[R] How Machine Learning Can Help Unlock the World of Ancient Japan (by Alex Lamb)
Humanity’s rich history has left behind an enormous number of historical documents and artifacts. However, virtually none of these documents, containing stories and recorded experiences essential to our cultural heritage, can be understood by non-experts due to language and writing changes over time.
For instance, archaeologist have unearthed tens of thousands of clay tablets from ancient Babylon, yet only a few hundred specially trained scholars can translate them. The vast majority of these documents have never been read, even if they were uncovered in the 1800s. To give a further illustration of the challenge posed by this scale, a tablet from the Tale of Gilgamesh was collected in an expedition in 1851, but its significance was not brought to light until 1872. This tablet contains a pre-biblical flood narrative, which has enormous cultural significance as a precursor to the Noah’s Ark narrative.
This is a global problem, yet one of the most striking examples is the case of Japan. From 800 until 1900 CE, Japan used a writing system called Kuzushiji, which was removed from the curriculum in 1900 when the elementary school education was reformed. Currently, the overwhelming majority of Japanese speakers cannot read texts which are more than 150 years old. The volume of these texts — comprised of over three million books in storage but only readable by a handful of specially-trained scholars — is staggering. One library alone has digitized 20 million pages from such documents. The total number of documents — including, but not limited to, letters and personal diaries — is estimated to be over one billion. Given that very few people can understand these texts, mostly those with PhDs in classical Japanese literature and Japanese history, it would be very expensive and time-consuming to finance for scholars to convert these documents to modern Japanese. This has motivated the use of machine learning to automatically understand these texts.