Alphabet subsidiary DeepMind has unveiled Ithaca, a new AI model that can help restore and reconstruct historical inscriptions, manuscripts, and other materials. Ithaca is a neural network that is developed in collaboration with the University of Venice, the University of Oxford, and the Athens University of Economics and Business. This neural network draws inspiration from Ithaca – the Greek island described in Homer’s Odyssey for its name.
Artificial intelligence has revolutionized the way archaeologists excavate the past in recent years. Though it cannot fight cursed mummies or crack a whip-like Indiana Jones, it has proved itself to be a valuable asset in unearthing the past. Archaeologists, for example, are examining manuscripts and tablets using computer vision techniques. In many places of the world, machine learning is used to assess satellite data and other aerial imagery to find potential archaeological sites.
According to a report published in Nature by DeepMind, Ithaca was trained using natural language processing to retrieve lost ancient literature that has been degraded through time and identify the original location of the text and determine the date when it was produced. The objectives behind this research were: finding a solution to decode ancient yet damaged Greek inscriptions and come up with an advanced modern dating technique.
These objectives were crucial because these manuscripts are frequently damaged owing to their antiquity, making restoration a gratifying effort. In addition, because they are frequently etched on inorganic materials like stone or metal, contemporary dating methods such as radiocarbon dating cannot be performed to determine when they were written.
Pythia, Ithaca’s precursor, which draws its name from the priestess of Delphi, was DeepMind’s first text restoration system launched in 2019. The initial stage for the researchers was to convert the Packard Humanities Institute (PHI) dataset, which is the world’s largest digitized collection of ancient Greek inscriptions, into PHI-ML, a machine-actionable text format. The Packard Humanities Institute dataset includes transcribed texts of 178,551 inscriptions. The researchers then taught Pythia to predict the missing letters of words in those inscriptions using both words and individual characters as inputs.
When presented with an incomplete inscription, Pythia generated as many as 20 alternative probable letters or phrases, as well as the level of confidence for each suggestion. It was up to the historians (also known as “domain experts”) to sort through all of the choices and make a final decision based on their subject matter expertise.
Ithaca’s neural network architecture is built on the transformer, which employs an attention mechanism to balance the impact of various input elements on the model’s decision-making process. By concatenating the input character and word representations with their sequential positional information, the attention mechanism is aware of the position of each component of the input text. Each Ithaca transformer block produces a sequence of processed representations with a length equal to the number of input characters, and each block’s output becomes the input of the next. The final output is sent to three separate task heads, each of which handles restoration, geographical attribution, and chronological attribution using a shallow feedforward neural network that has been properly trained for each function.
During testing, the team observed that Ithaca is 62% accurate at restoring damaged texts and 71% accurate in identifying the placement of a text. It was also demonstrated that it could determine the origin of the writer and could place the date of writing to within 30 years, on average. Further, this research is unique cause unlike existing NLP systems used for text generation and analysis like GPT-3, Ithaca does not rely on using word sequences to offer better textual context. However, it is important to note that it is a research tool that still depends on humans.
If you have any ancient Greek text on hand, you may try out a pared-down version of Ithaca here, or use one of their offered samples to see how it fills in desired gaps. Try it out in this Colab notebook if you have lengthier parts or more than 10 letters missing.
DeepMind also collaborated on an interactive version of Ithaca with Google Cloud and Google Arts & Culture. It has also open-sourced the code as well as the pre-trained model, encouraging additional study. DeepMind also stated on its blog that it was already working on additional Ithaca versions based on other ancient books. Other ancient writing systems, including Akkadian, Demotic, Hebrew, and Mayan, might be used by historians in their research. Ithaca is available on this GitHub page.