Developing HuBERT: a Natural Language Processing algorithm for extending the Seshat Global History


There are a wide-range of historic and archaeological works that have documented the dynamics of past complex societies from across the globe. Recent projects have focused on recompiling this information across time for different societies (e.g. Seshat and D-place databases), enabling research from multiple disciplines across the Humanities and Social Sciences. Increasing the usability of these databases as well as increasing the information on them would allow for many more research questions to be answered.

The  main challenge for expanding and increasing the cross-utilization of  historical datasets is that data collection is slow and that translating information across projects is tedious and prone to error.  It takes many human hours to screen through the existing literature, thoroughly read selected articles, and manually record variables or reenter the information into a different framework. The goal of this project is to develop a Natural Language Processing (NLP) algorithm that can help expand current databases and increase the translatability of data across projects. In particular,  we will build upon the recent BERT language model to partially automate document screening and data collection from archeological and historical materials as well as existing databases.


01.03.2023 – 

Maria del Rio-Chanona

Funded by

Project Partners

0 Pages 0 Press 0 News 0 Events 0 Projects 0 Publications 0 Person 0 Visualisation 0 Art


CSH Newsletter

Choose your preference
Data Protection*