Introduction to Natural Language Processing for DH Research with spaCy




Session I: Intro


Session II: Custom attributes


coffee break (Plein 6)


Session III: Text categorization


Session IV: Prodigy


open discussion & experiments















All of the workshop materials can be accessed on our VM here:


If you prefer to download the notebooks and work locally, they can be found here:

Session I, Seth Bernstein


Seth Bernstein is Assistant Professor of History at the Higher School of Economics (Russian Federation). From January 2020 he will be Assistant Professor of History at University of Florida (Gainesville, USA). He is the author of Raised under Stalin: Young Communists and the Defense of Socialism (Cornell, 2017). His current project is "Return to the Motherland: The Repatriation of Soviet Citizens after World War II." Seth's work also uses digital techniques like GIS and massive textual databases to extract and visualize data.

Sessions II + III, David Lassner


David Lassner graduated (M.Sc.) in computer science at TU Berlin in 2017, focussing on machine learning with a minor in German literary studies. Mr. David Lassner is now a PhD candidate researching machine learning in the digital humanities at the group of machine learning at TU Berlin, where his main focus is the (machine-driven) analysis of literature.

Session IV, Andrew Janco


Andrew Janco is the Digital Scholarship Librarian at Haverford College. He completed his Ph.D. in History at the University of Chicago and MS in Information Science at the University of Illinois. Andy has a passion for inquiry-driven and community-engaged digital projects. He is the lead developer of a digital archive and research application for the Groupo de Apoyo Mutuo, Guatemala's oldest human rights organization. He works on applied machine learning for research applications in humanities and social science scholarship.


NER workflow