Introduction to Natural Language Processing for DH Research with spaCy

Schedule

Introduction

 

Session I: Intro

 

Session II: Custom attributes

 

coffee break (Plein 6)

 

Session III: Text categorization

 

Session IV: Prodigy

 

open discussion & experiments

14:00-14:15

 

14:15-15:00

 

15:00-15:25

 

15:30-16:00

 

16:00-16:20

 

16:20-17:05

 

17:05-18:00

Notebooks

All of the workshop materials can be accessed on our VM here:

 

spacy.apjan.co:8000

 

If you prefer to download the notebooks and work locally, they can be found here:

 

github.com/apjanco/spaCy_DH2019_workshop

Session I, Seth Bernstein

 

Seth Bernstein is Assistant Professor of History at the Higher School of Economics (Russian Federation). From January 2020 he will be Assistant Professor of History at University of Florida (Gainesville, USA). He is the author of Raised under Stalin: Young Communists and the Defense of Socialism (Cornell, 2017). His current project is "Return to the Motherland: The Repatriation of Soviet Citizens after World War II." Seth's work also uses digital techniques like GIS and massive textual databases to extract and visualize data.

Sessions II + III, David Lassner

 

David Lassner graduated (M.Sc.) in computer science at TU Berlin in 2017, focussing on machine learning with a minor in German literary studies. Mr. David Lassner is now a PhD candidate researching machine learning in the digital humanities at the group of machine learning at TU Berlin, where his main focus is the (machine-driven) analysis of literature.

Session IV, Andrew Janco

 

Andrew Janco is the Digital Scholarship Librarian at Haverford College. He completed his Ph.D. in History at the University of Chicago and MS in Information Science at the University of Illinois. Andy has a passion for inquiry-driven and community-engaged digital projects. He is the lead developer of a digital archive and research application for the Groupo de Apoyo Mutuo, Guatemala's oldest human rights organization. He works on applied machine learning for research applications in humanities and social science scholarship.

Prodigy

NER workflow