Digital Humanities

Many different scientific domains are using computer-based methods and approaches to verify hypothesis or to explore possible patterns in their datasets. This course will mainly focus on text-based datasets and machine learning methods to extract both content and stylistic patterns from texts (historical documents, newspaper articles, political speeches, tweets, etc.). These datasets are typical of humanities and social sciences. Different approaches to discover the evolution over time, or the differences between authors, genders, author’s ages and their psychological profiles will be discussed.


Code 32108
Type Course
Site Neuchâtel
Track(s) T3 – Advanced Information Processing
Semester A2021


Learning Outcomes

Learning outcomes The main objectives of this course is to introduce the students to the various techniques and strategies that can be used to

  1. to store, convert, and correct texts to generate a corpus
  2. to extract useful patterns from a corpus
  3. to compute the intertextual similarities between texts or corpora (clustering)
  4. to verify the authorship of a document or to draw the profile of the true author
  5. to apply a fair evaluation of those text categorization methods
Lecturer(s) Jacques Savoy
Language english
Course Page

The course page in ILIAS can be found at

Schedules and Rooms

Period Weekly
Schedule Wednesday, 08:15 - 12:00
Location UniNE, Unimail

Additional information


First Lecture
The first lecture will be announced later.


  • Karsdorp, F., Kestemont, M., Riddell, A. (20921). Humanities Data Analysis. Case Studies with Python. Princeton University Press: Princeton.
  • Savoy, J. (2020). Machine Learning Methods for Stylometry: Authorship Attribution and Author Profiling. Springer: Cham.