Digital Humanities Guide: Analysis



The digital format and availability of massive amounts of data provide enormous new opportunities for research and analysis.  The Taxonomy of Digital Research Activities has identified at least seven categories into which much of this work can be grouped:

  • Content: focusing on deeper understanding of the meaning conveyed by a resource
  • Network: studying the relations between actors and entities in a network, whether actual or virtual
  • Relational: discovering relations between objects of study, as in examinations of intertextuality, influence, membership in a genre, or collation of multiple versions of a work
  • Spatial: discovery of trends or patterns in data pertaining to spatial or geographical aspects
  • Structural: analysis of artifacts on the level of their structural components -- e.g. part of speech tagging, studies of syntax, collocation, concordance
  • Stylistic: identification and comparison of formal and stylistic features of objects
  • Visualization: graphic representation of relationships within a set of data


A digital humanities research project will frequently involve a mix of these approaches and employ a variety of software tools, licensed, open-source, and created, to accomplish the task.  The DHC provides a collection and/or offers support for a number of these:

  • AntConc: a simple but powerful desktop tool for concordance, word frequency, identification of word clusters, mapping a term across a text, and measuring the "keyness" of a given text or collection vis-a-vis a reference corpus
  • Gephi:  an interactive visualization and exploration platform for networks, complex systems, and dynamic and hierarchical graph
  • Mallet: a command-line Java program for identifying topics within a corpus of texts
  • Natural Language Tool Kit (NLTK): a Python library with a vert robust set of tools and corpora and tools for natural language processing and text analysis
  • NVivo: -- a "qualitative analysis" software, supporting the coding of themes, topics, and other components in a body of text or media sources, their classification with a set of attributes, and their analysis through a suite of text, word frequency, coding, and matrix queries, as well as cluster analysis and model building
  • stylo R package: software for comparing the styles of multiple texts
  • Voyant: online site with a full suite of tools for analyzing texts -- word frequency, concordancing, collocation, word clouds, and much more


Most of the times, your research questions will ask of you that you create a unique script or scripts in order to get at an answer you can understand and base strong claims on. We are here to provide guidance and consultation. Please make an appointment with one of us and tell us more about your research project and goals.

Otherwise, a decent repository of off-the-shelf analysis tools is maintained by Geoffrey Rockwell and others at TAPoR.