Columbia University Libraries has a license providing current affiliates free access to ProQuest's TDM Studio, a web-based portal into doing TDM research using ProQuest's many databases of full-text sources. TDM Studio has two flavors:
data@library.columbia.edu
to set up an appointment) first and a pre-existing knowledge of programming in either Python or R. As with Visualization, Workbench allows the researcher to build a corpus of relevant ProQuest documents, but then they are provided with a Jupyter Notebook environment where they can pursue nearly any mode of TDM analysis possible with either programming language.In TDM Studio, corpora are referred to as "datasets," and constructing one is the first step in using TDM Studio.
Similar to using regular ProQuest search, in TDM Studio, the researcher:
The dataset will take a few minutes to be generated, but once it is available, the researcher can access it from their TDM Studio Jupyter environment.
Once a researcher has created a dataset, after a few minutes they can set up a Jupyter Notebook environment that has unique access to their dataset(s).
ProQuest is somewhat flexible on these limitations, but relaxing them requires having RDS mediate between the researcher and ProQuest.
The default Jupyter Notebook includes helpful tutorials from ProQuest about accessing the TDM Studio datasets. Technologically, the process involves copying the dataset to an AWS S3 bucket to which the Jupyter Notebook has access. Once the dataset is copied to the bucket, it can be deleted from the TDM Studio Workbench Dashboard, freeing up one of the ten slots for datasets.
The dataset takes the form of a folder that contains all of the documents, each an XML file with rich metadata.
For Python users, the environment has access to all of the packages that are available for install via conda. Custom libraries can also be uploaded into the environment.