Below you will find a list of sources of texts that Columbia affiliates may use for text mining purposes. Every source has different rules associated with it, so when in doubt, please reach out to Research Data Services (data@library.columbia.edu
) for more information.
Publishers have different rules and regulations. They also make their datasets available in different formats, depending on the publisher.
Please remember that this is an emerging, fast-moving field, so information on this page may fall out of date rather quickly.
The information presented here was largely gathered by Colleen Major, Head of Electronic Resources Management: Operations and Analysis.
Adam Mathew
Primary sources in humanities and social sciences. (list of databases in Clio)
Adam Mathew allows for TDM via an API. Contact RDS.
American Chemical Society
Journals in chemistry.
ACS allows for fulltext TDM. Contact RDS.
BioOne
Non-profit scientific research.
BioOne allows for TDM. Contact RDS.
Cambridge University Press
Academic journals and books.
Cambridge University Press allows for TDM. Contact RDS.
EBSCOHost
Academic journals and other scholarly publications.
No TDM capabilities for full text databases.
Elsevier
Citation metadata for publications in health, physical, and social sciences.
Elsevier allows for TDM via an API Key tied to Columbia’s license. See Elsevier’s TDM policy and contact RDS with questions.
Gale
Newspapers, magazines, and religious, historical, and social scientific materials. (list of databases in Clio, search for primary sources)
Gale will share fulltext content for TDM with Columbia Libraries upon request. Contact RDS.
JSTOR
Journals and books in the humanities and social sciences.
JSTOR’s fulltext TDM is now supported by the Constellate platform. Contact RDS for details.
ProQuest
Newspapers, magazines, journals, and books in the humanities and social sciences. (list of databases in Clio)
ProQuest allows for fulltext TDM via ProQuest TDM Studio, a virtual environment. Contact RDS for details.
Oxford University Press
Journals and books in the humanities and social sciences.
OUP allows for TDM. Contact RDS.
SAGE
Journals and books in business, humanities, social sciences, science, technology, and medicine.
SAGE allows for TDM that follows their guidelines. They also have datasets available via Data Planet. Contact RDS with questions.
Springer Nature
Journals in science.
Springer Nature provides a limited API for TDM, but allows for fulltext TDM via an API key belonging to Columbia. Contact RDS.
Taylor & Francis
Scholarly journals.
Taylor and Francis allows for TDM. Contact RDS.
Web of Science
Citation metadata for articles in the sciences.
Web of Science allows for TDM via an API. Contact RDS with questions.
Wiley
Journals and books in science, technology, medicine, professional development, and higher education.
Wiley allows for fulltext TDM via an API. Contact RDS with questions.