Below you will find a list of sources of texts that Columbia affiliates may use for text mining purposes. Every source has different rules associated with it, so when in doubt, please reach out to Research Data Services (data@library.columbia.edu
) for more information.
Publishers have different rules and regulations. They also make their datasets available in different formats, depending on the publisher.
Please remember that this is an emerging, fast-moving field, so information on this page may fall out of date rather quickly.
The information presented here was largely gathered by Colleen Major, Head of Electronic Resources Management: Operations and Analysis.
Adam Mathew
Primary sources in humanities and social sciences. (list of databases in Clio)
Adam Mathew allows for TDM via an API. Contact RDS.
American Chemical Society
Journals in chemistry.
ACS allows for fulltext TDM. Contact RDS.
BioOne
Non-profit scientific research.
BioOne allows for TDM. Contact RDS.
Cambridge University Press
Academic journals and books.
Cambridge University Press allows for TDM. Contact RDS.
EBSCOHost
Academic journals and other scholarly publications.
No TDM capabilities for full text databases.
Elsevier / Scopus
Citation metadata for publications in health, physical, and social sciences.
Elsevier allows for TDM via an API Key tied to Columbia’s license. See Elsevier’s TDM policy and contact RDS with questions. Full text is available only for articles published via ScienceDirect.
Gale
Newspapers, magazines, and religious, historical, and social scientific materials. (list of databases in Clio, search for primary sources)
Gale will share fulltext content for TDM with Columbia Libraries upon request. Contact RDS.
HathiTrust Research Center
The HathiTrust Digital Library
HTRC allows for web-based analysis of the HathiTrust Digital Library.
IEEE Xplore
Scientific and technical content published by the IEEE (Institute of Electrical and Electronics Engineers) and its publishing partners..
IEEE Xplore has an API that Columbia researchers can access. See their Quick Start Guide for more information.
JSTOR
Journals and books in the humanities and social sciences.
JSTOR’s fulltext TDM is supported both by the Constellate platform and individual requests for data transfers. Contact RDS for details.
ProQuest
Newspapers, magazines, journals, and books in the humanities and social sciences. (list of databases in Clio)
ProQuest allows for fulltext TDM via ProQuest TDM Studio, a virtual environment. Contact RDS for details.
Oxford University Press
Journals and books in the humanities and social sciences.
OUP allows for TDM. Contact RDS.
SAGE
Journals and books in business, humanities, social sciences, science, technology, and medicine.
SAGE allows for TDM that follows their guidelines. They also have datasets available via Data Planet. Contact RDS with questions.
Scopus
See Elsevier, above.
Springer Nature
Journals in science.
Springer Nature provides a limited API for TDM, but allows for fulltext TDM via an API key belonging to Columbia. Contact RDS.
Taylor & Francis
Scholarly journals.
Taylor and Francis allows for TDM. Contact RDS.
University of Chicago Press Journals
Journals and books.
UofC Press allows for TDM. Contact RDS with questions.
Web of Science
Citation metadata for articles in the sciences. Includes citation counts but no full text.
Web of Science allows for TDM via an API. Contact RDS with questions.
Wiley
Journals and books in science, technology, medicine, professional development, and higher education.
Wiley allows for fulltext TDM via an API. Contact RDS with questions.