Most digital humanities work involves the use of preexisting digital or analog resources. Depending on their format, a variety of strategies for identifying and capturing them exist.
Columbia Libraries' website brings together a uniquely strong collection of resources for humanities research, supplementing the vast array of publicly accessible material on the Internet. The links below can help you to search and retrieve the information and sources you need from both of those places most effectively.
For large-scale projects, the standard interfaces for retrieval may be too slow and time-consuming. We are happy to assist you in exploring other mass mining and capture option, including ones that can scrape and harvest content from the web in a more automatic fashion.
At the moment our scanners are not available to the public. You can access OCR software virtually through CUIT.
The DHC does not currently provide equipment for the recording and digitization of audio and video content, although we are exploring options for such a service. (Undergraduate film majors wanting to reserve equipment for their projects can make arrangements with the Film Department.) In the meantime, we can offer some recommendations.
While manuscripts and poorly printed material can be scanned, they are unlikely to yield useful OCR. (Note, however that large masses of OCRed text with large amounts of error can sometimes lend themselves to certain types of text mining and analysis. Consult with staff to learn more.) If very accurate digital text is required, however, you will need to take advantage of a transcription tool like the ones listed below. Reliable tools for turning spoken speech into printed digital text are services provided by private companies outside Columbia.
At Columbia Libraries we support a number of programs used to change text, image, audio, and video files from one format to another. Some of the most powerful are noted below. It is also worth noting that if texts are created or saved in certain standard marked up formats, such as XML, or markdown, they can easily be output in a variety of different formats. We can also consult on unique scripted solutions for your needs.