Project examples
[cdf_page_title]
Our task was to process and structure song pamphlets in the catalog of the Southwest German Library Network based on PDF documents and automatically generated catalog entries according to appropriate regulations and linking to additional information.
Project duration: 24 months
“Alt-Heidelberg, du feine …“ – this song by Joseph Victor von Scheffel, like many others, was circulated as a pamphlet in the 19th century. For centuries, such song pamphlets were printed in relatively large runs and commercially distributed. They always consisted of loose sheets, without bindings, and usually contained several songs – such as glees for convivial singing, religious, or political songs. Frequently, they were illustrated, but only rarely did they provide notes.
Valuable collections of song pamphlets from the 16th to the 20th century were digitalized in libraries and cataloged down to song level. This project also supplemented the directories of the German-language prints specified by century. In total, the corpus consisted of 15,000 song pamphlets with about 33,000 songs.
GIMD was given PDF documents of the pamphlets and linked automatically generated catalog entries with basic information from these documents and the archival finding aids of the German Folk Song Archives. Our task was to process these catalog entries on-line according to appropriate current regulations and to provide them with additional information from the PDF documents.
This involved a great deal of intellectual work. Song lyrics, for instance, are often in historical dialects, but beginnings and choruses of songs must also be represented in standardized form in the data record. The languages of the songs were identified and designated according to a pre-defined language code. This required great linguistic expertise, especially in linguistic fringe areas like Rhaeto-Romanic or Catalan. Our client was informed of problem cases – e.g. defectively scanned PDF documents – on a weekly basis. This notification was vitally supported by ARTIS.
The project was carried out completely paper-free. After two years, the collection could be presented on the Internet. Without endangering the sensitive original documents, the precious collections could be made accessible to academia and the general public due to digitalization and structuring.