Société de gestion d’informations et de documentation

Conversion de données d’impression en format XML

Exemples de projets

Conversion de données d’impression en format XML

Un ouvrage de référence scientifique, disponible initialement uniquement sur support papier, a été modifié de façon à ce que son contenu soit utilisable plus exhaustivement. Un des défis a été de reproduire de façon homogène des illustrations possédant des différentes résolutions, révélant à l’utilisateur des informations supplémentaires, non-disponibles à l’origine.

Project duration: 6 months

Even today, many reference books and standard textbooks are primarily produced for print, therefore being available for secondary use via electronic media only to a limited extent. Search options and navigation via cross-references in particular are heavily restricted. In order to enable search options and navigation in electronic media as well, we have, together with a leading academic publisher, developed a concept to transfer print data into an augmented XML format. In a first step, we created raw files using a parser, exploiting information from the typesetting data as far as possible, e.g. type size, font, or color, as they have a clearly defined content relevance in the printed work. Even at this stage, it turned out that these features were often used redundantly, e.g. italics for highlighting and for the designations of biological species. Due to this, the unambiguous allocation of XML tags is only intellectually possible with the requisite expert knowledge. In addition, the publisher decided to augment the text on the content level as well.

Practical examples include:

  • insertion of synonym relations (e.g. between term and abbreviation)
  • insertion of additional information, definitions, etc.
  • links between register and text
  • meaningful additions (e.g. about a differentiation between base substance and product of a chemical reaction)
  • additional search options by depositing invisible synonyms and spellings
  • integration and indexing of purely graphical elements

At the end of this post-processing and the subsequent validation on the basis of DTD (Document Type Definition), a high-quality data stock will be available. The high formal and content-related consistency of the data is the precondition for its further use on electronic platforms and in various applications.

All requisite steps may be carried out optionally, either remotely in the client system or in GIMD’s ARTIS database. The ARTIS software then supports editors with needs-based checking routines, keyboard shortcuts, automated data import, respectively export, allocation of work packages, and much more.