Conversion from PDF into XML
Many documents today are made available in a PDF format. To work systematically and comprehensively with the contents of these documents requires conversion into the XML format and possibly, intellectual post-processing.
- Standardization of separator characters (e.g. normal blank spaces vs. non-breaking spaces)
- Correct insertion of links with the appropriate attributes (e.g. link to Directive …, effective as of …)
- Standardization of referenced documents (e.g. with internal and external references, standardization of nomenclature (e.g. Dir for “directive”, etc.)
- Integration of notes, footnotes, lists, appendices, etc., at the respective position in the text, in order to improve readability in electronic media
- Consolidation of numbers (decimal point), special characters, and diagrams