{"id":1652,"date":"2018-05-18T16:28:23","date_gmt":"2018-05-18T14:28:23","guid":{"rendered":"https:\/\/p686699.mittwaldserver.info\/?p=1652"},"modified":"2025-03-20T12:45:10","modified_gmt":"2025-03-20T11:45:10","slug":"conversion-from-pdf-into-xml","status":"publish","type":"post","link":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/","title":{"rendered":"Conversion from PDF into XML"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Many documents today are made available in a PDF format. To work systematically and comprehensively with the contents of these documents requires conversion into the <abbr title=\"XML (Extensible Markup Language) is a markup language for the representation of hierarchically structured data in the format of a text file that is readable both by humans and machines.\">XML format<\/abbr> and possibly, intellectual post-processing.<\/p>\n<\/blockquote>\n\n\n\n<p><strong>Project duration: <\/strong>9 months<br><br>Today, most official documents are published and made available for download as PDF files, including all EU laws and directives (see <a href=\"http:\/\/eur-lex.europa.eu\">http:\/\/eur-lex.europa.eu<\/a>). In order to work systematically and comprehensively with these documents, conversion into the XML format is advisable. In a first step, so-called parsers are used to create an XML file on the basis of the formattings in the PDF file. For most applications, however, this raw file will be unusable, since neither the allocation nor the contents of the individual <abbr title=\"XML (Extensible Markup Language) is a markup language for the representation of hierarchically structured data in the format of a text file that is readable both by humans and machines. Tags supplement the data stock with additional information.\">XML tags<\/abbr> will lead to an immediately consistent result.<br><br>Intellectual post-processing therefore is indispensable, if specific demands are made on searchability and cross-references. Intellectual post-processing may comprise the following points:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Standardization of separator characters (e.g. normal blank spaces vs. non-breaking spaces)<\/li>\n\n\n\n<li>Correct insertion of links with the appropriate attributes (e.g. link to Directive \u2026, effective as of \u2026)<\/li>\n\n\n\n<li>Standardization of referenced documents (e.g. with internal and external references, standardization of nomenclature (e.g. Dir for \u201cdirective\u201d, etc.)<\/li>\n\n\n\n<li>Integration of notes, footnotes, lists, appendices, etc., at the respective position in the text, in order to improve readability in electronic media<\/li>\n\n\n\n<li>Consolidation of numbers (decimal point), special characters, and diagrams<\/li>\n<\/ul>\n\n\n\n<p>At the end of this post-processing and the subsequent validation on the basis of <abbr title=\"A Document Type Definition defines the details for the use of a certain XML language.\">DTD (Document Type Definition)<\/abbr>, a high-quality data stock will be available. The high formal and content-related consistency of the data is the precondition for its further use on electronic platforms and in various applications.<br>All requisite steps may be carried out optionally, either remotely, i.e. in the client system, or in GIMD\u2019s <a href=\"https:\/\/gimd.de\/en\/software\/\">ARTIS database<\/a>. The <a href=\"https:\/\/gimd.de\/en\/software\/\">ARTIS software<\/a> then supports our editors with needs-based checking routines, keyboard shortcuts, automated data import, respectively export, allocation of work packages, and much more.<\/p>\n\n\n\n<p class=\"has-text-align-right\"><a href=\"https:\/\/gimd.de\/en\/contact\/\">Contact<\/a> <a href=\"https:\/\/gimd.de\/en\/category\/projects\/#projectsContent\">Show all<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many documents today are made available in a PDF format. To work systematically and comprehensively with the contents of these [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[109,76,95,85,118,71,101,106],"tags":[],"class_list":["post-1652","post","type-post","status-publish","format-standard","hentry","category-converting","category-e-books-en","category-electronic-publishing-systems","category-law","category-processing-editing","category-projects","category-search-engines","category-standardizing"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Conversion from PDF into XML &#8211; GIMD<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Conversion from PDF into XML &#8211; GIMD\" \/>\n<meta property=\"og:description\" content=\"Many documents today are made available in a PDF format. To work systematically and comprehensively with the contents of these [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/\" \/>\n<meta property=\"og:site_name\" content=\"GIMD\" \/>\n<meta property=\"article:published_time\" content=\"2018-05-18T14:28:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-20T11:45:10+00:00\" \/>\n<meta name=\"author\" content=\"gimd-redaktion\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"gimd-redaktion\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/\"},\"author\":{\"name\":\"gimd-redaktion\",\"@id\":\"https:\/\/gimd.de\/en\/#\/schema\/person\/ba78560ef83b195d9b67f42a1c6e0a6e\"},\"headline\":\"Conversion from PDF into XML\",\"datePublished\":\"2018-05-18T14:28:23+00:00\",\"dateModified\":\"2025-03-20T11:45:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/\"},\"wordCount\":331,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/gimd.de\/en\/#organization\"},\"articleSection\":[\"converting\",\"E-Books\/E-Journals\",\"Electronic Publishing Systems\",\"Law\",\"processing\/editing\",\"Projects\",\"Search Engines\",\"standardizing\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/\",\"url\":\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/\",\"name\":\"Conversion from PDF into XML &#8211; GIMD\",\"isPartOf\":{\"@id\":\"https:\/\/gimd.de\/en\/#website\"},\"datePublished\":\"2018-05-18T14:28:23+00:00\",\"dateModified\":\"2025-03-20T11:45:10+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Startseite\",\"item\":\"https:\/\/gimd.de\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Conversion from PDF into XML\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/gimd.de\/en\/#website\",\"url\":\"https:\/\/gimd.de\/en\/\",\"name\":\"GIMD\",\"description\":\"Limited Corporation for Information Management and Documentation\",\"publisher\":{\"@id\":\"https:\/\/gimd.de\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/gimd.de\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/gimd.de\/en\/#organization\",\"name\":\"Gesellschaft f\u00fcr Informations-Management und Dokumentation mbH\",\"url\":\"https:\/\/gimd.de\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/gimd.de\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/gimd.de\/wp-content\/uploads\/2025\/03\/cropped-GIMD-Logo-002.png\",\"contentUrl\":\"https:\/\/gimd.de\/wp-content\/uploads\/2025\/03\/cropped-GIMD-Logo-002.png\",\"width\":800,\"height\":140,\"caption\":\"Gesellschaft f\u00fcr Informations-Management und Dokumentation mbH\"},\"image\":{\"@id\":\"https:\/\/gimd.de\/en\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/gimd.de\/en\/#\/schema\/person\/ba78560ef83b195d9b67f42a1c6e0a6e\",\"name\":\"gimd-redaktion\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/e0982fccb74046b96083b824011f76dd98b3644c85bed7cc8c056ebb40fcb2cd?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e0982fccb74046b96083b824011f76dd98b3644c85bed7cc8c056ebb40fcb2cd?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e0982fccb74046b96083b824011f76dd98b3644c85bed7cc8c056ebb40fcb2cd?s=96&d=mm&r=g\",\"caption\":\"gimd-redaktion\"},\"url\":\"https:\/\/gimd.de\/en\/author\/gimd-redaktion\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Conversion from PDF into XML &#8211; GIMD","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/","og_locale":"en_US","og_type":"article","og_title":"Conversion from PDF into XML &#8211; GIMD","og_description":"Many documents today are made available in a PDF format. To work systematically and comprehensively with the contents of these [&hellip;]","og_url":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/","og_site_name":"GIMD","article_published_time":"2018-05-18T14:28:23+00:00","article_modified_time":"2025-03-20T11:45:10+00:00","author":"gimd-redaktion","twitter_card":"summary_large_image","twitter_misc":{"Written by":"gimd-redaktion","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/#article","isPartOf":{"@id":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/"},"author":{"name":"gimd-redaktion","@id":"https:\/\/gimd.de\/en\/#\/schema\/person\/ba78560ef83b195d9b67f42a1c6e0a6e"},"headline":"Conversion from PDF into XML","datePublished":"2018-05-18T14:28:23+00:00","dateModified":"2025-03-20T11:45:10+00:00","mainEntityOfPage":{"@id":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/"},"wordCount":331,"commentCount":0,"publisher":{"@id":"https:\/\/gimd.de\/en\/#organization"},"articleSection":["converting","E-Books\/E-Journals","Electronic Publishing Systems","Law","processing\/editing","Projects","Search Engines","standardizing"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/","url":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/","name":"Conversion from PDF into XML &#8211; GIMD","isPartOf":{"@id":"https:\/\/gimd.de\/en\/#website"},"datePublished":"2018-05-18T14:28:23+00:00","dateModified":"2025-03-20T11:45:10+00:00","breadcrumb":{"@id":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/gimd.de\/en\/2018\/05\/18\/conversion-from-pdf-into-xml\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Startseite","item":"https:\/\/gimd.de\/en\/"},{"@type":"ListItem","position":2,"name":"Conversion from PDF into XML"}]},{"@type":"WebSite","@id":"https:\/\/gimd.de\/en\/#website","url":"https:\/\/gimd.de\/en\/","name":"GIMD","description":"Limited Corporation for Information Management and Documentation","publisher":{"@id":"https:\/\/gimd.de\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/gimd.de\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/gimd.de\/en\/#organization","name":"Gesellschaft f\u00fcr Informations-Management und Dokumentation mbH","url":"https:\/\/gimd.de\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/gimd.de\/en\/#\/schema\/logo\/image\/","url":"https:\/\/gimd.de\/wp-content\/uploads\/2025\/03\/cropped-GIMD-Logo-002.png","contentUrl":"https:\/\/gimd.de\/wp-content\/uploads\/2025\/03\/cropped-GIMD-Logo-002.png","width":800,"height":140,"caption":"Gesellschaft f\u00fcr Informations-Management und Dokumentation mbH"},"image":{"@id":"https:\/\/gimd.de\/en\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/gimd.de\/en\/#\/schema\/person\/ba78560ef83b195d9b67f42a1c6e0a6e","name":"gimd-redaktion","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/e0982fccb74046b96083b824011f76dd98b3644c85bed7cc8c056ebb40fcb2cd?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e0982fccb74046b96083b824011f76dd98b3644c85bed7cc8c056ebb40fcb2cd?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e0982fccb74046b96083b824011f76dd98b3644c85bed7cc8c056ebb40fcb2cd?s=96&d=mm&r=g","caption":"gimd-redaktion"},"url":"https:\/\/gimd.de\/en\/author\/gimd-redaktion\/"}]}},"_links":{"self":[{"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/posts\/1652","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/comments?post=1652"}],"version-history":[{"count":2,"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/posts\/1652\/revisions"}],"predecessor-version":[{"id":1654,"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/posts\/1652\/revisions\/1654"}],"wp:attachment":[{"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/media?parent=1652"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/categories?post=1652"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gimd.de\/en\/wp-json\/wp\/v2\/tags?post=1652"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}