Towards tagged PDF

Ross Moore
Macquarie university, Sydney, Australia
Play (45min) Download: MP4 | MP3

This talk will demonstrate recent work done by the author and Han The Thanh, to enrich pdfTeX with the primitives to allow the production of “tagged PDF”. As this is still very much work-in-progress, the talk will concentrate on presenting various aspects of tagging that allow the advantages of tagging to be easily appreciated. These advantages include, but are not limited to:

  • substitution of Unicode characters, for glyph combinations from fonts that use encodings other than Unicode, via CMap resources and other techniques;
  • alternative text, to be read by screen-readers;
  • extraction of text from PDFs in XML format;
  • extraction of mathematical content, in MathMLformat.

Each of these aspects will be illustrated by examples constructed using an enhanced version of pdfTeX.

Also, I’ll try to explain the extra complexity of internal PDF structures required for generating properly tagged structure and content. If there is sufficient time, this may be followed by a discussion of the requirements needed to adjust the LaTeX format and packages, to facilitate the automatic production of properly tagged PDF, to become conformant with the ISO–32000–2 standard — also known as PDF/UA (Universal Accessibility, This standard includes MathML tagging of mathematical content; I wish to acknowledge Neil Soiffer (Design Science Inc., for motivation and much helpful advice, and testing, concerning this aspect.

You may also like:

  1. Making TeX support Unicode: The Quest of the Holy Grail
  2. Putting the cork back on the bottle: Improving Unicode support in TeX extensions
  3. Multidimensional Text
  4. TeX + MathML for Tagged PDF, the next frontier in mathematical typesetting
  5. Further advances toward Tagged PDF for mathematics

  • Share
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...