We describe an experiment of transforming large collections of LaTeX documents to more machine–understandable representations. Concretely, we are translating the collection of scientific publications of the Cornell e–Print Archive (arXiv)using the LaTeXtoXML converter which is currently under development.