XML-based Extraction of Terminological Information from Corpora
Ana Belén Crespo Bastos (Universidade de Vigo)
Xosé María Gómez Clemente (Universidade de Vigo)
Xavier Gómez Guinovart (Universidade de Vigo)
Susana López Fernández (Universidade de Vigo)

In this paper, we present a methodology for the extraction of terminological information from textual corpora, showing the processes we follow for identification of term candidates in corpora, and for recognition in textual data of term definitions and conceptual relations. Both the textual corpora that are used as the source for terminological information, as well as the terminological database we build from this information, are stored and maintained by linguists in XML format, and converted to MySQL format for consultation through a PHP-based web application.

Document Processing using XML, XML-based natural language processing