Advisory Committee Chair
Steven J Betbard
Advisory Committee Members
Elliot J Lefkowitz
Thamar Solorio
Willig H James
Document Type
Dissertation
Date of Award
2016
Degree Name by School
Doctor of Philosophy (PhD) College of Arts and Sciences
Abstract
A critical area in the field of information extraction is the assignment of ontological categories (meaning) to fragments of natural language. These fragments are termed ``utterances'' when referring to a continuous fragment or ``mentions'' when the fragment may be discontinuous. In text, most approaches to ontological assignment use named entity recognition (NER) to first discover mention boundaries. This can then be followed by a concept recognition or normalization process to assign a more precise conceptual category to the NER discovered mention. Typically this semantic assignment is represented by only a single identifier from an ontology. If the provided semantic categories are broad and the mention context simple, then these identifiers can provide a reasonable conceptual representation of the mention. However when finer conceptual granularity is required (and it often is) the gap between the expressiveness of natural language and the ontologies used to represent that complexity is exposed. This results in mentions for which no single semantic assignment is available. To alleviate this problem, this dissertation proposes a novel multi-identifier framework to semantically compose and annotate text mentions. This dissertation shows that compositional multi-identifier semantic assignment is both possible and useful. The feasibility of multi-identifier normalization is shown through the creation of a multi-identifier corpus and the construction of software to normalize multi-identifier (composite) concepts on this corpus. The utility of multi-identifier normalization is shown through a practical application that extracts cancer related information from electronic medical records. Finally we provide evidence that the ``content completeness problem'' can be alleviated by the use of composite concepts.
Recommended Citation
Osborne, John David, "Machine Learning Of Composite Concepts And The Alleviation Of The Content Completeness Problem In Text Mention Normalization" (2016). All ETDs from UAB. 2636.
https://digitalcommons.library.uab.edu/etd-collection/2636