All ETDs from UAB

Advisory Committee Chair

Steven J Betbard

Advisory Committee Members

Elliot J Lefkowitz

Thamar Solorio

Willig H James

Document Type

Dissertation

Date of Award

2016

Degree Name by School

Doctor of Philosophy (PhD) College of Arts and Sciences

Abstract

A critical area in the field of information extraction is the assignment of ontological categories (meaning) to fragments of natural language. These fragments are termed ``utterances'' when referring to a continuous fragment or ``mentions'' when the fragment may be discontinuous. In text, most approaches to ontological assignment use named entity recognition (NER) to first discover mention boundaries. This can then be followed by a concept recognition or normalization process to assign a more precise conceptual category to the NER discovered mention. Typically this semantic assignment is represented by only a single identifier from an ontology. If the provided semantic categories are broad and the mention context simple, then these identifiers can provide a reasonable conceptual representation of the mention. However when finer conceptual granularity is required (and it often is) the gap between the expressiveness of natural language and the ontologies used to represent that complexity is exposed. This results in mentions for which no single semantic assignment is available. To alleviate this problem, this dissertation proposes a novel multi-identifier framework to semantically compose and annotate text mentions. This dissertation shows that compositional multi-identifier semantic assignment is both possible and useful. The feasibility of multi-identifier normalization is shown through the creation of a multi-identifier corpus and the construction of software to normalize multi-identifier (composite) concepts on this corpus. The utility of multi-identifier normalization is shown through a practical application that extracts cancer related information from electronic medical records. Finally we provide evidence that the ``content completeness problem'' can be alleviated by the use of composite concepts.

Share

COinS