Date of Award
2016-01-01
Degree Name
Master of Science
Department
Computer Science
Advisor(s)
Olac Fuentes
Abstract
Machine Learning is a sub-field of Artificial intelligence that aims to automatically improve algorithms by experience. It has been used successfully to solve various problems, such as playing checkers, or even as simple as word prediction when typing a sentence. These algorithms perform best with large amounts of training data. The more labeled data, the better a machine learning algorithm will be able to recognize patterns. However, the ideal scenario, where there is a large amount of labeled data available to train the algorithm, does not occur all the time. There are cases where labeling data is both time-consuming and expensive.
The problem of lacking training data created an interest in such cases, where an algorithm could only work with a small labeled training set. One of the practices is semi-supervised learning, which uses a large set of unlabeled examples to supplement the small labeled training set. A classical example of a semi-supervised algorithm is the Co-Training algorithm. It uses a set of positive and negative labeled examples to label the unlabeled set through machine learning, rather than doing so manually. Co-Training is used in various problems that use a small amount of labeled trained data, such as Web-Page Classication and Image-Detection . While semi-supervised training is not as accurate as supervised training, it creates a good solution to problems where there is not enough labeled training data. This field has made advances in many problems, such as gene disease identication in the field of bio-bioinformatics. In this Thesis, I will focus on one of the fields that has shown promise with this type of learning: using Positive and Unlabeled Learning for Natural Language Processing. I propose a change in a Positive and Unlabeled learning algorithm, Multi-Level Example Learning, that uses word embeddings to improve the results of the original algorithm for text classication.
Language
en
Provenance
Received from ProQuest
Copyright Date
2016
File Size
51 pages
File Format
application/pdf
Rights Holder
Emmanuel Carlo Tafoya
Recommended Citation
Tafoya, Emmanuel Carlo, "Using Word Embeddings for Text Classification in Positive and Unlabeled Learning" (2016). Open Access Theses & Dissertations. 757.
https://scholarworks.utep.edu/open_etd/757