Date of Award


Degree Name

Master of Science


Computer Science


Olac Fuentes


Convolutional neural networks have seen much success in computer vision and natural language processing tasks. When training convolutional neural networks for text classification tasks, a common technique is to transform an input sequence of words into a dense matrix of word embeddings, or words represented as dense vectors, using table lookup operations. This enables the inputs to be represented in a way that the well-known convolution/pooling operations can be applied to them in a manner similar to images. These word embeddings may be further incorporated into the neural network itself as a trainable layer to allow fine-tuning, usually leading to improved model predictions. The drastic increase of free parameters, however, leads to overfitting if proper regularization is not applied or the size of the training set is not large enough.

I give an overview of convolutional and recurrent network architectures, describe their basic functions, and discuss their observed advantages and shortcomings in my experiments. I follow this discussion with an overview of my final choice of architecture, based on a combination of these architectures.

I train neural networks using abstracts from multiple science and engineering fields; each set of abstracts comprised of multiple topics. The number of publications available for my task is moderate, in the mid-thousands for each topic. I analyse the effect of using word embeddings with the models in terms of fit and prediction. I then propose embedding "trainability" schemes to alleviate overfitting, to improve test accuracy, and to reduce training times. I conclude my study proposing several data augmentation techniques designed for text sequences to further mitigate overfitting and to improve generalization. Finally, I discuss my empirical results and propose directions for future work.




Received from ProQuest

File Size

53 pages

File Format


Rights Holder

Jonathan Quijas