Context-Aware Temporal Embeddings for Text and Video Data

Ahnaf Farhan, University of Texas at El Paso


Recent years have seen an exponential increase in unstructured data, primarily in the form of text, images, and videos. Extracting useful features and trends from large-scale unstructured datasets – such as news outlets, scientific papers, and videos like security cameras or body cam recordings – is faced with substantial challenges of volume, scalability, complexity, and semantic understanding. In analyzing trends, comprehending the temporal context is vital for uncovering patterns and narratives that are not apparent from a single video frame or text document. Despite its importance, many existing data mining and machine learning approaches overlook extracting evolutionary contextual features in datasets. The oversight leads to missed opportunities in harnessing insights for improved decision-making and predictive analysis. In this dissertation, I seek to address the gap between regular and temporal/dynamic representations of video and text data. The regular representations do not capture temporal contexts of features; on the other hand, the dynamic representation captures contextual changes over time, improving the quality of the features for downstream prediction applications.My dissertation focuses on neural network embeddings as distributed representations of unit features of the data (e.g., entities or words for text, and objects in videos.) I investigate how the embeddings can be generated to capture the contextual changes. I propose temporal embedding models that (1) are capable of detecting both short and long-term shifts in the semantics of entities within text data – a capability often missing in existing temporal word embedding models, and (2) capitalize on the spatial distance of objects and their appearance over video frames in video data to model contextual trends. My experiments demonstrate that the proposed models provide high-quality temporal embeddings for both text and video data, enriching predictive capabilities for downstream applications. The findings demonstrate the underutilized potential of temporal embedding models within natural language understanding and computer vision.

Subject Area

Computer science|Information Technology|Computer Engineering

Recommended Citation

Farhan, Ahnaf, "Context-Aware Temporal Embeddings for Text and Video Data" (2023). ETD Collection for University of Texas, El Paso. AAI30819533.