Date of Award

2021-12-01

Degree Name

Master of Science

Department

Computational Science

Advisor(s)

Michael Pokojovy

Abstract

Cluster analysis is an unsupervised machine learning technique commonly employed to partition a dataset into distinct categories referred to as clusters. The k-means algorithm is a prominent distance-based clustering method. Despite its overwhelming popularity, the algorithm is not invariant under non-singular linear transformations and is not robust, i.e., can be unduly influenced by outliers. To address these deficiencies, we propose an alternative clustering procedure based on minimizing a “trimmed” variant of the negative log-likelihood function. We develop a “concentration step”, vaguely reminiscent of the classical Lloyd’s algorithm, that can iteratively reduce the objective function. Multiple real and synthetic datasets are analyzed to assess the performance of our algorithm. Compared to k-means, empirical studies indicate competitiveness and oftentimes superiority of our algorithm.

Language

en

Provenance

Received from ProQuest

File Size

73 p.

File Format

application/pdf

Rights Holder

ANDREWS TAWIAH ANUM

Share

COinS