A New Algorithm for Robust Affine-Invariant Clustering

Andrews Tawia Anum, University of Texas at El Paso

Abstract

Cluster analysis is an unsupervised machine learning technique commonly employed to partition a dataset into distinct categories referred to as clusters. The k-means algorithm is a prominent distance-based clustering method. Despite its overwhelming popularity, the algorithm is not invariant under non-singular linear transformations and is not robust, i.e., can be unduly influenced by outliers. To address these deficiencies, we propose an alternative clustering procedure based on minimizing a “trimmed” variant of the negative log-likelihood function. We develop a “concentration step”, vaguely reminiscent of the classical Lloyd’s algorithm, that can iteratively reduce the objective function. Multiple real and synthetic datasets are analyzed to assess the performance of our algorithm. Compared to k-means, empirical studies indicate competitiveness and oftentimes superiority of our algorithm.

Subject Area

Statistics

Recommended Citation

Anum, Andrews Tawia, "A New Algorithm for Robust Affine-Invariant Clustering" (2021). ETD Collection for University of Texas, El Paso. AAI28870001.
https://scholarworks.utep.edu/dissertations/AAI28870001

Share

COinS