Date of Award

2023-08-01

Degree Name

Doctor of Philosophy

Department

Computer Science

Advisor(s)

Olac Fuentes

Abstract

Collective intelligence has emerged as a powerful methodology for annotating and classifying challenging data that pose difficulties for automated classifiers. It works by leveraging the concept of "wisdom of the crowds" which approximates a ground truth after aggregating experts' feedback and filtering out noise. However, challenges arise when certain applications, such as medical image classification, security threat detection, and financial fraud detection, demand accurate and reliable data annotation. The unreliability of experts due to inconsistent expertise and competencies, coupled with the associated cost and time-consuming judgment extraction, presents additional challenges.

Input aggregation is the process of consolidating and combining multiple individual judgments, feedback, or annotations obtained from a diverse group of experts to arrive at a single representative decision or prediction. In this dissertation, we introduce diverse deep learning techniques to enhance the accuracy of input aggregation methods and optimize task assignments among experts. We demonstrate that incorporating the outputs of an automated classifier as additional features improves traditional input aggregation methods. We also show that the accuracy of these methods can be further improved by adding meaningful image features learned by self-supervised models. The additional features reduce the requirements of collecting a large number of inputs from human labelers. We also investigate how task assignments can be optimized for groups of experts that possess varying degrees of expertise and diverse competency areas. We show that experts' competencies and samples' complexity can be modeled simultaneously and that optimization algorithms can leverage deep learning models to perform an optimal selection of experts. To train and evaluate the deep learning models, we propose a novel algorithm for generating a large dataset of synthetic X-ray images. The dataset works as a test bed for conducting comprehensive testing and validation of our proposed methodologies.

Language

en

Provenance

Recieved from ProQuest

File Size

97 p.

File Format

application/pdf

Rights Holder

Md Mahmudulla Hassan

Share

COinS