Publication Date

9-1-2021

Comments

Technical Report: UTEP-CS-21-83

Abstract

At present, the most successful machine learning technique is deep learning, that uses rectified linear activation function (ReLU) s(x) = max(x,0) as a non-linear data processing unit. While this selection was guided by general ideas (which were often imprecise), the selection itself was still largely empirical. This leads to a natural question: are these selections indeed the best or are there even better selections? A possible way to answer this question would be to provide a theoretical explanation of why these selections are -- in some reasonable sense -- the best. This paper provides a possible theoretical explanation for this empirical fact.

Download

Included in

Computer Sciences Commons, Mathematics Commons

COinS

Departmental Technical Reports (CS)

Why Rectified Linear Activation Functions? Why Max-Pooling? A Possible Explanation

Publication Date

Comments

Abstract

Included in

Search

Links

Browse

Author Corner

Links

Departmental Technical Reports (CS)

Why Rectified Linear Activation Functions? Why Max-Pooling? A Possible Explanation

Authors

Publication Date

Comments

Abstract

Included in

Share

Search

Links

Browse

Author Corner

Links