Departmental Technical Reports (CS)

Why, in Deep Learning, Non-Smooth Activation Function Works Better Than Smooth Ones

Daniel Cruz, The University of Texas at El PasoFollow
Richard Godoy, The University of Texas at El PasoFollow
Vladik Kreinovich, The University of Texas at El PasoFollow

Publication Date

4-1-2021

Comments

Technical Report: UTEP-CS-21-41

Abstract

Since in the physical world, most dependencies are smooth (differentiable), traditionally, smooth functions were used to approximate these dependencies. In particular, neural networks used smooth activation functions such as the sigmoid function. However, the successes of deep learning showed that in many cases, non-smooth activation functions like max(0,z) work much better. In this paper, we explain why in many cases, non-smooth approximating functions often work better -- even when the approximated dependence is smooth.

Download

Included in

Computer Sciences Commons, Mathematics Commons

COinS

Departmental Technical Reports (CS)

Why, in Deep Learning, Non-Smooth Activation Function Works Better Than Smooth Ones

Publication Date

Comments

Abstract

Included in

Search

Links

Browse

Author Corner

Links

Departmental Technical Reports (CS)

Why, in Deep Learning, Non-Smooth Activation Function Works Better Than Smooth Ones

Authors

Publication Date

Comments

Abstract

Included in

Share

Search

Links

Browse

Author Corner

Links