Traditional neural networks start from the data, they cannot easily handle prior knowledge -- this is one of the reasons why they often take very long to train. It is desirable to incorporate prior knowledge into deep learning. For the case when this knowledge consists of propositional statements, a successful way to incorporate this knowledge was proposed in a recent paper by van Krieken et al. That paper uses the fact that a neural network does not directly return a truth value, it returns a real value -- in effect, the degree of confidence in the corresponding statement -- from which we extract the truth value by fixing a threshold. Thus, the authors of the paper used formulas for transforming degrees of confidence in individual statements into a reasonable estimate for the degree of confidence in their logical combinations, formulas developed and studied under the name of fuzzy logic. However, it turns out the direct use of these formula often leads to very slow training. That paper showed that we can get effective training if instead of directly using the resulting degree of confidence we first apply a sigmoid-related transformation. In our paper, we provide a theoretical explanation of this semi-empirical idea: specifically, we show that under reasonable conditions, the optimal nonlinear transformation is either a sigmoid or an (arc)tangent or an appropriate combination of sigmoids, (arc)tangents, and their limit cases (such as linear functions).