the tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. true/false?
Welcome to the solution site, which is interested in answering educational questions, as it provides an wanser to your questions in all areas
The solution site is concerned with questions, answers and solutions for all study materials for all levels
Through the word “Ask a Question” at the top of the site, you can ask any topic or any question or inquiry you want answered by users specialized
to solving questions and educational and academic activities that you ask
Now let’s answer your question
the tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. true/false?
answer :Specifically, you learned: The sigmoid and hyperbolic tangent activation functions cannot be used in networks with many layers due to the vanishing gradient problem. The rectified linear activation function overcomes the vanishing gradient problem, allowing models to learn faster and perform better.