This session is a survey of results from the works of Sumio Watanabe  on using resolution of singularity techniques from nonlinear algebra to improve learning and model selection when the Fisher information matrix of the learning machine is singular. This happens to be almost always the case!
The notion of singularity in mathematics refers to the points on an algebraic manifold where the tangent space is ill-behaved. We shall see that singularities make the learning process more challenging by substantially worsening the bias-variance tradeoff and lacking the desired convergence properties regardless of the number of training examples.
The Fisher information matrix is the Hessian of the KL-distance (loss function) at the true parameter. We follow  to take a closer look at how singularities are manifest in practice by examining the spectrum of the eigenvalues of the loss function for some typical neural network examples.
 Almost All Learning Machines are Singular, Sumio Watanabe  Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond, Levent Sagun, Leon Bottou, Yann LeCun