# CCMA Seminar on Mathematics of Data and Computation

This PSU-PKU joint CCMA seminar aims to introduce cutting-edge research of numerical methods and scientific computing related to data science and  machine learning and are held weekly on every Thursday 8:30-9:30 pm ET (Friday 8:30-9:30 am Beijing time). If you want to give a talk, please contact Lian Zhang (luz244@psu.edu).

(Due to the COVID-19, we move all the seminars online, ZOOM: https://psu.zoom.us/j/560094163)

## Flow dynamic approach and Lagrange multiplier approach for gradient flow

• Qing Cheng, Illinois Institute of Technology
• Time: 8:30pm – 9:30pm, Thursday, July 30, 2020 (ET).
• Abstract: In this talk, I will introduce  a new Lagrangian approach-flow dynamic approach to effectively capture the interface for phase field models. Its main advantage, comparing with numerical methods in Eulerian coordinate  is that thin interfaces can be effectively captured with few points in Lagrangian coordinate. Meanwhile  I will also introduce the SAV and Lagrange multiplier approach which preserve  energy dissipative  and physical constraints for gradient systems in discrete level. The advantage of  these methods only require solving linear equation with constant coefficients at each time step plus an additional nonlinear algebraic system which can be solved at negligible cost. Ample numerical results for phase field models are presented to validate the effectiveness and accuracy of the proposed numerical schemes.

## Bayesian Sparse learning with preconditioned stochastic gradient MCMC and its applications

• Guang Lin, Purdue University
• Time: 8:30pm – 9:30pm, Thursday, July 23, 2020 (ET).
• Abstract: Deep neural networks have been successfully employed in an extensive variety of research areas, including solving partial differential equations. Despite its significant success, there are some challenges in effectively training DNN, such as avoiding over-fitting in over-parameterized DNNs and accelerating the optimization in DNNs with pathological curvature.
In this work, we propose a Bayesian type sparse deep leaning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using appropriate stochastic gradient Markov Chain Monte Carlo method (SG-MCMC), and optimize the latent variables using stochastic approximation. The sparsity of the network is achieved while optimizing the hyperparameters with adaptive searching and penalizing. A popular SG-MCMC approach is Stochastic gradient Langevin dynamics (SGLD). However, considering the complex geometry in the model parameter space in non-convex learning, updating parameters using a universal step size in each component as in SGLD may cause slow mixing. To address this issue, we apply computational manageable preconditioner in the updating rule, which provides step size adapt to local geometric properties. Moreover, by smoothly optimizing the hyperparameter in the preconditioning matrix, our proposed algorithm ensures a decreasing bias, which is introduced by ignoring the correction term in preconditioned SGLD. According to existing theoretical framework, we show that the proposed method can asymptotically converge to the correct distribution with a controllable bias under mild conditions. Numerical tests are performed on both synthetic regression problems and learning the solutions of elliptic PDE, which demonstrate the accuracy and efficiency of present work.

## Sparse Machine Learning in a Banach Space

• Yuesheng Xu, Old Dominion University
• Time: 8:30pm – 9:30pm, Tuedsay July 21, 2020 (ET).
• Abstract: We report in this talk recent development of kernel based machine learning. We first review a basic classical problem in machine learning- classification, from which we introduce kernel based machine learning methods. We discuss two fundamental problems in kernel based machine learning: representer theorems and kernel universality. We then elaborate recent exciting advances in sparse learning. In particular, we discuss the notion of reproducing kernel Banach spaces and learning in a Banach space.

## Reduced-order Deep Learning for Flow Dynamics. The Interplay between Deep Learning and Model Reduction

• Min Wang, Duke University
• Time: 8:30pm – 9:30pm, Thursday, July 16, 2020 (ET).
• Abstract: In this work, we investigate neural networks applied to multiscale simulations of porous media flows taking into account the observed fine data and physical modeling concepts. In addition, a design of a novel deep neural network model reduction approach for multiscale problems is discussed.  Our approaches use deep learning techniques combined with local multiscale model reduction methodologies to predict flow dynamics. Constructing deep learning architectures using a reduced-order models can benefit its robustness since such a model has fewer degrees of freedom. Moreover, numerical results show that using deep learning with data generated from multiscale models as well as available observed fine data, we can obtain an improved forward map which can better approximate the fine scale model. More precisely, the solution (e.g., pressures and saturation) at the time instant n+1 is approximated by a neural network taking the solution at the time instant n and parameters, such as permeability fields, forcing terms, and initial conditions as its inputs.  We further study the features of the coarse-grid solutions that neural networks capture via relating the input-output optimization to $l_1$ minimization of flow solutions. In proposed multi-layer networks, we can learn the forward operators in a reduced way without computing them as in POD like approaches. We present soft thresholding operators as activation function, which promotes sparsity and can be further utilized to find underlying low-rank structures of the data.  With these activation functions, the neural network identifies and selects important multiscale features which are crucial in modeling the underlying flow. Using trained neural network approximation of the input-output map, we construct a reduced-order model.

## Computational Redundancy in Deep Neural Networks

• Gao Huang, Tsinghua University
• Time: 8:30pm – 9:30pm, Thursday, June 25, 2020 (ET).
• Abstract: Deep learning has gained great popularity in computer vision, natural language processing, robotics, etc. However, deep models are often been criticized as being cumbersome and energy inefficient. This talk will first demonstrate that deep networks are overparameterized models – although they have millions (or billions) of parameters, they may not use them effectively. High redundancy seems to be helpful in terms of generalization, while it introduces high computational burden to real systems. This talk will introduce algorithms and architecture innovations that help us understand the redundancy in deep models, and eventually reduce unnecessary computation for efficient deep learning.

## Understanding Deep Learning via Analyzing Trajectories of Gradient Descent

• Wei Hu, Princeton University
• Time: 8:30pm – 9:30pm, Thursday, June 18, 2020 (ET).
• Abstract: Deep learning builds upon the mysterious abilities of gradient-based optimization algorithms. Not only can these algorithms often achieve low loss on complicated non-convex training objectives, but the solutions found can also generalize remarkably well on unseen test data. Towards explaining these mysteries, I will present some recent results that take into account the trajectories taken by the gradient descent algorithm — the trajectories turn out to exhibit special properties that enable the successes of optimization and generalization.

## Statistical Method for Selecting Best Treatment with High-dimensional Covariates

• Xiaohua Zhou, Peking University
• Time: 8:30pm – 9:30pm, Thursday, June 11, 2020 (ET).
• Abstract: In this talk,  I will introduce  a new semi-parametric modeling method for heterogeneous treatment effect estimation and individualized treatment selection using a covariate-specific treatment effect (CSTE) curve with high-dimensional covariates.   The proposed method is quite flexible to depict both local and global associations between the treatment and baseline covariates, and thus is robust against model mis-specification in the presence of high-dimensional covariates. We also establish the theoretical properties of our proposed procedure.  I will  further illustrate  the performance of  the proposed method by simulation studies and analysis of a real data example.  This is a joint work with Drs.  Guo and Ma at University of California at Riverside.

## Homotopy training algorithm for neural networks and applications in solving nonlinear PDE

• Wenrui Hao, Pennsylvania State University
• Time: 8:30pm – 9:30pm, Thursday, May 21, 2020 (ET).
• Abstract: In this talk, I will introduce two different topics related to neural networks. The first one is a homotopy training algorithm that is designed to solve the nonlinear optimization problem of machine learning via building the neural network adaptively. The second topic is a randomized Newton’s method that is used to solve nonlinear systems arising from the neural network discretization of differential equations. Several examples are used to demonstrate the feasibility and efficiency of two proposed methods.

## From ODE solvers to accelerated first-order methods for convex optimization

• Long Chen, University of California, Irvine
• Time: 8:30pm – 9:30pm, Thursday, May 14, 2020 (ET).
• Abstract: Convergence analysis of accelerated first-order methods for convex optimization problems are presented from the point of view of ordinary differential equation (ODE) solvers. We first take another look at the acceleration phenomenon via A-stability theory for ODE solvers and present a revealing spectrum analysis for quadratic programming. After that, we present the Lyapunov framework for dynamical system and introduce the strong Lyapunov condition. Many existing continuous convex optimization models, such as gradient flow, heavy ball system, Nesterov accelerated gradient flow, and dynamical inertial Newton system etc, are addressed and analyzed in this framework. Then we present convergence analyses of optimization algorithms obtained from implicit or explicit methods of underlying dynamical systems. This is a joint work with Hao Luo from Sichuan University

## The Geometry of Functional Spaces of Neural Networks

• Matthew Trager, Courant Institute at NYU
• Time: 8:30pm – 9:30pm, Thursday, May 7, 2020 (ET).
• Abstract: The reasons behind the empirical success of neural networks are not well understood. One important characteristic of modern deep learning architectures compared to other large-scale parametric learning models is that they identify a class of functions that is non-linear, but rather has a complex hierarchical structure. Furthermore, neural networks are non-identifiable models, in the sense that different parameters may yield the same function. Both of these aspects come into play significantly when optimizing an empirical risk in classification or regression tasks.
In this talk, I will present some of my recent work that studies the functional space associated with neural networks with linear, polynomial, and ReLU activations, using ideas from algebraic and differential geometry. In particular, I will emphasize the distinction between the intrinsic function space and its parameterization, in order to shed light on the impact of the architecture on the expressivity of a model and on the corresponding optimization landscapes.

## Interpreting Deep Learning Models: Flip Points and Homotopy Methods

• Time: 8:30 pm – 9:30 pm, April 30 , 2020 (ET).
• Abstract: This talk concerns methods for studying deep learning models and interpreting their outputs and their functional behavior. A trained model (e.g., a neural network), is a function that maps inputs to outputs. Deep learning has shown great success in performing different machine learning tasks; however, these models are complicated mathematical functions, and their interpretation remains a challenging research question. We formulate and solve optimization problems to answer questions about the model and its outputs. Specifically, we study the decision boundaries of a model using flip points. A flip point is any point that lies on the boundary between two output classes: e.g. for a neural network with a binary yes/no output, a flip point is any input that generates equal scores for “yes” and “no”. The flip point closest to a given input is of particular importance, and this point is the solution to a well-posed optimization problem. To compute the closest flip point, we develop a homotopy algorithm for neural networks that transforms the deep learning function in order to overcome the issues of vanishing and exploding gradients. We show that computing closest flip points allows us to systematically investigate the model, identify decision boundaries, interpret and audit the model with respect to individual inputs and entire datasets, and find vulnerability against adversarial attacks. We demonstrate that flip points can help identify mistakes made by a model, improve the model’s accuracy, and reveal the most influential features for classifications.

## Partial Differential Equation Principled Trustworthy Deep Learning

• Bao Wang, University of California, Los Angeles
• Time: 9:00am – 10:30am, Apr. 24th, 2020 (ET).
• Abstract: This talk contains two parts: In the first part, I will present some recent work on developing partial differential equation principled robust neural architecture and optimization algorithms for robust, accurate, private, and efficient deep learning. In the second part, I will discuss some recent progress on leveraging Nesterov accelerated gradient style momentum for accelerating deep learning, which again involves designing stochastic optimization algorithms and mathematically principled neural architecture.

## Machine Learning Models for Drug Design

• Kelin Xia, Nanyang Technological University
• Time: 9:00am – 10:30am, Apr. 17th, 2020 (ET).