AM221 Final Project Blog Post: Adjoint sensitivity and conjugate gradient method

By Yufeng Chen and Xiang Li

We explore optimization problems under partial differential equations

constraints. We describe adjoint sensitivity method that allows efficient

computation of the objective function gradient. We further describe

conjugate gradient method and analyze its convergence properties.

Finally, we implement both algorithms toward solving 1D and 2D time

independent Schrodinger equations.

Adjoint method

On an abstract level, the adjoint sensitivity method can be derived

using Lagrange multipliers. We define the Lagrangian as

L(x,p,\lambda)=f(x,p)+\lambda^{T}g(x,p)

In most cases, the objective function only depends on x so we

make the simplication that f(x,p)=f(x). Since g(x,p)=0, we have

\frac{df}{dp}=\frac{dL}{dp}=\frac{\partial f}{\partial x}\frac{dx}{dp}+g\frac{d\lambda}{dp}+\lambda^{T}(\frac{\partial g}{\partial x}\frac{dx}{dp}+\frac{\partial g}{\partial p})

This expression can be simplified into 

\frac{df}{dp}=(\frac{\partial f}{\partial x}+\lambda^{T}\frac{\partial g}{\partial x})\frac{dx}{dp}+\lambda^{T}\frac{\partial g}{\partial p}

Note that \frac{dx}{dp} is difficult to evaluate, hence for convenience

we set 

\frac{\partial f}{\partial x}+\lambda^{T}\frac{\partial g}{\partial x}=0

This is called the adjoint equation. After we solve for \lambda,

we can evaluate 

\frac{df}{dp}=\lambda^{T}\frac{\partial g}{\partial p}

This method is efficient because it avoids the costly evaluation of

\frac{dx}{dp}. In addition, this method gives the exact value of

\frac{df}{dp}. Several conditions need to be imposed on the function

g(x,p). First of all, g(x,p) must be smooth so that \frac{\partial g}{\partial x}

is well defined over the domain \Omega. We haven’t specified \Omega,

in this general setting it can be both space and time. Also, the ode/pde

are usually initial value or boundary value problems that are define

on domain boundary x|_{\partial\Omega}=h(p). We require the system

to be well posed. In addition, as we will show, the adjoint equation

needs to be solved backward in time, hence we impose a time dependent

ode/pde system must not exhibit sensitive dependence to initial conditions.

In most physical systems, these conditions are satisfied. The Navier

Stokes equation is one that is extensively studied, and its adjoint equation is linear. Hence, the adjoint method

gives a framework for optimization of Navier Stokes equation.

Note that the adjoint method only gives \frac{df}{dp} but not the

optimal solution. To search for an optimum, we need to use another

algorithm that works on the objective function f(x). Simple algorithms

can be gradient descent or steepest descent. We know that if the objective

function is convex, then there exists a unique optimum. However, in

most physical systems, convexity is usually not guranteeted. We discuss

conjugate gradient method in the following section.

Conjugate gradient method

We compared four optimization methods: steepest descent algorithm, Newton’s method, BFGS method, and Fletcher-Reeves algorithm (conjugate gradient method). 

Among the four algorithms, steepest descent method and Fletcher are gradient-based methods and Newton’s method and BFGS method are Newton-based methods. Gradient-based methods has linear convergence while Newton’s methods has locally quadratic convergence. We can clearly see that the gradient-based methods were trapped in a certain region for a while at the beginning and then converges at a linear rate, and the conjugate gradient method converges much faster than the steepest descent method. The Newton’s methods converge slowly (or even zigzag for a long time) at the beginning, but when they approach to the optimal point, it converges very rapidly at a quadratic rate.

The Rosenbrock’s function used in this example is a particularly good case to illustrate this difference. The minimal values of the Rosenbrock’s function are located in a very narrow curved valley. The gradient-based methods only have information form the first-order derivative, so they often miss the valley and take long time to search along the curved valley to find the optimal point. The Newton’s methods, which take the second-order information into account, can adjust the search with the curvature information from the Hessian and thus can find the optimal direction and searching length much more accurately. We know Newton’s methods can locate the optimal point of a quadratic function in a single one shot, and most functions can be approximated locally by a quadratic function, so Newton’s methods worked out especially well in this example, where the function is mostly quadratic curvature-like.

In terms of computational cost, the conjugate gradient method shows significant advantage. First, the Fletcher-Reeves method only needs the function value and the residual vector so does not need to store the matrix. This is particularity suitable for very large scale sparse matrices. Second, the conjugate gradient method does not need to calculate the Hessian, which can be very computationally expensive for very large scale matrices. Last, although we need the conjugate set of directions to drastically reduce the number of iterations, we do not need to store all of them. From our construction, we can calculate the direction of the nest step by using just the previous step’s direction. These features give the conjugate gradient method a very broad scope of applications in physics, economics, and operation research. 

2D Schrodinger equations

One of the most significant results of quantum mechanics is revealing

the wave particle duality of an photon. In this case, we solve the

Schrodinger equation in 2D to emulate that of a quantum pinhole camera.

We assume the probability distribution of many photon particles passing

through a quantum pinhole to be a gaussian pulse, and we use the adjoint

method to solve for the quantum potential in 2D. The problem is set

up on a 2D square domain with Dirichlet boundary conditions. The continuous

laplace operator is discretized using a two stencil representation.

A sparse matrix of size 1600\times1600 is assembled at each time

step. Figure below shows the designed wave function and the simulated

quantum potential.

Image

Image

Image

Image

This 2D optimization problem has 1600 spacial variables and conjugate

graident method is able to converge after approximately 150 iterations.

This fast convergence reduces the computational cost and illustrates

the power of the Krylov subspace method. 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s