By Yufeng Chen and Xiang Li
We explore optimization problems under partial differential equations
constraints. We describe adjoint sensitivity method that allows efficient
computation of the objective function gradient. We further describe
conjugate gradient method and analyze its convergence properties.
Finally, we implement both algorithms toward solving 1D and 2D time
independent Schrodinger equations.
On an abstract level, the adjoint sensitivity method can be derived
using Lagrange multipliers. We define the Lagrangian as
In most cases, the objective function only depends on so we
make the simplication that . Since , we have
This expression can be simplified into
Note that is difficult to evaluate, hence for convenience
This is called the adjoint equation. After we solve for
we can evaluate
This method is efficient because it avoids the costly evaluation of
. In addition, this method gives the exact value of
Several conditions need to be imposed on the function
First of all, must be smooth so that
is well defined over the domain We haven’t specified
in this general setting it can be both space and time. Also, the ode/pde
are usually initial value or boundary value problems that are define
on domain boundary We require the system
to be well posed. In addition, as we will show, the adjoint equation
needs to be solved backward in time, hence we impose a time dependent
ode/pde system must not exhibit sensitive dependence to initial conditions.
In most physical systems, these conditions are satisfied. The Navier
Stokes equation is one that is extensively studied, and its adjoint equation is linear. Hence, the adjoint method
gives a framework for optimization of Navier Stokes equation.
Note that the adjoint method only gives but not the
optimal solution. To search for an optimum, we need to use another
algorithm that works on the objective function Simple algorithms
can be gradient descent or steepest descent. We know that if the objective
function is convex, then there exists a unique optimum. However, in
most physical systems, convexity is usually not guranteeted. We discuss
conjugate gradient method in the following section.
Conjugate gradient method
We compared four optimization methods: steepest descent algorithm, Newton’s method, BFGS method, and Fletcher-Reeves algorithm (conjugate gradient method).
Among the four algorithms, steepest descent method and Fletcher are gradient-based methods and Newton’s method and BFGS method are Newton-based methods. Gradient-based methods has linear convergence while Newton’s methods has locally quadratic convergence. We can clearly see that the gradient-based methods were trapped in a certain region for a while at the beginning and then converges at a linear rate, and the conjugate gradient method converges much faster than the steepest descent method. The Newton’s methods converge slowly (or even zigzag for a long time) at the beginning, but when they approach to the optimal point, it converges very rapidly at a quadratic rate.
The Rosenbrock’s function used in this example is a particularly good case to illustrate this difference. The minimal values of the Rosenbrock’s function are located in a very narrow curved valley. The gradient-based methods only have information form the first-order derivative, so they often miss the valley and take long time to search along the curved valley to find the optimal point. The Newton’s methods, which take the second-order information into account, can adjust the search with the curvature information from the Hessian and thus can find the optimal direction and searching length much more accurately. We know Newton’s methods can locate the optimal point of a quadratic function in a single one shot, and most functions can be approximated locally by a quadratic function, so Newton’s methods worked out especially well in this example, where the function is mostly quadratic curvature-like.
In terms of computational cost, the conjugate gradient method shows significant advantage. First, the Fletcher-Reeves method only needs the function value and the residual vector so does not need to store the matrix. This is particularity suitable for very large scale sparse matrices. Second, the conjugate gradient method does not need to calculate the Hessian, which can be very computationally expensive for very large scale matrices. Last, although we need the conjugate set of directions to drastically reduce the number of iterations, we do not need to store all of them. From our construction, we can calculate the direction of the nest step by using just the previous step’s direction. These features give the conjugate gradient method a very broad scope of applications in physics, economics, and operation research.
2D Schrodinger equations
One of the most significant results of quantum mechanics is revealing
the wave particle duality of an photon. In this case, we solve the
Schrodinger equation in 2D to emulate that of a quantum pinhole camera.
We assume the probability distribution of many photon particles passing
through a quantum pinhole to be a gaussian pulse, and we use the adjoint
method to solve for the quantum potential in 2D. The problem is set
up on a 2D square domain with Dirichlet boundary conditions. The continuous
laplace operator is discretized using a two stencil representation.
A sparse matrix of size is assembled at each time
step. Figure below shows the designed wave function and the simulated
This 2D optimization problem has 1600 spacial variables and conjugate
graident method is able to converge after approximately 150 iterations.
This fast convergence reduces the computational cost and illustrates
the power of the Krylov subspace method.