Adjoint Sensitivity Method

Motivation

Numerical modeling and simulation of complex physical systems are becoming increasingly important in physics and engineering. Many problems are formulated in the form of ordinary or partial differential equations, which require numerical solution schemes such as finite difference, finite element or integral methods. A particular problem can have multiple input parameters, and one often wants to optimize an objective function based on the computed solution. In these cases, the systems of ordinary or partial differential equations serve as the constraints, and the objective function evaluates the numerical solution of the differential equations. Examples of ode/pde constrained optimization include weather forecasting, airfoil shape design and design of micro-electrical-mechanical devices. A characteristic of this type of optimization problems is the high cost of constraint evaluation, since solving partial differential equations is a numerically costly task. We introduce the adjoint sensitivity method, which computes objective function gradient with the same computational cost as solving the differential equations once.

Formulation

We present the adjoint sensitivity method for a partial differential equation that can be semi-discretized in space and solved as an ode in time. Heat or wave equations are examples of this type of equations. The optimization problem can be formulated as

\begin{array}{c}  \underset{p}{\mbox{minimize}}F(x,p),\mbox{ where \ensuremath{F(x,p)=\int_{0}^{T}f(x,p,t)dt}}\\  \mbox{subject to \ensuremath{h(x,\dot{x},p,t)=0}}\\  g(x(0),p)=0\end{array}

We can interpret h(x,\dot{x},p,t)=0 as the governing ordinary differential equation in time, where p is the input parameters, t is time and x is the solution to the ordinary differential equation. The function g(x,p) is the constraint on initial conditions. The objective function F(x,p) assigns a value to the input parameter and solution pair (x,p). Solving this optimization problem usually involves finding the gradient function \frac{dF}{dp} and applying a gradient based algorithm such as conjugate gradient descent. We derive the solution to \frac{dF}{dp} through sensitivity method.

We first define the Lagrangian

L=\int_{0}^{T}[f(x,p,t)+\lambda^{T}h(x,\dot{x},p,t)]dt+\mu^{T}g(x(0),p)

,where \lambda and \mu are Lagrangian multipliers. Since \frac{dL}{dp}=\frac{dF}{dp} ,we take the derivative of the Lagrangian

\frac{dL}{dp}=\int_{0}^{T}\frac{\partial f}{\partial x}\frac{dx}{dp}+\frac{\partial f}{\partial p}+\lambda^{T}(\frac{\partial h}{\partial x}\frac{dx}{dp}+\frac{\partial h}{\partial\dot{x}}\frac{\partial\dot{x}}{\partial p}+\frac{\partial h}{\partial p})dt+\mu^{T}(\frac{\partial g}{\partial x(0)}\frac{dx(0)}{dp}+\frac{\partial g}{\partial p})

Note that \frac{d\dot{x}}{dp} is difficult to compute, so we need to eliminate this term using integration by parts. We also need to eliminate costly terms such as \frac{dx}{dp} by choosing appropriate Lagrange multipliers. In this case, we can let

\begin{array}{c}  \mu^{T}=\lambda^{T}\frac{\partial h}{\partial\dot{x}}|_{0}\frac{\partial g^{-1}}{\partial x(0)}\\  \frac{\partial f}{\partial x}+\lambda^{T}(\frac{\partial h}{\partial x}-\frac{d}{dt}\frac{\partial h}{\partial\dot{x}})-\dot{\lambda}^{T}\frac{\partial h}{\partial\dot{x}}=0\end{array}

Finally, gradient function simplifies to

\frac{dF}{dp}=\int_{0}^{T}[f_{p}+\lambda^{T}\frac{\partial h}{\partial p}]dt+\lambda^{T}\frac{\partial h}{\partial\dot{x}}|_{t=0}\frac{\partial g^{-1}}{\partial x(0)}\frac{\partial g}{\partial p}

Example

We demonstrate a very simple example of using the adjoint method to compute the objective function gradient. We want to compute the gradient of the system

\begin{array}{c}  \int_{0}^{T}xdt\\  \mbox{subject to \ensuremath{\dot{x}=bx}}\\  x(0)-a=0,\end{array}

where p=[a, b] are the input parameters. Firstly, using the approach discussed in the previous section, we need to evaluate the function once: x(t)=ae^{bt}. We then compute the Lagrange multiplier by solving

\begin{array}{c}  1-b\lambda-\dot{\lambda}=0\\  \lambda(T)=0\end{array}

This leads to \lambda(t)=b^{-1}(1-e^{b(T-t)}) . Finally, we can compute the objective function gradient as

\frac{dF}{dp}=\left(\begin{array}{c}  \frac{\partial F}{\partial a}\\  \frac{\partial F}{\partial b}\end{array}\right)=\left(\begin{array}{c}  \frac{a}{b}Te^{bt}-\frac{a}{b^{2}}(e^{bT}-1)\\  0\end{array}\right)

The adjoint method can be applied to solve more difficult problems such as inverse Schrodinger equation and address optimization of fluid flow that exhibits chaotic behavior. More examples can be found at http://engineer-chaos.blogspot.com/p/what-is-chaos.html and http://math.mit.edu/~stevenj/18.336/adjoint.pdf.

References

R. M. Errico, “What is an adjoint model?,” Bul- letin Am. Meteorological Soc. , vol. 78, pp. 2577– 2591, 1997.

Y Cao, S. Li, L. Petzold, and R. Serban, “Adjoint sensitivity analysis for differential- algebraic equations: The adjoint DAE system and its numerical solution,” SIAM J. Sci. Com- put. , vol. 24, no. 3, pp. 1076–1089, 2003.

Optimization With PDE Constraints ,Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S. 2009

Written by Yufeng (Kevin) Chen

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s