Non-Proprietary Optimization - Implementation Details

From ASCEND
Jump to: navigation, search
This article is incomplete or needs expanding. Please help out by adding your comments.

This article documents the implementation of wrapper for IPOPT - a Non-Proprietary Optmization tool from COIN-OR. It also discusses automatic differentiation support in ASCEND.


IPOPT

<a href="https://projects.coin-or.org/Ipopt" class="external text" title="https://projects.coin-or.org/Ipopt" rel="nofollow">IPOPT</a> is a software for nonlinear optimization from COIN-OR. It is designed to find local solutions to mathematical problems of the kind:


\begin{align}
\min f \left( x \right) & : \mathbb{R}^n \rightarrow \mathbb{R} \\ 
\mbox{ subject to } g \left( x \right) & : \mathbb{R}^n \rightarrow \mathbb{R}^m \\
x & \in \mathbb{R}^n \\
\mbox{ such that } g_L & \le g \left( x \right) \le g_U \\
x_L & \le x \le x_U \\
\end{align}

where f(x) is the objective function and g(x) the m constraint relations. The vectors g_L and g_U denote the lower and upper bounds of the constraint relations g(x) and x_L and x_U the lower and upper bounds of the independent variables x. The functions f(x) and g(x) can be nonlinear and nonconvex, however they should be twice, continuously differentiable. Observe that equality constraints can be formulated in the above relations by setting the corresponding components of g_L and g_U to the same value.

IPOPT documents how to setup and install IPOPT. It also briefly mentions how to build against it.

With IPOPT functioning, we have a completely FOSS way of solving Nonlinear Optimization problems with ASCEND. The IPOPT Solver Wrapper has been written along the guidelines documented in External Solvers


Foundations

Any expression can be parsed to generate a computational graph (a binary tree indeed) as is shown below for the relation 'f':

 x1^2 + 3 * x2 + Sin(x1 * x2) = C

Re-arranging relation a bit:

 x1^2 + 3 * x2 + Sin(x1 * x2) - C = 0 

where the relation corresponds to

 f: x1^2 + 3 * x2 + Sin(x1 * x2) - C

x1, x2 are independent variables, C is a constant. The Upper and Lower bound of this relation is 0. Like wise we can have other bounded relations - bounded on both sides, bounded only on one side

File:Computational Graph.png


Definitions

A detailed practical and theoretical treatment of calculating derivatives can be found in Griewank's book on [autodiff]. For ASCEND and most other equation-based modeling languages, the terminology used by Griewank is often confusing because it is very general and applies to the differentiation of entire software programs, not just algebraic equations. When differentiating constraint equations or explicit objective functions, the following table can be used to map between the terminology of the text and that commonly used in discussing ASCEND.


Term Name Definition Example
Type Of Node Indicates the type of a node - Real, Integer, Variable, Operations like +,-,*,/, Unary functions e_real, e_int, e_var, e_plus, e_minus, e_uminus, e_times, e_div, e_power, e_ipower, e_func, e_equal
Value Of Node is the value of the expression whose subtree is rooted at that node value of node at pow is x1^2, if x1 is 2 then value is 4 at pow in the computational graph shown above
Adjoint of a Node is the derivative of output relation f wrt to the expression rooted at that node. Note: for constants (real or integer), the value can be got (manually) by temporarily assigning a symbol to the constant, and obtaining the derivative of the relation wrt to the symbol when it takes that particular value. Adjoint of node Sin in the CG above is 1
Tangent of a Node is the derivative of the node with respect to a seed direction. if seed direction is x1, tangent of node Sin is x2*Cos(x1*x2); similarly, tangent of f is 2*x1 + x2*Cos(x1*x2)
Tangent of Adjoint is the derivative of the adjoint of a Node in the particular seed direction. eg, for the subtree pow(x1,2), the tangent of adjoint of Node x1 in that particular subtree in the seed direction x1 is d/dx1(df/dx1) [in that subtree] = d/dx1(2*x1) = 2. Note that this is only a partial result. To get second derivative of f wrt x1, we need to sum up such partial results. which will be 2 - x2^2 * Sin(x1*x2) in the seed direction x1



Reverse Automatic Differentiation

Datastructures

The datastructures are detailed here Reverse Automatic Differentiation Implementation


Concepts to Digest

Adjoint of a Node: If f:R^m -> R is a relation of m variables (m=2, in the above computational graph) then adjoint of a node in the computational graph is defined as df/dNode.

Hence for the node f, the adjoint is 1 as df/df is 1.

Similarly, adjoint of the node with Sin operation is 1.


How it works

In ASCEND implementation of Reverse Mode of Automatic Differentiation, a tape representing the above computational graph is created in postfix form - the linked list of Element_struct type. Pointers to the children of a particular node are stored in arg1 and arg2 fields of the Element_struct.

Now, to calculate the adjoints, we set the Bar.Val field of the node f as 1 (as df/df = 1). Once we set that, we traverse the linked list in the reverse direction - i.e. from the node f down to the very bottom. As we traverse the tree in this fashion, we apply the chain rule to get adjoints of every node we traverse.

For illustration purposes below is a computational graph which shows reverse mode of derivative accumulation. This file has been borrowed from the wikipedia article [1].

File:ReverseaccumulationAD.png

The logic of the above computational graph is implemented in functions, ReturnSweep and its safe variant and the function AccumulateAdjoints and its safe variant in file reverse_ad.c. When we compare the 2 computational graphs above, we observe that nodes x1 and x2 repeat for each occurrence in the relation in the computational graph (1) where as in the second graph, x1 and x2 appear only once and multiple links from a node to a higher level node is allowed in the second graph. Where as in ASCEND implementation, the nodes are replicated and only single link to a higher node is used. Hence the Accumulation phase is required which takes care of summing up the partial adjoints of each independent variable.


Exact Second Partials

The exact second partials have been implemented using the `Tangent of Adjoint` method. The tangent of adjoint method is detailed in Evaluating Derivatives by Andreas Griewank.

The `Tangent of Adjoint` method can be better understood from the following example:

File:Tangent of adjoint.png

As is observed from the above figure, the tangent of adjoint (equation 2) is calculated by applying chain rule to the adjoint (equation 1).

Note: In the `Tangent of Adjoint` equation, the expression \begin{align} \left ( \dot {\frac{\partial Y}{\partial X_{1}}} \right ) \end{align} is evaluated by calculating total derivative of \begin{align}{\frac{\partial Y}{\partial X_{1}}}\end{align} in the seed direction i.e., if some X is the seed direction and suppose \begin{align} Y = X_{1} \times X_{2} \end{align} then total derivative of \begin{align} \left ( {\frac{\partial Y}{\partial X_{1}}} \right ) \end{align} in the direction X is  \dot {X_{2}} where  \dot {X_{2}} = {\frac {\partial X_{2}} {\partial X}} whose value is stored in the Val.Dot field of the node \begin{align} X_{2} \end{align} in the computational graph during the forward sweep.


Dense Hessian

Scope & Proposed Work

References

[1] Automatic differentiation. (2009, August 21). In Wikipedia, The Free Encyclopedia. Retrieved 19:34, October 19, 2009, from http://en.wikipedia.org/w/index.php?title=Automatic_differentiation&oldid=309238913