CONOPT
Loading...
Searching...
No Matches
Using second derivatives

CONOPT can take advantage of second derivatives. This requires that the user can provide the second derivatives in analytic form. CONOPT can use two forms of second derivatives: As a matrix with the second derivatives of the Lagrangian of the model (the Hessian of the Lagrangian), and/or as a set of directional second derivatives. We will show how the first form can be used with the model defined above. The second form is usually more difficult to implement by hand, but it can be very useful for automatically computed derivatives.

If second derivatives are difficult to code, then CONOPT can be instructed to compute directional second derivatives using perturbations of the Jacobian. This form of second derivatives require fast and accurate Jacobian computations and it may not always be worth while, so you are advised to experiment. This possibility is turned on and off with the routine COIDEF_Perturb as shown in the example in qp5.f90 or qp5.c.

The sparsity pattern

You must first determine the sparsity pattern of the matrix of second derivatives. This is done by first determining the second derivatives of each nonlinear constraint and then combining them:

From the Constraint Reorganization section we have the nonlinear terms for the individual nonlinear constraints (only the nonlinear terms are provided here):

Nonlinear terms: \(Out P\)

The second derivatives are constant with value +1 in the ( \(P\), \(Out\)) and ( \(Out\), \(P\)) positions as follows:

3: \(Out\) 4: \(P\)
3: \(Out\) +1
4: \(P\) +1

Nonlinear terms: \((A_{l} L^{-\rho} + A_{k} K^{-\rho} + A_{Inp} Inp^{-\rho})^{-1/\rho}\)

There are varying second derivatives in the ( \(L\), \(L\)), ( \(Inp\), \(L\)), ( \(L\), \(Inp\)) and ( \(Inp\), \(Inp\)) positions:

1: \(L\) 2: \(Inp\)
1: \(L\) NL NL
1: \(Inp\) NL NL

The Hessian of the Lagrangian is a weighted sum of these two matrices so the sparsity pattern for the overall Hessian can be derived from the tables above. Since the matrix is symmetric, CONOPT only uses the diagonal and the lower triangular part. The structure of this matrix is:

1: \(L\) 2: \(Inp\) 3: \(Out\) 4: \(P\)
1: \(L\) 2: NL (1)
1: \(Inp\) 2: NL (1) 3: NL (1)
3: \(Out\)
4: \(P\) 4: +1 (0)

The first row and column represent the variables with their index and name. The inner cells represent the sparse Hessian. We have assigned the nonzero cells an index in the sparse Hessian by numbering the cells column wise starting from 1 as shown by the numbers before the colon. The Hessian index is in the figure followed by a code for whether the Hessian element is constant or nonlinear and, in parenthesis, which constraint(s) it comes from. Note that Hessian elements can have contributions from several constraints in complex models.

You must tell CONOPT that second derivatives are available by registering a 2DLagr callback routine. You should also give an estimate of the number of elements in the Hessian to be used in memory estimates. In addition, it can be useful to let CONOPT debug the second derivatives by defining Debug2D. These lines are included in the revised model

in tutorial2.f90

External :: Tut_2DLagr
INTEGER :: COIDEF_Num2D
INTEGER :: COIDEF_Debug2D
..
COI_Error = MAX( COI_Error, COIDEF_Num2D
 ( CntVect, 4 ) )
COI_Error = MAX( COI_Error, COIDEF_Debug2D ( CntVect, 0 ) )
COI_Error = MAX( COI_Error, COIDEF_2DLagr ( CntVect,Tut_2DLagr) )

in tutorial2.c:

Int Num2D;
Int Debug2D;
..
Num2D = 4; Debug2D = ..;
..
COI_Error += COIDEF_Num2D
 ( CntVect, &Num2D );
COI_Error += COIDEF_Debug2D ( CntVect, %Debug2D );
COI_Error += COIDEF_2DLagr ( CntVect, &Tut_2DLagr);

You must then define a 2nd order callback routine, here named Tut_2DLagr. The complete implementation of Tut_2Dlagr with adjustments in the main program is given in the file tutorial2.f90 and tutorial2.c. Tut_2Dlagr will be called in three different modes: The first call, represented by argument mode == 1, is a size call. The user must return into nhess (*NHESS) the number of nonzero elements in the Hessian of the Lagrangian as counted above. Only the arguments n (*N) (number of variables), m (*M) (number of constraints), mode (=1) (*MODE), and possibly usrmem (USRMEM) (see COIDEF_Usrmem) are defined during this call.

The second call, represented by argument mode == 2, is a sparsity structure call. The user must return the sparsity pattern of the Hessian of the Lagrangian as defined in the figure above into the two int vectors hsrw (HSRW) (for Hessian Rows) and hscl (HSCL) (for Hessian Columns) that CONOPT in the mean time has allocated with nhess (*NHESS) elements each. Only the arguments n (*N), m (*M), nhess (*NHESS), mode (MODE), and usrmem (USRMEM) are defined during this call.

The remaining calls, represented by mode == 3, are numerical evaluation calls. The user must compute the numerical values of the Hessian elements as defined below and return them in hsvl (HSVL) (for Hessian Values). Tut_2Dlagr has as input the usual x vector that represent the point in which we should evaluate the Hessian. The second argument is a vector u with weights for each constraint. x and U are only defined during the numerical evaluation mode, i.e. when mode == 3. The values of the second derivatives are returned in the fifth argument, hsvl. During these calls, the arguments hsrw, hscl, n, m, nhess, mode, and usrmem are all defined. hsrw and hscl hold the values from the mode == 2 call; they can in some cases make the coding of the routine easier.

The user must return the Hessian values according to the following definition:

\begin{equation} \texttt{hsvl(i)} = \sum_{k}\frac{\texttt{u(k)}\partial^{2}\texttt{g(k)}}{\partial \texttt{X}_{\texttt{hsrw(i)}}\partial\texttt{x}_{\texttt{hscl(i)}}} \end{equation}

i.e. the user must in each cell return the sum over all constraints of the second derivative with respect to the two variables in hsrw and hscl. The contribution from each constraint is weighted with u.

The implementation in tutorial2.f90 and tutorial2.c shows how common terms are used in the definition of the individual Hessian terms. Notice how the expressions are very complicated because constraint 2 involves a power function of a long expression. It is very easy to make mistakes so the second derivative debugger is highly recommended (even though the output of the debugger not is nicely formatted).

If you are going to use second derivatives, it can often be useful to write the model in a way that makes it easier to derive the second derivatives. In particular, functions of long expressions should be avoided by defining intermediate variables that represent the inner expressions. In our particular example, the equation

\[ Out = (A_{l} L^{-\rho} + A_{k} K^{-\rho} + A_{Inp} Inp^{-\rho})^{-1/\rho} \]

can, with the help of an extra variable, be split into the two equations

\begin{equation} \begin{aligned} inner &= A_{l} L^{-\rho} + A_{k} K^{-\rho} + A_{Inp} Inp^{-\rho} \\ Out &= inner^{-1/\rho} \end{aligned} \end{equation}

All first and second derivatives become much simpler with only diagonal terms in the Hessian. The price is a slightly larger model, but the overall solution effort will often be reduced by this type of reformulation. The revision is implemented in the model tutorial2r.f90 and tutorial2r.c. It is not shown here, but the source is available with CONOPT.