Topics
	Computing the Hessian of the Lagrangian

	Computing the second derivative on constraint in a direction

	Second derivative of the Lagrangian in a direction

Detailed Description

Methods for computing second order information for non-linear functions.

CONOPT is able to take advantage of second derivatives. Only some of the potential ways of defining second derivatives is currently being implemented and this chapter describes the interface defined so far.

Second order information is used to formulate and solve an inner Quadratic Programming sub-model in a Sequential Quadratic Programming (SQP) like algorithm. If 2^nd order information is not readily available it will be approximated from gradient values using finite differences. Please note that perturbations may require very many gradient calls so this option assumes that gradient calls are cheap. In addition, perturbations require accurate gradients. Even given accurate derivatives they are not as accurate as properly coded 2^nd derivatives. It is therefore highly recommended to provide some form or second order information.

Second order information can be available in two forms: (1) as a sparse matrix of second derivatives, the Hessian matrix, and (2) or as a vector of directional second derivatives. Second-order information can be defined based on individual constraints or on the Lagrangian function, giving a total of four combinations of which three are supported at the moment. The notation and mathematical relationship between these concepts is as follows:

The Hessian of a single constraint \(k\), \(H_k\), is the matrix of second derivatives defined as

\[ H_{ij}^k = \frac{\partial^{2}g_k}{\partial x_{i}\partial x_{j}} \]

The Hessian is by definition a symmetric matrix and we will usually only work with the lower triangular part of the matrix, including the diagonal. \(H^k\) will only have non-zero elements in rows and columns corresponding to variables that appear nonlinearly in constraint \(k\), so if constraint \(k\) is sparse, the corresponding Hessian will usually also be sparse. But notice that if constraint \(k\) depends on \(p\) variables, the Hessian may have up to \(p^2\) elements in total or \(p \cdot (p+1)/2\) elements in the lower triangular part of the matrix. There are models where the constraints on average are sparse, but one or a few of the constraints depend on many variables, and their Hessians have a huge number of nonzero elements.

The Lagrangian function is defined as

\[ L(x,u)=\sum_{k}u_{k} \cdot g_{k}(x) \]

where we have ignored some terms that do not contribute to second derivatives. \(u\) is a vector of Lagrange multipliers defined by CONOPT. There is one Lagrange multiplier for each constraint.

The Hessian of the Lagrangian, \(HL\), is defined as:

\[ HL = \sum_{k}u_{k} \cdot H^{k} \quad \text{or} \quad HL_{ij} = \sum_{k}u_{k} \cdot \frac{\partial^{2}g_{k}}{\partial x_{i}\partial x_{j}} \]

The sparsety pattern for the Hessian of the Lagrangian is the union of the sparsety patterns for the Hessians of all constraints in the model. This means that if one of the constraints has a dense Hessian then the Hessian of the Lagrangian will also be dense and CONOPT cannot use a Hessian efficiently with these models.

The directional first derivative of constraint \(k\) in point \(x\) and in the direction \(v\) is defined as

\[ d1_{k}(x,v) = \frac{\partial g_{k}(x+\theta\cdot v)}{\partial \theta} = \frac{\partial g_{k}}{\partial x} \cdot v \]

or the scalar product of the first derivative of \(g_k\) with the direction \(v\). The directional second derivative is defined as

\[ d2_{k}(x,v) = \frac{\partial d1_{k}(x,v)}{\partial x} = \frac{\partial^{2}g_{k}}{\partial x^{2}} \cdot v = H^{k} \cdot v \]

or the product of the Hessian with the direction v. The directional second derivative of the Lagrangian is similarly defined as

\[ HL \cdot v = \sum_{k}u_{k} \cdot H_{k} \cdot v \]

where the order of the multiplications can be selected by the implementer. Notice that in cases where \(H_k\) has too many elements to be of practical direct use, the directional second derivative is still a vector with only one element per variable, and it may be cheap to compute.

Currently, CONOPT accept second derivative information in three forms:

1. The Hessian of the Lagrangian,

2. Directional second derivatives of individual constraints, and

3. Directional second derivatives of the Lagrangian.

The remaining combination, the Hessians of individual constraints, can be implemented if they are more convenient for some users.

CONOPT uses the second order information in a Sequential Quadratic Programming (SQP) algorithm to find good directions for a line search. The SQP model is solved approximately using a conjugate gradient (CG) algorithm. The SQP tolerances are large on the beginning of the optimization and are reduced as we approach the optimal solution. The gradients used in the CG algorithm are either computed by repeated calls to the directional second derivative routine or by a single call to the Hessian routine followed by several multiplications of the Hessian with a direction vector inside CONOPT. If both a Hessian and directional derivatives are available, CONOPT will initially use 2DDir or 2DDirLagr when few CG iterations are expected, and switch to 2DLagrVal when the accuracy is tighter and a larger number of iterations are expected.

Topics

Detailed Description