CONOPT
|
Enumerations | |
enum class | ConoptSDEvaluationType { ConoptSDEvaluationType::None = 0 , ConoptSDEvaluationType::Constraint , ConoptSDEvaluationType::Lagrangian } |
the evaluation type for the directional second derivatives More... | |
Functions | |
void | ConoptModelData::setSDEvaluationType (ConoptSDEvaluationType sdevaltype) |
informs CONOPT of the method for evaluating the second derivative | |
void | ConoptModelData::setSDLagrangianStructure (const std::vector< int > &rownum, const std::vector< int > &colnum) |
sets the structure of the second derivatives of the Lagrangian | |
virtual int | ConoptModelData::SDLagrVal (const double x[], const double u[], const int hsrw[], const int hscl[], double hsvl[], int *nodrv, int numvar, int numcon, int nhess) |
Computes and returns the numerical values of the Hessian. | |
virtual int | ConoptModelData::SDDirLagr (const double x[], const double dx[], const double u[], double d2g[], int newpt, int *nodrv, int numvar, int numcon) |
computes the directional second derivative for the Lagrangian | |
virtual int | ConoptModelData::SDDir (const double x[], const double dx[], double d2g[], int rowno, const int jacnum[], int *nodrv, int numvar, int numjac, int thread) |
computes the directional second derivative for a single constraint | |
virtual int | ConoptModelData::SDDirIni (const double x[], const double dx[], const int rowlist[], int listsize, int numthread, int newpt, int *nodrv, int numvar) |
called by CONOPT before a sequence of 2DDir calls each time either the point or the direction changes. | |
virtual int | ConoptModelData::SDDirEnd (int *nodrv) |
called by CONOPT after a sequence of 2DDir calls each time either the point or the direction changes. | |
Virtual functions that can be optionally implemented by the user to define the second order derivatives.
CONOPT is able to take advantage of second derivatives. Only some of the potential ways of defining second derivatives is currently being implemented and this chapter describes the interface defined so far.
Second order information is used to formulate and solve an inner Quadratic Programming sub-model in a Sequential Quadratic Programming (SQP) like algorithm. If 2nd order information is not readily available it will be approximated from gradient values using finite differences. Please note that perturbations may require very many gradient calls so this option assumes that gradient calls are cheap. In addition, perturbations require accurate gradients. Even given accurate derivatives they are not as accurate as properly coded 2nd derivatives. It is therefore highly recommended to provide some form or second order information.
Second order information can be available in two forms: (1) as a sparse matrix of second derivatives, the Hessian matrix, and (2) or as a vector of directional second derivatives. Second-order information can be defined based on individual constraints or on the Lagrangian function, giving a total of four combinations of which three are supported at the moment. The notation and mathematical relationship between these concepts is as follows:
The Hessian of a single constraint \(k\), \(H_k\), is the matrix of second derivatives defined as
\[ H_{ij}^k = \frac{\partial^{2}g_k}{\partial x_{i}\partial x_{j}} \]
The Hessian is by definition a symmetric matrix and we will usually only work with the lower triangular part of the matrix, including the diagonal. \(H^k\) will only have non-zero elements in rows and columns corresponding to variables that appear nonlinearly in constraint \(k\), so if constraint \(k\) is sparse, the corresponding Hessian will usually also be sparse. But notice that if constraint \(k\) depends on \(p\) variables, the Hessian may have up to \(p^2\) elements in total or \(p \cdot (p+1)/2\) elements in the lower triangular part of the matrix. There are models where the constraints on average are sparse, but one or a few of the constraints depend on many variables, and their Hessians have a huge number of nonzero elements.
The Lagrangian function is defined as
\[ L(x,u)=\sum_{k}u_{k} \cdot g_{k}(x) \]
where we have ignored some terms that do not contribute to second derivatives. \(u\) is a vector of Lagrange multipliers defined by CONOPT. There is one Lagrange multiplier for each constraint.
The Hessian of the Lagrangian, \(HL\), is defined as:
\[ HL = \sum_{k}u_{k} \cdot H^{k} \quad \text{or} \quad HL_{ij} = \sum_{k}u_{k} \cdot \frac{\partial^{2}g_{k}}{\partial x_{i}\partial x_{j}} \]
The sparsety pattern for the Hessian of the Lagrangian is the union of the sparsety patterns for the Hessians of all constraints in the model. This means that if one of the constraints has a dense Hessian then the Hessian of the Lagrangian will also be dense and CONOPT cannot use a Hessian efficiently with these models.
The directional first derivative of constraint \(k\) in point \(x\) and in the direction \(v\) is defined as
\[ d1_{k}(x,v) = \frac{\partial g_{k}(x+\theta\cdot v)}{\partial \theta} = \frac{\partial g_{k}}{\partial x} \cdot v \]
or the scalar product of the first derivative of \(g_k\) with the direction \(v\). The directional second derivative is defined as
\[ d2_{k}(x,v) = \frac{\partial d1_{k}(x,v)}{\partial x} = \frac{\partial^{2}g_{k}}{\partial x^{2}} \cdot v = H^{k} \cdot v \]
or the product of the Hessian with the direction v. The directional second derivative of the Lagrangian is similarly defined as
\[ HL \cdot v = \sum_{k}u_{k} \cdot H_{k} \cdot v \]
where the order of the multiplications can be selected by the implementer. Notice that in cases where \(H_k\) has too many elements to be of practical direct use, the directional second derivative is still a vector with only one element per variable, and it may be cheap to compute.
Currently, CONOPT accept second derivative information in three forms:
1. The Hessian of the Lagrangian,
2. Directional second derivatives of individual constraints, and
3. Directional second derivatives of the Lagrangian.
The remaining combination, the Hessians of individual constraints, can be implemented if they are more convenient for some users.
CONOPT uses the second order information in a Sequential Quadratic Programming (SQP) algorithm to find good directions for a line search. The SQP model is solved approximately using a conjugate gradient (CG) algorithm. The SQP tolerances are large on the beginning of the optimization and are reduced as we approach the optimal solution. The gradients used in the CG algorithm are either computed by repeated calls to the directional second derivative routine or by a single call to the Hessian routine followed by several multiplications of the Hessian with a direction vector inside CONOPT. If both a Hessian and directional derivatives are available, CONOPT will initially use 2DDir
or 2DDirLagr
when few CG iterations are expected, and switch to 2DLagrVal
when the accuracy is tighter and a larger number of iterations are expected.
The modeler can tell CONOPT that second order information is available as the Hessian of the Lagrangian, assumed to be a sparse matrix, by calling ConoptModelData::setSDLagrangianStructure(). The number of nonzero elements and the sparsety pattern of the Hessian of the Lagrangian are defined in two setup calls to 2DLagrStr
. The numerical values of the Hessian must be returned by calls to ConoptModelData::SDLagrVal(). The numerical calls will be scattered throughout the optimization, usually with no more than one call per iteration.
If the user wants to compute the directional second derivative, this is registered by calling ConoptModelData::setSDEvaluationType() and supplying either ConoptSDEvaluationType::Constraint or ConoptSDEvaluationType::Lagrangian. If ConoptSDEvaluationType::Constraint is set, then the user must implement ConoptModelData::SDDir(). Alternatively, if ConoptSDEvaluationType::Lagrangian is set, then the user must implement ConoptModelData::SDDirLagr(). For the directional derivatives, the user may also implement ConoptModelData::SDDirIni() and ConoptModelData::SDDirEnd() for the setup and teardown of the second derivative evaluation.
|
strong |
void ConoptModelData::setSDEvaluationType | ( | ConoptSDEvaluationType | sdevaltype | ) |
informs CONOPT of the method for evaluating the second derivative
void ConoptModelData::setSDLagrangianStructure | ( | const std::vector< int > & | rownum, |
const std::vector< int > & | colnum ) |
sets the structure of the second derivatives of the Lagrangian
rownum | Vector of row numbers of the lower triangular part of the Hessian. |
colnum | Vector of column numbers of the lower triangular part of the Hessian. |
|
inlinevirtual |
Computes and returns the numerical values of the Hessian.
The structure of the Lagrangian of the Hessian is provided using setSDLagrangianStructure()
where:
MODE
: Distinguishes between the three modes described above.X
: Vector with the point in which the Hessian of the Lagrangian should be computed. Defined by CONOPT when MODE = 3
.U
: The vector of weights on the individual constraints. The Lagrangian is defined as L = SUM(r in rows)U(r) . function(r)
. U
is defined by CONOPT when MODE = 3
.ROWNO
: Vector of row numbers of the lower triangular part of the Hessian. Must be defined by the modeler when MODE = 2
and is provided as a help to the modeler when MODE = 3
.COLNO
: Vector of column numbers of the lower triangular part of the Hessian. Must be defined by the modeler when MODE = 2
and is provided as a help to the modeler when MODE = 3
. The elements of the Hessian must be sorted column wise, i.e. COLNO
must be non-decreasing. Within each column the elements must be sorted row-wise, i.e. the elements of ROWNO
must be increasing for sequences for which COLNO
is constant. The row and column numbers are interpreted according to Base, i.e. they are between 1 and N when Base = 1 (Fortran conventions) and between 0 and N-1 when Base = 0 (C conventions).VALUE
: Vector returning the values of the second derivatives when MODE = 3
. The individual elements must be defined in the order used for ROWNO
and COLNO
. VALUE will be initialized to zero when 2DLagrVal
is called and it must be defined by the modeler.NODRV
: Can be set to 1 if the derivatives for some reason could not be computed, for example because some of them were not defined. Is initialized to 0 by CONOPT. CONOPT will not use second order methods in the current point if NODRV
is 1.N
: The number of variables in the model as defined in ConoptModelData::setProblemDimension. Provided by CONOPT.M
: The number of constraints in the model as defined in ConoptModelData::setProblemDimension. Provided by CONOPT.NHESS
: The number of nonzero elements in the Hessian. When MODE = 1
CONOPT will define the maximum number of elements it will accept in NHESS
, and 2DLagrVal
must return the actual number of element. If the actual number is larger than the input value of NHESS
then CONOPT will not use the Hessian of the Lagrangian and 2DLagrVal
will not be called again. When MODE > 1
NHESS
is defined by CONOPT as the value provided by the modeler in the first call.2DLagrVal
will only be called in points in which the constraint values have been computed successfully before with FDEval
. Checks for function evaluation errors can therefore usually be limited. Note that 2DLagrVal
in some cases can be called in a point that is different from the last point in which FDEval
was called.
The maximum number of Hessian nonzero elements accepted by CONOPT is computed as the option value RVHESS
multiplied by the number of nonlinear Jacobian elements. The default value of RVHESS
is 10. If the Hessian has more than 10 times as many elements as the Jacobian it is expected that it is too expensive to compute and that directional 2nd derivatives in some form are more efficient to use. The modeler may allow CONOPT to use a denser Hessian by increasing RVHESS
.
There is a rudimentary debugger of the values returned by 2DLagrVal
. It can be turned on by setting Debug2D
in coidef_debug2d() or with the option LKDEB2
. Information on the debugger including messages and error return codes can be found in Error Return Codes.
Reimplemented in Tut2_ModelData.
Definition at line 432 of file modeldata.h.
|
inlinevirtual |
computes the directional second derivative for the Lagrangian
where:
X
: Vector with the point in which the directional second derivatives should be computed. Defined by CONOPT.DX
: Vector with the direction in which the directional second derivatives should be computed. Defined by CONOPT.U
: The vector of weights on the individual constraints. The Lagrangian is defined as L = SUM(r in rows)U(row) \cdot function(row)
. U
is defined by CONOPT when MODE = 3
.D2G
: Vector returning the directional second derivative. The relevant positions (see JCNM
below) will be zero on entry to 2DDir
.NEWPT
: Indicator that describes the pair (X,DX)
. Is provided by CONOPT: - <b>1:</b> This is a new point. - <b>2:</b> This is the same point as in last call, i.e. the variables in `X` have not changed but the direction `DX` has changed. `NEWPT` is provided by CONOPT as a service to the modeler. It may be used to calculate and reuse terms that depend on `X` but do not depend on the direction.
NODRV
: Must be set to 1 if the directional second derivatives for some reason could not be computed, for example because some of them were not defined. CONOPT will not use any output from a point that returned a nonzero NODRV
.N
: The number of variables in the model as defined in ConoptModelData::setProblemDimension. Provided by CONOPT.M
: The number of constraints in the model as defined in ConoptModelData::setProblemDimension. Provided by CONOPT.Definition at line 445 of file modeldata.h.
|
inlinevirtual |
computes the directional second derivative for a single constraint
where:
X
: Vector with the point in which the directional second derivatives should be computed. Defined by CONOPT.DX
: Vector with the direction in which the directional second derivatives should be computed. Defined by CONOPT.D2G
: Vector returning the directional second derivative. The relevant positions (see JCNM
below) will be zero on entry to 2DDir
.ROWNO
: The constraint for which the directional second derivatives should be computed. Defined by CONOPT. 2DDir
will be called for one constraint at a time and only for nonlinear constraints.JCNM
: List of column numbers for the variables that appear nonlinearly in the current constraint. JCNM
and NJ
(see below) are provided as a service to modelers that may find this information useful; it can be ignored by others.NODRV
: Must be set to 1 if the directional second derivatives for some reason could not be computed, for example because some of them were not defined. CONOPT will not use any output from a point that returned a nonzero NODRV
.N
: The number of variables in the model as defined in ConoptModelData::setProblemDimension. Provided by CONOPT.NJ
: The number of variables appearing nonlinearly in the current constraint. Defined by CONOPT.THREAD
: In some multi-threading environments multiple copies of FDEval
can be called at the same time, (see Multi Threading). THREAD
will hold the number of the thread for the current call of FDEval
. THREAD
is in the interval from 0 to the maximum number of threads allowed for FDEval
.Definition at line 458 of file modeldata.h.
|
inlinevirtual |
called by CONOPT before a sequence of 2DDir
calls each time either the point or the direction changes.
This method can be used to initialize the 2DDir
computations, for example to compute common terms.
where:
X
: Vector with the point in which the directional second derivatives should be computed in future 2DDir
calls. Defined by CONOPT.DX
: Vector with the direction in which the directional second derivatives should be computed in future 2DDir
calls. Defined by CONOPT.ROWLIST
: List of the constraints that will be evaluated in the future 2DDir
calls.LISTSIZE
: The number of elements in ROWLIST
, defined by CONOPT.NUMTHREAD
: The number of threads to be used for the following 2DDir
calls.NEWPT
: Indicator that describes the pair (X,DX)
. Is provided by CONOPT: - <b>1:</b> This is a new point. - <b>2:</b> This is the same point as in last call, i.e. the variables in `X` have not changed but the direction `DX` has changed. `NEWPT` is provided by CONOPT as a service to the modeler. It may be used to calculate and reuse terms that depend on `X` but do not depend on the direction.
NODRV
: Must be set to 1 if the directional second derivatives for some reason could not be computed, for example because some of them were not defined. CONOPT will not use any output from a point that returned a nonzero NODRV
and 2DDir
and the optional 2DDirEnd
will not be called in this point.N
: The number of variables in the model as defined in ConoptModelData::setProblemDimension. Provided by CONOPT.Definition at line 473 of file modeldata.h.
|
inlinevirtual |
called by CONOPT after a sequence of 2DDir
calls each time either the point or the direction changes.
This method can be used to perform cleanup tasks, including to report if anything went wrong.
where:
NODRV
: Can be set to 1 if the directional second derivatives for some reason could not be computed, for example because some of them were not defined. this is an alternative to the use of NODRV
in 2DDir
. CONOPT will not use any output from a point and direction that returned a nonzero NODRV
.Definition at line 488 of file modeldata.h.