Derivative Operators

A guide for derivatives in InfiniteOpt. See the respective technical manual for more details.

Overview

Derivative operators commonly arise in many infinite-dimensional problems, particularly in space-time optimization. InfiniteOpt.jl provides a simple yet powerful interface to model these objects for derivatives of any order, including partial derivatives. Derivatives can be used in defining measures and constraints.

Basic Usage

Derivative operators can be defined a few different ways in InfiniteOpt. To motivate these, let's first define an InfiniteModel along with some parameters and variables:

julia> using InfiniteOpt, Distributions;

julia> model = InfiniteModel();

julia> @infinite_parameter(model, t in [0, 10], 
                           derivative_method = OrthogonalCollocation(3));

julia> @infinite_parameter(model, ξ ~ Uniform(-1, 1));

julia> @variable(model, y, Infinite(t, ξ));

julia> @variable(model, q, Infinite(t));

Notice that we used the derivative_method keyword argument to specify which numerical method will be used to evaluate any derivatives that depend on that infinite parameter t. In this case we, specified to use orthogonal collocation over finite elements using 3 nodes. We'll come back to this just a little further below to more fully describe the various methods we can use.

First, let's discuss how to define derivatives in InfiniteOpt.jl. Principally, this is accomplished via @deriv which will operate on a particular InfiniteOpt expression (containing parameters, variables, and/or measures) with respect to infinite parameters specified with their associated orders. Behind the scenes all the appropriate calculus will be applied, creating derivative variables as needed. For example, we can define the following:

julia> d1 = @deriv(y, t)
∂/∂t[y(t, ξ)]

julia> d2 = @deriv(y, t, ξ)
∂/∂ξ[∂/∂t[y(t, ξ)]]

julia> d3 = @∂(q, t^2)
∂/∂t[∂/∂t[q(t)]]

julia> d_expr = @deriv(y * q - 2t, t)
∂/∂t[y(t, ξ)]*q(t) + ∂/∂t[q(t)]*y(t, ξ) - 2

Thus, we can define derivatives in a variety of forms according to the problem at hand. The last example even shows how the product rule is correctly applied.

Note

For convenience in making more compact code we provide and @∂ as wrappers for deriv and @deriv, respectively.

Also, notice that the appropriate analytic calculus is applied to infinite parameters. For example, we could also compute:

julia> @deriv(3t^2 - 2t, t)
6 t - 2

Conveniently, @deriv can be called within any measure and constraint. However, in certain cases we may need to define an initial guess (initial guess trajectory). This can be accomplished in 2 ways:

  • Call set_start_value_function using the individual derivative (e.g., d1 above)
  • Define the derivative using @variable with the Deriv variable type object and use the start keyword argument.

In either case, a single value can be given or a start value function that will generate a value in accordance with the support values (i.e., following the same syntax as infinite variables). For example, we can specify the starting value of d1 to 0 via the following:

julia> set_start_value_function(d1, 0)

Now let's return to our discussion on derivative evaluation methods. These are the methods that can/will be invoked to transcript the derivatives when solving the model. The methods native to InfiniteOpt are described in the table below:

MethodTypeNeeded Boundary ConditionsCreates Supports
FiniteDifferenceForwardFinal & optional InitialNo
FiniteDifferenceCentralInitial & FinalNo
FiniteDifferenceBackwardInitial & optional FinalNo
OrthogonalCollocationGaussLobattoInitialYes

Here the default method is backward finite difference. These are enforced on an infinite parameter basis (i.e., the parameter the differential operator is taken with respect to). Thus, in the above examples any derivatives taken with respect to t will use orthogonal collocation on finite elements since that is what we specified as our derivative method. More information is provided in the Derivative Methods Section below. However, we note here that set_derivative_method can be invoked anytime after parameter definition to specify/modify the derivative method used. More conveniently, we can call set_all_derivative_methods:

julia> set_all_derivative_methods(model, FiniteDifference(Forward()))
Warning

InfiniteOpt does not ensure proper boundary conditions are provided by the user. Thus, it is imperative that the user ensure these are provided appropriately with the derivative evaluation method that is used. We recommend specifying such conditions via a constraint that uses Restricted Variables. For example:

@constraint(model, initial_condition, y(0) == 42)

Advanced Definition

This section will detail the inner-workings and more advanced details behind defining derivatives in InfiniteOpt.

Manual Definition

The workflow for derivative definition mirrors that of variable definition as summarized in the following steps:

  1. Define the variable information via a JuMP.VariableInfo.
  2. Build the derivative using build_derivative.
  3. Add the derivative to the model via add_derivative.

To exemplify this process, let's first define appropriate variable information:

julia> info = VariableInfo(true, 0., true, 42., false, 0., false, 0., false, false);

More detailed information on JuMP.VariableInfo is provided in the Variable Definition Methodology section.

Warning

Instances of JuMP.VariableInfo used to define derivatives should have info.binary = false and info.integer = false, since most derivative evaluation methods require that derivatives be continuous.

Now that we have our variable information we can make a derivative using build_derivative:

julia> d = build_derivative(error, info, y, ξ);

julia> d isa Derivative
true

Here the argument variable can be an infinite variable, semi-infinite variable, derivative, or measure that depends on the infinite parameter provided. This will error to the contrary.

Now we can add the derivative to the model via add_derivative which will add the Derivative object and return GeneralVariableRef pointing to it that we can use in InfiniteOpt expressions:

julia> dref = add_derivative(model, d)
∂/∂ξ[y(t, ξ)]

This will also create any appropriate information based constraints (e.g., lower bounds).

Finally, we note that higher order derivatives are made by simply nesting this process.

Macro Definition

There are two macros we provide for defining derivatives: @variable that uses the Deriv variable type and @deriv.

Warning

The @derivative_variable macro used by previous versions of InfiniteOpt is now discontinued in favor of using @variable with the Deriv variable type object.

First, @variable simply automates the process described above in a manner inspired the by the syntax of the variable macros. As such it will support all the same keywords and constraint syntax used with the variable macros. For example, we can define the derivative $\frac{\partial^2 y(t, \xi)}{\partial t^2}$ using d1 (defined in the a Basic Usage section) enforcing a lower bound of 1 with an initial guess of 0 and assign it to an alias GeneralVariableRef called dydt2:

julia> @variable(model, dydt2 >= 1, Deriv(d1, t), start = 0)
dydt2(t, ξ)

This will also support anonymous definition and multi-dimensional definition. Please see Macro Variable Definition for more information.

Warning

The same derivative should not be redefined with multiple @variable calls and using @variable to define derivatives should be avoided on derivatives that were already defined. This is because the latest @variable will overwrite any existing properties a derivative might already have.

Second, for more convenient definition we use @deriv (or @∂) as shown in the Basic Usage section above. Unlike @variable this can handle any InfiniteOpt expression as the argument input. It also can build derivatives that depend on multiple infinite parameters and/or are taken to higher orders. This is accomplished via recursive derivative definition, handling the nesting as appropriate. For example, we can "define" $\frac{\partial^2 y(t, \xi)}{\partial t^2}$ again:

julia> @deriv(d1, t)
dydt2(t, ξ)

julia> @deriv(y, t^2)
dydt2(t, ξ)

Notice that the derivative references all point to the same derivative object we defined up above with its alias name dydt2. This macro can also tackle complex expressions using the appropriate calculus such as:

julia> @deriv(∫(y, ξ) * q, t)
∂/∂t[∫{ξ ∈ [-1, 1]}[y(t, ξ)]]*q(t) + ∂/∂t[q(t)]*∫{ξ ∈ [-1, 1]}[y(t, ξ)]

Thus, demonstrating the convenience of using @deriv.

With all this in mind, we recommend using @deriv as the defacto method, but then using @variable as a convenient way to specify information constraints and an initial guess value/trajectory.

Derivative Evaluation

In this section, we detail how derivatives are evaluated in InfiniteOpt to then be used in reformulating the model for solution.

Theory

To motivate the principles behind numerical derivative evaluation/transcription, let's first consider the initial value problem:

\[\frac{d y(t)}{dt} = f(t, y(t)), \ \ \ y(t_0) = y_0\]

With a finite support set $\{t_0, t_1, \dots, t_k\}$ we can numerically approximate the value of $\frac{d y(t_n)}{dt}$ at each time point $t_n$ via the Euler method (i.e., forward finite difference). We thus obtain a system of equations:

\[\begin{aligned} &&& y(t_{n+1}) = y(t_n) + (t_{n+1} - t_n) \frac{d y(t_n)}{dt}, && \forall n = 0, \dots, k-1\\ &&& \frac{d y(t_n)}{dt} = f(t_n, y(t_n)), && \forall n = 0, \dots, k \\ &&& y(t_0) = y_0 \end{aligned}\]

Thus, we obtain 3 sets of equations:

  1. constraint transcriptions
  2. auxiliary derivative equations
  3. boundary conditions.

In the case above, we could reduce the number of equations by substituting out the point derivatives in the constraint transcriptions since we have explicit relationships in the auxiliary equations. However, this is not possible in general, such as when we encounter more complex partial differential equations.

Thus, in InfiniteOpt derivatives are treated as variables which can be contained implicitly in constraints and/or measures. This allows us to support implicit dependencies and higher order derivatives. This means that when the model is reformulated, its constraints and measures can be reformulated as normal (treating any derivative dependencies as variables). We then can apply the appropriate derivative evaluation technique to derive the necessary set of auxiliary derivative equations to properly characterize the derivative variables. This can be formalized as:

\[\begin{aligned} &&& f_j(y(\lambda), Dy(\lambda)) \leq 0, && \forall j \in J, \lambda \in \Lambda \\ &&& h_i(y(\lambda), Dy(\lambda)) = 0, && \forall i \in I, \lambda \in \Lambda \\ &&& g_k(y(\hat{\lambda}), Dy(\hat{\lambda})) = 0, && \forall k \in K, \hat{\lambda} \in \hat{\Lambda} \end{aligned}\]

where $y(\lambda)$ and $Dy(\lambda)$ denote all the variables and derivatives in the problem and $\lambda$ denote all the problem's infinite parameters. With this let the constraints $f_j$ denote the problem constraints which can contain any variables, parameters, derivatives, and/or measures associated with the problem. The constraints $h_i$ denote the auxiliary derivative equations formed by the appropriate numerical method to implicitly define the behavior of the derivative variables present in $f_j$. Finally, the necessary boundary conditions are provided in the constraints $g_k$.

Note that this general paradigm captures a wide breadth of problems and derivative evaluation techniques. Higher order derivatives are dealt with naturally since such techniques can be applied to nested derivative operators recursively. For example, consider the second-order partial derivative:

\[\frac{\partial^2 y(t, \xi)}{\partial t^2} = \frac{\partial}{\partial t}\left(\frac{\partial y(t, \xi)}{\partial t}\right)\]

The 2 forms are equivalent thus when we apply the Euler method we obtain the following auxiliary equations:

\[\begin{aligned} &&& y(t_{n+1}, \xi) = y(t_n, \xi) + (t_{n+1} - t_n) \frac{\partial y(t_n, \xi)}{\partial t}, && \forall \xi \in \mathcal{D}_\xi, n = 0, \dots, k-1\\ &&& \frac{\partial y(t_{n+1}, \xi)}{\partial t} = \frac{\partial y(t_n, \xi)}{\partial t} + (t_{n+1} - t_n) \frac{\partial^2 y(t_n, \xi)}{\partial t^2}, && \forall \xi \in \mathcal{D}_\xi, n = 0, \dots, k-1\\ \end{aligned}\]

In the section below we detail the derivative evaluation methods that InfiniteOpt natively implements.

Derivative Methods

As discussed briefly above in the Basic Usage section, we natively employ 4 derivative methods in InfiniteOpt (see the table in that section for a summary).

These methods are defined in association with individual infinite parameters and will be applied to any derivatives that are taken with respect to that parameter. These methods are specified via the derivative_method keyword argument in the @infinite_parameter macro and can also be defined by invoking set_derivative_method or set_all_derivative_methods:

julia> set_derivative_method(t, FiniteDifference(Forward()))

In this example, we set t's derivative evaluation method to use forward finite difference. This will also reset any changes that were made with the old method (e.g., removing old collocation points). Now let's describe the ins and outs of these methods.

The first class of methods pertain to finite difference techniques. The syntax for specifying these techniques is described in FiniteDifference and exemplified here:

julia> FiniteDifference(Forward(), true)
FiniteDifference{Forward}(Forward(), true)

where the first argument indicates the type of finite difference we wish to employ and the second argument indicates if this method should be enforced on boundary points. By default, we have FiniteDifference(Backward(), true) which is the default for all infinite parameters.

Forward finite difference (i.e., explicit Euler) is exemplified by approximating first order derivative $\frac{d y(t)}{dt}$ via

\[y(t_{n+1}) = y(t_n) + (t_{n+1} - t_{n})\frac{d y(t_n)}{dt}, \ \forall n = 0, 1, \dots, k-1\]

Note that in this case, the boundary relation corresponds to $n = 0$ and would be included if we set FiniteDifference(Forward(), true) or would be excluded if we let the second argument be false. We recommend, selecting false when an initial condition is provided. Also, note that a terminal condition should be provided when using this method since an auxiliary equation for the derivative at the terminal point cannot be made. Thus, if a terminal condition is not given terminal point derivative will be a free variable.

Central finite difference is exemplified by approximating the first order derivative $\frac{d y(t)}{dt}$ via

\[y(t_{n+1}) = y(t_{n-1}) + (t_{n+1} - t_{n-1})\frac{d y(t_n)}{dt}, \ \forall n = 1, 2, \dots, k-1\]

Note that this form cannot be invoked at $n = 0$ or $n = k$ and cannot an equation at either boundary. With this in mind the syntax is FiniteDifference(Central()) where the second argument is omitted since it doesn't apply to this scheme. As a result both initial and terminal conditions should be specified otherwise the derivatives at those points will be free variables.

Backward finite difference (i.e., implicit Euler) is our last (and default) finite difference method and is exemplified by approximating the first order derivative $\frac{d y(t)}{dt}$ via

\[y(t_{n}) = y(t_{n-1}) + (t_{n} - t_{n-1})\frac{d y(t_{n})}{dt}, \ \forall n = 1, 2, \dots, k\]

Here the boundary case corresponds to $n = k$ and would be included if we set FiniteDifference(Backward(), true) (the default) or excluded if we set the second argument to false. We recommend, selecting false when a terminal condition is provided. Also, note that an initial condition should always be given otherwise the derivative at the first point will be free.

Finally, we employ orthogonal collocation on finite elements via the OrthogonalCollocation object (please refer to it in the manual for complete syntax details). In general terms, this technique fits an $m$ degree polynomial to each finite element (i.e., sequential support pair) and this fit is done via $m+1$ collocation nodes (supports) which include the finite element supports along with $m-1$ additional internal collocation nodes chosen at orthogonal points to the polynomial. The typical syntax for specifying this method is OrthogonalCollocation(num_nodes) where num_nodes indicates the number collocation nodes to be used for each finite element. For example, we can specify to use 3 collocation nodes (i.e., 1 internal node per finite element) corresponding to a 2nd degree polynomial via

julia> OrthogonalCollocation(3)
OrthogonalCollocation{GaussLobatto}(3, GaussLobatto())

Notice that the 2nd attribute is GaussLobatto which indicates that we are using collocation nodes selected via Lobatto quadrature. This is currently the only supported technique employed by OrthogonalCollocation although more may be added in future versions. Please note that an initial condition must be provided otherwise the corresponding derivative will be free variable. For more information on orthogonal collocation over finite elements, this page provides a good reference.

Other methods can be employed via user-defined extensions. Please visit our Extensions page for more information.

User-Invoked Evaluation

Typically, derivative evaluation is handled when the model is reformulated in such a way that the InfiniteModel is unmodified such that modifications and repeated solutions can be done efficiently and seamlessly. This is also the recommended workflow. However, we do provide user accessible derivative evaluation methods that generate the auxiliary derivative equations and add them to the InfiniteModel. This can be useful for visualizing how these techniques work and can be helpful for user-defined reformulation extensions (i.e., optimizer model extensions).

We can build these relations for a particular derivative via evaluate. For example, let's build evaluation equations for d1:

julia> d1 
∂/∂t[y(t, ξ)]

julia> fill_in_supports!(t, num_supports = 3) # add supports first

julia> evaluate(d1)

julia> derivative_constraints(d1)
2-element Vector{InfOptConstraintRef}:
 5 ∂/∂t[y(t, ξ)](5, ξ) - y(10, ξ) + y(5, ξ) = 0, ∀ ξ ~ Uniform
 5 ∂/∂t[y(t, ξ)](0, ξ) - y(5, ξ) + y(0, ξ) = 0, ∀ ξ ~ Uniform

Note that we made sure t had supports first over which we could carry out the evaluation, otherwise an error would have been thrown. Moreover, once the evaluation was completed we were able to access the auxiliary equations via derivative_constraints.

We can also, add the necessary auxiliary equations for all the derivatives in the model if we call evaluate_all_derivatives!:

julia> fill_in_supports!(ξ, num_supports = 4) # add supports first

julia> evaluate_all_derivatives!(model)

julia> derivative_constraints(dydt2)
2-element Vector{InfOptConstraintRef}:
 5 dydt2(5, ξ) - ∂/∂t[y(t, ξ)](10, ξ) + ∂/∂t[y(t, ξ)](5, ξ) = 0, ∀ ξ ~ Uniform
 5 dydt2(0, ξ) - ∂/∂t[y(t, ξ)](5, ξ) + ∂/∂t[y(t, ξ)](0, ξ) = 0, ∀ ξ ~ Uniform

Finally, we note that once derivative constraints have been added to the InfiniteModel any changes to the respective infinite parameter sets, supports, or derivative method will necessitate the deletion of these auxiliary constraints and a warning will be thrown to indicate such:

julia> derivative_constraints(d1)
2-element Vector{InfOptConstraintRef}:
 5 ∂/∂t[y(t, ξ)](5, ξ) - y(10, ξ) + y(5, ξ) = 0, ∀ ξ ~ Uniform
 5 ∂/∂t[y(t, ξ)](0, ξ) - y(5, ξ) + y(0, ξ) = 0, ∀ ξ ~ Uniform

julia> add_supports(t, 0.2)
┌ Warning: Support/method changes will invalidate existing derivative evaluation constraints that have been added to the InfiniteModel. Thus, these are being deleted.
└ @ InfiniteOpt ~/work/infiniteopt/InfiniteOpt.jl/src/scalar_parameters.jl:783

julia> has_derivative_constraints(d1)
false

Query Methods

Here we describe the various query techniques that we can employ on derivatives in InfiniteOpt.

Basic Queries

First, let's overview the basic object inquiries: derivative_argument, operator_parameter, derivative_method, and name:

julia> derivative_argument(dydt2) # get the variable the derivative operates on
∂/∂t[y(t, ξ)]

julia> operator_parameter(dydt2) # get the parameter the operator is taken with respect to
t

julia> derivative_method(dydt2) # get the numerical derivative evaluation method
FiniteDifference{Forward}(Forward(), true)

julia> name(dydt2) # get the name if there is one
"dydt2"

These all work as exemplified above. We note that derivative_method simply queries the derivative method associated with the operator parameter.

Derivatives also inherit all the usage methods employed by infinite variables. For example:

julia> is_used(d1)
true

julia> used_by_measure(dydt2)
false

julia> used_by_semi_infinite_variable(d2)
true

Also, since derivatives are analogous to infinite variables, they inherit many of the same queries including parameter_refs:

julia> parameter_refs(d1)
(t, ξ)

julia> parameter_refs(derivative_argument(d1))
(t, ξ)

Since derivatives simply inherit their infinite parameter dependencies from the argument variable, the above lines are equivalent.

Variable Information

Again, since derivatives are essentially a special case of infinite variables, they inherit all the same methods for querying variable information. For example, consider the following queries:

julia> has_lower_bound(dydt2)
true

julia> lower_bound(dydt2)
1.0

julia> LowerBoundRef(dydt2)
dydt2(t, ξ) ≥ 1, ∀ t ∈ [0, 10], ξ ~ Uniform

julia> has_upper_bound(dydt2)
false 

julia> func = start_value_function(dydt2);

Model Queries

We can also determine the number of derivatives a model contains and obtain a list of them via num_derivatives and all_derivatives, respectively:

julia> num_derivatives(model)
7

julia> all_derivatives(model)
7-element Vector{GeneralVariableRef}:
 ∂/∂t[y(t, ξ)]
 ∂/∂ξ[∂/∂t[y(t, ξ)]]
 ∂/∂t[q(t)]
 ∂/∂t[∂/∂t[q(t)]]
 ∂/∂ξ[y(t, ξ)]
 dydt2(t, ξ)
 ∂/∂t[∫{ξ ∈ [-1, 1]}[y(t, ξ)]]

Modification Methods

In this section, we'll highlight some of the modification methods that can be used on derivatives in InfiniteOpt.

Variable Information

As discussed above, derivatives inherit the same variable methods as infinite variables. Thus, we can modify/delete bounds and starting values for derivatives using the same methods. For example:

julia> set_lower_bound(dydt2, 0)

julia> lower_bound(dydt2)
0.0

julia> set_upper_bound(dydt2, 2)

julia> upper_bound(dydt2)
2.0

julia> fix(dydt2, 42, force = true)

julia> fix_value(dydt2) 
42.0

julia> set_start_value_function(dydt2, (t, xi) -> t + xi)

julia> unfix(dydt2)

Deletion

Finally, there are 2 deletion methods we can employ apart from deleting variable information. First, we can employ delete_derivative_constraints to delete any derivative evaluation constraints associated with a particular derivative:

julia> delete_derivative_constraints(d2)

julia> has_derivative_constraints(d2)
false

Lastly, we can employ delete to delete a particular derivative and all its dependencies:

julia> delete(model, d2)

julia> is_valid(model, d2)
false