Exercises
The update of Newton's method computes $A^{-1}b$. The most intuitive way of writing this is to use inv(A) * b
, which first computes the inverse of A
and then multiplies it with a vector. However, this approach has several disadvantages:
- Specialized algorithms for solving the linear system $Ax=b$ cannot be used.
- When
A
is sparse, this inverse is dense and additional memory is needed to store the dense matrix.
For these reasons, the linear system of equations is solved by A \ b
, which calls specialized algorithms.
Use the package BenchmarkTools
to benchmark both possibilities.
Solution:
We first create a random matrix A
and a random vector b
.
using BenchmarkTools
n = 1000
A = randn(n,n)
b = randn(n)
We first verify that both possibilities result in the same number.
julia> using LinearAlgebra
julia> norm(inv(A)*b - A \ b)
9.321906736594836e-12
We benchmark the first possibility.
julia> @btime inv($A)*($b)
71.855 ms (6 allocations: 8.13 MiB)
We benchmark the second possibility.
julia> @btime ($A) \ ($b)
31.126 ms (4 allocations: 7.64 MiB)
The second possibility is faster and has lower memory requirements.
Similarly to Newton's method, the bisection method is primarily designed to solve equations by finding their zero points. It is only able to solve equations $f(x)=0$ where $f:\mathbb{R}\to\mathbb{R}$. It starts with an interval $[a,b]$ where $f$ has opposite values $f(a)f(b)<0$. Then it selects the middle point on $[a,b]$ and halves the interval so that the new interval again satisfies the constraint on opposite signs $f(a)f(b)<0$. This is repeated until the function value is small or until the interval has a small length.
Implement the bisection method and use it to minimize $f(x) = x^2 - x$ on $[-1,1]$. During the implementation, do not evaluate $f$ unless necessary.
Solution:
First, we write the bisection method. We initialize it with arguments $f$ and the initial interval $[a,b]$. We also specify the optional tolerance. First, we save the function value fa = f(a)
to not need to recompute it every time. The syntax fa == 0 && return a
is a bit complex. Since &&
is the "and" operator, this first checks whether fa == 0
is satisfied, and if so, it evaluates the second part. However, the second part exits the function and returns a
. Since we need to have $f(a)f(b)<0$, we check this condition, and if it is not satisfied, we return an error message. Finally, we run the while loop, where every iteration halves the interval. The condition on opposite signs is enforced in the if condition inside the loop.
function bisection(f, a, b; tol=1e-6)
fa = f(a)
fb = f(b)
fa == 0 && return a
fb == 0 && return b
fa*fb > 0 && error("Wrong initial values for bisection")
while b-a > tol
c = (a+b)/2
fc = f(c)
fc == 0 && return c
if fa*fc > 0
a = c
fa = fc
else
b = c
fb = fc
end
end
return (a+b)/2
end
This implementation is efficient in the way that only one function evaluation is needed per iteration. The price to pay are additional variables fa
, fb
and fc
.
To use the bisection method to minimize a function $f(x)$, we use it find the solution of the optimality condition $f'(x)=0$.
f(x) = x^2 - x
g(x) = 2*x - 1
x_opt = bisection(g, -1, 1)
The correct solution is
0.5
The library to perform optimization is called JuMP
. Install it, go briefly through its documentation, and use it to solve the linear optimization problem
\[\begin{aligned} \text{minimize}\qquad &x_1 + x_2 + x_5 \\ \text{subject to}\qquad &x_1+2x_2+3x_3+4x_4+5x_5 = 8, \\ &x_3+x_4+x_5 = 2, \\ &x_1+x_2 = 2. \end{aligned}\]
Solution:
The best start is the official documentation of the JuMP package. Since JuMP
is only an interface for solvers, we need to include an actual solver as well. For linear programs, we can use using GLPK
, for non-linear ones, we would need to use using Ipopt
. We specify the constraints in a matrix form. It is possible to write them directly via @constraint(model, x[1] + x[2] == 2)
. This second way is more pleasant for complex constraints. Since x
is a vector, we need to use value.(x)
instead of the wrong value(x)
.
using JuMP
using GLPK
A = [1 2 3 4 5; 0 0 1 1 1; 1 1 0 0 0]
b = [8; 2; 2]
c = [1; 1; 0; 0; 1]
n = size(A, 2)
model = Model(GLPK.Optimizer)
@variable(model, x[1:n] >= 0)
@objective(model, Min, c'*x)
@constraint(model, A*x .== b)
optimize!(model)
x_val = value.(x)
The correct solution is
[2.0, 0.0, 2.0, 0.0, 0.0]
Derive the SQP method for optimization problem with only equality constraints
\[\begin{aligned} \text{minimize}\qquad &f(x) \\ \text{subject to}\qquad &h_j(x) = 0, j=1,\dots,J. \end{aligned}\]
SQP writes the Karush-Kuhn-Tucker optimality conditions and then applies Newton's method to solve the resulting system of equations.
Apply the obtained algorithm to
\[\begin{aligned} \text{minimize}\qquad &\sum_{i=1}^{10} ix_i^4 \\ \text{subject to}\qquad &\sum_{i=1}^{10} x_i = 1. \end{aligned}\]
Verify that the numerically obtained solution is correct.
Solution:
The Lagrangian reads
\[L(x,\mu) = f(x) + \sum_{j=1}^J\mu_j h_j(x).\]
Since there are no inequality constraints, the optimality conditions contain no complementarity and read
\[\begin{aligned} \nabla f(x) + \sum_{j=1}^J\mu_j \nabla h_j(x) &= 0,\\ h_j(x) &= 0, \end{aligned}\]
The Newton method's at iteration $k$ has some pair $(x^k,\mu^k)$ and performs the update
\[\begin{pmatrix} x^{k+1} \\ \mu^{k+1} \end{pmatrix} = \begin{pmatrix} x^{k} \\ \mu^{k} \end{pmatrix} - \begin{pmatrix} \nabla^2 f(x^k) + \sum_{j=1}^J \mu_j^k \nabla^2 h_j(x^k) & \nabla h(x^k) \\ \nabla h(x^k)^\top & 0 \end{pmatrix}^{-1} \begin{pmatrix} \nabla f(x^k) + \sum_{j=1}^J\mu_j^k \nabla h_j(x^k) \\ h(x^k) \end{pmatrix}. \]
We define functions $f$ and $h$ and their derivates and Hessians for the numerical implementation. The simplest way to create a diagonal matrix is Diagonal
from the LinearAlgebra
package. It can be, of course, done manually as well.
using LinearAlgebra
n = 10
f(x) = sum((1:n) .* x.^4)
f_grad(x) = 4*(1:n).*x.^3
f_hess(x) = 12*Diagonal((1:n).*x.^2)
h(x) = sum(x) - 1
h_grad(x) = ones(n)
h_hess(x) = zeros(n,n)
To implement SQP, we first randomly generate initial $x$ and $\mu$ and then write the procedure derived above. Since we update $x$ in a for loop, we need to define it as a global
variables; otherwise, it will be a local variable, and the global (outside of the loop) will not update. We can write inv(A)*b
or the more efficient A\b
. To subtract from $x$, we use the shortened notation x -= ?
, which is the same as x = x - ?
.
x = randn(n)
μ = randn()
for i in 1:100
global x, μ
A = [f_hess(x) + μ*h_hess(x) h_grad(x); h_grad(x)' 0]
b = [f_grad(x) + μ*h_grad(x); h(x)]
step = A \ b
x -= step[1:n]
μ -= step[n+1]
end
The need to differentiate global and local variables in scripts is one reason why functions should be used as much as possible.
To validate, we need to verify the optimality and the feasibility; both need to equal zero. These are the same as the b
variable. However, we cannot call b
directly, as it is inside the for loop and therefore local only.
julia> f_grad(x) + μ*h_grad(x)
10-element Vector{Float64}: 0.0 0.0 3.469446951953614e-18 6.938893903907228e-18 -3.469446951953614e-18 3.469446951953614e-18 0.0 0.0 -3.469446951953614e-18 0.0
julia> h(x)
0.0
The correct solution is
[0.1608, 0.1276, 0.1115, 0.1013, 0.094, 0.0885, 0.084, 0.0804, 0.0773, 0.0746]
Show that the primal formulation for a problem with no inequalities is equivalent to the min-max formulation.
Solution:
The primal problem with no inequalities reads
\[\begin{aligned} \text{minimize}\qquad &f(x) \\ \text{subject to}\qquad &h_j(x) = 0,\ j=1,\dots,J. \end{aligned}\]
The Lagrangian has form
\[L(x;\lambda,\mu) = f(x) + \sum_{j=1}^J \mu_j h_j(x).\]
Now consider the min-max formulation
\[\operatorname*{minimize}_x\quad \operatorname*{maximize}_{\mu}\quad f(x) + \sum_{j=1}^J \mu_j h_j(x).\]
If $h_j(x)\neq 0$, then it is simple to choose $\mu_j$so that the inner maximization problem has the optimal value $+\infty$. However, since the outer problem minimizes the objective, the value of $+\infty$ is irrelevant. Therefore, we can ignore all points with $h_j(x)\neq 0$ and prescribe $h_j(x)=0$ as a hard constraint. That is precisely the primal formulation.
Derive the dual formulation for the linear programming.
Solution:
The linear program
\[\begin{aligned} \text{minimize}\qquad &c^\top x \\ \text{subject to}\qquad &Ax=b, \\ &x\ge 0 \end{aligned}\]
has the Lagrangian
\[L(x;\lambda,\mu) = c^\top x - \lambda^\top x + \mu^\top (b-Ax) = (c - \lambda - A^\top\mu)^\top x + b^\top \mu.\]
We need to have $- \lambda^\top x$ because we require constraints $g(x)\le 0$ or in other words $-x\le 0$. The dual problem from its definition reads
\[\operatorname*{maximize}_{\lambda\ge0, \mu} \quad \operatorname*{minimize}_x \quad (c - \lambda - A^\top\mu)^\top x + b^\top \mu.\]
Since the minimization with respect to $x$ is unconstrained, the same arguments as the previous exercise imply the hard constraint $c - \lambda - A^\top\mu=0$. Then we may simplify the dual problem into
\[\begin{aligned} \text{maximize}\qquad &b^\top \mu \\ \text{subject to}\qquad &c - \lambda - A^\top\mu = 0, \\ &\lambda\ge 0. \end{aligned}\]
From this formulation, we may remove $\lambda$ and obtain $A^\top \mu\le c$. This is the desired dual formulation.