Mathematical background for PDLP

This page contains mathematical background for developers and advanced users of PDLP, a linear and quadratic programming solver available in OR-Tools. It serves as reference for parts of the code and is not intended to be read on its own. Interested readers should first familiarize themselves with the paper, "Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient", then look at the code, then come back to this document when the code references it.

Primal

PDLP considers the following convex quadratic programming problem,

$$ \begin{align} \min_x & \, c^Tx + \frac{1}{2}x^TQx \\ \text{s.t.}\; & l^{c} \le Ax \le u^{c} \\ & l^{v} \le x \le u^{v} \end{align} $$

where $A$ is an $m \times n$ matrix and $Q$ is a symmetric $n \times n$ positive semidefinite matrix. The upper bound vectors $u^{c}$ and $u^{v}$ have entries in $\mathbb{R} \cup { \infty}$, and the lower bound vectors $l^{c}$ and $l^{v}$ have entries in $\mathbb{R} \cup { -\infty}$. We assume $l^{c} \le u^{c}$ and $l^{v} \le u^{v}$.

Dual

Let $a \in \mathbb{R}$. Let $[a]_+$ denote its positive part and $[a]_-$ denote its negative part, i.e., $a = [a]_+ - [a]_-$. When applied to a vector, the positive and negative parts are computed element-wise.

The dual of the above primal problem is over $x \in \mathbb{R}^n$, $y \in \mathbb{R}^m$, and $r \in \mathbb{R}^n$. The vector $y$ contains dual multipliers on the linear constraints ($l^{c} \le Ax \le u^{c}$). The vector $r$, called the reduced costs, contains dual multipliers on the bound constraints ($l^{v} \le x \le u^{v}$).

$$ \begin{align} \max_{x, y, r} & \, -\frac{1}{2}x^TQx + \left((l^{c})^T[y]_+ - (u^{c})^T[y]_- \right) + \left((l^{v})^T[r]_+ - (u^{v})^T[r]_- \right) \\ \text{s.t.}\; & Qx + c - A^Ty = r \end{align} $$

When $Q = 0$, we can drop $x$ from the dual, and we recover LP duality.

Dual variable bounds

We say that $y$ satisfies the dual variable bounds if the $y$-term in the objective is finite, that is:

$$ y_i \geq 0 \qquad \text{if }u^c_i = \infty, \\ y_i \leq 0 \qquad \text{if }l^c_i = -\infty. $$

Derivation via conjugate duality

Preliminaries

Let $a \in \mathbb{R} \cup {-\infty}$ and $b \in \mathbb{R} \cup {\infty}$ with $b \ge a$ and consider the interval $[a, b] \subseteq \mathbb{R} \cup {-\infty, \infty}$.

Let $\mathcal{I}_{[a, b]} : \mathbb{R} \to \mathbb{R} \cup { \infty}$ be the indicator function of the interval, i.e., $\mathcal{I}_{[a, b]}(x)$ is zero when $x \in [a, b]$ and $\infty$ otherwise.

Define $p(y; a, b): \mathbb{R} \to \mathbb{R} \cup {\infty}$ as:

$$ p(y; a, b) = \begin{cases} ay & \text{ if } y < 0 \\ 0 & \text{ if } y = 0 \\ by & \text{ if } y > 0 \end{cases}. $$

When $a$ or $b$ are infinite, follow standard extended real arithmetic.

Basic result: \(p(y; a, b) = (\mathcal{I}_{[a, b]})^*(y)\) where $(\cdot)^*$ denotes the convex conjugate.

For vectors $l \subseteq (\mathbb{R} \cup {-\infty})^n$ and $u \subseteq (\mathbb{R} \cup {\infty})^n$, the indicator function $\mathcal{I}_{[l, u]} : \mathbb{R}^n \to \mathbb{R} \cup { \infty}$ is defined analogously, and we define $p(y; l, u) = (\mathcal{I}_{[l, u]})^*(y) = \sum_{i=1}^n p(y_i; l_i, u_i)$.

Derivation

Introducing auxiliary variables $\tilde a \in \mathbb{R}^m$ and $\tilde x \in \mathbb{R}^n$, we restate the primal problem as:

$$ \begin{align} \min_{x, \tilde x, \tilde a} & \, c^Tx + \frac{1}{2}x^TQx + \mathcal{I}_{[l^c,u^c]}(\tilde a) + \mathcal{I}_{[l^v,u^v]}(\tilde x) \\ \text{s.t.}\; & \tilde a = Ax \\ & \tilde x = x \end{align} $$

Dualizing the equality constraints, we obtain:

$$ \min_{x, \tilde x, \tilde a} \max_{y, r} c^Tx + \frac{1}{2}x^TQx + y^T\tilde a - y^TAx + r^T\tilde x - r^Tx + \mathcal{I}_{[l^c,u^c]}(\tilde a) + \mathcal{I}_{[l^v,u^v]}(\tilde x) $$

Exchanging min/max and regrouping:

$$ \max_{y, r} \min_{x, \tilde x, \tilde a} c^Tx + \frac{1}{2}x^TQx - y^TAx - r^Tx + \left( \mathcal{I}_{[l^c,u^c]}(\tilde a) + y^T\tilde a \right) + \left(\mathcal{I}_{[l^v,u^v]}(\tilde x) + r^T\tilde x \right) $$

The joint minimization over $x$, $\tilde x$, and $\tilde a$ decomposes. For $x$ we see that a minimizer, if one exists, satisfies $Qx + c - A^Ty = r$, in which case the minimum value is $-\frac{1}{2} x^TQx$. For $\tilde x$ and $\tilde a$ we apply the definition of convex conjugates with minor adjustments for signs.

This yields the dual:

$$ \begin{align} \max_{x, y, r} & \, -\frac{1}{2}x^TQx - p(-y, l^c, u^c) - p(-r, l^v, u^v) \\ \text{s.t.}\; & Qx + c - A^Ty = r \end{align} $$

By expanding the definition of $p$, we obtain the dual stated at the top.

Saddle-point formulation

Primal-dual hybrid gradient (see Chambolle and Pock) targets a primal problem of the form

$$ \begin{align} \min_x f(x) + g(x) + h(Kx) \end{align} $$

which by conjugate duality is equivalent to the saddle-point problem

$$ \begin{align} \min_x \max_y f(x) + g(x) + y^TKx - h^*(y) \end{align} $$

PDLP coerces the convex quadratic programming problem into this form by setting:

  • $f(x) = 0$
  • $g(x) = c^T x + \frac{1}{2} x^T Q x + \mathcal{I}_{[l^v, u^v]}(x)$
  • $h(a) = \mathcal{I}_{[-u^c,-l^c]}(a)$
  • $K = -A$

As derived above, \(h^*(y) = p(y; -u^c,-l^c)\) is a piecewise linear convex function. Both $g$ and $h^*$ can take infinite values, which effectively limits the domains of $x$ and $y$ respectively.

Note that the proximal operator of $g$ is computable in closed form under PDLP's assumption that $Q$ is diagonal, using the fact that $g$ is separable and the following property that holds for any functions \(f_1, f_2\):

$$ f_2(t) = f_1(t) + \frac{\mu}{2} t^2 \Rightarrow \mathrm{prox}_{\lambda f_2}(t) = \mathrm{prox}_{\frac{\lambda}{1 + \lambda \mu} f_1}\left( \frac{t}{1 + \lambda \mu} \right). $$

For a proof of this fact, see e.g., Theorem 6.13 in First-Order Methods in Optimization. The resulting expression is given by

$$ \begin{equation} \mathrm{prox}_{\lambda g}(x) = \mathrm{proj}_{[l^v, u^v]}\left( (I + \lambda Q)^{-1} (x - \lambda c) \right) \end{equation} $$

Reduced costs, dual residuals, and the corrected dual objective

The saddle point formulation only explicitly works with $(x,y)$; the reduced costs $r$ are implicit. In order to return $(x,y,r)$ when solving the saddle-point formulation, we choose an $r$ that approximately satisfies $Qx + c - A^Ty = r$, that is,

$$ Qx + c - A^T y = r + \tilde{r} $$

where $\tilde{r}$ are the dual residuals and $r$ are the reduced costs. Without this allocation of $Qx + c - A^T y$ between dual residuals and reduced costs, the objective value of the dual problem would be $-\infty$ whenever there is a non-zero reduced cost on an infinite bound. The term corrected dual objective is used to identify the objective value of the dual problem when $\tilde{r} = 0$. Thus the corrected dual objective always gives a lower bound on the objective value, but is often $-\infty$.

Rescaling

Suppose we are given a diagonal (column) rescaling matrix $C$ and a diagonal (row) rescaling matrix $R$ with positive entries on the diagonal. Applying rescaling as in ShardedQuadraticProgram::RescaleQuadraticProgram, we obtain the following transformed problem:

$$ \begin{align} \min_{\tilde x} & \, (Cc)^T{\tilde x} + \frac{1}{2}{\tilde x}^T(CQC){\tilde x} \\ \text{s.t.}\; & Rl^{c} \le RAC\tilde x \le Ru^{c} \\ & C^{-1}l^{v} \le \tilde x \le C^{-1}u^{v} \end{align} $$

A solution to the original problem is recovered as $x = C\tilde x$. If $\tilde y$ is a dual solution and $\tilde r$ are reduced costs for the transformed problem, then $y = R\tilde y$ is a dual solution and $r = C^{-1}\tilde r$ are reduced costs for the original problem (derivation omitted).

Infeasibility identification

A certificate of primal infeasibility is a point $(y, r) \in \mathbb{R}^m \times \mathbb{R}^n$ satisfying:

$$ \begin{equation} \left((l^{c})^T[y]_+ - (u^{c})^T[y]_- \right) + \left((l^{v})^T[r]_+ - (u^{v})^T[r]_- \right) > 0 \\ -A^T y = r . \end{equation} $$

The existence of such a point implies that the primal problem doesn't have a solution.

Similarly, a certificate of dual infeasibility is a point $x \in \mathbb{R}^n$ such that:

$$ \begin{equation} Qx = 0 \\ c^T x < 0 \\ (Ax)_i \begin{cases} = 0 & \text{if }l^c_i , u^c_i \in \mathbb{R}, \\ \geq 0 & \text{if }l^c_i \in \mathbb{R}, u^c_i = \infty, \\ \leq 0 & \text{if }l^c_i = -\infty, u^c_i \in \mathbb{R}, \end{cases} \\ x_i \begin{cases} = 0 & \text{if }l^v_i , u^v_i \in \mathbb{R}, \\ \geq 0 & \text{if }l^v_i \in \mathbb{R}, u^v_i = \infty, \\ \leq 0 & \text{if }l^v_i = -\infty, u^v_i \in \mathbb{R}. \end{cases} \end{equation} $$

Note that the certificates for a linear program can be obtained by setting $Q=0$.