Proximal gradient method

Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems. Many interesting problems can be formulated as convex optimization problems of form

\operatorname{minimize}_{x \in \mathbb{R}^N} \qquad f_1(x) +f_2(x) + \cdots+ f_{n-1}(x) +f_n(x)

where $f_1, f_2, ..., f_n$ are convex functions defined from $f: \mathbb{R}^N \rightarrow \mathbb{R}$ where some of the functions are non-differentiable, this rules out our conventional smooth optimization techniques like Steepest descent method, conjugate gradient method etc. There is a specific class of algorithms which can solve above optimization problem. These methods proceed by splitting, in that the functions $f_1, . . . , f_n$ are used individually so as to yield an easily implementable algorithm. They are called proximal because each non smooth function among $f_1, . . . , f_n$ is involved via its proximity operator. Iterative Shrinkage thresholding algorithm, projected Landweber, projected gradient, alternating projections, alternating-direction method of multipliers, alternating split Bregman are special instances of proximal algorithms. Details of proximal methods are discussed in Combettes and Pesquet.^[1] For the theory of proximal gradient methods from the perspective of and with applications to statistical learning theory, see proximal gradient methods for learning.

Notations and terminology

Let $\mathbb{R}^N$ , the $N$ -dimensional euclidean space, be the domain of the function $f: \mathbb{R}^N \rightarrow [-\infty,+\infty]$ . Suppose $C$ is the non-empty convex subset of $\mathbb{R}^N$ . Then, the indicator function of $C$ is defined as

i_C : x \mapsto \begin{cases} 0 & \text{if } x \in C \\ + \infty & \text{if } x \notin C \end{cases}

p

-norm is defined as (

\| \cdot \|_p

)

\|x\|_p = ( |x_1|^p + |x_2|^p + \cdots + |x_N|^p )^{1/p}

The distance from $x \in \mathbb{R}^N$ to $C$ is defined as

D_C(x) = \min_{y \in C} \|x - y\|

If $C$ is closed and convex, the projection of $x \in \mathbb{R}^N$ onto $C$ is the unique point $P_Cx \in C$ such that $D_C(x) = \| x - P_Cx \|_2$ .

The subdifferential of $f$ is given by

\partial f : \mathbb{R}^N \rightarrow 2^{\mathbb{R}^N} : x \mapsto \{ u \in \mathbb{R}^N \mid \forall y \in \mathbb{R}^N, (y-x)^\mathrm{T}u+f(x) \leq f(y)).\}

Projection onto convex sets (POCS)

One of the widely used convex optimization algorithms is POCS (Projection Onto Convex Sets). This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints. Let $f_i$ be the indicator function of non-empty closed convex set $C_i$ modeling a constraint. This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection of all convex sets $C_i$ . In POCS method each set $C_i$ is incorporated by its projection operator $P_{C_i}$ . So in each iteration $x$ is updated as

x_{k+1} = P_{C_1} P_{C_2} \cdots P_{C_n} x_k

However beyond such problems projection operators are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, proximity operators are best suited for other purposes.

Definition

Proximity operators of function $f$ at $x$ is defined as

For every $x \in \mathbb{R}^N$ , the minimization problem

\text{minimize}_{y \in C} \qquad f(y) + \frac{1}{2} \left\| x-y \right\|_2^2

admits a unique solution which is denoted by $\operatorname{prox}_f(x)$ .

\operatorname{prox}_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N

The proximity operator of $f$ is characterized by inclusion

p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) \qquad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)

If $f$ is differentiable then above equation reduces to

p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \nabla f(p) \quad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)

Examples

Special instances of Proximal Gradient Methods are

Projected Landweber
Alternating projection
Alternating-direction method of multipliers
Fast Iterative Shrinkage Thresholding Algorithm (FISTA)^[2]

References

Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
Combettes, Patrick L.; Pesquet, Jean-Christophe (2011). Springer's Fixed-Point Algorithms for Inverse Problems in Science and Engineering 49. pp. 185–212.

Notes

↑ Combettes, Patrick L.; Pesquet, Jean-Christophe (2009). "Proximal Splitting Methods in Signal Processing". p. {11–23}.
↑ "Beck, A; Teboulle, M (2009). "A fast iterative shrinkage-thresholding algorithm for linear inverse problems". SIAM J. Imaging Science 2. pp. 183–202.

External links

Stephen Boyd and Lieven Vandenberghe Book, Convex optimization
EE364a: Convex Optimization I and EE364b: Convex Optimization II, Stanford course homepages
EE227A: Lieven Vandenberghe Notes Lecture 18