Proximal gradient method

Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems. Many interesting problems can be formulated as convex optimization problems of form

\operatorname {min} \limits _{x\in \mathbb {R} ^{N}}f_{1}(x)+f_{2}(x)+\cdots +f_{n-1}(x)+f_{n}(x)

where $f_{1},f_{2},...,f_{n}$ are convex functions defined from $f:{\mathbb {R}}^{N}\rightarrow {\mathbb {R}}$ where some of the functions are non-differentiable, this rules out our conventional smooth optimization techniques like Steepest descent method, conjugate gradient method etc. There is a specific class of algorithms which can solve the above optimization problem. These methods proceed by splitting, in that the functions $f_{1},...,f_{n}$ are used individually so as to yield an easily implementable algorithm. They are called proximal because each non smooth function among $f_{1},...,f_{n}$ is involved via its proximity operator. Iterative Shrinkage thresholding algorithm, projected Landweber, projected gradient, alternating projections, alternating-direction method of multipliers, alternating split Bregman are special instances of proximal algorithms. Details of proximal methods are discussed in Combettes and Pesquet.^[1] For the theory of proximal gradient methods from the perspective of and with applications to statistical learning theory, see proximal gradient methods for learning.

Notations and terminology

Let $\mathbb {R} ^{N}$ , the $N$ -dimensional euclidean space, be the domain of the function $f: \mathbb{R}^N \rightarrow (-\infty,+\infty]$ . Suppose $C$ is a non-empty convex subset of $\mathbb {R} ^{N}$ . Then, the indicator function of $C$ is defined as

i_{C}:x\mapsto {\begin{cases}0&{\text{if }}x\in C\\+\infty &{\text{if }}x\notin C\end{cases}}

p

-norm is defined as (

\|\cdot \|_{p}

)

\|x\|_{p}=(|x_{1}|^{p}+|x_{2}|^{p}+\cdots +|x_{N}|^{p})^{{1/p}}

The distance from $x\in {\mathbb {R}}^{N}$ to $C$ is defined as

D_{C}(x)=\min _{y\in C}\|x-y\|_{2}

If $C$ is closed and convex, the projection of $x\in {\mathbb {R}}^{N}$ onto $C$ is the unique point $P_{C}x\in C$ such that $D_{C}(x)=\|x-P_{C}x\|_{2}$ .

The subdifferential of $f$ is given by

\partial f=\{u\in \mathbb {R} ^{N}\mid \forall y\in \mathbb {R} ^{N},(y-x)^{\mathrm {T} }u+f(x)\leq f(y).\}

Projection onto convex sets (POCS)

One of the widely used convex optimization algorithms is POCS (Projection Onto Convex Sets). This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints. Let $f_{i}$ be the indicator function of non-empty closed convex set $C_{i}$ modeling a constraint. This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection of all convex sets $C_{i}$ . In POCS method each set $C_{i}$ is incorporated by its projection operator $P_{{C_{i}}}$ . So in each iteration $x$ is updated as

x_{{k+1}}=P_{{C_{1}}}P_{{C_{2}}}\cdots P_{{C_{n}}}x_{k}

However beyond such problems projection operators are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, proximity operators are best suited for other purposes.

Definition

The proximity operator of a convex function $f$ at $x$ is defined as the unique solution to

\operatorname {argmin} \limits _{y}f(y)+{\frac {1}{2}}\left\|x-y\right\|_{2}^{2}

and is denoted $\operatorname {prox}_{f}(x)$ .

\operatorname {prox}_{f}(x):{\mathbb {R}}^{N}\rightarrow {\mathbb {R}}^{N}

Note that in the specific case where $f$ is the indicator function $i_{C}$ of some convex set $C$

{\begin{aligned}\operatorname {prox} _{i_{C}}(x)&=\operatorname {argmin} \limits _{y}{\begin{cases}{\frac {1}{2}}\left\|x-y\right\|_{2}^{2}&{\text{if }}y\in C\\+\infty &{\text{if }}y\notin C\end{cases}}\\&=\operatorname {argmin} \limits _{y\in C}{\frac {1}{2}}\left\|x-y\right\|_{2}^{2}\\&=P_{C}(x)\end{aligned}}

showing that the proximity operator is indeed a generalisation of the projection operator.

The proximity operator of $f$ is characterized by inclusion

p=\operatorname {prox}_{f}(x)\Leftrightarrow x-p\in \partial f(p)\qquad (\forall (x,p)\in {\mathbb {R}}^{N}\times {\mathbb {R}}^{N})

If $f$ is differentiable then above equation reduces to

p=\operatorname{prox}_f(x) \Leftrightarrow x-p = \nabla f(p) \quad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)

Examples

Special instances of Proximal Gradient Methods are

Projected Landweber
Alternating projection
Alternating-direction method of multipliers
Fast Iterative Shrinkage Thresholding Algorithm (FISTA)^[2]

References

Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
Combettes, Patrick L.; Pesquet, Jean-Christophe (2011). Springer's Fixed-Point Algorithms for Inverse Problems in Science and Engineering. 49. pp. 185–212.

Notes

↑ Combettes, Patrick L.; Pesquet, Jean-Christophe (2009). "Proximal Splitting Methods in Signal Processing". arXiv:0912.3522 .
↑ "Beck, A; Teboulle, M (2009). "A fast iterative shrinkage-thresholding algorithm for linear inverse problems". SIAM J. Imaging Science. 2. pp. 183–202.

External links

Stephen Boyd and Lieven Vandenberghe Book, Convex optimization
EE364a: Convex Optimization I and EE364b: Convex Optimization II, Stanford course homepages
EE227A: Lieven Vandenberghe Notes Lecture 18

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.