Proximal Gradient Methods

From Wikipedia, the free encyclopedia

Convex optimization is a sub field of optimization which can produce reliable solutions and can be solved exactly. Many signal processing problems can be formulated as convex optimization problems of form

${\begin{aligned}&\operatorname {minimize}_{{x\in {\mathbb {R}}^{N}}}&&f_{1}(x)+f_{2}(x)+...+f_{{n-1}}(x)+f_{n}(x)\end{aligned}}$

where $f_{1},f_{2},...,f_{n}$ are convex functions defined from $f:{\mathbb {R}}^{N}\rightarrow {\mathbb {R}}$ where some of the functions are non-differentiable, this rules out our conventional smooth optimization techniques like Steepest decent method, conjugate gradient method etc. There is a specific class of algorithms which can solve above optimization problem. These methods proceed by splitting, in that the functions $f_{1},...,f_{n}$ are used individually so as to yield an easily implementable algorithm. They are called proximal because each non smooth function among $f_{1},...,f_{n}$ is involved via its proximity operator. Iterative Shrinkage thresholding algorithm, projected Landweber, projected gradient, alternating projections, alternating-direction method of multipliers, alternating split Bregman are special instances of proximal algorithms. Details of proximal methods are discussed in Combettes et.al.^[1] For the theory of proximal gradient methods from the perspective of and with applications to statistical learning theory, see proximal gradient methods for learning.

Notations and Terminology

Let ${\mathbb {R}}^{N}$ , the $N$ -dimensional euclidean space, be the domain of the function $f:{\mathbb {R}}^{N}\rightarrow [-\infty ,+\infty ]$ . Suppose $C$ is the non-empty convex subset of ${\mathbb {R}}^{N}$ . Then, the indicator function of $C$ is defined as

$i_{C}:x\mapsto \left\lbrace {\begin{aligned}&0&{\mbox{if }}x\in C\\&+\infty &{\mbox{if }}x\notin C\end{aligned}}\right.$

$p$ -norm is defined as ( $\|.\|_{{p}}$ )

$\|x\|_{p}=(|x_{1}|^{p}+|x_{2}|^{p}+\dots +|x_{N}|^{p})^{{{\frac {1}{p}}}}$

The distance from $x\in {\mathbb {R}}^{N}$ to $C$ is defined as

$D_{C}(x)=\min _{{y\in C}}\|x-y\|$

If $C$ is closed and convex, the projection of $x\in {\mathbb {R}}^{N}$ onto $C$ is the unique point $P_{C}x\in C$ such that $D_{C}(x)=\|x-P_{C}x\|_{2}$ .

The subdifferential of $f$ is given by

$\partial f:{\mathbb {R}}^{N}\rightarrow 2^{{{\mathbb {R}}^{N}}}:x\mapsto \{u\in {\mathbb {R}}^{N}|\forall y\in {\mathbb {R}}^{N},(y-x)^{{\mathrm {T}}}u+f(x)\leq f(y)).\}$

Projection onto Convex Sets (POCS)

One of the widely used convex optimization algorithm is POCS (Projection Onto Convex Sets). This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints. Let $f_{i}$ be the indicator function of non-empty closed convex set $C_{i}$ modeling a constraint. This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection of all convex sets $C_{i}$ . In POCS method each set $C_{i}$ is incorporated by its projection operator $P_{{C_{i}}}$ . So in each iteration $x$ is updated as

$x_{{k+1}}=P_{{C_{1}}}P_{{C_{2}}}...P_{{C_{n}}}x_{k}$

However beyond such problems projection operators are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, proximity operators are best suited for other purposes.

Definition

Proximity operators of function $f$ at $x$ is defined as

For every $x\in {\mathbb {R}}^{N}$ , the minimization problem

${\begin{aligned}&{\text{minimize}}_{{y\in C}}&f(y)+{\frac {1}{2}}\left\|x-y\right\|_{2}^{2}&\end{aligned}}$

admits a unique solution which is denoted by $\operatorname {prox}_{f}(x)$ .

$\operatorname {prox}_{f}(x):{\mathbb {R}}^{N}\rightarrow {\mathbb {R}}^{N}$

The proximity operator of $f$ is characterized by inclusion

${\begin{aligned}&p=\operatorname {prox}_{f}(x)\Leftrightarrow x-p\in \partial f(p)&(\forall (x,p)\in {\mathbb {R}}^{N}\times {\mathbb {R}}^{N})\end{aligned}}$

If $f$ is differentiable then above equation reduces to

${\begin{aligned}&p=\operatorname {prox}_{f}(x)\Leftrightarrow x-p\in \triangledown f(p)&(\forall (x,p)\in {\mathbb {R}}^{N}\times {\mathbb {R}}^{N})\end{aligned}}$

Examples

Special instances of Proximal Gradient Methods are

Basis Pursuit
Projected Landweber
Alternating Projection
Alternating-direction method of multipliers
Fast Iterative Shrinkage Thresholding Algorithm (FISTA)^[2]

References

Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
Combettes, Patrick L.; Pesquet, Jean-Christophe (2011). Springer's Fixed-Point Algorithms for Inverse Problems in Science and Engineering 49. pp. 185–212.

Notes

↑ Combettes, Patrick L.; Pesquet, Jean-Chritophe (2009). "Proximal Splitting Methods in Signal Processing". p. {11–23}.
↑ "Beck, A; Teboulle, M (2009). "A fast iterative shrinkage-thresholding algorithm for linear inverse problems". SIAM J. Imaging Science 2. pp. 183–202.

External links

Stephen Boyd and Lieven Vandenberghe Book, Convex optimization
EE364a: Convex Optimization I and EE364b: Convex Optimization II, Stanford course homepages
EE227A: Lieven Vandenberghe Notes Lecture 18

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.