Proximal gradient method

Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems. Many interesting problems can be formulated as convex optimization problems of form


\operatorname{minimize}_{x \in \mathbb{R}^N} \qquad f_1(x) +f_2(x) + \cdots+ f_{n-1}(x) +f_n(x)

where f_1, f_2, ..., f_n are convex functions defined from f: \mathbb{R}^N  \rightarrow \mathbb{R} where some of the functions are non-differentiable, this rules out our conventional smooth optimization techniques like Steepest descent method, conjugate gradient method etc. There is a specific class of algorithms which can solve above optimization problem. These methods proceed by splitting, in that the functions f_1, . . . , f_n are used individually so as to yield an easily implementable algorithm. They are called proximal because each non smooth function among f_1, . . . , f_n is involved via its proximity operator. Iterative Shrinkage thresholding algorithm, projected Landweber, projected gradient, alternating projections, alternating-direction method of multipliers, alternating split Bregman are special instances of proximal algorithms. Details of proximal methods are discussed in Combettes and Pesquet.[1] For the theory of proximal gradient methods from the perspective of and with applications to statistical learning theory, see proximal gradient methods for learning.

Notations and terminology

Let \mathbb{R}^N, the N-dimensional euclidean space, be the domain of the function  f: \mathbb{R}^N  \rightarrow [-\infty,+\infty]. Suppose C is the non-empty convex subset of \mathbb{R}^N. Then, the indicator function of C is defined as

 i_C : x \mapsto
\begin{cases}
0        &  \text{if } x \in C \\
+ \infty &  \text{if } x \notin C 
\end{cases}
p-norm is defined as ( \| \cdot \|_p )

\|x\|_p = ( |x_1|^p + |x_2|^p + \cdots + |x_N|^p )^{1/p}

The distance from x \in \mathbb{R}^N to C is defined as


D_C(x) =  \min_{y \in C} \|x - y\|

If C is closed and convex, the projection of x \in \mathbb{R}^N onto C is the unique point P_Cx \in C such that D_C(x) = \| x - P_Cx \|_2 .

The subdifferential of f is given by


 \partial f : \mathbb{R}^N \rightarrow 2^{\mathbb{R}^N} : x \mapsto \{ u \in \mathbb{R}^N \mid \forall y \in \mathbb{R}^N, (y-x)^\mathrm{T}u+f(x) \leq f(y)).\}

Projection onto convex sets (POCS)

One of the widely used convex optimization algorithms is POCS (Projection Onto Convex Sets). This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints. Let f_i be the indicator function of non-empty closed convex set C_i modeling a constraint. This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection of all convex sets C_i. In POCS method each set C_i is incorporated by its projection operator P_{C_i}. So in each iteration x is updated as


x_{k+1} = P_{C_1}  P_{C_2} \cdots P_{C_n} x_k

However beyond such problems projection operators are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, proximity operators are best suited for other purposes.

Definition

Proximity operators of function f at x is defined as

For every x \in \mathbb{R}^N , the minimization problem


\text{minimize}_{y \in C} \qquad f(y) + \frac{1}{2} \left\| x-y \right\|_2^2

admits a unique solution which is denoted by \operatorname{prox}_f(x).


 \operatorname{prox}_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N

The proximity operator of f is characterized by inclusion


p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) \qquad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)

If f is differentiable then above equation reduces to


p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \nabla f(p) \quad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)

Examples

Special instances of Proximal Gradient Methods are

See also

References

Notes

  1. Combettes, Patrick L.; Pesquet, Jean-Christophe (2009). "Proximal Splitting Methods in Signal Processing". p. {11–23}.
  2. "Beck, A; Teboulle, M (2009). "A fast iterative shrinkage-thresholding algorithm for linear inverse problems". SIAM J. Imaging Science 2. pp. 183–202.

External links