# Acceleration convergence using a potential function

We cover the analysis of acceleration from this paper. The main motivation is to fill in some of the details that has been left out most likely for clarity.

### 1.1 Rewriting in terms of Mirror descent

We can write NAG:

as

where is -smooth, and (See e.g. this paper). This is the scheme we will be analysing.

### 1.2 Convergence

the general strategy is using potential function goes like this:

1. Define a potential function

1. Find bound for .
2. Get convergence by telescoping the difference,

##### Acceleration

This suggests that we will apply primal analysis to and dual to .

• We want to bound the difference which we can expand as

• Using (for euclidean norm) and the update for

we rewrite

Interpretation: we want to be small for to improve the most.

• -smooth and the update rule for gives us

Interpretation: we want to be large for to improve the most.

This lets us bound

By picking the slow step size to be such that the last term disappears this reduces to

• By convexity

• Using that the update is we choose the step size such that the terms cancels

The RHS of the inner product cancels,

• Now we have a bound for . We directly get from eq. 1

So to be less than we have to run for at least iterations. This is a quadratic speed up over vanilla gradient descent on smooth functions which requires steps.

The crucial observation is that we bound in terms of the norm and in terms of the gradient. This technique is further exploited in Mirr+Grad paper for general Mirror maps in the aggressive update, .