# Various ways of writing Nesterov's acceleration

Posted on June 4, 2020
Nestorov’s acceleration is usually presented in the following form (AGM1)

where is the gradient Lipschitz parameter of .

The update is known to be notoriously opaque. There are alternatives ways of rewriting the first-order scheme, however, two of which we will cover here.

### 1.1 Acceleration from primal dual perspective

We want to rewrite AGM1 as the following scheme (called AGM2)

where is -smooth, and . This is a linear coupling of a small step size update () and a large step size update ().

##### Derivation

Notice that the definition for is already the same which leaves us with verifying the update for and . Let’s start by assuming that the correct update for exists

This leaves us with checking whether this leads to the right definition of . That is, we need to show that according to the AGM2 update. First we make a guess about the relationship between parameters,

Using this after isolating in eq. 1 gives us,

Expanding

Using definition of in AGM1,

We can cancel terms by subtracting it

We can further develop by using the update for ,

What is ? If it is then we are done.

So we are weirdly enough off by one since it should be …

Source: potential function paper, p. 468.

### 1.2 As momentum

NAG can be written as a momentum scheme where the gradient is queried at “a future point”.

##### Derivation

We start with AGM1

Now, naturally define the momentum as and the parameter for convenience. This lets us write the above as,

This definition can be used in the update for

Finally using this form of in the definition for we obtain the desired momentum update

Simply writing using the definition of completes the scheme

Source: acceleration as momentum (appendix).