# Various ways of writing Nesterov's acceleration

Nestorov’s acceleration is usually presented in the following form (AGM1)

where is the gradient Lipschitz parameter of .

The update is known to be notoriously opaque. There are alternatives ways of rewriting the first-order scheme, however, two of which we will cover here.

### 1.1 Acceleration from primal dual perspective

We want to rewrite AGM1 as the following scheme (called AGM2)

where is -smooth, and . This is a linear coupling of a small step size update () and a large step size update ().

##### Derivation

Notice that the definition for is already the same which leaves us with verifying the update for and . Let’s start by assuming that the correct update for exists

This leaves us with checking whether this leads to the right definition of . That is, we need to show that according to the AGM2 update. First we make a guess about the relationship between parameters,

Using this after isolating in eq. 1 gives us,

Expanding

Using definition of in AGM1,

We can cancel terms by subtracting it

We can further develop by using the update for ,

What is ? If it is then we are done.

So we are weirdly enough off by one since it should be

Source: potential function paper, p. 468.

### 1.2 As momentum

NAG can be written as a momentum scheme where the gradient is queried at “a future point”.