# Various ways of writing Nesterov's acceleration

Nestorov’s acceleration is usually presented in the following form (AGM1) where is the gradient Lipschitz parameter of .

The update is known to be notoriously opaque. There are alternatives ways of rewriting the first-order scheme, however, two of which we will cover here.

### 1.1 Acceleration from primal dual perspective

We want to rewrite AGM1 as the following scheme (called AGM2) where is -smooth, and . This is a linear coupling of a small step size update ( ) and a large step size update ( ).

##### Derivation

Notice that the definition for is already the same which leaves us with verifying the update for and . Let’s start by assuming that the correct update for exists This leaves us with checking whether this leads to the right definition of . That is, we need to show that according to the AGM2 update. First we make a guess about the relationship between parameters, Using this after isolating in eq. 1 gives us, Expanding  Using definition of in AGM1, We can cancel terms by subtracting it We can further develop by using the update for , What is ? If it is then we are done. So we are weirdly enough off by one since it should be Source: potential function paper, p. 468.

### 1.2 As momentum

NAG can be written as a momentum scheme where the gradient is queried at “a future point”. ##### Derivation Now, naturally define the momentum as and the parameter for convenience. This lets us write the above as, This definition can be used in the update for  Finally using this form of in the definition for we obtain the desired momentum update Simply writing using the definition of completes the scheme Source: acceleration as momentum (appendix).