Nestorov’s acceleration is usually presented in the following form (AGM1)
where is the gradient Lipschitz parameter of .
The update is known to be notoriously opaque. There are alternatives ways of rewriting the first-order scheme, however, two of which we will cover here.
We want to rewrite AGM1 as the following scheme (called AGM2)
where is -smooth, and . This is a linear coupling of a small step size update () and a large step size update ().
Notice that the definition for is already the same which leaves us with verifying the update for and . Let’s start by assuming that the correct update for exists
This leaves us with checking whether this leads to the right definition of . That is, we need to show that according to the AGM2 update. First we make a guess about the relationship between parameters,
Using this after isolating in eq. 1 gives us,
Using definition of in AGM1,
We can cancel terms by subtracting it
We can further develop by using the update for ,
What is ? If it is then we are done.
So we are weirdly enough off by one since it should be …
NAG can be written as a momentum scheme where the gradient is queried at “a future point”.
We start with AGM1
Now, naturally define the momentum as and the parameter for convenience. This lets us write the above as,
This definition can be used in the update for
Finally using this form of in the definition for we obtain the desired momentum update
Simply writing using the definition of completes the scheme
Source: acceleration as momentum (appendix).