Disclaimer: we will require some familiarity with convex optimization.
The player is given a fixed compact convex decision set . At each round :
We want to have some notion of whether we are doing well. In the above setting simply using we could never do well since the environment can pick a new adversarial function every time. Instead we compare ourselves with the best fixed strategy :
Regret is defined as
This is called the regret as it captures how badly we did in hindsight Ideally we want since then .
To get a feel let’s first try a naive approach. A straight forward idea is what goes by the name of Follow The Leader (FTL) in the machine learning community and ficticious play in economics. We picks
However, there is a simple adversarial strategy against this algorithm for the environment. Consider then the environment picks
So when minimizing the following at every we have
We will end up always picking the that leads to the maximum regret . So we achieve the lower bound of the regret. Can we do better?
Let’s say that we could cheat and play at time — thus avoiding the adversary. For notational convenience we still keep the same definition:
However, now in the regret analysis we incur instead of . We can now redo the regret analysis:
Not surprisingly BTL’s regret is negative. But it provides an interesting insight:
The above is just a simplification of which is immediately obtained by application of the BTL bound.
So is no worse than the difference between and . Intuitively a solution to our problem with FTL is then to make sure the strategy does not change too much — in other words: we want to regularize / stabilize!
We modify our selection
where is the learning rate and is a regularizer.
The regularizer is there to ensure that our choice does not change too much between iterations. To capture this we will rely on being strongly convex. That is,
for some norm .
This will allow us to prove the following notion of stability.
Stability says that
with is the dual norm of defined as .
Let’s define the function we minimizes in the decision selection
By strong convexity of and convexity of we have that This will ultimately let us bound .
RFTL defines so is the optimum and first order characterizationFirst order characterization of convexity says . So when is minimum we have .
So a weaker bound of eq. 1 is
The second line simply applies the same reasoning. We can now sum the two lines and expand the definition of . All terms will cancel out except the last term in and so we get
Now we have the tools to bound the change Hölder’s inequality:
Solving for we get
Equipped with stability and the BTL lemma we almost directly obtain the regret bound for FTRL.
With learning rate we have
By defining we can write the regularized selection as
First note that by the BTL lemma we haveNote that we additionally use since BTL only applies to .
Now we can bound the FTRL regret as followsWe want to use stability so we need to move from to . We do this by getting it on a form on which we can apply the BTL lemma.
Further, if is -Lipschitz constraint (so that the gradient is bounded ) and the optimal learning rateThe optimal is found by simply optimizing the above regret bound which is of the form:
is chosen, then we have
At this point we have good regret bounds but at every iteration we have to minimize a sum of convex functions. This simplifies significantly if we assume that the loss is linear since we can push in the summation:
Now we only have to keep track of a sum of gradients and the minimization problem reduces to a sum of a linear function and the strongly convex regularizer. So we can write the program as:
FTRL with linear loss
It turns out that we can assume linear loss without loss of generality! That is, we can get the same regret bound for convex loss functions even if we run the above algorithm only using the gradient at each . To see this, consider
So we can bound the regret for by bounding the regret for the new problem which only considers its gradient . The bound stays as tight since we already used convexity in our argument for stability.
These notes are primarily based on Prof. Luo’s lectures notes  with minor modifications borrowed from Hazan’s book  and the lecture notes of Prof. Orabona .
 “CSCI 699: Introduction to Online Learning.” [Online]. Available: https://haipeng-luo.net/courses/CSCI699/. [Accessed: 02-Nov-2019]
 E. Hazan, “Introduction to Online Convex Optimization,” Sep. 2019 [Online]. Available: http://arxiv.org/abs/1909.05207. [Accessed: 02-Nov-2019]
 “Introduction to Online Learning,” Parameter-free Learning and Optimization Algorithms, 20-Sep-2019. [Online]. Available: https://parameterfree.com/lecture-notes-on-online-learning/. [Accessed: 02-Nov-2019]