*Disclaimer: we will require some familiarity with convex optimization*.

The player is given a fixed compact convex decision set . At each round :

- Player pick .
- The environment picks a convex loss function .
- The player then gets access to and suffers loss .

We want to have some notion of whether we are doing well. In the above setting simply using we could never do well since the environment can pick a new adversarial function every time. Instead we compare ourselves with the best fixed strategy :

Regret is defined as

This is called the *regret* as it captures how badly we did in hindsight Ideally we want since then .

.

To get a feel let’s first try a naive approach. A straight forward idea is what goes by the name of Follow The Leader (FTL) in the machine learning community and *ficticious* play in economics. We picks

However, there is a simple adversarial strategy against this algorithm for the environment. Consider then the environment picks

So when minimizing the following at every we have

We will end up always picking the that leads to the maximum regret . So we achieve the lower bound of the regret. Can we do better?

Let’s say that we could cheat and play at time — thus avoiding the adversary. For notational convenience we still keep the same definition:

However, now in the regret analysis we incur instead of . We can now redo the regret analysis:

Not surprisingly BTL’s regret is negative. But it provides an interesting insight:

The above is just a simplification of which is immediately obtained by application of the BTL bound.

So is no worse than the difference between and . Intuitively a solution to our problem with FTL is then to make sure the strategy does not change too much — in other words: we want to regularize / stabilize!

We modify our selection

where is the learning rate and is a regularizer.

The regularizer is there to ensure that our choice does not change too much between iterations. To capture this we will rely on being strongly convex. That is,

for some norm .

This will allow us to prove the following notion of *stability*.

Stability says that

with is the dual norm of defined as .

Let’s define the function we minimizes in the decision selection

By strong convexity of and convexity of we have that This will ultimately let us bound .

RFTL defines so is the optimum and first order characterization**First order characterization** of convexity says . So when is minimum we have .

becomes

So a weaker bound of eq. 1 is

The second line simply applies the same reasoning. We can now sum the two lines and expand the definition of . All terms will cancel out except the last term in and so we get

Now we have the tools to bound the change **Hölder’s inequality**:

.

.

Solving for we get

Equipped with stability and the BTL lemma we almost directly obtain the regret bound for FTRL.

With learning rate we have

where .

By defining we can write the regularized selection as

First note that by the BTL lemma we haveNote that we additionally use since BTL only applies to .

with

Now we can bound the FTRL regret as followsWe want to use stability so we need to move from to . We do this by getting it on a form on which we can apply the BTL lemma.

Further, if is -Lipschitz constraint (so that the gradient is bounded ) and the optimal learning rateThe optimal is found by simply optimizing the above regret bound which is of the form:

is chosen, then we have

At this point we have good regret bounds but at every iteration we have to minimize a sum of convex functions. This simplifies significantly if we assume that the loss is linear since we can push in the summation:

Now we only have to keep track of a sum of gradients and the minimization problem reduces to a sum of a linear function and the strongly convex regularizer. So we can write the program as:

FTRL with linear loss

It turns out that we can assume linear loss without loss of generality! That is, we can get the same regret bound for convex loss functions even if we run the above algorithm only using the gradient at each . To see this, consider

So we can bound the regret for by bounding the regret for the new problem which only considers its gradient . The bound stays as tight since we already used convexity in our argument for *stability*.

*These notes are primarily based on Prof. Luo’s lectures notes [1] with minor modifications borrowed from Hazan’s book [2] and the lecture notes of Prof. Orabona [3]*.

[1] “CSCI 699: Introduction to Online Learning.” [Online]. Available: https://haipeng-luo.net/courses/CSCI699/. [Accessed: 02-Nov-2019]

[2] E. Hazan, “Introduction to Online Convex Optimization,” Sep. 2019 [Online]. Available: http://arxiv.org/abs/1909.05207. [Accessed: 02-Nov-2019]

[3] “Introduction to Online Learning,” *Parameter-free Learning and Optimization Algorithms*, 20-Sep-2019. [Online]. Available: https://parameterfree.com/lecture-notes-on-online-learning/. [Accessed: 02-Nov-2019]