Short bio#
I am interested in understanding the structure underlying deep learning models and use this to build algorithms that learn efficiently and robustly. Of particular importance to me is how to make these systems learn continuously under changed environments and in multi-agent settings. In my work, I am typically on the look-out for where and why training instabilities occur, since these often help identify the barriers to both efficiency and robustness.
I strongly believe that deep learning does not need to be an art, but that it can be made into a scientific discipline through both theoretical and empirical work. Interestingly, in recent years we have seen a closing of the theory-practice gap in optimization for deep learning that is slowly leading to a unification across both training paradigms and model families such as RNNs, GANs, Transformers, and diffusion models. These developments leave many opportunities, which I am excited to explore and exploit.
I received my PhD from EPFL under Prof. Volkan Cevher, where I was broadly interested in optimization for machine learning with a focus on stable training of deep learning models. During my studies, I interned with Amazon and ETH Zürich.
Selected publications#
See publications for other publications and Google Scholar for the most up to date version.
Training Deep Learning Models with Norm-Constrained LMOs
Thomas Pethick, Wanyun Xie, Kimon Antonakopoulos, Zhenyu Zhu, Antonio Silveti-Falls and Volkan Cevher
International Conference on Machine Learning (ICML) 2025 (spotlight)
paper code tweet
Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs
Thomas Pethick, Ioannis Mavrothalassitis and Volkan Cevher
International Conference on Learning Representations (ICLR) 2025
paper tweet
SAMPa: Sharpness-aware Minimization Parallelized
Wanyun Xie, Thomas Pethick and Volkan Cevher
Neural Information Processing Systems (NeurIPS) 2024
paper code tweet
Stable nonconvex-nonconcave training via linear interpolation
Thomas Pethick, Wanyun Xie, Volkan Cevher
Neural Information Processing Systems (NeurIPS) 2023 (spotlight)
paper code tweet
Solving stochastic weak Minty variational inequalities without increasing batch size
Thomas Pethick, Olivier Fercoq, Puya Latafat, Panagiotis Patrinos and Volkan Cevher
International Conference on Learning Representations (ICLR) 2023
paper code tweet
Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems
Thomas Pethick, Puya Latafat, Panagiotis Patrinos, Olivier Fercoq, Volkan Cevher
The International Conference on Learning Representations (ICLR) 2022 (spotlight)
paper code tweet
Content#
A geometric view on optimization
Online learning
Talks
Provably beneficial artificial intelligence by Stuart Russell
From causal inference to autoencoders and gene regulation by Caroline Uhler
Tidbits
All the posts can also be found in chronological order in the archive.
Open source#
Some of the projects I worked on prior to the PhD:
Scalable Gaussian Processes for Economic Models. This codebase can be used to run high-dimensional scalable Gaussian Processes on Economic Models on a High Performance Computing cluster.
Ensembled Deep Network for Global Optimization. This project explores the behavior of an ensembled variant of the architecture proposed by (Snoek et al 2015) on various Bayesian Optimization benchmark problems.
Prolog code generation from Isabelle’s inner syntax. This project compiles a theorem prover written and proven with Isabelle and compiles it into Prolog. It does so in Haskell through several catamorphism that changes the Isabelle AST into a Prolog AST.
CampusNet Sync. A Dropbox like inspired app to sync your computer with the filesystem used at the Technical University of Denmark.
Anki Onenote importer. Allows one to import
.mhtfiles exported from OneNote into Anki.
… and more on Github including this site which was originally build by Hakyll with some added \(\text{\LaTeX}\) goods. I have since moved to the Executable Book Project for a well-maintained codebase with many of the same features.