Jonas Geiping

Tübingen, Germany

ELLIS Institute

Maria-von-Linden Straße 2

Hi, I’m Jonas. I am a Machine Learning researcher in Tübingen, where I lead the research group for safety- & efficiency- aligned learning (🦭). Before this, I’ve spent time at the Universities of Maryland, Siegen and Münster.

I am constantly fascinated by questions of safety and efficiency in modern machine learning. There are a number of fundamental machine learning questions that come up in these topics that we still do not understand well. On the safety side, I investigate how models can be manipulated through data poisoning, jailbreaks, and adversarial attacks. I’m curious about watermarking for generative models, privacy guarantees in machine learning, and the challenge of defining “safety” in a meaningful technical way. Are there feasible technical solutions that reduce harm?

For efficiency, I study how we can build systems that do more with less, from weight averaging techniques to recursive computation approaches that extend model capabilities. I’m particularly interested in how these systems reason, and whether we can enhance their reasoning abilities while maintaining efficiency. How do we build mechanisms that let these models learn to be intelligent systems? At the core of my research is this intersection: Can we make models that reason well without sacrificing safety? How do computational constraints affect safety guarantees? Can we design systems where intelligence and safety reinforce each other?

In short:

Safety, Security and Privacy in Machine Learning
Efficient Machine Learning (especially in Language Modeling)
Understanding Reasoning in Intelligent Systems
Deep Learning as-a-Science

Incoming PhD Students:

If you are interested in these topics, feel free to reach out for more information! I’m admitting PhD students on a yearly basis through the following PhD programs:

For more details, make sure to read the openings page carefully.

Selected Publications

2025

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein

arxiv:2502.05171[cs], Feb 2025

Abs

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

2024

Coercing LLMs to Do and Reveal (Almost) Anything

Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, and Tom Goldstein

arxiv:2402.14020[cs], Feb 2024

Abs

It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements. In this work, we argue that the spectrum of adversarial attacks on LLMs is much larger than merely jailbreaking. We provide a broad overview of possible attack surfaces and attack goals. Based on a series of concrete examples, we discuss, categorize and systematize attacks that coerce varied unintended behaviors, such as misdirection, model control, denial-of-service, or data extraction. We analyze these attacks in controlled experiments, and find that many of them stem from the practice of pre-training LLMs with coding capabilities, as well as the continued existence of strange "glitch" tokens in common LLM vocabularies that should be removed for security reasons.
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, and Tom Goldstein

In Proceedings of the Forty-first International Conference on Machine Learning, Jan 2024

Abs

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

2023

Cramming: Training a Language Model on a Single GPU in One Day.

Jonas Geiping, and Tom Goldstein

In Proceedings of the 40th International Conference on Machine Learning, Jul 2023

Abs

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting. We provide code to reproduce all experiments at github.com/JonasGeiping/cramming .
A Watermark for Large Language Models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein

In Proceedings of the 40th International Conference on Machine Learning, Jul 2023

Abs

Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.