Jonathan Tompson

Selected Projects

Gemini for Robotics
Gemini Robotics Team [116 authors]
Technical Report [paper] [website]
Gemini Robotics is a Gemini 2.0-based model designed for robotics applications and is state of the art in both emboddied reasoning and low-level action prediction tasks. It is a multi-embodiment, multi-task model that supports natural human interaction.

ALOHA Unleashed: A simple recipe for robot dexterity
Tony Z Zhao, Jonathan Tompson, Danny Driess, Pete Florence, Kamyar Ghasemipour, Chelsea Finn, Ayzaan Wahid
CoRL 2024 [paper] [website]
Aloha Unleased introduced a new level of dexterous policy learning. Using high-quality data from Aloha2 and a diffusion policy architecture we were able to automate dexterous tasks such as shoe-lace untying, t-shirt and laundry hanging, etc.

ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation
Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaroshenko, Kevin Zakka and Tony Z. Zhao
whitepaper 2024 [paper] [website]
Aloha2 is the next generation of low-cost puppeteering teleoperation for bi-arm manipulation. We demonstrate teleop and policy learning on extremely dexterous tasks from putting a t-shirt on a hanger to tying shoe laces.

Video language planning
Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson
ICLR 2024 [paper] [website]
We present VLP; a novel way to perform robust long-horizon complex task planning using generative video models. VLP is able to reason over multi-part tasks with complex branching and we demonstrate results on a number of robotic platforms.

UniSim: Learning Interactive Real-World Simulators
Sherry Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel
ICLR 2024 [paper] [website]
Unisim uses state-of-the-art video prediction to learn a universal simulator of real-world interactions. We demonstrate the usefulness of the system for a wide range of applications, including robotic planning, simulated reinforcement learning and multiple computer vision applications.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration [>150 Authors]
ICRA 2024 [paper] [website]
Open X-Embodiment is a large multi-institution collaboration to create the worlds largest multi-embodiment robotic manipulation dataset. Presented baselines demonstrate the importance of multi-embodiment data and show positive task transfer between embodiments with different morphologies.

PaLM-E: An Embodied Multimodal Language Model
Danny Driess, Fei Xia1, Mehdi Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
ICML 2023 [website]
We propose embodied language models to establish the link between words and percepts by directly incorporating continuous inputs from sensor modalities. A single large embodied multimodal PaLM-E model can address a variety of embodied reasoning tasks from VQA to robotic task planning.

Interactive Language: Talking to Robots in Real Time
Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, Pete Florence
RA-L 2023 [paper] [website]
We present a framework for building interactive, real-time, natural language-instructable robots in the real world.

Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter
CoRL 2022 [paper] [website]
Building on SayCan, we incorporate multiple feedback sources into the LLM that allows them to more richly process and plan in robotic control scenarios.

Keynote Talk: Pick and Place at Scale
Jonathan Tompson
ICRA 2022 Workshop: Challenges in Applying Academic Research to Real-World Robotics. [slides] [video]
In this talk I cover the challenges of 3 production robotics problems within the Robotics At Google team: Singulation, Kitting and Depalletization. I discuss the practical and engineering challenges of each task, as well as propose open research questions.

Implicit Behavioral Cloning
Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson
CoRL 2021 [paper] [website]
We show that on real-world robotic policy learning tasks that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs.

Transporter networks: Rearranging the Visual World for Robotic Manipulation
Andy Zeng, Pete Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, Johnny Lee
CoRL 2020 [paper] [website]
We present a novel end-to-end system for data-efficient robotic manipulation (pick-and-place). Transporter Networks learns to map query pick objects with their target place location.

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
CVPR 2020 [paper] [website]
We present an end-to-end algorithm for detecting periodic motion in videos, and counting repeated actions in a sequence.

Temporal Cycle-Consistency Learning
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
CVPR 2019 [paper] [webpage] [video]
A self-supervised representation learning method based on the task of temporal alignment between videos. In addition to robustly solving video alignment, these representations enable few-shot classification of video action phases.

Discovery of Semantic 3D Keypoints via End-to-end Geometric Reasoning
Supasorn Suwajanakorn. Noah Snavely, Jonathan Tompson, Mohammad Norouzi
NIPS 2018 Oral [paper] [website]
A semi-supervised method to recover semantically consistent 3D keypoints from weakly labeled RGB data.

Towards Accurate Multi-person Pose Estimation in the Wild
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy
CVPR 2017 [paper] [slides]
State-of-the-art RGB human pose on MSCOCO using a 2-stage system for top-down detection.

Accelerating Eulerian Fluid Simulation With Convolutional Networks
Jonathan Tompson, Kristofer Schlachter, Pablo Sprechmann, Ken Perlin
ICML 2017 [paper] [video] [video presentation] [website]
A learning-based system for simulating Navier-Stokes Equations in real-time. We do so by reformulating the standard operator splitting method as an end-to-end network.

Inside-out Hand Tracking
Google Daydream 2016
Lead a project to enable high quality hand-tracking from a head-mounted camera. The ConvNet-based system ran in real time on embedded hardware. The system demonstrated robustness to occlusion and hand-shape variation. More details are unfortunately not yet public.

Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
Jonathan Tompson, Arjun Jain, Christoph Bregler, Yann LeCun
NIPS 2014 [paper] [predictions]
Following ICLR 2014 work, we substantially improved the architecture, incorporated the MRF into the ConvNet and significantly outperformed existing SoTA.

Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
Jonathan Tompson, Murphy Stein, Ken Perlin, Yann LeCun
SIGGRAPH 2014 [paper] [video] [ppt] [code]
A novel method for real-time pose recovery of markerless complex articulable objects from a single depth image. We showed state-of-the-art results for real-time hand tracking. Released NYU Hand Pose Dataset.

IC Reserach and Development at Epoch Microelectronics
Designed a wide range of mixed signal RFICs including:

RF Modulators (passive and active).
Fractional and integer PLLs.
Wideband, low-noise LNAs.
Low-noise bandgaps and regulators.
LC and XTAL oscillators.
ΣΔ modulators.
FIR/IIR interpolation and decimation filters.
Anti-aliasing filters for data converters.
Continuous time, high linearity analog filters for base-band processing.

Worked with multiple telecommunication standards, including:

Cellular: GSM (EDGE), CDMA, WCDMA, LTE, WIMAX
Terrestrial and Cable Television: DVB, ATSC, ISDB
Low-power standards: Bluetooth, Zigbee

2.6GHz RF Inductive Power Delivery for Contactless On-Wafer Characterization
Jonathan Tompson, Adam Dolin and Peter Kinget
ICMTS 2008 [paper]
Designed a contactless IC testing mechanism using inductive probing through custom devices.

Jonathan Tompson

Selected Projects

Resume