I am a Staff Research Scientist and Research Manager for Google DeepMind in Mountain View, CA. My research background covers a wide range of topics: robotics, imitation learning, reinforcement learning, computer vision and graphics, computational fluid dynamics, unsupervised learning, hand and human body tracking and integrated circuit design.

Projects

ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation
Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaroshenko, Kevin Zakka and Tony Z. Zhao
whitepaper 2024 [paper] [website]
Aloha2 is the next generation of low-cost puppeteering teleoperation for bi-arm manipulation. We demonstrate teleop and policy learning on extremely dexterous tasks from putting a t-shirt on a hanger to tying shoe laces.

FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar
Preprint 2024 [paper] [website]
We introduce a versatile flexible-captioning VLM capable of generating region-specific descriptions of varying lengths, which we call FlexCap. We demonstrates SoTA in dense captioning tasks on the Visual Genome dataset and SoTA zero-shot performance on a number of VQA datasets.

Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng + 48 Authors
Preprint 2024 [paper] [website]
We investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their "teachability" i.e., how efficiently they adapt to human inputs. We present Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments.

RT-H: Action Hierarchies Using Language
Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quan Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, Dorsa Sadigh
Preprint 2024 [paper] [website]
A policy that is conditioned on language motions can easily be corrected during execution through human-specified language motions. This enables a new paradigm for flexible policies that can learn from human intervention in language. Our method RT-H builds an action hierarchy using language motions: it first learns to predict language motions, and conditioned on this along with the high-level task, it then predicts actions, using visual context at all stages.

Video language planning
Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson
ICLR 2024 [paper] [website]
We present VLP; a novel way to perform robust long-horizon complex task planning using generative video models. VLP is able to reason over multi-part tasks with complex branching and we demonstrate results on a number of robotic platforms.

UniSim: Learning Interactive Real-World Simulators
Sherry Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel
ICLR 2024 [paper] [website]
Unisim uses state-of-the-art video prediction to learn a universal simulator of real-world interactions. We demonstrate the usefulness of the system for a wide range of applications, including robotic planning, simulated reinforcement learning and multiple computer vision applications.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration [>150 Authors]
ICRA 2024 [paper] [website]
Open X-Embodiment is a large multi-institution collaboration to create the worlds largest multi-embodiment robotic manipulation dataset. Presented baselines demonstrate the importance of multi-embodiment data and show positive task transfer between embodiments with different morphologies.

Geometry Matching for Multi-Embodiment Grasping
Maria Attarian, Muhammad Adil Asif, Jingzhou Liu, Ruthrash Hari, Animesh Garg, Igor Gilitschenski, Jonathan Tompson
CoRL 2023 [paper] [website]
We propose GeoMatch in which we apply supervised learning on grasping data from multiple embodiments, learning end-to-end contact point likelihood maps as well as conditional autoregressive prediction of grasps keypoint-by-keypoint.

Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
Ted Xiao, Harris Chan, Pierre Sermanet, Ayzaan Wahid, Anthony Brohan, Karol Hausman, Sergey Levine, Jonathan Tompson
RSS 2023 [paper] [website]
We proposes Data-driven Instruction Augmentation for Language-conditioned control (DIAL), which utilizes semi-supervised language labels, leveraging the semantic understanding of VLMs, to propagate knowledge onto unlabelled demonstration data.

Scaling Robot Learning with Semantically Imagined Experience
Tianhe Yu, Ted Xiao, Austin Stone, Jonathan Tompson, Anthony Brohan, Su Wang, Jaspiar Singh, Clayton Tan, Dee M, Jodilyn Peralta, Brian Ichter, Karol Hausman, Fei Xia
RSS 2023 [paper] [website]
We present ROSIE: Scaling RObot Learning with Semantically Imagined Experience, where we augment real robotics data with semantically imagined scenarios for downstream manipulation learning.

PaLM-E: An Embodied Multimodal Language Model
Danny Driess, Fei Xia1, Mehdi Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
ICML 2023 [website]
We propose embodied language models to establish the link between words and percepts by directly incorporating continuous inputs from sensor modalities. A single large embodied multimodal PaLM-E model can address a variety of embodied reasoning tasks from VQA to robotic task planning.

Interactive Language: Talking to Robots in Real Time
Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, Pete Florence
RA-L 2023 [paper] [website]
We present a framework for building interactive, real-time, natural language-instructable robots in the real world.

Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations
Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi
ICRA 2023 [paper]
We explore the effectiveness of using self-supervised object-aware representations for robotic policy learning tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment to learn a robust object-centric representation, outperforming baselines with significantly less data.

Contrastive Value Learning: Implicit Models for Simple Offline RL
Bogdan Mazoure*, Ben Eysenbach*, Ofir Nachum, Jonathan Tompson (* equal)
CoRL 2023 [paper]
We propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics. CVL outperforms SoTA on offline RL datasets and we show that it is able to incorporate out-of-domain demonstrations while pretraining to perform task transfer.

Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter
CoRL 2022 [paper] [website]
Building on SayCan, we incorporate multiple feedback sources into the LLM that allows them to more richly process and plan in robotic control scenarios.

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
Bogdan Mazoure, Ofir Nachum, Ilya Kostrikov, Jonathan Tompson
NeurIPS 2022 [paper]
Achieved state-of-the-art in the challenging setting of zero-shot offline RL on the Procgen benchmark. Used a theoretically-motivated framework called Generalized Similarity Functions to learn improved state representations leading to more robust policies with improved generalization.

Keynote Talk: Pick and Place at Scale
Jonathan Tompson
ICRA 2022 Workshop: Challenges in Applying Academic Research to Real-World Robotics. [slides] [video]
In this talk I cover the challenges of 3 production robotics problems within the Robotics At Google team: Singulation, Kitting and Depalletization. I discuss the practical and engineering challenges of each task, as well as propose open research questions.

Implicit Behavioral Cloning
Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson
CoRL 2021 [paper] [website]
We show that on real-world robotic policy learning tasks that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs.

XIRL: Cross-embodiment Inverse Reinforcement Learning
Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, Debidatta Dwibedi
CoRL 2021 [paper] [website]
We introduce x-MAGICAL benchmark geared towards cross-embodiment imitation, and we demonstrate an unsupervised reward-learning approach (using TCC) which successfully learns from expert demonstrations from a different agent embodiment.

Offline Reinforcement Learning with Fisher Divergence Critic Regularization
Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum
ICML 2021 [paper]
Achieved state-of-the-art on the offline RL D4RL benchmark by proposing a Fisher divergence behavior regularization term to replace CQL's KL constraint.

Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-conditioned Transporter Networks
Daniel Seita, Pete Florence, Jonathan Tompson, Erwin Coumans, Vikas Sindhwani, Ken Goldberg, Andy Zeng
ICRA 2021 [paper] [website]
This work extends Transporter Nets to perform goal-conditioned manipulation of deformable objects. We propose a suite of simulated benchmark tasks ranging from rope manipulation to placing objects in bags.

With a Little Help From My Friends: Nearest-neighbor Contrastive Learning of Visual Representations
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
ICCV 2021 [paper]
Our method, Nearest-Neighbor Contrastive Learning of visual Representations (NNCLR), achieved state-of-the-art on unsupervised Imagenet, by novel sampling of negatives using nearest neighbors.

Transporter networks: Rearranging the Visual World for Robotic Manipulation
Andy Zeng, Pete Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, Johnny Lee
CoRL 2020 [paper] [website]
We present a novel end-to-end system for data-efficient robotic manipulation (pick-and-place). Transporter Networks learns to map query pick objects with their target place location.

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
CVPR 2020 [paper] [website]
We present an end-to-end algorithm for detecting periodic motion in videos, and counting repeated actions in a sequence.

Imitation Learning via Off-Policy Distribution Matching
Ilya Kostrikov, Ofir Nachum, Jonathan Tompson
ICLR 2020 [paper]
Our ValueDice algorithm improves upon DAC, enabling similar performance in an entirely offline setting. This is done via a novel formulation of phrasing distributional matching as value function learning problem.

ADAIL: Adaptive Adversarial Imitation Learning
Yiren Lu, Jonathan Tompson
NeurIPS 19 workshop + Submission [paper]
The ADAIL tackles the problem of adaptive policies learning. We use an explicit latent dynamics encoder to improve policy robustness to unseen dynamics.

An Analysis of Object Representations in Deep Visual Trackers
Jonathan Tompson*, Ross Goroshin*, Debidatta Dwibedi (* equal)
Submission [paper]
In this work we performed an in-depth analysis of a popular tracking architecture and suggested a novel architecture to mitigate saliency biases.

Temporal Cycle-Consistency Learning
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
CVPR 2019 [paper] [webpage] [video]
A self-supervised representation learning method based on the task of temporal alignment between videos. In addition to robustly solving video alignment, these representations enable few-shot classification of video action phases.

Learning Latent Plans from Play
Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet
CORL 2019 [paper] [website]
A novel method for learning hierarchical robotic control policies from unstructured play data.

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
Ilya Kostrikov, Kumar Krishna Agrawal , Debidatta Dwibedi, Sergey Levine, Jonathan Tompson
ICLR 2019 [paper] [code]
Presented a SoTA Adversarial Imitation Learning method that utilizes an off-policy variant of the GAIL algorithm, as well as a novel mechanism for handling absorbing states.

PersonLab: Person Pose Estimation and Instance Segmentation
George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy
ECCV 2018 [paper]
A box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model.

Learning Actionable Representations from Visual Observations
Debidatta Dwibedi, Jonathan Tompson, Corey Lynch, Pierre Sermanet
IROS 2018 [paper] [website]
A novel framework for learning robust visual features for training robotic agents. Building upon the TCN framework, we show that our learned representations are as performant as policies trained from true state representations.

Temporal Reasoning in Videos using Convolutional Gated Recurrent Units
Debidatta Dwibedi, Jonathan Tompson, Pierre Sermanet
CVPR 2018 Workshop [paper]
An architecture for video-based action recognition using a novel latent prediction loss to constrain and improve latent representations.




Discovery of Semantic 3D Keypoints via End-to-end Geometric Reasoning
Supasorn Suwajanakorn. Noah Snavely, Jonathan Tompson, Mohammad Norouzi
NIPS 2018 Oral [paper] [website]
A semi-supervised method to recover semantically consistent 3D keypoints from weakly labeled RGB data.

Learning Robotic Manipulation of Granular Media
Connor Schenck, Jonathan Tompson, Dieter Fox, Sergey Levine
CoRL 2017 [paper] [video]
This paper examines the problem of robotic manipulation of graunular media, where we learn predictive models of granular media dynamics to perform scooping and dumping actions.

Towards Accurate Multi-person Pose Estimation in the Wild
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy
CVPR 2017 [paper] [slides]
State-of-the-art RGB human pose on MSCOCO using a 2-stage system for top-down detection.

Accelerating Eulerian Fluid Simulation With Convolutional Networks
Jonathan Tompson, Kristofer Schlachter, Pablo Sprechmann, Ken Perlin
ICML 2017 [paper] [video] [video presentation] [website]
A learning-based system for simulating Navier-Stokes Equations in real-time. We do so by reformulating the standard operator splitting method as an end-to-end network.

Inside-out Hand Tracking
Google Daydream 2016
Lead a project to enable high quality hand-tracking from a head-mounted camera. The ConvNet-based system ran in real time on embedded hardware. The system demonstrated robustness to occlusion and hand-shape variation. More details are unfortunately not yet public.

PhD Thesis: Localization of Humans in Images Using Convolutional Networks
NYU 2015 [thesis] [ppt]
My PhD thesis covers *most* of the human body tracking work I did while at NYU.

Efficient ConvNet-based Marker-less Motion Capture in General Scenes with a Low Number of Cameras
Ahmed Elhayek, Edilson De Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Christoph Bregler, Bernt Schiele, Christian Theobalt
CVPR 2015 [paper] [video] [website]
SoTA motion capture in arbitrary scenes from few cameras.


Efficient Object Localization Using Convolutional Networks
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, Christoph Bregler
CVPR 2015 [paper] [predictions_flic] [predictions_mpii]
A novel cascaded architecture to help overcome the effects of MaxPooling and a modified dropout that works better in the presence of spatially-coherent activations. Achieved SoTA in human body tracking.
Turked MPII images containing one person: [data].

FLIC-plus Dataset
Jonathan Tompson, Arjun Jain, Christoph Bregler, Yann LeCun
NIPS 2014 [website]
Cleaned up an filtered the FLIC Human Pose dataset of Sapp et al. for fairer evaluation and higher quality labels.



Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
Jonathan Tompson, Arjun Jain, Christoph Bregler, Yann LeCun
NIPS 2014 [paper] [predictions]
Following ICLR 2014 work, we substantially improved the architecture, incorporated the MRF into the ConvNet and significantly outperformed existing SoTA.

MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
Arjun Jain, Jonathan Tompson, Yann LeCun, Christoph Bregler
ACCV 2014 [paper]
For ambiguous poses with poor image evidence (such as detecting the pose of camouflaged actors), we showed that motion flow features allow us to outperform state-of-the-art techniques.

Unsupervised Feature Learning from Temporal Data
Rostislav Goroshin, Joan Bruna, Jonathan Tompson, Arthur Szlam, David Eigen, Yann LeCun
ACCV 2014 [paper]
A sparse auto-encoder architecture to make use of temporal coherence. This formulation enables pre-training on unlabeled video data (of which there is a massive abundance), to improve ConvNet performance.

Learning Human Pose Estimation Features with Convolutional Networks
Arjun Jain, Jonathan Tompson, Mykhaylo Andriluka, Graham W. Taylor, Christoph Bregler
ICLR 2014 [paper]
It was a new architecture for human pose estimation using a ConvNet + MRF spatial model and it was the first paper to show that a variation of deep learning could outperform existing architectures.

NYU Hand Pose Dataset
Jonathan Tompson, Murphy Stein, Ken Perlin, Yann LeCun
[website] [code]
High quality hand pose dataset released. Was the primary hand pose evaluation dataset for the community for years.

Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
Jonathan Tompson, Murphy Stein, Ken Perlin, Yann LeCun
SIGGRAPH 2014 [paper] [video] [ppt] [code]
A novel method for real-time pose recovery of markerless complex articulable objects from a single depth image. We showed state-of-the-art results for real-time hand tracking.

Distributed Locking Protocol
Working at MongoDB Inc with the server kernel team (under Alberto Lerner): developed a new distributed lease protocol (for the sharding config server) using a heavily modified 2-phase commit with timeout mechanism.

Open-Source Randomized Decision Forests
[video] [code]
Early hand-tracking research: a randomized-decision-forest classifier trained to recognize hand pixels using Microsoft's Kinect. Provides an OSS implementation of Real-Time Human Pose Recognition in Parts from Single Depth Images

ARCADE
Jonathan Tompson, Ken Perlin, Murphy Stein, Charlie Hendee, Xiao Xiao Hiroshi Ishii
SIGGRAPH Realtime Live 2012 [video]
Group project with the MIT Media Lab and NYU Media Research Lab. ARCADE is a system that allows real-time video-based presentations that convey the illusion that presenters are directly manipulating holographic 3D objects with their hands.

Miscellaneous Open-Source Graphics Projects
Over the years I have open sourced implementations of multiple graphics algorithms, including:

IC Reserach and Development at Epoch Microelectronics
Designed a wide range of mixed signal RFICs including:

  • RF Modulators (passive and active).
  • Fractional and integer PLLs.
  • Wideband, low-noise LNAs.
  • Low-noise bandgaps and regulators.
  • LC and XTAL oscillators.
  • ΣΔ modulators.
  • FIR/IIR interpolation and decimation filters.
  • Anti-aliasing filters for data converters.
  • Continuous time, high linearity analog filters for base-band processing.
Worked with multiple telecommunication standards, including:
  • Cellular: GSM (EDGE), CDMA, WCDMA, LTE, WIMAX
  • Terrestrial and Cable Television: DVB, ATSC, ISDB
  • Low-power standards: Bluetooth, Zigbee

2.6GHz RF Inductive Power Delivery for Contactless On-Wafer Characterization
Jonathan Tompson, Adam Dolin and Peter Kinget
ICMTS 2008 [paper]
Designed a contactless IC testing mechanism using inductive probing through custom devices.

Mismatch Characterization of Ring-Oscillators
Jonathan Tompson, Peter Kinget
2008 Research Associate
Investigated the matching of on-chip oscillators and compared statistics to theoretical estimates.

High-Speed, Chip-to-Chip Communcation
Jonathan Tompson, Gu-Yeon Wei
2006 Undergraduate Honors thesis
Designed a novel transformer-based communication system for high-speed digital systems.

Open Source Tools
I've written or contributed to many OSS tools over the years. This is a short list of some:

  • jtorch - Torch7 Utility Library for running models in OpenCL / C++
  • jcl - OpenCL Wrapper (to make OpenCL easier)
  • torchzlib - A utility library for zlib compression / decompression of Torch7 tensors
  • matlabnoise - Matlab procedural noise library
  • matlabobj - Matlab obj reader
  • torch7 - I'm a reasonably regular contributor to torch and it's various packages
  • icp - A C++ Iterative Closest Point library (with Matlab interface)
  • ik - A very simple inverse kinematics library (in C++)
  • ModelFit - Off-line fitting portion of the hand-tracking paper below
  • jzmq - A ZeroMQ Utility Library (C++)
  • There are probably others... See my github.

Resume

PDF
WORD