I am currently on the job market. Feel free to connect!

I am a PhD student (2021-) at the Max Planck Institute for Intelligent Systems where I am advised by MPI Director Michael Black. Earlier, I worked as an Applied Scientist at Amazon (2019-2021). I earned my Masters (2017-2019) from the Robotics Institute, Carnegie Mellon University, working with Prof. Kris Kitani. I am a recipient of the Meta Research PhD Fellowship award in 2023.

At Amazon Lab126, I closely collaborated with Prof. James Rehg, Dr. Amit Agrawal and Dr. Ambrish Tyagi. In 2023, I spent time at Epic Games as a research intern working with Dr. Carsten Stoll, Dr. Christoph Lassner and Dr. Daniel Holden. Recently, I interned at Meta Zurich where I worked with Bugra Tekin on impoving spatial understanding and visual grounding in 3D foundation models.

It has been my great fortune to have worked with excellent mentors and advisors.

Work Experience

Shashank Tripathi

Recent News

Research

My research lies at the intersection of machine learning, computer vision and computer graphics. Specifically, I am interested in 3D modeling of human bodies, modeling human-object interactions, physics-inspired human motion understanding and spatial understanding of 3D scenes. In the past, I have worked on synthetic data for applications like object detection and human pose estimation from limited supervision.

Publications

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas
Computer Vision and Pattern Recognition (CVPR) 2025
(Winner of the contact estimation tracks at CVPR 2025 - see here)
InteractVLM is a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate joint reconstruction by leveraging large foundational model.
PICO: Reconstructing 3D People In Contact with Objects

PICO: Reconstructing 3D People In Contact with Objects

Shashank Tripathi*, Alpár Cseke*, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas
Computer Vision and Pattern Recognition (CVPR) 2025
PICO recovers humans, objects, and their interactions (HOI) - all in 3D, from just a single internet image. To this end, we collect a new dataset of 3D contact correspondences, called PICO-db and a novel three-stage optimization framework PICO-fit.

HUMOS: HUman MOtion Model Conditioned on Body Shape

HUMOS: HUman MOtion Model Conditioned on Body Shape

Shashank Tripathi, Omid Taheri, Christoph Lassner, Michael J. Black, Daniel Holden, Carsten Stoll
European Conference on Computer Vision (ECCV) 2024
People with different body shapes perform the same motion differently. Our method, HUMOS, generates natural, physically plausible, and dynamically stable human motions based on body shape.

DECO: Dense Estimation of 3D Human-Scene COntact in the Wild

DECO: Dense Estimation of 3D Human-Scene COntact in the Wild

Shashank Tripathi*, Agniv Chatterjee*, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, Michael J. Black
International Conference on Computer Vision (ICCV) 2023
(Oral presentation)
DECO estimates dense vertex-level 3D human-scene and human-object contact across the full body mesh and works on diverse and challenging human-object interactions in arbitrary in-the-wild images.

EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement

EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Radek Danecek, Kiran Chhatre, Shashank Tripathi, Yandong Wen, Michael J. Black, Timo Bolkart
SIGGRAPH ASIA 2023
Given audio input and an emotion label, EMOTE generates an animated 3D head that has state-of-the-art lip synchronization while expressing the emotion. The method is trained from 2D video sequences using a novel video emotion loss and a mechanism to disentangle emotion from speech.

3D Human Pose Estimation via Intuitive Physics

3D Human Pose Estimation via Intuitive Physics

Shashank Tripathi, Lea Müller, Chun-Hao P. Huang, Omid Taheri, Michael Black, Dimitrios Tzionas
Computer Vision and Pattern Recognition (CVPR) 2023
IPMAN estimates a 3D body from a color image in a "stable" configuration by encouraging plausible floor contact and overlapping CoP and CoM. It exploits interpenetration of the body mesh with the ground plane as a heuristic for pressure.

BITE: Beyond Priors for Improved Three-Dog Pose Estimation

BITE: Beyond Priors for Improved Three-Dog Pose Estimation

Nadine Rüegg, Shashank Tripathi, Konrad Schindler, Michael J. Black, Silvia Zuffi
Computer Vision and Pattern Recognition (CVPR) 2023
BITE enables 3D shape and pose estimation of dogs from a single input image. The model handles a wide range of shapes and breeds, as well as challenging postures far from the available training poses, like sitting or lying on the ground.

MIME: Human-Aware 3D Scene Generation

MIME: Human-Aware 3D Scene Generation

Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus Thies, Michael J. Black
Computer Vision and Pattern Recognition (CVPR) 2023
MIME takes 3D human motion capture and generates plausible 3D scenes that are consistent with the motion. Why? Most mocap sessions capture the person but not the scene.

PERI: Part Aware Emotion Recognition in the Wild

PERI: Part Aware Emotion Recognition in the Wild

Akshita Mittel, Shashank Tripathi
European Conference on Computer Vision Workshops (ECCVW) 2022
An in-the-wild emotion recognition network that leverages both body pose and facial landmarks using a novel part aware spatial (PAS) image representation and context infusion (Cont-In) blocks.
AGORA: Avatars in Geography Optimized for Regression Analysis

AGORA: Avatars in Geography Optimized for Regression Analysis

Priyanka Patel, Chun-Hao P. Huang, Joachim Tesch, David T. Hoffman, Shashank Tripathi and Michael J. Black
Computer Vision and Pattern Recognition (CVPR) 2021
A synthetic dataset with high realism and highly accurate ground truth containing 4240 textured scans and SMPLX fits.
PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation

PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation

Shashank Tripathi, Siddhant Ranade, Ambrish Tyagi and Amit Agrawal
International Conference on 3D Vision (3DV), 2020
(Oral presentation)
Temporally consistent recovery of 3D human pose from 2D joints without using 3D data in any form
Learning to Generate Synthetic Data via Compositing

Learning to Generate Synthetic Data via Compositing

Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James Rehg and Visesh Chari
Computer Vision and Pattern Recognition (CVPR) 2019
Efficient, task-aware and realisitic synthesis of composite images for training classification and object detection models
C2F: Coarse-to-Fine Vision Control System for Automated Microassembly

C2F: Coarse-to-Fine Vision Control System for Automated Microassembly

Shashank Tripathi, Devesh Jain and Himanshu Dutt Sharma
Nanotechnology and Nanoscience-Asia 2018
Automated, visual-servoing based closed loop system to perform 3D micromanipulation and microassembly tasks
Sub-cortical Shape Morphology and Voxel-based Features for Alzheimer's Disease Classification

Sub-cortical Shape Morphology and Voxel-based Features for Alzheimer's Disease Classification

Shashank Tripathi, Seyed Hossein Nozadi, Mahsa Shakeri and Samuel Kadoury
IEEE International Symposium on Biomedical Imaging (ISBI) 2017
Alzheimer's disease patient classification using a combination of grey-matter voxel-based intensity variations and 3D structural (shape) features extracted from MRI brain scans
Deep Spectral-Based Shape Features for Alzheimer’s Disease Classification

Deep Spectral-Based Shape Features for Alzheimer’s Disease Classification

MICCAI Spectral and Shape Analysis in Medical Imaging (SeSAMI) 2016
Alzheimer's disease classification by using deep learning variational auto-encoder on shape based features

Patents

Miscellaneous

Some other unpublished work: