Shashank Tripathi - Computer Vision Researcher

I am currently on the job market. Feel free to connect!

I am a PhD student (2021-) at the Max Planck Institute for Intelligent Systems where I am advised by MPI Director Michael Black. Earlier, I worked as an Applied Scientist at Amazon (2019-2021). I earned my Masters (2017-2019) from the Robotics Institute, Carnegie Mellon University, working with Prof. Kris Kitani. I am a recipient of the Meta Research PhD Fellowship award in 2023.

At Amazon Lab126, I closely collaborated with Prof. James Rehg, Dr. Amit Agrawal and Dr. Ambrish Tyagi. In 2023, I spent time at Epic Games as a research intern working with Dr. Carsten Stoll, Dr. Christoph Lassner and Dr. Daniel Holden. Recently, I interned at Meta Zurich where I worked with Bugra Tekin on impoving spatial understanding and visual grounding in 3D foundation models.

It has been my great fortune to have worked with excellent mentors and advisors.

Work Experience

Research

My research lies at the intersection of machine learning, computer vision and computer graphics. Specifically, I am interested in 3D modeling of human bodies, modeling human-object interactions, physics-inspired human motion understanding and spatial understanding of 3D scenes. In the past, I have worked on synthetic data for applications like object detection and human pose estimation from limited supervision.

Publications

SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image

Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi, Theo Gevers, Sai Kumar Dwivedi, Dimitrios Tzionas

IEEE/CVF International Conference on Computer Vision (ICCV) 2025

SDFit is a novel optimization framework that uses a morphable signed-distance-function prior and 2D–3D correspondences from foundational models to iteratively recover and refine 3D object shape and pose from a single image

Paper Abstract Project BibTex

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas

Computer Vision and Pattern Recognition (CVPR) 2025

(Winner of the contact estimation tracks at CVPR 2025 - see here)

InteractVLM is a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate joint reconstruction by leveraging large foundational model.

Paper Abstract Project Poster Video BibTex

PICO: Reconstructing 3D People In Contact with Objects

Shashank Tripathi*, Alpár Cseke*, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas

Computer Vision and Pattern Recognition (CVPR) 2025

PICO recovers humans, objects, and their interactions (HOI) - all in 3D, from just a single internet image. To this end, we collect a new dataset of 3D contact correspondences, called PICO-db and a novel three-stage optimization framework PICO-fit.

Paper Abstract Project Poster Video BibTeX Dataset

HUMOS: HUman MOtion Model Conditioned on Body Shape

Shashank Tripathi, Omid Taheri, Christoph Lassner, Michael J. Black, Daniel Holden, Carsten Stoll

European Conference on Computer Vision (ECCV) 2024

People with different body shapes perform the same motion differently. Our method, HUMOS, generates natural, physically plausible, and dynamically stable human motions based on body shape.

Paper Abstract Project Video BibTeX Poster

DECO: Dense Estimation of 3D Human-Scene COntact in the Wild

Shashank Tripathi*, Agniv Chatterjee*, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, Michael J. Black

International Conference on Computer Vision (ICCV) 2023

(Oral presentation)

DECO estimates dense vertex-level 3D human-scene and human-object contact across the full body mesh and works on diverse and challenging human-object interactions in arbitrary in-the-wild images.

Paper Abstract Project Dataset Video BibTeX Poster

EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Radek Danecek, Kiran Chhatre, Shashank Tripathi, Yandong Wen, Michael J. Black, Timo Bolkart

SIGGRAPH ASIA 2023

Given audio input and an emotion label, EMOTE generates an animated 3D head that has state-of-the-art lip synchronization while expressing the emotion. The method is trained from 2D video sequences using a novel video emotion loss and a mechanism to disentangle emotion from speech.

Paper Abstract Project BibTeX

3D Human Pose Estimation via Intuitive Physics

Shashank Tripathi, Lea Müller, Chun-Hao P. Huang, Omid Taheri, Michael Black, Dimitrios Tzionas

Computer Vision and Pattern Recognition (CVPR) 2023

IPMAN estimates a 3D body from a color image in a "stable" configuration by encouraging plausible floor contact and overlapping CoP and CoM. It exploits interpenetration of the body mesh with the ground plane as a heuristic for pressure.

Paper Abstract Project Dataset Video BibTeX Poster

BITE: Beyond Priors for Improved Three-Dog Pose Estimation

Nadine Rüegg, Shashank Tripathi, Konrad Schindler, Michael J. Black, Silvia Zuffi

Computer Vision and Pattern Recognition (CVPR) 2023

BITE enables 3D shape and pose estimation of dogs from a single input image. The model handles a wide range of shapes and breeds, as well as challenging postures far from the available training poses, like sitting or lying on the ground.

Paper Abstract Project Video BibTeX Poster

MIME: Human-Aware 3D Scene Generation

Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus Thies, Michael J. Black

Computer Vision and Pattern Recognition (CVPR) 2023

MIME takes 3D human motion capture and generates plausible 3D scenes that are consistent with the motion. Why? Most mocap sessions capture the person but not the scene.

Paper Abstract Project Video BibTeX Poster

PERI: Part Aware Emotion Recognition in the Wild

Akshita Mittel, Shashank Tripathi

European Conference on Computer Vision Workshops (ECCVW) 2022

An in-the-wild emotion recognition network that leverages both body pose and facial landmarks using a novel part aware spatial (PAS) image representation and context infusion (Cont-In) blocks.

Paper Abstract

AGORA: Avatars in Geography Optimized for Regression Analysis

Priyanka Patel, Chun-Hao P. Huang, Joachim Tesch, David T. Hoffman, Shashank Tripathi and Michael J. Black

Computer Vision and Pattern Recognition (CVPR) 2021

A synthetic dataset with high realism and highly accurate ground truth containing 4240 textured scans and SMPLX fits.

Paper Abstract Project Video

PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation

Shashank Tripathi, Siddhant Ranade, Ambrish Tyagi and Amit Agrawal

International Conference on 3D Vision (3DV), 2020

(Oral presentation)

Temporally consistent recovery of 3D human pose from 2D joints without using 3D data in any form

Paper Abstract videos

Learning to Generate Synthetic Data via Compositing

Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James Rehg and Visesh Chari

Computer Vision and Pattern Recognition (CVPR) 2019

Efficient, task-aware and realisitic synthesis of composite images for training classification and object detection models

Paper Abstract Poster

C2F: Coarse-to-Fine Vision Control System for Automated Microassembly

Shashank Tripathi, Devesh Jain and Himanshu Dutt Sharma

Nanotechnology and Nanoscience-Asia 2018

Automated, visual-servoing based closed loop system to perform 3D micromanipulation and microassembly tasks

Paper Abstract Video

Sub-cortical Shape Morphology and Voxel-based Features for Alzheimer's Disease Classification

Shashank Tripathi, Seyed Hossein Nozadi, Mahsa Shakeri and Samuel Kadoury

IEEE International Symposium on Biomedical Imaging (ISBI) 2017

Alzheimer's disease patient classification using a combination of grey-matter voxel-based intensity variations and 3D structural (shape) features extracted from MRI brain scans

Paper Abstract Poster

Deep Spectral-Based Shape Features for Alzheimer’s Disease Classification

Mahsa Shakeri, Hervé Lombaert, Shashank Tripathi and Samuel Kadoury

MICCAI Spectral and Shape Analysis in Medical Imaging (SeSAMI) 2016

Alzheimer's disease classification by using deep learning variational auto-encoder on shape based features

Paper Abstract

Patents

Three-dimentional Pose Estimation without 3D Data

Shashank Tripathi, Amit Agrawal, Ambrish Tyagi, Siddhant Ranade

US Patent 11,526,697

Patent for unsupervised 3D human pose estimation using geometric consistency.

Patent

Generation of synthetic image data using three-dimensional models

Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James M. Rehg, Visesh Chari

US Patent 10,909,349

Patent for generating synthetic training data using 3D models and realistic rendering techniques for computer vision applications.

Patent

Generation of synthetic image data for computer vision models

Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James M. Rehg, Visesh Chari

US Patent 10,860,836

Patent for methods of generating synthetic image data through advanced compositing techniques for training computer vision models.

Patent

Miscellaneous

Some other unpublished work:

Learning Salient Objects in a Scene using Superpixel-augmented Convolutional Neural Networks

Shashank Tripathi

A method for learning salient object detection using superpixel-augmented convolutional neural networks to improve object localization in complex scenes.

Report Slides Code

Moving object detection, tracking and classification from an unsteady camera

Shashank Tripathi

A computer vision system for detecting, tracking, and classifying moving objects from unstable camera platforms, addressing challenges in dynamic environments.

Slides Video

Towards integrating model dynamics for sample efficient reinforcement learning

Shashank Tripathi

Research on integrating model-based dynamics into reinforcement learning algorithms to improve sample efficiency and learning performance.

Report Code

Work Experience

Recent News

Research

Publications

Patents

Miscellaneous