[NEW!] [Jun. 2022] 2 papers accepted by CVPR 2022.

► [May 2021] Start my new journey at Adobe Research as a full-time research scientist.

► [Mar. 2021] Gave a talk on deep learning architectures for character animation at Intelligent Graphics Lab, Chinese Academy of Sciences.

► [Nov. 2020] Our summer intern project #OnTheBeatSneak was presented at Adobe MAX 2020 (Sneak Peek). [Quick Look] [Full Youtube Link] [Press]

► [Aug. 2020] Our paper MakeItTalk accepted by SIGGRAPH ASIA 2020. [Video]

► [Apr. 2020] Our paper RigNet accepted by SIGGRAPH 2020. [Video]

► [Nov. 2019] Our summer intern project #SweetTalkSneak was presented at Adobe MAX 2019 (Sneak Peek). [Youtube Link] [Press]

► [Aug. 2019] Our paper on Animation Skeleton Prediction accepted by 3DV 2019.

► [Jul. 2019] Our paper SceneGraphNet accepted by ICCV 2019.

► [Jun. 2019] Joined Adobe CIL (Seattle) as a summer intern.

► [Jun. 2018] Joined Wayfair Next Research as a summer intern and fall co-op intern.

► [Apr. 2018] Our paper VisemeNet accepted by SIGGRAPH 2018. [Video]



Audio-driven Neural Gesture Reenactment with Video Motion Graphs 2020-2022

Yang Zhou, J. Yang, D. Li, J. Saito, D. Aneja, E. Kalogerakis
CVPR 2022

Human speech is often accompanied by body gestures including arm and hand gestures. We present a method that reenacts a high-quality video with gestures matching a target speech audio. The key idea of our method is to split and re-assemble clips from a reference video through a novel video motion graph encoding valid transitions between clips. To seamlessly connect different clips in the reenactment, we propose a pose-aware video blending network which synthesizes video frames around the stitched frames between two clips. Moreover, we developed an audio-based gesture searching algorithm to find the optimal order of the reenacted frames. Our system generates reenactments that are consistent with both the audio rhythms and the speech content. We evaluate our synthesized video quality quantitatively, qualitatively, and with user studies, demonstrating that our method produces videos of much higher quality and consistency with the target audio compared to previous work and baselines.

[Project Page] [Paper] [Code]

APES: Articulated Part Extraction from Sprite Sheets 2021-2022

Z. Xu, M. Fisher, Yang Zhou, D. Aneja, R. Dudhat, L. Yi, E. Kalogerakis
CVPR 2022

Rigged puppets are one of the most prevalent representations to create 2D character animations. Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. Our method is trained to infer articulated body parts, e.g. head, torso and limbs, that can be re-assembled to best reconstruct the given poses. Our results demonstrate significantly better performance than alternatives qualitatively and quantitatively.

[Project Page] [Paper]

MakeItTalk: Speaker-Aware Talking-Head Animation 2019-2020

Yang Zhou, X. Han, E. Shechtman, J. Echevarria, E. Kalogerakis, D. Li

We present a method that generates expressive talking heads from a single facial image with audio as the only input. Our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking head dynamics. Our method is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework.

[Project Page] [Paper] [Video] [New Video!] [Code]

RigNet: Neural Rigging for Articulated Characters 2018-2019

Z. Xu, Yang Zhou, E. Kalogerakis, C. Landreth, K. Singh

We present RigNet, an end-to-end automated method for producing animation rigs from input character models. Given an input 3D model representing an articulated character, RigNet predicts a skeleton that matches the animator expectations in joint placement and topology. It also estimates surface skin weights based on the predicted skeleton. Our method is based on a deep architecture that directly operates on the mesh representation without making assumptions on shape class and structure.

[Project Page] [Video] [Code] [Paper]

SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation 2018-2019

Yang Zhou, Z. While, E. Kalogerakis
International Conference Computer Vision (ICCV), 2019

We propose a neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings. Given an input, potentially incomplete, 3D scene and a query location, our method predicts a probability distribution over object types that fit well in that location. Our distribution is predicted though passing learned messages in a dense graph whose nodes represent objects in the input scene and edges represent spatial and structural relationships.

[Project Page] [Paper] [Code]

Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets 2018-2019

Z. Xu, Yang Zhou, E. Kalogerakis, K. Singh
International Conference on 3D Vision (3DV) 2019

We present a learning method for predicting animation skeletons for input 3D models of articulated characters. In contrast to previous approaches that fit pre-defined skeleton templates or predict fixed sets of joints, our method produces an animation skeleton tailored for the structure and geometry of the input 3D model.

[Project Page] [Code]

Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55 2017

L. Yi, L. Shao, M. Savva, H. Huang, Yang Zhou, et al.
International Conference Computer Vision Workshop (ICCVW), 2017

ShapeNet is an ongoing effort to establish a richly-annotated, large-scale dataset of 3D shapes. We collaborate with ShapeNet team in helping building the training and testing dataset of “Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55”. In particular, we help check the geometry duplicates in ShapeNet Core dataset.

[3D Shape Reconstruction and Segmentation Task Page] [Paper] [ShapeNet Duplicate Check]

A Tube-and-Droplet-based Approach for Representing and Analyzing Motion Trajectories 2014-2016

W. Lin, Yang Zhou, H. Xu, J. Yan, M. Xu, J. Wu, Z. Liu
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 39(8), pp. 1489-1503, 2017

We address the problem of representing motion trajectories in a highly informative way, and consequently utilize it for analyzing trajectories. We apply our tube-and-droplet representation to trajectory analysis applications including trajectory clustering, trajectory classification & abnormality detection, and 3D action recognition.

[Project Page] [Paper] [Dataset] [Code]

Unsupervised Trajectory Clustering via Adaptive Multi-Kernel-based Shrinkage 2014-2015

H. Xu, Yang Zhou, W. Lin, H. Zha
International Conference Computer Vision (ICCV), pp. 4328-4336, 2015

We introduce an adaptive multi-kernel-based estimation process to estimate the 'shrunk' positions and speeds of trajectories' points. This kernel-based estimation effectively leverages both multiple structural information within a trajectory and the local motion patterns across multiple trajectories, such that the discrimination of the shrunk point can be properly increased.


Representing and recognizing motion trajectories: a tube and droplet approach 2013-2014

Yang Zhou, W. Lin, H. Su, J. Wu, J. Wang, Y. Zhou
ACM Intl. Conf. on Multimedia (MM), pp. 1077-1080. 2014

This paper addresses the problem of representing and recognizing motion trajectories. We propose a 3D tube which can effectively embed both motion and scene-related information of a motion trajectory and a droplet-based method which can suitably catch the characteristics of the 3D tube for activity recognition.




Adobe, Inc | Media Intelligent Lab

May 2021 | Research Scientist

Working on digital human related projects.

Adobe, Inc | Media Intelligent Lab

June, 2020 | Research Intern

Collaborate with researchers on 3D facial/skeleton animations based on deep learning approaches.

Our intern project #OnTheBeatSneak was presented at Adobe MAX 2020 (Sneak Peek).

[Quick Look] [Full Youtube Link] [Press]

Adobe, Inc | Creative Intelligence Lab

June, 2019 | Research Intern

Collaborate with researchers on audio-driven cartoon and real human facial animations and lip-sync technologies based on deep learning approaches.

Our intern project #SweetTalk was presented at Adobe MAX 2019 (Sneak Peek).

[Youtube Link] [Press]

Wayfair, Inc | Wayfair Next Research

June, 2018 | Research Intern

Working on 3D scene systhesis based on deep learning approaches.

NetEase Game, Inc

June, 2015 | Management Trainee

Working on mobile game design, especially on profit models and user-experiences.

Best way to

Contact Me

Best way to reach me is to send an Email