The SMPL Model

Keep it SMPL™

What is SMPL?

SMPL stands for Skinned Multi-Person Linear Model is a patented 3D parametric body model of the human body. SMPL is trained on hundreds of thousands of 3D and 4D scans of people and it parametrizes accurate human body shape and realistic changes in body shape and pose.

The SMPL parametrization encodes 3D body shape, pose, soft tissue motion, hands, fingers — Try it live at Meshcapade Me

The SMPL Patent

The SMPL Model has been developed at the Max Planck Institute for Intelligent Systems (MPI) and is owned by and proprietary material of the Max-Planck-Gesellschaft zur Foerderung der Wissenschaften e.V. (MPG). The SMPL-Model specification is defined in the patent application WO2016207311A1 (2016-12-29), Skinned Multi-Person Linear Model. MPG owns patented technology disclosed in this patent application. Meshcapade has an exclusive license to use and sub-license the use of the SMPL Model to its customers.

Skinned multi-person linear model

US patent US10395411B2

Compatibility & Controllability

SMPL is appearance agnostic. Our focus is creating real human motion, behavior, and expressions that can be used to drive the character layer, or skin, from any source.

Compatibility with traditional graphics (Unreal, Roblox, Fortnite): SMPL can be used to drive any 3D avatar system like Unreal Engine, Roblox, Fortnite.
Control signal for video diffusion models: SMPL is also being used as a control signal for video diffusion models, neural appearance models and Gaussian splatting.

The De-Facto Standard

SMPL is the de-facto standard for representing 3D humans in any AI, machine learning and generative training architecture, whether it’s for industrial research, products or academia. We’ve made it available for widespread adoption through licensing for many different commercial and R&D purposes.

The most significant features that make the SMPL model so important are:

SMPL is a compression algorithm: It inverts the 3D reality of humans into a form that computers can “read”. SMPL encodes all of the 3D information about human body shape, pose, soft tissue motion, expressions, hand articulation and dynamics into just 100 parameters.
SMPL is the control signal for diffusion models. SMPL is a statistical distribution and the most compact 3D representation of humans. Using SMPL in the latent encoding of video diffusion models gives us the control to drive generated video with human motion.
SMPL is the label for all human behavior. Humans body motion and pose cannot be completely described in words or language. The 100 parameters of the SMPL model give us the “language” of human behavior for training AI agents.
SMPL is the “canvas” for humans in 3D space. SMPL provides a consistent representation for 3D space on and inside the human. SMPL very quickly became the de-facto standard for representing 3D humans because anyone who needs to work with 3D human data — whether it’s 3D scans, 2D images, motion capture — they need the 3D “canvas”.
SMPL is a 3D graphics primitive. It’s like any other equation for the base primitives in 3D engines, like 3D planes, cubes, spheres, cones, cylinders — and of course the Utah teapot. The SMPL model equation simply creates a 3D human instead of a sphere or a cube.

ACM ToG: Seminal Papers Award

SMPL: A Skinned Multi-Person Linear Model Volume 2

Seminal Graphics Papers: Pushing the Boundaries

SMPL is the third most cited publication in the history of ACM Transactions on Graphics and was awarded the Seminal Papers award by ACM Transactions on Graphics’s 50th anniversary in 2023. Since its publication in 2016, SMPL has now been used in a myriad of different applications across industry and research.

SMPL in the Industry

SMPL is the de-facto standard representation for 3D humans in industry and academia in many different fields ranging from AI, computer graphics, computer vision to 3D motion training, simulation and generation.

At Meshcapade we have combined the SMPL model and our 3D motion datasets to build our own procedural generation pipeline so that any of our ML scientists can generate data, on demand, to suit their model training needs in a self-serve manner. This enables us to generate unlimited amounts of video of people in motion, containing a wide variety of human motions, body shapes, scenes, lighting, clothing, cameras, and camera motions, all with perfect 3D ground truth data.

But most importantly, our unique technological and data generation moat has allowed us to create the world’s best technologies for 3D human motion capture, understanding and generation.

SMPL in Mixed Reality at Microsoft - Microsoft Research

A simple strategy for body estimation from partial-view images

Zalando brings a virtual fitting room pilot to millions of customers

BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

GENMO: A GENeralist Model for Human MOtion

Controllable Human-Object Interaction Synthesis

DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors

NeuMan: Neural Human Radiance Field from a Single Video

Look Ma, no markers: holistic performance capture without the hassle

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

TeSMo: Generating Human Interaction Motions in Scenes with Text Control

Meta Motivo

ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills

HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning

META AI Habitat 3.0 for Socially Intelligent Robots (SiRo)

HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations - Microsoft Research

COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control

OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation

AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

CoMotion: Concurrent Multi-Person 3D Motion

HUGS: Human Gaussian Splats

Robot Motion Diffusion Model: Motion Generation for Robotic Characters - Disney Research

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Motionshop-2

SMPL in Meshcapade Technologies

But most importantly, our unique technological and data generation moat has allowed us to create the world’s best technologies for 3D human motion capture, understanding and generation.

MoCapade: 3D Motion Capture

Our 3D motion capture product is live and free for testing. Try it now: Mocapade 3.5

Input: video from any camera

Output: 3D rigged animation file in GLB, FBX & SMPL formats.

Using our real world & synthetic 3D data we have trained the worlds best motion from video model. With this, we are taking motion capture out of the studio and into the real world. This is now our internal engine for building the world’s largest dataset of real world 3D motion in the wild:

Key Features of MoCapade

👯 Multi-person capture
🎥 Export estimated 3D camera trajectories
🙌 Detailed hand articulation and gestures
⏫ New 3D & video export options: GLB, MP4 and SMPL

All of this, just from a single video. Any video. Even moving handheld cameras, phones and awkward angles.

Carlton on DWTS (HD 720p).gif

Estimating body shape and accurate motion from a single view is one of the longest researched problems in computer vision and one of the most sought after. Behind the scenes, there is a decades-long history of science publications and research from many key members of Meshcapade that went into the development of our MoCapade product. From parametric body models, body shape estimation methods, motion extraction methods, pose probability models, camera estimation and tracking models, and finally, tokenized representations of body, shape and motion are all a few of the decades of development just from our small team at Meshcapade.

The Science behind MoCapade

The model behind MoCapade is based on our CVPR 2025 publication PromptHMR (Patent Pending):

PromptHMR: Promptable Human Mesh Recovery (ICCV 2025)

PromptHMR is a promptable human pose and shape (HPS) estimation method that processes images with spatial or semantic prompts.

MoGen: 3D Motion Generation

We leverage the 3D motion information of this real world data together with the rich spatial and semantic information about the real world to train our AI characters how humans interact with the real world, objects and spaces. We’re turning the 3D physical reality of humans that we capture using MoCapade, into behavioral intelligence for AI models. We’ve teaching them how to react in realtime like real people.

Interested in trying it out? Reach out to us! Contact Sales .

Fine-tuning MoGen for Observed Behavior

The Science behind MoGen

The model behind MoGen is based on the our publication PRIMAL (Patent Pending)

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning

(ICCV 2025)

PRIMAL is a generative motion model that works in realtime.