Recently I’ve been on a huge animation kick. I love doing all things 3D, but if I ever were to specialize I think I’d really love to become a technical animator. So a big goal for me across this year is to take some larger steps to get there.
I also have been working on a custom VTuber setup inside of Unity for a few years now. There’s a lot of interesting work that can be done when you combine live tracking from a webcam with premade animations and procedural animations.
One of my dreams is to try and build something truly larger than life. Plenty of other VTubers have created setups with lots of expensive mocap suits and trackers that make them feel real, specifically with creators like CodeMiko and Saruei recently. The hardware they use though is crazy expensive though. I also don’t really have the space for a full body tracking solution of that scale. So instead I want to try and mimic that same expression you could get out of a high quality mocap performance, but instead use a variety of premade and procedural animations.
I’ve been doing this for a while with a very simple approach, mix the posing received from live webcam tracking with premade third person character animation. This doesnt always make sense, especially when far away, so I blend the influence of the tracking based on the angle of my face to the camera, and the distance. Theres a bit of head tracking assistance too, since my real head isn’t actually turning to look around the environment in the game, it’s all on my actual screen. You can see a little bit of this in this twitch clip. This still feels pretty basic and I’ll get into how I want to improve that below.
The next level is to use some procedural animation, using IK systems and blendspace/blendtree states. I used this to create both a mouse tracking system and a pen tracking system, nearly identical. These use poses that are setup around a fake mouse/tablet in a game world, and then use my mouse position to drive the positioning. With a setup like this, we can create something a webcam can’t feasibly track.
One of the earliest examples of this was from 2024, just using a raw IK target on the hand and a head look system.
Heres an example I showed last year on twitter when I set this up to mimic a mouse, using blendtrees,
https://fixupx.com/maidmage/status/1942847204416233573?s=20
and then below is an updated thats a bit more recent, showing off a more polished version of both with body tracking combined.
But I feel like a lot of these solutions were a little basic and could use a bit more depth. Theres two problems I’ve been facing, the first being the quality of my animations, and the second being how I’ve been implementing procedural animation. So I’ve been doing a lot of studies on animation since the start of this month, one part focused on this 3D animation course called Alive! that I’ve been working through at least an hour a day. The other part has been spent watching a variety of industry talks about animation, particularly a lot of procedural stuff.
I was watching this talk given by James Benson (and others) at GDC called Animation Tricks of the Trade (2018). Benson worked on a variety of games, but most notably Ori and the Blind Forest, Firewatch, and Half Life Alyx. During his segment he mentioned character look-at systems and how they should be broken down into multiple layers. A single chain can end up appearing rigid, but with separation and delay you can build a more life-like performance from your characters.
“So if instead you split your character’s attention into say the eyes, the head, the chest, the equipment their using- even if you want all those things pointed in one direction, they’re gonna get there with different timings, and even better you can point them in different places…”
I want to take this approach but go a little bit further by adding some other procedural animations, stuff like turn in place animations and some upper body spaces. But before I can get there I need to break the problem down into each of its components.
So the first thing I had to do was to find a way to implement an eye-look system for my VTuber avatar. I have existing blendshape/morphtarget/shapekey poses used by my ARKit tracking, but I haven’t tried animating them yet. My rig was only meant to key bones, and since my avatar controls those expressions without bones, I wasn’t quite sure how I would animate these as clips and if they’d be treated by unity properly. Especially since I want to be animating for multiple different characters with the same rig, which was another constraint.
It turned out to be a lot simpler though, and I instead animated the expressions inside of Unity directly, rather than in Blender, by keying my desired posing on the mesh level. I then setup a few poses with my eyes looking forward, up, down, left and right. These were then placed inside of a blend tree. Using a script I track a given Eye Look Target’s position, which is referenced against the head’s forward direction to calculate an approximate Yaw/Pitch angle from my face. These were then remapped, so any angle of about 45 degrees would translate to a maximum look left, for example. I made the vertical ones a bit more sensitive though for taste.
Here’s an example of the eye expressions using the preview window.
Once the eyes could look in the vague direction of a target, I added back in a head look system with less influence. A direct head look feels a bit too rigid, so I gave it about 0.5 strength, with 0.1 also on the neck. I find that its a lot more natural to turn slightly with the head and turn more with the eyes. Then to add a bit of independence, like suggested by Benson, I changed the Head to focus on a separate target, this one instead lags behind the eyes. With this current system the eyes move first and then the head follows. This creates a bit of an overshoot effect with the eyes, since they can reach an extreme angle which relaxes to a more centered position as the head follows in.
The effect can be seen below. I also added some extra looping facial expression layers too to make the avatar feel more like a living NPC in the game world. This is all without any tracking enabled what so ever, my webcam wasn’t even plugged in.
https://www.youtube.com/watch?v=_QEB6QxXhvs
I think the next step is to take this a bit further with some upper body posing, allowing the look system to go through the rest of the body. If the focus is far enough, I’d also like to engage a turn in place animation to orient the character better to a more natural angle.
Some smaller things I’d like to do aswell, in no particular order:
- Find a workflow for recording facial animations from my tracking
- Build a system that can blend between these premade facial animations and my live face tracking data
- Change the auto-blink animations and facial expressions to work on a random time value rather than a fixed loop.
- Create a system for toggling these facial expression layers separately.
- Change the look delay to have acceleration and overshoot, add some inertia to stay idle before looking.
Systems such as these are standard in the AAA games space for making NPCs feel like real characters, and they often feel much more real than some cheaper implementations of VTuber tracking. I’m hoping that if I can use the same techniques that these games use to make an NPC come to life in a video game, that I can use that to augment my tracking and create something beyond what a mocapped performance can deliver. Premade and procedural animations can help bring something that is just impossible for a human to perform, and if we drive that with human inputs and a blend of human expressions, I think the end result could be something really interesting and technically impressive.