I suspect that, for the facial animation at least, there's actually no mocap going on at all. This looks like algorithmically-generated lipsync, which matches phonemes in the audio playback to associated mouth shapes. For games with a ton of dialog, it's way less time-consuming to do it this way than to have to mocap (or, god forbid, hand-animate) every line, and when done well, it's fairly convincing. It looks terrible here because the tweening from mouth shape to mouth shape has atrocious curve easing (look at the way the mouths snap from shape to shape), and the devs didn't seem to bother writing any code to animate the rest of the body beyond blinking and wobbling in place.
Guild Wars 2 (which came out in 2012, for reference) uses the same sort of computer-driven lipsyncing technology in their in-game cinematics, but they also have a library of expressions for the body and the face itself that they layer on top of the lipsync to actually give the characters more life. It's still not perfect, but at least in GW2 when a character says something in an angry tone of voice, they actually have a scowl on their face and gesture threateningly.