'How to convert Mediapipe Face Mesh to Blendshape weight

I would like to apply facial motions which are predicted by Mediapipe Face Mesh into 3D models using blendshape.

Target 3D models have blendshapes like ARFaceAnchor.BlendShapeLocation of iOS.

I should convert Face landmark to blendshape weight.
To make this happen, I guess that I should check positions of landmark and calculate distance between them and position of calibrated vertices.
But maybe it require fine tuning and is lacking in versatility.
According to this paper, Google has a model for it but unfortunally they will not publish this model(here).

Could you teach me any good idea?
Or if you know already published logic that seems to be helpful, I would like to know it.



Solution 1:[1]

I'm having the same problem at the moment. My current solution is below:

Quote from paper you cited:

Puppeteering Our model can also be used for virtual pup- peteering and facial triggers. We built a small fully con- nected model that predicts 10 blend shape coefficients for the mouth and 8 blend shape coefficients for each eye. We feed the output of the attention mesh submodels to this blend shape network. In order to handle differences be- tween various human faces, we apply Laplacian mesh edit- ing to morph a canonical mesh into the predicted mesh [3]. This lets us use the blend shape coefficients for different hu- man faces without additional fine-tuning. We demonstrate some results in Figure 5

I think my approach at the moment is pretty much the same as what they've done.

My approach: First sample many pairs of random blendshapes -> face mesh (detecting face mesh on 3D model), and then learning an inverse model from that. (A simple neuronet would do)

Therefore you end up with a model that can give blendshapes given a face mesh.

The catch, which is also mentioned in the above blurb, is that you wanna handle different face mesh inputs. In the above blurb it seems that they sample the 3D model but transform the sampled mesh into the canonical face mesh, and hence end up with a canonical inverse model. At inference you transform a given mesh into the canonical face mesh as well.

Another solution might be to directly transform your different people's face meshes into the 3D model's mesh.

I haven't yet done the canonical mesh part, but the step one should work.

Best regards, C

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yunnosch