'How do you implement SVoice?

I'm trying to use Facebook's SVoice to split out different speakers in my audio file using python. I found a library that implemented it here:

https://github.com/facebookresearch/svoice

However, I'm having trouble running it. The readme discusses how to train my own dataset which I can't really do since I don't have the noises parsed out in my own audio files. It also talks about how I can separate my own file using one of the models in the models folder but I get the following error when I try to follow the readme and create a model from the toy dataset:

File "/mnt/c/Users/imrea/PycharmProjects/svoice/svoice/data/audio.py", line 34, in find_audio_files
    siginfo, _ = torchaudio.info(file)
TypeError: cannot unpack non-iterable AudioMetaData object

How do I run this to test the output on an audio file of my own? Has anyone used this before? Any guidance would be greatly appreciated!



Solution 1:[1]

You need to have torchaudio version 0.6.0 Try: pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 torchaudio==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html This worked for me.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 cchoi1022