'How to mix audio description track into stereo mix in FFMPEG or SOX

I have a video file with a stereo mix. I have also been provided with an additional audio description track (a narration which describes what's happening on screen for visually-impaired audiences) as a mono WAV.

I am trying to mix the two together, however the tricky part is adjusting the levels. The levels of the main mix should be dipped before and raised back again after each line of speech in the AD track.

The company who produced the AD track have offered to do this for a fee, however I noticed that their fee is static regardless of the length of the film, so I assume it must be an automated process (if it involved a sound mixer in a studio, it'd be charged at a per minute rate).

I'm wondering if it's possible to do this myself in FFMPEG.

The AD track is cleanly recorded at a consistent level and is entirely silent in between the lines of narration. So imagine it would be in principle possible to determine where the main mix needs to go up and down.

Would probably need to:

Analyse the levels of the AD track and convert to a list of "fade down here", "fade up here" instructions.
Apply that list of instructions to the main mix to create an intermediate.
Mix together the intermediate with the AD track.

The final step could be achieved with the amix filter, but I have little idea how to approach the first 2 steps.

Does anyone know if this is achievable with FFMPEG? I'd also be open to using other programs such as SOX.

ffmpeg sox

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to mix audio description track into stereo mix in FFMPEG or SOX

Sources

Related Questions