[D] Find text in long audio clip.
Are there any ML models or tools that take a sentence (text) and 1-hour audio clip (mp3/wav) as input and output the time span(s) where that sentence is said in the audio?
submitted by /u/marksbren
[link] [comments]