[D] Is state of the art speech synthesis at the point where it could be used to transcribe a book or other long form literary works?
Tacotron and derivative models described in recent papers often have appendixes with short speech samples that appear to have crossed the threshold of sounding perfectly human. However, those samples are only a few seconds in duration so I wonder if any attempts were made to use those models in long form text transcription and how those would rate against a proper human narrator.
submitted by /u/leostrauss
[link] [comments]