[D] I have acquired the videos to the AVSpeech dataset, and I want to share them for research. How?
There is still cleanup in progress; as there are many errors in the dataset that may obscure anyone’s research. For examples, dubbed translation over a speaker will trigger a false positive if trained against, or an off-screen narrator.
After cleanup, I expect the raw output of clipped resources to be approximately 10TB or so. Is there a place that would host that dataset for me that others to use?