[D] Analyzing thousands of podcast transcripts – interesting project ideas & best algos?
Hi – we’re a small startup focusing on podcast transcriptions. We are working to make them readable, searchable, etc. We have previously used Tf-idf + LDA topic modeling to extract underlying topics in the corpus and compute related podcasts.
Potential ideas for future interesting projects include:
- automatically identifying ads
- picking up trends/sentiment on politics/people/companies, etc
- find promo codes
- auto-generate podcast summaries
What would you find most interesting, and why?