Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] SOTA topic extraction

TLDR: Are there non-LDA algorithms for topic modeling that are performant or state-of-the-art?

I’m working for a company that has a corpus of 10k articles for which they’d like to have topics identified and extracted. The company has a specific clientele; therefore the articles are already quite focused and topical (i.e., an engineering company would probably only write articles about engineering or engineering-adjacent things). Essentially, I’m trying to mine articles for sub-topics within our area of expertise.

I’m aware of LDA/LDA2Vec for topic modeling. In our case, since all of the articles are already of the same umbrella topic, the “topics” found via LDA tend to have an incredible amount of overlap relevance and salience metrics tend to prioritize words that relate both to the umbrella topic and the subtopic (unhelpful), or that are extremely rare occurrences (useless) – this is after multiple passes of filtering out frequent, rare, and low-value words.

I guess I’m hoping for something that either draws inferences from semantic meaning or uses a more sophisticated “topic” definition than probabilistic co-occurrence.

Thanks!

submitted by /u/namnnumbr
[link] [comments]