Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Word embeddings for categorical variables?

I am working on a classification problem with a data set containing numerical as well as categorical data. A colleague of mine said that instead of encoding the categorical variables in a “primitive way” (label encoding, creating dummy variables etc.) he would use word2vec to get some kind of word embeddings. This would be a more realistic way of representing these variables. To me this makes no sense. If I understood correctly, for word2vec to work the words we want to embed need neighbors for there to be some kind of context. In a column of a DataFrame containing one string in each row and maybe 3 – 10 unique categories there isn’t any context. Each entry is independent from the entry in the next row. Am I missing something?

I hope I posed the question in a somewhat understandable way.

Thanks, guys.

submitted by /u/aeppelsaeft
[link] [comments]