Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Classifier for tSNE or UMAP results?

Recently I worked on a binary classification problem. The input data is a high dimension (>100) series. I tried PCA to lower the input to a much smaller dimension (<10) then applied Gradient Boosting on it and this seems to give good result. However I want to improve the results by replacing the PCA part since the classifier is not necessarily linear.

I tried both tSNE and UMAP and they can bring out clusters even in 2D. However I don’t know what to do next:

  1. Should I use clustering algorithms like DBSCAN to do the binary classification? How should I do that? One of the issue is that although I can see a cluster of positives, there are also clusters of mixed positives and negatives that I couldn’t label;
  2. I tried to put UMAP results to Gradient Boosting and to my surprise, it actually give poorer classification than PCA + Gradient Boosting. One issue I believe is that I only tried tSNE and UMAP at 2 or 3 dimensions because the computation time involved. So is there a way (in tSNE or UMAP) to know the intrinsic dimension of a input dataset, like the explained variance or factor loadings in PCA?

I tried to read many articles on how to use tSNE/UMAP properly but it seems most of them focused on visualization and clustering.

submitted by /u/dinoaide
[link] [comments]