[D] Classifier for tSNE or UMAP results?
Recently I worked on a binary classification problem. The input data is a high dimension (>100) series. I tried PCA to lower the input to a much smaller dimension (<10) then applied Gradient Boosting on it and this seems to give good result. However I want to improve the results by replacing the PCA part since the classifier is not necessarily linear.
I tried both tSNE and UMAP and they can bring out clusters even in 2D. However I don’t know what to do next:
- Should I use clustering algorithms like DBSCAN to do the binary classification? How should I do that? One of the issue is that although I can see a cluster of positives, there are also clusters of mixed positives and negatives that I couldn’t label;
- I tried to put UMAP results to Gradient Boosting and to my surprise, it actually give poorer classification than PCA + Gradient Boosting. One issue I believe is that I only tried tSNE and UMAP at 2 or 3 dimensions because the computation time involved. So is there a way (in tSNE or UMAP) to know the intrinsic dimension of a input dataset, like the explained variance or factor loadings in PCA?
I tried to read many articles on how to use tSNE/UMAP properly but it seems most of them focused on visualization and clustering.