[D] Is there a way to prove that there is no cluster in a population?
For example, I am looking for cluster in a dataset of 10000 binary variables. I reduce the number to 50 variables with a PCA, then apply t-sne on it to find clusters in the output.
There are separated shapes in the visualisation of tsne, but I understood that points that are far apart in the output are not always far in the higher dimension space. T-sne can find clusters in a normally distributed dataset.
Can we to prove that data follows a normal distribution in all directions of space? Is there a way to remove variables that would add noise to clustering? Like some kind of variable selection but for clustering?
submitted by /u/elpiro
[link] [comments]