[P] Structure-preserving dimensionality reduction in very large datasets
Hi there, we’re a London-based research team working on clinical applications of machine learning. Recently, we’ve been dealing a lot with clinical datasets that exceed 1M+ observations and 20K+ features. We found that traditional dimensionality reduction and feature extraction methods don’t deal well with this data without subsampling and are actually quite poor at preserving both global and local structures of the data. To address these issues, we’ve been looking into Siamese Networks for non-linear dimensionality reduction and metric learning applications. We are making our work available through an open-source project: https://github.com/beringresearch/ivis
So far, we’ve applied ivis to single cell datasets, images, and free text – we’re really keen to see what other applications could be enabled! We’ve also ran a large number of benchmarks looking at both accuracy of embeddings and processing speed – https://bering-ivis.readthedocs.io/en/latest/timings_benchmarks.html – and can see that ivis begins to stand out in datasets with 250K+ observations. We’re really excited to make this project open source – there’s so much for Siamese Networks beyond one-shot learning!