Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Using protein sequences to make better classifiers in bioinformatics

As a data scientist in the bioinformatics field, I often found it useful to add features describing proteins to my models. These were often manually engineered or based on heuristics and alignments, and lacked information on the structure of the protein, as that data is relatively sparse.

Recently I found a paper by Bepler and Berger, published at ICLR 2019, where they created a set of models that use weak supervision to create protein embeddings. In this blog post I take a look at the theory behind this paper and present an intermediate-level tutorial for people who want to include these embeddings in their own models. A comprehensive analysis of the predictive power of these embeddings is also included.

https://stephanheijl.com/protein_sequence_ml.html

submitted by /u/Yuras_Stephan
[link] [comments]