Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

Great Data Science Company Blogs

List of popular company Data Science blogs

Some of the best technology companies showcase their innovation from time to time on their blogs. These blogs are a great source to read when you are preparing for company specific data science interviews. From a company perspective, the blogs help attract data professionals. In the last few years, companies are in a race to hire data talent and have started showcasing their data science technology and techniques by having separate data science/ machine learning or AI sections on their blogs.

At Acing Data Science, we consume a lot of papers, blogs, videos and podcasts about data science. Lots of companies write about data science but below are our top picks of the company data science blogs. These blogs cover one or few aspects of data science in a very helpful way helping the whole data science community in general.

Photo by Luke Chesser on Unsplash
  • Google: Google is where some of the very early research in data science and AI began. Their AI blog is one of the most mature and complete manifestation of what an AI blog would look like. The blog covers everything from publications, stories, open source data science frameworks, data sets, tools, learning courses and finally careers at this AI institution.
  • Uber: Uber AI Labs has a fantastic set of articles which gives us a speak peak into the great work going on within Uber. Uber’s also gives building blocks about its coveted ML-as-a-service platform Michaelangelo. Uber has also open sourced many data engineering and data science frameworks and mentioned them on its blog.
  • Facebook: Facebook has been doing great work in computer vision and conversational AI. They have open sourced Pytorch which is increasingly cited in papers on ArXiv. Their blog also covers publications, experiments and techniques within Facebook which helps advance the data science field forward.
  • AirBnB AI & Machine Learning: Airbnb has one of the best AI and ML company blogs. They have done some amazing work using deep learning models on search, listing photos and a host of other things. Airbnb data scientists are split across teams which is detailed by Elena Grewal. It shows some of the best ways to think about building and managing teams within product companies.
  • Instacart Data Science | Instacart ML: Instacart handles 200 million plus grocery items on their platform. The blog showcases their data engineering prowess. It also shows some of the techniques they apply to critical business areas like delivery, cost prediction, real-time availability of grocery items and even some great data visualizations using their data.
  • OpenAI blog: OpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. OpenAI has some great papers and findings on their blog which are on the cutting edge of AI.
  • StitchFix: Stichfix is the most under rated data science blog for their data visualizations. Their algorithms tour is one of the best ways I have seen data scientists explain what their product does. Their blog (multi-threaded) does not have a separate section for data science but they cover the interesting things they do within Stitchfix.

This is by no means an exhaustive list of company blogs to follow and read. These blogs have some of the best data science content helpful for all data professionals!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Newsletter

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about the different AI company blogs and its technologies. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Great Data Science Company Blogs was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

TVEyes transcribing 13,000 podcasts… should I be impressed? [News]

The company TVEyes says it is now transcribing 1,500 hours of podcasts every day (press release). TVEyes clients receive alerts when they are mentioned in one of the 13,000 most popular podcasts.

I’ve never worked with audio, so I’m curious how impressive it is to be able to process 62.5 hours of spoken audio per hour. Is this something that could be done with an off-the-shelf algorithm and a $5,000 server, or are we talking about PhD-level expertise and massive computing power?

submitted by /u/TrueBirch
[link] [comments]

[D] Best of AI News of September

Hi, there are 10 news/reflexions which showed up on September I summed up them, as a recap. Did I forgot an important one? What are the most important for you in October?

There is the list :

  • Documentary aired on how AI was seen in 1960
  • Tensorflow 2.0 released, patch note
  • Reflexion on how to make technology work WITH society?
  • MIT is helping doctors detecting patient pains
  • Launch of the Facebook & Microsoft Deepfake challenge
  • How US government overcomes European GDPR law
  • Google finned for targeting children
  • Report published on the world wide expansion of AI
  • Google published a new Multilingual Speech Recognition model
  • Report on the pollution associated to artificial intelligence

https://www.sicara.ai/blog/2019-10-21-best-of-ai-september-2019

submitted by /u/kouskastook
[link] [comments]

[D] ML Inference optimization, runtimes, compilers

I’m doing a study on inference latency. What are different ways of optimizing your model for this? Let’s say the goal is to get your inference latency as low as possible. I’ve heard of ONNX runtime (apparently used by Microsoft in production), compilers such as Intel nGraph, TVM, Intel OpenVINO and so on. Are these kind of tools used in production, or do most companies just use PyTorch and TF inference mode? If anyone here has experience from unique deployments I’d love to hear about it!

submitted by /u/dilledalle
[link] [comments]

[D] Are small transformers better than small LSTMs?

Transformers are currently beating the state of the art on different NLP tasks.

Some examples are:

  • Machine translation: Transformer Big + BT
  • Named entity recognition: BERT large
  • Natural language inference: RoBERTa

Something I noticed is that in all of the papers, the models are massive with maybe 20 layers and 100s of millions of parameters.

Of course, using larger models is a general trend in NLP but it begs the question if small transformers are any good. I recently had to train a sequence to sequence model from scratch and I was unable to get better results with a transformer than with LSTMs.

I am wondering if someone here has had similar experiences or knows of any papers on this topic.

submitted by /u/djridu
[link] [comments]

[D] Looking for suggestions for biomedical datasets similar to the Wisconsin Breast cancer database

I am looking for biomedical databases similar to the Wisconsin breast cancer database (available at https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)) ). This database has 9 features (each feature values being integers ranging from 1 to 10) and two classes – benign and malignant. Defining characteristic of this dataset is that the higher feature values generally indicate higher chance of abnormality (malignancy). I am looking for other biomedical datasets having features with this property (not necessarily integer valued, can also be real valued; preferably with low number of features also – less than 30 or so)

submitted by /u/daffodils123
[link] [comments]

[D] What is the current state-of-art in unsupervised document/information retrieval for NLP tasks?

Hello everybody,

Are there any good unsupervised methods of retrieving top-k documents from corpus based on a rather short query?

I did a bit of googling but couldn’t find anything that isn’t tf-idf based.

Maybe it would be possible to somehow retrieve similarities between docs and query by utilising contextual embeddings (such as from BERT) and use some sort of scoring function to evaluate it.

Anyway, thank you in advance for your answers.

submitted by /u/Slowai
[link] [comments]