Blog

Learn About Our Meetup

4500+ Members

Category: Vimarsh Karbhari

Acing Data Science Interviews

According to Indeed, there is a 344% increase in demand for data scientists year over year.

In January 2018, I started the Acing AI blog with a goal to help people get into data science. My first article was about “The State of Autonomous Transportation”. As I wrote more, I realized people were interested in acing data science interviews. This led me to start my articles covering various companies’ data science interview questions and processes. The Acing AI blog will continue to have interesting articles as always. This journey continues with today’s exciting announcements.

First, we are launching the Acing Data Science/Acing AI newsletter. This newsletter will always be free. We will be sharing interesting data science articles, interview tips and more via the newsletter. Some of you are already subscribed to this newsletter and will continue to get emails on it.

Through my first newsletter, I also wanted to share the next evolution of the Acing AI blog, Acing Data Science Interviews.

I partnered with Johnny to come up with an amazing course to help people ace data science interviews. Everything we have learned from conducting interviews, giving interviews, writing these blogs and learning from the best people in the data science, we packaged that into this course. Think about the collective six plus years of learning condensed into a three month course. That would be Acing Data Science Interviews.

At a high level, we will cover different topics from a data science interview perspective. These include SQL, coding, probability and statistics, data analysis, machine learning algorithms, advanced machine learning, machine learning system design, deep learning, neural networks, big-data concepts and finally approaching a data science interviews. The first few topics provide the foundation aspects of data science. They are followed by the data science application topics. Collectively, all these should encompass everything that could be asked in a data science interview.

The first sessions will start in the second half of September 2019. We are aiming to have a small group of 15 people. The original course will be only 199$. We are focused on quality and would like to provide the best experience and hence, we want to keep the small group size.

Acing Data Science Interviews

Thank you for reading!


Acing Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Workday Data Science Interviews

In 2012, Workday launched a successful IPO valued at $9.5 billion.

Workday is a leading provider of enterprise cloud applications for finance and human resources. It was founded in 2005. Workday delivers financial management, human capital management, planning, and analytics applications designed for the world’s largest companies, educational institutions, and government agencies. In January 2018, Workday announced that it acquired SkipFlag, makers of an AI knowledge base that builds itself from a company’s internal communications. In July 2018, they acquired Stories.bi to boost augmented analytics. These two acquisitions point towards an increased investment in the data science domain.

Source: cloudfoundation.com

Interview Process

The process starts with a phone screen with a recruiter. That is followed by a technical phone interview with hiring manager. The questions are typical machine learning and data science questions — with some data structures and algorithms questions. If both of those go well, there is an onsite interview.
The onsite consists of five interviews with different team members, hiring managers, and executives. The questions are about programming skills, algorithmic skills, data structures, and anything related to machine learning techniques.

Important Reading

Source: https://workday.github.io/scala/2014/05/15/managing-a-job-grid-using-akka

Data Science Related Interview Questions

  • Given data from the world bank, provide insights on a small CSV file.
  • Write a C++ class to perform garbage collection.
  • Given 2 sorted arrays, merge them into 1 array. If the first array has enough space for 2, how do you merge the 2 without using extra space?
  • Given a huge collection of books, how would you tag each book based on genre?
  • Compare the classification algorithms
  • Logistic regression vs neural network
  • Integer array — get pairs of values that equal a certain target value.
  • How would you improve the complexity of a list merging algorithm from quadratic to linear?
  • What is p-value?
  • Perform a tweet correlation analysis and tweet prediction for the given dataset.

Reflecting on the Questions

The questions are highly technical in nature. They point towards a very strong requirement of having Data Scientists who can code very well. Workday is the employee directory in the cloud and there are interesting things that could be done based on data. A good inclination of a Data Scientists in coding can surely land a job with Workday!

Subscribe to our Acing Data Science newsletter. A new course to ace data science interviews is coming soon. Sign up below to join the waitlist!

Acing Data Science Interviews

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


Workday Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Goldman Sachs Data Science Interviews

Goldman Sachs did net revenue of 35.94 billion dollars in 2018.

The Goldman Sachs Group, Inc. is a leading global investment banking, securities and investment management firm that provides a wide range of financial services to a substantial and diversified client base that includes corporations, financial institutions, governments and individuals. Goldman Sachs makes key decisions by taking a calculated risk, based on data and evidence. As a Data Science practitioner, your analysis might have first hand impact to make millions of dollars. The FAST (Franchise Analytics Strategy and Technology) team at Goldman Sachs is a group of data scientists and engineers who are responsible for generating insights and creating products that turn big data into easily digestible takeaways. In essence, the FAST team is comprised of data experts who help other professionals at Goldman Sachs act on relevant insights.

Source: https://revenuesandprofits.com/how-goldman-sachs-makes-money/

Interview Process

The first step is the phone screen with hiring manager person. There is usually a hackerank/coderpad coding assignment involved for an ML/Data Engineer type of role. If that goes well, there is an onsite interview. The onsite interview is usually 4–6 people deep dive into analysis, probability and stats, coding and data science concepts.

Important Reading

Data Science Related Interview Questions

  • Design a random number generator.
  • How to treat missing and null values in a dataset?
  • Given N noodles in a bowl and randomly attaching ends. What is the expected number of loops you will have in the end?
  • How to remove duplicates without distinct from a database table?
  • When is value at risk inappropriate?
  • What is the Wiener process?
  • A = [-2 -1] [9 4]. What is A¹⁰⁰⁰?
  • Write an algorithm for a tree traversal.
  • Write a program for Levenshtein Distance calculation.
  • Count the total number of trees in the states.

Reflecting on the Questions

GS is one of the best places to work for because they really take care of their people. The questions reflect a mix of puzzles and analysis based questions which form the basis of financial investments in general. Thinking on your feet is very important as puzzles can get complicated. A great presence of mind and ample preparation can surely land you a job with one of the most prestigious investment banks in the world!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


Goldman Sachs Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to evaluate ML models using confusion matrix?

Model Evaluation using Confusion Matrix

Model evaluation is a very important aspect of data science. Evaluation of a Data Science Model provides more colour to our hypothesis and helps evaluate different models that would provide better results against our data.

What Big-O is to coding, validation and evaluation is to Data Science Models.

Photo by Leon Koye on Unsplash

When we are implementing a multi-class classifier, we have multiple classes and the number of data entries belonging to all these classes is different. During testing, we need to know whether the classifier performs equally well for all the classes or whether there is bias towards some classes. This analysis can be done using the confusion matrix. It will have a count of how many data entries are correctly classified and how many are misclassified.

Let’s take an example. There is a total of ten data entries that belong to a class, and the label for that class is “Class 1”. When we generate the prediction from our ML model, we will check how many data entries out of the ten entries get the predicted label as “Class 1”. Suppose six data entries are correctly classified and get the label “Class 1”. In this case, for six entries, the predicted label and True(actual) label is the same, so the accuracy is 60%. For the remaining data entries (4 entries), the ML model misclassifies them. The ML model predicts class labels other than “Class 1”. From the preceding example, it is visible that the confusion matrix gives us an idea about how many data entries are classified correctly and how many are misclassified. We can explore the class-wise accuracy of the classifier.

Source: ML Solutions

For more learning on similar topics, the ML solutions book provides good explanations.

For more such answers to important Data Science concepts, please visit Acing AI.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


How to evaluate ML models using confusion matrix? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Shopify Data Science Interview Questions

Shopify powers 800,000 businesses in approximately 175 countries.

The first iteration of Shopify (before it was called that) was an online store that sold snowboards. Eventually, there was a pivot to becoming an e- commerce platform. It’s been named Canada’s “smartest” company, among myriad other well-earned accolades. Shopify was the third largest e-commerce CMS in 2018, with a market share of 10.03% in the first million websites. In 2018, Shopify platform did 1.5+ Billion $ in sales on Cyber Monday alone.

Source: https://mobilesyrup.com/2018/05/08/shopify-new-retail-features-chip-reader/

Interview Process

The first step is the phone screen with HR person. The next step is a three part in person interview (‘life story’ and technical interview). Once those are clear, there is an onsite interview which consists of two more technical interviews, and three more interviews before prospective team leads.

Important Reading

surviving-flashes-of-high-write-traffic-using-scriptable-load-balancers

Data Science Related Interview Questions

  • Go through a previously completed project and explain it. Why did you make the choices in the project that you did?
  • What’s the difference between Type I and Type II error?
  • Explain the difference between L1 and L2 regularization.
  • Write a program to solve a simulation of Conway’s game of life.
  • What is the difference between supervised and unsupervised machine learning?
  • What’s the difference between a generative and discriminative model?
  • What’s the F1 score? How would you use it?
  • What is your experience working on big data technologies?
  • Do you have experience with Spark or big data tools for machine learning?
  • How do you ensure you are not overfitting with a model?

Reflecting on the Question

The 800,000 businesses that Shopify powers generates massive amounts of data. The Data Science team at Shopify asks basic data science questions which are fundamental in nature. Sometimes, the questions revolve around your resume and the problems you have solved in your past career. Good grip on fundamentals can surely land you a job with the world’s largest e-commerce platform!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Shopify and its technologies helping people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Shopify Data Science Interview Questions was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

What is TF-IDF in Feature Engineering?

Basic concept of TF-IDF in NLP

The concept TF-IDF stands for term frequency-inverse document frequency. This is in the field of numerical statistics. With this concept, we will be able to decide how important a word is to a given document in the present dataset or corpus.

Frequency

What is TF-IDF?

TF-IDF indicates what the importance of the word is in order to understand the document or dataset. Let us understand with an example. Suppose you have a dataset where students write an essay on the topic, My House. In this dataset, the word a appears many times; it’s a high frequency word compared to other words in the dataset. The dataset contains other words like home, house, rooms and so on that appear less often, so their frequency are lower and they carry more information compared to the word. This is the intuition behind TF-IDF.

Let us dive deep into the mathematical aspect of TF-IDF. It has two parts: Term Frequency(TF) and Inverse Document Frequency(IDF). The term frequency indicates the frequency of each of the words present in the document or dataset.

So, its equation is given as follows:

TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)

The second part is — inverse document frequency. IDF actually tells us how important the word is to the document. This is because when we calculate TF, we give equal importance to every single word. If the word appears in the dataset more frequently, then its term frequency (TF) value is high while not being that important to the document.

So, if the word the appears in the document 100 times, then it’s not carrying that much information compared to words that are less frequent in the dataset. Thus, we need to define some weighing down of the frequent terms while scaling up the rare ones, which decides the importance of each word. We will achieve this with the following equation:

IDF(t) = log10(Total number of documents / Number of documents with term t in it).

Hence, equation is calculate TF-IDF is as follows.

TF * IDF = [ (Number of times term t appears in a document) / (Total number of terms in the document) ] * log10(Total number of documents / Number of documents with term t in it).

In reality, TF-IDF is the multiplication of TF and IDF, such as TF * IDF.

Now, let’s take an example where you have two sentences and are considering those sentences as different documents in order to understand the concept of TF-IDF:

Document 1: This is a sample.

Document 2: This is another example.

Source: Python NLP
Source: Python NLP

In summary, to calculate TF-IDF, we will follow these steps:

1. We first calculate the frequency of each word for each document.

2. We calculate IDF.

3. We multiply TF and IDF.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

Reference: Python NLP


What is TF-IDF in Feature Engineering? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

What is AUC?

Data Science Interview Questions based on AUC.

Few weeks ago, I started wrote about ROC curves. The purpose was to provide a basic primer on ROC curves. As a follow up, this article talks about AUC.

Photo by Zbysiu Rodak on Unsplash

AUC stands for Area Under the Curve. ROC can be quantified using AUC. The way it is done is to see how much area has been covered by the ROC curve. If we obtain a perfect classifier, then the AUC score is 1.0. If the classifier is random in its guesses, then the AUC score is 0.5. In the real world, we don’t expect an AUC score of 1.0, but if the AUC score for the classifier is in the range of 0.6 to 0.9, then it is considered to be a good classifier.

AUC for the ROC curve

In the preceding figure, the area under the curve which has been covered becomes our AUC score. This gives us an indication of how good or bad our classifier is performing. ROC and AUC are the two indicators that can provide us with insights on how our classifier performs.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

Reference: ML Solutions


What is AUC? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Expedia Data Science Interview Questions

There are 37 million Expedia members across 32 countries.

Expedia has covered 534 billion miles in air travel, this is enough for 72 round trips (in passenger miles flown) from the sun to Pluto and back. Expedia is a travel company like Booking.com which we have covered at Acing AI previously. It has sold enough hotel room nights in the last 20 years to account for every person living in the United States. The amount of data Expedia accumulates by having so many travellers every year leads to huge investment in technology. Expedia has invested over $850M trailing year over year in tech spend. A mature tech stack helps Data Scientists at their job. This is a great opportunity for any Data Scientist to build their career.

Photo by Vincent Versluis on Unsplash

Interview Process

If you are short-listed after resume screening, there is first interview with the manager of the data science team. These include technical questions about machine learning and statistics. After clearing that, there is a technical coding interview. The third round is an interview with HR more classic and typical job interview.

Important Reading

Source: Streaming Data Ecosystems

Data Science Related Interview Questions

  • What is the process of cross validation?
  • How can we do price optimization for properties on Expedia?
  • Predict Hotel prices in a given dataset.
  • Explain a Machine Learning project on your resume.
  • Develop a recommendation system based on a provided dataset.
  • Which flight path is more profitable for London-Lisbon or London-Milan?
  • Should we invest on buying more property in X city?
  • Explain linear and logistic regression.
  • Give pros and cons of SVM.
  • Explain the meaning of overfitting to non technical people.

Reflecting on the Question

The data science team at Expedia is geographically dispersed. The technical team has build a very mature data science architecture that enables the Data Science team. The questions are based on the questions the data science team at Expedia answers day to day. Great product sense about the Expedia product and its business can surely land you a job at one of the world’s largest travel sites!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Expedia and its technologies helping people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Expedia Data Science Interview Questions was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Lyft Data Science Interview Questions

As of January 2018, Lyft could count 23 million users.

Lyft currently offers services in 350 US cities, and Toronto and Ottawa in Canada. It was launched in 2012, as a part of long-distance car-pooling business Zimride — the largest such app in the US (named for transportation culture in Zimbabwe). It was renamed as Lyft later. Launched in Silicon Valley, Lyft spread from 60 US cities in April 2014 to 300 in January 2017, to 350 today — plus the two aforementioned Canadian cities. With 350 cities, millions of users and billions of rides the data generated at Lyft is huge. The product achieves economies of scale deploying Data Science. Hence, data science is a core part of the product and not just an added feature.

Photo by Austin Distel on Unsplash

Interview Process

The interview process starts with a phone interview with a Data Scientist. It is around an in depth conversation about your resume and past projects. That interview is followed by a take home test which is usually around a ride sharing data set. As part of the take home test, there is a presentation which has to be created for the onsite interview. The onsite interview consists of 4–5 interviews. One of those is presentation of the take home test. It also includes a SQL test, stats and probability and business case. There is a final core values interview to know if you fit within the Lyft culture. The interview is challenge but the reward when you clear the interview is totally worth it.

Important Reading

Source: From shallow to deep learning in fraud

Data Science Related Interview Questions

  • Find expectations of a random variable with basic distribution. How would you construct a confidence interval? How would you estimate a probability of ordering a ride? What assumptions do you need in order to estimate this probability?
  • What optimization techniques are you familiar with and how do they work? How would you find the optimal price given a linear demand function?
  • Coin got x heads during y flips. How can we test if this is a fair coin?
  • What are some metrics for monitoring supply and demand in Lyft market?
  • Explain correlation and variance.
  • What is the lifetime value of a driver?
  • Implement k nearest neighbour using a quad tree.
  • What are the different factors that could influence a rise in average wait time of a driver?
  • Explain what are the best ways to achieve pool matching?
  • How do you reduce churn on the supply side?

Reflecting on the Questions

The Data Science team at Lyft moves very quickly. The Data sets are huge and problems so wide in nature that the team explores different types of models which can provide higher precision for same recall and feature set. The questions reflect the tough problems which the team faces day to day. There is a mix of model building along with complex coding questions. As I mentioned before the interviews are tough but they are well worth it for getting to work in an excellent team. Hard work can surely get you a job in one of the world’s largest transportation companies!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Lyft and its technologies helping people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Lyft Data Science Interview Questions was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Booking.com Data Science Interview Questions

There are 28 Million+ listings to stay on Booking.com.

Booking.com is the travel E-Commerce part of Booking Holdings. They have over 140,000+ destinations in 230 countries all over the world. They also have over 1.5 Million+ nights reserved every day on their platform. From a data science perspective, this translates into over 300 TB of data. A robust data engineering infrastructure coupled with huge amounts of data makes Booking.com one of the best places for a Data Scientist to build their career.

Photo by John Matychuk on Unsplash

Interview Process

The interview process starts with MCQ based test on machine learning and statistics questions.That is followed by the HR phone interview. Once you clear both of those, there is a technical phone interview with data scientists. This is based around your projects and also includes a case study discussion. Finally there is an onsite interview which consists of technical interviews, behavioural interview and hiring manager interview.

Important Reading

Booking.com streaming ecosystem

Data Science Related Interview Questions

  • What is the difference between L1 and L2 regularization?
  • What is gradient decent?
  • Why did you use Random Forests instead of Clustering on a particular problem?(case study)
  • How to deal with new hotels that do not have an official rating?
  • If the training error and the testing error are both high, as the number of data points increase, what measures will you take to fix the model?
  • How would you optimize the advertising that directs people to your site? How do you evaluate how much to spend on each channel?
  • What do you do to make sure your model is not over fitting?
  • Given a business case as such, how would you handle this with a Machine Learning solution?
  • How did you validate your model?
  • What are the parameters of decision trees and random forests, and how would you choose them?

Reflecting on the Questions

Booking.com is headquartered in Amsterdam but has offices all over the globe. Data dictates the spending and drives efficiency in their business. It is a critical component of their product. The questions are about deep data science fundamentals and also about the different situations within their business where they deploy data science. A good knowledge of Data Science fundamentals coupled with know how about their business can surely land you a job with one of the world’s largest booking sites!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Booking.com and its technologies helping people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Booking.com Data Science Interview Questions was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat