Blog

Learn About Our Meetup

4500+ Members

Category: Vimarsh Karbhari

Visa, Inc. Data Science Interviews

Visa Data Science Interviews

In 2018, Visa had $20.61 billion in revenue.

In 2015, the Nilson Report, a publication that tracks the credit card industry, found that Visa’s global network (known as VisaNet) processed 100 billion transactions during 2014 with a total volume of US$6.8 trillion. VisaNet data centers can handle up to 30,000 simultaneous transactions and up to 100 billion computations every second. Visa is a very household name all over the world. If you ever owned a credit card, you will surely know what Visa is. With a 100 billion transactions, the scale of data in the company is beyond compare. It could be a highlight of a Data professionals’ career.

Photo by Sharon McCutcheon on Unsplash

Interview Process

A senior data scientist from the team reaches out for the first telephonic interview after the resume is selected. The interview involves resume based questions, SQL, and or a business case study. After the first round, there is another telephonic technical interview. Eventually, there are five on-site interviews. On-site interviews are with top level personnel, directors and VPs. Each of those interviews is 45 minutes long.

Important Reading

Source: Data Science at Visa

Data Science Related Interview Questions

  • Invert an integer without the use of strings.
  • Write a sorting algorithm.
  • How do you estimate a customer’s location based on Visa transaction data?
  • Write a code for a Fibonacci sequence.
  • What functions can I perform using a spreadsheet?
  • Who would be your first line of contact to report a missing data you’re keeping record of?
  • Give the top three employee salaries in each department in a company.
  • What is Node.js?
  • What is MVC?
  • What is synchronous vs asynchronous Javascript?

Reflecting on the Interviews

The data science interview at Visa, Inc. is a rigorous process which involves many different interviews. The team is top notch and they are looking for similar candidates to hire. Most interviews look for fundamentals in SQL, coding, probability and statistics as well as ML. A decent amount of hard work can surely get you a job with the world’s largest credit transaction processing company!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Newsletter

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Visa Inc. and its technologies and help people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Visa, Inc. Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Analysis of Data Science Interview Questions

Breaking down data science interview questions by category

tl;dr: Unless the data science role is nuanced, most data science roles require fundamental knowledge about the basics of data science. (SQL, Coding, Probability and Stats, Data Analysis)

We analyzed hundreds of data science interview questions to find trends, patterns and topics that are the core of a data science interview.

At a high level, we divided these questions into different categories. We added a weight to each category. Weight of a category is simply the number of times we found a question occurring or repeating in the bucket from a random corpus of 100 questions.

From the pie chart above, categories like SQL, coding are non ambiguous. Machine learning basics consists of Linear/Logistic regression and related ML algorithms. Advanced ML consists of comparisons between multiple approaches, algorithms and nuanced techniques. Big Data technology includes big data concepts such as Hadoop, Spark and may include the infrastructure side/data engineering/deployment side of data science models. In essence, data science fundamentals are asked 70% of the time in a data science interview.

While SQL and coding based questions might be part of the initial online assessment, data analysis questions tend to be a take-home assessment. The remaining categories are usually covered during the phone/in-person interview and vary based on the role, company, years of experience and team composition.

Considering all this data, we designed a data science interview course to help people Ace Data Science Interviews. All the categories mentioned above will be covered in this course. The current cohort starts September 16, 2019. It will be a small group of 15 people. Sign up here!

Acing Data Science Interviews

Subscribe to the Acing AI/Data Science Newsletter. It is FREE!

Newsletter


Analysis of Data Science Interview Questions was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Acing Data Science Interviews

According to Indeed, there is a 344% increase in demand for data scientists year over year.

In January 2018, I started the Acing AI blog with a goal to help people get into data science. My first article was about “The State of Autonomous Transportation”. As I wrote more, I realized people were interested in acing data science interviews. This led me to start my articles covering various companies’ data science interview questions and processes. The Acing AI blog will continue to have interesting articles as always. This journey continues with today’s exciting announcements.

First, we are launching the Acing Data Science/Acing AI newsletter. This newsletter will always be free. We will be sharing interesting data science articles, interview tips and more via the newsletter. Some of you are already subscribed to this newsletter and will continue to get emails on it.

Through my first newsletter, I also wanted to share the next evolution of the Acing AI blog, Acing Data Science Interviews.

I partnered with Johnny to come up with an amazing course to help people ace data science interviews. Everything we have learned from conducting interviews, giving interviews, writing these blogs and learning from the best people in the data science, we packaged that into this course. Think about the collective six plus years of learning condensed into a three month course. That would be Acing Data Science Interviews.

At a high level, we will cover different topics from a data science interview perspective. These include SQL, coding, probability and statistics, data analysis, machine learning algorithms, advanced machine learning, machine learning system design, deep learning, neural networks, big-data concepts and finally approaching a data science interviews. The first few topics provide the foundation aspects of data science. They are followed by the data science application topics. Collectively, all these should encompass everything that could be asked in a data science interview.

The first sessions will start in the second half of September 2019. We are aiming to have a small group of 15 people. The original course will be only 199$. We are focused on quality and would like to provide the best experience and hence, we want to keep the small group size.

Acing Data Science Interviews

Thank you for reading!


Acing Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Workday Data Science Interviews

In 2012, Workday launched a successful IPO valued at $9.5 billion.

Workday is a leading provider of enterprise cloud applications for finance and human resources. It was founded in 2005. Workday delivers financial management, human capital management, planning, and analytics applications designed for the world’s largest companies, educational institutions, and government agencies. In January 2018, Workday announced that it acquired SkipFlag, makers of an AI knowledge base that builds itself from a company’s internal communications. In July 2018, they acquired Stories.bi to boost augmented analytics. These two acquisitions point towards an increased investment in the data science domain.

Source: cloudfoundation.com

Interview Process

The process starts with a phone screen with a recruiter. That is followed by a technical phone interview with hiring manager. The questions are typical machine learning and data science questions — with some data structures and algorithms questions. If both of those go well, there is an onsite interview.
The onsite consists of five interviews with different team members, hiring managers, and executives. The questions are about programming skills, algorithmic skills, data structures, and anything related to machine learning techniques.

Important Reading

Source: https://workday.github.io/scala/2014/05/15/managing-a-job-grid-using-akka

Data Science Related Interview Questions

  • Given data from the world bank, provide insights on a small CSV file.
  • Write a C++ class to perform garbage collection.
  • Given 2 sorted arrays, merge them into 1 array. If the first array has enough space for 2, how do you merge the 2 without using extra space?
  • Given a huge collection of books, how would you tag each book based on genre?
  • Compare the classification algorithms
  • Logistic regression vs neural network
  • Integer array — get pairs of values that equal a certain target value.
  • How would you improve the complexity of a list merging algorithm from quadratic to linear?
  • What is p-value?
  • Perform a tweet correlation analysis and tweet prediction for the given dataset.

Reflecting on the Questions

The questions are highly technical in nature. They point towards a very strong requirement of having Data Scientists who can code very well. Workday is the employee directory in the cloud and there are interesting things that could be done based on data. A good inclination of a Data Scientists in coding can surely land a job with Workday!

Subscribe to our Acing Data Science newsletter. A new course to ace data science interviews is coming soon. Sign up below to join the waitlist!

Acing Data Science Interviews

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


Workday Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Goldman Sachs Data Science Interviews

Goldman Sachs did net revenue of 35.94 billion dollars in 2018.

The Goldman Sachs Group, Inc. is a leading global investment banking, securities and investment management firm that provides a wide range of financial services to a substantial and diversified client base that includes corporations, financial institutions, governments and individuals. Goldman Sachs makes key decisions by taking a calculated risk, based on data and evidence. As a Data Science practitioner, your analysis might have first hand impact to make millions of dollars. The FAST (Franchise Analytics Strategy and Technology) team at Goldman Sachs is a group of data scientists and engineers who are responsible for generating insights and creating products that turn big data into easily digestible takeaways. In essence, the FAST team is comprised of data experts who help other professionals at Goldman Sachs act on relevant insights.

Source: https://revenuesandprofits.com/how-goldman-sachs-makes-money/

Interview Process

The first step is the phone screen with hiring manager person. There is usually a hackerank/coderpad coding assignment involved for an ML/Data Engineer type of role. If that goes well, there is an onsite interview. The onsite interview is usually 4–6 people deep dive into analysis, probability and stats, coding and data science concepts.

Important Reading

Data Science Related Interview Questions

  • Design a random number generator.
  • How to treat missing and null values in a dataset?
  • Given N noodles in a bowl and randomly attaching ends. What is the expected number of loops you will have in the end?
  • How to remove duplicates without distinct from a database table?
  • When is value at risk inappropriate?
  • What is the Wiener process?
  • A = [-2 -1] [9 4]. What is A¹⁰⁰⁰?
  • Write an algorithm for a tree traversal.
  • Write a program for Levenshtein Distance calculation.
  • Count the total number of trees in the states.

Reflecting on the Questions

GS is one of the best places to work for because they really take care of their people. The questions reflect a mix of puzzles and analysis based questions which form the basis of financial investments in general. Thinking on your feet is very important as puzzles can get complicated. A great presence of mind and ample preparation can surely land you a job with one of the most prestigious investment banks in the world!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


Goldman Sachs Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to evaluate ML models using confusion matrix?

Model Evaluation using Confusion Matrix

Model evaluation is a very important aspect of data science. Evaluation of a Data Science Model provides more colour to our hypothesis and helps evaluate different models that would provide better results against our data.

What Big-O is to coding, validation and evaluation is to Data Science Models.

Photo by Leon Koye on Unsplash

When we are implementing a multi-class classifier, we have multiple classes and the number of data entries belonging to all these classes is different. During testing, we need to know whether the classifier performs equally well for all the classes or whether there is bias towards some classes. This analysis can be done using the confusion matrix. It will have a count of how many data entries are correctly classified and how many are misclassified.

Let’s take an example. There is a total of ten data entries that belong to a class, and the label for that class is “Class 1”. When we generate the prediction from our ML model, we will check how many data entries out of the ten entries get the predicted label as “Class 1”. Suppose six data entries are correctly classified and get the label “Class 1”. In this case, for six entries, the predicted label and True(actual) label is the same, so the accuracy is 60%. For the remaining data entries (4 entries), the ML model misclassifies them. The ML model predicts class labels other than “Class 1”. From the preceding example, it is visible that the confusion matrix gives us an idea about how many data entries are classified correctly and how many are misclassified. We can explore the class-wise accuracy of the classifier.

Source: ML Solutions

For more learning on similar topics, the ML solutions book provides good explanations.

For more such answers to important Data Science concepts, please visit Acing AI.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


How to evaluate ML models using confusion matrix? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Shopify Data Science Interview Questions

Shopify powers 800,000 businesses in approximately 175 countries.

The first iteration of Shopify (before it was called that) was an online store that sold snowboards. Eventually, there was a pivot to becoming an e- commerce platform. It’s been named Canada’s “smartest” company, among myriad other well-earned accolades. Shopify was the third largest e-commerce CMS in 2018, with a market share of 10.03% in the first million websites. In 2018, Shopify platform did 1.5+ Billion $ in sales on Cyber Monday alone.

Source: https://mobilesyrup.com/2018/05/08/shopify-new-retail-features-chip-reader/

Interview Process

The first step is the phone screen with HR person. The next step is a three part in person interview (‘life story’ and technical interview). Once those are clear, there is an onsite interview which consists of two more technical interviews, and three more interviews before prospective team leads.

Important Reading

surviving-flashes-of-high-write-traffic-using-scriptable-load-balancers

Data Science Related Interview Questions

  • Go through a previously completed project and explain it. Why did you make the choices in the project that you did?
  • What’s the difference between Type I and Type II error?
  • Explain the difference between L1 and L2 regularization.
  • Write a program to solve a simulation of Conway’s game of life.
  • What is the difference between supervised and unsupervised machine learning?
  • What’s the difference between a generative and discriminative model?
  • What’s the F1 score? How would you use it?
  • What is your experience working on big data technologies?
  • Do you have experience with Spark or big data tools for machine learning?
  • How do you ensure you are not overfitting with a model?

Reflecting on the Question

The 800,000 businesses that Shopify powers generates massive amounts of data. The Data Science team at Shopify asks basic data science questions which are fundamental in nature. Sometimes, the questions revolve around your resume and the problems you have solved in your past career. Good grip on fundamentals can surely land you a job with the world’s largest e-commerce platform!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Shopify and its technologies helping people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Shopify Data Science Interview Questions was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

What is TF-IDF in Feature Engineering?

Basic concept of TF-IDF in NLP

The concept TF-IDF stands for term frequency-inverse document frequency. This is in the field of numerical statistics. With this concept, we will be able to decide how important a word is to a given document in the present dataset or corpus.

Frequency

What is TF-IDF?

TF-IDF indicates what the importance of the word is in order to understand the document or dataset. Let us understand with an example. Suppose you have a dataset where students write an essay on the topic, My House. In this dataset, the word a appears many times; it’s a high frequency word compared to other words in the dataset. The dataset contains other words like home, house, rooms and so on that appear less often, so their frequency are lower and they carry more information compared to the word. This is the intuition behind TF-IDF.

Let us dive deep into the mathematical aspect of TF-IDF. It has two parts: Term Frequency(TF) and Inverse Document Frequency(IDF). The term frequency indicates the frequency of each of the words present in the document or dataset.

So, its equation is given as follows:

TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)

The second part is — inverse document frequency. IDF actually tells us how important the word is to the document. This is because when we calculate TF, we give equal importance to every single word. If the word appears in the dataset more frequently, then its term frequency (TF) value is high while not being that important to the document.

So, if the word the appears in the document 100 times, then it’s not carrying that much information compared to words that are less frequent in the dataset. Thus, we need to define some weighing down of the frequent terms while scaling up the rare ones, which decides the importance of each word. We will achieve this with the following equation:

IDF(t) = log10(Total number of documents / Number of documents with term t in it).

Hence, equation is calculate TF-IDF is as follows.

TF * IDF = [ (Number of times term t appears in a document) / (Total number of terms in the document) ] * log10(Total number of documents / Number of documents with term t in it).

In reality, TF-IDF is the multiplication of TF and IDF, such as TF * IDF.

Now, let’s take an example where you have two sentences and are considering those sentences as different documents in order to understand the concept of TF-IDF:

Document 1: This is a sample.

Document 2: This is another example.

Source: Python NLP
Source: Python NLP

In summary, to calculate TF-IDF, we will follow these steps:

1. We first calculate the frequency of each word for each document.

2. We calculate IDF.

3. We multiply TF and IDF.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

Reference: Python NLP


What is TF-IDF in Feature Engineering? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

What is AUC?

Data Science Interview Questions based on AUC.

Few weeks ago, I started wrote about ROC curves. The purpose was to provide a basic primer on ROC curves. As a follow up, this article talks about AUC.

Photo by Zbysiu Rodak on Unsplash

AUC stands for Area Under the Curve. ROC can be quantified using AUC. The way it is done is to see how much area has been covered by the ROC curve. If we obtain a perfect classifier, then the AUC score is 1.0. If the classifier is random in its guesses, then the AUC score is 0.5. In the real world, we don’t expect an AUC score of 1.0, but if the AUC score for the classifier is in the range of 0.6 to 0.9, then it is considered to be a good classifier.

AUC for the ROC curve

In the preceding figure, the area under the curve which has been covered becomes our AUC score. This gives us an indication of how good or bad our classifier is performing. ROC and AUC are the two indicators that can provide us with insights on how our classifier performs.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

Reference: ML Solutions


What is AUC? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.