Blog

Learn About Our Meetup

4500+ Members

Category: Vimarsh Karbhari

What advice should a data scientist ignore?

What advice should a data scientist ignore? — Interview with Johannes, Senior Data Engineer at Loop Insights

Photo by Frame Harirak on Unsplash

Johannes Giorgis is a Senior Data Engineer at Loop Insights. His story is fascinating how he has gone from a big company to a fast paced data based startup. I met Johnny while we took the deep learning Nanodegree at Udacity together. We have stayed in touch ever since. Over the last few years of knowing about Johnny I have realized that “still water runs deep” is an apt proverb for him. He shares his learning via his blog. Going through the interview with him, he details how it is an important for folks to understand to know where their ML models fit in the larger scheme of a software system.

For more some similar inspiration:

Vimarsh Karbhari(VK): What top three books about AI/ML/DS have you liked the most? What books have had the most impact in your career?

Johannes Giorgis(JG):

Artificial Intelligence: A Modern Approach was an eye opener to the field of Artificial Intelligence. I read through that book back when I was first enrolled in Udacity’s Artificial Intelligence Nanodegree.

I’m currently reading Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable and Maintainable Systems. This has been a great resource into building systems that power data driven applications.

Next on my list is Data Science from Scratch — I’m excited about this book as it focuses on the base algorithms that power a lot of data science today. Re-writing these algorithms and applying them in the context of being a new data scientist at a company gives us that one level deeper perspective that we lose when we rely on higher level libraries’ functions.

VK: What tool/tools (software/hardware/habit) that you have as a Data Scientist has the most impact on your work?

JG: Pandas! I’m always excited to utilize my Pandas skills to clean, format and explore datasets. Recently, I’ve taken up learning Docker for web related tasks at work. I’m hoping to incorporate it into my data science workflow/toolkit to help me create a reproducible data science development workspace. Being able to quickly share the environment under which you built a model is a huge advantage.

VK: Can you share about the Data Science related failures/projects/experiments that you have learned from the most?

JG: I and my friend got together to explore the respective tech meetups in our cities — Vancouver and San Francisco. We initially explored the Meetup API to see what it allowed us to do. From there, we built some helper functions to get data for multiple groups, transform it into Pandas Dataframes so we could move forward with cleaning and exploring the data. Jumping straight into a problem, looking up enough documentation, tutorials to move you forward one inch at a time was an invaluable lesson. I often find myself stuck in tutorial hell, where I’m unable to apply what I’ve just learnt to anything that will help me retain it.

By focusing on a project or problem that I’m interested in exploring or solving, I avoid getting stuck with tutorials. — Johnny

VK: If you were to write a book what would be the title of the book? What would be the main topics you would cover in the book?

JG: I am interested in writing a book that explores how a company can build its data capabilities. From no data teams, to some or plenty of data to a fully fledged data infrastructure that enables analytics and Machine Learning exploration. Then taking that to the next level and being able to deploy machine learning and AI in an effective way to solve business problems.

Too many resources out there focus on doing the sexy data science/ML model building part, which in reality is what data scientists tend to spend the least amount of time on. A majority of the time is spent in capturing the data, cleaning and transforming it into something they can actually use. In the real world, data is messy, it’s not in one single place, etc. Being able to take that and build a data infrastructure that enables data scientists, analysts and machine learning engineers to do their work is an area that fascinates me.

Tied to that is also the deployment of machine learning/AI systems. Again, lots of resources walk you through how to build a model, but not enough show you how to make it useful — build a web app and deploy it to heroku, dockerize it and deploy it to a cloud environment, etc. The value of these systems will only be realized by making it available to people whether you are building a side project for fun or building a business. Everyone doesn’t need to know about scale, ML platforms, etc but it is an important aspect to understand so folks can know where their ML models fit in the larger scheme of a software system.

Going hand in hand with all this is how can you evangelize an organization to become more data-driven, to communicate the importance of using and building data capabilities to executives and decision-makers.

VK: In terms of time, money or energy what are the best investments you have made which have given you compounded rewards in your career?

JG: Having moved to Vancouver while still exploring the field of AI, Meetups have been invaluable to me. I met so many people that were on the same journey as me, some I could learn from and others I could help. Going out and meeting folks is a great way to connect, to understand the problems people are solving and even to find new roles!

Conferences are also a great learning and networking opportunity. You tend to be surrounded by folks you don’t usually have the chance to meet in person, so take advantage and connect. It is also a place to learn in more detail what other companies are working on, the challenges they have faced and how they solved it. I attended Data Science Go earlier this year in San Diego and I met lots of exciting and passionate people. I’m looking forward to attending next year as well as finding more relevant Data conferences to attend.

Working on a project on my own accord separate from an online class has also been very rewarding. Courses are great for covering the basics and getting you started but projects allow you to sink your teeth into and really wrap your head around how to get stuff done with the skills you’ve learnt. I’ve worked on exploring Tech Meetups in Vancouver, scraping data from multiple pages to create my own catalog, etc. While working on these projects, I get more ideas on how to extend them, which in turn requires me to learn more skills to achieve that.

Podcasts are another resource I spend a lot of time using — there are lots of good Data Science focused podcasts that explore different aspects — practical applications, theoretical papers, how to build your career, leadership, ethics, data engineering, etc.

VK: In the last year, what has improved your work life which could benefit others?

JG: I joined a startup earlier this year so I have been adjusting to the speed change coming from a much larger company. Every task in a startup can seem like it is a priority 1, so being able to prioritize tasks and communicate the expectation of how long they will take is a crucial skill I’ve needed to develop.

VK: What advice would you give to someone starting in this field? What advice should they ignore?

JG: This was an advice I heard while attending Data Science Go — focus on the area that you are interested in. Specifically, if you aren’t interested in working with images, don’t learn Convolutional Neural Networks. If you aren’t interested in Marketing, don’t bother learning Marketing related analytics. Sometimes it is easier to figure out what we aren’t interested in rather than what we are interested in. So go through this process to narrow down the areas you may be interested in.

This field is quite vast — although more specialized roles are being created, a data scientist could either do data infrastructure, build machine learning models, do analytics or conduct statistical experiments or some combination of these and more. Although there is talk of the unicorn full stack data scientist, you must realize that this will take years to achieve (if you are aiming to do it well).

Start blogging! Start learning how you can communicate your findings, your challenges in written form. Share what you are learning. Just as there is someone in ahead of you, there is someone behind you who can learn from you.

VK: How do you determine saying no to experiments/projects?

JG: Right now, I’m really interested in building ML/AI projects that will have a meaningful business impact.

Some experiments/projects sound super cool from a technical perspective, but don’t provide any immediate business value. These are the projects I say no to. — Johnny

Currently, I work for an IoT based Data Analytics company. Our IoT device sits in retail stores at the Point of Sale — theoretically, we could use it to provide translation services between a cashier and a customer. It could be a very cool project to build such a model that could work on the edge effectively. However, it wouldn’t have a business impact as that is not the business problem we are trying to solve.

VK: In your opinion what is the ideal Organizational placement for a data team?

JG: This really depends. Again, another take away I had from Data Science Go (did I mention you learn a lot at conferences 🙂 ) was from a talk that focused on the roles of a Data Science team, which determined where the team sat in the organization.

Depending on the organization and its needs, data science teams could sit in the engineering team helping them build ML pipelines/products, or in a centralized/embedded team serving as a center of excellence for data science/analytics, or in the research department exploring next generation of AI products, etc.

VK: If you could redo your career today, what would you do?

JG: I would have worked on my soft skills earlier. I would have joined a Toastmasters group, started attending Meetups and offered to give talks.

Along with this, I would have focused on building applications in my free time, honing my software engineering skills while building my Operations/Cloud Architecture and deployment skills.

VK: What online blogs/people do you follow for getting advice/ learning more about DS?

JG: Some of the blogs I follow and podcasts I listen to are below.

Blogs:
Acing AI 🙂
Towards Data Science

Podcasts:
Practical AI
TWIML
Super Data Science
Data Engineering Podcast
Data Science at Home
AI in Industry
Data Skeptic

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Newsletter

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


What advice should a data scientist ignore? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Great Data Science Company Blogs

List of popular company Data Science blogs

Some of the best technology companies showcase their innovation from time to time on their blogs. These blogs are a great source to read when you are preparing for company specific data science interviews. From a company perspective, the blogs help attract data professionals. In the last few years, companies are in a race to hire data talent and have started showcasing their data science technology and techniques by having separate data science/ machine learning or AI sections on their blogs.

At Acing Data Science, we consume a lot of papers, blogs, videos and podcasts about data science. Lots of companies write about data science but below are our top picks of the company data science blogs. These blogs cover one or few aspects of data science in a very helpful way helping the whole data science community in general.

Photo by Luke Chesser on Unsplash
  • Google: Google is where some of the very early research in data science and AI began. Their AI blog is one of the most mature and complete manifestation of what an AI blog would look like. The blog covers everything from publications, stories, open source data science frameworks, data sets, tools, learning courses and finally careers at this AI institution.
  • Uber: Uber AI Labs has a fantastic set of articles which gives us a speak peak into the great work going on within Uber. Uber’s also gives building blocks about its coveted ML-as-a-service platform Michaelangelo. Uber has also open sourced many data engineering and data science frameworks and mentioned them on its blog.
  • Facebook: Facebook has been doing great work in computer vision and conversational AI. They have open sourced Pytorch which is increasingly cited in papers on ArXiv. Their blog also covers publications, experiments and techniques within Facebook which helps advance the data science field forward.
  • AirBnB AI & Machine Learning: Airbnb has one of the best AI and ML company blogs. They have done some amazing work using deep learning models on search, listing photos and a host of other things. Airbnb data scientists are split across teams which is detailed by Elena Grewal. It shows some of the best ways to think about building and managing teams within product companies.
  • Instacart Data Science | Instacart ML: Instacart handles 200 million plus grocery items on their platform. The blog showcases their data engineering prowess. It also shows some of the techniques they apply to critical business areas like delivery, cost prediction, real-time availability of grocery items and even some great data visualizations using their data.
  • OpenAI blog: OpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. OpenAI has some great papers and findings on their blog which are on the cutting edge of AI.
  • StitchFix: Stichfix is the most under rated data science blog for their data visualizations. Their algorithms tour is one of the best ways I have seen data scientists explain what their product does. Their blog (multi-threaded) does not have a separate section for data science but they cover the interesting things they do within Stitchfix.

This is by no means an exhaustive list of company blogs to follow and read. These blogs have some of the best data science content helpful for all data professionals!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Newsletter

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about the different AI company blogs and its technologies. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Great Data Science Company Blogs was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Visa, Inc. Data Science Interviews

Visa Data Science Interviews

In 2018, Visa had $20.61 billion in revenue.

In 2015, the Nilson Report, a publication that tracks the credit card industry, found that Visa’s global network (known as VisaNet) processed 100 billion transactions during 2014 with a total volume of US$6.8 trillion. VisaNet data centers can handle up to 30,000 simultaneous transactions and up to 100 billion computations every second. Visa is a very household name all over the world. If you ever owned a credit card, you will surely know what Visa is. With a 100 billion transactions, the scale of data in the company is beyond compare. It could be a highlight of a Data professionals’ career.

Photo by Sharon McCutcheon on Unsplash

Interview Process

A senior data scientist from the team reaches out for the first telephonic interview after the resume is selected. The interview involves resume based questions, SQL, and or a business case study. After the first round, there is another telephonic technical interview. Eventually, there are five on-site interviews. On-site interviews are with top level personnel, directors and VPs. Each of those interviews is 45 minutes long.

Important Reading

Source: Data Science at Visa

Data Science Related Interview Questions

  • Invert an integer without the use of strings.
  • Write a sorting algorithm.
  • How do you estimate a customer’s location based on Visa transaction data?
  • Write a code for a Fibonacci sequence.
  • What functions can I perform using a spreadsheet?
  • Who would be your first line of contact to report a missing data you’re keeping record of?
  • Give the top three employee salaries in each department in a company.
  • What is Node.js?
  • What is MVC?
  • What is synchronous vs asynchronous Javascript?

Reflecting on the Interviews

The data science interview at Visa, Inc. is a rigorous process which involves many different interviews. The team is top notch and they are looking for similar candidates to hire. Most interviews look for fundamentals in SQL, coding, probability and statistics as well as ML. A decent amount of hard work can surely get you a job with the world’s largest credit transaction processing company!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Newsletter

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Visa Inc. and its technologies and help people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.


Visa, Inc. Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Analysis of Data Science Interview Questions

Breaking down data science interview questions by category

tl;dr: Unless the data science role is nuanced, most data science roles require fundamental knowledge about the basics of data science. (SQL, Coding, Probability and Stats, Data Analysis)

We analyzed hundreds of data science interview questions to find trends, patterns and topics that are the core of a data science interview.

At a high level, we divided these questions into different categories. We added a weight to each category. Weight of a category is simply the number of times we found a question occurring or repeating in the bucket from a random corpus of 100 questions.

From the pie chart above, categories like SQL, coding are non ambiguous. Machine learning basics consists of Linear/Logistic regression and related ML algorithms. Advanced ML consists of comparisons between multiple approaches, algorithms and nuanced techniques. Big Data technology includes big data concepts such as Hadoop, Spark and may include the infrastructure side/data engineering/deployment side of data science models. In essence, data science fundamentals are asked 70% of the time in a data science interview.

While SQL and coding based questions might be part of the initial online assessment, data analysis questions tend to be a take-home assessment. The remaining categories are usually covered during the phone/in-person interview and vary based on the role, company, years of experience and team composition.

Considering all this data, we designed a data science interview course to help people Ace Data Science Interviews. All the categories mentioned above will be covered in this course. The current cohort starts September 16, 2019. It will be a small group of 15 people. Sign up here!

Acing Data Science Interviews

Subscribe to the Acing AI/Data Science Newsletter. It is FREE!

Newsletter


Analysis of Data Science Interview Questions was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Acing Data Science Interviews

According to Indeed, there is a 344% increase in demand for data scientists year over year.

In January 2018, I started the Acing AI blog with a goal to help people get into data science. My first article was about “The State of Autonomous Transportation”. As I wrote more, I realized people were interested in acing data science interviews. This led me to start my articles covering various companies’ data science interview questions and processes. The Acing AI blog will continue to have interesting articles as always. This journey continues with today’s exciting announcements.

First, we are launching the Acing Data Science/Acing AI newsletter. This newsletter will always be free. We will be sharing interesting data science articles, interview tips and more via the newsletter. Some of you are already subscribed to this newsletter and will continue to get emails on it.

Through my first newsletter, I also wanted to share the next evolution of the Acing AI blog, Acing Data Science Interviews.

I partnered with Johnny to come up with an amazing course to help people ace data science interviews. Everything we have learned from conducting interviews, giving interviews, writing these blogs and learning from the best people in the data science, we packaged that into this course. Think about the collective six plus years of learning condensed into a three month course. That would be Acing Data Science Interviews.

At a high level, we will cover different topics from a data science interview perspective. These include SQL, coding, probability and statistics, data analysis, machine learning algorithms, advanced machine learning, machine learning system design, deep learning, neural networks, big-data concepts and finally approaching a data science interviews. The first few topics provide the foundation aspects of data science. They are followed by the data science application topics. Collectively, all these should encompass everything that could be asked in a data science interview.

The first sessions will start in the second half of September 2019. We are aiming to have a small group of 15 people. The original course will be only 199$. We are focused on quality and would like to provide the best experience and hence, we want to keep the small group size.

Acing Data Science Interviews

Thank you for reading!


Acing Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Workday Data Science Interviews

In 2012, Workday launched a successful IPO valued at $9.5 billion.

Workday is a leading provider of enterprise cloud applications for finance and human resources. It was founded in 2005. Workday delivers financial management, human capital management, planning, and analytics applications designed for the world’s largest companies, educational institutions, and government agencies. In January 2018, Workday announced that it acquired SkipFlag, makers of an AI knowledge base that builds itself from a company’s internal communications. In July 2018, they acquired Stories.bi to boost augmented analytics. These two acquisitions point towards an increased investment in the data science domain.

Source: cloudfoundation.com

Interview Process

The process starts with a phone screen with a recruiter. That is followed by a technical phone interview with hiring manager. The questions are typical machine learning and data science questions — with some data structures and algorithms questions. If both of those go well, there is an onsite interview.
The onsite consists of five interviews with different team members, hiring managers, and executives. The questions are about programming skills, algorithmic skills, data structures, and anything related to machine learning techniques.

Important Reading

Source: https://workday.github.io/scala/2014/05/15/managing-a-job-grid-using-akka

Data Science Related Interview Questions

  • Given data from the world bank, provide insights on a small CSV file.
  • Write a C++ class to perform garbage collection.
  • Given 2 sorted arrays, merge them into 1 array. If the first array has enough space for 2, how do you merge the 2 without using extra space?
  • Given a huge collection of books, how would you tag each book based on genre?
  • Compare the classification algorithms
  • Logistic regression vs neural network
  • Integer array — get pairs of values that equal a certain target value.
  • How would you improve the complexity of a list merging algorithm from quadratic to linear?
  • What is p-value?
  • Perform a tweet correlation analysis and tweet prediction for the given dataset.

Reflecting on the Questions

The questions are highly technical in nature. They point towards a very strong requirement of having Data Scientists who can code very well. Workday is the employee directory in the cloud and there are interesting things that could be done based on data. A good inclination of a Data Scientists in coding can surely land a job with Workday!

Subscribe to our Acing Data Science newsletter. A new course to ace data science interviews is coming soon. Sign up below to join the waitlist!

Acing Data Science Interviews

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


Workday Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Goldman Sachs Data Science Interviews

Goldman Sachs did net revenue of 35.94 billion dollars in 2018.

The Goldman Sachs Group, Inc. is a leading global investment banking, securities and investment management firm that provides a wide range of financial services to a substantial and diversified client base that includes corporations, financial institutions, governments and individuals. Goldman Sachs makes key decisions by taking a calculated risk, based on data and evidence. As a Data Science practitioner, your analysis might have first hand impact to make millions of dollars. The FAST (Franchise Analytics Strategy and Technology) team at Goldman Sachs is a group of data scientists and engineers who are responsible for generating insights and creating products that turn big data into easily digestible takeaways. In essence, the FAST team is comprised of data experts who help other professionals at Goldman Sachs act on relevant insights.

Source: https://revenuesandprofits.com/how-goldman-sachs-makes-money/

Interview Process

The first step is the phone screen with hiring manager person. There is usually a hackerank/coderpad coding assignment involved for an ML/Data Engineer type of role. If that goes well, there is an onsite interview. The onsite interview is usually 4–6 people deep dive into analysis, probability and stats, coding and data science concepts.

Important Reading

Data Science Related Interview Questions

  • Design a random number generator.
  • How to treat missing and null values in a dataset?
  • Given N noodles in a bowl and randomly attaching ends. What is the expected number of loops you will have in the end?
  • How to remove duplicates without distinct from a database table?
  • When is value at risk inappropriate?
  • What is the Wiener process?
  • A = [-2 -1] [9 4]. What is A¹⁰⁰⁰?
  • Write an algorithm for a tree traversal.
  • Write a program for Levenshtein Distance calculation.
  • Count the total number of trees in the states.

Reflecting on the Questions

GS is one of the best places to work for because they really take care of their people. The questions reflect a mix of puzzles and analysis based questions which form the basis of financial investments in general. Thinking on your feet is very important as puzzles can get complicated. A great presence of mind and ample preparation can surely land you a job with one of the most prestigious investment banks in the world!

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


Goldman Sachs Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to evaluate ML models using confusion matrix?

Model Evaluation using Confusion Matrix

Model evaluation is a very important aspect of data science. Evaluation of a Data Science Model provides more colour to our hypothesis and helps evaluate different models that would provide better results against our data.

What Big-O is to coding, validation and evaluation is to Data Science Models.

Photo by Leon Koye on Unsplash

When we are implementing a multi-class classifier, we have multiple classes and the number of data entries belonging to all these classes is different. During testing, we need to know whether the classifier performs equally well for all the classes or whether there is bias towards some classes. This analysis can be done using the confusion matrix. It will have a count of how many data entries are correctly classified and how many are misclassified.

Let’s take an example. There is a total of ten data entries that belong to a class, and the label for that class is “Class 1”. When we generate the prediction from our ML model, we will check how many data entries out of the ten entries get the predicted label as “Class 1”. Suppose six data entries are correctly classified and get the label “Class 1”. In this case, for six entries, the predicted label and True(actual) label is the same, so the accuracy is 60%. For the remaining data entries (4 entries), the ML model misclassifies them. The ML model predicts class labels other than “Class 1”. From the preceding example, it is visible that the confusion matrix gives us an idea about how many data entries are classified correctly and how many are misclassified. We can explore the class-wise accuracy of the classifier.

Source: ML Solutions

For more learning on similar topics, the ML solutions book provides good explanations.

For more such answers to important Data Science concepts, please visit Acing AI.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Acing AI Newsletter – Revue

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.


How to evaluate ML models using confusion matrix? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.