What advice should a data scientist ignore?

What advice should a data scientist ignore? — Interview with Johannes, Senior Data Engineer at Loop Insights

Johannes Giorgis is a Senior Data Engineer at Loop Insights. His story is fascinating how he has gone from a big company to a fast paced data based startup. I met Johnny while we took the deep learning Nanodegree at Udacity together. We have stayed in touch ever since. Over the last few years of knowing about Johnny I have realized that “still water runs deep” is an apt proverb for him. He shares his learning via his blog. Going through the interview with him, he details how it is an important for folks to understand to know where their ML models fit in the larger scheme of a software system.

For more some similar inspiration:

Vimarsh Karbhari(VK): What top three books about AI/ML/DS have you liked the most? What books have had the most impact in your career?

Johannes Giorgis(JG):

Artificial Intelligence: A Modern Approach was an eye opener to the field of Artificial Intelligence. I read through that book back when I was first enrolled in Udacity’s Artificial Intelligence Nanodegree.

I’m currently reading Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable and Maintainable Systems. This has been a great resource into building systems that power data driven applications.

Next on my list is Data Science from Scratch — I’m excited about this book as it focuses on the base algorithms that power a lot of data science today. Re-writing these algorithms and applying them in the context of being a new data scientist at a company gives us that one level deeper perspective that we lose when we rely on higher level libraries’ functions.

VK: What tool/tools (software/hardware/habit) that you have as a Data Scientist has the most impact on your work?

JG: Pandas! I’m always excited to utilize my Pandas skills to clean, format and explore datasets. Recently, I’ve taken up learning Docker for web related tasks at work. I’m hoping to incorporate it into my data science workflow/toolkit to help me create a reproducible data science development workspace. Being able to quickly share the environment under which you built a model is a huge advantage.

VK: Can you share about the Data Science related failures/projects/experiments that you have learned from the most?

JG: I and my friend got together to explore the respective tech meetups in our cities — Vancouver and San Francisco. We initially explored the Meetup API to see what it allowed us to do. From there, we built some helper functions to get data for multiple groups, transform it into Pandas Dataframes so we could move forward with cleaning and exploring the data. Jumping straight into a problem, looking up enough documentation, tutorials to move you forward one inch at a time was an invaluable lesson. I often find myself stuck in tutorial hell, where I’m unable to apply what I’ve just learnt to anything that will help me retain it.

By focusing on a project or problem that I’m interested in exploring or solving, I avoid getting stuck with tutorials. — Johnny

VK: If you were to write a book what would be the title of the book? What would be the main topics you would cover in the book?

JG: I am interested in writing a book that explores how a company can build its data capabilities. From no data teams, to some or plenty of data to a fully fledged data infrastructure that enables analytics and Machine Learning exploration. Then taking that to the next level and being able to deploy machine learning and AI in an effective way to solve business problems.

Too many resources out there focus on doing the sexy data science/ML model building part, which in reality is what data scientists tend to spend the least amount of time on. A majority of the time is spent in capturing the data, cleaning and transforming it into something they can actually use. In the real world, data is messy, it’s not in one single place, etc. Being able to take that and build a data infrastructure that enables data scientists, analysts and machine learning engineers to do their work is an area that fascinates me.

Tied to that is also the deployment of machine learning/AI systems. Again, lots of resources walk you through how to build a model, but not enough show you how to make it useful — build a web app and deploy it to heroku, dockerize it and deploy it to a cloud environment, etc. The value of these systems will only be realized by making it available to people whether you are building a side project for fun or building a business. Everyone doesn’t need to know about scale, ML platforms, etc but it is an important aspect to understand so folks can know where their ML models fit in the larger scheme of a software system.

Going hand in hand with all this is how can you evangelize an organization to become more data-driven, to communicate the importance of using and building data capabilities to executives and decision-makers.

VK: In terms of time, money or energy what are the best investments you have made which have given you compounded rewards in your career?

JG: Having moved to Vancouver while still exploring the field of AI, Meetups have been invaluable to me. I met so many people that were on the same journey as me, some I could learn from and others I could help. Going out and meeting folks is a great way to connect, to understand the problems people are solving and even to find new roles!

Conferences are also a great learning and networking opportunity. You tend to be surrounded by folks you don’t usually have the chance to meet in person, so take advantage and connect. It is also a place to learn in more detail what other companies are working on, the challenges they have faced and how they solved it. I attended Data Science Go earlier this year in San Diego and I met lots of exciting and passionate people. I’m looking forward to attending next year as well as finding more relevant Data conferences to attend.

Working on a project on my own accord separate from an online class has also been very rewarding. Courses are great for covering the basics and getting you started but projects allow you to sink your teeth into and really wrap your head around how to get stuff done with the skills you’ve learnt. I’ve worked on exploring Tech Meetups in Vancouver, scraping data from multiple pages to create my own catalog, etc. While working on these projects, I get more ideas on how to extend them, which in turn requires me to learn more skills to achieve that.

Podcasts are another resource I spend a lot of time using — there are lots of good Data Science focused podcasts that explore different aspects — practical applications, theoretical papers, how to build your career, leadership, ethics, data engineering, etc.

VK: In the last year, what has improved your work life which could benefit others?

JG: I joined a startup earlier this year so I have been adjusting to the speed change coming from a much larger company. Every task in a startup can seem like it is a priority 1, so being able to prioritize tasks and communicate the expectation of how long they will take is a crucial skill I’ve needed to develop.

VK: What advice would you give to someone starting in this field? What advice should they ignore?

JG: This was an advice I heard while attending Data Science Go — focus on the area that you are interested in. Specifically, if you aren’t interested in working with images, don’t learn Convolutional Neural Networks. If you aren’t interested in Marketing, don’t bother learning Marketing related analytics. Sometimes it is easier to figure out what we aren’t interested in rather than what we are interested in. So go through this process to narrow down the areas you may be interested in.

This field is quite vast — although more specialized roles are being created, a data scientist could either do data infrastructure, build machine learning models, do analytics or conduct statistical experiments or some combination of these and more. Although there is talk of the unicorn full stack data scientist, you must realize that this will take years to achieve (if you are aiming to do it well).

Start blogging! Start learning how you can communicate your findings, your challenges in written form. Share what you are learning. Just as there is someone in ahead of you, there is someone behind you who can learn from you.

VK: How do you determine saying no to experiments/projects?

JG: Right now, I’m really interested in building ML/AI projects that will have a meaningful business impact.

Some experiments/projects sound super cool from a technical perspective, but don’t provide any immediate business value. These are the projects I say no to. — Johnny

Currently, I work for an IoT based Data Analytics company. Our IoT device sits in retail stores at the Point of Sale — theoretically, we could use it to provide translation services between a cashier and a customer. It could be a very cool project to build such a model that could work on the edge effectively. However, it wouldn’t have a business impact as that is not the business problem we are trying to solve.

VK: In your opinion what is the ideal Organizational placement for a data team?

JG: This really depends. Again, another take away I had from Data Science Go (did I mention you learn a lot at conferences 🙂 ) was from a talk that focused on the roles of a Data Science team, which determined where the team sat in the organization.

Depending on the organization and its needs, data science teams could sit in the engineering team helping them build ML pipelines/products, or in a centralized/embedded team serving as a center of excellence for data science/analytics, or in the research department exploring next generation of AI products, etc.

VK: If you could redo your career today, what would you do?

JG: I would have worked on my soft skills earlier. I would have joined a Toastmasters group, started attending Meetups and offered to give talks.

Along with this, I would have focused on building applications in my free time, honing my software engineering skills while building my Operations/Cloud Architecture and deployment skills.

VK: What online blogs/people do you follow for getting advice/ learning more about DS?

JG: Some of the blogs I follow and podcasts I listen to are below.

Blogs:
Acing AI 🙂
Towards Data Science

Podcasts:
Practical AI
TWIML
Super Data Science
Data Engineering Podcast
Data Science at Home
AI in Industry
Data Skeptic

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Newsletter

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

What advice should a data scientist ignore? was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

What advice should a data scientist ignore?

What advice should a data scientist ignore? — Interview with Johannes, Senior Data Engineer at Loop Insights