Blog

Learn About Our Meetup

4500+ Members

Category: Vibhanshu Sharma

Working on Kaggle datasets through Google Colab..

Colab + Kaggle

Data scientists often practices their skills on Kaggle datasets, its always trickier to download and manage the datasets an easier way is using the Google Colab . Colab provides an easier integration with Kaggle using couple of simple command lines. Let’s dive in!!!

Prerequisites:

You just need only two things to get started.

  1. A gmail account.
  2. A kaggle account.

Open kaggle home, go to your account by clicking your profile picture at top right hand corner …

Kaggle Account

Now scroll down to the API section and click on “Create new API token”

get kaggle API key

You will get a kaggle.json file, save it somewhere in your PC. Now bring up google colab make sure to sign in with your google account.

Your first Colab notebook(rename notebook to whatever you want)

You can always choose the python version and your choice of hardware (GPU/TPU) by clicking ‘change runtime type’ under ‘runtime tab’.

Next is to install kaggle library in your environment, just type below command . Note that Colab needs an ! at the starting on every command that you may not need when you work in your PC(command prompt or Anaconda).

!pip install kaggle

All unix commands like ls, cd, mkdir, unzip & pip works directly on Colab with a prefixed “!”. Once you install it create a directory named Kaggle.

!mkdir .kaggle

Now you’ve to use the kaggle.json file that you’ve downloaded to connect your Colab notebook with Kaggle. Use the below command to upload you json file. once you type it, you will automatically prompted to upload your file.

upload your kaggle.json by clicking choose files and display of your username and key will notify that you’ve uploaded it correctly.

Move this uploaded file into kaggle directory and give proper permissions.

You are good to go now !!!

Basic Kaggle commands

I am listing the most useful kaggle commands that you’ll need to work on kaggle datasets.

  1. Browsing Kaggle datasets: This command will list the datasets available in kaggle.
!kaggle datasets list
Others information like size of the dataset and download count is also available in the details

To search any specific competition you can use below command e.g beginners competitions can be listed using

!kaggle competitions list — category gettingStarted.

2. Checking leader board of any competition : By using below command you can check leader board of any competition you want.

!kaggle competitions leaderboard favorita-grocery-sales-forecasting -s

3. Downloading datasets for a competition : By using the below command you can download dataset for any competition. Go to kaggle → chose the competition → go the data tab and copy the API link.

You can directly run this command to download dataset.

It’s good to create 2 separate directories Train and Test, move the respective zip files in the folder and unzip it.

4. Submission for a competition : Every competition ask you to upload the submission file in csv which contains the predictions done by your code. you can directly submit from colab using below command.

!kaggle competitions submit humpback-whale-identification -f sample_submission.csv -m “Submitted by DSVS_01272019”

Pro-tip : You can create a small shell script for automated submissions but remember kaggle will except only 5 submissions per day.

Conclusion : You can use above commands to make your kaggling journey easier for you using Google Colab. If you like this article please share it with other kagglers and data scientists. Do clap, if you find this useful. I’ll be covering important concepts in more interesting way in my next posts. Thanks for reading….

A-Z Learning curriculum for Data Science to follow in 2019.

I’ve been learning and applying end to end data science concepts from past 2 and half years, Below is the list of resources and MOOC’s which I used to learn. If you work fast paced you can complete this in 4 months(investing 3–4 hours everyday), 2 points you need to consider for completing this course is discipline and consistency. I recommend you to watch the videos at 1.5x – 2x(try different speeds and see what suits you best).

Month 1 — Python basics and its implementation in data science

Week 1– Learn python: There are 2 learning resources very useful to learn python for data science, I recommend you to browse both in the below sequence. https://automatetheboringstuff.com/ https://www.codecademy.com/learn/learn-python

Week 2– This week will be more aggressive for python and you have to learn data science aspects of python. https://www.edx.org/course/introduction-python-data-science-2 — watch at 1.5x , if you’re new to python invest time on first 3 topics (basics , data structures , functions & packages) otherwise directly jump to section 4 i.e. Numpy , plotting with matplotlib and pandas basics. This course will give you basic idea about these libraries and its usage, if you want to explore the advance part you can refer the book by Chris Albon(https://www.amazon.ca/Machine-Learning-Python-Cookbook-Preprocessing/dp/1491989386/ref=sr_1_1?ie=UTF8&qid=1546373714&sr=8-1&keywords=chris+albon)

Week 3– By this week you’ve attained much knowledge to understand the code presented in this series https://www.youtube.com/watch?v=T5pRlIbr6gg&list=PL2-dafEMk2A6QKz1mrk1uIGfHkC1zZ6UU do replicate the code by yourself shown in these videos by Siraj and add it to your online github profile.

Week 4– This week you’ll learn the intermediate by below course provided by university of Michigan https://www.coursera.org/specializations/data-science-python. Suggestion for this course restore your browser and python editor such both are visible on the screen at a time so that you can listen to the course and do the coding side by side.

Month 2— Mathematics basics and its implementation in data science.

This whole month we are on a journey of understanding mathematics. Will divide this month learn 4 main mathematics concepts. Linear Algebra, Calculus, Probability & Statistics. Divide your study hours in such a way that you’ll visit one lecture from Math of Intelligence by Siraj( https://www.youtube.com/watch?v=xRJCOz3AfYY&list=PL2-dafEMk2A7mu0bSksCGMJEmeddU_H4D) everyday as you progress on this journey.

Week 1– Linear Algebra course by MIT, I recommend to listen it on 2.5x-3x. First listen all the concepts from the video then write it on a paper(I’ll share those concepts in my next post). 3blue1brown should be on your subscription list, Essence of Linear Algebra ( https://www.youtube.com/watch?v=kjBOesZCoqc&index=1&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab) should be your next course, prefer a weekend for this course and spend 4–5 hours to finish it in one go.

Week 2– For Calculus(uni-variate and multi-variate), Essence of Calculus should be your next course.
https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr Consider single or 2 days to complete this course.

Week 3– Introduction to Probability, this course will take more than a week(utilize remaining calculus week 2) https://courses.edx.org/courses/course-v1:MITx+6.041x_4+1T2017/course/ By this time you may be frustrated but do not loose your cool and do not apply to any job as you have not achieve what is required for the post of Machine learning engineer or a Data Scientist.

Week 4– Introduction to statistics by Khan academy is best course for statistics (https://www.khanacademy.org/math/statistics-probability) but there is other good course as well that include whole mathematics for machine learning from Imperial college London by coursera (https://www.coursera.org/learn/pca-machine-learning/lecture/QCWpn/variance-of-higher-dimensional-datasets). Naked Statistics by Charles Wheelan is a must read by the end of this week(https://www.amazon.ca/Naked-Statistics-Stripping-Dread-Data/dp/1480590185).

Month 3- Machine Learning basics and related algorithms.

This month you will learn all the machine learning algorithms.

Week 1– This course by Udemy will help you understanding machine learning line by line, it is designed using scikit-learn and Keras (both best libraries for academia) https://www.udemy.com/machinelearning/, you will find implementation in both R and Python, i suggest keep R videos for later as you’ve already invested learning python so in each section you can skip half videos and excel this course. Also this course contains deep- learning lectures as one separate section park that aside too.

Week 2– You may not be able to complete the above mentioned course in a week and need few more days, utilize this week along with auditing a nice one from EDx ( https://courses.edx.org/courses/course-v1:ColumbiaX+DS102X+2T2018/course/) don’t revisit the concept which you’ve already covered.

Week 3– Now there is something which may seem unfair to you and let you think why do I need to repeat the same course from a different provider. but believe me as both black and white pawns are important in a chess game as they complement each other being in opposite team the below mentioned course will also complement the one you’ve learnt in week 2 and you will learn all the missing components that too using TensorFlow. (https://classroom.udacity.com/courses/ud120). Believe you can easily skim through this course and you already know the concepts but order of doing these course is important.

Week 4– Now its the time to apply your knowledge to some real world problem and create you resume. Take part in any kaggle competition which involves categorization or regression and complete it till end and submit your solution(the confidence you’ll get by doing this is something no univ. degree can give you).

Month 4- Deep Learning basics from beginner to advance.

This month you’ll be applying all your knowledge to learn advance concepts, taking part in kaggle competitions and start applying for the jobs.

Week 1– Again, I’ll ask you to revisit Udemy for A-Z Deep Leaning course (https://www.udemy.com/deeplearning), don’t forget to add each code to your github profile as you keep progressing throughout this month.

Week 2 & 3 – Revisiting the same concept again will reinforce that concept and you’ll never forget it as it will be imprinted on your brain. Udacity’s deep learning nanodegree is such course (take its certificate that will help you finding new career opportunities)

Week 4– This week you’ll be finalizing your concepts and start fulfilling your dreams. I recommend to go through fast.ai course (http://course.fast.ai/) and a brilliant course and story telling by Edx (https://www.edx.org/course/analytics-storytelling-impact-1) . Genuinely both can’t be completed in a week’s time but try to complete as soon as you can.

Conclusion : Till this point you’ve learnt enough concepts and have a good github profile to showcase potential recruiters. Keep posting your work on github as you proceed throughout the course. I’ll post detailed concepts in my next posts so that you don’t have to carry your notes. All the best and keep learning till you die.

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.