Blog

Learn About Our Meetup

4500+ Members

Working on Kaggle datasets through Google Colab..

Colab + Kaggle

Data scientists often practices their skills on Kaggle datasets, its always trickier to download and manage the datasets an easier way is using the Google Colab . Colab provides an easier integration with Kaggle using couple of simple command lines. Let’s dive in!!!

Prerequisites:

You just need only two things to get started.

  1. A gmail account.
  2. A kaggle account.

Open kaggle home, go to your account by clicking your profile picture at top right hand corner …

Kaggle Account

Now scroll down to the API section and click on “Create new API token”

get kaggle API key

You will get a kaggle.json file, save it somewhere in your PC. Now bring up google colab make sure to sign in with your google account.

Your first Colab notebook(rename notebook to whatever you want)

You can always choose the python version and your choice of hardware (GPU/TPU) by clicking ‘change runtime type’ under ‘runtime tab’.

Next is to install kaggle library in your environment, just type below command . Note that Colab needs an ! at the starting on every command that you may not need when you work in your PC(command prompt or Anaconda).

!pip install kaggle

All unix commands like ls, cd, mkdir, unzip & pip works directly on Colab with a prefixed “!”. Once you install it create a directory named Kaggle.

!mkdir .kaggle

Now you’ve to use the kaggle.json file that you’ve downloaded to connect your Colab notebook with Kaggle. Use the below command to upload you json file. once you type it, you will automatically prompted to upload your file.

upload your kaggle.json by clicking choose files and display of your username and key will notify that you’ve uploaded it correctly.

Move this uploaded file into kaggle directory and give proper permissions.

You are good to go now !!!

Basic Kaggle commands

I am listing the most useful kaggle commands that you’ll need to work on kaggle datasets.

  1. Browsing Kaggle datasets: This command will list the datasets available in kaggle.
!kaggle datasets list
Others information like size of the dataset and download count is also available in the details

To search any specific competition you can use below command e.g beginners competitions can be listed using

!kaggle competitions list — category gettingStarted.

2. Checking leader board of any competition : By using below command you can check leader board of any competition you want.

!kaggle competitions leaderboard favorita-grocery-sales-forecasting -s

3. Downloading datasets for a competition : By using the below command you can download dataset for any competition. Go to kaggle → chose the competition → go the data tab and copy the API link.

You can directly run this command to download dataset.

It’s good to create 2 separate directories Train and Test, move the respective zip files in the folder and unzip it.

4. Submission for a competition : Every competition ask you to upload the submission file in csv which contains the predictions done by your code. you can directly submit from colab using below command.

!kaggle competitions submit humpback-whale-identification -f sample_submission.csv -m “Submitted by DSVS_01272019”

Pro-tip : You can create a small shell script for automated submissions but remember kaggle will except only 5 submissions per day.

Conclusion : You can use above commands to make your kaggling journey easier for you using Google Colab. If you like this article please share it with other kagglers and data scientists. Do clap, if you find this useful. I’ll be covering important concepts in more interesting way in my next posts. Thanks for reading….

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.