Working on Kaggle datasets through Google Colab..
Data scientists often practices their skills on Kaggle datasets, its always trickier to download and manage the datasets an easier way is using the Google Colab . Colab provides an easier integration with Kaggle using couple of simple command lines. Let’s dive in!!!
Prerequisites:
You just need only two things to get started.
Open kaggle home, go to your account by clicking your profile picture at top right hand corner …
Now scroll down to the API section and click on “Create new API token”
You will get a kaggle.json file, save it somewhere in your PC. Now bring up google colab make sure to sign in with your google account.
You can always choose the python version and your choice of hardware (GPU/TPU) by clicking ‘change runtime type’ under ‘runtime tab’.
Next is to install kaggle library in your environment, just type below command . Note that Colab needs an ! at the starting on every command that you may not need when you work in your PC(command prompt or Anaconda).
!pip install kaggle
All unix commands like ls, cd, mkdir, unzip & pip works directly on Colab with a prefixed “!”. Once you install it create a directory named Kaggle.
!mkdir .kaggle
Now you’ve to use the kaggle.json file that you’ve downloaded to connect your Colab notebook with Kaggle. Use the below command to upload you json file. once you type it, you will automatically prompted to upload your file.
Move this uploaded file into kaggle directory and give proper permissions.
Basic Kaggle commands
I am listing the most useful kaggle commands that you’ll need to work on kaggle datasets.
- Browsing Kaggle datasets: This command will list the datasets available in kaggle.
!kaggle datasets list
To search any specific competition you can use below command e.g beginners competitions can be listed using
!kaggle competitions list — category gettingStarted.
2. Checking leader board of any competition : By using below command you can check leader board of any competition you want.
!kaggle competitions leaderboard favorita-grocery-sales-forecasting -s
3. Downloading datasets for a competition : By using the below command you can download dataset for any competition. Go to kaggle → chose the competition → go the data tab and copy the API link.
It’s good to create 2 separate directories Train and Test, move the respective zip files in the folder and unzip it.
4. Submission for a competition : Every competition ask you to upload the submission file in csv which contains the predictions done by your code. you can directly submit from colab using below command.
!kaggle competitions submit humpback-whale-identification -f sample_submission.csv -m “Submitted by DSVS_01272019”
Pro-tip : You can create a small shell script for automated submissions but remember kaggle will except only 5 submissions per day.
Conclusion : You can use above commands to make your kaggling journey easier for you using Google Colab. If you like this article please share it with other kagglers and data scientists. Do clap, if you find this useful. I’ll be covering important concepts in more interesting way in my next posts. Thanks for reading….