【Trick】获取kaggle账号的token和api(用于数据集下载)
0:操作背景
由于未来的科研需要用到Unet,但是运行学长的史山代码无法跑通,自己写了一个Unet并load学长的数据集效果也很差,于是打算从最最基础的开始,上github调用一个Unet并成功在公有数据集上跑一遍实例。
Unet的github链接为:https://github.com/milesial/Pytorch-UNet
里面有个下载数据集的bash文件,装了一些终端需要的操作。
具体如下:
#!/bin/bash
if [[ ! -f ~/.kaggle/kaggle.json ]]; then
echo -n "Kaggle username: "
read USERNAME
echo
echo -n "Kaggle API key: "
read APIKEY
mkdir -p ~/.kaggle
echo "{\"username\":\"$USERNAME\",\"key\":\"$APIKEY\"}" > ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
fi
pip install kaggle --upgrade
kaggle competitions download -c carvana-image-masking-challenge -f train_hq.zip
unzip train_hq.zip
mv train_hq/* data/imgs/
rm -d train_hq
rm train_hq.zip
kaggle competitions download -c carvana-image-masking-challenge -f train_masks.zip
unzip train_masks.zip
mv train_masks/* data/masks/
rm -d train_masks
rm train_masks.zip
可以看到需要kaggle的username和api,才能够下载数据,否则会出现unauthorized的报错。
具体如下:
root@autodl-container-25494b9550-92941e6b:~/Unet# bash scripts/download_data.sh
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Requirement already satisfied: kaggle in /root/miniconda3/lib/python3.8/site-packages (1.6.17)
Requirement already satisfied: requests in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (2.28.2)
Requirement already satisfied: python-dateutil in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (2.8.2)
Requirement already satisfied: bleach in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (6.0.0)
Requirement already satisfied: certifi>=2023.7.22 in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (2024.12.14)
Requirement already satisfied: urllib3 in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (1.26.20)
Requirement already satisfied: tqdm in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (4.64.1)
Requirement already satisfied: python-slugify in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (8.0.4)
Requirement already satisfied: six>=1.10 in /root/miniconda3/lib/python3.8/site-packages (from kaggle) (1.16.0)
Requirement already satisfied: webencodings in /root/miniconda3/lib/python3.8/site-packages (from bleach->kaggle) (0.5.1)
Requirement already satisfied: text-unidecode>=1.3 in /root/miniconda3/lib/python3.8/site-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /root/miniconda3/lib/python3.8/site-packages (from requests->kaggle) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /root/miniconda3/lib/python3.8/site-packages (from requests->kaggle) (2.10)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
401 - Unauthorized - Unauthenticated
unzip: cannot find or open train_hq.zip, train_hq.zip.zip or train_hq.zip.ZIP.
mv: cannot stat 'train_hq/*': No such file or directory
rm: cannot remove 'train_hq': No such file or directory
rm: cannot remove 'train_hq.zip': No such file or directory
401 - Unauthorized - Unauthenticated
unzip: cannot find or open train_masks.zip, train_masks.zip.zip or train_masks.zip.ZIP.
mv: cannot stat 'train_masks/*': No such file or directory
rm: cannot remove 'train_masks': No such file or directory
rm: cannot remove 'train_masks.zip': No such file or directory
而且此后继续在终端输入bash scripts/download_data.sh,也会出现相同的报错,无法回到输入username和api那个阶段。
所以需要删除已有的可能错误配置的 kaggle.json
文件。
具体如下:
rm -f ~/.kaggle/kaggle.json
之后再运行bash文件,就会提示先输入kaggle账号的相关信息了。
1:获取kaggle账号信息
1.1:登录kaggle
在浏览器中输入网址:https://www.kaggle.com/#,进入kaggle 官网
点击sign in,登录自己的账号
如果此前没有注册过kaggle,请选择register,并按照提示注册
通常是用邮箱注册的,所以此处点击sign in with email
输入自己的邮箱和密码,成功登录后跳转到主页
1.2:在设置中下载token
点击右上角自己的头像,选择settings
在account页面往下滑,找到api,点击create new token
接着会下载一个json文件
通过vscode或者记事本打开,就能看到具体的内容
可以发现用户名和api-key都有
此后就可以在运行bash文件的时候填入到对应的位置
具体如下:
root@autodl-container-25494b9550-92941e6b:~/Unet# bash scripts/download_data.sh
Kaggle username:
Kaggle API key:
1.3:取消授权token
点击expire token,左下角会弹出所有api token都已被过期
这样可以保证自己账户的api不会被盗用