Ftp目录整个下载
最近有个需求是要下载ftp接近十个T的数据,在调研过多个工具后发现还是lftp的mirror最省事
mirror参数
Mirror specified source directory to local target directory. If target directory ends with a slash, the source base name is appended to target directory name. Source
and/or target can be URLs pointing to directories.
-c, --continue continue a mirror job if possible
-e, --delete delete files not present at remote site
--delete-first delete old files before transferring new ones
--depth-first descend into subdirectories before transferring files
-s, --allow-suid set suid/sgid bits according to remote site
--allow-chown try to set owner and group on files
--ascii use ascii mode transfers (implies --ignore-size)
--ignore-time ignore time when deciding whether to download
--ignore-size ignore size when deciding whether to download
--only-missing download only missing files
--only-existing download only files already existing at target
-n, --only-newer download only newer files (-c won't work)
--no-empty-dirs don't create empty directories (implies --depth-first)
-r, --no-recursion don't go to subdirectories
--no-symlinks don't create symbolic links
-p, --no-perms don't set file permissions
--no-umask don't apply umask to file modes
-R, --reverse reverse mirror (put files)
-L, --dereference download symbolic links as files
-N, --newer-than=SPEC download only files newer than specified time
--on-change=CMD execute the command if anything has been changed
--older-than=SPEC download only files older than specified time
--size-range=RANGE download only files with size in specified range
-P, --parallel[=N] download N files in parallel
--use-pget[-n=N] use pget to transfer every single file
--loop loop until no changes found
-i RX, --include RX include matching files
-x RX, --exclude RX exclude matching files
-I GP, --include-glob GP include matching files
-X GP, --exclude-glob GP exclude matching files
-v, --verbose[=level] verbose operation
--log=FILE write lftp commands being executed to FILE
--script=FILE write lftp commands to FILE, but don't execute them
--just-print, --dry-run same as --script=-
--use-cache use cached directory listings
--Remove-source-files remove files after transfer (use with caution)
-a same as --allow-chown --allow-suid --no-umask
问题记录
1.虽然mirror支持多线程,我们也是针对三个大目录(其中很多子目录)下载,但是整个过程中list列表比较费时间,建议是直接mirror子目录 这样线程会多一些。
2.注意使用--only-missing参数,其他的参数比如only-newer 不太清楚原因但是会先删掉本地再下载一遍
#!/bin/bash
# FTP服务器信息
FTP_HOST="xxxxx"
FTP_USER="xxxx"
FTP_PASS="xxxxxxx"
# 定义要同步的远程和本地目录对
declare -A DIR_MAP=(
["/fumulu/zimulu1"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu2"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu3"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu4"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu5"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu6"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu7"]="/data/0/bendi/fumulu/"
)
# 创建日志目录
LOG_DIR="sync_logs"
mkdir -p "$LOG_DIR"
sync_directory() {
local remote_dir=$1
local local_dir=$2
# 生成日志文件名(将目录分隔符替换为下划线)
local log_name=$(echo "${remote_dir}" | tr '/' '_')
local log_file="$LOG_DIR/${log_name}sync.log"
# 确保本地目录存在
mkdir -p "$local_dir"
echo "开始同步 $remote_dir 到 $local_dir..." | tee -a "$log_file"
echo "同步开始时间: $(date)" >> "$log_file"
# 使用lftp进行同步操作,添加 --size-only 参数
temp_log=$(mktemp)
lftp -c "open -u $FTP_USER,$FTP_PASS $FTP_HOST; \
mirror --parallel=1000 --verbose --only-missing $remote_dir $local_dir" 2>&1 | tee -a "$temp_log" "$log_file"
# 检查文件下载失败的情况
if grep -i "File not available" "$temp_log" > /dev/null; then
echo "发现文件下载失败,记录到 shibai.txt..."
# 提取并记录失败的文件信息
grep -i "File not available" "$temp_log" | while read -r line; do
# 提取完整的文件路径和文件名
full_path=$(echo "$line" | grep -o "@.*" | cut -d' ' -f1)
echo "$full_path" >> shibai.txt
done
fi
echo "同步结束时间: $(date)" >> "$log_file"
echo "----------------------------------------" >> "$log_file"
# 清理临时日志文件
rm -f "$temp_log"
}
# 同时启动所有同步任务
for remote_dir in "${!DIR_MAP[@]}"; do
local_dir=${DIR_MAP[$remote_dir]}
sync_directory "$remote_dir" "$local_dir" &
done
# 等待所有后台任务完成
wait
echo "所有同步任务已完成。"