docker compose deploy fate cluster
官方文档 写的不清晰
KubeFATE,用于生成部署脚本,链接
部署机就是下载了 KubeFATE的主机;运行机就是要安装fate容器的主机(部署机和运行机可以相同)
- 两个主机:并非必须 centos7,Ubuntu也行
- Docker 版本 : 19.03.0+;Docker Compose 版本: 1.27.0+;
- 确保部署机可以ssh免密登录到两个运行节点主机上(如果部署机和运行机一样就不用管了)
$ vim docker-deploy/parties.conf
user=fate # 改成 root 方便
dir=/data/projects/fate
party_list=(10000 9999) # 部署的 partyid,部署单边只填写一个
party_ip_list=(192.168.0.1 192.168.0.2) # 与partyid对应
$ bash ./generate_config.sh
$ ls docker-deploy/outputs/
confs-10000.tar confs-9999.tar serving-10000.tar serving-9999.tar
bash ./docker_deploy.sh all --training # 部署全部
bash ./docker_deploy.sh 9999 --training # 部署单个
docker_deploy.sh 会将配置文件通过 scp 传到目标主机(所以需要免密登录),解压并执行容器创建运行
$ cd /data/projects/fate/confs-10000
$ docker compose ps
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
confs-10000-clustermanager-1 federatedai/eggroll:3.2.0-release "/tini -- bash -c 'j…" clustermanager About a minute ago Up About a minute 4670/tcp
confs-10000-fateflow-1 federatedai/fateflow:2.2.0-release "/bin/bash -c 'set -…" fateflow About a minute ago Up About a minute (healthy) 192.168.7.1:9360->9360/tcp, :::9360->9360/tcp, 192.168.7.1:9380->9380/tcp, :::9380->9380/tcp
confs-10000-mysql-1 mysql:8.0.28 "docker-entrypoint.s…" mysql About a minute ago Up About a minute 3306/tcp, 33060/tcp
confs-10000-nodemanager-1 federatedai/eggroll:3.2.0-release "/tini -- bash -c 'j…" nodemanager About a minute ago Up About a minute 4671/tcp
confs-10000-osx-1 federatedai/osx:2.2.0-release "/tini -- bash -c 'j…" osx About a minute ago Up About a minute 192.168.7.1:9370->9370/tcp, :::9370->9370/tcp
confs-10000-fateboard-1 federatedai/fateboard:2.1.1-release "sh -c 'java -Dsprin…" fateboard About a minute ago Up About a minute 192.168.7.1:8080->8080/tcp
$ docker compose exec fateflow bash
toy 验证(flow test toy --guest-party-id 10000 --host-party-id 9999
), 无法找到命令 flow