尝试飞桨的PaddleHelix螺旋桨生物计算框架(失败)
尝试飞桨的PaddleHelix螺旋桨生物计算框架
螺旋桨(PaddleHelix)是一个生物计算工具集,是用机器学习的方法,特别是深度神经网络,致力于促进以下领域的发展:
- 新药发现。提供1)大规模预训练模型:化合物和蛋白质; 2)多种应用:分子属性预测,药物靶点亲和力预测,和分子生成。
- 疫苗设计。提供RNA设计算法,包括LinearFold和LinearPartition。
- 精准医疗。提供药物联用的应用。
官网:https://github.com/PaddlePaddle/PaddleHelix
后来为了简化操作,官方提供了PaddleHelix的API调用服务,技术详情见:
PaddleHelix平台API SDK
尝试安装
先直接pip安装
先尝试pip isntall paddlehelix
pip isntall paddlehelix
失败
一看,是两年前的版本啦,估计是不支持python3.10
使用conda安装创建虚拟环境后安装
使用conda创建python3.7环境
conda create -n paddlehelix python=3.7
后来设为python3.8环境
conda create -n paddlehelix python=3.8
激活环境
conda activate paddlehelix
尝试conda安装rdkit
conda install -c conda-forge rdkit
如果太慢,就用pip安装
使用conda创建新环境的问题,需要重新安装飞桨
安装飞桨
python -m pip install paddlepaddle-gpu -f https://paddlepaddle.org.cn/whl/stable.html
安装相关库
pip install pgl
使用pip安装PaddleHelix
pip install paddlehelix
还是没装成。
最后鼓捣了好久也没调通。怪不得官方后来提供了API调用服务,是发现大家真的装不上PaddleHelix啊!
调试
pip安装paddlehelix的时候,报错THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE
Collecting networkx (from paddlehelix)
Downloading https://mirrors.aliyun.com/pypi/packages/a8/05/9d4f9b78ead6b2661d6e8ea772e111fc4a9fbd866ad0c81906c11206b55e/networkx-3.1-py3-none-any.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 1.6/2.1 MB 33.5 MB/s eta 0:00:01
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
networkx from https://mirrors.aliyun.com/pypi/packages/a8/05/9d4f9b78ead6b2661d6e8ea772e111fc4a9fbd866ad0c81906c11206b55e/networkx-3.1-py3-none-any.whl#sha256=4f33f68cb2afcf86f28a45f43efc27a9386b535d567d2127f8f61d51dec58d36 (from paddlehelix):
Expected sha256 4f33f68cb2afcf86f28a45f43efc27a9386b535d567d2127f8f61d51dec58d36
Got 9f9c721c0a7b33c92099cdd018101b05913b46ce578a533eeb80892853f94afb
换成baidu源
skl报错: The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
Downloading https://mirrors.aliyun.com/pypi/packages/46/1c/395a83ee7b2d2ad7a05b453872053d41449564477c81dc356f720b16eac4/sklearn-0.0.post12.tar.gz (2.6 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
rather than 'sklearn' for pip commands.
Here is how to fix this error in the main use cases:
- use 'pip install scikit-learn' rather than 'pip install sklearn'
- replace 'sklearn' by 'scikit-learn' in your pip requirements files
(requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
- if the 'sklearn' package is used by one of your dependencies,
it would be great if you take some time to track which package uses
'sklearn' instead of 'scikit-learn' and report it to their issue tracker
- as a last resort, set the environment variable
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
More information is available at
https://github.com/scikit-learn/sklearn-pypi-package
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
设了这句也不管用:
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True
手工安装scikit_learn也不管用
终于找到了官方的文件:
wget https://baidu-nlp.bj.bcebos.com/PaddleHelix/HelixFold/ppfleetx-0.0.0-py3-none-any.whl
结果这里报错:
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Checking for Rust toolchain....
Cargo, the Rust package manager, is not installed or is not on PATH.
This package requires Rust and Cargo to compile extensions. Install it through
the system's package manager or via https://rustup.rs/
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
结论就是不适合自己弄环境啊,至少在AIStudio下难弄。
看了下官网,现在开始提供api啦。
但是需要鉴权,必须开始就鉴权,连其它价格信息等都没有。
附:PaddleHelix平台API SDK
1. 设置API鉴权AK、SK
- 获取生成接口鉴权信息所需的访问密钥ID(AK)及秘密访问密钥(SK),参考:如何获取AK、SK
- 开通 CHPC 服务:开通页
- 将AK、SK设置为环境变量 PADDLEHELIX_API_AK、PADDLEHELIX_API_SK
export PADDLEHELIX_API_AK="your_access_key" export PADDLEHELIX_API_SK="your_secret_key"
2. 安装API SDK
- 下载 paddlehelix-1.3.1-py3-none-any.whl 到本地
- 使用pip命令安装,注意替换paddlehelix-1.3.1-py3-none-any.whl文件在本机的实际路径
pip install paddlehelix-1.3.1-py3-none-any.whl
3. HelixFold3 JSON说明
4. HelixFold3 端到端直接使用
任务提交函数只有一个,helixfold3.execute,下面针对数据、单任务模式提交、批量任务提交三个部分介绍
数据准备
输入数据包括4个参数,至少一个参数有效,可以多参数联合使用。参数包括data、data_list、file_path、file_dir,详细介绍如下
- data,单个JSON数据,格式见HelixFold3 JSON说明
- data_list,JSON数据列表,格式是[{},{},...],每一个元素格式同参数data
- file_path,数据文件路径,文件内容同data/data_list
- file_dir,数据文件目录路径,目录中包含有多个数据文件,每个文件内容格式同file_path
任务提交
数据准备环节中介绍的数据参数对于单任务/批量任务模式是通用的。单任务/多任务模式指的是每一次请求中提交任务的方式, 假设用户有n个任务,不论是单任务还是多任务模式最终都会将n个任务提交完毕,区别在于单任务模式下每次请求提交一个任务,任务被循环提交; 而在多任务模式下,每次请求提交m个任务,任务被循环提交,直到n个任务提交完毕
请注意, 在运行下列的示例代码时, 请确保处于上述设置过PADDLEHELIX_API_AK
, PADDLEHELIX_API_SK
的终端环境
- 单任务模式
from paddlehelix.task import helixfold3
data = {
"job_name": "7xwo_chain_F_22",
"entities": [
{
"type": "protein",
"sequence": "HKTDSFVGLMA",
"count": 2
}
]
}
helixfold3.execute(data=data, output_dir="output")
- 多任务模式