当前位置: 首页 > article >正文

PeptidesFunctionalDataset(helpers.dataset_classes文件中的lrgb.py)

任务类型:多任务二分类任务
用途:`PeptidesFunctionalDataset` 处理肽的分子图,并为肽的功能类别进行10种多任务二分类,任务目标是根据肽的分子图预测它们是否属于特定的功能类别(如抗癌、抗病毒等)。

from helpers.dataset_classes.lrgb import PeptidesFunctionalDataset

'''
Adapted from https://github.com/vijaydwivedi75/lrgb.git
https://github.com/HySonLab/Multires-Graph-Transformer.git
https://github.com/hamed1375/Exphormer.git
'''
import hashlib
import os.path as osp
import pickle
import shutil

import pandas as pd
import torch
from ogb.utils import smiles2graph
from ogb.utils.torch_util import replace_numpy_with_torchtensor
from ogb.utils.url import decide_download
from torch_geometric.data import (InMemoryDataset, Data, download_url,
                                  extract_zip)
from tqdm import tqdm
import os

class PeptidesFunctionalDataset(InMemoryDataset):
    def __init__(self, root='data', smiles2graph=smiles2graph,
                 transform=None, pre_transform=None):
        """
        PyG dataset of 15,535 peptides represented as their molecular graph
        (SMILES) with 10-way multi-task binary classification of their
        functional classes.
        The goal is use the molecular representation of peptides instead
        of amino acid sequence representation ('peptide_seq' field in the file,
        provided for possible baseline benchmarking but not used here) to test
        GNNs' representation capability.
        The 10 classes represent the following functional classes (in order):
            ['antifungal', 'cell_cell_communication', 'anticancer',
            'drug_delivery_vehicle', 'antimicrobial', 'antiviral',
            'antihypertensive', 'antibacterial', 'antiparasitic', 'toxic']
        Args:
            root (string): Root directory where the dataset should be saved.
            smiles2graph (callable): A callable function that converts a SMILES
                string into a graph object. We use the OGB featurization.
                * The default smiles2graph requires rdkit to be installed *
        """

        self.original_root = root
        self.smiles2graph = smiles2graph
        self.folder = osp.join(root, 'peptides-functional')

        self.url = 'https://www.dropbox.com/s/ol2v01usvaxbsr8/peptide_multi_class_dataset.csv.gz?dl=1'
        self.version = '701eb743e899f4d793f0e13c8fa5a1b4'  # MD5 hash of the intended dataset file
        self.url_stratified_split = 'https://www.dropbox.com/s/j4zcnx2eipuo0xz/splits_random_stratified_peptide.pickle?dl=1'
        self.md5sum_stratified_split = '5a0114bdadc80b94fc7ae974f13ef061'

        # Check version and update if necessary.
        release_tag = osp.join(self.folder, self.version)
        if osp.isdir(self.folder) and (not osp.exists(release_tag)):
            print(f"{self.__class__.__name__} has been updated.")
            if input("Will you update the dataset now? (y/N)\n").lower() == 'y':
                shutil.rmtree(self.folder)

        super().__init__(self.folder, transform, pre_transform)
        self.data, self.slices = torch.load(self.processed_paths[0])

    @property
    def raw_file_names(self):
        return 'peptide_multi_class_dataset.csv.gz'

    @property
    def processed_file_names(self):
        return 'geometric_data_processed.pt'

    def _md5sum(self, path):
        hash_md5 = hashlib.md5()
        with open(path, 'rb') as f:
            buffer = f.read()
            hash_md5.update(buffer)
        return hash_md5.hexdigest()

    def download(self):
        if decide_download(self.url):
            path = download_url(self.url, self.raw_dir)
            # Save to disk the MD5 hash of the downloaded file.
            hash = self._md5sum(path)
            if hash != self.version:
              

http://www.kler.cn/news/361081.html

相关文章:

  • 约克VRF打造舒适绿色无污染的生活环境
  • 基于Java+ssm的名著阅读网站
  • CSP-S模拟5复盘
  • 【计网】理解TCP全连接队列与tcpdump抓包
  • HCIP-HarmonyOS Application Developer 习题(十三)
  • 革新你的智能体验:AIStarter 3.1.1正式版现已上线【安全认证】ai应用市场,数字人,ai绘画,ai视频,大模型,工作流因有尽有
  • CZX前端秘籍2
  • WebGL编程指南 - 绘制和变换三角形
  • 计算机在我们生活中的应用
  • Go 切片的扩容规则是怎么样的
  • 【数据库】T SQL语句和SSMS有啥联系?
  • 学习鸿蒙Next 之路 http
  • JAVA继承和多态
  • 18.VScode写Java项目的教程
  • 使用ETL进行数据接入的方式
  • 深入探索LINUX中AWK命令:强大的文本处理工具
  • 后端常用安全措施
  • idea中,git提交时忽略某些本地修改.将文件从git暂存区移除
  • 使用GraphRAG系统实现本地部署的Ollama模型问答系统
  • 实现鼠标经过某个元素时弹出提示框(通常称为“工具提示”或“悬浮提示”)