当前位置: 首页 > article >正文

AF3 DataPipeline类process_multiseq_fasta 方法解读

AlphaFold3 data_pipeline 模块DataPipeline类的 process_multiseq_fasta 方法用于处理多序列 FASTA 文件,生成 AlphaFold3 结构预测所需的特征,适用于多链复合物的预测。它结合了 Minkyung Baek 在 Twitter 上提出的“AlphaFold-Gap”策略,即通过在多链 MSA 中插入固定长度的 gap 以模拟多链复合物。

源代码:

    def process_multiseq_fasta(self,
                               fasta_path: str,
                               super_alignment_dir: str,
                               ri_gap: int = 200,
                               ) -> FeatureDict:
        """
            Assembles features for a multi-sequence FASTA. Uses Minkyung Baek's
            hack from Twitter (a.k.a. AlphaFold-Gap).
        """
        with open(fasta_path, 'r') as f:
            fasta_str = f.read()

        input_seqs, input_descs = parsers.parse_fasta(fasta_str)

        # No whitespace allowed
        input_descs = [i.split()[0] for i in input_descs]

        # Stitch all of the sequences together
        input_sequence = ''.join(input_seqs)
        input_description = '-'.join(input_descs)
        num_res = len(input_sequence)

        sequence_features = make_sequence_features(
            sequence=input_sequence,
            description=input_description,
            num_res=num_res,
        )

        seq_lens = [len(s) for s in input_seqs]
        total_offset = 0
        for sl in seq_lens:
            total_offset += sl
            sequence_features["residue_index"][total_offset:] += ri_gap

        msa_list = []
        deletion_mat_list = []
        for seq, desc in zip(input_seqs, input_descs):
            alignment_dir = os.path.join(
                super_alignment_dir, desc
            )
            msas = self._get_msas(
                alignment_dir, seq, None
            )
            msa_list.append([m.sequences for m in msas])
            deletion_mat_list.append([m.deletion_matrix for m in msas])

        final_msa = []
        final_deletion_mat = []
        final_msa_obj = []
        msa_it = enumerate(zip(msa_list, deletion_mat_list))
        for i, (msas, deletion_mats) in msa_it:
            prec, post = sum(seq_lens[:i]), sum(seq_lens[i + 1:])
            msas = [
                [prec * '-' + seq + post * '-' for seq in msa] for msa in msas
            ]
            deletion_mats = [
                [prec * [0] +

http://www.kler.cn/a/566206.html

相关文章:

  • 代理服务器与内网穿透/打洞
  • MVCC,MySQL中常见的锁
  • 数据库基础二(数据库安装配置)
  • 汽车开放系统架构(AUTOSAR)中运行时环境(RTE)生成过程剖析
  • AI智能体与大语言模型:重塑SaaS系统的未来航向
  • 在 IntelliJ IDEA 中启动多个注册到 Nacos 的服务
  • 基因型—环境两向表数据分析——品种生态区划分
  • TCP长连接与短连接
  • PyCharm怎么集成DeepSeek
  • Full GC 排查
  • Windows 图形显示驱动开发-WDDM 3.2-自动显示切换(十二)
  • Python 创建一个能够筛选文件的PDF合并工具
  • 74.时间显示的两种方法 WPF例子 C#例子
  • HTTP服务
  • 在鸿蒙HarmonyOS手机上安装hap应用
  • Node.js定义以及性能优化
  • Opencv 图像形态学操作
  • 蓝桥杯---快速排序(leetcode第159题)最小的k个元素(剑指offer原题)
  • [Lc优选算法] 双指针 | 移动零 | 复写零 | 快乐数
  • es 生产集群的部署架构是什么?每个索引的数据量大概有多少?每个索引大概有多少个分片?