AF3 _build_query_to_hit_index_mapping函数解读
AlphaFold3 中templates模块的_build_query_to_hit_index_mapping函数是将原始查询序列(original_query_sequence
)中的索引与hit 序列(hit_sequence
)中的索引进行映射。
在蛋白质序列比对(如 HHsearch)中,hit 是与查询序列部分匹配的区域。由于存在缺口(-
)和部分比对,该函数修正索引,使得可以将原始序列的每个氨基酸位置正确映射到 hit 序列中。
该函数输入参数主要来自 HHsearch/HMMsearch 解析器 生成TemplateHit数据,返回的索引映射map可以用于提取模版特征(_extract_template_features函数参数)。
源代码:
def _build_query_to_hit_index_mapping(
hit_query_sequence: str,
hit_sequence: str,
indices_hit: Sequence[int],
indices_query: Sequence[int],
original_query_sequence: str,
) -> Mapping[int, int]:
"""Gets mapping from indices in original query sequence to indices in the hit.
hit_query_sequence and hit_sequence are two aligned sequences containing gap
characters. hit_query_sequence contains only the part of the original query
sequence that matched the hit. When interpreting the indices from the .hhr, we
need to correct for this to recover a mapping from original query sequence to
the hit sequence.
Args:
hit_query_sequence: The portion of the query sequence that is in the .hhr
hit
hit_sequence: The portion of the hit sequence that is in the .hhr