对于 HF 的 Transformers 库的经典 API 以及大致架构我们都从前面已经学习的差不多了 【动机】那还剩下几个小问题,就是: Tokenizer 和 Specific Model 的运作原理是什么? 我如何查看与修改模型的架构?前向与反向传播过程?损失计算?激活函数? 这些模型层面的内容,如何学习与具体操作?
Args:vocab_size(`int`,*optional*, defaults to 32000):
Vocabulary size of the LLaMA model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`LlamaModel`]hidden_size(`int`,*optional*, defaults to 4096):
Dimension of the hidden representations.intermediate_size(`int`,*optional*, defaults to 11008):
Dimension of the MLP representations.num_hidden_layers(`int`,*optional*, defaults to 32):
Number of hidden layers in the Transformer encoder.num_attention_heads(`int`,*optional*, defaults to 32):
Number of attention heads for each attention layer in the Transformer encoder.hidden_act(`str` or `function`,*optional*, defaults to `"silu"`):
The non-linear activation function(function or string) in the decoder.max_position_embeddings(`int`,*optional*, defaults to 2048):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case(e.g.,512 or 1024 or 2048).initializer_range(`float`,*optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.rms_norm_eps(`float`,*optional*, defaults to 1e-12):
The epsilon used by the rms normalization layers.use_cache(`bool`,*optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions(not used by all models). Only
relevant if `config.is_decoder=True`.tie_word_embeddings(`bool`,*optional*, defaults to `False`):
Whether to tie weight embeddings
Example:
```python
>>> from transformers import LlamaModel, LlamaConfig
>>> # Initializing a LLaMA llama-7b style configuration
>>> configuration =LlamaConfig()>>> # Initializing a model from the llama-7b style configuration
>>> model =LlamaModel(configuration)>>> # Accessing the model configuration
>>> configuration = model.config
```"""
@propertydefvocab_size(self):"""Returns vocab size"""return self.sp_model.get_piece_size()defget_vocab(self):"""Returns vocab as a dict"""
vocab ={self.convert_ids_to_tokens(i): i for i inrange(self.vocab_size)}
vocab.update(self.added_tokens_encoder)return vocab