r""" This is the configuration class to store the configuration of a [`InternLMModel`]. It is used to instantiate an InternLM model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to
| 29 | |
| 30 | |
| 31 | class InternLMConfig(PretrainedConfig): |
| 32 | r""" |
| 33 | This is the configuration class to store the configuration of a [`InternLMModel`]. It is used to instantiate an InternLM |
| 34 | model according to the specified arguments, defining the model architecture. Instantiating a configuration with the |
| 35 | defaults will yield a similar configuration to that of the InternLM-7B. |
| 36 | |
| 37 | Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the |
| 38 | documentation from [`PretrainedConfig`] for more information. |
| 39 | |
| 40 | |
| 41 | Args: |
| 42 | vocab_size (`int`, *optional*, defaults to 32000): |
| 43 | Vocabulary size of the InternLM model. Defines the number of different tokens that can be represented by the |
| 44 | `inputs_ids` passed when calling [`InternLMModel`] |
| 45 | hidden_size (`int`, *optional*, defaults to 4096): |
| 46 | Dimension of the hidden representations. |
| 47 | intermediate_size (`int`, *optional*, defaults to 11008): |
| 48 | Dimension of the MLP representations. |
| 49 | num_hidden_layers (`int`, *optional*, defaults to 32): |
| 50 | Number of hidden layers in the Transformer encoder. |
| 51 | num_attention_heads (`int`, *optional*, defaults to 32): |
| 52 | Number of attention heads for each attention layer in the Transformer encoder. |
| 53 | hidden_act (`str` or `function`, *optional*, defaults to `"silu"`): |
| 54 | The non-linear activation function (function or string) in the decoder. |
| 55 | max_position_embeddings (`int`, *optional*, defaults to 2048): |
| 56 | The maximum sequence length that this model might ever be used with. Typically set this to something large |
| 57 | just in case (e.g., 512 or 1024 or 2048). |
| 58 | initializer_range (`float`, *optional*, defaults to 0.02): |
| 59 | The standard deviation of the truncated_normal_initializer for initializing all weight matrices. |
| 60 | rms_norm_eps (`float`, *optional*, defaults to 1e-12): |
| 61 | The epsilon used by the rms normalization layers. |
| 62 | use_cache (`bool`, *optional*, defaults to `True`): |
| 63 | Whether or not the model should return the last key/values attentions (not used by all models). Only |
| 64 | relevant if `config.is_decoder=True`. |
| 65 | tie_word_embeddings(`bool`, *optional*, defaults to `False`): |
| 66 | Whether to tie weight embeddings |
| 67 | Example: |
| 68 | |
| 69 | ```python |
| 70 | >>> from transformers import InternLMModel, InternLMConfig |
| 71 | |
| 72 | >>> # Initializing a InternLM internlm-7b style configuration |
| 73 | >>> configuration = InternLMConfig() |
| 74 | |
| 75 | >>> # Initializing a model from the internlm-7b style configuration |
| 76 | >>> model = InternLMModel(configuration) |
| 77 | |
| 78 | >>> # Accessing the model configuration |
| 79 | >>> configuration = model.config |
| 80 | ```""" |
| 81 | model_type = "internlm" |
| 82 | _auto_class = "AutoConfig" |
| 83 | |
| 84 | def __init__( |
| 85 | self, |
| 86 | vocab_size=103168, |
| 87 | hidden_size=4096, |
| 88 | intermediate_size=11008, |