vllm.model_executor.models.mistral_large_3 ¶
MistralLarge3ForCausalLM ¶
Bases: DeepseekV3ForCausalLM
Source code in vllm/model_executor/models/mistral_large_3.py
remapping class-attribute instance-attribute ¶
remapping = {
"layers\\.(\\d+)\\.attention_norm\\.weight": "model.layers.\\1.input_layernorm.weight",
"layers\\.(\\d+)\\.attention\\.wq_a\\.(\\w+)": "model.layers.\\1.self_attn.q_a_proj.\\2",
"layers\\.(\\d+)\\.attention\\.q_a_norm\\.weight": "model.layers.\\1.self_attn.q_a_layernorm.weight",
"layers\\.(\\d+)\\.attention\\.wq_b\\.(\\w+)": "model.layers.\\1.self_attn.q_b_proj.\\2",
"layers\\.(\\d+)\\.attention\\.wkv_a_with_mqa\\.(\\w+)": "model.layers.\\1.self_attn.kv_a_proj_with_mqa.\\2",
"layers\\.(\\d+)\\.attention\\.kv_a_norm\\.weight": "model.layers.\\1.self_attn.kv_a_layernorm.weight",
"layers\\.(\\d+)\\.attention\\.wkv_b\\.(\\w+)": "model.layers.\\1.self_attn.kv_b_proj.\\2",
"layers\\.(\\d+)\\.attention\\.wo\\.(\\w+)": "model.layers.\\1.self_attn.o_proj.\\2",
"layers\\.(\\d+)\\.ffn_norm\\.weight": "model.layers.\\1.post_attention_layernorm.weight",
"layers\\.(\\d+)\\.feed_forward\\.w1\\.(\\w+)": "model.layers.\\1.mlp.gate_proj.\\2",
"layers\\.(\\d+)\\.feed_forward\\.w2\\.(\\w+)": "model.layers.\\1.mlp.down_proj.\\2",
"layers\\.(\\d+)\\.feed_forward\\.w3\\.(\\w+)": "model.layers.\\1.mlp.up_proj.\\2",
"layers\\.(\\d+)\\.gate\\.weight": "model.layers.\\1.mlp.gate.weight",
"layers\\.(\\d+)\\.shared_experts\\.w1\\.(\\w+)": "model.layers.\\1.mlp.shared_experts.gate_proj.\\2",
"layers\\.(\\d+)\\.shared_experts\\.w2\\.(\\w+)": "model.layers.\\1.mlp.shared_experts.down_proj.\\2",
"layers\\.(\\d+)\\.shared_experts\\.w3\\.(\\w+)": "model.layers.\\1.mlp.shared_experts.up_proj.\\2",
"layers\\.(\\d+)\\.experts\\.(\\d+)\\.w1\\.(\\w+)": "model.layers.\\1.mlp.experts.\\2.gate_proj.\\3",
"layers\\.(\\d+)\\.experts\\.(\\d+)\\.w2\\.(\\w+)": "model.layers.\\1.mlp.experts.\\2.down_proj.\\3",
"layers\\.(\\d+)\\.experts\\.(\\d+)\\.w3\\.(\\w+)": "model.layers.\\1.mlp.experts.\\2.up_proj.\\3",
"norm\\.weight": "model.norm.weight",
"tok_embeddings\\.weight": "model.embed_tokens.weight",
"output\\.weight": "lm_head.weight",
}
_remap_mistral_to_ds ¶
Remap Mistral parameters to DeepseekV2 parameters.