vllm.reasoning.holo2_reasoning_parser ¶
Holo2ReasoningParser ¶
Bases: ReasoningParser
Reasoning parser for the Holo2 models which are based on Qwen3.
The Holo2 model uses
The model provides a switch to enable or disable reasoning output via the 'thinking=False' parameter.
Chat template args: - thinking: Whether to enable reasoning output (default: True)
Parsing rules on model output
- thinking == False -> Model output is treated as purely the content |content|
- thinking == True -> Model output is |reasoning_content||content|
Source code in vllm/reasoning/holo2_reasoning_parser.py
__init__ ¶
__init__(tokenizer: TokenizerLike, *args, **kwargs)
Source code in vllm/reasoning/holo2_reasoning_parser.py
extract_content_ids ¶
extract_reasoning ¶
extract_reasoning(
model_output: str, request: ChatCompletionRequest
) -> tuple[str | None, str | None]
extract_reasoning_streaming ¶
extract_reasoning_streaming(
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
) -> DeltaMessage | None