vllm.entrypoints.serve.rlhf.api_router ¶
attach_router ¶
engine_client ¶
engine_client(request: Request) -> EngineClient
is_paused async ¶
Return the current pause status.
Source code in vllm/entrypoints/serve/rlhf/api_router.py
pause_generation async ¶
pause_generation(
raw_request: Request,
wait_for_inflight_requests: bool = Query(False),
clear_cache: bool = Query(True),
) -> JSONResponse
Pause generation requests to allow weight updates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
wait_for_inflight_requests | bool | When | Query(False) |
clear_cache | bool | Whether to clear KV/prefix caches after draining. | Query(True) |
Source code in vllm/entrypoints/serve/rlhf/api_router.py
resume_generation async ¶
Resume generation after a pause.