vLLM & vLLM-ascend源码走读
的主要作用是在导入 vllm 模块时,对一些关键的环境变量和 PyTorch 配置进行设置和调整,以确保 vLLM 在不同的环境中能够正常运行,并避免一些潜在的问题。方法,当用户尝试访问该模块中未直接导入的属性时,会调用这个方法。解释了env_override的作用:设置一些通用的配置和环境变量,这些变量应该为 vllm 创建的所有进程以及与 vllm 工作进程交互的所有进程设置。定义了一个字典
init.py
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
这两行是关于代码的许可证信息。SPDX-License-Identifier 表明该代码使用的是 Apache-2.0 许可证,SPDX-FileCopyrightText 声明版权归 vLLM 项目的贡献者所有。
# The version.py should be independent library, and we always import the
# version library first. Such assumption is critical for some customization.
from .version import __version__, __version_tuple__ # isort:skip
注释说明 version.py 应该是一个独立的库,并且总是首先导入版本库,这对于某些定制化很关键。 from .version import __version__, __version_tuple__ 从当前包的 version.py 文件中导入 __version__ 和 __version_tuple__ 变量,# isort:skip 告诉 isort 工具跳过对这一行的排序。
import typing
导入 typing 模块,用于类型注解。
# The environment variables override should be imported before any other
# modules to ensure that the environment variables are set before any
# other modules are imported.
import vllm.env_override # noqa: F401
注释说明环境变量覆盖模块应该在其他模块之前导入,以确保在导入其他模块之前设置好环境变量。import vllm.env_override 导入 vllm 包中的 env_override 模块,# noqa: F401 告诉 flake8 工具忽略未使用导入的警告。
vllm.env_override的主要作用是在导入 vllm 模块时,对一些关键的环境变量和 PyTorch 配置进行设置和调整,以确保 vLLM 在不同的环境中能够正常运行,并避免一些潜在的问题。具体来说,0.9.1版本中该文件做了以下几件事:
- 处理
NCCL_CUMEM_ENABLE环境变量,根据系统情况进行设置,以优化多节点 NVLink 的使用。 - 设置
PYTORCH_NVML_BASED_CUDA_CHECK环境变量,避免 torch.cuda.is_available() 函数无意中初始化 CUDA。 - 设置
TORCHINDUCTOR_COMPILE_THREADS环境变量和 torch._inductor.config.compile_threads 配置,可能是为了解决特定的问题。
MODULE_ATTRS = {
"AsyncEngineArgs": ".engine.arg_utils:AsyncEngineArgs",
"EngineArgs": ".engine.arg_utils:EngineArgs",
"AsyncLLMEngine": ".engine.async_llm_engine:AsyncLLMEngine",
"LLMEngine": ".engine.llm_engine:LLMEngine",
"LLM": ".entrypoints.llm:LLM",
"initialize_ray_cluster": ".executor.ray_utils:initialize_ray_cluster",
"PromptType": ".inputs:PromptType",
"TextPrompt": ".inputs:TextPrompt",
"TokensPrompt": ".inputs:TokensPrompt",
"ModelRegistry": ".model_executor.models:ModelRegistry",
"SamplingParams": ".sampling_params:SamplingParams",
"PoolingParams": ".pooling_params:PoolingParams",
"ClassificationOutput": ".outputs:ClassificationOutput",
"ClassificationRequestOutput": ".outputs:ClassificationRequestOutput",
"CompletionOutput": ".outputs:CompletionOutput",
"EmbeddingOutput": ".outputs:EmbeddingOutput",
"EmbeddingRequestOutput": ".outputs:EmbeddingRequestOutput",
"PoolingOutput": ".outputs:PoolingOutput",
"PoolingRequestOutput": ".outputs:PoolingRequestOutput",
"RequestOutput": ".outputs:RequestOutput",
"ScoringOutput": ".outputs:ScoringOutput",
"ScoringRequestOutput": ".outputs:ScoringRequestOutput",
}
定义了一个字典 MODULE_ATTRS,键值是要暴露给用户的类或函数名,值是它们在包内的导入路径,格式为 模块名:属性名。用法举例:
- 最简单的用法,from vllm import LLM, SamplingParams
- 对应的LLM和SamplingParams可以根据该模块找到对应的路径。
if typing.TYPE_CHECKING:
from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.engine.llm_engine import LLMEngine
from vllm.entrypoints.llm import LLM
from vllm.executor.ray_utils import initialize_ray_cluster
from vllm.inputs import PromptType, TextPrompt, TokensPrompt
from vllm.model_executor.models import ModelRegistry
from vllm.outputs import (ClassificationOutput,
ClassificationRequestOutput, CompletionOutput,
EmbeddingOutput, EmbeddingRequestOutput,
PoolingOutput, PoolingRequestOutput,
RequestOutput, ScoringOutput,
ScoringRequestOutput)
from vllm.pooling_params import PoolingParams
from vllm.sampling_params import SamplingParams
else:
def __getattr__(name: str) -> typing.Any:
from importlib import import_module
if name in MODULE_ATTRS:
module_name, attr_name = MODULE_ATTRS[name].split(":")
module = import_module(module_name, __package__)
return getattr(module, attr_name)
else:
raise AttributeError(
f'module {__package__} has no attribute {name}')
if typing.TYPE_CHECKING: 条件判断用于在类型检查时执行相应的导入操作,这样可以避免在运行时不必要的导入。 else 分支定义了 __getattr__ 方法,当用户尝试访问该模块中未直接导入的属性时,会调用这个方法。它会根据 MODULE_ATTRS 字典中的信息动态导入相应的模块和属性,如果属性名不在字典中,则抛出 AttributeError 异常。Note:这个问题可能最常出现在版本变动时。
__all__ = [
"__version__",
"__version_tuple__",
"LLM",
"ModelRegistry",
"PromptType",
"TextPrompt",
"TokensPrompt",
"SamplingParams",
"RequestOutput",
"CompletionOutput",
"PoolingOutput",
"PoolingRequestOutput",
"EmbeddingOutput",
"EmbeddingRequestOutput",
"ClassificationOutput",
"ClassificationRequestOutput",
"ScoringOutput",
"ScoringRequestOutput",
"LLMEngine",
"EngineArgs",
"AsyncLLMEngine",
"AsyncEngineArgs",
"initialize_ray_cluster",
"PoolingParams",
]
`__all__ 列表定义了使用 from vllm import * 语句时要导入的属性和类名。
总结: __init__.py 文件的主要作用是暴露 vLLM 包中的一些核心类和函数,同时使用动态导入的方式避免不必要的模块导入,提高性能。
env_override.py
logger = init_logger(__name__)
调用 init_logger 函数,以当前模块的名称 __name__ 作为参数,初始化一个日志记录器对象 logger,用于记录程序运行过程中的信息。
# set some common config/environment variables that should be set
# for all processes created by vllm and all processes
# that interact with vllm workers.
# they are executed whenever `import vllm` is called.
解释了env_override的作用:设置一些通用的配置和环境变量,这些变量应该为 vllm 创建的所有进程以及与 vllm 工作进程交互的所有进程设置。每当调用 import vllm 时,这些代码就会被执行。
if os.environ.get('NCCL_CUMEM_ENABLE', '0') != '0':
logger.warning(
"NCCL_CUMEM_ENABLE is set to %s, skipping override. "
"This may increase memory overhead with cudagraph+allreduce: "
"https://github.com/NVIDIA/nccl/issues/1234",
os.environ['NCCL_CUMEM_ENABLE'])
elif not os.path.exists('/dev/nvidia-caps-imex-channels'):
# NCCL requires NCCL_CUMEM_ENABLE to work with
# multi-node NVLink, typically on GB200-NVL72 systems.
# The ultimate way to detect multi-node NVLink is to use
# NVML APIs, which are too expensive to call here.
# As an approximation, we check the existence of
# /dev/nvidia-caps-imex-channels, used by
# multi-node NVLink to communicate across nodes.
# This will still cost some GPU memory, but it is worthwhile
# because we can get very fast cross-node bandwidth with NVLink.
os.environ['NCCL_CUMEM_ENABLE'] = '0'
这部分代码用于处理 NCCL_CUMEM_ENABLE 环境变量。
- 首先检查 NCCL_CUMEM_ENABLE 环境变量的值是否不为 '0'。如果是,则使用日志记录器记录一个警告信息,提示该变量已被设置,跳过覆盖操作,并指出这样做可能会增加 cudagraph + allreduce 的内存开销,并提供了相关的 GitHub 问题链接:https://github.com/NVIDIA/nccl/issues/1234。
- 如果 NCCL_CUMEM_ENABLE 环境变量的值为 '0',则检查 /dev/nvidia-caps-imex-channels 文件是否存在。如果不存在,说明系统可能不支持多节点 NVLink,将 NCCL_CUMEM_ENABLE 环境变量的值设置为 '0'。注释中解释了这样做的原因:虽然会消耗一些 GPU 内存,但可以通过 NVLink 获得非常快的跨节点带宽。
# see https://github.com/vllm-project/vllm/pull/15951
# it avoids unintentional cuda initialization from torch.cuda.is_available()
os.environ['PYTORCH_NVML_BASED_CUDA_CHECK'] = '1'
这部分代码设置 PYTORCH_NVML_BASED_CUDA_CHECK 环境变量为 '1'。注释中提到了相关的 GitHub pull请求链接,解释了这样做的目的是避免 torch.cuda.is_available() 函数无意中初始化 CUDA。
# see https://github.com/vllm-project/vllm/issues/10619
torch._inductor.config.compile_threads = 1
vllm启动时将 torch._inductor.config.compile_threads设置为 1。 torch.compile的线程数是在compile_threads时确定的,如不在导入vllm时设置,改为在import torch后设置,torch.compile仍然会启动多线程。
env_override主要是设置几个常用的环境参数,对于国内使用而言,通常也可以将 os.environ['VLLM_USE_MODELSCOPE'] = 'True'等写入。
logger.py
import datetime
import json
import logging
import os
import sys
from collections.abc import Hashable
from functools import lru_cache, partial
from logging import Logger
from logging.config import dictConfig
from os import path
from types import MethodType
from typing import Any, Optional, cast
import vllm.envs as envs
VLLM_CONFIGURE_LOGGING = envs.VLLM_CONFIGURE_LOGGING
VLLM_LOGGING_CONFIG_PATH = envs.VLLM_LOGGING_CONFIG_PATH
VLLM_LOGGING_LEVEL = envs.VLLM_LOGGING_LEVEL
VLLM_LOGGING_PREFIX = envs.VLLM_LOGGING_PREFIX
这部分代码导入了必要的模块,并从vllm.envs中获取环境变量配置:
- VLLM_CONFIGURE_LOGGING: 是否启用 vLLM 的日志配置
- VLLM_LOGGING_CONFIG_PATH: 自定义日志配置文件路径
- VLLM_LOGGING_LEVEL: 默认日志级别
- VLLM_LOGGING_PREFIX: 日志前缀
_FORMAT = (f"{VLLM_LOGGING_PREFIX}%(levelname)s %(asctime)s "
"[%(filename)s:%(lineno)d] %(message)s")
_DATE_FORMAT = "%m-%d %H:%M:%S"
DEFAULT_LOGGING_CONFIG = {
"formatters": {
"vllm": {
"class": "vllm.logging_utils.NewLineFormatter",
"datefmt": _DATE_FORMAT,
"format": _FORMAT,
},
},
"handlers": {
"vllm": {
"class": "logging.StreamHandler",
"formatter": "vllm",
"level": VLLM_LOGGING_LEVEL,
"stream": "ext://sys.stdout",
},
},
"loggers": {
"vllm": {
"handlers": ["vllm"],
"level": "DEBUG",
"propagate": False,
},
},
"version": 1,
"disable_existing_loggers": False
}
vLLM 的默认日志配置,使用字典形式定义:
- formatters:使用NewLineFormatter类(在vllm.logging_utils中定义),按照前面定义的格式和日期格式输出
- handlers:使用标准输出流,级别由环境变量决定
- loggers:根日志器名为 "vllm",级别设为 DEBUG,不传播日志到父级
@lru_cache
def _print_info_once(logger: Logger, msg: str, *args: Hashable) -> None:
logger.info(msg, *args, stacklevel=2)
@lru_cache
def _print_warning_once(logger: Logger, msg: str, *args: Hashable) -> None:
logger.warning(msg, *args, stacklevel=2)
class _VllmLogger(Logger):
def info_once(self, msg: str, *args: Hashable) -> None:
_print_info_once(self, msg, *args)
def warning_once(self, msg: str, *args: Hashable) -> None:
_print_warning_once(self, msg, *args)
这部分避免了重复输出日志:
- 使用@lru_cache装饰器缓存已调用过的日志消息
- 当相同的消息再次尝试输出时,会被缓存机制拦截
- 通过stacklevel=2确保日志显示的是原始调用者的行号
- _VllmLogger类定义了接口,实际实现是通过方法补丁注入到标准 Logger 实例中
def _configure_vllm_root_logger() -> None:
logging_config = dict[str, Any]()
if not VLLM_CONFIGURE_LOGGING and VLLM_LOGGING_CONFIG_PATH:
raise RuntimeError(
"VLLM_CONFIGURE_LOGGING evaluated to false, but "
"VLLM_LOGGING_CONFIG_PATH was given. VLLM_LOGGING_CONFIG_PATH "
"implies VLLM_CONFIGURE_LOGGING. Please enable "
"VLLM_CONFIGURE_LOGGING or unset VLLM_LOGGING_CONFIG_PATH.")
if VLLM_CONFIGURE_LOGGING:
logging_config = DEFAULT_LOGGING_CONFIG
if VLLM_LOGGING_CONFIG_PATH:
if not path.exists(VLLM_LOGGING_CONFIG_PATH):
raise RuntimeError(
"Could not load logging config. File does not exist: %s",
VLLM_LOGGING_CONFIG_PATH)
with open(VLLM_LOGGING_CONFIG_PATH, encoding="utf-8") as file:
custom_config = json.loads(file.read())
if not isinstance(custom_config, dict):
raise ValueError("Invalid logging config. Expected dict, got %s.",
type(custom_config).__name__)
logging_config = custom_config
for formatter in logging_config.get("formatters", {}).values():
# This provides backwards compatibility after #10134.
if formatter.get("class") == "vllm.logging.NewLineFormatter":
formatter["class"] = "vllm.logging_utils.NewLineFormatter"
if logging_config:
dictConfig(logging_config)
这个函数负责配置 vLLM 的根日志器:
- 处理配置冲突:如果禁用日志配置但提供了配置路径,则抛出错误
- 优先使用默认配置 - 如果指定了配置文件,则加载并使用自定义配置
- 修复向后兼容性问题(更新类路径)
- 使用dictConfig应用最终的日志配置
def init_logger(name: str) -> _VllmLogger:
logger = logging.getLogger(name)
methods_to_patch = {
"info_once": _print_info_once,
"warning_once": _print_warning_once,
}
for method_name, method in methods_to_patch.items():
setattr(logger, method_name, MethodType(method, logger))
return cast(_VllmLogger, logger)
_configure_vllm_root_logger()
logger = init_logger(__name__)
对外提供的日志获取接口: - init_logger函数获取标准 Logger 实例,并动态注入info_once和warning_once方法 - 在模块导入时自动配置根日志器 - 创建模块自身的日志实例
def _trace_calls(log_path, root_dir, frame, event, arg=None):
if event in ['call', 'return']:
# Extract the filename, line number, function name, and the code object
filename = frame.f_code.co_filename
lineno = frame.f_lineno
func_name = frame.f_code.co_name
if not filename.startswith(root_dir):
# only log the functions in the vllm root_dir
return
# Log every function call or return
try:
last_frame = frame.f_back
if last_frame is not None:
last_filename = last_frame.f_code.co_filename
last_lineno = last_frame.f_lineno
last_func_name = last_frame.f_code.co_name
else:
# initial frame
last_filename = ""
last_lineno = 0
last_func_name = ""
with open(log_path, 'a') as f:
ts = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")
if event == 'call':
f.write(f"{ts} Call to"
f" {func_name} in {filename}:{lineno}"
f" from {last_func_name} in {last_filename}:"
f"{last_lineno}\n")
else:
f.write(f"{ts} Return from"
f" {func_name} in {filename}:{lineno}"
f" to {last_func_name} in {last_filename}:"
f"{last_lineno}\n")
except NameError:
# modules are deleted during shutdown
pass
return partial(_trace_calls, log_path, root_dir)
def enable_trace_function_call(log_file_path: str,
root_dir: Optional[str] = None):
"""
Enable tracing of every function call in code under `root_dir`.
This is useful for debugging hangs or crashes.
`log_file_path` is the path to the log file.
`root_dir` is the root directory of the code to trace. If None, it is the
vllm root directory.
Note that this call is thread-level, any threads calling this function
will have the trace enabled. Other threads will not be affected.
"""
logger.warning(
"VLLM_TRACE_FUNCTION is enabled. It will record every"
" function executed by Python. This will slow down the code. It "
"is suggested to be used for debugging hang or crashes only.")
logger.info("Trace frame log is saved to %s", log_file_path)
if root_dir is None:
# by default, this is the vllm root directory
root_dir = os.path.dirname(os.path.dirname(__file__))
sys.settrace(partial(_trace_calls, log_file_path, root_dir))
实现了高级调试功能 —— 函数调用追踪:
- enable_trace_function_call函数启用Python的系统追踪功能
-_trace_calls```是追踪回调函数,记录函数调用和返回事件
- 只记录指定目录下的函数调用,默认是 vLLM 根目录
- 日志包含时间戳、函数名、文件名、行号和调用关系,并提醒用户此功能会显著降低性能,建议仅用于调试
昇腾计算产业是基于昇腾系列(HUAWEI Ascend)处理器和基础软件构建的全栈 AI计算基础设施、行业应用及服务,https://devpress.csdn.net/organization/setting/general/146749包括昇腾系列处理器、系列硬件、CANN、AI计算框架、应用使能、开发工具链、管理运维工具、行业应用及服务等全产业链
更多推荐

所有评论(0)