Deterministic behavior was enabled with either `torch.use_deterministic_algorith...

2026年05月04日 17:38 状态: processing

🚨 错误信息

Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or `at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility (Triggered internally at ../aten/src/ATen/Context.cpp:156.)

🤖 AI解决方案

报错分析报告

1.错误翻译


``
原文:Deterministic behavior was enabled with either
torch.use_deterministic_algorithms(True) or at::Context::setDeterministicAlgorithms(true), but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8.
译文:已通过
torch.use_deterministic_algorithms(True)at::Context::setDeterministicAlgorithms(true)启用确定性行为,但此操作因使用CuBLAS且CUDA版本≥10.2而非确定性。若需在此场景下启用确定性行为,必须在运行PyTorch应用前设置环境变量:CUBLAS_WORKSPACE_CONFIG=:4096:8 或 CUBLAS_WORKSPACE_CONFIG=:16:8。
`

2.原因分析


  • 根本原因:启用确定性算法后,操作依赖CuBLAS且CUDA≥10.2,但未设置CUBLAS_WORKSPACE_CONFIG环境变量导致非确定性。

  • 发生位置:PyTorch内部ATen/Context.cpp:156触发检查,对应CuBLAS操作的 reproducibility 校验。
  • 3.解决方案


    在运行PyTorch程序设置环境变量(二选一):
    bash

    方式1:终端临时设置(Linux/macOS)


    export CUBLAS_WORKSPACE_CONFIG=:4096:8


    export CUBLAS_WORKSPACE_CONFIG=:16:8

    方式2:Python代码内设置(需在import torch前)


    import os
    os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8" # 或 ":16:8"
    import torch
    torch.use_deterministic_algorithms(True)

    4.️ 预防措施


  • 最佳实践1:启用确定性算法前,先确认依赖库(如CuBLAS)的版本兼容性,优先通过环境变量配置 reproducibility。

  • 最佳实践2:使用torch.backends.cudnn.deterministic = True配合环境变量,双重保障;推荐用torch.utils.collect_env`打印环境信息排查。
  • ---
    :以上方案基于PyTorch与CUDA的reproducibility要求,需确保环境变量在程序启动前生效。