炼丹心得
炼丹心得
清空显存
sudo fuser -v /dev/nvidia* |awk '{for(i=1;i<=NF;i++)print "kill -9 " $i;}' | sudo sh
或者查找
fuser -v /dev/nvidia*
https://zhuanlan.zhihu.com/p/637164912
nvitop
nvidia-ml-py is conflict with nvidia-ml-py3 and pynvml. All these three packages will install module pynvml.py. You should uninstall the others and reinstall nvidia-ml-py:
pip3 uninstall nvidia-ml-py3 pynvml
pip3 install --force-reinstall nvidia-ml-py==11.450.51
Or you can create a new virtual environment and install nvitop in that.
https://github.com/XuehaiPan/nvitop/issues/4#issuecomment-894659795
vLLM
如何在一个脚本里,释放 vllm.LLM 及其显存,并启用另外的 vllm.LLM:
#llm is a vllm.LLM object
import gc
import torch
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel
import os
#avoid huggingface/tokenizers process dead lock
os.environ["TOKENIZERS_PARALLELISM"] = "false"
destroy_model_parallel()
#del a vllm.executor.ray_gpu_executor.RayGPUExecutor object
del llm.llm_engine.model_executor
del llm
gc.collect()
torch.cuda.empty_cache()
import ray
ray.shutdown()
https://github.com/vllm-project/vllm/issues/1908#issuecomment-2074543512