Qwen3-8B Dify RAG环境搭建
一、环境配置
属性 值
CUDA Driver Version 555.42.02
CUDA Version 12.5
OS Ubuntu 20.04.6 LTS
Docker version 24.0.5, build 24.0.5-0ubuntu1~20.04.1
GPU NVIDIA GeForce RTX 3090 24GB显存
二、操作步骤
1、创建容器
docker run --runtime nvidia --gpus all -ti \
-v $PWD:/home -w /home \
-p 8000:8000 --ipc=host nvcr.io/nvidia/pytorch:24.03-py3 bash
AI写代码
2、下载Qwen3-8B和embedding模型
cd /home
pip install modelscope
modelscope download --model Qwen/Qwen3-8B --local_dir Qwen3-8B
modelscope download --model maidalun/bce-embedding-base_v1 --local_dir bce-embedding-base_v1
AI写代码
3、安装transformers
cd /home
git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout v4.51.0
pip install tokenizers==0.21
python3 setup.py install
AI写代码
4、安装vllm
pip install vllm
pip install flashinfer-python==v0.2.2
python3 -m pip install --upgrade 'optree>=0.13.0'
pip install bitsandbytes>=0.45.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
5、安装flash-attention
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/
git checkout fd2fc9d85c8e54e5c20436465bca709bc1a6c5a1
python setup.py build_ext
python setup.py bdist_wheel
pip install dist/flash_attn-*.whl
————————————————
版权声明:本文为CSDN博主「Hi20240217」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/m0_61864577/article/details/147704158