当前位置：首页 > news >正文

chromadb使用hugging face模型时利用镜像网站下载注意事项

news 2025/8/24 7:24:49

chromadb默认使用sentence-transformers/all-MiniLM-L6-v2的词嵌入（词向量）模型，如果在程序首次运行时，collection的add或query操作时如果没有指定embeddings或query_embeddings，程序会自动下载相关嵌入向量模型，但是由于默认hugging face后端网络下载速度常常非常慢，所以需要指定镜像网站以加快模型下载速度。

windows系统下具体操作步骤如下：

1、安装huggingface_hub:

pip install huggingface_hub

2、设置huggingface后端镜像网址系统变量：

set HF_ENDPOINT=https://hf-mirror.com

3、检查系统变量是否设置成功：

hf env

4、x下载指定模型（如all-MiniLM-L6-v2模型）到本地指定文件夹中：

huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir ./models/all-MiniLM-L6-v2 --resume-download --local-dir-use-symlinks False

5、在程序中使用本地模型（如all-MiniLM-L6-v2模型）示例：

from sentence_transformers import SentenceTransformer# 指定本地模型路径（注意替换为实际路径）
model_path = r".\models\all-MiniLM-L6-v2"  # Windows路径建议用r""避免转义问题
model = SentenceTransformer(model_path)  # 从本地加载模型# 输入句子列表
sentences = ["This is an example sentence.", "Each sentence is converted."]
embeddings = model.encode(sentences)  # 生成384维向量# 打印结果（示例）
print("向量维度:", embeddings.shape)
for i, emb in enumerate(embeddings):print(f"句子 '{sentences[i]}' 的前5维向量: {emb[:5]}")

6、在chromadb中使用本地词嵌入向量模型示例：

import chromadb
from sentence_transformers import SentenceTransformer# 指定本地模型路径（注意替换为实际路径）
model_path = r".\models\all-MiniLM-L6-v2"  # Windows路径建议用r""避免转义问题
model = SentenceTransformer(model_path)  # 从本地加载模型chroma_client = chromadb.Client()collection = chroma_client.create_collection(name="my_collection"
)#文本
documents=["This is a document about pineapple","This is an island of the USA","This is a location where there are many tourists","This is a document about oranges"]#文本通过模型转换为向量
embeddings = model.encode(documents) #像集合中添加记录
collection.add(embeddings=embeddings,ids=["id1", "id2","id3","id4"],documents=documents
)#查询语句
query_texts=["This is a query document about hawaii"]
#查询语句通过模型转换为向量
query_embeddings = model.encode(query_texts)#查询数据
results = collection.query(query_embeddings=query_embeddings,query_texts=query_texts, # Chroma will embed this for youn_results=2 # how many results to return
)print(results)

查看全文

http://www.xdnf.cn/news/1351441.html