当前位置：首页 > news >正文

使用Gemini, LangChain, Gradio打造一个书籍推荐系统（第四部分）

news 2025/6/8 20:59:37

第四部分：为每本书加上情绪标签

import pandas as pd
books = pd.read_csv("books_with_categories.csv")
from transformers import pipeline
classifier = pipeline("text-classification",model="j-hartmann/emotion-english-distilroberta-base",top_k = None,device = 0)
classifier("I love this!")

transformers 是 Hugging Face 提供的自然语言处理工具包。
pipeline 是它的一个便捷工具，可以快速调用预训练的模型进行各种 NLP 任务（如文本分类、生成、翻译等）。

pipeline(“text-classification”, …) 表示创建一个文本分类任务的 pipeline。
model=“j-hartmann/emotion-english-distilroberta-base” 指定所使用的预训练模型，这个模型专门用于情绪分析（识别情绪，如喜悦、悲伤、愤怒等）。
top_k=None 表示返回所有可能的分类及其概率分数。
device=0 表示使用 GPU 0 加速计算（如果有 GPU）。如果没有 GPU，可以改为 device=-1，表示使用 CPU。

将字符串 “I love this!” 输入到模型中，让模型对这段文本进行情绪分类预测。
返回结果是一个列表，里面包含每个可能的情绪类别及其概率。

Device set to use cuda:0
[[{'label': 'joy', 'score': 0.9771687984466553},{'label': 'surprise', 'score': 0.00852868054062128},{'label': 'neutral', 'score': 0.005764591973274946},{'label': 'anger', 'score': 0.004419785924255848},{'label': 'sadness', 'score': 0.0020923891570419073},{'label': 'disgust', 'score': 0.001611991785466671},{'label': 'fear', 'score': 0.00041385178337804973}]]

books["description"][0]

A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, Gilead is a song of celebration and acceptance of the best and the worst the world has to offer. At its heart is a tale of the sacred bonds between fathers and sons, pitch-perfect in style and story, set to dazzle critics and readers alike.

classifier(books["description"][0])

[[{'label': 'fear', 'score': 0.6548413634300232},{'label': 'neutral', 'score': 0.16985207796096802},{'label': 'sadness', 'score': 0.11640888452529907},{'label': 'surprise', 'score': 0.02070062793791294},{'label': 'disgust', 'score': 0.019100705161690712},{'label': 'joy', 'score': 0.015161297284066677},{'label': 'anger', 'score': 0.003935146611183882}]]

classifier(books["description"][0].split("."))

books[“description”] 表示获取 books 表格中的 description（描述）这一列，返回一个 Series。
[0] 表示取这一列中的第一个元素，也就是第 0 行的 description。

.split(“.”)，这是字符串的 split() 方法，按句号 . 拆分文本。
它会将描述文本按照句子分开，得到一个句子列表。

调用之前创建的情绪分类模型。
它会将句子列表中的每个句子作为输入，分别进行情绪分类。
返回的是每个句子的情绪预测结果列表，每个结果都是一组情绪及其分数。

[[{'label': 'surprise', 'score': 0.7296026945114136},{'label': 'neutral', 'score': 0.1403856873512268},{'label': 'fear', 'score': 0.06816219538450241},{'label': 'joy', 'score': 0.04794241115450859},{'label': 'anger', 'score': 0.009156348183751106},{'label': 'disgust', 'score': 0.00262847519479692},{'label': 'sadness', 'score': 0.0021221605129539967}],[{'label': 'neutral', 'score': 0.44937071204185486},{'label': 'disgust', 'score': 0.2735914885997772},{'label': 'joy', 'score': 0.10908304899930954},{'label': 'sadness', 'score': 0.09362724423408508},{'label': 'anger', 'score': 0.040478333830833435},{'label': 'surprise', 'score': 0.02697017975151539},{'label': 'fear', 'score': 0.006879060063511133}],[{'label': 'neutral', 'score': 0.6462162137031555},{'label': 'sadness', 'score': 0.2427332103252411},{'label': 'disgust', 'score': 0.04342261329293251},{'label': 'surprise', 'score': 0.028300540521740913},{'label': 'joy', 'score': 0.014211442321538925},{'label': 'fear', 'score': 0.014084079302847385},{'label': 'anger', 'score': 0.01103188470005989}],[{'label': 'fear', 'score': 0.928167998790741},{'label': 'anger', 'score': 0.03219102695584297},{'label': 'neutral', 'score': 0.012808729894459248},{'label': 'sadness', 'score': 0.008756889030337334},{'label': 'surprise', 'score': 0.008597911335527897},{'label': 'disgust', 'score': 0.008431846275925636},{'label': 'joy', 'score': 0.001045582932420075}],[{'label': 'sadness', 'score': 0.9671575427055359},{'label': 'neutral', 'score': 0.015104170888662338},{'label': 'disgust', 'score': 0.006480592768639326},{'label': 'fear', 'score': 0.005393994972109795},{'label': 'surprise', 'score': 0.0022869433742016554},{'label': 'anger', 'score': 0.0018428893527016044},{'label': 'joy', 'score': 0.0017338789766654372}],[{'label': 'joy', 'score': 0.9327971935272217},{'label': 'disgust', 'score': 0.03771771863102913},{'label': 'neutral', 'score': 0.01589190773665905},{'label': 'sadness', 'score': 0.006444551516324282},{'label': 'anger', 'score': 0.005025018472224474},{'label': 'surprise', 'score': 0.0015812073834240437},{'label': 'fear', 'score': 0.0005423100665211678}],[{'label': 'joy', 'score': 0.6528703570365906},{'label': 'neutral', 'score': 0.25427502393722534},{'label': 'surprise', 'score': 0.0680830255150795},{'label': 'sadness', 'score': 0.009908979758620262},{'label': 'disgust', 'score': 0.006512209307402372},{'label': 'anger', 'score': 0.00482131028547883},{'label': 'fear', 'score': 0.0035290152300149202}],[{'label': 'neutral', 'score': 0.5494765639305115},{'label': 'sadness', 'score': 0.1116902083158493},{'label': 'disgust', 'score': 0.10400670021772385},{'label': 'surprise', 'score': 0.07876556366682053},{'label': 'anger', 'score': 0.0641336739063263},{'label': 'fear', 'score': 0.05136282742023468},{'label': 'joy', 'score': 0.040564440190792084}]]

sentences = books["description"][0].split(".")
predictions = classifier(sentences)
sentences[0]

A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives

predictions[0]

[{'label': 'surprise', 'score': 0.7296026945114136},{'label': 'neutral', 'score': 0.1403856873512268},{'label': 'fear', 'score': 0.06816219538450241},{'label': 'joy', 'score': 0.04794241115450859},{'label': 'anger', 'score': 0.009156348183751106},{'label': 'disgust', 'score': 0.00262847519479692},{'label': 'sadness', 'score': 0.0021221605129539967}]

sentences[3]

 Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist

predictions[3]

[{'label': 'fear', 'score': 0.928167998790741},{'label': 'anger', 'score': 0.03219102695584297},{'label': 'neutral', 'score': 0.012808729894459248},{'label': 'sadness', 'score': 0.008756889030337334},{'label': 'surprise', 'score': 0.008597911335527897},{'label': 'disgust', 'score': 0.008431846275925636},{'label': 'joy', 'score': 0.001045582932420075}]

predictions

[[{'label': 'surprise', 'score': 0.7296026945114136},{'label': 'neutral', 'score': 0.1403856873512268},{'label': 'fear', 'score': 0.06816219538450241},{'label': 'joy', 'score': 0.04794241115450859},{'label': 'anger', 'score': 0.009156348183751106},{'label': 'disgust', 'score': 0.00262847519479692},{'label': 'sadness', 'score': 0.0021221605129539967}],[{'label': 'neutral', 'score': 0.44937071204185486},{'label': 'disgust', 'score': 0.2735914885997772},{'label': 'joy', 'score': 0.10908304899930954},{'label': 'sadness', 'score': 0.09362724423408508},{'label': 'anger', 'score': 0.040478333830833435},{'label': 'surprise', 'score': 0.02697017975151539},{'label': 'fear', 'score': 0.006879060063511133}],[{'label': 'neutral', 'score': 0.6462162137031555},{'label': 'sadness', 'score': 0.2427332103252411},{'label': 'disgust', 'score': 0.04342261329293251},{'label': 'surprise', 'score': 0.028300540521740913},{'label': 'joy', 'score': 0.014211442321538925},{'label': 'fear', 'score': 0.014084079302847385},{'label': 'anger', 'score': 0.01103188470005989}],[{'label': 'fear', 'score': 0.928167998790741},{'label': 'anger', 'score': 0.03219102695584297},{'label': 'neutral', 'score': 0.012808729894459248},{'label': 'sadness', 'score': 0.008756889030337334},{'label': 'surprise', 'score': 0.008597911335527897},{'label': 'disgust', 'score': 0.008431846275925636},{'label': 'joy', 'score': 0.001045582932420075}],[{'label': 'sadness', 'score': 0.9671575427055359},{'label': 'neutral', 'score': 0.015104170888662338},{'label': 'disgust', 'score': 0.006480592768639326},{'label': 'fear', 'score': 0.005393994972109795},{'label': 'surprise', 'score': 0.0022869433742016554},{'label': 'anger', 'score': 0.0018428893527016044},{'label': 'joy', 'score': 0.0017338789766654372}],[{'label': 'joy', 'score': 0.9327971935272217},{'label': 'disgust', 'score': 0.03771771863102913},{'label': 'neutral', 'score': 0.01589190773665905},{'label': 'sadness', 'score': 0.006444551516324282},{'label': 'anger', 'score': 0.005025018472224474},{'label': 'surprise', 'score': 0.0015812073834240437},{'label': 'fear', 'score': 0.0005423100665211678}],[{'label': 'joy', 'score': 0.6528703570365906},{'label': 'neutral', 'score': 0.25427502393722534},{'label': 'surprise', 'score': 0.0680830255150795},{'label': 'sadness', 'score': 0.009908979758620262},{'label': 'disgust', 'score': 0.006512209307402372},{'label': 'anger', 'score': 0.00482131028547883},{'label': 'fear', 'score': 0.0035290152300149202}],[{'label': 'neutral', 'score': 0.5494765639305115},{'label': 'sadness', 'score': 0.1116902083158493},{'label': 'disgust', 'score': 0.10400670021772385},{'label': 'surprise', 'score': 0.07876556366682053},{'label': 'anger', 'score': 0.0641336739063263},{'label': 'fear', 'score': 0.05136282742023468},{'label': 'joy', 'score': 0.040564440190792084}]]

sorted(predictions[0], key=lambda x: x["label"])

predictions[0] 是第一个预测结果，形式是一个字典列表（list of dictionaries），每个字典表示一个标签（情绪）及其得分

sorted() 是 Python 内置的排序函数，用于对列表中的元素排序，返回一个新的排序后的列表。
排序需要一个排序依据（key），所以这里用了 key=lambda x: x[“label”]。

key 参数用于告诉 sorted() 按什么排序。
lambda x: x[“label”] 是一个匿名函数（lambda表达式），表示“取 x（每个字典）的 ‘label’ 值”。
也就是，排序会按照每个标签的字母顺序进行。

[{'label': 'anger', 'score': 0.009156348183751106},{'label': 'disgust', 'score': 0.00262847519479692},{'label': 'fear', 'score': 0.06816219538450241},{'label': 'joy', 'score': 0.04794241115450859},{'label': 'neutral', 'score': 0.1403856873512268},{'label': 'sadness', 'score': 0.0021221605129539967},{'label': 'surprise', 'score': 0.7296026945114136}]

import numpy as npemotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
isbn = []
emotion_scores = {label: [] for label in emotion_labels}def calculate_max_emotion_scores(predictions):per_emotion_scores = {label: [] for label in emotion_labels}for prediction in predictions:sorted_predictions = sorted(prediction, key=lambda x: x["label"])for index, label in enumerate(emotion_labels):per_emotion_scores[label].append(sorted_predictions[index]["score"])return {label: np.max(scores) for label, scores in per_emotion_scores.items()}

定义了我们关注的情绪类别。
isbn 用于存放书籍编号
emotion_scores 是一个字典，用于存放每个情绪类别的分数列表。初始化为：
{‘anger’: [], ‘disgust’: [], ‘fear’: [], ‘joy’: [], ‘sadness’: [], ‘surprise’: [], ‘neutral’: []}

定义了一个函数，输入参数是 predictions（通常是一个列表，包含多段文本的情绪预测结果，每段是多个情绪及其分数的列表）。
同样初始化一个字典，用于存储当前这组预测中，每种情绪对应的分数。

每个 prediction 是一个情绪字典列表。
将每个 prediction 按 label 字母顺序排序，以便后续按照 emotion_labels 顺序索引。

通过 index 索引到排序后的 sorted_predictions，取出 score。
将该分数加到 per_emotion_scores[label] 对应的列表中。

对每个情绪的分数列表，取最大值 np.max(scores)。

for i in range(10):isbn.append(books["isbn13"][i])sentences = books["description"][i].split(".")predictions = classifier(sentences)max_scores = calculate_max_emotion_scores(predictions)for label in emotion_labels:emotion_scores[label].append(max_scores[label])

把第 i 本书的 ISBN 号码（isbn13 列中的值）添加到 isbn 列表中，方便后续关联。
将第 i 本书的 description（简介文本）按句号 . 分割成多个句子列表。
把所有句子 sentences 送入 classifier（模型），进行情绪分类。
结果 predictions 是一个列表，每个元素是一个句子的预测结果，通常像这样：
[
[{‘label’: ‘joy’, ‘score’: 0.85}, {‘label’: ‘sadness’, ‘score’: 0.10}, {‘label’: ‘anger’, ‘score’: 0.05}],
[{‘label’: ‘joy’, ‘score’: 0.65}, {‘label’: ‘sadness’, ‘score’: 0.30}, {‘label’: ‘anger’, ‘score’: 0.05}],
…
]
调用我们之前写的函数 calculate_max_emotion_scores()，统计每种情绪的最大分数。
遍历我们定义好的情绪标签（emotion_labels），把每种情绪的最大分数添加到 emotion_scores 字典的相应列表中。

emotion_scores

{'anger': [np.float64(0.0641336739063263),np.float64(0.6126185059547424),np.float64(0.0641336739063263),np.float64(0.35148391127586365),np.float64(0.08141247183084488),np.float64(0.2322249710559845),np.float64(0.5381842255592346),np.float64(0.0641336739063263),np.float64(0.30067017674446106),np.float64(0.0641336739063263)],'disgust': [np.float64(0.2735914885997772),np.float64(0.3482844829559326),np.float64(0.10400670021772385),np.float64(0.15072263777256012),np.float64(0.1844954937696457),np.float64(0.727174699306488),np.float64(0.15585491061210632),np.float64(0.10400670021772385),np.float64(0.2794813811779022),np.float64(0.1779276728630066)],'fear': [np.float64(0.928167998790741),np.float64(0.9425278306007385),np.float64(0.9723208546638489),np.float64(0.36070623993873596),np.float64(0.09504325687885284),np.float64(0.05136282742023468),np.float64(0.7474286556243896),np.float64(0.4044959247112274),np.float64(0.9155241250991821),np.float64(0.05136282742023468)],'joy': [np.float64(0.9327971935272217),np.float64(0.7044215202331543),np.float64(0.7672368884086609),np.float64(0.25188079476356506),np.float64(0.040564440190792084),np.float64(0.04337584972381592),np.float64(0.8725654482841492),np.float64(0.040564440190792084),np.float64(0.040564440190792084),np.float64(0.040564440190792084)],'sadness': [np.float64(0.6462162137031555),np.float64(0.887939453125),np.float64(0.5494765639305115),np.float64(0.732685387134552),np.float64(0.8843895196914673),np.float64(0.6213927268981934),np.float64(0.7121942639350891),np.float64(0.5494765639305115),np.float64(0.8402896523475647),np.float64(0.8603722453117371)],'surprise': [np.float64(0.9671575427055359),np.float64(0.1116902083158493),np.float64(0.1116902083158493),np.float64(0.1116902083158493),np.float64(0.4758807122707367),np.float64(0.1116902083158493),np.float64(0.40800026059150696),np.float64(0.820281982421875),np.float64(0.35446029901504517),np.float64(0.1116902083158493)],'neutral': [np.float64(0.7296026945114136),np.float64(0.2525450885295868),np.float64(0.07876556366682053),np.float64(0.07876556366682053),np.float64(0.07876556366682053),np.float64(0.27190276980400085),np.float64(0.07876556366682053),np.float64(0.23448744416236877),np.float64(0.13561409711837769),np.float64(0.07876556366682053)]}

from tqdm import tqdmemotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
isbn = []
emotion_scores = {label: [] for label in emotion_labels}for i in tqdm(range(len(books))):isbn.append(books["isbn13"][i])sentences = books["description"][i].split(".")predictions = classifier(sentences)max_scores = calculate_max_emotion_scores(predictions)for label in emotion_labels:emotion_scores[label].append(max_scores[label])

tqdm 是一个非常流行的进度条库，用来美化循环进度显示。这样可以直观看到代码运行到哪了，特别是处理大量数据时非常有用。
列出了情绪分类器支持的情绪类别，这些标签对应模型输出结果中的 label 字段。

isbn：用于存储每本书的 ISBN 编号。
emotion_scores：用于存储每个情绪标签的最高分数，每个标签对应一个列表。
初始化后的样子：
{
‘anger’: [],
‘disgust’: [],
‘fear’: [],
‘joy’: [],
‘sadness’: [],
‘surprise’: [],
‘neutral’: []
}

用 tqdm 包裹的 for 循环会显示漂亮的进度条。
range(len(books)) 意味着从第一本书到最后一本书，逐行遍历整个 books DataFrame。

把当前书的 ISBN 编号添加到 isbn 列表中。
把当前书的 description 按句号 . 切分为一个句子列表。

把句子列表 sentences 输入到 classifier（情绪分类模型）。
模型返回每个句子的情绪预测结果（每个句子会有多个情绪分数）。

调用之前定义的 calculate_max_emotion_scores 函数，统计当前书中每个情绪类别的最高分数。

遍历每个情绪标签。
把当前书的每个情绪分数保存到 emotion_scores 字典对应的列表中。
这样，循环跑完后：
isbn 列表里会有所有书的 ISBN。
emotion_scores 字典会有所有书的情绪分数（每种情绪对应一个分数列表）。

emotions_df = pd.DataFrame(emotion_scores)
emotions_df["isbn13"] = isbn
emotions_df

在这里插入图片描述

books = pd.merge(books, emotions_df, on = "isbn13")
books

pd.merge() 是 Pandas 的一个函数，用来合并（Join）两个 DataFrame，类似 SQL 中的 JOIN 操作。

合并的两个表：
左表（books）：包含了书籍的基本信息，比如 isbn13、title、author、description、simple_categories 等。
右表（emotions_df）：包含了情绪分析后的结果，比如每本书的 anger、joy、sadness、fear 等分数。

on = “isbn13”：
告诉 Pandas 用哪一列作为合并的依据。
因为每本书都有唯一的 ISBN 编号，所以选择 isbn13 作为合并键。
在这里插入图片描述

books.to_csv("books_with_emotions.csv", index = False)

查看全文

http://www.xdnf.cn/news/743995.html

自动驾驶系列—Monocular 3D Lane Detection for Autonomous Driving

【Web API系列】WebTransportSendStream接口深度解析：构建高性能实时数据传输的基石

Python实现P-PSO优化算法优化循环神经网络LSTM分类模型项目实战

【技能拾遗】——家庭宽带单线复用布线与配置（移动2025版）

【网络与信息安全】实验三 RSA加解密与签名验证

澄清 STM32 NVIC 中断优先级

[网页五子棋][对战模块]实现游戏房间页面，服务器开发(创建落子请求/响应对象)

中文NLP with fastai - Fastai Part4

新视角！经济学顶刊QJE用文本分析探究新技术扩散

简单cnn

go|channel源码分析

c# 如何中的 ? 与 ??

“粽”览全局：分布式系统架构与实践深度解析（端午特别版）

《信号与系统》第 5 章离散时间傅里叶变换

2025年- H61-Lc169--74.搜索二维矩阵(二分查找）--Java版

Qt -下载Qt6与OpenCV

Python训练营打卡Day41

5G-A：开启通信与行业变革的新时代

2025年渗透测试面试题总结-匿名[校招]渗透测试(打击黑灰产)（题目+回答）

Python实现P-PSO优化算法优化循环神经网络LSTM回归模型项目实战

华为OD机试真题——文件目录大小（2025 A卷：100分）Java/python/JavaScript/C++/C语言/GO六种语言最佳实现

（11）课29--30：navicat 的用法；行转列的查询与典型算法，并涉及分组（学生各科成绩与比赛胜负）；

【Unity】AudioSource超过MaxDistance还是能听见

AI笔记 - 网络模型 - mobileNet

[蓝桥杯]机器人塔

Java 文件操作和 IO（5）-- 综合案例练习 -- 示例一

antddesign使用iconfont的字体库和图标库

微服务中引入公共拦截器

python从零开始实现四极场离子轨迹仿真——框架

深入理解设计模式之访问者模式

第四部分：为每本书加上情绪标签

相关文章：