情感词典增强技术：从tidytext到自定义领域词典

超越内置词典，使用tidytext结合词向量（GloVe）和标注数据，为金融/医疗领域定制情感词典。 · 难度：入门 · +10XP

情感词典增强技术：从tidytext到自定义领域词典

通用情感词典（如AFINN）在专业领域表现不佳。本教程展示如何用tidytext和textdata包构建领域特定词典。首先收集标注的金融新闻（正面/负面），计算每词的log-odds ratio。然后使用GloVe词向量找到同义词扩展词典。最后结合text2vec构建基于词典的文本情感得分，并与通用词典对比提升效果。你将学会如何导出自定义词典为data.frame并直接用于inner_join情感分析。

library(tidytext)
library(textdata)
# 使用Loughran-McDonald金融词典
fin_dict <- get_sentiments('loughran')
# 扩展：找到'profit'的同义词
synonyms <- text2vec::sim2(word_vectors['profit',], y=word_vectors, method='cosine')
top_syn <- names(sort(synonyms[1,], decreasing=TRUE)[1:5])
# 添加到词典
new_words <- data.frame(word=top_syn, sentiment='positive', value=1)
extended_dict <- rbind(fin_dict, new_words)

← 上一节

情感词典增强技术：从tidytext到自定义领域词典

情感词典增强技术：从tidytext到自定义领域词典

🏆 学习排行

📢 推荐

🔧 工具

📊 统计