实战 | 用Python爬取《云南虫谷》3.6万条评论( 四 )


df.query('userid==1368145091')[['nick','剧集','time','content']].sort_values(by='time')
不得不说,看着正常一些 。。不过,感觉有点话唠,哈哈!
这两位都是啥头像,感兴趣的看看:
from skimage import io# 显示头像img_url = df.query('userid==640014751')['head'].iloc[0]image = io.imread(img_url)io.imshow(image)
评论最多额哥们头像
from skimage import io# 显示头像img_url = df.query('userid==1368145091')['head'].iloc[0]image = io.imread(img_url)io.imshow(image)
评论第二多额哥们头像
咳咳,我就不做评价了,毕竟我的头像和昵称也都很...
3.9. 评论词云
这部分参考《140行代码自己动手写一个词云制作小工具(文末附工具下载)》,我们将从整体词云和主角词云几部分展开
先看看咱们三个主角提及次数
df.fillna('',inplace=True)hu = ['老胡','胡八一','潘粤明','胡','潘']yang = ['张雨绮','Shirley','杨']wang = ['姜超','胖子']df_hu = df[df['content'].str.contains('|'.join(hu))]df_yang = df[df['content'].str.contains('|'.join(yang))]df_wang = df[df['content'].str.contains('|'.join(wang))]df_star = pd.DataFrame({'角色':['胡八一','Shirley杨','王胖子'],'权重':[len(df_hu),len(df_yang),len(df_wang)]})y = df_star['权重']mapper = linear_cmap(field_name='权重', palette=Spectral[11] ,low=min(y) ,high=max(y))df_star_bar = df_star.plot_bokeh.bar(x = '角色',y = '权重',ylabel="提及权重", title="主要角色提及权重", color=mapper,alpha=0.8,legend=False)
王胖子是笑点担当,不得不说出镜次数真高!!!
整体词云
整体词云
胡八一词云
胡八一
杨词云
张雨绮
王胖子词云
王胖子
词云核心代码
【实战 | 用Python爬取《云南虫谷》3.6万条评论】import osimport stylecloudfrom PIL import Imageimport jiebaimport jieba.analyseimport pandas as pdfrom wordcloud import STOPWORDSdef ciYun(data,addWords,stopWords):print('正在作图...')comment_data = http://www.kingceram.com/post/datafor addWord in addWords:jieba.add_word(addWord)comment_after_split = jieba.cut(str(comment_data), cut_all=False)words = ' '.join(comment_after_split)# 词云停用词 stopwords = STOPWORDS.copy()for stopWord in stopWords:stopwords.add(stopWord)# 就下面代码,即可获取满足类型要求的参数stylecloud.gen_stylecloud(text=words,size = 800,palette='tableau.BlueRed_6', # 设置配色方案icon_name='fas fa-mountain',# paper-plane mountain thumbs-up male fa-cloudcustom_stopwords = stopwords,font_path='FZZJ-YGYTKJW.TTF'# bg = bg, # font_path=font_path,# 词云图 字体(中文需要设定为本机有的中文字体))print('词云已生成~')pic_path = os.getcwd()print(f'词云图文件已保存在 {pic_path}')data = df.content.to_list()addWords = ['潘老师','云南虫谷','还原度','老胡','胡八一','潘粤明','张雨绮','Shirley','Shirley杨','杨参谋','王胖子','胖子']# 添加停用词stoptxt = pd.read_table(r'stop.txt',encoding='utf-8',header=None)stoptxt.drop_duplicates(inplace=True)stopWords = stoptxt[0].to_list()words = ['说','年','VIP','真','这是','没','干','好像']stopWords.extend(words)# 运行~ciYun(data,addWords,stopWords)
以上就是本次全部内容
如果你感兴趣,可以点赞+在看,
后台回复 0907领取代码+数据哦!