Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.7k views
in Technique[技术] by (71.8m points)

关于BeautifulSoup的find方法查找中文的问题

代码如下
`
from bs4 import BeautifulSoup
import requests

url = "http://www.paopaoche.net/psp/280873.html"
res = requests.get(url)
res.encoding="gb2312"
bsObj = BeautifulSoup(res.text)
tag1 = bsObj.find("dd", {"class":"left"}).find(class_="xq").find("em", text="游戏类型")
print(tag1)
`
终端运行代码输出为None,如果find("em", text="1993"),则可以匹配到,应该是编码问题,但网上搜的方法都试了,无解。


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
tag1 = bsObj.find("dd", {"class": "left"}).find(class_="xq").find("em")
print(str(tag1).encode('gbk', 'ignore').decode('gbk'))

我这么写获取的结果是

<em>游戏类型:<a href="http://www.paopaoche.net/psp/110_1.html" target="_blank">动作冒险</a></em>

em下面不是纯文本按照你的写法当然获取不到东西,下面的写法才能获得到东西,根据情况调整一下吧

tag1 = bsObj.find("dd", {"class": "left"}).find(class_="xq").find("em").find("a").text

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...