Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
566 views
in Technique[技术] by (71.8m points)

beautifulsoup - How to use python beautiful soup to get only the level 1 navigableText?

I am using beautiful soup to get the text from this example html code:

....
<div style="s1">
    <div style="s2">Here is text 1</div>
    <div style="s3">Here is text 2</div>
Here is text 3 and this is what I want.
</div>
....

Text 1 and text 2 is at the same level 2 and the text 3 is at the upper level 1. I only want to get the text 3 and used this:

for anchor in tbody.findAll('div', style="s1"):
    review=anchor.text
    print review

But these code get me all the text 1,2,3. How do I only get the first level text 3?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Something like:

for anchor in tbody.findAll('div', style="s1"):
    text = ''.join([x for x in anchor.contents if isinstance(x, bs4.element.NavigableString)])

works. Just know that you'll also get the line breaks in there, so .strip()ing might be necessary.

For example:

for anchor in tbody.findAll('div', style="s1"):
    text = ''.join([x for x in anchor.contents if isinstance(x, bs4.element.NavigableString)])
    print([text])
    print([text.strip()])

Prints

[u'


Here is text 3 and this is what I want.
']
[u'Here is text 3 and this is what I want.']

(I put them in lists so you could see the newlines.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...