Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
778 views
in Technique[技术] by (71.8m points)

xml parsing - Python element tree - extract text from element, stripping tags

With ElementTree in Python, how can I extract all the text from a node, stripping any tags in that element and keeping only the text?

For example, say I have the following:

<tag>
  Some <a>example</a> text
</tag>

I want to return Some example text. How do I go about doing this? So far, the approaches I've taken have had fairly disastrous outcomes.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you are running under Python 3.2+, you can use itertext.

itertext creates a text iterator which loops over this element and all subelements, in document order, and returns all inner text:

import xml.etree.ElementTree as ET
xml = '<tag>Some <a>example</a> text</tag>'
tree = ET.fromstring(xml)
print(''.join(tree.itertext()))

# -> 'Some example text'

If you are running in a lower version of Python, you can reuse the implementation of itertext() by attaching it to the Element class, after which you can call it exactly like above:

# original implementation of .itertext() for Python 2.7
def itertext(self):
    tag = self.tag
    if not isinstance(tag, basestring) and tag is not None:
        return
    if self.text:
        yield self.text
    for e in self:
        for s in e.itertext():
            yield s
        if e.tail:
            yield e.tail

# if necessary, monkey-patch the Element class
if 'itertext' not in ET.Element.__dict__:
    ET.Element.itertext = itertext

xml = '<tag>Some <a>example</a> text</tag>'
tree = ET.fromstring(xml)
print(''.join(tree.itertext()))

# -> 'Some example text'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...