Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.5k views
in Technique[技术] by (71.8m points)

Scraping data from XML with python

I want to get some data from https://kartkatalog.geonorge.no/api/search?limit=10000&text=&facets[0]name=type&facets[0]value=software&mediatype=xml

What I need is the "title" and "GetCapabilitiesUrl" for every record. I have tried playing around with BeautifulSoup, but I can't find the right way to get the data I want.

Does someone know how to proceed with this?

Thanks.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

That link you posted looks like a JSON file, not an XML file. You can see the difference here. You can use the json module in python to parse this data.

Once you get a string with the data from the website, you can use json.loads() to convert a string containing a JSON object into a python object.

The following code snippet will put all titles in a variable called titles and a urls in urls

import json
import urllib.request
import ssl

ssl._create_default_https_context = ssl._create_unverified_context
raw_json_string = urllib.request.urlopen("https://kartkatalog.geonorge.no/api/search?limit=10000&text=&facets%5B0%5Dname=type&facets%5B0%5Dvalue=software&mediatype=xml").read()
json_object = json.loads(raw_json_string)

titles = []
urls = []

for record in json_object["Results"]:
    titles.append(record["Title"])
    try:
        urls.append(record["GetCapabilitiesUrl"])
    except:
        pass

When writing the code, you can use an online JSON viewer to help you figure out the elements of dictionaries and lists.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...