Parsing XML/JSON/YAML into Python data structures

In this article, I’ll cover how to parse XML, JSON, and YAML data structures into Python. If you want to learn more about data structures, visit this article.

Describe parsing of common data format (XML, JSON, and YAML) to Python data structures

Cisco DEVASC 200-901 exam topic covered in this article

XML

Let’s take the following XML code snippet. It’s an inventory containing one device – Switch1, with hostname, IP, vendor, model and os_version elements.

<?xml version="1.0" encoding="UTF-8"?>
<inventory>
    <Switch1>
        <hostname>Cisco-1</hostname>
        <ip>10.10.10.2</ip>
        <vendor>Cisco</vendor>
        <model>C2960X</model>
        <os_version>15.2(2)E7</os_version>
    </Switch1>
</inventory>

Looking at the available Python data structures, the most convenient way to represent our device(Switch1) will be a dictionary. Let’s take a look at the Python code that can parse our XML inventory into Python’s dictionary.

import xmldict

with open('inventory.xml') as file:
    xmlstring = file.read()

xml_dictionary = xmldict.xml_to_dict(xmlstring)

I’ve used an xmldict library, you can find it’s documentation here.

First of all, we’re opening an inventory file and reading it content. The string with XML code is loaded to the xmlstring variable. Then we use the xml_to_dict method from the xmldict library. That’s how we parse the XML string into the native Python data structure – dictionary. The structure of xml_dictionary variable is following.

If you’re not familiar with Python dictionary type, you can visit the official documentation page. It’s full of practical examples that might help you.

Looking at the content of the xml_dictionary variable, we can see that there is an inventory key containing one element – Switch1, which is also dictionary type.

Let’s now examine the content of the Switch1 dictionary. Let’s take look at the bottom __len__ variable, but the one that is on the same level as other device attributes. It tells us that the Switch1 has 5 key-value pairs: hostname, ip, vendor, model, os_version. They are named the same way as in the XML code.

Now as we know how the xml_dictionary is built, let’s take a look at how we can access the values of Switch1 from the Python code. Let’s say that we want to print a hostname and ip address of Switch1. We can do it the following way.

print('Hostname: {hostname}, ip: {ip}'
      .format(hostname=xml_dictionary['inventory']['Switch1']['hostname']
              ,ip=xml_dictionary['inventory']['Switch1']['ip']))

And the command output looks like this.

Hostname: Cisco-1, ip: 10.10.10.2

As you can see, access to the Switch1 key-value pairs is straightforward.

JSON

In the JSON example, we will take the same inventory, so you can focus on the parsing process. Below you can find a JSON representation of the inventory.

{ 
  "Switch1" : {
    "hostname": "Cisco-1",
    "ip": "10.10.10.2",
    "vendor": "Cisco",
    "model": "C2960X",
    "os_version": "15.2(2)E7"
  }
}

Parsing of JSON is more convenient comparing to the XML. We will use a json library, that is already pre-installed. You don’t have to install it manually. If you want to know more, visit the official documentation site. All it takes to parse JSON is to open a file containing our inventory and use the load method from JSON library.

import json

with open(r'inventory.json') as file:
    json_file = json.load(file)

Let’s now check what is the content of the json_file variable. Again, we will use debugger for that.

In this case, the situation is clear. We have a json_file variable, which is a standard Python’s dictionary type. We can access it the same way as presented in the XML example.

YAML

The last example covers the parsing of the YAML data structure. Below you can find our inventory in YAML format.

---
device:
  hostname: Cisco-1
  ip: 10.10.10.2
  vendor: Cisco
  model: C2960X
  os_version: 15.2(2)E7
...

And now let’s jump straight to the code.

import yaml

with open(r'inventory.yaml') as file:
    yaml_file = yaml.load(file, Loader=yaml.FullLoader)

As you can see, the parsing process is similar to JSON. In this case, we’re using the PyYAML library. It’s not installed by default, so you have to do it by yourself.

You can install PyYAML library using Python pip – pip install pyyaml

After opening the inventory file, we’re using a load method. And that’s it! Now let’s inspect the yaml_file variable.

As you can see, yaml_file variable type is again dictionary, so you can use it the same way as described in the XML section.

Summary

Keep in mind that there are multiple ways to parse the XML, JSON and YAML data structures. In this article I’ve just covered three methods at the basic level.

Share

One thought on “Parsing XML/JSON/YAML into Python data structures

Leave a Reply

Your email address will not be published. Required fields are marked *