YAML for Configuration Files

Preamble

In [ ]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import os

Introduction

In this section, we're going to have a look at YAML, which is a recursive acronym for "YAML Ain't Markup Language". It is a data interchange file format that is often found with a .yaml or .yml file extension.

What this section is most interested in is using YAML for configuration files, enabling us to extract parameters that we use within our programs so that they can be separated. This means that configuration files can be shared between scripts/programs, and they can be modified without needing to modify source code.

In [2]:
%%blockdiag
{
    orientation = portrait
    "config.yaml" <-> "notebook.ipynb"
    "config.yaml" [color = '#ffffcc']
}

Of course, there are many alternatives such as JavaScript Object Notation (JSON) or Tom's Obvious, Minimal Language (TOML), and they all have their advantages and disadvantages. We won't do a full comparison of YAML vs. alternatives, but some advantages of YAML are:

  • It is human-readable, making it easy for someone to read or create them.
  • Many popular programming languages have support for managing YAML files.
  • YAML is a superset of JSON, meaning that JSON can be easily converted to YAML if needed.

We can see that we have some key-value mappings, where the key appears before the colon and the value appears after. The first key we have is learning_rate with a value of 0.1, the second is random_seed with a value of 789108, and the third is maintainer with a value of Shahin Rostami. Finally, I have included a key mapping to a sequence, where categories has the value of list which contains the items "hotdog" and "not a hotdog".

You can read more about YAML at the YAML Specification web page.

So that we can work with this file later on in this section, you can paste the above YAML into a file in the same directory as this notebook and name it config.yaml. Alternatively, you can just run the cell below which will do it for you.

In [3]:
config_string = '''learning_rate:  0.1
random_seed: 789108
maintainer: Shahin Rostami
categories:
- hotdog
- not a hotdog'''

with open('config.yaml', 'w') as f:
    f.write(config_string)

Getting Python Ready for YAML

Before we begin working with YAML files in Python, we need to make sure we have PyYAML installed. There are alternatives to PyYAML available, but they may not be compatible with the following instructions. However, you can use ruamel.yaml as a drop-in replacement if you wish.

Some options to install PyYAML are with:

Anaconda

conda install -c conda-forge pyyaml

pip

pip install PyYAML

Once you have the package installed you should be ready to import the PyYAML package within Python.

In [4]:
import yaml

Loading YAML with Python

It is surprisingly easy to load a YAML file into a Python dictionary with PyYAML.

In [5]:
with open('config.yaml') as f:
    config = yaml.load(f, Loader=yaml.FullLoader)

We can confirm that it worked by displaying the contents of the config variable.

In [6]:
config
Out[6]:
{'learning_rate': 0.1,
 'random_seed': 789108,
 'maintainer': 'Shahin Rostami',
 'categories': ['hotdog', 'not a hotdog']}

It's as easy as that. We can now access the various elements of the data structure like a normal Python dictionary.

In [7]:
config['learning_rate']
Out[7]:
0.1
In [8]:
config['categories']
Out[8]:
['hotdog', 'not a hotdog']
In [9]:
config['categories'][0]
Out[9]:
'hotdog'

Updating YAML with Python

Let's say we want to update our learning_rate to 0.2 and add an extra category to our category list. We can do this using the normal Python dictionary manipulation.

In [10]:
config['learning_rate'] = 0.2
In [11]:
config['categories'].append('kind of hotdog')

We can then write this back to the config.yaml file to save our changes.

In [12]:
with open('config.yaml', 'w') as f:
    config = yaml.dump(config, stream=f,
                       default_flow_style=False, sort_keys=False)

All done! We can confirm this by loading our YAML File again and displaying the dictionary.

In [13]:
with open('config.yaml') as f:
    config = yaml.load(f, Loader=yaml.FullLoader)
config
Out[13]:
{'learning_rate': 0.2,
 'random_seed': 789108,
 'maintainer': 'Shahin Rostami',
 'categories': ['hotdog', 'not a hotdog', 'kind of hotdog']}

Conclusion

In this section, we briefly introduced YAML before using the PyYAML package to load, manipulate, and save a collection of configuration settings that we stored in a file named config.yaml. Keeping your configuration settings separate from your source code comes with multiple benefits, e.g. allowing modification of these configurations without modifying source code, automation and search throughout your project, and sharing configurations between multiple bits of work.

Support this work

You can support this work by getting the e-book. This notebook will always be available for free in its online format.