How to Work with YAML in Python
Table of Contents
This article explains how to manage Python YAML with PyYAML. Earthly greatly improves continuous integration workflows involving Python. Learn how.
If you’ve ever worked with Docker or Kubernetes, you’ll have likely used YAML files. From configuring an application’s services in Docker to defining Kubernetes objects like pods, services, and more—YAML is used for them all.
If you’d like to learn how to work with YAML in the Python programming language, then this tutorial is for you. This tutorial will cover creating, writing, reading, and modifying YAML in Python.
The Need for Data Serialization and Why You Should Use YAML
Data serialization is relevant in the exchange of unstructured or semi-structured data effectively across applications or systems with different underlying infrastructures. Data serialization languages use standardized and well-documented syntax to share data across machines.
Some of the widely used data serialization languages include YAML, XML, and JSON. While XML and JSON are used for data transfer between applications, YAML is often used to define the configuration for applications.
YAML is characterized by a simple syntax involving line separation and indentation, without complex syntax involving the use of curly braces, parentheses, and tags.
YAML is a human-readable data-serialization language and stands for “YAML Ain’t Markup Language”, often also referred to as “Yet Another Markup Language”. It is written with a .yml or .yaml (preferred) file extension.
It is used because of its readability to write configuration settings for applications. It is user-friendly and easy to understand.
Prerequisites
To follow along you’ll need the following:
- Local installation of Python 3.x
- A text editor
The PyYAML Library
The PyYAML library is widely used for working with YAML in Python. It comes with a yaml module that you can use to read, write, and modify contents of a YAML file, serialize YAML data, and convert YAML to other data formats like JSON.
Installing the PyYAML Library
To parse YAML in Python, you’ll need to install the PyYAML library.
In your working directory, run the command below to install PyYAML via pip3:
pip3 install pyyaml
To confirm that the installation was successful, you can run the command below:
pip3 show pyyaml
If the PyYAML installation was successful, you should get a similar output.
Creating YAML in Python
You can download the code snippets used in this tutorial from this GitHub repository.
Now that you have PyYAML installed, you can start working with YAML in Python.
In your working directory, create a file called script.py and
import the yaml module:
import yaml
Let’s create a dictionary called data
with the following key-value pairs:
= {
data 'Name':'John Doe',
'Position':'DevOps Engineer',
'Location':'England',
'Age':'26',
'Experience': {'GitHub':'Software Engineer',\
'Google':'Technical Engineer', 'Linkedin':'Data Analyst'},
'Languages': {'Markup':['HTML'], 'Programming'\
'Python', 'JavaScript','Golang']}
:[ }
The dump()
Function
Now, to create a yaml representation of the data
dictionary created above, you can use the dump()
function in the yaml module . The dump()
function expresses Python objects in YAML format. It accepts two arguments, data (which is a Python object) which is required, and optionally, a file to store the YAML format of the Python object.
You can also pass in optional parameters that specify formatting details for the emitter. The commonly used optional parameters are sort_keys
for sorting the keys of a Python object in alphabetical order, and default_flow-style
for proper indentation of nested lists, which is set to True
by default.
The code below will return a str
object that corresponds to a YAML document. As we’ve set sort_keys
to False
, the original order of the keys is preserved.
= yaml.dump(data, sort_keys=False)
yaml_output
print(yaml_output)
Now, run script.py. You should see the following output:
Name: John Doe
Position: DevOps Engineer
Location: England
Age: '26'
Experience:
GitHub: Software Engineer
Google: Technical Engineer
Linkedin: Data Analyst
Languages:
Markup:
- HTML
Programming:
- python
- JavaScript
- golang
You can also create multiple blocks of yaml data from a Python object, such as a list of dictionaries into a single stream, where each dictionary is represented as a YAML document. To do this, you can use the dump_all()
function.
The dump_all()
Function
💡 The dump_all()
function is used to serialize Python objects—in order—into a single stream. It only accepts Python objects represented as lists, such as a list of dictionaries. If you pass in a dictionary object instead of a Python object represented as lists, the dump_all()
function will output each item of the dictionary as a YAML document.
Let’s define a list of dictionaries called data2
.
= [
data2
{'apiVersion': 'v1',
'kind':'persistentVolume',
'metadata': {'name':'mongodb-pv', 'labels':{'type':'local'}},
'spec':{'storageClassName':'hostpath'},
'capacity':{'storage':'3Gi'},
'accessModes':['ReadWriteOnce'],
'hostpath':{'path':'/mnt/data'}
},
{'apiVersion': 'v1',
'kind':'persistentVolume',
'metadata': {'name':'mysql-pv', 'labels':{'type':'local'}},
'spec':{'storageClassName':'hostpath'},
'capacity':{'storage':'2Gi'},
'accessModes':['ReadWriteOnce'],
'hostpath':{'path':'/mnt/data'}
} ]
Now call the dump_all()
function to decode the Python object as a YAML document:
= yaml.dump_all(data2, sort_keys=False)
yaml_output2 print(yaml_output2)
Once you execute this code, you should have the following output. The output below shows that the data2
list has been dumped into a single stream—as multiple blocks of YAML separated by - - -
.
apiVersion: v1
kind: persistentVolume
metadata:
name: mongodb-pv
labels:
type: local
spec:
storageClassName: hostpath
capacity:
storage: 3Gi
accessModes:
- ReadWriteOnce
hostpath:
path: /mnt/data
---
apiVersion: v1
kind: persistentVolume
metadata:
name: mysql-pv
labels:
type: local
spec:
storageClassName: hostpath
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
hostpath:
path: /mnt/data
Writing YAML to a File in Python
Now that you’ve learned how to create YAML documents from Python objects, let’s learn how to write them into a file for future use.
The dump()
function optionally accepts a file object as one of its arguments. When you provide this optional file object argument, the dump()
function will write the produced YAML document into the file.
Let’s define a Python function write_yaml_to_file()
that converts a Python object into YAML and writes it into a file.
def write_yaml_to_file(py_obj,filename):
with open(f'{filename}.yaml', 'w',) as f :
=False)
yaml.dump(py_obj,f,sort_keysprint('Written to file successfully')
'output') write_yaml_to_file(data,
Upon calling the write_yaml_to_file()
function with the data
dictionary as the argument, the YAML file will be created in the working directory, as shown below:
Similarly, you can call the write_yaml_to_file()
function with data2
as the argument to convert it to YAML and store it into a YAML file.
def write_yaml_to_file(py_obj,filename) :
with open(f'{filename}.yaml', 'w',) as f :
=False)
yaml.dump_all(py_obj,f,sort_keysprint('written to file successfully')
'output2') write_yaml_to_file(data2,
Reading YAML in Python
The yaml module comes with a function that can be used to read YAML files. This process of YAML files with PyYAML is also referred to as loading a YAML file.
How to Read YAML Files With safe_load()
The safe_load()
function is used to read YAML files with the PyYAML library. The other loader you can use but is not recommended is the load()
function.
The function read_one_block_of_yaml_data
will open a yaml file in read mode, load its contents using the safe_load()
function, and print out the output as a dictionary of dictionaries.
def read_one_block_of_yaml_data(filename):
with open(f'{filename}.yaml','r') as f:
= yaml.safe_load(f)
output print(output)
'output') read_one_block_of_yaml_data(
{'Name': 'John Doe', 'Position': 'DevOps Engineer', \
'Location': 'England', 'Age': '30', 'Experience': \
{'GitHub': 'Software Engineer', 'Google': 'Technical Engineer', \
'Linkedin': 'Data Analyst'}, 'Languages': {'Markup': ['HTML'],\
'Programming': ['Python', 'JavaScript', 'Golang']}}
You can also read the contents of a YAML file, copy, and write its contents into another file. The code below will read a file output.yaml and write the content of that file, into another file output3.yaml.
def read_and_write_one_block_of_yaml_data(filename,write_file):
with open(f'{filename}.yaml','r') as f:
= yaml.safe_load(f)
data with open(f'{write_file}.yaml', 'w') as file:
file,sort_keys=False)
yaml.dump(data,print('done!')
'output', 'output3') read_and_write_one_block_of_yaml_data(
You should have the following output:
For reading yaml with multiple blocks of YAML data, you’ll use the safe_load_all()
function and convert the output to a list:
def read_multiple_block_of_yaml_data(filename):
with open(f'{filename}.yaml','r') as f:
= yaml.safe_load_all(f)
data print(list(data))
'output2') read_multiple_block_of_yaml_data(
The output below shows the result as a list of dictionaries:
[{'apiVersion': 'v1', 'kind': 'persistentVolume', 'metadata': \
{'name': 'mongodb-pv', 'labels': {'type': 'local'}}, 'spec': \
{'storageClassName': 'hostpath'}, 'capacity': {'storage': '3Gi'}, \
'accessModes': ['ReadWriteOnce'], 'hostpath': {'path': '/mnt/data'}}, \
{'apiVersion': 'v1', 'kind': 'persistentVolume', 'metadata': \
{'name': 'mysql-pv', 'labels': {'type': 'local'}}, 'spec': \
{'storageClassName': 'hostpath'}, 'capacity': {'storage': '2Gi'}, \
'accessModes': ['ReadWriteOnce'], 'hostpath': {'path': '/mnt/data'}}]
If you try to load the data as it is, without converting it to a list, you’ll get a generator object and a memory location and not the contents of the yaml file:
<generator object load_all at 0x7f9e0e0b6880>
Similarly, you can also write the loaded data into another file:
with open(f'{filename}.yaml','r') as f:
= yaml.safe_load_all(f)
data = list(data)
loaded_data with open('output4.yaml', 'w') as file:
file, sort_keys=False)
yaml.dump_all(loaded_data,print('done!')
And you have the complete function as follows:
def read_multiple_block_of_yaml_data(filename,write_file):
with open(f'{filename}.yaml','r') as f:
= yaml.safe_load_all(f)
data = list(data)
loaded_data with open(f'{write_file}.yaml', 'w') as file:
file, sort_keys=False)
yaml.dump_all(loaded_data,print('done!')
'output2','output4') read_multiple_block_of_yaml_data(
Modifying YAML in Python
You can modify the contents of a YAML file using the yaml
module with PyYAML. All you have to do is ensure the function takes in the following arguments: a YAML file to read and the key with the new value.
As an example, you’ll replace the data
dictionary Age
key to have a value of 30
instead of 26
. The code below will create a function read_and_modify_one_block_of_yaml_data
that takes in any YAML file as an argument. Then, it will read that file and modify the Age
key to have a value of 30
and output the modified data.
def read_and_modify_one_block_of_yaml_data(filename, key, value):
with open(f'{filename}.yaml', 'r') as f:
= yaml.safe_load(f)
data f'{key}'] = f'{value}'
data[print(data)
print('done!')
'output', key='Age', value=30) read_and_modify_one_block_of_yaml_data(
{'Name': 'John Doe', 'Position': 'DevOps Engineer', 'Location': \
'England', 'Age': '30', 'Experience': {'GitHub': 'Software Engineer', \
'Google': 'Technical Engineer', 'Linkedin': 'Data Analyst'}, \
'Languages': {'Markup': ['HTML'], 'Programming': \
['Python', 'JavaScript', 'Golang']}}
done!
You can optionally write the modified data into another file. The code below writes the modified data into another file output5.yaml.
def read_and_modify_one_block_of_yaml_data(filename,write_file, key,value):
with open(f'{filename}.yaml', 'r') as f:
= yaml.safe_load(f)
data f'{key}'] = f'{value}'
data[print(data)
with open(f'{write_file}.yaml', 'w') as file:
file,sort_keys=False)
yaml.dump(data,print('done!')
'output', \
read_and_modify_one_block_of_yaml_data('output5', key='Age', value=30)
Once executed successfully, you should have the following output:
To illustrate further, you can modify the output2.yaml file also. The code below, will modify the first block of YAML data and edit the accessMode
to be both ‘ReadAccessModes’ and ‘ReadOnlyMany’ and write it to a file output6.yaml
def read_modify_save_yaml_data(filename,index,key,value,write_file):
with open(f'{filename}.yaml','r') as f:
= yaml.safe_load_all(f)
data = list(data)
loaded_data f'{key}'].append(f'{value}')
loaded_data[index][with open(f'{write_file}.yaml', 'w') as file:
file, sort_keys=False)
yaml.dump_all(loaded_data,print(loaded_data)
'output2', 0, 'accessModes', \
read_modify_save_yaml_data('ReadOnlyMany', 'output6')
Once this code is executed, you should have the following output:
How to Convert YAML to JSON in Python
You can convert YAML to another data-serialization format like JSON.
Firstly, you’ll need to import the json
module from the Python standard library:
import json
Now, run the code below to convert a YAML document to a JSON object and save it in a file called output.json
.
def convert_yaml_to_json(yfile, jfile):
with open(f'{yfile}.yaml', 'r') as f:
= yaml.safe_load(f)
yaml_file with open(f'{jfile}.json', 'w') as json_file:
=3)
json.dump(yaml_file, json_file, indentprint('done!')
'output','output') convert_yaml_to_json(
Once converted successfully, you should have an output.json
file in your working directory:
Conclusion
In this tutorial, you’ve learned how to handle YAML using Python and the PyYAML library. You’ve worked with functions like safe_load()
, safe_load_all()
, dump()
, and dump_all()
to manipulate YAML data and even convert it to JSON with Python’s json
library.
As you continue to explore and expand your Python skills, you might find yourself in need a CI process. If that’s the case, give Earthly a try. It’s a great tool for automating Python builds and can significantly streamline your workflow.
Now, why not take your newfound knowledge a step further? Try converting JSON to CSV next.
Earthly Cloud: Consistent, Fast Builds, Any CI
Consistent, repeatable builds across all environments. Advanced caching for faster builds. Easy integration with any CI. 6,000 build minutes per month included.