Validate Your YAML (with CUE)

43 minute read     Updated:

Adam Gordon Bell %
Adam Gordon Bell

This article explains how Cuelang configurations enhance CI pipelines. Earthly guarantees reproducible and efficient builds. Learn more about Earthly.

I’ve complained before about using YAML when a programming language is what’s needed. But, when you’ve got configuring to do, YAML is pretty useful. It’s so much nicer to read and write than the XML I had to write back in the early days of Java development. But one advantage XML had over YAML was that XML schemas were commonly used and so I wouldn’t get errors like this:

Error parsing yaml file:
map values are not allowed here
  in "/Users/adam/sandbox/earthly-website/blog/authors.yml", line 2, column 8 

This is an error I got today on this very blog you are reading right now. You see, the blog is expecting a certain format of config, and I’ve violated that expectation. The blog wants a string, and I’ve given it a map. If only we had XML Schema for YAML or, even better, a static type system for configuration, then I wouldn’t have to get this error at runtime.

It turns out such things do exist. There is JSON Schema, Dhall, and Cuelang1. I’ll save covering Dhall and JSON Schema for another day. Today I’m going to show you how to use Cuelang, which is both an extension of YAML and command-line tool, to validate your YAML. And I’m going to attempt to use it to prevent future problems with this blog’s authors.yml file.

Start Validating

Ok, so the authors of this blog are stored in a YAML file aptly named authors.yaml and it looks like this:

Corey:
    name    : "Corey Larson"
    bio     : "Eats, runs, and codes. Dad. Engineer. Progressive. LDS. Disneyland fanatic."
    avatar  : "/assets/images/authors/coreylarson.jpg"
Alex:
    name    : "Alex Couture-Beil"
    bio     : "Alex enjoys writing code, growing vegetables, and the great outdoors."
Vlad:
    name    : "Vlad A. Ionescu"
    bio     : " Founder of Earthly. Founder of ShiftLeft. Ex Google. Ex VMware. Co-author RabbitMQ Erlang Client."
    avatar  : "/assets/images/authors/vladaionescu.jpg"

All I want to do is find out before my blog starts if this file contains anything that will cause a runtime error. You know, catching problems without running the blog and hitting a code path that reads these values. It won’t be an earth-shattering improvement to the blog, but it will remove a little bit of friction.

To start with I’ll create a description of what my author type looks like in a file called authors-type.cue:

#Author : {
    name   : string
    bio    : string 
    avatar : string
}

Types in Cuelang start with # and look like a struct or class definition. So here I’m saying an #Author type has a name, bio, and an avatar – all of which are strings.

Next, I need to tell cue that my YAML file, at the root level, is going to be a map with string keys and #Author values. Doing so is pretty simple:

[string] : #Author

Then I can use the cue command-line tool to validate my author file, but first let’s install it.

Installing Cue

To install Cuelang, run brew install cue-lang/tap/cue on a mac or download a release directly from GitHub.

YAML Validate

Now that I have the cue command installed I can vet my authors.yml like this:

$ cue vet authors-type.cue authors.yml
Alex.avatar: incomplete value string:
    ./authors-type.cue:6:14

As you can see, I’m missing an avatar value for Alex. I’ll add that:

Alex:
    name    : "Alex Couture-Beil"
    bio     : "Alex enjoys writing code, growing vegetables, and the great outdoors."
+    avatar  : "/assets/images/authors/alexcouturebeil.jpg"

And now everything passes.

$ cue vet authors-type.cue authors.yml

That’s a small win for cue, but it shows how static typing can be valuable.

What Did I Learn?

Cuelang can be used to specify a schema for a plain YAML file. You can put your types in a separate file and run cue vet types.cue plain.yml and start benefiting from static types right away.

To specify a Type, the convention is to prefix with a # like this:

#Point: {
    x: number
    y: number
}

Optional Fields

The authors.yml snippet I’ve shown so far is a bit simplified. I also have links optionally attached to authors. It looks like this:

Adam:
    name    : "Adam Gordon Bell"
    bio     : "Spreading the word about Earthly. Host of CoRecursive podcast. Physical Embodiment of Cunningham's Law"
    avatar  : "/assets/images/authors/adamgordonbell.png"
    links:
        - label: ""
          icon: "fab fa-fw fa-twitter-square" 
          url: "https://twitter.com/adamgordonbell"
        - label: ""
          icon: "fas fa-fw fa-envelope-square"
          url: "mailto:adam+website@earthly.dev"

That is not described in our type, so now validation fails:

$ cue vet authors-type.cue authors.yml
Adam: field not allowed: links:
    ./authors-type.cue:3:11
    ./authors-type.cue:9:12
    ./authors2.yml:17:6

So to add links to my #Author type I first create a type of links:

#Link : {
    label: string
    icon: string
    url: string
}

Then I add it to #Authors as optional (using ? marks it as optional):

#Author : {
    name   : string
    bio    : string 
    avatar : string
    links?: [...#Link]
}

Then running vet finds no errors.

$ cue vet authors-type.cue authors.yml

I found it a little strange at first: cue vet complains when I add properties to a field, but Cuelang types are by default closed and don’t allow for extending. But that’s just a default. If I want to skip adding links to my type I can easily mark the type as open (using ...) and get around the errors.

#Author : {
    name   : string
    bio    : string 
    avatar : string
    ... //Type is open
}

[string] : #Author

It’s also possible to define links as optional but leave its type unspecified:

#Author : {
    name   : string
    bio    : string 
    avatar : string
    links?: [...]
}

This is less open than the totally open solution, but less constrained than the full typed solution.

By giving us types, but leaving ways to leave the types under-specified, Cuelang is behaving a bit like TypeScript. It improves upon YAML, but it tries to meet YAML where its at, and provides some escape hatches like open types.

The other way Cuelang is like TypeScript is that it does things with the type systems that are a little unusual. This brings me to constraints.

What Did I Learn?

  • Optional fields can be specified by ending their name with ?. For example, tip?: number would be an optional value.
  • Lists can be defined without a type list: [...] or with a type strings: [...string].
  • Types are closed by default. Adding new elements to them is a type error.
  • Types can be opened by ending the definition with ....

Putting that all together you can write something like this:

#MyOpenType: {
  tip?: number
  list: [...]
  strings: [...strings]
  ...
}

To specify a Type, the convention is to with a # like this:

#Point: {
    x: number
    y: number
}

Constraints

The layout on this blog has limited space for bios. If an author’s bio is longer than 250 characters, then it will still fit, but it won’t look great.

No one should have to read this much about me.

I can ensure that never happens like this:

#Author : {
    name   : string
    bio    : string 
    bio   : =~ "^.{1,250}$" //Max 250 length regex
    avatar : string
    links?: [...#Link]
}

What I’ve done here is add a constraint to the #Author type using a regex. You can constrain the type of strings using a regex! That is pretty cool, and any regex can be attached to any string type. Granted, a regex is a bit of an odd way to verify a string length. So let’s fix that.

CueLang has a standard library that can be used for creating constraints. If I import the strings module, I can ensure the number of Unicode runes in my string bio never gets over two-fifty like this:

import "strings"

#Author : {
    name   : string
    bio    : string 
    bio   : strings.MaxRunes(250)
}

And get errors like this:

$ cue vet authors-type.cue authors.yml
Adam.bio: invalid value 
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla eu dolor nec nisi scelerisque semper..." 
(does not satisfy strings.MaxRunes(250)):

The Cuelang standard library has many other functions for constraining types. Any function that takes in the type in question and returns a bool can be used for this purpose. In this way, constraints in CueLang are like Refinement Types.

The number of potential invariants you can enforce in this way is immense. For example, I could parse the links to ensure they are valid links, and I could ensure that avatar string always ended in .png and so on. However, for this simple author list, what I’ve got is enough validation.

What I Learned

Cuelang lets you constrain your values by adding constraints to the types. There are built-in constraints that let you enforce ranges and ensure strings match or don’t match regular expressions like this:

#Person: {
    age:   >=0 & <=100
    phone:  =~ "[0-9]+" 

}

And you can also import constraints from the standard library:

import "list"

#Author : {
    parents   : list.isSortedStrings()
    pets      : list.MaxItems(3)
}

Packages and Imports

The author.yaml file I’ve been using as an example came with our blog theme, and it’s pretty simple to understand what each field is for. But we could do better than this, right? If, instead of using YAML for config, the theme used CueLang directly, then they could have shipped it with the author type we specified above. Then the configuration would be typed and validated without any effort on my part.

That might work something like this. The minimal mistakes theme could put this cue file up on GitHub:

package blog

import "strings"

#Link : {
    label: string
    icon: string
    url: string
}

#Author : {
    name   : string
    bio    : string 
    bio   : strings.MaxRunes(250)
    avatar : string
    links?: [...#Link]
}

Then I could convert my YAML file to cue:

$ cue import authors.yaml

Which would give me something like this:


Corey: {
 name:   "Corey Larson"
 bio:    "Eats, runs, and codes. Dad. Engineer. Progressive. LDS. Disneyland fanatic."
 avatar: "/assets/images/authors/coreylarson.jpg"
}
Alex:  {
 name:   "Alex Couture-Beil"
 bio:    "Alex enjoys writing code, growing vegetables, and the great outdoors."
 avatar: "/assets/images/authors/vladaionescu.jpg"
}
Vlad: {
 name:   "Vlad A. Ionescu"
 bio:    " Founder of Earthly. Founder of ShiftLeft. Ex Google. Ex VMware. Co-author RabbitMQ Erlang Client."
 avatar: "/assets/images/authors/vladaionescu.jpg"
}
Adam:  {
 name:   "Adam Gordon Bell"
 bio:    "Spreading the word about Earthly. Host of CoRecursive podcast. Physical Embodiment of Cunningham's Law"
 avatar: "/assets/images/authors/adamgordonbell.png"
 links: [
  {
   label: ""
   icon:  "fab fa-fw fa-twitter-square"
   url:   "https://twitter.com/adamgordonbell"
  }, {
   label: ""
   icon:  "fas fa-fw fa-envelope-square"
   url:   "mailto:adam+website@earthly.dev"
  }, {
   label: ""
   icon:  "fas fa-fw fa-link"
   url:   "https://corecursive.com"
  },
 ]
}

And then I can import blog.#Authors type and use it to annotate my CUE like so:

import (
 "github.com/mmistakes/minimal-mistakes/blog"
)

Corey: blog.#Author & {
 name:   "Corey Larson"
 bio:    "Eats, runs, and codes. Dad. Engineer. Progressive. LDS. Disneyland fanatic."
 avatar: "/assets/images/authors/coreylarson.jpg"
}
Alex: blog.#Author & {
 name:   "Alex Couture-Beil"
 bio:    "Alex enjoys writing code, growing vegetables, and the great outdoors."
 avatar: "/assets/images/authors/vladaionescu.jpg"
}
Vlad: blog.#Author & {
 name:   "Vlad A. Ionescu"
 bio:    " Founder of Earthly. Founder of ShiftLeft. Ex Google. Ex VMware. Co-author RabbitMQ Erlang Client."
 avatar: "/assets/images/authors/vladaionescu.jpg"
}
Adam: blog.#Author & {
 name:   "Adam Gordon Bell"
 bio:    "Spreading the word about Earthly. Host of CoRecursive podcast. Physical Embodiment of Cunningham's Law"
 avatar: "/assets/images/authors/adamgordonbell.png"
 links: [
  {
   label: ""
   icon:  "fab fa-fw fa-twitter-square"
   url:   "https://twitter.com/adamgordonbell"
  }, {
   label: ""
   icon:  "fas fa-fw fa-envelope-square"
   url:   "mailto:adam+website@earthly.dev"
  }, {
   label: ""
   icon:  "fas fa-fw fa-link"
   url:   "https://corecursive.com"
  },
 ]
}

And I can now run cue vet without need for specifying the schema, or having to create the types myself:

$ cue vet authors.cue

And I can be sure all my config data is structured correctly.

Once you start import types as part of config writing, you can see why something like CueLang or its competitor Dhall can make a lot of sense. Especially for config-heavy domains like Kubernetes.

(A great post that expands on this idea is How CUE wins which makes a great case for why something like CUE is needed desperately needed.)

There is one problem, though…

What Did I Learn?

Cue supports packages with the package keyword. You can split cue files up into packages.

package lib1

#MyString : string
#MyNumber : number

Packages can be imported from the web.

import "lib.io/lib1/lib1.cue"

n1 : lib1.#MyNumber
s1 : lib1.#MyString

It all works very much like Golang packages.

Back To YAML

Having type checking for YAML sounds great. And CueLang has lots more features for improving configuration. But I have a problem. I want types and constraints for my blog config, but how much incentive do the Jekyll theme authors have to provide them? Not much because not many Jekyll users are using cue, which won’t change because of the lack of types. It’s the chicken and the egg problem of type annotations.

TypeScript solved this with DefinitelyTyped, a massive community-driven effort to source types for various JavaScript libraries. Unfortunately, CueLang doesn’t yet have a similar concept.

Generating YAML

Another problem is that, ideally, I’d like my programming language to be able to read .cue files directly. Only Golang supports this.

Nevertheless, much like TypeScript, as a sufficiently motivated user, I can do an end-run around these concerns. I just have to write my own types and then use the cue CLI tool to turn my cue back YAML. ( And this is all I can really do with Jekyll, since Jekyll is written in Ruby and has no support for cue. So I need to convert back to YAML before running the blog. )

cue does support this:

$ cue eval authors.cue > authors.yaml 

And so if I keep my types in authors-type.cue and my values in authors.cue and then generate authors.yaml in source using cue eval then I can ensure my YAML is always correct. But I’ll have to keep a copy of my cue config and the YAML version and I think that is too much to ask.

I do look forward to the day when types and constraints spread throughout the configuration land but what I’m going to do for now is just keep my types in a authors-type.cue file and keep the authors list in yaml, validating things using the cue vet approach. That way, I still get some type checks but don’t have to maintain two files.

What Did I Learn?

  • Only Golang can read cue files. 😢
  • cue can generate yaml or json with cue eval, though. 😊
  • But then I have to hold onto my config in two formats or use cue vet. 😕
  • But if more people use Cuelang, support will grow over time. 🤞

Conclusion

That is a hopefully breezy introduction to the basics of Cuelang. There is much more to learn, but if you remember that it’s a configuration format that uses types and constraints and packages to improve on YAML, then you’ve got the gist of it. After that, you just have to draw the rest of the damn owl.

To that end, cuetorials.com and bitfieldconsulting have the best online tutorials I was able to find.

(Just remember not to use configuration when a programming language is what you need.)

And also, if you’re the type of person who thinks validating YAML is a worthy goal, you might be the type of person who cares about software builds. If that’s you, take a look at Earthly.

Earthly Cloud: Consistent, Fast Builds, Any CI
Consistent, repeatable builds across all environments. Advanced caching for faster builds. Easy integration with any CI. 6,000 build minutes per month included.

Get Started Free


  1. I’ll be referring to Cue as Cuelang in this article, and the command line tool as cue. Cue is a bit hard to search for in google, so perhaps I can start the trend of following GoLang’s naming convention.↩︎

Adam Gordon Bell %
Spreading the word about Earthly. Host of CoRecursive podcast. Physical Embodiment of Cunningham's Law.
@adamgordonbell
✉Email Adam✉

Updated:

Published:

Get notified about new articles!
We won't send you spam. Unsubscribe at any time.