Documentation & Markdown Tables

Documentation is a vital part of almost any technical project. However, maintaining good documentation is hard for many reasons, one being that the time needed to document things is time you’re not spending on the “building / fixing / operating / assisting” aspects of your work – this is especially painful when you have lots of other urgent or important work to do.

Therefore, an important value of any documentation system is to make it as easy as possible for the would-be documenter to access, create, and edit content at any time. This includes enabling the documenter to express their ideas without requiring excessive authentication, cumbersome markup, or slow / complex UI interactions.

In addition to making things fast and easy for documentation authors, a good documentation system needs to enable documentation consumers to quickly and effortlessly find relevant, understandable, and well-presented information. This means you can’t just use walls of text, instead you’ll need properly formatted prose (or at least bullet-points) and you’ll occasionally need diagrams, photos, videos, syntax-hilighted code, and other attachments to express ideas clearly and simply. You also can’t rely on the user to find files scattered around a filesystem or know which subcategory you’ve decided to use for a given document, so you need a search feature.

Using a wiki as a documentation system is a pretty good, but not perfect, fit for these needs and values. I haven’t yet found the ideal wiki software for my purposes, so both for professional and personal projects, I’m currently supplementing a wiki with CodiMD (which is offered commercially as HackMD). But I am still hunting for the-one-true-system, and I have my eye on wiki.js as a potential candidate.

Markdown

The “MD” in “CodiMD” is Markdown. For tech documentation purposes Markdown stands out in the field of markup languages because it was designed with strong opinions about what you actually need and what you don’t. As a result it has an easy, minimal, human-readable syntax (which still allows you to leverage much of the power of HTML if you absolutely need it). There are more expressive and powerful markup languages like LaTeX or (god help you) SGML. In contrast to those, markdown addresses the paradox of choice by eliminating or at least discouraging many possible formatting options so that you can focus on the core content instead of things like the box model, kerning, or drop caps. You want people to be thinking about their core content when they’re documenting, not tweaking their line-heights and not trying to remember which of many similar markup tags to use for a given situation.

Tables

Markdown has support for table generation, which renders this text:

animal | color | feature
------ | ----- | ------------
turtle | green | shell
crow   | black | intelligence
horse  | brown | long face

…as this table:

animal color feature
turtle green shell
crow black intelligence
horse brown long face

Tables like this can make it easier for readers of your documentation to understand what you’re trying to communicate. However, making an ascii table for use in markdown can be time-consuming, so I have some code that I use to help save time making these tables. It’s a python function that takes a list of dictionaries (aka maps/hashes/associative arrays) and turns them into table rows.

Here I present that function in the context of reviewing the output of a program that returns a series of json documents that represent tabular data:

#!/usr/bin/env python
""" Example showing markdown table generation from a list of dicts """
from __future__ import print_function

import collections
import json
import subprocess


def markdown_table_from_dicts(row_list, col_list=None, fill_empty=""):
    """
    return a string containing a markdown-formatted table. the table is
    generated from the supplied list of dicts that contain row data, and a
    list of column names which refer to keys in the dicts. column widths are
    automatically calculated. the optional fill_empty argument defines the
    value inserted into cells that don't have data
    """
    # for example to create this table...
    #
    # animal | color | feature
    # ------ | ----- | ------------
    # turtle | green | shell
    # crow   | black | intelligence
    # horse  | brown | long face
    #
    # you'd supply this row_list (remember regular python dicts are unordered):
    # [
    #     {"animal": "turtle", "feature": "shell", "color": "green"},
    #     {"color": "black", "animal": "crow", "feature": "intelligence"},
    #     {"feature": "long face", "animal": "horse", "color": "brown"},
    # ]
    # ...and (optionally) this col_list: ["animal", "color", "feature"]

    if not col_list:
        # the user didn't supply a col_list so we generate one from the set of
        # keys in all the rows
        col_list_tmp = set()
        for row in row_list:
            col_list_tmp.update(row)
        col_list = sorted(col_list_tmp)

    if fill_empty is not None:
        # fill_empty represents the default value for cells that aren't defined
        # in a row
        filler = lambda: fill_empty
        row_list = [collections.defaultdict(filler, row) for row in row_list]

    # calculate the width of each column. this is derived from the max length
    # of the row contents in a column including the column name itself
    col_widths = {
        col_name: max([len(r[col_name]) for r in row_list] + [len(col_name)])
        for col_name in col_list
    }

    # generate a format string that can later be used to print the rows of the
    # table. in the above example this would work out to:
    # "{animal:<6} | {color:<5} | {feature:<12}"
    row_fmt = ' | '.join([
        '{{{name}:<{width}}}'.format(name=col_name, width=col_widths[col_name])
        for col_name in col_list
    ])

    # the header row contains just the names of the columns
    header_row = {col_name: col_name for col_name in col_list}

    # the delimiter row contains the dash/underlines that appear on the line
    # below the header row (this is vital for md tables to be parsed as tables)
    delim_row = {col_name: "-" * col_widths[col_name] for col_name in col_list}

    # putting it all together
    return "\n".join([row_fmt.format(**row)
                      for row
                      in [header_row, delim_row] + row_list])


def main():
    """ entrypoint for direct execution """
    # this is for example purposes, generally you don't want to shell-out to
    # docker. you should use the API instead.
    docker_ps = ["docker", "ps", "-a", "--format", "{{json .}}"]
    containers = [json.loads(line)
                  for line in subprocess.check_output(docker_ps).splitlines()]

    # containers[0].keys():
    #   [u'Status', u'Image', u'Labels', u'Ports', u'Networks', u'Command',
    #    u'Names', u'Mounts', u'RunningFor', u'LocalVolumes', u'ID',
    #    u'CreatedAt', u'Size']

    # containers is now a list of dicts, let's markdown-tableize it!
    print(markdown_table_from_dicts(containers, ["Names", "Image", "CreatedAt",
                                                 "Status"]))

    return


if __name__ == '__main__':
    main()

The simple program above looks at the docker containers running on the local machine (or on a remote machine if that’s how you roll) and prints a markdown table with excerpts of that data.