Recutils — Small Technology Notes

Hi! It looks like I have since continued, updated, or rethought this post in some ways, so you may want to look at this after you're done reading here.

Database Basics from Apr 5, 2020, 8:55am

Once in a while, I have a small problem with some regularity that has an easy solution. Traditionally, I write a note to myself, somewhere, and then need to hunt down the note when I need it next time. Since that’s inefficient, whenever I have a few of these, I’m going to start using them as short blog posts.

If anybody else finds them useful, that’s great, but it’s at least as important to me to be able to just look through the https://john.colagioia.net/blog/tag/techtips summary lines for what I want, instead of needing to sift through my notes with hundreds of other random ideas, assuming it made it to my specific note-taking app. And it’s also now a fixed URL that I can point other people to, instead of copying and pasting into e-mails.

Recutils, in General

When I store information, I’m a big fan of lightweight, text-based formats for a variety of reasons.

They store and compress well, when that’s important.
It’s possible to quickly inspect or edit whatever I’m working on.
I can quickly search anything I’m working on with a tool like grep.
I’m not tied to specific applications.
It’s easier to convert to another format, making it closer to future-proof.

For example, this blog, my novel, and my personal notes are generally all written as Markdown files. I store most of my spreadsheets as sets of CSV files, unless the formatting or calculated cells are important. In the rare cases I need something like a flow chart, it’s in pic or—increasingly—Mermaid format.

So, I got a bit excited, this past week, when I discovered recutils, a system to manage plain-text databases. It’s not perfect (unless I’m missing something important), but it’s definitely close to the sort of thing I would generally reach for.

File Organization

One of the clunkier aspects to recutils is that it’s difficult (impossible?) to work with an arbitrary subset of files. So, there seems to be two options:

Put all the record types into one file, or
Separate the record types into different files, while specifying all of them with a wildcard (such as “*.rec”) whenever working in situations that require more than one record type, like a Join.

Neither of these is ideal, of course, but the single file is probably acceptable for small databases and the latter is probably the most intuitive. I’ll use the multiple-file approach, here.

Obviously, recutils doesn’t actually care either way, and it’s possible to combine the two, so that record types that are often used together can share a file, while the other clusters of types or individual types can be in their own files. I’m not really sure that helps, though.

Record Schema

While it’s not strictly necessary to have a schema for records to conform to, it makes life significantly easier, if for no other reason than documentation. So, the header of a file of records, optionally, shows the types of each field, keys, and so forth.

It looks like unspecified fields are all arbitrary strings, multi-line with a plus sign (+) at the beginning of every line after the first. Other than that, if we are looking to store something like application users, we might have something like…

%rec: user
%doc: User information

# The non-multi-line string fields
%type: id int
%type: login line
%type: email email
%type: password line
%type: name line
%type: created_at: date
%type: updated_at: date
%type: role enum Guest User Moderator Administrator
%type: deleted bool

# Roles for fields; note that we sneak in "profile,"
# which is our multi-line string.
%key: id
%mandatory: login email password
%allowed: name profile created_at updated_at role deleted
# We could also %prohibit fields we want to ensure doesn't
# exist until we're ready to set it.

# Special restrictions
%auto: id created_at updated_at
%confidential: password
%unique: login email

That’s probably more extensive than it needs to be for an example, but we see the record type name (%rec), decent coverage of the available types (int, line, email, date, enum, and bool, the different kinds of fields (other than prohibited), and constraints (%auto, %confidential, and %unique).

There’s one useful type not shown here, that I’ll get to in a bit.

Records

Each user record, now, is going to need to have our specified fields and the right types, all in name: value format. For example, I created the following user via the Fake Name Generator.

id: 7760
login: Brouch
email: VanessaJStolz@dayrep.com
password: $2a$11$WBkGS/r5/0tn2muqPXreVO40wy6dxdEzKj/lY5.y7BQ5YoOHKerNW
name: Vanessa J. Stolz
profile: A 78-year-old Libra from Atlanta, Georgia, who works
+ for De Pinna as a machine feeder and drives a blue 2000
+ Toyota Pod.
created_at: 2020-02-05 06:48:55-0500
updated_at: 2020-02-05 06:48:55-0500
role: Moderator
deleted: False

Beyond just editing our users.rec file manually, we have a few options to populate our file with data.

And no, I’ve never heard of the Toyota Pod before now. It came generated with the fictional person’s dossier. It was a concept car that appears to be the automotive equivalent of the early smart refrigerators that nobody would really want, but is easy to explain to people.

As an aside, recutils doesn’t care about extra fields. So, if you want to add a birthday, significant other’s name, and hat size to Vanessa’s record above, but not anybody else’s, that’s probably going to be somewhat difficult to work with, but not destructive.

Conversion

If there’s a pre-existing data source, recutils comes with two utilities to convert from common database types:

csv2rec turns each line of a CSV file into a record, with the header line providing the field names.
mdb2rec converts a Microsoft Access database (or a single table) into records.

I can confirm that the CSV program works well, but couldn’t find an Access database to test.

Insertion

The recins utility, probably unsurprisingly, inserts records. Continuing with our users, it can be used something like…

recins --type=user --field=login --value=Gode1962 \
  --field=email --value=PhillipSCollins@teleworm.us \
  --field=password --value=password \
  --field=name --value="Phillip S. Collins" \
  --field=profile --value="Suburban driver of Dodge Rams" \
  --field=role --value=User \
  --field=deleted --value=False \
  --verbose users.rec

Assuming that none of the constraints have been violated, here, this will add the record to the users.rec file. If it fails, the --verbose argument will produce an error message explaining what went wrong.

Exploring

We can get some basic information, now. The following gives us a count of each record type in the file.

recinf users.rec

The difference between that and retrieving the schema is a matter of an argument.

recinf --descriptor users.rec

And then we can check into more general queries of the data, though the syntax isn’t SQL. We can pick a random set of records.

recsel --random 5 users.rec

We can get a “projection” of all data by specifying columns. The columns can also be renamed for convenience, which will be important, later.

recsel --print-values=login,email users.rec
recsel --print-values=login:Username,email:EMail users.rec

Replacing --print-values with just --print produces output that looks like a record, with name: value pairs. Also, if there are multiple types of records, we would need to specify --type=user.

We can also run arbitrary queries.

recsel --expression="id = 23573" users.rec
recsel --expression="email ~ gmail.com" users.rec

The expressions can get fairly sophisticated.

Deleting Records

Now that we understand insertion and query expressions, we should be able to figure out how to delete a record.

recdel --type=user --expression="login = Selits" --comment users.rec

The --comment argument deletes the record by commenting it out in the file, so it can be found or restored later.

Multiple Tables

Now that we’ve settled our users, we can give the users something to work with, like favorite websites. We’ll keep this website.rec schema simpler than our users.

%rec: website
%doc: Favorite websites submitted to the database.

%type: id int
%type: url line
%type: owner rec user
%type: created_at: date
%type: updated_at: date

%key: id
%mandatory: url owner

%auto: id created_at updated_at

The important field, here, sets up a foreign key:

%type: owner rec user

That is, owner is a reference to a record of type user. Because the user record type’s key is an integer, the owner field is an integer, as well.

So, we have an example record.

id: 2031
url: https://www.gnu.org/software/recutils/manual/Foreign-Keys.html#Foreign-Keys
owner: 7760
created_at: 2020-02-05 06:48:55-0500
updated_at: 2020-02-05 06:48:55-0500

Joining Tables

Now, we get to the important stuff. We can now get a list of URLs and the name of the person who submitted each one. Remember, we can’t explicitly specify multiple files, so we either have a single file or a wildcard.

recsel --type=website --field owner \
  --print-values=url:URL,owner_name:Owner *.rec

And, of course, we could add a query expression to the command, if we wanted, or we could involve additional tables.

Note, incidentally, that the join is an inner join. One of the problems with recutils seems to be that other kinds of joins are not permitted.

Programming Support

In the grand scheme, a database that only allows ad hoc queries isn’t particularly useful. Instead, we need some kind of API that can be used by software.

For programming in C, there’s the librec-dev library package, and a lot of other languages can (indirectly) use that. But I only see bindings available for Python and PHP, with neither library updated in several years.

Also, the library documentation is “go read the source code,” specifically the main header file. Apparently, the important functions in the library approximately map to…

librec Function	recutils Utility
`rec_db_query`	`recsel`
`rec_db_insert`	`recins`
`rec_db_delete`	`recdel`
`rec_db_set`	`recset`
`rec_int_check_db`	`recfix --check`

That’s not spectacular, but still has some potential to improve, at least. Maybe in a few weeks, when other things settle down, I’ll revisit recutils specifically to document the library.

Other Capabilities

This just scratches the surface, honestly. Queries can be grouped. Aggregate functions are available. Subtypes are available, such as strings that conform to a regular expression or integers in a limited range, and can be named with %typedef. Records can have compound fields or be sorted. There are more encryption options than just %confidential. Output can be formatted with templates. Records can be updated. The recfix program performs diagnostics and repairs some issues.

In other words, it’s a sophisticated system with a lot going for it, despite the handful of problems I’ve called out in the preceding discussion. It won’t replace CSV files for my simple data sets, but for anything more complicated than that (other than calculation, which still needs a lightweight solution that doesn’t have the overhead of a typical programming language), it looks pretty good!

A good example of where I’ll start using it soon, for the Silver Bat stories (more in that universe are definitely coming), I currently just have a Markdown file that lists all the characters. It includes their profiles from my profile generator, plus a brief biography and possible plot threads. When the time comes to share that information with other people, it would be nice to manage that data better and also turn it into a kind of concordance to be able to see where all the characters have been used.

For much more information, you can read through the documentation. It’s not particularly friendly to newcomers who don’t already understand how things work—which, in theory, is nobody who got to the end of this post—but it’s extensive enough to answer most questions that aren’t related to the library.

Bonus: SQLite Export

Since I wanted to get up and running quickly with data with recutils, I decided to use a database I was already working with to populate my records. SQLite, which is where the data is, has a mildly clunky way of exporting tables to CSV files.

First, get into the SQLite shell.

sqlite3 database.sqlite3

Then, decide what you want to export and go for it.

.tables # <- This lists all tables
# Now that we know what tables are available,
# we can export them.
.headers on
.mode csv
.output users.csv
SELECT * FROM users;
# The table and file names don't need to match.
.output websites.csv
SELECT * FROM websites;
# If you don't issue a new .output directive,
# all your data will end up in the same file,
# which is going to make everything else more
# complicated than it needs to be.
.quit

Now, we have files that can be converted to record sets with csv2rec as described above.

Credits: The header image is work, technology, vintage, wheel, retro, clock, by an anonymous PxHere photographer, is made available under the terms of the CC0 1.0 Universal Public Domain Dedication.

By commenting, you agree to follow the blog's Code of Conduct and that your comment is released under the same license as the rest of the blog. Or do you not like comments sections? Continue the conversation in the #entropy-arbitrage chatroom on Matrix…

Get monthly ^* updates on Entropy Arbitrage posts, additional reading of interest, thoughts that are too short/personal/trivial for a full post, and previews of upcoming projects, delivered right to your inbox. I won’t share your information or use it for anything else. But you might get an occasional discount on upcoming services.
Or…	Mailchimp 🐒 seems less trustworthy every month, so you might prefer to head to my Buy Me a Coffee ☕ page and follow me there, which will get you the newsletter three days after Mailchimp, for now. Members receive previews, if you feel so inclined.
Email Address
First Name
Last Name
Email Format	html text


* Each issue of the newsletter is released on the Saturday of the Sunday-to-Saturday week including the last day of the month.
Can’t decide? You can read previous issues to see what you’ll get.