- Database Basics from Apr 5, 2020, 8:55am
Once in a while, I have a small problem with some regularity that has an easy solution. Traditionally, I write a note to myself, somewhere, and then need to hunt down the note when I need it next time. Since that’s inefficient, whenever I have a few of these, I’m going to start using them as short blog posts.
If anybody else finds them useful, that’s great, but it’s at least as important to me to be able to just look through the https://john.colagioia.net/blog/tag/techtips summary lines for what I want, instead of needing to sift through my notes with hundreds of other random ideas, assuming it made it to my specific note-taking app. And it’s also now a fixed URL that I can point other people to, instead of copying and pasting into e-mails.
Recutils, in General
When I store information, I’m a big fan of lightweight, text-based formats for a variety of reasons.
- They store and compress well, when that’s important.
- It’s possible to quickly inspect or edit whatever I’m working on.
- I can quickly search anything I’m working on with a tool like
- I’m not tied to specific applications.
- It’s easier to convert to another format, making it closer to future-proof.
For example, this blog, my novel, and my personal notes are generally all written as Markdown files. I store most of my spreadsheets as sets of CSV files, unless the formatting or calculated cells are important. In the rare cases I need something like a flow chart, it’s in pic or—increasingly—Mermaid format.
So, I got a bit excited, this past week, when I discovered recutils, a system to manage plain-text databases. It’s not perfect (unless I’m missing something important), but it’s definitely close to the sort of thing I would generally reach for.
One of the clunkier aspects to recutils is that it’s difficult (impossible?) to work with an arbitrary subset of files. So, there seems to be two options:
- Put all the record types into one file, or
- Separate the record types into different files, while specifying all of them with a wildcard (such as “
*.rec”) whenever working in situations that require more than one record type, like a Join.
Neither of these is ideal, of course, but the single file is probably acceptable for small databases and the latter is probably the most intuitive. I’ll use the multiple-file approach, here.
Obviously, recutils doesn’t actually care either way, and it’s possible to combine the two, so that record types that are often used together can share a file, while the other clusters of types or individual types can be in their own files. I’m not really sure that helps, though.
While it’s not strictly necessary to have a schema for records to conform to, it makes life significantly easier, if for no other reason than documentation. So, the header of a file of records, optionally, shows the types of each field, keys, and so forth.
It looks like unspecified fields are all arbitrary strings, multi-line with a plus sign (
+) at the beginning of every line after the first. Other than that, if we are looking to store something like application users, we might have something like…
%rec: user %doc: User information # The non-multi-line string fields %type: id int %type: login line %type: email email %type: password line %type: name line %type: created_at: date %type: updated_at: date %type: role enum Guest User Moderator Administrator %type: deleted bool # Roles for fields; note that we sneak in "profile," # which is our multi-line string. %key: id %mandatory: login email password %allowed: name profile created_at updated_at role deleted # We could also %prohibit fields we want to ensure doesn't # exist until we're ready to set it. # Special restrictions %auto: id created_at updated_at %confidential: password %unique: login email
That’s probably more extensive than it needs to be for an example, but we see the record type name (
%rec), decent coverage of the available types (
bool, the different kinds of fields (other than
prohibited), and constraints (
There’s one useful type not shown here, that I’ll get to in a bit.
user record, now, is going to need to have our specified fields and the right types, all in
name: value format. For example, I created the following user via the Fake Name Generator.
id: 7760 login: Brouch email: VanessaJStolz@dayrep.com password: $2a$11$WBkGS/r5/0tn2muqPXreVO40wy6dxdEzKj/lY5.y7BQ5YoOHKerNW name: Vanessa J. Stolz profile: A 78-year-old Libra from Atlanta, Georgia, who works + for De Pinna as a machine feeder and drives a blue 2000 + Toyota Pod. created_at: 2020-02-05 06:48:55-0500 updated_at: 2020-02-05 06:48:55-0500 role: Moderator deleted: False
Beyond just editing our
users.rec file manually, we have a few options to populate our file with data.
And no, I’ve never heard of the Toyota Pod before now. It came generated with the fictional person’s dossier. It was a concept car that appears to be the automotive equivalent of the early smart refrigerators that nobody would really want, but is easy to explain to people.
As an aside, recutils doesn’t care about extra fields. So, if you want to add a birthday, significant other’s name, and hat size to Vanessa’s record above, but not anybody else’s, that’s probably going to be somewhat difficult to work with, but not destructive.
If there’s a pre-existing data source, recutils comes with two utilities to convert from common database types:
csv2recturns each line of a CSV file into a record, with the header line providing the field names.
mdb2recconverts a Microsoft Access database (or a single table) into records.
I can confirm that the CSV program works well, but couldn’t find an Access database to test.
recins utility, probably unsurprisingly, inserts records. Continuing with our users, it can be used something like…
recins --type=user --field=login --value=Gode1962 \ --field=email --value=PhillipSCollins@teleworm.us \ --field=password --value=password \ --field=name --value="Phillip S. Collins" \ --field=profile --value="Suburban driver of Dodge Rams" \ --field=role --value=User \ --field=deleted --value=False \ --verbose users.rec
Assuming that none of the constraints have been violated, here, this will add the record to the
users.rec file. If it fails, the
--verbose argument will produce an error message explaining what went wrong.
We can get some basic information, now. The following gives us a count of each record type in the file.
The difference between that and retrieving the schema is a matter of an argument.
recinf --descriptor users.rec
And then we can check into more general queries of the data, though the syntax isn’t SQL. We can pick a random set of records.
recsel --random 5 users.rec
We can get a “projection” of all data by specifying columns. The columns can also be renamed for convenience, which will be important, later.
recsel --print-values=login,email users.rec recsel --print-values=login:Username,email:EMail users.rec
--print-values with just
name: value pairs. Also, if there are multiple types of records, we would need to specify
We can also run arbitrary queries.
recsel --expression="id = 23573" users.rec recsel --expression="email ~ gmail.com" users.rec
The expressions can get fairly sophisticated.
Now that we understand insertion and query expressions, we should be able to figure out how to delete a record.
recdel --type=user --expression="login = Selits" --comment users.rec
--comment argument deletes the record by commenting it out in the file, so it can be found or restored later.
Now that we’ve settled our users, we can give the users something to work with, like favorite websites. We’ll keep this
website.rec schema simpler than our users.
%rec: website %doc: Favorite websites submitted to the database. %type: id int %type: url line %type: owner rec user %type: created_at: date %type: updated_at: date %key: id %mandatory: url owner %auto: id created_at updated_at
The important field, here, sets up a foreign key:
%type: owner rec user
owner is a reference to a record of type
user. Because the
user record type’s key is an integer, the
owner field is an integer, as well.
So, we have an example record.
id: 2031 url: https://www.gnu.org/software/recutils/manual/Foreign-Keys.html#Foreign-Keys owner: 7760 created_at: 2020-02-05 06:48:55-0500 updated_at: 2020-02-05 06:48:55-0500
Now, we get to the important stuff. We can now get a list of URLs and the name of the person who submitted each one. Remember, we can’t explicitly specify multiple files, so we either have a single file or a wildcard.
recsel --type=website --field owner \ --print-values=url:URL,owner_name:Owner *.rec
And, of course, we could add a query expression to the command, if we wanted, or we could involve additional tables.
Note, incidentally, that the join is an inner join. One of the problems with recutils seems to be that other kinds of joins are not permitted.
In the grand scheme, a database that only allows ad hoc queries isn’t particularly useful. Instead, we need some kind of API that can be used by software.
For programming in C, there’s the librec-dev library package, and a lot of other languages can (indirectly) use that. But I only see bindings available for Python and PHP, with neither library updated in several years.
Also, the library documentation is “go read the source code,” specifically the main header file. Apparently, the important functions in the library approximately map to…
|librec Function||recutils Utility|
| || |
| || |
| || |
| || |
| || |
That’s not spectacular, but still has some potential to improve, at least. Maybe in a few weeks, when other things settle down, I’ll revisit recutils specifically to document the library.
This just scratches the surface, honestly. Queries can be grouped. Aggregate functions are available. Subtypes are available, such as strings that conform to a regular expression or integers in a limited range, and can be named with
%typedef. Records can have compound fields or be sorted. There are more encryption options than just
%confidential. Output can be formatted with templates. Records can be updated. The
recfix program performs diagnostics and repairs some issues.
In other words, it’s a sophisticated system with a lot going for it, despite the handful of problems I’ve called out in the preceding discussion. It won’t replace CSV files for my simple data sets, but for anything more complicated than that (other than calculation, which still needs a lightweight solution that doesn’t have the overhead of a typical programming language), it looks pretty good!
A good example of where I’ll start using it soon, for the Silver Bat stories (more in that universe are definitely coming), I currently just have a Markdown file that lists all the characters. It includes their profiles from my profile generator, plus a brief biography and possible plot threads. When the time comes to share that information with other people, it would be nice to manage that data better and also turn it into a kind of concordance to be able to see where all the characters have been used.
For much more information, you can read through the documentation. It’s not particularly friendly to newcomers who don’t already understand how things work—which, in theory, is nobody who got to the end of this post—but it’s extensive enough to answer most questions that aren’t related to the library.
Bonus: SQLite Export
Since I wanted to get up and running quickly with data with recutils, I decided to use a database I was already working with to populate my records. SQLite, which is where the data is, has a mildly clunky way of exporting tables to CSV files.
First, get into the SQLite shell.
Then, decide what you want to export and go for it.
.tables # <- This lists all tables # Now that we know what tables are available, # we can export them. .headers on .mode csv .output users.csv SELECT * FROM users; # The table and file names don't need to match. .output websites.csv SELECT * FROM websites; # If you don't issue a new .output directive, # all your data will end up in the same file, # which is going to make everything else more # complicated than it needs to be. .quit
Now, we have files that can be converted to record sets with
csv2rec as described above.
Tags: techtips recutils linux sqlite export