Joey
typed pipes in every shell

Powershell and nushell take unix piping beyond raw streams of text to structured or typed data. Is it possible to keep a traditional shell like bash and still get typed pipes?

I think it is possible, and I'm now surprised noone seems to have done it yet. This is a fairly detailed design for how to do it. I've not implemented it yet. RFC.

Let's start with a command called typed. You can use it in a pipeline like this:

typed foo | typed bar | typed baz

What typed does is discover the types of the commands to its left and its right, while communicating the type of the command it runs back to them. Then it checks if the types match, and runs the command, communicating the type information to it. Pipes are unidirectional, so it may seem hard to discover the type to the right, but I'll explain how it can be done in a minute.

Now suppose that foo generates json, and bar filters structured data of a variety of types, and baz consumes csv and pretty-prints a table. Then bar will be informed that its input is supposed to be json, and that its output should be csv. If bar didn't support json, typed foo and typed bar would both fail with a type error.

Writing "typed" in front of everything is annoying. But it can be made a shell alias like "t". It also possible to wrap programs using typed:

cat >~/bin/foo <<EOF
#/usr/bin/typed /usr/bin/foo
EOF

Or program could import a library that uses typed, so it natively supports being used in typed pipelines. I'll explain one way to make such a library later on, once some more details are clear.

Which gets us back to a nice simple pipeline, now automatically typed.

foo | bar | baz

If one of the commands is not actually typed, the other ones in the pipe will treat it as having a raw stream of text as input or output. Which will sometimes result in a type error (yay, I love type errors!), but in other cases can do something useful.

find | bar | baz
# type error, bar expected json or csv

foo | bar | less
# less displays csv 

So how does typed discover the types of the commands to the left and right? That's the hard part. It has to start by finding the pids to its left and right. There is no really good way to do that, but on Linux, it can be done: Look at what /proc/self/fd/0 and /proc/self/fd/1 link to, which contains the unique identifiers of the pipes. Then look at other processes' fd/0 and fd/1 to find matching pipe identifiers. (It's also possible to do this on OSX, I believe. I don't know about BSDs.)

Searching through all processes would be a bit expensive (around 15 ms with an average number of processes), but there's a nice optimisation: The shell will have started the processes close together in time, so the pids are probably nearby. So look at the previous pid, and the next pid, and fan outward. Also, check isatty to detect the beginning and end of the pipeline and avoid scanning all the processes in those cases.

To indicate the type of the command it will run, typed simply opens a file with an extension of ".typed". The file can be located anywhere, and can be an already existing file, or can be created as needed (eg in /run). Once it discovers the pid at the other end of a pipe, typed first looks at /proc/$pid/cmdline to see if it's also running typed. If it is, it looks at its open file handles to find the first ".typed" file. It may need to wait for the file handle to get opened, which is why it needs to verify the pid is running typed.

There also needs to be a way for typed to learn the type of the command it will run. Reading /usr/share/typed/$command.typed is one way. Or it can be specified at the command line, which is useful for wrapper scripts:

cat >~/bin/bar <<EOF
#/usr/bin/typed --type="JSON | CSV" --output-type="JSON | CSV" /usr/bin/bar
EOF

And typed communicates the type information to the command that it runs. This way a command like bar can know what format its input should be in, and what format to use as output. This might be done with environment variables, eg INPUT_TYPE=JSON and OUTPUT_TYPE=CSV

I think that's everything typed needs, except for the syntax of types and how the type checking works. Which I should probably not try to think up off the cuff. I used Haskell ADT syntax in the example above, but don't think that's necessarily the right choice.

Finally, here's how to make a library that lets a program natively support being used in a typed pipeline. It's a bit tricky, because it has to run typed, because typed checks /proc/$pid/cmdline as detailed above. So, check an environment variable. When not set yet, set it, and exec typed, passing it the path to the program, which it will re-exec. This should be done before program does anything else.


This work was sponsored by Mark Reidenbach on Patreon.

Posted
git-annex devblog (Joey devblog)
day 639 major keys database milestone

I've fallen completely out of practice on this dev blog, but I felt I had to mention a major milestone accomplished over the past week. The database that git-annex maintains about keys and worktree files used to only be guaranteed to be maintained for unlocked files, but it did not have information about locked files. Now it does, and it's automatically, and efficiently (I hope) kept up-to-date.

That let a long-standing bug get fixed, where when 2 files used the same key, the preferred content expression could match one file and not the other and cause get/drop to happen over and over.

But there are probably a lot of other ways this database could be used, now that's it's fully available. For example, it would be easy to write a git-annex command that queries for which worktree files use a key, without needing to scan the whole worktree to find them.

Posted
Joey
the end of the olduse.net exhibit

Ten years ago I began the olduse.net exhibit, spooling out Usenet history in real time with a 30 year delay. My archive has reached its end, and ten years is more than long enough to keep running something you cobbled together overnight way back when. So, this is the end for olduse.net.

The site will continue running for another week or so, to give you time to read the last posts. Find the very last one, if you can!

The source code used to run it, and the content of the website have themselves been archived up for posterity at The Internet Archive.

Sometime in 2022, a spammer will purchase the domain, but not find it to be of much value.

The Utzoo archives that underlay it have currently sadly been censored off the Internet by someone. This will be unsuccessful; by now they have spread and many copies will live on.


I told a lie ten years ago.

You can post to olduse.net, but it won't show up for at least 30 years.

Actually, those posts drop right now! Here are the followups to 30-year-old Usenet posts that I've accumulated over the past decade.

Mike replied in 2011 to JPM's post in 1981 on fa.arms-d "Re: CBS Reports"

A greeting from the future: I actually watched this yesterday (2011-06-10) after reading about it here.

Christian Brandt replied in 2011 to schrieb phyllis's post in 1981 on the "comments" newsgroup "Re: thank you rrg"

Funny, it will be four years until you post the first subnet post i ever read and another eight years until my own first subnet post shows up.

Bernard Peek replied in 2012 to mark's post in 1982 on net.sf-lovers "Re: luke - vader relationship"

i suggest that darth vader is luke skywalker's mother.

You may be on to something there.

Martijn Dekker replied in 2012 to henry's post in 1982 on the "test" newsgroup "Re: another boring test message"

trentbuck replied in 2012 to dwl's post in 1982 on the "net.jokes" newsgroup "Re: A child hood poem"

Eveline replied in 2013 to a post in 1983 on net.jokes.q "Re: A couple"

Ha!

Bill Leary replied in 2015 to Darin Johnson's post in 1985 on net.games.frp "Re: frp & artwork"

Frederick Smith replied in 2021 to David Hoopes's post in 1990 on trial.rec.metalworking "Re: Is this group still active?"

Posted

List of feeds:

  • Anna: last checked (50 posts)
  • Anna and Mark: Waldeneffect: last checked (4554 posts)
  • Joey: last checked (213 posts)
  • Joey devblog: last checked (263 posts)
  • Jay: last checked (50 posts)
  • Errol: last checked (53 posts)
  • Maggie too: Cannot detect feed type (72 posts)
  • Maggie also: Not Found (437 posts)
  • Tomoko: last checked (77 posts)
  • Jerry: last checked (28 posts)
  • Dani: last checked (22 posts)
  • Richard: last checked (67 posts)