Saturday, March 28, 2009

~/bin

I've started a project on github for my collection of general-purpose shell scripts: the ones I keep in ~/bin on each of my shell accounts. If you have any general purpose utilities, don't hesitate to fork the project; I'm sure we could collectively build a fantastic set of power tools.


I wrote a new one this week, called xip, that is a shell analog for the zip function in many languages (the name zip is naturally reserved for the pkzip utility). I created this script to join the ranks of diff and comm, all functions that benefit from multiple input streams. This comes on the heels of discovering at commandlinefu.com that there's a syntax for subshell fifo replacement. That is, you can supply a subshell as an argument to a command, and it will be replaced with the file name of a named pipe. Let's take the the canonical example:

$ cat a
a
b
c
$ cat b
b
a
$ diff <(sort a) <(sort b)
3d2
< c

To peer under the hood, I used echo.

$ echo <(echo)
/dev/fd/63

Ahah! The stream gets passed as an argument!

So, this opens up a world of possibilities. Normally you can only work with linear pipelines because the functions or programs only have one input and one output stream, and this limitation has created a dearth of standard utilities for working with multiple input streams. Before discovering this feature, the command line was like a programming language where functions only accepted one argument (and no implicit partial application, smarty-pants). Now I feel like I've discovered bash's secret cow level.

So, to remedy the lack of multi-parameter functions in shell, I started by making xip. It takes any number of file names as arguments and interlaces the lines of their output until one of the streams closes.

$ xip <(echo 1; echo 2) <(echo a; echo b)
1
a
2
b

You can then pipe that to a while read loop, or xargs -n 2 loop, to create a table. This example enumerates the lines of a file (jot for BSD, seq for Linux).

$ xip <(seq `cat a | wc -l`) a | xargs -n 2
1 a
2 b
3 c

I suppose the next fun trick is producing multiple output streams, with something like tee and mkfifo. I leave this as an exercise for the reader.


I've also included some of my older scripts from back in the days when I was working exclusively on Linux and used mpg123 to play my music. mpg123 is a command line music player, and it doesn't really have a playlist system built in (for that there are alternatives, but I digress). So, I used a pipeline to generate my playlist stream. cycle, shuffle, and enquote are in the github ~/bin project.

$ find . -name '*.mp3' \
	| cycle \
	| shuffle `find . -name '*.mp3' | wc -l` \
	| enquote \
	| xargs -n 1 mpg123

2 comments:

Unknown said...

Producing multiple output streams is fully supported in shell (bourne, korn, bash) with external call-outs like tee. Simply use the exec keyword to open/close file descriptors.

Kris Kowal said...

@Wes, perhaps you can provide an example. I'm familiar with "tee" but not familiar with how, in conjunction with "exec", this can be harnessed to create branches in a pipeline, which is what I presume you mean to address.