Saturday, March 28, 2009

~/bin

I've started a project on github for my collection of general-purpose shell scripts: the ones I keep in ~/bin on each of my shell accounts. If you have any general purpose utilities, don't hesitate to fork the project; I'm sure we could collectively build a fantastic set of power tools.


I wrote a new one this week, called xip, that is a shell analog for the zip function in many languages (the name zip is naturally reserved for the pkzip utility). I created this script to join the ranks of diff and comm, all functions that benefit from multiple input streams. This comes on the heels of discovering at commandlinefu.com that there's a syntax for subshell fifo replacement. That is, you can supply a subshell as an argument to a command, and it will be replaced with the file name of a named pipe. Let's take the the canonical example:

$ cat a
a
b
c
$ cat b
b
a
$ diff <(sort a) <(sort b)
3d2
< c

To peer under the hood, I used echo.

$ echo <(echo)
/dev/fd/63

Ahah! The stream gets passed as an argument!

So, this opens up a world of possibilities. Normally you can only work with linear pipelines because the functions or programs only have one input and one output stream, and this limitation has created a dearth of standard utilities for working with multiple input streams. Before discovering this feature, the command line was like a programming language where functions only accepted one argument (and no implicit partial application, smarty-pants). Now I feel like I've discovered bash's secret cow level.

So, to remedy the lack of multi-parameter functions in shell, I started by making xip. It takes any number of file names as arguments and interlaces the lines of their output until one of the streams closes.

$ xip <(echo 1; echo 2) <(echo a; echo b)
1
a
2
b

You can then pipe that to a while read loop, or xargs -n 2 loop, to create a table. This example enumerates the lines of a file (jot for BSD, seq for Linux).

$ xip <(seq `cat a | wc -l`) a | xargs -n 2
1 a
2 b
3 c

I suppose the next fun trick is producing multiple output streams, with something like tee and mkfifo. I leave this as an exercise for the reader.


I've also included some of my older scripts from back in the days when I was working exclusively on Linux and used mpg123 to play my music. mpg123 is a command line music player, and it doesn't really have a playlist system built in (for that there are alternatives, but I digress). So, I used a pipeline to generate my playlist stream. cycle, shuffle, and enquote are in the github ~/bin project.

$ find . -name '*.mp3' \
	| cycle \
	| shuffle `find . -name '*.mp3' | wc -l` \
	| enquote \
	| xargs -n 1 mpg123

Saturday, March 21, 2009

Interoperable JavaScript Modules

This year has begun with a combination of wondrous events in the JavaScript theatre. I've been struggling to promote the idea of module system in JavaScript for several years now. There has been a sudden explosion of progress.

It started with a series of prototypes for module loaders for Tale in college, which eventually developed into modules.js, that over the last few years was refined by the development of the Chiron module library. I presented Chiron at BarCampLA last year, only succeeding to put Dan Kaminsky to sleep. Before I left Apple and the bay area, I introduced myself to Mark Miller from the Google Caja team including Ihab Awad and Mike Samuel and discussed modules over lunch in Mountain View. In May last year, Peter Michaux and I started discussing converging on a common module standard so that our Chiron and XJS libraries could be interoperable, but that effort floundered. However, Peter introduced me to the Helma NG project and Hannes Wallnoefer which has a compatible notion about modules. We converged partially toward a standard in August. Peter was also kind enough to notify me when Ihab Awad started a discussion about modules on the ECMAScript standard discussion list. I met up with the Caja team again in October for a full day to specifically design a module system that was both usable and securable. We discovered a way to make a module system that looked just like any other, but also reused inert module factories for multiple sandboxes, addressing the need for dependency injection.

January this year, Mark Miller sent word to Ihab and me that we were on the agenda for the next ECMAScript committee meeting later that month to present a proposal for adding modules to a future version of JavaScript. After extensive discussion, we nailed down a proposal and Ihab flew down to LA to work on a presentation with me before the meeting. We presented to the committee on the second day and it was received well. The conversation focused on what additional requirements we would need to nail down to actually make the modules secure.

At about the same time, Kevin Dangoor from the Bespin team at Mozilla prompted a massive discussion that attracted a flash crowd of developers around the world who were interested in sharing code among JavaScript implementations outside the browser. One week later, with 224 members, and 653 messages posted, we knew Kevin had struck a nerve.

The group founded the ServerJS project, and among the first common efforts was to converge on a module system. Ihab and I camped out on the list promoting, receiving feedback, and refining a securable module proposal. There are now several efforts to create compliant module loaders for various platforms including Jack (which works on Rhino with Jetty and Simple, and eventually v8cgi among others), a project called JSEng or GPSE to be released eventually by Wes Garland at PageMail, Kris Zyp's Persevere, and of course Chiron. We're working on getting the various platforms passing unit tests and sharing code. I've got about 11KLOCs of Chiron ported to the standard.

Meanwhile, Kevin has hinted that Bespin may eventually have a JavaScript backend running on Jack, which would be an impressive foothold for the eventual JavaScript standard module library.

So, if last year was the year of JavaScript module struggles, this year looks like it will be the year of JavaScript module success.


The technical details are on the Securable Modules wiki page. The general idea is that modules receive a "require" function for getting other modules with both absolute and relative identifiers, an "exports" object which the module shares with other modules, and an "environment" object for modules that use dependency injection, those things that ultimately provide IO in secured sandboxes.

A module would look like:

var file = require('file');
exports.foo = function (bar) {
 return file.File(bar, 'r');
};

Secure module loaders would prevent tampering with the primordials and the global scope by creating module factory functions that receive those three variables under a hermetic bell. A sandbox would be a group of secured singleton modules produced by calling the module factory functions, and sandboxes can create smaller sandboxes and share loaders to improve performance without "leaking" capabilities. If you're used to dependency injection modules, the difference is that the only security boundary is at the sandbox interface, and instead of instantiating modules with an explicit list of its required modules, you inject capabilities in the environment and all modules in that environment are loaded on demand and have access to those capabilities. The hermetic bell is a special evaluator to be provided by the JavaScript engine that runs programs in an alternate transitively frozen global scope.

Enjoy!

Sunday, March 8, 2009

Dict versus dict

Chiron's base module provides both a dict operator and a Dict factory method, as well as List and list, and analogously Set and unique. Dict and dict both accept the same basic types.

dict({a: 10}) =
Dict({a: 10}) =
Dict([["a", 10]])
dict("abc") =
Dict("abc") =
Dict([[0, "a"], [1, "b"], [2, "c"]]) =
dict(["a", "b", "c"]) =
Dict(["a", "b", "c"]) =
Dict(iter("abc"))

In code, the difference is that Dict is a type and dict is an operator. The difference in practice is that dict first checks whether the first argument is a subtype of Base (which includes all objects using the type system), and whether it implements a dict method. If so, it defers to that polymorphic dict method. Otherwise, it defers to Dict.

var base = require('./base');
var test = require('./test');
exports.Foo = base.type(function (self, supr) {
 self.dict = function () {
  return base.Dict({'a': 10});
 };
});
var foo = exports.Foo();
test.assertEq(
 base.dict(foo),
 base.Dict({'a': 10}),
 'dict behaves as a polymorphic operator'
);

The same difference applies to the polymorphic operators unique (that defers to Set if unique is not a member of the type), or list (that defers to List if list is not a member of the type). object and array are also polymorphic operators that work as copy constructors for Object and Array, but also defer to polymorphic object and array members. The default behaviors of object and array are to copy or coerce the argument to an Object or Array, since Object and Array cannot be used as copy constructors themselves. A complete variety of coercions are possible, extending well into the bizarre and insane.

array("abc") =
["a", "b", "c"]
object([1, 2, 3]) =
{'0': 1, '1': 2, '2': 3}