Sunday, December 6, 2009

Object: Container or Type

JavaScript, in a vain attempt to make the language simpler, conflates the concerns of the Object system and the lookup-table. "objects" in JavaScript serve as both instances of types and as key to value mappings, but only for string keys. Object-literals can be used as lookup tables, instances of types, or both at the same time. The "Object" constructor itself serves as both a lookup table base-type, and the base-type of all types.

The principle advantage of using objects as types and as lookups is a reduction in syntax. Many languages have two separate notations for dealing with properties and keys. Keys get the brackets; properties get the dot notation. But dot notation does not provide a facility for parameterized properties names. The easy solution was to just use brackets for both properties and keys.

object[propertyName]

Many other languages separate these concerns. The principle advantage of separating these concerns is that a lookup-table object needs to have two key domains, that of its type and that of its contents. When these domains are conflated, neither can express the full range of potential keys. Their key-spaces collide.

Furthermore, the domain of object properties should be more restricted than that of a lookup table. In the former you want all keys to be valid symbols. In the latter, you want keys to be any reference and any string. Instead, Objects-used-as-lookups can only use some Strings: those that do not collide with methods, unless you're really careful.

Which brings me to my thesis. My point is not that JavaScript should be fixed; there is no technically viable solution to that problem, and using another language isn't always a solution. My point is that we have to be really careful. Objects can be safely used as lookup tables for the full range of at least Strings. In order to do so, you have to avoid using them as instances. That means you can't call their function properties (member functions). To do so would be to assume that the member function name is an invalid entry in the lookup key-space. You cannot enforce that restriction without peril.

So, to use an Object as a lookup table, you must only use the "owned" properties of the "Object". By convention, any function in its prototype chain must be treated as a method of the type, not contents of the lookup table. This distinction is useful in determining whether a property is a member function or contents of a lookup type.

has

return Object.prototype.hasOwnProperty.call(mapping, key)

get

if (Object.prototype.hasOwnProperty.call(mapping, key)
	return mapping[key];

set

mapping[key] = value;

getset

if (!Object.prototype.hasOwnProperty.call(mapping, key)
	mapping[key] = value;
return mapping[key];

del

delete mapping[key];

The complete and hideous Object.prototype.hasOwnProperty.call(mapping, key) instead of the polymorphic mapping.hasOwnProperty(mapping, key) is draconian but enables "hasOwnProperty" to be a key in the container space. Some would argue that this particular value is not worth the effort, and that a polymorphic "hasOwnProperty" is useful in creating Object-as-lookup-and-as-subtypes. If you can validate your key space, it might be an optimization you can use. However, if you are writing generic code to operate on objects that may have been crafted by suspect users, this is not a luxury you can afford. If you want polymorphic types, use a polymorphic type.

To that end, I propose that you make or find a polymorphic collection type. These are easy to define. We do not have the luxury of creating hash tables in JavaScript since there is no good hashing solution for arbitrary objects, but we do have "toString". We can use "toString" as a hash function and arrays as collision buckets. Then, we can wrap the "internal" Object of Arrays with polymorphic "get", "set", "has", "getset", "del", "put", and "len" property functions in the type name space. Chiron defines sets and dictionaries in this fashion.

Narwhal has a util module that exports top-level functions by those names that will operate, via their first argument, on either objects-as-mappings or objects-as-instances generically. It distinguishes name-as-key from name-as-method by checking whether it is an owned property. So, an object literal that happens to be tracking whether it has encountered the "get" method name in a collection of instances would own a "get" property, but an instance that has a "get" method that mediates some crazy internal storage mechanism would not own its "get" property, it would be in the prototype chain.

To this end, I also propose that any Crockford-style constructor that returns an object-as-instance should use the new ECMAScript 5 "Object.create(self)" idiom so that its member functions can be distinguished from object-literal contents.

Saturday, March 28, 2009

~/bin

I've started a project on github for my collection of general-purpose shell scripts: the ones I keep in ~/bin on each of my shell accounts. If you have any general purpose utilities, don't hesitate to fork the project; I'm sure we could collectively build a fantastic set of power tools.


I wrote a new one this week, called xip, that is a shell analog for the zip function in many languages (the name zip is naturally reserved for the pkzip utility). I created this script to join the ranks of diff and comm, all functions that benefit from multiple input streams. This comes on the heels of discovering at commandlinefu.com that there's a syntax for subshell fifo replacement. That is, you can supply a subshell as an argument to a command, and it will be replaced with the file name of a named pipe. Let's take the the canonical example:

$ cat a
a
b
c
$ cat b
b
a
$ diff <(sort a) <(sort b)
3d2
< c

To peer under the hood, I used echo.

$ echo <(echo)
/dev/fd/63

Ahah! The stream gets passed as an argument!

So, this opens up a world of possibilities. Normally you can only work with linear pipelines because the functions or programs only have one input and one output stream, and this limitation has created a dearth of standard utilities for working with multiple input streams. Before discovering this feature, the command line was like a programming language where functions only accepted one argument (and no implicit partial application, smarty-pants). Now I feel like I've discovered bash's secret cow level.

So, to remedy the lack of multi-parameter functions in shell, I started by making xip. It takes any number of file names as arguments and interlaces the lines of their output until one of the streams closes.

$ xip <(echo 1; echo 2) <(echo a; echo b)
1
a
2
b

You can then pipe that to a while read loop, or xargs -n 2 loop, to create a table. This example enumerates the lines of a file (jot for BSD, seq for Linux).

$ xip <(seq `cat a | wc -l`) a | xargs -n 2
1 a
2 b
3 c

I suppose the next fun trick is producing multiple output streams, with something like tee and mkfifo. I leave this as an exercise for the reader.


I've also included some of my older scripts from back in the days when I was working exclusively on Linux and used mpg123 to play my music. mpg123 is a command line music player, and it doesn't really have a playlist system built in (for that there are alternatives, but I digress). So, I used a pipeline to generate my playlist stream. cycle, shuffle, and enquote are in the github ~/bin project.

$ find . -name '*.mp3' \
	| cycle \
	| shuffle `find . -name '*.mp3' | wc -l` \
	| enquote \
	| xargs -n 1 mpg123

Saturday, March 21, 2009

Interoperable JavaScript Modules

This year has begun with a combination of wondrous events in the JavaScript theatre. I've been struggling to promote the idea of module system in JavaScript for several years now. There has been a sudden explosion of progress.

It started with a series of prototypes for module loaders for Tale in college, which eventually developed into modules.js, that over the last few years was refined by the development of the Chiron module library. I presented Chiron at BarCampLA last year, only succeeding to put Dan Kaminsky to sleep. Before I left Apple and the bay area, I introduced myself to Mark Miller from the Google Caja team including Ihab Awad and Mike Samuel and discussed modules over lunch in Mountain View. In May last year, Peter Michaux and I started discussing converging on a common module standard so that our Chiron and XJS libraries could be interoperable, but that effort floundered. However, Peter introduced me to the Helma NG project and Hannes Wallnoefer which has a compatible notion about modules. We converged partially toward a standard in August. Peter was also kind enough to notify me when Ihab Awad started a discussion about modules on the ECMAScript standard discussion list. I met up with the Caja team again in October for a full day to specifically design a module system that was both usable and securable. We discovered a way to make a module system that looked just like any other, but also reused inert module factories for multiple sandboxes, addressing the need for dependency injection.

January this year, Mark Miller sent word to Ihab and me that we were on the agenda for the next ECMAScript committee meeting later that month to present a proposal for adding modules to a future version of JavaScript. After extensive discussion, we nailed down a proposal and Ihab flew down to LA to work on a presentation with me before the meeting. We presented to the committee on the second day and it was received well. The conversation focused on what additional requirements we would need to nail down to actually make the modules secure.

At about the same time, Kevin Dangoor from the Bespin team at Mozilla prompted a massive discussion that attracted a flash crowd of developers around the world who were interested in sharing code among JavaScript implementations outside the browser. One week later, with 224 members, and 653 messages posted, we knew Kevin had struck a nerve.

The group founded the ServerJS project, and among the first common efforts was to converge on a module system. Ihab and I camped out on the list promoting, receiving feedback, and refining a securable module proposal. There are now several efforts to create compliant module loaders for various platforms including Jack (which works on Rhino with Jetty and Simple, and eventually v8cgi among others), a project called JSEng or GPSE to be released eventually by Wes Garland at PageMail, Kris Zyp's Persevere, and of course Chiron. We're working on getting the various platforms passing unit tests and sharing code. I've got about 11KLOCs of Chiron ported to the standard.

Meanwhile, Kevin has hinted that Bespin may eventually have a JavaScript backend running on Jack, which would be an impressive foothold for the eventual JavaScript standard module library.

So, if last year was the year of JavaScript module struggles, this year looks like it will be the year of JavaScript module success.


The technical details are on the Securable Modules wiki page. The general idea is that modules receive a "require" function for getting other modules with both absolute and relative identifiers, an "exports" object which the module shares with other modules, and an "environment" object for modules that use dependency injection, those things that ultimately provide IO in secured sandboxes.

A module would look like:

var file = require('file');
exports.foo = function (bar) {
 return file.File(bar, 'r');
};

Secure module loaders would prevent tampering with the primordials and the global scope by creating module factory functions that receive those three variables under a hermetic bell. A sandbox would be a group of secured singleton modules produced by calling the module factory functions, and sandboxes can create smaller sandboxes and share loaders to improve performance without "leaking" capabilities. If you're used to dependency injection modules, the difference is that the only security boundary is at the sandbox interface, and instead of instantiating modules with an explicit list of its required modules, you inject capabilities in the environment and all modules in that environment are loaded on demand and have access to those capabilities. The hermetic bell is a special evaluator to be provided by the JavaScript engine that runs programs in an alternate transitively frozen global scope.

Enjoy!

Sunday, March 8, 2009

Dict versus dict

Chiron's base module provides both a dict operator and a Dict factory method, as well as List and list, and analogously Set and unique. Dict and dict both accept the same basic types.

dict({a: 10}) =
Dict({a: 10}) =
Dict([["a", 10]])
dict("abc") =
Dict("abc") =
Dict([[0, "a"], [1, "b"], [2, "c"]]) =
dict(["a", "b", "c"]) =
Dict(["a", "b", "c"]) =
Dict(iter("abc"))

In code, the difference is that Dict is a type and dict is an operator. The difference in practice is that dict first checks whether the first argument is a subtype of Base (which includes all objects using the type system), and whether it implements a dict method. If so, it defers to that polymorphic dict method. Otherwise, it defers to Dict.

var base = require('./base');
var test = require('./test');
exports.Foo = base.type(function (self, supr) {
 self.dict = function () {
  return base.Dict({'a': 10});
 };
});
var foo = exports.Foo();
test.assertEq(
 base.dict(foo),
 base.Dict({'a': 10}),
 'dict behaves as a polymorphic operator'
);

The same difference applies to the polymorphic operators unique (that defers to Set if unique is not a member of the type), or list (that defers to List if list is not a member of the type). object and array are also polymorphic operators that work as copy constructors for Object and Array, but also defer to polymorphic object and array members. The default behaviors of object and array are to copy or coerce the argument to an Object or Array, since Object and Array cannot be used as copy constructors themselves. A complete variety of coercions are possible, extending well into the bizarre and insane.

array("abc") =
["a", "b", "c"]
object([1, 2, 3]) =
{'0': 1, '1': 2, '2': 3}