Thursday, February 21, 2008

Unicode

What I've learned today—

In order for a program or library to operate in a Unicode compatible fashion, all strings must be in Unicode. All input strings must be brought into Unicode, and all output strings must be sent out of Unicode at the very last possible moment. This is because, outside of Unicode strings, encoding is not a function of type and type information does not generally cross API boundaries accurately, plus regular expressions don't play well against mixed-length characters.

Django works entirely in Unicode and drops a string to UTF-8 at the last moment. I had need for Python's Textile library to transform text, but it only dealt with byte strings. Anyhow, It turned out to be quick work to change all of its strings to unicode and not bother with encoding and decoding.

Tuesday, February 19, 2008

Integrating Simile

Not to name names, but I've been working on integrating code from Simile from MIT into Chiron. Refactoring an existing JavaScript project highlights all the things you get for free in Chiron.

Simile has it's own XHR engine, DOM event wrappers, DOM layout and style functions, PNG transparency solution, and a SortedArray type that provides binary search functions. Here are some of my observations.

  • Simile's layout getSize was better than mine. I will rectify this.
  • Not having a module system makes us reinvent the wheel: frequently.
  • It's hard to write a good XHR engine. There are a lot of XHR modules out there, most of them have some issue or another: missing browser, doesn't report OK status on local files, doesn't unify browser caching inconsistencies, doesn't support timeouts, doesn't expose XML (the X in AJAX) in IE, or so on. If you're going to make a new one, you should use these as references and do some serious research, development, and testing. Otherwise, you should copy or use the best of them (jQuery, in my opinion). Also, it needs to support asynchronous (the A in AJAX) requests, and you need to use them as often as possible.
  • Not having a solid, modular library makes us lazy. The inconvenience of name-spacing makes us lazy. This causes us to write sloppy code. For example, we should always use an enquote function when we're string interpolating HTML attributes and an inoculate function when we're interpolating HTML, or we should use DOM functions or a DOM wrapper API to generate our HTML.
  • As I integrate code from other libraries, a pattern emerges. In my first pass, I collapse the name-spaces. Every module is a name-space, so all the manual creating of hierarchies like Simile.DOM (Simile = {} presumed, then Simile.DOM = {}, then endless repetition of Simile.DOM to augment or use its contents) is unnecessary and undesirable.
  • Referencing URL's of resources, like other scripts and images, relative to the URL of the script you're currently in, is hard. Starting from scratch, this usually means you're going to have a global URL constant. This means domain-coupling. Maybe you make the URL relative to the root. This means domain-coupling. Maybe you provide it as a configuration variable. This means site-coupling. Maybe you scrape the script tags on the page for the URL of your script then resolve the URL relative to your own URL. This means you're going to write a lot of slow code for what you perceive to be little value. In Chiron, you can get a function called resolve from http.js that resolves a URL relative to a base URL. Chiron also provides your modules with a moduleUrl variable that is the URL of the script you're in. resolve also implicitly uses this variable as your base URL if you don't provide a second argument(include('http.js'); resolve('images/blah.gif')).

    Chiron grabs the script tag href of modules.js and removes the script object from the DOM (so other scripts can't sniff it) exactly once, since it needs that URL to resolve other module URL's. From there, Chiron keeps track of where all of your modules are relative to it and provides that information to each module.

  • About SortedArray:
    • A collection type should create an empty instance if you pass no arguments in.
    • A collection type should populate itself from the values of another collection if you pass one in as its first argument. This should always be the first argument, even if you frequently create empty collections with overrides on later arguments. Force your user to pass in a null or undefined.
    • Try to accept null and undefined as equivalent unless the distinction is meaningful.
    • Try to distinguish null and undefined from 0 in all meaningful cases.
    • Invariants like "sorted" are a promise. Guarantee your invariants across all function calls, including construction. If this means an unacceptable performance degradation, permit the user to suppress whatever code you need to verify the invariant if they are willing to provide treated data.
    • If there is a reasonable default, it should always be implicit. I should not have to explicitly send the global compare function into a SortedArray if I want a SortedArray of types supported by compare.
  • Not having a system of base types makes for noisy API's where names from different organizations have different meanings. For example, find functions should always accept the same kinds of arguments and return the same kinds of values. Simile's name choices are very close to mine, to the effect that they could almost be used as partially implemented duck types for mine, but some of the names would have to be realigned. find in Simile accepts a comparator and returns an acceptable index to insert or remove a particular element. find in Chiron returns an index or key at which an item can be inserted or removed, and guarantees that it will be the first occurrence of a given value (not a comparator). It was very easy to refactor SortedArray to subscribe to the strict model. Also, removeAll needed to be clear, length and getCount both needed to be getLength, getIterator needed to be iter, next needed to throw StopIteration once in a while, among others.

I'm looking forward to having a semblance of Simile Timeplot and Timechart in the Chiron family.

Monday, February 18, 2008

Polymorphic repr

I'm not about to debate whether debugging is a valuable exercise in JavaScript, nor whether introspection or reflection tools are useful, nor whether they would be especially helpful in a dynamic language. Ruby has inspect. Python has repr. The Chiron JavaScript library has repr too.

Notionally, repr is the inverse of eval for a reasonable subset of JavaScript. There are a lot of object hierarchies that cannot be reconstructed from a repr serialization, but as a debugging tool, repr is indispensable.

repr is a polymorphic function you can import from base.js. If you pass repr an object that implements a repr member function, repr will defer to your overridable repr. Otherwise, repr returns reasonable defaults for other types. For example, repr provides defaults for Array, Object, String, Number, Boolean, and Date. repr also recursively represents members of arrays and objects, but provides circular reference protection by tracking visited objects in a memo Set.

Chiron's debugger uses repr to convert the value of an expression on the command line to a human-readable string.

j$ 1
1
j$ repr(1)
"1"
j$ "hi"
"hi"
j$ repr("hi")
"\"hi\""
j$ true
true
j$ {a: 10}
{"a": 10}
j$ [1, 2, 3]
[1, 2, 3]
j$ [{a: 10}]
[{"a": 10}]
j$ var a = {}; a.a = a; a
{"a": <cycle>}
j$ type()()
<instance run.html#0 0>
j$ type({'repr': function () {return "x"}})()
x