Tuesday, September 30, 2008

Metaclasses - The Python Saga - Part 4

The original type function, whose behavior is preserved in modern Python 2.5, accepts an object and returns the class, albeit the type, that would emit it. It's like the typeof operator in JavaScript that returns the String name of the primitive type of an object, or the C++ function that returns a pointer to an object's virtual function table. They're all sufficient for comparing apples to oranges, but all of them are also insufficient for the more interesting comparison of apples to the idea of a Fiji apple: the question, "Does your type inherit from this?", that can be accomplished with Python's isinstance, JavaScript's instanceof, or C++'s infernal dynamic_cast. So, type's single argument behavior is effectively retired.

At some transcendental moment, somebody deeply involved in the Python project must have been thinking, "Well, if functions and classes return objects, what returns a class? Could a class, like a property, be syntactic sugar for some deeply metaphysical latent behavior in pure Python?". I figure this is how the type function grew its new wings.

So consider a class declaration:

class Foo(object):
	bar = 10
	def __init__(self, bar = None):
		if bar is not None:
			self.bar = bar

This is what is actually happening behind the curtains:

name = 'Foo'
bases = (object,)
def __init__(self, bar = None):
	if bar is not None:
		self.bar = bar
attys = {'bar': bar, '__init__': __init__}
Foo = type(name, bases, attys)

That is to say, there is no magic in the syntax. Ultimately all of the magic happens when you call type. By "magic" I mean functionality that cannot be replicated in pure Python without the interpreter's intervention.

The type function returns a type: a function that returns new instances. It's also called a "metaclass". type just happens to also be the implied metaclass of object. That is to say, you can create your own metaclasses.

The big question about metaclasses is, "Why on earth would you want to define a metaclass?". David Mertz from IBM wrote that you would simply know when you needed them. Since I read that article, I've wracked my mind for a reason to use metaclasses to no avail. At some point, I was reading Django's ORM code and it occurred to me that the reason you would want to define a metaclass is to provide a class in your API that, when subclassed by unsuspecting users, would invoke certain preparations without their knowledge or consent. Here's how:

Define a metaclass. The best way to define a metaclass is to inherit type and override its __init__ method.

class FooType(type):
	def __init__(self, name, bases, attys):
		super(FooType, self).__init__(name, bases, attys)
		print '%s was declared!' % name

Define a base class for your API. The trick here is that you can override its metaclass. Let's look at this one in an interactive interpreter:

>>> class Foo(object):
...     __metaclass__ = FooType
...
Foo was declared!
>>>

Whoa! You didn't call anything. Not true. Here's what actually happened:

name = 'Foo'
bases = (object,)
attys = {}
attys['__metaclass__'] = FooType
Foo = attys.get('__metaclass__', type)(name, bases, attys)

Python checks your attributes for a metaclass before defaulting to type.

That means that your FooType.__init__ got called. Hot damn. I wonder what happens if you create a subclass.

>>> class Bar(Foo):
...     pass
...
Bar was declared!
>>>

Whoa! I totally inherited a metaclass.

So, the reason for writing a metaclass is that metaclasses give you an opportunity to get and manipulate your derived class objects before anyone instantiates them. You get to do this once, right after the class dictionary is fully populated. You can take this opportunity to monitor class declarations, to prepare additional attributes, or to interpolate additional base types.


Keep in mind that metaclasses are jealous. If you create a metaclass for a type that inherits from base classes in someone else's API, your metaclass must inherit from their metaclass. I suspect that it's best not to assume that your base types use a particular metaclass. Thankfully, you can use an expression for your base type.

class FooType(getattr(Bar, '__metaclass__', type)):
	pass
class Foo(Bar):
	__metaclass__ = FooType

This takes advantage of the Python idiom of accessor methods like dict.get and getattr that accept a default-if-none-exists argument. Unfortunately, Python's object doesn't explicitly state that type is its metaclass. Otherwise, you could safely say:

class FooType(Bar.__metaclass__):
	pass

Such things are to be looked for in Python 3. I find that the Python developers have either, after considerable review and debate, already accepted or rejected most of my ideas before I even consider them, so I'm not even going to check for a PEP on this one.

Monday, September 29, 2008

Properties - The Python Saga - Part 3

Properties come out of a tired programming language genesis. In the beginning, there were structs. The trouble with structs was that an opaque data structure could not programmatically monitor or intercept access and mutation of its member data.

So that's not a big deal; we could solve the problem with classes. The best practice to avoid programming yourself into a corner was to never expose a datum; you would write accessor and mutator functions, whether you needed them at the moment or not. Thus, as your design grew, you could eventually do nice things like validation, observation, or proxying. The trouble with this approach was that you had to write six times as much code on the off chance you'd need to extend it some day. But it was worth it.

The idea of managed properties came along eventually in various languages (Python, C#, some implementations of JavaScript, and recent versions of [C]). The notion is that you would initially write all of your classes like structs with member data camped in public view. You would encourage your API consumers to interact with those members directly. Then, as need arose, you would subvert the member variables with property objects. These objects would intercept accesses and mutations with functions that you could write at any time of your design process.

Lets observe this design shift in Python. Here's a class with unmanaged data:

class Foo(object):
	def __init__(self);
		self.bar = 10

Here's some other fellow's code that uses your class:

foo = Foo()
foo.bar = 20
print foo.bar
del foo.bar

So there you have it. Just to keep on the same page, the idea at this point is to add a feature to Python that permits both of those code samples to work and, in-fact, be perfectly cromulent. However, we also want to eventually add features to Foo such that its bar attribute can be managed, validated, proxied, secured, or outright lied about. Enter property. property is a function that accepts an accessor function and optional mutator and deleter functions. The property must be a class attribute to work. Here's how you would use a property:

class Foo(object):
	def __init__(self):
		self.bar = 10
	def get_bar(self, objekt, klass):
		return self.baz / 2
	def set_bar(self, objekt, value):
		self.baz = value * 2
	def del_bar(self, objekt):
		del self.baz
	bar = property(get_bar, set_bar, del_bar)

Now we have a Foo class that transparently maintains the invariant that "bar" will always be half of "baz".

Sometimes you don't need to have a setter for a property, and you almost never need a deleter. For the common case, you can use the property function as a decorator.

class Foo(object):
	def __init__(self):
		self.baz = 20
	@property
	def bar(self):
		return self.baz / 2

Creating the property function.

So, it's easy to assume that the property function does all the magic behind the scenes, setting up traps in your class's accessor and mutator paths. There's actually another layer of code that can be done entirely in Python. That is, we can implement the property function in pure Python. The trick is that the property function is actually a type or factory method (who cares which) that returns a Python duck-type: a property object. A property object is any object that implements __get__, __set__, or __del__. These are special magic Python functions that intercept access, mutation, and deletion on members. All you have to do is install an object on a class with one of methods defined, with the name of the member you want to manage. The property function just handles the common cases. Let's redefine the property function in Python, as the Property class.

class Property(object):
	def __init__(self, fget):
		self.fget = fget
	def __get__(self, objekt, klass):
		return self.fget(objekt)

This defines enough of the Property object to decorate an accessor function.

class Foo(object):
	def __init__(self):
		self.baz = 20
	@Property
	def bar(self):
		return self.baz / 2

Here's a full implementation of Property. You will note that, in order to exactly emulate the property object, the __init__ method has the same argument names as the internal property so that code that uses keyword arguments will function in perfect ambivalence.

class Property(object):
	def __init__(
		self,
		fget,
		fset = None,
		fdel = None,
		doc = None,
	):
		self.fget = fget
		self.fset = fset
		self.fdel = fdel
		self.__doc__ = doc
	def __get__(self, objekt, klass):
		return self.fget(objekt)
	def __set__(self, objekt, value):
		self.fset(objekt, value)
	def __del__(self, objekt):
		self.fdel(objekt)

Sunday, September 28, 2008

Decorators - The Python Saga - Part 2

Python introduced a short-hand for the adapter pattern on functions. You can "decorate" a function with another function. This is a neat tool you can use to factor out some common code from a bunch of functions. You can fiddle with the arguments, return values, or intercept exceptions thrown by any function you decorate.

The canonical example is a memoize decorator. The idea is to generalize the notion of memoization so you can simply subscribe to it in any function you want to memoize.

def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)
factorial = memoize(factorial)

You accomplish this by writing the memoize decorator. A decorator is a function that accepts a function and returns another. Python virtuously provides a shorthand for taking the function, decorating it, and assigning it to a variable with the same name.

@memoize
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

In the imagined normal case of decorators, the returned function accepts the same arguments and returns the same kinds of values as the accepted function. However, a decorator does have the liberty of extending or restricting that interface, like accepting additional arguments or raising an exception if the arguments are of the wrong type. It might also perform some common computation on the original arguments and pass the result to the original function as an additional argument. In any case, you can use some closures to create a decorator:

def memoize(function):
	cache = {}
	def decorated(*args):
		if args not in cache:
			cache[args] = function(*args)
		return cache[args]
	return decorated

Of course, that's too simple. A lot of things you put after the "@" symbol are just functions that return decorators so that they can be configured with arguments. For example, you probably want to make a memoize decorator that lets you specify your own cache object. So, you need another layer of deference.

def memoize(cache = None):
	if cache is None: cache = {}
	def decorator(function):
		def decorated(*args):
			if args not in cache:
				cache[args] = function(*args)
			return cache[args]
		return decorated
	return decorator

@memoize({})
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

Since, in Python, functions, objects, and types are indistinguishable to the casual observer, you can do the exact same thing with a class, although I shudder to think that you might want to forgo the simplicity and elegance of closures. After the transform, the previous code might look like this:

class memoize(object):
	def __init__(self, cache = None):
		self.cache = cache
	def __call__(self, function):
		return Memoized(function, self.cache)

class Memoized(object):
	def __init__(self, function, cache = None):
		if cache is None: cache = {}
		self.function = function
		self.cache = cache
	def __call__(self, *args):
		if args not in self.cache:
			self.cache[args] = self.function(*args)
		return self.cache[args]

@memoize()
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

So now you can use a Least Recently Used Cache, assuming it is a dictionary-like-object (a duck-dict, if you will):

from lru_cache import LruCache
@memoize(LruCache(max_size = 100, cull = .25))
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

Download decorators.zip.

Variadic Positional and Keyword Arguments - The Python Saga - Part 1

Python supports "variadic" arguments. Variadic arguments are the man behind the curtain for C's printf function. The idea is that a function can accept a variable number of positional arguments, the values to put in your format string. In C this is accomplished with an ellipsis, ..., and some VA macro-linked-list-stuff that I always have to look up. Python goes a couple steps further with variadic arguments and the results are stunning, orthogonal, and actually useful almost every day. With Python, you get both "positional" arguments, like C, and keyword arguments: those arguments that conceptually map, in any order, to the names of the arguments in your function's declaration. The magic symbols are "*" and "**" for positional and keyword arguments respectively. With one "*", you can declare a function that accepts any number of arguments as the declared list object:

def foo(*args):
	return args
assert foo(1, 2, 3) == [1, 2, 3]

You can also pass an array of positional arguments to a function with very similar syntax:

def foo(a, b, c):
	print [a, b, c]
assert foo(*[1, 2, 3]) == [1, 2, 3]

And you can do the same thing with keyword arguments except you use dictionaries:

def foo(**kwargs):
	return kwargs
assert foo(a = 10, b = 20, c = 30) == {'a': 10, 'b': 20, 'c': 30}

Likewise, you can pass keyword arguments:

def foo(a, b, c):
	print [a, b, c]
assert foo(**{'a': 10, 'b': 20, 'c': 30}) == [10, 20, 30]

You can use them in combination, along with default arguments to provide beautiful, orthogonal, reusable abstractions:

def foo(a, b = None, c = None, d = None):
	return [a, b, c, d]
assert foo(*[1, 2], **{'c': 3}) == [1, 2, 3, None]
def bar(a, b, c, *args, **kws):
	return [a, b, c], args, kws
assert bar(1, 2, 3, 4, 5, f = 6) == ([1, 2, 3], [4], {'f': 5})

Saturday, September 27, 2008

It Shtarts with a Bloody Esh.

The light saber application for the iPhone was recently "upgraded" to a naggy advertisement application for a game that I will not give the benefit of calling out by name. If, like me, you need to "downgrade" your iPhone application, fear not, there is a way.

If you upgraded the application directly on the phone, the old version may be backed up on your main iTunes library. If you have docked since you got the application originally, and have not docked since you upgraded, just delete the application from your phone then dock and sync. The old version will be restored.

If you have already docked and synced your iPhone with your iTunes library, it will appear that the new application is both on your phone and your computer with no sign of the old one. Check the trash—it might still be there. Copy the .ipa (iPone App) file for your application to your iTunes/Mobile Applications directory and open it in iTunes. It will prompt you for whether you would like to replace the newer version. Just say, "YES!".

Thursday, September 4, 2008

JavaScript Module Standard

The purpose of this document is to propose a contract between a collection of JavaScript module loaders and modules. The specification describes the environment that module loader implementations provide, and the environment that modules may depend upon. In particular, compliant module systems:

  • map one file to one module (while leaving room for implementation-specific multi-module bundling for website performance),
  • cache singleton module objects before executing the corresponding module file,
  • execute modules with a feature-rich context,
  • resolve module URLs relative to a module path, and
  • conform to a domain-name-based site-package naming convention and leave a name-space for a centrally managed standard library.

The specification is intended to be suitable for client- and server-side JavaScript module loader implementations.

The specification is intended to provide insights and an easy migration path to future versions of JavaScript.

The specification is intended to narrow the domain in which JavaScript modules can universally depend to maximize portability.

The specification encourages modules to adhere to a strict subset of the JavaScript environment in which they may be loaded. In spirit, this is a theoretical version of the JavaScript language that provides the intersection of behaviors provided by Class-A browsers and server-side run-times including Rhino, plus this system for loading modules.

Module Execution Context

The singleton module object MUST be declared and cached BEFORE the corresponding module file is executed. The module file MUST only be executed ONCE for the duration of a page-view or JavaScript program.

In a module file's execution context, the context object, represented by this, MUST be the module object.

The scope chain, from global to local, of a module file's execution context MUST consist of:

  • builtins
  • moduleScope
  • module
  • <anonymous>

builtins

Rules for module systems:

  • The builtins object MAY be frozen.
  • All objects in the transitive closure of builtins on item selection MAY be frozen.
  • The builtins object MAY contain more values than specified.
  • The builtins object MUST include:
    • String
    • Number
    • Boolean
    • Object
    • Array
    • Date
    • RegExp
    • Error
    • EvalError
    • RangeError
    • ReferenceError
    • SyntaxError
    • TypeError
  • The module loader MAY enforce these invariants at any time. In some environments, verifying these invariants will not be possible or pragmatic.
  • All objects in builtins MUST conform to the JavaScript subset described in the introduction: one that consists of the intersection of behaviors of the respective objects in all Grade-A browsers and server-side JavaScript environments.

Rules for modules:

  • Modules MUST NOT write items to the builtins object.
  • Modules MUST NOT modify any object in the transitive closure through references on the builtins object.
  • Modules MUST NOT access any items in the builtins object not herein specified.
  • Modules MUST NOT use non-standard features provided by builtins.

moduleScope

The moduleScope is a module's private name-space for module loader functions and imported values.

The moduleScope MUST provide:

  • builtins
  • module
  • moduleUrl
  • moduleRootUrl
  • require
  • include
  • foreignModuleBind
  • log

The moduleScope MAY provide:

  • register
  • publish

Modules MAY augment the moduleScope with additional items.

Modules MUST NOT overwrite the items specified here.

Module Loading

require(<moduleUrl>, [<structure>])

The require function returns an object with items from a foreign module. The required module is referenced with a URL. By default, all items from the foreign module are copied into the returned object. The returned object MAY be frozen. Modules MUST NOT modify the returned object. If a structure is provided, a subset of the items from the foreign module will be returned, the result of destructure(<module>, <structure>). If a function in the foreign module was declared with the foreignModuleBind decorator, the corresponding item in the returned object is the result of <foreignModule>.moduleScope.moduleBind(<name>, <value>).

If the URL begins with a dot, ("."), the fully qualified URL for the requested module is resolved relative to the fully qualified URL of the current module. This would be the result of urlJoin(moduleRootUrl, moduleUrl, foreignModuleUrl). Otherwise the fully qualified URL is urlJoin(moduleRootUrl, foreignModuleUrl).

Regarding module file names:

  • Directory components and file names in modules MUST be in camelCase.
  • Modules MUST have a ".js" extension if they are provided by files.
  • Modules MUST not have an extension if they are provided by the module loader but are not backed by real files. This might include a "window" module in a particular browser implementation.
  • The module root is reserved for a cross-browser JavaScript standard library.
  • Modules provided by entities other than the standard library MUST exist in a subdirectory of the module root corresponding to a domain name controlled by the author, or a subdirectory thereof.

Module authors are encouraged to use module relative URLs to increase the mobility of entire directory trees of modules.

include(<moduleUrl>, [<structure>])

The include function defers to require, both its arguments and its return value. However, include also copies all of the items from the object returned by require to the moduleScope object.

foreignModuleBind(<function>)

A function decorator that denotes that the module loader guarantees that, when the decorated function is called, it will receive the module object for the module file in which it was called as its context object: this. foreignModuleBind MAY return the same Function it was passed. The returned function MUST be usable in the module in which it was declared as if it were the function passed to foreingModuleBind. foreignModuleBind MAY modify properties of the given Function.

For example:

this.foo = foreignModuleBind(function () {
	log("foo called from: " + this.moduleScope.moduleUrl);
});

destructure(<object>, <structure>)

For the purpose of this specification, the "destructure" function has the following semantics. If the structure is an Array, the returned Object contains the items from the given <object> corresponding to the keys provided in the given Array structure. If the structure is an Object, the returned object contains items where each key corresponds to a value in the structure, and the value is a value from the object corresponding to the key in the structure. For example:

  • destructure({"a": 10, "b": 20}, ["a"]) == {"a": 10}
  • destructure({"a": 10, "b": 20}, {"a": "A"}) == {"A": 10}

module

The module object is a module's public interface. Adding items to the module object makes them available for export to other module scopes. To that end, the module object is mutable. The module object MUST provide a moduleScope. Modules MAY augment the module object.

moduleRootUrl

moduleRootUrl is the fully qualified URL of the JavaScript site-packages directory: the module root.

moduleUrl

moduleUrl is the relative URL of the current module file relative to moduleRootUrl.

log(<message>, [<label>])

log is a simple console logging function.

The module loader MAY ignore calls to log. The module loader MAY ignore the label argument.

The optional label MAY be any string, and MUST be suitable for use as a CSS class-name (preferably lower-case delimited by hyphens) of which the following MAY be significant:

  • info
  • warn
  • error
  • module
  • help
  • pass
  • fail

Afterword: Browser Implementations

This specification outlines the process of requiring modules from within other modules. However, in a browser's global context, JavaScript execution blocks are not modules. To that end, this specification does not require that the module loader be invoked in any particular fashion. A particular implementation might hook an initial module to be loaded from within a script tag. Another implementation might scan the DOM for script tags with an alternate language attribute and execute them as modules with the current page's URL as their module URL.

Afterword: Future ECMAScript import semantics.

A future version of the ECMAScript standard might specify new syntax and semantics for importing modules. Current discussions about this feature trend toward having new syntax that "desugars" to native JavaScript. To that end, I propose the following syntax and desugaring transformations in the context of this specification:

  • import "<moduleUrl>" as <moduleName>;
  • module.<moduleName> = require("<moduleUrl>");
  • from "<moduleUrl>" import *;
  • include("<moduleUrl>");
  • from "<moduleUrl>" import <a>;
  • include("<moduleUrl>", ["<a>"]);
  • from "<moduleUrl>" import <a> as <a'>;
  • include("<moduleUrl>", {"<a>": "<a'>"});