13.4.06. Snakes and Rubies in the Temple of Bondage and Discipline down in the City of Heartbreak and Needles

One of the new topics making the rounds is a spate of responses to an old (2-3 years, at least) post comparing Python and Ruby. One piece of interest is this bit:

Larry Wall, the designer of Perl, has the slogan “There’s more than one way to do it”. In contrast, Bertrand Meyer, the designer of Eiffel, says “A programming language should provide one good way of performing any operation of interest; it should avoid providing two.” Ruby follows the anarchist approach of Larry Wall, while Python follows the bondage-and- discipline approach of Eiffel.

While it’s stated in the Python import this (aka “The Zen of Python”): There should be one – and preferably only one – obvious way to do it, that’s seldom the case in actuality. Ian Bicking’s response to the post linked above mentions this as well. There are well meaning attempts at consistency – but do they happen?

Python makes some interesting design trades. As many note, it’s not a so-called “Pure Object Oriented” system. You can write Python in a purely procedural style, in a Modula style, in functional style, in pure scripting style, in “pure OO” style, or in a mix of all.

The comprehensive standard library is a big mix of all of these styles. Some of it stems from certain core modules being thin layers over common C libraries. Some of it stems from different styles of different authors. Some of it stems from age.

I’ve generally been a fan of Python’s modular approach. Common modules like os and os.path provide a nice – but thin – abstraction over operating system specific behaviors, especially in regards to the file system. Python’s early development, according to legend, happened on Macs (and considering Python’s age, it should be noted that this was long before Mac OS X). Classic Mac OS, DOS/Windows, and Unix all spell paths differently. In the mid nineties, as Python was starting to really find its audience (including me), its primary rival was perceived to be Perl. Perl’s history is steeped in Unix style shell scripting. Perl scripts, then, typically had very Unix style paths and elements inside of them, just by default. While there were some good Perl implementations available for classic Mac OS, it was hard to play with other peoples code – there were just a lot of system expectations built in, especially at the time I started experimenting with the two P languages.

Python, on the other hand, had the ‘os’ and ‘os.path’ modules. These modules contain, among other things, knowledge of how the current OS spells certain file system elements, like ‘parent directory’ or path separator. Classic Mac HFS paths were separated by colons, for example, and ‘pardir’ was not ’..’ but ’::’. And, of course, Windows/DOS uses backslashes instead of forward slashes. So while you could write paths the normal way, it was (and still is) considered good design to use the os module’s tools:

    pydir = os.path.join(os.pardir, 'lib', 'python')
    appdir = os.path.join(pydir, 'example')

But! The os module’s elements for working with paths is very procedural. There’s not really a built in concept of a Path object. That may be changing in Python 2.5 if PEP 355 really finds its way in. Then the above could be more like:

    pydir = Path(Path.cwd().parent, 'lib', 'python')
    appdir = Path + 'example'

While that doesn’t look like much, it’s when one looks at some of the proposed replacements that one can see the difference:

    CURRENT (using 'os')
    --------------------
    DIR = '/usr/home/guido/bin'
    for f in os.listdir(DIR):
        if f.endswith('.py'):
            path = os.path.join(DIR, f)
            os.chmod(path, 0755)

    NEW (using 'path')
    ------------------
    for f in Path('/usr/home/guido/bin').files("*.py"):
        f.chmod(0755)

Taking that old – but useful (and usable) – procedural code and turning it into a class / objects is just beautiful, and is one of those “why didn’t they do that before?” things. The new Path class also brings a lot of functionality that’s dispersed in other modules (that people may not even be aware of) together, like ‘glob’, ‘fnmatch’, and ‘shutil’. (It looks like Ruby 1.8 picked up something similar with the new Pathname class).

The point being – this is an area where there are really quite a few ways of doing things in Python. Is it good? Is it bad? It’s a nice kind of freedom, I guess, to be able to write in both procedural/modular and OO styles. There’s definitely no hard “bondage and discipline” here.

There’s also no hard “bondage and discipline” in Python when it comes to naming. There are some good practices, but not everyone follows them (myself included). I have a personal style guide, much of it taken from practices we had at Zope corp (4 spaces indentation, must fit in 80 columns) and styles picked up over recent years (all package/module names lower case, all classes capitalized, never import * if it can be avoided, etc). And I wince when I see code that differs – mixed case package/module names being one of the biggest turn-offs to me, personally. import * has generally been offensive as well. One thing that I’ve historically liked about Python has been the ability to trace back any name I see in code. “Path… ah, it came from from path import Path! And implements.. Oh, it came from zope.interface import implements! and self.bluecheese is obviously an attribute available to the instance!” import * tends to get in the way of name chasing. I like to learn code by reading it (since most documentation tends to be flaky), so being able to find where ‘mapper’ or ‘begin’ or ‘desc’ or ‘Table’ came from is something I like. And for maintainability reasons, I’ve liked being able to find and trace names within a function/class/module without having to speculate.

But that’s just my preference. Python’s namespaces and general rules here have generally worked out for me. But as I’ve gone on, I’ve found myself much more critical of other Python code than I should be, which is contributing to this crisis of faith I’ve been going through. Whereas when I’m ready Ruby code, I have different expectations and can be a lot more forgiving.

But there’s something significant that I’ve come to realize about Ruby. It can be argued that Python’s biggest “bondage and discipline” element is its use of significant whitespace (clear structure by indentation is enforced at the language level – although there are ways around this). Ruby, on the other hand, enforces significant naming conventions. As I’ve been reading Ruby code (with most of my experience being limited to Rails and code from why the lucky stuff), I’ve grown to appreciate this:

  • Constants begin with a capital letter. Classes and modules are constants, and so they are always capitalized. Pathname.pwd, ActiveRecord::Base, and so on.
  • variables are lower case (or at least, start with a lowercase letter). This includes methods.

Understanding those two elements alone (along with learning Ruby’s symbols for self.attribute access, etc) has made it really easy to start reading and understanding Ruby code. I can understand how Rails can be a system that “prefers convention over configuration” – there seems to be a fair amount of convention enforced in the language and core library as it is.

Another thing I find interesting is that object attributes are not exposed directly, but can be exposed easily through methods. With parenthesis being optional on method calls, this makes zero-input methods (ie – ‘reader’ methods) much easier to use and work with. An historical Zope 1 and 2 problem was the case of ‘id’. Some objects would just have their ‘id’ (name within their container) set as an attribute. Others would compute it and have a callable method. In DTML and TAL, you could refer to ‘id’ and the templating language’s expression engine would figure out if it needed to be a call or not (and then translate it into ‘id()’ if needed). So sometimes, obj.id would return a string like ‘monkey’. But other times it would return (bound method id of ...). Python’s descriptors (since Python 2.2) make this a little bit better, but in my encounters with Ruby so far, it’s made code more readable and consistent and predictable. The proposed Path class for Python (mentioned above), for example, has a mix of properties and methods:

    curpath = Path.cwd()
    documents = curpath + 'Documents'
    curpath.parent
    curpath.realpath()
    curpath.mtime()
    curpath.isdir()
    str(curpath)

Compared now to Ruby’s Pathname object:

    curpath = Pathname.pwd
    documents = curpath + 'Documents'
    curpath.parent
    curpath.realpath
    curpath.mtime
    curpath.directory?
    curpath.to_s

The Ruby code just looks cleaner, more consistent – even though one could put parenthesis on all of these calls. ‘Pathname’ is obviously (or highly likely) to be a class. For the Path class proposed in Python PEP 355, one of the open issues in the PEP is this:

The name obviously has to be either “path” or “Path,” but where should it live? In its own module or in os?

Let’s look at that first part again: The name obviously has to be either ‘path’ or ‘Path’. Either? Well, a lot of core Python classes and types are lower case – although many of them weren’t really types/classes until recently. I’m talking ‘str’, ‘object’, ‘file’ (aka ‘open’), etc. Some of this has to do with the history of the language and how it has grown / changed over the years. But obviously there’s more than one way to do it if you can’t really be sure of the proper way to capitalize it.

Indentation can be significant when scanning over code (not even reading into details). So can naming and other behaviors. Python programmers may see the ‘end’ keywords for Ruby’s blocks for for loops, if statements, etc, and go “ha! Python don’t need that line noise!”. But Ruby programmers may just as easily look at parenthesis on method calls that take no arguments and go “Ha! Ruby don’t need that line noise!” They may also look at things like ‘list.sort()’ and wonder why they don’t get a sorted copy of the list back (and may also wonder why it took until Python 2.4 for Python to finally grow an in-line ‘sorted copy’ function of its own). Ruby’s use of punctuation helps distinguish between the destructive (changes the object in question) and not destructive (returns a copy of the object in question):

    Python:
    >>> alist = [3,2,7,13,8]
    >>> print alist.sort()
    None
    >>> alist
    [2, 3, 7, 8, 13]
    >>> alist = [3,2,7,13,8]
    >>> print sorted(alist)
    [2, 3, 7, 8, 13]
    >>> alist
    [3, 2, 7, 13, 8]

    Ruby:
    irb> alist = [3,2,7,13,8]
    irb> p alist.sort
    [2, 3, 7, 8, 13]
    irb> alist
    => [3, 2, 7, 13, 8]
    irb> p alist.sort!
    [2, 3, 7, 8, 13]
    irb> alist
    => [2, 3, 7, 8, 13]

Things like list.sort and having len(), sorted(), etc, being separate methods are things you get used to in Python. At the same time, it gives bad guidance for best practices (or just good practices) for writing code. If writing something like a Field object which can be bound to another object – should ‘bind(target)’ return a new copy of the Field bound to the target, or should it bind in place? In Ruby, one might write two methods, bind(target) and bind!(target), with the second one denoting that it changes the Field in place.

I don’t know how consistent other Ruby developers are in following that standard, but it’s great to have as an option. I also like the question mark used for boolean methods: alist.empty? alist.include? 7, and so on. Again – in my limited experience with Ruby, it’s been easier to understand a lot of the code I’ve seen:

    popped = alist.pop if alist.respond_to? :pop
    if popped.nil?
        puts "Bastard didn't pop" 
    end

    if hasattr(alist, 'pop'):
        popped = alist.pop()
    else:
        print "Bastard couldn't pop" 

The differences are negligible here, mostly a matter of preference. How about testing for emptiness? Not just testing for truth – but testing to see if a container is empty?

    Ruby:
    if alist.empty?
        puts "It's Empty" 
    end

    Python:
    # truth testing
    if not alist:
        print "It's Empty" 
    # length testing
    if len(alist) == 0:
        print "It's Empty" 

    # Writing a custom 'empty' tester.
    def empty(obj):
        return (len(obj) == 0)

    if empty(alist):
        print "It's Empty" 

    # But 'empty' could also be destructive
    def empty(obj):
        obj[:] = []
        return obj

    if empty(alist):
        print "will always be empty because this version of empty destroys" 

More than one way to do it. Of course, the truth testing is the most commonly used in Python – but there are times when ‘emptiness’ and ‘truthiness’ have different meaning / impact. alist.empty? reads easy. The punctuation in the name differentiates it from a method like empty or empty! which might clear the object out (like ‘clear’ does in Ruby arrays). So this useful feature and convention makes it possible to look at an objects list of methods and pick out the queries.

So – am I saying Ruby is better than Python? No. Nor am I saying it the other way around. There are some very different designs here. But the enforced naming conventions and other small touches – constants versus variables, leading to consistent class and method spellings; punctuation in method names; ‘optional’ method call parenthesis causing one not to have to worry about whether they need to use object.id or object.id() – have enabled me to read large framework code like that in Rails or Hobix as easily or more easily than Python, even though Python claims to be written for readability and I’ve been programming in Python for a decade now and I’ve still scarcely used Ruby at all. There really seems to be a clean and simple elegance about it that I find myself yearning for.

The communities are interesting too. Python can’t even seem to settle on a good documentation tool. pydoc is way too simple in its HTML mode (but quite nice in the interpreter when you can do help(obj)). epydoc is quite comprehensive and supports comprehensive markup options, including the ability to document instance variables, method parameters, etc…, but its output can still be a bit hard to read.. Or at least, to differentiate the wheat from the chaff (I don’t think it’s epydoc’s fault, Python modules and objects may just be exposing too many extra objects that don’t always need to be documented and those get in the way of finding useful help). The almighty effbot has added PythonDoc to the mix as well. And there’s also Pudge. Zope 3 has its own API documentation tool with some specific knowledge of additional Zope 3 elements – sadly there’s no publically available online version to use for reference (yet).

And with all of these options – there’s nothing that quite matches the discoverability and usability of the Ruby on Rails API Docs. Again, I’ve gotten far better answers out of it than on any of the Python web tools I’ve evaluated… Ever. Which is why reading the source and being able to follow names is so important to me in Python. The source is often the best documentation you can easily get to and through. The Python standard library is very well documented. But beyond that, it’s often a lot of work. Ian Bicking wrote in his post Orthogonality is Pretentious (a post I generally disagree with – orthogonality leads to a pleasant predictability, I think, in languages like Smalltalk and also in Ruby) about PHP:

One example of this design in PHP is the lack of objects in the standard library. Instead it is one very long list of functions. If you want to do something with the standard library there either is a function for doing it, or there is not a function. If you want to see what you can do, you can just scan through the index.

And that’s what I like about Ruby on Rails. In the API Docs, one of the frames lists all of the methods in the system. That alone has been useful for just looking up functionality. One big list of functions (essentially), most of which are available where you need them most. Easy list to turn to and scroll through.

Similar functionality is in the WebHelpers library for Python – but there’s no version of the docs (at this time) that have that nice “big list of functions”. It’s click / try / click / try / click / try. At least it’s documented! But still, it just takes more time to weed out pertinent information.

Jumping to api.rubyonrails.org and scrolling through that bottom list to find ‘has_many’ again to find its options is a very quick process. (Just like most Python standard library lookups – the library and module indexed reference for Python is damn handy!)

Ugh. Where was I going with all of this? Nowhere really. I still like Python and it’s still my working programming language. But there are a lot of things about Ruby and what its community has been producing over recent years (beyond just Rails) that are increasingly enticing.

7.4.06. Because I Could (SQLAlchemy, Zope 3, Metaclass Fun, Etc)

I continue to be (generally) impressed by SQLAlchemy. After a couple of days worth of work, digging around the guts of Zope transactions and thread management along with SQLAlchemy’s guts of similar nature, I think I have a system in place that will work fine with Zope 3. Zope 3 uses a two phase commit transaction management system, whereas SQLAlchemy generally supports the basic ‘begin / commit / abort’ cycle. However, by taking advantage of SQLAlchemy’s nesting of transactions, I essentially have the following model in place:

  • Zope transaction begin—SQLAlchemy engine begin (starts a fresh outer scope and calls ‘begin’ on the database connection itself)
  • On Zope commit start (first phase of commit)—SQLAlchemy objectstore commit (sends changes managed in SQLAlchemy’s unit of work / object graph over the wire)
    • This is of interest because objectstore commit also calls engine.begin() and engine.commit(). SQLAlchemy is designed for ‘reentrance’, so that multiple calls to something like engine.begin() are OK – as long as they’re matched by an equal number of finishing calls like engine.commit(). When the nesting count is back to zero, engine.commit() saves.
    • In two phase commits, it’s allowable for an error to happen here. It’s probably not preferred, but it’s OK.
  • On Zope commit finish (final phase of commit)—SQLAlchemy engine commit. This ‘commit’ matches to the ‘begin’ at the start of the Zope transaction, and sends the ‘COMMIT’ statement to the target database.

Beyond that, I’m not doing too much work to track changes in either environment (SQLALchemy, ZODB) in this current setup. ZODB’s persistence manages the loading and unloading of its objects, and SQLAlchemy manages its own too. It’s not that I’m trying to mix them – it’s just that Zope really really really wants to run with the ZODB, so even if it’s not used, it’s probably going to be there.

If this system ends up actually doing its job, I’ll share the code. I’m trying to write it separate from the application I’m developing.

Also: for kicks, I’ve toyed with how I might get ActiveRecord style sytax (which I like) using Python and SQLAlchemy. This mostly works as it stands right now:

    from ruts.storage.base import Base
    from ruts.storage import act
    import db.tables

    class Venue(Base):
        act.fromTable(db.tables.venues)

        act.hasMany('Event', 'events', order_by='start_date')

    class Presenter(Base):
        act.fromTable(db.tables.presenters)

        act.hasMany('Event', 'events', order_by='start_date')

    class Event(Base):
        act.fromTable(db.tables.events)

        act.hasOne(Presenter)
        act.hasOne(Venue)
    >>> v = Venue.get(1)
    >>> v.events
    [<exmple.model.Event object at 0x104a930>]
    >>> v.events[0].venue.title
    'Testy McVenueVille'
    >>> v.events.append(Event(title="Jackson"))    
    >>> v.commit()
    >>> v.events
    [<example.model.Event object at 0x104a930>, <example.model.Event object at 0x2b2c090>]
    >>> e2 = Event.get(2)
    >>> e = Event.get(1)
    >>> from datetime import timedelta
    >>> e2.start_date = e.start_date + timedelta(1)
    >>> e2.end_date = e.end_date + timedelta(1)
    >>> e.end_date
    datetime.datetime(2006, 4, 8, 23, 0)
    >>> e2.end_date
    datetime.datetime(2006, 4, 9, 23, 0)
    >>> v.events[1].end_date
    datetime.datetime(2006, 4, 9, 23, 0)
    >>> v.events[1].title
    'Jackson'
5.4.06. SQLAlchemy

Among the short list of good (or at least, personally interesting) Python projects under development today stands SQLAlchemy. It’s a powerful toolkit for working with relational databases in Python, including doing object-relational mapping. It offers a lot more power and flexibility than SQLObject (Python) and ActiveRecord (Ruby), while still staying pretty close to those in relative ease of use (ie – no Hibernate style XML mappings).

Especially interesting is that SQLAlchemy is a SQL toolkit first, and O-R mapper second. It has a lot of support for query construction, compilation, native feature support, feature-agnostic support, pre-processing, post-processing, and more, that can also be used in the mapping layer. Already it seems like it’d be free of the kind of issues that I ran into when I tried to deal with a moderately complicated Inheritance / contained object scheme with Rails and ActiveRecord.