Secession

Sit back and imagine how Rush Limbaugh and his compatriots would have reacted if the Democratic governor of New York had talked about seceding in 2002.

Can you even imagine the deafening cries of unamericanism? Of a lack of patriotism? Of unfitness for office?

What a bunch of hypocrites.

A version to rule them all

I've been working on Conary for a long time now. For those who don't know, Conary is basically a version control system for deployed systems. We all use Subversion or whatever for our sources (and probably anything else that looks like source if you squint the slightest bit) for years now, but for some reason we don't use them for deployed systems.

At best, we use a loose collection of versions. Tools like dpkg and rpm (and apt and yum) have integrated the versions of software components into peoples minds pretty well. Questions like "which version of vi are you running" are easy to answer in the Linux world. Simple commands like "rpm -qp /bin/vi" will give you reasonable answers.

What that leaves is systems defined by, oh, 500 versions or so. Oh, and those versions probably don't include version information for your actual application or third party software products. Add those to the count.

What all this means is that those 10,000 servers you're running don't have a version each, they have 500 versions each. You can't go look at two of them and easily see if they're the same. You can't make two of them match without going through backflips. You can't ask what version a server was running last week because the question doesn't make sense; you have to ask what 500 versions it was running last week. It's like using RCS to manage all of your source files. You might get a tag to get some kind of consistency, but aren't git versions better? Finally a single version to describe the state of your source tree. A single way ("hg parent" in my world) to know what the heck is there.

Conary provides the same capability to running systems. Define your systems as Conary groups and have a single version. You need another system like that one? Just install the same version of the same group? You want to know what it was running last week? Look at what version of the group was on there last week (the rollback stack or /var/log/conary will both tell you). Do you need to downgrade? No problem.

I simply don't know how system management can scale without a version associated with a system. Not a piece of a system, but the entire system.

Fun and games with a Kindle

Right after the Kindle 2 started shipping, I bought one (that Monday). I figured if I could get FAA approach plates on the thing (like ReaderPlates does for the Sony) I'd save enough money to make the thing free within about 18 months. Oh, and it was a toy.

Funny enough, I've discovered that I kind of like reading books on the thing. Especially big, fat books where I lose track of things and having the search is great. But really any book I don't feel the need to own forever I'm happy to read on the device. It's light, the UI is good, the battery lasts forever, and you really disappear into a good book just like you do with paper. Nice job Amazon.

That aside, I've also enjoyed hacking up a couple of web apps which generate ebooks on the fly for the thing. I have one which I can list a couple of airports on and get an ebook with all of the FAA plates I need to fly in instrument conditions. The plates are date stamped, so I know when I'm current. Michael was a huge help in getting all of that done. It gave me a chance to play with django and genshi too, which I needed to learn more about anyway.

Today I decided to play with the TripIt APIs and see what I could make happen. After 240 lines of python code and about the same amount of genshi templates (with a huge dependency on xobj!), I have a web app which lists my itineraries in TripIt, let's me choose one, and creates an ebook summary of that itinerary. Now all the intregration I've done with email filters for TripIt pays off not only in iCal, but in a book format. Alright, so it's not clear why I need my itinerary on my phone's calendar and in my kindle, but it is cool to have hacked that up so quickly (almost all of it while a chicken was roasting). The authentication isn't real (it uses basic auth, not oauth), but it's real enough for me!

Airline TV payment model

I'm writing this at 30,000 feet (I'll post it later) on a flight from New York to San Diego. A long flight. It's scheduled for six and a half hours nonstop; it's about as far as you can go without leaving the continental United States.

I spent some time hacking on a side project I'm working on with mkj, wrote about a third of a white paper he and I are working on with Jake, read the latest Economist for a while, ate dinner (eh), had dessert (yummy, but I shouldn't have done it), tried not to watch Gary Unmarried, watched all of Big Bang Theory, watched most of a How I met Your Mother, mostly ignored a movie about a teenage girl (I don't identify), played mine sweeper a bit out of pure desperation, and now I'm waching the video screen loop.

I have an hour and a half left.

As my eyes flicked back up to Gary Unmarried I realized it was a repeat. A loop. This flight is so long they ran out of material. Get me out of here.

I was actually tempted to watch it this time, but managed to resist and started thinking about what shows I do watch. The list is really pretty short. In order or priority they are:

  1. 30 Rock
  2. My Name is Earl
  3. Scrubs
  4. some animated stuff I record but don't seem to actually watch

What struck me about this list is that I started watching 30 Rock on airplanes and My Name is Earl in hotel rooms (yes, I travel a bit). Thinking about this list I realized that I also watch Big Bang Theory and Everybody Loves Raymond when I'm traveling because I've seen it on airplanes and they can both make the time pass. That's about all I watch save the Family Guy which seems to have eaten TNT.

Ignoring Family Guy (which is little more than background noise), and I've started watching 60% of my shows on airplanes, and another 20% when I'm traveling. My wife got into Scrubs, so of the shows I've picked out to watch over the last few years, I've started watching all of them when I'm traveling, and most of them on airplanes!

This surprised me a bit, and has an interesting implication. I've always wondered how the partnership between airlines and the networks work (American partners with CBS and Delta with NBC; notice I don't watch anything from ABC). I had always assumed the airlines paid the networks for content, but based on a sample size of one the highly captive audience on an airplane is incredibly valuable because it has a high potential of leading to new viewers. I wonder if the networks pay the airline to show content to that audience. If they don't, they probably ought to.

80 minutes left. I'm back to being board.

Google Maps Fail

Too really get this, click on the links...

Say you're staying at the Doubletree hotel in downtown boston. You look to see where it is in google maps. Realizing that you are renting a car, but there is no reason to keep it for the next day, you search for Avis locations nearby.

Look at that, there is an Avis location right next to the hotel at 1 Bennet Street! Fantastic! So call up Avis (you're out on a Sunday and you need the car for the next day), get the pickup at Logan, and the drop off at 1 Bennet Street. All set.

Now it turns out that the Charles Hotel at 1 Bennett Street (the hotel with the Avis, and note the extra t) is in Cambridge, not Boston. That's miles away, and not at all useful. There is a 1 Bennet Street in Boston though, right next to the hotel you're staying at. Apparently that and Cambridge being kind of like Boston were enough to confuse google maps, and you.

Toll roads

Here is a concise, well written justification of toll roads. My wife regularly makes fun of me for suggesting that privatizing the interstate system would be a good idea. It's nice to see some sort of economic argument about how to fix road overcrowding.

Now, if someone would explain to me it makes sense to highly subsidize automobiles, moderatley subsidize air travel, and barely subsidize train travel I'd appreciate it.

XObj, Part 2

giles7777 posted some comments about the XObj announcement I posted here on Tuesday. He raised some points Brett and I talked about quite a bit when we put this thing together, so I wanted to respond to them. As an aside, giles7777 and I did more extreme programming together (at the time, we called it finding the damn bugs, but now it has an actual name!) than I've done with anyone else, so I hate ignoring his thoughts!

He asked if we had thought about parsing the schema instead of the XML instance in order to generate types. This approach is quite popular in the Java world, for instance. It's quite sane for a non-dynamic language where items need to be strongly typed from the beginning, and there are project in the python world for it as well. Brett and I talked long and hard about starting with the schema, and it was the approach I favored in the beginning. There were a few reasons we didn't take that approach.

One important reason was we wanted a framework which works with XML documents without schemas. While there are proper schemas for many types of documents, the informal XML formats we use internally at rPath, as well as in many of our APIs have never had a schema written. You can certainly argue that we're lazy not to do so, but I strongly suspect the vast majority of XML documents do not have a schema associated with them.

Another reason is that schemas can be poorly defined, requiring augmentation to be really useful. OVF is an example of this, unfortunately. The types use attributes which should rightfully be defined as IDREF types, but aren't. Having a way to augment a formal schema was necessary for us, and leaving the schema behind was an easy way of achieving this. (The python XObj implementation currently resolved ID/IDREF pairs for both serializing and parsing, automatically referencing a single object in both places; the AS3 implementation is a little behind but will get this feature soon)

The third major reason we wound up ignoring schemas was forward compatibility. The approach we took will parse any XML, let the application modify the pieces it understands, and output the XML without losing the unknown parts. We can represent both the known and the unknown consistently, allowing us high levels of compatibility as the document content gets enhanced. You can argue this is a misfeature as it has a high tolerance for documents which are not compliant with their own schema, and the Python implementation does allow the caller to validate a document against a schema as part of the parsing process.

The final reason, which may not be as important of a consideration as the other two, is we wanted a learning curve for this library which was flat. A lot of the existing XML parsing solutions take some getting used to, and coders who aren't steeped in XML take time to get productive. By ignoring the schema, not having to have code generation steps, and generating native objects XObj is easy to use. A single line turns XML into an object hierarchy which is easy to use, modify, and spit back out. The downside of course is that it's easy to create documents which violate a schema when you do so.

Introducing XObj

A couple of weeks ago, I was talking with Brett Adam about XML decoding. We're looking at how to handle OVF in python, and I was talking about the approach the open-ovf had taken and my desire to get real python classes around the OVF objects instead of a python layer around the XML. Something which turns the XML into an artifcat of the object model instead of an object model which allows access to the XML.

Brett told be about the approach to XML he'd used for the ActionScript implementation of the rPath Management Console, and I thought it was a good approach. He enhanced the XML parser provided with ActionScript by making it type aware; elements would get parsed into classes of the right type which could then be serialized on the output. This provides a natural place to hang methods as well as a nice way of providing typing hints into the parser.

I decided to code up a similar approach for python. As I worked on it, I kept walking into Brett's office with corner cases, XML oddities, and things I just didn't understand. We decided to work together on a common approach to XML parsing in the two languages, and release the work as the XObj open source project under an MIT license. The code hasn't been released yet, but it is available via mercurial at http://hg.rpath.com/xobj/".

You should really think of what we've built as an object reflector. It's goal is to either take the objects described by an XML document and turn those into a set of classes (either python or AS3) consistent with that document, or to take a set of objects and serialize those in XML. It allows type information for both elements and attributes, the caller to specify which objects to use where, and places to put serialization hints.

One of this projects primary goals is to be easy to get started with and let the programmer use the more complicated features as he finds the need. Let me show you a couple of examples from python.

>>> from xobj import xobj
>>> class Foo(object):
...     pass
... 
>>> class Bar(object):
...     def sum(self):
...         return self.first + self.second
... 
>>> foo = Foo()
>>> foo.val = "value"
>>> foo.bar = Bar()
>>> foo.bar.first = 1
>>> foo.bar.second = 2
>>> print xobj.toxml(foo, 'foo', xml_declaration = False)
<foo>
  <bar>
    <first></first>
    <second>2</second>
  </bar>
  <<val>value</val>
</foo>

See? Nice and simple. Now, let's say that same hunk of XML is stored in the string varaible xml. Here's how you turn that into a python object tree:

>>> doc = xobj.parse(xml)

The object which is returned, doc, represents the entire document. If includes some housekeeping elements which make generation cleaner, as well as objects which repesent the XML document. The top level element in the XML was called foo, so that's where the element representing the top level element was stored.

>>> doc.foo.val
'value'
>>> doc.foo.bar.first
'1'
>>> doc.foo.bar.second
'2'

Simple in and out object serialization with XML in between. Nothing fancy, and there are certainly other ways of doing this. Let's look at what else we can do now though. We have lost the class information for Bar though since we didn't tell the parser what object to use for the bar element. We can fix that though.

>>> doc = xobj.parse(xml, typeMap = {'bar' : Bar})
>>> doc.foo.bar.sum()
'12'

Okay, so maybe that's not quite what we wanted. We need to tell the parser that first and second are integers, not strings.

>>> doc = xobj.parse(xml, typeMap = {'bar' : Bar, 'first' : int, 'second' : int})
>>> doc.foo.bar.sum()
3

That typeMap specifies the type for bar elements and it's elements first and second. To avoid those maps getting overly complicated, here is another way of doing the same thing:

>>> class Bar2(Bar):
...     first = int
...     second = int
>>> doc = xobj.parse(xml, typeMap = {'bar' : Bar2})
>>> doc.foo.bar.sum()
3

Here we're using class variables as form of prototyping. We could carry this even further, and specify a class for foo which tells what bar</tt> should be. >>> class Foo2(Foo): ... bar = Bar2 ... >>> doc = xobj.parse(xml, typeMap = {'foo' : Foo2}) >>> doc.foo.bar.sum() 3 </pre>

There are lots of other things you can do with prototypes like this. The last I'm going to show here (as it is 5pm two days before Christmas!) shows how you can force an item to be a list.

>>> doc = xobj.parse(xml, typeMap = {'bar' : [ Bar2 ] } )
>>> doc.foo.bar[0].sum()
3

By making mapping the bar element to a list of </tt>Bar</tt> classes, we've told the parser to always use a list (normally an element creates a list of objects only if that element appears more than once. (Note that I used a typeMap here instead of the prototype in the Foo class, but both methods are identical).

There are lots more things XObj can do, all of which are demonstrated in the python test case. Give it a try, and tell me what you think!

Conary and Version Control

When we got started on Conary, one of the original goals was to bring the strengths of the version control tools used by software developers to the systems tools used by system administrators. We explored those ideas in the first paper we presented on Conary, and it's been an important guideline ever since.

I was talking to an architect at a well known financial institution last week, and he had a good analogy for the power of Conary. He compares the rpm/dpkg/yum/apt methodology to rcs. Each file is separately managed, with separate versioning and little to no coordination between the files. Conary, according to him, was much more like subversion (I'd have preferred mercurial, but nevertheless!) where the entire archive was treated and versioned as a whole (thanks to Conary's groups). That makes it much easier to keep sets of things in sync and operate at a system level instead of an individual package level.

I thought this was a pretty good way of thinking about how Conary is different from the older approaches, and provides some insight into why groups are so powerful.