Thursday, October 18, 2012

Python's screwed up exception hierarchy

Doing this in Python is bad bad bad:
try:
  # some code
except Exception, e:  # Bad
  log.error("Uncaught exception!", e)
Yet you need to do something like that, typically in the event loop of an application server, or when one library is calling into another library and needs to make sure that no exception escapes from the call, or that all exceptions are re-packaged in another type of exception.

The reason the above is bad is that Python badly screwed up their standard exception hierarchy.
    __builtin__.object
        BaseException
            Exception
                StandardError
                    ArithmeticError
                    AssertionError
                    AttributeError
                    BufferError
                    EOFError
                    EnvironmentError
                    ImportError
                    LookupError
                    MemoryError
                    NameError
                        UnboundLocalError
                    ReferenceError
                    RuntimeError
                        NotImplementedError
                    SyntaxError
                        IndentationError
                            TabError
                    SystemError
                    TypeError
                    ValueError
Meaning, if you try to catch all Exceptions, you're also hiding real problems like syntax errors (!!), typoed imports, etc.  But then what are you gonna do?  Even if you wrote something silly such as:
try:
  # some code
except (ArithmeticError, ..., ValueError), e:
  log.error("Uncaught exception!", e)
You still wouldn't catch the many cases where people define new types of exceptions that inherit directly from Exception. So it looks like your only option is to catch Exception and then filter out things you really don't want to catch, e.g.:
try:
  # some code
except Exception, e:
  if isinstance(e, (AssertionError, ImportError, NameError, SyntaxError, SystemError)):
    raise
  log.error("Uncaught exception!", e)
But then nobody does this. And pylint still complains.

Unfortunately it looks like Python 3.0 didn't fix the problem :( – they only moved SystemExit, KeyboardInterrupt, and GeneratorExit to be subclasses of BaseException but that's all.

They should have introduced another separate level of hierarchy for those errors that you generally don't want to catch because they are programming errors or internal errors (i.e. bugs) in the underlying Python runtime.

Saturday, October 6, 2012

Perforce killed my productivity. Again.

I've used Perforce for 2 years at Google.  Google got a lot of things right, but Perforce has always been a pain in the ass to deal with, despite the huge amount of tooling Google built on top.  I miss a lot of things from my days at Google, but Perforce is definitely not on the list.  Isn't it ironic that for a company that builds large distributed systems on commodity machines, their P4 server had to be by far the beefiest, most expensive server?  Oh and guess what ended up happening to P4 at Google?

Anyways, after a 3 year break during which I happily forgot my struggle with Perforce, I am now back to using it.  Sigh.  Now what's 'funny' is that Arista has the same problem as Google: they locked themselves in through tools.  When you have a large code base of tools built on top of an SCM, it's really, really hard to migrate to something else.

Arista, like Google, literally has tens of thousands of lines of code of tools built around Perforce.  It's kind of ironic that Perforce, the company, doesn't appear to have done anything actively evil to lock the customers in.  The customers got locked in by themselves.  Also note that in both of these instances the companies started quite a few years ago, back when Git didn't exist, or barely existed in Arista's case, so Perforce was a reasonable choice at the time (provided you had the $$$, that is) given that the only other options then were quite brain damaging.

Now I could go on and repeat all the things that have been written many times all over the web about why Perforce sucks.  Yes it's slow, yes you can't work offline, yes you can't do anything that doesn't make it wanna talk to the server, yes it makes all your freaking files read-only and it forces you to tell the server that you're going to edit a file, etc.

But Perforce has its own advantages too.  It has quasi-decent branching / merging capabilities (merging is often more painful than with Git IMO).  It gives you a flexible way to compose your working copy, what's in it, where it comes from.  It's more forgiving for organizations that like to dump a lot of random crap in their SCM.  This seems fairly common, people just find it convenient to commit binaries and such.  It is convenient indeed if you lack better tools, but that doesn't mean it's right.
Used to be a productive software engineer, took a P4 arrow in the knee
So what's my grip with Perforce?  It totally ruins my workflow.  This makes my life as a software engineer utterly miserable.  I always work on multiple things at the same time.  Most of the time they're related.  I may be working on a big change, and I want to break it down in many multiple small incremental steps.  And I often like to revisit these steps.  Or I just wanna go back and forth between a few somewhat related things as I work on an idea and sort of wander into connected ideas.  And I want to get my code reviewed.  Before it gets upstream.

This means that I use git rebase very, very extensively.  And git stash.  I find that this the hardest thing to explain to people who don't know Git.  But once it clicks in your mind, and you understand how powerful git rebase is, you realize it's the best Swiss army knife to manipulate your changes and their history.  When it comes to writing code, it's literally my best friend after vim.

Git, as a tool to manipulate changes made to files, is several orders of magnitude better and more convenient.  It's so simple to select what goes into what commit, undo, redo, squash, split, swap, drop, amend changes.  I always feel like I can manipulate my code and commits effortlessly, that it's malleable, flexible.  I'm removing some lint around some code I'm refactoring?  No problem, git commit -p to select hunk-by-hunk what goes into the refactoring commit and what goes into the "small clean up" commit.  Perforce on the other hand doesn't offer anything but "mark this file for add/edit/delete" and "put these files in a change" and "commit the change".  This isn't the 1990s anymore, but it sure feels like it.

With Perforce you have to serialize your workflow, you have to accept to commit things that will require subsequent "fix previous commit" commits, and thus you tend to commit fewer bigger changes because breaking up a change in smaller chunks is a pain in the ass.  And when you realize you got it wrong, you can't go back, you just have to fix it up with another change.  And your project history is all fugly.  I've used the patch command more over the past 2 months than in the previous 3 years combined.  I'm back to the stone age.

Oh and you can't switch back and forth between branches.  At all.  Like, you just can't.  Period.  This means you have to maintain multiple workspaces and try to parallelize your work across them.  I already have 8 workspaces across 2 servers at Arista, each of which contains mostly-the-same copy of several GB of code.  The overhead to go back and forth between them is significant, so I end up switching a lot less than when I just do git checkout somebranch.  And of course creating a new branch/workspace is extremely time consuming, as in we're talking minutes, so you really don't wanna do it unless you know you're going to amortize the cost over the next several days.

I think the fact that P4 coerces you into a workflow that sucks shows in Perforce's marketing material and product strategy too.  Now they're rolling out this Git integration, dubbed Perforce Git Fusion, that essentially makes the P4 server speak Git so that you can work with Git but still use P4 on the server.  They sell it as "improving the Git experience".  That must be the best joke of the year.  But I think the reality is that engineers don't want to deal with the bullshit way of doing things Perforce imposes, and they want to work with Git.  Anyways this integration sounds great, I would love to use it to stop the pain, only you have to be on a recent enough version of Perforce to be able to use it, and if you're not you "just" need to pay an arm and a fucking leg to upgrade.

My lame workaround: overlay a Git repo on top of my P4 workspace, p4 edit the files I want to work on, maintain the changes in Git until I'm ready to push them upstream.  Still a royal PITA, but at least I can manipulate the files in my workspace.

And then, of course, there is the problem that I'm impatient.  I can't stand waiting more than 500ms at a prompt.  It's quite rare to be able to p4 edit a file in less than a second or two.  At 1:30am on Saturday, after a dozen p4 edits in a row, I was able to get the latency down to 300-500ms (yes it really took a dozen edits/reverts in a row to reliably get lower latency).  It often takes several minutes to trace the history of a file or a branch, or to blame a file ... when that's useful at all with Perforce.

We're in 2012, soon 2013, running on 32 core 128GB RAM machines hooked to 10G/40G networks with an RTT of less than 60µs.  Why would I ever need to wait more than a handful of milliseconds for any of these mundane things to happen?

So, you know what Perforce, (╯°□°)╯︵ ┻━┻

Edit: despite the fact that Arista uses Perforce, which is a bummer, I love that place, love the people I work with and what we're building.  So you should join!