Add support for running a subset of tests (aka "sharding"). This patch adds two new command line arguments, --shard-index and --total-shards. These can be used to run a fractional subset of the tests, and work by running every `total_shard`th test in the list of tests, starting at offset `shard_index`. Also, bump the version to 0.9.5.