Our current HTML dumper simply creates a directory per title, with a single file named after the revision number inside it. While simple, this does not scale too well on some file systems. A tar file with 10 million subdirectories in a single directory would not work well for many users.
One option to avoid this is to use a subdirectory tree based on the actual title (like `/F/Fo/Foo/12345`) or based on a hash of the title. However, working with such a tree is not very straightforward and requires significant custom client-side code.
Another option is to distribute a sqlite database keyed on title and revision (lzma-compressed, e.g. `en.wikipedia.org_articles.sqlite.xz`). A major advantage of this option is wide client support and random access support out of the box, as well as lack of special requirements on the file system. The biggest question mark for this option is performance for large db sizes, although posts like [this one](http://stackoverflow.com/questions/784173/what-are-the-performance-characteristics-of-sqlite-with-very-large-database-file) describe settings that seem to work with the database sizes we need.