Page MenuHomePhabricator

Evaluate the Zim file format for "DB" like features
Closed, ResolvedPublic

Description

Zim files efficiently compress HTML enabling a large number of wiki articles to be transferred to clients using minimal bandwidth and minimal storage. It also allows for efficient searching of the content.

As we begin to support Zim files in our products some questions come to mind, especially when contrasting the file with standard DBs:

When comparing the compression with say a SQLite DB (or Mongo, PostrgreSQL), what are the average space savings?
Can we efficiently iterate all title metadata for showing the articles as a list? (Like a DB cursor)
Is there any possibility to update a single article without re-writing the entire file? What would it take to do this?

Event Timeline

Dbrant claimed this task.

The Offline Library feature is currently shelved, but just a quick note on the ZIM format:
This format seems to be designed for high compression and fast searching, but is *not* designed for incremental updates, which is ultimately its downfall. If we decide to bring back the Offline Library, we should consider using other formats than ZIM.