As a Wikisource user, I would like the potential benefits and options related to cached generated ebooks (in the server) to be investigated, so it can be determined a) how this can be done, b) how much of an improvement a user could see, and c) the level of time/complexity that such work requires.
Background: With this work, we can make it so that, if someone downloads Book A and then someone else wants to download Book A, the generated ebook can already be cached, as an example. This task is a Phabricator placeholder for https://github.com/wsexport/tool/issues/38, which states: "It would be nice if you cached the files you produce for at least some days, as [[mw:OCG]] / Collection does. Not only that could save some processing time, but most importantly we could serve books faster to our users." This might as simple as caching the full generated epub/pdf/etc. files, or might involve caching various parts of the ebook construction process (e.g. contributor fetching). Please refer to the Github link for more details. This could be helpful for users in many cases, such as when: many people are downloading featured books of the month, when many people are downloading the list of common/popular books on wikis (which are always present, such as on Bengali Wikisource, and other cases. You can see data on recently popular ebook downloads on the Wikisource Stats page.
Acceptance Criteria:
- Read the relevant discussion on Github to receive the full context (see sam's comment below for discussion)
- Investigate what we can cache to improve ebook export reliability
- Investigate for how long we can have this cache in the server, generally speaking
- Investigate the primary work that would need to be done in order to cache generated ebooks
- Investigate the main challenges, risks, and dependencies associated with such work
- Investigate if/how we could give users the option of skipping the cache
- Provide a general estimate/idea, if possible, of the potential impact it may have on ebook export reliability
- Provide a general estimation/rough sense of the level of difficulty of effort required in doing such work
- Think about storage options and disk space