"As a Reader of a Wikimedia site, I want to quickly get the HTML representation of a common page, and not have to wait for a parse to happen because the parser cache is full of a lot of other data."
I had a hard time formulating this as a user story. Roughly, @cscott estimates that Parsoid output for a page will be about 3x the size of default output, counting the metadata blobs. That means that we might hold 4x the data in the cache for a single page. My completely uneducated guess is that this would hurt cache performance.
I can see a lot of possibilities, but I'd like to have others weigh in on what we could do here. My first takes are:
- Add hardware. This seems kind of extreme, especially since we think this transition is going to be temporary.
- Have lower priority or TTL or something for the non-default parser. I'm not sure how this works or if it would help.
- Use existing storage. We currently have the default parser cache on memcache, and the RESTBase cache in Cassandra. The "parser cache" discussed in these user stories might just be multiplexed across these different storage layers. I don't have the chops to say whether that's preferred or not.