Introduce PCS cache management layer
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MSantos
	Oct 16 2023, 2:38 PM

Description

Background Information

In order to remove PCS from restbase, PCS needs to manage its own cache.

What this is not about

This is not about pre-generation, this will be covered in a separate task

What

The cache layer should connect to the same Cassandra cluster used by restbase

Open Questions

What's the work needed in the infrastructure side?

Acceptance Criteria

A strategy exists to migrate within capacity constraints
PCS is able to manage its own cache layer and restbase cache can be turned off

Details

	Subject	Repo	Branch	Lines +/-
	Introduce server side caching	mediawiki/services/mobileapps	master	+137 -3

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T324931 Clean up open RESTBase related tickets
In Progress	None	T262315 <CORE TECHNOLOGY> API Migration & RESTbase Sunset
Stalled	Dbrant	T328943 Replace PCS lazy-loading logic with standard "loading=lazy" attribute
Open	None	T314025 [EPIC] Migrate PCS service away from restbase
Open	Jgiannelos	T319365 PCS caching and pregeneration when restbase is decommissioned
Resolved	Jgiannelos	T348995 Introduce PCS cache management layer
Open	hnowlan	T350507 Update mobileapps k8s deployment chart for Cassandra credentials

Event Timeline

MSantos created this task.Oct 16 2023, 2:38 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 16 2023, 2:38 PM

MSantos triaged this task as High priority.Oct 16 2023, 2:43 PM

MSantos added a parent task: T319365: PCS caching and pregeneration when restbase is decommissioned.Oct 16 2023, 2:48 PM

Jgiannelos claimed this task.Oct 23 2023, 2:50 PM

Jgiannelos moved this task from Backlog to In Progress on the Content-Transform-Team-WIP board.

MSantos moved this task from Unsorted to PCS Service Pile on the RESTBase Sunsetting board.Oct 24 2023, 10:54 AM

Jgiannelos moved this task from In Progress to Code Review on the Content-Transform-Team-WIP board.Oct 27 2023, 2:45 PM

Initial patch here after bootstrapping the nodejs env and gitlab CI:
https://gitlab.wikimedia.org/repos/content-transform/nodejs-cassandra-storage/-/merge_requests/1

@Eevans will you be available to take a look at this patch, mostly for the cassandra specific changes?

In T348995#9299509, @Jgiannelos wrote:

Initial patch here after bootstrapping the nodejs env and gitlab CI:
https://gitlab.wikimedia.org/repos/content-transform/nodejs-cassandra-storage/-/merge_requests/1

@Eevans will you be available to take a look at this patch, mostly for the cassandra specific changes?

I left comments on the merge-request (TL;DR LGTM)

Eevans updated the task description. (Show Details)Nov 3 2023, 7:04 PM

I added an item to the acceptance criteria about having a strategy for migration that works within the capacity constraints of the cluster. Hopefully this is the right issue for this, let me know if that's not the case and I can move it elsewhere (including to a separate ticket).

I'm going to be taking a closer look at utilization, as I have some reason to think it might be artificially high, but it's really doubtful that we'll have a enough to store all of PCS's cache twice (once for the old system, and once for the new), and maintain a responsible amount of overhead. I don't know if that means that we migrate by group (wikipedia, enwiki, etc), or by table (mobile_html, page_summary, etc), or via some combination of both (or whether or not either would be prohibitively difficult). Or perhaps there are other RESTBase use-cases that are near enough to be removed that this won't be an issue?

{F41439373}

Eevans mentioned this in T348993: Create new cassandra table data model for PCS.Nov 3 2023, 7:49 PM

@Eevans i cant see the file you attached in your comment. If there is a concern for storage we can always switchover gradually by project.

In T348995#9344181, @Jgiannelos wrote:

@Eevans i cant see the file you attached in your comment.

That's strange, it's an image of storage utilization on the cluster. I can see it inline, and click to view it full page. I wonder what happens if I attach the same image again?