Page MenuHomePhabricator

Deploy instance of hoarde as linked-artifacts(?) in k8s
Open, MediumPublic

Description

Data-Persistence is working on an MVP of a service for the persistent caching of outputs which correspond to MediaWiki page content (aka linked artifacts). See the (work-in-progress) proposal for more information.

The software is called Hoarde, and its source repo is: https://gitlab.wikimedia.org/repos/sre/hoarde.

Ideally we'd keep the software & service names disjoint, as we did for Kask (software), and sessionstore and echostore (services). Possible ideas for the service name are: linked-artifacts, linked-cache, linked-artifact-cache, artifact-cache, etc; Bikeshedding of service names welcome!

Edit: Updated title to reflect the state-of-the-art for this discussion.

The service endpoint will need to be accessible from within the production network, and should presumably use the service mesh (it will be accessed from MW extensions). It will need to connect to Cassandra (the RESTBase cluster) for storage, and to make connections to configured "lambdas" (services running in k8s, including MLs k8s cluster).


See also: T402984: Data Persistence Design Review: Article topic model caching

Event Timeline

Eevans triaged this task as Medium priority.Jan 8 2026, 4:55 PM
Eevans updated the task description. (Show Details)
Eevans added a subscriber: BWojtowicz-WMF.

Change #1224817 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/deployment-charts@master] WIP hoard chart

https://gerrit.wikimedia.org/r/1224817

Bikeshedding of service names welcome!

:)

Do you prefer Artifact over Entity? Entity seems like a more common term for this concept to me, whatcha think? In DPE at least, artifacts usually mean something more like a static file

Also, why 'linked'? I'd guess maybe because a record stored in the cache is 'linked' by a stable foreign key to the source system? I worry that 'linked' may be a overloaded term here, especially if we ever store page links ;) What did you intend 'linked' to mean?

entity-cache maybe?

[ ... ]
Do you prefer Artifact over Entity? Entity seems like a more common term for this concept to me, whatcha think?

Both are a little too generic for my taste, but I think the entity here is the MediaWiki article. It is a particular and discrete unit that can be considered apart from its properties...

image.png (317×732 px, 33 KB)

What is being stored here is more ...object produced or shaped by craft, especially a tool...or ornament of archaeological or historical interest, or something viewed as a product...rather than an inherent element. So artifacts associated with entities.

image.png (373×732 px, 51 KB)

In DPE at least, artifacts usually mean something more like a static file

I don't think that's too far off. For example, from the perspective of this system, the data is opaque (as it is for a system storing files), even if it's somehow structured according to the thing that produced it.

Also, why 'linked'? I'd guess maybe because a record stored in the cache is 'linked' by a stable foreign key to the source system? I worry that 'linked' may be a overloaded term here, especially if we ever store page links ;) What did you intend 'linked' to mean?

Yes, linked as in "connected". Each is linked to a particular revision of a MediaWiki article.

I think as important as having the capability itself, is having a system (and it's corresponding documentation, nomenclature, etc) that encapsulates a well-defined problem. We should be able to clearly and easily define exactly what this is for (and perhaps even more importantly, what it isn't for).

entity-cache maybe?

This feels less specific, where I would prefer something more specific. The less ambiguous the better!

Oh, entity vs artifact interesting. Indeed! We would not be storing the entity, but data directly associated with / about the entity. A MW example: pagelinks is derived data, but we would not say that a pagelink is a core MW entity. It is derived data about a MW entity: the page. Is it a new derived entity? Meh, I suppose you could call it that, but that is not helpful. Okay I'm convinced entity is not a good name for this then.

Another term for what this is sort of doing: materialized view storage. Or, it can be used as a materialized view, but it also has cache like abilities: fetching the requested value on cache miss. materialized-artifacts? Meh.

Yes, linked as in "connected".

Hm, what about using the term 'foreign' as in foreign key somehow. foreign-artifacts? Too weird? :) associated-artifacts ?

Maybe linked-artifacts is just fine :D

[ ... ]

Hm, what about using the term 'foreign' as in foreign key somehow. foreign-artifacts? Too weird? :) associated-artifacts ?

Maybe linked-artifacts is just fine :D

I'm not a huge fan of linked-artifacts, but I've vacillated on this quite a bit, with a trajectory that looked a lot like the above. :)

Change #1227851 had a related patch set uploaded (by Federico Ceratto; author: Federico Ceratto):

[operations/puppet@production] service, trafficserver: Prepare "linked-artifacts" k8s pod

https://gerrit.wikimedia.org/r/1227851

Eevans renamed this task from Deploy instance of hoarde as artifact-cache(?) in k8s to Deploy instance of hoarde as linked-artifacts(?) in k8s.Fri, Jan 16, 6:44 PM
Eevans updated the task description. (Show Details)

We (Serviceops) are at capacity this quarter, but we'll keep an eye on this, please tag us if we can help with design questions or issues during deployment.

Change #1237258 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] deployment_server: add linked-artifacts kubeconfig files

https://gerrit.wikimedia.org/r/1237258