Page MenuHomePhabricator

Eevans (Eric Evans)
Staff Site Reliability Engineer

Today

  • No visible events.

Tomorrow

  • No visible events.

Tuesday

  • No visible events.

User Details

User Since
Feb 27 2015, 10:47 PM (562 w, 1 d)
Availability
Available
IRC Nick
urandom
LDAP User
Eevans
MediaWiki User
EEvans (WMF) [ Global Accounts ]

Recent Activity

Tue, Dec 2

Eevans added a comment to T410075: Discovery of Cassandra cluster nodes.

[ ... ] The external service should be basically a sort of software LB for hosts outside k8s, so you can contact them from k8s simply starting from a common name. I've read a bit of code and discovered stuff like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/953283, my worry was that we had to duplicate the cassandra instance hostnames in the egress rules and in the initial hostnames list (for discovery), but it seems that we have a workaround in place.

Tue, Dec 2, 3:48 PM · Data-Persistence, SRE, Cassandra

Mon, Dec 1

Eevans added a comment to T410075: Discovery of Cassandra cluster nodes.

[ ... ]
The only thing that we can explore at this point is a custom external service / svc in k8s for the various clusters, so every client will just need it instead of a list of cassandra instances to discover the various IPs to connect to. We'll keep the egress policies managed by the external-services (sorry for the name overloading) configs that Balthazar created in the past, it should be a good-enough compromise to alleviate the proliferation of IP/hostnames in various configs. Pooling and depooling will not be ideal but if things are in a single place we should be more efficient in management over a long period of time. Lemme know your thoughts, or if you have a different idea!

Mon, Dec 1, 2:46 PM · Data-Persistence, SRE, Cassandra

Fri, Nov 28

Eevans added a comment to T271140: Some Data Persistence clusters apparently do not support IPv6.

We're good for thanos !

[ ... ]

  • the restbase cluster that is now consistent, although missing AAAA records for all hosts, is that intended @Eevans?

Not that I know of!

For restbase they don't automatically get provisioned with v6 because they're in that list : https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/1208360

Fri, Nov 28, 11:40 PM · Data-Persistence, IPv6

Wed, Nov 26

Eevans added a comment to T410962: Provision Global Editor Metrics tables & endpoints.

Ok, schema has been created, grants made, and DG v1.0.14 has been deployed to staging. Let me know if you encounter any problems.

Wed, Nov 26, 6:57 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence
Eevans added a comment to T410075: Discovery of Cassandra cluster nodes.

@Eevans thanks for the explanation, I kinda assumed that a query to any of the cassandra nodes would have worked as-is, routing the request to the right node (if needed) behind the scenes. IIUC you are saying that whatever endpoint is provided to connect to a cassandra, it is used to retrieve the list of nodes and then the client picks up the right IP address based on $routing policy.

Wed, Nov 26, 3:34 PM · Data-Persistence, SRE, Cassandra
Eevans added a comment to T410075: Discovery of Cassandra cluster nodes.

Naïve q, piggybacking on @Eevans 's response: what about a DNS domain resolving to the node IPs? If we have a recent enough version, we can let the client perform the DNS resolution and trial of the different resolved node IPs, as per https://issues.apache.org/jira/browse/CASSANDRA-14361

Wed, Nov 26, 2:56 PM · Data-Persistence, SRE, Cassandra
Eevans added a comment to T410075: Discovery of Cassandra cluster nodes.

Hi folks!

[ ... ]

I totally agree, my proposal is just to have a quick way to find a random cassandra host to connect to, that is pooled and not under maintenance. I don't think we'd need to disable anything on the Cassandra side, in theory the routing strategy at the LVS level shouldn't really affect clients but it is something to test. Is there a specific use case that may trigger issues (that you have in mind) ?

Wed, Nov 26, 2:52 PM · Data-Persistence, SRE, Cassandra

Tue, Nov 25

Eevans added a comment to T410075: Discovery of Cassandra cluster nodes.

[ ... ]

Lemme know :)

Tue, Nov 25, 9:16 PM · Data-Persistence, SRE, Cassandra
Eevans added a comment to T410962: Provision Global Editor Metrics tables & endpoints.

The more I look at this top k endpoint, the more I think I may have misunderstood what was intended. I reckon the endpoint should accept a start and end timestamp, and return all of the aggregations included (like the other). I'll update it to reflect that.

Backend-wise we will for the foreseeable future only aggregate monthly. Product-wise it might even not make sense to do more flexibly.
However for endpoint modeling consistently using start and end parameters makes sense...

Tue, Nov 25, 4:54 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence
Eevans added a comment to T410962: Provision Global Editor Metrics tables & endpoints.

The more I look at this top k endpoint, the more I think I may have misunderstood what was intended. I reckon the endpoint should accept a start and end timestamp, and return all of the aggregations included (like the other). I'll update it to reflect that.

Tue, Nov 25, 3:44 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence

Mon, Nov 24

Eevans triaged T410962: Provision Global Editor Metrics tables & endpoints as Medium priority.

@mforns, @amastilovic I opened https://gitlab.wikimedia.org/repos/sre/data-gateway/-/merge_requests/10, which combines the work started by @Ottomata (data-gateway/mr-8), with the schema and endpoint for pageviews_top_pages_per_editor.

Mon, Nov 24, 10:36 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence
Eevans created T410962: Provision Global Editor Metrics tables & endpoints.
Mon, Nov 24, 10:23 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence

Thu, Nov 13

Eevans added a comment to T409414: Configure Lift Wing isvc Integration with Cassandra.

[ ... ]
@Eevans was there any discussion about adding an LVS endpoint in front of the Cassandra nodes? It shouldn't be super difficult, and it would simplify things a lot :)

Thu, Nov 13, 6:34 PM · Machine-Learning-Team
Eevans created T410075: Discovery of Cassandra cluster nodes.
Thu, Nov 13, 6:33 PM · Data-Persistence, SRE, Cassandra

Wed, Nov 12

Eevans updated the task description for T409850: Cassandra role & grants for Lift Wing isvc integration.
Wed, Nov 12, 10:12 PM · Data-Persistence, Machine-Learning-Team
Eevans closed T409850: Cassandra role & grants for Lift Wing isvc integration as Resolved.

Ok, a new role has been created: revise_tone_task_generator, and it has been given MODIFY permissions on ml_cache.page_paragraph_tone_scores. This is the case for both the cassandra-dev (staging), and aqs (production) clusters.

Wed, Nov 12, 10:11 PM · Data-Persistence, Machine-Learning-Team
Eevans closed T409850: Cassandra role & grants for Lift Wing isvc integration, a subtask of T409414: Configure Lift Wing isvc Integration with Cassandra, as Resolved.
Wed, Nov 12, 10:11 PM · Machine-Learning-Team
Eevans added a comment to T409850: Cassandra role & grants for Lift Wing isvc integration.

When the service starts, Lift Wing will validate whether the target table exists, so we'll need SELECT as well. @BWojtowicz-WMF, is it correct?

In the current implementation, we try to validate that our connection is successful and the table exists by running a this query on init: SELECT table_name FROM system_schema.tables WHERE keyspace_name = %s AND table_name = %s. I think the system_schema.tables is readable by default, this validation probably doesn't require any additional permissions.

Wed, Nov 12, 2:28 PM · Data-Persistence, Machine-Learning-Team

Tue, Nov 11

Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Tue, Nov 11, 8:49 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T409850: Cassandra role & grants for Lift Wing isvc integration.

With respect to GRANTs, is it safe to assume that MODIFY is sufficient? There is no requirement to do reads here, is there?

Tue, Nov 11, 4:37 PM · Data-Persistence, Machine-Learning-Team
Eevans updated the task description for T409850: Cassandra role & grants for Lift Wing isvc integration.
Tue, Nov 11, 4:36 PM · Data-Persistence, Machine-Learning-Team
Eevans added a comment to T409414: Configure Lift Wing isvc Integration with Cassandra.

@Eevans i guess we can just start with a set of shared credentials and split later if needed

These clusters are managed as multi-tenant, so what I'm trying to establish here is if this is logically one tenant, or many (two currently). If what is writing to Cassandra is some piece of shared infrastructure or service (presumably a single code repository), than that would be one tenant (one set of credentials). If each project contains the code that manages connections to Cassandra, those are separate tenants, and we should create a role for each.

I think it should be one tenant. We're using a shared infrastructure (Lift Wing) and a single code repository (inference-services).

Tue, Nov 11, 4:31 PM · Machine-Learning-Team
Eevans triaged T409850: Cassandra role & grants for Lift Wing isvc integration as Medium priority.
Tue, Nov 11, 4:29 PM · Data-Persistence, Machine-Learning-Team
Eevans created T409850: Cassandra role & grants for Lift Wing isvc integration.
Tue, Nov 11, 4:27 PM · Data-Persistence, Machine-Learning-Team

Mon, Nov 10

Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Mon, Nov 10, 10:45 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Mon, Nov 10, 10:43 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T409414: Configure Lift Wing isvc Integration with Cassandra.

@Eevans i guess we can just start with a set of shared credentials and split later if needed

Mon, Nov 10, 4:35 PM · Machine-Learning-Team

Nov 6 2025

Eevans added a comment to T409414: Configure Lift Wing isvc Integration with Cassandra.

[ ... ]

@Eevans Hi! Is there a load balancing endpoint in front of the cassandra nodes, or should we randomly pick one to connect to?

Nov 6 2025, 4:29 PM · Machine-Learning-Team
Eevans added a comment to T409414: Configure Lift Wing isvc Integration with Cassandra.

is Cassandra running on the prod network? if yes it should be reachable at a given address/port with a set of credentials, no?

Nov 6 2025, 3:23 PM · Machine-Learning-Team

Nov 4 2025

Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

The table has been created, the mock data loaded, and v1.0.13 of the Gateway (w/ the new endpoint) has been deployed.

Nov 4 2025, 8:22 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Clearly I care about naming! But once again, I only put my arguments here and leave it to you to make the final call!

Nov 4 2025, 3:14 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Nov 3 2025

Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Fyi, I've rearranged some of what I'm quoting here (I hope that's OK).

Nov 3 2025, 9:45 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Could we go with page_paragraph_tone_scores?

I think it's clear that the data represents paragraphs from MediaWiki pages when we have page_id as part of primary key

Perhaps, but I'm suggesting this in anticipation of more MediaWiki page entity related derived data. In the future I think we will want to standardize on a table name and key for derived data keyed by page_id (ML predictions, structured tasks, etc.). If when we do, we'd likely want to include the MediaWiki entity name in all derived data tables like this, to make it clear which ones are about pages (and what their expected key is), which one is about users, etc. etc.

Nov 3 2025, 7:03 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 31 2025

Eevans added a comment to T343855: AQS 2.0 differentially private pageviews deploy API.

Any updates on this?

Oct 31 2025, 7:47 PM · Cassandra, serviceops, AQS2.0, Service-deployment-requests, Services, SRE
Eevans closed T360548: Cassandra quorum read timeouts during node decommissions as Resolved.

I'd pretty much forgotten about this issue, but there have since been a number of decommissions (routine node refreshes) that have happened with no reported errors. I think it is reasonable to assume that the upgrade to 4.1.5 (we are now on 4.1.8) did in fact fix this issue, and so I will boldly close the ticket (we can always reopen if we learn otherwise).

Oct 31 2025, 7:00 PM · Cassandra
Eevans closed T358141: sre.cassandra.roll-restart cookbook can fail if it overlaps with a puppet run as Resolved.

This was fixed in https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1023873.

Oct 31 2025, 6:51 PM · Cassandra
Eevans moved T408935: Provision anonymous session storage from Backlog to Next on the Cassandra board.
Oct 31 2025, 6:47 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans added a comment to T392170: sessionstorage namespacing.

@Tgr at this point, is there any obstacle and/or objections to separating storage of central auth sessions? Using a separate store/namespace for them is perhaps even more interesting if it means we could set the TTLs accordingly.

Oct 31 2025, 6:28 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans added a comment to T408935: Provision anonymous session storage.

@Tgr what portion of the overall workload is anon? Is there a dashboard for this?

Oct 31 2025, 6:21 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans triaged T408935: Provision anonymous session storage as Medium priority.
Oct 31 2025, 6:18 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans updated the task description for T408935: Provision anonymous session storage.
Oct 31 2025, 6:18 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans updated the task description for T408935: Provision anonymous session storage.
Oct 31 2025, 6:13 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans created T408935: Provision anonymous session storage.
Oct 31 2025, 6:12 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans triaged T392170: sessionstorage namespacing as Medium priority.
Oct 31 2025, 6:01 PM · MediaWiki-Platform-Team (Radar), SRE-OnFire, Sustainability (Incident Followup), Cassandra
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

[ ... ]

My question is: what work is required from us to load a few mock records on staging Cassandra? Is this something can be done on your end?

Oct 31 2025, 4:31 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T402853: Deploy separate anonymous session backend to Wikimedia production.

[ ... ]
Is the relevant task T392170: sessionstorage namespacing?

Oct 31 2025, 3:37 PM · MediaWiki-Platform-Team (Kanban Board), Data-Persistence, OKR-Work, Wikimedia-Site-requests

Oct 30 2025

Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Ok, I've updated https://gitlab.wikimedia.org/repos/sre/data-gateway/-/merge_requests/9

Oct 30 2025, 9:48 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans edited P84494 (An Untitled Masterwork).
Oct 30 2025, 9:03 PM
Eevans created P84494 (An Untitled Masterwork).
Oct 30 2025, 9:02 PM

Oct 29 2025

Eevans closed T401394: ☂️ [FY2025-26][Hypothesis] WE6.2.3 Data Storage Design Review as Resolved.
Oct 29 2025, 8:32 PM · Data-Persistence
Eevans added a comment to T401394: ☂️ [FY2025-26][Hypothesis] WE6.2.3 Data Storage Design Review.

This is complete, and the design review published as: https://wikitech.wikimedia.org/wiki/SRE/Data_Persistence/Design_Review

Oct 29 2025, 8:32 PM · Data-Persistence
Eevans triaged T408746: ☂️ [FY2025-26][Hypothesis] WE6.2.4 Data Storage Design Review as Medium priority.
Oct 29 2025, 8:29 PM · Data-Persistence
Eevans created T408746: ☂️ [FY2025-26][Hypothesis] WE6.2.4 Data Storage Design Review.
Oct 29 2025, 8:29 PM · Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 29 2025, 8:25 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 29 2025, 8:25 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Trying to address the blocking ones:

  • Naming

Keyspace: ml_cache
Table name: page_tone_check

I think "page_tone_check" is okay since the ML model is called Tone Check, and the data stored is outputs from the model.

Oct 29 2025, 8:20 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 29 2025, 7:26 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 27 2025

Eevans created P84310 (An Untitled Masterwork).
Oct 27 2025, 9:25 PM
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

every change to the model absolutely requires a change to the application code as well

This is probably a good thing. IIUC, model_version rarely changes, but if it does, you probably want to have a managed upgrade path. This also would give you the ability to A/B test serving different model versions. I would expect when this happens that ML could generate and store tasks using both models, until we are sure the new model_version is the one to use for sure.

Oct 27 2025, 9:07 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

I've created a draft merge-request here: https://gitlab.wikimedia.org/repos/sre/data-gateway/-/merge_requests/9, please have a look and let me know if —among other things— there are any issues with the keyspace and table names (which by convention are exposed in the DG urls), attribute names (which by convention will be returned in JSON results), or the order/disposition of URL parameters (I've ordered them differently to how they appear in the schema).

From that MR I'm having trouble deciphering what the requests pattern would look like exactly. Would it just be http:://localhost:1234/ml_cache/{wiki_id}/{page_id}? (With localhost:1234 being the internal URL for the DataGateway.) I would have expected that to include something about Revise Tone? Note that for image suggestions it is a bit more elaborate: /public/image_suggestions/suggestions/{wiki}/{page_id} (per the docs).

Oct 27 2025, 4:10 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

But also, it kind of is cache isn't?

I'm not sure if it is! At the very least, it is not a read-through cache. But as we discussed in slack, the line is blurry.

In this case, there is no way to make a request for the task content to be generated. Tasks are the data stored in this cassandra table. If the data isn't there, it can't be obtained by users. If we had done a batch task generation approach, I think this would be more apparent.

Oct 27 2025, 4:06 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

QUESTION: What is Tone Suggestion Generator (as referenced in F66750510)? Is it architecturally compatible with the sort of HTTP service mentioned above?

[ ... ]
It is not expected to handle client reads, which are handled by Data Gateway. The ML team don't have a preference on the RESTBase or AQS cluster. But we would like to use Data Gateway since Growth team has been using it for other structured tasks.

Oct 27 2025, 3:36 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 27 2025, 3:16 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 27 2025, 3:06 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 27 2025, 3:03 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 27 2025, 3:01 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 27 2025, 2:58 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 24 2025

Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

I've created a draft merge-request here: https://gitlab.wikimedia.org/repos/sre/data-gateway/-/merge_requests/9, please have a look and let me know if —among other things— there are any issues with the keyspace and table names (which by convention are exposed in the DG urls), attribute names (which by convention will be returned in JSON results), or the order/disposition of URL parameters (I've ordered them differently to how they appear in the schema). Also, let me know whether or not you think we should include all of the attributes in the results. For example, do you want wiki_id, page_id, etc, in the results given that presumably the caller will know them (having just supplied them as query parameters).

Oct 24 2025, 6:39 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 23 2025

Eevans triaged T408129: Provision Cassandra + Data Gateway resources for Tone Check as Medium priority.
Oct 23 2025, 3:03 PM · Cassandra, OKR-Work, Goal, Machine-Learning-Team
Eevans created T408129: Provision Cassandra + Data Gateway resources for Tone Check.
Oct 23 2025, 3:03 PM · Cassandra, OKR-Work, Goal, Machine-Learning-Team
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 23 2025, 2:12 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 23 2025, 2:11 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 22 2025

Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 22 2025, 11:18 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 22 2025, 10:05 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 22 2025, 10:04 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 22 2025, 8:14 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 22 2025, 7:53 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 22 2025, 7:03 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 22 2025, 12:12 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Summary for yesterday's meeting (doc)

Data Model:

  • Use “MVCC” revision_id in composite key

Update Pipeline:

  • Streaming updates in LiftWing+ChangeProp
    • With intentions / commitment for DPE to examine and hopefully built platform support for this in near future.

@Eevans For next steps, we would like to have the instance ready so we can begin working on bootstrap/initial ingestion to Cassandra. Do you have an estimated timeline for when this can be ready? Is there anything else you need from us? :)

Oct 22 2025, 12:10 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 21 2025

Eevans updated the task description for T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.
Oct 21 2025, 9:58 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans updated the task description for T401260: Global Editor Metrics - Data Persistence Design Review.
Oct 21 2025, 4:39 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Data-Persistence

Oct 20 2025

Eevans added a comment to T407414: aqs1012 is down.

@Eevans are you able to reimage the server i have had no luck due to no root partition error. and preseed file has -efi for raid configuration for a server setup for legacy bios?

I haven't tried (and wouldn't trust putting it back into production without first understanding the failure that got us here). That preseed is supposed to work with with legacy bios (even though it supports uefi). It's the same preseed that was last used to install that host last (back in...July-ish, I think?), along with all of the sessionstore hosts (also legacy bios).

TL;DR I think there is something else wrong here.

Oct 20 2025, 2:07 PM · SRE, DC-Ops, ops-eqiad
Eevans added a comment to T407414: aqs1012 is down.

@Eevans are you able to reimage the server i have had no luck due to no root partition error. and preseed file has -efi for raid configuration for a server setup for legacy bios?

Oct 20 2025, 2:05 PM · SRE, DC-Ops, ops-eqiad

Oct 15 2025

Eevans triaged T407414: aqs1012 is down as High priority.
Oct 15 2025, 5:51 PM · SRE, DC-Ops, ops-eqiad
Eevans created T407414: aqs1012 is down.
Oct 15 2025, 5:51 PM · SRE, DC-Ops, ops-eqiad
Eevans added a comment to T405942: eqiad row C/D Data Persistence host migrations.

[ ... ]

Provided that the moves happen one at a time (probably goes without saying), then the Cassandra hosts can be done at any time, and without coordination. The Cassandra hosts here are: aqs*, restbase*, & sessionstore*

aqs*, restbase*, & sessionstore can be done anytime without coordination. @Eevans: So no icinga notice and just move the network port without further interactions with the OS or services? If so, that is by far the easiest. Do they require any time between hosts? That is 6 aqs hosts, 5 restbase, and 2 sessionstore, so less than 6 business days.

So if we need to do anything other than move the port, please let us know.

Oct 15 2025, 5:37 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad

Oct 13 2025

Eevans added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Provided that the moves happen one at a time (probably goes without saying), then the Cassandra hosts can be done at any time, and without coordination. The Cassandra hosts here are: aqs*, restbase*, & sessionstore*

Oct 13 2025, 1:23 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad

Oct 10 2025

Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

[...]

If possible I would be interested in decoupling the schema decision discussed in the task from the update/ingestion mechanism and its architecture and my understanding is that your latest recommendation allows us to achieve this.

Oct 10 2025, 2:28 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 9 2025

Eevans updated subscribers of T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

@Eevans Aiko has suggested a way to query for page_id,revision_id & model_version in T401021#11190742

PRIMARY KEY((wiki, page_id, revision_id), model_version)

Oct 9 2025, 3:00 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 7 2025

Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

[ ... ]

@Eevans We would like to use the following schema. What do you think?

CREATE TABLE table (
  wiki_id    text, -- enwiki, frwiki, etc
  page_id    int,
  revision_id    int,
  paragraphs    map<text, float>, -- plaintext paragraph with tone issues and score. can be null if no paragraphs have tone issues
  model_version    text,
  PRIMARY KEY((wiki, page_id, revision_id), model_version)
)
Oct 7 2025, 1:12 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Eevans added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Update: Growth team won't be doing the testwiki PoC this quarter, so we don't have an urgent timeline to ingest a one-off dataset to staging Cassandra

I'm circling back on this to figure out if we can align on the timelines. We would like to have the instance by Mid October (15th) so we can work on ingesting data that would enable an A/B test. Is this possible?

Oct 7 2025, 12:19 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Oct 6 2025

Eevans added a comment to T402850: Decide on anonymous session backend.

@Eevans we are now very close to wrapping up the coding part of T400372: Separate storage backend for anonymous sessions. That will allow for separate Cassandra namespaces for anonymous and authenticated sessions (also e.g. per-wiki as proposed in T392170: sessionstorage namespacing if that's deemed useful), but also something more aggressive like using Cassandra for authenticated sessions but Memcached for anonymous sessions. (Using Memcached was proposed in T362335: Simplify MediaWiki session store at WMF but rejected because routine Memcached maintenance would then result in users getting logged out. With anonymous users only, that's not really a problem.)

What would be the best way to determine what store to use?

(cc @Krinkle @DAlangi_WMF)

Oct 6 2025, 3:55 PM · Data-Persistence, OKR-Work, MediaWiki-Platform-Team
Eevans added a comment to T402984: Data Persistence Design Review: Article topic model caching.

I have updated ownership and expiration date
@Eevans There has been a change of plans regarding the integration of this work with this years Year In Review so although we still need this Cassandra instance the request that we have filed for the improve tone structured task in T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task is of higher priority .I just wanted to mention this so you can handle your priorities and timelines accordingly.

Oct 6 2025, 2:01 PM · Data-Persistence-Design-Review, Machine-Learning-Team, Data-Persistence

Oct 2 2025

Eevans updated the task description for T402984: Data Persistence Design Review: Article topic model caching.
Oct 2 2025, 3:52 PM · Data-Persistence-Design-Review, Machine-Learning-Team, Data-Persistence
Eevans updated the task description for T402984: Data Persistence Design Review: Article topic model caching.
Oct 2 2025, 3:48 PM · Data-Persistence-Design-Review, Machine-Learning-Team, Data-Persistence
Eevans updated the task description for T402984: Data Persistence Design Review: Article topic model caching.
Oct 2 2025, 3:08 PM · Data-Persistence-Design-Review, Machine-Learning-Team, Data-Persistence
Eevans updated the task description for T402984: Data Persistence Design Review: Article topic model caching.
Oct 2 2025, 3:03 PM · Data-Persistence-Design-Review, Machine-Learning-Team, Data-Persistence
Eevans added a comment to T402984: Data Persistence Design Review: Article topic model caching.

[ ... ]

I see you filled out the description with all the discussed details, thank you a lot!

Oct 2 2025, 1:52 PM · Data-Persistence-Design-Review, Machine-Learning-Team, Data-Persistence

Oct 1 2025

Eevans updated the task description for T402984: Data Persistence Design Review: Article topic model caching.
Oct 1 2025, 11:49 PM · Data-Persistence-Design-Review, Machine-Learning-Team, Data-Persistence