Page MenuHomePhabricator
Feed Advanced Search

Sat, Jan 25

Eevans committed rMSKS662f71ebf9bc: Configurable query and connect timeouts (authored by Eevans).
Configurable query and connect timeouts
Sat, Jan 25, 2:07 AM

Fri, Jan 24

Eevans closed T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` as Resolved.
Fri, Jan 24, 10:28 PM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)
Eevans added a comment to T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session`.

Kask has been updated with higher (default) Cassandra timeouts, and deployment-prep has been updated. I'm going to close this, feel free to re-open if this happens again.

Fri, Jan 24, 10:27 PM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Thu, Jan 23

Eevans triaged T243544: Cassandra PHP language driver packaging (Debian) as Medium priority.
Thu, Jan 23, 8:10 PM · Core Platform Team, User-Eevans
Eevans created T243544: Cassandra PHP language driver packaging (Debian).
Thu, Jan 23, 8:09 PM · Core Platform Team, User-Eevans
Eevans added a comment to T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session`.

The default timeouts in the Cassandra Go driver, both Timeout and ConnectTimeout are 600ms. This seems quite low, by comparison the Java and NodeJS drivers both use 12s and 5s respectively. I propose we make these values configurable in Kask (with defaults of 12s and 5s).

Thu, Jan 23, 1:01 AM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Wed, Jan 22

Eevans lowered the priority of T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` from Unbreak Now! to Medium.

It looks like Cassandra queries from Kask have been intermittently timing out. Both Kask and Cassandra are co-located on the same VM, and it is pretty resource constrained, but AFAIK it has been working OK to this point; We can probably begin with a restart and go from there

Wed, Jan 22, 11:23 PM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Fri, Jan 17

Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 11:51 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 10:02 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 9:56 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243106: Phased rollout of sessionstore to production fleet.
Fri, Jan 17, 9:41 PM · Patch-For-Review, TPG-Epics (Team Practices Group Coaching Clinic), CPT Initiatives (Multi-DC (TEC1)), User-Clarakosi, User-Eevans
Eevans updated the task description for T243106: Phased rollout of sessionstore to production fleet.
Fri, Jan 17, 9:40 PM · Patch-For-Review, TPG-Epics (Team Practices Group Coaching Clinic), CPT Initiatives (Multi-DC (TEC1)), User-Clarakosi, User-Eevans
Eevans created T243106: Phased rollout of sessionstore to production fleet.
Fri, Jan 17, 9:36 PM · Patch-For-Review, TPG-Epics (Team Practices Group Coaching Clinic), CPT Initiatives (Multi-DC (TEC1)), User-Clarakosi, User-Eevans
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 8:07 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 5:13 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 2:32 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 2:45 AM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Fri, Jan 17, 12:30 AM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations

Thu, Jan 16

Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Thu, Jan 16, 10:21 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Thu, Jan 16, 9:30 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans added a comment to T234286: Multi-DC Echo Notification Storage.

TTBMK, everything here is done.

Thu, Jan 16, 7:21 PM · Growth-Team, Notifications, Core Platform Team, CPT Initiatives (Multi-DC Echo Notification Storage)
Eevans claimed T234296: Completed migration.

Done.

Thu, Jan 16, 7:21 PM · Growth-Team, Notifications, Core Platform Team Workboards (User Stories), Story, CPT Initiatives (Multi-DC Echo Notification Storage)
Eevans claimed T234963: Deploy final configuration.

Done.

Thu, Jan 16, 7:20 PM · Core Platform Team Workboards (Clinic Duty Team), Notifications, Growth-Team, CPT Initiatives (Multi-DC Echo Notification Storage)
Eevans edited projects for T241784: (No Need By Date) rack/setup/install restbase1029, restbase1029, restbase1030, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Thu, Jan 16, 6:23 PM · Core Platform Team Workboards (Clinic Duty Team), ops-eqiad, Operations
Eevans edited projects for T241790: (No Need By Date Provided) rack/setup/install restbase202[123], added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Thu, Jan 16, 6:23 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans triaged T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c} as Medium priority.
Thu, Jan 16, 5:56 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans moved T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c} from Inbox to Doing on the Core Platform Team Workboards (Clinic Duty Team) board.
Thu, Jan 16, 5:56 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans edited projects for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Thu, Jan 16, 5:55 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans created T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Thu, Jan 16, 5:55 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations

Wed, Jan 15

Eevans moved T234963: Deploy final configuration from Doing to Waiting for Review on the Core Platform Team Workboards (Clinic Duty Team) board.
Wed, Jan 15, 9:14 PM · Core Platform Team Workboards (Clinic Duty Team), Notifications, Growth-Team, CPT Initiatives (Multi-DC Echo Notification Storage)
Eevans moved T234963: Deploy final configuration from Inbox to Doing on the Core Platform Team Workboards (Clinic Duty Team) board.
Wed, Jan 15, 9:14 PM · Core Platform Team Workboards (Clinic Duty Team), Notifications, Growth-Team, CPT Initiatives (Multi-DC Echo Notification Storage)
Eevans triaged T234963: Deploy final configuration as Medium priority.
Wed, Jan 15, 9:13 PM · Core Platform Team Workboards (Clinic Duty Team), Notifications, Growth-Team, CPT Initiatives (Multi-DC Echo Notification Storage)

Mon, Jan 13

Eevans added a comment to T242461: restrouter.svc.{eqiad,codfw}.wmnet in a failed state.

Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Is it possible to take it out for now until we either prioritize it again or drop it entirely?

You mean undeploy? Sure we can undeploy it. The only caveat being that redeploying it will take some time as we will need to create the necessary resources again (LVS entries, DNS, kubernetes namespaces etc).

We're running CI for RESTBase in both RESTBase and RESTRouter modes, so it will be in mostly deployable state if we want to put it back online, however maintaining an unused production deployment seems like a waste.

Indeed.

A lot has changed since we began this migration, including https://www.mediawiki.org/wiki/Core_Platform_Team/Decisions_Architecture_Research_Documentation/Services_Architecture_Recommendations_(2019), which is expected be a lengthly process, but will ultimately result in REST{Router,Base}-less world. I guess the question we should be asking is: Is this still something we should do in the meantime (and schedule and resource to complete), or should we cut bait, undeploy from k8s, and leave things as they are?
@WDoranWMF ?

Mon, Jan 13, 8:51 PM · serviceops, Core Platform Team Workboards (Clinic Duty Team)
Eevans added a comment to T242461: restrouter.svc.{eqiad,codfw}.wmnet in a failed state.

Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Is it possible to take it out for now until we either prioritize it again or drop it entirely?

You mean undeploy? Sure we can undeploy it. The only caveat being that redeploying it will take some time as we will need to create the necessary resources again (LVS entries, DNS, kubernetes namespaces etc).

We're running CI for RESTBase in both RESTBase and RESTRouter modes, so it will be in mostly deployable state if we want to put it back online, however maintaining an unused production deployment seems like a waste.

Indeed.

Mon, Jan 13, 4:43 PM · serviceops, Core Platform Team Workboards (Clinic Duty Team)

Fri, Jan 10

Eevans triaged T242461: restrouter.svc.{eqiad,codfw}.wmnet in a failed state as Medium priority.

It's not clear to me what the status of this is. Do we need to deploy the latest code here? Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Fri, Jan 10, 8:39 PM · serviceops, Core Platform Team Workboards (Clinic Duty Team)
Eevans created T242461: restrouter.svc.{eqiad,codfw}.wmnet in a failed state.
Fri, Jan 10, 8:35 PM · serviceops, Core Platform Team Workboards (Clinic Duty Team)
Eevans added a comment to T242344: Remove Parsoid-JS tables from Cassandra.

The tables have been dropped in all 3 environments. The only thing remaining is to clear the snapshots (and actually reclaim the space). Out of an abundance of caution, I'll sit on this for a couple days and close the ticket once complete.

Fri, Jan 10, 8:27 PM · Core Platform Team Workboards (Clinic Duty Team), Parsoid-PHP, RESTBase
Eevans added a comment to T242344: Remove Parsoid-JS tables from Cassandra.

OK, here is what I propose applying; Review appreciated!

Fri, Jan 10, 7:51 PM · Core Platform Team Workboards (Clinic Duty Team), Parsoid-PHP, RESTBase
Eevans created P10118 deployment-prep.yaml.
Fri, Jan 10, 7:50 PM
Eevans created P10116 dev.yaml.
Fri, Jan 10, 7:49 PM
Eevans created P10115 production.yaml.
Fri, Jan 10, 7:47 PM
Eevans updated the task description for T242344: Remove Parsoid-JS tables from Cassandra.
Fri, Jan 10, 7:35 PM · Core Platform Team Workboards (Clinic Duty Team), Parsoid-PHP, RESTBase
Eevans triaged T242344: Remove Parsoid-JS tables from Cassandra as Medium priority.
Fri, Jan 10, 7:34 PM · Core Platform Team Workboards (Clinic Duty Team), Parsoid-PHP, RESTBase
Eevans edited projects for T241068: Restrouter health checks fail when local wikifeeds instance is not pool in discovery records, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Fri, Jan 10, 5:47 PM · Core Platform Team Workboards (Clinic Duty Team), serviceops-radar
Eevans edited projects for T178445: flapping monitoring for recommendation_api on scb, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Fri, Jan 10, 5:43 PM · Core Platform Team Workboards (Clinic Duty Team), Recommendation-API, Discovery, Services (watching), Wikidata, Operations, observability
Eevans edited projects for T241905: Investigate JobQueue outage from 2020-01-04 22:00 UTC, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Fri, Jan 10, 5:41 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Incident, WMF-JobQueue
Eevans edited projects for T241940: No option to continue querying for more results in globalallusers API, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Fri, Jan 10, 5:40 PM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, MediaWiki-API
Eevans edited projects for T242249: Unclear MCR replacement for WikiPage::prepareContentForEdit, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Fri, Jan 10, 5:40 PM · Core Platform Team Workboards (Clinic Duty Team), Documentation, CPT Initiatives (MCR)
Eevans edited projects for T242409: languageinfo API returns a TypeError if you request fallbacks, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team.
Fri, Jan 10, 5:40 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), Core Platform Team Workboards (Clinic Duty Team), Wikimedia-production-error, MediaWiki-API, Regression
Eevans removed a project from T224425: MW Job consumers sometimes pause for several minutes: Core Platform Team.
Fri, Jan 10, 5:39 PM · Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Modern Event Platform (TEC2)), WMF-JobQueue, Discovery-Search (Current work)
Eevans added a project to T224425: MW Job consumers sometimes pause for several minutes: Core Platform Team Workboards (Clinic Duty Team).
Fri, Jan 10, 5:38 PM · Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Modern Event Platform (TEC2)), WMF-JobQueue, Discovery-Search (Current work)
Eevans triaged T240307: Hook container with strong types and DI as Medium priority.
Fri, Jan 10, 5:34 PM · TechCom-RFC (TechCom-Approved), User-Daniel, Core Platform Team
Eevans triaged T170603: API Edit Requires a Captcha, but on Wiki edit does not as Medium priority.
Fri, Jan 10, 5:33 PM · MediaWiki-extensions-OAuth, ConfirmEdit (CAPTCHA extension), MediaWiki-API
Eevans triaged T192023: Allowing seaching the archive table for titles of deleted pages through the API as Medium priority.
Fri, Jan 10, 5:25 PM · MediaWiki-API
Eevans triaged T241940: No option to continue querying for more results in globalallusers API as Medium priority.
Fri, Jan 10, 5:23 PM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, MediaWiki-API
Eevans triaged T242249: Unclear MCR replacement for WikiPage::prepareContentForEdit as Medium priority.
Fri, Jan 10, 5:22 PM · Core Platform Team Workboards (Clinic Duty Team), Documentation, CPT Initiatives (MCR)
Eevans triaged T241905: Investigate JobQueue outage from 2020-01-04 22:00 UTC as Medium priority.
Fri, Jan 10, 5:21 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Incident, WMF-JobQueue
Eevans triaged T242409: languageinfo API returns a TypeError if you request fallbacks as Medium priority.
Fri, Jan 10, 5:05 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), Core Platform Team Workboards (Clinic Duty Team), Wikimedia-production-error, MediaWiki-API, Regression

Tue, Jan 7

Eevans updated subscribers of T228294: Cassandra PHP driver evaluation.

This seems in our wheelhouse.

  • Once we have a packaged driver, what would we do with it?
Tue, Jan 7, 9:35 PM · Core Platform Team, User-Eevans

Mon, Jan 6

Eevans added a comment to T241790: (No Need By Date Provided) rack/setup/install restbase202[123].
In T238580#5710739, @Eevans wrote:
In T238580#5709953, @RobH wrote:

Also note I assumed details for the racking/hostnames and would appreciate confirmation of those details in task description, thanks!

This cluster uses a replication count of 3 (per-DC), and for eqiad we have machines evenly distributed over a, b, and d. This replica-to-row affinity makes it very nice to reason about where data will be moving from/to on topology changes and it would be a shame if we lost that now. Will there be a problem keeping these to the same 3 rows currently in-use?

Mon, Jan 6, 4:58 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T241790: (No Need By Date Provided) rack/setup/install restbase202[123].
Mon, Jan 6, 4:57 PM · Core Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations

Dec 19 2019

Eevans closed T218609: Figure out future for newly created deployment-prep jessie instances, a subtask of T218729: Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster, as Resolved.
Dec 19 2019, 9:17 PM · Cloud-VPS (Debian Jessie Deprecation), Beta-Cluster-Infrastructure
Eevans closed T218609: Figure out future for newly created deployment-prep jessie instances as Resolved.

This is now done. Sorry for the long delay.

Dec 19 2019, 9:17 PM · Patch-For-Review, Beta-Cluster-Infrastructure
Eevans added a comment to T122825: Service Ownership and Maintenance.

I think most of the issues described here have been in the meantime solved by the implementation of the code stewardship review process and a list of developers/maintainers. @Pchelolo @Eevans @Clarakosi any opinions?

Dec 19 2019, 4:52 PM · Core Platform Team, TechCom, User-mobrovac, Operations
Eevans added a comment to T218609: Figure out future for newly created deployment-prep jessie instances.

@Eevans: It has been 6 months, please respond.

Dec 19 2019, 1:46 AM · Patch-For-Review, Beta-Cluster-Infrastructure

Dec 4 2019

Eevans added a comment to T222851: Improve Echo seentime code for multi-DC access.

Summarizing an IRC discussion: @Catrope will pick this up mid-November(ish), and we'll target deployment for sometime after the November freeze (27th–29th), and before the December freeze (December 23rd-January 3rd).

Dec 4 2019, 7:55 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), CPT Initiatives (Multi-DC Echo Notification Storage), User-Eevans, Notifications, Growth-Team
Eevans triaged T239856: Fold services recommendations into Standards for services RfC as Medium priority.
Dec 4 2019, 7:52 PM · Core Platform Team Workboards (Clinic Duty Team)
Eevans created T239856: Fold services recommendations into Standards for services RfC.
Dec 4 2019, 7:51 PM · Core Platform Team Workboards (Clinic Duty Team)

Dec 2 2019

Eevans added a comment to T236113: API developer creates automated documentation.

We discussed this in our kickoff meeting today.
There was a lot of resistance to the idea of having an endpoint for OpenAPI 3.0 definitions of the (other) endpoints. I like the idea of using OpenAPI since there are a lot of other tools that would benefit, such as client code generators.

Dec 2 2019, 11:02 PM · Core Platform Team Workboards (Green), Story, CPT Initiatives (Core REST API in PHP)

Nov 27 2019

Eevans moved T207946: Evaluate possible optimizations for concurrent JVMs from Inbox to Icebox on the Core Platform Team board.
Nov 27 2019, 7:23 PM · Core Platform Team, Cassandra, User-Eevans
Eevans edited projects for T207946: Evaluate possible optimizations for concurrent JVMs, added: Core Platform Team; removed Core Platform Team (Needs Cleaning - Cassandra Operational).
Nov 27 2019, 7:23 PM · Core Platform Team, Cassandra, User-Eevans
Eevans moved T226553: Install Cassandra table properties Debian package on Cassandra hosts from Inbox to Backlog on the Core Platform Team Workboards (Clinic Duty Team) board.
Nov 27 2019, 7:22 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, User-WDoran
Eevans edited projects for T226553: Install Cassandra table properties Debian package on Cassandra hosts, added: Core Platform Team Workboards (Clinic Duty Team); removed Core Platform Team (Needs Cleaning - Cassandra Operational).
Nov 27 2019, 7:22 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, User-WDoran
Eevans edited projects for T228294: Cassandra PHP driver evaluation, added: Core Platform Team; removed Core Platform Team (Needs Cleaning - Cassandra Operational).
Nov 27 2019, 7:20 PM · Core Platform Team, User-Eevans

Nov 25 2019

Eevans moved T237143: Log warning: Duplicate get(): "officewiki:echo:seen:message:time:{n}" fetched 2 times from Waiting for Review to Done on the Core Platform Team Workboards (Clinic Duty Team) board.

This was deployed during SWAT. See: https://logstash.wikimedia.org/goto/a61eb70c51d26b11835e4bb4caadda0b

Nov 25 2019, 11:13 PM · Notifications, Growth-Team, MediaWiki-Cache, Core Platform Team Workboards (Clinic Duty Team)

Nov 22 2019

Eevans moved T231027: Cassandra instances outages (was: Outage of restbase2017-b) from Backlog to Ready on the Core Platform Team Workboards (Green) board.
Nov 22 2019, 1:38 AM · Core Platform Team Workboards (Green), User-Eevans
Eevans edited projects for T231027: Cassandra instances outages (was: Outage of restbase2017-b), added: Core Platform Team Workboards (Green); removed Core Platform Team Workboards (Clinic Duty Team).

Since an upgrade to Cassandra 3.11.4 was pending anyway (T200803: Upgrade Cassandra 3.11.2 clusters to 3.11.4 (bugfix release)), we prioritized that work a) in the event the issue had been fixed upstream, and b) so that if we had to dig deeper to troubleshoot, that we would be doing so against a current release. Unfortunately, this does not seem to have fixed (see: T238591), and we will indeed need to dig deeper.

Nov 22 2019, 1:38 AM · Core Platform Team Workboards (Green), User-Eevans
Eevans merged T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4) into T231027: Cassandra instances outages (was: Outage of restbase2017-b).
Nov 22 2019, 1:37 AM · Core Platform Team Workboards (Green), User-Eevans
Eevans merged task T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4) into T231027: Cassandra instances outages (was: Outage of restbase2017-b).
Nov 22 2019, 1:37 AM · Core Platform Team Workboards (Clinic Duty Team), Cassandra
Eevans renamed T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4) from Casssandra node outage: restbase2015-c to Casssandra node outage: restbase2015-c (Cassandra 3.11.4).
Nov 22 2019, 1:32 AM · Core Platform Team Workboards (Clinic Duty Team), Cassandra

Nov 19 2019

Eevans triaged T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4) as Medium priority.
Nov 19 2019, 7:50 PM · Core Platform Team Workboards (Clinic Duty Team), Cassandra

Nov 18 2019

Eevans added a subtask for T231027: Cassandra instances outages (was: Outage of restbase2017-b): T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4).
Nov 18 2019, 8:46 PM · Core Platform Team Workboards (Green), User-Eevans
Eevans added a parent task for T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4): T231027: Cassandra instances outages (was: Outage of restbase2017-b).
Nov 18 2019, 8:46 PM · Core Platform Team Workboards (Clinic Duty Team), Cassandra
Eevans added a comment to T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4).

This looks suspiciously similar to T231027, right down to the read timeout exceptions during read-repair that precede the event:

Nov 18 2019, 8:46 PM · Core Platform Team Workboards (Clinic Duty Team), Cassandra
Eevans added projects to T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4): Core Platform Team, Cassandra.
Nov 18 2019, 8:30 PM · Core Platform Team Workboards (Clinic Duty Team), Cassandra
Eevans created T238591: Cassandra node outage: restbase2015-c (Cassandra 3.11.4).
Nov 18 2019, 8:30 PM · Core Platform Team Workboards (Clinic Duty Team), Cassandra
Eevans raised the priority of T237143: Log warning: Duplicate get(): "officewiki:echo:seen:message:time:{n}" fetched 2 times from Medium to High.
Nov 18 2019, 5:05 PM · Notifications, Growth-Team, MediaWiki-Cache, Core Platform Team Workboards (Clinic Duty Team)

Nov 7 2019

Eevans added a comment to T227776: Generalize ParserCache into a generic service class for large "current" page-derived data.

I like the idea of having the ParserCache being a more generalized caching mechanism for MediaWiki. I have serious doubts about other things hinted here, specifically exposing a caching endpoint to other services. I'd argue that such a caching service should be separated from MediaWiki, have a simple API, and probably be structured around the page/revision identifier. We also probably don't want such a system to be written in PHP, as we would aim for the highest possible throughput.
.
We do not want an application doing some business logic to also be the cache storage for everything else. It was wrong with restbase, it would be wrong here. Each application should manage its own caching logic. This logic should not be delegated to another application and should not rely on the automagic properties of some centralized management system that then becomes the brain of the whole architecture. The only exception I see to this could be some purging logic.
.
So if we want such a system to be generalized and usable outside of MediaWiki it should be a thin service in front of a storage system[1] and it should:

  • Have primitives that reproduce whatever API we use with e.g. BagOfStuff
  • Be able to work across datacenters in write/write mode
Nov 7 2019, 10:57 PM · CPT Initiatives (Parsoid REST API in PHP (CDP2)), User-Eevans, User-mobrovac, TechCom, User-Daniel, Proposal
Eevans added a comment to T227776: Generalize ParserCache into a generic service class for large "current" page-derived data.

What is the use case for external access?

To clarify: I was referring to access outside MW core, but inside the local (in our case, WMF) network. The intent is not to make this a public service that can be accessed directly by external clients.
Concrete use cases (some currently in core), for extracting data from page content, and caching it for later access: Wikibase constraint validation, graphoid, kartographer, mathoid, template data, page summary...

How would an external consumer deal with value validation (e.g. matches known rev id), fragmentation parameters

Ideally, the cache service itself would know about these things and handle them correctly. E.g. before returning a cache entry, it would check that it's not stale, and when purging the entry for a given page, it would purge the entire "bucket" of cached variants.

and how would it deal with absence of the value? -I see ParserCache as fundamentally a getWithSet-like interface (with very high persistence and poolcounter etc, but nonetheless fundamentally lazy-populated).

Currently, ParserCache isn't getWithSet. If there is no entry cached or the cached entry is stale, you get nothing back. Generating and then caching is the caller's responsibility.
For the new component described here, I'd propose to keep it that way. Generally, a component that accesses the cache (inside mw core or as a standalone service) would be using the cache for a kind of derived resource it knows how to generate.
The idea is: there would be one place to go to for getting rendered content, and one to go to for getting extracted infobox data, and one to go to for graphoid output, etc - and each of these places knows how to generate the derived resources, and uses the unified cache internally. This makes more sense to me than a generic end point for fetchi9ng any kind of resource, with some kind of internal routing to generate each resource.
When then should different components that derive different kind of things from pages share the caching infrastructure, instead of writing their own? Because the purging mechanism is the same, and the access keys are the same, and the scale is similar. Having to re-invent this wheel leads to duplication and annoyance, or the abuse of less-than-ideal mechanisms that exist, like page props.

Nov 7 2019, 10:47 PM · CPT Initiatives (Parsoid REST API in PHP (CDP2)), User-Eevans, User-mobrovac, TechCom, User-Daniel, Proposal
Eevans added a comment to T227776: Generalize ParserCache into a generic service class for large "current" page-derived data.

Kask,[1] accessed via RESTBagOStuff?

Probably not Kask, but perhaps something similar, or a derivative or successor of Kask.

This sounds an awful lot like file storage (where I'm defining "file" to mean some semi-large (for definition of large) chunk of opaque data), which Kask (and Cassandra) aren't well suited for.

Though I'm not entirely sure that we want Cassandra as a backend for this.

Same.

Nov 7 2019, 10:35 PM · CPT Initiatives (Parsoid REST API in PHP (CDP2)), User-Eevans, User-mobrovac, TechCom, User-Daniel, Proposal
Eevans added a comment to T180051: Reduce the number of fields declared in elasticsearch by logstash.

An additional 2¢

Nov 7 2019, 7:18 PM · Patch-For-Review, observability, Core Platform Team Legacy (Watching / External), Services (watching), Operations, Wikimedia-Logstash

Nov 4 2019

Eevans updated subscribers of T234295: Migration of old timestamps.

During discussions w/ @Catrope, we determined that the impact of timestamp misses were sufficiently minor as to not justify us spending the time to write, test, and debug a migration of data from Redis. What we will do instead: Deploy with a MultiWriteBagOStuff that wraps the new and old store (read-from-new, fallback-to-old, write-to-both). There will be some 90 days or more between this deployment and the decommission of Redis, during which time most active users will have seen-times seeded into the new store.

Nov 4 2019, 3:26 PM · Growth-Team, Notifications, Core Platform Team Workboards (User Stories), Story, CPT Initiatives (Multi-DC Echo Notification Storage)

Nov 2 2019

Eevans triaged T237143: Log warning: Duplicate get(): "officewiki:echo:seen:message:time:{n}" fetched 2 times as Medium priority.
Nov 2 2019, 12:33 AM · Notifications, Growth-Team, MediaWiki-Cache, Core Platform Team Workboards (Clinic Duty Team)
Eevans created T237143: Log warning: Duplicate get(): "officewiki:echo:seen:message:time:{n}" fetched 2 times.
Nov 2 2019, 12:33 AM · Notifications, Growth-Team, MediaWiki-Cache, Core Platform Team Workboards (Clinic Duty Team)

Nov 1 2019

Eevans added a comment to T222851: Improve Echo seentime code for multi-DC access.

Summarizing an IRC discussion: @Catrope will pick this up mid-November(ish), and we'll target deployment for sometime after the November freeze (27th–29th), and before the December freeze (December 23rd-January 3rd).

Nov 1 2019, 3:00 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), CPT Initiatives (Multi-DC Echo Notification Storage), User-Eevans, Notifications, Growth-Team

Oct 31 2019

Eevans reopened T222851: Improve Echo seentime code for multi-DC access, a subtask of T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed., as Open.
Oct 31 2019, 9:03 PM · MediaWiki-General, serviceops-radar, User-mobrovac, User-jijiki, Performance-Team (Radar), Operations
Eevans reopened T222851: Improve Echo seentime code for multi-DC access, a subtask of T234294: Configurable timestamp storage, as Open.
Oct 31 2019, 9:03 PM · Growth-Team, Notifications, Core Platform Team Workboards (User Stories), Story, CPT Initiatives (Multi-DC Echo Notification Storage)
Eevans reopened T222851: Improve Echo seentime code for multi-DC access as "Open".

I believe this task to be done.

Oct 31 2019, 9:03 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), CPT Initiatives (Multi-DC Echo Notification Storage), User-Eevans, Notifications, Growth-Team
Eevans updated the task description for T222851: Improve Echo seentime code for multi-DC access.
Oct 31 2019, 9:02 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), CPT Initiatives (Multi-DC Echo Notification Storage), User-Eevans, Growth-Team, Notifications
Eevans updated the task description for T222851: Improve Echo seentime code for multi-DC access.
Oct 31 2019, 9:01 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), CPT Initiatives (Multi-DC Echo Notification Storage), User-Eevans, Growth-Team, Notifications