Page MenuHomePhabricator

Eevans (Eric Evans)
Senior Software Engineer

Projects (13)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Feb 27 2015, 10:47 PM (295 w, 5 d)
Availability
Available
IRC Nick
urandom
LDAP User
Eevans
MediaWiki User
Unknown

Recent Activity

Yesterday

Eevans committed rMSKSba9d72b5d6ef: Upgrade build environment to golang:1.13-3 (authored by Eevans).
Upgrade build environment to golang:1.13-3
Wed, Oct 28, 3:54 PM
Eevans committed rMSKS2f0d47c08c79: Create new buster porting branch (authored by Eevans).
Create new buster porting branch
Wed, Oct 28, 3:29 PM

Tue, Oct 20

Eevans added a comment to T258414: Cassandra Grafana dashboards seem to disagree with actual utilization.

The following list of keyspaces seem to correspond with these ghost snapshots.

Allegedly obsolete keyspaces
commons_T_parsoid
commons_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU
enwiki_T_parsoid
enwiki_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU
enwiki_T_references
others_T_parsoid
others_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU
others_T_references
wikipedia_T_parsoid
wikipedia_T_parsoidd3o5Dn1wcj_Xve2tXe4_rtmeWSU
wikipedia_T_references

@Pchelolo can you check me on this before I go around rm -r-ing stuff?

Tue, Oct 20, 2:41 PM · Platform Team Workboards (Clinic Duty Team), observability

Tue, Sep 29

Eevans updated subscribers of T258414: Cassandra Grafana dashboards seem to disagree with actual utilization.

The following list of keyspaces seem to correspond with these ghost snapshots.

Tue, Sep 29, 11:55 PM · Platform Team Workboards (Clinic Duty Team), observability
Eevans added a comment to T258414: Cassandra Grafana dashboards seem to disagree with actual utilization.

It seems as though we do in fact have some snapshot data:

Tue, Sep 29, 11:48 PM · Platform Team Workboards (Clinic Duty Team), observability
Eevans updated the task description for T258414: Cassandra Grafana dashboards seem to disagree with actual utilization.
Tue, Sep 29, 11:39 PM · Platform Team Workboards (Clinic Duty Team), observability
Eevans updated the task description for T258414: Cassandra Grafana dashboards seem to disagree with actual utilization.
Tue, Sep 29, 11:39 PM · Platform Team Workboards (Clinic Duty Team), observability
Eevans updated the task description for T258414: Cassandra Grafana dashboards seem to disagree with actual utilization.
Tue, Sep 29, 11:28 PM · Platform Team Workboards (Clinic Duty Team), observability

Sep 21 2020

Eevans added a comment to T261512: Provision new RESTBase/Cassandra cluster nodes: restbase1028, restbase1029, restbase1030.

Which racks should these new hosts go onto? I am guessing one into each of a b and d, which are the eqiad racks atm (all at 4 hosts currently).

Sep 21 2020, 7:28 PM · RESTBase-Cassandra, Platform Engineering, Cassandra

Aug 28 2020

Eevans created T261512: Provision new RESTBase/Cassandra cluster nodes: restbase1028, restbase1029, restbase1030.
Aug 28 2020, 4:13 PM · RESTBase-Cassandra, Platform Engineering, Cassandra

Aug 6 2020

Eevans added a comment to T256863: restbase2009 down.

I'm looking into this today - I see that restbase2009 is up 9 days, has been configured by puppet and added to the Cassandra cluster but I don't see anything in SAL about who did it. Still investigating

eevans@restbase2009:~$ c-any-nt status -r | grep 2009
UN  restbase2009-a.codfw.wmnet  554.01 GiB  256          6.9%              c0d7a947-d423-49b3-b307-416a783a722f  d
UN  restbase2009-b.codfw.wmnet  523.81 GiB  256          6.6%              3ec9435b-dc45-46d3-a2ea-d0d5153615a9  d
UN  restbase2009-c.codfw.wmnet  470.2 GiB  256          6.5%              2e4e1268-1e17-476f-aa9e-b8c42035d115  d
eevans@restbase2009:~$

Umm, wow.

TL;DR It wasn't me. :)

Aug 6 2020, 2:02 PM · RESTBase, Operations, ops-codfw
Eevans added a comment to T256863: restbase2009 down.

I'm looking into this today - I see that restbase2009 is up 9 days, has been configured by puppet and added to the Cassandra cluster but I don't see anything in SAL about who did it. Still investigating

Aug 6 2020, 1:58 PM · RESTBase, Operations, ops-codfw

Jul 29 2020

Eevans updated subscribers of T256769: Client Developer makes unauthenticated sample API calls.

Setting aside how antithetical this requirement seems for a Wikimedia project...

Jul 29 2020, 9:40 PM · Patch-For-Review, Platform Team Sprints Board (Sprint 1), Platform Team Workboards (Green), Story, Platform Team Initiatives (API Gateway)

Jul 20 2020

Eevans created T258414: Cassandra Grafana dashboards seem to disagree with actual utilization.
Jul 20 2020, 4:12 PM · Platform Team Workboards (Clinic Duty Team), observability
Eevans updated subscribers of T256863: restbase2009 down.

Hi @Eevans - it looks like this was originally scheduled to be refreshed this fiscal year during the annual CapEx planning, but then someone decided to push out the refresh until FY21-22. Can this be decommissioned, since we're towards the end of the 5yr server life cycle? Thanks, Willy

Do you mean... indefinitely?

Yeah. Unfortunately, this server is about 5yrs old now, and out of warranty. @Eevans - will you be able to get by without having this system in rotation, until it's time to refresh it? On line 51 for this year's CapEx budget sheet below, I see there's a comment there to postpone the refresh until FY21-22 along with the rest of the batch.

Jul 20 2020, 4:04 PM · RESTBase, Operations, ops-codfw

Jul 17 2020

Eevans added a comment to T256863: restbase2009 down.

Hi @Eevans - it looks like this was originally scheduled to be refreshed this fiscal year during the annual CapEx planning, but then someone decided to push out the refresh until FY21-22. Can this be decommissioned, since we're towards the end of the 5yr server life cycle? Thanks, Willy

Jul 17 2020, 2:35 PM · RESTBase, Operations, ops-codfw

Jul 16 2020

Eevans added a comment to T258155: Bump kask base images from stretch to buster.

I have a local branch where I ported Kask to Buster (as I recall, there were one or two minor changes to the APIs of dependencies). I'll dig that up.

Jul 16 2020, 4:01 PM · serviceops-radar, Platform Team Initiatives (Session Management Service (CDP2))

Jun 23 2020

Eevans added a comment to T224041: Kask functional testing with Cassandra via the Deployment Pipeline.

However, up to now we did not have a need to publish the test images to the registry but still run integration tests via helm test (IIRC mathoid runs them). What has changed?

I might have misunderstood that the team wanted to run the functional and integration tests that are in the kask repo written in go.

I just took a closer look at service-checker and I think it would be able to replicate the go integration tests, but only the get and post, not the delete ones, so if that's sufficient, then you are right, we wouldn't need to publish any additional images or make changes to the kask chart (newly added to task checklist).

Jun 23 2020, 9:38 PM · Patch-For-Review, Platform Engineering, Platform Team Initiatives (Session Management Service (CDP2)), Release-Engineering-Team (Pipeline), Release-Engineering-Team-TODO, Services (next), User-Eevans, Release Pipeline, Operations, serviceops

Jun 9 2020

Eevans added a comment to T254911: Investigate how to include private claims in JWTs.

any information included in JWTs is publicly visible to anyone with access to the JWT. Because the JWT is signed, the rate limiting information cannot be modified/hacked. But it can be seen. Therefore, care should be taken regarding claims that include confidential or non-public information. This is somewhat mitigated by the fact that JWTs primarily function as access tokens. They are therefore exchanged over HTTPS, are not publicly logged, and non-authorized parties having access to them would already constitute a security issue.

Jun 9 2020, 5:10 PM · Patch-For-Review, MediaWiki-extensions-OAuth, Platform Team Initiatives (API Gateway)

Jun 4 2020

Eevans added a comment to T209106: Setup session storage service testing/continuous integration.

Anything left to do here?

Jun 4 2020, 4:30 PM · Platform Team Initiatives (Session Management Service (CDP2)), User-Clarakosi, User-Eevans
Eevans added a comment to T209109: Security model for session storage service.

I think we can close it.

Jun 4 2020, 4:26 PM · Platform Team Initiatives (Session Management Service (CDP2)), User-Clarakosi, User-Eevans
Eevans added a comment to T222990: Audit session storage to determine max age of un-GC'd sessions.

Anything left to do here?

Jun 4 2020, 4:25 PM · Platform Team Initiatives (Session Management Service (CDP2)), audits-data-retention, User-Clarakosi, User-Eevans

May 20 2020

Eevans created T253244: Upstream gocql bug effects Kask.
May 20 2020, 4:46 PM · Platform Team Workboards (Clinic Duty Team), Cassandra

May 19 2020

Eevans added a comment to T252898: echostore connection error in Beta Cluster.

echoseen in deployment-prep doesn't do HTTPS

May 19 2020, 8:57 PM · Platform Team Workboards (Clinic Duty Team), Growth-Team, Beta-Cluster-Infrastructure, Notifications

May 18 2020

Eevans added a comment to T249756: Cassandra3 migration plan proposal .

@Eevans please be patient, me and Joseph had a long chat about upgrading in place and there are some doubts that we have. Overall, what we'd like to do is something like:

May 18 2020, 4:01 PM · Analytics-Kanban, Analytics-Clusters, User-Elukey, Cassandra

May 13 2020

Eevans added a comment to T249756: Cassandra3 migration plan proposal .

Before starting, there are some notes to keep in mind:

  • we have currently 6 nodes running cassandra 2.2
  • 3 of them are due to be refreshed due to hw warranty expiration
  • we have in mind to expand the cluster to 9/12 nodes to host more data (needs to be verified)
  • the goal is to upgrade to Cassandra 3.11 (already running for Restbase)

We have a couple of options, not sure about their flexibility/etc.. yet:

  • Upgrade the cluster in place

This is something that wasn't tested by the Services team at the time, since a new Restbase cluster was created. The idea should be to upgrade one node at the time, and call nodetool sstableupgrade (it takes a long time to complete usually).
We could take the current cluster, upgrade it in place, and then add/remove new nodes later on.
The in place upgrade looks appealing, but if we hit bugs half way through we could end up in a weird state (recovering may not be trivial at this point).

  • Create a new cluster and stream sstables content to it

This could be doable with sstableloader, but there are some question marks about streaming sstables v2.2 to a 3.11 cluster (should we use sstableloader v3.11 on a 2.2 node? Is it sufficient to use sstableloader 2.2 on a 2.2 node, stream to a 3.11 node and then run nodetool sstableupgrade on it? etc..).

May 13 2020, 4:25 PM · Analytics-Kanban, Analytics-Clusters, User-Elukey, Cassandra

May 5 2020

Eevans committed rMSKSb38c389701c8: convert test to prototype nodejs module (authored by Eevans).
convert test to prototype nodejs module
May 5 2020, 8:54 PM

Apr 30 2020

Eevans committed rMSKS1f80ad6ff397: Integration tests based on the mediawiki/api-testing framework (authored by Eevans).
Integration tests based on the mediawiki/api-testing framework
Apr 30 2020, 8:58 PM

Apr 29 2020

Eevans added a comment to T251063: Investigate issues with domain overlap.

There is related discussion at https://www.mediawiki.org/wiki/Topic:Vie8y5khj6w3qs3y

The defining question for me, which also came up in the discussion linked above, is whether we are building only an API Portal, or whether we are building a Developer Portal and we just happen to be starting with the API portion because that's what we're working on right now.

If we're building a Developer Portal, the name developer.wikimedia.org makes sense to me. If we're just building an API Portal and stopping there, then the name api.wikimedia.org makes sense to me. My understanding is that we're building just an API Portal, and I don't see any technical showstoppers for using the name api.wikimedia.org, so that's the name I prefer.

Apr 29 2020, 7:51 PM · Platform Team Initiatives (API Gateway)

Apr 27 2020

Eevans closed T250050: Degraded RAID on restbase2014 as Resolved.

AFAIK, this is complete

Apr 27 2020, 9:20 PM · Operations, ops-codfw

Apr 20 2020

Eevans added a comment to T250050: Degraded RAID on restbase2014.

@Eevans /dev/sdc has been replaced. Let me know if you have any questions

Apr 20 2020, 3:50 PM · Operations, ops-codfw

Apr 17 2020

Eevans added a comment to T250498: restbase2014: systemd critical - cassandra-c.service loaded failed.

I think this popped up when Puppet was re-enabled (which ironically is failing anyway because of the failed unit). It's been put under maintenance.

Apr 17 2020, 4:08 PM · Operations, RESTBase-Cassandra, RESTBase
Eevans merged T250498: restbase2014: systemd critical - cassandra-c.service loaded failed into T250050: Degraded RAID on restbase2014.
Apr 17 2020, 4:06 PM · Operations, ops-codfw
Eevans merged task T250498: restbase2014: systemd critical - cassandra-c.service loaded failed into T250050: Degraded RAID on restbase2014.
Apr 17 2020, 4:06 PM · Operations, RESTBase-Cassandra, RESTBase

Apr 15 2020

Eevans added a comment to T250050: Degraded RAID on restbase2014.

@Eevans the IDRAC is not showing any failed drive. Is it possible for you to get me some system logs showing the bad disk so i can upload that when i ask for a disk replacement. The last log i have for this system from the IDRAC is from 2018 .

Also I need to clear the log and upgrade the firmware on this system

BIOS Version 1.5.6
iDRAC Firmware Version 3.21.21.21

new verison

BIOS Version 2.5.4
iDRAC Firmware Version 4.10.10

please let me know when i can do this.

Thanks

Apr 15 2020, 4:39 PM · Operations, ops-codfw

Apr 14 2020

Eevans added a comment to T250050: Degraded RAID on restbase2014.

[ ... ]

Once complete we'll need to do the b & c instances as well. Once that is complete, we can either re-image the node entirely, or replace the SSD, rebuild the arrays, and then completely wipe Cassandra state (I'd prefer the former for repeatability sake, but defer to SRE here). I'll update the ticket when we're at that point.

Apr 14 2020, 5:29 PM · Operations, ops-codfw
Eevans added a comment to T250050: Degraded RAID on restbase2014.

OK, so it seems like we have a failed SSD (/dev/sdc), and as a result, some degraded arrays. Ideally we'd be able to replace the SSD and rebuild the array, but we are using the /dev/sd[x]4 partitions on these machines as a JBOD for Cassandra. Unfortunately, it distributes its own system tables over these devices as well, and isn't recoverable after losing a chunk of them like this.

Apr 14 2020, 1:52 PM · Operations, ops-codfw
Eevans added a comment to T250050: Degraded RAID on restbase2014.

@Eevans this is the weekend of broken cassandra hosts, adding you as FYI :)

Apr 14 2020, 1:42 PM · Operations, ops-codfw

Apr 10 2020

Eevans closed T248543: Evaluate Envoy proxy for API gateway (and rate-limiter) as Resolved.
Apr 10 2020, 3:43 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans closed T248543: Evaluate Envoy proxy for API gateway (and rate-limiter), a subtask of T235270: Wikimedia API Gateway, as Resolved.
Apr 10 2020, 3:43 PM · Platform Team Workboards (Initiatives), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Apr 10 2020, 3:42 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans added a comment to T235437: RESTBase/RESTRouter/service-runner rate limiting plans.

Ah, I see. My interest is specifically in service-runner as my understanding is that it will continue to be used by most or all Wikimedia Node.js services. I'm currently working on updating an open-source Node.js service for Wikimedia production and I'm wondering (a) if I should plan to incorporate service-runner as part of that work, and (b) if so, whether I can plan to use service-runner's existing rate-limiting facility or I should plan to look elsewhere for that.

Apr 10 2020, 2:38 PM · Platform Engineering (Icebox), service-runner, User-mobrovac, Services (doing), Platform Team Initiatives (RESTBase Split (CDP2)), serviceops, Kubernetes, Service-deployment-requests, Operations

Apr 9 2020

Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Apr 9 2020, 9:53 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Apr 9 2020, 9:08 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Apr 9 2020, 7:45 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Apr 9 2020, 1:01 AM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)

Apr 1 2020

Eevans edited P10847 Masterwork From Distant Lands.
Apr 1 2020, 4:43 PM

Mar 31 2020

Eevans closed T248018: Drop Cassandra keyspaces for /page/references as Resolved.

Done.

Mar 31 2020, 12:26 AM · Platform Team Workboards (Clinic Duty Team), Page Content Service, Product-Infrastructure-Team-Backlog

Mar 30 2020

Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 9:54 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 9:45 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 9:35 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 8:39 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 8:00 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 7:57 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 7:47 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 7:45 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 7:43 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 30 2020, 7:41 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)

Mar 26 2020

Eevans updated the task description for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 26 2020, 6:38 PM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans updated subscribers of T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 26 2020, 12:56 AM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans added a subtask for T235270: Wikimedia API Gateway: T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 26 2020, 12:54 AM · Platform Team Workboards (Initiatives), Platform Team Initiatives (API Gateway)
Eevans added a parent task for T248543: Evaluate Envoy proxy for API gateway (and rate-limiter): T235270: Wikimedia API Gateway.
Mar 26 2020, 12:54 AM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans triaged T248543: Evaluate Envoy proxy for API gateway (and rate-limiter) as Medium priority.
Mar 26 2020, 12:53 AM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)
Eevans created T248543: Evaluate Envoy proxy for API gateway (and rate-limiter).
Mar 26 2020, 12:51 AM · Platform Team Workboards (Green), Platform Team Initiatives (API Gateway)

Mar 25 2020

Eevans added a comment to T248018: Drop Cassandra keyspaces for /page/references.

Ok, these keyspaces have been removed from production, dev, and deployment-prep. Out of an abundance of caution, I will leave this open until Friday, and close after cleaning up the snapshots.

Mar 25 2020, 9:52 PM · Platform Team Workboards (Clinic Duty Team), Page Content Service, Product-Infrastructure-Team-Backlog
Eevans added a comment to T248018: Drop Cassandra keyspaces for /page/references.

LGTM. Beta cluster has one as well

Mar 25 2020, 9:48 PM · Platform Team Workboards (Clinic Duty Team), Page Content Service, Product-Infrastructure-Team-Backlog
Eevans triaged T248018: Drop Cassandra keyspaces for /page/references as Medium priority.
Mar 25 2020, 9:33 PM · Platform Team Workboards (Clinic Duty Team), Page Content Service, Product-Infrastructure-Team-Backlog
Eevans added a comment to T248018: Drop Cassandra keyspaces for /page/references.

This has been deployed, so we can drop those key spaces!

Mar 25 2020, 9:33 PM · Platform Team Workboards (Clinic Duty Team), Page Content Service, Product-Infrastructure-Team-Backlog

Mar 13 2020

Eevans added a comment to T239856: Fold services recommendations into Standards for services RfC.

Is there anything left to do/review here?

Mar 13 2020, 8:12 PM · Platform Team Workboards (Clinic Duty Team)

Mar 6 2020

Eevans added a comment to T243544: Cassandra PHP language driver packaging (Debian).

I've overhauled things and moved stuff to a more Debian-compliant layout here: https://github.com/nosmo/cpp-driver/tree/debian/debian
Still not sure if this is up to snuff though, needs some testing.

Mar 6 2020, 1:44 PM · Platform Team Workboards (Initiatives), User-Eevans

Mar 3 2020

Eevans placed T137419: Investigate aberrant disk read throughput in Cassandra (affects 2.2.x and 3.x) up for grabs.

This is actually something we should be following up on with upstream more aggressively.

@Eevans: Hi, do you plan to do this? Asking as you you have been task assignee for a while now.

Mar 3 2020, 3:51 PM · Platform Engineering (Icebox), User-Eevans, Services (later), Cassandra
Eevans updated subscribers of T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.

Yeah, it's ready to be closed, but AFAIK, we're supposed to wait for the PM (@CCicalese_WMF) to close it after moving it to Done on the workboard.

Mar 3 2020, 3:47 PM · Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans placed T92471: enable authenticated access to Cassandra JMX up for grabs.
Mar 3 2020, 3:43 PM · Platform Engineering (Icebox), User-Eevans, Cassandra, Operations, Patch-For-Review

Feb 27 2020

Eevans created T246379: Research rate limiter implementations and rate limiter-capable HTTP reverse proxies.
Feb 27 2020, 8:59 PM · Platform Team Initiatives (API Gateway), Platform Team Workboards (Green)

Feb 25 2020

zeljkofilipin awarded T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` a Party Time token.
Feb 25 2020, 10:49 AM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Feb 24 2020

Cparle awarded T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` a Like token.
Feb 24 2020, 4:30 PM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)
Eevans added a comment to T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session`.

I believe this is pending https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/574034 (T224712).

Feb 24 2020, 4:13 PM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Feb 21 2020

Eevans reassigned T245875: Parsoid REST endpoint not working on en.wikipedia.beta.wmflabs.org from Eevans to Pchelolo.
Feb 21 2020, 10:02 PM · Platform Team Workboards (Clinic Duty Team)
Eevans closed T245875: Parsoid REST endpoint not working on en.wikipedia.beta.wmflabs.org as Resolved.
Feb 21 2020, 10:02 PM · Platform Team Workboards (Clinic Duty Team)

Feb 19 2020

Kaartic awarded T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` a Heartbreak token.
Feb 19 2020, 5:51 PM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)
Eevans moved T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` from Done to Inbox on the Platform Team Workboards (Clinic Duty Team) board.
Feb 19 2020, 1:59 PM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Feb 18 2020

Eevans added a comment to T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session`.

I don't think this is sessionstore, (at least, it's not the timeout issue with Cassandra that we saw before).

Feb 18 2020, 2:37 PM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Feb 13 2020

WMDE-Fisch awarded T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` a Heartbreak token.
Feb 13 2020, 9:27 AM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)
awight awarded T243123: Login to at least en.wikipedia.beta.wmflabs.org and commons.wikimedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login session` a Heartbreak token.
Feb 13 2020, 9:27 AM · Platform Team Workboards (Clinic Duty Team), MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure, MediaWiki-User-login-and-signup, MediaWiki-Core-Testing, User-zeljkofilipin, Quality-and-Test-Engineering-Team (QTE)

Feb 10 2020

Eevans updated the task description for T243106: Phased rollout of sessionstore to production fleet.
Feb 10 2020, 7:41 PM · Performance-Team-publish, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (Session Management Service (CDP2)), serviceops-radar, TPG-Epics (Team Practices Group Coaching Clinic), User-Clarakosi, User-Eevans
Eevans awarded T244508: Request for +2 access to mediawiki-config a Party Time token.
Feb 10 2020, 7:37 PM · Release-Engineering-Team, Operations, SRE-Access-Requests, Gerrit-Privilege-Requests
Eevans added a comment to T244508: Request for +2 access to mediawiki-config.

@Eevans You should have +2 on the mw-config repo now. Probably after logging out and back in.

Feb 10 2020, 7:37 PM · Release-Engineering-Team, Operations, SRE-Access-Requests, Gerrit-Privilege-Requests

Feb 6 2020

Eevans created T244508: Request for +2 access to mediawiki-config.
Feb 6 2020, 5:51 PM · Release-Engineering-Team, Operations, SRE-Access-Requests, Gerrit-Privilege-Requests
Eevans edited P10322 Masterwork From Distant Lands.
Feb 6 2020, 5:41 PM

Feb 5 2020

Eevans updated the task description for T243106: Phased rollout of sessionstore to production fleet.
Feb 5 2020, 4:45 PM · Performance-Team-publish, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (Session Management Service (CDP2)), serviceops-radar, TPG-Epics (Team Practices Group Coaching Clinic), User-Clarakosi, User-Eevans

Feb 4 2020

Eevans updated the task description for T243106: Phased rollout of sessionstore to production fleet.
Feb 4 2020, 7:54 PM · Performance-Team-publish, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (Session Management Service (CDP2)), serviceops-radar, TPG-Epics (Team Practices Group Coaching Clinic), User-Clarakosi, User-Eevans
Eevans moved T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c} from Doing to Done on the Platform Team Workboards (Clinic Duty Team) board.

This is complete.

Feb 4 2020, 4:45 PM · Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans updated the task description for T243000: Bootstrap new Cassandra instances: restbase202[123]-{a,b,c}.
Feb 4 2020, 4:45 PM · Platform Team Workboards (Clinic Duty Team), ops-codfw, Operations
Eevans added a comment to T242461: restrouter.svc.{eqiad,codfw}.wmnet in a failed state.

I believe we have consensus around de-deploying restrouter from k8s, @WDoranWMF can you confirm?

Feb 4 2020, 12:26 AM · serviceops
Eevans updated the task description for T243106: Phased rollout of sessionstore to production fleet.
Feb 4 2020, 12:12 AM · Performance-Team-publish, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (Session Management Service (CDP2)), serviceops-radar, TPG-Epics (Team Practices Group Coaching Clinic), User-Clarakosi, User-Eevans
Eevans updated the task description for T243106: Phased rollout of sessionstore to production fleet.
Feb 4 2020, 12:11 AM · Performance-Team-publish, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (Session Management Service (CDP2)), serviceops-radar, TPG-Epics (Team Practices Group Coaching Clinic), User-Clarakosi, User-Eevans

Feb 3 2020

Eevans added a comment to T244178: Deploy restbase to restbase202[123].

All these steps should be done after Cassandra is bootstrapped. See T219404 for the ticket where fresh deploy was done the previous time.

Feb 3 2020, 9:59 PM · Patch-For-Review, Platform Team Workboards (Clinic Duty Team)
Eevans triaged T244178: Deploy restbase to restbase202[123] as Medium priority.
Feb 3 2020, 9:09 PM · Patch-For-Review, Platform Team Workboards (Clinic Duty Team)
Eevans created T244178: Deploy restbase to restbase202[123].
Feb 3 2020, 9:09 PM · Patch-For-Review, Platform Team Workboards (Clinic Duty Team)
Eevans committed rDEPLOYCHARTS6751955304bc: Upgrade sessionstore production to Kask v1.0.6 (authored by Eevans).
Upgrade sessionstore production to Kask v1.0.6
Feb 3 2020, 5:34 PM