Page MenuHomePhabricator

Joe (Giuseppe Lavagetto)
Spy

Projects (23)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 5:57 AM (416 w, 6 d)
Availability
Available
LDAP User
Giuseppe Lavagetto
MediaWiki User
GLavagetto (WMF) [ Global Accounts ]

Recent Activity

Yesterday

Ladsgroup awarded T271736: Migrate WMF production from PHP 7.2 to PHP 7.4 a Love token.
Thu, Sep 29, 11:47 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
TheresNoTime awarded T318894: Remove php 7.2 from production a Love token.
Thu, Sep 29, 7:31 AM · Patch-For-Review, Dumps-Generation, Performance-Team (Radar), serviceops
Joe created T318894: Remove php 7.2 from production.
Thu, Sep 29, 7:30 AM · Patch-For-Review, Dumps-Generation, Performance-Team (Radar), serviceops
Joe added a comment to T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.

Production is effectively migrated, so this task can be considered "Resolved" as far as CI etc are concerned.

Thu, Sep 29, 6:01 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Thu, Sep 29, 5:57 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops

Wed, Sep 28

Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Wed, Sep 28, 10:39 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Wed, Sep 28, 6:51 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Wed, Sep 28, 6:50 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Wed, Sep 28, 5:28 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops

Tue, Sep 27

Joe added a comment to T318697: Ensure wikimedia::memcached role bootstraps cleanly.

We can probably check in an exec that the socket exists and restart memcached only if it doesn't, or something along those lines.

Tue, Sep 27, 3:36 PM · serviceops
Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Tue, Sep 27, 3:11 PM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe added a project to T318065: IABot is encountering 429 on Wikimedia Production: Traffic.

Do you happen to have any further detail on the response headers and body you get whenever you receive a 429 response? it would help us identify which layer is returning the 429 errors

Tue, Sep 27, 2:30 PM · Traffic, InternetArchiveBot, SRE
Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Tue, Sep 27, 2:12 PM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe triaged T318671: Ensure that all appserver-related roles can be cleanly applied on bootstrap as Medium priority.
Tue, Sep 27, 9:16 AM · serviceops
Joe created T318671: Ensure that all appserver-related roles can be cleanly applied on bootstrap.
Tue, Sep 27, 9:16 AM · serviceops
Joe closed T313973: GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::get Unable to decode JSON response for page {title} upstream connect error or disconnect/reset before headers. reset reason: connection termination as Resolved.

My patch from yesterday that reduced the keepalive timeout to a short time in the service proxy has resolved most of the "upstream connection failed" issues:

Tue, Sep 27, 5:00 AM · Structured-Data-Backlog, Structured Data Engineering, Patch-For-Review, serviceops, API Platform, MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Platform Engineering, Growth-Team (Current Sprint), Image-Suggestions, Growth-Structured-Tasks, Wikimedia-production-error
Joe updated the task description for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Tue, Sep 27, 4:32 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops

Mon, Sep 26

Joe added a comment to T315995: Document how to disable x2 per DC.

@Marostegui we now have a cookbook to depool the non-primary datacenters from traffic, using:

Mon, Sep 26, 5:48 AM · Sustainability (Incident Followup), Performance-Team

Sun, Sep 11

Joe added a comment to T312638: Parsoid migration to php 7.4.

html2wt time per output KB also saw a dip although it is not as prominent as the wt2html one.

But, the p75 html2wt components graph makes it really obvious. This change is for all the components, even the small lines at the bottom (which shows up when you suppress the domdiff and serialize graphs).

Looks like maybe an approximately 30% speedup based on eyeballing the various plots in the full panel.

Sun, Sep 11, 9:22 AM · Performance-Team-publish, Patch-For-Review, Parsoid, Performance-Team (Radar), serviceops

Fri, Sep 9

Joe created P34337 (An Untitled Masterwork).
Fri, Sep 9, 3:05 PM

Thu, Sep 8

Joe updated subscribers of T317283: Coordinate with ServiceOps Team about a rework of the Search Update Pipeline.

There is a general problem I have with this plan, which is that as we stand, the API and appserver clusters are reserved (as much as possible) to live requests from the website or other services to perform their duties.

Thu, Sep 8, 10:11 AM · Discovery-Search (Current work), serviceops
Joe added a comment to T317187: GrowthExperiments Special:Homepage: investigate performance regression since September 6 2022.

The appserver envoy metrics show many 503s, but the istio metrics and logs show no errors. So my theory is that something goes wrong with kube-proxy, but only for ingestion. If I understand correctly, the request flow is MW -> envoy -> kube-proxy -> istio/envoy -> kube-proxy -> image-suggestions. If kube-proxy was broken both times, istio would see errors.

Thu, Sep 8, 5:48 AM · Performance-Team (Radar), Growth-Team (Current Sprint), GrowthExperiments, Performance Issue

Wed, Sep 7

Joe added a comment to T317128: Failed to convert audio to MP3: /bin/bash: /usr/bin/lame: No such file or directory on Beta Cluster.

The reason lame is not installed in production appservers is that it uses a remote shellbox instance for security reasons.

Wed, Sep 7, 10:58 AM · Beta-Cluster-Infrastructure, Community-Tech, MediaWiki-extensions-Phonos
Joe added a comment to T315995: Document how to disable x2 per DC.

From irc:

[09:45:42]  <volans> is there already some documentation on how depool MW from codfw? I was thinking to add a note in the dns's admin_state with a link as a reminder to consider if people needs to depool MW too when depooling one of the core DCs in the DNS
[09:46:34]  <volans> (this ofc applies mostly for the RO DC, is not meant to replace the switchdc workflow, whose cookbooks needs to be refactored to take into account the new structure)
[09:47:15]  <@marostegui> volans: I created https://phabricator.wikimedia.org/T315995 but we can probably expand it to be: how to disable the RO DC from MW

@Krinkle @tstarling can we document the above too? How to disable the RO DC entirely from MW in case of issues?

Wed, Sep 7, 7:59 AM · Sustainability (Incident Followup), Performance-Team

Tue, Sep 6

Joe added a comment to T317064: History pages' caches not being invalidated after edits.

Another option is to do the query sorting for purges, which are a special case, in either:

Tue, Sep 6, 9:48 AM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Core-HTTP-Cache, SRE, Regression, Traffic, MediaWiki-Page-history

Mon, Sep 5

Joe added a comment to T306042: Serve beta cluster via PHP 7.4 by default.

As of now, deployment-prep is using php 7.4 only. We can cleanup later and remove php 7.2 completely.

Mon, Sep 5, 10:48 AM · Beta-Cluster-Infrastructure, serviceops
Joe closed T306042: Serve beta cluster via PHP 7.4 by default, a subtask of T271736: Migrate WMF production from PHP 7.2 to PHP 7.4, as Resolved.
Mon, Sep 5, 10:34 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe closed T306042: Serve beta cluster via PHP 7.4 by default as Resolved.
Mon, Sep 5, 10:34 AM · Beta-Cluster-Infrastructure, serviceops

Fri, Sep 2

Joe awarded T316706: Run user-submitted code under gVisor a Love token.
Fri, Sep 2, 5:15 AM · Abstract Wikipedia team (Phase θ – Throttling)

Thu, Sep 1

Joe added a comment to T314118: Reduce IRC flood/spam during incidents.

Regarding the appserver alerts, I think we should go in the following direction:

  • Have one metric that tells us if apache is up; I think that AppserversUnreachable is checking up{job="apache"}, instead than apache_up which is what the exporter uses to signal if the apache server is reachable or not. But otherwise, we're covered and we can remove the individual server alerts
  • I'd also add a per-server alert if apache is down for more than 3 hours on that specific server, though - just so that if a server has been left misconfigured or broken for any reason we'll notice.
  • We also need a metric that tells us if php-fpm is able to respond to queries, although that is mostly covered by the PHPBusyWorkers alerts
  • Finally, and this is still lacking in alertmanager, we need a metric similar to the php7 rendering one, basically we need an http check of an url that involves calling the mediawiki code. Probably a good candidate is what pybal check.
Thu, Sep 1, 10:14 AM · Patch-For-Review, serviceops-radar, User-fgiunchedi, SRE Observability (FY2022/2023-Q1), SRE
Joe closed T316601: PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos' as Resolved.
Thu, Sep 1, 9:29 AM · MW-1.39-notes (1.39.0-wmf.28; 2022-09-05), Performance-Team, MediaWiki-libs-Rdbms, Wikimedia-production-error
Joe added a comment to T316601: PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'.

We're back to 1% of traffic and no issue is showing up, I'll resolve the task and move to 5% now.

Thu, Sep 1, 9:29 AM · MW-1.39-notes (1.39.0-wmf.28; 2022-09-05), Performance-Team, MediaWiki-libs-Rdbms, Wikimedia-production-error
Joe added a comment to T312638: Parsoid migration to php 7.4.

Fopr the record, that failure was expected and Clement was well aware of

Thu, Sep 1, 8:07 AM · Performance-Team-publish, Patch-For-Review, Parsoid, Performance-Team (Radar), serviceops
Joe created P33731 (An Untitled Masterwork).
Thu, Sep 1, 7:41 AM
Joe added a comment to T316601: PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'.

I have brought back 1% of the user traffic to php 7.4 now, let's see if errors spike up again, otherwise I'll resolve the task.

Thu, Sep 1, 6:13 AM · MW-1.39-notes (1.39.0-wmf.28; 2022-09-05), Performance-Team, MediaWiki-libs-Rdbms, Wikimedia-production-error
Joe added a comment to T316601: PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'.

@hashar the latest package is +buster3.1

Thu, Sep 1, 6:12 AM · MW-1.39-notes (1.39.0-wmf.28; 2022-09-05), Performance-Team, MediaWiki-libs-Rdbms, Wikimedia-production-error

Wed, Aug 31

Joe changed the status of T316601: PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos' from Open to In Progress.

Just to clarify, this task is about a production issue.

Wed, Aug 31, 10:47 AM · MW-1.39-notes (1.39.0-wmf.28; 2022-09-05), Performance-Team, MediaWiki-libs-Rdbms, Wikimedia-production-error
Joe added a comment to T316601: PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'.

Code search for __serialize shows the following implementations in deployed code:

  • ApiMessageTrait
  • Message
  • GenericArrayObject
  • HashRing
  • MapCacheLRU
  • MySQLPrimaryPos
  • Site
  • Various classes in ramsey/uuid
  • RunningStat\PSquare
  • Wikibase\Lib\Changes\{EntityDiffChangedAspects, RepoRevisionIdentifier}
  • Wikibase\DataModel\Snak\{ReferenceList, SnakList}
  • Wikibase\MediaInfo\DataModel\MediaInfoId

If I understand the bug correctly, if any of these classes are serialized with PHP 7.4, they will fail to unserialize on PHP 7.2, so it's not enough to just patch MySQLPrimaryPos.

My PHP patch is tested now. With php72-serialize.php having the code from T316601#8196601:

$ sapi/cli/php /srv/mw/core/junk/php72-serialize.php 
C:4:"Test":27:{a:1:{s:3:"foo";s:4:"Test";}}
Test
Test

This shows my patched serialize() producing PHP 7.2 compatible output but correctly interpreting the new format. That's with the updated patch 7bc2f4d194e2136 at the head of my bc-serialize-7.4 branch. It compiles with -Werror and should pass make test.

Wed, Aug 31, 5:57 AM · MW-1.39-notes (1.39.0-wmf.28; 2022-09-05), Performance-Team, MediaWiki-libs-Rdbms, Wikimedia-production-error

Aug 30 2022

Joe added a comment to T316601: PHP Warning: Erroneous data format for unserializing 'Wikimedia\Rdbms\MySQLPrimaryPos'.

The comment by @Krinkle is incorrect, the traffic gets drained up pretty quickly (see https://logstash.wikimedia.org/goto/db7753562f930806440d6fbd95c7b7ab for the disappearing warning), as it's always historically been, as the cookie is reevalued on every page view.

Aug 30 2022, 5:30 AM · MW-1.39-notes (1.39.0-wmf.28; 2022-09-05), Performance-Team, MediaWiki-libs-Rdbms, Wikimedia-production-error
Joe closed T316611: No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php as Invalid.

These errors come from servers we didn't finish installing yesterday, but are still in the process of getting a complete mediawiki push because we ran out of time, apologies for the confusion.

Aug 30 2022, 5:23 AM · Wikimedia-production-error, Growth-Team (Current Sprint), GrowthExperiments-CommunityConfiguration

Aug 29 2022

Joe added a comment to T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.

My current plan is to proceed as follows:

  • Move 0.1% of the traffic on monday 22nd
  • Ramp up to […]

Note that the client-side code currently in prod doesn't work.

In T311388#8161816 on 17 August 2022:

Change 824204 merged:

[mediawiki/extensions/WikimediaEvents@master] phpEngine: Actually include the phpEngine.js file

https://gerrit.wikimedia.org/r/824204

This stack will need backporting as it landed 17th, thus out by train next Thursday 26th.

Aug 29 2022, 1:39 PM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe added a comment to T314871: Investigation: Determine approach for DST support.

I am not sure you actually need to store the version of the tzinfo db in your database, reading the above, but please correct me if I'm wrong.

Aug 29 2022, 10:34 AM · Campaign-Tools (Campaign-Tools-Sprint-21), Campaign-Registration
Joe added a comment to T311385: Netbox and Redis.

Silly question: do we have an idea of the size of the cached dataset? if it's small, do we need to keep redis remote to the VM where netbox runs, or should we install it as a local sidecar?

Aug 29 2022, 8:46 AM · Infrastructure-Foundations, netbox, serviceops
Joe changed the status of T313968: codfw (2) memcached host service implementation tracking from Open to In Progress.
Aug 29 2022, 5:55 AM · serviceops, SRE
Joe changed the status of T313968: codfw (2) memcached host service implementation tracking, a subtask of T313966: Q1:rack/setup/install new codfw memcached hosts, from Open to In Progress.
Aug 29 2022, 5:55 AM · serviceops, SRE, ops-codfw, DC-Ops

Aug 19 2022

Joe added a comment to T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.

For jobrunners, I would say that if jobs are working in beta where they're supposedly running on 7.4, we can just do the conversion relatively quickly.

I reverted my changes in beta after a week of PHP 7.4 testing, in the interests of keeping beta in sync with production. I can see from the X-Powered-By headers in tcpdump that jobs are still running PHP 7.2. I would suggest switching over all of beta to 100% PHP 7.4 shortly before the production rollout. Watch out for the PHP-FPM port conflict issue I documented at T295578#7987696.

Aug 19 2022, 7:57 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe closed T311386: Install php 7.4 in production, a subtask of T271736: Migrate WMF production from PHP 7.2 to PHP 7.4, as Resolved.
Aug 19 2022, 7:56 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe closed T311386: Install php 7.4 in production as Resolved.
Aug 19 2022, 7:56 AM · Patch-For-Review, Performance-Team (Radar), serviceops

Aug 18 2022

Joe triaged T315538: Move Clement Goubert to ops as Medium priority.
Aug 18 2022, 9:53 AM · serviceops, SRE, SRE-Access-Requests
Joe updated the task description for T315538: Move Clement Goubert to ops.
Aug 18 2022, 9:52 AM · serviceops, SRE, SRE-Access-Requests
Joe created T315538: Move Clement Goubert to ops.
Aug 18 2022, 9:52 AM · serviceops, SRE, SRE-Access-Requests

Aug 17 2022

TheresNoTime awarded T306042: Serve beta cluster via PHP 7.4 by default a The World Burns token.
Aug 17 2022, 1:45 PM · Beta-Cluster-Infrastructure, serviceops
Joe changed the status of T306042: Serve beta cluster via PHP 7.4 by default, a subtask of T271736: Migrate WMF production from PHP 7.2 to PHP 7.4, from Open to In Progress.
Aug 17 2022, 1:30 PM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe changed the status of T306042: Serve beta cluster via PHP 7.4 by default from Open to In Progress.
Aug 17 2022, 1:30 PM · Beta-Cluster-Infrastructure, serviceops
Joe added a comment to T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.

My current plan is to proceed as follows:

Aug 17 2022, 6:49 AM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops

Aug 16 2022

lmata awarded T314840: Productionize vopsbot a Like token.
Aug 16 2022, 2:52 PM · Observability-Alerting, SRE-OnFire, SRE
herron awarded T314840: Productionize vopsbot a Love token.
Aug 16 2022, 1:39 PM · Observability-Alerting, SRE-OnFire, SRE
Joe changed the status of T271736: Migrate WMF production from PHP 7.2 to PHP 7.4, a subtask of T261872: Drop PHP 7.2 & 7.3 support from MediaWiki master branch, once Wikimedia production is on 7.4, from Open to In Progress.
Aug 16 2022, 1:23 PM · MW-1.40-notes (1.40.0-wmf.4; 2022-10-03), MediaWiki-Releasing, PHP 7.3 support, serviceops, PHP 7.2 support
Joe changed the status of T271736: Migrate WMF production from PHP 7.2 to PHP 7.4, a subtask of T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page, from Open to In Progress.
Aug 16 2022, 1:23 PM · Parsoid (Tracking), Upstream, Excimer, Wikimedia-production-error
Joe changed the status of T271736: Migrate WMF production from PHP 7.2 to PHP 7.4, a subtask of T297667: mysqli/mysqlnd memory leak, from Open to In Progress.
Aug 16 2022, 1:23 PM · serviceops-radar, WMF-General-or-Unknown
Joe changed the status of T271736: Migrate WMF production from PHP 7.2 to PHP 7.4 from Open to In Progress.
Aug 16 2022, 1:23 PM · Dumps-Generation, Patch-For-Review, Performance-Team (Radar), serviceops
Joe closed T314840: Productionize vopsbot as Resolved.
Aug 16 2022, 11:11 AM · Observability-Alerting, SRE-OnFire, SRE
Joe committed rLPRIf0e6d5f678ce: Add stub data for profile::vopsbot (authored by Joe).
Add stub data for profile::vopsbot
Aug 16 2022, 9:19 AM
Joe updated the task description for T314840: Productionize vopsbot.
Aug 16 2022, 8:31 AM · Observability-Alerting, SRE-OnFire, SRE
Joe committed rWISC1b6c7bef4b03: wikimedia-sre: allow SRE's new bot to change topic (authored by Joe).
wikimedia-sre: allow SRE's new bot to change topic
Aug 16 2022, 8:10 AM
Joe updated the task description for T314840: Productionize vopsbot.
Aug 16 2022, 7:13 AM · Observability-Alerting, SRE-OnFire, SRE

Aug 12 2022

Joe updated the task description for T314840: Productionize vopsbot.
Aug 12 2022, 9:05 AM · Observability-Alerting, SRE-OnFire, SRE
Joe added a comment to T314840: Productionize vopsbot.

Thank you for vopsbot, looks really good and useful!

A perhaps silly/minor thing: I think we should be using - instead of _ as a delimiter for commands, as that requires only one keystroke on US keyboards instead of two

Aug 12 2022, 6:28 AM · Observability-Alerting, SRE-OnFire, SRE
Joe changed the status of T238751: Only generate maxlag from pooled query service servers. from Open to Stalled.

Hi, any news on this front? I'll release this bug as its completion doesn't depend on me right now. When the functionality has been merged, please reassign to me.

Aug 12 2022, 5:14 AM · Wikidata-Query-Service, wmde-wikidata-tech, User-ItamarWMDE, SRE-OnFire, wdwb-tech, Sustainability (Incident Followup), Patch-For-Review, User-Addshore, Wikidata
Joe changed the status of T238751: Only generate maxlag from pooled query service servers., a subtask of T270614: Automatically depool wdqs servers that are "lagged", from Open to Stalled.
Aug 12 2022, 5:13 AM · Wikidata, Wikidata-Query-Service

Aug 11 2022

Joe updated the task description for T314840: Productionize vopsbot.
Aug 11 2022, 3:38 PM · Observability-Alerting, SRE-OnFire, SRE
Joe closed T314842: User management in vopsbot, a subtask of T314840: Productionize vopsbot, as Resolved.
Aug 11 2022, 10:16 AM · Observability-Alerting, SRE-OnFire, SRE
Joe closed T314842: User management in vopsbot as Resolved.

I'm resolving the task because I think the current changes are enough for the current goal. I'll come back to look at what @RhinosF1 suggested (thanks for that, by the way!) when I have some time for looking at maybe adding support for the account tag in the base bot libraries upstream.

Aug 11 2022, 10:15 AM · Observability-Alerting, SRE-OnFire, SRE
Joe added a comment to T314842: User management in vopsbot.

Basically, you're asking to base the bot's reactions on a state that's completely managed by an external source (nickserv/chanserv) and that we don't get with every IRC message.

This is not true and hasn't been for a while. Libera supports the account-tag CAP which means that any client that supports it can request that every message includes the services account if the user is signed in.

Most frameworks that have been updated recently should support this now.

See https://ircv3.net/specs/extensions/account-tag#:~:text=The%20account%2Dtag%20capability%20causes,the%20sender's%20current%20services%20username

Aug 11 2022, 10:10 AM · Observability-Alerting, SRE-OnFire, SRE
Joe closed T314843: vopsbot: UX improvements, a subtask of T314840: Productionize vopsbot, as Resolved.
Aug 11 2022, 9:56 AM · Observability-Alerting, SRE-OnFire, SRE
Joe closed T314843: vopsbot: UX improvements as Resolved.
Aug 11 2022, 9:56 AM · Observability-Alerting, SRE-OnFire, SRE

Aug 10 2022

Joe updated the task description for T314843: vopsbot: UX improvements.
Aug 10 2022, 12:24 PM · Observability-Alerting, SRE-OnFire, SRE
Joe updated the task description for T314843: vopsbot: UX improvements.
Aug 10 2022, 9:31 AM · Observability-Alerting, SRE-OnFire, SRE
Joe updated the task description for T314842: User management in vopsbot.
Aug 10 2022, 9:30 AM · Observability-Alerting, SRE-OnFire, SRE
Joe updated the task description for T314919: Access request for Clément Goubert.
Aug 10 2022, 9:09 AM · Security-Team
Joe added a member for WMF-NDA: Clement_Goubert.
Aug 10 2022, 8:38 AM
Joe added a member for acl*sre-team: Clement_Goubert.
Aug 10 2022, 8:37 AM
Joe added a comment to T314842: User management in vopsbot.

For the IRC side, it's probably better to check cloak or account:

  • 1: a nickname on IRC normally has a short period of time (although this can range from 0 to infinity depending on account setup) where nicknames can be used before being switched to GuestXXXX nicks. This poses a small window where an SRE account might be impersonated and can end up accepting a command (yes, SREs are supposed to have it turned on but while I'm sure everyone is trusted, mistakes can happen at both the network level (during a netsplit) and by users - an account or cloak is likely to be much more persistent - I don't think this task is the best place to list every possible way you can hijack a nickname).

2: some SREs like Volans use Volans|off - matching on account means only having one entry for every user rather than individually listing every nickname an SRE might use.

Aug 10 2022, 6:15 AM · Observability-Alerting, SRE-OnFire, SRE

Aug 9 2022

Joe updated the task description for T313902: Requesting access to production / the sreadmins group for Clément Goubert.
Aug 9 2022, 10:22 AM · SRE, LDAP-Access-Requests
Joe renamed T313902: Requesting access to production / the sreadmins group for Clément Goubert from Requesting access to wmf and ops for Clément Goubert to Requesting access to production / the sreadmins group for Clément Goubert.
Aug 9 2022, 9:44 AM · SRE, LDAP-Access-Requests
Joe claimed T313902: Requesting access to production / the sreadmins group for Clément Goubert.

Hi @BCornwall I'm going to take care of this task together with @Clement_Goubert - we already have @akosiaris' approval from before the delays we had; but I'll ask Lucasz to formally approve here anyways.

Aug 9 2022, 9:43 AM · SRE, LDAP-Access-Requests
Joe created T314843: vopsbot: UX improvements.
Aug 9 2022, 9:17 AM · Observability-Alerting, SRE-OnFire, SRE
Joe updated the task description for T314840: Productionize vopsbot.
Aug 9 2022, 9:12 AM · Observability-Alerting, SRE-OnFire, SRE
Joe created T314842: User management in vopsbot.
Aug 9 2022, 9:11 AM · Observability-Alerting, SRE-OnFire, SRE
Joe added a comment to T314840: Productionize vopsbot.

As of now, the debianization is done in the software, but I'm waiting to build and upload a package until I've solved some of the outstanding issues.

Aug 9 2022, 9:03 AM · Observability-Alerting, SRE-OnFire, SRE
Joe created T314840: Productionize vopsbot.
Aug 9 2022, 9:01 AM · Observability-Alerting, SRE-OnFire, SRE
Joe added a comment to T211661: Automatically clean up unused thumbnails in Swift.

As several people have pointed out in conversation with me, the fact that we're storing derivative artifacts that can be re-generated relatively cheaply in 3 replicas, in 2 data centers in Swift and in Varnish in all data centers is dubious and merits a serious re-think.

Aug 9 2022, 7:16 AM · Patch-For-Review, Traffic, SRE-swift-storage, SRE, Performance-Team

Aug 8 2022

Joe updated the task description for T313966: Q1:rack/setup/install new codfw memcached hosts.
Aug 8 2022, 6:45 AM · serviceops, SRE, ops-codfw, DC-Ops
Joe updated the task description for T313963: Q1:rack/setup/install new eqiad memcached hosts.
Aug 8 2022, 6:45 AM · ops-eqiad, SRE, serviceops, DC-Ops
Joe assigned T313963: Q1:rack/setup/install new eqiad memcached hosts to Jclark-ctr.
Aug 8 2022, 6:44 AM · ops-eqiad, SRE, serviceops, DC-Ops
Joe assigned T313966: Q1:rack/setup/install new codfw memcached hosts to Papaul.

@RobH the task should be complete with all the info, reassigning to Papaul

Aug 8 2022, 6:44 AM · serviceops, SRE, ops-codfw, DC-Ops
Joe placed T313966: Q1:rack/setup/install new codfw memcached hosts up for grabs.
Aug 8 2022, 6:43 AM · serviceops, SRE, ops-codfw, DC-Ops
Joe updated subscribers of T313963: Q1:rack/setup/install new eqiad memcached hosts.

@RobH all info should be filled in now.

Aug 8 2022, 6:42 AM · ops-eqiad, SRE, serviceops, DC-Ops
Joe placed T313963: Q1:rack/setup/install new eqiad memcached hosts up for grabs.
Aug 8 2022, 6:42 AM · ops-eqiad, SRE, serviceops, DC-Ops
Joe renamed T313965: eqiad (2) memcached host for wikifunctions service implementation tracking from eqiad (2) memcached host service implementation tracking to eqiad (2) memcached host for wikifunctions service implementation tracking.
Aug 8 2022, 6:36 AM · SRE, serviceops