Page MenuHomePhabricator

jijiki (effie mouzeli)
is an animal

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Aug 14 2018, 10:50 AM (115 w, 4 d)
Availability
Available
IRC Nick
effie
LDAP User
Effie Mouzeli
MediaWiki User
EMouzeli (WMF) [ Global Accounts ]

Recent Activity

Yesterday

jijiki added a comment to T252391: Reimage one memcached shard per DC to Buster.

We removed shard18 from redis.yaml so to be able to avoid installing redis-server on this server pair (mc1036-mc2036).

  • mc1036.eqiad.wmnet is left with puppet being disabled, as it is broken due to the server being absent from redis.yaml
  • mc2036.codfw.wmnet has been reimaged to buster without redis-server installed 🎉
Fri, Oct 30, 9:20 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki renamed T252391: Reimage one memcached shard per DC to Buster from Reimage one memcached shard to Buster to Reimage one memcached shard per DC to Buster.
Fri, Oct 30, 9:17 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki updated subscribers of P13119 mc2036.
Fri, Oct 30, 9:01 PM
jijiki created P13119 mc2036.
Fri, Oct 30, 9:00 PM
jijiki added a subtask for T266865: Very long response time on frwiki main page: T266900: FeaturedFeeds loads all feed content just to output the feed URLs on the main page.
Fri, Oct 30, 6:46 PM · serviceops, Traffic, Operations, Performance-Team, Performance Issue
jijiki added a parent task for T266900: FeaturedFeeds loads all feed content just to output the feed URLs on the main page: T266865: Very long response time on frwiki main page.
Fri, Oct 30, 6:46 PM · Performance Issue, MediaWiki-extensions-FeaturedFeeds
jijiki raised the priority of T266865: Very long response time on frwiki main page from High to Unbreak Now!.
Fri, Oct 30, 6:36 PM · serviceops, Traffic, Operations, Performance-Team, Performance Issue

Thu, Oct 29

jijiki moved T266577: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 from Inbox 🐅 to In Progress 🏋️‍♀️ on the User-jijiki board.
Thu, Oct 29, 2:44 PM · User-jijiki, serviceops, Operations, ops-codfw

Wed, Oct 28

jijiki added a comment to T213089: Upgrade memcached cluster to Debian Stretch/Buster.

@aaron thank you! I updated the task description

Wed, Oct 28, 8:04 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki updated the task description for T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Wed, Oct 28, 8:03 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki added a comment to T213089: Upgrade memcached cluster to Debian Stretch/Buster.

@aaron If you have any insights regarding the Redis Lock Manager and file upload, it would be much appreciated (+ T265643)

Wed, Oct 28, 6:59 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki updated the task description for T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Wed, Oct 28, 5:44 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki added a project to T213089: Upgrade memcached cluster to Debian Stretch/Buster: Wikidata.
Wed, Oct 28, 5:44 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki updated the task description for T265643: Upgrade MediaWiki's Redis cluster to Debian Buster.
Wed, Oct 28, 5:28 PM · User-jijiki, Operations, serviceops, Platform Engineering
jijiki added a parent task for T264991: Upgrade the MediaWiki servers to ICU 63: T245757: Upgrade MediaWiki appservers to Debian Buster (debian 10).
Wed, Oct 28, 5:17 PM · Operations, serviceops
jijiki added a subtask for T245757: Upgrade MediaWiki appservers to Debian Buster (debian 10): T264991: Upgrade the MediaWiki servers to ICU 63.
Wed, Oct 28, 5:17 PM · Release-Engineering-Team (Deployment services), Release-Engineering-Team-TODO, serviceops
jijiki changed the status of T245757: Upgrade MediaWiki appservers to Debian Buster (debian 10), a subtask of T247045: Migrate all of production metal and VMs to Buster or later, from Stalled to Open.
Wed, Oct 28, 3:39 PM · Operations, Epic
jijiki changed the status of T245757: Upgrade MediaWiki appservers to Debian Buster (debian 10), a subtask of T261872: Drop PHP 7.2 support from MediaWiki master branch, once Wikimedia production is on 7.3, from Stalled to Open.
Wed, Oct 28, 3:39 PM · Patch-For-Review, Release-Engineering-Team, serviceops, PHP 7.2 support
jijiki changed the status of T245757: Upgrade MediaWiki appservers to Debian Buster (debian 10), a subtask of T252432: Drop MediaWiki testing in stretch and instead test only in buster, from Stalled to Open.
Wed, Oct 28, 3:39 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure
jijiki renamed T245757: Upgrade MediaWiki appservers to Debian Buster (debian 10) from upgrade MediaWiki appservers to Debian 10 (buster) to Upgrade MediaWiki appservers to Debian Buster (debian 10).
Wed, Oct 28, 3:39 PM · Release-Engineering-Team (Deployment services), Release-Engineering-Team-TODO, serviceops
jijiki updated the task description for T264991: Upgrade the MediaWiki servers to ICU 63.
Wed, Oct 28, 2:39 PM · Operations, serviceops
jijiki renamed T264991: Upgrade the MediaWiki servers to ICU 63 from Upgrade the MediaWiki appservers to debian buster, icu63 to Upgrade the MediaWiki servers to ICU 63.
Wed, Oct 28, 12:47 PM · Operations, serviceops
jijiki updated subscribers of T263683: Mechanism to flag webrequests as "debug".

@Millimetric, after discussing with @ema, traffic feels that those requests should be visible in turnilo (eg webrequests_sampled_128), but we should be able to filter them out easily. Moreover, I gave it some more thought and I think we should add to the condition to have the "x-wikimedia-debug" present as well, along with the X-Analytics: debug=1 header. The way I have written it now, it can be easily used to run requests which would be invisible in turnilo. All our mwdebug servers are small VMs, so sending a lot of traffic towards them would take a lot of time :)

Wed, Oct 28, 9:11 AM · Patch-For-Review, serviceops, Analytics-Kanban, Analytics, User-jijiki

Tue, Oct 27

jijiki added a project to T266577: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 : User-jijiki.
Tue, Oct 27, 8:55 PM · User-jijiki, serviceops, Operations, ops-codfw

Mon, Oct 26

jijiki updated the task description for T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Mon, Oct 26, 5:10 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki lowered the priority of T265501: Make a way to build Scap .deb in Docker from High to Low.
Mon, Oct 26, 4:27 PM · serviceops, Operations, Scap, Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2))
jijiki added a member for Kubernetes: jijiki.
Mon, Oct 26, 4:24 PM
jijiki added a project to T264025: Create a structured testing environment for applications running on kubernetes : Kubernetes.
Mon, Oct 26, 4:24 PM · Kubernetes, User-jijiki, serviceops
jijiki updated the task description for T264025: Create a structured testing environment for applications running on kubernetes .
Mon, Oct 26, 1:35 PM · Kubernetes, User-jijiki, serviceops

Fri, Oct 23

jijiki added a comment to T243009: Add option in Scap to restart php-fpm for emergency deployments, and skip depooling/pooling servers.

I would also like for someone to investigate if systemctl reload php7.2-fpm clears opcache.

IIRC it does, but we saw the same kind of corruption happening when issuing a reload than when invalidating it explicitly (that is, much more common than it is now with automatic revalidation).

Fri, Oct 23, 5:06 PM · Patch-For-Review, Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Sustainability (Incident Followup), Release-Engineering-Team (Deployment services), Scap
jijiki added a comment to T243009: Add option in Scap to restart php-fpm for emergency deployments, and skip depooling/pooling servers.

I would also like for someone to investigate if systemctl reload php7.2-fpm clears opcache.

Fri, Oct 23, 4:22 PM · Patch-For-Review, Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Sustainability (Incident Followup), Release-Engineering-Team (Deployment services), Scap
jijiki updated the task description for T262202: Create a separate 'mwdebug' cluster.
Fri, Oct 23, 3:57 PM · Patch-For-Review, Analytics-Radar, Release-Engineering-Team, observability, serviceops, User-jijiki
jijiki updated the task description for T262202: Create a separate 'mwdebug' cluster.
Fri, Oct 23, 3:56 PM · Patch-For-Review, Analytics-Radar, Release-Engineering-Team, observability, serviceops, User-jijiki
jijiki updated the task description for T252391: Reimage one memcached shard per DC to Buster.
Fri, Oct 23, 1:29 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops

Thu, Oct 22

jijiki renamed T243009: Add option in Scap to restart php-fpm for emergency deployments, and skip depooling/pooling servers from Add option in Scap to skip restarting php-fpm for emergency deployments to Add option in Scap to restart php-fpm for emergency deployments, and skip depooling/pooling servers.
Thu, Oct 22, 5:43 PM · Patch-For-Review, Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Sustainability (Incident Followup), Release-Engineering-Team (Deployment services), Scap
jijiki added a comment to T243009: Add option in Scap to restart php-fpm for emergency deployments, and skip depooling/pooling servers.

Proposal:

Revise /usr/local/bin/safe-service-restart to accept --force to make it skip depool/repool.

Thu, Oct 22, 5:02 PM · Patch-For-Review, Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Sustainability (Incident Followup), Release-Engineering-Team (Deployment services), Scap

Wed, Oct 21

jijiki added a comment to T253673: Avoid php-opcache corruption in WMF production.

Yesterday we had opcache corruptions on 2 servers, mw2252 && I don't know about other times, but for those specific 2 corruptions, I can say that they happened right after opcache restarted because, on these servers it reached its max cached keys:

mw2328:
    "start_time": 1600177590, -> Tuesday, 15 September 2020 
    "last_restart_time": 1603211850, -> Tuesday, 20 October 2020 16:37:30
    "oom_restarts": 0,
    "hash_restarts": 2,

mw2252:
    "start_time": 1600174055, -> Tuesday, 15 September 2020 12:47:35
    "last_restart_time": 1603217533, -> Tuesday, 20 October 2020 18:12:13
    "oom_restarts": 0,
    "hash_restarts": 2,

Wed, Oct 21, 4:43 PM · User-jijiki, Patch-For-Review, Sustainability (Incident Followup), Performance-Team, serviceops
jijiki added a comment to T265501: Make a way to build Scap .deb in Docker.

@LarsWirzenius after discussing it, we decided that for the time being we can't adopt this solution, given that generally there are no issues with the way we currently build debs. You can always use the package_builder instance on cloud VPS tou check if your package build properly. Last time I updated the documentation so I doubt we will have any issues building scap this time. Thank you!

Wed, Oct 21, 3:35 PM · serviceops, Operations, Scap, Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2))
jijiki added a comment to T265324: Create the base container images for running MediaWiki in a production environment.

Also I want to clarify: we can reduce the pain as much as possible, but for the duration of the transition phase, it will be somewhat more work than we're used to for this kind of changes. There is no way around that that I can think of.

Hopefully we'll be able to reduce the amount of time needed as much as possible.

I don't think "you also need to do a deploy to k8s" is a bad compromise, but at this point I'd like to hear from others.

Wed, Oct 21, 2:44 PM · Patch-For-Review, Operations, serviceops, MW-on-K8s
jijiki added a comment to T265324: Create the base container images for running MediaWiki in a production environment.
Wed, Oct 21, 11:45 AM · Patch-For-Review, Operations, serviceops, MW-on-K8s
jijiki added a comment to T265324: Create the base container images for running MediaWiki in a production environment.

Regarding the apache httpd container, I am approaching layering as follows:

  • one base image, which uses the apache2-bin debian package and just modifies the vanilla configuration to listen on port 8080 (so that the container can run as user www-data).
  • one image configured to manage a php-fpm application. This image will be used as a base for both MediaWiki and the shellout service. It will include all modules and base configurations we need, and have a single virtualhost sending all "*.php" files to the fastcgi daemon
Wed, Oct 21, 11:18 AM · Patch-For-Review, Operations, serviceops, MW-on-K8s

Tue, Oct 20

jijiki edited projects for T213089: Upgrade memcached cluster to Debian Stretch/Buster, added: Platform Engineering; removed Patch-For-Review.
Tue, Oct 20, 8:45 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki updated the task description for T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Tue, Oct 20, 8:45 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki changed the status of T213089: Upgrade memcached cluster to Debian Stretch/Buster, a subtask of T244852: Upgrade and improve our application object caching service (memcached), from Stalled to Open.
Tue, Oct 20, 8:45 PM · Patch-For-Review, Operations, serviceops
jijiki changed the status of T213089: Upgrade memcached cluster to Debian Stretch/Buster, a subtask of T224549: Track remaining jessie systems in production, from Stalled to Open.
Tue, Oct 20, 8:45 PM · Operations
jijiki changed the status of T213089: Upgrade memcached cluster to Debian Stretch/Buster from Stalled to Open.
Tue, Oct 20, 8:45 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki changed the status of T213089: Upgrade memcached cluster to Debian Stretch/Buster, a subtask of T247045: Migrate all of production metal and VMs to Buster or later, from Stalled to Open.
Tue, Oct 20, 8:45 PM · Operations, Epic
jijiki moved T265643: Upgrade MediaWiki's Redis cluster to Debian Buster from Inbox 🐅 to In Progress 🏋️‍♀️ on the User-jijiki board.
Tue, Oct 20, 4:45 PM · User-jijiki, Operations, serviceops, Platform Engineering
jijiki added a project to T265643: Upgrade MediaWiki's Redis cluster to Debian Buster: User-jijiki.
Tue, Oct 20, 4:01 PM · User-jijiki, Operations, serviceops, Platform Engineering
jijiki added a subtask for T213089: Upgrade memcached cluster to Debian Stretch/Buster: T265643: Upgrade MediaWiki's Redis cluster to Debian Buster.
Tue, Oct 20, 4:00 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki added a parent task for T265643: Upgrade MediaWiki's Redis cluster to Debian Buster: T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Tue, Oct 20, 4:00 PM · User-jijiki, Operations, serviceops, Platform Engineering
jijiki moved T252391: Reimage one memcached shard per DC to Buster from Inbox 🐅 to In Progress 🏋️‍♀️ on the User-jijiki board.
Tue, Oct 20, 3:55 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki moved T213089: Upgrade memcached cluster to Debian Stretch/Buster from Q2 2020 to In Progress 🏋️‍♀️ on the User-jijiki board.
Tue, Oct 20, 3:53 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki added a project to T252391: Reimage one memcached shard per DC to Buster: User-jijiki.
Tue, Oct 20, 3:44 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki updated the task description for T252391: Reimage one memcached shard per DC to Buster.
Tue, Oct 20, 3:44 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki added a comment to T265258: High latency on push notification service initialization.

@Jgiannelos is there any help you would like from serviceops ?

Tue, Oct 20, 3:41 PM · Product-Infrastructure-Team-Backlog (Kanban), User-jijiki, serviceops, Push-Notification-Service
jijiki closed T236292: php-fpm invalid opcode on mw1317 as Resolved.

Resolve it since it has not been updated for so long :)

Tue, Oct 20, 2:27 PM · Operations, serviceops
jijiki moved T258779: Roll out proxy gutter pool from Inbox 🐅 to Next up 🥌 on the User-jijiki board.
Tue, Oct 20, 10:51 AM · User-jijiki, serviceops
jijiki moved T263683: Mechanism to flag webrequests as "debug" from Inbox 🐅 to Next up 🥌 on the User-jijiki board.
Tue, Oct 20, 10:50 AM · Patch-For-Review, serviceops, Analytics-Kanban, Analytics, User-jijiki
jijiki moved T260661: Create a cookbook to perform a rolling reboot of a kubernetes cluster from Inbox 🐅 to Next up 🥌 on the User-jijiki board.
Tue, Oct 20, 10:50 AM · User-jijiki, SRE-tools, serviceops, Operations
jijiki moved T263494: Evaluate use of Gerrit dashboard for code review from Inbox 🐅 to In Progress 🏋️‍♀️ on the User-jijiki board.
Tue, Oct 20, 10:50 AM · User-jijiki, serviceops, Performance-Team, Developer Productivity
jijiki moved T264604: MediaWiki to route specific keys to /*/mw-with-onhost-tier/ from Inbox 🐅 to Radar 📻 on the User-jijiki board.
Tue, Oct 20, 10:50 AM · Patch-For-Review, User-jijiki, Operations, serviceops, Performance-Team

Mon, Oct 19

jijiki updated the task description for T252391: Reimage one memcached shard per DC to Buster.
Mon, Oct 19, 8:29 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki added a comment to T252391: Reimage one memcached shard per DC to Buster.

@kostajh Thank you! 💃🏼

Mon, Oct 19, 8:23 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki added projects to T265501: Make a way to build Scap .deb in Docker: Operations, serviceops.
Mon, Oct 19, 8:19 PM · serviceops, Operations, Scap, Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2))

Fri, Oct 16

jijiki closed T264698: Degraded RAID on mw2279 as Resolved.
Fri, Oct 16, 1:42 PM · serviceops, Operations, ops-codfw
jijiki added a comment to T264698: Degraded RAID on mw2279.

@jijiki disk replaced

Fri, Oct 16, 12:34 PM · serviceops, Operations, ops-codfw

Thu, Oct 15

jijiki updated the task description for T252391: Reimage one memcached shard per DC to Buster.
Thu, Oct 15, 5:58 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki triaged T265643: Upgrade MediaWiki's Redis cluster to Debian Buster as Medium priority.
Thu, Oct 15, 5:57 PM · User-jijiki, Operations, serviceops, Platform Engineering
jijiki renamed T213089: Upgrade memcached cluster to Debian Stretch/Buster from Upgrade memcached for Debian Stretch/Buster to Upgrade memcached cluster to Debian Stretch/Buster.
Thu, Oct 15, 5:50 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki updated the task description for T244852: Upgrade and improve our application object caching service (memcached).
Thu, Oct 15, 5:48 PM · Patch-For-Review, Operations, serviceops
jijiki added a subtask for T244852: Upgrade and improve our application object caching service (memcached): T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Thu, Oct 15, 5:47 PM · Patch-For-Review, Operations, serviceops
jijiki added a parent task for T213089: Upgrade memcached cluster to Debian Stretch/Buster: T244852: Upgrade and improve our application object caching service (memcached).
Thu, Oct 15, 5:47 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki removed a subtask for T213089: Upgrade memcached cluster to Debian Stretch/Buster: T244852: Upgrade and improve our application object caching service (memcached).
Thu, Oct 15, 5:47 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki removed a parent task for T244852: Upgrade and improve our application object caching service (memcached): T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Thu, Oct 15, 5:47 PM · Patch-For-Review, Operations, serviceops
jijiki added a parent task for T244852: Upgrade and improve our application object caching service (memcached): T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Thu, Oct 15, 5:44 PM · Patch-For-Review, Operations, serviceops
jijiki added a subtask for T213089: Upgrade memcached cluster to Debian Stretch/Buster: T244852: Upgrade and improve our application object caching service (memcached).
Thu, Oct 15, 5:44 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki removed a subtask for T244852: Upgrade and improve our application object caching service (memcached): T213089: Upgrade memcached cluster to Debian Stretch/Buster.
Thu, Oct 15, 5:44 PM · Patch-For-Review, Operations, serviceops
jijiki removed a parent task for T213089: Upgrade memcached cluster to Debian Stretch/Buster: T244852: Upgrade and improve our application object caching service (memcached).
Thu, Oct 15, 5:44 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki added a subtask for T213089: Upgrade memcached cluster to Debian Stretch/Buster: T252391: Reimage one memcached shard per DC to Buster.
Thu, Oct 15, 5:43 PM · Wikidata, Platform Engineering, User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
jijiki edited parent tasks for T252391: Reimage one memcached shard per DC to Buster, added: T213089: Upgrade memcached cluster to Debian Stretch/Buster; removed: T244852: Upgrade and improve our application object caching service (memcached).
Thu, Oct 15, 5:43 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki removed a subtask for T244852: Upgrade and improve our application object caching service (memcached): T252391: Reimage one memcached shard per DC to Buster.
Thu, Oct 15, 5:43 PM · Patch-For-Review, Operations, serviceops
jijiki changed the status of T252391: Reimage one memcached shard per DC to Buster from Stalled to Open.
Thu, Oct 15, 5:42 PM · User-jijiki, Growth-Team (Current Sprint), User-Elukey, Patch-For-Review, Operations, serviceops
jijiki changed the status of T252391: Reimage one memcached shard per DC to Buster, a subtask of T244852: Upgrade and improve our application object caching service (memcached), from Stalled to Open.
Thu, Oct 15, 5:41 PM · Patch-For-Review, Operations, serviceops

Wed, Oct 14

jijiki added a project to T263494: Evaluate use of Gerrit dashboard for code review: User-jijiki.
Wed, Oct 14, 9:57 PM · User-jijiki, serviceops, Performance-Team, Developer Productivity
jijiki added a project to T263683: Mechanism to flag webrequests as "debug": serviceops.
Wed, Oct 14, 6:37 PM · Patch-For-Review, serviceops, Analytics-Kanban, Analytics, User-jijiki
jijiki added a comment to T263683: Mechanism to flag webrequests as "debug".

@Milimetric Sorry for the late reply, thank you very much! I will move forward with the relevant patch. Do we need to coordinate after merging it?

Wed, Oct 14, 6:35 PM · Patch-For-Review, serviceops, Analytics-Kanban, Analytics, User-jijiki
jijiki added a comment to T225140: Icinga alerts that should open tasks instead of alerting.

I like what we do for degraded RAIDs, I think it will help us move forward something like this.

Wed, Oct 14, 2:32 PM · observability

Tue, Oct 13

jijiki added a comment to T264698: Degraded RAID on mw2279.

Thank you!

Tue, Oct 13, 7:23 PM · serviceops, Operations, ops-codfw
jijiki added a project to T265258: High latency on push notification service initialization: User-jijiki.
Tue, Oct 13, 4:31 PM · Product-Infrastructure-Team-Backlog (Kanban), User-jijiki, serviceops, Push-Notification-Service

Tue, Oct 6

jijiki updated subscribers of T264698: Degraded RAID on mw2279.

According to netbox, this server is still under warranty.

Tue, Oct 6, 10:51 AM · serviceops, Operations, ops-codfw
jijiki added a comment to T264698: Degraded RAID on mw2279.
[Tue Oct  6 06:28:23 2020] ata2.00: failed command: READ FPDMA QUEUED
[Tue Oct  6 06:28:23 2020] ata2.00: cmd 60/80:00:00:a9:f7/00:00:03:00:00/40 tag 0 ncq dma 65536 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Tue Oct  6 06:28:23 2020] ata2.00: status: { DRDY }
Tue, Oct 6, 10:39 AM · serviceops, Operations, ops-codfw

Mon, Oct 5

jijiki added a subtask for T244340: Reduce read pressure on mc* servers by adding a machine-local Memcached instance (on-host memcached): T264604: MediaWiki to route specific keys to /*/mw-with-onhost-tier/.
Mon, Oct 5, 12:12 PM · User-jijiki, Sustainability (Incident Followup), Performance-Team, Patch-For-Review, Operations, serviceops
jijiki added a parent task for T264604: MediaWiki to route specific keys to /*/mw-with-onhost-tier/: T244340: Reduce read pressure on mc* servers by adding a machine-local Memcached instance (on-host memcached).
Mon, Oct 5, 12:12 PM · Patch-For-Review, User-jijiki, Operations, serviceops, Performance-Team
jijiki updated the task description for T264604: MediaWiki to route specific keys to /*/mw-with-onhost-tier/.
Mon, Oct 5, 12:12 PM · Patch-For-Review, User-jijiki, Operations, serviceops, Performance-Team
jijiki created T264604: MediaWiki to route specific keys to /*/mw-with-onhost-tier/.
Mon, Oct 5, 12:11 PM · Patch-For-Review, User-jijiki, Operations, serviceops, Performance-Team
jijiki added a project to T260661: Create a cookbook to perform a rolling reboot of a kubernetes cluster: User-jijiki.
Mon, Oct 5, 10:58 AM · User-jijiki, SRE-tools, serviceops, Operations

Fri, Oct 2

jijiki added a comment to T244340: Reduce read pressure on mc* servers by adding a machine-local Memcached instance (on-host memcached).

If the local-memcached's blind ttl is around the time we tolerate purges to be delayed for, and if it explicitly excludes mw-wan broadcasts and other WANCache internal keys then this is probably fine to proceed as experiment without further input.

  • WANCache:v:* - Fine, these are the bulk of values stored in Memached.
  • WANCache:[^v]:- Not fine, but should be minor, these are misc unusual/short-lived keys.
  • /*/mw-wan/ - Not fine, but should be minor, these are WANCache tombstones.
  • Everything else - Unsure, but should be minor, these are dc-local direct use of BagOStuff without WANCache. The only example that comes to mind are MWs rate limits. Sounds unsafe to split-brain on local app servers?

@aaron Can you confirm the above, and maybe fill in more about the non-WANCache?

@Joe Purge delay tolerance is indeed in the ballpark of 10s, after which makes MW automatically shorten lots of things if exceeded, e.g. refuse storing results of stuff in higher level caches, or use drastically shorter TTLs for them, shortened CDN maxage, eventually throttle edits/read-only etc.

Fri, Oct 2, 1:56 PM · User-jijiki, Sustainability (Incident Followup), Performance-Team, Patch-For-Review, Operations, serviceops
jijiki updated subscribers of T263958: Test onhost memcached performance and functionality.

What happens when onhost memcached in unavailable? https://phabricator.wikimedia.org/T244340#6211682 @elukey @aaron

Fri, Oct 2, 1:14 PM · Patch-For-Review, User-jijiki, Operations, serviceops
jijiki added a comment to T244340: Reduce read pressure on mc* servers by adding a machine-local Memcached instance (on-host memcached).

What happens when onhost memcached in unavailable? https://phabricator.wikimedia.org/T244340#6211682 @elukey @aaron

Fri, Oct 2, 1:13 PM · User-jijiki, Sustainability (Incident Followup), Performance-Team, Patch-For-Review, Operations, serviceops