Page MenuHomePhabricator

jcrespo (Jaime Crespo)
Sr Database Administrator

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
May 11 2015, 8:31 AM (285 w, 5 d)
Availability
Available
IRC Nick
jynus
LDAP User
Jcrespo
MediaWiki User
JCrespo (WMF) [ Global Accounts ]

Recent Activity

Yesterday

jcrespo added a comment to T264189: Prepare a proof of concept of the minimum setup capable of backup and recover testwiki media files.

Further exploration of the existing metadata has been done at: https://gerrit.wikimedia.org/r/637769

Fri, Oct 30, 7:42 PM · Patch-For-Review, Data-Persistence-Backup, Goal, Operations, SRE-swift-storage
jcrespo added a comment to T266483: Enable report_host for mariadb.

Yes, no problem with that, I was just giving context in case you didn't remember about that, after they got moved around.

Fri, Oct 30, 12:33 PM · Patch-For-Review, Orchestrator, DBA, User-Kormat
jcrespo added a comment to T266483: Enable report_host for mariadb.

Small hint- note there is 2 testing hosts on eqiad, db1077 and the "backup testing" host, db1133.

Fri, Oct 30, 12:23 PM · Patch-For-Review, Orchestrator, DBA, User-Kormat
jcrespo added a comment to T266775: Stalls on db1075 (s3) replica db.

db performance

Fri, Oct 30, 9:43 AM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA

Thu, Oct 29

jcrespo added a project to T266775: Stalls on db1075 (s3) replica db: Datacenter-Switchover.

Not related, but first incident was spotted during datacenter-switchover.

Thu, Oct 29, 3:56 PM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo added a comment to T263220: Limit concurrency of DPL queries.

+1

Thu, Oct 29, 3:54 PM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), Performance Issue, Patch-For-Review, Platform Engineering, DynamicPageList (Wikimedia)
jcrespo lowered the priority of T263220: Limit concurrency of DPL queries from Unbreak Now! to Medium.

This is no longer UBN, please feel free to consider it resolved/declined based on my comments on the subtask.

Thu, Oct 29, 3:49 PM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), Performance Issue, Patch-For-Review, Platform Engineering, DynamicPageList (Wikimedia)
jcrespo reassigned T266775: Stalls on db1075 (s3) replica db from jcrespo to Urbanecm.
Thu, Oct 29, 3:47 PM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo closed T266775: Stalls on db1075 (s3) replica db as Resolved.

I am going to consider this resolved, unless we were completely wrong and this wasn't the cause of the database stalls/connectivity issues.

Thu, Oct 29, 3:47 PM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo awarded T266775: Stalls on db1075 (s3) replica db a Love token.
Thu, Oct 29, 3:47 PM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo closed T266775: Stalls on db1075 (s3) replica db, a subtask of T263220: Limit concurrency of DPL queries, as Resolved.
Thu, Oct 29, 3:47 PM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), Performance Issue, Patch-For-Review, Platform Engineering, DynamicPageList (Wikimedia)
jcrespo added a comment to T266775: Stalls on db1075 (s3) replica db.

I am mostly certain that this was the issue causing db1075 stalls, as processlist has decreased a lot (a slow query can be millions of times more impactful than a regular query).

Thu, Oct 29, 3:44 PM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo added a comment to T266775: Stalls on db1075 (s3) replica db.

Looking great so far-

Thu, Oct 29, 3:41 PM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo added a comment to T266723: When switching DCs, update pc hosts in tendril.

If I can provide more background, unless normal circumstances, pc* hosts are active-active, and no change should happen on them (no read only changes, etc.). This was solved on zarcillo by setting masters per datacenter so no change has to happen. Because zarcillo never substituted tendril, the issue is not as much with the switchover scripts as with tendril model, which can only setup one master per global replica set, an not one per datacenter. Per convenience, on tendril the "masters" are considered the ones on the active dc, but that is not really accurate to reality.

Thu, Oct 29, 1:10 PM · Datacenter-Switchover, DBA, Operations
jcrespo raised the priority of T263220: Limit concurrency of DPL queries from High to Unbreak Now!.

Because this is actively causing outages for 800+ wikis on s3.

Thu, Oct 29, 12:01 PM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), Performance Issue, Patch-For-Review, Platform Engineering, DynamicPageList (Wikimedia)
jcrespo added a project to T266775: Stalls on db1075 (s3) replica db: MediaWiki-General.

Adding MediaWiki-General because we need to identify which team may know about DynamicPageListHooks::renderDynamicPageList

Thu, Oct 29, 11:47 AM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo edited projects for T266775: Stalls on db1075 (s3) replica db, added: DBA; removed Security-Team.
Thu, Oct 29, 11:45 AM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA
jcrespo created T266775: Stalls on db1075 (s3) replica db.
Thu, Oct 29, 11:44 AM · Datacenter-Switchover, User-Urbanecm, DynamicPageList (Wikimedia), MediaWiki-General, DBA

Wed, Oct 28

jcrespo added a comment to T266636: Orchestrator db logical backups.

I have added it to the logical backup process by adding the right grants to the existing dump user/process to the new database, but let's revisit once people working on the setup are happy with the deployment.

Wed, Oct 28, 8:50 AM · Orchestrator, Data-Persistence-Backup, DBA
jcrespo added a comment to T264189: Prepare a proof of concept of the minimum setup capable of backup and recover testwiki media files.

For archival purposes, this is the (naive) code solution for downloading all images of a wiki using the mwclient library (https://gerrit.wikimedia.org/r/636007):

Wed, Oct 28, 7:36 AM · Patch-For-Review, Data-Persistence-Backup, Goal, Operations, SRE-swift-storage

Tue, Oct 27

jcrespo closed T265323: Add toil::systemd_scope_cleanup to dbprov hosts, a subtask of T199911: Systemd session creation fails under I/O load, as Resolved.
Tue, Oct 27, 1:23 PM · Operations, SRE-tools
jcrespo closed T265323: Add toil::systemd_scope_cleanup to dbprov hosts as Resolved.
<icinga-wm> RECOVERY - Check systemd state on dbprov1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
Tue, Oct 27, 1:23 PM · Data-Persistence-Backup, Operations, SRE-tools
jcrespo added a comment to T266432: Increase on database writes and deletes activity on Commonswiki leads to some replication lag.

Independently of the source of the issue, could these regenerations be throttled/rate limited (assuming it is a background process)? It is clear that while they don't affect (a lot) the same dc databases, the number of writes do not scale over multiple datacenters?

Tue, Oct 27, 11:27 AM · Platform Engineering, Wikimedia-production-error, DBA, Commons, Release-Engineering-Team, Operations
jcrespo added a comment to T265323: Add toil::systemd_scope_cleanup to dbprov hosts.

So my guess is this is only happening on buster.

Tue, Oct 27, 9:14 AM · Data-Persistence-Backup, Operations, SRE-tools

Mon, Oct 26

jcrespo added a comment to T266483: Enable report_host for mariadb.

I didn't find a ticket, so maybe it was only an informal conversation with no actionables. This was something we wanted to do, because when implementing "primaryhost.slaves()" on WMFMariaDB code we didn't have a report of the host.

Mon, Oct 26, 3:57 PM · Patch-For-Review, Orchestrator, DBA, User-Kormat
jcrespo added a comment to T266483: Enable report_host for mariadb.

I believe there was a ticket were we refereed this, let me try to search it as I think I run into this issue for the WMFReplication class.

Mon, Oct 26, 3:43 PM · Patch-For-Review, Orchestrator, DBA, User-Kormat
jcrespo added a comment to T266432: Increase on database writes and deletes activity on Commonswiki leads to some replication lag.

I am getting strange, inconsistent results every time I check, now I've seen the increase happening starting at ~9:04h (with no deployments around that time):

Mon, Oct 26, 2:18 PM · Platform Engineering, Wikimedia-production-error, DBA, Commons, Release-Engineering-Team, Operations
jcrespo added a project to T266432: Increase on database writes and deletes activity on Commonswiki leads to some replication lag: Wikimedia-production-error.

Adding Wikimedia-production-error as it seems to coincide with a non-train deploy at 16:45 on the 22. I am unable to find it on SAL, however?

Mon, Oct 26, 10:49 AM · Platform Engineering, Wikimedia-production-error, DBA, Commons, Release-Engineering-Team, Operations

Sun, Oct 25

jcrespo added a comment to T266334: Display number of total files uploaded on Special:MediaStatistics.

I'm not sure (really) what exactly you want.

Sun, Oct 25, 2:57 PM · Structured-Data-Backlog, Design, Structured Data Engineering, MediaWiki-Special-pages, Commons

Fri, Oct 23

jcrespo created P13058 wikireplica alias.
Fri, Oct 23, 1:05 PM
jcrespo added a comment to T266334: Display number of total files uploaded on Special:MediaStatistics.

This is not a serious blocker, but maybe it could be an "easy" task for a newcomer, assuming people are ok with it?

Fri, Oct 23, 12:52 PM · Structured-Data-Backlog, Design, Structured Data Engineering, MediaWiki-Special-pages, Commons
jcrespo created T266334: Display number of total files uploaded on Special:MediaStatistics.
Fri, Oct 23, 12:39 PM · Structured-Data-Backlog, Design, Structured Data Engineering, MediaWiki-Special-pages, Commons
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

First run has been done on all hosts, all clean now as far as mysqlcheck / CHECK TABLES is concerned (only commonswiki on db2099 had a bad index, now fixed and rechecked).

Fri, Oct 23, 8:24 AM · Data-Persistence-Backup
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Fri, Oct 23, 8:22 AM · Data-Persistence-Backup

Thu, Oct 22

jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

After running in a very supervising way mysqlcheck on almost all hosts, I can say this is not as easy as "just setting up a cron and run it every week". The CHECK TABLES command on all tables can take up to 24 hours per host, and it is very impacting. We don't have the proper monitoring tuning configuration to handle this, plus it makes backups fail frequently if both run concurrently (at least 3 snapshots failed because of ongoing checks).

Thu, Oct 22, 10:34 AM · Data-Persistence-Backup
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Thu, Oct 22, 8:35 AM · Data-Persistence-Backup
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

I am dropping and then recreating the index on different transactions with the hope that that will be a bit faster than recreating the full table- I will do a check tables at the end to check that fixes it.

Thu, Oct 22, 5:57 AM · Data-Persistence-Backup
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Thu, Oct 22, 5:50 AM · Data-Persistence-Backup

Wed, Oct 21

jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

We finally have a positive:

Wed, Oct 21, 8:00 PM · Data-Persistence-Backup
jcrespo closed T165756: Create summary templates on Wikitech wiki to stop writing the same things everywhere, everytime as Resolved.

I am going to consider it as resolved, as this was created in the past for a very specific event, but it is no longer very concrete. That doesn't mean documentation shouldn't improved, is that I see no value on tracking anything concrete on a task. Reopen if you disagree. Documentation should be improved a lot, but tasks should have reason to be open, and there is no longer much activity/clear actionables here.

Wed, Oct 21, 7:35 PM · Data-Persistence-Admin, MediaWiki-backport-deployments, Wikimedia-Site-requests, Documentation, Wikimedia-Hackathon-2017
jcrespo closed T165756: Create summary templates on Wikitech wiki to stop writing the same things everywhere, everytime, a subtask of T165726: [Hackathon doc sprint] Improve deployment documentation, as Resolved.
Wed, Oct 21, 7:34 PM · Wikimedia-Site-requests, Documentation, Wikidata, Wikimedia-Hackathon-2017
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Wed, Oct 21, 6:35 PM · Data-Persistence-Backup
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Wed, Oct 21, 5:45 PM · Data-Persistence-Backup
jcrespo added a comment to T261405: db1139 memory errors on boot 2020-08-27.

the host will need reimage

Wed, Oct 21, 4:24 PM · Operations, DBA, ops-eqiad
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Wed, Oct 21, 1:14 PM · Data-Persistence-Backup
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

We should drop the profiling table from source backup hosts before setting up the regular checking to prevent extra log spam.

Wed, Oct 21, 1:13 PM · Data-Persistence-Backup
jcrespo added a subtask for T265866: Run check table periodically on backup source hosts: T266125: Drop table profiling from WMF wiki mariadb servers.
Wed, Oct 21, 1:12 PM · Data-Persistence-Backup
jcrespo added a parent task for T266125: Drop table profiling from WMF wiki mariadb servers: T265866: Run check table periodically on backup source hosts.
Wed, Oct 21, 1:12 PM · Data-Persistence-Backup, DBA
jcrespo updated the task description for T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking).
Wed, Oct 21, 1:10 PM · Epic, DBA, Tracking-Neverending
jcrespo triaged T266125: Drop table profiling from WMF wiki mariadb servers as Medium priority.

I will take care of dropping it first on the source backups so those don't contaminate other host, other host will have to wait until dc switchback from codfw to eqiad.

Wed, Oct 21, 1:08 PM · Data-Persistence-Backup, DBA
jcrespo created T266125: Drop table profiling from WMF wiki mariadb servers.
Wed, Oct 21, 1:07 PM · Data-Persistence-Backup, DBA
jcrespo updated the task description for T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking).
Wed, Oct 21, 1:02 PM · Epic, DBA, Tracking-Neverending
jcrespo added a comment to T265323: Add toil::systemd_scope_cleanup to dbprov hosts.

I did a manual systemctl reset-failed once.

Wed, Oct 21, 12:37 PM · Data-Persistence-Backup, Operations, SRE-tools
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Wed, Oct 21, 11:24 AM · Data-Persistence-Backup
jcrespo added a comment to T264703: Race condition when re-importing a logical backup and a new one is generated.

Aside from that, what I can do is add a check just before rotation to "latest" to see if there is something "reading" the dir and kill it before moving it to latest? Maybe restrict it to myloader pid/recovery script?

Wed, Oct 21, 9:43 AM · Data-Persistence-Backup
jcrespo added a comment to T234826: Repurpose db1108 as generic Analytics db replica.

Everything there looks fine! There may be procedures that I could help you simplify to be done more easily, we can talk on a different medium at a later time to avoid spamming other people here.

Wed, Oct 21, 9:39 AM · Analytics-Clusters, User-Elukey, Analytics-Kanban
jcrespo added a comment to T264703: Race condition when re-importing a logical backup and a new one is generated.

I've added:
https://wikitech.wikimedia.org/wiki/MariaDB/Backups#Pre-requisites_before_recovering_the_backup

Wed, Oct 21, 9:28 AM · Data-Persistence-Backup
jcrespo closed T265323: Add toil::systemd_scope_cleanup to dbprov hosts as Declined.

I am going to decline this, not because it is a bad suggestion, but because the fix is not really a fix, as much as a "way to avoid alerting" (aka reduce toil), I want to make sure this is toil- and happens more than once before deploying it. If it happens again, CC @Kormat reopen this and I will just deploy it on all dbprovs.

Wed, Oct 21, 9:17 AM · Data-Persistence-Backup, Operations, SRE-tools
jcrespo closed T265323: Add toil::systemd_scope_cleanup to dbprov hosts, a subtask of T199911: Systemd session creation fails under I/O load, as Declined.
Wed, Oct 21, 9:17 AM · Operations, SRE-tools
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

@Marostegui s2 on codfw gave no errors. This is what I expected- given we had no issues in the past with 10.1, I think it has to be the combination of corruption and the upgrade to 10.4. We can do a test of moving s2 to 10.4 on a test host, and then running the test? Or we can establish to do so after every upgrade ( a full check tables and not only the one done for upgrade, even if it takes longer).

Wed, Oct 21, 9:10 AM · Data-Persistence-Backup
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Wed, Oct 21, 9:07 AM · Data-Persistence-Backup
jcrespo closed T215028: Unexpected extracts greek API response, a subtask of T109238: Clean up broken namespace pages across Wikimedia sites, as Resolved.
Wed, Oct 21, 9:01 AM · Wikimedia-maintenance-script-run, Wikimedia-Site-requests
jcrespo closed T215028: Unexpected extracts greek API response as Resolved.

I am going to be bold and close this as fixed, based on original reporter response, pending tasks could be fixed at parent T109238.

Wed, Oct 21, 9:01 AM · Readers-Web-Backlog (Tracking), TextExtracts, MediaWiki-API
jcrespo added a comment to T261405: db1139 memory errors on boot 2020-08-27.

Sorry for the late response, it was very late on our TZ.

Wed, Oct 21, 8:49 AM · Operations, DBA, ops-eqiad

Tue, Oct 20

jcrespo added a comment to T263842: S5 replication issue, affecting watchlist and probably recentchanges.

As a last comment, I thought at first it was 1, but after some analysis, I believe there are more chances that it was 2, given the rows involved were very frequent, but as you say, it is not easy to prove it.

Tue, Oct 20, 5:26 PM · Sustainability (Incident Followup), Wikimedia-Incident, Operations, DBA
jcrespo lowered the priority of T79922: Set up backup strategy for es clusters from High to Medium.
Tue, Oct 20, 5:21 PM · Data-Persistence-Backup, Patch-For-Review, Goal, Operations
jcrespo added a comment to T79922: Set up backup strategy for es clusters.

ES are backed up, but currently only locally. We need to finish the cross-dc backup, hopfully on Q3.

Tue, Oct 20, 5:21 PM · Data-Persistence-Backup, Patch-For-Review, Goal, Operations
jcrespo added a comment to T200398: Document clearly the mariadb backup and recovery setup, specially how to recover a backup.

I think I will need guidance of what is not clear from someone less "into" backups. I understand what it is now is not great, the problem is I am too deep into the rabbitwhole to try to figure how to approach this (TODOs? recipes? Examples?)

Tue, Oct 20, 5:19 PM · Data-Persistence-Backup, User-Marostegui
jcrespo lowered the priority of T205628: Handle object metadata backups and compare it with stored database object inventory from Medium to Low.

Wishlist but not planned at the moment, we need first to work on object inventory- but we will want it eventually to check live data corruption/backup corruption.

Tue, Oct 20, 5:16 PM · Data-Persistence-Backup
jcrespo added a comment to T85278: Setup an Offsite backup infrastructure.

This is most likely delayed to Q3 or even if we setup an alternative backup method to bacula.

Tue, Oct 20, 5:15 PM · Data-Persistence-Backup, Operations
jcrespo closed T234900: Setup bacula backup monitoring as Resolved.

I am going to consider this resolved- there is monitoring, and we have a dashboard and tooling for it (command line and prometheus exporter) (https://grafana.wikimedia.org/d/413r2vbWk/bacula). Everything documented at: https://wikitech.wikimedia.org/wiki/Bacula#Monitoring

Tue, Oct 20, 4:54 PM · Data-Persistence-Backup, Patch-For-Review, Sustainability, observability, Goal, Operations
jcrespo closed T234900: Setup bacula backup monitoring, a subtask of T229209: Strengthen backup infrastructure and support, as Resolved.
Tue, Oct 20, 4:54 PM · Patch-For-Review, Goal, DBA, serviceops, Operations
jcrespo added a comment to T264703: Race condition when re-importing a logical backup and a new one is generated.

So the second part is kinda expected "Running myloader..." will indicate that the process has started and it won't finish as long as the underlying myloader hasn't finished... which unless I understood incorrectly, it hadn't finish (it was blocked)?

Tue, Oct 20, 4:45 PM · Data-Persistence-Backup
jcrespo added a comment to T165756: Create summary templates on Wikitech wiki to stop writing the same things everywhere, everytime.

@Dereckson @Quiddity Do you see any more concrete actionables to keep this open still? Some changes were done already. Is there something specific beyond "improve documentation"?

Tue, Oct 20, 4:34 PM · Data-Persistence-Admin, MediaWiki-backport-deployments, Wikimedia-Site-requests, Documentation, Wikimedia-Hackathon-2017
jcrespo added a comment to T234826: Repurpose db1108 as generic Analytics db replica.

I can confirm backups have been flowing weekly as expected:

Tue, Oct 20, 4:24 PM · Analytics-Clusters, User-Elukey, Analytics-Kanban
jcrespo closed T243884: Strange URL pattern after search https://en.wikipedia.org/w/index.php?sort=relevance&sort=relevance&sort=relevance&sort=relevance&sort=relevance&sort=relevance ... as Resolved.

Not sure who owns this to declare it resolved, but as the original reporter, I think it is, using the above link, there was no incident in the last 4 weeks. Thanks to everyone that helped here.

Tue, Oct 20, 4:17 PM · MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), Traffic, Operations, Advanced-Search, Readers-Web-Backlog (Tracking), Discovery-Search, Wikimedia-production-error
jcrespo added a comment to T86530: Replace wb_terms table with more specialized mechanisms for terms (tracking).

To be fair, technically, this is resolved because wb_terms has been replaced, AFAIK, with a more specialized mechanism (several smaller and normalised tables) :-)

Tue, Oct 20, 4:12 PM · Wikidata-Turtles-Tech-Debt, Wikidata-Ministry-Of-Magic-Tech-Debt, Tracking-Neverending, § Wikidata-Sprint-2015-02-03, Performance Issue, Wikidata, MediaWiki-extensions-WikibaseRepository
jcrespo added a comment to T111929: Puppetize grants for mysql hosts that are the source of recovery (dbstore, passive misc).

In other works this is a subtask of bigger issue T146149, specific to the backup-related hosts.

Tue, Oct 20, 4:07 PM · Operations, DBA
jcrespo added a comment to T111929: Puppetize grants for mysql hosts that are the source of recovery (dbstore, passive misc).

I think Manuel and/or I requested to document what grants are needed to setup a backup host. The problems is there is no good way to do so- as grants are currently only maintained/supported to document on a text file for core mediawiki hosts, and there is no way to define/document non-core/non-misc grants.

Tue, Oct 20, 4:05 PM · Operations, DBA
jcrespo added a comment to T265323: Add toil::systemd_scope_cleanup to dbprov hosts.

@Marostegui 2 questions:

Tue, Oct 20, 3:34 PM · Data-Persistence-Backup, Operations, SRE-tools
jcrespo added a comment to T263587: CAPEX for ParserCache for Parsoid.

I'm going by the Dell quotes for the hw, backtracking from the racking task. If those are wrong, can $someone point me to the right ones?

Tue, Oct 20, 1:36 PM · DBA, serviceops, Platform Team Workboards (Green), MediaWiki-Parser, Parsoid
jcrespo added a comment to T263587: CAPEX for ParserCache for Parsoid.

Another small correction:

it could bring us capability to write into the ParserCache from the secondary DC, which we don't currently need, but certainly could think of some usages for it

Tue, Oct 20, 1:30 PM · DBA, serviceops, Platform Team Workboards (Green), MediaWiki-Parser, Parsoid
jcrespo added a comment to T263587: CAPEX for ParserCache for Parsoid.

Small addendum: Note that parsercache functionality is memcached + MySQL, not just MySQL. In fact the MySQL part was a later addition for disk persistence/larger dataset.

Tue, Oct 20, 1:25 PM · DBA, serviceops, Platform Team Workboards (Green), MediaWiki-Parser, Parsoid
jcrespo updated the task description for T265866: Run check table periodically on backup source hosts.
Tue, Oct 20, 1:11 PM · Data-Persistence-Backup
jcrespo added a comment to T170298: sshd stretch puppet support.

https://www.openssh.com/txt/release-7.5:

This release deprecates the sshd_config UsePrivilegeSeparation
option, thereby making privilege separation mandatory. Privilege
separation has been on by default for almost 15 years and
sandboxing has been on by default for almost the last five.

Tue, Oct 20, 12:34 PM · Patch-For-Review, Operations
jcrespo added a subtask for T262668: WMF media storage must be adequately backed up in a remote location: T160229: Back up of Commons files.
Tue, Oct 20, 10:45 AM · Data-Persistence-Backup, Epic, Goal, Operations, SRE-swift-storage
jcrespo added a parent task for T160229: Back up of Commons files: T262668: WMF media storage must be adequately backed up in a remote location.
Tue, Oct 20, 10:45 AM · Datasets-Archiving, Operations, Datasets-General-or-Unknown, Community-Wishlist-Survey-2016, Commons
jcrespo added a comment to T160229: Back up of Commons files.

Backup of commons files is a part of the more ambitious: "Backup al wikis media files" project being worked currently at: T262668 and subtasks.

Tue, Oct 20, 10:45 AM · Datasets-Archiving, Operations, Datasets-General-or-Unknown, Community-Wishlist-Survey-2016, Commons
jcrespo added a comment to T265321: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production.

It's something we seemingly only do in a minority. At least in MW core, only on TINYINT, in two cases. 17 other tinyint in tables.sql don't do it

Tue, Oct 20, 8:58 AM · Data-Persistence-Consultation, User-Ladsgroup, Schema-change
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

Is s2 completed

Tue, Oct 20, 8:28 AM · Data-Persistence-Backup
jcrespo added a comment to T263587: CAPEX for ParserCache for Parsoid.

I have one question before everything else- does the parsercache expansion mean like a new "cluster/service" in parallel to the existing parsercache or would it be more like an expansion of the current service, to increase the number of hits/change the pc policy to store more data?

Tue, Oct 20, 8:01 AM · DBA, serviceops, Platform Team Workboards (Green), MediaWiki-Parser, Parsoid
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

But the backup source hosts have notifications disabled for lag, no?

Tue, Oct 20, 7:04 AM · Data-Persistence-Backup

Mon, Oct 19

jcrespo added a comment to T227739: Contention on User::getActorId ?.

@PlatformEng Note my previous comment at T227739#5327312, as the original reporter, which are in line with @Aklapper comments.

Mon, Oct 19, 5:54 PM · Platform Engineering (Icebox), Wikimedia-database-error, Wikimedia-production-error, MediaWiki-User-management
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

100% agree with this, sadly, the lag part is not configurable yet on icinga alerts. :-( We'll see what I can get done, however for now I will just run it manually on all hosts to discard ongoing issues.

Mon, Oct 19, 12:34 PM · Data-Persistence-Backup
jcrespo added a comment to T265866: Run check table periodically on backup source hosts.

Implementing this should be relatively easy, just running mysqlcheck -c -A on the host. The problem is how to deal with potential replication delays?

Mon, Oct 19, 12:29 PM · Data-Persistence-Backup
jcrespo closed T265885: test task, ignore as Invalid.
Mon, Oct 19, 11:30 AM
jcrespo updated subscribers of T265885: test task, ignore.
Mon, Oct 19, 11:27 AM
jcrespo created T265885: test task, ignore.
Mon, Oct 19, 11:26 AM
jcrespo added a comment to T119173: RFC: Discourage use of MySQL's ENUM type.

Ladsgroup: I already gave an opinion at T119173#6064823 (please note that that was in context of the original proposal of the task "ban ENUMs", not the context of the current task/RFC. That and subsequent comments already capture the essence of the my suggestions (discourage them and encourage table normalization). Of course, maybe those DBAs acting on schema changes may have additional thoughts.

Mon, Oct 19, 10:56 AM · Data-Persistence-Consultation, Performance-Team (Radar), TechCom-RFC, MediaWiki-General
jcrespo added a comment to T265321: ipblocks_restrictions.ir_type is tinyint(1) in code but tinyint(4) in production.

Interesting trivia: MySQL is going to deprecate definition of length for numeric values: https://dev.mysql.com/worklog/task/?id=13127 We should stop defining them in the first place.

Mon, Oct 19, 10:26 AM · Data-Persistence-Consultation, User-Ladsgroup, Schema-change