Page MenuHomePhabricator

jcrespo (Jaime Crespo)
Sr Database Administrator

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
May 11 2015, 8:31 AM (210 w, 5 d)
Availability
Available
IRC Nick
jynus
LDAP User
Jcrespo
MediaWiki User
JCrespo (WMF) [ Global Accounts ]

Recent Activity

Fri, May 17

jcrespo awarded T223446: Cannot upload larger JPG/WEBM/GIF files: "An error was encountered when opening the file for ZIP checks" a Like token.
Fri, May 17, 3:21 PM · Patch-For-Review, UploadWizard, Multimedia, MediaWiki-Uploading

Thu, May 16

jcrespo added a comment to T223448: ErrorException from line 1274 of /srv/mediawiki/php-1.34.0-wmf.5/includes/upload/UploadBase.php: PHP Warning: fread() expects parameter 1 to be resource, boolean given.

Based on log coorelation, I think this,T222994 and T223446 are caused by the same regression.

Thu, May 16, 2:19 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), Patch-For-Review, Multimedia, UploadWizard, Wikimedia-production-error
jcrespo removed a project from T223440: Special:Contributions on meta.wikimedia.org: associated=1: "PHP fatal error: Call to undefined method MediaWiki\MediaWikiServices::getAssociated()": MediaWiki-General-or-Unknown.
Thu, May 16, 12:28 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), Regression, MediaWiki-Special-pages
jcrespo added a project to T223440: Special:Contributions on meta.wikimedia.org: associated=1: "PHP fatal error: Call to undefined method MediaWiki\MediaWikiServices::getAssociated()": MediaWiki-General-or-Unknown.
Thu, May 16, 12:28 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), Regression, MediaWiki-Special-pages

Wed, May 15

jcrespo added a comment to T223371: Thank links missing at some wikis (for just me?).

I don't see a (thank) on the Not working ones for bots and anonymous users, but I would guess that is intended.

Wed, May 15, 12:13 PM · Growth-Team, Thanks

Tue, May 14

jcrespo added a comment to T223126: Install new PDUs into b5-eqiad.

dbproxy1006 switched over completely. The above patch (plus db1139 shutdown will be done hours before the maintenance).

Tue, May 14, 2:49 PM · ops-eqiad, Operations
jcrespo added a comment to T213664: correctable memory errors db1068 (commons primary master database).

Ignore the above, that is unrelated.

Tue, May 14, 11:23 AM · Patch-For-Review, DBA, Operations
jcrespo added a comment to T213664: correctable memory errors db1068 (commons primary master database).

It now says: CRITICAL: Devices (12) not equal to PDs (2)

Tue, May 14, 11:15 AM · Patch-For-Review, DBA, Operations
jcrespo added a comment to T221595: MovePage::move contention on cebwiki.

Not sure if related, but now there seems to be contention on cebwiki for LinksUpdate::updateLinksTimestamp this one looks more like a structural problem, as it seems a lot of connections are trying to update the same page row at the same time:

Tue, May 14, 9:02 AM · MediaWiki-Special-pages, Wikimedia-production-error, MediaWiki-API, Contributors-Team, Editing-team
jcrespo added a comment to T206504: Create a new endpoint which returns articles in need of a description.

If you want a fast, in-memory, replicated storage we have memcached. It doesn't have the fancy data types redis has, but it's hardly impossible to use it for the same purpose.
If you want a less fast, reliable, eventually consistent multi-dc storage, you have cassandra via kask

Tue, May 14, 8:00 AM · WikimediaEditorTasks, Wikipedia-Android-App-Backlog, Reading-Infrastructure-Team-Backlog (Kanban), Mobile-Content-Service

Mon, May 13

jcrespo updated subscribers of T223126: Install new PDUs into b5-eqiad.

CC @akosiaris @ayounsi @RobH for m1 proxy for potential even if unlikely impact on etherpad, bacula, puppet (the mysql database) & librenms, racktables & rt.

Mon, May 13, 4:45 PM · ops-eqiad, Operations
jcrespo added a comment to T197126: Create tool to handle the state of database configuration in MediaWiki in etcd.

It would be nice to have a mockup of the API to test soon (with no production effect except maybe some debug information). That will allow to test automation from scripts we have already. I think that would be step #6 ?

Mon, May 13, 3:48 PM · User-ArielGlenn, Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA

Sat, May 11

jcrespo added a comment to T206203: Implement database binary backups into the production infrastructure.

All 9 + 9 backups worked, starting 20 UTC and the last one finished at 10:15 the next day. 14.4 TB of backups produced in that interval (~5.6 TB after compression).
In comparison, dumps do 14 + 13 backups and it takes from 17 UTC to 00:12 the next day, with a total size of 2.9 TB after compression.

Sat, May 11, 10:54 AM · Patch-For-Review, Goal, DBA

Fri, May 10

jcrespo added a comment to T206203: Implement database binary backups into the production infrastructure.

Things pending I would like to work on:

Fri, May 10, 4:53 PM · Patch-For-Review, Goal, DBA
jcrespo added a comment to T220002: Decommission dbstore1001, dbstore2001, dbstore2002.

MySQL and Prometheus have been stopped on the above hosts. This is almost ready, only pending wait some time and see if there is something we would like to keep from these old hosts.

Fri, May 10, 4:42 PM · Goal, DBA
jcrespo closed T213406: Purchase and setup remaining hosts for database backups, a subtask of T206203: Implement database binary backups into the production infrastructure, as Resolved.
Fri, May 10, 4:40 PM · Patch-For-Review, Goal, DBA
jcrespo closed T213406: Purchase and setup remaining hosts for database backups as Resolved.

I'd say, after closing all children, that this is done.

Fri, May 10, 4:40 PM · Goal, DBA
jcrespo renamed T200398: Document clearly the mariadb backup and recovery setup, specially how to recover a backup from MySQL playbook: How to recover a backup to Document clearly the mariadb backup and recovery setup, specially how to recover a backup.
Fri, May 10, 4:38 PM · User-Marostegui
jcrespo merged task T205626: Document clearly the mariadb backup and recovery setup into T200398: Document clearly the mariadb backup and recovery setup, specially how to recover a backup.
Fri, May 10, 4:37 PM · Patch-For-Review, DBA
jcrespo merged T205626: Document clearly the mariadb backup and recovery setup into T200398: Document clearly the mariadb backup and recovery setup, specially how to recover a backup.
Fri, May 10, 4:37 PM · User-Marostegui
jcrespo claimed T219631: Create a recovery/provisioning script for database binary backups.
Fri, May 10, 9:46 AM · DBA
jcrespo added a comment to T219631: Create a recovery/provisioning script for database binary backups.

A first version has been intetrated into transfer.py:

Fri, May 10, 9:45 AM · DBA
jcrespo added a comment to P8506 table sizes.
root@db1115.eqiad.wmnet[zarcillo]> select file_path, file_name, round(size/1024/1024/1024 * 100)/100 as GB FROM backup_files where backup_id=1311 order by size desc LIMIT 20;                                                                  +--------------+------------------------------------------------------+----------+
| file_path    | file_name                                            | GB       |
+--------------+------------------------------------------------------+----------+
| wikidatawiki | wb_terms.ibd                                         | 475.8500 |
| wikidatawiki | revision.ibd                                         | 188.8200 |
| wikidatawiki | revision_actor_temp.ibd                              | 161.7200 |
| wikidatawiki | pagelinks.ibd                                        | 155.5300 |
| wikidatawiki | slots.ibd                                            |  87.2900 |
| wikidatawiki | content.ibd                                          |  86.7900 |
| wikidatawiki | revision_comment_temp.ibd                            |  51.8000 |
| wikidatawiki | comment.ibd                                          |  45.0700 |
| wikidatawiki | change_tag.ibd                                       |  42.6300 |
| wikidatawiki | text.ibd                                             |  36.4000 |
| wikidatawiki | cu_changes.ibd                                       |  20.6700 |
| wikidatawiki | page_props.ibd                                       |  15.8800 |
| wikidatawiki | wb_items_per_site.ibd                                |  12.0900 |
| wikidatawiki | externallinks.ibd                                    |  10.4100 |
| wikidatawiki | recentchanges.ibd                                    |   8.5500 |
| wikidatawiki | page.ibd                                             |   6.7300 |
| wikidatawiki | wikimedia_editor_tasks_entity_description_exists.ibd |   6.2200 |
| wikidatawiki | wb_changes_subscription.ibd                          |   5.2400 |
| wikidatawiki | __wmf_checksums.ibd                                  |   4.7200 |
| wikidatawiki | watchlist.ibd                                        |   4.3500 |
+--------------+------------------------------------------------------+----------+
20 rows in set (0.00 sec)
Fri, May 10, 9:26 AM
jcrespo added a comment to P8506 table sizes.
root@db1115.eqiad.wmnet[zarcillo]> select IF(LOCATE('eqiad', source), 'eqiad', 'codfw') as dc, section, round (total_size / 1024/1024/1024 * 100)/100 as GB FROM backups where start_date > now() - interval 23 hour ORDER BY total_size DESC;
+-------+---------+-----------+
| dc    | section | GB        |
+-------+---------+-----------+
| eqiad | s8      | 1450.8400 |
| codfw | s8      | 1150.2200 |
| eqiad | s4      |  994.5700 |
| codfw | s4      |  969.4600 |
| eqiad | s1      |  896.9300 |
| eqiad | s2      |  805.2100 |
| eqiad | s7      |  801.9200 |
| eqiad | s3      |  800.4600 |
| codfw | s3      |  713.2600 |
| codfw | s5      |  608.6100 |
| codfw | x1      |  110.6500 |
| eqiad | x1      |   95.5100 |
| codfw | s1      |      NULL |
+-------+---------+-----------+
13 rows in set (0.00 sec)
Fri, May 10, 9:19 AM
jcrespo added a comment to T221764: Overview of wb_terms redesign.

Multichill I think a solution to your problems can be done on wikireplicas (or a similar level)- wikireplicas don't need to have the same structure than production, and additional tables or indexes can be done there. I don't think internal production needs should cater tool needs. That doesn't mean tool needs should not be provided, on the contrary, better query methods should be provided but I think they are different problems that should not be confused with one another. Better APIs should be made available with a stable interface, I 100% agree with that, but the database cannot be an interface that is guaranteed to be stable.

Fri, May 10, 8:23 AM · wb_terms - Tool Builders Migration
jcrespo edited P8506 table sizes.
Fri, May 10, 6:26 AM
jcrespo created P8506 table sizes.
Fri, May 10, 6:25 AM

Thu, May 9

jcrespo committed rOSMD832117977aab: transfer.py: Stop slave when calling stop slave (authored by jcrespo).
transfer.py: Stop slave when calling stop slave
Thu, May 9, 10:45 AM
jcrespo committed rOSMDb8a10e738cd4: transfer.py: Stop slave when calling stop slave (authored by jcrespo).
transfer.py: Stop slave when calling stop slave
Thu, May 9, 10:41 AM
jcrespo closed T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host as Resolved.

Compression has finished for these hosts.

Thu, May 9, 8:57 AM · Patch-For-Review, DBA
jcrespo closed T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host, a subtask of T220787: Fix RAID handler alert and puppet facter to work with Gen10 hosts and ssacli tool, as Resolved.
Thu, May 9, 8:57 AM · Patch-For-Review, Operations, Icinga, observability
jcrespo closed T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host, a subtask of T213406: Purchase and setup remaining hosts for database backups, as Resolved.
Thu, May 9, 8:57 AM · Goal, DBA
jcrespo committed rOSMD078fd5ae99e9: transfer.py: Fix bug that broke xtrabackup transfers (authored by jcrespo).
transfer.py: Fix bug that broke xtrabackup transfers
Thu, May 9, 8:28 AM
jcrespo added a comment to T214975: proton experienced a period of high CPU usage, busy queue, lockups.

Now it says proton1001: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues

Thu, May 9, 8:06 AM · Reading-Infrastructure-Team-Backlog, Proton, Operations
jcrespo updated the task description for T222772: Productionize db2[103-120].
Thu, May 9, 6:56 AM · Patch-For-Review, Goal, DBA

Wed, May 8

jcrespo added a comment to T206203: Implement database binary backups into the production infrastructure.

Executed as I mentioned above:

Wed, May 8, 4:25 PM · Patch-For-Review, Goal, DBA
jcrespo added a comment to T214975: proton experienced a period of high CPU usage, busy queue, lockups.

I think it overloaded again today (times are CEST):

Wed, May 8, 4:08 PM · Reading-Infrastructure-Team-Backlog, Proton, Operations
jcrespo added a comment to T222753: db2114 hardware problem .

Thanks.

Wed, May 8, 3:12 PM · Operations, ops-codfw, DBA
jcrespo added a comment to T206203: Implement database binary backups into the production infrastructure.

To set up replication on the destination, questions: does the metadata file contain only GTID coordinates so we have to do the "translation" looking at the master's binlog?

Wed, May 8, 1:28 PM · Patch-For-Review, Goal, DBA
jcrespo committed rOSMD37c035ae06c5: transfer.py: Ignore stopping and starting slave if option is not set (authored by jcrespo).
transfer.py: Ignore stopping and starting slave if option is not set
Wed, May 8, 12:15 PM
jcrespo added a comment to T206203: Implement database binary backups into the production infrastructure.
root@cumin2001:~$ time transfer.py --no-checksum --no-encrypt --type=decompress dbprov2001.codfw.wmnet:/srv/backups/snapshots/latest/snapshot.s6.2019-05-07--20-00-02.tar.gz db2117.codfw.wmnet:/srv/sqldata
...
WARNING: Original size is 207411963527 but transferred size is 494357898256 for copy to db2117.codfw.wmnet
494357898256 bytes correctly transferred from dbprov2001.codfw.wmnet to db2117.codfw.wmnet
Wed, May 8, 12:06 PM · Patch-For-Review, Goal, DBA
jcrespo added a comment to P8490 Logical backup duration.

Snapshot duration:

Wed, May 8, 8:51 AM
jcrespo created P8490 Logical backup duration.
Wed, May 8, 8:24 AM

Mon, May 6

jcrespo renamed T220002: Decommission dbstore1001, dbstore2001, dbstore2002 from Decommission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* to Decommission dbstore1001, dbstore2001, dbstore2002.
Mon, May 6, 9:33 AM · Goal, DBA
jcrespo moved T222592: Decommission es2001, es2002, es2003, es2004 from Triage to Blocked external/Not db team on the DBA board.
Mon, May 6, 9:33 AM · DBA
jcrespo changed the status of T222592: Decommission es2001, es2002, es2003, es2004 from Open to Stalled.

Blocked on bacula setup.

Mon, May 6, 9:33 AM · DBA
jcrespo created T222592: Decommission es2001, es2002, es2003, es2004.
Mon, May 6, 9:32 AM · DBA
jcrespo added a comment to T220002: Decommission dbstore1001, dbstore2001, dbstore2002.

Does T220002#5158901 conflict with setting it as spare? I wanted to set it as spare soon-ish, decom later.

Mon, May 6, 8:57 AM · Goal, DBA
jcrespo added a comment to T208323: Predictive failures on disk S.M.A.R.T. status.

You might be confused with db2047, I don't recall db2049 having a disk replaced lately

Mon, May 6, 8:48 AM · Operations, DBA

Sun, May 5

jcrespo added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

All the hosts have been setup and provisioned. Only pending patch to deploy is https://gerrit.wikimedia.org/r/507925 There is, however, a few iterations of table optimization and compression.

Sun, May 5, 1:01 PM · Patch-For-Review, DBA
jcrespo updated the task description for T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.
Sun, May 5, 12:57 PM · Patch-For-Review, DBA
jcrespo added a comment to T208323: Predictive failures on disk S.M.A.R.T. status.

T222526 db2049 (again?)

Sun, May 5, 12:54 PM · Operations, DBA
jcrespo assigned T222526: Degraded RAID on db2049 to Papaul.

@Papaul Please see if you have spare 600 GB disks (this is unlikely to be covered by warranty) to replace this. In the case you don't have we can see other options.

Sun, May 5, 12:50 PM · Operations, ops-codfw

Thu, May 2

jcrespo added a comment to T222224: Normalizing *links tables.

How does it sound?

Thu, May 2, 7:39 PM · MediaWiki-Database, TechCom-RFC
jcrespo added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

eqiad is complete too, also pending only possible recompressions to save space, like most of the codfw servers here.

Thu, May 2, 7:33 PM · Patch-For-Review, DBA
jcrespo updated the task description for T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.
Thu, May 2, 7:29 PM · Patch-For-Review, DBA
jcrespo added a comment to T207901: dbproxy1005 reports database failover.
-- Logs begin at Sat 2019-04-20 15:06:53 UTC, end at Thu 2019-05-02 16:07:12 UTC. --
May 02 14:53:39 dbproxy1005 haproxy[14940]: Backup Server mariadb/db1117:3325 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 02 14:55:25 dbproxy1005 haproxy[14940]: Server mariadb/db1073 is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 02 14:55:25 dbproxy1005 haproxy[14940]: proxy mariadb has no server available!
May 02 15:04:22 dbproxy1005 haproxy[14940]: Backup Server mariadb/db1117:3325 is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 02 15:04:22 dbproxy1005 haproxy[14940]: proxy mariadb has no server available!

Both servers were detected as down, so likely a network/app level issue of the proxy, not the databases.

Thu, May 2, 4:08 PM · cloud-services-team, DBA
jcrespo added a comment to T207901: dbproxy1005 reports database failover.

This happened again, restarting proxy, as I don't see a clear connection with max_connections. Network instability?

Thu, May 2, 4:02 PM · cloud-services-team, DBA
jcrespo closed T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) as Resolved.

installed, implementation (provisioning) will be handled at T220572.

Thu, May 2, 12:38 PM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo closed T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves), a subtask of T213406: Purchase and setup remaining hosts for database backups, as Resolved.
Thu, May 2, 12:38 PM · Goal, DBA
jcrespo closed T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves), a subtask of T220002: Decommission dbstore1001, dbstore2001, dbstore2002, as Resolved.
Thu, May 2, 12:38 PM · Goal, DBA
jcrespo updated the task description for T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves).
Thu, May 2, 12:37 PM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo updated the task description for T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves).
Thu, May 2, 12:23 PM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo added a comment to T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves).

@Cmjohnson In case this is useful for you, I have documented how to enable ipmi on ilo5 from the web interface here: https://wikitech.wikimedia.org/w/index.php?title=Management_Interfaces&diff=1824940&oldid=1823217

Thu, May 2, 12:22 PM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo added a comment to T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves).

Either dns, remote ipmi or password may not be configured properly:

Error: Unable to establish IPMI v2 / RMCP+ session
11:23:36 | Unable to run wmf-auto-reimage: Remote IPMI failed for mgmt 'db1139.mgmt.eqiad.wmnet': Command '['ipmitool', '-I', 'lanplus', '-H', 'db1139.mgmt.eqiad.wmnet', '-U', 'root', '-E', 'chassis', 'power', 'status']' returned non-zero exit status 1

trying to debug following workbook.

Thu, May 2, 11:25 AM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

db2102 is setup, pending loading data, which being done now while testing at the same time the latest recover_dump.py version and generated backup.

Thu, May 2, 9:43 AM · Patch-For-Review, DBA
jcrespo added a comment to T222224: Normalizing *links tables.

+1 to add namespace to the "title" table.

Thu, May 2, 8:06 AM · MediaWiki-Database, TechCom-RFC

Tue, Apr 30

jcrespo updated the task description for T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.
Tue, Apr 30, 7:51 PM · Patch-For-Review, DBA
jcrespo updated the task description for T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.
Tue, Apr 30, 3:18 PM · Patch-For-Review, DBA
jcrespo updated subscribers of T221449: Redesign querycache* tables.

All tables that are being used in a master-replica database need to have PK (preferably auto_increment integer PK)

Tue, Apr 30, 7:47 AM · MediaWiki-Database

Mon, Apr 29

jcrespo added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

98 and 99 done, althought they need recompression (specially s3).

Mon, Apr 29, 5:33 PM · Patch-For-Review, DBA
jcrespo updated the task description for T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.
Mon, Apr 29, 5:32 PM · Patch-For-Review, DBA
jcrespo claimed T218985: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves).
Mon, Apr 29, 4:57 PM · Patch-For-Review, Operations, ops-eqiad, DBA

Sat, Apr 27

jcrespo added a comment to T221595: MovePage::move contention on cebwiki.

The deadlocks came back, so I don't think we can close this for now. I still do not think it is high priority, but it is an ongoing event: https://logstash.wikimedia.org/goto/e5a2230fb3cd9c90155d5391fa54a484

Sat, Apr 27, 7:29 PM · MediaWiki-Special-pages, Wikimedia-production-error, MediaWiki-API, Contributors-Team, Editing-team
jcrespo moved T187153: Special:Abuselog throws when viewing details or examining (BadMethodCallException: Call get getId() on null) from Resolved to Found during 1.34-wmf.1 on the Wikimedia-production-error board.

Based on T187153#5101883 and the rate at https://logstash.wikimedia.org/goto/c930467fddcf4aaa4d4c0f8f00838498 lots of hits of this on fiwiki right now. The logging is the blocker issue, not the actual problem.

Sat, Apr 27, 7:25 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), User-zeljkofilipin, MW-1.33-notes (1.33.0-wmf.12; 2019-01-08), Patch-For-Review, User-Daimona, Regression, Multi-Content-Revisions, User-Addshore, Wikimedia-production-error, Chinese-Sites, AbuseFilter
jcrespo updated the task description for T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.
Sat, Apr 27, 4:03 PM · Patch-For-Review, DBA
jcrespo added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

Recompression is ongoing on db2097, but technically it is done.

Sat, Apr 27, 4:03 PM · Patch-For-Review, DBA
jcrespo added a comment to T151029: duplicate key problems.
root@db2097:/srv$ mysql -A -BN -S /run/mysqld/mysqld.s6.sock -e "select CONCAT(table_schema, '.', table_name) FROM information_schema.tables where table_Schema like '%wik%' and engine='InnoDB' and row_format != 'COMPRESSED'" | head -n 1 | while read table; do echo "$table..."; mysql -S /run/mysqld/mysqld.s6.sock -e "set session sql_log_bin=0; ALTER TABLE $table row_format=COMPRESSED"; done
frwiki.actor...
ERROR 1062 (23000) at line 1: Duplicate entry 'X.X.X.X' for key 'actor_name'
Sat, Apr 27, 1:24 PM · Patch-For-Review, DBA

Fri, Apr 26

jcrespo claimed T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.
Fri, Apr 26, 1:16 PM · Patch-For-Review, DBA

Apr 25 2019

jcrespo added a comment to T221595: MovePage::move contention on cebwiki.

I suggest to close this rather than keep it around if someone already checkit. No reason to keep a backlog if it is not clearly actionable, and we can reopen if it reoccurres. Normally I don't create a ticket for this kind of errors, but I was worried that it kept happening for days rather than hours of minutes.

Apr 25 2019, 10:29 PM · MediaWiki-Special-pages, Wikimedia-production-error, MediaWiki-API, Contributors-Team, Editing-team
jcrespo closed T219399: rack/setup/deploy eqiad dedicated backup recovery/provisioning hosts as Resolved.

This is now done, both servers are in production (although not with 100% of the final load, only all logical backups and one snapshot each). There is still the issue with permissions for free disk monitoring which is important, but not fatal. Will be fixed at the same time than the issues with dbprov2* servers (T218336#5081898).

Apr 25 2019, 5:08 PM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo updated the task description for T219399: rack/setup/deploy eqiad dedicated backup recovery/provisioning hosts.
Apr 25 2019, 5:06 PM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo updated the task description for T219399: rack/setup/deploy eqiad dedicated backup recovery/provisioning hosts.
Apr 25 2019, 5:04 PM · Patch-For-Review, Operations, ops-eqiad, DBA

Apr 24 2019

jcrespo updated the task description for T219399: rack/setup/deploy eqiad dedicated backup recovery/provisioning hosts.
Apr 24 2019, 3:50 PM · Patch-For-Review, Operations, ops-eqiad, DBA
jcrespo added a comment to T220894: Replacement of network::constant's special_hosts.

I think it is better to hardcode the constants on modules/profile/manifests/mariadb/ferm.pp (for now, not as an ideal situation) than to go on a multi-file refactoring commit without even notifying or searching input from the code maintainer (note I also inherited that code).

Apr 24 2019, 11:46 AM · Patch-For-Review, Operations

Apr 23 2019

jcrespo added a comment to T149670: Predictive disk failure on db2047.

(also not the same disk slot, so different issues and should be tracked separately)

Apr 23 2019, 2:57 PM · ops-codfw, Operations
jcrespo added a comment to T200297: Introduce a new namespace for collaborative judgements about wiki entities.

@Harej My question is more like, is the summary still accurate about the result of the conversations? (e.g. rampup of 1%, etc.), bots technically not allowed, etc. If yes, no problem, if not, I was asking to update it to reflect the latest agreement.

Apr 23 2019, 1:51 PM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Scoring-platform-team (Current), DBA, Operations, Jade, TechCom-RFC
jcrespo moved T221463: questions about standalone wmf-mariadb103 from Triage to Blocked external/Not db team on the DBA board.
Apr 23 2019, 1:41 PM · Patch-For-Review, DBA
jcrespo added a comment to T221463: questions about standalone wmf-mariadb103.

wmf-mariadb103 doesn't exist, and if it exists, it won't work- we don't support it yet as we found some bugs and we are not working on those at the moment. The plan is only to support wmf-mariadb101 for stretch, and wmf-mariadb103 for buster (we stopped supporting wmf-mariadb for ubuntu and wmf-mariadb10 for jessie). If you need a roadmap, please ask us.

Apr 23 2019, 1:39 PM · Patch-For-Review, DBA
jcrespo added a comment to T221458: Special:Log on commons -- entire web request took longer than 60 seconds and timed out.

Let's keep them separated for now, my bet is they are the same underlying issue, but the effects (Special:Log vs Special:Contributions), are different and they may need different resolutions (e.g. different query hints or indexes).

Apr 23 2019, 9:15 AM · Core Platform Team Kanban (Done with CPT), MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), Performance, MediaWiki-Logging, MediaWiki-Database, DBA, Operations, Wikimedia-production-error
jcrespo added a comment to T221511: Possible full scan query ApiQueryUserContribs::execute for revision_actor_temp table on commonswiki.

possible duplicate of T221380

Apr 23 2019, 9:04 AM · MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), Core Platform Team Kanban (Done with CPT), Patch-For-Review, Performance, MediaWiki-Database
jcrespo updated the task description for T221595: MovePage::move contention on cebwiki.
Apr 23 2019, 8:44 AM · MediaWiki-Special-pages, Wikimedia-production-error, MediaWiki-API, Contributors-Team, Editing-team
jcrespo created T221595: MovePage::move contention on cebwiki.
Apr 23 2019, 8:38 AM · MediaWiki-Special-pages, Wikimedia-production-error, MediaWiki-API, Contributors-Team, Editing-team
jcrespo updated the task description for T221577: Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope.
Apr 23 2019, 8:14 AM · Patch-For-Review, Performance-Team, Multimedia, MediaWiki-Database, GlobalUsage, MediaWiki-extensions-PageAssessments, Discovery-Search, GeoData, Wikimedia-production-error
jcrespo triaged T221577: Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope as Unbreak Now! priority.

Going up to unbreak now, because as far as I can see all edit hooks may be broken, causing long-lasting issues on the metadata. Change if that is not true.

Apr 23 2019, 8:13 AM · Patch-For-Review, Performance-Team, Multimedia, MediaWiki-Database, GlobalUsage, MediaWiki-extensions-PageAssessments, Discovery-Search, GeoData, Wikimedia-production-error
jcrespo updated the task description for T221577: Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope.
Apr 23 2019, 8:06 AM · Patch-For-Review, Performance-Team, Multimedia, MediaWiki-Database, GlobalUsage, MediaWiki-extensions-PageAssessments, Discovery-Search, GeoData, Wikimedia-production-error
jcrespo added a project to T221577: Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope: MediaWiki-Watchlist.
Apr 23 2019, 7:50 AM · Patch-For-Review, Performance-Team, Multimedia, MediaWiki-Database, GlobalUsage, MediaWiki-extensions-PageAssessments, Discovery-Search, GeoData, Wikimedia-production-error
jcrespo added projects to T221577: Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope: MediaWiki-extensions-PageAssessments, GlobalUsage, MediaWiki-Database.
Apr 23 2019, 7:48 AM · Patch-For-Review, Performance-Team, Multimedia, MediaWiki-Database, GlobalUsage, MediaWiki-extensions-PageAssessments, Discovery-Search, GeoData, Wikimedia-production-error
jcrespo moved T221577: Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope from Untriaged to Found during 1.34-wmf.1 on the Wikimedia-production-error board.
Apr 23 2019, 7:42 AM · Patch-For-Review, Performance-Team, Multimedia, MediaWiki-Database, GlobalUsage, MediaWiki-extensions-PageAssessments, Discovery-Search, GeoData, Wikimedia-production-error
jcrespo created T221577: Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope.
Apr 23 2019, 7:38 AM · Patch-For-Review, Performance-Team, Multimedia, MediaWiki-Database, GlobalUsage, MediaWiki-extensions-PageAssessments, Discovery-Search, GeoData, Wikimedia-production-error