jcrespo (Jaime Crespo)
Sr Database Administrator

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
May 11 2015, 8:31 AM (144 w, 6 d)
Availability
Available
IRC Nick
jynus
LDAP User
Jcrespo
MediaWiki User
JCrespo (WMF)

Recent Activity

Today

jcrespo created T187651: Setting packages on 'hold' breaks puppet runs.
Sun, Feb 18, 6:07 PM · Puppet, Operations

Yesterday

jcrespo awarded T175881: Redirect Toolforge Quarry page to Cloud VPS Quarry a Love token.
Sat, Feb 17, 6:40 PM · Toolforge, Quarry
jcrespo created T187626: https://tools.wmflabs.org/quarry/ is accessible but broken.
Sat, Feb 17, 5:58 PM · Quarry
jcrespo added a comment to T187542: Decommission db1043.

The usage of the tafs is ok.

Sat, Feb 17, 4:37 PM · DBA, Patch-For-Review, ops-eqiad, hardware-requests, Operations

Fri, Feb 16

jcrespo triaged T187543: Decommission db2012 as Normal priority.
Fri, Feb 16, 1:09 PM · DBA, Patch-For-Review, hardware-requests, ops-codfw, Operations
jcrespo triaged T187542: Decommission db1043 as Normal priority.
Fri, Feb 16, 12:53 PM · DBA, Patch-For-Review, ops-eqiad, hardware-requests, Operations
jcrespo updated subscribers of T187526: Disk #5 (count starts at #0) of db1111 has corrupted sectors.
Fri, Feb 16, 9:46 AM · DBA, Operations, ops-eqiad
jcrespo moved T187526: Disk #5 (count starts at #0) of db1111 has corrupted sectors from Triage to Blocked external/Not db team on the DBA board.
Fri, Feb 16, 9:45 AM · DBA, Operations, ops-eqiad
jcrespo created T187526: Disk #5 (count starts at #0) of db1111 has corrupted sectors.
Fri, Feb 16, 9:45 AM · DBA, Operations, ops-eqiad
jcrespo added a comment to T159423: Meta ticket: Migrate multi-source database hosts to multi-instance.

Or 2 + eventlogging, more realistically, with the current budget.

Fri, Feb 16, 8:33 AM · Epic, DBA
jcrespo added a comment to T187521: Optimize recentchanges and wbc_entity_usage table across wikis.

Sqoops the wbc_entity_usage tables

Fri, Feb 16, 7:39 AM · Wikidata, DBA

Thu, Feb 15

jcrespo added a comment to T187143: Upcoming phabricator upgrade requires unusually long database migrations.

That is not what I asked, I asked you to disable the first one (the alter), not the second one.

Thu, Feb 15, 5:51 AM · Patch-For-Review, Phabricator (2018-02-15), Release, DBA
jcrespo added a comment to T187419: Degraded RAID on db2048.

It finally failed completely.

Thu, Feb 15, 5:44 AM · DBA, Operations, ops-codfw
jcrespo assigned T187419: Degraded RAID on db2048 to Papaul.
Thu, Feb 15, 5:43 AM · DBA, Operations, ops-codfw
jcrespo merged task T187328: db2048: RAID with predictive failure into T187419: Degraded RAID on db2048.
Thu, Feb 15, 5:43 AM · ops-codfw, DBA, Operations
jcrespo merged T187328: db2048: RAID with predictive failure into T187419: Degraded RAID on db2048.
Thu, Feb 15, 5:43 AM · DBA, Operations, ops-codfw

Wed, Feb 14

jcrespo added a comment to T187143: Upcoming phabricator upgrade requires unusually long database migrations.

The switchover will probably require your help restarting apache/phab at the time.

Wed, Feb 14, 1:32 PM · Patch-For-Review, Phabricator (2018-02-15), Release, DBA
jcrespo added a comment to T187143: Upcoming phabricator upgrade requires unusually long database migrations.

The alter is done on the slave already- try to see if you can cleanly skip that migration- if there are other alters backwards compatible, we could do those, too. During the mainteance, we will switchover to the new host, you finish the migrations, and you have a newer version and a new, faster hosts with the latest mariadb package.

Wed, Feb 14, 1:31 PM · Patch-For-Review, Phabricator (2018-02-15), Release, DBA
jcrespo added a comment to T187143: Upcoming phabricator upgrade requires unusually long database migrations.

Is it easy to trick phabricator into skipping migrations?- we could do that safely and online, and it seems backwards compatible.

Wed, Feb 14, 12:49 PM · Patch-For-Review, Phabricator (2018-02-15), Release, DBA
jcrespo added a comment to T187143: Upcoming phabricator upgrade requires unusually long database migrations.

I do not see any schema change, only updates using ids, is there an actual schema change done before that? https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.02.populate.php;215b8b4727aa96dc5cbd2a53120f97d5f4d4ce3b

Wed, Feb 14, 12:44 PM · Patch-For-Review, Phabricator (2018-02-15), Release, DBA
jcrespo moved T136335: Allow self-serve database credential and permissions management for Tool Labs projects from Backlog to Backlog (help welcome) on the DBA board.

I will not be working on this, but will help if someone else wants in the future (add us back). Not sure if it is still relevant with the new way of handling accounts.

Wed, Feb 14, 12:00 PM · Cloud-Services, DBA, Toolforge
jcrespo moved T165674: Investigate slow servermon updating queries on db1016 from Backlog to Backlog (help welcome) on the DBA board.
Wed, Feb 14, 11:58 AM · DBA, Operations
jcrespo moved T183983: Re-institute query killer for the analytics WikiReplica from In progress to Next on the DBA board.
Wed, Feb 14, 11:57 AM · Data-Services, DBA
jcrespo moved T186579: labsdb1010 crashed from In progress to Next on the DBA board.
Wed, Feb 14, 11:56 AM · Data-Services, DBA
jcrespo moved T184888: Replace codfw x1 master (db2033) (WAS: Failed BBU on db2033 (x1 master)) from In progress to Next on the DBA board.
Wed, Feb 14, 11:56 AM · Patch-For-Review, DBA
jcrespo added a comment to T183029: Stop managing account creation for labsdb1001 and 1003 through the maintain-dbusers script.

This is the new list of m5 databases being backed up- if you confirm that is as intended, you can proceed now we have proper[sic] backups.

root@dbstore2001:/srv/backups/m5.20180214102352$ ls *-schema-create.sql.gz
ceilometer-schema-create.sql.gz
designate_pool_manager-schema-create.sql.gz
designate-schema-create.sql.gz
glance-schema-create.sql.gz
keystone-schema-create.sql.gz
labsdbaccounts-schema-create.sql.gz
labspuppet-schema-create.sql.gz
neutron-schema-create.sql.gz
nodepooldb-schema-create.sql.gz
nova-schema-create.sql.gz
striker-schema-create.sql.gz
Wed, Feb 14, 10:54 AM · cloud-services-team (Kanban), Data-Services
jcrespo closed T186585: Review m5 backups as Resolved.

I have not dropped percona, will want to examine the checkusums later. Will do at another time.

Wed, Feb 14, 10:50 AM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo closed T186585: Review m5 backups, a subtask of T183029: Stop managing account creation for labsdb1001 and 1003 through the maintain-dbusers script, as Resolved.
Wed, Feb 14, 10:50 AM · cloud-services-team (Kanban), Data-Services
jcrespo moved T186585: Review m5 backups from Triage to In progress on the DBA board.
Wed, Feb 14, 10:18 AM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo added a project to T186585: Review m5 backups: DBA.
Wed, Feb 14, 10:18 AM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo claimed T186585: Review m5 backups.
Wed, Feb 14, 10:17 AM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo added a comment to T186585: Review m5 backups.

Final list:

Wed, Feb 14, 10:13 AM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo moved T187295: Apply AbuseFilter patch-fix-index from Triage to Backlog on the DBA board.
Wed, Feb 14, 9:31 AM · DBA, Blocked-on-schema-change

Tue, Feb 13

jcrespo added a comment to T186815: Badges not displaying on trwiki.

ok

Tue, Feb 13, 8:24 PM · Operations, Wikidata
jcrespo added a comment to T186815: Badges not displaying on trwiki.
curl 'https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=ext.cite.styles%7Cext.echo.badgeicons%7Cext.echo.styles.badge%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Cwikibase.client.init&only=styles&skin=vector' \
| grep '/w/extensions/WikimediaBadges/resources/images/badge-golden-star.png'
Tue, Feb 13, 7:49 PM · Operations, Wikidata
jcrespo added a comment to T179131: AbuseFilter should actively prune old IP data.

The selects seem to have bad performance >30 seconds to execute each batch of 200- this probably will need optimization, but I will wait to see if it is only due to the large amount of rows pending or something else.

Tue, Feb 13, 12:19 PM · Privacy, AbuseFilter
jcrespo added a comment to T187175: Load-test maps servers.

For load testing, you want to work closely with SRE team, in particular Traffic

Tue, Feb 13, 12:00 PM · Maps-Sprint
jcrespo added a comment to T179131: AbuseFilter should actively prune old IP data.

This seems to be ongoing, but probably will take a long time to finish for its first execution. This is actually desirable- we want this to happen, but not as fast that affects other writes.

Tue, Feb 13, 11:56 AM · Privacy, AbuseFilter
jcrespo renamed T183485: Please consider purging/moving the cx_corpora table at x1 from Please consider purging the cx_corpora table to Please consider purging/moving the cx_corpora table at x1.
Tue, Feb 13, 8:57 AM · Language-2018-Jan-Mar, ContentTranslation
jcrespo added a comment to T183485: Please consider purging/moving the cx_corpora table at x1.

I want to stress that I do not really need purging if that is not desirable, as long as chunks of content are not on s*/x* hosts and are moved to es* (content hosts).

Tue, Feb 13, 8:57 AM · Language-2018-Jan-Mar, ContentTranslation
jcrespo added a comment to T187143: Upcoming phabricator upgrade requires unusually long database migrations.

Can you share the migration script in advance?

Tue, Feb 13, 8:51 AM · Patch-For-Review, Phabricator (2018-02-15), Release, DBA
jcrespo added a comment to T186815: Badges not displaying on trwiki.

@Superyetkin As per Ladsgroup, it seems the CSS you added here is not correct https://tr.wikipedia.org/w/index.php?title=MediaWiki%3AVector.css&type=revision&diff=18969658&oldid=18388018 Please double check it.

Tue, Feb 13, 8:49 AM · Operations, Wikidata

Mon, Feb 12

jcrespo awarded T184090: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 a Love token.
Mon, Feb 12, 5:14 PM · hardware-requests, ops-codfw, Operations, DBA
jcrespo added a comment to T186123: rack/setup/install db2093 (WAS: rack/setup/install tendril2001).

@Papaul as you may have heard, we are in a kind of an emergency right now busy on fixing other stuff, this will have to be delayed.

Mon, Feb 12, 5:11 PM · Patch-For-Review, ops-codfw, DBA, Operations
jcrespo added a subtask for T42810: Wikibase badges (tracking): T186815: Badges not displaying on trwiki.
Mon, Feb 12, 10:25 AM · Wikidata, Tracking, MediaWiki-extensions-WikibaseClient
jcrespo added a parent task for T186815: Badges not displaying on trwiki: T42810: Wikibase badges (tracking).
Mon, Feb 12, 10:25 AM · Operations, Wikidata
jcrespo added a comment to T186815: Badges not displaying on trwiki.

This is maybe a defect on the deployment of the badges extension/Wikidata or a breakage on apache redirects. However, I have open other languages, that show the image on the browser, and all generate a 404- the image comes from a base64 encoding, not from the static image:

Mon, Feb 12, 10:22 AM · Operations, Wikidata

Sun, Feb 11

jcrespo added a comment to T133523: [RFC] improve parsercache replication and sharding handling.

See my latest comments on: T167784#3961866

Sun, Feb 11, 10:51 PM · Patch-For-Review, Operations, codfw-rollout, DBA
jcrespo added a comment to T167784: WMF ParserCache disk space exhaustion.

reduction in disk usage

Sun, Feb 11, 10:42 PM · MediaWiki-Platform-Team (MWPT-Q1-Jul-Sep-2017), Performance-Team, DBA, MediaWiki-Parser
jcrespo added a comment to T186764: refreshLinks/jobqueue issues in wmf.20 causing MW-reported replag.

Do you have a one-line summary of what was the issue? "bug on X patch/deployment of Y feature"- not looking to blame anyone, just genuinely curious.

Sun, Feb 11, 10:30 PM · MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)), Patch-For-Review, MediaWiki-JobQueue, MediaWiki-Database, Regression
jcrespo added a comment to T186973: DBA review of purgeOldLogIPData.php.

Please set an ORDER BY, the LIMIT without an order by can lead to different results on masters and replicas- while you can argue that the same thing will be eventually deleted even if the query has a different order on different slaves, but there is always the risk of, in combination with other queries, to break replication. This idea of "undeterministic queries are in general banned from mediawiki" is a policy: https://www.mediawiki.org/wiki/Development_policy#Database_policy unless it can be justify for exceptional reasons.

Sun, Feb 11, 10:00 PM · Patch-For-Review, DBA, AbuseFilter

Fri, Feb 9

jcrespo added a comment to T166733: Deploy refactored comment storage.

Thanks.

Fri, Feb 9, 4:28 PM · Patch-For-Review, User-notice, MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), Epic, Release-Engineering-Team (Watching / External)
jcrespo added a comment to T159170: Sunset MySQL data store for eventlogging. Find an alternative query interface for eventlogging on analytics cluster that can replace MariaDB.

Have you consider clickhouse? It seems like an interesting open source solution with a familiar SQL interface and much better performance for analytics-like workload. We were considering it for the analytics server on labs- but there will be challenges on loading the data, as it doesn't have a proper replication or update strategy.

Fri, Feb 9, 10:48 AM · Analytics-Kanban, Analytics-EventLogging
jcrespo added a comment to T166733: Deploy refactored comment storage.

Small question regarding old comment format, while BOTH, is the old format truncated in advance, or do you let the database do it on its own?

Fri, Feb 9, 10:42 AM · Patch-For-Review, User-notice, MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), Epic, Release-Engineering-Team (Watching / External)

Thu, Feb 8

jcrespo added a comment to T186596: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller..

This is high for us DBAs, not high for dc-ops (but we cannot express that difference).

Thu, Feb 8, 7:22 PM · Patch-For-Review, ops-eqiad, DBA, Operations
jcrespo raised the priority of T186596: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. from Normal to High.
Thu, Feb 8, 7:13 PM · Patch-For-Review, ops-eqiad, DBA, Operations
jcrespo added a comment to T186579: labsdb1010 crashed.
Could not execute Delete_rows_v1 event on table hywiki.geo_tags; Can't find record in 'geo_tags', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log db1095-bin.004423, end_log_pos 400824621
Thu, Feb 8, 11:20 AM · Data-Services, DBA
jcrespo added a comment to T186689: toolforge: xtools: exceed max database connections.

@Krinkle Can you see why I do not just rise the quota directly- in this case, according to T186689#3953106, queries started to go wild -somthing that can happen to everybody, even to the best programmers- a limit helps not to bring the whole service down.

Thu, Feb 8, 11:10 AM · XTools
jcrespo added a comment to T186752: Swap objectcache table for MEMORY engine?.

If the worry is "It tends to fill up with stale entries", which I think it is not, you can easily setup a job to slowly purge entries there, or a mysql event if people don't have the infrastructure, to delete entries slowly.

Thu, Feb 8, 11:02 AM · MediaWiki-Cache, MediaWiki-Database
jcrespo added a comment to T186752: Swap objectcache table for MEMORY engine?.

I have 0 context for this, but without context- but do never use MEMORY- InnoDB is going to be more efficient on every context. Ask if you want for the long explanation, but the short explanation is that temporary tables in memory have been converted from MEMORY to InnoDB and gained performance. It is also a pain to maintain, because it can very easily break replication (restart -> failover -> write).

Thu, Feb 8, 11:00 AM · MediaWiki-Cache, MediaWiki-Database
jcrespo edited projects for T186764: refreshLinks/jobqueue issues in wmf.20 causing MW-reported replag, added: MediaWiki-Database; removed DBA.

@demon In addition to T186764#3954890, the fact that this happened exactly at code deploy means this is a MediaWiki-Database not a DBA issue. Check with the maintainers of that. Maybe some load balancer issue? CC @aaron

Thu, Feb 8, 10:35 AM · MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)), Patch-For-Review, MediaWiki-JobQueue, MediaWiki-Database, Regression

Wed, Feb 7

jcrespo updated subscribers of T186596: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller..

I am currently on step 2, "Moving current backup files to dbstore2001", FYI, /srv/backups/_mysqldump, will take a break and wait for its completion tomorrow.

Wed, Feb 7, 6:38 PM · Patch-For-Review, ops-eqiad, DBA, Operations
jcrespo closed T186730: Change user u11106 to have max 1 open connection as Resolved.

Done on the 3 labsdb backends for wikireplicas (not on toolsdb).

SHOW GRANTS FOR 'u11106';
GRANT USAGE ON *.* TO 'u11106'@'%' ... WITH MAX_USER_CONNECTIONS 1;
Wed, Feb 7, 6:18 PM · DBA
jcrespo claimed T186730: Change user u11106 to have max 1 open connection.
Wed, Feb 7, 5:21 PM · DBA
jcrespo added a comment to T153182: Perform schema change to add externallinks.el_index_60 to all wikis.

Well, it was supposed to happen just after comments in our schedule, so sorry about that.

Wed, Feb 7, 3:47 PM · Patch-For-Review, Schema-change, Blocked-on-schema-change, DBA
jcrespo claimed T184697: Failover existing eqiad database backup system to the new codfw database logical backup system.
Wed, Feb 7, 3:42 PM · Patch-For-Review, DBA
jcrespo moved T184697: Failover existing eqiad database backup system to the new codfw database logical backup system from Next to In progress on the DBA board.
Wed, Feb 7, 3:42 PM · Patch-For-Review, DBA
jcrespo added a comment to T184666: DBA review for GlobalPreferences schema.

We are currently experiencing a bit of a crisis with several emergencies on core systems, we may have some delay on attending features requests due to that- but site reliability is taking precedence-, I ask you to be patient, but contact us louder (through SOS or management) if that will block you a lot (but please only for unbreak now or similar).

Wed, Feb 7, 3:41 PM · Patch-For-Review, Community-Tech, MediaWiki-extensions-GlobalPreferences, Schema-change, DBA
jcrespo added a comment to T153182: Perform schema change to add externallinks.el_index_60 to all wikis.

We are currently experiencing a bit of a crisis with several emergencies on core systems, we may have some delay on attending features requests due to that- but site reliability is taking precedence-, I ask you to be patient, but contact us louder (through SOS or management) if that will block you a lot (but please only for unbreak now or similar).

Wed, Feb 7, 3:40 PM · Patch-For-Review, Schema-change, Blocked-on-schema-change, DBA
jcrespo moved T162070: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases from Triage to Meta/Epic on the DBA board.
Wed, Feb 7, 3:40 PM · Patch-For-Review, Operations, DBA
jcrespo added a comment to T146055: Improve privilege separation for phabricator's config files and mysql credentials.

We need to catch up on a lot of pending DBA-phabricator tasks (this, some pending failovers, upgrades to strech/mariadb 10.1) and proxyies, codfw deplyment. Let's try to schedule some time to do all of those, but for now it will have to wait as we have other important fires going on.

Wed, Feb 7, 3:39 PM · Release-Engineering-Team (Kanban), DBA, Phabricator, Security
jcrespo moved T165358: Set up and package wmfmariadbpy helper scripts so they can easily be deployed to all database server and client hosts from Triage to Meta/Epic on the DBA board.
Wed, Feb 7, 3:37 PM · DBA
jcrespo closed T180380: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers as Resolved.

Sadly, labsdb IS a hostile envirnement, and the way to solve this, which is kill users abusing resources also causes complains, so there is no easy fix. I will consider this fixed, and we can continue talking on how to improve the architecture with a solution that works for all users in the list, wikis and other tickets.

Wed, Feb 7, 3:37 PM · DBA, Data-Services
jcrespo added a comment to T180636: Make Dispenser's principle_links table accessible in new Wiki replica cluster.

I think this could be added to the analytics databases without problem- but it may take some time due to other ongoing issues, plus the work needed to productionize this service, I ask you to be patient.

Wed, Feb 7, 3:34 PM · Data-Services, DBA
jcrespo closed T182948: Create method for accessing user watchlists in database queries, a subtask of T142807: Migrate all users to new Wiki Replica cluster and decommission old hardware, as Declined.
Wed, Feb 7, 3:02 PM · Patch-For-Review, Goal, cloud-services-team (FY2017-18), Data-Services, DBA
jcrespo closed T182948: Create method for accessing user watchlists in database queries as Declined.

I am going to decline this based on our own feedback, but that doesn't mean we cannot continue discussing other methods of doing what you need. However, the initial response to the request is: "do not write private data to databases, and less to wikirreplicas". My advice would be to copy data that is public to toolsdb and do there the necessary joins, but if possible, always keeping user data on application memory, so it doesn't accidentally leak.

Wed, Feb 7, 3:02 PM · Data-Services, DBA
jcrespo moved T185084: Allow use of EtcdConfig to configure slave databases from Triage to Meta/Epic on the DBA board.
Wed, Feb 7, 2:59 PM · DBA, discovery-system, MediaWiki-Configuration, Operations
jcrespo moved T185673: Add base36 functions to ToolForge database from Backlog to Wiki replicas on the Data-Services board.
Wed, Feb 7, 2:58 PM · Data-Services, DBA
jcrespo added a comment to T185673: Add base36 functions to ToolForge database.

We are currently experiencing a bit of a crisis with several emergencies on core systems, we may have some delay on attending features requests due to that- but site reliability is taking precedence-, I ask you to be patient, but contact us louder (through SOS or management) if that will block you a lot (but please only for unbreak now or similar).

Wed, Feb 7, 2:57 PM · Data-Services, DBA
jcrespo added a comment to T184680: Update duplicate handling in reading lists API.

We are currently experiencing a bit of a crisis with several emergencies on core systems, we may have some delay on attending features requests due to that- but site reliability is taking precedence-, I ask you to be patient, but contact us louder (through SOS or management) if that will block you a lot (but please only for unbreak now or similar).

Wed, Feb 7, 2:56 PM · DBA, Reading-Infrastructure-Team-Backlog (Kanban), Reading List Service
jcrespo moved T173943: Display count of remaining content space errors from Triage to Blocked external/Not db team on the DBA board.
Wed, Feb 7, 2:52 PM · DBA, Patch-For-Review, MediaWiki-extensions-Linter
jcrespo added a comment to T186596: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller..

@Marostegui, let me know what you think of the plan:

  • Deploy the above patch
  • Move current backup files to dbstore2001
  • Reimage and format all partitions of dbstore1001, including a stretch upgrade
  • In parallel, try to do the goal: T184697 and its related tickets.
Wed, Feb 7, 2:50 PM · Patch-For-Review, ops-eqiad, DBA, Operations
jcrespo removed a project from T109179: Migrate MySQLs to use ROW-based replication: Goal.

This is important, but not a goal for this quarter- we are still blocked on mediawiki extension maintainers to be compatible with it; however, all databases (misc, x1, parsercache, es) have been meanwhile migrated to ROW already with great success.

Wed, Feb 7, 2:47 PM · Operations, DBA
jcrespo updated subscribers of T159423: Meta ticket: Migrate multi-source database hosts to multi-instance.

Only dbstore1002 pending, which we are in talks with Analytics to replace soon CC @Ottomata @elukey (not sure if there is a ticket already).

Wed, Feb 7, 2:42 PM · Epic, DBA
jcrespo renamed T176243: Decommission database hosts <= db2031 (tracking) from Decommission database hosts < db2030 (tracking) to Decommission database hosts <= db2031 (tracking).
Wed, Feb 7, 2:40 PM · Patch-For-Review, Goal, DBA
jcrespo closed T159430: convert dbstore1001 to multi-instance + InnoDB compressed by importing db shards to it as Declined.

T186596 happened, we should decline this and create a new one setting up the new 2 provisioning hosts.

Wed, Feb 7, 2:38 PM · DBA
jcrespo claimed T186596: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller..

claiming it for cleaning up purposes only.

Wed, Feb 7, 2:33 PM · Patch-For-Review, ops-eqiad, DBA, Operations
jcrespo added a comment to T186685: Remove deleted wikis from wikireplicas.

I do, we can reverse that if we want, but that would take more effort (reimporting all deleted wikis). I would prefer to work towards deleting them from production, too :-).

Wed, Feb 7, 12:33 PM · Data-Services, DBA
jcrespo added a comment to T186675: Add 'centralauth' to meta_p.wiki so that apps can re-use the appropriate slice.

@Krinkle, in fact, replica maintenance scripts do fall into cloud team territory, so not much we can help here. But it seems like a reasonable request to me, centralauth should be on s7, and while I could manually add it easily, I prefer them to do it, as it probably needs code changes and would be lost on the next change.

Wed, Feb 7, 12:30 PM · Toolforge, Data-Services, cloud-services-team
jcrespo updated subscribers of T186689: toolforge: xtools: exceed max database connections.

@aborrero Let maintainer (@Cyberpower678 @Matthewrbowker @MusikAnimal @Samwilson @Technical13) triage this, 20 connections are already a lot (more than the default), so maybe they can implement some queuing, dbs are up and running, so there is nothing we DBAs can do for now.

Wed, Feb 7, 12:20 PM · XTools
jcrespo added a comment to T182916: Database error: Unable to connect to s7.web.db.svc.eqiad.wmflabs.

As I said, 10 increased to 15-20 is something I am open to do, and we did for others, e.g. if more than 10 people use it at the same time- that is reasonable.

Wed, Feb 7, 9:44 AM · cloud-services-team (Kanban), DBA, Toolforge, Data-Services, Tool-Global-user-contributions
jcrespo added a comment to T186685: Remove deleted wikis from wikireplicas.

why closing this? This is perfectly valid!

Wed, Feb 7, 9:26 AM · Data-Services, DBA
jcrespo added a comment to T186685: Remove deleted wikis from wikireplicas.

T181925

Wed, Feb 7, 9:23 AM · Data-Services, DBA

Tue, Feb 6

jcrespo added a comment to T182916: Database error: Unable to connect to s7.web.db.svc.eqiad.wmflabs.

No, I do not expect you to close the connection after every query- but I expect, unlike dedicated databases, to close and not reuse connections OR limit the number of connections open to less than 5-10 a bit more if the resources are slightly increased. Persistent connections ("reusing connections") were, and are still disallowed- as reflected on https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Connection_handling_policy even if they impact negatively your latency. Please also provide your account name, it is possible that other problems are happening, like you getting banned automatically by the db if you create too much traffic, etc.

Tue, Feb 6, 9:33 PM · cloud-services-team (Kanban), DBA, Toolforge, Data-Services, Tool-Global-user-contributions
jcrespo added a comment to T186585: Review m5 backups.

Thanks.

Tue, Feb 6, 6:49 PM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo added a comment to T186585: Review m5 backups.

I will add labsdbaccounts, make sure there is at least 1 full backup and unblock T183029

Tue, Feb 6, 5:20 PM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo raised the priority of T186585: Review m5 backups from Normal to High.
Tue, Feb 6, 9:50 AM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo added a comment to T186579: labsdb1010 crashed.
Time will dictate if we actually need to rebuild this host if more errors show up after the crash.
Tue, Feb 6, 9:47 AM · Data-Services, DBA
jcrespo added a comment to T186585: Review m5 backups.

heartbeat, information_schema, performance_schema and mysql are system databases that need no sql-backups. percona may contain some checsums, but probably should be deleted and checked again.

Tue, Feb 6, 9:44 AM · DBA, cloud-services-team (Kanban), Data-Services
jcrespo triaged T186585: Review m5 backups as Normal priority.
Tue, Feb 6, 9:42 AM · DBA, cloud-services-team (Kanban), Data-Services