tstarling (Tim Starling)Administrator
User

Projects (17)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 15 2014, 8:27 PM (208 w, 6 d)
Roles
Administrator
Availability
Available
LDAP User
Tim Starling
MediaWiki User
Tim Starling (WMF) [ Global Accounts ]

Recent Activity

Wed, Oct 10

tstarling created T206586: Write LuaSandbox DocBook manual.
Wed, Oct 10, 1:34 AM · Patch-For-Review, Documentation, Core Platform Team, LuaSandbox

Thu, Sep 27

tstarling added a comment to T205059: Excimer: new profiler for PHP.

I think there should also be ExcimerProfiler::flush(), which detaches the log and returns it, similar to what happens on an implicit flush. The theory is that ExcimerProfiler::stop() will leave the log still attached to the profiler, so this:

Thu, Sep 27, 4:06 AM · Core Platform Team Kanban (Doing), Patch-For-Review, Performance-Team (Radar), PHP 7.1 support

Wed, Sep 26

kostajh awarded T176370: Migrate to PHP 7 in WMF production a Love token.
Wed, Sep 26, 6:42 PM · Patch-For-Review, Core Platform Team Backlog (Watching / External), TechCom-RFC (TechCom-Approved), User-ArielGlenn, HHVM, Operations

Tue, Sep 25

tstarling closed T97192: HHVM request timeouts not working; support lowering the API request timeout per request as Resolved.
Tue, Sep 25, 1:07 PM · User-notice, Performance-Team (Radar), Patch-For-Review, User-Joe, Operations, Services (watching), Wikimedia-Incident, HHVM, Availability, MediaWiki-API
tstarling closed T97192: HHVM request timeouts not working; support lowering the API request timeout per request, a subtask of T97204: RFC: Request timeouts and retries, as Resolved.
Tue, Sep 25, 1:07 PM · Services (watching), Wikimedia-Incident, TechCom-RFC (TechCom-Approved), Proposal, Operations, RESTBase, Availability, Performance, Incident-20150423-Commons, Service-Architecture
tstarling created T205370: PHP 7 object layout/allocation error in LuaSandbox.
Tue, Sep 25, 6:37 AM · Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), PHP 7.0 support, LuaSandbox

Mon, Sep 24

tstarling added a comment to T205059: Excimer: new profiler for PHP.

I'm planning the timer backend component. An interesting wrinkle is ZTS support. As in LuaSandbox, we can have an integer ID (sival_int) with our timer struct stored in a hashtable, with a lock protecting it from concurrent updates. Instead of setting a hook in a lua_State, we need to store &EG(vm_interrupt) in the timer struct, since in PHP 7.0+ it is declared with __thread, so taking the address of it is the only way to transport it to the handler thread. Then when the zend_interrupt_function() hook is called, the hook function will need to find all the ExcimerTimer/ExcimerProfiler instances associated with the local thread that have pending events -- this was not a problem with LuaSandbox which only had one "timer set" per lua_State.

Mon, Sep 24, 11:35 AM · Core Platform Team Kanban (Doing), Patch-For-Review, Performance-Team (Radar), PHP 7.1 support
tstarling added a comment to T186302: Promote LuaSandbox as its own project, separate from Scribunto.

I started the process of adding LuaSandbox to PECL: https://marc.info/?l=pecl-dev&m=153776610925078&w=2

Mon, Sep 24, 5:26 AM · Patch-For-Review, LuaSandbox, Librarization, MediaWiki-extensions-Scribunto
tstarling added a comment to T205059: Excimer: new profiler for PHP.

The main reason to use a flush callback is for real-time analysis of overload events. The problem we've had in the past is that if profiling data is only logged at the end of the request, the requests that are timing out are invisible. If we log once every 10 seconds, we can get a realistic snapshot of what the cluster is doing.

Mon, Sep 24, 3:22 AM · Core Platform Team Kanban (Doing), Patch-For-Review, Performance-Team (Radar), PHP 7.1 support

Sat, Sep 22

Krinkle awarded T205059: Excimer: new profiler for PHP a Orange Medal token.
Sat, Sep 22, 11:50 PM · Core Platform Team Kanban (Doing), Patch-For-Review, Performance-Team (Radar), PHP 7.1 support

Fri, Sep 21

tstarling added a subtask for T176916: Set up sampling profiler for PHP 7 (alternative to HHVM Xenon): T205059: Excimer: new profiler for PHP.
Fri, Sep 21, 6:53 AM · Core Platform Team Kanban (Doing), PHP 7.1 support, Core Platform Team (PHP7 (TEC4)), Performance-Team
tstarling added a parent task for T205059: Excimer: new profiler for PHP: T176916: Set up sampling profiler for PHP 7 (alternative to HHVM Xenon).
Fri, Sep 21, 6:53 AM · Core Platform Team Kanban (Doing), Patch-For-Review, Performance-Team (Radar), PHP 7.1 support
tstarling created T205059: Excimer: new profiler for PHP.
Fri, Sep 21, 6:44 AM · Core Platform Team Kanban (Doing), Patch-For-Review, Performance-Team (Radar), PHP 7.1 support

Wed, Sep 19

tstarling committed rMSPC9f85eb972148: Add man page (authored by Legoktm).
Add man page
Wed, Sep 19, 6:29 AM

Sep 12 2018

tstarling added a comment to T151291: "User::loadFromSession called before the end of Setup.php" warning due to AbuseFilter.

For createaccount/autocreateaccount filtering, shouldn't the log performer always be anonymous? It doesn't make sense to use a non-existent user, half created, as the performer. That logic shouldn't depend on User::isSafeToLoad(), which is implemented in a hackish way, it should just depend on $action.

Sep 12 2018, 11:23 PM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Later), User-Daimona, AbuseFilter, Wikimedia-production-error
tstarling moved T151291: "User::loadFromSession called before the end of Setup.php" warning due to AbuseFilter from Inbox to Backlog on the Core-Platform-Team-Old board.
Sep 12 2018, 1:38 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Later), User-Daimona, AbuseFilter, Wikimedia-production-error
tstarling moved T78802: Localization Cache Redo from Inbox to Backlog on the Core-Platform-Team-Old board.
Sep 12 2018, 1:35 AM · Core Platform Team ( Code Health (TEC13)), Core Platform Team Backlog (Epic), Release-Engineering-Team, Deployments, MediaWiki-extensions-LocalisationUpdate, I18n, Epic
tstarling moved T158360: RFC: Reevaluate LocalisationUpdate extension for WMF from Inbox to Backlog on the Core-Platform-Team-Old board.
Sep 12 2018, 1:35 AM · Core Platform Team ( Code Health (TEC13)), Core Platform Team Backlog (Later), Release-Engineering-Team, Deployments, TechCom-RFC, I18n
tstarling moved T203356: Sort out semantics of causeAgent and triggeringUser/triggeringRevisionId from Inbox to Watching on the Core-Platform-Team-Old board.
Sep 12 2018, 1:20 AM · Core Platform Team Backlog (Watching / External), Technical-Debt, MediaWiki-Documentation
tstarling removed a project from T203929: cpjobqueue should log a warning when there is an HTTP error: Core-Platform-Team-Old.
Sep 12 2018, 1:16 AM · Services (doing), ChangeProp
tstarling moved T203935: Support Save-Data header from Inbox to Watching on the Core-Platform-Team-Old board.
Sep 12 2018, 1:15 AM · Core Platform Team Backlog (Watching / External), TimedMediaHandler, MediaWiki-Parser, Reading-Infrastructure-Team-Backlog, Readers-Web-Backlog (Tracking), MobileFrontend
tstarling moved T204072: parser tests should reset service locator from Inbox to Backlog on the Core-Platform-Team-Old board.
Sep 12 2018, 1:13 AM · Multi-Content-Revisions (MCR-SDC File Caption Support - phase 2), MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old, MediaWiki-Core-Tests

Sep 11 2018

tstarling added a comment to T203929: cpjobqueue should log a warning when there is an HTTP error.

It's very mysterious. My best guess is that "PHP fatal error: unlink(1): No such file or directory" was actually a suppressed warning, and was later misinterpreted as a fatal. There's definitely no entries in fatal.log with "unlink" in the message.

Sep 11 2018, 6:25 AM · Services (doing), ChangeProp
tstarling added a comment to T203929: cpjobqueue should log a warning when there is an HTTP error.

It was webVideoTranscodePrioritized. This is the JobExecutor log line for one of the affected jobs:

Sep 11 2018, 5:46 AM · Services (doing), ChangeProp
tstarling updated the task description for T204010: Timeouts in wikidiff2.
Sep 11 2018, 3:08 AM · Wikimedia-production-error, wikidiff2
tstarling added a comment to T181454: Port wikidiff2 to a memory-safe language.

I would be interested in those logs, can you point me to them?

Sep 11 2018, 1:16 AM · MediaWiki-History-or-Diffs, wikidiff2
tstarling added a comment to T204010: Timeouts in wikidiff2.

You can find the logs by going to https://logstash.wikimedia.org, selecting "Dev Tools" and entering the query

Sep 11 2018, 1:16 AM · Wikimedia-production-error, wikidiff2
tstarling created T204010: Timeouts in wikidiff2.
Sep 11 2018, 1:10 AM · Wikimedia-production-error, wikidiff2

Sep 10 2018

tstarling added a comment to T181454: Port wikidiff2 to a memory-safe language.

I'm not especially concerned about wikidiff2 since it's written in a defensive way and has mostly been reviewed for security. We are passing user input to much more horrifying C code, like exif which has the worst pointer arithmetic tricks I have ever seen and has been the subject of multiple security vulnerabilities.

Sep 10 2018, 8:31 AM · MediaWiki-History-or-Diffs, wikidiff2
tstarling updated subscribers of T203930: EchoDiscussionParser is slow, causes timeouts.

The diff in question has a large number of diff ops, but only one section. So for every diff op, getSectionStartIndex() and getSectionEndIndex() have to run a regex match on almost every line in the page, looking for headings. I think a better algorithm would be to have a cache giving the section start index for each line. If a line is not in the cache, then do the preg_match() on the current line, and if it is not a section header, repeat for the previous line with a new cache fetch. And do the same for getSectionEndIndex() looking forwards.

Sep 10 2018, 5:29 AM · Patch-For-Review, Performance-Team (Radar), Notifications, Growth-Team
tstarling created T203930: EchoDiscussionParser is slow, causes timeouts.
Sep 10 2018, 4:57 AM · Patch-For-Review, Performance-Team (Radar), Notifications, Growth-Team
tstarling added projects to T203929: cpjobqueue should log a warning when there is an HTTP error: Core-Platform-Team-Old, Services.
Sep 10 2018, 2:58 AM · Services (doing), ChangeProp
tstarling created T203929: cpjobqueue should log a warning when there is an HTTP error.
Sep 10 2018, 2:57 AM · Services (doing), ChangeProp

Sep 7 2018

tstarling committed rMSPC07a9f0fef273: Improve test comments (authored by tstarling).
Improve test comments
Sep 7 2018, 7:04 AM
tstarling committed rMSPC05de76ad0043: Improve test comments (authored by tstarling).
Improve test comments
Sep 7 2018, 7:04 AM

Sep 6 2018

tstarling reopened T97192: HHVM request timeouts not working; support lowering the API request timeout per request as "Open".

It's not fixed, or has regressed. I noticed this today due to T203628 and confirmed it by placing a simple infinite loop script in mwdebug1002's /w directory.

Sep 6 2018, 7:17 AM · User-notice, Performance-Team (Radar), Patch-For-Review, User-Joe, Operations, Services (watching), Wikimedia-Incident, HHVM, Availability, MediaWiki-API
tstarling reopened T97192: HHVM request timeouts not working; support lowering the API request timeout per request, a subtask of T97204: RFC: Request timeouts and retries, as Open.
Sep 6 2018, 7:17 AM · Services (watching), Wikimedia-Incident, TechCom-RFC (TechCom-Approved), Proposal, Operations, RESTBase, Availability, Performance, Incident-20150423-Commons, Service-Architecture
tstarling awarded T190111: VirtualHost for mod_status breaks debugging Apache/MediaWiki from localhost a Cookie token.
Sep 6 2018, 5:26 AM · Performance-Team (Radar), Wikimedia-Apache-configuration, Operations
tstarling closed T203628: Infinite loop in quiz shuffleAnswers as Resolved.
Sep 6 2018, 4:17 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Patch-For-Review, MediaWiki-extensions-Quiz
tstarling created T203628: Infinite loop in quiz shuffleAnswers.
Sep 6 2018, 3:20 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Patch-For-Review, MediaWiki-extensions-Quiz

Sep 5 2018

tstarling assigned T203424: Replace the WikiExporter backup dump streaming mode with batched queries to BPirkle.
Sep 5 2018, 2:12 AM · Core Platform Team Kanban (Done with CPT), MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), Core Platform Team (Security, stability, performance and scalability (TEC1)), Patch-For-Review, MediaWiki-Export-or-Import

Sep 4 2018

tstarling closed T202641: Allow NameTableStores to be reset in sync with their associated database tables in unit tests. as Resolved.
Sep 4 2018, 11:25 PM · MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), Patch-For-Review, MediaWiki-Core-Tests
tstarling created T203424: Replace the WikiExporter backup dump streaming mode with batched queries.
Sep 4 2018, 4:43 AM · Core Platform Team Kanban (Done with CPT), MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), Core Platform Team (Security, stability, performance and scalability (TEC1)), Patch-For-Review, MediaWiki-Export-or-Import

Aug 28 2018

jcrespo awarded T183488: MCR schema migration stage 2: populate new fields a Evil Spooky Haunted Tree token.
Aug 28 2018, 1:57 PM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata

Aug 27 2018

tstarling updated subscribers of T202641: Allow NameTableStores to be reset in sync with their associated database tables in unit tests..

@CCicalese_WMF asked me to look at this. Tracking all NameTableStore objects looked easy enough, and is beneficial for other reasons, so I did that in the patch linked above. However, without existing calling code, I'm not sure exactly what @Addshore needs for a reset/reload interface and where it would need to be called, so maybe he could add that in a dependent patch.

Aug 27 2018, 5:38 AM · MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), Patch-For-Review, MediaWiki-Core-Tests

Aug 24 2018

tstarling added a comment to T202546: Requesting access to restricted production access for Bill Pirkle.

I approve of this request. As the task description says, I recommended it. I note that @Fjalapeno is Bill's manager, so he should sign off on it, I'm just a tech lead now.

Aug 24 2018, 2:58 AM · Patch-For-Review, Operations, SRE-Access-Requests

Aug 23 2018

awight awarded T185607: Provide an inline discussion feature, "DiscussThis" a Doubloon token.
Aug 23 2018, 10:18 PM · Growth-Team, VisualEditor-MediaWiki-Plugins, Collaboration-Team-Triage, StructuredDiscussions, VisualEditor, TechCom-RFC

Aug 22 2018

tstarling added a comment to T202483: www.mediawiki.org showing: error:Unknown database 'wikidatawiki' on shard: s3.

BlobStoreFactory has a single LoadBalancer injected into its constructor, but allows the caller to choose the wiki ID in newSqlBlobStore(). So that's wrong. Wikidata just gets the BlobStoreFactory from MediaWikiServices::getBlobStoreFactory(), which doesn't allow you to specify the wiki ID and just gives a BlobStoreFactory with the LoadBalancer for the current wiki. BlobStoreFactory could take an LBFactory instead, which would allow newSqlBlobStore() to fetch the correct LoadBalancer.

Aug 22 2018, 6:36 AM · MW-1.32-notes (WMF-deploy-2018-08-28 (1.32.0-wmf.19)), User-Addshore, Patch-For-Review, Wikidata, Wikimedia-production-error, Release, Release-Engineering-Team (Kanban), Train Deployments
tstarling updated subscribers of T202483: www.mediawiki.org showing: error:Unknown database 'wikidatawiki' on shard: s3.
Aug 22 2018, 6:25 AM · MW-1.32-notes (WMF-deploy-2018-08-28 (1.32.0-wmf.19)), User-Addshore, Patch-For-Review, Wikidata, Wikimedia-production-error, Release, Release-Engineering-Team (Kanban), Train Deployments
tstarling added a comment to T202483: www.mediawiki.org showing: error:Unknown database 'wikidatawiki' on shard: s3.

Passing the wrong LoadBalancer into the SqlBlobStore constructor would have approximately this effect. The "previous.trace" is:

Aug 22 2018, 6:24 AM · MW-1.32-notes (WMF-deploy-2018-08-28 (1.32.0-wmf.19)), User-Addshore, Patch-For-Review, Wikidata, Wikimedia-production-error, Release, Release-Engineering-Team (Kanban), Train Deployments
tstarling added a comment to T199383: WaitConditionLoop callers need to log on timeout.

The above patches only address BagOStuff. There is also:

Aug 22 2018, 4:23 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Later), MW-1.32-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), Patch-For-Review, MediaWiki-General-or-Unknown, Performance-Team (Radar), Wikimedia-Incident
tstarling added a comment to T202107: Job queue should not overload the DB servers when there is replication lag.

You could measure concurrency by summing a timing metric, like I did for the "load" metrics in the API dashboard: https://grafana.wikimedia.org/dashboard/db/api-backend-summary?refresh=5m&orgId=1 . I used:

Aug 22 2018, 1:23 AM · WMF-JobQueue, Services (doing), Patch-For-Review, ChangeProp, Availability
tstarling added a comment to T172497: Fix mediawiki heartbeat model, change pt-heartbeat model to not use super-user, avoid SPOF and switch automatically to the real master without puppet dependency.

Does it have to use the same table definition? To measure lag, MediaWiki uses

Aug 22 2018, 1:11 AM · Core Platform Team Backlog (Watching / External), Performance-Team (Radar), MediaWiki-Database, Wikimedia-Incident, DBA

Aug 21 2018

tstarling updated subscribers of T73010: Don't use the same 'ArticleView' poolcounter for anonymous and logged in users.

Can anyone recall the rationale for this? @Anomie thinks the task was created as an action item from a MediaWiki Core Team meeting. If there were notes from that meeting, then that might help.

Aug 21 2018, 2:48 AM · Core Platform Team Kanban (Done with CPT), MediaWiki-Interface, PoolCounter
tstarling added a comment to T198176: Mediawiki page deletions should happen in batches of revisions.

For testing after the new feature reaches group0, I note that there are a few pages with a large number of revisions on test.wikipedia.org:

Aug 21 2018, 2:45 AM · Core Platform Team Kanban (Doing), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Core Platform Team ( Code Health (TEC13)), Wikimedia-production-error, Patch-For-Review, MediaWiki-Page-deletion
tstarling reassigned T202032: Duplicate ar_rev_id values in several wikis from tstarling to Anomie.
Aug 21 2018, 1:37 AM · Multi-Content-Revisions (Deployment), MW-1.32-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Patch-For-Review, Wikidata, Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), SDC General
tstarling moved T73010: Don't use the same 'ArticleView' poolcounter for anonymous and logged in users from Inbox to CPT TEC1 Backlog on the Core-Platform-Team-Old board.
Aug 21 2018, 1:26 AM · Core Platform Team Kanban (Done with CPT), MediaWiki-Interface, PoolCounter
tstarling removed a project from T106244: URL encoded values using fallback 8-bit encoding (invalid UTF-8) cause mediawiki.Uri to crash: Core-Platform-Team-Old.
Aug 21 2018, 1:23 AM · MediaWiki-General-or-Unknown, Performance-Team, JavaScript
tstarling moved T202352: Convert MultiHttpClient to use Guzzle from Inbox to CPT-Q1-Jul-Sep-2018 on the Core-Platform-Team-Old board.
Aug 21 2018, 1:20 AM · Core Platform Team Kanban (Doing), Core Platform Team ( Code Health (TEC13)), Patch-For-Review, MediaWiki-General-or-Unknown

Aug 20 2018

tstarling added a comment to T183488: MCR schema migration stage 2: populate new fields.

enwiki is complete now, so only the T202032 wikis remain.

Aug 20 2018, 4:32 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata

Aug 17 2018

tstarling added a comment to T202107: Job queue should not overload the DB servers when there is replication lag.

I can stop replication or partially bring one server down and show it, but @tstarling won't let me for now.

Aug 17 2018, 10:35 AM · WMF-JobQueue, Services (doing), Patch-For-Review, ChangeProp, Availability
tstarling added a comment to T202107: Job queue should not overload the DB servers when there is replication lag.

I think I've found the correct configuration file now, at mediawiki/services/change-propagation/jobqueue-deploy/scap/vars.yaml . I couldn't tell if the concurrency limits are normally reached, and I couldn't figure out how they add up to a global connection count. Looking at current connection counts from scb* to jobrunner.svc with netstat I see counts of 113, 318, 52, 107. MediaWiki has 60 job types, is it correct to multiply that by 30, which is the top-level concurrency in vars.yaml, and then to adjust for the overridden queue types? 55 classes with 30 connections each plus the 5 overrides would make 1970 connections. Then I multiply this by 4 scb servers, for a total of 7880 maximum connections. Is this correct?

Aug 17 2018, 5:50 AM · WMF-JobQueue, Services (doing), Patch-For-Review, ChangeProp, Availability
tstarling changed the visibility for T202107: Job queue should not overload the DB servers when there is replication lag.
Aug 17 2018, 1:56 AM · WMF-JobQueue, Services (doing), Patch-For-Review, ChangeProp, Availability
tstarling triaged T202107: Job queue should not overload the DB servers when there is replication lag as Normal priority.
Aug 17 2018, 1:18 AM · WMF-JobQueue, Services (doing), Patch-For-Review, ChangeProp, Availability

Aug 16 2018

tstarling added a comment to T183488: MCR schema migration stage 2: populate new fields.

Current status: everything is done except enwiki and the T202032 wikis. enwiki has about another 49 hours to run.

Aug 16 2018, 10:05 PM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata
tstarling triaged T202032: Duplicate ar_rev_id values in several wikis as Normal priority.
Aug 16 2018, 1:32 AM · Multi-Content-Revisions (Deployment), MW-1.32-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Patch-For-Review, Wikidata, Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), SDC General
tstarling created P7461 aawikibooks ar_rev_id conflicts.
Aug 16 2018, 12:06 AM

Aug 15 2018

tstarling added a comment to T183488: MCR schema migration stage 2: populate new fields.

So how do we end up trying to insert a row for revision 3003 twice?

Aug 15 2018, 10:23 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata

Aug 14 2018

tstarling added a comment to T183488: MCR schema migration stage 2: populate new fields.

You can see the full logs at mwmaint1001:/var/log/mediawiki/populateContentTables/ . On both aawikibooks and gotwikibooks, the error occurred on the second batch of the archive table, starting at ar_rev_id 2001. In both cases it was also the last batch, with the maximum ar_rev_id being 3275 and 3175 respectively.

Aug 14 2018, 10:32 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata
tstarling closed T200881: Create ParserFactory service as Resolved.
Aug 14 2018, 6:56 AM · MW-1.32-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Patch-For-Review, MediaWiki-Parser, Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018)
tstarling closed T200246: Introduce ContentLanguage service to replace $wgContLang as Resolved.
Aug 14 2018, 6:54 AM · MW-1.32-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Patch-For-Review, Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), MediaWiki-General-or-Unknown, Technical-Debt
tstarling closed T200246: Introduce ContentLanguage service to replace $wgContLang, a subtask of T160815: Deprecate $wgContLang, as Resolved.
Aug 14 2018, 6:54 AM · MW-1.32-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Patch-For-Review, Technical-Debt (Deprecation), MediaWiki-General-or-Unknown
tstarling closed T110209: Maintenance scripts should fail on unknown parameters as Resolved.
Aug 14 2018, 6:52 AM · MW-1.32-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Wikimedia-Incident, Incident-20150825-Redis, MediaWiki-Maintenance-scripts
tstarling moved T200864: Pingback on non-MySQL databases fails to save to updatelog, generates a high rate of unique pings from In Progress to Closed on the Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018) board.
Aug 14 2018, 6:50 AM · MW-1.31-release-notes, MW-1.32-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), MediaWiki-General-or-Unknown, MW-1.31-release, Patch-For-Review, Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018)
tstarling moved T200861: Web upgrade of SQLite does not work, just skips to install from In Progress to Closed on the Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018) board.
Aug 14 2018, 6:50 AM · MW-1.31-release-notes, MW-1.30-release-notes, MW-1.29-release-notes, MW-1.31-release, MW-1.30-release, MW-1.29-release, MW-1.32-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, MediaWiki-Database, SQLite, Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018)
tstarling added a comment to T193565: Foreign query for metawiki fails with "Table 'centralauth.page' doesn't exist" (DBConnRef mixup?).

I tried importing a file into testwiki with curl, forcing a centralauth DB connection in the same request by first deleting the global:centralauth-user:... cache key, still could not reproduce.

Aug 14 2018, 6:45 AM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Patch-For-Review, Core Platform Team Kanban (Doing), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Performance-Team, Core Platform Team (Security, stability, performance and scalability (TEC1)), Wikimedia-production-error, MediaWiki-Database
tstarling added a comment to T183488: MCR schema migration stage 2: populate new fields.

Log summary:

Aug 14 2018, 5:32 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata
tstarling added a comment to T183488: MCR schema migration stage 2: populate new fields.

@tstarling Please stop writes going to *s2* unless they have already finished

Aug 14 2018, 5:12 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata

Aug 13 2018

tstarling updated subscribers of T193565: Foreign query for metawiki fails with "Table 'centralauth.page' doesn't exist" (DBConnRef mixup?).

It doesn't have to be a LoadBalancer bug, it could just be some other extension calling reuseConnection() inappropriately. It's hard to debug without a reproduction procedure. I see in the logs that there was a series of these on 2018-08-06 with URL https://sat.wikipedia.org/w/index.php?title=%E1%B1%9F%E1%B1%A5%E1%B1%9A%E1%B1%A0%E1%B1%9F%E1%B1%AD:Import&action=submit , and the failed query indicates that the user was @MF-Warburg , who did have successful file upload imports at that time in the logs: https://sat.wikipedia.org/wiki/%E1%B1%9F%E1%B1%A5%E1%B1%9A%E1%B1%A0%E1%B1%9F%E1%B1%AD:Log/import

Aug 13 2018, 6:54 AM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Patch-For-Review, Core Platform Team Kanban (Doing), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Performance-Team, Core Platform Team (Security, stability, performance and scalability (TEC1)), Wikimedia-production-error, MediaWiki-Database
tstarling added a comment to T201799: Should ParserFactory call firstCallInit()?.

It's important to avoid running it on requests that don't need it. In particular, requests that only call $wgParser->setHook() but not Parser::parse() should not call firstCallInit(). Maybe the risk of that is fading but my understanding is that it's not quite gone yet.

Aug 13 2018, 4:15 AM · Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), MediaWiki-Parser

Aug 11 2018

tstarling added a comment to T194697: Multiblocks — Allow for multiple, simultaneously blocks with different expiration dates..
  1. Is there a reason why bt_auto exists in block_target instead of block_entry. It feels to me that autoblocks can be just another entry
Aug 11 2018, 11:43 PM · MediaWiki-User-management, Anti-Harassment

Aug 9 2018

tstarling added a comment to T134976: SpecialRecentChangesLinked::doMainQuery blocking database infrastructure.

The initial report showed a query which didn't even use ORES, so it seems unfair to assign it to them.

Aug 9 2018, 4:38 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Later), Growth-Team, MediaWiki-Recent-changes, Wikimedia-production-error
tstarling added a comment to T194697: Multiblocks — Allow for multiple, simultaneously blocks with different expiration dates..

Here's my proposal.

Aug 9 2018, 1:01 AM · MediaWiki-User-management, Anti-Harassment

Aug 8 2018

jcrespo awarded T201482: LinksUpdate fails, spams exception logs, whenever replication lag on any server rises above 10s a Love token.
Aug 8 2018, 6:49 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Next), MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), Patch-For-Review, Performance-Team (Radar), MediaWiki-Database
tstarling renamed T59186: Drop blob_tracking and blob_orphans everywhere from blob_tracking indexes apparently unused to Drop blob_tracking and blob_orphans everywhere.
Aug 8 2018, 6:28 AM · Patch-For-Review, DBA, MediaWiki-Database
tstarling added a comment to T201240: Transaction timeout for LinksUpdate::updateLinksTimestamp (SET page_links_updated) .

We have debug logs for this request. On mwlog1001 do zgrep W2XVZApAAC4AAEKMbQAAAAAV /srv/mw-log/archive/test2wiki.log-20180805.gz

Aug 8 2018, 4:35 AM · Performance-Team, Core-Platform-Team-Old, Regression, Wikimedia-production-error, MediaWiki-Page-editing
tstarling closed T198049: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC as Resolved.
Aug 8 2018, 2:37 AM · Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Wikidata, Operations
tstarling created T201482: LinksUpdate fails, spams exception logs, whenever replication lag on any server rises above 10s.
Aug 8 2018, 2:34 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Next), MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), Patch-For-Review, Performance-Team (Radar), MediaWiki-Database
tstarling created T201481: API maxlag stats.
Aug 8 2018, 2:07 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Next), MediaWiki-API
tstarling added a comment to T198049: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC.

db1071, the master, had no writes

Aug 8 2018, 1:48 AM · Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Wikidata, Operations

Aug 7 2018

tstarling added a comment to T198049: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC.

The drop may have been caused by the API maxlag parameter. Wikidata:Bots recommends using a maxlag parameter, and some client libraries set maxlag=5 by default. The point of this feature is to make bots pause during replication lag, to prioritise human users and avoid worsening the situation.

Aug 7 2018, 11:57 AM · Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Wikidata, Operations
tstarling added a comment to T198049: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC.

51,715 exceptions with:

[{exception_id}] {exception_url} Wikimedia\Rdbms\DBReplicationWaitError from line 426 of /srv/mediawiki/php-1.32.0-wmf.8/includes/libs/rdbms/lbfactory/LBFactory.php: Could not wait for replica DBs to catch up to db1071
Aug 7 2018, 6:14 AM · Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Wikidata, Operations
tstarling assigned T182748: $wgExternalDiffEngine should have shell restrictions to BPirkle.
Aug 7 2018, 2:11 AM · Core Platform Team Kanban (Done with CPT), Core Platform Team (Security, stability, performance and scalability (TEC1)), MediaWiki-History-or-Diffs, MediaWiki-Shell
tstarling assigned T179901: Create a tmp directory just for MediaWiki to BPirkle.
Aug 7 2018, 2:11 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Later), Security, Security-Core, MediaWiki-General-or-Unknown
tstarling assigned T198176: Mediawiki page deletions should happen in batches of revisions to BPirkle.
Aug 7 2018, 2:00 AM · Core Platform Team Kanban (Doing), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Core Platform Team ( Code Health (TEC13)), Wikimedia-production-error, Patch-For-Review, MediaWiki-Page-deletion

Aug 6 2018

tstarling added a comment to T183488: MCR schema migration stage 2: populate new fields.

@greg The WN31 things are done now, only 1081 seconds for mediawikiwiki and 9252 seconds for metawiki. For metawiki the rate was about the same as anomie got for testwiki, 2000 rows per second for the revision table and 600 rows per second for the archive table. At that rate, we can expect wikidatawiki to take about 91 hours and commonswiki to take about 48 hours. We can run them concurrently since they are on different DB clusters, and that way maybe get them done by the end of the week.

Aug 6 2018, 10:16 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata
tstarling added a comment to T200960: Logstash packet loss.

Back up to ~60% loss now, due to a slow drop in capacity on logstash1008 and logstash1009. And there was a similar event on August 4, which was fixed when @fgiunchedi restarted logstash. Can we have a daily restart cron job now?

Aug 6 2018, 6:56 AM · Operations, Patch-For-Review, Wikimedia-Logstash
tstarling closed T197816: Enable MCR migration stage "write both, read old" on live systems as Resolved.
Aug 6 2018, 4:06 AM · Patch-For-Review, Multi-Content-Revisions (Deployment), SDC General, Wikidata
tstarling closed T197816: Enable MCR migration stage "write both, read old" on live systems, a subtask of T183488: MCR schema migration stage 2: populate new fields, as Resolved.
Aug 6 2018, 4:06 AM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Core-Platform-Team-Old (CPT-Q1-Jul-Sep-2018), Multi-Content-Revisions (Deployment), Patch-For-Review, SDC General, Wikidata
tstarling closed T197816: Enable MCR migration stage "write both, read old" on live systems, a subtask of T194750: Deploy Structured Data on Commons baseline , as Resolved.
Aug 6 2018, 4:06 AM · Core Platform Team (MCR), Core Platform Team Backlog (Epic), SDC Engineering, Multi-Content-Revisions (Deployment), Epic, Multimedia-Team-Working-Board, Wikidata, Multimedia