Variable from InitialiseSettings can be undefined (corrupt opcache?)
Closed, DuplicatePublicPRODUCTION ERROR
Actions

Assigned To

None

Authored By

	DannyS712
	Mar 11 2019, 6:52 AM

Description

After performing this edit using rollback on the english wikipedia, I saw an error at the "Action complete" page (here). Screenshots of the page, and of the html that caused it to display, are attached. There were no errors in the console relevant to this. I believe it is related to https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L517, since the error reports that $wmgUsePagedTiffHandler is undefined, and I can't find it mentioned previously in that file.

Details

	Subject	Repo	Branch	Lines +/-
	profile::mediawiki::php: disable display_errors in FPM mode	operations/puppet	production	+1 -0

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		None	T176370 Migrate to PHP 7 in WMF production
Resolved		jijiki	T219127 SRE FY19-20 Q1 goal: complete the transition to PHP7
Duplicate	PRODUCTION ERROR	None	T218005 Variable from InitialiseSettings can be undefined (corrupt opcache?)

Event Timeline

DannyS712 created this task.Mar 11 2019, 6:52 AM

Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald TranscriptMar 11 2019, 6:52 AM

DannyS712 updated the task description. (Show Details)Mar 11 2019, 6:53 AM

Thanks for reporting this!

I can't find it mentioned previously in that file.

It's in https://noc.wikimedia.org/conf/InitialiseSettings.php.txt

Nirmos added a project: Wikimedia-production-error.Mar 11 2019, 8:30 AM

@DannyS712 do you have the PHP 7 beta feature enabled?

In T218005#5014474, @Legoktm wrote:

@DannyS712 do you have the PHP 7 beta feature enabled?

@Legoktm yes, I do

Legoktm added a project: PHP 7.2 support.Mar 11 2019, 8:38 AM

Daimona subscribed.Mar 11 2019, 11:49 AM

Krinkle moved this task from Untriaged to Sep2019/1.34.wmf.21+ on the Wikimedia-production-error board.Mar 16 2019, 4:24 AM

This error suggests that somehow (part of) InitialiseSettings' variables were not exported under PHP 7. That's very concerning.

The code we use to cache and export these globals via a temporary file may need to be re-reviewed for this. In addition to potential PHP 7 compatibility, it's also possible we're seeing a pre-existing race condition where maybe different threads see a partially written file there or something else weird.

Alternatively, if the setting of this variable works fine, it might be that something is un-setting it. But that seems rather unlikely (and code search suggests nothing else refers to this except the 1 line that sets it, and the 1 line that reads it).

Krinkle triaged this task as High priority.Mar 16 2019, 4:29 AM

Krinkle added a parent task: T176370: Migrate to PHP 7 in WMF production.

Logs show 461 such errors in the last week, all between 2019-03-11T06:32:48 and 2019-03-11T06:49:09, all on the same server: mw1258. Cross-referencing with SAL shows that we were deploying around that time:

06:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 after upgrade (duration: 00m 52s)
06:38 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
06:37 marostegui: Power cycle mw1280 - server down
06:35 marostegui: Upgrade mysql and kernel on db1099
06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 for upgrade (duration: 03m 01s)
06:03 effie: Restarting pdfrender on scb1003

And know what was the end of these errors?

2019-03-11T06:49:10	php	PHP Warning: Cannot modify header information - headers already sent by (output started at /srv/mediawiki/wmf-config/CommonSettings.php:517)
2019-03-11T06:49:22	scap	http://mw1258:9181 "GET /opcache-free HTTP/1.1" 200 None

(dashboard)

Change 497024 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[operations/puppet@production] Disable display_errors in FPM mode

https://gerrit.wikimedia.org/r/497024

gerritbot added a project: Patch-For-Review.Mar 16 2019, 5:55 AM

Krinkle renamed this task from PHP notice "Undefined variable: wmgUsePagedTiffHandler" shown after rollback on enwp to Variable wmgUsePagedTiffHandler from InitialiseSettings undefined.Mar 16 2019, 6:16 AM

Krinkle moved this task from Sep2019/1.34.wmf.21+ to Resolved on the Wikimedia-production-error board.

Ladsgroup subscribed.Mar 16 2019, 1:15 PM

Nikerabbit subscribed.Mar 16 2019, 4:14 PM

So while the errors found by @MaxSem on logstash seem to refer to a host that was somehow in a bad state in terms of opcache state, which will need further analysis, the other issue to fix is the display_errors ini setting, which used to be in our php configuration and I didn't modify.

Looking at the logs on the server:

2019-03-11T06:32:48 errors start
2019-03-11T06:34:57 opcache received a request to invalidate a single file /opcache-free?file=wmf-config%2Fdb-eqiad.php, during a scap run.
2019-03-11T06:49:09 last error is logged
2019-03-11T06:49:22 a full opcache flush was requested /opcache-free - not sure what caused this, but it came from scap.

Now it seems clear to me that a full opcache flush repaired the issue, but the daemon has been restarted since so I lost any ability to further inspect what happened with its opcache. Grafana data don't show anything unusual either.

So while an opcache flush did the trick of stopping the errors, there is no obvious issue at play here. Having opcache revalidation active every 60 seconds would've probably prevented this from happening.

I'll solve the issue with display_errors and leave the bug open at lower priority so that we can relate it to other episodes of this happening.

Change 497024 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::mediawiki::php: disable display_errors in FPM mode

https://gerrit.wikimedia.org/r/497024

Krinkle renamed this task from Variable wmgUsePagedTiffHandler from InitialiseSettings undefined to Variable from InitialiseSettings can be undefined (corrupt opcache?) .Mar 28 2019, 12:23 AM

Krinkle mentioned this in T219148: Use PHP7 to run all async jobs.Apr 10 2019, 6:37 PM

Krinkle moved this task from Untriaged to Wikimedia production on the PHP 7.2 support board.Apr 16 2019, 12:34 AM

jijiki subscribed.Apr 23 2019, 12:43 PM

Krinkle removed a project: Patch-For-Review.Apr 23 2019, 9:36 PM

DannyS712 moved this task from Unsorted to In progress on the User-DannyS712 board.Apr 26 2019, 7:52 AM

Krinkle mentioned this in T218692: read only on mediawiki generates "LoadBalancer.php: Cannot access the database: Unknown error".May 7 2019, 3:02 PM

We have done some work to overcome this sporadic failures:

Reenabled opcache revalidation, which should hopefully refresh the corrupt cache within 1 minute
Allowed pybal to monitor both HHVM and php7, so that a failing server will be depooled in case of failures.

Krinkle closed this task as a duplicate of T221347: PHP7 opcache sometimes corrupts when cleared (was: Fatal ConfigException, undefined InitialiseSettings variable).May 8 2019, 2:00 PM

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptMay 8 2019, 2:00 PM

@Krinkle this task predates T221347 by a month - isn't that one the duplicate?

DannyS712 moved this task from In progress to Resolved tasks (x-tools) on the User-DannyS712 board.May 13 2019, 2:27 AM

DannyS712 moved this task from Resolved tasks (x-tools) to Archive on the User-DannyS712 board.

@DannyS712: Date is not relevant - whatever task has more info or a patch should be the one to merge into.

• mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM

Variable from InitialiseSettings can be undefined (corrupt opcache?) Closed, DuplicatePublicPRODUCTION ERRORActions

Description

Details

Related ObjectsSearch...

Event Timeline

Variable from InitialiseSettings can be undefined (corrupt opcache?)
Closed, DuplicatePublicPRODUCTION ERROR
Actions

Related Objects
Search...