Page MenuHomePhabricator

Multiple users reporting content pages displaying "NULL" compared to desired content
Closed, ResolvedPublic


<ankry> Hmm, what does it mean:
<ankry> I got NULL instead of page content
<nadando> Does anyone know why this page just says NULL
<p858snake> nadando / ankry: are you currently logged in?
<nadando> p858snake, yes
<Waggie> p858snake, we just had another user report the same thing on
<Waggie> (over on -en-help)
<p858snake> try doing a force refreshed (most browsers: Ctrl+F5
<Waggie> They've left, so I can't inquire further.
<p858snake> *forced refresh
<nadando> That page says NULL too, same if I log out and force refresh
<p858snake> Can you take a screenshot please?
<Waggie> I get NULL also, Chrome, logged in, on Linux. Will take screenshot momentarily.
<Waggie> Also, page source consists only of "NULL". No html, or anything.

Event Timeline

Somehow "NULL" got cached in varnish.

legoktm@terbium:~$ echo "" | mwscript purgeList.php --wiki=commonswiki
Purging 1 urls

And the url works now.

If you ?action=purge on the affected articles, they should be fixed. We'll (probably not me though) figure out some way to purge everything with the content of "NULL".

In ru-wiki, this problem hit the Special:Watchlist page.
So I think this bug needs a critical priority.

Mentioned in SAL [2016-03-22T06:54:27Z] <_joe_> banning all pages with content-length of 25 from the caches, T130575

Joe triaged this task as Unbreak Now! priority.Mar 22 2016, 6:56 AM

My hackish ban should've removed all the current pages with NULL content (at least if they're gzipped).

The fact remains we have no idea what can have caused this.

Please let me know if you see this again, I'm not confident at all this was a "permanent" fix (which would mean the error we've seen was transient)

@John_of_Reading I think my ban of the cached content should've stopped this from happening now, so let's see if more reports come in from now on.

Joe lowered the priority of this task from Unbreak Now! to High.Mar 22 2016, 7:45 AM

I tested one of the affected urls against all appservers, and they're all responding correctly now.

Change 278849 had a related patch set uploaded (by Giuseppe Lavagetto):
conftoool: remove the debug appservers from the pool

I just found out that three out of four of the application servers dedicated to debugging were still pooled to serve traffic. If someone was performing some tests on those, the assumption that tests would not impact production would be voided.

This hypothesis came to my mind as @Legoktm suggested on irc

< legoktm> NULL in all uppercase suggests a "var_dump(null)" or something

earlier this morning. It's completely unverified, but I'm fixing the potential issue anyways as this could be a real-world scenario.

Change 278849 merged by Giuseppe Lavagetto:
conftoool: remove the debug appservers from the pool

I am also resolving the ticket as it seems like no more incidents have been reported

Joe claimed this task.