Warning: Empty regular expression in /srv/mediawiki/php-1.29.0-wmf.9/includes/parser/DateFormatter.php on line 200
Closed, ResolvedPublic

Description

Symptoms

With 1.29.0-wmf.9:

Warning: Empty regular expression in /srv/mediawiki/php-1.29.0-wmf.9/includes/parser/DateFormatter.php on line 200

Example of spam after wmf.9 deployments:

  • 2017-01-25 20:50 rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.9

Zoom over 20 minutes highlighting the bursts:

  • 2017-01-26 13:01 rebuilt wikiversions.php and synchronized wikiversions files: group1 back to 1.29.0-wmf.9 T156310

In both cases one can see a huge spike after 300?? seconds. And it is vanishing entirely after exactly(?) an hour.

Code

includes/parser/DateFormatter.php
184             $regex = $this->regexes[$i];
185 
186             // Horrible hack
187             if ( !$linked ) {
188                 $regex = str_replace( [ '\[\[', '\]\]' ], '', $regex );
189             }
190 
191             if ( $match_whole ) {
192                 // Let's hope this works
193                 $regex = preg_replace( '!^/!', '/^', $regex );
194                 $regex = str_replace( $this->regexTrail,
195                     '$' . $this->regexTrail, $regex );
196             }
197 
198             // Another horrible hack
199             $this->mLinked = $linked;
200             $text = preg_replace_callback( $regex, [ &$this, 'replace' ], $text );
201             unset( $this->mLinked );

Note the nice comment hoping that will work.

hashar created this task.Jan 26 2017, 1:29 PM

https://gerrit.wikimedia.org/r/#/c/332702 did some changes to DateFormatter.php

That patch didn't touch this code, I have no idea why you flagged it.

As for the warning itself, do we know any way at all to reproduce it?

If we don't have any way to reproduce it, it seems from looking at HHVM's source that this error is raised when $regex is empty or contains only whitespace (per C isspace()) up to the first NUL byte, so we could probably live hack in something like this to get some logs.

if ( preg_match( '/^[\f\n\r\t\v ]*($|\0)/', $regex ) ) {
    wfDebugLog( 'temp-debug', 'Empty regex! lang={lang} i={i} orig={orig} regex={regex}', 'all', [
        'lang' => $this->lang->getCode(),
        'i' => $i,
        'orig' => $this->regexes[$i],
        'regex' => $regex,
        'options' => $options,
    ] );
}

If we don't have any way to reproduce it, it seems from looking at HHVM's source that this error is raised when $regex is empty or contains only whitespace (per C isspace()) up to the first NUL byte, so we could probably live hack in something like this to get some logs.

+1 that sounds like a good idea to me.

greg added a comment.Jan 26 2017, 11:15 PM

So, blocker status? :)

This doesn't appear to be happening anymore. I'm removing blocking status.

After deploying wmf.9 to all wikis, the error came back:

mmodell raised the priority of this task from Normal to High.Jan 26 2017, 11:49 PM

@Anomie wrote

> https://gerrit.wikimedia.org/r/#/c/332702 did some changes to DateFormatter.php
That patch didn't touch this code, I have no idea why you flagged it.

Random guess sorry :-(

There was another old task that mentioned the empty regex was caused by a faulty/empty cache of the linktail. The spam of log happened as well on Tuesday (group0) and eventually disappeared. So might be outofdate/corrupted cache?

There was another old task that mentioned the empty regex was caused by a faulty/empty cache of the linktail.

In T66603 it's pretty clear how it would happen on lines 1263 and 1630 where the variable is being passed directly. Here, though, even if something is corrupted I'd think the concatenation to build the regexes used should prevent this particular error. But I can't say for 100% certain it's not a corrupted cache somehow.

From a very quick chat with @Anomie and given we don't have a way to reproduce, I am all in favor of adding some debug log as proposed on T156364#2973081 . That would help track down the root cause since it does not even happen on beta cluster.

hashar updated the task description. (Show Details)Jan 27 2017, 4:58 PM
hashar updated the task description. (Show Details)

Edited to add some screenshot event rate failure from logstash.

Anomie added a comment.EditedJan 27 2017, 5:46 PM

Mukunda's email prompted me to look for warnings from DateFormatter in 1.29.0-wmf.8. I spotted several like the following:

Jan 27 00:47:45 mw1227:  #012Notice: Undefined property: DateFormatter::$keys in /srv/mediawiki/php-1.29.0-wmf.8/includes/parser/DateFormatter.php on line 212
Jan 27 00:47:45 mw1227:  #012Notice: Undefined property: DateFormatter::$targets in /srv/mediawiki/php-1.29.0-wmf.8/includes/parser/DateFormatter.php on line 229
Jan 27 00:47:45 mw1227:  #012Notice: Undefined index: F in /srv/mediawiki/php-1.29.0-wmf.8/includes/parser/DateFormatter.php on line 251
Jan 27 00:47:45 mw1227:  #012Notice: Undefined index:  in /srv/mediawiki/php-1.29.0-wmf.8/includes/parser/DateFormatter.php on line 341
Jan 27 00:47:45 mw1227:  #012Notice: Undefined index: j in /srv/mediawiki/php-1.29.0-wmf.8/includes/parser/DateFormatter.php on line 260
Jan 27 00:47:45 mw1227:  #012Notice: Undefined variable: matches in /srv/mediawiki/php-1.29.0-wmf.8/includes/parser/DateFormatter.php on line 304
Jan 27 00:47:45 mw1227:  #012Notice: Undefined index: m in /srv/mediawiki/php-1.29.0-wmf.8/includes/parser/DateFormatter.php on line 311

Something extremely weird is going on there: keys and targets are set unconditionally in the constructor, so how can we have an instance of the class without them set? I also note that these were logged relatively frequently from Jan 26 11:12:18 to 12:11:05 and Jan 26 23:49:08 to Jan 27 00:48:23 and not otherwise, both almost exactly one hour time periods.

Looking at code that creates the object, I find the following:

public static function getInstance( $lang = null ) {
    global $wgContLang, $wgMainCacheType;

    $lang = $lang ? wfGetLangObj( $lang ) : $wgContLang;
    $cache = ObjectCache::getLocalServerInstance( $wgMainCacheType );

    static $dateFormatter = false;
    if ( !$dateFormatter ) {
        $dateFormatter = $cache->getWithSetCallback(
            $cache->makeKey( 'dateformatter', $lang->getCode() ),
            $cache::TTL_HOUR,
            function () use ( $lang ) {
                return new DateFormatter( $lang );
            }
        );
    }

    return $dateFormatter;
}

So that's probably where we're getting the weird behavior that makes it look like an invalid cache that goes away after an hour: rMW4ca09bd76f19: Make most of DateFormatter private changed how DateFormatter is serialized, so when 1.29.0-wmf.9 loads a DateFormatter from cache that was saved in 1.29.0-wmf.8 or vice versa it's probably missing setting of expected fields.

Assuming that's true, we could probably just redeploy and see that the errors really do go away after an hour. Or we could add a version number to the cache key like we do with some other classes that cache instances.

@Anomie: I believe you are onto something, that seems like a solid theory and after discussing it with @greg I think we are going to go ahead with deploying wmf.9 and see how things look after 1 hour. Assuming the errors die down as expected then we'll leave it at that.

Mentioned in SAL (#wikimedia-operations) [2017-01-27T18:52:19Z] <twentyafterfour> Rolling forward with group2 to 1.29.0-wmf.9 refs T156364 T154683

Error rate is dropping off right on schedule.

Interesting issue. PHP names private vars internally differently from public ones (since there can be a number of private vars with the same name in the inheritance chain). So we should be very careful moving to/from private in serialized classes.
Check out: https://3v4l.org/t2IPr

The same would probably happen in any refactoring that either adds fields or changes visibility of fields. Unfortunately, PHP does not have any mechanism to ensure class definition that wrote the serialization is the same as one that is reading it.

mmodell closed this task as Resolved.Jan 27 2017, 9:09 PM
mmodell claimed this task.

Looks like @Anomie was right about this one. Will reopen if the warning appears again.

hashar added a comment.EditedJan 27 2017, 11:17 PM

My summary is:

We had the spam after group0 and group1 deploy and that ceased after one hour. The same happens on group2 deploy, though I am not sure why we rollbacked four hours later since the spam should have been gone. Maybe due to another issue, but if the spam was still going on after the one hour TTL that confuses me.

Anomie is the huge asset and pointed to the non trivial issue which is that the cache unserialize objects that do not match the new class signature. Havoc happens because of PHP inconsistency.

That is at least the third time we have some havoc due to such signatures changes, I could not find previous bugs but I am 101% sure we had the exact same trouble and faced some kind of nice outage / blank pages as a result. Though it is really a corner case it is definitely annoying.


I like @Smalyshev https://3v4l.org/t2IPr it is rather annoying that an object can end up with two $a property, one public the other protected. I wish unserialize() would just explode in such case or warning and return false.

I went with my own one (P4826): that does public -> private and private -> public: https://3v4l.org/QKTGW That even show a bug/different behavior on HHMV which concatenate both properties (last line). Might be worth filling a bug https://github.com/facebook/hhvm/issues/7629 .


Would there be a way to have BagOfStuff to detect obsolete serialization or an unserialization resulting in a "wrong" object? Might be worth investigating further. I am not quite sure how that would work with array of objects or objects referencing other objects......

mmodell reassigned this task from mmodell to Anomie.Jan 28 2017, 2:21 AM

My summary is:

We had the spam after group0 and group1 deploy and that ceased after one hour. The same happens on group2 deploy, though I am not sure why we rollbacked four hours later since the spam should have been gone. Maybe due to another issue, but if the spam was still going on after the one hour TTL that confuses me.

Out of an abundance of caution, I rolled back immediately when going to group2, it was not several hours later.

Would there be a way to have BagOfStuff to detect obsolete serialization or an unserialization resulting in a "wrong" object? Might be worth investigating further. I am not quite sure how that would work with array of objects or objects referencing other objects......

This sounds like a really good idea and I think it warrants a followup task...