Page MenuHomePhabricator

MobileFrontend is failing mwext-mw-selenium test
Closed, ResolvedPublic

Description

MobileFrontend is failing mwext-mw-selenium test

Steps to reproduce

Actual results

  • All scenarii are failling

Expected results

  • The test should pass with success.

Note all patches in that repo are failing.

If you view open patches here https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/MobileFrontend+branch:master,n,z

Example trace from the Selenium output:

@smoke @editing @integration
Scenario: Added and removed content                            # features/diff.feature:5
  Given I am logged into the mobile website                    # features/step_definitions/common_steps.rb:29
    757: unexpected token at ' 
      
       ( ! )  Fatal error: Call to a member function getIP() on a non-object in includes/db/loadbalancer/LBFactory.php on line  285   
      Call Stack  
      #  Time  Memory  Function  Location  
      1  0.0081  256912  {main}(  )  ../api.php : 0  
      2  0.0085  281800  require(  'includes/WebStart.php'  )  ../api.php : 38  
      3  0.0484  2830016  require_once(  'LocalSettings.php'  )  ../WebStart.php : 123  
      4  0.0623  3818936  wfGetDB(  )  ../LocalSettings.php : 363  
      5  0.0623  3819160  wfGetLB(  )  ../GlobalFunctions.php : 3562  
      6  0.0623  3819192  wfGetLBFactory(  )  ../GlobalFunctions.php : 3572  
      7  0.0635  3953544  LBFactory::singleton(  )  ../GlobalFunctions.php : 3581   
      8  0.0640  3993176  LBFactorySimple->__construct(  )  ../LBFactory.php : 75  
      9  0.0640  3993248  LBFactory->__construct(  )  ../LBFactorySimple.php : 37  
      10  0.0640  3993400  LBFactory->newChronologyProtector(  )  ../LBFactory.php : 49  
      
    ' (JSON::ParserError)
    ./features/step_definitions/common_steps.rb:31:in `/^I am logged into the mobile website$/'
    features/diff.feature:6:in `Given I am logged into the mobile website'

Exceptions while hitting Special:Blankpage:

0.0607   5.2M  Start request GET /jenkins-mwext-mw-selenium-2136/index.php/Special:BlankPage
HTTP HEADERS:
USER-AGENT: curl/7.35.0
HOST: localhost:9412
ACCEPT: */*
[caches] cluster: EmptyBagOStuff, WAN: mediawiki-main-default, stash: db-replicated, message: SqlBagOStuff, parser: SqlBagOStuff
[caches] LocalisationCache: using store LCStoreCDB
0.1039  10.8M  Fully initialised
0.1391  13.2M  LocalisationCache::isExpired(en): cache for en expired due to GlobalDependency
0.1461  13.0M  LocalisationCache::recache: got localisation for en from source
0.3282  15.2M  IP: 127.0.0.1
[connect] Connected to database 0 at 127.0.0.1:3306
[DBPerformance] Expectation (masterConns <= 0) by MediaWiki::main not met:
[connect to 127.0.0.1:3306 (jenkins_u2_mw)]
TransactionProfiler.php line 311 calls wfBacktrace()
TransactionProfiler.php line 146 calls TransactionProfiler->reportExpectationViolated()
LoadBalancer.php line 573 calls TransactionProfiler->recordConnection()
GlobalFunctions.php line 3562 calls LoadBalancer->getConnection()
MessageBlobStore.php line 235 calls wfGetDB()
LocalisationCache.php line 1032 calls MessageBlobStore->clear()
LocalisationCache.php line 463 calls LocalisationCache->recache()
LocalisationCache.php line 337 calls LocalisationCache->initLanguage()
LocalisationCache.php line 274 calls LocalisationCache->loadItem()
Language.php line 3344 calls LocalisationCache->getItem()
SpecialPageFactory.php line 274 calls Language->getSpecialPageAliases()
SpecialPageFactory.php line 334 calls SpecialPageFactory::getAliasList()
Title.php line 1092 calls SpecialPageFactory::resolveAlias()
MediaWiki.php line 182 calls Title->isSpecial()
MediaWiki.php line 672 calls MediaWiki->performRequest()
MediaWiki.php line 474 calls MediaWiki->main()
index.php line 43 calls MediaWiki->run()

[DBPerformance] Expectation (writes <= 0) by MediaWiki::main not met:
query-m: DELETE FROM `msg_resource` [TRX#002e1341e90d]
TransactionProfiler.php line 311 calls wfBacktrace()
TransactionProfiler.php line 200 calls TransactionProfiler->reportExpectationViolated()
Database.php line 1006 calls TransactionProfiler->recordQueryCompletion()
Database.php line 2951 calls DatabaseBase->query()
MessageBlobStore.php line 236 calls DatabaseBase->delete()
LocalisationCache.php line 1032 calls MessageBlobStore->clear()
LocalisationCache.php line 463 calls LocalisationCache->recache()
LocalisationCache.php line 337 calls LocalisationCache->initLanguage()
LocalisationCache.php line 274 calls LocalisationCache->loadItem()
Language.php line 3344 calls LocalisationCache->getItem()
SpecialPageFactory.php line 274 calls Language->getSpecialPageAliases()
SpecialPageFactory.php line 334 calls SpecialPageFactory::getAliasList()
Title.php line 1092 calls SpecialPageFactory::resolveAlias()
MediaWiki.php line 182 calls Title->isSpecial()
MediaWiki.php line 672 calls MediaWiki->performRequest()
MediaWiki.php line 474 calls MediaWiki->main()
index.php line 43 calls MediaWiki->run()

Event Timeline

Paladox raised the priority of this task from to Needs Triage.
Paladox updated the task description. (Show Details)
Paladox subscribed.

This has nothing to do with continuous integration. A regression occurred either in MobileFrontend, one of it dependencies or mediawiki/core itself.

The regression seems to have occurred between 12 nov. 2015 22:05 UTC and 13 nov. 2015 20:38 UTC. Looking at build against master, two got trigger by the l10n bot on postmerge:

The first failure was build #2​12​4 on 13 nov. 2015 20:38 UTC

Triggered by change: 253009,1 on branch master and as a postmerge.

The previous change on the master branch was successful. build #2​12​1 on 12 nov. 2015 22:05 UTC

Triggered by change: 252820,1 on branch master and as a postmerge.

The job, when triggered against MobileFrontend, clones the following repositories:

mediawiki/core
mediawiki/vendor
mediawiki/extensions/MobileFrontend
mediawiki/extensions/Echo
mediawiki/extensions/VisualEditor

One of them had the regression / incompatibility introduced between 12 nov. 2015 22:05 UTC and 13 nov. 2015 20:38 UTC.

There is a couple commits commits mentioning newChronologyProtector

1af6474a6ea3b77268f3af097894061286d61f80
Author: Aaron Schulz <aschulz@wikimedia.org>
AuthorDate: Fri Nov 13 00:07:50 2015
Commit: Aaron Schulz <aschulz@wikimedia.org>
CommitDate: Fri Nov 13 00:09:25 2015

Add LBFactory::disableChronologyProtection method

This is useful for things like job running entry points by
avoiding pointless overhead.

Change-Id: I007273dd95e6a2e9d258d475d383bc869d63f7b1

commit 85c0f85e925e206f3e2f76d348d2550c0644fdc4
Author: Aaron Schulz <aschulz@wikimedia.org>
AuthorDate: Sun Oct 18 19:53:40 2015
Commit: Tim Starling <tstarling@wikimedia.org>
CommitDate: Thu Nov 12 23:11:18 2015

Decouple ChronologyProtector from user sessions

* It now works for users without using sessions.
  Sessions should not be cluttered with things
  unrelated to authentication and tokens.
* Public services doing api.php requests on behalf
  of a users only need to set XFF headers (as normal)
  for position wait logic to trigger. They can opt out
  of ChronologyProtector via a new HTTP header
  "ChronologyProtection: false".
* Requests across subdomains, such as the SUL2 handshake
  for CentralAuth on account creation, now have position
  wait logic applied. This helps avoid anomolies were a
  row just written in the last request may not be seen.
* Use merge() to avoid rolling back master positions if
  the user has multiple tabs open and doing writes at once.
* $_SESSION global state is gone from ChronologyProtector.
* Cleaned up post-send LBFactory::shutdown() logic for
  avoiding master position writes with an explicit flag.
* Use 'replication' debug log group in more places.

Bug: T111264
Change-Id: Ib25d05994d62b25c2f89e67b7f51009c54f4bca8

The related function is:

277     /**
278      * @return ChronologyProtector
279      */
280     protected function newChronologyProtector() {
281         $request = RequestContext::getMain()->getRequest();
282         $chronProt = new ChronologyProtector(
283             ObjectCache::getMainStashInstance(),
284             array(
285                 'ip' => $request->getIP(),   # <---- unset / missing  for some reason
286                 'agent' => $request->getHeader( 'User-Agent' )
287             )
288         );
289         if ( PHP_SAPI === 'cli' ) {
290             $chronProt->setEnabled( false );
291         } elseif ( $request->getHeader( 'ChronologyProtection' ) === 'false' ) {
292             // Request opted out of using position wait logic. This is useful for requests
293             // done by the job queue or background ETL that do not have a meaningful session.
294             $chronProt->setWaitEnabled( false );
295         }
296 
297         return $chronProt;
298     }

Some custom configuration is injected at the end of LocalSettings.php which shows up in the trace: 4 0.0623 3818936 wfGetDB( ) ../LocalSettings.php : 363

353 <?php
354 $wgRightsText = 'Creative Commons Attribution 3.0';
355 $wgRightsUrl = "http://creativecommons.org/licenses/by-sa/3.0/";
356 // Allow users to edit privacy link.
357 $wgGroupPermissions['user']['editinterface'] = true;
358 
359 
360 $wgMFIgnoreEventLoggingBucketing = true;
361 // Make languages.feature test work.
362 try {
363     $dbw = wfGetDB( DB_MASTER );   # <------- call / root cause
364     $dbw->insert( 'interwiki',
365         array(
366             'iw_prefix' => 'es',
367             'iw_url' => 'http://wikifoo.org/es/index.php/$1',
368             'iw_local' => 0,
369             'iw_trans' => 0,
370         )
371     );
372 } catch ( Exception $e ) {
373     // Pass - interwiki link already exists.
374 }

That is being injected from MobileFrontend tests/browser/LocalSettings.php and hasn't been touched for a few weeks. With the introduction of the ChronologyProtection system, that makes the wfGetDB() call to eventually call an unset request context :(

That LocalSettings.php hack should be made a hook, probably in SetupAfterCache.

hashar triaged this task as High priority.Nov 17 2015, 10:26 AM

high priority since it breaks tests.

I don't think ChronologyProtector is to blame. I cced Aaron and Tim merely for information.

It seems that the problem was caused by rMW85c0f85e925e: Decouple ChronologyProtector from user sessions like mentioned by @hashar. Instead of creating a new object of ChronologyProtector directly, it uses newChronologyProtector which requires a WebRequest/FauxRequest object (through RequestContext::getRequest()). It will fall back to the global wgRequest, because it's not set yet (I think this is nearly the first call to RequestContext::getMain(), which crates a new object of itself).Unfortunately, the $wgRequest global will be initialized, _after_ LocalSettings.php was loaded.

It would be possible to fix the MF only problem by moving the db update job into a hook, but that doesn't solve the problem, that wfGetDB isn't available in LocalSettings.php anymore. So we should consider: Should wfGetDB() be available in LocalSettings.php? I think yes, any function, that is available (and GlobalFunctions.php is loaded, before LocallSettings.php) should work. Maybe we can fix this task in the following way?

[[ https://github.com/wikimedia/mediawiki/blob/master/includes/context/RequestContext.php#L124-L125 | RequestContext::getRequest() ]] already has a comment for the usage of the global variable (# fallback to $wg till we can improve this). Maybe we can move the initialization of the WebRequest object from Setup.php into RequestContext.php, e.g. into a private function called createRequest() or something similar. This function would hold the functionality part of https://github.com/wikimedia/mediawiki/blob/master/includes/Setup.php#L617-L636 and the debugging part would live there too, or (I'm not sure, how helpful would this) would still be in Setup.php, which calls RequestContext::getMain()->getRequest() to initialize the Request object.

Change 253651 had a related patch set uploaded (by Florianschmidtwelzow):
RequestContext: Load the request object for getRequest on first call

https://gerrit.wikimedia.org/r/253651

Wait what?? You can't make DB writes in LocalSettings.php and expect that to work....that's crazy. If you want to add an interwiki prefix, just use the InterwikiLoadPrefix hook.

Change 253658 had a related patch set uploaded (by Florianschmidtwelzow):
Use InterwikiLoadPrefix to use interwiki data for brwoser tests

https://gerrit.wikimedia.org/r/253658

Change 253658 merged by jenkins-bot:
Use InterwikiLoadPrefix to use interwiki data for browser tests

https://gerrit.wikimedia.org/r/253658

Everything seems to work now.

Should I close as resolved.

Paladox removed a project: Patch-For-Review.
Paladox removed a subscriber: gerritbot.

Thank you everyone :-)