MobileFrontend corrupts parser cache for regular page views
Closed, ResolvedPublic

Description

Starting at about 4. July there are many (aproximatly one every two days) reports on [[de:Wikipedia:Fragen_zur_Wikipedia]] (the dewiki village pump) about issues obviously caused by badly nested HTML. Purging fixes the issue, but it seems like tidy wasn't executed in these cases. Since this kind of issue occurs now definitly more frequently than before, it should be investigated why tidy refuses to work so often.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=58042

bzimport added a project: MobileFrontend-stable.Via ConduitNov 22 2014, 1:02 AM
bzimport set Reference to bz38273.
Schnark created this task.Via LegacyJul 10 2012, 8:12 AM
TheDJ added a comment.Via ConduitJul 10 2012, 1:40 PM

If this happens, can people use the "View source" feature of their browser and pick and the bottom of it look for "<!-- Served by mw## in 2.259 secs. -->" and note the mw## id before purging the file ?

That will probably help in pinpointing the problem further.

Reedy added a comment.Via ConduitJul 10 2012, 1:41 PM

(In reply to comment #1)

If this happens, can people use the "View source" feature of their browser and
pick and the bottom of it look for "<!-- Served by mw## in 2.259 secs. -->" and note the mw## id before purging the file ?

That will probably help in pinpointing the problem further.

Or <!-- Served by srv#### in 2.259 secs. -->

Reedy added a comment.Via ConduitJul 10 2012, 1:51 PM

Don't even need to do that, looking at the dpkg output suggests multiple are missing it

Reedy added a comment.Via ConduitJul 10 2012, 2:11 PM

(In reply to comment #3)

Don't even need to do that, looking at the dpkg output suggests multiple are
missing it

bleh, ignore me

Schnark added a comment.Via ConduitJul 14 2012, 9:01 AM

mw53 just served an untidy html for [[de:Keith Jarrett]]

TheDJ added a comment.Via ConduitJul 19 2012, 9:52 AM

Can someone with shell access do a sanity check on that host please ?

Reedy added a comment.Via ConduitJul 23 2012, 12:03 PM

reedy@mw53:~$ which tidy
/usr/bin/tidy
reedy@mw53:~$ tidy --version
HTML Tidy for Linux released on 25 March 2009
reedy@mw53:~$ php /usr/local/apache/common-local/multiversion/MWScript.php eval.php enwiki

echo $wgTidyConf

/usr/local/apache/common-local/php-1.20wmf7/includes/tidy.conf

echo $wgTidyBin

tidy

Need to check the source for existence of "Tidy was unable to run" or "Tidy found serious XHTML errors"

Orlodrim added a comment.Via ConduitJul 25 2012, 3:14 PM

We have a similar problem on frwiki. As far as I know, the first error was reported on 25 June and there are at least 10 reports since this date. Today, I have loaded a page (Richard Feynman) twice. The server was mw6 the first time and srv243 the second time, and I obtained exactly the same incorrect rendering (then I purged the cache and it fixed the problem).

Drongou added a comment.Via ConduitJul 25 2012, 10:09 PM

The </div> of <div id="content" class="mw-body"> is after <!-- Served ...
see : http://imageshack.us/f/801/capturedcran20120720011.png/

Umherirrender added a comment.Via ConduitJul 29 2012, 9:06 AM

Served by mw4 in 0.208 secs. on dewiki

Schniggendiller added a comment.Via ConduitJul 30 2012, 10:04 AM

http://de.wikipedia.org/wiki/DB_City_Night_Line: Served by srv240 in 0.351 secs. (A few days ago.)

Schnark added a comment.Via ConduitAug 1 2012, 10:13 AM

Served by mw30 in 0.196 secs. ([[de:Galatasaray Istanbul]])

<div clear="all" style="clear:both;" /><br />
<div style="background-color:#888; height:1px; width:8em;"/>

became

<p><br style="clear:both;" clear="all"/>
</br>
</p>
<div style="background-color:#888; height:1px; width:8em;"/>

Schnark added a comment.Via ConduitAug 1 2012, 10:15 AM

Served by srv229 in 0.118 secs. ([[de:Glosche]])
Served by mw11 in 0.111 secs. ([[de:Toupet]])

Schnark added a comment.Via ConduitAug 2 2012, 8:15 AM

(In reply to comment #12)

<div clear="all" style="clear:both;" /><br />
<div style="background-color:#888; height:1px; width:8em;"/>

became

<p><br style="clear:both;" clear="all"/>
</br>
</p>
<div style="background-color:#888; height:1px; width:8em;"/>

Sorry, I wrote nonsense there. That strange
<br style="clear:both;" clear="all"/>
</br>
already was in the wikitext. I didn't notice it, because I use a script to automatically clean up some errors.

Schniggendiller added a comment.Via ConduitAug 7 2012, 6:44 PM

Right now and not yet fixed with purging: http://de.wikipedia.org/wiki/Bundespr%C3%A4sident_%28Deutschland%29: Served by srv260 in 0.155 secs

Umherirrender added a comment.Via ConduitAug 12 2012, 4:51 PM

Increase priority, because HTML tidy is missing more often on pages, at least on de.wp.

Umherirrender added a comment.Via ConduitAug 20 2012, 7:39 PM

HTML tidy is still missing on some pages.

Is there no solution?

Orlodrim added a comment.Via ConduitAug 21 2012, 7:28 PM

As a workaround, I replaced <div ... /> by <div ... ></div> in the most used templates on frwiki. I did not see any complain since I did that, 20 days ago. The display of some pages is still broken, but this is virtually invisible.

Redrose64 added a comment.Via ConduitSep 1 2012, 11:10 PM

This is still happening on en.wp - on 14 August 2012 mw4 and mw44 both served invalid HTML (see http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_102#Are_the_HTML_generators_out_of_sync_on_some_servers.3F

Today, a few minutes ago, I saw exactly the same problem on a different article: checking, I found that I was again served invalid HTML, this time by srv272

The basic problem concerns two table cells which each contain an unordered list with several items. The last </li> of each list and the </ul> immediately following were not in the proper place, but placed later on: either in between one </th> and the next <td>, or between a </tr> and the </table> following.

Umherirrender added a comment.Via ConduitSep 4 2012, 6:50 PM

This needs timely further investigation, because it breaks many pages on many wikis. Some user get confused, because after a time it is gone away, because someone other purged the page.

Please have a look at this. Thanks.

matmarex added a comment.Via ConduitSep 9 2012, 8:40 PM

I think this may be a dupe: bug 40121. Interesting thing: the NewPP limit report is missing. Page served by mw15.

Redrose64 added a comment.Via ConduitSep 12 2012, 4:45 PM

Same problem seen in the navbox at the bottom of http://en.wikipedia.org/wiki/Operation_Nougat - this was served by mw20. Omitting all attributes, the textual content of enclosures, and all correctly-paired tags which do not enclose bad tags, the mis-ordered tags are:

<table> <tr> <td> <div> <ul> <li> </div>
<div> </li> </ul> </div> <table> <tr> <td> <div> <ul> <li> </div>
<div> </li> <li> </div>
        </td>
      </tr>
      <tr>
        <td>
          <div>
                    </li>
                    <li>
          </div>
        </td>
      </tr>
      <tr>
        <td>
          <div>
                    </li>
                  </ul>
          </div>
        </td>
      </tr>
    </table>
  </td>
</tr>

</table>

Schnark added a comment.Via ConduitSep 14 2012, 8:31 AM

I think I found the problem why Tidy sometimes isn't executed:

./includes/job/RefreshLinksJob.php calls
ParserOptions::newFromUserAndLang( new User, $wgContLang )
while in other places makeParserOptions from ./includes/WikiPage.php is called, which additionally calls
$options->enableLimitReport();
$options->setTidy( true );

This also explains why the limit report is missing.

This means that this bug report is in the wrong component, but since I neither know where it actually belongs to nor how to change both product and component, I'm just leaving this as is.

btw: The structure in the previous comment reminds me a bit of Alice in Wonderland: https://en.wikisource.org/wiki/Alice%27s_Adventures_in_Wonderland/Chapter_3

Krinkle added a comment.Via ConduitSep 16 2012, 2:51 PM
  • Bug 40121 has been marked as a duplicate of this bug. ***
Redrose64 added a comment.Via ConduitSep 26 2012, 3:04 PM

Happened again with server mw44. BTW the long and sad tale (which I first read in about 1973) was not forefront - I wanted to illustrate the mismatching by means of indent levels. For indents as deep as twelve levels, tabs are impractical so I used two spaces.

Redrose64 added a comment.Via ConduitSep 27 2012, 3:36 PM

A new one: srv200 did this too

Redrose64 added a comment.Via ConduitSep 27 2012, 4:51 PM

Also mw29

matmarex added a comment.Via ConduitSep 27 2012, 5:10 PM

Comment 23 contains what could be a patch. Could somebody competent look at this?

(Updating component and project.)

MaxSem added a comment.Via ConduitSep 28 2012, 10:18 PM

(In reply to comment #23)

I think I found the problem why Tidy sometimes isn't executed:

./includes/job/RefreshLinksJob.php calls

ParserOptions::newFromUserAndLang( new User, $wgContLang )

while in other places makeParserOptions from ./includes/WikiPage.php is called,
which additionally calls

$options->enableLimitReport();
$options->setTidy( true );

This also explains why the limit report is missing.

RefreshLinksJob doesn't save the results of this parse, so people shouldn't getting these results, and avoiding Tidy calls here makes a lot of sense as the results are used only for link updates.

Schnark added a comment.Via ConduitOct 4 2012, 7:48 AM

I tried to reproduce the issue by creating a page with wrongly nested syntax and changing linked/transcluded pages, but everything displayed correctly.

But all reports about broken layout that say something about the NewPP limit report mention that it is missing. (Latest report: [[de:Frankfurt (Main) Hauptbahnhof]] served by mw29 and srv264)

So either there is some other place where ParserOptions is created without enabling Tidy and LimitReport, or under some strange circumstances I wasn't able to replicate, RefreshLinksJob does save the result to cache.

MZMcBride added a comment.Via ConduitOct 5 2012, 10:45 PM

Tim, could you please poke at this? This seems like your kind of thing. :-)

Schnark added a comment.Via ConduitOct 9 2012, 9:26 AM

[[de:Blue October]]: no tidy, no NewPP limit report
Saved in parser cache with key dewiki:pcache:idhash:3450708-0!*!0!!de!4!* and timestamp 20120912181102
Served by srv271 in 0.190 secs

tstarling added a comment.Via ConduitOct 9 2012, 11:51 PM

Looks like a MobileFrontend bug. Call stack:

  • require
  • ApiMain::execute
  • ApiMain::executeActionWithErrorHandling
  • ApiMain::executeAction
  • ApiMobileView::execute
  • ApiMobileView::getData
  • WikiPage::getParserOutput
  • PoolCounterWork::execute
  • PoolWorkArticleView::doWork
  • ParserCache::save

ApiMobileView makes a new default ParserOptions, it doesn't get one from Article::getParserOptions() etc. where tidy and limit reports are enabled.

MaxSem added a comment.Via ConduitOct 23 2012, 10:14 PM

Was deployed at least a week ago.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.