Page MenuHomePhabricator

Adjustments to how we log troublesome infoboxes and move lead paragraph
Closed, ResolvedPublic2 Story Points

Description

There are several improvements we should make before rolling out the lead paragraph move that will reduce the likelihood of an unexpected outcome and allow us to discover edge cases. Reducing the noise in the logs will increase our confidence about deploying. Deploying is only going to increase the volume of logging so we should do our best to clean up beforehand.

Spike notes

This task is the result of the spike T162713. Here is a copy & paste with slight modifications:

I see a little more than 2,000 instances of message:"Found infobox wrapped" at over the last 10 days. Here are some of the page titles logged in logs:

  1. Rodney Peete - There are two infoboxes on the page and the second one is a child of the first one;
  2. LNER_Class_A4_4468_Mallard; - the infobox is a direct child of the lead section, but there are other infoboxes inside the main one;
  3. Portal:Current_events and https://es.wikipedia.org/wiki/Wikipedia:Tablón de anuncios de los bibliotecarios/Portal/Archivo/Protección de artículos/Actual?oldid=98664160 - Considering the page is laid out using tables, I wonder if we need to worry only articles in the Main namespace;
  4. Bob_Hope_Airport - another case of an infobox within an infobox
  5. https://bn.wikipedia.org/wiki/%E0%A6%A8%E0%A7%87%E0%A6%A4%E0%A6%BE%E0%A6%9C%E0%A6%BF_%E0%A6%B8%E0%A7%81%E0%A6%AD%E0%A6%BE%E0%A6%B7%E0%A6%9A%E0%A6%A8%E0%A7%8D%E0%A6%A6%E0%A7%8D%E0%A6%B0_%E0%A6%AC%E0%A6%B8%E0%A7%81_%E0%A6%86%E0%A6%A8%E0%A7%8D%E0%A6%A4%E0%A6%B0%E0%A7%8D%E0%A6%9C%E0%A6%BE%E0%A6%A4%E0%A6%BF%E0%A6%95_%E0%A6%AC%E0%A6%BF%E0%A6%AE%E0%A6%BE%E0%A6%A8%E0%A6%AC%E0%A6%A8%E0%A7%8D%E0%A6%A6%E0%A6%B0?oldid=2553697 - infobox within infobox on Bengali wiki

A/C

The goal of the task is to fix the false positives listed above, i.e.

  • don't log cases where an infobox is a child of another infobox;
  • Only move infoboxes on pages in the main namespace (0) and thus only log infoboxes wrapped in containers on these pages

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 25 2017, 5:45 PM

Can you add the revision ids to all these examples? They are really important for debugging.

Jdlrobson moved this task from Incoming to Needs Prioritization on the Readers-Web-Backlog board.
bmansurov removed bmansurov as the assignee of this task.Apr 26 2017, 7:01 PM
bmansurov updated the task description. (Show Details)
bmansurov updated the task description. (Show Details)Apr 26 2017, 7:44 PM

There is one important thing - the script was logging every wrapped infobox on every page, even if correct infobox was found. Now I think the logging could be bit better - log all instances only if no correct infobox is found.
Because: https://en.m.wikipedia.org/wiki/Rodney_Peete?oldid=773172112 is a correct page.

Jdlrobson renamed this task from Logging of instances of infoboxes being wrapped in containers is misbehaving to Adjustments to how we log troublesome infoboxes and move lead paragraph.Apr 26 2017, 11:59 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson moved this task from Needs Prioritization to Upcoming on the Readers-Web-Backlog board.
Jdlrobson added a subscriber: Nirzar.

1+2+5) Agreed.

  1. Agreed. We should only be moving infoboxes if the page is in the main namespace.
  1. Let's debug this a little more. I couldn't find this in logstash in logstash. I see no reason why we would be logging pages that do not have infoboxes from having looked at the code.

In absence of @ovasileva @Nirzar can make the decision about whether we should limit moving the lead paragraph to pages in the main namespace. Should be fine to limit this to articles (main namespace = 0)

Jdlrobson updated the task description. (Show Details)Apr 27 2017, 12:06 AM

Only mention of that page and revision id that I can see on logstash is this unrelated entry. So I think we can safely dismiss any problems with the infobox logging for 香水 ...?

Jdlrobson triaged this task as High priority.Apr 27 2017, 12:11 AM
Jdlrobson updated the task description. (Show Details)

We should aim to reduce the amount of logging that is occurring to give us more confidence for enabling this feature.

@Jdlrobson that was my bad, I took the referrer and not the actual page where the infobox is wrapped in a div: 苦橙.

bmansurov updated the task description. (Show Details)Apr 27 2017, 5:43 PM
Jdlrobson updated the task description. (Show Details)Apr 27 2017, 5:52 PM
pmiazga claimed this task.May 1 2017, 5:16 PM

I'm pulling this task into Sprint 96 as there is nothing else to do right now.

Jdlrobson added a subscriber: MBinder_WMF.

Channeling @MBinder_WMF we should estimate this during standup on Tuesday.

pmiazga moved this task from To Do to Doing on the Reading-Web-Sprint-96 board.May 1 2017, 10:06 PM

Change 351340 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/MobileFrontend@master] Skip logging infoboxes in special cases

https://gerrit.wikimedia.org/r/351340

pmiazga set the point value for this task to 2.May 2 2017, 5:07 PM

Change 351340 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Skip logging infoboxes in special cases

https://gerrit.wikimedia.org/r/351340