This task is the result of the spike T162713. Here is a copy & paste with slight modifications:
I see a little more than 2,000 instances of `message:"Found infobox wrapped"` at [[ https://logstash.wikimedia.org/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2017-04-13T15:43:43.192Z',mode:absolute,to:'2017-04-25T15:43:43.192Z'))&_a=(columns:!(_source),index:'logstash-*',interval:d,query:(query_string:(analyze_wildcard:!t,query:'message:%22Found%20infobox%20wrapped%22')),sort:!('@timestamp',desc)) | over the last 10 days ]]. Here are some of the page titles logged in logs:
1. [[https://en.m.wikipedia.org/wiki/Rodney_Peete | Rodney Peete ]] - the infobox is a direct child of the lead section and the page was last edited 25 days ago while the log entry was created today;
2. [[https://en.m.wikipedia.org/wiki/LNER_Class_A4_4468_Mallard | LNER_Class_A4_4468_Mallard ]]; - the infobox is a direct child of the lead section
3. [[https://zh.m.wikipedia.org/wiki/%E9%A6%99%E6%B0%B4 | 香水 ]] - no infobox on the page;
4. [[https://en.m.wikipedia.org/wiki/Portal:Current_events | Portal:Current_events]] - Considering the page is laid out using tables, I wonder if we need to worry only articles in the Main namespace;
5. [[https://en.m.wikipedia.org/wiki/Bob_Hope_Airport| Bob_Hope_Airport]] - the infobox is a direct child of the lead section
#4 above is a product decision in my opinion. @ovasileva to clarify what to do and update the A/C below.
=A/C=
The goal of the task is to fix the false positives listed above, i.e.
[] don't log cases where an infobox is a direct child of the lead section;
[] when an infobox isn't present on the page.