Page MenuHomePhabricator

[BUG] [Content Service] Tapping random causes an unknown error sometimes
Closed, ResolvedPublic1 Estimated Story Points

Description

Steps to reproduce

Tap random until it goes to an article with a comma in the latitude, such as Savonlinna on ES Wikipedia.

Expected resultsPage is shown.

Actual results

An unknown error occurs.

screenshot-2016-05-17-16-39-31-999812163.png (2×1 px, 96 KB)

Environments observed

App version: 10fbcadedd9dba6e8b4c9a8e890e1d1998c346e7
Content Service: 5ed16e2fa034e217898f80f68a362ea3618349ec
Android OS versions: API 23
Device model: Nexus 6P
Device language: English

Details

Related Changes in Gerrit:

Event Timeline

Change 289329 had a related patch set uploaded (by Niedzielski):
Fix geo prop

https://gerrit.wikimedia.org/r/289329

Niedzielski renamed this task from [BUG] Tapping random causes an unknown error sometimes to [BUG] [Content Service] Tapping random causes an unknown error sometimes.May 18 2016, 7:41 PM
Niedzielski set the point value for this task to 1.

@Nicholas.tsg I've moved this here since the coding work is done but it's not yet ready for QA since we still have to deploy the server-side change. I'll update here when it's ready for testing.

The Mobile Content Service change is deployed. The results won't be seen until the affected pages get edited.
The Services team would need to run pre-generation if this is urgent and cannot wait until the affected pages get edited.

@mobrovac Roughly how long would we expect a pre-generation run to take?

The Mobile Content Service change is deployed. The results won't be seen until the affected pages get edited.
The Services team would need to run pre-generation if this is urgent and cannot wait until the affected pages get edited.

@mobrovac Roughly how long would we expect a pre-generation run to take?

If we force pre-generation, then around a week for all wikis, but I would really discourage doing that as (a) it takes a bad toll on storage and Cassandra (it's an intensive process); and (b) sets a bad precedent for the future. A better approach would be to get the list of affected pages (if possible) and regenerate only those. Could you provide such a list?

Generally speaking, this is also a bug in the App, as it must be able to handle such situations.

It seems like many pages on ES wiki are affected. Would it be possible to regenerate just ES?

I'm not in favour of going crazy and regenerate stuff based on a hunch. Can you provide more details as to the affected pages? Can you produce a list of such pages? Or at least rough estimates about any type of characteristics these pages share so that we can narrow it down somehow?

Sure, here's a list of pages generated by tapping the random button 30 times on ES Wikipedia:

  • Samba_(India)
  • Universidad_de_Leópolis
  • Harem_(Siria)
  • Pueblo_Nuevo_de_las_Raíces
  • Linz

The characteristic is that geo.latitude is null. Since many wiki pages are places, and places often have a geolocation, the failure is common.

Ok, so all pages that have lead.geo but are missing either latitude or longitude ?

I don't think the pages are missing latitude or longitude. They just weren't being parsed right due to an unexpected trailing comma in the latitude. For example, the Linz page has this:

<span class="geo">
  <span class="latitude">48.3,</span>
  <span class="longitude">14.283333333333</span>
</span>

Right, sorry, I was referring to the output of the MCS. I.e. are we to re-render all titles for which the stored content of the mobile-sections output is missing one or the other?

Oh! Then yes :) If it's possible to re-render on something like lead.geo && (lead.geo.latitude === null || lead.geo.longitude === null), that would be ideal!

Mentioned in SAL [2016-05-25T18:58:14Z] <mobrovac> restbase running a partial mobile-sections dump of eswiki for T135571 on restbase1009

I have started the regeneration on eswiki, but only for articles that satisfy the condition geo && !(geo.latitude && geo.longitude). Note that even though a subset of articles will be re-rendered, it will take a while for the script to complete since it needs to fetch the contents of every article, parse it and only then ask for it to be regenerated.

Mentioned in SAL [2016-05-26T07:43:36Z] <mobrovac> restbase starting partial mobile-sections dump of enwiki for T135571 on restbase1009

eswiki done, I started the same process for enwiki. @Niedzielski could you check and make sure the error's gone for eswiki?

Issuing curl localhost:8888/es.wikipedia.org/v1/page/mobile-sections-lead/Linz on the MCS node in production gives the same result, so it seems the problem hasn't been fully fixed.

Hm, I'm seeing something different on my machine for deploy/2016-05-23/5ce4f31. curl http://localhost:6927/es.wikipedia.org/v1/page/mobile-sections-lead/Linz|jq .geo yields:

{
  "latitude": 48.3,
  "longitude": 14.283333333333
}

The patch included a test for this scenario and seemed to work. Sorry, but would you mind double checking?

So, it turns out https://gerrit.wikimedia.org/r/289329 hasn't been deployed yet. Consequently, I have stopped the regeneration. Let's be more careful in the future and coordinate better.

I've added T136355 to help verify that the deployment happened for real.

The fix is now deployed. Should we restart the regeneration?

PS @Nicholas.tsg -- this still isn't quite ready for testing yet, but should be soon. I'll update when it is.

@mobrovac, @Mholloway, Yay! It's working!

@Nicholas.tsg I think is ready for testing on español Wikipedia. The steps would be:

  1. Change the app language to español by tapping the search bar, then tapping the language button in the upper right, typing "spanish", and selecting "Español / Spanish".
  2. Search for Linz.
  3. Tap the Linz entry and verify the page loads.

Tested on Toshiba ATC-7 with Android 4.4.2 and Wikipedia app 2.2.146-alpha-2016-05-31. This is fixed now as after I followed the @Niedzielski steps above the Linz page loaded on español Wikipedia.