Version: WikipediaApp/7.3.0.2242 (iOS 16.5; Phone)
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Debug: Log empty mostread responses | mediawiki/services/wikifeeds | master | +6 -0 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T339127 Bug: Blank top read articles | |||
| Resolved | Jgiannelos | T354605 Add debug information for empty most read responses |
Event Timeline
Questions for @ARamadan-WMF:
- Was there only one report? or Multiple reports?
- What day/time were the issues received?
I've asked the following of the content transform team:
Are there any kind of status alerts that exist for Wikifeeds? Source of my question relates to https://phabricator.wikimedia.org/T339127. I'm suspecting there might have been some sort of issue with the API since it was intermittent and I'm just wondering whether we have any decent means to measure uptime and/or correctness of the endpoint outputs
@Seddon, We recieved only one report for it; it was received on the 12th of June and wasn't resolved yet.
+1 report on the 17th of June.
Version: WikipediaApp/7.3.0.2242 (iOS 16.5; Phone)
And another report on the 19th of June.
Version: WikipediaApp/7.3.0.2242 (iPadOS 16.5; Tablet)
This diagram looks like its correlating with the issue:
https://grafana.wikimedia.org/goto/Ie0w_5u4k?orgId=1
Even though error rate on logs is the same it looks like that some of them are returning 50x errors.
Can somebody from the apps help me with a client side implementation detail:
- Which API requests is rendering the top-read functionality? Is it /page/most-read ?
- Would this issue be reproduced if there as an error on the /page/most-read or if the response is empty? (or both) ?
I can't find a log entry that is related to most-read and error rate is very low.
@Jgiannelos, What I received through the iOS support email are reports for the top read-only, as shown in the screenshot.
No, i think that would be enough to filter out some logs.
Just for clarity regarding debugging, the first user report was on the 12th of June but the errors show up on the 15th so the slight error rate increase on the diagram might be a red herring.
The first reply we got is as the next:
In English. The app is in the latest version too
The second response:
I use it in English as the primary language.
The third and last response:
The language I use in English.
We got a new report on the 20th of June:
Version: WikipediaApp/7.3.0.2242 (iPadOS 16.5; Tablet)
@Jgiannelos we fetch top read widget data via the wikifeeds featured API call:
https://en.wikipedia.org/api/rest_v1/feed/featured/2023/06/20
The data comes from the "mostread" object. Have you noticed any errors lately with that?
Would this issue be reproduced if there as an error on the /page/most-read or if the response is empty? (or both) ?
Both - for that mostread object in the featured URL response. We rewrote the way we pulled this data relatively recently, so something could be off with the way we're deserializing now (perhaps we assume some data will be there but it's missing in the response).
I did test the widget myself by changing my device date to the 12, 17, 19 and 20 - all dates populated fine for me.
Tagging @Dmantena in case he has any other thoughts or corrections.
@Jgiannelos we fetch top read widget data via the wikifeeds featured API call:
https://en.wikipedia.org/api/rest_v1/feed/featured/2023/06/20
...
I did test the widget myself by changing my device date to the 12, 17, 19 and 20 - all dates populated fine for me.
Just adding a note that while hitting this endpoint days afterwards (like now) does appear to return populated mostread content, there have been days in the past where the mostread object has been entirely missing. One of those recent days was for 2023/06/20, where the featured API call above returned objects for tfa, image and onthisday, but did not return mostread at all.
+1 user:
My top read widget does not work. It displays empty sections. It used to work 100%.
I tried to delete the app and download it again, as well as logging out and back in but it’s still faulty.
Please find attached screenshots.
Version: WikipediaApp/7.4.4.2841 (iOS 17.1.1; Phone)
+1 report:
The 'Top Reads' widget doesn't seem to work anymore. It's just placeholder stuff:
It was last working about a month or so ago.
Version: WikipediaApp/7.4.5.2871 (iOS 17.2; Phone)
I did a bit more digging on this - here is our legacy fetching method (now only used for On This Day widget):
- Fetches the latest top read item from Core Data
- Checks that item is today's date, if yes returns that item.
- If not, it fetches new wikifeeds data via the API, which populates Core Data
- Then it fetches the latest top read item from Core Data, and returns it without checking for today's date.
So before our refactor in https://github.com/wikimedia/wikipedia-ios/pull/4504, the top read widget always displayed cached old data from previous dates as a last resort. I think the frequency of the blank top read widget since our refactor is just reflecting how often the wikifeeds call is missing the top read data.
@JTannerWMF @Seddon @ARamadan-WMF
A couple of things we could do:
- Add client-side caching back in as a last resort, meaning the top read widget prefers data from older dates over blank. We should take care to clear it and refresh the widgets when user clears cache via Settings to limit vandalism.
- Have Content Transform investigate why top read data is missing from the wikifeeds API call (could it be a timing thing?)
We do have some metrics that show the empty responses (compared with the regular 200s):
codfw:
https://grafana.wikimedia.org/goto/IJLAVdKIz?orgId=1
eqiad:
https://grafana.wikimedia.org/goto/mV314dFIz?orgId=1
The redirects shown by 30x status code are the random title redirect.
It makes sense to add some more metrics to show empty responses per domain and endpoint to detect whats wrong.
That said, do we know if this issue is caused by empty responses or the response might not be empty but the values for the keys are empty?
eg.
{...
mostread: []
...}@Jgiannelos Since we don't have similar blank widget complaints on featured article or picture of the day widgets (which are populated with the same API call), it's more likely the mostread object is empty or missing but other data is populated and the response is 200. To echo @Dmantena
there have been days in the past where the mostread object has been entirely missing
If I pick a day in the future the mostread object is missing (which makes sense):
Perhaps there are hiccups at the metrics layer that causes mostread to be missing for the whole (current) day. If you could add some logging for how often that featured endpoint returns 200 with the mostread object missing, that may be helpful. Looking at our code, if it's missing the widget should try fetching/refreshing again in 2 hours.
Android will be working on Widgets next. I'd rather this is resolved at the mobile-html level if possible before we do things client side.
@Tsevener sounds good, the 204s are way to low in our metrics to cause issues. I will put some grafana metrics for empty keys.
Change 987949 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):
[mediawiki/services/wikifeeds@master] Debug: Log empty mostread responses
There is a pending patch for instrumentation that could help us figure out when/how often we get empty mostread but I was wondering, do we only show widgets for most read articles, or is this the only one we have known failures?
Thats the only component of the aggregated/featured endpoint that queries AQS for metrics instead of MW/Parsoid for page content. Maybe we can narrow it down to a problem between wikifeeds/AQS.
@Jgiannelos we have received complaints about our picture of the day widget as well, though we suspect those are client-side memory limits. I came across another ticket today that indicates top read could have the same memory limit issue. I'll do a little exploration when I get a chance to see if a top read memory limit hit could have the same appearance as what is pictured in the description. It's possible it's a memory thing our end, or a missing mostread object, or both.
Change 987949 merged by jenkins-bot:
[mediawiki/services/wikifeeds@master] Debug: Log empty mostread responses
A few things since we deployed the debug logging:
- We have a lot of empty mostreads from fy.wikipedia.org because its explicitly disabled in the code (the relevant commit says that the content is not PG rated, no relevant bug)
- The commit is very old so it might worth revisiting if its still an issue
- We get some requests for future dates which is expected to be empty
I want to leave it for one more day to see if there is any patterns for a whole day (we see an increase in empty mostread towards the start of my evening UTC but not sure if this is correlated)
Here you can see the logs in logstash:
https://logstash.wikimedia.org/goto/b5056d08d110b2ef1797b9f7d3e64598
Found some relevant documentation - https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews#Updates_and_backfilling.
The data is loaded at the end of the timespan in question. So data for 2015-12-01 will be loaded on 2015-12-02 00:00:00 UTC; Data for 2015-11-10 18:00:00 UTC will be loaded on 2015-11-10 19:00:00 UTC; and so on
It does seem like the featured aggregate endpoint is smartly offsetting by 1:
i.e. Right now when I call:
https://en.wikipedia.org/api/rest_v1/feed/featured/2024/01/10
I receive a mostread item of 01/09:
"mostread": {
"date": "2024-01-09Z",
"articles": [...]
}
Another callout:
The loading can take a long time. It's usually a few hours, but sometimes 24 hours, sometimes more if there are problems.
I guess it could just be that, if loading often takes a long time. These complaints do seem to happen in spurts. The app tries again in two hours but that still might not be enough. Falling back to our last cached data as described in https://phabricator.wikimedia.org/T339127#9431167 could help with this.
@BPirkle I was told to tag you for AQS questions - do you know if the information in https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews#Updates_and_backfilling is still accurate? This would be for the availability of data at something like https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia.org/all-access/2024/01/09. We think we're having timing issues in the iOS app.
@BPirkle I was told to tag you for AQS questions
As of last summer's reorg, I'm no longer actively working on AQS. I'm happy to help if there's something I can do, but I'm subscribing some of the folks still working on it, as they're more up-to-date on current status and will probably be more help.
It looks like the most read date calculation will return either yesterday or two days ago if the API call uses the 'aggregated' parameter but if not it is possible that wikifeeds will select the current day or yesterday depending on what the server UTC is and when the request is made. If the 'aggregated' parameter is used, it should never select today. If we want to fix the non aggregated call, we could either make the code smarter about server UTC setting offsets, or just use the same -1 day hack and sometimes get the most read from two days ago instead of one, but never try to request most read from today.
I think this can be fixed to always only produce yesterdays most read from any server UTC time setting and request time with a bit more code.,
Thanks @Sbailey! From what I can tell I think the iOS app always uses the aggregated endpoint, using a call like https://en.wikipedia.org/api/rest_v1/feed/featured/2024/01/10. Is there a non-aggregated endpoint accessible to us? https://en.wikipedia.org/api/rest_v1/page/most-read/2024/01/10 didn't work for me but maybe I have something wrong.
Another thing we could do client-side is refresh the widgets a little after 3am, instead of midnight. If data "usually takes a few hours" I wonder if that would get most users past the most flaky timespan.
do you know if the information in https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews#Updates_and_backfilling is still accurate?
Yes, it's still valid.
All batch jobs need to wait until the period in question has fully elapsed, so that all corresponding source data has been collected.
Moreover, once the source data is present, dataset refinement and metric computations take place, which do also take some time.
And finally, for AQS metrics, the data needs to be loaded to the serving layer (Druid or Cassandra), which takes some time as well.
So, for AQS most of the time, the metric values will only show up a couple hours after the period in question ends.
For instance (the times are made up, just to give an idea):
- On a daily endpoint, metrics for Tuesday, Jan 9th (whole day) will only be available on Wednesday, Jan 10th at i.e. 02:30h.
- On an hourly endpoint, metrics for the hour 5 (05:00:00.000 -> 05:59:59.999) will be available the same day, at i.e. 07:23h.
- On a monthy endpoint, metrics for the month of December 2023 will be available i.e. on 2024 January 5th at 12:35h.
Depending on the metric, the underlying pipeline can be lighter or heavier, and it can take more or less time for the data to be available.
Some monthly endpoints, take days after the month has elapsed to be ready.
I agree it's a good idea to show the latest available data-point in the most read rank.
I imagine you could request the latest data; and if that returns a 404, request the previous interval, and so forth.
Not sure though that this is possible in your setup?
For what its worth GETing RESTBase (which was the setup that we used since lately) for the failing URL also returns empty mostread responses, so I believe this isssue its not related to wikifeeds switchover to the standalone service.
Also indeed wikifeeds tries to get the previous UTC day to workaround today's (in UTC) data that are not ready from AQS but the requests that cause the empty mostread bug are for tomorrow (in UTC). So for example:
- Today in UTC is 2024/01/16
- The URLs that cause the error (from the logs) are for /feed/featured/2024/01/17
- Wikifeeds backend tries to compute the previous day so we have metrics but 2024/01/16 is not ready yet
- Returns empty response
I suspect this problem comes from ios app code
WMFFeedContentsource.m which uses wmf_midnightUTCDateFromLocalDate function and might better use wmf_midnightUTCDateFromUTCDate instead?
@Sbailey Thanks for looking at this! WMFFeedContentSource.m is an old fetcher that we moved away from for some of our widgets (including top read).
The widget top read fetch starts here - https://github.com/wikimedia/wikipedia-ios/blob/main/Widgets/Widgets/TopReadWidget.swift#L119, which will eventually make a call like https://en.wikipedia.org/api/rest_v1/feed/featured/2024/01/10, passing in the current local device date (the Date() parameter here is passing in the current date). We do lean on some basic file cache in these methods if it's available but only if that cache was fetched in the current date. In the completion block here, you will see that we tell the widgets to next refresh a little after midnight if all went well, if something was off we tell it to refresh in two hours.
There's also an interesting nugget in T355134 (possibly related) that says this has only been a problem since iOS 17.
I think the iOS-side could be a lot smarter about picking a widget refresh date that won't result in an invalid featured endpoint date (for top read) for all timezones. Untagging content transform.




