Page MenuHomePhabricator

Implement Content Service endpoint for availability of feed content by Wikipedia languages
Closed, ResolvedPublic

Description

Implement an endpoint determining the availability of feed endpoints (card types) for each Wikipedia language.

This is so it is clear on the 'Customize feed' UI why some card types are not shown depending on the Wikipedia language(s) set.

This endpoint essentially enables iOS (T186624) and Android (T190920) to only show card types available per languages in their customization screens.

Event Timeline

RHo created this task.Apr 9 2018, 10:59 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 9 2018, 10:59 AM
RHo updated the task description. (Show Details)Apr 17 2018, 5:22 PM
bearND added a subscriber: bearND.Apr 17 2018, 5:25 PM

Couldn't the apps just get the feed for the current day and see which items are available that way?

hi @bearND – the thinking is that in the case of the user changing or adding a new app language, the "Customize the feed" UI is able to show which correct cards available for that language. However, I defer to @Fjalapeno @Dbrant to comment on whether the alternative you've proposed could be the way to go.

Mholloway renamed this task from Implement Content Service endpoint for available of feed content by Wikipedia languages to Implement Content Service endpoint for availability of feed content by Wikipedia languages.Apr 17 2018, 6:12 PM

Ok, during our weekly RI meeting @Fjalapeno brought up the concern about the most-read entry not being available at all times for the current day. If checking the previous day is not good enough, we could consider adding a new endpoint. RI would like some input from the app devs in what (output) form that would be most useful for the apps. Maybe something like a JSON object with properties for all available explore (aggregated + onthisday) feed items for the current language?

en.wikipedia.org/api/rest_v1/....

{
  news: true,
  tfa: true,
  potd: true
  onthisday: true
}

or an array in a something more generic but still site specific endpoint:

{ 
  availableFeeds: [
    'news',
    'potd',
    'tfa',
    'onthisday'
  ]
}

Another idea is the do one per endpoint type and return all language codes that support it.

{ 
  news: [ 'de', 'en', 'es', 'fr']
}

As far as suggestions, either per project or an aggregate end point both work for me.

As an additional option:
We can tweak @bearND's #2 suggestion while not providing a new end point. Rather we can add the "availableFeeds" content as a dictionary in the existing feed end point. Thoughts?

A more general comment:

If checking the previous day is not good enough, we could consider adding a new endpoint. RI would like some input from the app devs in what (output) form that would be most useful for the apps.

@bearND I don't mention this specifically yesterday, but my feeling here is that APIs should explicitly support features rather than relying on clients to use heuristics to provide those features.

More concretely, I would like to avoid clients needing to implement logic like: "Fetch data for the current day, if everything is there good, but if not also check the day before that may have content that is missing for the current day…"

Hope that makes sense for my rationale here.

LGoto triaged this task as High priority.Apr 18 2018, 8:17 PM
RHo added a subscriber: Mhurd.Apr 24 2018, 5:01 PM

hi @Mhurd - adding for your perspective from the iOS side regarding the above discussion.
Also + @Dbrant – if you've any opinions based on initial work on T190920.

Mholloway added a subscriber: Mholloway.EditedApr 25 2018, 3:24 PM

I like @bearND's suggestion of putting up a per-feature list of supported sites, maybe at something like api/rest_v1/feed/availability:

{
  "tfa": [ "bg", "bn", "bs", "cs", "de", "el", "en", "fa", "fr", "he", "hu", "ja", "la", "no", "sco", "ur", "vi" ],
  "mostread": [ "*" ],
  "image": [ "*" ],
  "news": [ "bs", "da", "de", "el", "en", "es", "fi", "fr", "he", "ko", "no", "pl", "pt", "ru", "sco", "sv", "vi" ],
  "onthisday": [ "ar", "bs", "de", "en", "es", "fr", "pt", "ru", "sv" ]
}

@Fjalapeno I don't much like the idea of putting it in the existing feed endpoint, because that's a very un-performant endpoint with a processing-heavy response and I'd hate to increase traffic to it unnecessarily just to get this little snippet. Of course we'd hope that clients would be good citizens and cache this info, but I wouldn't want to rely on that.

I don't much like the idea of putting it in the existing feed endpoint, because that's a very un-performant endpoint with a processing-heavy response and I'd hate to increase traffic to it unnecessarily just to get this little snippet. Of course we'd hope that clients would be good citizens and cache this info, but I wouldn't want to rely on that.

I don't think that would make any difference for MCS due to Varnish caching and RESTBase storage.

Change 428955 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] [Proposal for discussion] New feed content availability endpoint

https://gerrit.wikimedia.org/r/428955

The per-feature list of languages as proposed by @Mholloway / @bearND will work perfectly well for our purposes.

Mhurd added a comment.EditedApr 26 2018, 9:45 PM

Only tweak I would consider to @Mholloway's suggestion would be to not use acronyms - ie "tfa" -> "featuredarticle" and "potd" -> "pictureoftheday". Otherwise lgtm!

bearND added a project: Services.EditedMay 10 2018, 10:04 PM

+ Services for visibility. We'll need to expose this endpoint through RESTBase. The output is fairly static (same output for every WP project even). The only time this can change is during MCS deploy time. I can write the RB part. Let me know if you have any thoughts or concerns.

I'm wondering if it would be nicer to expose the actual JSON schema of the expected response and not invent a custom format?

We could expose it under something like /feed/featured/schema or even add a query parameter to the existing feed endpoint (this would be bad for caching though)

Sounds interesting.

@Pchelolo would you add some example output or show how the output format would look like for this?

@bearND basically just the schema we already have, just adjusted per-domain:

feed:
  type: object
  description: Aggregated feed content for a given date
  properties:
    tfa:
      description: Data about the featured article of the day
      $ref: '#/definitions/summary'
    mostread:
      description: Data about most viewed articles
      $ref: '#/definitions/mostread'
    news:
      description: Data about in the news section
      $ref: '#/definitions/news'
    image:
      description: Featured image for a given date
      $ref: '#/definitions/image'
    onthisday:
      description: List of featured events that happend on this day
      $ref: '#/definitions/onthisday'
  additionalProperties: false

@Pchelolo Oh, I was thinking more of a JSON endpoint. Not sure if the apps want to also add libraries for parsing yaml.
So, were you thinking that the domains that don't support a featured feed card would be omitted in this output?

@bearND heh, I've just copy-pasted this from the config, obviously it should return this in json format.

So I was thinking that, let's say for enwiki it will return all the properties. For the wikis where it doesn't support news, it will return the schema with everything but the news one. For domains with no support of feeds it will just return an empty object.

The idea is to have exactly what you're providing, just have it in a standardized format of JSON schema

@Pchelolo I see the appeal of that approach but it seems like it would be less convenient for clients based on what I understand their use case to be; but I could be wrong about that. @Dbrant / @Mhurd ?

Ye, it might be more complex to parse indeed. Just throwing out the ideas, feel free to discard it.

Fjalapeno added a comment.EditedMay 14 2018, 6:57 PM

@Pchelolo do you have any other objections or were you just looking to reuse schemas for consistency?

FWIW: The client's would like this to be a single API so they don't have to check every language (instead it is a single API call that gives them everything they need).

Having said that, Is it ok to go with @Mholloway / @bearND's proposal or do you have other needed tweaks given that info?

@Pchelolo do you have any other objections or were you just looking to reuse schemas for consistency?

Just throwing ideas around, I don't have any strong objections to implementing a custom format if that's easier and more convenient for the clients. Just wanted us to consider reusing a standard format.

I would be +1 on reusing the format proposed by @Pchelolo as it's something we are already using for the spec and the format is well-documented elsewhere, so it's a more future-proof and long-term solution IMO. Clients will have to parse whatever we serve them, so I don't think there will be increased complexity here.

@mobrovac Links to the docs and/or concrete examples of the actual format would be good to have.

It really would be preferential for clients to receive the data in the kind of simple json list format specified earlier, since it would fit much more readily into our existing parsing logic.

While I appreciate the desirability of a standard output format in general I'll cast my vote in favor of giving the clients exactly what they want here. From the service side, serving this up as I've written the patch is as close to a freebie as we ever get, as far as I can see.

@Mholloway That works for me. All that's left before we can merge your patch is a minor update to it.

While I appreciate the desirability of a standard output format in general I'll cast my vote in favor of giving the clients exactly what they want here. From the service side, serving this up as I've written the patch is as close to a freebie as we ever get, as far as I can see.

I strongly disagree. Not with this particular case but with the attitude we're getting into here. The API should be generic. If we make APIs for clients convenience, we basically prohibit all the other uses of the api for the rest of the possible clients.

The summary API started as a single use case and now it's powering numerous use cases. Writing 5 more lines of code in the apps is way easier than rebuilding the API.

I;m sorry @Mholloway if we did have a power of veto in the WMF this time I would use it.

I strongly disagree. Not with this particular case but with the attitude we're getting into here.

Point taken. What sways me in this case is that what I'm proposing gives clients the info they need for their use case in one call, vs. ${total # of active wikis} API calls, as the counterproposal would require, if I understand it correctly.

What sways me in this case is that what I'm proposing gives clients the info they need for their use case in one call, vs. ${total # of active wikis} API calls, as the counterproposal would require, if I understand it correctly.

Hm, looking more into why we're doing that, point taken. I back off :) +1 to your proposal.

Feels a bit weird to request info about all the domains from a specific domain though, perhaps put it under global domain https://wikimedia.org/api/rest_v1/ ?

Feels a bit weird to request info about all the domains from a specific domain though, perhaps put it under global domain https://wikimedia.org/api/rest_v1/ ?

That makes sense to me, but how about 'wikipedia.org' instead, since currently (and for the foreseeable future) the feed content this refers to is only available for the Wikipedias?

That makes sense to me, but how about 'wikipedia.org' instead, since currently (and for the foreseeable future) the feed content this refers to is only available for the Wikipedias?

heh.. we don't have restbase on wikipedia.org

Ha, wikimedia.org it is!

@Pchelolo @Mholloway I would prefer to expose it on all WPs since then clients wouldn't need to contact a new domain and avoid a TLS handshake. Or are the apps already contacting wikimedia.org for another reason? I believe the Android app contacts meta.wikimedia.org. Maybe that one would be better?

@Pchelolo @Mholloway I would prefer to expose it on all WPs since then clients wouldn't need to contact a new domain and avoid a TLS handshake. Or are the apps already contacting wikimedia.org for another reason? I believe the Android app contacts meta.wikimedia.org. Maybe that one would be better?

The certificate contains all of our domains AFAIK, so the handshake wouldn't need to happen with HTTP/2. Furthermore, having it served from wm.org improves caching.

Change 428955 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Feed content availability endpoint

https://gerrit.wikimedia.org/r/428955

Change 437469 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Availability: Make projects the top level of the response

https://gerrit.wikimedia.org/r/437469

Change 437469 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Availability: Provide full project domains for supported projects

https://gerrit.wikimedia.org/r/437469

Per discussion on the PR, the output is updated to provide full domains rather than language codes only:

{
  "foo": [
    "ar.wikipedia.org",
    "en.wikipedia.org"
  ]
}

This is to improve consistency with other REST API responses and to allow for the possibility of feed content being generated for other projects in the future.