Page MenuHomePhabricator

Make public "Anniversaries" endpoint
Closed, ResolvedPublic

Description

After implementing:
T143408

  • Add hydration
  • Expose public endpoints

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 19 2016, 2:50 PM
JMinor added a subscriber: JMinor.Aug 19 2016, 4:55 PM
Fjalapeno renamed this task from Incorporate "On This Day" endpoint into aggregate feed endpoint to Make public "On This Day" endpoint.Jan 19 2017, 7:21 PM
Fjalapeno updated the task description. (Show Details)
Fjalapeno renamed this task from Make public "On This Day" endpoint to Make public "Anniversaries" endpoint.Jan 23 2017, 7:58 PM

In preparation I've made a PR abstracting out the feed logic to make adding the new feed for anniversaries easier: https://github.com/wikimedia/restbase/pull/750

@Pchelolo awesome - just saw this - thanks!

@Fjalapeno From the discussions we've had about this, the public API should look as follows:

/feed/anniversaries/{yyyy}/{mm}/{dd} - returns ALL anniversaries for the selected date. Stored in RESTBase similar to feed content.
/feed/featured/{yyyy}/{mm}/{dd} - add the SELECTED (featured) anniversaries to the response under an anniversaries field in response.

@Pchelolo a couple slight tweaks to that:

  • We have multiple individual paths for components of the response
  • No year component

'births' on given day:

/feed/anniversaries/births/05/22

'deaths' on given day:

/feed/anniversaries/deaths/05/22

'anniversaries' on given day:

/feed/anniversaries/events/05/22

'selected' on given day:

/feed/anniversaries/selected/05/22

'holidays' on given day:

/feed/anniversaries/holidays/05/22

'all' on given day (includes everything above):

/feed/anniversaries/all/05/22

Does that look ok to you?

If so, looks like we need to do 2 things yet:

  1. Add the feed component to the route
  2. Change the name form "onthisday" to "anniversaries"

@Fjalapeno

  1. From what I've understood the all anniversaries endpoint would be used in the app after the user clicks a 'more' button on the selected anniversaries card. How do you expect the other individual types of anniversaries be used? Should we expose individual anniversary types at all?
  1. What do you think about leaving the {yyyy} component in there? Although it doesn't make much sense here, it will provide nice commonality with the feed endpoint and potentially other feeds that we could add in the future. Does it make sense to special-case this?

Hm, there are also births/deaths/events per year too, e.g. 2016. In the same vein, there are also per-month pages, e.g. deaths in March 2016. Is this something we should consider in this discussion as well? Or at least keep the room for it?

@Fjalapeno

  1. From what I've understood the all anniversaries endpoint would be used in the app after the user clicks a 'more' button on the selected anniversaries card. How do you expect the other individual types of anniversaries be used? Should we expose individual anniversary types at all?

I think it would be interesting to expose that data separately as well, since we are able to obtain it. If it's not requested, then no harm no foul. On the other hand, it is giving us the flexibility for future uses.

  1. What do you think about leaving the {yyyy} component in there? Although it doesn't make much sense here, it will provide nice commonality with the feed endpoint and potentially other feeds that we could add in the future. Does it make sense to special-case this?

I'm not sure this makes sense. While there is an appeal in keeping it consistent with the feed hierarchy, I don't see the use in it. Is this something we would really want to be able to retrieve as-was? Is there value in that?

Pchelolo added a comment.EditedFeb 8 2017, 10:27 PM

I think it would be interesting to expose that data separately as well, since we are able to obtain it. If it's not requested, then no harm no foul. On the other hand, it is giving us the flexibility for future uses.

Ok, agreed, no harm in doing that.

Hm, there are also births/deaths/events per year too, e.g. 2016. In the same vein, there are also per-month pages, e.g. deaths in March 2016. Is this something we should consider in this discussion as well? Or at least keep the room for it?

I'm not sure this makes sense. While there is an appeal in keeping it consistent with the feed hierarchy, I don't see the use in it. Is this something we would really want to be able to retrieve as-was? Is there value in that?

That's what I'm concerned about - if we don't have the {yyyy} parameter, we'd never be able to add something like this, and who knows where the feature would be going in the future. It seems for the client supplying the {yyyy} is not hard at all, while it's keeping the API more flexible for potential future use-cases.

For example I could imaging having a parameter that would supply only stuff that happened on a particular day of the particular year.

That's what I'm concerned about - if we don't have the {yyyy} parameter, we'd never be able to add something like this, and who knows where the feature would be going in the future. It seems for the client supplying the {yyyy} is not hard at all, while it's keeping the API more flexible for potential future use-cases.
For example I could imaging having a parameter that would supply only stuff that happened on a particular day of the particular year.

Keeping the year would allow us to have something like /feed/anniversaries/deaths/{yyyy}{/mm}{/dd} which would allow the client to select what type of anniversaries they want. I like this. But then, how would they say I'm just interested in anniversaries for March, 5th ? I guess forcing them to supply the current year is not that big of a deal, but I'm not entirely sure because then clients wouldn't be able to always use the same URI for that particular case.

Ye, I'm on the edge here as well.

If we have /feed/anniversaries/{type}/{yyyy}/{mm}/{dd} it's not very clear that the anniversaries would be provided for all years before yyyy and not just for that particular year. But if we omit the {yyyy} parameter, we wouldn't be able to add types that return content just for that year.

Let's see what @Fjalapeno have to say, but after thinking a bit more I'm leaning towards omitting the yyyy

Mhurd added a subscriber: Mhurd.EditedFeb 9 2017, 1:33 AM

Hey all, thought I'd add some notes...

This endpoint was designed to provide information about specific days from day pages - hence the original name "onthisday". i.e. if you want to know who died on Feb 8 you could use /onthisday/deaths/02/08 and it would give you a list of deaths, by year, which occurred on Feb 8, or you could ask it /onthisday/holidays/12/03 and it would give you a list of Dec 3 holidays. Note that holiday results are not "by year" because they recur yearly by definition - also you run into issues trying to associate years with holidays because if you go far enough in the past there's a point where every holiday didn't exist yet.

The term "anniversaries" does not make this "onthisday" intent clear and may lead to more confusion about a yyyy use case, which, while interesting, is different than the use case this endpoint was designed for.

From what I've understood the all anniversaries endpoint would be used in the app after the user clicks a 'more' button on the selected anniversaries card. How do you expect the other individual types of anniversaries be used? Should we expose individual anniversary types at all?

"all" returns a very large amount of information - all of the data from each of the atomic endpoints (births, deaths, holidays, selected and events) see https://en.wikipedia.org/wiki/January_20, for example - if you return all of those titles hydrated it's going to be a pretty big payload.

iirc the apps use case is actually to use *only* the individual types - i.e. the user can pick a "holidays" or "deaths" or "events" section to appear in their feed or as a widget. Respective widgets would show the first few most recent respective items, potentially with a "more" button to see the rest. My personal preference is to not have "all" at all and just have the atomic individual types - but that's just my preference to keep endpoints tightly scoped, atomic and specific.


Oh, side question - why are these to be grouped under feed? Seems to presume a use case...

Thank you @Mhurd. I think after your explanation we'd all agree not to include yyyy parameter. In case we ever want to implement the functionality to return stuff that happened exactly on the day of the particular year we can add one more endpoint for that.

My personal preference is to not have "all" at all and just have the atomic individual types - but that's just my preference to keep endpoints tightly scoped, atomic and specific.

The endpoint would look like /feed/anniversaries/{type}/{mm}/{dd}, so exporting all is not a big deal for us.

Oh, side question - why are these to be grouped under feed? Seems to presume a use case...

We don't have any other top-level group to put it into and I don't think we should create one for this endpoint.

Now one more question comes into mind. So we've decided that we wouldn't have the yyyy parameter. The content will be stored in RESTBase for performance reasons, but right now the feed stores 'historic' content forever. So after one year, all the content will be stored indefinitely. And the births/deaths keep happening, pages get edited, more articles and more anniversaries keep being added or removed. So we can't use the same storage concept as we've used in the featured feed endpoint.

I propose to store the results in RESTBase for X amount of time and not have a permanent storage for this endpoint at all. Since we can't actively purge this, we have to decide what X is gonna be - 1 day? 1 week? 1 month? The lower the TTL here is the faster new anniversaries will show up and the higher the load on MCS would be and the higher the latency would be. I think 1-2 days is OK.

Mhurd added a comment.Feb 9 2017, 1:54 AM

@Pchelolo

In case we ever want to implement the functionality to return stuff that happened exactly on the day of the particular year we can add one more endpoint for that.

Makes sense. If we find demand for that I'd be interested in hacking at it :)

I think 1-2 days is OK.

Agreed.

I still don't prefer anniversaries over onthisday but I can live with it if everyone else likes it ;)

Mhurd added a comment.Feb 9 2017, 2:11 AM

Background on my preference for onthisday:

My main issue with anniversaries is it by definition creates a mental association with year (when it should be with day):

wiktionary
From the Latin adjective anniversarius ‎(“returning yearly”), from vertere ‎(“to turn”), + annus ‎(“year”).

Only some of the items on days pages ( https://en.wikipedia.org/wiki/January_20 ) have year associations (see my comment above about holidays), but they all have association with a day. The editor's naming of the pages by day reenforces this and is what got me thinking about how the endpoint could be named to reflect the editor's existing organization "scheme".

Thank you @Mhurd. I think after your explanation we'd all agree not to include yyyy parameter. In case we ever want to implement the functionality to return stuff that happened exactly on the day of the particular year we can add one more endpoint for that.

I think you guys already figured out the yyyy stuff, but just to toss this in:

Yes the end point was designed to tell you all the interesting events that occurred on a particular day of the year (e.g Jan 20, July 12) that reflects the content of the "day" pages ( https://en.wikipedia.org/wiki/January_20 ) that it is extracted from.

It wasn't intended to be able to query events for a particular day in history (Tell me who died on July 12, 1962), though that is an interesting use case.

I would be fine with keeping things as is, but if we wanted to support that, we could take a hint form the page view API and do something like:

/feed/events/byday/{type}/{mm}/{dd}

/feed/events/byyear/{type}/{yyyy}

/feed/events/bydate/{type}/{yyyy}/{mm}/{dd}

In this way it can be expanded to encompass new time slices.

But in this case we need to change the endpoint name totally to support different use cases. "Events" is about the best I can think of as its pretty generic.

I propose to store the results in RESTBase for X amount of time and not have a permanent storage for this endpoint at all. Since we can't actively purge this, we have to decide what X is gonna be - 1 day? 1 week? 1 month? The lower the TTL here is the faster new anniversaries will show up and the higher the load on MCS would be and the higher the latency would be. I think 1-2 days is OK.

Now one more question comes into mind. So we've decided that we wouldn't have the yyyy parameter. The content will be stored in RESTBase for performance reasons, but right now the feed stores 'historic' content forever. So after one year, all the content will be stored indefinitely. And the births/deaths keep happening, pages get edited, more articles and more anniversaries keep being added or removed. So we can't use the same storage concept as we've used in the featured feed endpoint.

1-2 days is fine, but a couple questions:

  1. Why can't we purge when we get the change prop event for the page being edited? Is this purely because of it being under the "feed" route?
  2. If it is because of being under feed, is this reason enough to reconsider putting it there?
  3. What about vandalism - I would think these pages are most vulnerable to vandalism close to the day that they would be shown in the app, if we don't purge on edits, that could be an issue.

Why can't we purge when we get the change prop event for the page being edited? Is this purely because of it being under the "feed" route?

Because we don't know which pages to track. We'd need to maintain a list telling us that 'if a page Jan_10 is changed also purge content from anniversaries endpoint'. We don't support that.

What about vandalism - I would think these pages are most vulnerable to vandalism close to the day that they would be shown in the app, if we don't purge on edits, that could be an issue.

We can do < 1 day. 2 hours?

2 hours is good 👍

Need to work out the name of the routes with @bearND based on if we intent to ever include routes for dates, months, and years

Created the primary version of the PR here: https://github.com/wikimedia/restbase/pull/761

Still needs some tweaking though

bearND added a comment.Feb 9 2017, 8:33 PM

/feed/events/byday/{type}/{mm}/{dd} sounds fine with me.

mobrovac triaged this task as Normal priority.Feb 9 2017, 8:50 PM

/feed/events/byday/{type}/{mm}/{dd} is a bit unclear to me. What events? "Event" is like a super-broad concept, it could mean almost anything. In the API it's even more confusing, because in computer science 'event' have a pretty specific meaning.

/feed/events/byday/{type}/{mm}/{dd} is a bit unclear to me. What events? "Event" is like a super-broad concept, it could mean almost anything. In the API it's even more confusing, because in computer science 'event' have a pretty specific meaning.

Agreed. While in the context of WP event is pretty self-explanatory, in the context of the API one might easily confuse that with the feed of events happening in the system, i.e. associate that endpoint with EventStreams.

Adding my 2¢: I agree with @Pchelolo and @mobrovac that "event" is a bit too broad. I personally associate anniversary with a date repeating after some years & would be fine with using it. "Onthisday" wfm as well.

I've been trying to look something up in the dictionary, but didn't really find anything good. I'm ok with onthisday or anniversaries

Hm, but if we plan to extend it to include specific years and/or months as we discussed today in the meeting, neither of these really work :(

Hm, but if we plan to extend it to include specific years and/or months as we discussed today in the meeting, neither of these really work :(

Actually 'anniversaries' will work in this case, if we add a separate one for specific date and call it 'onthisday'. And we will have to add a separate one, because we're not including the yyyy parameter in this one.

Sounds like feed/onthisday/{type}/{mm}/{dd} for this endpoint then, right?

Sounds like feed/onthisday/{type}/{mm}/{dd} for this endpoint then, right?

I'm ambivalent… @mobrovac @Pchelolo ?

Personally, I find anniversaries more elegant, but onthisday is future-proof, as we could have /feed/onthisdate/ later (even though the similarity in names may lead to confusion). TL;DR let's be future-proof rather than elegant, IMHO.

I don't really care too much, let's just get to a conclusion and deploy this. Looks like we are all leaning to onthisday, so I will agree with that. 3. 2. 1. Sold?

Yes, I'll make the change in MCS to add the /feed/ layer shortly, so I can deploy this afternoon.

Cool. I will update the RESTBase PR after @bearND makes changes to MCS and we can deploy RB portion today/tomorrow

Change 337436 had a related patch set uploaded (by BearND):
On this day: move routes under /feed

https://gerrit.wikimedia.org/r/337436

Change 337436 merged by jenkins-bot:
On this day: move routes under /feed

https://gerrit.wikimedia.org/r/337436

Change 337514 had a related patch set uploaded (by BearND):
Hygiene: On this day: move lib file under feed

https://gerrit.wikimedia.org/r/337514

Change 337514 merged by Ppchelko:
Hygiene: On this day: move lib file under feed

https://gerrit.wikimedia.org/r/337514

Pchelolo closed this task as Resolved.Feb 13 2017, 11:55 PM

Deployment done: https://en.wikipedia.org/api/rest_v1/#!/Feed/aggregatedFeed_0
The endpoint is completely operational!