Page MenuHomePhabricator

Have Restbase invalidate MCS changed content (primarily to support quickly updating vandalized content)
Closed, ResolvedPublic

Description

Content displayed in the feed of the iOS and Android apps can sometimes be vandalized.

This is problematic if anything in the feed (featured article, picture of the day, most read) is downloaded and cached by the clients.

When this happens, it is common for the page to be quickly fixed by community members. When an article is updated, it would be expected that any cached content on the server (like a summary of an article) would be invalidated and removed from the cache.

It would be expected that tags for updated content would be different so that clients would know that the content was updated and can invalidate their local caches as well.

This may become more of an issue once T143912 is implemented since content that is updated infrequently will likely have a longer cache time

Event Timeline

One idea how we could achieve that is the following:

  1. MCS will not fetch summaries of the articles on it's own, but instead emit only article titles, so the whole feeds result would look something like this:
tfa:
  items:
    - title: Todd_Manning
random:
  items:
    - title: Delaware_Route_37
mostread:
  date: "2016-08-24Z"
  items:
    - title: Spy_Hard_(song)
      views: 446191
      rank: 3
news:
   items:
     - story: "" # I don't fully understand what's that and where's the content coming from, could you clarify?
       items:
          - title: 2016_Central_Italy_earthquake
     - Another one
image: # unchanged. But I don't understand the 'image' property we have there.
  1. RESTBase would store this info with larger TTL (10-30 minutes?)
  2. On every request for the feed RESTBase would fetch this information and fill in all the titles with summaries from RESTBase storage
  3. Varnish-cache the response for shorter period (in order of several seconds because of random part)

With this schema the MCS code would be simplified (no need to fetch summaries any more), and the content in the summary would always be fresh and nice since we are rerendering them via ChangeProp. Also I think that we will get some performance gain from this since fetching a bunch of summaries from RB storage is definitely faster then regenerating the feed in MCS.

Although, you have wikidata description in the summaries while we don't, but that's gonna be fixed soon with T128894 What do you think?

That's the only viable option I think, because to actively purge the feed endpoint we need to know which articles are used there, so we'd need some kind of dependency tracking which we don't have

@Pchelolo I'm fine with the strategy to just store the titles in the aggregated feed endpoint vs. the whole summary data.
What I really want to get to though is to store the contents of the microservices separately since they have different lifetimes. It would be great if the aggregation would be done inside RESTBase then.
Especially separating the most-read endpoint could benefit the cluster by reducing load incurred by recalculating this big payload which doesn't change more than a day. It would also reduce load to the PageView API. Since PageView API is in RB it should be simple to setup ChangeProp for this. I'd also would like to see this for the other feed microservices, too. What do you think?

What I really want to get to though is to store the contents of the microservices separately since they have different lifetimes. It would be great if the aggregation would be done inside RESTBase then.

This is a completely separate question tracked in T143912 I think, but I agree that it's the right direction. Not necessarily store the content in separate buckets though, but update it separately definitely.

@Pchelolo How do you update it separately if the things are in the same bucket?

@bearND we could update just one property and store other stuff unchanged. Although we'd have to fetch full feed to only get a portion of it for the individual endpoints, the payload is small, so it should be OK.

@Fjalapeno Nono, the overall output will be exactly the same as it is right now, the question is just about what is stored in RB

@Fjalapeno Nono, the overall output will be exactly the same as it is right now, the question is just about what is stored in RB

Got it… thanks for the clarification… removing my comment so its not confusing

Question: for all the rest of the articles MCS emits 2 properties:

  • title - an article title in DB key format (with underscores)
  • normalisedtitle - an article title in human-readable format - with spaces.

while summary endpoint uses title with human-readable article title format. Should we update summary endpoint to also emit both to match feed?

while summary endpoint uses title with human-readable article title format. Should we update summary endpoint to also emit both to match feed?

That would be a breaking change, as the parameter would effectively be renamed from title to normalisedtitle. Or do you mean just inside the feed output?

while summary endpoint uses title with human-readable article title format. Should we update summary endpoint to also emit both to match feed?

That would be a breaking change, as the parameter would effectively be renamed from title to normalisedtitle. Or do you mean just inside the feed output?

I mean we need to try to make it work the same inside feed and inside summary, so I wanna understand how it's used in the app and whether we can change that?

title is used for fetching the article from RESTBase, while normalisedtitle is used for GUI display. Since there's already an Android app version that conforms to that, I highly doubt we can, but perhaps the Android folks have some ideas.

Deriving the human readable title from the regular normalizedDBKey (with underscores) is a matter of s/_/ /g. Given the ease of doing this replacement, a separate attribute might not be warranted.

The PR for this has been merged, so after it's deployed MCS can remove summary fetching and return only titles. I will signal when it can be done (when RB is deployed) here, but for now transferring this ticket to MCS team.

What should be returned: exactly the same thing as before, but the summaries shouldn't be substituted. Only title/normalizedtitle and various content-specific values, like 'rank' for most_read content.

Pchelolo claimed this task.

The followup work on MCS side is tracked by T146041, nothing is needed from RESTBase here, resolving.