Page MenuHomePhabricator

Move feed assembly from RESTBase to Wikifeeds
Closed, ResolvedPublic

Description

Background

Feed is really expensive to build on the fly, so for awhile we tried to store the results in RESTBase until realizing there is just too many problems with it - the daily updates are unpredictable, there's no way to track/observe them. At the times when the feed metadata was stored, we wanted to provide the newest summary content for feed items, so feed was enriched on the fly with summary data. Since it was stored in RESTBase, the code fetching summaries lives in RESTBase too now.

Status quo

After we've gave up on storing feed metadata in RESTBase, the assembly code remained, so now we have a very strange pattern of wikifeeds providing RESTBase urls and RESTBase is calling itself to enrich the response with the summary. This makes no sense.

Solution

We need to port the feed enriching code from RESTBase to WikiFeeds. The code is quite generic, residing mostly in here. The code can't be removed from RESTBase right away since it's used elsewhere (with a less crazy pattern, the urls are assembled in restbase too)

Making wikifeeds return full feed response will have the following benefits:

  • This is the first step in RESTBase sunset project
  • Makes the request flow much more sane
  • We can point api-gateway directly to feeds service and stop proxying this one via restbase
  • RESTBase becomes a simple no-op proxy for feeds just for backwards compatibility. Requests via restbase can be deprecated and traffic slowly migrated to api-gateway. Feed is already exposed there.

Event Timeline

Pchelolo created this task.
Pchelolo updated the task description. (Show Details)

One other reason feeds were proxied via restbase was that we didn't request the whole aggregated feed in a single request, instead we fan out multiple requests for portions of the feed and then assemble it all together. This was done because aggregated feed request to too heavy for a single wikifeeds worker, and it frequently timed out.

We can have the same patter in wikifeeds, just calling localhost, but I think we should try to build aggregated feed in a single worker as a first step and see how it behaves.

Implementing fan out in wikifeeds without going via HTTP would require exposing some worker API in service-runner which we do not currently have.

Interesting disparity between RB feed and wikifeeds feed is that in RB we expose the aggregate feed under /feed/featured, while in wikifeeds the /feed/features is just for the featured article component of the feed.

Plan:

  • Add /feed/aggregate to wikifeeds, expose that in api gateway
  • Make RESTBase use /feed/aggregate as a backend for /feed/featured, deprecate accessing via RESTBase
  • Cleanup: at this point wikifeeds will not need to support ?aggregate query parameter anymore, drop it.
  • Migrate clients from {domain}/rest_v1/feed/featured to api.wikimedia.org/{project}/{lang}/v1/feed/aggregate
  • Remove feeds from RESTBase entirely.

I'm open to suggestions on the naming of the aggregate endpoint in wikifeeds.

I'm open to suggestions on the naming of the aggregate endpoint in wikifeeds.

/feed/aggregate sounds fine to me.

Change 628426 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/wikifeeds@master] WIP: Provide summaries in the responses instead of $merge.

https://gerrit.wikimedia.org/r/628426

Change 628426 merged by jenkins-bot:
[mediawiki/services/wikifeeds@master] Provide summaries in the responses instead of $merge.

https://gerrit.wikimedia.org/r/628426

Change 629761 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/wikifeeds@master] Filter out missing summaries from most-read feed

https://gerrit.wikimedia.org/r/629761

Change 629761 merged by jenkins-bot:
[mediawiki/services/wikifeeds@master] Filter out missing summaries from most-read feed

https://gerrit.wikimedia.org/r/629761

Change 629792 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/wikifeeds@master] Include 'normalizedtitle' property in summary for legacy clients

https://gerrit.wikimedia.org/r/629792

Change 629792 merged by jenkins-bot:
[mediawiki/services/wikifeeds@master] Include 'normalizedtitle' property in summary for legacy clients

https://gerrit.wikimedia.org/r/629792

Change 629803 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[operations/deployment-charts@master] Update wikifeeds to 2020-09-24-191356-production

https://gerrit.wikimedia.org/r/629803

Change 629803 merged by jenkins-bot:
[operations/deployment-charts@master] Update wikifeeds to 2020-09-24-191356-production

https://gerrit.wikimedia.org/r/629803

Change 630684 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/wikifeeds@master] Deduplicate most-read and onthisday pages after fetching summaries

https://gerrit.wikimedia.org/r/630684

Change 630684 merged by jenkins-bot:
[mediawiki/services/wikifeeds@master] Deduplicate most-read and onthisday pages after fetching summaries

https://gerrit.wikimedia.org/r/630684

Mentioned in SAL (#wikimedia-operations) [2020-10-05T14:13:44Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@366a543]: T263133 T264035

Mentioned in SAL (#wikimedia-operations) [2020-10-05T14:36:07Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@366a543]: T263133 T264035 (duration: 22m 23s)