Page MenuHomePhabricator

Recovery: Trending service should be able to replay last 1hr of edits
Closed, ResolvedPublic5 Estimated Story Points

Description

When the script starts, it has no knowledge of the current editing activity on the wiki.

The wikitrender service dealt with this by dropping articles from a store if they had been inactive for a configurable amount of time OR they had exceeded a max lifespan and caching this in a file which is loaded from on startup.

Instead of this approach we will replay existing edits within the last hour.

Event Timeline

Jdlrobson renamed this task from Trending service should be able to store page to Trending service should be able to store an overview of editing activity of a wiki.Sep 13 2016, 8:36 PM
Jdlrobson renamed this task from Trending service should be able to store an overview of editing activity of a wiki to Storage: Trending service should be able to store an overview of editing activity of a wiki.
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)
Jdlrobson set the point value for this task to 5.
Jdlrobson renamed this task from Storage: Trending service should be able to store an overview of editing activity of a wiki to Storage: Trending service should be able to replay last 1hr of edits.Sep 29 2016, 4:30 PM
Jdlrobson renamed this task from Storage: Trending service should be able to replay last 1hr of edits to Recovery: Trending service should be able to replay last 1hr of edits.
Jdlrobson updated the task description. (Show Details)

Depending on the method of ingesting edits, this may be accomplished via Kafka rather than needing a persistence layer

This should be mostly solved here: T145553

But lets leave this card for now in case we have any follow up tasks

Yes this should be done, but I've not been able to verify. @Pchelolo how can we do so?

@Jdlrobson I would do the following:

  1. Set the max_age in the config to 5 minutes
  2. Put a console.log when the message is received
  3. Start the service, produce a message to kafka, wait 5 minutes, produce another message, stop the service
  4. Start the service give it some time to subscribe, look how only the second message gets reprocessed.

I will try create an integration test after we're done with setting up kafka in CI.

This doesnt seem to be working.
I tested by doing the following

(No results)

(Results)

@Jdlrobson This will only refetch stuff if there were some offsets already committed for the consumer group, because on first deploy we don't want to refetch a week of data currently stored in kafka, so we default to starting from the end of the topic. To check if you have something committed in kafka you can do the following command inside vagrant:

kafka run-class  kafka.admin.ConsumerGroupCommand --describe --group <YOUR_CONSUMER_GROUP_NAME> --bootstrap-server localhost:9092 --new-consumer

You can find out your consumer group name by putting console.log in edit-stream.js:54

@Pchelolo if we do a deploy (or restart the service) what I see is an empty list of trending results when the service goes back. This is not expected - it should show the same results as it did before it went offline. Can you investigate this? I'm not too clear on the ins and outs of the kafka part of this service...

@Jdlrobson kk, will investigate and get back to you with the results

Change 327859 had a related patch set uploaded (by Ppchelko):
Kafka consumer: Switch back to flowing mode.

https://gerrit.wikimedia.org/r/327859

Ok, stupid bug.

We've never been actually committing anything, because we've been comparing strings with integers when deciding whether it's time to commit already or not yet. Adding a single Date.parse call fixes the issue. To avoid merge conflicts I've updated https://gerrit.wikimedia.org/r/#/c/327859/ to include a fix for this issue too.

Change 327859 merged by Mobrovac:
Kafka consumer: Switch back to flowing mode.

https://gerrit.wikimedia.org/r/327859

Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2017-01-18T00:53:58Z] <mobrovac@tin> Starting deploy [trending-edits/deploy@1d53b7c]: fixes for T153122 and T145571

Mentioned in SAL (#wikimedia-operations) [2017-01-18T00:59:04Z] <mobrovac@tin> Finished deploy [trending-edits/deploy@1d53b7c]: fixes for T153122 and T145571 (duration: 05m 06s)

mobrovac subscribed.

The fix has been deployed. Right after the restart, I can see:

mobrovac@scb1001:~$ curl localhost:6699/en.wikipedia.org/v1/feed/trending-edits/ | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   459    0   459    0     0  58396      0 --:--:-- --:--:-- --:--:-- 65571
{
  "timestamp": "2017-01-18T01:06:11.741Z",
  "pages": [
    {
      "totalEdits": 22,
      "editors": 12,
      "trendiness": 1.1073497847740705,
      "isNew": false,
      "updated": "2017-01-18T01:06:02.000Z",
      "$merge": [
        "http://restbase.svc.eqiad.wmnet:7231/en.wikipedia.org/v1/page/summary/Whitley_Bay"
      ]
    },
    {
      "totalEdits": 7,
      "editors": 1,
      "trendiness": -1,
      "isNew": false,
      "updated": "2017-01-18T01:04:36.000Z",
      "$merge": [
        "http://restbase.svc.eqiad.wmnet:7231/en.wikipedia.org/v1/page/summary/Underworld_Unleashed"
      ]
    }
  ]
}
Jdlrobson claimed this task.

Wow so fast :) Thanks @Pchelolo and @mobrovac !!