Recovery: Trending service should be able to replay last 1hr of edits
Closed, ResolvedPublic5 Story Points

Description

When the script starts, it has no knowledge of the current editing activity on the wiki.

The wikitrender service dealt with this by dropping articles from a store if they had been inactive for a configurable amount of time OR they had exceeded a max lifespan and caching this in a file which is loaded from on startup.

Instead of this approach we will replay existing edits within the last hour.

Jdlrobson changed the title from "Trending service should be able to store page" to "Trending service should be able to store an overview of editing activity of a wiki".Sep 13 2016, 8:36 PM
Jdlrobson edited the task description. (Show Details)
Jdlrobson edited the task description. (Show Details)
Jdlrobson set the point value for this task to 5.
Jdlrobson changed the title from "Trending service should be able to store an overview of editing activity of a wiki" to "Storage: Trending service should be able to store an overview of editing activity of a wiki".
Jdlrobson changed the title from "Storage: Trending service should be able to store an overview of editing activity of a wiki" to "Storage: Trending service should be able to replay last 1hr of edits".Sep 29 2016, 4:30 PM
Jdlrobson edited the task description. (Show Details)
Jdlrobson changed the title from "Storage: Trending service should be able to replay last 1hr of edits" to "Recovery: Trending service should be able to replay last 1hr of edits".

Depending on the method of ingesting edits, this may be accomplished via Kafka rather than needing a persistence layer

This should be mostly solved here: T145553

But lets leave this card for now in case we have any follow up tasks

Yes this should be done, but I've not been able to verify. @Pchelolo how can we do so?

@Jdlrobson I would do the following:

  1. Set the max_age in the config to 5 minutes
  2. Put a console.log when the message is received
  3. Start the service, produce a message to kafka, wait 5 minutes, produce another message, stop the service
  4. Start the service give it some time to subscribe, look how only the second message gets reprocessed.

I will try create an integration test after we're done with setting up kafka in CI.

Jdlrobson added a comment.EditedDec 7 2016, 10:16 PM

This doesnt seem to be working.
I tested by doing the following

(No results)

(Results)

@Jdlrobson This will only refetch stuff if there were some offsets already committed for the consumer group, because on first deploy we don't want to refetch a week of data currently stored in kafka, so we default to starting from the end of the topic. To check if you have something committed in kafka you can do the following command inside vagrant:

kafka run-class  kafka.admin.ConsumerGroupCommand --describe --group <YOUR_CONSUMER_GROUP_NAME> --bootstrap-server localhost:9092 --new-consumer

You can find out your consumer group name by putting console.log in edit-stream.js:54

@Pchelolo if we do a deploy (or restart the service) what I see is an empty list of trending results when the service goes back. This is not expected - it should show the same results as it did before it went offline. Can you investigate this? I'm not too clear on the ins and outs of the kafka part of this service...

@Jdlrobson kk, will investigate and get back to you with the results

Change 327859 had a related patch set uploaded (by Ppchelko):
Kafka consumer: Switch back to flowing mode.

https://gerrit.wikimedia.org/r/327859

Ok, stupid bug.

We've never been actually committing anything, because we've been comparing strings with integers when deciding whether it's time to commit already or not yet. Adding a single Date.parse call fixes the issue. To avoid merge conflicts I've updated https://gerrit.wikimedia.org/r/#/c/327859/ to include a fix for this issue too.

Change 327859 merged by Mobrovac:
Kafka consumer: Switch back to flowing mode.

https://gerrit.wikimedia.org/r/327859

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2017-01-18T00:53:58Z] <mobrovac@tin> Starting deploy [trending-edits/deploy@1d53b7c]: fixes for T153122 and T145571

Mentioned in SAL (#wikimedia-operations) [2017-01-18T00:59:04Z] <mobrovac@tin> Finished deploy [trending-edits/deploy@1d53b7c]: fixes for T153122 and T145571 (duration: 05m 06s)

mobrovac added a subscriber: mobrovac.

The fix has been deployed. Right after the restart, I can see:

mobrovac@scb1001:~$ curl localhost:6699/en.wikipedia.org/v1/feed/trending-edits/ | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   459    0   459    0     0  58396      0 --:--:-- --:--:-- --:--:-- 65571
{
  "timestamp": "2017-01-18T01:06:11.741Z",
  "pages": [
    {
      "totalEdits": 22,
      "editors": 12,
      "trendiness": 1.1073497847740705,
      "isNew": false,
      "updated": "2017-01-18T01:06:02.000Z",
      "$merge": [
        "http://restbase.svc.eqiad.wmnet:7231/en.wikipedia.org/v1/page/summary/Whitley_Bay"
      ]
    },
    {
      "totalEdits": 7,
      "editors": 1,
      "trendiness": -1,
      "isNew": false,
      "updated": "2017-01-18T01:04:36.000Z",
      "$merge": [
        "http://restbase.svc.eqiad.wmnet:7231/en.wikipedia.org/v1/page/summary/Underworld_Unleashed"
      ]
    }
  ]
}
Jdlrobson closed this task as "Resolved".Jan 18 2017, 10:06 PM
Jdlrobson claimed this task.

Wow so fast :) Thanks @Pchelolo and @mobrovac !!