Page MenuHomePhabricator

Remove webrequest misc analytics related jobs and code after cache misc -> text merge is complete
Closed, ResolvedPublic5 Story Points

Details

Related Gerrit Patches:
operations/puppet : productioncamus: set proper number of consumers for Webrequest
analytics/refinery : masterRemove cache misc from Refinery

Event Timeline

Ottomata triaged this task as Normal priority.Jul 31 2018, 4:59 PM
Ottomata created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 31 2018, 4:59 PM
elukey added a subscriber: ema.Sep 11 2018, 5:19 PM

After checking with kafkacat -b kafka-jumbo1001.eqiad.wmnet:9092 -t webrequest_misc I can only see health checks flowing from Kafka.

@ema just to be sure, can you confirm that cache misc is gone and that we can get rid of all our data processing for it?

Change 459827 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Remove cache misc from Refinery

https://gerrit.wikimedia.org/r/459827

ema added a comment.Sep 12 2018, 9:08 AM

@ema just to be sure, can you confirm that cache misc is gone and that we can get rid of all our data processing for it?

Yes, cache_misc is gone.

elukey added a subscriber: leila.EditedSep 12 2018, 2:43 PM

Hi @leila!

Our dear cache misc (the Varnish hosts that were hosting misc websites like phabricator, yarn, etc..) has now been merged into cache text. When reviewing all the Analytics Hadoop jobs referencing cache misc in https://gerrit.wikimedia.org/r/#/c/459827/, we found also wdqs_extract. Is it still used? I am asking because we have two options now:

  1. delete it if not needed
  2. fix it (as it is done in the code review linked above) and also try to figure out how to fill the data gaps happened during the past weeks (from when the Traffic team merged misc in to text up to now). Since the wdqs_extract kept using cache misc, it was not pulling any relevant data (that was in cache text) probably leading to a nice flatline.

Let me know!

Luca

@elukey I can't make the call for deleting it until I have the confirmation from @mkroetzsch . I've already sent him an email about the next steps for the MOU and deleting or not is related to his response on that front. Can this wait a couple of more weeks?

It's already been broken for a few weeks. We don't need to delete the data at all, but the change Luca is working on will cause this job to run with the webrequest_text data partition, which is a lot more than webrequest_misc. We can do it, but we'd rather not if we don't have to!

Luca, I suggest removing the job and if we hear back otherwise we can re-add it then.

leila added a comment.Sep 13 2018, 4:34 PM

Luca, I suggest removing the job and if we hear back otherwise we can re-add it then.

+1.

Thanks for the comments! I've updated https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/459827/ to remove the job, this patch will be deployed during or after the Analytics offsite probably :)

elukey claimed this task.Sep 14 2018, 6:46 AM
elukey moved this task from Next Up to Ready to Deploy on the Analytics-Kanban board.

Yes, it is fine to stop the extraction for now. Many thanks!

Change 459827 merged by Joal:
[analytics/refinery@master] Remove cache misc from Refinery

https://gerrit.wikimedia.org/r/459827

Change 462761 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] camus: set proper number of consumers for Webrequest

https://gerrit.wikimedia.org/r/462761

Change 462761 merged by Elukey:
[operations/puppet@production] camus: set proper number of consumers for Webrequest

https://gerrit.wikimedia.org/r/462761

elukey set the point value for this task to 5.Sep 25 2018, 4:37 PM
elukey moved this task from Ready to Deploy to Done on the Analytics-Kanban board.
Nuria closed this task as Resolved.Sep 26 2018, 7:12 PM