Page MenuHomePhabricator

Remove webrequest misc analytics related jobs and code after cache misc -> text merge is complete
Closed, ResolvedPublic5 Estimated Story Points

Event Timeline

Ottomata triaged this task as Medium priority.Jul 31 2018, 4:59 PM
Ottomata created this task.

After checking with kafkacat -b kafka-jumbo1001.eqiad.wmnet:9092 -t webrequest_misc I can only see health checks flowing from Kafka.

@ema just to be sure, can you confirm that cache misc is gone and that we can get rid of all our data processing for it?

Change 459827 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Remove cache misc from Refinery

https://gerrit.wikimedia.org/r/459827

@ema just to be sure, can you confirm that cache misc is gone and that we can get rid of all our data processing for it?

Yes, cache_misc is gone.

Hi @leila!

Our dear cache misc (the Varnish hosts that were hosting misc websites like phabricator, yarn, etc..) has now been merged into cache text. When reviewing all the Analytics Hadoop jobs referencing cache misc in https://gerrit.wikimedia.org/r/#/c/459827/, we found also wdqs_extract. Is it still used? I am asking because we have two options now:

  1. delete it if not needed
  2. fix it (as it is done in the code review linked above) and also try to figure out how to fill the data gaps happened during the past weeks (from when the Traffic team merged misc in to text up to now). Since the wdqs_extract kept using cache misc, it was not pulling any relevant data (that was in cache text) probably leading to a nice flatline.

Let me know!

Luca

@elukey I can't make the call for deleting it until I have the confirmation from @mkroetzsch . I've already sent him an email about the next steps for the MOU and deleting or not is related to his response on that front. Can this wait a couple of more weeks?

It's already been broken for a few weeks. We don't need to delete the data at all, but the change Luca is working on will cause this job to run with the webrequest_text data partition, which is a lot more than webrequest_misc. We can do it, but we'd rather not if we don't have to!

Luca, I suggest removing the job and if we hear back otherwise we can re-add it then.

Luca, I suggest removing the job and if we hear back otherwise we can re-add it then.

+1.

Thanks for the comments! I've updated https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/459827/ to remove the job, this patch will be deployed during or after the Analytics offsite probably :)

Yes, it is fine to stop the extraction for now. Many thanks!

Change 459827 merged by Joal:
[analytics/refinery@master] Remove cache misc from Refinery

https://gerrit.wikimedia.org/r/459827

Change 462761 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] camus: set proper number of consumers for Webrequest

https://gerrit.wikimedia.org/r/462761

Change 462761 merged by Elukey:
[operations/puppet@production] camus: set proper number of consumers for Webrequest

https://gerrit.wikimedia.org/r/462761

elukey set the point value for this task to 5.Sep 25 2018, 4:37 PM
elukey moved this task from Ready to Deploy to Done on the Analytics-Kanban board.