Page MenuHomePhabricator

Set up standard HTTPS Termination -> 2layer caching for maps service
Closed, ResolvedPublic

Description

Basically, we need:

  • maps cache cluster in eqiad
    • standard 2layer setup like "upload", with fresh/minimal VCL
    • can reuse the recently decommed cp104[34] machines for now
    • backend is kartotherian.svc.codfw.wmnet:4000 (LVS service)
  • maps.wikimedia.org LVS/DNS setup into new IP for maps cache cluster

Event Timeline

Yurik created this task.Jul 7 2015, 8:34 PM
Yurik raised the priority of this task from to Needs Triage.
Yurik updated the task description. (Show Details)
Yurik added subscribers: Aklapper, MaxSem, Yurik and 2 others.
Restricted Application added a subscriber: Matanya. · View Herald TranscriptJul 7 2015, 8:34 PM
MaxSem moved this task from Backlog to Stalled/Waiting on the Maps-Sprint board.Jul 9 2015, 8:06 PM
Yurik updated the task description. (Show Details)Jul 18 2015, 12:00 AM
Yurik set Security to None.
fgiunchedi triaged this task as Medium priority.Jul 20 2015, 2:25 PM
fgiunchedi added a subscriber: fgiunchedi.
jcrespo claimed this task.Jul 21 2015, 8:30 AM

Questions from the data end:

  1. Is this going to be integrated with the existing kafka pipelines for logging?
  2. If so, what source cluster will it be under? web, misc, a new "maps" one..?
Yurik updated the task description. (Show Details)Aug 6 2015, 10:22 AM
BBlack renamed this task from Assign varnish memory-only role to maps servers to Set up standard HTTPS Termination -> 2layer caching -> LVS -> servicehosts for maps service.Aug 7 2015, 7:17 PM
BBlack updated the task description. (Show Details)
BBlack added a project: Traffic.
BBlack updated the task description. (Show Details)Aug 7 2015, 7:24 PM
BBlack renamed this task from Set up standard HTTPS Termination -> 2layer caching -> LVS -> servicehosts for maps service to Set up standard HTTPS Termination -> 2layer caching for maps service.Aug 7 2015, 7:54 PM
BBlack claimed this task.Aug 7 2015, 10:41 PM

Change 230243 had a related patch set uploaded (by BBlack):
add maps.wikimedia.org (-> maps-lb.eqiad)

https://gerrit.wikimedia.org/r/230243

Change 230242 had a related patch set uploaded (by BBlack):
create maps-lb.eqiad, repurposing osm-lb.eqiad ipv4

https://gerrit.wikimedia.org/r/230242

Change 230241 had a related patch set uploaded (by BBlack):
remove unused revdns for osm-lb.esams

https://gerrit.wikimedia.org/r/230241

Change 230246 had a related patch set uploaded (by BBlack):
Add cache_maps and related LVS config, using cp104[34]

https://gerrit.wikimedia.org/r/230246

Change 230241 merged by BBlack:
remove unused revdns for osm-lb.esams

https://gerrit.wikimedia.org/r/230241

Change 230242 merged by BBlack:
create maps-lb.eqiad, repurposing osm-lb.eqiad ipv4

https://gerrit.wikimedia.org/r/230242

Change 230243 merged by BBlack:
add maps.wikimedia.org (-> maps-lb.eqiad)

https://gerrit.wikimedia.org/r/230243

Change 230246 merged by BBlack:
Add cache_maps and related LVS config, using cp104[34]

https://gerrit.wikimedia.org/r/230246

This is now basically working at https://maps.wikimedia.org/static/ . Don't link that anywhere or use it on wikis anywhere yet, there are probably lots of ancillary details to sort out, review, and/or fix first.

Note also that due to taking whatever available spares we have here and there, the cluster config isn't really ideal or standard. The cache cluster for this is a pair of non-SSD older nodes in eqiad, while the service hosts are over in codfw.

This is now basically working at https://maps.wikimedia.org/static/ . Don't link that anywhere or use it on wikis anywhere yet, there are probably lots of ancillary details to sort out, review, and/or fix first.
Note also that due to taking whatever available spares we have here and there, the cluster config isn't really ideal or standard. The cache cluster for this is a pair of non-SSD older nodes in eqiad, while the service hosts are over in codfw.

awesome

BBlack updated the task description. (Show Details)Aug 8 2015, 12:44 AM
Yurik added a comment.Aug 8 2015, 1:34 AM

@BBlack, thank you for doing this on such a short notice! The results so far are amazing! We will be doing performance eval, and regen the tiles to fix the most outstanding issues. Once again, it is awesome!

Yurik closed this task as Resolved.Aug 8 2015, 9:19 AM
Yurik added a subscriber: mark.

@BBlack, re non-SSD -- I think there are a few smaller unused SSDs that might be used to upgrade these servers. @mark mentioned them when we were considering the storage for the codfw servers.

Also, I will close this issue and move the cache invalidation to a separate task - max is working on the htcp component, and once done, we should figure it out. Thanks!

Yurik updated the task description. (Show Details)Aug 8 2015, 9:19 AM
Yurik moved this task from Stalled/Waiting to Done on the Maps-Sprint board.Aug 8 2015, 11:45 AM
Ironholds reopened this task as Open.Aug 8 2015, 2:23 PM

Not done; to repeat my questions above:

  1. Is this going to be integrated with the existing kafka pipelines for logging?
  2. If so, what source cluster will it be under? web, misc, a new "maps" one..?
BBlack added a comment.Aug 8 2015, 3:10 PM

Not done; to repeat my questions above:

  1. Is this going to be integrated with the existing kafka pipelines for logging?
  2. If so, what source cluster will it be under? web, misc, a new "maps" one..?

It will be a new "maps" one, and yes there's probably still work to do there on my side and otto's to hook up the ends of the pipes. But it's using our same basic puppetization infrastructure as e.g. upload, text, etc.

Gotcha; awesome! Long as the pipes get hooked up I'm all happy :).

(We need a new task for that, or..?)

BBlack added a subscriber: Ottomata.Aug 8 2015, 4:18 PM

No idea, I'm going to ping @Ottomata Monday and talk it out with him first and then see what we need to do.

BBlack moved this task from Triage to In Progress on the Traffic board.Aug 8 2015, 4:38 PM
Yurik moved this task from Done to Stalled/Waiting on the Maps-Sprint board.Aug 8 2015, 6:16 PM

Change 230535 had a related patch set uploaded (by Ottomata):
Make camus import webrequest_maps from new maps varnish cluster

https://gerrit.wikimedia.org/r/230535

Yup.

I just created the webrequest_maps kafka topic:

kafka topic --create --topic webrequest_maps --partitions 12 --replication-factor 3

And we'll also need:
https://gerrit.wikimedia.org/r/#/c/230535/

As long as the new varnish maps cluster has varnishkafka configured in the same way as elsewhere (with topic => 'webrequest_maps'), then this should just work.

@BBlack, just a note for this in the future. We have recently started running Kafka with auto.create.topics.enable set, which means if anything starts producing to a Kafka topic that doesn't exist, that topic will be created. Unfortunately, we cannot delete topics in Kafka. It doesn't really hurt to have unused topics in Kafka, but it is just ugly and I have to look at them. I just wanted you to know in case you fire up varnishkafka instances in places with a new topic name set.

Change 230539 had a related patch set uploaded (by BBlack):
Add webrequest_maps kafka topic output for cache_maps

https://gerrit.wikimedia.org/r/230539

Change 230546 had a related patch set uploaded (by BBlack):
fix description for maps LVS

https://gerrit.wikimedia.org/r/230546

Change 230547 had a related patch set uploaded (by BBlack):
remove icinga monitoring for maps.wm.o for now (not production)

https://gerrit.wikimedia.org/r/230547

Change 230546 merged by BBlack:
fix description for maps LVS

https://gerrit.wikimedia.org/r/230546

Change 230547 merged by BBlack:
remove icinga monitoring for maps.wm.o for now (not production)

https://gerrit.wikimedia.org/r/230547

Change 230539 merged by Ottomata:
Add webrequest_maps kafka topic output for cache_maps

https://gerrit.wikimedia.org/r/230539

Change 231726 had a related patch set uploaded (by BBlack):
maps.wm.o: turn back on, but only for beta self referer

https://gerrit.wikimedia.org/r/231726

Change 231726 merged by BBlack:
maps.wm.o: turn back on, but only for beta self referer

https://gerrit.wikimedia.org/r/231726

Yurik closed this task as Resolved.Aug 15 2015, 1:11 AM

per IRC with @BBlack, closing this task as complete.

<bblack> there are outstanding spinoff issues like purging, or like actually deploying a production-ready variant of this better

So we will need more tasks to resolve them, but separate from this one.

Change 230535 abandoned by Ottomata:
Make camus import webrequest_maps from new maps varnish cluster

Reason:
This was done by joseph in another commit.

https://gerrit.wikimedia.org/r/230535

BBlack moved this task from In Progress to Done on the Traffic board.Aug 27 2015, 2:41 AM