Page MenuHomePhabricator

Swift container for performance flame graphs (ArcLamp)
Open, HighPublic

Description

The ArcLamp pipeline collects stack traces (via Redis) and produces the flame graphs seen at https://performance.wikimedia.org/php-profiling/.

Right now, the processed logfiles and SVGs are stored in /srv/xenon on webperf1002, and processing runs via cron on that host. We would like to run the pipeline in a more distributed manner (T227026), increase retention (T200108), and not require bespoke backup/restore/failover procedures for this data. I believe the path forward is thus to store this data in Swift. The analytics cluster was also considered, however the data needs to be externally available via HTTPS on the performance site.

I've done some initial work towards rewriting the cron job to read/write data from a local Swift instance on my laptop. I would like to start running this on real data, initially in parallel with the current pipeline.

This task is to determine replication, etc. settings and create Swift container(s) for this data. As I do not have admin rights in the Swift cluster, some SRE input & assistance is requested.

Event Timeline

Looks like hierdata/(swift|codfw)/params.yaml needs updating, along with the private puppet repo (beforehand).

I'd suggest "performance:xhgui" "performance:arc-lamp" as the Swift account:user. It could have admin rights for "performance" (analogous to our main MediaWiki user, which can create containers automatically)*. Manual container creation can be done via https://wikitech.wikimedia.org/wiki/Swift/How_To#Create_a_container (the "swift" client tool is just doing the HTTP POST for you here).

Great to see this work ! re: authentication and permissions it is indeed like @aaron outlined, we'd be creating a user and that can create containers and upload files at will.

A bunch of questions to get a better idea on the dataset, how big are you expecting this data to get over time ? Does the data need to be replicated in codfw too (assuming you'd be starting to write in eqiad)? Our standard replication within the datacenter is 3x copies. We have a standard rate limit of 30/s write operations (PUT/DELETE/POST), mentioning it in case it is relevant. Also in terms of access to the files, I'd imagine the containers to be public for reads and accessed by reverse proxying via the webserver on the performance website ?

Looking at yesterday's (2020-02-11) output, it was about 8 GB of (uncompressed) logs and 14 MB of SVGs, and about 800 files total. We can control the sampling interval to regulate how big these get, so let's assume it's relatively constant. I'll have to check if there's a reason we don't compress the logs; I feel like we should, which would dramatically reduce this. (I just now tried gzip -1 on one set of logs, and they went from 4 GB to 479 MB.)

Yes, we want this data to be available in both data centers. Right now, the entire pipeline is duplicated in both places: we read from Redis twice, and generate two sets of files, one in each location. We could save some resources if replication happened at the file level instead. The Redis pubsub channel tracks what data has been consumed, so it shouldn't be too difficult to make processing fail over reliably.

At 800 files/day, we'll be well under the rate limit (assuming it's per-file and not per-block). The only time I could think it'd be an issue would be if the pipeline breaks and we have to backfill data, but so long as exceeding the rate limit generates a unique error, we can add logic to back off if we get throttled.

Are public Swift containers directly accessible? If so, I can think of points both for and against a reverse proxy, instead of linking directly. If Swift reads can only happen from inside, then the reverse proxy is definitely how we'll go.

Compression seems doable. LZMA works well per https://phabricator.wikimedia.org/T235455#5837382 . arclamp-grep would have to change though; maybe grep(fname, search_string) could stream zipped log object contents to lzcat and loop through the resulting lines.

@fgiunchedi How hard is it too set up hourly swift-repl container sync for eqiad <=> codfw for such a container? That seems doable. It might be interesting to try https://docs.openstack.org/swift/latest/overview_container_sync.html for this. It has far less traffic than the MediaWiki upload repos, so it would be lower risk. My concern would be the fact that the daemon that listens to redis will keep appending JSON lines to the objects representing the "current" buckets (daily, hourly) for each endpoint. That might trigger a lot of naive "copy the whole file" writes each time some JSON profiling lines are flushed.

As long as there is proper buffering of JSON lines into chunks to periodically flush, I don't see the object write rate being a problem.

AFAIK, and having looking at puppet today, Swift is only indirectly accessible from the outside. We have upload.wikimedia.org/X that falls back to a random/close ms-fe* Swift proxy server via DNS discovery. The custom WSGI middleware we have, rewrite.py, runs in the proxy server and reroutes the request as follows:

  • auth request (/auth) or supposedly authenticated (AUTH_) URLs pass directly to the core Swift response handler (valid account name in URL and valid token header presence is enforced)
  • known public paths as recognized by rewrite.py are rewritten to an unauthenticated URLs for corresponding swift containers (within the "mw" account) and passed to the core Swift response handler (401s if the container does not have .r:* within x-container-read ACLs)
  • for any other path, an error response is returned

Looking at yesterday's (2020-02-11) output, it was about 8 GB of (uncompressed) logs and 14 MB of SVGs, and about 800 files total. We can control the sampling interval to regulate how big these get, so let's assume it's relatively constant. I'll have to check if there's a reason we don't compress the logs; I feel like we should, which would dramatically reduce this. (I just now tried gzip -1 on one set of logs, and they went from 4 GB to 479 MB.)

Sounds good, from T200108 even without compression it seems the whole dataset is going to be ~200GB which is ok

Yes, we want this data to be available in both data centers. Right now, the entire pipeline is duplicated in both places: we read from Redis twice, and generate two sets of files, one in each location. We could save some resources if replication happened at the file level instead. The Redis pubsub channel tracks what data has been consumed, so it shouldn't be too difficult to make processing fail over reliably.

At 800 files/day, we'll be well under the rate limit (assuming it's per-file and not per-block). The only time I could think it'd be an issue would be if the pipeline breaks and we have to backfill data, but so long as exceeding the rate limit generates a unique error, we can add logic to back off if we get throttled.

ok! yeah rate limit doesn't seem a problem in practice, we can whitelist the account and/or bump the limits if it comes to that

Are public Swift containers directly accessible? If so, I can think of points both for and against a reverse proxy, instead of linking directly. If Swift reads can only happen from inside, then the reverse proxy is definitely how we'll go.

As @aaron mentioned the containers are accessible via our frontend caching layer, in the case of commons/upload.wikimedia.org the caching layer talks directly to swift for historical reasons and we have a custom middleware in swift to translate container names. For new use cases having the service/webserver reverse-proxy to swift is recommended, thus the request flow (all https) will be: clients -> frontend caches -> apache on webperf -> swift.

@fgiunchedi How hard is it too set up hourly swift-repl container sync for eqiad <=> codfw for such a container? That seems doable. It might be interesting to try https://docs.openstack.org/swift/latest/overview_container_sync.html for this. It has far less traffic than the MediaWiki upload repos, so it would be lower risk. My concern would be the fact that the daemon that listens to redis will keep appending JSON lines to the objects representing the "current" buckets (daily, hourly) for each endpoint. That might trigger a lot of naive "copy the whole file" writes each time some JSON profiling lines are flushed.

Not hard to setup container sync, in fact that's what we are doing for docker registry (cfr T214289 T227570, plus puppet). Although I'm confused about the json file you mentioned, I was under the impression that only output svgs and logs would be stored in swift? (i.e. https://performance.wikimedia.org/arclamp/logs/). In terms of failover I don't know offhand if it is possible to keep the same container name 2-way synchronized (assuming only one writer at a time, or that the object names don't collide).

In tactical/practical terms: I can assist with this and can review patches but my time spent on swift is limited (mostly maintenance) as swift maintainership itself is currently not funded.

HTH!

Change 572129 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] Add Swift user for ArcLamp

https://gerrit.wikimedia.org/r/572129

I submitted a patch which I *think* does what's needed to create the user, less the private keys. I don't know if there's more to it than this, but hopefully it's a starting point.

Change 572129 merged by Filippo Giunchedi:
[operations/puppet@production] Add Swift user for ArcLamp

https://gerrit.wikimedia.org/r/572129

Mentioned in SAL (#wikimedia-operations) [2020-02-19T08:14:09Z] <godog> roll restart swift proxies - T244776

In case it is useful: we have setup a separate Swift cluster to host Prometheus long term data (https://wikitech.wikimedia.org/wiki/Thanos) which unlike the main swift cluster is multi-site, let me know if that's something you'd be interested in trying.

Change 618626 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] arclamp: require python-swiftclient

https://gerrit.wikimedia.org/r/618626

The account and container are now working in beta (deployment-ms-fe03.deployment-prep.eqiad.wmflabs):

$ swift list
arclamp-logs-daily
arclamp-logs-hourly
arclamp-svgs-daily
arclamp-svgs-hourly

Some notes:

The labs instance gets its account hieradata (https://gerrit.wikimedia.org/r/572129) from horizon. I had to add AUTH_performance there.

The puppet agent has been failing on this host at least as far back as syslog goes (>7 days), due to:

Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'profile::prometheus::memcached_exporter::arguments' (file: /etc/puppet/modules/profile/manifests/prometheus/memcached_exporter.pp, line: 1) on node deployment-ms-fe03.deployment-prep.eqiad.wmflabs

I think this should have been fixed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/612507, but despite that patch being present, I had to manually add an empty value for the missing key in the host configuration on horizon.

The dummy account credentials in beta were being overridden by a newer local commit in /var/lib/git/labs/private on the puppetmaster, so I had to add another local commit there.

service swift-proxy restart was needed to get it to pick up the new proxy-server.conf after the puppet run (this was not done automatically).

Puppet doesn't generate (and actively deletes if present) the /etc/swift/account_AUTH_*.env files (from the Swift How-To) in the beta environment:

Aug  5 22:07:25 deployment-ms-fe03 puppet-agent[32512]: (/Stage[main]/Swift::Stats::Accounts/Swift::Stats::Stats_account[performance_arclamp]/File[/etc/swift/account_AUTH_performance.env]/ensure) removed

I manually created one in my home directory instead.

Change 618626 merged by Dzahn:
[operations/puppet@production] arclamp: require python-swiftclient

https://gerrit.wikimedia.org/r/618626

webperf1002: Package[python-swiftclient]/ensure: created

Change 622904 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] arclamp: provide Swift credentials to cron jobs

https://gerrit.wikimedia.org/r/622904

Change 622915 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[performance/arc-lamp@master] [WIP] Copy SVGs to Swift

https://gerrit.wikimedia.org/r/622915

Change 623068 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] [WIP] arclamp: serve SVGs from Swift

https://gerrit.wikimedia.org/r/623068

Change 624795 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] deployment-prep: add profile::prometheus::memcached_exporter::arguments: '' to Hiera

https://gerrit.wikimedia.org/r/624795

The puppet agent has been failing on this host at least as far back as syslog goes
I think this should have been fixed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/612507, but despite that patch being present, I had to manually add an empty value

Unfortunately lookups that work for prod don't work in cloud as one would expect (--> T255787).

This should do it though:

https://gerrit.wikimedia.org/r/624795

Change 622904 merged by Dzahn:
[operations/puppet@production] arclamp: provide Swift credentials to cron jobs

https://gerrit.wikimedia.org/r/622904

on webperf1002.eqiad.wmnet there are now Swift credentials in /etc/swift/

Change 624795 merged by Dzahn:
[operations/puppet@production] deployment-prep: add profile::prometheus::memcached_exporter::arguments

https://gerrit.wikimedia.org/r/624795

Change 622915 merged by jenkins-bot:
[performance/arc-lamp@master] Copy SVGs and compressed logs to Swift

https://gerrit.wikimedia.org/r/622915

Change 626241 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[performance/arc-lamp@master] Generate JSON index

https://gerrit.wikimedia.org/r/626241

Change 626779 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] [WIP] arclamp: fix spurious "environment changed"

https://gerrit.wikimedia.org/r/626779

Change 626779 merged by Dzahn:
[operations/puppet@production] arclamp: fix spurious "environment changed"

https://gerrit.wikimedia.org/r/626779

Change 626241 merged by jenkins-bot:
[performance/arc-lamp@master] Generate JSON index

https://gerrit.wikimedia.org/r/626241

Change 629497 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[performance/arc-lamp@master] [WIP] Generate HTML index

https://gerrit.wikimedia.org/r/629497

Change 623068 merged by Legoktm:
[operations/puppet@production] arclamp: serve SVGs, compressed logs from Swift

https://gerrit.wikimedia.org/r/623068

Change 673602 had a related patch set uploaded (by Dave Pifke; owner: Dave Pifke):
[operations/puppet@production] arclamp: allow Puppet CA for ms-fe.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/673602

Change 673602 merged by Legoktm:
[operations/puppet@production] arclamp: enable SSL to ms-fe.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/673602

Krinkle triaged this task as High priority.Fri, Sep 24, 4:01 PM