Page MenuHomePhabricator

Spike: Can/should Swift be used as Flink checkpoint backend?
Closed, ResolvedPublic

Description

As a WDQS Streaming Updater maintainer I want updater to use an external, widely adopted in WMF storage for checkpointing so that I can minimize effort required to maintain this component of the solution.

AC:

  • Plan to move forward on the Swift backed checkpoint store, or informed decision not to do so.

-> decision is to move forward with Swift

Event Timeline

Gehel triaged this task as High priority.Sep 15 2020, 7:42 AM

The plan is to timebox this spike at 2 days.

@Zbyszko re: docker and swift. @JMeybohm suggested using https://github.com/swiftstack/docker-swift (and possibly lowering auth token TTLs to make sure renewing expired tokens works as expected)

re: monitoring, we have production dashboards at https://grafana.wikimedia.org/d/OPgmB1Eiz/swift. I'm not sure about monitoring docker-swift, at a very basic level though swift will send statsd metrics out, so temporarily you can send to statsd.eqiad.wmnet for testing purposes.

HTH!

@fgiunchedi We estimate we'd need around 500GB of storage for the streaming updater (not accounting for replicas). Our use case is almost always write only (checkpoints are read only on pipeline restarts, which ideally will be done rarely) - but we have a elasticity when it comes to configuration of the checkpoints interval.

@Ottomata There's some confusion on how to access Swift from Hadoop cluster - I understood that it isn't doable, but from what I hear, search pipeline results go there. Can we reuse the same mechanism here? I rather have a set up done with the current updater, so we have a working end-to-end poc before we go to production.

Context in T219544 - IIRC your team uses a specific oozie util to push data on swift in the coordinator's definition:

https://github.com/wikimedia/wikimedia-discovery-analytics/commit/02fddbfc949b665457354b910bef43da9c421412 (see swift_upload occurrences)

There seems also to be some airflow code but I don't know much about it.

@fgiunchedi unfortunately, there is no docker on stat instances so I'm unable to test swift that way. I'd still prefer to have some container on a already running service (whichever is accessible from analytics cluster). Test we want to set up would involve longer running service, starting from longer intervals, going back to shorter ones. We want to set up full end to end solution, so it'd make sense to use a stable solution anyway.

@fgiunchedi We estimate we'd need around 500GB of storage for the streaming updater (not accounting for replicas). Our use case is almost always write only (checkpoints are read only on pipeline restarts, which ideally will be done rarely) - but we have a elasticity when it comes to configuration of the checkpoints interval.

500G seems reasonable to me, do you have a sense how/if this size increases over time? (need to know in general but not a blocker ATM). Do you know if flink takes care of creating the containers too once it has access to swift? I'm asking because on container creation we can pick whether data will be stored on hdd or ssd.

@fgiunchedi unfortunately, there is no docker on stat instances so I'm unable to test swift that way. I'd still prefer to have some container on a already running service (whichever is accessible from analytics cluster). Test we want to set up would involve longer running service, starting from longer intervals, going back to shorter ones. We want to set up full end to end solution, so it'd make sense to use a stable solution anyway.

I understand where you are coming from and wanting to setup a full end to end solution. Where is (local?) testing of the Flink/WDQS pipeline happening at the moment? I'm asking because wherever that environment is then it'd be good to have a (even minimal, single host) Swift cluster to test the integration.

@fgiunchedi Currently, Flink pipeline resides on the Analytics Hadoop cluster. As for the question whether Flink creates it's containers - I think not, it did complain when there was no container, so I assume it expects one.

@fgiunchedi Currently, Flink pipeline resides on the Analytics Hadoop cluster. As for the question whether Flink creates it's containers - I think not, it did complain when there was no container, so I assume it expects one.

Ack, thank you! For a POC / test I'd still prefer production resources not to be used, especially as we don't know (I think?) what the write patterns are like. At the same time it doesn't look like running docker on stat hosts is a thing (?) (cc @Ottomata @elukey ?). Do you have a sense (e.g. order of magnitude) of how many writes we're talking about (for the test/POC and also full scale), at what concurrency and how much data would be written at a time ? That'd help greatly with understanding if we can go ahead with the production Swift accounts.

We lack precise data for production - we haven't really optimised yet and complete functionality isn't yet ready (it will soon, though). Rarely, we get around 8-9GB checkpoints (when bootstrapping for example), but they do not happen regularly. Normally, checkpoints are incremental and each 30 seconds they drop data of sub-1MB volume. We may change those values, but we'll still use incremental checkpoints.

It's hard to tell when it comes to concurrency. It should be something around 20-30 threads writing small amounts of data, but due to the nature of the process, it's hard to know how many of this will be doing that at the same time.

Ok, thank you for the information. It doesn't seem we have an isolated test environment anyways so even though I'm reluctant we'll have to test on the production swift cluster. A middle ground I suppose would be to create the accounts on the thanos swift cluster first, which is functionally the same as production but not in the hot-path for serving content. Let me know how you'd like to proceed!

I'm fine with the thanos cluster option - we can proceed with that. @Ottomata do you know if thanos swift cluster is accessible from hadoop?

I don't know! @fgiunchedi how does one access the cluster? @elukey can check the network VLAN ACLs and update accordingly.

I don't know! @fgiunchedi how does one access the cluster? @elukey can check the network VLAN ACLs and update accordingly.

The canonical url is https://thanos-swift.discovery.wmnet

Ok, it looks like we need to allow traffic to 10.2.2.54:443 and 10.2.1.54:443 from the Analytics VLAN. @elukey can you add that? TY!

term swift {
    from {
        destination-address {
            /* swift.svc.codfw */
            10.2.1.27/32;
            /* swift.svc.eqiad */
            10.2.2.27/32;
        }
        protocol tcp;
        destination-port 443;
    }
    then accept;
}

I'll add those in the swift term then!

Change 635319 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/homer/public@master] Add Thanos Swift endpoints to the analytics-in4 filter

https://gerrit.wikimedia.org/r/635319

Change 635319 merged by Elukey:
[operations/homer/public@master] Add Thanos Swift endpoints to the analytics-in4 filter

https://gerrit.wikimedia.org/r/635319

elukey@stat1004:~$ telnet thanos-swift.discovery.wmnet 443
Trying 10.2.2.54...
Connected to thanos-swift.discovery.wmnet.
Escape character is '^]'.
^]

:)

Change 635501 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: add Swift account for wqds

https://gerrit.wikimedia.org/r/635501

Thank you all for swift (pun intended) action!

Change 635501 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: add Swift account for wdqs

https://gerrit.wikimedia.org/r/635501

Thank you all for swift (pun intended) action!

haha! the account is setup now, I've written the credentials in your home on deploy1001

Great! thanks - I'll get on that.

Progress so far:

I managed to connect to switft via command line and create a container. Unfortunately, we use V1 auth (or at least I don't know about any newer method) and swift client that is used by Flink only supports V2+. I'll try S3 compat layer now, unless there is a way of authenticating with our swift cluster through V2 auth.

There are unfortunately issues with S3, too. One implementation doesn't seem to work anymore since introduction of RecoverableWriters (Presto), other (Hadoop) is producing class loading errors (I'll post them once I'll finish all my researches). Right now I'm estimating how complicated would it be to implement our own Flink FileSystem implementation.

I'm happy to report that I managed to get swift-flink integration running - after reimplementing authorization (tempauth wasn't supported in the original swift plugin). Unfortunately, this implementation also suffers from lack of implementation of recoverable writers, but it turned out that this is only issue for Sink implementation. That means that we can't use as sink for spurious/late events or ignored mutations, but since those aren't as crucial to continued healthy run of the pipeline, we can figure out different solution to them.

Change 643257 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/flink-swift-plugin@master] Flink Swift FS plugin with TempAuth

https://gerrit.wikimedia.org/r/643257

Change 643257 merged by jenkins-bot:
[wikidata/query/flink-swift-plugin@master] Flink Swift FS plugin with TempAuth

https://gerrit.wikimedia.org/r/643257

Gehel updated the task description. (Show Details)