Spike: Can/should Swift be used as Flink checkpoint backend?
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Ottomata
	Feb 24 2020, 4:10 PM

Description

As a WDQS Streaming Updater maintainer I want updater to use an external, widely adopted in WMF storage for checkpointing so that I can minimize effort required to maintain this component of the solution.

AC:

Plan to move forward on the Swift backed checkpoint store, or informed decision not to do so.

-> decision is to move forward with Swift

Details

Subject	Repo	Branch	Lines +/-
Flink Swift FS plugin with TempAuth	wikidata/query/flink-swift-plugin	master	+6 K -0
hieradata: add Swift account for wdqs	operations/puppet	production	+5 -0
Add Thanos Swift endpoints to the analytics-in4 filter	operations/homer/public	master	+4 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Gehel	T244590 [Epic] Rework the WDQS updater as an event driven application
		Resolved		• Zbyszko	T246004 Spike: Can/should Swift be used as Flink checkpoint backend?

Event Timeline

Ottomata created this task.Feb 24 2020, 4:10 PM

Gehel moved this task from Incoming to Scaling on the Wikidata-Query-Service board.Apr 8 2020, 12:47 PM

Gehel triaged this task as High priority.Sep 15 2020, 7:42 AM

• Zbyszko moved this task from Scaling to Current work on the Wikidata-Query-Service board.Sep 21 2020, 3:53 PM

• Zbyszko added a project: Discovery-Search (Current work).

The plan is to timebox this spike at 2 days.

CBogen moved this task from Incoming to Ready for Dev -- SWE on the Discovery-Search (Current work) board.Sep 21 2020, 5:07 PM

• Zbyszko claimed this task.Sep 22 2020, 9:06 AM

• Zbyszko moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.

fgiunchedi subscribed.Sep 23 2020, 9:13 AM

@Zbyszko re: docker and swift. @JMeybohm suggested using https://github.com/swiftstack/docker-swift (and possibly lowering auth token TTLs to make sure renewing expired tokens works as expected)

re: monitoring, we have production dashboards at https://grafana.wikimedia.org/d/OPgmB1Eiz/swift. I'm not sure about monitoring docker-swift, at a very basic level though swift will send statsd metrics out, so temporarily you can send to statsd.eqiad.wmnet for testing purposes.

HTH!

EBernhardson subscribed.Sep 24 2020, 11:19 PM

@fgiunchedi We estimate we'd need around 500GB of storage for the streaming updater (not accounting for replicas). Our use case is almost always write only (checkpoints are read only on pipeline restarts, which ideally will be done rarely) - but we have a elasticity when it comes to configuration of the checkpoints interval.

• Zbyszko updated the task description. (Show Details)Sep 29 2020, 8:32 AM

@Ottomata There's some confusion on how to access Swift from Hadoop cluster - I understood that it isn't doable, but from what I hear, search pipeline results go there. Can we reuse the same mechanism here? I rather have a set up done with the current updater, so we have a working end-to-end poc before we go to production.

Context in T219544 - IIRC your team uses a specific oozie util to push data on swift in the coordinator's definition:

https://github.com/wikimedia/wikimedia-discovery-analytics/commit/02fddbfc949b665457354b910bef43da9c421412 (see swift_upload occurrences)

There seems also to be some airflow code but I don't know much about it.

@fgiunchedi unfortunately, there is no docker on stat instances so I'm unable to test swift that way. I'd still prefer to have some container on a already running service (whichever is accessible from analytics cluster). Test we want to set up would involve longer running service, starting from longer intervals, going back to shorter ones. We want to set up full end to end solution, so it'd make sense to use a stable solution anyway.

In T246004#6501106, @Zbyszko wrote:

@fgiunchedi We estimate we'd need around 500GB of storage for the streaming updater (not accounting for replicas). Our use case is almost always write only (checkpoints are read only on pipeline restarts, which ideally will be done rarely) - but we have a elasticity when it comes to configuration of the checkpoints interval.

500G seems reasonable to me, do you have a sense how/if this size increases over time? (need to know in general but not a blocker ATM). Do you know if flink takes care of creating the containers too once it has access to swift? I'm asking because on container creation we can pick whether data will be stored on hdd or ssd.

In T246004#6501794, @Zbyszko wrote:

@fgiunchedi unfortunately, there is no docker on stat instances so I'm unable to test swift that way. I'd still prefer to have some container on a already running service (whichever is accessible from analytics cluster). Test we want to set up would involve longer running service, starting from longer intervals, going back to shorter ones. We want to set up full end to end solution, so it'd make sense to use a stable solution anyway.

I understand where you are coming from and wanting to setup a full end to end solution. Where is (local?) testing of the Flink/WDQS pipeline happening at the moment? I'm asking because wherever that environment is then it'd be good to have a (even minimal, single host) Swift cluster to test the integration.

fgiunchedi mentioned this in T264291: Swift users and their usage.Oct 1 2020, 9:49 AM

@fgiunchedi Currently, Flink pipeline resides on the Analytics Hadoop cluster. As for the question whether Flink creates it's containers - I think not, it did complain when there was no container, so I assume it expects one.

In T246004#6520077, @Zbyszko wrote:

@fgiunchedi Currently, Flink pipeline resides on the Analytics Hadoop cluster. As for the question whether Flink creates it's containers - I think not, it did complain when there was no container, so I assume it expects one.

Ack, thank you! For a POC / test I'd still prefer production resources not to be used, especially as we don't know (I think?) what the write patterns are like. At the same time it doesn't look like running docker on stat hosts is a thing (?) (cc @Ottomata @elukey ?). Do you have a sense (e.g. order of magnitude) of how many writes we're talking about (for the test/POC and also full scale), at what concurrency and how much data would be written at a time ? That'd help greatly with understanding if we can go ahead with the production Swift accounts.

We lack precise data for production - we haven't really optimised yet and complete functionality isn't yet ready (it will soon, though). Rarely, we get around 8-9GB checkpoints (when bootstrapping for example), but they do not happen regularly. Normally, checkpoints are incremental and each 30 seconds they drop data of sub-1MB volume. We may change those values, but we'll still use incremental checkpoints.

It's hard to tell when it comes to concurrency. It should be something around 20-30 threads writing small amounts of data, but due to the nature of the process, it's hard to know how many of this will be doing that at the same time.

Ok, thank you for the information. It doesn't seem we have an isolated test environment anyways so even though I'm reluctant we'll have to test on the production swift cluster. A middle ground I suppose would be to create the accounts on the thanos swift cluster first, which is functionally the same as production but not in the hot-path for serving content. Let me know how you'd like to proceed!

I'm fine with the thanos cluster option - we can proceed with that. @Ottomata do you know if thanos swift cluster is accessible from hadoop?

I don't know! @fgiunchedi how does one access the cluster? @elukey can check the network VLAN ACLs and update accordingly.

In T246004#6560303, @Ottomata wrote:

I don't know! @fgiunchedi how does one access the cluster? @elukey can check the network VLAN ACLs and update accordingly.

The canonical url is https://thanos-swift.discovery.wmnet

Ok, it looks like we need to allow traffic to 10.2.2.54:443 and 10.2.1.54:443 from the Analytics VLAN. @elukey can you add that? TY!

term swift {
    from {
        destination-address {
            /* swift.svc.codfw */
            10.2.1.27/32;
            /* swift.svc.eqiad */
            10.2.2.27/32;
        }
        protocol tcp;
        destination-port 443;
    }
    then accept;
}

I'll add those in the swift term then!

Change 635319 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/homer/public@master] Add Thanos Swift endpoints to the analytics-in4 filter

https://gerrit.wikimedia.org/r/635319

gerritbot added a project: Patch-For-Review.Oct 20 2020, 4:09 PM

Change 635319 merged by Elukey:
[operations/homer/public@master] Add Thanos Swift endpoints to the analytics-in4 filter

https://gerrit.wikimedia.org/r/635319

elukey@stat1004:~$ telnet thanos-swift.discovery.wmnet 443
Trying 10.2.2.54...
Connected to thanos-swift.discovery.wmnet.
Escape character is '^]'.
^]

Maintenance_bot removed a project: Patch-For-Review.Oct 21 2020, 8:10 AM

Change 635501 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: add Swift account for wqds

https://gerrit.wikimedia.org/r/635501

gerritbot added a project: Patch-For-Review.Oct 21 2020, 8:18 AM

Thank you all for swift (pun intended) action!

Change 635501 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: add Swift account for wdqs

https://gerrit.wikimedia.org/r/635501

In T246004#6567338, @Zbyszko wrote:

Thank you all for swift (pun intended) action!

haha! the account is setup now, I've written the credentials in your home on deploy1001

Maintenance_bot removed a project: Patch-For-Review.Oct 22 2020, 2:10 PM

Great! thanks - I'll get on that.

Progress so far:

I managed to connect to switft via command line and create a container. Unfortunately, we use V1 auth (or at least I don't know about any newer method) and swift client that is used by Flink only supports V2+. I'll try S3 compat layer now, unless there is a way of authenticating with our swift cluster through V2 auth.

There are unfortunately issues with S3, too. One implementation doesn't seem to work anymore since introduction of RecoverableWriters (Presto), other (Hadoop) is producing class loading errors (I'll post them once I'll finish all my researches). Right now I'm estimating how complicated would it be to implement our own Flink FileSystem implementation.

I'm happy to report that I managed to get swift-flink integration running - after reimplementing authorization (tempauth wasn't supported in the original swift plugin). Unfortunately, this implementation also suffers from lack of implementation of recoverable writers, but it turned out that this is only issue for Sink implementation. That means that we can't use as sink for spurious/late events or ignored mutations, but since those aren't as crucial to continued healthy run of the pipeline, we can figure out different solution to them.

JAllemandou awarded a token.Nov 19 2020, 2:00 PM

Change 643257 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/flink-swift-plugin@master] Flink Swift FS plugin with TempAuth

https://gerrit.wikimedia.org/r/643257

gerritbot added a project: Patch-For-Review.Nov 25 2020, 2:36 PM

• Zbyszko moved this task from In Progress to Needs review on the Discovery-Search (Current work) board.Nov 30 2020, 8:05 AM

Change 643257 merged by jenkins-bot:
[wikidata/query/flink-swift-plugin@master] Flink Swift FS plugin with TempAuth

https://gerrit.wikimedia.org/r/643257

Maintenance_bot removed a project: Patch-For-Review.Dec 7 2020, 9:10 AM

Mstyles mentioned this in T269876: Add swift plugin to Flink k8s.Dec 10 2020, 6:39 PM

CBogen moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Dec 14 2020, 4:26 PM

Gehel closed this task as Resolved.Jan 6 2021, 9:08 AM

Gehel updated the task description. (Show Details)