Page MenuHomePhabricator

Possible flink optimizations/cleanups
Closed, ResolvedPublic8 Estimated Story Points

Description

As part of the flink review with ververica here are the few points we agreed to exeriment before our next meeting in January:

  • Remove non-optimized reordering
  • remove unnecessary chained operators (e.g. routing to side outputs can be done directly inside the same process function)
  • cleanup unnecessary serialization of the Patch class, might just be necessary to declare the Statement interface to Kryo
  • use of DataStreamUtils#reinterpretAsKeyedStream(DataStream<T>, KeySelector<T,K>) and possibly use the KeyedStream signature a bit more
  • try to drop custom parallelism
  • Enable Object reuse
  • Test unnaligned checkpoints on backfills

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
CBogen set the point value for this task to 8.Dec 7 2020, 6:30 PM

Change 656867 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Remove option for non-optimized reordering

https://gerrit.wikimedia.org/r/656867

Change 657104 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Inline side output routing

https://gerrit.wikimedia.org/r/657104

Change 657575 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Remove custom Patch serializer

https://gerrit.wikimedia.org/r/657575

Change 658595 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Use reinterpretAsKeyedStream on pre-keyed streams

https://gerrit.wikimedia.org/r/658595

Change 656867 merged by jenkins-bot:
[wikidata/query/rdf@master] Remove option for non-optimized reordering

https://gerrit.wikimedia.org/r/656867

Change 657104 merged by jenkins-bot:
[wikidata/query/rdf@master] Inline side output routing

https://gerrit.wikimedia.org/r/657104

Change 657575 merged by jenkins-bot:
[wikidata/query/rdf@master] Remove custom Patch serializer

https://gerrit.wikimedia.org/r/657575

Change 658595 merged by jenkins-bot:
[wikidata/query/rdf@master] Use reinterpretAsKeyedStream on pre-keyed streams

https://gerrit.wikimedia.org/r/658595

Change 659313 had a related patch set uploaded (by ZPapierski; owner: DCausse):
[wikidata/query/rdf@master] Drop custom parallelism

https://gerrit.wikimedia.org/r/659313

Change 660776 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Enable object reuse in streaming updater

https://gerrit.wikimedia.org/r/660776

Change 659313 merged by jenkins-bot:
[wikidata/query/rdf@master] Drop custom parallelism

https://gerrit.wikimedia.org/r/659313

Change 660776 merged by jenkins-bot:
[wikidata/query/rdf@master] Enable object reuse in streaming updater

https://gerrit.wikimedia.org/r/660776