Page MenuHomePhabricator

Edit type enrichment: Add timeout
Open, MediumPublic

Description

T421026#11775078:

The research spark job for html based edit-type has 60s timeout (previously 300s). And from the errors in research edit-types table, it looks like indeed some processing did reach the timeout limit. Note that the research pipeline did not have to apply the diff, just computing edit-types.

spark.sql("select edit_types.error from research.edit_types_html").groupBy("error").count().show(truncate=False);

+---------------------------+---------+
|error                      |count    |
+---------------------------+---------+
|null                       |124586530|
|timeout error (300 seconds)|2759     |
|timeout error (60 seconds) |4043     |
|None                       |1469     |
+---------------------------+---------+

We should have timeouts for the feature counts enrichment job as well:

  • in appyling the diff
  • in computing edit-types

Without timeouts, some processing may take a long time and block the pipeline. More importantly, for whatever reason, if the pipeline fails due to this long processing time (OOM?) the flink app will crash. We should pre-emptively raise with timeouts to jobs go into the error sink so we save the app and keep events flowing.

  • With recent changes, the app should auto restart with backoff time, but if the same event is attempted to be computed repeatedly, the pipeline will not progress and will remain stuck

Related Objects

StatusSubtypeAssignedTask
OpenIsaac
ResolvedAKhatun_WMF
OpenNone
OpenNone
ResolvedAKhatun_WMF
ResolvedAKhatun_WMF
OpenNone
ResolvedOttomata
ResolvedJMonton-WMF
ResolvedJMonton-WMF
ResolvedJMonton-WMF
OpenJMonton-WMF
OpenJMonton-WMF
OpenNone
OpenNone
ResolvedJMonton-WMF
ResolvedJMonton-WMF
OpenNone
ResolvedOttomata
OpenJMonton-WMF
ResolvedJMonton-WMF
OpenJMonton-WMF
ResolvedJMonton-WMF
ResolvedJMonton-WMF
ResolvedOttomata
ResolvedOttomata
OpenJMonton-WMF
OpenAKhatun_WMF
OpenNone
OpenNone
OpenOttomata