Page MenuHomePhabricator

Investigate DB connection issues faced from airflow on an-launcher1002
Closed, ResolvedPublic

Description

Every night at between 00:01:30 to 00:03:30 airflow scheduler from an-aluncher1002 is facing DB conneciton issue

Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]: The above exception was the direct cause of the following exception:
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]: MySQLdb._exceptions.OperationalError: (2026, 'SSL connection error: Error in the pull function.')
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     super().__init__(*args, **kwargs2)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/MySQLdb/connections.py", line 185, in __init__
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     return Connection(*args, **kwargs)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/MySQLdb/__init__.py", line 130, in Connect
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     return self.dbapi.connect(*cargs, **cparams)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 508, in connect
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     return dialect.connect(*cargs, **cparams)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     connection = pool._invoke_creator(self)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     raise exception
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     with_traceback=exc_tb,
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     pool.logger.debug("Error on connect(): %s", e)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     self.__connect(first_connect_check=True)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     return _ConnectionRecord(self)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     return self._create_connection()
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     raise exception
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     with_traceback=exc_tb,
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     self._dec_overflow()
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     rec = pool._do_get()
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     fairy = _ConnectionRecord.checkout(pool)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     return _ConnectionFairy._checkout(self)
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 364, in connect
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:     return fn()
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]:   File "/usr/lib/airflow/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]: Traceback (most recent call last):
Mar 06 00:01:33 an-launcher1002 airflow-scheduler@analytics[5803]: Process DagFileProcessor346386-Process:
Mar 06 00:01:27 an-launcher1002 airflow-scheduler@analytics[5803]: (Background on this error at: http://sqlalche.me/e/13/e3q8)

This issue can lead to jobs being reported in failure while they have run successfully

Mar 03 00:03:41 an-launcher1002 airflow-scheduler@analytics[5803]: [2023-03-03 00:03:41,218] {scheduler_job.py:655} INFO - Setting task instance <TaskInstance: pageview_actor_hourly.compute_pageview_actor_hourly 2023-03-02 22:00:00+00:00 [queued]> state to failed as reported by executor
Mar 03 00:03:41 an-launcher1002 airflow-scheduler@analytics[5803]: [2023-03-03 00:03:41,217] {scheduler_job.py:648} ERROR - Executor reports task instance <TaskInstance: pageview_actor_hourly.compute_pageview_actor_hourly 2023-03-02 22:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
...
DB ISSUE
...
Mar 03 00:01:31 an-launcher1002 airflow-scheduler@analytics[5803]:         <TaskInstance: pageview_actor_hourly.compute_pageview_actor_hourly 2023-03-02 22:00:00+00:00 [scheduled]>
Mar 03 00:01:31 an-launcher1002 airflow-scheduler@analytics[5803]:         <TaskInstance: pageview_actor_hourly.compute_pageview_actor_hourly 2023-03-02 22:00:00+00:00 [scheduled]>

Event Timeline

Looks to me that we reach 100% network usage on an-launcher1002 when the connection issues happens: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=an-launcher1[…]analytics&from=1677715234951&to=1677715701346&viewPanel=11
This node only have a 1GB NIC, we should identify that local job which issue this usage and see if we can throttle or make it run on hadoop.

Trying to identify process which could generate this tx network usage:

  • running a PS command every 15s in an-launcher1002 stored in /home/nfraison/proc.log
  • running netstat -tup te get process/src/dst ip/port stored in /home/nfraison/netstat.log
  • running iftop -tPs 5 to get traffic send/receive from last 5s src host/dst host/port stored in /home/nfraison/iftop.log
nfraison renamed this task from Investigate DB connection issues faced from airflow on an-aluncher1002 to Investigate DB connection issues faced from airflow on an-launcher1002.Mar 6 2023, 2:23 PM

An idea was that reportupdater job can be the root cause of that high tx
Here is the log of all reportupdater service running at midnight (or last log when not having logs at midnight. Doesn't seems to match the timeline

● reportupdater-pingback.service - Periodic execution of reportupdater-pingback.service
Mar 06 00:00:01 an-launcher1002 systemd[1]: Started Report Updater job for pingback.
Mar 06 00:00:01 an-launcher1002 systemd[1]: reportupdater-pingback.service: Succeeded.
Mar 06 00:00:01 an-launcher1002 kerberos-run-command[22919]: 2023-03-06 00:00:01,321 - INFO - Execution complete.
Mar 06 00:00:01 an-launcher1002 kerberos-run-command[22919]: 2023-03-06 00:00:00,730 - INFO - Starting execution.
Mar 06 00:00:01 an-launcher1002 kerberos-run-command[22919]: kinit: Failed to store credentials: Internal credentials cache error (filename: /tmp/krb5cc_906) while getting initial credentials
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23016]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/pin
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for pingback...


● reportupdater-published_cx2_translations.service - Periodic execution of reportupdater-published_cx2_translations.service
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]: 2023-03-06 00:01:17,466 - ERROR - Report "pages_with_unreviewed_translations" could not be executed because of error: object of type 'NoneType' has no len()
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]: 2023-03-06 00:00:47,557 - INFO - Executing "<Report key=pages_with_unreviewed_translations type=script granularity=weeks lag=86400 first_date=2018-01-01 start=2022-12-25 end=2023-01-01 db_key=N
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]: TypeError: object of type 'NoneType' has no len()
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]:     empty_row = [report.start] + [None] * (len(normalized_header) - 1)
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]:   File "/srv/reportupdater/reportupdater/reportupdater/executor.py", line 198, in normalize_results
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]:     report.results = self.normalize_results(report, None, tsv_reader)
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]:   File "/srv/reportupdater/reportupdater/reportupdater/executor.py", line 161, in execute_script_report
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]: Traceback (most recent call last):
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]: 2023-03-06 00:00:47,555 - ERROR - Report "pages_with_unreviewed_translations" could not be executed because of error: object of type 'NoneType' has no len()
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]: 2023-03-06 00:00:00,747 - INFO - Executing "<Report key=pages_with_unreviewed_translations type=script granularity=weeks lag=86400 first_date=2018-01-01 start=2022-12-18 end=2022-12-25 db_key=N
Mar 06 00:03:12 an-launcher1002 kerberos-run-command[22994]: 2023-03-06 00:00:00,733 - INFO - Starting execution.


● reportupdater-reference-previews.service - Periodic execution of reportupdater-reference-previews.service
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]: 2023-03-05 20:39:08,610 - INFO - Executing "<Report key=baseline type=hive granularity=days lag=10800 first_date=2019-11-02 start=2021-09-08 end=2021-09-09 db_key=None hql_template=-- -- Estimat
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]: TypeError: object of type 'NoneType' has no len()
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:     empty_row = [report.start] + [None] * (len(normalized_header) - 1)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:   File "/srv/reportupdater/reportupdater/reportupdater/executor.py", line 198, in normalize_results
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:     report.results = self.normalize_results(report, None, tsv_reader)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:   File "/srv/reportupdater/reportupdater/reportupdater/executor.py", line 134, in execute_hive
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]: Traceback (most recent call last):
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]: 2023-03-05 20:39:08,608 - ERROR - Report "baseline" could not be executed because of error: object of type 'NoneType' has no len()
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]: FAILED: ParseException line 42:0 cannot recognize input near 'event_counts' ';' '<EOF>' in joinSource
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at java.lang.reflect.Method.invoke(Method.java:498)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36633)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:36987)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:36735)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:41904)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1312)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:1690)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:2475)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]:         at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.atomjoinSource(HiveParser_FromClauseParser.java:1790)
Mar 06 04:49:35 an-launcher1002 kerberos-run-command[8105]: NoViableAltException(350@[])


● reportupdater-language.service - Periodic execution of reportupdater-language.service
Mar 06 00:03:19 an-launcher1002 python3[22953]: 2023-03-06 00:00:00,995 - INFO - Executing "<Report key=content_translation_beta type=sql granularity=days lag=0 first_date=2015-01-13 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:03:19 an-launcher1002 python3[22953]: 2023-03-06 00:00:00,665 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for language...


● reportupdater-templatewizard.service - Periodic execution of reportupdater-templatewizard.service
Mar 06 00:00:47 an-launcher1002 systemd[1]: Started Report Updater job for templatewizard.
Mar 06 00:00:47 an-launcher1002 systemd[1]: reportupdater-templatewizard.service: Succeeded.
Mar 06 00:00:47 an-launcher1002 kerberos-run-command[22960]: 2023-03-06 00:00:47,578 - INFO - Execution complete.
Mar 06 00:00:47 an-launcher1002 kerberos-run-command[22960]: 2023-03-06 00:00:00,755 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23021]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/tem
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for templatewizard...


● reportupdater-mt_engines.service - Periodic execution of reportupdater-mt_engines.service
Mar 06 00:12:05 an-launcher1002 systemd[1]: Started Report Updater job for mt_engines.
Mar 06 00:12:05 an-launcher1002 systemd[1]: reportupdater-mt_engines.service: Succeeded.
Mar 06 00:12:05 an-launcher1002 python3[22972]: 2023-03-06 00:12:05,467 - INFO - Execution complete.
Mar 06 00:12:05 an-launcher1002 python3[22972]: 2023-03-06 00:12:05,464 - INFO - Report mt_engines has been updated.
Mar 06 00:12:05 an-launcher1002 python3[22972]: 2023-03-06 00:00:00,692 - INFO - Executing "<Report key=mt_engines type=sql granularity=weeks lag=86400 first_date=2016-01-18 start=2023-02-26 end=2023-03-05 db_key=wikishared hql_template=None sql_template
Mar 06 00:12:05 an-launcher1002 python3[22972]: 2023-03-06 00:00:00,648 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for mt_engines...

● reportupdater-wmcs.service - Periodic execution of reportupdater-wmcs.service
Mar 06 00:13:39 an-launcher1002 kerberos-run-command[22962]: 2023-03-06 00:03:42,879 - ERROR - Report "wikis_by_wmcs_edits_percent" could not be executed because of error: object of type 'NoneType' has no len()
Mar 06 00:13:39 an-launcher1002 kerberos-run-command[22962]: NameError: name 'number' is not defined
Mar 06 00:13:39 an-launcher1002 kerberos-run-command[22962]:     pivoted[wiki][date] += number
Mar 06 00:13:39 an-launcher1002 kerberos-run-command[22962]:   File "/srv/reportupdater/jobs/reportupdater-queries/wmcs/dynamic_pivot.py", line 21, in <module>
Mar 06 00:13:39 an-launcher1002 kerberos-run-command[22962]: Traceback (most recent call last):
Mar 06 00:13:39 an-launcher1002 kerberos-run-command[22962]: 2023-03-06 00:00:01,320 - INFO - Executing "<Report key=wikis_by_wmcs_edits_percent type=script granularity=months lag=345600 first_date=2019-10-01 start=2021-02-01 end=2021-03-01 db_key=None h
Mar 06 00:13:39 an-launcher1002 kerberos-run-command[22962]: 2023-03-06 00:00:00,759 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23031]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/wmc
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for wmcs...

● reportupdater-cx.service - Periodic execution of reportupdater-cx.service
Mar 06 00:00:00 an-launcher1002 systemd[1]: Started Report Updater job for cx.
Mar 06 00:00:00 an-launcher1002 systemd[1]: reportupdater-cx.service: Succeeded.
Mar 06 00:00:00 an-launcher1002 python3[22954]: 2023-03-06 00:00:00,752 - INFO - Execution complete.
Mar 06 00:00:00 an-launcher1002 python3[22954]: 2023-03-06 00:00:00,673 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for cx...

● reportupdater-published_cx2_translations_mysql.service - Periodic execution of reportupdater-published_cx2_translations_mysql.service
Mar 06 00:30:06 an-launcher1002 systemd[1]: Started Report Updater job for published_cx2_translations_mysql.
Mar 06 00:30:06 an-launcher1002 systemd[1]: reportupdater-published_cx2_translations_mysql.service: Succeeded.
Mar 06 00:30:06 an-launcher1002 python3[4041]: 2023-03-06 00:30:06,633 - INFO - Execution complete.
Mar 06 00:30:06 an-launcher1002 python3[4041]: 2023-03-06 00:30:06,633 - INFO - Report published_cx2_translators has been updated.
Mar 06 00:30:06 an-launcher1002 python3[4041]: 2023-03-06 00:30:05,629 - INFO - Executing "<Report key=published_cx2_translators type=sql granularity=weeks lag=86400 first_date=2018-01-01 start=2023-02-26 end=2023-03-05 db_key=wikishared hql_template=Non
Mar 06 00:30:06 an-launcher1002 python3[4041]: 2023-03-06 00:30:05,615 - INFO - Report published_cx2_translations has been updated.
Mar 06 00:30:06 an-launcher1002 python3[4041]: 2023-03-06 00:30:00,912 - INFO - Executing "<Report key=published_cx2_translations type=sql granularity=weeks lag=86400 first_date=2018-01-01 start=2023-02-26 end=2023-03-05 db_key=wikishared hql_template=No
Mar 06 00:30:06 an-launcher1002 python3[4041]: 2023-03-06 00:30:00,875 - INFO - Starting execution.
Mar 06 00:30:00 an-launcher1002 systemd[1]: Starting Report Updater job for published_cx2_translations_mysql...

● reportupdater-codemirror.service - Periodic execution of reportupdater-codemirror.service
Mar 06 00:17:01 an-launcher1002 systemd[1]: Started Report Updater job for codemirror.
Mar 06 00:17:01 an-launcher1002 systemd[1]: reportupdater-codemirror.service: Succeeded.
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:17:01,172 - INFO - Execution complete.
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:17:01,168 - INFO - Report users_codemirror_and_wikitext has been updated.
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:16:37,006 - INFO - Executing "<Report key=users_codemirror_and_wikitext type=sql granularity=days lag=0 first_date=2021-06-21 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:16:36,801 - INFO - Report users_codemirror_and_wikitext has been updated.
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:16:11,112 - INFO - Executing "<Report key=users_codemirror_and_wikitext type=sql granularity=days lag=0 first_date=2021-06-21 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:16:10,868 - INFO - Report users_codemirror_and_wikitext has been updated.
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:15:04,539 - INFO - Executing "<Report key=users_codemirror_and_wikitext type=sql granularity=days lag=0 first_date=2021-06-21 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:15:04,334 - INFO - Report users_codemirror_and_wikitext has been updated.
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:01:07,938 - INFO - Executing "<Report key=users_codemirror_and_wikitext type=sql granularity=days lag=0 first_date=2021-06-21 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:01:07,850 - INFO - Report users_codemirror_and_wikitext has been updated.
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:01:01,734 - INFO - Executing "<Report key=users_codemirror_and_wikitext type=sql granularity=days lag=0 first_date=2021-06-21 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_
Mar 06 00:17:01 an-launcher1002 kerberos-run-command[22990]: 2023-03-06 00:00:00,747 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23019]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/cod
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for codemirror...

● reportupdater-edit-beta-features.service - Periodic execution of reportupdater-edit-beta-features.service
Mar 06 00:00:27 an-launcher1002 systemd[1]: Started Report Updater job for edit-beta-features.
Mar 06 00:00:27 an-launcher1002 systemd[1]: reportupdater-edit-beta-features.service: Succeeded.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:27,835 - INFO - Execution complete.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:27,819 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:27,497 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:26,928 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:26,591 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:26,232 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:25,287 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:24,501 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:24,284 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:23,919 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:23,616 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:23,141 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:22,646 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:22,199 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:21,905 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:21,543 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:21,174 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:20,647 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:19,641 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:19,223 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:18,552 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:17,999 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:17,530 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:16,988 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:16,394 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:15,771 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:14,573 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:14,039 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:13,314 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:12,667 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:11,560 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:11,346 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:10,986 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:10,336 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:09,353 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:08,665 - INFO - Report new_wikitext_editor_beta has been updated.
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:00,843 - INFO - Executing "<Report key=new_wikitext_editor_beta type=sql granularity=days lag=0 first_date=2017-03-17 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_
Mar 06 00:00:27 an-launcher1002 python3[22955]: 2023-03-06 00:00:00,669 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for edit-beta-features...

● reportupdater-interlanguage.service - Periodic execution of reportupdater-interlanguage.service
Mar 06 00:04:27 an-launcher1002 systemd[1]: Started Report Updater job for interlanguage.
Mar 06 00:04:27 an-launcher1002 systemd[1]: reportupdater-interlanguage.service: Succeeded.
Mar 06 00:04:27 an-launcher1002 kerberos-run-command[22996]: 2023-03-06 00:04:27,018 - INFO - Execution complete.
Mar 06 00:04:27 an-launcher1002 kerberos-run-command[22996]: 2023-03-06 00:04:26,776 - INFO - Report percent_interlanguage_navigation_prev has been updated.
Mar 06 00:04:27 an-launcher1002 kerberos-run-command[22996]: 2023-03-06 00:03:06,597 - INFO - Executing "<Report key=percent_interlanguage_navigation_prev type=script granularity=weeks lag=86400 first_date=2018-01-01 start=2023-02-26 end=2023-03-05 db_ke
Mar 06 00:04:27 an-launcher1002 kerberos-run-command[22996]: 2023-03-06 00:03:04,405 - INFO - Report percent_interlanguage_navigation_curr has been updated.
Mar 06 00:04:27 an-launcher1002 kerberos-run-command[22996]: 2023-03-06 00:00:06,950 - INFO - Executing "<Report key=percent_interlanguage_navigation_curr type=script granularity=weeks lag=86400 first_date=2018-01-01 start=2023-02-26 end=2023-03-05 db_ke
Mar 06 00:04:27 an-launcher1002 kerberos-run-command[22996]: 2023-03-06 00:00:00,747 - INFO - Starting execution.
Mar 06 00:04:27 an-launcher1002 kerberos-run-command[22996]: kinit: Failed to store credentials: No credentials cache found (filename: /tmp/krb5cc_906) while getting initial credentials
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23033]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/int
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for interlanguage...

● reportupdater-structured-data.service - Periodic execution of reportupdater-structured-data.service
Mar 06 00:40:37 an-launcher1002 systemd[1]: Started Report Updater job for structured-data.
Mar 06 00:40:37 an-launcher1002 systemd[1]: reportupdater-structured-data.service: Succeeded.
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:40:36,995 - INFO - Execution complete.
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:40:36,994 - INFO - Report commons_file_usage_in_wikimedia_projects has been updated.
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:30:12,810 - INFO - Executing "<Report key=commons_file_usage_in_wikimedia_projects type=script granularity=months lag=432000 first_date=2019-12-01 start=2023-02-01 end=2023-03-01 
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:30:12,807 - INFO - Report wikidata_usage_in_wikimedia_projects has been updated.
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:16:32,643 - INFO - Executing "<Report key=wikidata_usage_in_wikimedia_projects type=script granularity=months lag=432000 first_date=2019-11-01 start=2023-02-01 end=2023-03-01 db_k
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:16:32,639 - INFO - Report commonswiki_mediainfo_slots has been updated.
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:00:00,756 - INFO - Executing "<Report key=commonswiki_mediainfo_slots type=script granularity=months lag=432000 first_date=2019-11-01 start=2023-02-01 end=2023-03-01 db_key=None h
Mar 06 00:40:37 an-launcher1002 kerberos-run-command[22982]: 2023-03-06 00:00:00,745 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23023]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/str
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for structured-data...

● reportupdater-flow-beta-features.service - Periodic execution of reportupdater-flow-beta-features.service
Mar 06 00:00:22 an-launcher1002 systemd[1]: Started Report Updater job for flow-beta-features.
Mar 06 00:00:22 an-launcher1002 systemd[1]: reportupdater-flow-beta-features.service: Succeeded.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:22,133 - INFO - Execution complete.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:22,131 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:21,832 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:21,373 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:21,052 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:20,415 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:18,870 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:18,216 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:14,278 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:13,303 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:13,028 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:12,411 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:11,873 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:11,508 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:11,380 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:11,032 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:10,616 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:09,756 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:09,404 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:08,310 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:04,722 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:04,221 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:03,658 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:02,790 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:02,444 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:01,826 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:01,305 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:01,073 - INFO - Report flow_beta has been updated.
Mar 06 00:00:22 an-launcher1002 python3[22956]:   result = self._query(query)
Mar 06 00:00:22 an-launcher1002 python3[22956]: /usr/lib/python3/dist-packages/pymysql/cursors.py:170: Warning: (1292, "Truncated incorrect DOUBLE value: ''")
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:00,885 - INFO - Executing "<Report key=flow_beta type=sql granularity=days lag=0 first_date=2016-01-19 start=2023-03-05 end=2023-03-06 db_key=mediawiki hql_template=None sql_template=select
Mar 06 00:00:22 an-launcher1002 python3[22956]: 2023-03-06 00:00:00,659 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for flow-beta-features...

● reportupdater-browser.service - Periodic execution of reportupdater-browser.service
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:25:28,842 - ERROR - Report "mobile_site_by_os_family_percent" could not be executed because of error: object of type 'NoneType' has no len()
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: TypeError: 'NoneType' object has no attribute '__getitem__'
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]:     print header[0] + '\t' + '\t'.join(categories)
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]:   File "/srv/reportupdater/jobs/reportupdater-queries/browser/dynamic_pivot.py", line 24, in <module>
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: Traceback (most recent call last):
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:25:04,930 - INFO - Executing "<Report key=mobile_site_by_os_family_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2021-02-07 end=2021-02-14 db_key=Non
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:25:04,913 - INFO - Report mobile_site_by_os_family_and_major_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:23:04,801 - INFO - Executing "<Report key=mobile_site_by_os_family_and_major_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:23:04,444 - INFO - Report mobile_site_by_os_family_and_major has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:21:28,451 - INFO - Executing "<Report key=mobile_site_by_os_family_and_major type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_key=N
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:21:28,318 - INFO - Report mobile_site_by_browser_family_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:19:25,192 - INFO - Executing "<Report key=mobile_site_by_browser_family_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_ke
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:19:25,151 - INFO - Report mobile_site_by_browser_family_and_major_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:17:21,225 - INFO - Executing "<Report key=mobile_site_by_browser_family_and_major_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-0
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:17:20,653 - INFO - Report mobile_site_by_browser_family_and_major has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:15:47,404 - INFO - Executing "<Report key=mobile_site_by_browser_family_and_major type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:15:46,300 - INFO - Report all_sites_by_os_family_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:13:50,796 - INFO - Executing "<Report key=all_sites_by_os_family_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_key=None 
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:13:50,778 - INFO - Report all_sites_by_os_family_and_major_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:11:53,503 - INFO - Executing "<Report key=all_sites_by_os_family_and_major_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:11:53,365 - INFO - Report all_sites_by_os_family_and_major has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:10:22,563 - INFO - Executing "<Report key=all_sites_by_os_family_and_major type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_key=Non
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:10:21,688 - INFO - Report all_sites_by_os_and_browser_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:08:15,140 - INFO - Executing "<Report key=all_sites_by_os_and_browser_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_key=
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:08:12,932 - INFO - Report all_sites_by_os_and_browser has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:06:45,200 - INFO - Executing "<Report key=all_sites_by_os_and_browser type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_key=None hql
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:06:43,474 - INFO - Report all_sites_by_browser_family_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:04:44,132 - INFO - Executing "<Report key=all_sites_by_browser_family_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_key=
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:04:43,867 - INFO - Report all_sites_by_browser_family_and_major_percent has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:02:54,848 - INFO - Executing "<Report key=all_sites_by_browser_family_and_major_percent type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:02:54,406 - INFO - Report all_sites_by_browser_family_and_major has been updated.
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:00:01,724 - INFO - Executing "<Report key=all_sites_by_browser_family_and_major type=script granularity=weeks lag=86400 first_date=2015-06-07 start=2023-02-26 end=2023-03-05 db_ke
Mar 06 01:17:38 an-launcher1002 kerberos-run-command[22983]: 2023-03-06 00:00:00,768 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23035]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/bro
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for browser...

● reportupdater-visualeditor.service - Periodic execution of reportupdater-visualeditor.service
Mar 06 00:01:03 an-launcher1002 systemd[1]: Started Report Updater job for visualeditor.
Mar 06 00:01:03 an-launcher1002 systemd[1]: reportupdater-visualeditor.service: Succeeded.
Mar 06 00:01:03 an-launcher1002 kerberos-run-command[22969]: 2023-03-06 00:01:02,976 - INFO - Execution complete.
Mar 06 00:01:03 an-launcher1002 kerberos-run-command[22969]: 2023-03-06 00:00:00,788 - INFO - Starting execution.
Mar 06 00:01:03 an-launcher1002 kerberos-run-command[22969]: kinit: Failed to store credentials: No credentials cache found (filename: /tmp/krb5cc_906) while getting initial credentials
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23038]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/vis
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for visualeditor...

● reportupdater-templatedata.service - Periodic execution of reportupdater-templatedata.service
Mar 06 00:00:02 an-launcher1002 systemd[1]: Started Report Updater job for templatedata.
Mar 06 00:00:02 an-launcher1002 systemd[1]: reportupdater-templatedata.service: Succeeded.
Mar 06 00:00:02 an-launcher1002 kerberos-run-command[22968]: 2023-03-06 00:00:02,418 - INFO - Execution complete.
Mar 06 00:00:02 an-launcher1002 kerberos-run-command[22968]: 2023-03-06 00:00:00,738 - INFO - Starting execution.
Mar 06 00:00:00 an-launcher1002 kerberos-run-command[23024]: User analytics executes as user analytics the command ['/usr/bin/python3', '/srv/reportupdater/reportupdater/update_reports.py', '-l', 'info', '/srv/reportupdater/jobs/reportupdater-queries/tem
Mar 06 00:00:00 an-launcher1002 systemd[1]: Starting Report Updater job for templatedata...

Tonight issue:

Mar 07 00:01:31 an-launcher1002 airflow-scheduler@analytics[5803]: Process DagFileProcessor652438-Process:
Mar 07 00:03:17 an-launcher1002 airflow-scheduler@analytics[5803]: [2023-03-07 00:03:17,896] {scheduler_job.py:354} INFO - 5 tasks up for execution:

From iftop we can see that the issue happen exactly during this high tx load.
The load is spread over multiple hadoop datanode -> probably some data being persisted to hdfs

Tue 07 Mar 2023 12:01:08 AM UTC
   1 an-launcher1002.eqiad.wmnet:12001        =>     41.9Mb      118Mb      118Mb     88.3MB
     analytics1066.eqiad.wmnet:60330          <=      116Kb      348Kb      348Kb      261KB
   2 an-launcher1002.eqiad.wmnet:12002        =>     42.0Mb     93.7Mb     93.7Mb     70.3MB
     an-worker1104.eqiad.wmnet:37226          <=     70.1Kb      143Kb      143Kb      107KB
   3 an-launcher1002.eqiad.wmnet:12001        =>     41.9Mb     62.4Mb     62.4Mb     46.8MB
     an-worker1080.eqiad.wmnet:57734          <=     75.8Kb      116Kb      116Kb     86.8KB
   4 an-launcher1002.eqiad.wmnet:12002        =>     42.9Mb     49.3Mb     49.3Mb     37.0MB
     analytics1059.eqiad.wmnet:36474          <=     98.5Kb      128Kb      128Kb     95.9KB
########
Tue 07 Mar 2023 12:01:49 AM UTC
Listening on eno1
   # Host name (port/service if enabled)            last 2s   last 10s   last 40s cumulative
--------------------------------------------------------------------------------------------
   1 an-launcher1002.eqiad.wmnet:12004        =>     44.8Mb     43.4Mb     43.4Mb     32.6MB
     an-worker1086.eqiad.wmnet:60036          <=     63.4Kb     66.2Kb     66.2Kb     49.7KB
   2 an-launcher1002.eqiad.wmnet:12000        =>     27.4Mb     25.7Mb     25.7Mb     19.3MB
     an-worker1133.eqiad.wmnet:52948          <=     63.0Kb     50.7Kb     50.7Kb     38.0KB
   3 an-launcher1002.eqiad.wmnet:12004        =>     27.4Mb     25.7Mb     25.7Mb     19.3MB
     an-worker1089.eqiad.wmnet:33876          <=     43.3Kb     41.6Kb     41.6Kb     31.2KB
   4 an-launcher1002.eqiad.wmnet:12004        =>     25.9Mb     24.8Mb     24.8Mb     18.6MB
     an-worker1127.eqiad.wmnet:53228          <=     44.5Kb     42.7Kb     42.7Kb     32.0KB
   5 an-launcher1002.eqiad.wmnet:12004        =>     25.9Mb     24.8Mb     24.8Mb     18.6MB
     an-worker1108.eqiad.wmnet:33848          <=     39.6Kb     38.9Kb     38.9Kb     29.2KB
   6 an-launcher1002.eqiad.wmnet:12001        =>     25.8Mb     24.8Mb     24.8Mb     18.6MB
     an-worker1121.eqiad.wmnet:37246          <=     43.5Kb     49.8Kb     49.8Kb     37.4KB
   7 an-launcher1002.eqiad.wmnet:12004        =>     25.9Mb     24.7Mb     24.7Mb     18.5MB
     an-worker1118.eqiad.wmnet:47564          <=     59.3Kb     56.8Kb     56.8Kb     42.6KB
   8 an-launcher1002.eqiad.wmnet:12004        =>     25.9Mb     24.7Mb     24.7Mb     18.6MB
     an-worker1122.eqiad.wmnet:57926          <=     51.6Kb     44.3Kb     44.3Kb     33.2KB
   9 an-launcher1002.eqiad.wmnet:12000        =>     25.9Mb     24.7Mb     24.7Mb     18.5MB
     an-worker1118.eqiad.wmnet:47096          <=     50.8Kb     51.8Kb     51.8Kb     38.8KB
  10 an-launcher1002.eqiad.wmnet:12004        =>     25.9Mb     24.7Mb     24.7Mb     18.5MB
     an-worker1133.eqiad.wmnet:51164          <=     57.9Kb     49.8Kb     49.8Kb     37.4KB
########
Tue 07 Mar 2023 12:02:16 AM UTC
Listening on eno1
   # Host name (port/service if enabled)            last 2s   last 10s   last 40s cumulative
--------------------------------------------------------------------------------------------
   1 an-launcher1002.eqiad.wmnet:12001        =>     25.9Mb     24.0Mb     24.0Mb     18.0MB
     an-worker1084.eqiad.wmnet:38220          <=     46.5Kb     44.0Kb     44.0Kb     33.0KB
   2 an-launcher1002.eqiad.wmnet:12000        =>     25.6Mb     24.0Mb     24.0Mb     18.0MB
     an-worker1089.eqiad.wmnet:60706          <=     46.5Kb     41.3Kb     41.3Kb     31.0KB
   3 an-launcher1002.eqiad.wmnet:12000        =>     25.1Mb     23.9Mb     23.9Mb     17.9MB
     an-worker1118.eqiad.wmnet:47074          <=     53.2Kb     47.7Kb     47.7Kb     35.8KB
   4 an-launcher1002.eqiad.wmnet:12004        =>     25.4Mb     23.9Mb     23.9Mb     17.9MB
     an-worker1143.eqiad.wmnet:49266          <=     35.8Kb     38.5Kb     38.5Kb     28.8KB
   5 an-launcher1002.eqiad.wmnet:12004        =>     25.4Mb     23.8Mb     23.8Mb     17.9MB
     an-worker1133.eqiad.wmnet:51172          <=     54.0Kb     49.2Kb     49.2Kb     36.9KB
   6 an-launcher1002.eqiad.wmnet:12000        =>     25.7Mb     23.8Mb     23.8Mb     17.8MB
     an-worker1118.eqiad.wmnet:47086          <=     51.2Kb     52.0Kb     52.0Kb     39.0KB
   7 an-launcher1002.eqiad.wmnet:12000        =>     25.1Mb     23.7Mb     23.7Mb     17.8MB
     an-worker1080.eqiad.wmnet:51526          <=     50.4Kb     45.8Kb     45.8Kb     34.4KB
   8 an-launcher1002.eqiad.wmnet:12004        =>     25.4Mb     23.7Mb     23.7Mb     17.8MB
     an-worker1119.eqiad.wmnet:34204          <=     57.9Kb     55.9Kb     55.9Kb     41.9KB
   9 an-launcher1002.eqiad.wmnet:12000        =>     25.4Mb     23.7Mb     23.7Mb     17.8MB
     an-worker1119.eqiad.wmnet:57538          <=     52.2Kb     48.5Kb     48.5Kb     36.4KB
  10 an-launcher1002.eqiad.wmnet:12004        =>     25.4Mb     23.7Mb     23.7Mb     17.8MB
     an-worker1087.eqiad.wmnet:38122          <=     49.4Kb     46.0Kb     46.0Kb     34.5KB
########
....
########
Tue 07 Mar 2023 12:03:06 AM UTC
Listening on eno1
   # Host name (port/service if enabled)            last 2s   last 10s   last 40s cumulative
--------------------------------------------------------------------------------------------
   1 an-launcher1002.eqiad.wmnet:12004        =>     73.5Mb     54.7Mb     54.7Mb     41.0MB
     an-worker1148.eqiad.wmnet:40920          <=      135Kb     88.4Kb     88.4Kb     66.3KB
   2 an-launcher1002.eqiad.wmnet:12004        =>     41.9Mb     50.7Mb     50.7Mb     38.0MB
     an-worker1119.eqiad.wmnet:34192          <=     64.8Kb     76.8Kb     76.8Kb     57.6KB
   3 an-launcher1002.eqiad.wmnet:12004        =>     53.8Mb     36.4Mb     36.4Mb     27.3MB
     analytics1077.eqiad.wmnet:58216          <=      109Kb     74.6Kb     74.6Kb     56.0KB
   4 an-launcher1002.eqiad.wmnet:12004        =>     40.3Mb     31.9Mb     31.9Mb     23.9MB
     analytics1077.eqiad.wmnet:58210          <=     86.5Kb     67.7Kb     67.7Kb     50.8KB
   5 an-launcher1002.eqiad.wmnet:12004        =>     39.7Mb     31.6Mb     31.6Mb     23.7MB
     an-worker1143.eqiad.wmnet:49150          <=     75.8Kb     64.9Kb     64.9Kb     48.6KB
   6 an-launcher1002.eqiad.wmnet:12000        =>     39.4Mb     31.6Mb     31.6Mb     23.7MB
     an-worker1119.eqiad.wmnet:57562          <=     70.5Kb     59.9Kb     59.9Kb     44.9KB
   7 an-launcher1002.eqiad.wmnet:12000        =>     38.6Mb     31.5Mb     31.5Mb     23.7MB
     an-worker1120.eqiad.wmnet:51910          <=     75.2Kb     61.0Kb     61.0Kb     45.8KB
   8 an-launcher1002.eqiad.wmnet:12000        =>     38.6Mb     31.4Mb     31.4Mb     23.5MB
     an-worker1118.eqiad.wmnet:47104          <=     76.8Kb     64.5Kb     64.5Kb     48.4KB
   9 an-launcher1002.eqiad.wmnet:12000        =>     38.9Mb     31.4Mb     31.4Mb     23.5MB
     an-worker1118.eqiad.wmnet:47122          <=     73.9Kb     63.7Kb     63.7Kb     47.8KB
  10 an-launcher1002.eqiad.wmnet:12000        =>     38.6Mb     31.4Mb     31.4Mb     23.5MB
     an-worker1119.eqiad.wmnet:57554          <=     74.1Kb     60.8Kb     60.8Kb     45.6KB

From that point I can identify those 4 spark jobs runing in yarn client mode

analyti+ 18328 14486 99 00:00 ?        00:01:59 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar:/usr/lib/spark2/conf/:/usr/lib/spark2/jars/*:/etc/hadoop/conf/:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/share/java/apache-log4j-extras.jar -Xmx2G -Djava.net.useSystemProxies=True org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.driver.memory=2G --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar --conf spark.dynamicAllocation.maxExecutors=64 --class org.wikimedia.analytics.refinery.job.HiveToDruid --name eventlogging_to_druid_netflow-sanitization_daily --queue production --files /etc/hive/conf/hive-site.xml,/etc/refinery/eventlogging_to_druid/eventlogging_to_druid_netflow-sanitization_daily.properties /srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-job-0.0.146.jar --config_file /etc/refinery/eventlogging_to_druid/eventlogging_to_druid_netflow-sanitization_daily.properties --since 2023-01-05T00:00:00 --until 2023-01-06T00:00:00

analyti+ 19199 14387 93 00:00 ?        00:01:48 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar:/usr/lib/spark2/conf/:/usr/lib/spark2/jars/*:/etc/hadoop/conf/:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/share/java/apache-log4j-extras.jar -Xmx2G -Djava.net.useSystemProxies=True org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.driver.memory=2G --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar --conf spark.dynamicAllocation.maxExecutors=64 --class org.wikimedia.analytics.refinery.job.HiveToDruid --name eventlogging_to_druid_netflow_daily --queue production --files /etc/hive/conf/hive-site.xml,/etc/refinery/eventlogging_to_druid/eventlogging_to_druid_netflow_daily.properties /srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-job-0.0.146.jar --config_file /etc/refinery/eventlogging_to_druid/eventlogging_to_druid_netflow_daily.properties --since 2023-03-03T00:00:00 --until 2023-03-04T00:00:00

analyti+ 19138 14467 78 00:00 ?        00:00:59 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar:/usr/lib/spark2/conf/:/usr/lib/spark2/jars/*:/etc/hadoop/conf/:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/share/java/apache-log4j-extras.jar -Xmx2G -Djava.net.useSystemProxies=True org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.driver.memory=2G --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar --conf spark.dynamicAllocation.maxExecutors=32 --class org.wikimedia.analytics.refinery.job.HiveToDruid --name eventlogging_to_druid_netflow_hourly --queue production --files /etc/hive/conf/hive-site.xml,/etc/refinery/eventlogging_to_druid/eventlogging_to_druid_netflow_hourly.properties /srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-job-0.0.146.jar --config_file /etc/refinery/eventlogging_to_druid/eventlogging_to_druid_netflow_hourly.properties --since 2023-03-06T18:00:00 --until 2023-03-06T19:00:00

analyti+ 18945 14391 49 00:00 ?        00:00:57 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar:/usr/lib/spark2/conf/:/usr/lib/spark2/jars/*:/etc/hadoop/conf/:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/share/java/apache-log4j-extras.jar -Xmx2G -Djava.net.useSystemProxies=True org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.driver.memory=2G --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar --conf spark.dynamicAllocation.maxExecutors=64 --class org.wikimedia.analytics.refinery.job.HiveToDruid --name eventlogging_to_druid_network_flows_internal_daily --queue production --files /etc/hive/conf/hive-site.xml,/etc/refinery/eventlogging_to_druid/eventlogging_to_druid_network_flows_internal_daily.properties /srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-job-0.0.146.jar --config_file /etc/refinery/eventlogging_to_druid/eventlogging_to_druid_network_flows_internal_daily.properties --since 2023-03-03T00:00:00 --until 2023-03-04T00:00:00

Linked to those 4 services

● eventlogging_to_druid_network_flows_internal_daily.timer - Periodic execution of eventlogging_to_druid_network_flows_internal_daily.service
   Loaded: loaded (/lib/systemd/system/eventlogging_to_druid_network_flows_internal_daily.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Wed 2022-04-13 16:43:05 UTC; 10 months 23 days ago
  Trigger: Wed 2023-03-08 00:00:00 UTC; 13h left
● eventlogging_to_druid_netflow-sanitization_daily.timer - Periodic execution of eventlogging_to_druid_netflow-sanitization_daily.service
   Loaded: loaded (/lib/systemd/system/eventlogging_to_druid_netflow-sanitization_daily.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Wed 2022-04-13 16:43:07 UTC; 10 months 23 days ago
  Trigger: Wed 2023-03-08 00:00:00 UTC; 13h left
● eventlogging_to_druid_netflow_daily.timer - Periodic execution of eventlogging_to_druid_netflow_daily.service
   Loaded: loaded (/lib/systemd/system/eventlogging_to_druid_netflow_daily.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Wed 2022-04-13 16:43:04 UTC; 10 months 23 days ago
  Trigger: Wed 2023-03-08 00:00:00 UTC; 13h left
● eventlogging_to_druid_netflow_hourly.timer - Periodic execution of eventlogging_to_druid_netflow_hourly.service
   Loaded: loaded (/lib/systemd/system/eventlogging_to_druid_netflow_hourly.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Wed 2023-02-22 18:48:41 UTC; 1 weeks 5 days ago
  Trigger: Tue 2023-03-07 11:00:00 UTC; 15min left

The driver is in charge of servicing files, jars and app jar through http file server. With those potential 64 executors * 4 jobs getting the 100MB refinery-job-0.0.146.jar it indeed generate some loads (more than 3 min of network transfer at full speed).

We must move to yarn cluster mode

Let's wait for Airflow 2.5 - we'll migrate those jobs to Airflow and they'll run through Skein.

BTullis claimed this task.
BTullis moved this task from Backlog to Shared Data Infra on the Data-Engineering-Planning board.
BTullis subscribed.

To wrap up this investigation, here's a brief summary as I understand it.

  • The main cause of the MariaDB errors around midnight was the fact that the network interface on an-launcher1002 is completely saturated, blocking MariaDB traffic.
  • The main cause of the saturation is that several jobs are currently launched in spark client mode. These jobs are:
    1. eventlogging_to_druid_netflow_daily
    2. eventlogging_to_druid_netflow_hourly
    3. eventlogging_to_druid_netflow-sanitization_daily
    4. eventlogging_to_druid_network_flows_internal_daily
  • It is likely the copying of all of the jar files to all of the executor containers in client mode that causes the excessive traffic.
  • We can rewrite these jobs to use skein, which will prevent this excessive traffic.
  • The migratin to spark3 and skein is already planned and will be taking place anyway, as part of the migration to Airflow.

The ticket tracking the migration work is: T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task

So I will close this ticket for now.