Page MenuHomePhabricator

Division by zero when running consistency_check
Closed, ResolvedPublic

Description

idwikivoyage is a small wiki. On a recent`consistency_check.py` run we found a ZeroDivisionError: division by zero:

Script:

script: venv/bin/spark-submit --driver-cores 4 --conf spark.executorEnv.PYSPARK_DRIVER_PYTHON=venv/bin/python
  --conf spark.executorEnv.PYSPARK_PYTHON=venv/bin/python --conf spark.executorEnv.SPARK_CONF_DIR=/etc/spark3/conf
  --conf spark.executorEnv.SPARK_HOME=venv/lib/python3.10/site-packages/pyspark
  --master yarn --conf spark.shuffle.service.name=spark_shuffle_3_3 --conf spark.shuffle.service.port=7338
  --conf spark.yarn.archive=hdfs:///user/spark/share/lib/spark-3.3.2-assembly.zip
  --conf spark.jars.packages=org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.6.1
  --conf spark.jars.ivySettings=/etc/maven/ivysettings.xml --conf 'spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp/analytics/ivy_spark3/cache
  -Divy.home=/tmp/analytics/ivy_spark3/home' --conf spark.dynamicAllocation.maxExecutors=16
  --conf spark.sql.shuffle.partitions=64 --conf spark.jars=hdfs:///wmf/cache/artifacts/airflow/analytics/mysql-connector-j-8.2.0.jar,hdfs:///wmf/cache/artifacts/airflow/analytics/refinery-spark-0.2.51.jar
  --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf spark.yarn.appMasterEnv.SPARK_HOME=venv/lib/python3.10/site-packages/pyspark
  --conf spark.yarn.appMasterEnv.SPARK_CONF_DIR=/etc/spark3/conf --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=venv/bin/python
  --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=venv/bin/python --archives
  'hdfs:///wmf/cache/artifacts/airflow/analytics/mediawiki-content-dump-0.2.0.dev0-fix-emit-dt-issue.conda.tgz#venv'
  --executor-cores 2 --executor-memory 8G --driver-memory 8G --keytab analytics.keytab
  --principal analytics/an-launcher1002.eqiad.wmnet@WIKIMEDIA --name dumps_reconcile_wikitext_raw_daily__reconcile.spark_consistency_check__20241202
  --queue production --deploy-mode client venv/bin/consistency_check.py --wiki_db
  idwikivoyage --target_table wmf_dumps.wikitext_raw_rc2 --results_table wmf_dumps.wikitext_inconsistent_rows_rc1
  --min_timestamp 2024-12-02T00:00:00 --max_timestamp 2024-12-03T00:00:00 --spark_jdbc_num_partitions
  1 --computation_dt 2024-12-03T00:00:00 --mariadb_password_file ***

Logs:

sudo -u analytics yarn logs -appOwner analytics -applicationId application_1732360673415_354900

...
24/12/03 13:37:35 INFO DAGScheduler: Job 0 finished: take at /var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py:129, took 9.617400 s
Traceback (most recent call last):
  File "/var/lib/hadoop/data/b/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/filecache/10/mediawiki-content-dump-0.2.0.dev0-fix-emit-dt-issue.conda.tgz/bin/consistency_check.py", line 8, in <module>
    sys.exit(main())
  File "/var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py", line 408, in main
    find_inconsistent_rows(
  File "/var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py", line 233, in find_inconsistent_rows
    source_revisions_sql_df = df_from_mariadb_replica_adaptive(
  File "/var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py", line 151, in df_from_mariadb_replica_adaptive
    estimate_ratio = max((end_timestamp - start_timestamp).days, 1) / (last_timestamp - first_timestamp).days
ZeroDivisionError: division by zero
...