idwikivoyage is a small wiki. On a recent`consistency_check.py` run we found a ZeroDivisionError: division by zero:
Script:
script: venv/bin/spark-submit --driver-cores 4 --conf spark.executorEnv.PYSPARK_DRIVER_PYTHON=venv/bin/python --conf spark.executorEnv.PYSPARK_PYTHON=venv/bin/python --conf spark.executorEnv.SPARK_CONF_DIR=/etc/spark3/conf --conf spark.executorEnv.SPARK_HOME=venv/lib/python3.10/site-packages/pyspark --master yarn --conf spark.shuffle.service.name=spark_shuffle_3_3 --conf spark.shuffle.service.port=7338 --conf spark.yarn.archive=hdfs:///user/spark/share/lib/spark-3.3.2-assembly.zip --conf spark.jars.packages=org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.6.1 --conf spark.jars.ivySettings=/etc/maven/ivysettings.xml --conf 'spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp/analytics/ivy_spark3/cache -Divy.home=/tmp/analytics/ivy_spark3/home' --conf spark.dynamicAllocation.maxExecutors=16 --conf spark.sql.shuffle.partitions=64 --conf spark.jars=hdfs:///wmf/cache/artifacts/airflow/analytics/mysql-connector-j-8.2.0.jar,hdfs:///wmf/cache/artifacts/airflow/analytics/refinery-spark-0.2.51.jar --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf spark.yarn.appMasterEnv.SPARK_HOME=venv/lib/python3.10/site-packages/pyspark --conf spark.yarn.appMasterEnv.SPARK_CONF_DIR=/etc/spark3/conf --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=venv/bin/python --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=venv/bin/python --archives 'hdfs:///wmf/cache/artifacts/airflow/analytics/mediawiki-content-dump-0.2.0.dev0-fix-emit-dt-issue.conda.tgz#venv' --executor-cores 2 --executor-memory 8G --driver-memory 8G --keytab analytics.keytab --principal analytics/an-launcher1002.eqiad.wmnet@WIKIMEDIA --name dumps_reconcile_wikitext_raw_daily__reconcile.spark_consistency_check__20241202 --queue production --deploy-mode client venv/bin/consistency_check.py --wiki_db idwikivoyage --target_table wmf_dumps.wikitext_raw_rc2 --results_table wmf_dumps.wikitext_inconsistent_rows_rc1 --min_timestamp 2024-12-02T00:00:00 --max_timestamp 2024-12-03T00:00:00 --spark_jdbc_num_partitions 1 --computation_dt 2024-12-03T00:00:00 --mariadb_password_file ***
Logs:
sudo -u analytics yarn logs -appOwner analytics -applicationId application_1732360673415_354900
...
24/12/03 13:37:35 INFO DAGScheduler: Job 0 finished: take at /var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py:129, took 9.617400 s
Traceback (most recent call last):
File "/var/lib/hadoop/data/b/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/filecache/10/mediawiki-content-dump-0.2.0.dev0-fix-emit-dt-issue.conda.tgz/bin/consistency_check.py", line 8, in <module>
sys.exit(main())
File "/var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py", line 408, in main
find_inconsistent_rows(
File "/var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py", line 233, in find_inconsistent_rows
source_revisions_sql_df = df_from_mariadb_replica_adaptive(
File "/var/lib/hadoop/data/g/yarn/local/usercache/analytics/appcache/application_1732360673415_354900/container_e129_1732360673415_354900_01_000001/venv/lib/python3.10/site-packages/mediawiki_content_dump/consistency_check.py", line 151, in df_from_mariadb_replica_adaptive
estimate_ratio = max((end_timestamp - start_timestamp).days, 1) / (last_timestamp - first_timestamp).days
ZeroDivisionError: division by zero
...