Page MenuHomePhabricator

wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException
Closed, ResolvedPublic1 Estimated Story Points

Description

This has happened sporadically multiple times on job for spark_create_intermediate_table, which runs this code backfill_create_intermediate_table.py.

I believe a quick fix similar to what we did in T340863#9397991 will fix this issue. TL;DR: Make the temporary table also be ALTER TABLE tmp.tmp_table WRITE ORDERED BY wiki_db, revision_timestamp.

Event Timeline

Spark job: https://yarn.wikimedia.org/cluster/app/application_1707226456123_87407. Finished successfully in ~17 hours. Some evindence of retries, but overall the best time so far.

Will wait until the full backfill is done to do a full comparison.

The spark_create_intermediate_table job did not show evidence of FetchFailedExceptions. There were retries, but they were unrelated. It also finished ~21% faster as shown below.

The remaining spark tasks were comparable to last successful run. In general, the whole backfill process finished 9% faster. We are good here.

TaskBefore (2024-01-01)After (2024-02-01)Diff factorNotes
wait_for_data_in_mw_wikitext_history0:00:040:00:020.5
wait_for_data_in_raw_mediawiki_revision0:00:030:00:020.6666666667
spark_create_intermediate_table21:32:3316:58:520.7882609312
spark_backfill_merge_into_20010:03:080:03:191.058510638
spark_backfill_merge_into_20020:03:400:04:271.213636364
spark_backfill_merge_into_20030:03:390:04:441.296803653
spark_backfill_merge_into_20040:07:080:05:580.8364485981
spark_backfill_merge_into_20050:17:450:16:450.9436619718
spark_backfill_merge_into_20060:59:550:46:360.7777468707
spark_backfill_merge_into_20071:38:111:09:180.705822441
spark_backfill_merge_into_20081:38:401:26:290.8765202703
spark_backfill_merge_into_20091:30:531:33:501.032459197
spark_backfill_merge_into_20101:41:071:24:530.8394593704
spark_backfill_merge_into_20111:36:241:24:170.8743084371
spark_backfill_merge_into_20121:38:541:35:260.9649477587
spark_backfill_merge_into_20131:46:361:40:520.9462163852
spark_backfill_merge_into_20141:41:411:40:590.9931158826
spark_backfill_merge_into_20151:54:561:37:210.847012761
spark_backfill_merge_into_20162:07:122:06:140.9924004193
spark_backfill_merge_into_20172:10:112:13:321.025732941
spark_backfill_merge_into_20182:36:202:54:491.118230277
spark_backfill_merge_into_20193:29:062:58:000.8512673362
spark_backfill_merge_into_20202:22:123:11:501.349038912
spark_backfill_merge_into_20212:52:152:41:290.9374939526
spark_backfill_merge_into_20224:08:284:25:131.067413469
spark_backfill_merge_into_20234:28:563:49:540.854858701
spark_backfill_merge_into_20240:07:100:36:585.1581395352024 data keeps growing so this one is not informative.
drop_intermediate_table0:00:250:00:271.08
remove_intermediate_files0:00:310:00:300.9677419355
Total62:38:0356:53:060.90821037519% total time improvement