Seen in the logs while processing analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-02-12.
Stack:
LogType:stdout
LogLastModifiedTime:Sun Feb 25 00:24:20 +0000 2024
LogLength:1359
LogContents:
INFO:root:Validating configuration for table analytics_platform_eng.image_suggestions_search_index_delta
INFO:root:Validating fields aliased to weighted_tags
Traceback (most recent call last):
File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/convert_to_esbulk.py", line 8, in <module>
sys.exit(run_cli())
File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/venv/lib/python3.10/site-packages/discolytics/cli/convert_to_esbulk.py", line 731, in run_cli
return main(**dict(vars(args)))
File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/venv/lib/python3.10/site-packages/discolytics/cli/convert_to_esbulk.py", line 719, in main
unique_value_per_partition(df, limit_per_file, 'wikiid')
File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/venv/lib/python3.10/site-packages/discolytics/cli/convert_to_esbulk.py", line 631, in unique_value_per_partition
raise Exception('Empty dataframe provided')
Exception: Empty dataframe providedChecking analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-02-12 it appears to be empty and seems to be expected (see T345570, https://wikimedia.slack.com/archives/C01DFVAQRGA/p1708941091842049).
AC:
- convert_to_esbulk.py should accept empty partitions