Seen in the logs while processing analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-02-12.
Stack:
LogType:stdout LogLastModifiedTime:Sun Feb 25 00:24:20 +0000 2024 LogLength:1359 LogContents: INFO:root:Validating configuration for table analytics_platform_eng.image_suggestions_search_index_delta INFO:root:Validating fields aliased to weighted_tags Traceback (most recent call last): File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/convert_to_esbulk.py", line 8, in <module> sys.exit(run_cli()) File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/venv/lib/python3.10/site-packages/discolytics/cli/convert_to_esbulk.py", line 731, in run_cli return main(**dict(vars(args))) File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/venv/lib/python3.10/site-packages/discolytics/cli/convert_to_esbulk.py", line 719, in main unique_value_per_partition(df, limit_per_file, 'wikiid') File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-search/appcache/application_1707226456123_97551/container_e117_1707226456123_97551_01_000001/venv/lib/python3.10/site-packages/discolytics/cli/convert_to_esbulk.py", line 631, in unique_value_per_partition raise Exception('Empty dataframe provided') Exception: Empty dataframe provided
Checking analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-02-12 it appears to be empty and seems to be expected (see T345570, https://wikimedia.slack.com/archives/C01DFVAQRGA/p1708941091842049).
AC:
- convert_to_esbulk.py should accept empty partitions