Page MenuHomePhabricator

Figure out a good place for static HDFS helper files for the structured data team.
Closed, ResolvedPublic

Description

From https://gitlab.wikimedia.org/repos/structured-data/section-topics/-/merge_requests/6#note_13763:

Let's figure out independently of this PR what is a good place on HDFS to put this FILTER_PARQUET parquet file.

Event Timeline

@xcollazo , what about setting paths with VariableProperties, pretty much as we do with the conda artifact?
Something like helper = var_props.get('helper', '/path/to/hdfs')

@xcollazo , what about setting paths with VariableProperties, pretty much as we do with the conda artifact?
Something like helper = var_props.get('helper', '/path/to/hdfs')

Yes, that makes sense.

I had two concerns with the current way of using static files:

  1. what happens if we migrate the DAG elsewhere?
  2. what happens if someone else starts using the platform_eng Airflow instance and inadvertedly messes up the files

@mfossati suggestion takes care of (1), as we could just override temporarily while we change the default to the new place.

For (2): We can also come up with some namespacing strategy like:

/user/analytics-platform-eng/structured-data/image_suggestions/ for image_suggestions static files

and

/user/analytics-platform-eng/structured-data/section_topics/ for section_topics static files

etc.

I think the above would make it clear that these static files should not be touched by other folks.

  1. what happens if we migrate the DAG elsewhere?
  2. what happens if someone else starts using the platform_eng Airflow instance and inadvertedly messes up the files

@mfossati suggestion takes care of (1), as we could just override temporarily while we change the default to the new place.

For (2): We can also come up with some namespacing strategy like:

/user/analytics-platform-eng/structured-data/image_suggestions/ for image_suggestions static files

and

/user/analytics-platform-eng/structured-data/section_topics/ for section_topics static files

etc.

I think the above would make it clear that these static files should not be touched by other folks.

That sounds good to me!