Page MenuHomePhabricator

Re-Ingestion: Update Structured Data service to work with multiple topic versions
Closed, ResolvedPublic5 Estimated Story Points

Description

As an Engineer, I want to be able to configure the structured-data service to publish messages to multiple versions of compacted topics (e.g., v1 and v2), enabling me to execute the ingestion process without disrupting normal WME API operations.

Acceptance Criteria

  • Service configuration allows users to specify single or multiple versions of compacted topics.
  • Messages published are delivered only to the selected topic versions.

To-Do(s)

  • expose environment variable to propagate configuration
  • update / add unit tests
  • make sure that when configuration is provided messages are published in appropriate topics

Notes

  • It's important to remember that our structured-data service has multiple handlers designed for different event types. Every one of these handlers must be be able to support the new configuration.
  • We might think about similar apllying configuration as we have inside our on-demand service when we provide the full list of topics in advance (just a thing to consider).

Additional Context
Within our system, each Wikipedia project has a dedicated compacted topic storing its articles. The topic name is dynamically generated from project and namespace identifier during event processing by the structured-data service.
For instance, an event updating the Joe Biden page in English Wikipedia would send a payload containing enwiki (project ID) and 0 (namespace ID).
Using the template aws.structured-data.{project_id}-{namespace_name}-compacted.v1, the generated topic name is aws.structured-data.enwiki-articles-compacted.v1, and the message will be published to this topic.
We want to enable publishing to different topic versions simultaneously.
Currently, publishing is limited to version v1 because it's hardcoded.
This change would allow publishing to multiple versions (e.g., v2, v3) concurrently.

Event Timeline

REsquito-WMF set the point value for this task to 5.Jan 31 2024, 2:33 PM
Protsack.stephan renamed this task from Re-Ingestion: Update Structured Data To update multiple topic versions to Re-Ingestion: Update Structured Data service to work with multiple topic versions.Feb 5 2024, 3:02 PM
Protsack.stephan updated the task description. (Show Details)