Page MenuHomePhabricator

Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for dependency tracking and events
Closed, ResolvedPublic

Description

Session Themes and Topics

  • Theme: Architecting our code for change and sustainability
  • Topic: Dependency Tracking and Events

Session Leader

  • Alexandros Kosiaris

Facilitator

  • Jon Katz

Description

Dependency tracking and propagating events for changes is a critical function of our infrastructure that enables us to invalidate the many caches we have for our content, and regenerate artifacts based on it. New use cases, especially around Wikidata, have greatly increased the number of events we need to propagate and process to ensure we always return the latest content to users. We are currently designing a new Modern Event Platform to fill this need as well as the needs of our Analytics stack. This session looks to aggregate needs and requirements for the dependency tracking in order to provide parameters for designing this new system.

Questions to answer during this session

QuestionSignificance: Why is this question important? What is blocked by it remaining unanswered?
What event propagation issues do we have now? What use cases do we see event propagation as a solution for? Are the current efforts for the Modern event platform poised to solve these? What other gaps do we have and do we have solutions for these? If not what do we need to do to find them?We are building a new event propagation system built on Kafka. This is being used in several contexts including dependency tracking. We should make sure that we understand what the new system is solving and if there are gaps identify them. We should also make sure we have a scalable way to store dependencies, in order to route events.
What are the acceptable delays per type of update ? How do we solve the issue of the number of events caused by invalidation of Wikidata items due the inherent structure of Wikidata? Is there a way to reduce the number of events? If not, is there a way to scale to handle this number of events. If we integrate Wikidata and Wikibase more into other content Wikis, what is the impact of this?When editing content, we need to invalidate our caches to ensure clients get the most recent version. Most of this is currently done using the JobQueue. Wikidata and the way that it stores data has increased the number of updates and has made invalidation more difficult when it impacts many pages. With the increase of including Wikidata content on other Wikis, this may get worse. Do we know how we are going to solve this?
How does the product need to increase the use of content from other wikis impact the needs of event propagation?Product is moving towards including content from other projects… specifically Wikipedia. This is already happening With Wikidata. This would seem to increase the need to propagate events to invalidate content across projects
Would deterministic parsing and closed templates impact the needs of dependency tracking?Dependency tracking is key to invalidating content. Syntactically closed templates would allow us to ensure parts of HTML are independent of each other. How does this impact the need for dependency tracking and re-parsing of content?

Facilitator and Scribe notes

Facilitator reminders

Session Structure

  • Define session scope, clarify desired outcomes, present agenda
  • Get some PostIts and write down issues, fill up the whiteboards. (10mins)
    • Issue
    • Consumer (optionally)
    • Producer (optionally)
  • Dot vote on the whiteboards. Deduplicate/Cluster while at it if possible. Gather the 5 most voted. ( 5 mins )
  • Split up in groups of 6-8 people. Try to have at least 1 person per group that has some knowledge of event issues
    • Pick 3
  • Try to answer the following (20 mins, try to give 6-7 mins to each issue )
    • What is the nature/cause of the issue ?
    • What obvious solutions exist if any ?
    • The importance of the issue (try to stick to low/medium/high)
  • Regroup, sort issues by importance, try to come up with
    • Goal/action/decision/question

Session Leaders please:

  • Add more details to this task description.
  • Coordinate any pre-event discussions (here on Phab, IRC, email, hangout, etc).
  • Outline the plan for discussing this topic at the event.
  • Optionally, include what it will not try to solve.
  • Update this task with summaries of any pre-event discussions.
  • Include ways for people not attending to be involved in discussions before the event and afterwards.

Post-event Summary:

  • ...

Action items:

  • ...

Event Timeline

kchapman renamed this task from Wikimedia Technical Conference 2018 Session - Dependency Tracking and Events to Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for dependency tracking and events.Oct 3 2018, 2:38 AM
debt added a subscriber: akosiaris.
debt updated the task description. (Show Details)
debt removed a subscriber: akosiaris.
akosiaris triaged this task as Medium priority.Oct 24 2018, 6:02 PM
akosiaris updated the task description. (Show Details)
akosiaris updated the task description. (Show Details)