= Session Themes and Topics=
* Theme: Architecting our code for change and sustainability
* Topic: Dependency Tracking and Events
=Session Leader=
* Alexandros Kosiaris
=Facilitator=
*
=Description=
Dependency tracking and propagating events for changes is a critical function of our infrastructure that enables us to invalidate the many caches we have for our content, and regenerate artifacts based on it. New use cases, especially around Wikidata, have greatly increased the number of events we need to propagate and process to ensure we always return the latest content to users. We are currently designing a new Modern Event Platform to fill this need as well as the needs of our Analytics stack. This session looks to aggregate needs and requirements for the dependency tracking in order to provide parameters for designing this new system.
=Questions to answer during this session=
|**Question**|**Significance: Why is this question important? What is blocked by it remaining unanswered? **
|What event propagation issues do we have now? What use cases do we see event propagation as a solution for? Are the current efforts for the Modern event platform poised to solve these? What other gaps do we have and do we have solutions for these? If not what do we need to do to find them? |We are building a new event propagation system built on Kafka. This is being used in several contexts including dependency tracking. We should make sure that we understand what the new system is solving and if there are gaps identify them. We should also make sure we have a scalable way to store dependencies, in order to route events.
|What is the SLA for dependency tracking / delay for different types of updates? How do we solve the issue of the number of events caused by invalidation of Wikidata items due the inherent structure of Wikidata? Is there a way to reduce the number of events? If not, is there a way to scale to handle this number of events. If we integrate Wikidata and Wikibase more into other content Wikis, what is the impact of this? |When editing content, we need to invalidate our caches to ensure clients get the most recent version. Most of this is currently done using the JobQueue. Wikidata and the way that it stores data has increased the number of updates and has made invalidation more difficult when it impacts many pages. With thee increase of including Wikidata content on other Wikis, this may get worse. Do we know how we are going to solve this?
|Would deterministic parsing and closed templates impact the needs of dependency tracking? |Dependency tracking is key to invalidating content. Syntactically closed templates would allow us to ensure parts of HTML are independent of each other. How does this impact the need for dependency tracking and re-parsing of content?
|How does the product need to increase the use of content from other wikis impact the needs of event propagation? |Product is moving towards including content from other projects… specifically Wikipedia. This is already happening With Wikidata. This would seem to increase the need to propagate events to invalidate content across projects
=Scribe Instructions=
Please make a copy of the notes worksheet located here to take notes: https://docs.google.com/document/d/1J-wTeelHFGeXw6dO1ywkGr0NfnzG-cUykowc6aSKoWE/edit?usp=sharing
=Facilitator Instructions=
Use this document for reference: https://docs.google.com/document/d/1tGH8LEykXQ3r82rPT9n_EsLHAljDKjo7UtcatZSiH1g
= Resources: =
* Session Guide: https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2018/Session_Guide
= Session Structure =
* **Define session scope, clarify desired outcomes, present agenda**
* Discuss Focus Areas
** Discuss and Adjust. ''Note that we are not trying to come to a final agreement, we are just prioritizing and assigning responsibilities!''
** For each proposition [add etherpad link here]
*** Decides whether there is (mostly) agreement or disagreement and the proposition(s).
*** Decide whether there is more need for discussion on the topic, and how urgent or important that is.
*** Identify any open questions that need answering from others, and from who (product, ops, etc)
*** Decides who will drive the further discussion/decision process (ie: a four month deadline)
* Discuss additional strategy questions [add etherpad link here]. For each question:
** Decide whether it is considered important.
** Discuss who should answer it.
** Decide who will follow up on it.
* **Wrap up**
----
**Session Leaders** please:
[] Add more details to this task description.
[] Coordinate any pre-event discussions (here on Phab, IRC, email, hangout, etc).
[] Outline the plan for discussing this topic at the event.
[] Optionally, include what it will //not// try to solve.
[] Update this task with summaries of any pre-event discussions.
[] Include ways for people not attending to be involved in discussions before the event and afterwards.
----
Post-event Summary:
* ...
Action items:
* ...