Page MenuHomePhabricator

Enable topic subscription dashboard to update regularly
Open, Needs TriagePublic

Description

T287126 extended the Talk Pages Project Superset Dashboard to include the information about topic subscriptions listed below. This data comes from the discussiontools_subscription table and currently requires manual updates. This task represents the work with iterating upon the Talk Pages Project Superset Dashboard to make it so the data shown within the Topic Notifications tab updates is automatically updated each time someone views the dashboard.

Requirements

The requirements below were finalized in the 19 October meeting between @Milimetric, @MNeisler, and @ppelberg.

  • The Topic Notifications tab within the Talk Pages Project Superset Dashboard [i] is automatically updated hourly
  • All data within the discussions_subscriptions tables should be "replicated" (right word?) within Hadoop
  • discussions_subscriptions from all Wikimedia wikis should be "replicated" within Hadoop, and subsequently be made available within Superset

Open questions

  • 1. How might we ensure that the data presented in Superset remains accurate while Superset is attempting into incorporate new data? Asked another way: How can Superset atomically read while Sqoop writes to the data behind the superset dashboard?

Done

  • The answers to all ===Open questions are documented within this ticket
  • The data within the Topic Notifications tab within the Talk Pages Project Superset Dashboard is automatically updated hourly.

Event Timeline

  • Frequency of Updates These charts are unable to be updated in real-time as the data source for these charts comes from a query that currently needs to be manually run and then uploaded into Superset. This process does not take long but will require some planning and discussion on the needed frequency of updates since it's not automated. If needed frequently, a job scheduler can eventually be set-up to automate this process.

Note: The current query and notebook require some reformatting so that it can be automated and run by a job scheduler. I can update this ticket with the updated query once complete.

ppelberg renamed this task from Enable topic subscription dashboard to update in real-time to Enable topic subscription dashboard to update regularly .Sep 9 2021, 9:23 PM
ppelberg updated the task description. (Show Details)

Analytics Engineering is looking into the feasibility of adding access to the MariaDB replicas in Superset. This would allow us to directly query the discussiontools_subscription database table within Superset and create a chart that is updated in real-time.

We will decide whether we need to create a job scheduler to update the topic subscriptions charts depending on the estimated timeline and confirmation of feasibility in T291195.

Meta: per T291195#7396005, Analytics Engineering is fine with us implementing what this ticket is describing in the time between now and when work on T291195 can be prioritized.

ppelberg updated the task description. (Show Details)
ppelberg added a subscriber: Milimetric.

(@Milimetric + @MNeisler: I've updated the task description with the ===Requirements and ===Open questions we talked about during today's meeting; please comment here if see you anything unexpected.)