Page MenuHomePhabricator

Monitor: Nov 28 Advanced Search as a default feature to all wikis
Closed, ResolvedPublic


It has been announced that the Advanced Search will become default on all wikies (!) on November 28 2018.

It is critical to monitor the operation of the Advanced Search Dashboard near the onset of its introduction as a default for the following behavior:

  • the current dashboard update engine (AdvancedSearchExtension_Update_Production.R running on regular daily schedule at 06:00 UTC from stat1007) uses in-memory concatenation of the existing log eventlogging schemata for this extension;
  • the deployment of the feature as default on all wikis is likely to increase the extent of data by orders of magnitude;
  • if the update engine survives this (in terms of successfully processing the data under given memory constraints on the server) there will be no need for immediate adjustments, but a study of its stability with the future growth of the dataset will be necessary;
  • if the update engine fails under the expected circumstances, immediate adjustments will be made, most probably including HiveQL orchestration from within R in order to switch the processing to eventlogging schemata in the Data Lake instead of using log to fetch and R to process.

The update engine will not necessarily fail. For example, the Wiktionary Cognate Dashboard's update engine processes hundreds of millions of rows in-memory, in R, previously imported from a daily SQL dump of the respective database. However, for the Advanced Search Extension we still need to see if the same approach will hold or not. It all depends upon the extent of the incoming data as of November 28.

@Lea_WMDE At this point, I would need to know if your team is planning to introduce *any* changes that would affect the eventlogging for this extension before its roll-out as a default feature.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 21 2018, 10:18 AM
  • Also, to improve the dashboard responsiveness, some of the local processing in Shiny server.R will be migrated to the dashboard's update engine.

Hi @GoranSMilovanovic there are no plans for any changes before rolling out AdvancedSearch as a default feature.

  • One missing update run detected; missing data are being recollected now;
  • critical tests will be conducted tomorrow, first allowing for one attempt at the standard dashboard update (i.e. as it is now) and after the extension becomes default everywhere.
  • Missing data recollected, dashboard operating as usual.
GoranSMilovanovic added a comment.EditedNov 29 2018, 11:11 AM

@Lea_WMDE @RazShuty

  • The Advanced Search Extension usage is now spiking on the dashboard's Special:Search graph :)

My conclusions and the recommended course of action:

  • the dashboard update engine did not encounter any problems due to the increased amount of data;
  • as expected, the dashboard itself takes more time to load the datasets now; this will be handled by
  • moving some data pre-processing procedures away from the Shiny front-end to reduce the amount of data processed there.

The recommended optimization will take place in the following days and as soon as my work on several other tickets is completed.

  • The dashboard now loads fast as lightning;
  • all data pre-processing happens in production.

I will wait for one regular update run (tomorrow morning) just to check if everything runs smoothly, then closing the ticket as resolved.

GoranSMilovanovic closed this task as Resolved.Dec 7 2018, 12:06 AM