It has been announced that the Advanced Search will become default on all wikies (!) on November 28 2018.
It is critical to monitor the operation of the Advanced Search Dashboard near the onset of its introduction as a default for the following behavior:
- the current dashboard update engine (AdvancedSearchExtension_Update_Production.R running on regular daily schedule at 06:00 UTC from stat1007) uses in-memory concatenation of the existing log eventlogging schemata for this extension;
- the deployment of the feature as default on all wikis is likely to increase the extent of data by orders of magnitude;
- if the update engine survives this (in terms of successfully processing the data under given memory constraints on the server) there will be no need for immediate adjustments, but a study of its stability with the future growth of the dataset will be necessary;
- if the update engine fails under the expected circumstances, immediate adjustments will be made, most probably including HiveQL orchestration from within R in order to switch the processing to eventlogging schemata in the Data Lake instead of using log to fetch and R to process.
The update engine will not necessarily fail. For example, the Wiktionary Cognate Dashboard's update engine processes hundreds of millions of rows in-memory, in R, previously imported from a daily SQL dump of the respective database. However, for the Advanced Search Extension we still need to see if the same approach will hold or not. It all depends upon the extent of the incoming data as of November 28.
@Lea_WMDE At this point, I would need to know if your team is planning to introduce *any* changes that would affect the eventlogging for this extension before its roll-out as a default feature.