Page MenuHomePhabricator

Compare dewiki vs enwiki incoming new editors
Closed, ResolvedPublic

Description

Compare the time series profile of incoming new editors on enwiki with the one regularly generated for the dewiki users. Figure out whether the drop in the number of new editors is characteristic of dewiki only.

Event Timeline

@Verena @Stefan_Schneider_WMDE

Current status: 62.8 Gb RAM on our new stat1006 production machine is not enough to process the enwiki data; R script failed on memory allocation (in spite of a very, very defensive and cautious approach to resources management there).

Way to go: Spark. Even if this task is not of a high priority, sooner or later the processing of the dewiki new editors datasets will require scaling.

Problem to solve: connect R {sparklyr} to our Spark cluster from production. People have tried already, but the problem is not solved completely.

I will not focus on this entirely; as soon as I have something I will report back.

No problem. Thanks for the update.

Merged with https://phabricator.wikimedia.org/T171420: Migrate New Editors Analytics to Production on 08/29/2017.