Wed, Feb 13
After talking to @joal at All Hands, I'm now thinking that T211069 should be the first priority for integrating the scoring pipeline into Hadoop. We should design a stream transformer to calculate features for each revision, and store the resulting values partitioned by feature name. This involves very different challenges than what's being discussed here, and in the end we might be able to entirely circumvent this minor glitch about merging the data centers.
Fri, Feb 8
Didn't have time to do this, regrettably.
Wed, Feb 6
I've only had very general conversations about this project so far, and didn't realize there was working code :-)
Wed, Jan 30
Tue, Jan 29
Sat, Jan 26
Turns out this is really annoying in a stock mediawiki-vagrant, it obscures the form:
Fri, Jan 25
@fr-tech: Okay, just the main patch left and it's in your court! Let me know if I can do anything more.
Just for fun, I elaborated on the quick estimate based on existing w_cache files. Note that these are not the "root" data sources, these are the final, calculated values ready to be input into a model. Storing the calculated values is one alternative we might consider, the tradeoff is that the values are compact and can be used for existing models, but lack flexibility and completeness, so additional features added in the future will probably require another full MW API extraction.
Just discovered some good documentation about this exact problem, here: https://www.mediawiki.org/wiki/Manual:Extension_registration#Requirements_(dependencies)
@Nikerabbit Do you know of any mechanism to enable Extension:Translate when unit testing other extensions? ExtensionRegistry#isLoaded tells me that Translate isn't available from WMF CI tests for FundraisingTranslateWorkflow.
Thu, Jan 24
Wed, Jan 23
Important change of plans—We're discussing backfilling, and it might be best to allow mismatched model versions in the dumps for now. In other words, go ahead and backfill with whatever the current model version is. Normalized data will continue to be segregated by model version, but the monthly "current" and "historical" dumps will patch together whatever scores are available, simply taking the newest model version used to score each revision.
Tue, Jan 22
@Nikerabbit Hi, are we unblocked now that MLEB 2019.01 is released?