Page MenuHomePhabricator

Create a newcomerquality meta-model for revscoring
Closed, ResolvedPublic


Once T201370 is finished or labels are collecting, we can shift attention to creating the meta-model that will score newcomers.

halfak: "I think we'll want to implement a user-oriented set of "Datasources" in revscoring
And we'll want to create a separate repo (maybe "newcomerquality") to put specific features (e.g. the score of another model)
One of the issues we'll have is not duplicating the memory usage of other models."

Event Timeline

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptOct 1 2018, 8:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
notconfusing added a comment.EditedOct 16 2018, 9:22 PM

Ideas for feature engineering:

+ Temporal (time variance)
+ Number of articles edits
+ Number of Namespaces
+ Includes self-reverts
+ Multiple reverts (undoings)
+ Multiple repeated additions (deletions) (sometimes identical, sometimes very similar levenstein)
+ Replacing text on talk pages
+ Proper noun edits

@notconfusing and I worked together on looking at what it would take to implement a user_oriented dependency tree with a first_session DependencyList. It was more complicated than I expected it to be. One of the most difficult aspects was working out how to handle a list of items within the dependency tree. We concluded with an agreement to let me spend some more time thinking through the considerations for our dependency extraction strategy and if that isn't wildly successful, develop the newcomer quality/session quality model outside of the revscoring framework.

Starting repo with ipynb documenting work so far:


  1. Logistic regression works well enough and CV doesn't show overfitting.
  2. Feature importance show
    1. the maximum time in beteen edits indicates good faith (like a long contemplate edit?) (note that singleton-sessions have an imputed value of zero here, so needs to be checked)
    2. The mean, and variance of the goodfaith scores comes in at number 2, which are the simplest mathematical thing you could do as a meta-classifier.
    3. Then we have the number of edits in a session being poor indicator. Does this mean that vandalism happens in bursts? Or that COI editing likewise? Or that revert-wars show up here? I have seen such a thing in my own labelling, but didn't put my finger on it.
    4. Number 4 is total seconds, and surprisingly longer sessions are worse thought?

Next todos:

  1. Inspect Edit<-->Session label mismatches. Where goodfaith sessions wherer composed of badfaith edits? 22 out of 189 (Find a good word for this too).
  2. Inspect False Positives
  3. Decide on metric (not accuracy). Should be recall based to make sure editors are saved.
  4. Remove correlated features. Multicollinearity (or move away from Logistic Regression).
  5. Rerun analysis for 'damaging' label as well.

Note: I sent this talkpage message to everyone on en:Wikipedia:Labels/Edit quality

Invitation to New AI-Labelling Campaign for Newcomer Sessions

I'm reaching out to you because I saw that you signed up as a labelling volunteer at [[Wikipedia:Labels/Edit quality]]. I'm starting a new project that builds on Edit quality, to predict Newcomer quality. That is, to predict the damagingness and goodfaithness of "sessions" (multiple related edits) of users within 1 day of their registration. With this AI trained, we could help automatically distinguish betewen productive and unproductive new users. If you wouldn't mind taking a look at this new labelling campaign and label a few sessions I would be very grateful. In addition if you have any feedback or discover any bugs in the process I would appreciate that too. You can find the project page at [[Wikipedia:Labels/Newcomer_session_quality]] or go directly to [] and look for the campaign titled "Newcomer Session quality (2018)". Thanks so much!


notconfusing added a subscriber: Capt_Swing.EditedOct 30 2018, 9:10 PM

@Halfak and I reconvened and found that redoing the dependency-management of revscoring for this project's scope was too much at the moment. Our new goal is to create a python-package or simple "code snippet" that will allow other applications to use this model as easily as possible. There will be a very pretty function like newcomer_quality(user_id, timestamp) or newcomer_quality([revid(s)]). This is enwiki-only for now, although because of the interest from CivilServant frwiki is strong possibility. I am going to ask @Capt_Swing if he knows how HostBot would want the AI optimized? We have thought about it being a minimum precision model at around (1-5% minimum).

Optionally if it's possible we can see if we can remove sklearn as a dependency because we'll just be running the models forward (not training them). It will be necessary to retrain the models occasionally as the underlying edit-quality model is updated.

notconfusing closed this task as Resolved.Oct 30 2018, 9:48 PM
notconfusing triaged this task as Normal priority.