Page MenuHomePhabricator

Okapi: Fresher -> Safer Spectrum, please review!!
Closed, ResolvedPublic

Assigned To
Authored By
RBrounley_WMF
Sep 25 2020, 8:16 PM
Referenced Files
F32361484: Fresh (Level 2).png
Sep 25 2020, 8:16 PM
F32361487: Safer (Level 4).png
Sep 25 2020, 8:16 PM
F32363844: Screen Shot 2020-09-25 at 12.24.46 PM.png
Sep 25 2020, 8:16 PM
F32361483: Raw (Level 1).png
Sep 25 2020, 8:16 PM
F32361486: Safest (Level 5).png
Sep 25 2020, 8:16 PM
F32361485: Fresher (Level 3).png
Sep 25 2020, 8:16 PM

Description

Hi all -

Ok this is fun and interesting, so please chime in if you are interested!

The Problem:

In our research of large Wikimedia data reusers, an obvious pain point is accidentally ingesting our live content that might have vandalism in the time frame between when editors pick it up. This causes obvious risk to the product they are reusing our content in but also creates a situation where there is content our community handles that is a phantom living outside of our projects.

The Solution:

We are building a feature into our alpha HTML Exports on Okapi to try to create different "safety" levels of our corpus of content as to fit the needs of different users. We are creating 5 different exports for each text-based wiki project that fit different levels on the spectrum from Safe (hyper community reviewed content) to Fresh (which is the raw data feed, similar to the current offerings). Below is a mock of the what it will look like for the downloaders. Disclaimer "Credible -> Fresh" is not the language we're using here, these are just mocks.

Screen Shot 2020-09-25 at 12.24.46 PM.png (646×1 px, 75 KB)

We are using inputs from the project/communities themselves; i.e Flagged Revisions, Patrolled Revisions, ORES Thresholds, Edit Count from user (in case of anonymous users), Time of edit live on project. Below is a TL;DR of the major differences at each level as well as a diagram explaining the data flow. Open to suggestions, critiques, and thoughts.

  • Level 1: Raw data feed; no considerations

Raw (Level 1).png (555×732 px, 52 KB)

  • Level 2: Freshest; .5 ORES threshold, 5 minutes live if passes threshold, trust anonymous user after 5 edits, trust anonymous edit after 30 minutes live.

Fresh (Level 2).png (946×1 px, 143 KB)

  • Level 3: Fresher; .6 ORES threshold, 30 minutes live if passes threshold, trust anonymous user after 10 edits, trust anonymous edit after 1 hour live.
    Fresher (Level 3).png (946×1 px, 139 KB)
  • Level 4: Safer; .67 ORES threshold, 3 hours live if passes threshold, trust anonymous user after 50 edits, trust anonymous edit after 12 hours live.

Safer (Level 4).png (946×1 px, 140 KB)

  • Level 5: Safest; .7 ORES threshold, 6 hours live if passes threshold, trust anonymous user after 100 edits, trust anonymous edit after 1 day live.

Safest (Level 5).png (946×1 px, 140 KB)

Let us know your thoughts, resources around different projects around Time Delays would be very helpful as these will different along project lines - we understand that and want to defer to the folks that understand that to help guide those decisions. Thanks all.

Ryan

Event Timeline

RBrounley_WMF updated the task description. (Show Details)

Feel free to add more subscribers, we want more opinions on this!

That sounds very cool, and seems well thought out! Will that flag be available via API as well, or only in dumps?

One thing maybe worth considering is replacing the editcount requirement with something closer to how the autoconfirmed group is defined (a combination of edit count and account age). Another might be to check if other recent edits of the user were reverted (revert detection has landed recently in core: T152434).

Also, especially on smaller wikis where most editors are geographically clustered, part of day can make a big difference in patroller reaction speed. Maybe the frequency of patrol action could be used to automatically adjust to that...

Does this "trust" expire?

For example it's certainly possible an anon user could be making good edits "today", but then tomorrow, next week, month or whatever time later are making bad edits (change of user using the IP etc)

That sounds very cool, and seems well thought out!

Why thank you. I'll totally take some credit for that compliment 😄

Will that flag be available via API as well, or only in dumps?

So the current plan is that we receive an edit via the events stream. It then gets fed through each of these logic flows and 5 versions of a page are maintained depending on how the edit matches these conditions.

One thing maybe worth considering is replacing the editcount requirement with something closer to how the autoconfirmed group is defined (a combination of edit count and account age). Another might be to check if other recent edits of the user were reverted (revert detection has landed recently in core: T152434).

I think we talked about possibly doing this. I cant remember why we didn't do it the version of the logic but yes that makes sense. That recent edit reversion sounds really interesting and definitely something to look into.

Also, especially on smaller wikis where most editors are geographically clustered, part of day can make a big difference in patroller reaction speed. Maybe the frequency of patrol action could be used to automatically adjust to that...

Interesting. Definitely adds complexity. We had planned that each wiki would have different settings for the various levels. I guess we could add some weightings based on typical activity patterns but maybe something for later versions.

Does this "trust" expire?

For example it's certainly possible an anon user could be making good edits "today", but then tomorrow, next week, month or whatever time later are making bad edits (change of user using the IP etc)

No right now we haven't built in hindsight beyond the check to see if a user has been blocked whilst the edit was in a holding pattern.

One thing maybe worth considering is replacing the editcount requirement with something closer to how the autoconfirmed group is defined (a combination of edit count and account age). Another might be to check if other recent edits of the user were reverted (revert detection has landed recently in core: T152434).

We actually aren't able to find the account age on the action api and the only data point we could see was editcount, thus making this a little more one dimensional than we want it to be. Do you know if there is a way on action api to find account age? Reversion detection is great, let me see that ticket if we can easily pull that in.

Also, especially on smaller wikis where most editors are geographically clustered, part of day can make a big difference in patroller reaction speed. Maybe the frequency of patrol action could be used to automatically adjust to that...

I need to design a way to test success of this system...I think we can then adjust and iterate to this level. So we'll have the five levels of the output, one of my initial thoughts is to take a list of actually vandalized revisions across every project and see if they make it into any of these versions. Maybe isolating like 10 pages per project and comparing them manually...initial thoughts - is there an easy way to scrape vandalized edits?

Does this "trust" expire?

For example it's certainly possible an anon user could be making good edits "today", but then tomorrow, next week, month or whatever time later are making bad edits (change of user using the IP etc)

No right now we haven't built in hindsight beyond the check to see if a user has been blocked whilst the edit was in a holding pattern.

We also are calculating "trust" upon every revision, as in, if Editor A is trusted on Monday - we will keep Monday's revision. If the same anon user makes an edit on Tuesday that isn't trusted, we will not keep Tuesday's revision - but Monday's will stay. Right now we don't have a database of the editors (on purpose) so we can maybe backtrack and find Editor A's revisions in the past but I feel like we also have the time delays activity to vet those bad revisions out. I am of the opinion that adding complications could make it unmanageable...curious of other's thoughts around it.

We actually aren't able to find the account age on the action api and the only data point we could see was editcount, thus making this a little more one dimensional than we want it to be. Do you know if there is a way on action api to find account age?

The users API tells you the registration date (it might be missing for very old accounts, we started logging it about a decade ago). To get the account age at the time of the edit, you'd have to do the date math from the registration date and edit timestamp, the API doesn't provide it directly. (The data lake does calculate it, so if you are using Analytics infrastructure for this anyway, maybe it is possible to reuse that?)

is there an easy way to scrape vandalized edits?

Not in any non-circular way AFAIK - you could use ORES (on wikis which have it) but that's also part of the logic that needs to be tested so I imagine it would not help.

Hi @RBrounley_WMF, thanks for sharing this and for the great work you are doing. Few comments from my side:

  • Although "Safe" is better than "credible", I think here you are referring more to "stable" or "reviewed". I think is risky to mark some content as "safe" with a WMF stamp. For example, in the context of COVID-19 related information (or elections-related content), with the tools you are applying we can say if something was "reviewed", but saying an information is "safe" could have even legal implications.
  • Where are all the thresholds coming from? Why 50 and not 40? or 30? I think those numbers might change a lot depending on the project's size. I'm not sure about the tech pipeline you are using, but without knowing the details, for my sounds natural to have adaptative thresholds depending on project's activity.
  • Have you consider adding any information about pageviews? This could be useful for (i) assessing content relevance (maybe is not important to have super fresh content for non-relevant pages) and (ii) detecting content anomalies .
  • More in general, I think labeling content (if I understood correctly you have 5 classes) is a risky approach. First, because you need to put names on those levels (and take the responsibility about those names), and second because is a one-dimensom label (although you are considering different inputs). Imho, it would be better to give a multidimensional score, with objective parameters (number of edits, ORES score, time on the platform, etc ...) and let the final dump consumer to put their own levels. In other words, if company X wants to call content with more than Y edits, and Z ores-score: "safe", that would be their responsibility. I'm not saying this only to avoid potential legal issues, but also because it depends a lot on the contents' usage and context. It is not the same to have "safe" information about a soccer score, than about the COVID-19 vaccines date of release. I understand that from a "dump" approach is difficult to do what I'm suggesting (because dumps implies a kind of "bucketing") but giving all the risks associated with WMF signed labels, it might good to think how work with multidimensional scores in a dump-based approach.
  • I also think you need to evaluate your algorithm some how. You have an hypothesis (you can filter content safety by a set of heuristics) and you have a methodology (the rule-based algorithm you are describing), but the evaluation is missing.

@RBrounley_WMF thank you for the conversation yesterday and today and your continued work to gather feedback and improve the work. I'm documenting all my feedback below (in no order).

  • When it comes to credibility/truth/misinformation, @diego is the subject matter expert in Research and I highly recommend you all continue the conversations to find a solution that works for all sides.
  • I read Diego's comments and I would like to emphasize the importance of each of them. I can expand each point from my perspective further if it's needed, but to err on the side of brevity: please follow his advice.
  • I understand the potential need from the consumer perspective to have an easy answer in the form of a product from us. I empathize with your situation as well. However, there is no easy answer from us at this point. We have not evaluated many of the hypotheses you have (for example, why 6 hours in level 5? I can tell you that my hypothesis is that the number of hours will depend on the number of pageviews to the page, number of editors in the project, how many editors have watched the page, if there are any stewards who speak the language of the project, and many more factors. I can confidently say that 6 hours for all projects is most likely wrong.).
  • You understand the customer needs. If I may share my perspective: when I gave a keynote talk in Google almost 2 years ago, what I heard loud and clear from many folks who are trying to answer the hard questions around data integrity in Google was: 1) we don't know all the important details of how WP/WM works when it comes to patrolling and content integrity and decision making on edits (consistent with something you had heard from others), 2) if you tell us what we should pay attention to, we can build models to handle the rest. This feedback makes me think: 1) we really need to make it clear what signals/features they should be aware of and pay attention to (both from your product's perspective and also that by doing so we help organizations in the ecosystem of the Web have better ways to stop the propagation of misinformation or bias), 2) we can/should rely on them doing the rest of the math themselves. If you don't agree with my second point, then my ask to you is that we get together and plan for doing the work to make it possible for you to offer a service we can stand behind.
  • I understand that OKAPI is a project that has many components and this particular part may not be the priority that you want to linger on for long. If that is the case, let's offer a much simpler solution and move on. I discourage you from putting a service out that we have not fully evaluated.

I close this feedback by re-iterating: myself and my team is committed to work with you and your team to make your product better if this is a priority for the organization. There are some areas that we need to be really clear and transparent about, such as our ability (or inability) to form judgement and assign labels. I'd like to be very clear about those as that's my responsibility here. I've asked Diego to do the same. I hope you find the feedback helpful.

I'll be around on Oct. 1 and 2 if you want to brainstorm more. I'll be focused on other projects in the week of October 5 and can spend more time with you starting October 12. (Same for Diego, to a good extent.)

A few technical notes to backup what others have said quite well above:

It is not the same to have "safe" information about a soccer score, than about the COVID-19 vaccines date of release.

Diego is making a really good and core point here. The ORES scores for instance do a very good job of detecting vandalism to prevent a bunch of spam characters from being injected or racial slurs being added (e.g., content warning but see the lists for English). That's a very different sort of problem than detecting (m|d)isinformation like release dates around vaccines -- there's no way for a ML service to know if a given date is reasonable or not and it's quite difficult to even know what information within an article would be considered most sensitive.

trust anonymous user after 100 edits

This is really "trust IP address after 100 edits" -- I hesitate to do it because most individuals who intend to make upwards of 100 edits will create an account as that's a much better way to be anonymous while the IP addresses that reasonably will reach 100 edits are much more likely to be shared IP addresses / proxies -- i.e. multiple people all editing (knowingly or unknowingly) with the same IP -- that haven't been blocked for spam yet. I frankly wouldn't make any criteria about # of edits by a given IP address, it's just too unstable of an identifier to be a good proxy for trust.

Where are all the thresholds coming from? Why 50 and not 40? or 30? I think those numbers might change a lot depending on the project's size. I'm not sure about the tech pipeline you are using, but without knowing the details, for my sounds natural to have adaptative thresholds depending on project's activity.

For instance, English ORES filters vs. Spanish ORES filters shows how these thresholds do vary between language communities -- partially this is the model and partially this is how a community wants to interpret a given label. I think we probably want some analysis of time-to-revert as well to the point made by TGR.

And a few additional points:

  • Another factor that you could consider incorporating is # of visiting watchers -- i.e. number of people with that article on their watchlist who have viewed that article at least within six months before the most recent edit. I'm personally very excited about the watchlist expiry project as it'll hopefully give much better data on how many editors are actually watching a page. This along with pageviews as mentioned by Diego, could be used to more dynamically decide on what threshold of time since the last edit might be considered "safe".
  • Be aware that depending on the success of this project, these threshold could introduce new motivation to game the metrics. Some can't be gamed -- e.g., time since edit -- but introducing new edit counts to achieve trust levels could be gamed and so perhaps these edit counts should be mapped to existing user access level thresholds on the wiki so that patrollers don't have to start watching for new patterns in abnormal behavior.

@RBrounley_WMF: Hi, do you still plan to work on this, as this task is assigned to you? Thanks.

@RBrounley_WMF: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task assignee on Feburary 22nd, 2023.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

JArguello-WMF claimed this task.