Page MenuHomePhabricator

Where to surface AI in Wikimedia Projects
Closed, ResolvedPublic

Assigned To
Authored By
Halfak
Oct 19 2016, 10:20 PM
Referenced Files
None
Tokens
"Like" token, awarded by Tgr."Like" token, awarded by Glorian_Yapinus."Like" token, awarded by Niharika."Like" token, awarded by ZhouZ."Cup of Joe" token, awarded by Capt_Swing."Like" token, awarded by Fjalapeno."Love" token, awarded by dr0ptp4kt.

Description

Session title

Where to surface AI in Wikimedia Projects

Main topic

T147708: Facilitate Wikidev'17 main topic "Artificial Intelligence to build and navigate content"

Type of activity

Unconference session

Description

Currently, ORES is available in production and article recommendations is right around the corner. Where should we be taking advantage of these predictions?

1. The problem

There are new useful AI services, but the predictions/rankings are not yet consistently surfaced through our UIs. Where should they be surfaced? What should we prioritize?

For example, ORES has 4 models:

  • reverted Should this edit be reverted?
  • goodfaith Was this edit saved in good-faith?
  • damaging Did this edit cause damage?
  • wp10 What's the quality level of this article?

Where should these predictions be surfaced?

2. Expected outcome

Develop a list of products and features that could take advantage of available AIs.

3. Current status of the discussion

While initially intended to be interpreted broadly, ORES and general predictions became the focus of discussion. It seems that the Recommendation-API isn't quite ready for this type of discussion. @Quiddity brought up the proposal to do the obvious with ORES wp10 (article quality) models -- surface it on article pages. @jmatazzoni is interested in discussion surfacing the ORES edit quality models in his team's Edit Review Improvements project. @EBernhardson talked about pulling the wp10 models into ranking search results (and @Sumit signaled interest in collaborating). @Tgr talked about extending where the edit quality models of ORES are surfaced (e.g. diff pages) and discussed T132901, a specific project about using ORES scores to trigger flagged revisions. @Cenarium is already working on some patches to implement the ORES-based (as well as other scoring mechanisms) auto-flagging of edits. @ZhouZ, @Niharika, @Capt_Swing, @Fjalapeno, @dr0ptp4kt, and @Glorian_Yapinus signalled interest in the topic by awarding this task a token.

4. Links

Proposed by

@Halfak (Facilitated by @Fjalapeno)

Preferred group size

Any - whoever is interested in getting AI in our products

Any supplies that you would need to run the session

A/V equipment for presenting an ether pad for discussion

Interested attendees (sign up below)

  1. Joe Matazzoni (@jmatazzoni )
  2. Pau Giner ( @Pginer-WMF )
  3. @Tgr
  4. @Niedzielski
  5. @EBernhardson
  6. @RHo
  7. @dr0ptp4kt if morning
  8. @Mholloway

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@leila, would you be interested in including the recommender AI in this session? I'd originally intended to include it, but I only provided ORES examples above.

@Halfak Can I get back to you about this by October 28? I'm thinking about it and I need to spend some cycles on it before finalizing the decision. With the WWW paper due on the 24th, I'm completely consumed at the moment though.

@Halfak I hear that my previous response came across as dismissal to the dev summit. That was not my intention and I'm sorry about it. I'll rephrase and expand below. :)

The bulk of the recommendation work we have done in article recommendation for creation is simple machine learning with solid research outcomes. There is a public api surfaced for the work at this point, and a few tool developers are using it, the CX team is our most sustained user. We have a tool that can be used by the different communities. These are all great but they are not ideal material for a summit with a focus on developer audience. For this audience, I would like us to have enough interesting material for a 15-20 min presentation, focused on the current state of development, development opportunities (on the api or using the api) in the future, and discussing the roadmap for how to build a recommendation pipeline for Wikimedia projects. I won't be able to deliver such a presentation given that I have already committed myself to quite a few items for the coming 2 months and I cannot add more items to my plate. sorry.

If @ellery or @schana would like to pick this up, they are by all means welcome to.

I would like us to have enough interesting material for a 15-20 min presentation

I was imagining a discussion style session. I hadn't intended for any sort of formal presentation.

For the the wp10 model, I'd like to be able to see the output surfaced at the top of the article via a gadget/script, very very similarly to how this Enwiki gadget does it: https://en.wikipedia.org/wiki/Wikipedia:Metadata_gadget - It could append to that line, as something like: "ORES-quality rating: B-class" - but it would work on all projects that have the wp10 model available.

@EBernhardson expressed interest in surfacing ORES wp10 scores in search results and using them for weighting results in search

@Fjalapeno, @JMinor, @JKatzWMF, et al. expressed interest in surfacing ORES scores to readers (probably wp10)

@ellery has been working on reader recommendations that should probably be served through some reader interface.

We're also working on building an edit types model that I'd like to surface on article history pages and Special:Contributions. See https://meta.wikimedia.org/wiki/Research:Automated_classification_of_edit_types

@Cervisiarius has been working on way to surface a link recommender to editors. It seems we might be able to surface something like that in VE.

@jmatazzoni & @Pginer-WMF will probably be interested in discussing their current work re. surfacing edit quality scores from ORES in the edit review improvements project. This session at the summit will probably provide a good venue for attracting collaborators and testers for the current work. It would be great to get some people who do patrolling work to give some feedback on the early mocks that we have or participate in some design exercises.

@jmatazzoni & @Pginer-WMF will probably be interested in discussing their current work re. surfacing edit quality scores from ORES

Happy to do it if there's interest.

@Cenarium has been working on Deferred Changes, a system that would use scores from ClueBot NG or ORES to post-hoc flag edits for review. That's a novel way of surfacing AI from within a baked-in wiki process. It seems like that would be worth including in a discussion at the dev summit.

with regards to wp10 in search, it is 100% still on our radar and something we are evaluating how to do right. We are a little behind still on figuring out how to best evaluate query independent factors such that we can integrate them into the scoring equation and produce better results. Semi-related, I've got a side project going to try and use machine learning on search clickthroughs to determine (query, page_id) relevance scores, and then correlating QI factors (such as wp10) to those relevance scores to determine how to actually integrate them properly.

Great to hear @EBernhardson and I'm not surprised. Extra work is hard to schedule even when it's mega cool. :)

On the #revision-scoring-as-a-service team, we're working on some new models that might come in handy. Specifically we're modeling sentences down to the grammar (T148037: Generate PCFG sentence models). I figure that, either this might give you or your team ideas for what ORES might provide beyond wp10 *or* someone on your team might be able to help us work through some of the hairy details of extracting natural language from wikitext.

For the editor-facing models (vandalism/goodfaith/damaging) a lot of it is obvious: show it on diffs, show it on pages which list diffs (recent changes, watchlist, user contributions, article history) and if possible allow filtering by it; provide robust and versatile APIs so that it's easy to integrate them in community-maintained patrolling tools. Maybe show aggregates on user profiles, although that's a bit scary. A less obvious options is to use them to decide whether to hide the last change to the article. I would like to work on T132901: Use revision scoring to trigger flagged protection when I have some free time. Also it would be interesting to expose that information to search engines somehow; Google indexing the vandalized version of a page is a frequent problem.

As I understand there is also an experimental service to detect harassment / personal attacks? It would be interesting to turn that into some sort of patrollable stream of civility violations.

Also, for both civility and diff quality metrics, if they are reliable enough and fast enough, some AbuseFilter-like functionality ("are you sure you want to save that?") could be very useful. Of course that brings up transparency concerns as it would be not easy to notice bias introduced by an algorithm that vets not-yet-saved edits...

Semi-related, I've got a side project going to try and use machine learning on search clickthroughs to determine (query, page_id) relevance scores, and then correlating QI factors (such as wp10) to those relevance scores to determine how to actually integrate them properly.

@EBernhardson is the side project somewhere public? I'd like to play with it and help out if possible as I'm already working on a subset of relevance problem.

For the editor-facing models (vandalism/goodfaith/damaging) a lot of it is obvious: show it on diffs, show it on pages which list diffs (recent changes, watchlist, user contributions, article history) and if possible allow filtering by it; provide robust and versatile APIs so that it's easy to integrate them in community-maintained patrolling tools. Maybe show aggregates on user profiles, although that's a bit scary. A less obvious options is to use them to decide whether to hide the last change to the article. I would like to work on T132901: Use revision scoring to trigger flagged protection when I have some free time. Also it would be interesting to expose that information to search engines somehow; Google indexing the vandalized version of a page is a frequent problem.

For FlaggedRevs, I've been working on an implementation of this concept: see T118696 for bots and AbuseFilter, and T150593 for ORES. See also this commit for the implementation (bots/AbuseFilter only, no commit yet for ORES), and the recently approved enwiki RFC.
In core mediawiki, we could have a minimalist RC patrol UI that displays a change as unpatrolled and allows to patrol it only in certain circumstances. I had worked on this concept using change tags in this commit, if this gets implemented we would only have to make ORES tag damaging edits. Compared with deferred changes, this could be used on a broader category of changes (we absolutely need to avoid backlogs in deferred changes).

Semi-related, I've got a side project going to try and use machine learning on search clickthroughs to determine (query, page_id) relevance scores, and then correlating QI factors (such as wp10) to those relevance scores to determine how to actually integrate them properly.

@EBernhardson is the side project somewhere public? I'd like to play with it and help out if possible as I'm already working on a subset of relevance problem.

Only kinda-sorta. It uses https://gerrit.wikimedia.org/r/#/c/317019/ to join click data against the original results that were presented to the user. ~90 days of this information is in hive at ebernhardson.top_query_clicks. Then the code that i'm processing it with is at https://github.com/ebernhardson/l2r

@Halfak Hey! As developer summit is less than four weeks from now, we are working on a plan to incorporate the ‘unconference sessions’ that have been proposed so far and would be generated on the spot. Thus, could you confirm if you plan to facilitate this session at the summit? Also, if your answer is 'YES,' I would like to encourage you to update/ arrange the task description fields to appear in the following format:

Session title
Main topic
Type of activity
Description Move ‘The Problem,' ‘Expected Outcome,' ‘Current status of the discussion’ and ‘Links’ to this section
Proposed by Your name linked to your MediaWiki URL, or profile elsewhere on the internet
Preferred group size
Any supplies that you would need to run the session e.g. post-its
Interested attendees (sign up below)

  1. Add your name here

We will be reaching out to the summit participants next week asking them to express their interest in unconference sessions by signing up.

@Fjalapeno and @jmatazzoni, any interest in taking the lead here? I've got a lot of other obligations for the dev summit and I know you've both been thinking about this a lot.

^ also potentially @dr0ptp4kt who is also digging in these areas. If none of them want to, I'd be willing to be a back up facilitator. I'm not working on this day-to-day, but am experienced in this area of tech and believe this is an important discussion.

To maintain the consistency, please consider referring to the template of the following task description: https://phabricator.wikimedia.org/T149564.

@Halfak I can facilitate, but I am far from an expert in ORES… would you be able to provide some support (especially in AI knowledge).

Also will you use attending the session?

I'll attend and I'm happy to help with the discussion. I don't think an intimate knowledge of ORES will be necessary at all. In writing this up, I was imagining a broader conversation about where AIs can feed into new information and usage patterns in the UI. I'd like to encourage thinking about AIs broadly and not limiting ourselves to ORES' current capabilities. Maybe we could purposefully schedule the session after T147710: What should an AI do you for you? Building an AI Wishlist..

@Halfak thanks that will be helpful. I updated the description to fit the new template. Do you have thoughts on the group size or supplies for the session?

Do you think it is anything more than a discussion forum (so maybe just projector/screen with an etherpad)?

Looks solid. I was imagining an open discussion with an etherpad & projector/screen.

Oh yeah, re. audience size, I don't think we need to set a limit. I'd like to leave the format up to you, but I wanted to suggest a small teaser of some ideas that people have about surfacing AI in the beginning. It might be a good idea to show a few of the tools that use ORES and the recommender service. We might even want to pull in some AI/UI ideas from other sites. I think a brief presentation like that would help everyone find common language when discussing their own ideas.

Fjalapeno added a subscriber: jmatazzoni.

@Halfak format sounds good to me… thanks!

I might hit you up for more info on recommended demos before the Dev summit.

Tgr updated the task description. (Show Details)

To the facilitator of this session: We have updated the unconference page with more instructions and faqs. Please review it in detail before the summit: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Unconference. If there are any questions or confusions, please ask! If your session gets a spot on the schedule, we would like you to read the session guidelines in detail: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Session_Guidelines. We would also then expect you to recruit Note-taker(s) 2(min) and 3 (max), Remote Moderator, and Advocate (optional) on the spot before the beginning of your session. Instructions about each role player's task are outlined in the guidelines. The physical version of the role cards will be available in all the session rooms! See you at the summit! :)