Page MenuHomePhabricator

Introduce a new namespace for collaborative judgements about wiki entities
Open, NormalPublic

Description

Background

Hi, we've been asked by Site Reliability to circle back through the TechCom process in case you can help us find a more palatable way to store our data. The problem from an Ops perspective is that we're increasing row count in the page and revision tables, which are obviously critical infrastructure, and already at a breaking point. As I understand it, page and revision tables can't be sharded any more granularly than per-wiki, and they must be replicated to every DB node, so they scale poorly. Another concern is that pages can't be deleted in case our project fails or we decide to disable the on-wiki storage component.

The arguments in favor of wiki page storage revolve around how well wiki pages satisfy our requirements for collaboration, suppression, and visibility.

Here's some context for this project:
https://www.mediawiki.org/wiki/Jade

The currently proposed technical implementation and its code:
https://www.mediawiki.org/wiki/Extension:Jade

Exploration of alternative implementations:
https://www.mediawiki.org/wiki/Jade/Implementations

Exploration of alternative implementations (in spreadsheet form):
https://docs.google.com/spreadsheets/d/1y7CPeAFpjOO-FTXLhp9qfO3lx6-OsaroCMNSNJMUFqc/edit#gid=0

Anticipated use cases (pending user tests):
https://docs.google.com/spreadsheets/d/1RPb8VHbseE_xPe46nFqo4QVYmwzgfFJrO4_Wh-QKBSw/edit#gid=0

Older discussion about using wiki page storage for judgments:
https://etherpad.wikimedia.org/p/Jade_scalability_FAQ
T196547: [Epic] Extension:JADE scalability concerns

Proposal

The proposal is to create two new namespaces, Judgment and Judgment_talk (exact names to be decided at T200365). In the content namespace, pages will be a JSON description of judgments about wiki entities such as a particular edit. For example, w:en:Judgment:Diff/123457 would have judgments about whether https://en.wikipedia.org/wiki/?diff=123457 is a damaging or good-faith edit, and its talk page the dialogue leading to this consensus.

Integration

Our first integrations will be to transparently duplicate data from existing user workflows. (UPDATE: We have a new guideline for the project, which is to only integrate in ways that allow for collaboration. That way, our data will have more consistent quality and isn't just a mirror of the simpler, existing processes. Jade data should be produced and reviewed collaboratively.)

The first planned integration is T201361: Jade Implementation: Watchlist integration, which will expose Jade edits and summary information about the judgment in watchlists which track the page being judged.

If the watchlist integration goes well, similar principles can be used to embed Jade in other revision pagers (Special:RecentChanges, Special:Contributions, action=history).

Additional workflows can be enriched by Jade integration, for example we can collect comments during patrolling actions, which has been shown to increase operator accuracy in other domains. We can expose existing Jade judgments in patrolling interfaces, and allow for collaborative interaction with the judgment content. This is quite vague for now and will wait until after the initial integration cycles.

Each workflow integration will be enabled or rolled back by a separate wiki configuration.

Impact

Our estimated impact is to ramp up to a 1% increase in the number of revisions created on each wiki, with a page also created for each of these judgments. The integration will be done incrementally, so this increase doesn't have to happen all at once, but can be stretched out over months or years. We insist that our namespace is only appropriate for human judgments and not bot predictions, so human labor time should be the limiting factor for how much review is performed and how many pages are created. This human labor assumption is the basis for our 1% overhead, and it comes from the total proportion of current wiki edits which are reviewed across all review workflows. Since bots could blow through this ceiling in dangerous ways, we're asking for a social agreement to curb bot abuse in the new namespaces before enabling Jade on any wiki.

Future

In the very long term, we anticipate that structured content models will have dedicated storage support which will be a natural fit for Jade, allowing us to shard more appropriately, and run analysis queries into JSON content. Ideally, that migration will reclaim all storage from MariaDB

Alternatives

See this document for alternative implementations:
https://www.mediawiki.org/wiki/Jade/Implementations

Update, 2018-11-19

We've implemented some of the pilot features for Extension:Jade, see the following resources:

Beta cluster sandbox judgments:

Browse the source code:

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I think we can support filtering by adding an index on the summary data? I was planning to do this unless there's a technical reason not to.

This would be a significant change to the schema and usage patterns as discussed in the RFC. It's actually a completely new usage pattern. It will need at least another feedback round with DBAs, and kind of invalidates the RFC discussion we had.

An index on that field would allow prefix matches. Denormalized summaries for display are conceptually quite different from filter criteria, though. This should be explored with a clearer view on the use cases .

Again: adding filter functionality is not a minor change, it really turns the entire proposals on its head.

I think we can support filtering by adding an index on the summary data? I was planning to do this unless there's a technical reason not to.

This would be a significant change to the schema and usage patterns as discussed in the RFC. It's actually a completely new usage pattern. It will need at least another feedback round with DBAs, and kind of invalidates the RFC discussion we had.

An index on that field would allow prefix matches. Denormalized summaries for display are conceptually quite different from filter criteria, though. This should be explored with a clearer view on the use cases .

Again: adding filter functionality is not a minor change, it really turns the entire proposals on its head.

I'm... happy to not add the index also, if this is really the case. There's no prefix matching though, these are two tinyint fields holding a boolean each. Filtering is currently done on several other tables joined in a similar way in the revision pager queries, such as rc_tags and ORES scores, so I don't see this as a major upheaval. Also, we wouldn't be adding a filtering UI any time soon, we're just supporting the future use case by providing an index on the boolean columns.

Neither the augmentation nor filtering use cases is on our product roadmap for the near future, we're only adding the columns because of helpful suggestions in the IRC meeting. If you feel that it's very likely people will want to see augmentation, but unlikely that they want to do filtering, please do share the reasoning that supports this conclusion.

There's no prefix matching though, these are two tinyint fields holding a boolean each.

Ah right, summaries are to be normalized and reference by ID. So new prefix matching, only exact matches.

If you feel that it's very likely people will want to see augmentation, but unlikely that they want to do filtering, please do share the reasoning that supports this conclusion.

I do think people will ask for filtering immediately.

Filtering is currently done on several other tables joined in a similar way in the revision pager queries, such as rc_tags and ORES scores, so I don't see this as a major upheaval

Adding more things to join, and more ways to filter, adds more complexity and more load.

You are adding an access pattern that was not part of the proposal before, and you are modifying the proposed schema to accommodate it. This is a significant change to the schema as proposed. That means that at least DBAs need to looped in again.

Neither the augmentation nor filtering use cases is on our product roadmap for the near future, we're only adding the columns because of helpful suggestions in the IRC meeting.

The suggestion was made to avoid a situation where user demand to see the judgement in RC would lead to page content being loaded for RC views. Joining a table for selecting from it is not too much of a deal - it adds load and complexity, but it shouldn't be much. Joining the table in order to filter by it is potentially a much bigger deal, and needs DBA approval.

Adding an index that you may or may not need later for a feature which you are not yet sure you want to build also should be run by DBAs: it may be a good thing (to avoid the need to ALTER later), or it may be a bad thing (overhead with no benefit). This is a trade-off to be discussed with them.

Thanks, this has been a helpful tangent!

If you feel that it's very likely people will want to see augmentation, but unlikely that they want to do filtering, please do share the reasoning that supports this conclusion.

I do think people will ask for filtering immediately.

That seems likely to me too, and I'm glad it came up before deployment. I agree that the right way to proceed is to propose a schema and use cases, and ask for another DBA review.

Filtering is currently done on several other tables joined in a similar way in the revision pager queries, such as rc_tags and ORES scores, so I don't see this as a major upheaval

Adding more things to join, and more ways to filter, adds more complexity and more load.

My layperson's understanding of the change I'm proposing will have these effects (and non-effects):

  • We're already fetching and processing judgment content when inserting to the link table, so there's no additional cost to extract judgment values.
  • 2 or 3 more indexes on the link tables, cost is O(1) time and O(n) storage.
  • There should be no impact on revision pager queries until we have UI to filter on the new index.

The way I see it, we're taking the same risk whether or not we make my proposed change, which is that we might have to run a final ALTER before we can safely support filtering at scale. If we don't add the filtering indexes, we're guaranteed to need a schema change. If we do add the indexes, we stand a chance of not needing a schema change.

I feel like the more complete implementation is better, so I'll go ahead and submit that for review later this week. Out of respect for our discussion, I'm going to implement in two patches. The first will add columns and logic for "query for" but not "query by", and the second will support the questionable filtering use case.

Joining the table in order to filter by it is potentially a much bigger deal, and needs DBA approval.

Yes, but without enabling the filtering UI we probably see no impact. I'll still get review, of course.

There should be no impact on revision pager queries until we have UI to filter on the new index.

The danger with this approach, and the thing I'm trying to guard against here, is a pattern I have seen a few times before: first it's "we don't need a big discussion since we are not using the index/field/table/api yet" and then it's "we don't need a big discussion, because we already have the index/field/table/api". In the end, this leads to no broad discussion taking place, leading to insufficient review and coordination, through nobody's fault.

So I favor to not introducing things in the backend without discussion and coordination, even if it's unused for now. This also helps with avoiding premature implementation (YAGNI).

That being said, I don't see any immediate problem with the column or indexes are proposed. But they do change the proposal significantly, and the people involved in the discussion, especially DBAs, should be made aware.

Change 476447 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/JADE@master] Add indexes to filter by judgment value

https://gerrit.wikimedia.org/r/476447

We discussed this again in the TechCom meeting the other day. If DBAs are ok with not just the new field and indexes, but also with the new usage pattern (that is, filtering by values in the joined table), this can move forward in the RFC process. So, if DBAs have no objections, this can move on to last call.

@Marostegui Hello! I've added a few summary columns and indexes to the link tables, and the resulting DDL would look like this:
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/JADE/+/4644f1f0f9fa723062123df947af30b084d3b1f8/sql/

The motivation is so that JADE judgment values can be used in the new RC filters in the same way as ORES predictions, to filter and highlight rows.

Please review at your convenience.

@Marostegui Hello! I've added a few summary columns and indexes to the link tables, and the resulting DDL would look like this:
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/JADE/+/4644f1f0f9fa723062123df947af30b084d3b1f8/sql/

The motivation is so that JADE judgment values can be used in the new RC filters in the same way as ORES predictions, to filter and highlight rows.

Please review at your convenience.

What's the expected growth for that table?
There is really not much to judge in regards to the table/indexes without knowing which kind of queries they'll have. There are indexes on every single column of the jade_diff_judgment table, some of those might end up being useless/duplicate depending on the queries you are expecting to have.

jrbs added a subscriber: jrbs.Dec 11 2018, 6:40 PM
SPoore added a subscriber: SPoore.Dec 11 2018, 7:21 PM

Here are some example queries to help with reviewing the DDL. @Marostegui, I'm especially interested in your feedback obviously, that will allow us to close this final chapter.

The DDL to review is in these two patches, split between the basic content fields and index for joining on revision in the highlighting case, and a followup which adds indexes for filtering.
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/JADE/+/475932/
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/JADE/+/476447/

Example of a recentchanges query which joins with the diff judgment table in order to augment RC rows with judgment summary data. This would be rendered by highlighting "damaging" rows. My intention is that the jaded_revision and rc_this_oldid are indexes, and can be efficiently joined. The covering index includes jaded_damaging, so no row lookups need to be made.

1 SELECT
2 rc_id,
3 rc_timestamp,
4 rc_namespace,
5 rc_title,
6 rc_minor,
7 rc_bot,
8 rc_new,
9 rc_cur_id,
10 rc_this_oldid,
11 rc_last_oldid,
12 rc_type,
13 rc_source,
14 rc_patrolled,
15 rc_ip,
16 rc_old_len,
17 rc_new_len,
18 rc_deleted,
19 rc_logid,
20 rc_log_type,
21 rc_log_action,
22 rc_params,
23 comment_rc_comment.comment_text AS `rc_comment_text`,
24 comment_rc_comment.comment_data AS `rc_comment_data`,
25 comment_rc_comment.comment_id AS `rc_comment_cid`,
26 rc_user,
27 rc_user_text,
28 NULL AS `rc_actor`,
29 wl_user,
30 wl_notificationtimestamp,
31 page_latest,
32 (SELECT GROUP_CONCAT(ctd_name SEPARATOR ',') FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=rc_id ) AS `ts_tags`,
33+ jade_diff_judgment.jaded_damaging
34 FROM `recentchanges`
35 JOIN `comment` `comment_rc_comment`
36 ON ((comment_rc_comment.comment_id = rc_comment_id))
37 LEFT JOIN `watchlist`
38 ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace))
39 LEFT JOIN `page`
40 ON rc_cur_id = page_id
41+LEFT JOIN `jade_diff_judgment`
42+ ON rc_this_oldid = jaded_revision
43 WHERE
44 rc_bot = '0'
45 AND (rc_timestamp >= '20181210232653')
46 AND rc_new IN ('0','1')
47 ORDER BY rc_timestamp DESC
48 LIMIT 50;

This is a recentchanges query which filters on the same field, so only showing edits judged to be "damaging". The specialized index on jaded_damaging is intended to select efficiently for a small number of matching jade_diff_judgment rows.

1 SELECT
2 rc_id,
3 rc_timestamp,
4 rc_namespace,
5 rc_title,
6 rc_minor,
7 rc_bot,
8 rc_new,
9 rc_cur_id,
10 rc_this_oldid,
11 rc_last_oldid,
12 rc_type,
13 rc_source,
14 rc_patrolled,
15 rc_ip,
16 rc_old_len,
17 rc_new_len,
18 rc_deleted,
19 rc_logid,
20 rc_log_type,
21 rc_log_action,
22 rc_params,
23 comment_rc_comment.comment_text AS `rc_comment_text`,
24 comment_rc_comment.comment_data AS `rc_comment_data`,
25 comment_rc_comment.comment_id AS `rc_comment_cid`,
26 rc_user,
27 rc_user_text,
28 NULL AS `rc_actor`,
29 wl_user,
30 wl_notificationtimestamp,
31 page_latest,
32 (SELECT GROUP_CONCAT(ctd_name SEPARATOR ',') FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=rc_id ) AS `ts_tags`
33 FROM `recentchanges`
34 JOIN `comment` `comment_rc_comment`
35 ON ((comment_rc_comment.comment_id = rc_comment_id))
36 LEFT JOIN `watchlist`
37 ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace))
38 LEFT JOIN `page`
39 ON rc_cur_id = page_id
40+JOIN `jade_diff_judgment`
41+ ON rc_this_oldid = jaded_revision
42 WHERE
43 rc_bot = '0'
44 AND (rc_timestamp >= '20181210232653')
45 AND rc_new IN ('0','1')
46+ AND jade_diff_judgment.jaded_damaging = 1
47 ORDER BY rc_timestamp DESC
48 LIMIT 50;

Here are some example queries to help with reviewing the DDL. @Marostegui, I'm especially interested in your feedback obviously, that will allow us to close this final chapter.

Thanks for providing example queries - I asked at T200297#4793282 if you know more or less how much those two tables will grow?

This is a recentchanges query which filters on the same field, so only showing edits judged to be "damaging". The specialized index on jaded_damaging is intended to select efficiently for a small number of matching jade_diff_judgment rows.

1 SELECT
2 rc_id,
3 rc_timestamp,
4 rc_namespace,
5 rc_title,
6 rc_minor,
7 rc_bot,
8 rc_new,
9 rc_cur_id,
10 rc_this_oldid,
11 rc_last_oldid,
12 rc_type,
13 rc_source,
14 rc_patrolled,
15 rc_ip,
16 rc_old_len,
17 rc_new_len,
18 rc_deleted,
19 rc_logid,
20 rc_log_type,
21 rc_log_action,
22 rc_params,
23 comment_rc_comment.comment_text AS `rc_comment_text`,
24 comment_rc_comment.comment_data AS `rc_comment_data`,
25 comment_rc_comment.comment_id AS `rc_comment_cid`,
26 rc_user,
27 rc_user_text,
28 NULL AS `rc_actor`,
29 wl_user,
30 wl_notificationtimestamp,
31 page_latest,
32 (SELECT GROUP_CONCAT(ctd_name SEPARATOR ',') FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=rc_id ) AS `ts_tags`
33 FROM `recentchanges`
34 JOIN `comment` `comment_rc_comment`
35 ON ((comment_rc_comment.comment_id = rc_comment_id))
36 LEFT JOIN `watchlist`
37 ON (wl_user = '1' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace))
38 LEFT JOIN `page`
39 ON rc_cur_id = page_id
40+JOIN `jade_diff_judgment`
41+ ON rc_this_oldid = jaded_revision
42 WHERE
43 rc_bot = '0'
44 AND (rc_timestamp >= '20181210232653')
45 AND rc_new IN ('0','1')
46+ AND jade_diff_judgment.jaded_damaging = 1
47 ORDER BY rc_timestamp DESC
48 LIMIT 50;

I was talking to Brad about these queries and he pointed me to T97797 which could be something that might or might not happen to this query.
Without having data on the table it is impossible to know if the optimizer will do the right thing (not to scan the whole table and use the index) or if there are not many records that will go thru the filter and the optimizer decides to scan the full table to find records that match.
We have seen the optimizer doing unexpected things before (and later fixed on newer releases), so it cannot be discarded., this is a recent example of the optimizer doing unexpected things: T197486#4749182 (sometimes they are documented bugs or sometimes are things we just see with big tables or complicated queries)
Other than a possible misbehaviour of the optimizer, they look ok to me.

What's the expected growth for that table?

Once Jade is fully accepted by communities, I'm assuming that the upper bound on table growth will follow existing patrolling volume. Manual patrol artifacts are created for roughly 1% of revisions, so I expect the jade_diff_judgment and jade_revision_judgment tables to grow at that rate once we reach full adoption. There will be at most one row in each table for these 1% of revisions that are patrolled. The ramp-up to this volume will probably take several years and follow a sigmoidal curve, with discontinuous jumps as various wiki-workflows are transferred over to Jade storage.

Other than a possible misbehaviour of the optimizer, they look ok to me.

Well, that's definitely a concern but it sounds like there's not much I can do to code around these potential bugs. I believe that in the highlighting case we won't be slowing anything down, but in the filter case we would be risking exactly the scenario you describe, where jade_* tables have very few rows relative to the recentchanges table so we might end up greatly increasing (100x) the number of recentchanges rows to scan. Other filters have this same problem, for example change tags and ORES.

@daniel or anyone else from TechCom want to help close this out? Code review is on hold pending DBA review, so I we should take Marostegui's feedback and decide whether it makes sense to proceed with the schema as currently written.

Marostegui added a comment.EditedDec 20 2018, 5:31 PM

I cannot really provide more feedback on the code itself apart from what I commented about the queries.
I'm not experienced enough with MW to be able to comment on the code itself - I will relay on the MW experts! :-)

daniel moved this task from Under discussion to Inbox on the TechCom-RFC board.Dec 20 2018, 5:36 PM

Moving to the RFC inbox, so TechCom will look at it during the next meeting. Since DBA have approved the plan, TechCom will probably put the RFC on last call (per T200297#4787259) during its next meeting, which will be on January 3rd.

I cannot really provide more feedback on the code itself apart from what I commented about the queries.
I'm not experienced enough with MW to be able to comment on the code itself - I will relay on the MW experts! :-)

Thank you for sharing your insights into the queries, it was a tremendous help. I'm glad you flagged the point about sparse rows in the join possibly fooling the optimizer, I'll make a task to monitor the query plans after deployment.

For instance, in recent changes patrolling when a change is marked as "patrolled", we can assume it is also "non damaging"

This is incorrect. The way recent change patrol works is to process it as a queue of unreviewed changes (like an inbox) resulting in them eventually being marked as dealt with: "marked as patrolled". An edit having been marked as patrolled could've been any of the following:

  • good faith and without need for immediate follow-up.
  • good faith but needed a follow-up fix which has been done, or the patroller will soon do.
  • bad faith and needs revert, which has been done or will soon be done.

It has no relation to the perceived intention of the user, nor the quality of the contribution.

awight added a comment.Jan 2 2019, 7:46 PM

For instance, in recent changes patrolling when a change is marked as "patrolled", we can assume it is also "non damaging"

This is incorrect. The way recent change patrol works is to process it as a queue of unreviewed changes (like an inbox) resulting in them eventually being marked as dealt with: "marked as patrolled". An edit having been marked as patrolled could've been any of the following:

  • good faith and without need for immediate follow-up.
  • good faith but needed a follow-up fix which has been done, or the patroller will soon do.
  • bad faith and needs revert, which has been done or will soon be done.

    It has no relation to the perceived intention of the user, nor the quality of the contribution.

Thank you for the clarifications! Actually, we've eliminated this entire integration concept from our project and I'm surprised I missed all the text in this task description. I'll edit now to bring up to date.

The benefit of these integrations is that it will allow work deduplication.

I would very much like the patrolling workflows to become more effective, and reducing the duplication of efforts is a big part of that. So I'm happy to see this mentioned.

Having said that, I find lacking in the proposal a plan or direction for how Jade would accomplish that.

Today, patrolling typically happens mainly through tools that consume EventStreams, or poll API:RecentChanges.

On the small number of wikis that actually have "RC patrol" functionality enabled (where the "mark as patrolled" button is visible), we already have near-perfect deduplication because these tools would exclude edits in the user interface where rc_patrolled=marked (through the API query, or by listening for log events in the stream).

In terms of cross-coordination, this already happens as well. There are numerous ways through which an edit can be "dealt with" (reviewed, patrolled). All native ways to do that (core and extensions) automatically also mark the edit as patrolled, even if the user didn't use the patrolling UI. For example, core "rollback" functionality automatically marks revisions as patrolled. The same for FlaggedRevisions.

Dutch Wikipedia, Commons, and a few dozen other wikis are enjoying collaborative patrolling through this mechanism across different patrolling workflows.

The big problem is with the majority of wikis where this feature is disabled (no "mark as patrolled" buttons, and thus no ability to query/filter). Once we work with these communities to have the feature enabled (especially en.wikipedia.org), deduplication would naturally be solved there. Currently, deduplication is still achieved minimally within each tool. For example, Huggle has its own broadcasting system where it deduplicates efforts for users of the same tool, despite not being able to coordinate via the API (due to rcpatrol being disabled on enwiki).

Is the intention for Jade judgements to be indexed on recent changes, as for rc_patrolle and as for ores, to allow real-time filtering and querying to list "unjudged" revisions?

awight added a comment.Jan 2 2019, 8:09 PM

The benefit of these integrations is that it will allow work deduplication.

I would very much like the patrolling workflows to become more effective, and reducing the duplication of efforts is a big part of that. So I'm happy to see this mentioned.

Having said that, I find lacking in the proposal a plan or direction for how Jade would accomplish that.

....

On the small number of wikis that actually have "RC patrol" functionality enabled (where the "mark as patrolled" button is visible), we already have near-perfect deduplication because these tools would exclude edits in the user interface where rc_patrolled=marked (through the API query, or by listening for log events in the stream).

This is really helpful, we'll make sure that Jade is marking the "patrolled" bit as well. We'll also go through our potential pilot wikis and consider what to do for the ones without RC patrol.

I didn't fully grasp how patrolling deduplication happens until your explanation, but it sounds like Jade probably doesn't have any "magic" improvements to offer in this area, as I'd previously hoped. Instead, I think our contribution will be to enrich the type of interaction that can happen.

Is the intention for Jade judgements to be indexed on recent changes, as for rc_patrolle and as for ores, to allow real-time filtering and querying to list "unjudged" revisions?

Yes, judgments will be indexed by revision and we'll optimize for the recent changes case. For example, see this (unmerged) DDL, https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/JADE/+/3c23f4928f759c8ea03f8f6b5929c8d6280a3bff/sql/jade_diff_judgment.sql and some example queries in T200297#4829689

awight updated the task description. (Show Details)Jan 2 2019, 8:16 PM

For this weeks' TechCom-RFC Inbox triage, I'm unsure whether to move to Under discussion or Backlog.

We usually move things to Under discussion even when the proposal isn't final. The main factor for moving there being that there's a good understanding of the product needs and technical requirements, so that we can encourage others and ourselves to help approve, improve, or create a proposal.

At this point, however, I feel I don't have sufficient understanding to be able to ask someone for input on the technical implementation. I'll leave it in the Inbox for now to discuss in this weeks' TechCom meeting.

Also, @awight has invited me to join a meeting soon about Jade which will hopefully lead to that understanding.

Disclaimer: I've only read content on this task, and not the referenced mediawiki.org links. It's quite possible the questions I have would be answered there, in which case I'm happy to help summarise those bits here.

Halfak added a subscriber: Petrb.Jan 3 2019, 4:42 PM

I'll be attending the meeting that @Krinkle mentions. But in the meantime, I'd like to offer an explanation for what Jade will do that "patrolled" flags do not. This conversation is getting very Producty but if this is what TechComm needs to in order to consider

The most important difference is that the structured data of a "patrolling" action is recorded. Right now, we don't even have structured data about whether or not an edit was reverted. With Jade, we'll be able to record information about *why* an edit was reverted or not reverted. This is important for many reasons. One is that can use it to check the correlation between ORES models and human judgment. If one particular editor is marking lots of edits as "badfaith" (patrolled + reverted + warning in the unstructured world) that ORES thinks are good, it would probably be good to review those judgments to make sure that either the editor in question or ORES are not going crazy.

Secondarily, Jade is far more broad than the patrol flag and supports a multi-backlog view. We can use Jade on any wiki entity. E.g. people who want to support the Teahouse Hosts but who do not want to engage in social interaction with newcomers can use Jade to flag newcomers who are goodfaith but need help for others to reach out to. This flexibility makes patrolling more valuable too. In a conversation with @Petrb at the last hackathon, he told me about how Huggle users wanted to flag some edits as "suspicious but not obviously bad". That way, an edit could be removed from the Huggle queue but highlighted to watchlist users or added to some other backlog for careful review.

Finally, we can take advantage of increased data quality/specificity to train/test machine learning models. As editors go about their work, they are constantly making judgments about the quality of articles, edits, and other editors. These judgments can be harvested for the production and refinement of tools that make wiki-work easier. One reason that this appears as a key feature is that our users have been asking for it. Currently, we ask editors to manually label a random sample of edits to help us train/test models. This work is entirely secondary to actual wiki-work and feels a bit like a waste of time. Our collaborators have asked us to build something like Jade to cross the gap -- to turn their regular wiki-work into training/testing data so that they don't need to duplicate their work.

daniel added a comment.EditedJan 4 2019, 11:30 AM

Quick meta-point:

@Halfak wrote

This conversation is getting very Producty but if this is what TechCom needs to in order to consider

While TechCom doesn't make decisions on products and features (unless these products and features are APIs or other technical interfaces used by engineers), TechCom needs to understand the product requirements and the expected usage patterns. This is particularly crucial for assessing performance and scalability. An RFC discussion may involve detailed discussion about the product specification, so we can get an iddea which technical solutions are feasible.

In the end, engineering decisions are a cost/benefit tradeoff. I think the best approach to finding the optimal tradeoff that is iteratively refining the requirements from the product side, and the constraints from the engineering side.

Change 482319 had a related patch set uploaded (by Halfak; owner: Halfak):
[research/ores/wheels@master] Adds langdetect-1.0.7

https://gerrit.wikimedia.org/r/482319

Halfak added a comment.Jan 4 2019, 4:36 PM

^ Nevermind that. Was a copy-paste mistake

daniel moved this task from Inbox to Backlog on the TechCom-RFC board.Jan 8 2019, 2:42 PM

Moving this to "backlog" for now. This should have happened right after last week's TechCom meeting, but I forgot to do it.

TechCom will have a look at this during the next meeting again, and decide whether the questions regarding the product requirements have been fully answered, and how to proceed. @Halfak did the meeting happen? What was the outcome?

Halfak added a comment.Jan 8 2019, 4:01 PM

The meeting happened. @Krinkle confirmed that his questions had been answered. I imagined that he'd report to y'all.

Change 475932 merged by jenkins-bot:
[mediawiki/extensions/JADE@master] Summarize preferred judgment values in link table

https://gerrit.wikimedia.org/r/475932

Change 476447 merged by jenkins-bot:
[mediawiki/extensions/JADE@master] Add indexes to filter by judgment value

https://gerrit.wikimedia.org/r/476447

Krinkle renamed this task from Introduce a new namespace for collaborative judgments about wiki entities to Introduce a new namespace for collaborative judgements about wiki entities.Jan 17 2019, 5:50 AM
Krinkle added a comment.EditedJan 17 2019, 5:52 AM

I'm writing to summarise the meeting with the Scoring team about Jade on January 4th.

Note that the meeting wasn't required for the RFC, and TechCom's never had meetings like this before. We met because @awight wanted to hear from me in my volunteer role as CVN lead, about future iterations of Jade. Thanks very much for having taken the time to walk me through everything, and for hearing my perspective as a Wikipedia user. I believe the Scoring team has or will also interview other contributors and developers active in the area of content review.

From the TechCom perspective, I felt there were misunderstandings about Jade. As such, this meeting was an opportunity to mentally validate or correct my understanding of Jade. This could've been done on Phabricator, of course. The meeting presented an opportunity to accelerate that. In line with our actual process, I'll now synchronise to Phabricator.


The technical requirements of Jade are listed at mw:JADE/Implementations#System requirements. I'll highlight a few:

  • The DB must store (somehow) each version of each judgement. A judgement being: the label(s), the author, a timestamp, some free-form text, and whether this judgement is the preferred one for that entity. (There is a many-to-one relationship between judgements and diffs).
  • The DB must store in an efficient and queryable manner, a subset of the (one) preferred judgement's data for each revision and diff. The subset specifically does not contain the free-form text blob. The current prototype stores only the label in this fast index. I suppose the fast index could store the author and timestamp without concern - if you were to need that.
  • The UI at a dedicated url will show all judgements for a given diff or revision, with a way to add/change/remove this information. For example, users can read the free-form text here (akin to an edit summary) with which users can explain why they judged a certain way. And here a user could change which judgement is marked "primary".
  • The act of adding, changing, or removing judgements should integrate with existing tooling that communities have for monitoring activity on the wiki, so abuse of Jade is noticed (e.g. through RC/EventStreams)
  • The free-form text saved as part of a Judgement must addressable by content filters, and content filters should be capable of preventing a Judgement from being stored. (e.g. through AbuseFilter)
  • Users should be able to easily revert the adding/changing/removing of a Judgement.
  • Users should be able to hide the freeform text of a past judgement from the public. e.g. by reverting its addition, and then hiding the addition through RevDel or similar.
  • An API that, given a revision or diff ID, exposes the full data of judgement(s) - not just data from a fast index.

The technical concerns I raised were:

  1. The feature proposes to incorporate page IDs into user-facing and public APIs.

This is considered problematic because MediaWiki does not guarantee page IDs to remain stable. If explicitly required, I believe it could be made to work, but not without cross-departmental coordination and changing other aspects of MediaWiki; which might delay the project.

I understood now that this wasn't a requirement but rather as a presumed easier way to associate pages (easier than associating by title, as titles can change and would have to be kept in sync). Doing the syncing, however, is the status quo of page association in MediaWiki, and should be used here too, to avoid technical debt, disruption, or incompatibility with other features. (Links in wikitext, Page deletion, history merging, XML export/re-imports, etc.)

@daniel recommended using an MCR slot for the "page" type judgements. The team was receptive to this, but has instead decided to leave out this particular feature of Jade. To be reconsidered later. The scope of Jade is reduced to judgements about revisions and diffs.

  1. The Jade page model might be incompatible with expectations from other features in MediaWiki.

Here, I was thinking about the Liskov substitution principle, and how if some of the page features that come for free are undesirable for Jade, it might indicate that it isn't a good fit. Specifically, page creation would be disabled (mostly, or entirely) given title requirements, and not create-able through the generic UIs and APIs. Page rename would need to be disabled. And "Change content model" would need to be disabled. Etc.

Thinking about this a bit more, I believe the specific cases I thought of are not problematic, and can't think of other ones. For example through Page protection, TitleBlacklist, and user-rights, it is already possible to deny these actions. Technically, these abilities could appear unconditionally denied.

  1. The feature proposes to store arbitrary text (specifically, wikitext) inside JSON blobs.

Before the meeting, I was not aware of this feature. I thought that the "Notes" field in the UI mock for diffs (image) would be stored as an edit summary, on the revision to the Jade JSON page. However, the current prototype saves these as a field in the JSON schema.

The user that created a judgement is also referenced in the JSON content, by user_id and centralauth_gu_id. I haven't given much thought as to whether storing references to users inside page content could be avoided, or how to best do it. It's not something we've done before in a machine-readable way. User names would not be viable, given, unlike page titles, we mustn't change existing revision text so user rename would break the data integrity. IDs are a better choice. I'll leave this to others to review further.

The storing of wikitext inside JSON blobs, however, I have thought a bit more about. I don't currently see architectural or cross-cutting problems with it. But, I believe within Jade, it would likely be a source of bugs, technical debt, and high implementation/maintenance cost. For example:

  • Pre-save transformation (e.g. tildes, subst etc.) would not work. If left this way, would effectively be a new unspecified variant of wikitext. Workaround: Maybe to manually invoke the PST phase of the Parser over each string separately, before saving the edit.
  • Search index would see unexpanded wikitext, in JSON-escaped form; instead of expanded wikitext in natural form. Workaround? unsure.
  • AbuseFilter would see unexpanded wikitext, in JSON-escaped form. Rules would not work as expected. Workaround: AbuseFilter has hooks to allow a content model to change how its blob is seen by filter rules.
  • Diff view: Would see a JSON-escaped form of wikitext.
  • Images, templates, and out-going links from the wikitext would not be recorded in the database due to the real Parser only being invoked when viewing the page. E.g. images/templates appear unused, leading to breakage when user and bots can't find references to them. E.g. "issue" categories from Linter extension or Cite extension would not be populated. E.g. page links would be unknown and thus not be seen when fixing disambiguation links. Workaround: Maybe some kind of pseudo-page could be temporarily composed during the save action with just the wikitext blobs concatenated (unsafe?), to populate link tables, but discarded otherwise. Or to invoke it separately on each and merge them somehow. MCR has done some pre-work in this area with regards to multiple Content slots being mergeable.
  • Assuming link tables are populated, the wikitext blobs would still not be editable through the usual means. E.g. can't remove categories with HotCat, AutoWikitextBrowser would misinterpret the wikitext. API bots like CommonsDelinker would either skip the page as non-wikitext or wrongly edit it as if it were wikitext. Some of this applies to Wikidata as well, but the problem is narrower in scope there as its blobs don't contain wikitext. It has keys the store plain text, and keys the store a specific type of entity directly (e.g. file name, cat name or title).
  • Section editing.
  • Parsoid. VisualEditor. WikiEditor plugins from extensions and gadget. CharInsert. Other stuff I haven't thought of...

I think the UI and workflows can be implemented, and look and function exactly as proposed.

But, perhaps with the notes field internally stored elsewhere. For example, could the notes be subpages or actions of Judgement_talk? (keyed into the JSON schema). This would technically be much like how the current prototype produces a rendering that mixes a fixed layout and rendering of schema data, with various sequences of wikitext parsing. It might end up rendering more cheaply and with less concern for layout interference by transcluding them as ParserOutput HTML (already balanced) and composing it directly into an HTML template, rather than as wikitext.

  1. The feature proposes to store arbitrary text (specifically, wikitext) inside JSON blobs.

Responding to this point because it seems like the only one that might still be a blocker. As a point of information, our approach is very similar to what Wikibase does with inlining text and wikitext inside JSON, but we have our own code paths.

The user that created a judgement is also referenced in the JSON content, by user_id and centralauth_gu_id. I haven't given much thought as to whether storing references to users inside page content could be avoided, or how to best do it. It's not something we've done before in a machine-readable way. User names would not be viable, given, unlike page titles, we mustn't change existing revision text so user rename would break the data integrity. IDs are a better choice. I'll leave this to others to review further.

We store user IDs in endorsements, here's the corresponding schema: https://phabricator.wikimedia.org/diffusion/EJAD/browse/master/jsonschema/judgment/v1.json$108
These will be the numeric local ID and global ID, and not the username for the reason you already gave.

  • Pre-save transformation (e.g. tildes, subst etc.) would not work. If left this way, would effectively be a new unspecified variant of wikitext. Workaround: Maybe to manually invoke the PST phase of the Parser over each string separately, before saving the edit.

I checked and you're right, although we do the parser pass which updates the links table and so on, we don't have signature expansion. We'll plan to fix this bug. There are other interesting questions we'll need to specify, for example what happens if the text includes a __TOC__, <references />, or other normally page-level constructions.

  • Search index would see unexpanded wikitext, in JSON-escaped form; instead of expanded wikitext in natural form. Workaround? unsure.

Interestingly, this works already:
https://en.wikipedia.beta.wmflabs.org/w/index.php?search=%22notice+of+edit+warring%22&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%22namespaces%22%3A%5B810%5D%7D&ns810=1

  • AbuseFilter would see unexpanded wikitext, in JSON-escaped form. Rules would not work as expected. Workaround: AbuseFilter has hooks to allow a content model to change how its blob is seen by filter rules.

I'm not sure about this, so we'll have to dig deeper. What I've done so far is that our Content subclass implement fillParserOutput, in which we render the JSON into wikitext and feed it to the parser. This gained us a few basic integrations such as flattening the advanced search summary, above.

  • Diff view: Would see a JSON-escaped form of wikitext.

Good point, this is gross. In the long term we'll want to implement a custom diff like Wikibase has done.

  • Images, templates, and out-going links from the wikitext would not be recorded in the database due to the real Parser only being invoked when viewing the page.

This isn't correct, we gained integration via fillParserOutput.

  • Assuming link tables are populated, the wikitext blobs would still not be editable through the usual means. E.g. can't remove categories with HotCat, AutoWikitextBrowser would misinterpret the wikitext. API bots like CommonsDelinker would either skip the page as non-wikitext or wrongly edit it as if it were wikitext. Some of this applies to Wikidata as well, but the problem is narrower in scope there as its blobs don't contain wikitext. It has keys the store plain text, and keys the store a specific type of entity directly (e.g. file name, cat name or title).

Good point.

  • Section editing.
  • Parsoid. VisualEditor. WikiEditor plugins from extensions and gadget. CharInsert. Other stuff I haven't thought of...

Lumping these together, we plan to implement a custom editing experience, loosely based on the Wikibase paradigm.

But, perhaps with the notes field internally stored elsewhere. For example, could the notes be subpages or actions of Judgement_talk? (keyed into the JSON schema). This would technically be much like how the current prototype produces a rendering that mixes a fixed layout and rendering of schema data, with various sequences of wikitext parsing. It might end up rendering more cheaply and with less concern for layout interference by transcluding them as ParserOutput HTML (already balanced) and composing it directly into an HTML template, rather than as wikitext.

One possibility I had briefly entertained was that we store all the "machine-readable" data in JSON, in a non-main MCR slot. Then, the main slot would contain all the wikitext, "human-readable" content organized in an ad-hoc format. The Scoring Platform team decided that MCR isn't mature enough to depend on yet, but I'm interested in exploring this general idea in the future. I think it's a great idea to split the two types of content.

This wikitext-in-JSON thing seems really complicated. I read through both comments above and walked away with a much better understanding of mediawiki, and I think that's a bad thing :)

Quick obvious question that has a small chance of being useful: is wikitext really required here? Can't you take advantage of the JSON structure to infer things like signatures (you have user + timestamp) and just make the notes field simple text? I think the focus of these notes is not presentation, it's just making a point. And simple text seems like a more democratic way to allow that to happen.

This wikitext-in-JSON thing seems really complicated.

It's certainly tricky and might cause problems. We believe that freeform text should be wikitext to allow for arbitrary links and so on. If embedding in JSON turns out to be impossible, we'll have to workaround with some other technique such as splitting the text and JSON into separate documents, maybe with slots as I mentioned in my last comment, maybe as a subpage, or something else. Simple text is a good suggestion, but I'm annoyed that it can't express links.

Here's a counter-precedent from Wikidata in which linkable entities are mentioned in simple text, https://www.wikidata.org/wiki/Q503 -> section "Wikidata usage instructions". Better if that could be linked, right? Especially if the judgment notes might be expressing a complex idea which relies on specific evidence from diffs or other articles, etc.

Simple text is a good suggestion, but I'm annoyed that it can't express links.

I have a proposal in the pipeline for a simplified subset of wikitext for use in i18n messages. It would support links, bold/italic, paragraphs, and perhaps lists, but no nested lists, no templates, no images, no html tags, etc. Perhaps that would be an option.

Harej added a comment.Jan 24 2019, 5:04 PM

How does it relate to the subset of wikitext used for edit summaries? As I understand you can do simple things in edit summaries like links but nothing fancy like template expansion.

How does it relate to the subset of wikitext used for edit summaries? As I understand you can do simple things in edit summaries like links but nothing fancy like template expansion.

It's not the same subset. Edit summaries *only* support links, and really should not support any styling. Also, a link like [[Foo|Bar]] will be linked, but still render as [[Foo|Bar]] in the summary, not as Bar, as it would in wikitext. So, link support in summaries isn't a true subset, since it produces different output even for the part of the syntax it supports.

Harej updated the task description. (Show Details)Tue, Apr 16, 12:18 AM
Harej removed awight as the assignee of this task.
Harej removed a subscriber: awight.

I have not seen any updates for this RFC in a few months. My understanding is that most of the issues are addressed. Is there anything outstanding that should be addressed?

Could you summarize the tables added and the resource commitment, just to make sure we all understand it well?

colewhite triaged this task as Normal priority.Tue, Apr 16, 6:05 PM
Harej added a comment.Fri, Apr 19, 2:47 AM

The secondary schema that was requested was addressed in T202596. I think at one point the schema was submitted to @Marostegui for review but I am not sure what happened with that.

As far as I remember (it has been a while) all the stuff that was sent for me to review was reviewed and I believe it was even merged.

Harej added a comment.Sun, Apr 21, 6:48 PM

In your opinion, are there any problems you can foresee from a database design perspective?

jcrespo added a comment.EditedTue, Apr 23, 1:51 PM

@Harej My question is more like, is the summary still accurate about the result of the conversations? (e.g. rampup of 1%, etc.), bots technically not allowed, etc. If yes, no problem, if not, I was asking to update it to reflect the latest agreement.