Page MenuHomePhabricator

Design a system for recording votes on depicts suggestions
Closed, ResolvedPublic

Description

We need to design a system for recording user votes on MV-generated depicts suggestions.

Overview / Requirements
  • The user will have the opportunity to approve, reject, or abstain
  • Approved labels will be promoted to SDC Depicts
  • For MVP, a single approving vote will suffice to promote to SDC Depicts
  • Label promotions to SDC Depicts (via approving votes) will be recorded as edits attributed to the approving user
  • All votes should be recorded somewhere, to be used later for a/b testing and model refinement
  • An image is removed from the label approvals queue after a submission of votes that approves at least one label
    • Assumption: At least one good label will be presented in the majority of cases
Questions
  • Should the votes themselves (including negative votes and abstentions) be tracked as wiki revisions, perhaps in an MCR slot? This would have the benefit of allowing for easy third-party and community review.
    • The most likely alternative is to store votes in a dedicated MySQL table.
  • Should we provide for a multiple-vote requirement for approval/rejection from the start? How can we best design a system that will accommodate a possible future change in this requirement?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

It's too bad that Jade isn't further along, because all of this is pretty similar in spirit.

I'm in favor of tracking negative votes as wiki revisions, especially since a negative vote should be able to be reverted. I am curiuos about counting abstaining votes though. How or why would an abstention be counted? Especially if we were to consider the default state of a tag to be neutral?

I'm imagining something like a JSON structure recording the tags presented and the outcome for each, e.g.:

"votes": [
  {
    "user": "User",
    "labels": [
      {
        "label": "dog",
        "vote": "approve",
      },
      {
        "label": "tree",
        "vote": "reject",
      },
      {
        "label": "nimbus cloud",
        "vote": "abstain"
      },
      ...
    ]
  },
  ...
]

I don't have any use cases off the top of my head for the abstentions, but it seems like info we shouldn't just throw away.

It suppose I need to review exactly how mediainfo revisions are recorded and presented, since that's a pretty close analogue to what I'm proposing here.

Edit: The labels would probably be stored as Q-numbers rather than English text.

I'd be wary about introducing a new type of revision that might have to be accounted for in community tools and workflows. Plus, what's there to revert? If we're taking the image out of stack/queue once it gets input from user anyway then there's nothing to be undone, and I don't think we want to expose the potential to get into edit wars using this tool as a weapon.

Would we want to store the entire "Depicts: Dog" statement (P180: Q12345) at the DB level (as opposed to just the label/Q-number), in case this tool was ever re-purposed to work for other properties beyond Depicts?

@Ramsey-WMF if a user downvoted a suggestion on an image, and then someone "reverted" that downvote, I'd assume the image could go back into the queue with all of the original suggestions back in place – assuming no other structured data existed for it yet. We could treat it as if they had made an edit to Special:ConfirmDepicts/File:Foo.jpg. That would be in keeping with https://www.mediawiki.org/wiki/Everything_is_a_wiki_page.

I think that a good argument could be made for not treating "abstain" votes as an edit (or even logging them at all), however.

If any "edit" should be visible to and revertible by users, then 1) showing "abstain" votes (non-votes) as an edit will introduce a lot of noise into histories, etc., and 2) there is no meaningful "revert" action one could take to undo an abstention.

I think something like this would make a decent amount of sense:

  • Voting in favor of a suggestion = editing the structured data for a given file page, an action that can be logged and reverted.
  • Voting against a suggestion = editing some kind of special page corresponding to the image in question, and could similarly be logged/reverted.
  • Not voting for a given suggestion simply does nothing and is not recorded anywhere.

if a user downvoted a suggestion on an image, and then someone "reverted" that downvote, I'd assume the image could go back into the queue with all of the original suggestions back in place – assuming no other structured data existed for it yet.

This sounds like a lot of conditions to account for :)

Also - are you suggesting that a reversion of one confirmation = revert all confirmations? Again, I see this as a very perturbing vector for edit wars, confusion, and further complications.

This sounds like a lot of conditions to account for :)

I think all the "queue" would have to keep track of is: all qualifying images (prominent/ above minimum size/ not nsfw/ etc): that:

  • do not have structured data currently
  • possess one or more "machine-aided depicts statements" (at or above a given confidence threshold, if that is factored in)
  • have not had their machine-aided suggestions rejected by user action

If "reverting" a reject vote would cause an image to be shown in the queue again, then that's fine. Most/all of the above logic will need to be handled anyway so I don't think this is adding complexity.

Also - are you suggesting that a reversion of one confirmation = revert all confirmations? Again, I see this as a very perturbing vector for edit wars, confusion, and further complications.

This is a good question, and I guess it depends on how we interpret the rejection votes. Does a single reject vote for any item purge the file from the feed? If so then maybe an image could only be re-introduced if all rejected items are reverted (something that would probably have to be done one-at-a-time on an edit page, and would therefore be uncommon).

If "reverting" a reject vote would cause an image to be shown in the queue again, then that's fine.

How would the original user who rejected the tag dispute the revert?

I'm trying to imagine the full scenario here:

User A visits Special:ConfirmDepicts/File:Foo.jpg and rejects a given suggestion. That gets recorded as an edit in the history of Special:ConfirmDepicts/File:Foo.jpg, associated with their account or IP address. Now Foo.jpg will no longer show up in the queue of images (even if no other structured data has been added to it); someone will have to add statements to its File page manually from now on.

User B could visit Special:ConfirmDepicts/File:Foo.jpg and view the page history. Here they could see User A's rejection of a given suggestion, and revert it if desired. At that point, User A would be notified through whatever mechanisms normally inform users when an edit they have made has been reverted. At this point disputes, etc. could be handled normally.

If the suggestion which User A rejected from Foo.jpg was the only rejection (and if no other statements had been added to it), then theoretically Foo.jpg could return to the queue after User B reverted User A's edit.

I acknowledge that this scenario is probably something of an edge case, but I don't see anything here that violates any fundamental wiki-editing paradigms that I'm familiar with.

That gets recorded as an edit in the history of Special:ConfirmDepicts/File:Foo.jpg

Uhm....hmm. I don't think that's how special pages are supposed to work :)

If you can show me a working example of something similar in action (could even be something you make on Beta), I'd consider it. Though I'd still have reservations about creating millions of new "pages" for a tool that is essentially just an optional (and perhaps temporary) helper, and not meant to be a core part of the Commons editing flow, with all the consequences that entails.

Should the votes themselves (including negative votes and abstentions) be tracked as wiki revisions, perhaps in an MCR slot?

How much will an image be voted on? MCR does not have any kind of sophisticated history handling currently so if it's too spammy (in the page history, in recent changes, in watchlists...) that might be an issue. In RC/watchlists it can be hidden by default, page history and user contribs do not have such functionality currently.

Also the storage requirements will probably be nontrivial: revision is the biggest MediaWiki table, Commons is a fairly large wiki, most of its pages have a single revision, this would add another one. (Of course it's not any less data in a separate table, still, I'd make sure DBAs are on board with the idea.)

Also, revision content is stored in immutable blob storage so if someone adds a new vote, both the pre-vote JSON object and the new JSON object with the new vote added would have to be stored forever. That's probably not too bad given that the average image is not expected to receive lots of votes, and blob storage is essentially a key-value store and so much easier to scale than revision metadata which needs to be indexed in various ways, but still worth considering.

OTOH using MCR means you don't have to deal with files and voting data getting detached (page moves, deletion / undeletion, even cross-wiki import in the future could probably be just left to the framework to handle). And the standard anti-abuse toolset (revision deletion, AbuseFilter) would work with no extra effort, although voting on label suggestions is probably not something where anti-abuse would be needed much.

Should we provide for a multiple-vote requirement for approval/rejection from the start? How can we best design a system that will accommodate a possible future change in this requirement?

Presumably the depicts statement edit would still be attributed to the last voter? Other than that, I don't see the system affected much. The "should this vote result in an edit to the statement?" logic would become more complicated but recording votes would not be any different.

It's too bad that Jade isn't further along, because all of this is pretty similar in spirit.

OTOH it is a good time to surface this as a potential use case to make sure it will be flexible enough to handle it eventually.

I'm in favor of tracking negative votes as wiki revisions, especially since a negative vote should be able to be reverted.

Note that a revert, as usually understood, would remove the fact that the vote happened, instead of just disabling it somehow. That's probably useful in case of some kind of vote vandalism, but I'm not sure if it is wanted in general.

I am curiuos about counting abstaining votes though. How or why would an abstention be counted?

Depending on how much information is needed (specific labels the user has abstained on vs. just the fact of abstention), it could be a null revision, with information in the edit summary. Or the abstention could be recorded in whatever internal data structure the MCR slot has.
(That doesn't answer the "why", of course.)

User A visits Special:ConfirmDepicts/File:Foo.jpg and rejects a given suggestion. That gets recorded as an edit in the history of Special:ConfirmDepicts/File:Foo.jpg, associated with their account or IP address. Now Foo.jpg will no longer show up in the queue of images (even if no other structured data has been added to it); someone will have to add statements to its File page manually from now on.

Special pages do not have any page history (despite the name, they aren't really pages in sense that word is used in MediaWiki). There could be a custom namespace which contains voting data about other pages (e.g. Reviews:File:Foo.jpg would contain votes about File:Foo.jpg) with all kinds of interface customizations; that's how JADE is planned to work since it needs to be able to hold reviews about any kind of page, but in a case of the review handling code working in close concert with the code handling the target page, it seems like extra complexity without much benefit compared to just using an MCR slot on the file page. It would separate file page history from label review history, which is probably beneficial in some cases, but having to deal with extra pages (e.g. page creation log spam) would IMO offset that.

It sounds like we might be best off just keeping votes in a MySQL table in the first instance, especially since (notwithstanding my assertion to the contrary in the description) we will in fact want them to be queryable for purposes of selecting for inclusion in the queue. This should also leave us flexible enough to surface the voting history in the UI however we want down the line. Introducing votes as a new type of revision is indeed probably more complexity than we should be taking on.

@PDrouin-WMF Would having users able to review somewhere that label X was suggested and voted down, so that they could go back in and add the label manually if they disagree, address your concern about being able to "revert" rejecting votes? I tend to agree with @Tgr that the ability to revert a vote might not actually be what we want.

Yes, I'd be curious about how feasible it'd be to have than information
stored somewhere -- but I'm not sure where or what kind of workflow that
would require. Worth considering!

I'm beginning to lean against counting abstentions. They won't have any use in our current discussions, and counting only affirmative upvotes or downvotes would allow us to get away with putting them in a simple boolean field, besides keeping overall storage needs smaller.

Out of curiosity, do we plan to allow anon users to vote on depicts suggestions, or encourage logging in? Do we care either way?

@Mholloway I think that's definitely worth discussing. I was imagining non-logged in users could participate, but be warned that their IP address would be logged in the edit history.

Should the votes themselves (including negative votes and abstentions) be tracked as wiki revisions, perhaps in an MCR slot?

How much will an image be voted on? MCR does not have any kind of sophisticated history handling currently so if it's too spammy (in the page history, in recent changes, in watchlists...) that might be an issue. In RC/watchlists it can be hidden by default, page history and user contribs do not have such functionality currently.

The plan for v1 is to have labels confirmed or rejected by a single vote. It was also proposed to have labels require more than one vote to reject/confirm unless the voter is the uploader, in which case a single upvote/downvote will suffice. I don't know if this is still on the table.

We plan to allow users to "abstain," and there's some discussion of sending labels without affirmative confirmations or rejections, or those which have never received votes, back through the "queue" at a lower priority than images whose labels have never been presented for voting. Depending on how we store labels (for example, if we decide to store all labels for an image from a given provider in a single blob in attempt to mitigate potential scaling problems), this might be tough to support.

Also the storage requirements will probably be nontrivial: revision is the biggest MediaWiki table, Commons is a fairly large wiki, most of its pages have a single revision, this would add another one. (Of course it's not any less data in a separate table, still, I'd make sure DBAs are on board with the idea.)

That concern may be somewhat reduced here, since for MV depicts we're not concerned with every image on Commons. Two stages are planned: in the first, we'll process the ~259k images on Commons that are designated as featured, valued, or quality. The second, larger stage will involve every image used in mainspace on a non-Commons wiki. So, this is still a lot of new revisions (enough to be concerned about); but we're not talking about doubling the size of the revisions table essentially overnight.

My biggest concern is that if we want images to appear in the "queue," or not, based on a query over whether/how many times their label suggestions have been voted on, then keeping the vote record in an MCR slot seems like a nonstarter, at least unless we're simultaneously updating an auxiliary MySQL table or something.

I'm beginning to lean against counting abstentions. They won't have any use in our current discussions, and counting only affirmative upvotes or downvotes would allow us to get away with putting them in a simple boolean field, besides keeping overall storage needs smaller.

Can we calculate abstentions instead? If we showed it and it wasn't voted on = abstain? And could that be part of a data dump?

Out of curiosity, do we plan to allow anon users to vote on depicts suggestions, or encourage logging in? Do we care either way?

I think it can work either way, but I'm leaning towards requiring sign-in for a couple reasons:

  1. It could help with tracking bad actors (or those who are just making honest mistakes)
  2. It's another way to remind people about the on-site notifications we'll use to inform them that their uploads are ready for tag confirmation. I think a lot of casual users forget to log in all the time and miss notifications (I'm guilty of this myself to a degree). Requiring log-in should help a little bit to make sure they see those notifs.

Here's a concrete proposal for v1 in advance of today's meeting. The goal is to keep the implementation as simple as possible for this initial iteration.

  • The "queue" is implemented as an ImageQueryPage and suggests only images which have never been voted on
  • All depicts suggestions for an image are shown at once on the voting page, and each image with suggestions is voted on exactly once, for all labels.
  • The entire voting record is persisted.
  • The vote is a one-off and is final; it cannot be reverted.
  • A vote approving a label results in a revision being created, attributed to the voting user, that adds the label as a depicts statement. This can of course be reverted as with any ordinary revision.
  • The vote is stored in MySQL, and a special page is created at which users can look up the vote result for any given image.

I am concerned about the implications of tracking votes for each label separately, particularly as we scale up to the larger second corpus of all images used on external projects. At that point, our queries will very swiftly involve joins on tables with tens of millions of rows, and I become concerned about scalability and performance. The above design will allow us to store label suggestions, and votes on them, compactly in a single table.

I am even more concerned about the notion of allowing users to "revert" votes, which I don't think makes much sense conceptually, and I frankly don't know how we'd support sanely in the "queue."

As I mentioned earlier, there is an existing project called Jade, or Judgment and Dialogue Engine, which is precisely about enabling on-wiki review of and dialogue about AI judgments. It's designed and built by our in-house AI experts. Unfortunately, it's not yet deployed to production. In the event that this feature lives on longer-term, I believe strongly that we should help push JADE forward and adopt it for the fuller product vision(s) discussed above, rather than creating our own, feature-specific proto-JADE. And I don't think it's realistic to try to hack something similar together in a few weeks.

Mholloway assigned this task to Tgr.