Page MenuHomePhabricator

Implement features that allow for tracking and measuring machine-aided depicts usage/activity
Open, MediumPublic

Description

We have this:
Machine-aided depicts (MAD) is designed to happen on a special page on Commons OR via external clients that utilize the tag suggestion and voting features via API. Currently, there are no solid instrumentation plans for this feature.

We want this:
On top of what we have, implementation of whatever additional setup needed for instrumentation (tags to identify edits from this tool, etc.)

Acceptance Criteria:

  • An edit tag system that accurately identifies confirmed tags from this tool (for example in recent changes), and is easy to measure
  • Instrumentation capability that allows us to measure how often depicts tag additions from this tool are reverted
  • Instrumentation capability that allows cross-referencing the tag confidence scores and the frequency of reversion
  • Instrumentation capability that allows cross-referencing the tag confidence scores and the frequency of user confirmation/rejection
  • Instrumentation showing how often/which files are skipped

Details

Related Gerrit Patches:
mediawiki/extensions/MachineVision : masterAdd event logging for Special:SuggestedTags user interactions
mediawiki/extensions/MachineVision : masterTag reverted computer-aided tagging revisions
mediawiki/extensions/MachineVision : masterAdd depicts statement when a MAD suggestion is accepted

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 22 2019, 5:25 PM

I can add a machine-aided depicts tag (or similar) to every revision adding a depicts statement, as a start.

I can add a machine-aided depicts tag (or similar) to every revision adding a depicts statement, as a start.

That will be a great first start, thanks! Paging @mpopov for additional input/ideas :)

Ramsey-WMF renamed this task from [Stub] Implement features that allow for tracking and measuring machine-aided depicts usage/activity to Implement features that allow for tracking and measuring machine-aided depicts usage/activity.Aug 31 2019, 12:09 AM
Ramsey-WMF updated the task description. (Show Details)
Mholloway triaged this task as Medium priority.Aug 31 2019, 5:28 PM

This doesn't need to hold up technology reviews, but we will want it in place before launch.

Change 533978 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MachineVision@master] Add depicts statement when a MAD suggestion is accepted

https://gerrit.wikimedia.org/r/533978

Change 533978 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Add depicts statement when a MAD suggestion is accepted

https://gerrit.wikimedia.org/r/533978

This comment was removed by mpopov.
mpopov added a comment.EditedSep 10 2019, 9:13 PM

I can add a machine-aided depicts tag (or similar) to every revision adding a depicts statement, as a start.

That will be a great first start, thanks! Paging @mpopov for additional input/ideas :)

Thank you, Michael! The great news is that an edit tag gets us the second criterion too since it's easy to query edits and their revert status from the mediawiki history monthly snapshots in our data lake. (Calculating revert rate on a daily basis manually is possible but much more difficult.)

For "cross-referencing the tag confidence scores" I'm curious if those are currently stored anywhere. Or is the task to figure out that storage?

For client-side analytics instrumentation, Jason and I are working on a set of cross-platform libraries for standardized and unified product analytics that would simplify this, BUT we won't have production-quality stuff until end of Q2 or Q3. Until then, MachineAidedDepictsUsage is a potential schema we can use and instrument for. It's a relatively simple impression/click-style design that would track:

  • name of service used
  • the confidence score
  • how long the suggestion took

for action = "receive" events. What the user does with the received suggestions is tracked in follow-up "confirm"/"reject" events which don't need to include that information.

I'm starting vacation tomorrow so in the meantime I'd love to get a review of that design from someone on Product Analytics team and also for @Mholloway to let me know whether the tag_id and page_id make sense/are doable. I'm not sure how much logging is going to happen on the backend that's going to act as the mediator between MediaWiki and external services like Google Vision API, so tag_id might have to be replaced with a randomly generated identifier. As for page_id, that's more of a performance concern because it's easier to send an integer than a filename string, but let me know if that would be problematic.

@Ramsey-WMF: Once the schema design has been given a thumbs up, who would be instrumenting it? I don't know that I'm able to work on this in an engineering capacity.

@mpopov Thanks for drafting a schema! Some thoughts:

  • page_id should be fine.
  • tag_id — it seems like you have in mind here either a row ID or some other arbitrary unique identifier, but since page_id is required, maybe we could just use the Wikidata ID string here? (Wikidata ID + image SHA1 digest uniquely identify a label suggestion in the underlying table.)
  • wait_time — is this the time elapsed between submitting the labeling request(s) and receiving a response, or the time between the user opening Special:SuggestedTags and the tags being loaded from the DB and presented? The former may or may not be interesting if we ultimately use Google as our MV provider, since we'll be planning to use batched, asynchronous labeling requests.
  • We'll also be storing voting results in the DB, as well as per-provider confidence scores (see here). Perhaps it's redundant to log these in EL? Actually, now that I look at the needs in the description more closely, reversion is the only piece that isn't accounted for in the DB schema.

As for your last question, I'm guessing the implementation will fall either to Product Infra or to the Structured Data PHP devs.

Taking this off kanban until the final schema is confirmed.

mpopov added a comment.Oct 2 2019, 2:58 PM

@Mholloway can I please take a look at some db tables that get created with Extension:MachineVision and some existing data in them?

Hey @mpopov,

There are (nearly-up-to-date) table schemas on wiki at https://www.mediawiki.org/wiki/Extension:MachineVision/Schema.

Here's some examples of them with data populated, from my local MW-Vagrant environment:

  • machine_vision_label
+--------+---------------------------------+-----------------+------------+-----------------+--------------------+-----------------+-------------------+
| mvl_id | mvl_image_sha1                  | mvl_wikidata_id | mvl_review | mvl_uploader_id | mvl_suggested_time | mvl_reviewer_id | mvl_reviewed_time |
+--------+---------------------------------+-----------------+------------+-----------------+--------------------+-----------------+-------------------+
|    118 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q1390           |         -1 |               1 | 15690146515989     |               1 | 15690146599604    |
|    127 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q43806          |         -1 |               1 | 15690146515989     |               1 | 15690146599691    |
|    136 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q156449         |         -1 |               1 | 15690146515989     |               1 | 15690146599614    |
|    145 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q938020         |          1 |               1 | 15690146515989     |               1 | 15690146599025    |
|    154 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q1745802        |         -1 |               1 | 15690146515989     |               1 | 15690146599633    |
|    172 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q2707760        |         -1 |               1 | 15690146515989     |               1 | 15690146599644    |
|    181 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q11946202       |          1 |               1 | 15690146515989     |               1 | 15690146598602    |
|    190 | 9w2x3j4ul5tz8hsthz1bno1bdz5haqu | Q1141466        |          1 |               1 | 15690146515989     |               1 | 15690146597438    |
+--------+---------------------------------+-----------------+------------+-----------------+--------------------+-----------------+-------------------+
  • machine_vision_suggestion
+------------+-----------------+----------------+----------------+
| mvs_mvl_id | mvs_provider_id | mvs_timestamp  | mvs_confidence |
+------------+-----------------+----------------+----------------+
|        194 |               3 | 20190920153620 |       0.910027 |
|        195 |               3 | 20190920153620 |       0.910027 |
|        196 |               3 | 20190920153620 |       0.823577 |
|        197 |               3 | 20190920153620 |       0.822089 |
|        198 |               3 | 20190920153620 |       0.781328 |
|        199 |               3 | 20190920153620 |       0.777754 |
|        200 |               3 | 20190920153620 |       0.761064 |
|        201 |               3 | 20190920153620 |       0.760972 |
|        202 |               3 | 20190920153620 |       0.734942 |
|        203 |               3 | 20190920153620 |       0.688029 |
|        204 |               3 | 20190920153620 |       0.686922 |
+------------+-----------------+----------------+----------------+
  • machine_vision_provider
+--------+----------+
| mvp_id | mvp_name |
+--------+----------+
|      3 | google   |
+--------+----------+

These schemas (particularly machine_vision_label and machine_vision_suggestion) may change as a result of T227355: DBA review for the MachineVision extension.

Mholloway updated the task description. (Show Details)Oct 23 2019, 6:58 PM

Hey @mpopov, is this ready for dev? Or should I set up a meeting to talk about the best approach? It seems like the remaining items in the description can be derived from data being stored in MySQL, so I'm not sure EventLogging-style instrumentation is needed.

Mholloway raised the priority of this task from Medium to High.Oct 23 2019, 7:00 PM

Looks like we will want an additional tag for when a CAT revision is reverted (i.e., undone or rolled back), which I will add.

Change 547323 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MachineVision@master] Tag reverted computer-aided tagging revisions

https://gerrit.wikimedia.org/r/547323

The above patch adds a tag (computer-aided-tagging-revert) that is applied to reverted CAT revisions. How often CAT revisions are reverted can by dividing the count of revisions tagged with computer-aided-tagging by the count of revisions tagged with computer-aided-tagging-revert.

As for the remainder, to make my thinking explicit:

Instrumentation capability that allows cross-referencing the tag confidence scores and the frequency of reversion

It's probably best if I add a column to machine_vision_label to track whether the label was reverted, in addition to the above added tag; will follow up with another patch.

Instrumentation capability that allows cross-referencing the tag confidence scores and the frequency of user confirmation/rejection

Can be done with the existing DB tables: SELECT mvs_confidence, mvl_review FROM machine_vision_suggestion LEFT JOIN machine_vision_label ON mvs_mvl_id = mvl_id;

Instrumentation showing how often/which files are skipped

This is a good candidate for client-side event logging. @AnneT Do you have bandwidth to take on adding some event logging to the front end?

Mholloway lowered the priority of this task from High to Medium.Oct 30 2019, 11:10 PM

Upon further reflection, lowering this to normal (but leaving in Kanban) since it won't become critical until the official feature launch.

Change 547323 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Tag reverted computer-aided tagging revisions

https://gerrit.wikimedia.org/r/547323

Instrumentation capability that allows cross-referencing the tag confidence scores and the frequency of reversion

It's probably best if I add a column to machine_vision_label to track whether the label was reverted, in addition to the above added tag; will follow up with another patch.

This is done. This can be calculated by dividing the number of revisions tagged computer-aided-tagging by the number tagged computer-aided-tagging-revert.

Mholloway updated the task description. (Show Details)Nov 4 2019, 11:32 PM

Hi @Ramsey-WMF & @mpopov, I drafted an EventLogging schema for Special:SuggestedTags interactions: https://meta.wikimedia.org/wiki/Schema:SuggestedTagsAction. Please let me know your thoughts when you get a chance.

Mholloway added a comment.EditedNov 15 2019, 11:14 PM

Instrumentation capability that allows cross-referencing the tag confidence scores and the frequency of reversion

It's probably best if I add a column to machine_vision_label to track whether the label was reverted, in addition to the above added tag; will follow up with another patch.

Upon further reflection, a new column isn't required for this. Reverts can be identified by tag and the confidence score for the associated label can be looked up in machine_vision_suggestion.

Mholloway updated the task description. (Show Details)Nov 15 2019, 11:14 PM

Hi! Sorry, this disappeared completely off my radar so I'm just now catching up on the updates to this.

@Mholloway: Your SuggestedTagsAction schema looks great! Let's go with that one :) Also, to clarify, confirmed_count is 0 if the user published that none of the tags apply and NULL otherwise (e.g. skipped), right?

Let me know if that's what you or @AnneT or someone else end up instrumenting it for client-side tracking so I can whitelist/set the appropriate data retention config. It will also need purge info documented on its Talk page (see https://meta.wikimedia.org/wiki/Schema_talk:MobileWikiAppSuggestedEdits for example)

@Mholloway: Your SuggestedTagsAction schema looks great! Let's go with that one :) Also, to clarify, confirmed_count is 0 if the user published that none of the tags apply and NULL otherwise (e.g. skipped), right?

Thanks! Yes, that's correct.

Let me know if that's what you or @AnneT or someone else end up instrumenting it for client-side tracking so I can whitelist/set the appropriate data retention config. It will also need purge info documented on its Talk page (see https://meta.wikimedia.org/wiki/Schema_talk:MobileWikiAppSuggestedEdits for example)

I'm planning on working on this today. At what point does the retention config stuff need to happen?

Do we need a separate confirm action to reflect the final confirmation step after the user hits "publish"? @Ramsey-WMF @PDrouin-WMF @AnneT @mpopov What do you think?

@Mholloway unless I am misunderstanding you, we have this flow:

user confirm tags -> user hits publish button -> user sees confirmation screen -> user hits OK

and then the feed refreshes. Do you mean something else? I think this is sufficient.

@PDrouin-WMF The user-facing flow LGTM. For instrumentation, I'm wondering if we want to separately capture whether the user hits 'confirm' after initially hitting publish. I think that we do.

@Mholloway: Yes, I think there should be a confirm action since that's another step in the funnel.

The retention config stuff can happen anytime in the next several weeks. It's for retaining data past the most recent 90 days, so we have some time :)

Change 554377 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MachineVision@master] Add event logging for Special:SuggestedTags user interactions

https://gerrit.wikimedia.org/r/554377

@Ramsey-WMF @mpopov One minor tweak I'd like to propose: rather than is_user_upload, would it be acceptable to simply log the tab that the user is currently on (popular vs. personal uploads)? This is what the code is currently actually doing. I was about to update the initial API query to get the original uploader for each file (i.e., the uploader of the first version, since a file can have multiple versions), but I'm not sure it's worth the added complexity, since in practice users are likely to encounter their own uploads only on the personal uploads tab.

I think this is fine for our Special Page, but I wonder if it's going to be problematic for 2nd party use cases (like the probable eventual Android app tool that will make use of the API for this). If we're going to tag/log such edits from other interfaces differently anyway, it's a moot point and you can ignore me 😺

@Ramsey-WMF @mpopov One minor tweak I'd like to propose: rather than is_user_upload, would it be acceptable to simply log the tab that the user is currently on (popular vs. personal uploads)? This is what the code is currently actually doing. I was about to update the initial API query to get the original uploader for each file (i.e., the uploader of the first version, since a file can have multiple versions), but I'm not sure it's worth the added complexity, since in practice users are likely to encounter their own uploads only on the personal uploads tab.

Ah, that's a good point. I guess it depends on whether we want to try to have the apps also log to this schema, or they'll get their own. This schema is based directly on actions available in the web UI, but I wouldn't expect the apps' UI to be significantly different. @mpopov What do you think? Do we typically try to have all clients logging to the same schema, or is that not something to worry about?

mpopov added a comment.Dec 6 2019, 9:49 PM

Ah, that's a good point. I guess it depends on whether we want to try to have the apps also log to this schema, or they'll get their own. This schema is based directly on actions available in the web UI, but I wouldn't expect the apps' UI to be significantly different. @mpopov What do you think? Do we typically try to have all clients logging to the same schema, or is that not something to worry about?

Don't worry about it :) Other interfaces (e.g. Android app, particularly if this is included in the Suggested Edits feature) will use their own schema, especially since EventLogging on Android currently decorates events with client-side timestamps and the app install ID. Next year we'll be able to use Modern Event Platform and the multi-platform EPC libraries that Jason and I are working on to have unified cross-platform data collection, but for now this needs to use classic EventLogging and a schema specific to the web UI.

@Mholloway: I do like the idea of logging the tab name. Do you anticipate their being more tabs added in the future? Asking because if you anticipate more tabs, then that field shouldn't be an enum (updating the allowed values and the revision number will become a pain in the butt) so in that case leaving it as a flexible string would be better.

Thanks, @mpopov!

@Mholloway: I do like the idea of logging the tab name. Do you anticipate their being more tabs added in the future? Asking because if you anticipate more tabs, then that field shouldn't be an enum (updating the allowed values and the revision number will become a pain in the butt) so in that case leaving it as a flexible string would be better.

I don't think there are any plans yet to add more tabs, but I think you're right that it's safest to leave as a flexible string for now.

Change 554377 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Add event logging for Special:SuggestedTags user interactions

https://gerrit.wikimedia.org/r/554377

Mholloway updated the task description. (Show Details)Dec 10 2019, 7:50 PM

Confirmed that EventLogging is working well on Beta.

I'll take another look at this when it's rolled out to production, then leave it to @Ramsey-WMF to mark the task resolved.