Page MenuHomePhabricator

Provide API module in GrowthExperiments to allow querying image suggestion API for titles
Open, MediumPublic

Description

Background
The Android team was responsible for the Image Recommendations MVP. Our roadmap included learning from the Growth team's V1 implementation of image recommendations. Now that V1 is complete and the team is exploring section images, the Android team is interested in bringing the Image Recommendations task into the app. We are determining the best course of action for API use and if we should follow the same path as Growth of adding images to articles that do not have images and a separate task for adding images in sections or a system that adds images to articles generally. Before we determine our approach we need to explore APIs.

Task
Explore possibility of using GrowthExperiments as a proxy as explained by @kostajh and affirmed by @BPirkle in task T306349#8311919

Ideal State
With 9.2% of logged in users from a 60 day sample making cross platform edits, the solution will ideally honor the following user story:
As an Android device user who completed GrowthExperiments Image Recommendations on Mobile Web, and someone that completes Suggested Edits in the Android app, I would like the progress I've made on the Image Recommendations task to carry over to the Image Recommendations task in the Android app and those contributions to be recognized across platform, so that I can have an accurate representation of how many tasks I've completed for topic areas I care about.

Acceptable Alternative
If we can not have parity, the following user story will work:
As an Android device user who completed GrowthExperiments Image Recommendations on Mobile Web, and someone that completes Suggested Edits in the Android app, I want to the Android app image recommendations suggested edits feature to be clearly distinct than the mobile web version, so that I can know when to use the mobile web version and when to use the Android app version.

Event Timeline

JTannerWMF triaged this task as Medium priority.Oct 13 2022, 3:55 PM
JTannerWMF created this task.
JTannerWMF moved this task from Backlog to Radar on the API Platform board.

So if I understand this correctly, @kostajh 's proposal is for the Growth team to

  1. write and maintain an Image Suggestions http api to proxy GET requests to the existing image suggestions cassandra gateway inside the GrowthExperiments extension
  2. possibly also provide a POST endpoint which would involve writing to Cassandra (or maybe just would write an event to EventGate that would eventually propagate to Cassandra)

If the growth team have the people/time to do this then it might expedite things, but I'm not sure it's a good idea. I feel like ImageSuggestions is growing beyond an "experiment" at this stage, and that this sort of API falls naturally under the purview of the API platform team rather than inside Product

So if I understand this correctly, @kostajh 's proposal is for the Growth team to

  1. write and maintain an Image Suggestions http api to proxy GET requests to the existing image suggestions cassandra gateway inside the GrowthExperiments extension
  2. possibly also provide a POST endpoint which would involve writing to Cassandra (or maybe just would write an event to EventGate that would eventually propagate to Cassandra)

If the growth team have the people/time to do this then it might expedite things, but I'm not sure it's a good idea. I feel like ImageSuggestions is growing beyond an "experiment" at this stage, and that this sort of API falls naturally under the purview of the API platform team rather than inside Product

I concur.

The direction we've been going in is to uniformly host APIs under the API Gateway (where requests are proxied to either a mircoservice or a MediaWiki REST endpoint); Utilizing the extension like this feels like a work-around.

And yes, the idea is that for datasets hosted on this platform, the HTTP gateway is used for reads, and pipelines consisting of either scheduled jobs, or events, are used for writes (i.e. we're decoupling the database).

TL;DR If such a service is going to accept POSTs for feedback submission, it would handle them by submitting an event.

I feel like ImageSuggestions is growing beyond an "experiment" at this stage, and that this sort of API falls naturally under the purview of the API platform team rather than inside Product

While I don't disagree, I'm the only developer on API Platform with MediaWiki development experience, and my time is already pretty much spoken for. If this task involves MediaWiki development (as presumably it does if we're collecting user-specific feedback), then we'd need to reshuffle some things for API Platform to take it on. I have no objections to that - I'm here to do whatever is most needed - so I'll let the various folks with "Manager" in their title sort that out. :-)

This is part of why we were asking if authentication/POSTs were required: if we could implement this as a simple GET-only service outside MediaWiki, then it would share a lot of similarities with how we're approaching T263489: AQS 2.0. We'd therefore have more options for who could do the coding, because we already have other devs spun up on that approach. To be clear: it makes sense that collecting feedback is desirable and I'm not suggesting we jump straight to omitting requirements. It just does make it a little more challenging for API Platform to take it on right now.

And FWIW, we're actively working on onboarding/hiring more developers. So I'm hopeful we'll be able to more easily take on things like this in the future. We're just not quite there yet.

I feel like ImageSuggestions is growing beyond an "experiment" at this stage, and that this sort of API falls naturally under the purview of the API platform team rather than inside Product

While I don't disagree, I'm the only developer on API Platform with MediaWiki development experience, and my time is already pretty much spoken for. If this task involves MediaWiki development (as presumably it does if we're collecting user-specific feedback), then we'd need to reshuffle some things for API Platform to take it on. I have no objections to that - I'm here to do whatever is most needed - so I'll let the various folks with "Manager" in their title sort that out. :-)

Out of curiosity, which parts would require MediaWiki development? I had imagined a stand-alone service deployed to k8s, like you had mentioned elsewhere.

Out of curiosity, which parts would require MediaWiki development? I had imagined a stand-alone service deployed to k8s, like you had mentioned elsewhere.

I was mostly basing that on this bit from the task description:

I would like the progress I've made on the Image Recommendations task to carry over to the Image Recommendations task in the Android app and those contributions to be recognized across platform

Maybe I'm wrong.

Can't speak to what is technically best, I will leave that for @Dbrant to decide after talking more with Kosta and perhaps Bill but if it is helpful, I delayed development on our end til January at the earliest so the API Platform team can have time to estimate the LOE if they were the ones to pick this up (I already discussed this with Virginia), so I hope that relieves some pressure. The intent of this task is if the estimation of the effort is pretty large, that we already know going into January if this is a feasible backup plan.

I would like the progress I've made on the Image Recommendations task to carry over to the Image Recommendations task in the Android app and those contributions to be recognized across platform

Can I get some clarification on this? Does the GrowthExperiments extension record and show user progress on image recommendations?

I would like the progress I've made on the Image Recommendations task to carry over to the Image Recommendations task in the Android app and those contributions to be recognized across platform

Can I get some clarification on this? Does the GrowthExperiments extension record and show user progress on image recommendations?

I understood this sentence to mean, "show me the number of image suggestion edits I have made". We add a "Suggested: add images" tag to edits that are made via the suggested edits image suggestion workflow (example).

I guess it would be OK for Android app to use the same edit tag (cc @nettrom_WMF @KStoller-WMF about that) because we could distinguish between web and Android app edits by looking at additional tags added on the edit (e.g. "Visual edit" indicates it was done via VE on desktop/mobile, and presumably Android/iOS apps add their own edit tags).

So if I understand this correctly, @kostajh 's proposal is for the Growth team to

  1. write and maintain an Image Suggestions http api to proxy GET requests to the existing image suggestions cassandra gateway inside the GrowthExperiments extension
  2. possibly also provide a POST endpoint which would involve writing to Cassandra (or maybe just would write an event to EventGate that would eventually propagate to Cassandra)

If the growth team have the people/time to do this then it might expedite things, but I'm not sure it's a good idea. I feel like ImageSuggestions is growing beyond an "experiment" at this stage, and that this sort of API falls naturally under the purview of the API platform team rather than inside Product

I concur.

The direction we've been going in is to uniformly host APIs under the API Gateway (where requests are proxied to either a mircoservice or a MediaWiki REST endpoint); Utilizing the extension like this feels like a work-around.

Yeah, totally agree, and it is not my preference – just offering it as a temporary workaround since there isn't a timeline yet in T306349: Public-facing API for querying image suggestion recommendations and submitting user feedback.

And yes, the idea is that for datasets hosted on this platform, the HTTP gateway is used for reads, and pipelines consisting of either scheduled jobs, or events, are used for writes (i.e. we're decoupling the database).

TL;DR If such a service is going to accept POSTs for feedback submission, it would handle them by submitting an event.

AIUI need to proxy the feedback event through MediaWiki unless we adjust the EventGate configuration for the relevant stream to allow for external event creation.

Yeah, totally agree, and it is not my preference – just offering it as a temporary workaround since there isn't a timeline yet in T306349: Public-facing API for querying image suggestion recommendations and submitting user feedback.

Cool ... can I propose that we don't progress with this spike until we've agreed the API interface in T306349 and got an idea of a potential timeline?

Yeah, totally agree, and it is not my preference – just offering it as a temporary workaround since there isn't a timeline yet in T306349: Public-facing API for querying image suggestion recommendations and submitting user feedback.

Cool ... can I propose that we don't progress with this spike until we've agreed the API interface in T306349 and got an idea of a potential timeline?

👍 from me.

[ ... ]

TL;DR If such a service is going to accept POSTs for feedback submission, it would handle them by submitting an event.

AIUI need to proxy the feedback event through MediaWiki unless we adjust the EventGate configuration for the relevant stream to allow for external event creation.

Any event-based pipeline for the datasets hosted on this platform shouldn't be limited to submissions from MediaWiki, that would be really limiting/inflexible. If that is a current limitation, we should change that (and it seems like an easy thing to fix).

I understood this sentence to mean, "show me the number of image suggestion edits I have made". We add a "Suggested: add images" tag to edits that are made via the suggested edits image suggestion workflow (example).

I guess it would be OK for Android app to use the same edit tag (cc @nettrom_WMF @KStoller-WMF about that) because we could distinguish between web and Android app edits by looking at additional tags added on the edit (e.g. "Visual edit" indicates it was done via VE on desktop/mobile, and presumably Android/iOS apps add their own edit tags).

As far as I'm concerned, using the same edit tag makes sense since the tasks are so similar (or identical, I'm not completely aware of the details here). I'd also expect the app edits to add the "mobile app edit" like they usually do. My main concern is that we document this well so that those who work with this data know that we'll need to exclude or exclusively include app edits depending on what metrics we're calculating.

Adding @SNowick_WMF to this as well for awareness, and in case I've missed something about analysis needs for app metrics.

Change 850497 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] [DNM] Add ApiQueryImageSuggestionMetadata

https://gerrit.wikimedia.org/r/850497

Change 850497 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] [DNM] Add ApiQueryImageSuggestionMetadata

https://gerrit.wikimedia.org/r/850497

I made this while having a look to see what might be involved in proxying access. It's marked "DNM" (do not merge), so just for discussion for now. We'd have to get approval from SRE about whether or not this would be allowed.

Based on the excellent notes in the google doc, as well as the blog post, I'll try to digest and reiterate all the specific APIs and other technical details for use in our implementation of image recommendations.
@kostajh Would you mind answering just a couple more minor questions (see inline below)?

Retrieving a list of recommendations

This can be done using action=query with generator=growthtasks, which provides page titles/ids for which recommendations are available, but not the contents of the recommendations themselves.
Note: paginating through these results should not be relied upon (since it uses a random sort), so we'll need to keep track of pageIds and filter out ones that have been seen already. There is also the question of refreshing the internal cache of suggestions, which is being worked on.

Retrieving the details of a specific recommendation

needs to be built

Adding an image to an article

This can be done using action=visualeditoredit, since this is simply adding a File: link to the top of the article.
Note that we'll also need to pass plugins data that contains the accept/reject status, plus the reason code. (refer to the blog post).

Question: When "rejecting" a recommendation, what does it mean to use the VisualEditor API in that case, if we're not actually making an edit? (i.e. what are the other api parameters?)

Invalidating a recommendation

This is done using action=growthinvalidateimagerecommendation

Question: Does this need to be called in addition to the VisualEditor action that provides the accept/reject status?

Working with Topics

The list of topics and topic-groups used by Growth is here.
Topics can be basically hard-coded in the client, since they are unlikely to change.
Localized topic names can be retrieved from meta=allmessages, with the messages being of the form growthexperiments-homepage-suggestededits-topic-name-[topic] for topics and growthexperiments-homepage-suggestededits-topic-group-name-[group] for groups.
The list of topics currently selected by the user is stored in the user option growthexperiments-homepage-se-ores-topic-filters, which can be retrieved from meta=userinfo and set using action=options.

Misc. configuration

The user options growthexperiments-homepage-enable and growthexperiments-homepage-suggestededits-activated must be set to 1.

Adding an image to an article

This can be done using action=visualeditoredit, since this is simply adding a File: link to the top of the article.
Note that we'll also need to pass plugins data that contains the accept/reject status, plus the reason code. (refer to the blog post).

Question: When "rejecting" a recommendation, what does it mean to use the VisualEditor API in that case, if we're not actually making an edit? (i.e. what are the other api parameters?)

Yeah, in retrospect, this could probably have been a separate endpoint. Maybe we could do that for this project. Here is an example of the payload for rejecting an image:

-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="action"

visualeditoredit
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="format"

json
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="formatversion"

2
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="paction"

save
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="useskin"

vector-2022
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="errorformat"

html
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="errorlang"

en
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="errorsuselocal"

true
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="editingStatsId"

4uemdleocs0v44vsvqdnd7cvrtj8goca
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="html"

rawdeflate,jVTRspMwEP2VyIz6BKFcWlosdaov+uB1xuv44FuaLCXTQGISWvv3btpScYarPsCQzdmzZw+brF8Izf3ZAGl8qzbr8CbGQi1/VpHgJYa9KSk1vVWJtnsqOPVgW0dJe7rvtiAkO8mDvECsqGlE2E73vopuCKU5U412vlymy5QGLH0ywCVT5RfMtsA9tXCUTuqO5kWeRigGmLiLaU/2Xu+/2JCgBc8Ib5h1gFJ6X8fLIWqsNmD9ORCXhu3ho4gI152HDqGzRZY+i3xkLTjDOIwSAlrJ7kAsqItxFoxCiIsw4nRvOVxbGPX4sJgsMSCePmxnowocsgXMV8Uuy/L5cpfzGiArFqzI5gB5KlbFis15PtEgqmm1kLWEcYtZmmVxWsRp/nW2KB9W5XyepGn6fVJTGItvYIOsMUWySKZtCvjy+GzCH05Jd2P+XEekwb/916H5xF6zVvcKWXbMwb8TEOilV7AZMtf0uh6pcP6swDUAflBAT0jGRGIa81axbl9B94q15g0a2SO0+j3x7iC7Tnb75NZlYnDctBQvi/cTIIkYW+Ng4LaTHpJr6Qu37tS5Gq1DUnXEUdZ2MDn0GcOPXh5xIK714iCvx7Ec+QzdVMKR2fMItOUcjEcgDecM7dTiTKQI/2+7RZxizoVFfC/kLQma8XxdvsOD2yFvwJBb85cgufcfYGEHbIyXgunRZiSpImSIyMXeq2Qz1N9Hm3cYP5Bw3pI1NagyUAax4Z76BQ==
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="campaign"


-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="wpWatchlistExpiry"

infinite
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="watchlist"

nochange
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="summary"

/* growthexperiments-addimage-summary-summary: 1 */
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="plugins"

ge-task-image-recommendation
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="data-ge-task-image-recommendation"

{"taskType":"image-recommendation","filename":"American_Crime_Story.jpg","accepted":false,"reasons":["unfamiliar"],"caption":""}
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="page"

Ma'amoul
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="oldid"

4740
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="basetimestamp"

20220704163955
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="starttimestamp"

20221103114312
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="etag"

"direct:4740/b1723834-5b6c-11ed-8d45-3e87d2fb6220/stash"
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="assert"

user
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="assertuser"

Admin
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="vetags"

visualeditor
-----------------------------3095042902054379116673148534
Content-Disposition: form-data; name="token"

f55441dd259dae7562ceca7ea2166a496363a94f+\
-----------------------------3095042902054379116673148534--

I think we could be convinced to move the "reject" use case to a separate API module, like action=growthrejectimagerecommendation. Then it would be much simpler for your client to interact with.

Invalidating a recommendation

This is done using action=growthinvalidateimagerecommendation

Question: Does this need to be called in addition to the VisualEditor action that provides the accept/reject status?

We currently only call this in edge case scenarios where the user lands on a page that already has an image added to it. This can happen because there can be a lag between when an article is edited (with an image) and the processing in the data pipeline for flagging articles as having image recommendations.

Adding an image to an article

This can be done using action=visualeditoredit, since this is simply adding a File: link to the top of the article.
Note that we'll also need to pass plugins data that contains the accept/reject status, plus the reason code. (refer to the blog post).

Question: When "rejecting" a recommendation, what does it mean to use the VisualEditor API in that case, if we're not actually making an edit? (i.e. what are the other api parameters?)

Yeah, in retrospect, this could probably have been a separate endpoint.
[...]
I think we could be convinced to move the "reject" use case to a separate API module, like action=growthrejectimagerecommendation. Then it would be much simpler for your client to interact with.

Filed as T322309: Create API module for image recommendation rejections

Note: paginating through these results should not be relied upon (since it uses a random sort), so we'll need to keep track of pageIds and filter out ones that have been seen already.

We could look into making the sort more controlled / controllable if that's useful. It would probably have to wait until gerrit 526621 is fully applied to the search index (cf T147505), as a stable pseudorandom sort wouldn't really be stable without that.

kostajh renamed this task from [SPIKE] Explore using GrowthExperiments as a proxy for bringing image recommendations into the Android app to Provide API module in GrowthExperiments to allow querying image suggestion API for titles.Fri, Jan 27, 1:49 PM