Page MenuHomePhabricator

Provide some form of image filter to not show 'unexpected' images in image search unless the user opts-in
Open, LowestPublic

Description

When trying to add a file on user testing page at Portuguese Wikipedia (User:Username/Testes), human balls are shown as examples.

That is because the subpagename (Testes) that is automatically pasted on search box. That is an uncomfortable view that should be removed somehow.

I have made a video to make myself clear: https://www.youtube.com/watch?v=fKVusyd_dHE

Thanks.

Event Timeline

Teles raised the priority of this task from to Needs Triage.
Teles updated the task description. (Show Details)
Teles added a project: VisualEditor.
Teles subscribed.

I have long thought that auto-completing the image search query with the page title wasn't that useful. I'd rather we should personal user uploads or nothing, this seems like another good reason not to do it.

Also: Best. Bug. Ever.

I have long thought that auto-completing the image search query with the page title wasn't that useful. I'd rather we should personal user uploads or nothing, this seems like another good reason not to do it.

Also: Best. Bug. Ever.

The idea of listing user uploads makes sense to me.

But what if you want to look at Testes?

We should still allow people to search all uploads. This is really more of a search issue than a VE one. Even that is questionable.

I do wonder if searching by description (as opposed to file name) would be more helpful, but if the idea with this is to get us to add some sort of content flag you can consider it declined.

But what if you want to look at Testes?

What if I don't?

The search box will always be there for looking for anything. The point is that when I am on a page named "Testes" I am pointed to "Testes" in its meaning on English language, which I can assure you is definitely not desired.

(I was just drive-by commenting, totally agree that user's uploads makes much better sense)

We should still allow people to search all uploads. This is really more of a search issue than a VE one. Even that is questionable.

I do wonder if searching by description (as opposed to file name) would be more helpful, but if the idea with this is to get us to add some sort of content flag you can consider it declined.

The idea is not seeing testicles any time a random user try to test how to add a file.

The idea is not seeing testicles any time a random user try to test how to add a file.

Do you actually care about how we get to that result? You should know that we don't control the content here, the local wiki and any foreign file repositories (in this case, commons) control it.

The idea is not seeing testicles any time a random user try to test how to add a file.

Do you actually care about how we get to that result? You should know that we don't control the content here, the local wiki and any foreign file repositories (in this case, commons) control it.

I care about improving the tool and that is why this bug is opened. If you also care about it I don't get your point. Do you think we should close it? Ignore it? I was definitely expecting more but you are the boss here as it sounds when your first comment is about declining and not actually thinking. Did you actually think about good solutions or just narrowed you mind on an impossible one?

If there isn't a way to improve searching, I suggest to rename local page to a less porn-friendly-by-accident.

I care about improving the tool and that is why this bug is opened. If you also care about it I don't get your point. Do you think we should close it? Ignore it? I was definitely expecting more but you are the boss here as it sounds when your first comment is about declining and not actually thinking. Did you actually think about good solutions or just narrowed you mind on an impossible one?

The reason I ask is that if you are asking for us to blacklist specific strings inside our code or WMF server config then I would just decline it. Because you are going after this one particular case instead of a more general problem, I thought maybe that's what you meant. There are of course other innocent-sounding search queries which will return results that you probably disagree with.
I don't think there is a good/acceptable technical solution to this.

If there isn't a way to improve searching, I suggest to rename local page to a less porn-friendly-by-accident.

This is not a decision made in the capacity of being a VE developer, so it's not really an option to be considered in a VE phabricator ticket.

In reply to T121613#1886358:

Please criticize ideas, not people. Thanks for helping to keep Wikimedia Phabricator a respectful place.

I think we're getting a bit ahead of ourselves here. Irrespective of whether there's a solution or not, the report is well intentioned, and a user of our products is having a suboptimal experience. Let's slow down and think about whether we can help the user rather than being rash, shall we? :-)

The issue is that there are words that are quite innocuous in one language but profane in another. The example that @Teles gives is not the only one that comes to mind, actually! I'll spare you all the awkwardness of me sharing them here. :-p

A simple way to avert this problem is not to pre-populate the search field of the image box with the title of the page you're on. This removes some functionality, but it might be an acceptable tradeoff to solve this problem. I'm unsure. The decision ultimately rests with @Jdforrester-WMF as the product owner.

A simple way to avert this problem is not to pre-populate the search field of the image box with the title of the page you're on.

I don't think that really solves much, all it does is hide the problem (query returning results quite different to what the requester expected in their head) behind a 'search' button :/

This removes some functionality, but it might be an acceptable tradeoff to solve this problem.

Yeah, I don't think it is acceptable just because of this. If someone does try to upload a VE patch for that (I doubt it), I don't plan to accept it, especially in James' absence

@Deskana That would solve this particular case - but there is probably a more general bug for the discovery team about serving inappropriate images for wrong-language searches. It may even exist as I've seen in the past people complaining about innocent search terms return NSFW imagery on commons.

@Teles thanks for filing this issue - I don't think I would've stumbled across it with my English test cases.

@Krenair I don't think anyone is suggesting we image censor results - that would probably have to come from the community.

I don't think that really solves much, all it does is hide the problem behind a 'search' button :/

You're right, it doesn't solve it. But, it does help. Whether the tradeoff is worth it is the tricky point.

Yeah, I don't think it is acceptable just because of this. If someone does try to upload a VE patch for that (I doubt it), I don't plan to accept it, especially in James' absence

I personally think that this is enough of an edge case that this can wait for James's input. Besides, it's unlikely that we'd be able to deploy a fix for this before he's back anyway, due to the special deployment schedule over the holidays.

@Deskana That would solve this particular case - but there is probably a more general bug for the discovery team about serving inappropriate images for wrong-language searches. It may even exist as I've seen in the past people complaining about innocent search terms return NSFW imagery on commons.

Yeah, there's a bunch of issues with search that can cause inappropriate content to be surfaced. Some are language-based, some are not. A lot of them were fixed a while back. Not all of them, obviously. :-)

Incidentally, if I do an image search in Portuguese on google.pt for "testes", I still get tons of images of testes. Fewer than if I searched in English, but still. Clearly this is not a trivial problem! Regardless, I'll file a task for it. I doubt we'll get to it any time soon, however.

jayvdb renamed this task from Remove non-requested human testicles from user test pages to Remove non-requested English matches of human testicles from user test pages on Portuguese Wikipedia.Dec 17 2015, 1:17 AM
jayvdb triaged this task as High priority.
jayvdb added a project: I18n.
jayvdb set Security to None.
Krenair raised the priority of this task from High to Needs Triage.Dec 17 2015, 1:19 AM
In reply to T121613#1886358:

Please criticize ideas, not people. Thanks for helping to keep Wikimedia Phabricator a respectful place.

Maybe I wasn't that harsh as it might have sound. Read it just as an advice at your discretion. There is no respectful answer when somebody points out a problem and somebody else asks if they care about anything. It is not about anyone; it is about good treatment.

I should bring this to Phabricator as I believe that the better solutions may come from here. In case it is bet to try an alternative solution from local community, I should have it denied here first, I suppose.

One way around this /might/ be to search not on image title but on image description in the local language. i.e. for users on pt.wp display only images that have "testes" in the Portuguese description field. for users on e.g. ca.wp display only images that have "testes" in the Catalan description, etc. Obviously this requires there to be a pt, etc. description for useful files ( and there are fewer of these an en descirptions) and wont resolve issues of homographs or vandalism but it should (I think) otherwise resolve the issue reported here (and might even encourage more local language descriptions on Commons, which would be a nice side effect). This should be the only search done for automatic searches.

For manual searches, options could be a check box for "search image titles" and a radiobox pair for "search only <wiki (interface?) language> descriptions" or "search all language descriptions".

Anticipating user personal uploads seems a good idea when adding an image that you recently uploaded. I think we could include that as part of the default set of images we provide initially to the user to anticipate the next steps.

Nevertheless, surfacing media about the topic at hand seems also to be useful in the default state. One of the problems here is to rely to much on text and too few on the semantics behind it which can result on images about an unrelated topic. Relying more on Wikidata connections can help.

One approach to consider would be to show the images used on equivalent pages in other languages which are not used in the current one. For example, when Italian Wikipedia users edit the "Formaggio" page they will find those images that English Wikipedia editors used for "Cheese" article or French Wikipedia editors added to the "Fromage" article. This may not be useful for test pages, but can be helpful for more common cases.

If there isn't a way to improve searching, I suggest to rename local page to a less porn-friendly-by-accident.

The "Testes" link in personal bar comes from the SandboxLink extension, and the subpage's name can be configured at https://pt.wikipedia.org/wiki/MediaWiki:Sandboxlink-subpage-name (personal bar link label can be changed to match at https://pt.wikipedia.org/wiki/MediaWiki:Sandboxlink-portlet-label). However, this would "break" the link for all the users who already created their sandbox.

Jdforrester-WMF renamed this task from Remove non-requested English matches of human testicles from user test pages on Portuguese Wikipedia to Provide some form of image filter to not show 'unexpected' images in image search unless the user opts-in.Jan 19 2016, 8:05 PM
Jdforrester-WMF triaged this task as Lowest priority.
Jdforrester-WMF moved this task from To Triage to Freezer on the VisualEditor board.

I think the realistic way of handling it would be to use commons categories hierarchy to figure out which images might be "undesirable" (probably by configuration) and (optionally) not display such images in completion/search results. Unfortunately, as I see it currently, there are a number of pieces missing to make that work - commons categories are not structured enough and there's no good way to quickly query the category tree.

The bug as originally described has been resolved by T62398, however there is still not general filter for "unexpected" images in media search.