Page MenuHomePhabricator

[M] Categories and Pages tab for Media Search
Closed, ResolvedPublic

Description

Currently in Media Search we don't have a place for pages and other text based search results. We want to expand upon the Categories tab and make it the Categories and Pages tab to include any other text based page results.

categories_pages_tab.jpg (703×1 px, 112 KB)

Acceptance Criteria:

  • The Categories tab should be renamed "Categories and Pages"
  • The Categories and Pages tab should show results for categories and all other text-based pages
    • The full list: Category pages, Commons pages, Help pages, Creator pages, Institution pages, and Talk pages (Talk, User talk, Commons talk, File talk, MediaWiki talk, Template talk, Help talk, Category talk, Creator talk, TimedText talk, Sequence talk, Institution talk, Campaign talk, Data talk, GWToolset talk, Module talk, Translation talk, Gadget talk, Gadget definition talk)
  • The metadata shown for each category result should be: Category name, # of members, # of subcategories and # of files
  • The metadata shown for each page result should be: Page title, page size (page size will be handled as part of T262992), word count (this is currently not feasible)

During development, please test the following:

  • Test this feature while logged in AND logged out
  • Test this feature on at least one mobile browser

Event Timeline

CBogen renamed this task from Categories and Pages tab to Categories and Pages tab for Media Search.Jul 22 2020, 3:59 PM
CBogen renamed this task from Categories and Pages tab for Media Search to [M] Categories and Pages tab for Media Search.Jul 22 2020, 4:22 PM
egardner added a subscriber: egardner.

Our "Alpha" stage tasks are getting pretty extensive, so I think this is a good candidate to bump into Beta. It' an additional feature that we can add in when some of the other work is out of the way.

Change 626787 had a related patch set uploaded (by Anne Tomasevich; owner: Anne Tomasevich):
[mediawiki/extensions/WikibaseMediaInfo@master] Expand categories tab to include various namespaces

https://gerrit.wikimedia.org/r/626787

Change 627359 had a related patch set uploaded (by Eric Gardner; owner: Eric Gardner):
[mediawiki/extensions/WikibaseMediaInfo@master] Enable URL-based feature-flag for Quickview

https://gerrit.wikimedia.org/r/627359

Change 626787 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Expand categories tab to include various namespaces

https://gerrit.wikimedia.org/r/626787

@mwilliams FYI, we're unable to include the word count right now due to a limitation of the API that would require some time and discussion with other teams to resolve. We may be able to add it eventually. If it's highly critical in your opinion, please let me know and we can try to move it forward. Thanks!

@AnneT No problem, not critical at all. I thought we were getting that one for free for some reason.

I thought so too! Turns out the way we're doing the query is a bit of an exception in terms of the data we have access to.

Etonkovidova updated the task description. (Show Details)

Checked in commons wmf.9 - all is in place, except for

The metadata shown for each page result should be: Page title, page size (page size will be handled as part of T262992), word count (this is currently not feasible)

Since it'd be handled in T262992 - closing this task as Resolved.

matthiasmullie added a subscriber: matthiasmullie.

Reopening for question:

Description mentions wanting to search "The full list", with a long list of namespaces.
The full list is actually missing a few pages (e.g. User, Sequence, Campaign, ...) - is there a particular reason they were omitted?

I suspect what we really want is not a predefined list of namespaces to search, but rather "all namespaces except for NS_FILE", am I right?=
(asking because implementing it that way is actually a lot safer in case namespaces change, or this gets installed on another - differently configured - wiki)

Reopening for question:

Description mentions wanting to search "The full list", with a long list of namespaces.
The full list is actually missing a few pages (e.g. User, Sequence, Campaign, ...) - is there a particular reason they were omitted?

I suspect what we really want is not a predefined list of namespaces to search, but rather "all namespaces except for NS_FILE", am I right?=
(asking because implementing it that way is actually a lot safer in case namespaces change, or this gets installed on another - differently configured - wiki)

I wrote that description and it wasn't intentional on my part - my understanding was that that was the full list and I didn't realize it was missing page types. So from my POV we can definitely implement it as "all namespaces except for NS_FILE". Would love confirmation from @Ramsey-WMF though.

"All namespaces except for NS_FILE" is fine for now. We might have to have a later conversation about how we rank all that (categories first, I would guess), but for now this should work.

Change 641145 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Use <all namespaces except file> instead of hardcoded list

https://gerrit.wikimedia.org/r/641145

Change 641145 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Use <all namespaces except file> instead of hardcoded list

https://gerrit.wikimedia.org/r/641145

Two notes
(1) @matthiasmullie File talk NS is also excluded? I could not find an example of File talk page present in betalabs, only in production.
In commons wmf.16 File talk namespace is present - Seal of Ocean Springs, Mississippi.jpg in the screenshot below refers to File talk:Seal of Ocean Springs, Mississippi.jpg page.

(2) Normal search will display namespaces for the search result. On Special:MediaSearch namespace info is not displayed, which might make it challenging for users to evaluate how relevant the search result is, especially when the search term are common words/phrases.

SearchMedia search-Categories and Pages
Galleries with coordinatesGalleries with coordinates
Screen Shot 2020-11-20 at 4.58.02 PM.png (748×599 px, 209 KB)
Screen Shot 2020-11-20 at 4.59.24 PM.png (666×886 px, 94 KB)
Pages with mapsPages with maps
Screen Shot 2020-11-20 at 5.26.02 PM.png (699×809 px, 163 KB)
Screen Shot 2020-11-20 at 5.26.32 PM.png (691×856 px, 85 KB)

@Etonkovidova Good call on the namespaces; I agree we should display them on this tab.

Change 643442 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Pass correct param array format into FauxRequest

https://gerrit.wikimedia.org/r/643442

File_talk pages are also searched on betawiki - e.g. search for LCCN2003689153. However, there appears to be an issue when the search is performed from PHP.
When using the UI to search, the requests will be sent via JS and they then work just fine (and File talk pages are included), but when you arrive to the search with a url that already has the search term (e.g. refresh page), then it fails to find those pages.
Patch with fix in CR.

Change 643442 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Pass correct param array format into FauxRequest

https://gerrit.wikimedia.org/r/643442

@matthiasmullie - searching for a full file name, e.g. File:Black flower.jpg (https://commons.wikimedia.org/wiki/Special:MediaSearch?type=page&q=File%3ABlack+flower.jpg) returns images from the File namespace - is it intentional?

Screen Shot 2020-12-03 at 5.04.23 PM.png (469×808 px, 53 KB)

Change 646672 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Fix NS_FILE exclusion

https://gerrit.wikimedia.org/r/646672

@matthiasmullie - searching for a full file name, e.g. File:Black flower.jpg (https://commons.wikimedia.org/wiki/Special:MediaSearch?type=page&q=File%3ABlack+flower.jpg) returns images from the File namespace - is it intentional?

Screen Shot 2020-12-03 at 5.04.23 PM.png (469×808 px, 53 KB)

I had to dig pretty deep, but existing code in CirrusSearch explicitly overrides namespace configuration if it detects a namespace prefix in the search term.
I wasn't aware, but it is intentional & I guess it makes sense?

(also: found another minor issue with the initial PHP render, where it could still include NS_FILE and exclude NS_MAIN pages due to an error in how it processes the namespaces array - fix up for review)

Change 646672 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Fix NS_FILE exclusion

https://gerrit.wikimedia.org/r/646672

@matthiasmullie - searching for a full file name, e.g. File:Black flower.jpg (https://commons.wikimedia.org/wiki/Special:MediaSearch?type=page&q=File%3ABlack+flower.jpg) returns images from the File namespace - is it intentional?

Screen Shot 2020-12-03 at 5.04.23 PM.png (469×808 px, 53 KB)

I had to dig pretty deep, but existing code in CirrusSearch explicitly overrides namespace configuration if it detects a namespace prefix in the search term.
I wasn't aware, but it is intentional & I guess it makes sense?

Only Categories and Pagesfilter behaves likes that, i.e. it will display any kind of files if a full name is provided. So far, I don't see that it would lead to some implications.