Page MenuHomePhabricator

Decide on In-site Search functionality (portal content itself)
Closed, ResolvedPublic

Description

Consider if we want an in-site/in-page search for content on the Developer Portal itself.
Hopefully not needed; but let's check at a later stage again.

Event Timeline

Aklapper changed the task status from Open to Stalled.Jul 27 2021, 4:20 PM
Aklapper triaged this task as Lowest priority.
Aklapper created this task.
Aklapper moved this task from Inbox to 2022-Q2 or later on the Wikimedia-Developer-Portal board.
Aklapper renamed this task from In-site Search functionality (portal content itself) to Decide on In-site Search functionality (portal content itself).Aug 24 2021, 4:22 PM

To put the priority of this item even lower, if going for static in T287175: Decide on most suitable underlying technical platform implies that searching would be way harder.
Plus search versus translation (T276700: Content localization infrastructure setup (export, import, deploy) which is more important than this very task) opens questions on indexing, stemming, etc.
What if that language was only 50% translated, would that lead to incomplete search results? Or search in both English and that other language?

TBurmeister added a subscriber: TBurmeister.

It seems like we have a nice search functionality in the current portal, but there appear to be duplicates in the results. Maybe a bug?

Screenshot from 2022-01-25 13-26-09.png (1×2 px, 477 KB)

It seems like we have a nice search functionality in the current portal, but there appear to be duplicates in the results. Maybe a bug?

The duplicates are very likely untranslated "Spanish" versions of the pages. Currently there is no separation of the lunr search index content by content language.

It seems like we have a nice search functionality in the current portal, but there appear to be duplicates in the results. Maybe a bug?

The duplicates are very likely untranslated "Spanish" versions of the pages. Currently there is no separation of the lunr search index content by content language.

This example search https://developer-portal.wmcloud.org/?q=machine+learning returns:

  • /build-tools/automate-editing/#use-machine-learning-to-detect-vandalism
  • /use-content/data/#use-machine-learning-to-detect-vandalism
  • /es/build-tools/automate-editing/#use-machine-learning-to-detect-vandalism
  • /es/use-content/data/#use-machine-learning-to-detect-vandalism
  • /use-content/#access-data-and-analyze-wikis
  • /es/use-content/#access-data-and-analyze-wikis
  • /grow/tech-capacity/
  • /es/grow/tech-capacity/
  • /grow/
  • /es/grow/
  • /#grow-your-technical-skills
  • /es/#grow-your-technical-skills

Each URL here is unique. Some of the matched content is duplicate because the same document descriptor is present in multiple categories and thus multiple target pages. And then as postulated everything is repeated in the /es/ namespace due to a lack of content translations at this stage of site preparation.

@bd808, is it possible to have multiple lunr search indexes, one for each language, and search just that index based on the current language?

@bd808, is it possible to have multiple lunr search indexes, one for each language, and search just that index based on the current language?

It should be possible with custom development work, but no it is not currently possible.

Closing this as resolved based on the current search functionality. I opened T306458 for implementing per-language search after launch