pasting full URL into entity selector no longer works
Closed, ResolvedPublic
Actions

Description

It used to be possible to past the full URL of an item or property into the entity selector. It no longer gives a result.

Details

Subject	Repo	Branch	Lines +/-
Restore wgCirrusSearchRescoreFunctionScoreChains after test	mediawiki/extensions/Wikibase	master	+72 -44
Add integration tests for pasting full URLs into entity selectors	mediawiki/extensions/Wikibase	master	+253 -5
Add special case handling for some forms of IDs	mediawiki/extensions/Wikibase	master	+582 -3

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Smalyshev	T179061 pasting full URL into entity selector no longer works
		Resolved		thiemowmde	T117763 Small feature request: recognize bracketed IDs as valid IDs

Event Timeline

Lydia_Pintscher created this task.Oct 26 2017, 10:33 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 26 2017, 10:33 AM

@Smalyshev can you have a look? @thiemowmde had some special handling for this it seems.

thiemowmde added a subtask: T117763: Small feature request: recognize bracketed IDs as valid IDs.Oct 26 2017, 10:37 AM

The code that made this possible was in EntitySearchTermIndex::getEntityIdsMatchingSearchTerm, introduced via T117763. The idea was intentionally trivial:

The first step always tried to parse the users input as an entity ID.
If this fails, the regex /.*(\b\w{2,})/s tried to grep the last ASCII sequence from the users input, and parse that as an entity ID. This covers full URLs as well as copy-pastes like "(Q42)".
All this is cheap.
In addition the term table was queried in two steps: exact matches first, then prefix matches.

This code path is not executed any more with the switch to the new EntitySearchElastic implementation, introduced via T125500.

To bad we had no integration test for this feature.

Change 386602 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig (WMDE)):
[mediawiki/extensions/Wikibase@master] Add integration tests for pasting full URLs into entity selectors

https://gerrit.wikimedia.org/r/386602

gerritbot added a project: Patch-For-Review.Oct 26 2017, 10:53 AM

debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.Oct 26 2017, 5:04 PM

debt moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.

We could probably pre-process the input, yes. Though I am not sure we should encourage these things... while something like case-insensitive match is common search functionality, pasting URLs etc. with magic rules seems to be going a bit too far. But if old one supported it, fine. I wish there were some docs though or tests which extra syntaxes we process. So far I've got:

URL - should it match only wikidata URL? Would http://google.com/Q42 also look for Q42 (which is kinda weird)?
Parens removal - (Q42) is the same as Q42. Should it work for others too, e.g. (P42)? (L42)? (Douglas Adams)? Should (Q42 and Q42)))) and ())Q42() also work?

If this fails, the regex /.*(\b\w{2,})/s tried to grep the last ASCII sequence from the users input, and parse that as an entity ID

That's not how Elastic search currently works, in general, we don't have ID parsing etc. In fact, Elastic index has no idea what "ID" is. We match the title against Elastic index. We could add an extra step of ID-parsing, but this complicates things quite a bunch, since if we don't run the search we don't have the necessary data. We could parse ID and then extract it and run the search on it, but that looks like duplicating the work. Anyway, if we have pre-processing rules, we could probably do it.

I think we can keep it to Wikidata URLs.
Parenthesis removal should also work for the other entity types. I don't think we need it for labels etc.

Smalyshev added a project: User-Smalyshev.Oct 26 2017, 5:51 PM

Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.Oct 28 2017, 10:24 AM

The url can be in different formats:

Not sure how many variants we have. Not to hard to catch in a regex, might be a bit more clutter when you add these urls to preprocessing.

Yes this is a problem... I wonder though whether we should cover all of these. I can see why one would paste the first one into the selector, but the other ones one would have to take from RDF or specially construct... Is it a reasonable expectation that search would cover all of them?

I believe that even pasting a partial URL like "ata.org/wiki/Q42" should work. And it easily can, as I already tried to show above. Just try to apply two steps:

Try to parse the users input as an entity ID. We do have WikibaseRepo::getEntityIdParser() which should be used for this.
If this fails, fetch the last alphanumeric character sequence from the users input and try again. The regex /.*(\b\w{2,})/s I used for this can even be simplified if you like: /.*(\b\w+)/s.

EntityIdParser throws exception on parse error. Is there any API that allows to check whether something is a valid ID without throwing?

What about utilizing a try-catch? You can wrap it in a private passesEntityIdParsing function if you like.

Liuxinyu970226 subscribed.Oct 28 2017, 3:16 PM

Change 387025 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Add special case handling for some forms of IDs

https://gerrit.wikimedia.org/r/387025

Smalyshev moved this task from Next to Waiting/Blocked on the User-Smalyshev board.Oct 31 2017, 11:47 PM

Smalyshev claimed this task.Nov 1 2017, 12:54 AM

Change 389698 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig):
[mediawiki/extensions/Wikibase@master] Restore wgCirrusSearchRescoreFunctionScoreChains after test

https://gerrit.wikimedia.org/r/389698

thiemowmde added a project: Wikidata-Former-Sprint-Board.Nov 7 2017, 11:46 AM

thiemowmde moved this task from ready to go to in progress on the Wikidata board.

thiemowmde moved this task from Proposed to Review on the Wikidata-Former-Sprint-Board board.

WMDE-leszek added a project: Wikidata-Sprint-2017-11-07.Nov 7 2017, 1:58 PM

WMDE-leszek moved this task from Backlog to Review on the Wikidata-Sprint-2017-11-07 board.Nov 7 2017, 2:02 PM

thiemowmde closed this task as Resolved.Nov 8 2017, 11:18 AM

thiemowmde moved this task from Review to Done on the Wikidata-Sprint-2017-11-07 board.

thiemowmde removed a project: Patch-For-Review.

thiemowmde moved this task from Review to Done on the Wikidata-Former-Sprint-Board board.

Liuxinyu970226 unsubscribed.Nov 8 2017, 3:35 PM

Change 387025 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add special case handling for some forms of IDs

https://gerrit.wikimedia.org/r/387025

Change 386602 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add integration tests for pasting full URLs into entity selectors

https://gerrit.wikimedia.org/r/386602

Change 389698 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Restore wgCirrusSearchRescoreFunctionScoreChains after test

https://gerrit.wikimedia.org/r/389698

ReleaseTaggerBot added a project: MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)).Nov 8 2017, 7:00 PM

EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.May 6 2019, 3:58 PM

pasting full URL into entity selector no longer worksClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

pasting full URL into entity selector no longer works
Closed, ResolvedPublic
Actions

Related Objects
Search...