Page MenuHomePhabricator

"You may create the page" suggestion does not appear if search contains a hyphen
Open, LowestPublic

Description

New article creation is not suggested if one of the words in the search term starts with a hyphen. In Finnish, we have some compound terms which contain such words.

I guess this clashes with the special meaning of the hyphen in the search syntax. What is the best way to fix the problem?

Example: Go to https://fi.wikipedia.org/ and enter "New York -syndrooma" (without quotes) in the search box.

Expected result: The search should ask if you want to create a new article.

Actual result: https://fi.wikipedia.org/w/index.php?search=New+York+-syndrooma&title=Toiminnot%3AHaku&go=Siirry lists only the search results.

Discussion in Finnish: https://fi.wikipedia.org/wiki/Wikipedia:Kahvihuone_(tekniikka)/Arkisto_32#Uusi_hakukone_on_ongelma


Version: unspecified
Severity: normal

Details

Reference
bz73355

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:54 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz73355.
bzimport added a subscriber: Unknown Object (MLST).
demon removed a subscriber: demon.Aug 19 2015, 3:38 PM
Restricted Application added a project: Discovery. · View Herald TranscriptAug 19 2015, 3:38 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 29 2015, 2:05 PM
Deskana triaged this task as Lowest priority.Dec 29 2015, 11:19 PM
Deskana added a subscriber: Deskana.

Related to T122309: "You may create the page" suggestion does not appear if search contains 'AND', 'OR', 'NOT' anywhere in search even when these are not used as special syntax. This is because the search system avoids giving you the link to create the article if it thinks your search contains any advanced syntax. This is low priority for us to fix.

Deskana renamed this task from Red link suggestion not created for terms that contain a word starting with hyphen to Search does not ask you to create a new article if the search query contains a hyphen.Dec 29 2015, 11:20 PM
Deskana set Security to None.
Deskana moved this task from Needs triage to Search on the Discovery board.
Deskana renamed this task from Search does not ask you to create a new article if the search query contains a hyphen to "You may create the page" suggestion does not appear if search contains a hyphen.Dec 29 2015, 11:24 PM
TJones updated the task description. (Show Details)Aug 4 2016, 9:25 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptAug 4 2016, 9:25 PM
TJones added a subscriber: TJones.Aug 4 2016, 9:30 PM

Is it common usage to have word-initial hyphens in Finnish? In English, it is not. The example New York -syndrooma doesn't look like normal language to my English-speaking, search-engine-loving eyes. A word initial hyphen looks like special syntax, which is how it's being treated here.

Queries with the more typical hyphens between words do get an option to create a page: i-like-hyphens, or English-speaking, search-engine-loving eyes.

I read the Finnish discussion in translation—so I didn't get all of it—but I did find the link discussed there: Eino Leino -palkinto, which translates to Eino Leino -award. Is that a common usage?

Is it common usage to have word-initial hyphens in Finnish?

Yes. If the first part of a Finnish compound consists of separately written words (e.g., "Eino Leino" in your example), we use a space and hyphen (or non-breaking hyphen) to separate the two parts. The spelling conventions are explained here (unfortunately only in Finnish): http://www.kielitoimistonohjepankki.fi/ohje/131

TJones added a comment.Aug 5 2016, 5:47 PM

Yes. If the first part of a Finnish compound consists of separately written words (e.g., "Eino Leino" in your example), we use a space and hyphen

That's very interesting. In American English, copyeditors would use en dashes, and non-copyeditors would use regular connected hyphens—until their copyeditor corrected them.

I can see why it's causing problems on Finnish Wikipedia. Thanks for explaining/verifying/linking. It makes more sense now.

My first though is that the most reasonable fix would be to add a configuration parameter to control what elements of search syntax should be disallowed in titles that are recommended for article creation (instead of disallowing everything). Just allowing space+hyphen in titles everywhere would lead to complaints in the other direction from users whose writing conventions don't use hyphens that way.

I have a follow up question: do Finnish users want to use the hyphen as a negation marker? Searching for Eino Leino -palkinto gives an exact title match, which is fine, but adding an extra n as a typo—Eino Leinno -palkinto matches Eino Leino well enough, but prevents any match on palkinto, which seems bad. Or are people just used to that kind of thing?

A simpler solution might be to remove hyphens from the search syntax for Finnish Wikis. ! is still available for negation. On the other hand, Google Finland treats the space+hyphen as a negated search, and—for better or worse—Google tends to set the expectations for basic search syntax for unsophisticated users.

debt added a subscriber: debt.

Since this uses the special syntax and we typically don't suggest making a page based on the special syntax, we're not sure if this needs to be done. Moving to the later column until we get more feedback.