Page MenuHomePhabricator

Phrase matching: Skip over headings
Closed, ResolvedPublic

Description

See this reverted diff https://de.wikipedia.org/w/index.php?title=City_Bomber&curid=1614969&diff=215675485&oldid=188948263. The recommendations from mwaddlink are:

{
  "links": [
    {
      "context_after": " zu berück",
      "context_before": "e und die ",
      "link_index": 0,
      "link_target": "Fluggeschwindigkeit",
      "link_text": "Fluggeschwindigkeit",
      "match_index": 0,
      "score": 0.5112375020980835,
      "wikitext_offset": 1673
    },
    {
      "context_after": " war recht",
      "context_before": "\nDie ",
      "link_index": 1,
      "link_target": "Programmierung",
      "link_text": "Programmierung",
      "match_index": 0,
      "score": 0.5177647471427917,
      "wikitext_offset": 2173
    },
    {
      "context_after": ")\n",
      "context_before": " (Orwin ",
      "link_index": 2,
      "link_target": "Software",
      "link_text": "Software",
      "match_index": 0,
      "score": 0.6518036127090454,
      "wikitext_offset": 3117
    },
    {
      "context_after": " richtig g",
      "context_before": "dings das ",
      "link_index": 3,
      "link_target": "Raumschiff",
      "link_text": "Raumschiff",
      "match_index": 0,
      "score": 0.5431332588195801,
      "wikitext_offset": 3969
    },
    {
      "context_after": ".\n\n",
      "context_before": " eher ein ",
      "link_index": 4,
      "link_target": "Geschicklichkeitsspiel",
      "link_text": "Geschicklichkeitsspiel",
      "match_index": 0,
      "score": 0.7253251671791077,
      "wikitext_offset": 4130
    }
  ],
  "links_count": 5,
  "meta": {
    "application_version": "5cfca36",
    "dataset_checksums": {
      "anchors": "0fa5550b374ccffdaf9a82f473bc10a584235ad46aef4bb62e1f9eef020bb5ff",
      "model": "b63fcdc8cbf2d4f0b1d693be3eff4dea723cbc42aa1e57cd59bfcd9a607f0a45",
      "pageids": "03d1e2b0e11b1fefb88baebaa0d9d0fbc00815ef0a5bee84f10fdad313699487",
      "redirects": "0353f76e007c232a63e5522b5bfe9269a1243957def1d08e21a1d611a80768ad",
      "w2vfiltered": "2a7c0735546f50ca8b6fb4d01f712c19cd05efa3fb6953ff6c3cbcdb3573c3c9"
    },
    "format_version": 1
  },
  "page_title": "City Bomber",
  "pageid": 1614969,
  "revid": 215675783
}

The Add-Link plugin for VE proposes to link Programmierung in the heading, instead of in the text:

image.png (220×762 px, 98 KB)

We should be able to update the phrase matching code to skip over headings, if we know they are always excluded from mwaddlink output.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 722280 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] AddLink: Skip over headings in phrase matching

https://gerrit.wikimedia.org/r/722280

Change 722280 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] AddLink: Skip over headings in phrase matching

https://gerrit.wikimedia.org/r/722280

Change 722449 had a related patch set uploaded (by Gergő Tisza; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.23] AddLink: Skip over headings in phrase matching

https://gerrit.wikimedia.org/r/722449

Change 722449 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.23] AddLink: Skip over headings in phrase matching

https://gerrit.wikimedia.org/r/722449

Mentioned in SAL (#wikimedia-operations) [2021-09-21T00:16:27Z] <tgr@deploy1002> Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: Backport: [[gerrit:722449|AddLink: Skip over headings in phrase matching (T291361)]] (duration: 00m 57s)

Etonkovidova subscribed.

I checked enwiki wmf.23 for the article that prefiously showed a recommended link in the title - https://en.wikipedia.org/wiki/Black-footed_cat

add_link in header.png (1×1 px, 727 KB)

Looks ok now:
https://api.wikimedia.org/service/linkrecommendation/v1/linkrecommendations/wikipedia/en/Black-footed_cat?threshold=0.5&max_recommendations=15

{"links":[{"context_after":". It usual","context_before":"s natural ","link_index":0,"link_target":"Habitat","link_text":"habitat","match_index":0,"score":0.7018957734107971,"wikitext_offset":2039}

I checked enwiki wmf.23 for the article that prefiously showed a recommended link in the title - https://en.wikipedia.org/wiki/Black-footed_cat

add_link in header.png (1×1 px, 727 KB)

Looks ok now:
https://api.wikimedia.org/service/linkrecommendation/v1/linkrecommendations/wikipedia/en/Black-footed_cat?threshold=0.5&max_recommendations=15

{"links":[{"context_after":". It usual","context_before":"s natural ","link_index":0,"link_target":"Habitat","link_text":"habitat","match_index":0,"score":0.7018957734107971,"wikitext_offset":2039}

FWIW, the API output didn't change from the patch above. The code in the VisualEditor plugin was updated to not attempt to link words in heading elements (h1:h6). To test it, you can open a link recommendation task article on testwiki/betalabs, and find the first word that is proposed as a link (e.g. habitat). Then in a private browser tab, edit that article and add that word in a heading (e.g. == Habitat ==) just before it was proposed as a link recommendation. Save the edit. Back in the other browser tab where the link recommendation plugin is open, refresh the page. You should see that the word added in the heading is not linked.