Page MenuHomePhabricator

Add a link engineering: Create MySQL table for caching link recommendations
Closed, ResolvedPublic

Description

See https://wikitech.wikimedia.org/wiki/Add_Link for high level overview and additional reading.

The link recommendations returned by mwaddlink will be the raw wikitext of an article, but with new links added where the algorithm thinks we should have new links.

Depending on what we do in T261408: Add a link engineering: Maintenance script for retrieving, caching, and updating search index, we might be storing:

  • the raw wikitext as is
  • annotated wikitext that distinguishes the algorithm-added links from the already existing links
  • a diff of the old wikitext and new wikitext

So, we'll need to store this data, we might also want to think about storing:

  • the article ID it's associated with
  • timestamp of when the recommendations were generated
  • the proposed new link text extracted from the overall recommendation document (mwaddlink might be able to provide this for us separately from the existing wikitext), see Concept B in the design for how this might be used on the front-end.
  • anything else?

Event Timeline

kostajh changed the task status from Open to Stalled.Sep 24 2020, 10:32 AM

Per T261411, we may not use a MySQL table.

We decided on a JSON-in-MySQL approach. See T261411#6546055 and T261411#6562849.

Change 635911 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/vagrant@master] growthexperiments: Run DB updates

https://gerrit.wikimedia.org/r/635911

Change 635912 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Add a link: Create MySQL table for caching link recommendations

https://gerrit.wikimedia.org/r/635912

We decided on a JSON-in-MySQL approach. See T261411#6546055 and T261411#6562849.

I didn't realise that JSON data-type was added in MariaDB 10.2, we are migrating to 10.4, but unfortunately we are not there yet, as the masters cannot be migrated and will remain on 10.1 until we are able to migrate the labsdb hosts, which we expect to be able to do completely by Q3

Although by looking at https://gerrit.wikimedia.org/r/635912, I don't see the JSON data-type, but just using a mediumblob to store that sort of data, that would work. I thought you specifically wanted to use the JSON column data-type.
Nevermind then!
Thanks!

Yeah, sorry, I just meant JSON strings. Proper JSON types don't seem to give a lot of value and limit reusability.

kostajh changed the task status from Stalled to Open.Oct 26 2020, 2:23 PM

Change 635912 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Add a link: Create MySQL table for caching link recommendations

https://gerrit.wikimedia.org/r/635912

Change 635911 merged by jenkins-bot:
[mediawiki/vagrant@master] growthexperiments: Run DB updates

https://gerrit.wikimedia.org/r/635911

I recommend to update createExtensionTables.php so it is possible to create tables used by the extension in a standardized way, similar to other extensions that need tables.

Change 636941 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/extensions/WikimediaMaintenance@master] Add growthexperiments to createExtensionsTables.php

https://gerrit.wikimedia.org/r/636941

Etonkovidova subscribed.

Checked betalabs (MariaDB)

MariaDB [enwiki]> 
*************************** 1. row ***************************
       Table: growthexperiments_link_recommendations
Create Table: CREATE TABLE `growthexperiments_link_recommendations` (
  `gelr_revision` int(10) unsigned NOT NULL,
  `gelr_page` int(10) unsigned NOT NULL,
  `gelr_data` mediumblob NOT NULL,
  PRIMARY KEY (`gelr_revision`),
  KEY `gelr_page` (`gelr_page`)
) ENGINE=InnoDB DEFAULT CHARSET=binary

two indexes have been created:

MariaDB [enwiki]> show index from growthexperiments_link_recommendations\G
*************************** 1. row ***************************
        Table: growthexperiments_link_recommendations
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: gelr_revision
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
*************************** 2. row ***************************
        Table: growthexperiments_link_recommendations
   Non_unique: 1
     Key_name: gelr_page
 Seq_in_index: 1
  Column_name: gelr_page
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment:

Change 636941 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Add growthexperiments to createExtensionTables.php

https://gerrit.wikimedia.org/r/636941