Page MenuHomePhabricator

Orphan articles as reading recommendations
Closed, ResolvedPublic

Description

In T341846, we built a proof-of-concept for surfacing orphan articles as recommendation to readers to increase their visibility. While this works well for some articles (example 1), the current approach times out if the article exists in many languages (example 2) due to the large number of necessary API calls to find the recommendations. This means that the current implementation is not ready to be used in practice. However, from discussions with the Web Team around annual planning indicates that the recommendations of orphans for readers would be very relevant; highlighting the need to improve the current prototype.

Therefore, in this task we want to find a better way of generating the recommendations, specifically optimizing the time needed to serve the recommendations, in order to make them more useful in practice.

Event Timeline

weekly update:

  • spent some time to try to figure out whats the bottleneck in the current setup
  • starting to brainstorm different options for improvement such as i) pre-computing look-up tables, ii) narrow down candidates earlier in the pipeline, iii) using embeddings to take advantage of fast approximate nearest neighbor lookup, etc....

weekly update:

  • no update this week

weekly update:

  • no update this week because I didnt manage to free up time for this task (shorter week, annual planning deadlines, wiki workshop submission deadline)

weekly update:

  • investigated how we could take advantage of existing recommendations from the RelatedArticles feature. the feature obtains recommendations from queries to cirrussearch' morelike. There seem to be two interesting options from the options available in the API
    • filtering or boosting with the number of inlinks (i.e. prioritizing articles with low indegree), e.g., via Srsort:incoming_links_asc
    • using the m̀orelikethis`option which allows to specify specific templates. In our case, we could then require recommendations to have the Orphans-template. The query then yields related articles that are marked as orphans via the corresponding template. Example query: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=morelikethis:Tiwanaku%20hastemplate:%22orphan%22&srlimit=10
    • especially, the latter approach seems promising (as it could also be adapted to other languages) but would require some evaluation of the quality of the recommendations. while the recommendations are orphans, it is not clear how "well" they are related to the query-article. however, in contrast to link translation, the queries via morelike/morelikethis are much faster.

weekly update:

  • no update since my main focus was on Wiki Workshop T352543

weekly update:

  • no update. main focus was preparing for attending ICWSM T362416

weekly updates:

  • finally I managed to spend some time on this.
  • I figured out that one of the main bottlenecks was to calculate the indegree for each potential link-target (this is crucial since we want to prioritize articles with low indegree such as orphans). My initial approach was to use the Linkshere-API. However, this requires a separate call for each individual article. A much cheaper alternative is to query the replicas, with only a single query for potentially hundreds of articles for which we want to get the indegree (see the example in quarry). The replicas can be easily queried from toolforge (wikitech-documentation, example script in PAWS). For an example article, I could reduce the query-time 10-fold.
  • I integrated this (and a few other fixes to improve the recommendations) into the latest version of the tool. Example: https://linkrec.toolforge.org/readmore?lang=en&title=Tiwanaku
  • the example still takes some time but its much less likely to just timeout since the number of API calls is much smaller.
  • in case, we need much more substantial speedups, we might want to resort to other heuristics such as the one mentioned above (T361944#9809372) using morelikethis in cirrussearch combined with the orphans-template