Page MenuHomePhabricator

Have PageAssessments store all the assessments in ExtensionData until the page is finished parsing
Closed, ResolvedPublic3 Estimated Story Points

Description

Right now PageAssessments immediately writes each assessment to the database as soon as each ((#assessment}} parser function is parsed. This has two disadvantages:

  1. We have no way of knowing when an assessment has been removed from the page and thus needs to be deleted.
  2. Each page parse may trigger several separate database inserts. It would be better if we could batch the inserts (and deletions) in a single transaction.

To improve things, we should instead have PageAssessments temporarily store each assessment in the ParserOutput's ExtensionData. Then, once the page parsing is complete, we can retrieve all the data using getExtensionData(), figure out which updates need to be made, and batch the updates into a single transaction (using begin and commit).

Event Timeline

kaldari raised the priority of this task from to Needs Triage.
kaldari updated the task description. (Show Details)
kaldari added a project: Community-Tech-Sprint.
kaldari subscribed.

I ran into a fundamental problem with using getExtensionData() - we need to know the keys to use it. Which we don't.
The set would go like: ( project => array( class, importance ) ) but to fetch that data back, I have to know the list of wikiprojects used in the page.

Could also do this by storing the list in a known key, or by just putting everything under a single array or object that's indexed by a known key. I'm a bit leery of exposing the full list of other extensions' internal data...

In T121068#1870344, @brion wrote:

Could also do this by storing the list in a known key, or by just putting everything under a single array or object that's indexed by a known key.

I need to store multiple key, value pairs at different times during the parsing. To do the above would mean to repeatedly call getExtensionData( 'known key' ) --> Append to the array --> setExtensionData( 'known key', 'changed array' ). Seems a bit hacky.

I'm a bit leery of exposing the full list of other extensions' internal data...

Could you tell me any possible concerns for doing this? It'd be interesting to know how/why this could be a problem.

@brion, @NiharikaKohli: Another option would be to create a function for appending data to a known key, rather than replacing it. How does that sound?

Change 259522 had a related patch set uploaded (by Niharika29):
Initial commit for PageAssessments extension

https://gerrit.wikimedia.org/r/259522

This is basically finished, but the patch is awaiting security review.

Change 259522 merged by jenkins-bot:
Initial commit for PageAssessments extension

https://gerrit.wikimedia.org/r/259522

DannyH moved this task from Q3 2018-19 to Q1 2018-19 on the Community-Tech-Sprint board.
DannyH subscribed.