Page MenuHomePhabricator

Federated Properties, ApiEntityLookup, trying to get data that was not prefetched.
Closed, ResolvedPublic

Description

The federated properties test system is broken with the exception above.
This happens on multiple code paths.

Event Timeline

Addshore created this task.May 19 2020, 3:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 19 2020, 3:16 PM

Change 597297 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] FP: ApiEntityLookup fetch and log when not already prefetched

https://gerrit.wikimedia.org/r/597297

Restricted Application added a project: Wikidata. · View Herald TranscriptMay 19 2020, 3:17 PM
Restricted Application added a subscriber: RhinosF1. · View Herald Transcript
Addshore claimed this task.May 19 2020, 3:17 PM
Addshore triaged this task as High priority.
Addshore updated the task description. (Show Details)
Restricted Application added a project: User-Addshore. · View Herald TranscriptMay 19 2020, 3:17 PM

Change 597297 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] FP: ApiEntityLookup fetch and log when not already prefetched

https://gerrit.wikimedia.org/r/597297

Addshore added a comment.EditedMay 25 2020, 1:36 PM

So we did a task break down related to this ticket (although the immediate issue described in this ticket has already been fixed).

The rough breakdown below should move the prefetching of data for federated properties to before "any" (perhaps) data is used.

  • Move the code from SingleEntitySourceServices::getEntityInfoBuilder into the entity type wiring files
  • Alter FullEntityParserOutput::getParserOutput to call getEntityInfo earlier and always (Need to think and check this is okay...) (See Fig1)
    • Need to figure out what makes the calls to get parser output without getting the HTML?
    • And how much this code path is called
    • And what the alter in logic might do?
    • TODO investigate what this entity info lookup is still used for etc... (Might be leftover from https://phabricator.wikimedia.org/project/view/3407/ ? )
  • Implement an EntityInfoBuilder for federated properties that prefetches $entityIds using the ApiEntityLookup
  • Override the EntityInfoBuilder for federated properties to do the prefetching

Fig1.

The prefetching for other cases such as the history page should already be covered in LanguageFallbackLabelDescriptionLookupFactory::newLabelDescriptionLookup which does a prefetchTerms using a termBuffer.
However looking at ItemHandler::getActionOverrides, the newLabelDescriptionLookup call which would allow the prefetching currently doesn't pass in any entity ids so maybe this doesn't happen?
TODO figure this out? This probably doesn't happen on Wikidata either that means? but should it? because of the now used terms caching stuff?

The stacktraces that we looked at during the breakdown were P11194 and P11297

I dug into some of this code a little bit more and put some findings below.

I believe that rather than use and abuse this entity info thing, we should create our own new things specifically for prefetching referenced entities if needed.

EntityInfo as a concept is not currently intentionally used and should be removed.

What is EntityInfo, why does it exist and is it still used?

As a result of the above, one of the first things that happens when rendering an entity is getting the entity info, which is then not used at all during rendering, as this is all now done via the formatted cache things.

Some other interesting observations (hinting toward removing entity info):

  • FormatterLabelDescriptionLookupFactory::OPT_LABEL_DESCRIPTION_LOOKUP is likely fairly pointless now, as this option is not looked at (see above) when creating the formatted to be used for most things. However::::
    • WikibaseValueFormatterBuilders::newItemPropertyIdHtmlLinkFormatter uses the formatted cache, and creates its own lookup
    • Other methods in this class still call labelDescriptionLookupFactory->getLabelDescriptionLookup which includes a check for OPT_LABEL_DESCRIPTION_LOOKUP These include:
      • newPlainEntityIdFormatter (seems it might need terms? it is a plain formatter that is used in newEntityIdFormatter (see next line)
      • newEntityIdFormatter
      • getVocabularyUriFormatter
    • However adding breakpoints locally to the EntityInfoTermLookup class methods didn't result in being hit
    • Adding a breakpoint to the OPT_LABEL_DESCRIPTION_LOOKUP case in getLabelDescriptionLookup resulted in hits for:
      • getVocabularyUriFormatter (so this code path still gets the terminfo lookup specified)
      • newLabelsProviderEntityIdHtmlLinkFormatter (we just touched / are touching this in fed prop) (it is called by newEntityIdFormatter)
    • All calls to this should be able to use the formatter cache and we remove EntityInfo

Some bookmarks of the code areas I was looking at throughout this

Impact of removal

The current rate of requests to this EntityInfo builder stuff: https://grafana.wikimedia.org/d/u5wAugyik/wikibase-statsdrecordingsimplecache?panelId=2&fullscreen&orgId=1&from=now-1h&to=now
The rate of requests for EntityInfo is essentially the same as the # of parser output generations for wikibase data.
Looking at the cache metrics
In a 12 hour period this is ~120million calls, so 10 million an hour, or 166k per min or ~2700 calls per second
The cache hits come from local server cache, but the misses result in lookups from the terms storage db tables.

We also track the db calls for this code path that may not be needed on https://grafana.wikimedia.org/d/000000548/wikibase-sql-term-storage?orgId=1&refresh=30s
You can see these in the top panel for the "select.EntityInfoBuilder_collectTermsForEntities" metric.
Averaging 4.7k ops a second/minute (TBA confirm this??)

Needless to say that not doing all of this stuff that is not needed should speed things up and reduce memory consumption.
It will also remove an old outdated concept that should have been killed some while ago.

Addshore closed this task as Resolved.Jun 2 2020, 9:22 PM

Ticket created so I can resolve this now