Found in T243915, For example, RDF output of Q7251, loads 252 other entities called from Wikibase\Rdf\RdfBuilder::resolveMentionedEntities. This wouldn't scale, most of the time responding for RDF is being spent on loading those, not to mention the huge memory footprint caused by it. the entities are in ExternalStorage, so loading can't be batched (due to the nature of ES using consistent hashing), the I/O needed for it is wild while it doesn't need all of the other entities, it only needs labels, property info, and statements like "formatter url" that can be also put in some cache here and there. After talking to devs and PM, it doesn't seem to be intentional.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Addshore | T243915 Performance audit of Wikibase/Wikidata, Jan 2020 | |||
Duplicate | None | T243950 RDF output of an entity loads all referenced entities, it shouldn't |
Event Timeline
the entities are in ExternalStorage
One note about that, these are likely retrieve from wan cache not actually external storage.
Yeah and that's not great either, it just moves the problem from one part of the infrastructure to another part. Network, bandwidth, IO and other things are still pretty high.
So I did a quick check on this on my volunteer capacity. It looks really interesting. If you remove $this->mentionedEntityTracker->entityReferenceMentioned from EntityIdRdfBuilder::addValue(), the RDF output stays completely the same (tried it in mwdebug in production on Q7251, even the hash is the same) but the time spent to produce it gets cut to 1/20th of it and the memory used to one fifth
- Before the removal: https://performance.wikimedia.org/xhgui/run/view?id=5e31dfb93f3dfab12dff839f
- After the removal: https://performance.wikimedia.org/xhgui/run/view?id=5e496c38bb854457660102b2
I make a patch for this and ask @daniel and @Tpt to take a look. Maybe I'm missing something obvious here.
I also tested it with random string in the URL to bypass varnish.
Change 572491 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Do not try to load the whole entity because their id is mentioned in RDF
Okay, Apparently some caching was indeed not letting me see the actual difference. They are different now, with this patch we don't get things like this:
> wd:Q15442776 a wikibase:Item ; > rdfs:label "cryptographer"@en ; > skos:prefLabel "cryptographer"@en ; > schema:name "cryptographer"@en ; > rdfs:label "cryptographe"@fr ; > skos:prefLabel "cryptographe"@fr ; > schema:name "cryptographe"@fr ; > rdfs:label "Kryptograph"@de ; > skos:prefLabel "Kryptograph"@de ; > schema:name "Kryptograph"@de ; > rdfs:label "криптограф"@ru ; > skos:prefLabel "криптограф"@ru ; > schema:name "криптограф"@ru ; > rdfs:label "crittografo"@it ; > skos:prefLabel "crittografo"@it ; > schema:name "crittografo"@it ; > rdfs:label "κρυπτογράφος"@el ; > skos:prefLabel "κρυπτογράφος"@el ; > schema:name "κρυπτογράφος"@el ; > rdfs:label "criptógrafo"@es ; > skos:prefLabel "criptógrafo"@es ; > schema:name "criptógrafo"@es ; > rdfs:label "cryptograaf"@nl ; > skos:prefLabel "cryptograaf"@nl ; > schema:name "cryptograaf"@nl ; > rdfs:label "criptógrafo"@pt ; > skos:prefLabel "criptógrafo"@pt ; > schema:name "criptógrafo"@pt ; > rdfs:label "криптограф"@sr ; > skos:prefLabel "криптограф"@sr ; > schema:name "криптограф"@sr ; > rdfs:label "криптограф"@sr-ec ; > skos:prefLabel "криптограф"@sr-ec ; > schema:name "криптограф"@sr-ec ; > rdfs:label "kriptograf"@sr-el ; > skos:prefLabel "kriptograf"@sr-el ; > schema:name "kriptograf"@sr-el ; > rdfs:label "kryptograf"@cs ; > skos:prefLabel "kryptograf"@cs ; > schema:name "kryptograf"@cs ; > rdfs:label "kryptograf"@da ; > skos:prefLabel "kryptograf"@da ; > schema:name "kryptograf"@da ; > rdfs:label "kryptograf"@sv ; > skos:prefLabel "kryptograf"@sv ; > schema:name "kryptograf"@sv ; > rdfs:label "kriptograf"@sl ; > skos:prefLabel "kriptograf"@sl ; > schema:name "kriptograf"@sl ; > rdfs:label "գաղտնագիր"@hy ; > skos:prefLabel "գաղտնագիր"@hy ; > schema:name "գաղտնագիր"@hy ; > rdfs:label "criptògraf"@ca ; > skos:prefLabel "criptògraf"@ca ; > schema:name "criptògraf"@ca ; > rdfs:label "criptograf"@ro ; > skos:prefLabel "criptograf"@ro ; > schema:name "criptograf"@ro ; > rdfs:label "kriptográfus"@hu ; > skos:prefLabel "kriptográfus"@hu ; > schema:name "kriptográfus"@hu ; > rdfs:label "عالم تعمية"@ar ; > skos:prefLabel "عالم تعمية"@ar ; > schema:name "عالم تعمية"@ar ; > rdfs:label "криптограф"@uk ; > skos:prefLabel "криптограф"@uk ; > schema:name "криптограф"@uk ; > rdfs:label "密碼學家"@zh-hk ; > skos:prefLabel "密碼學家"@zh-hk ; > schema:name "密碼學家"@zh-hk ; > rdfs:label "密碼學家"@yue ; > skos:prefLabel "密碼學家"@yue ; > schema:name "密碼學家"@yue ; > rdfs:label "密碼學家"@zh ; > skos:prefLabel "密碼學家"@zh ; > schema:name "密碼學家"@zh ; > rdfs:label "密码学家"@zh-cn ; > skos:prefLabel "密码学家"@zh-cn ; > schema:name "密码学家"@zh-cn ; > rdfs:label "密码学家"@zh-hans ; > skos:prefLabel "密码学家"@zh-hans ; > schema:name "密码学家"@zh-hans ; > rdfs:label "密碼學家"@zh-hant ; > skos:prefLabel "密碼學家"@zh-hant ; > schema:name "密碼學家"@zh-hant ; > rdfs:label "密碼學家"@zh-mo ; > skos:prefLabel "密碼學家"@zh-mo ; > schema:name "密碼學家"@zh-mo ; > rdfs:label "密码学家"@zh-my ; > skos:prefLabel "密码学家"@zh-my ; > schema:name "密码学家"@zh-my ; > rdfs:label "密码学家"@zh-sg ; > skos:prefLabel "密码学家"@zh-sg ; > schema:name "密码学家"@zh-sg ; > rdfs:label "密碼學家"@zh-tw ; > skos:prefLabel "密碼學家"@zh-tw ; > schema:name "密碼學家"@zh-tw ; > rdfs:label "jüfavan"@vo ; > skos:prefLabel "jüfavan"@vo ; > schema:name "jüfavan"@vo ; > rdfs:label "крыптограф"@be ; > skos:prefLabel "крыптограф"@be ; > schema:name "крыптограф"@be ; > rdfs:label "kriptografisto"@io ; > skos:prefLabel "kriptografisto"@io ; > schema:name "kriptografisto"@io ; > rdfs:label "kriptografo"@eu ; > skos:prefLabel "kriptografo"@eu ; > schema:name "kriptografo"@eu ; > rdfs:label "kryptograf"@pl ; > skos:prefLabel "kryptograf"@pl ; > schema:name "kryptograf"@pl ; > rdfs:label "קריפטוגרף"@he ; > skos:prefLabel "קריפטוגרף"@he ; > schema:name "קריפטוגרף"@he ; > rdfs:label "kryptograf"@nb ; > skos:prefLabel "kryptograf"@nb ; > schema:name "kryptograf"@nb ; > rdfs:label "kryptografi"@fi ; > skos:prefLabel "kryptografi"@fi ; > schema:name "kryptografi"@fi ; > rdfs:label "criptógrafu"@ast ; > skos:prefLabel "criptógrafu"@ast ; > schema:name "criptógrafu"@ast ; > rdfs:label "Kryptograph"@lb ; > skos:prefLabel "Kryptograph"@lb ; > schema:name "Kryptograph"@lb ; > rdfs:label "cryptograffwr"@cy ; > skos:prefLabel "cryptograffwr"@cy ; > schema:name "cryptograffwr"@cy ; > rdfs:label "криптограф"@mk ; > skos:prefLabel "криптограф"@mk ; > schema:name "криптограф"@mk ; > rdfs:label "criptografiste"@lfn ; > skos:prefLabel "criptografiste"@lfn ; > schema:name "criptografiste"@lfn ; > rdfs:label "kriptologo"@eo ; > skos:prefLabel "kriptologo"@eo ; > schema:name "kriptologo"@eo ; > rdfs:label "cripteagrafaí"@ga ; > skos:prefLabel "cripteagrafaí"@ga ; > schema:name "cripteagrafaí"@ga ; > schema:description "spécialiste de la cryptographie"@fr, > "Beruf, der das Verschlüsseln vertrauenswürdiger Information beinhaltet"@de, > "professione"@it, > "persona que se especializa en la criptografía"@es, > "specialist on techniques for secure communication in the presence of third parties"@en, > "specialista na počítačové šifrování"@cs, > "ekspert i kryptografi"@da, > "специалист в области криптографии"@ru, > "kriptografian aditua dena"@eu, > "ekspert i kryptografi"@nb, > "henkilö, joka työkseen tekee uusia salakirjoitusmenetelmiä ja suojaa viestejä salakirjoitukselta"@fi, > "在第三方面前進行安全通信技術的專家"@zh .
Is this intentional? if so, I need to make a better solution.
The intent is to provide "stub" information for all mentioned entities, such as their type, label, and description (but maybe not in all languages?). Basically, this is the information needed to generate a minimal human readable representation of the entity. This is skipped in "dump" mode, and could generally be made optional. Though it seems a good default behavior
All info needed for the stubs should be available in database tables (like the terms table), and it should be able to (pre-)fetch them using bulk queries. My approach would be to rewrite resolveMentionedEntities() based on EntityInfo. This is however complicated by the fact that stubs for some kinds of entities expose additional information by implementing EntityRdfBuilder::addEntityStub(). I think in practice, this is only needed for properties. Could be a hard-coded special case.
Change 572491 abandoned by Ladsgroup:
Do not try to load the whole entity because their id is mentioned in RDF
Reason:
Not this way.