Page MenuHomePhabricator

Don't generate Wikibase html when indexing entities for search
Open, Needs TriagePublic

Description

I noticed we are generating html when CirrusSearch requests ParserOutput for Wikibase entities. From profiling locally, I believe this slows down indexing quite a bit.

I don't think full html is needed in case of Wikibase content (except it might be used for 'text_bytes', but maybe that's not so relevant for entities?). For entities, we build the search text and fields directly from the entity object.

Event Timeline

aude created this task.May 6 2016, 3:12 AM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptMay 6 2016, 3:13 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

@EBernhardson says that the performance impact of this is minimal, so whilst it's absolutely correct that this shouldn't be done, it's not necessary for us to chase it down right now.

Deskana moved this task from Uncategorised to Technical on the CirrusSearch board.
aude added a comment.Jun 3 2016, 7:53 PM

@Deskana do you mean *should* be done?

so whilst it's absolutely correct that this shouldn't be done