Page MenuHomePhabricator

Document CirrusSearch schema
Closed, ResolvedPublic

Description

To make cloudelastic useful to developers they need to know what is actually in there. Document the schema somewhere (wikitech? mw.org?). This schema will also be useful documentation for the CirrusSearch dumps.

Information that should probably be included about each property:

  • What is this property?
  • Which ways is it analyzed?
  • How stable is it?

Likely we also need some documentation at a higher level than the individual properties. The various analysis chain variants that are applied (keyword, near_match, near_match_asciifolding, plain, prefix, prefix_asciifolding, trigram, etc) will need to be documented as well.

Event Timeline

Created a horrible first draft that lists most of the properties and provides a short description for the ones used across most wikis. We should figure out how we want to format this before going much further:

https://www.mediawiki.org/wiki/Extension:CirrusSearch/Schema

The documentation isn't amazing, but I feel like we need user feedback to improve it much. Will wait and see what people ask about and update with parts people are curious about.

https://meta.wikimedia.org/wiki/Data_dumps/Misc_dumps_format#Cirrus_search_index_dumps This is kind of pulled out of my *ss but might be worth stealing some of it. I needed it for documenting the dumps formats.

https://meta.wikimedia.org/wiki/Data_dumps/Misc_dumps_format#Cirrus_search_index_dumps This is kind of pulled out of my *ss but might be worth stealing some of it. I needed it for documenting the dumps formats.

Thanks! I made some minor corrections. I suppose I could also add a link to wikitech Extension:CirrusSearch/Schema?

Thanks!

Sure, just add it right before the sample excerpt, as we do for the content translation dumps section.

Krinkle removed a subscriber: Krinkle.May 27 2019, 6:10 PM
debt closed this task as Resolved.May 28 2019, 11:53 PM