I'm creating this task to evaluate our indexing strategy.
It's also correlated with some tasks we are currently discussing.
Today we index index a semi-structured document with few fields:
But I think we lose important information when merging data into these field, examples:
We should maybe not merge all the outgoing links into the same field.
This is somewhat related to few ongoing discussions:
- pagerank (see Fwd: Wikipedia PageRank thread in internal mailing lists)
- Wishlist : Improve Link Search
Some redirects contain interesting information that we could re-use to display better suggestions:
See https://en.wikipedia.org/wiki/Template:R_from_misspelling or https://en.wikipedia.org/wiki/Template:R_from_incorrect_name
It's just an example to illustrate the idea, the impact is may not be worth the effort...
Should we consider all the categories in the same way?
Some are displayed in the page and some are hidden why?
I don't have any example here but I think we can leverage interesting features from parsing wikitext.
This is a very broad task and phab is maybe not the best place to discuss about that so feel free to delete it.