Page MenuHomePhabricator

Find key patterns in Redis keys that indicate components using MainStash
Closed, ResolvedPublic5 Estimated Story Points


We should be able to map key patterns in Redis to components in MediaWiki that use MainStash that we identified in T228309.

This task will involve dumping the keys from Redis, grouping them by some characteristic (usually a namespacing prefix followed by ":") and matching those to components we know about from code analysis.

If there are patterns that don't match components we know about, we need to identify those components.

Event Timeline

WDoranWMF triaged this task as Medium priority.
WDoranWMF subscribed.

As reference, this was done previously for the top 7 parameters

It would be helpful if @jijiki were able to share their tools or code (if any) that they did for that previous analysis.

WDoranWMF set the point value for this task to 5.Jul 23 2019, 1:06 PM

One algorithm for this might be the following:

  • Make a hashtable for the pattern counts, mapping a pattern string to an integer
  • For each line:
    • pattern = line
    • Replace substrings in pattern that match the code for of one of our wikis (like "en_wp") with "(WIKI)"
    • Replace everything after the last colon in pattern with "(ID)"
    • (Other transforms here)
    • Increment the count for this pattern in the big hashtable (or set to 1 if not exists)
  • Dump comma-separated values (CSV) of hashtable to stdout as "count,pattern"

We can stop adding transforms when the resulting dumped CSV is less than 1000 lines long. Any dupes after that I can deal with manually.