Page MenuHomePhabricator

System Administrator uses object cache to lower database traffic
Open, LowPublic

Description

"As a System Administrator, I want to use an object cache like Memcache or Redis, to reduce the number of database calls that the MW REST API makes."

This is a user story for introducing object cache into the MW REST API. We've already got some issues with database performance for the history counts, for example.

Evaluate use of Objectcache/Poolcounter for each endpoints

  • /user/{name}/hello
  • /v1/page/{title}/history
  • /v1/page/{title}/history/counts/{type}
  • /v1/revision/{from}/compare/{to}
  • /v1/revision/{id}/bare
  • /coredev/v0/search/page
  • /coredev/v0/page/{title}/links/language

Event Timeline

eprodromou renamed this task from System Adminstrator uses object cache to lower database traffic to System Administrator uses object cache to lower database traffic.Nov 14 2019, 10:11 PM

/user/{name}/hello

I guess we eventually would want to get rid of this?

/v1/page/{title}/history

I'm not sure about this one.

ab -n 100 https://en.wikipedia.org/w/rest.php/v1/page/Tank/history
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking en.wikipedia.org (be patient).....done


Server Software:        mw1273.eqiad.wmnet
Server Hostname:        en.wikipedia.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-ECDSA-AES256-GCM-SHA384,256,256
TLS Server Name:        en.wikipedia.org

Document Path:          /w/rest.php/v1/page/Tank/history
Document Length:        3571 bytes

Concurrency Level:      1
Time taken for tests:   4.144 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      459330 bytes
HTML transferred:       357100 bytes
Requests per second:    24.13 [#/sec] (mean)
Time per request:       41.439 [ms] (mean)
Time per request:       41.439 [ms] (mean, across all concurrent requests)
Transfer rate:          108.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    3   0.2      3       4
Processing:    25   39  11.2     36      90
Waiting:       25   39  11.1     36      90
Total:         28   41  11.2     39      93

Percentage of the requests served within a certain time (ms)
  50%     39
  66%     42
  75%     46
  80%     48
  90%     58
  95%     64
  98%     70
  99%     93
 100%     93 (longest request)

So, TLDR we get a response within 40ms which is really really good. The load on the DB is also marginal, we're not doing any crazy there. My proposal is not to introduce another level of complexity. If we want to get speedier, we can utilize Varnish better here.

/v1/page/{title}/history/counts/{type}

Done in T237430

/v1/revision/{from}/compare/{to}

DifferencesEngine already uses WANObjectCache, and we have a TODO comment regarding integrating the JSON diff into SlotDiffRenderer/DifferencesEngine, so we will be able to piggyback on that. @Jdlrobson was working on getting inline diffs in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/547057, we should file a separate ticket for getting wikidiff2_inline_json_diff as an option into DifferencesEngine and kill two birds with one stone.

/v1/revision/{id}/bare

All it does is fetches a single revision from primary ID from the database. Not much data mangling is performed, so I don't think we need additional caching on top.

/coredev/v0/search/page

AFAIK there would be no way of purging the cache. We'd need to purge on every page edit to maintain correctness, so this is out of reach.

In conclusion, I think that we need to file a separate ticket for integrating JSON diffs into core properly, and not do anything else.

/coredev/v0/page/{title}/links/language

Depending on the target performance. I think it will perform pretty well as is. Purging this cache will not be very complicated, but I doubt the benefit/complexity ratio