Improve GettingStarted data storage strategy
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	aaron
	Feb 15 2017, 8:43 PM

Description

Right now, this extension does reads/writes from any datacenter, and uses redis as a backing store. It also relies on redis persistence to some extent. This usage blocks Luca's proposal of simplifying out twemproxy setup for redis.

Right now, we have two sockets on each apache, one that talks to the local redis pool and the other to the remote datacenter pool. Each listens on a socket with a DC-specific name (though mediawiki-config hides this a bit). The idea of having just one socket for the local pool would simplifiy the setup and the original reason for them had to do with a way of using redis for session that we no longer plan on using (instead, dynamo or cassandra are contenders).

Perhaps this can be switched to another store (like dynamo over redis). It could act as an canary use case to test dyanmo. This would handle locality and replication.

Details

	Subject	Repo	Branch	Lines +/-
	Add number of pages to dump_redis.php	mediawiki/extensions/GettingStarted	master	+9 -4

Customize query in gerrit

Related Objects

Mentioned In: T206504: Create a new endpoint which returns articles in need of a description
T158572: Explore how to make a category search that includes sub-categories.
Mentioned Here: T163514: Labs undefined index due to data center switchover

Event Timeline

aaron created this task.Feb 15 2017, 8:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 15 2017, 8:43 PM

• Mattflaschen-WMF added a project: MediaWiki-extensions-GettingStarted.Feb 15 2017, 9:42 PM

Thanks a lot Aaron for opening this task. I am a bit ignorant about this extension and I don't get the "this extension does reads/writes from any datacenter". From the ops point of view, the Redis cluster in codfw is used only for replication from eqiad, so we don't expect any live traffic to reach that pool directly. Is it a bad assumption?

In T158239#3032796, @elukey wrote:

Thanks a lot Aaron for opening this task. I am a bit ignorant about this extension and I don't get the "this extension does reads/writes from any datacenter". From the ops point of view, the Redis cluster in codfw is used only for replication from eqiad, so we don't expect any live traffic to reach that pool directly. Is it a bad assumption?

It basically maintains a cache of which pages are in which categories, so it can use Redis SRANDMEMBER to select random pages from the categories.

It expects:

To write to Redis master when page edits are made (so from the master data center), or from maintenance/populate_categories.php..
Read from Redis slave, which tracks the master, on any web request.
Redis persistence. It doesn't expect Redis to empty out. If it does, we need to re-run the maintenance script.

Most of the key stuff is in https://phabricator.wikimedia.org/diffusion/EGST/browse/master/RedisCategorySync.php (including the hook listeners). Ping me on IRC or reply here if I can help explain.

elukey added a project: User-Elukey.Feb 22 2017, 2:44 PM

In T158239#3034203, @Mattflaschen-WMF wrote:

In T158239#3032796, @elukey wrote:

Thanks a lot Aaron for opening this task. I am a bit ignorant about this extension and I don't get the "this extension does reads/writes from any datacenter". From the ops point of view, the Redis cluster in codfw is used only for replication from eqiad, so we don't expect any live traffic to reach that pool directly. Is it a bad assumption?

It basically maintains a cache of which pages are in which categories, so it can use Redis SRANDMEMBER to select random pages from the categories.

It expects:

To write to Redis master when page edits are made (so from the master data center), or from maintenance/populate_categories.php..

Read from Redis slave, which tracks the master, on any web request.

Redis persistence. It doesn't expect Redis to empty out. If it does, we need to re-run the maintenance script.

Most of the key stuff is in https://phabricator.wikimedia.org/diffusion/EGST/browse/master/RedisCategorySync.php (including the hook listeners). Ping me on IRC or reply here if I can help explain.

Nice to know, as a matter of fact, this raises a few questions:

where are the redis machines used for this configured?
where is master/slave determined from?

and also:

is this storage model documented anywhere outside of the code?

Answering some of my questions:

this uses the $wgObjectCaches['redis_master'] and $wgObjectCaches['redis_slave'], variables, that default to:

The local DC for the slave
The MW master DC for the master

so in eqiad both slave and master are the same servers. reachable via the nutcracker socket.

I think this is acceptable, but I was honestly not aware we were using redis for storing other things besides the jobqueue data.

May I ask what is special about redis so that it was preferred to other means of storage?

Persistency on redis (and replication) has always been guaranteed as best-effort, which doesn't seem to be great in this case.

I'm also pretty sure both me and ori changed the topology of the redis cluster many times assuming we could live with losing some keys, it might not be the case for this extension.

I would love to rethink where we do store those data, maybe using a dedicated instance.

@Mattflaschen-WMF any idea how much data are we storing for this extension?

elukey moved this task from Backlog to Ops Backlog on the User-Elukey board.Feb 23 2017, 1:07 PM

Note, this may be related to T163514: Labs undefined index due to data center switchover.

In T158239#3046783, @Joe wrote:

is this storage model documented anywhere outside of the code?

As far as I know, no.

In T158239#3046817, @Joe wrote:

May I ask what is special about redis so that it was preferred to other means of storage?

It was implemented by Ori, so I'm not the best person to ask. But I think the main nice thing is srand. We can make as many sets as we want, and easily choose an arbitrary number of random elements from each set. The other related set-manipulation code is simple too.

Persistency on redis (and replication) has always been guaranteed as best-effort, which doesn't seem to be great in this case.

I didn't know it was not persistent in production. We have seen problems with Redis emptying on the Beta Cluster, though.

@Mattflaschen-WMF any idea how much data are we storing for this extension?

Very little.

2,505 page IDs. I don't know exactly how Redis stores strings, or how to explicitly request size. So as an upper bound, assume each page ID is 8 bytes, plus 8 for a length.

So:

2505 * (8 + 8) = 40080 = ~~40 KB

There's additional overhead for the set structures themselves, etc., but it's negligible space currently.

This does not cover all wikis, so we can expect it to expand, but unless the features of the extension change, not radically.

The small list of wikis currently covered by this feature are listed at https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/dblists/gettingstarted-with-category-suggestions.dblist . Changes to this would be the main source of expansion currently.

Change 351098 had a related patch set uploaded (by Mattflaschen; owner: Mattflaschen):
[mediawiki/extensions/GettingStarted@master] Add number of pages to dump_redis.php

https://gerrit.wikimedia.org/r/351098

gerritbot added a project: Patch-For-Review.Apr 30 2017, 3:16 AM

Change 351098 merged by jenkins-bot:
[mediawiki/extensions/GettingStarted@master] Add number of pages to dump_redis.php

https://gerrit.wikimedia.org/r/351098

ReleaseTaggerBot added a project: MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)).Apr 30 2017, 4:00 AM

The task seems stuck, trying to get up to speed:

In T158239#3034203, @Mattflaschen-WMF wrote:

It basically maintains a cache of which pages are in which categories, so it can use Redis SRANDMEMBER to select random pages from the categories.

It expects:

To write to Redis master when page edits are made (so from the master data center), or from maintenance/populate_categories.php..

Read from Redis slave, which tracks the master, on any web request.

Redis persistence. It doesn't expect Redis to empty out. If it does, we need to re-run the maintenance script.

In T158239#3046817, @Joe wrote:

Answering some of my questions:

this uses the $wgObjectCaches['redis_master'] and $wgObjectCaches['redis_slave'], variables, that default to:

The local DC for the slave

The MW master DC for the master

so in eqiad both slave and master are the same servers. reachable via the nutcracker socket.

As far as I can understand the GettingStarted "thinks" that it is using two datacenters (the one in which Redis master is and the one in which the slaves are) but in reality it uses one. Do we need to make this explicit in the code or is it safe to proceed with the simplification of the nutcracker's config? (namely having only local Redis shards listed in each DC).

• Mattflaschen-WMF mentioned this in T158572: Explore how to make a category search that includes sub-categories. .Jun 17 2017, 2:04 AM

• Mattflaschen-WMF moved this task from Untriaged to External on the Collaboration-Team-Triage board.Jun 20 2017, 5:22 PM

elukey moved this task from Ops Backlog to Stalled on the User-Elukey board.Jun 23 2017, 10:24 AM

elukey moved this task from Stalled to Keep an eye on it on the User-Elukey board.Aug 4 2017, 3:10 PM

Restricted Application added a project: Growth-Team. · View Herald TranscriptAug 30 2018, 1:28 PM

JTannerWMF removed a project: Growth-Team.Sep 6 2018, 10:54 AM

Restricted Application added a project: Growth-Team. · View Herald TranscriptSep 6 2018, 10:54 AM

JTannerWMF moved this task from Inbox to Needs Discussion on the Growth-Team board.Sep 24 2018, 4:02 AM

Catrope added a project: Technical-Debt.Oct 9 2018, 3:42 AM

Catrope moved this task from Needs Discussion to Triaged but Future on the Growth-Team board.

• Mholloway subscribed.Jan 10 2019, 5:21 PM

• Mholloway mentioned this in T206504: Create a new endpoint which returns articles in need of a description.Jan 10 2019, 5:46 PM

jijiki added a project: User-jijiki.Jan 23 2019, 7:42 AM

jijiki subscribed.

elukey removed a project: User-Elukey.Apr 16 2019, 11:01 AM

jijiki removed a project: User-jijiki.Sep 8 2020, 10:22 AM

MBinder_WMF added a project: Growth-Team-Filtering.Apr 15 2021, 6:56 PM

Aklapper removed a project: Collaboration-Team-Triage.May 25 2021, 9:09 PM

Pppery removed a project: Patch-For-Review.Apr 1 2023, 8:17 PM

MediaWiki-extensions-GettingStarted has been removed from Wikimedia wikis and is getting archived per T292654. Thus declining this task to reflect reality.

See e.g. GrowthExperiments-NewcomerTasks or #GuidedTour instead nowadays for related use cases.

Improve GettingStarted data storage strategyClosed, DeclinedPublicActions

Description

Details

Related Objects

Event Timeline

Improve GettingStarted data storage strategy
Closed, DeclinedPublic
Actions