Page MenuHomePhabricator

Audit all existing code to ensure that any extension currently or previously adding blobs to ExternalStore has been registering a reference in the text table (and fix up if wrong)
Open, Needs TriagePublic

Description

Audit all existing code to ensure that any extension currently or previously adding blobs to ExternalStore has been registering a reference in the text table (and fix up if wrong)

Related Objects

Event Timeline

Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF raised the priority of this task from to Needs Triage.
Jdforrester-WMF renamed this task from Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table to Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong).
Jdforrester-WMF set Security to None.
Jdforrester-WMF added subscribers: greg, Legoktm, jcrespo and 4 others.

I see usage of the ExternalStore class in AbuseFilter and Flow. AbuseFilter is already updating the text table AFAIS.

AbuseFilter is already updating the text table AFAIS.

There is, however, a known bug (T34478) that would cause recompressTracked.php to corrupt the stored data. Despite a subsequent change in storage format (rEABF42bd0d84f4244ca2), the bug has still not yet been fixed.

mark added a subscriber: mark.Jul 22 2015, 11:37 AM

AIUI: the immediate space problem would first be solved by buying new hardware & moving all data to larger disks, right? Flow is not blocking that.

After that, the plan would be to recompress all existing ExternalStore entries by running trackBlobs.php and recompressTracked.php. Flow is not blocking that.
After that, we should be able to decommission the old (uncompressed) clusters as all data has been recompressed and moved over. This is blocked by Flow: since Flow doesn't store references in text, its entries would not be recompressed to the new cluster, and lost once we get rid of the old clusters.

I suggest to first move Flow's ExternalStore entries away from the shared ExternalStore clusters and into its own ExternalStore DB. The script to do that is mostly done already.
Are there any good reasons not to set up a new ExternalStore cluster specific to Flow data (and possibly others), where trackBlobs.php and recompressTracked.php don't run and move the existing Flow ExternalStore entries there?

This comment was removed by ArielGlenn.

who might be able to take on the recompressTracked.php problem?

@ArielGlenn I already talked to devs about external storage flow management and this were my thoughts: T107610#1506876.

Too many tickets to keep track of :-).

For everybody: Please note that even if new hardware is a blocker for the actual migration, things should be prepared by when it arrives (I understand this is not an easy topic, though).

yeah I saw and already removed my comment :-)

akosiaris triaged this task as Normal priority.Aug 25 2015, 1:06 PM
akosiaris added a subscriber: akosiaris.
greg added a comment.Sep 10 2015, 4:54 AM

Is the list of blockers here comprehensive (iow: have we audited and found all the cuplrits)?

It looks like @jcrespo is driving this, yes?

hashar renamed this task from Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) to Audit all existing code to ensure that any extension currently or previously adding blobs to ExternalStore has been registering a reference in the text table (and fix up if wrong).Sep 10 2015, 7:43 AM
hashar updated the task description. (Show Details)
jcrespo closed this task as Declined.May 16 2017, 3:42 PM
jcrespo added a subscriber: brion.

This is assigned to DBAs-operations. DBAs are not going to audit any kind of mediawiki code == declined. Jforester- feel free to reopen or create a new ticket, but assigned to the right team. Not doing this could break all mediawiki content, though. CC MediaWiki-Platform-Team cc @brion as this is probably related to the revision table reworking (it just makes no sense to keep it open as is).

This is assigned to DBAs-operations.

Not true. It wasn't assigned to anyone. This task was created as a split out of your task, T106386: Compress data at external storage, at your instigation a couple of years ago. Did you wish to decline that task instead?

DBAs are not going to audit any kind of mediawiki code

Indeed.

== declined.

Has the need gone away?

Has the need gone away?

No, this is very much needed, but the way I use phabricator for DBA tickets is- if they are assigned to us, and I cannot do anything about them, I decline them. Anyone else can reopen and reasign them or start working on them. Otherwise that will give the wrong expectations to the reporter that it is on our backlog. There is one exception, which is if it is assigned to some other project, in which case I move it to "blocked external". Not declining it means it will be lost on my backlog! :-)

I am cool with other people using phabricator differently, in which case, just delete the DBA and #operation tags and put it back to being untriaged.

jcrespo reopened this task as Open.

^I would be ok with this, for example.

jcrespo raised the priority of this task from Normal to Needs Triage.May 16 2017, 4:19 PM

Despite my particular usage of phabricator, probably there should be a way to say: "hey, this is interesting to you and you should be aware of it but I or someone else will do it" vs. "hey, can you do this if you/when had the time? yep, I cannot say when I will be doing this, but I will own this" :-D I use a separate column for that, but it may not be clear in all cases- specially on the "support" vs. "development" way of doing things.

Despite my particular usage of phabricator, probably there should be a way to say: "hey, this is interesting to you and you should be aware of it but I or someone else will do it" vs. "hey, can you do this if you/when had the time? yep, I cannot say when I will be doing this, but I will own this" :-D I use a separate column for that, but it may not be clear in all cases- specially on the "support" vs. "development" way of doing things.

Generally we put them in a different column, yes – "Watching" or "External" or whatever. But it can be confusing for anyone not in the team, as it's not obvious whether the tag on the task means "Foo are going to do this, you can ignore it" or "Foo are anxious that you get this done right now!" or anything in between. :-)

Per the above comments, adding the Platform team tag.