Page MenuHomePhabricator

Joins on INTEGER and TEXT fail with PostgreSQL
Closed, ResolvedPublic

Description

SpecialAbuseLog does three "LEFT JOIN"s on "af_id=afl_filter" with af_id being INTEGER and afl_filter being TEXT. These fail on PostgreSQL.

MySQL is very forgiving on such joins and for example matches 4711 with "4711 mysql sucks", but PostgreSQL does not. For AbuseFilter, afl_filter seems to be either a reference to af_id or a reference to af_id with "global-" prepended, so there shouldn't be a need for complete imitation. There is no explanation for this data model which must make JOINs even on MySQL very expensive and the needed storage unnecessarily huge.

Short term solution would be something like "wfGetDB( DB_MASTER )->getType() == 'mysql' ? 'af_id=afl_filter' : 'af_id::TEXT=afl_filter'" (not the other way round as that fails on non-convertable strings), proper way would be of course to split afl_filter into afl_filter_id and afl_filter_global or similar and let the database do what it's best at.

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:59 AM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz40757.
bzimport added a subscriber: Unknown Object (MLST).

High + Blocker, but no progress since it was reported, a year and half ago?

Is anybody currently planning to work on this?

Lowering priority - PostgreSQL support is maintained by volunteers and patches are welcome...

There is a new Database::buildStringCast which can be used here - https://gerrit.wikimedia.org/r/#/c/296066/

Jdforrester-WMF subscribed.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.

Daimona subscribed.

Right now we have a total of five cases where such a JOIN is done. Schema changes aren't something to take carelessly, but this column doesn't really make sense. The table is pretty large though (for instance, 22 millions rows on enwiki). I guess we should first add the new column, start writing it, and reading from both. Then clean the afl_filter column with a maintenance script and finally change it to be INT.
Adding DBA for thoughts about the process above.

Marostegui subscribed.

We don't maintain Postresql
Sorry :-(
Me personally I don't even have experience with Postresql either

@Marostegui Unfortunately, neither do I :-) However, PG is just the cause here. What we want to do (and I'm asking DBAs about) is perform a schema change on abuse_filter_action. Simply, we'd like to add a column (afl_filter_global) and ALTER the afl_column to be integer instead of text, for every DBMS. As I was saying, the table isn't that small, so we need to make sure that the process will be smooth.

@Marostegui Unfortunately, neither do I :-) However, PG is just the cause here. What we want to do (and I'm asking DBAs about) is perform a schema change on abuse_filter_action. Simply, we'd like to add a column (afl_filter_global) and ALTER the afl_column to be integer instead of text, for every DBMS. As I was saying, the table isn't that small, so we need to make sure that the process will be smooth.

Why don't you just add a new INT column, start writing to both, migrate rows from the old one to the new one with a maintenance script running on the background and once all the old rows have been migrated, drop the old TEXT one and only use the new INT one?

@Marostegui Perfect. I guess we should seize the opportunity and rename afl_filter to afl_filter_id to make sure that it's INT. At any rate, I think this should be momentarily delayed, since we're already dealing with several big changes to AF (actor migration, major overhaul for throttling and major overhaul for profiling). IMHO they should have the priority over this change, which (although necessary for performance etc.) is harmless on MySQL/MariaDB.

@Marostegui Perfect. I guess we should seize the opportunity and rename afl_filter to afl_filter_id to make sure that it's INT.

That is up to you, just include it on the schema change once you are ready for it.
https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change

Change 459818 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Split afl_filter in afl_filter_id and afl_global

https://gerrit.wikimedia.org/r/459818

As there are no more actionables for the DBA Team here I will untag it.
I will remain subscribed here though in case you have further questions.
Once this is good to go just send the schema change ticket as described in the link in my previous comment above and we will take it from there.

Thanks!

@Marostegui Yep, many thanks :-) I'll go ahead with that as soon as we'll be ready.

Change 468706 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Use string cast for Postgres compatibility

https://gerrit.wikimedia.org/r/468706

Change 488477 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Read afl_filter_id and afl_global instead of afl_filter

https://gerrit.wikimedia.org/r/488477

Change 493240 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Start writing to afl_filter_id and afl_global

https://gerrit.wikimedia.org/r/493240

Change 468706 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Use string cast for Postgres compatibility

https://gerrit.wikimedia.org/r/468706

Change 459818 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Split afl_filter in afl_filter_id and afl_global

https://gerrit.wikimedia.org/r/459818

Change 493240 abandoned by Daimona Eaytoy:
Start writing to afl_filter_id and afl_global

Reason:
Done in parent patch with a config option.

https://gerrit.wikimedia.org/r/493240

Change 488477 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Read afl_filter_id and afl_global instead of afl_filter

https://gerrit.wikimedia.org/r/488477

Change 488477 abandoned by Daimona Eaytoy:
Read afl_filter_id and afl_global instead of afl_filter

Reason:
Done in parent using schema flag.

https://gerrit.wikimedia.org/r/488477

Change 468706 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Use string cast for Postgres compatibility

https://gerrit.wikimedia.org/r/468706

Daimona removed a project: Patch-For-Review.

Calling this resolved because the compatibility problem with postgres is indeed resolved. I'm gonna open another task for the long-term goal of splitting afl_filter in afl_filter_id and afl_global.