Page MenuHomePhabricator

Create SQL database and Tables for Cognate extension to be used on Wiktionaries
Closed, ResolvedPublic

Description

The Cognate extension requires an additional database to be created with tables that will be shared between all wiktionaries.
The extension is already running on beta, a database named cognate_wiktionary has been created on the extension1 cluster.
I imagine it would make sense to carry this over into production (as long as the DBA team approve)
The extension is scheduled for deployment on the 24th April (see parent ticket).

Thus:

1 Database must be created on the decided cluster named 'cognate_wiktionary'
3 Tables must be created, SQL can be found @ https://phabricator.wikimedia.org/diffusion/1890/browse/master/db/

Related Objects

Event Timeline

Addshore created this task.
Addshore updated the task description. (Show Details)

As per T148988#2742029 I assume this is all public info and no further filtering is required when replicating to labs.
Most of the *wiktionary databases currently live in s3 so I would assume we need to create the new database and the 3 tables there too?

As per T148988#2742029 I assume this is all public info and no further filtering is required when replicating to labs.

Indeed, everything in these tables is public information and can be replicated to labs, no filtering required.

Most of the *wiktionary databases currently live in s3 so I would assume we need to create the new database and the 3 tables there too?

Well, the database can live anywhere / on any 'cluster' configured on mediawiki and the extension will work just fine.
If you feel that s3 is the best place for it then we can put it there!

So that we are not a blocker- creation of tables in production, specially if we have already given the OK to the plans, is not considered a schema change, so anyone with production rights can do it- you just need to mark it on the deployments calendar. With this we (DBAs) are not saying we do not want to do it, but you will have to wait for us to have a free slot, while many other people with development production rights may be able to do it better and earlier.

@Marostegui This is going to x1 as it is a shared thing between wikis- we may need to check replication filters.

So that we are not a blocker- creation of tables in production, specially if we have already given the OK to the plans, is not considered a schema change, so anyone with production rights can do it- you just need to mark it on the deployments calendar. With this we (DBAs) are not saying we do not want to do it, but you will have to wait for us to have a free slot, while many other people with development production rights may be able to do it better and earlier.

@Marostegui This is going to x1 as it is a shared thing between wikis- we may need to check replication filters.

Ah @jcrespo thanks for the clarification!
This sentence actually confused me The extension is already running on beta, a database named cognate_wiktionary has been created on the extension1 cluster. and forgot to ask you past Thursday in our meeting, I had it noted down for Monday actually :)

To clarify- it may be blocked on us right now to create the database and because labs filtering is not well managed, but the general idea stays for normal table creations.

This sentence actually confused me

x1 == extension1 :-)

A quick look on the x1 production hosts (thanks Jaime for the clarification) shows no replication filters so for those, we should be fine to create all on the master and let it replicate.
As Jaime mentioned, for labs we might need to manually intervene there, if creating the database+tables does not replicate well to labs because of all the filtering.

production hosts

Was thinking on dbstore (backup) hosts, which were problematic (remember you where the ones to set up those last time) + private table filtering. x1 has been traditionally not replicated to labs, this can be challenging (I would start by not replicationg at all, but setting the filters accordingly). Also reviewing the grants for the wikiuser and wikiadmin users.

Yup, we can leave labs aside for the "first stage" and then study those without blocking this task I would say.

@Addshore I hope labs access is not a blocker for this, that can be done at a later date.

@Addshore I hope labs access is not a blocker for this, that can be done at a later date.

Nope, not a blocker for this.

Hello,

So I have created the db and the 3 tables in the following production hosts (without replication):

root@neodymium:/home/marostegui/git/software/dbtools# cat x1.hosts
dbstore2001.codfw.wmnet	3306
tempdb2001.codfw.wmnet	3306
db2033.codfw.wmnet	3306
dbstore1001.eqiad.wmnet	3306
dbstore1002.eqiad.wmnet	3306
db1029.eqiad.wmnet	3306
db1031.eqiad.wmnet	3306

The dbstore servers, for the x1 replication thread will need to be modified, if we really think it is necessary to replicate this DB and its data as they have:

Replicate_Wild_Do_Table: flowdb.%,wikishared.%,heartbeat.%
root@neodymium:/home/marostegui/git/software/dbtools# for i in `cat x1.hosts | awk -F " " '{print $1}'`; do echo $i; mysql --skip-ssl -h$i cognate_wiktionary -e "show tables;";done
dbstore2001.codfw.wmnet
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+
tempdb2001.codfw.wmnet
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+
db2033.codfw.wmnet
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+
dbstore1001.eqiad.wmnet
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+
dbstore1002.eqiad.wmnet
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+
db1029.eqiad.wmnet
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+
db1031.eqiad.wmnet
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+

I will discuss with Jaime what to do with dbstore servers and labs and if we really need to do it there, as he mentioned, x1 is not being replicated to labs as well.
If not, I will remove the database and the tables from dbstore1001 and dbstore1002.

Will update the ticket.

I will discuss with Jaime what to do with dbstore servers and labs and if we really need to do it there, as he mentioned, x1 is not being replicated to labs as well.
If not, I will remove the database and the tables from dbstore1001 and dbstore1002.

So although replication to labs is not a blocker, It is something that we would like to happen for these tables.

So although replication to labs is not a blocker, It is something that we would like to happen for these tables.

We have been talking about it and we are not really comfortable replicating x1 to labs, as if something goes wrong we would be leaking private data. We have some filtering in place, but it is not a silver bullet and if they don't work as expected, the impact can be quite big, as I said, private information being leaked.
So for now we'd prefer to leave it out from labs.

There are some discussion about some services in the future might allow x1 split and replication to labs, if those finally go on, we could include these tables there.

Marostegui claimed this task.

I am going to close this as resolved as the core servers have been done and labsdb isn't something we can do super fast, as it needs some thoughts as I mentioned above.
We can track this in a separate ticket if necessary for the labs replication.

I would ask Addshore to confirm by running SELECT on the empty tables from terbium/tin, etc, using mediawiki scripts.

I would ask Addshore to confirm by running SELECT on the empty tables from terbium/tin, etc, using mediawiki scripts.

@Addshore can you confirm?
Thanks

All looks good

> addshore@tin:~$ mwscript sql.php --wiki=enwiktionary --cluster=extension1 --replicadb=any --wikidb=cognate_wiktionary --query="SHOW TABLES;"
stdClass Object
(
    [Tables_in_cognate_wiktionary] => cognate_pages
)
stdClass Object
(
    [Tables_in_cognate_wiktionary] => cognate_sites
)
stdClass Object
(
    [Tables_in_cognate_wiktionary] => cognate_titles
)
addshore@tin:~$ mwscript sql.php --wiki=enwiktionary --cluster=extension1 --replicadb=any --wikidb=cognate_wiktionary --query="DESCRIBE cognate_pages;"
stdClass Object
(
    [Field] => cgpa_site
    [Type] => bigint(20)
    [Null] => NO
    [Key] => PRI
    [Default] =>
    [Extra] =>
)
stdClass Object
(
    [Field] => cgpa_namespace
    [Type] => int(11)
    [Null] => NO
    [Key] => PRI
    [Default] =>
    [Extra] =>
)
stdClass Object
(
    [Field] => cgpa_title
    [Type] => bigint(20)
    [Null] => NO
    [Key] => PRI
    [Default] =>
    [Extra] =>
)
addshore@tin:~$ mwscript sql.php --wiki=enwiktionary --cluster=extension1 --replicadb=any --wikidb=cognate_wiktionary --query="DESCRIBE cognate_sites;"
stdClass Object
(
    [Field] => cgsi_key
    [Type] => bigint(20)
    [Null] => NO
    [Key] => PRI
    [Default] =>
    [Extra] =>
)
stdClass Object
(
    [Field] => cgsi_dbname
    [Type] => varbinary(32)
    [Null] => NO
    [Key] =>
    [Default] =>
    [Extra] =>
)
stdClass Object
(
    [Field] => cgsi_interwiki
    [Type] => varbinary(32)
    [Null] => NO
    [Key] =>
    [Default] =>
    [Extra] =>
)
addshore@tin:~$ mwscript sql.php --wiki=enwiktionary --cluster=extension1 --replicadb=any --wikidb=cognate_wiktionary --query="DESCRIBE cognate_titles;"
stdClass Object
(
    [Field] => cgti_raw
    [Type] => varbinary(255)
    [Null] => NO
    [Key] =>
    [Default] =>
    [Extra] =>
)
stdClass Object
(
    [Field] => cgti_raw_key
    [Type] => bigint(20)
    [Null] => NO
    [Key] => PRI
    [Default] =>
    [Extra] =>
)
stdClass Object
(
    [Field] => cgti_normalized_key
    [Type] => bigint(20)
    [Null] => NO
    [Key] => MUL
    [Default] =>
    [Extra] =>
)
addshore@tin:~$ mwscript sql.php --wiki=enwiktionary --cluster=extension1 --replicadb=any --wikidb=cognate_wiktionary --query="SELECT * FROM cognate_titles;"
Query OK, 0 row(s) affected
addshore@tin:~$ mwscript sql.php --wiki=enwiktionary --cluster=extension1 --replicadb=any --wikidb=cognate_wiktionary --query="SELECT * FROM cognate_pages;"
Query OK, 0 row(s) affected
addshore@tin:~$ mwscript sql.php --wiki=enwiktionary --cluster=extension1 --replicadb=any --wikidb=cognate_wiktionary --query="SELECT * FROM cognate_sites;"
Query OK, 0 row(s) affected