TZ: UTC +1/+2
User Details
- User Since
- Sep 1 2016, 6:48 AM (408 w, 8 h)
- Availability
- Available
- IRC Nick
- marostegui
- LDAP User
- Marostegui
- MediaWiki User
- MArostegui (WMF) [ Global Accounts ]
Today
Yesterday
I've installed 10.11 on db2136 (s4) for now. I've pooled in in production for a couple of hours to capture queries that would take longer than 10 seconds to run. For now the host is depooled again and will only be pooled during certain working hours.
Just for the record, I have been investigating the current lag on clouddb1019:3314 - it is because of this:
root@clouddb1019.eqiad.wmnet[commonswiki]> show explain for 3790361; +------+--------------------+---------------+-------+-------------------------------------------------------+--------------------+---------+------------------------------+-----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+---------------+-------+-------------------------------------------------------+--------------------+---------+------------------------------+-----------+--------------------------+ | 1 | PRIMARY | page | ALL | page_name_title,page_redirect_namespace_len | NULL | NULL | NULL | 146173850 | Using where | | 3 | MATERIALIZED | categorylinks | range | PRIMARY,cl_timestamp,cl_sortkey | cl_timestamp | 257 | NULL | 2 | Using where; Using index | | 2 | MATERIALIZED | linktarget | range | PRIMARY,lt_namespace_title | lt_namespace_title | 261 | NULL | 12 | Using where; Using index | | 2 | MATERIALIZED | templatelinks | ref | PRIMARY,tl_target_id,tl_backlinks_namespace_target_id | tl_target_id | 8 | commonswiki.linktarget.lt_id | 92 | Using index | | 8 | DEPENDENT SUBQUERY | pagelinks | ref | pl_target_id | pl_target_id | 8 | commonswiki.linktarget.lt_id | 6 | Using index | | 7 | DEPENDENT SUBQUERY | templatelinks | ref | tl_target_id | tl_target_id | 8 | commonswiki.linktarget.lt_id | 92 | Using index | +------+--------------------+---------------+-------+-------------------------------------------------------+--------------------+---------+------------------------------+-----------+--------------------------+ 6 rows in set, 1 warning (0.030 sec)
So in terms of data, my recap is:
- root password is differentr from production
- the data that is present there is sanitized and there's not data there that cannot be queried publicly. Although there's some data data there that we filter via the views and not only via sanitarium, but I guess that's fine
- replication user password I don't think it is such a big deal as the risk of having someone to set up a new replica directly from production is minimum and there would be lots of others things that would need to be done for that to be successful.
- wikiuser and wikiadmin are no longer there.
- non-public data (such as suppressed edits or bans) are possibly available - but I don't know enough MW to be able to say if this is still doable or not @Ladsgroup would you know?
Dropping it in s3, which will take around 8 hours.
This is done, I will track the master switchover in a different track
Tue, Jun 25
I will try - but just in case @ABran-WMF please take some notes!
old 300
No errors in db1169 for a week
Sun, Jun 23
If it was stopped correctly it should be fine to start again and resume replication too
I didn't bring anything back up as I wasn't aware of what was going on
Sat, Jun 22
From what I can see this was part of T367648
The downtime message above is wrong, it was downtimed cause it crashed :)
Fri, Jun 21
@BTullis can you double check why an-redacteddb1001 isn't having check_private_data runs every day like clouddb1021 has? I detected it because it doesn't have the logs:
Thu, Jun 20
There was nothing other than rebuilds/optimizes
I don't think we've had any since that but I am going to double check
This is done
Old 400
Wed, Jun 19
It makes sense it has more load and hence the queries can take longer, as anayltics hosts have larger queries in general, which pile up until they finished or get killed. It is not something strange to see. However, I want to reiterate that when the service was first set up it was agreed that it was a best effort and it was never guaranteed the hosts would have 0 lag.
This has been done - @BTullis let me know when clouddb1021 is decommissioned so I can remove it from zarcillo
Sorry wrong task
eqiad fixed.
s2 pending: db1162
Thank you!