Page MenuHomePhabricator

High replag on s1, s3, s5
Closed, ResolvedPublic

Description

https://tools.wmflabs.org/replag/ shows 8 hours for lot of wikis.

c1.labsdb
Shard 	Lag (seconds) 	Lag (time)
s1	11473	03:11:13
s2	0	00:00:00
s3	11473	03:11:13
s4	0	00:00:00
s5	11473	03:11:13
s6	0	00:00:00
s7	0	00:00:00

c3.labsdb
Shard 	Lag (seconds) 	Lag (time)
s1	29811	08:16:51
s2	0	00:00:00
s3	29811	08:16:51
s4	0	00:00:00
s5	29811	08:16:51
s6	0	00:00:00
s7	0	00:00:00

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 23 2017, 5:19 PM

I see them 0 on other hosts, have you tried those?

wikireplica-analytics.eqiad.wmnet
Shard	Lag (seconds)	Lag (time)
s1	0.0000	00:00:00
s2	0.0000	00:00:00
s3	0.0000	00:00:00
s4	0.0000	00:00:00
s5	0.0000	00:00:00
s6	0.0000	00:00:00
s7	0.0000	00:00:00
wikireplica-web.eqiad.wmnet
Shard	Lag (seconds)	Lag (time)
s1	0.0000	00:00:00
s2	0.0000	00:00:00
s3	0.0000	00:00:00
s4	0.0000	00:00:00
s5	0.0000	00:00:00
s6	0.0000	00:00:00
s7	0.0000	00:00:00
jcrespo added subscribers: MahmoudHashemi, Slaporte.EditedSep 23 2017, 5:43 PM

@MahmoudHashemi @Slaporte s52467 / s52467__hashtags seem to be overloading labs hosts with multiple heavy querying connections, generating lag and other user complains, I have throttled the account to 2 connections per host.

It's hard to say what can be done about this without more information. There's user-generated load from visits to http://tools.wmflabs.org/hashtags but there's also batch traffic as well. Both are necessary for editathon folks to get their data.

What monitoring tools are available for us tools developers to get some insight? Also, as far as throttling, what was the original limit and the original usage? The answer to the first question would help us answer the second. :)

Please create separate accounts for separate tools- if users overload a tool, and it starts to make bad/lots of queries, the full account will get affected. Folks will not get their data if the replicas fall behind production. :-)

That's an interesting idea. It would really help to have answers to my questions if we are forced to register an account per language.

For convenience:

  • What is the limit on connections per account?
  • What is/was the usage on the hashtags account?
  • What resource-usage monitoring tools are available for tools developers? Is there a page on the topic I've missed?

Thanks!

Reedy renamed this task from Hight replag on s1, s3, s5 to High replag on s1, s3, s5.Sep 23 2017, 7:19 PM
jcrespo closed this task as Resolved.Sep 24 2017, 11:10 AM
jcrespo claimed this task.

Replag is back to 0.

I have increased the max concurrent connections of s52467 back to 10. Most of your questions can be answered by reading the documentation on best practices to access the database replicas: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Connection_handling_policy Don't worry too much about "doing things badly"- when a problem happens it is my job to ping people and see if limits can be set (but only when problems arise). If you do not get pinged, it means you can continue doing what you are doing.