Page MenuHomePhabricator

Collection script seemingly randomly stops working
Closed, ResolvedPublic

Description

This is vague because I haven't been able to debug it yet, but twice in the past two days the collection script seems to have randomly stopped collecting new edits.

The hashtags_scripts_1 container is still up, and still reports the python process as running, but clearly no new edits are making it in to the database, and the container must be restarted. It then catches up as expected.

Occurred around 5pm on the 16th and then 10am on the 17th.

Suspect it has something to do with the volume of visitors (from #1lib1ref), but I don't have any immediate ideas for why that would be the case.

Event Timeline

Suggestions from @jsn.sherman:

  • Check database logs
  • Consider using ProxySQL to cache searches
  • When inserting new hashtags, follow up with a check that the edit was logged. If not, kick the container/restart the script.

Hasn't happened again since the first report.

Happened again today.

We're now running docker-compose up hourly to kick dead containers.

For me, the scripts containers goes down for every 10-15mins. I checked the logs of the scripts container,

Traceback (most recent call last):
  File "collect_hashtags.py", line 38, in <module>
    if db.is_duplicate(hashtag, change['revision']['new']):
KeyError: 'revision'

The change variable is not having revision key.

I cross checked with the change object,

{
  "title": "シロアリ(YouTuber)",
  "bot": false,
  "parsedcomment": "<a href=\"/wiki/WP:CSD#リダイレクト2-5\" class=\"mw-redirect\" title=\"WP:CSD\">WP:CSD#リダイレクト2-5</a> 曖昧さ回避括弧の使い方違反: 投稿者:<a href=\"/wiki/%E7%89%B9%E5%88%A5:%E6%8A%95%E7%A8%BF%E8%A8%98%E9%8C%B2/YBK75\" title=\"特別:投稿記録/YBK75\">YBK75</a> 内容: 「{{sd|r2-5|2=全角括弧が用いられている}} #転送 <a href=\"/wiki/%E3%82%B7%E3%83%AD%E3%82%A2%E3%83%AA_(YouTuber)\" title=\"シロアリ (YouTuber)\">シロアリ (YouTuber)</a>」",
  "wiki": "jawiki",
  "server_name": "ja.wikipedia.org",
  "log_action_comment": "deleted &quot;[[シロアリ(YouTuber)]]&quot;: [[WP:CSD#リダイレクト2-5]] 曖昧さ回避括弧の使い方違反: 投稿者:[[Special:Contributions/YBK75|YBK75]] 内容: 「{{sd|r2-5|2=全角括弧が用いられている}} #転送 [[シロアリ (YouTuber)]]」",
  "log_type": "delete",
  "type": "log",
  "log_action": "delete",
  "server_url": "https://ja.wikipedia.org",
  "meta": {
    "topic": "eqiad.mediawiki.recentchange",
    "dt": "2019-03-08T14:29:50+00:00",
    "partition": 0,
    "id": "a2072baf-41ae-11e9-84f4-141877613bad",
    "request_id": "122ea342-9747-4a9e-b388-27927826be78",
    "offset": 1438007202,
    "schema_uri": "mediawiki/recentchange/2",
    "domain": "ja.wikipedia.org",
    "uri": "https://ja.wikipedia.org/wiki/%E3%82%B7%E3%83%AD%E3%82%A2%E3%83%AA%EF%BC%88YouTuber%EF%BC%89"
  },
  "timestamp": 1552055390,
  "id": 101699146,
  "log_params": [],
  "comment": "[[WP:CSD#リダイレクト2-5]] 曖昧さ回避括弧の使い方違反: 投稿者:[[Special:Contributions/YBK75|YBK75]] 内容: 「{{sd|r2-5|2=全角括弧が用いられている}} #転送 [[シロアリ (YouTuber)]]」",
  "server_script_path": "/w",
  "log_id": 4229519,
  "user": "Sumaru",
  "namespace": 0
}

To fix this, can you tell me whether we should ignore the object or insert it?

This is a bug I noticed recently, it occurs when a recent change is something other than an edit or page creation. It should be fixed in https://github.com/Samwalton9/hashtags/commit/b7981bdbba54baa2da324dec68afe9affcecddde

Do you have the latest version of the repository?

After updating you'll need to ensure the scripts container files have updated too. I tend to docker rm and docker-compose to make sure it's got the latest files, but there's probably a more convenient method.

The default time limits for executing SQL statements in mySQL is 1000(1 second). It may be that due to traffic or overloading, the SQL statements take time to be executed.

Looks like T179986 might be related. This potentially isn't a tool issue?

Samwalton9 claimed this task.

This seems to be resolved.