Page MenuHomePhabricator

Hashtag search tools is down
Closed, ResolvedPublic

Description

Event Timeline

Shyamal created this task.Jul 3 2018, 1:09 PM

Indeed keeps loading. Added the maintainers of the tool

Samwalton9 added a comment.EditedJul 3 2018, 7:20 PM

This is an issue that is, as far as I understand, going to require more or less a complete rewrite of the tool - it wasn't built to handle the now millions of hashtag entries, so databases queries are taking a prohibitively long time and causing it not to load.

I'm not sure that there are plans to fix it in the immediate future unfortunately :(

Now I see

Internal server error
<ExceptionInfo [oursql.PermissionsError: (1045, "Access denied for user 's52467'@'10.68.18.52' (using password: YES)", None)] (12 frames, last=Callpoint('oursql.Connection._raise_error (oursqlx/oursql.c:5885)', 183, 'oursql', 'connection.pyx', -1, ''))>

Samwalton9 added a comment.EditedAug 28 2018, 10:10 AM

Per T188205 the tool's database access has been revoked, and the tool will therefore remain down for the forseeable future. I don't have time to create a whole new version of the tool that's more DB-friendly, and I haven't heard from @Slaporte on this.

Just a heads up that I've made some time to work on a new version of this tool incorporating improvements we'd wanted in the past (e.g. EventStream monitoring). It will be hosted on a VPS project (T204059) rather than Toolforge to avoid DB issues.

Samwalton9 added a comment.EditedOct 2 2018, 11:44 AM

The new hashtags tool is now live at http://hashtags.wmflabs.org/

This version uses Django rather than Flask (Source code: https://github.com/Samwalton9/hashtags), simply because it's what I'm more familiar with, and monitors hashtags via the EventStream - this means we now catch hashtags almost immediately and from across all Wikimedia projects (except Wikidata initially, because the volume of edits was very high, I'll be looking into this more closely). The old database has been imported, albeit with bot edits stripped out (and the new tool ignores bot edits). This had the beneficial effect of reducing the database from 8 million entries to around 200,000 (most lost edits were InternetArchiveBot).

Please poke around and let me know how it's looking!

Notes: The EventStream script is currently running through the historical data there, catching up to the live stream. Also, there's a gap in data from August 8th (when toolforge database access was revoked) through to early September (the furthest back the EventStream historical data goes from today)

Once the tool seems stable I'll rework this Phabricator board; the tool is far from polished and there are a lot of tasks I need to file :)

Samwalton9 closed this task as Resolved.Oct 3 2018, 9:31 AM
Samwalton9 claimed this task.

Resolving this task now that the tool is up again, will be filing a number of followup tasks at Hashtags.