Page MenuHomePhabricator

Identify top users of Huggle
Closed, ResolvedPublic

Description

We'd like to get in touch with some of the top Huggle users to conduct interviews with them. This is tied to our research into edit-review tools as part of new-product planning. Since Huggle adds a notation to actions taken in the program, it should be possible to find these people by consulting the logs (?).

Let's say, top 20 users over the last three months? Something like that. If you can give an indication of how many edits each user has made in Huggle, that would be of interest.

Results

English Wikipedia

+----------+----------+--------------------------------------+
| edits    | user_id  | user name                            |
+----------+----------+--------------------------------------+
|    17401 |  3174456 | Oshwah                               |
|    11954 |  9676078 | I dream of horses                    |
|    11673 | 11795905 | Dcirovic                             |
|     4433 |  9929111 | Serols                               |
|     2788 | 16075528 | David.moreno72                       |
|     2710 | 12744454 | Donner60                             |
|     2493 | 26778615 | GSS-1987                             |
|     1942 | 20513590 | Patient Zero                         |
|     1683 |  7695475 | Jim1138                              |
|     1540 |   820190 | IronGargoyle                         |
|     1397 | 25523690 | CAPTAIN RAJU                         |
|     1389 |   794069 | Lazylaces                            |
|     1301 |   776137 | Mike1901                             |
|     1286 | 21640150 | Qpalzmmzlapq                         |
|     1177 | 16431338 | Laberkiste                           |
|     1046 | 17239082 | Frosty                               |
|      910 |  4452763 | Hello71                              |
|      904 | 27598246 | Omni Flames                          |
|      875 |  6682565 | Non-dropframe                        |
|      721 |   873244 | Optakeover                           |
+----------+----------+--------------------------------------+

French Wikipedia

+----------+----------+--------------------------------------+
| edits    | user_id  | user name                            |
+----------+----------+--------------------------------------+
|     2213 |  2480038 | Framawiki                            |
|      458 |  1740090 | Thibaut120094                        |
+----------+----------+--------------------------------------+

German Wikipedian

No significant use.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 6 2016, 12:14 AM
Neil_P._Quinn_WMF triaged this task as Normal priority.

This went through my inbox and I was interested so had a go on enwiki_p:

MariaDB [enwiki_p]> select count(*), rev_user, group_concat(distinct rev_user_text) from revision, tag_summary where revision.rev_id = tag_summary.ts_rev_id and ts_tags like '%huggle%' and rev_timestamp > date_sub(now(), interval 3 month) group by rev_user order by count(*) desc limit 20;
+----------+----------+--------------------------------------+
| count(*) | rev_user | group_concat(distinct rev_user_text) |
+----------+----------+--------------------------------------+
|    17401 |  3174456 | Oshwah                               |
|    11954 |  9676078 | I dream of horses                    |
|    11673 | 11795905 | Dcirovic                             |
|     4433 |  9929111 | Serols                               |
|     2788 | 16075528 | David.moreno72                       |
|     2710 | 12744454 | Donner60                             |
|     2493 | 26778615 | GSS-1987                             |
|     1942 | 20513590 | Patient Zero                         |
|     1683 |  7695475 | Jim1138                              |
|     1540 |   820190 | IronGargoyle                         |
|     1397 | 25523690 | CAPTAIN RAJU                         |
|     1389 |   794069 | Lazylaces                            |
|     1301 |   776137 | Mike1901                             |
|     1286 | 21640150 | Qpalzmmzlapq                         |
|     1177 | 16431338 | Laberkiste                           |
|     1046 | 17239082 | Frosty                               |
|      910 |  4452763 | Hello71                              |
|      904 | 27598246 | Omni Flames                          |
|      875 |  6682565 | Non-dropframe                        |
|      721 |   873244 | Optakeover                           |
+----------+----------+--------------------------------------+
20 rows in set (57.60 sec)

Turns out I'm not even in the top 40.

Aklapper added a subscriber: Petrb.May 6 2016, 1:06 PM

Thanks Krenair! I'd like to get perspective from at least one foreign wiki. Do you know what other wikis this is popular on? German, I think...

@Krenair: Thanks for doing that! Out of curiosity, why did you add the group_concat(distinct rev_user_text)? In case the user was renamed during the 3 months?

Awesome, thanks Krenair. I've turned that into a Quarry query:
https://quarry.wmflabs.org/query/9559

How would we adapt that for a wiki that doesn't use edit-tags? Can we search edit-summaries via Quarry? (Dewiki does add the linked (HG) in Huggle edits, e.g. https://de.wikipedia.org/w/index.php?title=Jimdo&diff=prev&oldid=153389069 )

Krenair added a comment.EditedMay 6 2016, 8:47 PM

@Krenair: Thanks for doing that! Out of curiosity, why did you add the group_concat(distinct rev_user_text)? In case the user was renamed during the 3 months?

The GROUP BY was on rev_user, not rev_user_text, so if your MySQL server has ONLY_FULL_GROUP_BY in sql_mode you'll get a something like ERROR 1055 (42000): 'enwiki_p.revision.rev_user_text' isn't in GROUP BY if you just try to select rev_user_text. If your server doesn't have that mode, who knows what value it'll choose? See T102915#1538781

Per T112637: RFC: Increase the strictness of mediawiki SQL code and leverage database code blockers for scalability all new code must be compatible with ONLY_FULL_GROUP_BY.

And yes, theoretically you can get multiple different rev_user_text values for the same user due to renaming issues.

I see you've generated that using the tag_summary, i.e. "huggle" at enwiki https://en.wikipedia.org/wiki/Special:Tags but dewiki doesn't have an equivalent in https://de.wikipedia.org/wiki/Spezial:Markierungen ...

Yeah, https://en.wikipedia.org/wiki/Wikipedia:Huggle/Config#Other has a "tag:huggle" line but dewiki doesn't seem to have an equivalent line.

How would we adapt that for a wiki that doesn't use edit-tags? Can we search edit-summaries via Quarry? (Dewiki does add the linked (HG) in Huggle edits, e.g. https://de.wikipedia.org/w/index.php?title=Jimdo&diff=prev&oldid=153389069 )

It seems that by default modern (v3) versions of Huggle's edit summaries will contain [[Project:Huggle|HG]] or ([[WP:HG|HG 3]])
You can use a WHERE clause like rev_comment like '% ([[WP:HG|HG 3]])%'. I'm giving it a go but the query is taking some time to run.

Thanks Krenair! I'd like to get perspective from at least one foreign wiki. Do you know what other wikis this is popular on? German, I think...

I'm not sure which wikis it's popular on (you could run it by script across all and find out?), and it's more difficult for some (probably most) wikis for the reasons @Quiddity found.

Petrb added a comment.May 6 2016, 10:43 PM

Please note that older versions of Huggle do not support tagging of edits. Also in order for it to work, MediaWiki needs to know this tag. I added it myself on enwiki so that this works precisely for this purpose, but received some negative response in a sense that it's not needed.

I am not sure, but fortunately no sysop deleted the tag so far. If you want to enable this on other wikis, you need to create the tag before enabling this in huggle config, otherwise you will get API warnings

Petrb added a comment.May 6 2016, 10:44 PM

Given how much broken Huggle is on other projects, it wouldn't surprise me if there were barely 20 editors on German wiki though (in last 3 months). I wanted to focus on fixing it during Hackathon, but unfortunately there were almost nobody interested in Huggle.

Krenair added a comment.EditedMay 6 2016, 11:17 PM

Given how much broken Huggle is on other projects, it wouldn't surprise me if there were barely 20 editors on German wiki though (in last 3 months). I wanted to focus on fixing it during Hackathon, but unfortunately there were almost nobody interested in Huggle.

I'd be interested in helping Huggle and have global interface editing rights, unfortunately due to the time of year those Hackathons are scheduled I've never been to one and probably won't go to one for another couple of years.

Petrb added a comment.EditedMay 8 2016, 8:31 PM

Well, it's partially my fault, despite @Aklapper asked me many times to schedule some workshop or post announcement, I didn't. I did however a quick research and asked couple of people on Hackathon about Huggle. Most of them knew it exists, but nobody really had much of an interest in working on it. I even went over some (german?) admin who was slightly upset just hearing word "Huggle" and mumbled something about idiots who use it... :)

Neil_P._Quinn_WMF added a comment.EditedMay 9 2016, 6:17 PM

Thanks, @Krenair! That's very interesting info.

Given how much broken Huggle is on other projects, it wouldn't surprise me if there were barely 20 editors on German wiki though (in last 3 months).

@Petrb, are there any projects other than enwiki where Huggle is used by a significant number of people? I'm wondering which other wikis we should look at this for this research.

Neil_P._Quinn_WMF raised the priority of this task from Normal to High.May 9 2016, 7:43 PM
Neil_P._Quinn_WMF lowered the priority of this task from High to Normal.
Petrb added a comment.May 9 2016, 8:47 PM

Hi, I don't think that there is any project with so many users as on enwiki, which is primarily because enwiki is probably hundreds times more populated compared to other wikis. There are some projects where people do use huggle though, but they are all tiny, dewiki including.

Unfortunately there is no easy way to find them, you could probably search DB, but given that every project has separate DB instance, it's not gonna be that simple (not as running 1 query to find all huggle users). I believe that huggle might be actively used on dewiki, frwiki and ptwiki but I was never watching over them.

@jmatazzoni, we have the results for enwiki from @Krenair. Following @Petrb's suggestion, I ran the numbers for dewiki and frwiki as well. There are some strange bugs in the output that I need to look into, but it looks like Huggle is not widely used on either project.

Results

One thing to consider is the different volume of activity on each wiki. In terms of new articles, English Wikipedia seems to produce ~2.4x more articles and 5x more edits than French or German Wikipedias. Even considering these proportions, the usage of Huggle seems to be minimal in those Wikipedias compared to English.

Neil_P._Quinn_WMF added a comment.EditedMay 11 2016, 6:07 PM

I ran the frwiki query directly rather than through Huggle, and it gave me more sensible results.

SELECT COUNT(*), rev_user, GROUP_CONCAT(DISTINCT rev_user_text)
    -> FROM revision
    -> WHERE
    -> rev_comment LIKE "%WP:HG%" AND
    ->     rev_timestamp > DATE_SUB(NOW(), INTERVAL 3 MONTH)
    -> GROUP BY rev_user
    -> ORDER BY COUNT(*) DESC LIMIT 20;
+----------+----------+--------------------------------------+
| COUNT(*) | rev_user | GROUP_CONCAT(DISTINCT rev_user_text) |
+----------+----------+--------------------------------------+
|     2213 |  2480038 | Framawiki                            |
|      458 |  1740090 | Thibaut120094                        |
|        9 |   576801 | AGhostDog                            |
|        7 |    48924 | Ycarus                               |
|        6 |  1562954 | Rocherd                              |
|        4 |  1160699 | Simon-kempf                          |
|        2 |   121887 | Sebk                                 |
|        2 |   135113 | Dodeeric                             |
|        2 |   356703 | Letartean                            |
|        2 |  2043647 | Fugitron                             |
|        2 |  2471641 | Caméléon Diaphane                    |
|        1 |   290530 | Zepote                               |
|        1 |  1429755 | Vikin                                |
|        1 |  1911569 | KurodaSho                            |
|        1 |  2364398 | CastelRune                           |
|        1 |  2477793 | Elizamarcu                           |
|        1 |  2483069 | AlvaroMolina                         |
|        1 |  2510629 | Pontcas                              |
+----------+----------+--------------------------------------+
18 rows in set (5 hours 35 min 17.67 sec)
Neil_P._Quinn_WMF updated the task description. (Show Details)
Neil_P._Quinn_WMF closed this task as Resolved.

I think the results in the description answer the question! Thanks everybody for the help!

I ran the frwiki query directly rather than through Huggle, and it gave me more sensible results.

Against labsdb1003.eqiad.wmnet/frwiki.labsdb/s6.labsdb instead of labsdb1001.eqiad.wmnet/enwiki.labsdb/s1.labsdb used by quarry?

@Krenair, against analytics-store.eqiad.wmnet (=dbstore1002). I'd guess that with such an intensive query, the labs db can't fully complete the grouping and each user gets broken up into a bunch of separate rows.

Using analytics databases is not what I'd call direct but they're probably more accurate than the public labs replicas. I don't think query difficulty/intensity should change the result...

Yeah, I didn't say it was a good guess :) I filed the weird results as T135087.

Neil_P._Quinn_WMF raised the priority of this task from Normal to Needs Triage.