Page MenuHomePhabricator

Have CopyPatrol output editor's IP address if they are anonymous
Closed, ResolvedPublic3 Estimated Story Points

Description

Currently, if an anonymous IP user creates a new article on Wikipedia, the CopyPatrol interface will show the text "editor not found". Instead, we should show the user's IP address and provide appropriate links. Don't show edit count for IPs.

Event Timeline

I can't imagine this being very useful, honestly. It'll take another join on the Revision table because the user table does not contain information about IP addresses, making the tool even slower.
And we do need the join on the user table to get the editcount.

Aren't we already querying against the revision table to get the user ID? or is that something returned from Eranbot's database? If we're already querying against the revision table, it should be pretty simple: just include the rev_user_text in the query, and if the user ID is 0, output the user text.

And I think we can just omit the edit count if the user is an anon IP.

Aren't we already querying against the revision table to get the user ID? or is that something returned from Eranbot's database? If we're already querying against the revision table, it should be pretty simple: just include the rev_user_text in the query, and if the user ID is 0, output the user text.

Aren't we already querying against the revision table to get the user ID? or is that something returned from Eranbot's database? If we're already querying against the revision table, it should be pretty simple: just include the rev_user_text in the query, and if the user ID is 0, output the user text.

You're right, I am already using the revision table but weirdly, it doesn't seem to contain IP address information.

MariaDB [enwiki_p]> SELECT rev_user_text, rev_user FROM revision LIMIT 10\G;
*************************** 1. row ***************************
rev_user_text: 
     rev_user: 0
*************************** 2. row ***************************
rev_user_text: 
     rev_user: 0
*************************** 3. row ***************************
rev_user_text: 
     rev_user: 0
*************************** 4. row ***************************
rev_user_text: 
     rev_user: 0
*************************** 5. row ***************************
rev_user_text: 
     rev_user: 856
*************************** 6. row ***************************
rev_user_text: 
     rev_user: 62
*************************** 7. row ***************************
rev_user_text: 
     rev_user: 188
*************************** 8. row ***************************
rev_user_text: 
     rev_user: 0
*************************** 9. row ***************************
rev_user_text: 
     rev_user: 90
*************************** 10. row ***************************
rev_user_text: 
     rev_user: 3903
10 rows in set (0.00 sec)

I'll investigate this a bit.

Hmm, I believe it should be storing the IP addresses in the rev_user_text field, but I could be wrong. Also, you may need to do a LEFT OUTER JOIN in the getUserDetails query (rather than a simple JOIN) so that it includes the records where there is no corresponding user table entry (which will be the case with anon IP users).

OK, I'm super confused. If I do...

MariaDB [enwiki_p]> select * FROM revision where rev_user = 0 LIMIT 10;
+-----------+----------+-------------+----------------------------------+----------+-------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
| rev_id    | rev_page | rev_text_id | rev_comment                      | rev_user | rev_user_text     | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1                        | rev_content_model | rev_content_format |
+-----------+----------+-------------+----------------------------------+----------+-------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+
| 519207843 |        0 |           0 | /* Beschreibung */ Kleinigkeiten |        0 | Der-wuppertaler   | 20121004222920 |              0 |           0 |    2836 |             0 | dt0qefs4iu6bysy4g3elu33z0w07vn7 | NULL              | NULL               |
| 519207844 |        0 |           0 | /* Auszeichnungen */             |        0 | Der-wuppertaler   | 20121004222956 |              0 |           0 |    2837 |             0 | 1s7bjkqw61zqgsc5j2lfxjvbc9hr0te | NULL              | NULL               |
|    862220 |       10 |           0 | Automated conversion             |        0 | Conversion script | 20020225154311 |              1 |           0 |      35 |        233192 | i8pwco22fwt12yp12x29wc065ded2bh | NULL              | NULL               |
| 381200179 |       10 |           0 |                                  |        0 | 76.28.186.133     | 20100826222351 |              0 |           0 |    1875 |     133452289 | im247wfm4kfzwwu24bgw3hnxkjd1gvv | NULL              | NULL               |
|     18201 |       12 |           0 | Automated conversion             |        0 | Conversion script | 20020225150022 |              1 |           0 |    9546 |     332419362 | 07sqam7073877kptdznnip3viznphpy | NULL              | NULL               |
|     19746 |       12 |           0 | *                                |        0 | 140.232.153.45    | 20020225154311 |              0 |           0 |   11279 |         18201 | px5ovjydixhpbysqmqiw7floem9ii5i | NULL              | NULL               |
|     19749 |       12 |           0 | *                                |        0 | 24.188.31.147     | 20020227173409 |              0 |           0 |   11394 |         19746 | ir7scjzn4w7y93it75q41tamjrfd3nd | NULL              | NULL               |
|     20514 |       12 |           0 | *                                |        0 | 24.188.31.147     | 20020227173641 |              0 |           0 |   11486 |         19749 | o139okh8ak7f6x7f6hwdogun10kucpy | NULL              | NULL               |
|     42733 |       12 |           0 | term in bold                     |        0 | 213.253.39.175    | 20020301001317 |              0 |           0 |   11488 |         20514 | jmq3csxbr1t85ko3qkt0i8d3dtjlc2v | NULL              | NULL               |
|     42738 |       12 |           0 | *                                |        0 | 206.82.16.35      | 20020402095125 |              0 |           0 |   11546 |         42733 | q0yfrgc9zuz5vcsjsb40301rn49emow | NULL              | NULL               |
+-----------+----------+-------------+----------------------------------+----------+-------------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+

...but...

MariaDB [enwiki_p]> select rev_user_text, rev_user FROM revision where rev_user = 0 LIMIT 10;
+---------------+----------+
| rev_user_text | rev_user |
+---------------+----------+
|               |        0 |
|               |        0 |
|               |        0 |
|               |        0 |
|               |        0 |
|               |        0 |
|               |        0 |
|               |        0 |
|               |        0 |
|               |        0 |
+---------------+----------+

Wow, you're right. I wonder if we're doing something silly here.

Probably there is a covering index for the second query, and using that results in different ordering. (Remember, SQL does not give you any guarantee of sorting consistency if you don't use ORDER BY.) Rows with empty rev_user_text might be the residue of some old bug or maybe that column didn't even exist at the time the first few revisions have been made.

@Tgr: Makes sense.

@Niharika: Looks like we'll need to look for cases where user ID is 0 and user text is empty string, and in those cases show the "editor not found" message.

DannyH set the point value for this task to 3.Jun 16 2016, 6:02 PM
DannyH moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.

Fixed with this commit, included in the all mighty Wikimania branch!

Indeed, the revision table has what we want, so we just need to do a LEFT JOIN to get the rows from revision that don't match ON r.rev_user = u.user_id (since anons don't have a user_id). No noticeable affect on runtime.

For performance, we should probably be fetching data about the revisions in a single query (so 100 at a time) like WHERE r.rev_id IN (1234, 1235, 1236, ...); I bet that would speed it up quite a bit. You could do the same for the API queries, which have a 50-title limit