Page MenuHomePhabricator

Display results of Special:SimilarEditors
Closed, ResolvedPublic5 Estimated Story Points

Assigned To
Authored By
Tchanders
Apr 27 2022, 5:26 PM
Referenced Files
F35182252: mw_debug_logs.txt
May 27 2022, 1:20 PM
F35182249: similar-users_logs.txt
May 27 2022, 1:20 PM
F35182259: only_headers.png
May 27 2022, 1:20 PM
F35073316: image.png
Apr 29 2022, 1:57 PM
F34752840: image.png
Apr 27 2022, 5:26 PM

Description

From T296214:

  • The response from the service is displayed as a standard table, e.g. see system messages and blocked users

Design

image.png (1×1 px, 114 KB)

Notes

  • The <table> should have the classes mw-datatable and sortable
  • The module jquery.tablesorter should be loaded. Every column should be sortable.

Related investigation: T304525: Investigate how to display results on Special:SimilarEditors [8H]

Related Objects

Event Timeline

@Prtksxna What should the '(timeline)' links from the screenshot link to?

Change 790772 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/SimilarEditors@master] WIP: Sketch of the results formatter for SimilarEditors

https://gerrit.wikimedia.org/r/790772

The spd-test tool has this and is linking to the interaction timeline between the user that was queried and the the user in the row. While I think its useful, I'm not sure if we want to officially link to an external tool (that I think this team built) from a MediaWiki extension. @Niharika what do you think?

The spd-test tool has this and is linking to the interaction timeline between the user that was queried and the the user in the row. While I think its useful, I'm not sure if we want to officially link to an external tool (that I think this team built) from a MediaWiki extension. @Niharika what do you think?

Thanks. Let's leave it out for the sake of this task, but we can raise a new task for adding it if we decide to (and any other links).

Am I right in assuming "Label" is not actually needed at the bottom of the table heading cells?

We'll make sure to put this task through design review, and we can file any adjustments as a follow-up too.

The spd-test tool has this and is linking to the interaction timeline between the user that was queried and the the user in the row. While I think its useful, I'm not sure if we want to officially link to an external tool (that I think this team built) from a MediaWiki extension. @Niharika what do you think?

I see the usefulness as well. We have also seen the timeline being used on sockpuppet report pages on enwiki and some other wikis. Let's keep it.

The spd-test tool has this and is linking to the interaction timeline between the user that was queried and the the user in the row. While I think its useful, I'm not sure if we want to officially link to an external tool (that I think this team built) from a MediaWiki extension. @Niharika what do you think?

I see the usefulness as well. We have also seen the timeline being used on sockpuppet report pages on enwiki and some other wikis. Let's keep it.

Thanks - filed as T309035. (There's a question in the task description about which time period to show)

Change 790772 merged by jenkins-bot:

[mediawiki/extensions/SimilarEditors@master] Display results on Special:SimilarEditors

https://gerrit.wikimedia.org/r/790772

@Tchanders I am seeing a few errors in the similar-editors python script, which I am wondering if I should report or if they are due to the test data being incomplete.

  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=BD2412&quicksurvey=similareditors

Output from similar-editors:

ERROR:similar_users.factory:Exception on /similarusers [GET]
Traceback (most recent call last):
  File "/opt/lib/python/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/lib/python/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/lib/python/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/opt/lib/python/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/lib/python/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/lib/python/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/lib/python/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/lib/python/site-packages/flask_basicauth.py", line 108, in wrapper
    return view_func(*args, **kwargs)
  File "/opt/lib/python/site-packages/prometheus_flask_exporter/__init__.py", line 686, in func
    return current_app.handle_user_exception(ex)
  File "/opt/lib/python/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/opt/lib/python/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/lib/python/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/lib/python/site-packages/prometheus_flask_exporter/__init__.py", line 684, in func
    raise exception
  File "/opt/lib/python/site-packages/prometheus_flask_exporter/__init__.py", line 642, in func
    response = current_app.handle_user_exception(ex)
  File "/opt/lib/python/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/opt/lib/python/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/lib/python/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/lib/python/site-packages/prometheus_flask_exporter/__init__.py", line 639, in func
    response = f(*args, **kwargs)
  File "/srv/service/similar_users/wsgi.py", line 246, in get_similar_users
    update_coedit_data(user_text, edits, app.config["EDIT_WINDOW"])
  File "/srv/service/similar_users/wsgi.py", line 547, in update_coedit_data
    i for i, e in enumerate(revs) if e["user"] == user_text
  File "/srv/service/similar_users/wsgi.py", line 547, in <listcomp>
    i for i, e in enumerate(revs) if e["user"] == user_text
KeyError: 'user'
  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=Arjayay&quicksurvey=similareditors

similar-editors output:

[lots of calls to the MW API]
[2022-05-25 15:46:14 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)
[2022-05-25 15:46:14 +0000] [8] [INFO] Worker exiting (pid: 8)
[2022-05-25 15:46:14 +0000] [40] [INFO] Booting worker with pid: 40
  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=2601%3A845%3A100%3Ac00%3A38f7%3A2512%3Aa61e%3A105a&quicksurvey=similareditors

similar-editors output:

[2022-05-25 15:49:10,260] ERROR in wsgi: Failed to get additional edits for user 2601:845:100:c00:38f7:2512:a61e:105a: 'most_recent_edit'
ERROR:similar_users.factory:Failed to get additional edits for user 2601:845:100:c00:38f7:2512:a61e:105a: 'most_recent_edit'

On Special:SimilarEditors I see:

Notice: Undefined index: results in /var/www/html/w/extensions/SimilarEditors/src/SimilarEditorsClient.php on line 81

Warning: array_map(): Argument #2 should be an array in /var/www/html/w/extensions/SimilarEditors/src/SimilarEditorsClient.php on line 81
  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=Ser+Amantio+di+Nicolao&quicksurvey=similareditors

In MW debug logs:

[http] GET: http://blackbird:5000/similarusers?usertext=Ser Amantio di Nicolao
[http] Invalid URL: http&#58;//blackbird:5000/similarusers?usertext&#61;Ser Amantio di Nicolao

(I think this is due to there being a space in the username)

Tchanders added a subscriber: Isaac.

Thanks @dom_walden

@Tchanders I am seeing a few errors in the similar-editors python script, which I am wondering if I should report or if they are due to the test data being incomplete.

  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=BD2412&quicksurvey=similareditors

Filed as: T309232: Similarusers service should check for 'user' key in results returned from Revisions API

  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=Arjayay&quicksurvey=similareditors

similar-editors output:

[lots of calls to the MW API]
[2022-05-25 15:46:14 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)
[2022-05-25 15:46:14 +0000] [8] [INFO] Worker exiting (pid: 8)
[2022-05-25 15:46:14 +0000] [40] [INFO] Booting worker with pid: 40

I believe this is a known limitation, but we should handle it better. Working out how to do that is part of T308649: Investigate: How to implement error handling for similar users api call [8H].

@Isaac Should a task be filed for this, or is this just something that should be worked around for now?

  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=2601%3A845%3A100%3Ac00%3A38f7%3A2512%3Aa61e%3A105a&quicksurvey=similareditors

similar-editors output:

[2022-05-25 15:49:10,260] ERROR in wsgi: Failed to get additional edits for user 2601:845:100:c00:38f7:2512:a61e:105a: 'most_recent_edit'
ERROR:similar_users.factory:Failed to get additional edits for user 2601:845:100:c00:38f7:2512:a61e:105a: 'most_recent_edit'

On Special:SimilarEditors I see:

Notice: Undefined index: results in /var/www/html/w/extensions/SimilarEditors/src/SimilarEditorsClient.php on line 81

Warning: array_map(): Argument #2 should be an array in /var/www/html/w/extensions/SimilarEditors/src/SimilarEditorsClient.php on line 81

Bug with SimilarEditors, but will be addressed in T308649: Investigate: How to implement error handling for similar users api call [8H]

  1. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=Ser+Amantio+di+Nicolao&quicksurvey=similareditors

In MW debug logs:

[http] GET: http://blackbird:5000/similarusers?usertext=Ser Amantio di Nicolao
[http] Invalid URL: http&#58;//blackbird:5000/similarusers?usertext&#61;Ser Amantio di Nicolao

(I think this is due to there being a space in the username)

Confirmed that it works with underscores instead. (This one actually times out, but others with spaces fail with spaces but work with underscores.) Filed as T309235: URL encode user name in Similarusers service request URL

Should a task be filed for this, or is this just something that should be worked around for now?

Just to be clear: this question pertains to the timeout when gathering new contributions for Arjayay? If so, a few things going on here:

  • I don't think the databases for the tool have been updated in many many months so you're going to see this happen much more right now than you would when we have the monthly updates to the databases running
  • What's happening is that say the databases are good to 30 April 2022. If you query a user today, the tool hits the API for the pages they edited since 30 April. For each of these pages, the edit history (since 30 April in this example) is then gathered and analyzed. This is the step that's almost certainly timing out. Details:
    • The first set of API calls for pages edited is loosely capped at 1000 pages (code). It would be pretty cheap to reduce that cap to say 50 pages. Then if someone edited a lot recently, the first call to the tool would get those 50 pages. The next call would maybe get the next 50. And so on. So you essentially stretch out the data updates over multiple sessions so no one session times out (hopefully) at the cost of maybe not having all the most current data in that first session. The timespan associated with the data is included in the API response though so hopefully we could expose this easily to the user of the tool. We're making some assumptions too about the data that aren't perfect and the more sessions this update is spread across, the more likely we are to introduce error into the data. I wouldn't be super concerned about this and I haven't empirically evaluated it but an FYI. Each monthly database update resets this error to 0 though, so that's good.
    • The second set of API calls (code) is the expensive step. Many active editors will have edited 1000 unique pages since the database was last refreshed and each of those pages could have many associated API calls to get the edit history (especially right now). There are maybe ways to also explicitly limit this process but it's a lot trickier to do (definitely its own task and I don't know when/if it would be figured out). All to say, much easier to address this at the prior page gathering step.

A few paths forward I can see:

  • I think it's fair to decide it's not a top concern because we'll update the data before we test it with Checkusers (I hope) and so you should see this happen much less often then.
  • The error handling task you mentioned obviously is useful because solution we identify probably still won't be perfect.
  • The code change to reduce the pages checked is pretty simple. I'm just not certain who should be reviewing that. Now that the tool is is this production-level service, I'm not able to test it easily but I'm happy to submit a patch with the code that I believe should improve it if we have someone who can do the testing / validation / deployment of it.

I do sometimes see the MediaWiki debug logs reporting an HTTP timeout even if the similar-users service does not report a timeout error.

For example:

  • Here is the similar-users log:

  • Here is the MW log for the same request (reporting timeout):

Requests to some users just return the table headers but no data, e.g. http://localhost:8081/wiki/Special:SimilarEditors?wpTarget=B2Belgium&quicksurvey=similareditors

only_headers.png (505×1 px, 48 KB)

@Tchanders I don't know if these are worth reporting separately.

@dom_walden I'm not sure about the timeouts, but I've added a questions about the discrepancy to the performance review: T304633

F35182259

Filed as T310491

  • The code change to reduce the pages checked is pretty simple. I'm just not certain who should be reviewing that. Now that the tool is is this production-level service, I'm not able to test it easily but I'm happy to submit a patch with the code that I believe should improve it if we have someone who can do the testing / validation / deployment of it.

@Isaac Thanks for the detailed information! This would be helpful if you have the time, and we can test/review - otherwise if you could file a task, we can look into writing a patch

Thanks @Tchanders. I have nothing more to add here.

I tested Special:SimilarEditors for every username in the test data set. It either returned results or one of the errors/timeouts mentioned above.

I assume the results from the similarusers service are correct, but I don't know for sure as I have nothing to compare them to.