User Details
- User Since
- Jun 19 2015, 6:00 PM (547 w, 2 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Nettrom [ Global Accounts ]
Nov 25 2023
SuggestBot’s web service has been shut off. qstat reports no running jobs, and there are no crontab entries for the tool account.
Feb 3 2021
May 30 2020
I'd like to close this as "declined" for now, as we haven't really seen any interest in this since the last comment. If there's interest, and I'm able to get more focused time to work on this part of SuggestBot, then we can reopen this. It could also be a potential candidate for a Hackathon project, but I don't know much about the criteria for those.
I don't have the bandwidth to work on this, so I'm removing myself as the assignee.
I've updated the project page on meta so it marks the research project as completed, and links to the GitHub repository that contains the code I wrote during the project.
Aug 26 2019
@Bstorm : looks like https://gerrit.wikimedia.org/r/513943 did not include a definition of comment_archive, was that intentional? I would expect there to be a comment_archive table to allow for joining with the archive table when querying archived edit comments (similar to how there's an actor_archive table), but that table doesn't exist on the replicated databases on Toolforge.
Dec 15 2018
The suggestbot-prod instance has been shut down and deleted (ref the suggestbot log). SuggestBot is now running from suggestbot-01. Move completed, task resolved.
Oct 27 2018
The migration process has been started by creating a new instance suggestbot-01. Had a bit of a start-stop-start situation as the process of moving instances to the new eqiad region is coinciding with this work, so the new instance had a short life in the old region, but is now present in eqiad1-r.
Oct 10 2018
Aug 21 2018
The report is now posted as a sub-page of the AfC Process Improvement page on enwiki. Marking this as resolved and reassigning it so I can track it there in case it gets reopened.
Fixed it by getting an email alias set up, so I'm marking this as resolved.
Aug 20 2018
Can confirm I have access and everything seems to be working. Thanks for taking care of this, and so quickly as well, awesome work!
Is this something that can be resolved on the Phabricator end, or should I look for a workaround? Either way is fine with me, as long as I can get a second account set up.
Aug 16 2018
Aug 14 2018
Aug 13 2018
@Milimetric Thanks for taking care of the SQL queries! I don't see a need for backfilling the data at the moment, there's not a benefit warranting that cost. As mentioned I can help the NPP folks out with getting their data together. In other words, as far as I can tell, this ticket can be closed now.
Aug 10 2018
@Niharika Yes, I'd like to keep this open and try to wrap it up in the near future, if that's okay with you?
I don't think backfilling all the data is very important. The only ones that appear to be affected are the NPP reviewers, and I should be able to run some queries on the Data Lake to either fill the missing data, or get reasonable estimates they can use.
Aug 8 2018
Aug 7 2018
Aug 2 2018
Aug 1 2018
Jul 31 2018
Jul 30 2018
Apr 30 2018
I see you're running into some of the same challenges that I had with getting good data on this for ACTRIAL, and that you've found some of the code and data that I have. Since I'm currently working on T192574, there's also some newer code and data available.
Apr 23 2018
The data gathering for this is now running, and I expect it'll take a day or two to complete. I also updated the database schema to have a column for the timestamp when a submission was withdrawn so that we can use that to better estimate the contribution to the AfC backlog from pages created in the Draft namespace (hypothesis 17).
Apr 19 2018
Mar 27 2018
I've spent a bit of time looking at this, and as far as I can find, the revision_deleted_timestamp is consistently incorrect. Using a sample dataset of creations from four different months, I've found that 15% of the time the deletion timestamp is missing. For pages that have it set, the vast majority of entries (almost 90%) do not match against the logging table. Lastly, of those that match against the logging table, it's almost always not a page deletion event.
Mar 22 2018
As mentioned on IRC earlier today, I never filed a ticket because I didn't have the time to sit down and make sure I had data that allowed me to understand exactly what the problem is. Picked it up again today because I now have some time to dig in.
Jan 17 2018
I checked the dashboard for enwiki and spot-checked a dataset, and the data appears to be in working order. Thanks for helping take care of this @Milimetric, and great to learn there's a way to easily fix this next time!
Jan 16 2018
Nov 21 2017
Nevermind, turns out @mforns has already updated that configuration, should've checked that first. Thanks again for taking care of it!
The data behind Page Creation Dashboard is configured to read data from the log database on dbstore1002. Can I at this point submit a patch to the ReportUpdater configuration that updates it to use db1108.eqiad.wmnet, as that now has the updated log database?
Oct 26 2017
Looks good to me, thanks again!
Oct 19 2017
Sep 27 2017
- Verified that the dataset of number of pages created is available in the correct dataset directory.
- Added metric for number of pages created in the Draft namespace to Dashiki:CategorizedMetrics in this edit.
- Added metric to Config:Dashiki:PageCreations in this edit.
- Verified that the metric is now available, it can be viewed here.
Sep 20 2017
Sep 12 2017
@kaldari : The three last metrics are only defined for English Wikipedia, partly because I saw them as ACTRIAL-specific. When it comes to the autopatrol right, those are also defined for different user groups depending on what wiki we're looking at, and I didn't see the benefit of figuring those out for the entire set of wikis.
@Nuria : Thanks for taking care of this! Sorry I didn't get around to updating the commit message as you requested, forgot to put that on my todo list.
New dataset has now been uploaded to figshare. If this direct link does not work, use this DOI link and download the "2017_english_wikipedia_quality_dataset.tar.bz2" file.
Sep 11 2017
There's a user group called "autoreviewer" that specifically gets the "autopatrol" user right. That right is also applied to bots and admins. Or at least that's how I read en:Special:ListGroupRights. The help page mentions that it used to be called "autoreviewer", so I guess they just never renamed the user group.
@kaldari : No, I really mean "autoreviewer", ref en:Special:ListGroupRights. I haven't been able to find any documentation that defines the user group in the system as "autopatrolled". And yes, I find that confusing.
@Neil_P._Quinn_WMF : I actually ran a query to get similar data on Friday, because I've been using it to figure out how long it takes for articles to get reviewed. My current best version of the query is in our GitHub repository: non_autopatrolled_creations.hql It looks for non-autopatrolled creations, but it's trivial to calculate the opposite proportion as I also have data on all article creations.
@awight : I was working on this yesterday, but didn't get the dataset ready overnight. The process I have goes as follows:
@Nuria : I added a short note to the tutorial about the requirements. Since I don't know npm very well, it's rather non-specific on how to get them installed. I'll make a mental note to look into nvm on a rainy day, as that might allow it to be more specific on how to go about doing this since I'll then know how to do this for both a global npm install as well as for a local one using nvm.
Sep 8 2017
@Nuria: I've tested our dashboard locally here and everything seemed to be working just fine. How do we go about getting it deployed? In this specific project, having a VM on Labs isn't really an option.
Sep 6 2017
Ah, I see! The tutorial isn't aligned with said documentation then. I'll update the tutorial and move forward.
From what I can tell after digging around a bit, the configuration of the Dashiki extension limits the creation of pages in the "Config" namespace to ones with titles starting with "Dashiki:" (refs [1,2]). Thus, I can create "Config:Dashiki:PageCreations", but not "Config:PageCreations", I suspect the latter is instead a pseudo page used by the JsonConfig extension.
Sep 5 2017
@Nuria : I'm working on this now, got the metrics added to [[m:Dashiki:CategorizedMetrics]] without breaking anything, or so it seems. I do not have permissions to create [[m:Config:PageCreationDashboard]], but it appears I can edit existing dashboards. Could you (or someone else who has permissions, pinging @kaldari) create the config page for our dashboard so I can edit it? Feel free to create it with a different title if the one I suggested breaks conventions.
Aug 31 2017
Ah, I remember being confused by the configuration file path in the examples I looked at, but forgot to ask about what it should be. Thanks for figuring that out and updating it, and also for your help with reviewing the patch, much appreciated!
Aug 29 2017
I'm a bit pressed for time at the moment, so to prevent this from stalling I'd like to propose that a first priority is that I try to create a dataset that doesn't have any redirects in it. Given the low number of redirects we have in the dataset, I expect this problem to be minimal if I simply sample a few hundred extra articles in the classes where that is possible. I'll also make sure the dataset doesn't contain any disambiguation pages.
Aug 28 2017
Aug 25 2017
I adapted this query for use in gathering some statistics for the ACTRIAL project and noticed that it seemed to fail to pick up deleted articles. In my dataset gathered a week ago there is 730 article creations on 2017-01-01, and 729 of those currently exist in the revision table. What appears to be a key reason for this is that event_comment for those deleted articles is NULL leading any event_comment NOT REGEXP 'foo' to remove that row from the query result.
Aug 23 2017
@mforns Patch submitted (linked below), and I added you as a reviewer. First time working with Gerrit, hopefully I got it mostly right! Happy to make changes as need be, fun to learn how to do this. Thanks again!
@mforns : Thanks much for your help with this! I've set up the queries so they return two columns, with the second named after the wiki as you recommended. Also, thanks for the link to the tutorial, it's a lot easier to follow than the technical documentation ([[:wikitech:Analytics/Systems/Dashiki]], I'd be happy to add a link to the tutorial from that page if that's useful?).
Aug 22 2017
Just a head's up that we've rephrased our hypotheses around patroller workload since the start of the ACTRIAL project, and "number of active patrollers" is now one of our measurements together with a few related ones. Ref hypotheses 9–13 on our project page: https://meta.wikimedia.org/wiki/Research:Autoconfirmed_article_creation_trial I plan to reuse your query for counting number of active patrollers, thanks!
I'm working on this and got ReportUpdater working locally. A couple of questions:
Jul 14 2017
I've gathered revision timestamps for all the revisions in the published dataset, and also checked for redirects. Here are some summaries:
Jul 12 2017
Jun 8 2017
Coming back to this I have a bunch of questions, so I'll just ask them and see where we go from there. Apologies if this is counterproductive, feel free to let me know how to improve in future work.
Jun 7 2017
@Mavrikant Excellent! The extractor looks good to go as far as I can tell. Also, happy to hear that you don't have HTML comments in your WikiProject templates, that makes life a lot easier :)
Jun 6 2017
@Mavrikant: thanks for getting code for the trwiki extractor up on https://github.com/Mavrikant/wikiclass/blob/master/wikiclass/extractors/trwiki.py, it makes everything a lot easier!
Apr 17 2017
Jan 4 2017
3,746,600 rows. The file I'm importing is 259MiB when unzipped.
Dec 8 2016
Dec 7 2016
I just want the record to show my appreciation for your work on this @scfc, thank you so much for figuring this out! I'll look into switching the qsub calls with jsub as you suggest. Thanks again for the work you've done on this!
Nov 28 2016
@Halfak: No, currently nothing interesting to bring, unfortunately.
Oct 25 2016
I'm thinking the 2015 CSCW paper is a better citation for this dataset (and ORES in general moving forward). While the approach is the same in both that and the "Tell Me More" paper, the more recent paper does a much better job of figuring out the appropriate features and the data gathering for training it is much better. So it's overall just more similar to the model ORES has.
Sep 11 2016
I think there's two AQ datasets going around. One is the one @Ladsgroup pointed to, which I believe @Halfak gathered, and is used for ORES training and evaluation. The second is the one I used to do some additional training to improve the wikiclass library, and that's already on figshare: https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406 This second dataset is gathered by following the process described in our 2015 CSCW paper, and referenced in the figshare description.
Sep 6 2016
Sep 5 2016
SuggestBot's cron jobs for editing https://en.wikipedia.org/wiki/Wikipedia:Community_portal/Opentask appear to be executing just fine. I also noticed that changing crontab worked on tools-bastion-02 for the tools.suggestbot account.
Aug 22 2016
Switching to trusty by adding the parameter fixed it, sorry for not thinking about that before creating the task. Closing it as resolved.
Jul 18 2016
Tested now in Safari (9.1.1) and it works there too. Possibly older Safari version fails, or there's been an update to Swagger? Not sure, but labelling it "WORKSFORME". I see little reason to investigate further.
Jul 15 2016
Forgot to add the opening greeting and the screenshot, fixed now: https://no.wikipedia.org/w/index.php?title=Wikipedia:Tinget&oldid=16529063
Made a couple of copyedits, translated it to Norwegian, and posted it to their Village Pump (https://no.wikipedia.org/w/index.php?title=Wikipedia:Tinget&oldid=16529056).
Jul 14 2016
@Halfak : Sure thing, do you have an example post, or some specific points that need to be mentioned?
Jun 14 2016
Sounds like this got resolved, awesome!
Jun 13 2016
@Harej Sorry about the delays, have been busy rewriting much of SuggestBot to get article page view data back. Am clear to work on this now and will get an updated version running that only recommends articles within specific projects.
Jun 2 2016
I went through and copied some commonly used insults into the bad words list. Noticed that the generated list contains quite a lot of variations, will you be stemming the words?
May 31 2016
With regards to the web service being restarted: I count 80 "No running webservice" notifications in ~suggestbot/service.log for 2016-05-30. There are a handful of requests in the access.log for the same day, so it seems that the service is still restarted for no apparent reason.
May 6 2016
Memory usage has been one of my concerns as SuggestBot's link recommender is an iterative breadth-first search from a set of articles, and that's what is mostly used. However, the script (~tools.suggestbot/link-rec/link-recommender.py) is reasonably memory-optimized as it used to run on the Toolserver, which had a 1GB memory limit.
May 5 2016
I'm not sure if this ticket should be closed now. While I rarely experience issues with accessing my web services, the problem with the server being restarted appears to persist. I wasn't sure if the deployment of T98440 would change anything, for May 1–4 I see no significant change with the web service being restarted 80, 74, 70, and 73 times, respectively. All of these have the matching pattern mentioned in T133090#2221574, although the average time before restart is down to 5s.
Apr 21 2016
There are 71 "server stopped" patterns in ~/error.log as well. All of them have timestamps within 6–12s (mean 7.89s) after each of the timestamps in the service log. So yes, it's the same pattern as mentioned in T133090#2221574.
I haven't seen many 503s over the past couple of days, so I regard this particular task as resolved. At the same time, I logged 71 restarts of the web service yesterday (see below), so that appears to be something that needs looking at. Should I open a separate ticket for that?
Apr 20 2016
I unfortunately have to report that the problem wasn't resolved, when I checked a couple of hours after @valhallasw's update, 503s were again reported.
