Improve Copy Patrol performance
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	MusikAnimal
	Jun 23 2016, 3:46 PM

Description

We should try to mass-query the API and database to speed things up.

Pull request at https://github.com/Niharika29/PlagiabotWeb/pull/13/files but it's admittedly quite messy

Related Objects

Mentioned In: T139108: Group all title API requests into a single API request in CopyPatrol tool
T139083: Expand EranBot database to store WikiProjects and write cron job to update it regularly, also use this new data

Event Timeline

MusikAnimal created this task.Jun 23 2016, 3:46 PM

Restricted Application added subscribers: Zppix, JEumerus, Aklapper. · View Herald TranscriptJun 23 2016, 3:46 PM

MusikAnimal set the point value for this task to 3.Jun 23 2016, 3:47 PM

MusikAnimal moved this task from Ready to Needs Review/Feedback on the Community-Tech-Sprint board.

Also I was unable to get prepared statements to work when querying the db: https://github.com/Niharika29/PlagiabotWeb/pull/13/files#diff-07db3f9aa54a2870d9c9854e2389730fR60

Not sure how necessary that is. For this particular query I don't see a risk of SQL injection, and that database is read-only anyway

@MusikAnimal Bryan commented on that. I left a couple of comments too.
Nope, no real risk of SQL injection but it's good to be consistent.

• DannyH subscribed.Jun 23 2016, 4:50 PM

Merged and deployed the first round of improvements :) Still going to leave this open because I think we can do one better: If we authenticate through the API with bot credentials, we can query for all dead links at once. There's a fair amount of work involved, I think, since we'd probably want to use some sort of API framework.

Does this sound worthwhile? If so, whose bot credentials should we use? We can use MusikBot's if you want, I don't think anyone will care or even notice, but it might make more sense to have our own Community Tech bot account purely for the purposes of getting around API limitations.

@MusikAnimal: Why do we need to authenticate as a bot? The maximum limit should be 500 for regular users and 5000 for bots, but we're only showing 50 titles at a time.

@MusikAnimal Actually, with the performance improvements already in place, we can increase the default number of records to 100 and that should be enough. We shouldn't put in more effort for performance improvements until we really know this is a tool people are using actively and facing performance issues.

In T138508#2404431, @kaldari wrote:

@MusikAnimal: Why do we need to authenticate as a bot? The maximum limit should be 500 for regular users and 5000 for bots, but we're only showing 50 titles at a time.

It looks like we're using the basic query endpoint, which at least according to the docs only accepts up to 50 titles at a time.

In T138508#2405008, @Niharika wrote:

@MusikAnimal Actually, with the performance improvements already in place, we can increase the default number of records to 100 and that should be enough. We shouldn't put in more effort for performance improvements until we really know this is a tool people are using actively and facing performance issues.

I think the desire for 100 records instead of 50 was because users were having to do a lot of scrolling to get to items that actually needed reviewing (correct me if I'm wrong). With the advent of the filters, it seems there not much of an advantage increasing the pagination size. E.g. it probably takes users a good while to carefully review 50 records. I'm just really liking opening up /copypatrol and seeing something within 3-4 seconds, and doubling that will almost bring us back to where we were before.

I was also thinking with the improved performance we could add an infinite scroll. So by the time they're reviewing item 40, 41, 42, etc, the next chunk of 50 records will have already loaded in the background.

@MusikAnimal: Your right, I had the wrong order of magnitude. We're still only loading 50 pages at a time though, so I don't see why we would need to authenticate (since the limit is 50).

I think we should leave it how it is (50 at a time with a load more button) until someone lets us know that they want something different. When in doubt, keep it simple.

In T138508#2407512, @kaldari wrote:

@MusikAnimal: Your right, I had the wrong order of magnitude. We're still only loading 50 pages at a time though, so I don't see why we would need to authenticate (since the limit is 50).

I think we should leave it how it is (50 at a time with a load more button) until someone lets us know that they want something different. When in doubt, keep it simple.

The API is needed to check for dead links, so the articles + editor userpages/user talk pages amounts to up to 150 page titles. We could turn our current 3 API queries into one, and if we also grabbed the info on WikiProjects all at once, it will probably speed it up noticeably. I had a few other ideas on how to improve load time, but the tool indeed goes reasonably fast right now, so nothing high-priority :)

@MusikAnimal: Ah, now I understand :) Sounds like a good idea, but low priority. We have a Community Tech bot that we could use as the account. Let's create a new card for that and close this one.

kaldari closed this task as Resolved.Jun 30 2016, 4:33 PM

kaldari moved this task from Needs Review/Feedback to Q1 2018-19 on the Community-Tech-Sprint board.

kaldari mentioned this in T139083: Expand EranBot database to store WikiProjects and write cron job to update it regularly, also use this new data.Jun 30 2016, 5:13 PM

kaldari mentioned this in T139108: Group all title API requests into a single API request in CopyPatrol tool.Jun 30 2016, 8:32 PM

• DannyH edited projects, added Community-Tech; removed Community-Tech-Sprint.Jul 5 2016, 7:57 PM

• DannyH moved this task from New & TBD Tickets to Archive on the Community-Tech board.Jul 5 2016, 8:05 PM

MusikAnimal moved this task from Backlog to Done on the CopyPatrol board.Dec 6 2016, 5:24 AM

Improve Copy Patrol performanceClosed, ResolvedPublic3 Estimated Story PointsActions

Description

Related Objects

Event Timeline

Improve Copy Patrol performance
Closed, ResolvedPublic3 Estimated Story Points
Actions