Page MenuHomePhabricator

Improve Copy Patrol performance
Closed, ResolvedPublic3 Estimated Story Points

Description

We should try to mass-query the API and database to speed things up.

Pull request at https://github.com/Niharika29/PlagiabotWeb/pull/13/files but it's admittedly quite messy

Event Timeline

MusikAnimal set the point value for this task to 3.Jun 23 2016, 3:47 PM
MusikAnimal moved this task from Ready to Needs Review/Feedback on the Community-Tech-Sprint board.

Also I was unable to get prepared statements to work when querying the db: https://github.com/Niharika29/PlagiabotWeb/pull/13/files#diff-07db3f9aa54a2870d9c9854e2389730fR60

Not sure how necessary that is. For this particular query I don't see a risk of SQL injection, and that database is read-only anyway

@MusikAnimal Bryan commented on that. I left a couple of comments too.
Nope, no real risk of SQL injection but it's good to be consistent.

Merged and deployed the first round of improvements :) Still going to leave this open because I think we can do one better: If we authenticate through the API with bot credentials, we can query for all dead links at once. There's a fair amount of work involved, I think, since we'd probably want to use some sort of API framework.

Does this sound worthwhile? If so, whose bot credentials should we use? We can use MusikBot's if you want, I don't think anyone will care or even notice, but it might make more sense to have our own Community Tech bot account purely for the purposes of getting around API limitations.

@MusikAnimal: Why do we need to authenticate as a bot? The maximum limit should be 500 for regular users and 5000 for bots, but we're only showing 50 titles at a time.

@MusikAnimal Actually, with the performance improvements already in place, we can increase the default number of records to 100 and that should be enough. We shouldn't put in more effort for performance improvements until we really know this is a tool people are using actively and facing performance issues.

@MusikAnimal: Why do we need to authenticate as a bot? The maximum limit should be 500 for regular users and 5000 for bots, but we're only showing 50 titles at a time.

It looks like we're using the basic query endpoint, which at least according to the docs only accepts up to 50 titles at a time.

@MusikAnimal Actually, with the performance improvements already in place, we can increase the default number of records to 100 and that should be enough. We shouldn't put in more effort for performance improvements until we really know this is a tool people are using actively and facing performance issues.

I think the desire for 100 records instead of 50 was because users were having to do a lot of scrolling to get to items that actually needed reviewing (correct me if I'm wrong). With the advent of the filters, it seems there not much of an advantage increasing the pagination size. E.g. it probably takes users a good while to carefully review 50 records. I'm just really liking opening up /copypatrol and seeing something within 3-4 seconds, and doubling that will almost bring us back to where we were before.

I was also thinking with the improved performance we could add an infinite scroll. So by the time they're reviewing item 40, 41, 42, etc, the next chunk of 50 records will have already loaded in the background.

@MusikAnimal: Your right, I had the wrong order of magnitude. We're still only loading 50 pages at a time though, so I don't see why we would need to authenticate (since the limit is 50).

I think we should leave it how it is (50 at a time with a load more button) until someone lets us know that they want something different. When in doubt, keep it simple.

@MusikAnimal: Your right, I had the wrong order of magnitude. We're still only loading 50 pages at a time though, so I don't see why we would need to authenticate (since the limit is 50).

I think we should leave it how it is (50 at a time with a load more button) until someone lets us know that they want something different. When in doubt, keep it simple.

The API is needed to check for dead links, so the articles + editor userpages/user talk pages amounts to up to 150 page titles. We could turn our current 3 API queries into one, and if we also grabbed the info on WikiProjects all at once, it will probably speed it up noticeably. I had a few other ideas on how to improve load time, but the tool indeed goes reasonably fast right now, so nothing high-priority :)

@MusikAnimal: Ah, now I understand :) Sounds like a good idea, but low priority. We have a Community Tech bot that we could use as the account. Let's create a new card for that and close this one.