Page MenuHomePhabricator

Add option to refreshLinks.php to only update pages that haven't been updated since a timestamp
Open, LowPublic

Description

We can use the page_links_updated field to find pages that haven't been updated in a while. This makes it easier for us to run the refreshLinks.php script across large wikis without updating pages that were recently updated.

Event Timeline

Legoktm created this task.Mar 3 2017, 5:20 AM

Hi, @Legoktm. What do you mean in recently? Last year? Last month? An hour from the previous script run? Half a time until today?
And also, nulledit is a very quick action. I run nulledit bots a lot. Do you think the time to null edit somepage is more than checking newstamps diff? Including the timestamp retrieving itself?
Thanks.

Hi, @Legoktm. What do you mean in recently? Last year? Last month? An hour from the previous script run? Half a time until today?

The point is that it would be configurable based on the person running the script. For the puproses in T157670, we'd use a timestamp that was a few months probably.

And also, nulledit is a very quick action. I run nulledit bots a lot. Do you think the time to null edit somepage is more than checking newstamps diff? Including the timestamp retrieving itself?

Checking whether the page was updated recently is way faster than just running the updates again. Making null edits may seem fast because the server defers some processing until later and tries to give you output as soon as it can, but when we're talking about millions of pages across all wikis, it quickly adds up,

Thank you, @Legoktm.

The point is that it would be configurable based on the person running the script. For the puproses in T157670, we'd use a timestamp that was a few months probably.

And who is this person? It should be automatically, once a time, shouldn't it?

Checking whether the page was updated recently is way faster than just running the updates again. Making null edits may seem fast because the server defers some processing until later and tries to give you output as soon as it can, but when we're talking about millions of pages across all wikis, it quickly adds up,

Thanks, I see.

The point is that it would be configurable based on the person running the script. For the puproses in T157670, we'd use a timestamp that was a few months probably.

And who is this person? It should be automatically, once a time, shouldn't it?

For now it's me doing it manually, but in the future it should be some determined time that runs automatically and regularly.

For now it's me doing it manually, but in the future it should be some determined time that runs automatically and regularly.

Very well, @Legoktm, so will it be an option to run it when I want to and set the "recently" time as I want to? And also, just the last one, when the script runs automatically per period?