Similar to T67192, but for -recentchanges and its recentChanges generator.
While this is not a one-liner, it is marked as good first task because T67192 provides a good example of the changes, and it is quite easy to test.
Similar to T67192, but for -recentchanges and its recentChanges generator.
While this is not a one-liner, it is marked as good first task because T67192 provides a good example of the changes, and it is quite easy to test.
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
pywikibot/core | master | +62 -9 | Reimplement imageuncat's -recentchanges as a pagegenerators CLA |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | darthbhyrava | T129193 reimplement imageuncat's -recentchanges as a pagegenerators command line argument | |||
Resolved | darthbhyrava | T130189 Remove unnecessary props from rcprop in RecentChangesPageGenerator |
@jayvdb
There is already an existing -recentchanges:x command line argument in pagegenerators, which returns the x most recently changed pages (if x not mentioned, then x=60), while the function of -recentchanges in imageuncat is to return pages changed between 70 and 120 minutes ago.
Would you like me to implement a new command line argument in pagegenerators (maybe -recentchanges-timespan or -recentchanges:timespan or something similar) for imageuncat's functionailty?
Or I could modify the existing -recentchanges in different ways:
This task is about improving pagegenerators' -recentchanges . Please see T67192 and the code used to fix it.
I did go through that task and the corresponding code. :)
My doubt was merely if you wanted a modification of the existing parameter, or the creation of a new one, since the task's wording is 'reimplement'.
Since you have now mentioned 'improving', and since the quoted task also involves modification of an existing parameter, I take it that we have to modify -recentchanges such that it accepts both -recentchanges:x and -recentchanges:delay,block.
I will go ahead with that. Please correct me if I am wrong?
Thank you. :)
Change 276535 had a related patch set uploaded (by Darthbhyrava):
Change: Reimplement imageuncat's -recentchanges as a pagegenerators CL arg
I've added a patch, but am not sure about the tests. Could someone please take a look?
pagegenerators_tests throws the following error:
$ nosetests -v tests.pagegenerators_tests ERROR: Test recentchanges generator with offset and duration params. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/darthbhyrava/Dev/GSoC/pywikibot/core/tests/pagegenerators_tests.py", line 809, in test_recentchanges_timespan gf.handleArg('-recentchanges:120,70') File "/home/darthbhyrava/Dev/GSoC/pywikibot/core/pywikibot/pagegenerators.py", line 704, in handleArg _filter_unique=self._filter_unique) File "/home/darthbhyrava/Dev/GSoC/pywikibot/core/pywikibot/tools/__init__.py", line 1358, in wrapper return obj(*__args, **__kw) File "/home/darthbhyrava/Dev/GSoC/pywikibot/core/pywikibot/pagegenerators.py", line 1144, in RecentChangesPageGenerator user=user, excludeuser=excludeuser) File "/home/darthbhyrava/Dev/GSoC/pywikibot/core/pywikibot/tools/__init__.py", line 1358, in wrapper return obj(*__args, **__kw) File "/home/darthbhyrava/Dev/GSoC/pywikibot/core/pywikibot/site.py", line 4368, in recentchanges self.assert_valid_iter_params('recentchanges', start, end, reverse) File "/home/darthbhyrava/Dev/GSoC/pywikibot/core/pywikibot/site.py", line 2240, in assert_valid_iter_params "%s: start must be later than end with reverse=False" % msg_prefix) Error: recentchanges: start must be later than end with reverse=False
Should I swap the rcstart and rcend values in my edit of pagegenerators or is problem somewhere else?
Though both recentChangesin imageuncat and RecentChangesPageGenerator in pagegenerators call site.recentchanges (here's a paste listing out both), the differences seem to be as follows:
gen.request['rcprop'] = 'title|user|comment|ids' for p in gen: yield pywikibot.Page(site, p['title'], p['ns'])
gen = (pywikibot.Page(site, x['title']) for x in gen if x['type'] != 'log' or 'title' in x)
If the above can be resolved, then I wouldn't need to have an extra from_imageuncat param.
I think that p['ns'] should not be an issue.
Actually I do not understand why the docstring in RecentChangesPageGenerator says:
@param changetype: only iterate changes of this type ("edit" for edits to existing pages, "new" for new pages, "log" for log entries)
while, then, log is excluded.
Wrong docstring or clause?
We already have -logevents to deal with recent log entries, that is probably why it is excluded. I'll look into it, and confirm.
About the p['ns'] - should I include it or exclude it when merging with imageuncat's code?
And the issue of gen.request ['rcprop'] remains. What can be done about it if we merge the code without the from_imageuncat param?
@jayvdb?
Agree.
Actually I do not understand why the docstring in RecentChangesPageGenerator says:
@param changetype: only iterate changes of this type ("edit" for edits to existing pages, "new" for new pages, "log" for log entries)while, then, log is excluded.
Wrong docstring or clause?
Regarding for x in gen if x['type'] != 'log' or 'title' in x
This is to avoid trying to generate a Page for a log entry where the title has been suppressed. If the log entry has a title, it will be generated.
This is a feature of RecentChangesPageGenerator, and it is a bug in imageuncat's recentChanges. git blame shows it was slightly re-arranged in fc70846a, and the original bug fix is 996fa7e6.
So we dont want to re-implement this bug in imageuncat. (And this is a great example of why using a unified implementation in pagegenerators is a good idea: bug fixes in pagegenerators benefit all scripts that use it, and the bug fix is unlikely to be copied to separate implementations in scripts).
The default rcprop value is documented on mw:API:RecentChanges as timestamp|title|ids, so imageuncat value of title|user|comment|ids has two additional props : user and comment. They are explained on the aforementioned API manual page.
recentChanges fetches these additional properties, but then it only uses p['title'] and p['ns'] , so it is not using the values that prop comment or user provide.
So why is it there? It was added in September 2009 by 7654b64b , with a vague commit message referring to @Multichill who might know more about why it was added. However a little git blame analysis allows us to deduce why the change happened.
In September 2009, APISite.recentchanges was using rcprop="user|comment|timestamp|title|ids|redirect|patrolled|loginfo|flags" (it is only slightly different in the current 2016 code), so that commit was reducing the number of rc props being used: it removed timestamp and redirect|patrolled|loginfo|flags. A comment in that commit says:
So it seems the main objective of the patch was to remove patrolled from the list of requested props. It also avoided requesting unnecessary/unused properties. user and comment are two additional unused properties that could have removed been removed in that changeset, but were not. They are not needed.
Now when we look at RecentChangesPageGenerator, we see it is using APISite.recentchanges, and APISite.recentchanges does not request the patrolled prop. So, the main objective of this line in imageuncat's recentChanges appears to have been solved in APISite.recentchanges. Therefore, imageuncat should be able to use RecentChangesPageGenerator without any problem without a modified rcprop.
Could you please find the commit which removed the patrolled prop from APISite.recentchanges?
However, RecentChangesPageGenerator is requesting many other props which are unnecessary as they are not used. RecentChangesPageGenerator` only uses title and type -- everything else is discarded. So there is a performance improvement which can be made to RecentChangesPageGenerator, by setting the rcprop to only include the props which are used. This should be a separate (new) changeset.
That was certainly very instructive. Thanks. :D
Looking up APISite.recentchanges, I can see the code you speak of mentioning rcprop in recentchanges:
rcgen = self._generator(api.ListGenerator, type_arg="recentchanges", rcprop="user|comment|timestamp|title|ids" "|sizes|redirect|loginfo|flags", namespaces=namespaces, step=step, total=total)
Interestingly, the September '09 change by Alex removed 'patrolled' because of denied permission. (Whitelist issues, maybe?) Anyway, here's the commit you wanted. :)
gen = (pywikibot.Page(site, x['title']) for x in gen if x['type'] != 'log' or 'title' in x)
So, since we don't want to re-implement the non-titled entries bug in imageuncat, that's the issue resolved here.
gen.request['rcprop'] = 'title|user|comment|ids' for p in gen: yield pywikibot.Page(site, p['title'], p['ns'])
And since we just saw how imageuncat doesn't, in fact, call the extra props, a non-modified rcprop will do. That's this issue solved, too. :D
I'll submit another patch with these changes in mind, then.
However, RecentChangesPageGenerator is requesting many other props which are unnecessary as they are not used. RecentChangesPageGenerator` only uses title and type -- everything else is discarded. So there is a performance improvement which can be made to RecentChangesPageGenerator, by setting the rcprop to only include the props which are used. This should be a separate (new) changeset.
I've created a new task T130189 for this changeset, which can be a blocking task for this one. Description literally being your words. :D Take a look?
I'm trying to understand the difference between @deprecated and @deprecated() decorators, as defined in tools, and I didn't quite understand what 'callable' in the description of @deprecated() meant:
# The latter is invoked with the decorator arguments as *args & **kwargs, # and it must return a callable which will be invoked with the decorated # function as args[0].
I've gone with: @deprecated('RecentChangesPagesGenerator') to deprecate recentChanges in -imageuncat.
I've submitted a patch for the blocking task as well as updated patch for this one, please take a look? @jayvdb
Change 276535 merged by jenkins-bot:
Reimplement imageuncat's -recentchanges as a pagegenerators CLA