Hello! It would be nice, if you add -match option to pagegenerators.py, that means that script will work only on pages which match some regexp.
Description
Details
- Reference
- bz55078
Related Objects
Event Timeline
-regex is used for replacement. How can we solve task such "replace text using some regex if article match some regex"?
do you expect sth like -requiretext:XYZ \(in combination with -regex\) which could be the opposite of -excepttext:XYZ, analogous to the existing -requiretitle vs -excepttitle ?
I know a workaround for this in compat, but that feature has not been ported to core yet. Hopefully will be soon...
(First round: use replace.py "someregex" "foobar" -save:something.txt, then do the actual replacements with -file:something.txt.
As far as I can see, the -grep option provides this:
-grep A regular expression that needs to match the article otherwise the page won't be returned. Multiple -grep:regexpr can be provided and the page will be returned if content is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character, including a newline.
- -grep matches regex in
page title - -search:'insource://' matches regex inside page content. This is not ideal (doesn't work with ^,$,\s,...), but it usually will do just fine
- -search:'insource://' matches regex inside page content. This is not ideal (doesn't work with ^,$,\s,...), but it usually will do just fine
Why do we need not ideal solutions?
-search:'insource://' is just a workaround for this issue. It works with MediaWiki's CirrusSearch, which does not support some regex operators (e.g. T135280)
OK. Replace.py is very important for me as I heavily use it for multiple purposes, and I am highly interested in its performance. I contributed a lot to the compat version, but now I have troubles both with using the core and with coming back to development, but I will look inside the problem when I am able.
No, -grep matches the page contents. -titleregex matches the page title. This is clearly documented (https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/pagegenerators.py#L307, https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/pagegenerators.py#L226), and corresponds (as far as I can see) to the actual code.
That's weird, for me -grep never worked for page contents at all. I'll try next time once more
Update: It works finally, wow, but still missing a generator instead
But maybe the more important difference:
- -search:'insource://' is a generator
- -grep is a filter
Because -search:'insource://' is not ideal (missing support for \s, \n, ^, $, ...), a solution to this task would be still helpful
We already have a -grep filter. This will work together with any generator e.g. with -start. There is not such -match filter on API side which can be used.