Page MenuHomePhabricator

Wikia returns cached pages for get.py editarticle.py
Closed, DeclinedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1537/
Reported by: throwy
Created on: 2012-11-07 12:36:40
Subject: Wikia returns cached pages for get.py editarticle.py
Original description:
get.py and editarticle.py use a method of page fetching that results in cached pages from Wikia
replace.py uses the pagegenerator method, which fetches the latest version of pages from Wikia

The issue is probably a Wikia issue, but it would be nice to implement a workaround in pywikipediabot.

Steps to reproduce:
Create or edit a page on a Wikia wiki. Fetch the page with editarticle.py or get.py . The bot should fetch a cached version. Edit the page with replace.py and the bot should fetch the most recent version, which is the expected behavior.

Comments:
Someone had already solved this issue for me on \#pywikipediabot on freenode. It requires very little alteration to get.py and editarticle.py. Unfortunately I did not back up or document the changes before updating pywikipediabot from SVN and the changes were lost.

\----

$ python version.py
Pywikipedia \[http\] trunk/pywikipedia \(r10663, 2012/11/04, 19:53:31\)
Python 2.7.3 \(v2.7.3:70274d53c1dd, Apr 9 2012, 20:52:43\)
\[GCC 4.2.1 \(Apple Inc. build 5666\) \(dot 3\)\]
config-settings:
use\_api = True
use\_api\_login = True
unicode test: ok


Version: compat-(1.0)
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1537

Details

Reference
bz55165

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:24 AM
bzimport set Reference to bz55165.
bzimport added a subscriber: Unknown Object (????).

Some remarks:
I changed the hostname\(\) in family file to "mlp.wikia.com" and used the following statements:

import wikipedia as wp
s = wp.getSite\('wikia', 'wikia'\)
p = wp.Page\(s, 'Template:Date/doc'\)
t = p.get\(force=True\)

result:
Traceback \(most recent call last\):
File "<pyshell\#69>", line 1, in <module>
t = p.get\(force=True\)
File "wikipedia.py", line 699, in get
expandtemplates = expandtemplates\)
File "wikipedia.py", line 800, in \_getEditPage
"Page does not exist. In rare cases, if you are certain the page does exist, look into overriding family.RversionTab"\)
NoPage: \(wikia:wikia, u'\[\[wikia:Template:Date/doc\]\]', 'Page does not exist. In rare cases, if you are certain the page does exist, look into overriding family.RversionTab'\)

the query param dict was:
\{'inprop': \['protection', 'subjectid'\], 'rvprop': \['content', 'ids', 'flags', 'timestamp', 'user', 'comment', 'size'\], 'prop': \['revisions', 'info'\], 'titles': u'Template:Date/doc', 'rvlimit': 1, 'action': 'query'\}

the result data dict was:
\{u'query': \{u'pages': \{u'-1': \{u'protection': \[\], u'ns': 10, u'missing': u'', u'title': u'Template:Date/doc'\}\}\}\}

and last the url is:
/api.php?inprop=protection%7Csubjectid&format=json&rvprop=content%7Cids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Csize&prop=revisions%7Cinfo&titles=Template%3ADate/doc&rvlimit=1&action=query

which gives the right result via browser e.g.:
http://mlp.wikia.com/api.php?inprop=protection%7Csubjectid&format=json&rvprop=content%7Cids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Csize&prop=revisions%7Cinfo&titles=Template%3ADate/doc&rvlimit=1&action=query&format=jsonfm

I found the patched editarticle.py on pastebin, woohoo\!

<pre>33a34
> import pagegenerators
157c158
< self.page = pywikibot.Page\(site, pageTitle\)
\---
> self.page = iter\(pagegenerators.PreloadingGenerator\(\[pywikibot.Page\(site, pageTitle\)\]\)\).next\(\)</pre>

diff of editarticle.py with working pagegenerators fetching

diff of get.py with working pagegenerators fetching

Yes this path retrieves the page content via special:import instead of API because API bulk call is not approved for the trunk release. Thus this patch wouldn't work for rewrite branch.

Anyway it is not clear for me, why the api returns the data by browser call but not via bot frameworks query.

changed the hostname() in family file to "mlp.wikia.com" and used the following statements:

import wikipedia as wp
s = wp.getSite('wikia', 'wikia')
p = wp.Page(s, 'Template:Date/doc')
t = p.get(force=True)

works for me.

adding version info:
Pywikipedia [https] r/pywikibot/compat (r10308, a208b54, 2013/09/24, 09:51:19, ok)
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok

Was this ever reported to Wikia? If not please write to community AT wikia.com

Aklapper triaged this task as Lowest priority.Jun 5 2015, 1:41 PM
Aklapper subscribed.

Pywikibot has two versions: Compat and Core. This task was filed about the older version, called Pywikibot-compat, which is not under active development anymore. Hence I'm lowering the priority of this task to reflect the reality. Unfortunately, the Pywikibot team does not have the manpower to retest every single bug report / feature request against the (maintained) Pywikibot code base. Furthermore, the code base of Pywikibot-Compat has changed a lot compared to the code base of Pywikibot-Core so there is a chance that the problem described in this task might not exist anymore. Please help: Unfortunately manpower is limited and does not allow testing every single reported task again. If you have time and interest in Pywikibot, please upgrade to Pywikibot-Core and add a comment to this task if the problem in this task still happens in Pywikibot-Core (or directly edit the task by removing the Pywikibot-compat project and adding the Pywikibot project to this task). To learn more about Pywikibot and to get involved in its development, please check out https://www.mediawiki.org/wiki/Manual:Pywikibot/Development Thank you for your understanding.

Xqt subscribed.

compat branch is closed