Page MenuHomePhabricator

Write scripts to populate /ask from wp backlogs
Closed, ResolvedPublic

Event Timeline

prnk28 created this task.Jun 30 2016, 4:53 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 30 2016, 4:53 PM

Just to be sure, why do we need the script to be run via ask(). Ask can just handle user-typed inputs and render them in iframes
Can't we just run the script externally, so that queries directly form question files in the records directory so that they can cater to answer()?

@Jsalsman Here's a simple script that scans a backlog page and outputs all article titles along with links on that page.
https://github.com/priyankamandikal/minireview/blob/master/backlog_script.py

Could you please clarify my doubt in the previous comment? Can't I just form question files from this script directly instead of passing to ask?
Each file that's created from a backlogs categ will have its content as:
<Category name>
<Title>
<url>
So when this file is served to a user in /answer, we can display backlog categ and title and display the url in an iframe.

Does that sound good?

prnk28 added a comment.Jul 2 2016, 7:50 AM

I updated the script to reflect the changes. It now extracts articles from a given backlog category and month and makes questions out of all of them
@Jsalsman , is there anything else to be done to mark this task as complete?

@prnk28, I don't know what BeautifulSoup is or what it does. Is this
running anywhere I can look at its output easily? What does one of the -q
files it creates look like? Maybe you can attach one or provide a URL to a
set, please?

prnk28 added a comment.EditedJul 2 2016, 10:28 AM

BeutifulSoup is a python library that makes it very easy extract data out of html
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

You can check the records dir that has 200 files created by the script
https://github.com/priyankamandikal/minireview/tree/master/records

For example, records/000000020q looks like this:

Category:Wikipedia_articles_in_need_of_updating_from_May_2016
American Shakespeare Theatre
https://en.wikipedia.org/wiki/Category:Wikipedia_articles_in_need_of_updating_from_May_2016
Jsalsman triaged this task as High priority.Jul 2 2016, 11:38 PM
Jsalsman lowered the priority of this task from High to Medium.Jul 3 2016, 11:40 AM

@Jsalsman , what else needs to be done here?

@prnk28, re-write them to POST to /ask instead of just creating files, once
you get the iframeurl form element working. That way you can let other
people create questions from their own methods without giving them write
access to records/

Jsalsman raised the priority of this task from Medium to High.Jul 8 2016, 5:44 AM

@prnk28 is working on this; trying "recently" as a keyword to use with its wikiwho date for stale passages.

Jsalsman added a comment.EditedJul 10 2016, 11:40 PM

@prnk28 Don't forget to make these for

  • using arbitrary strings and regular expressions in place of "recently" and e.g. "last year" with Wikiwho age thresholds of those strings (i.e. from known paid editing incidents such as "hydraulic fracturing" etc.)
  • using diffs from particular usernames (e.g. the Wikipedia Education Program students)
  • your Flesch-Kincaid readability test implementation

Is there any way to get a list of all article names in Wikipedia? I need to test the script on a huge bulk of articles to find the ones containing 'recent' or 'recently'.

This task was for generating questions from WP:Backlogs. I have created a new task for the next task: T140559

prnk28 closed this task as Resolved.Jul 17 2016, 12:53 PM