Page MenuHomePhabricator

Write scripts to populate /ask from wp backlogs
Closed, ResolvedPublic

Event Timeline

Just to be sure, why do we need the script to be run via ask(). Ask can just handle user-typed inputs and render them in iframes
Can't we just run the script externally, so that queries directly form question files in the records directory so that they can cater to answer()?

@Jsalsman Here's a simple script that scans a backlog page and outputs all article titles along with links on that page.

Could you please clarify my doubt in the previous comment? Can't I just form question files from this script directly instead of passing to ask?
Each file that's created from a backlogs categ will have its content as:
<Category name>
So when this file is served to a user in /answer, we can display backlog categ and title and display the url in an iframe.

Does that sound good?

I updated the script to reflect the changes. It now extracts articles from a given backlog category and month and makes questions out of all of them
@Jsalsman , is there anything else to be done to mark this task as complete?

@prnk28, I don't know what BeautifulSoup is or what it does. Is this
running anywhere I can look at its output easily? What does one of the -q
files it creates look like? Maybe you can attach one or provide a URL to a
set, please?

BeutifulSoup is a python library that makes it very easy extract data out of html

You can check the records dir that has 200 files created by the script

For example, records/000000020q looks like this:

American Shakespeare Theatre
Jsalsman lowered the priority of this task from High to Medium.Jul 3 2016, 11:40 AM

@Jsalsman , what else needs to be done here?

@prnk28, re-write them to POST to /ask instead of just creating files, once
you get the iframeurl form element working. That way you can let other
people create questions from their own methods without giving them write
access to records/

Jsalsman raised the priority of this task from Medium to High.Jul 8 2016, 5:44 AM

@prnk28 is working on this; trying "recently" as a keyword to use with its wikiwho date for stale passages.

@prnk28 Don't forget to make these for

  • using arbitrary strings and regular expressions in place of "recently" and e.g. "last year" with Wikiwho age thresholds of those strings (i.e. from known paid editing incidents such as "hydraulic fracturing" etc.)
  • using diffs from particular usernames (e.g. the Wikipedia Education Program students)
  • your Flesch-Kincaid readability test implementation

Is there any way to get a list of all article names in Wikipedia? I need to test the script on a huge bulk of articles to find the ones containing 'recent' or 'recently'.

This task was for generating questions from WP:Backlogs. I have created a new task for the next task: T140559