Page MenuHomePhabricator

Enabling parsoid to generate WIKITEXT from other markups and HTML cleanup abilities[GSoC Proposal]
Closed, DeclinedPublic

Description

Name: khannaanant262129(vansh khanna)
Email : vanshkhanna27@gmail.com
IRC : khannaanant26212
Web Page : https://github.com/khannavansh/
Location : Noida(Uttar Pradesh),India.
Typical working hours : 1000Hrs to 1400Hrs and 1800Hrs to 0200Hrs (IST)

Phabricator Task : T127329

SYNOPSIS :
The project intends to create HTML from various other markups(input from the user) and this HTML will be fed to parsoid for conversion into wikitext. Thus facilitating information in various markups across the web into wikitext format.
This would help Wikipedia broaden its content by including content not originally in HTML.
Also this project aims at HTML cleanup so the wikitext so generated would be fine and errorless.

Possible mentors : @Arlolra @ssastry @cscott

DELIVERABLES :

  • 21'st March to 21'st April: I intend to develop a clear understanding of the workflow of the part of parsoid code that I will be working with(HTML->Wikitext). (Investigation Phase)
  • 23'rd April to 22'nd May: I aim to develop a strong bond with my mentors and (for a start)I intend to fabricate a working api that stores some utilities for conversion from some markup to HTML and also utilities for HTML cleanup.
  • 24'th May to 15'th June: I aim to develop the api to talk to parsoid and supply conversion utilities when requested and make sure parsoid's generated wikitext is errorless and accurate.
  • 16'th June to 20'th June: This time I want to invest in proper testing , documentation of the work so far.
  • 1'st July to 10'th August: Extending the support to parsoid by HTML cleanup utilities to cleanup the HTML from various sources like google docs so as to generate errorless and accurate wikitext.
  • 10'th August to 15'th August: Testing my entire work so far.Writing good documentation for the project.
  • The deployment will not just be left to the last week rather will be done from time to time,like once a week . Last week is specifically for ensuring that everything is in order and deployment and it's testing is good.Presently I'm thinking of posting my code to Github and for deployment Google AppEngine as it's beta service for nodejs is launched.
  • Essentially , my deliverables will also include identifying and fixing issues in Parsoid itself, such as T127207 and T74702

    PARTICIPATION :
  • I wish to be regularly in touch with my mentors on the IRC channels and through Phabricator
  • IRC seems to me the best place to ask for help.
  • This project will help me further enhance my knowledge and technical skills and I wish to be an active member of this community in future.

EXTRA :

  • So far I have successfully completed a microtask T129562

My past experience :

  • I'm a third year engineering student pursuing a four year bachelor's degree in Computer Science .My main focus at my college is getting more and more exposure to web technologies and networking is my second field of my interest
  • I have worked for a Netherland based company as an intern on a business-to-business tool on CAKEPHP.
  • My current minor project at my college is on nodejs.

ABOUT ME
How did you hear about this program?
Through my college senior who have completed GSOC 2014 with mediawiki

Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?
No , I will devout my complete summer vacations for GSOC 2k16

We don't just care about your project -- you are a person, and that matters to us! What drives you? What makes you want to make this the most awesomest wiki enhancement ever?

My constant efforts for enhancing my technical skills drives me and I strongly believe in taking up a task and completing it in the best possible way. Nevertheless I find the project ideas here in Wikimedia out of the box and intriguing.

What does making this project happen mean to you?
I'm a learner and technology enthusiast and making this project happen would help me grow my technical skills and working with such capable mentors would add a lot to my knowledge. Nevertheless I want prove myself on such a big platform as Google Summer of Code

Please review my proposal for necessary changes, corrections and additions in my proposed timeline and elsewhere
yours sincerely,
vansh khanna.

Event Timeline

@Khannaanant262129 change the title to reflect whether its a GSoC or Outreachy proposal or both. Also remove Possible-Tech-Projects and other tags which do not apply here.

As a general feedback, from https://www.mediawiki.org/wiki/Outreach_programs/Application_template updating with question/answers from "About You" section would be better.

@Sumit Thankyou so much , I have made the changes/additions as per your suggestions , please see if something else needs to be changed/added. Your help is deeply admired.
your sincerely,
vansh khanna

Good job so far. Here are some notes from my reading:

  • The title should emphasize generating wikitext, not HTML, from various other markups;
  • In the "What drives you?" section, apart from your personal development, which is commendable, you may want to mention what in particular you find appealing about the Wikimedia projects that is driving you to contribute here;
  • In the "past experience" section, if possible, maybe list which college you're attending and what your focus is there;
  • More substantively, in your deliverables, apart from the HTML cleanups and normalizations, there's identifying and fixing issues in Parsoid itself, such as T127207 and T74702;
  • Considering the goal here is to produce a user facing utility, I think figuring out a deployment story shouldn't be left to the last week;
  • Your website is a dead link, maybe remove it until it's restored;
  • Do you have any code samples to add to your GitHub profile?
  • What timezone are those working hours in?
Khannaanant262129 renamed this task from Enabling parsoid to generate HTML from other markups and HTML cleanup abilities to Enabling parsoid to generate WIKITEXT from other markups and HTML cleanup abilities.Mar 23 2016, 7:06 PM
Khannaanant262129 updated the task description. (Show Details)

@Arlolra Thanks! for the review , I've made the changes as per your review.

Regarding deployment I'm a little confused. Google AppEngine has recently extended support for nodejs so I was thinking of using that for deployment. What do you suggest ?

That could be alright.

It might also be worth exploring if Wikimedia Labs will suit your needs,
https://www.mediawiki.org/wiki/Wikimedia_Labs

By "focus", I meant your area of study (or major, etc.)

@Arlolra As the necessary changes as per your review has been made. Should this task be closed as resolved?

No, leave it open for the time being.

Khannaanant262129 renamed this task from Enabling parsoid to generate WIKITEXT from other markups and HTML cleanup abilities to Enabling parsoid to generate WIKITEXT from other markups and HTML cleanup abilities[GSoC Proposal].Mar 24 2016, 5:53 PM
Arlolra triaged this task as Medium priority.Apr 12 2016, 11:32 PM

Thank you for your proposal, but sadly it didn't make it to the selects this time. You are welcome to apply for Outreachy round'13, or GSoC round 14 with the same proposal ( if it still have consensus ) or a new one if elibible. Please notify your siblings below 18 years of age about the Google Code In 2016 ( g.co/gci ) round and add yourself as a mentor for the same, if eligible. Closing the proposal as Declined, see you around in #wikimedia-dev.