Extract text from wikimarkup
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Pavan91727
	Aug 13 2022, 6:02 AM

Description

Wikipedia articles contain lots of formatting text. We need a python package to extract pure text from them. This text should not contain tables,lists,references,lists etc.

Event Timeline

Pavan91727 created this task.Aug 13 2022, 6:02 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 13 2022, 6:02 AM

LennardHofmann subscribed.Aug 13 2022, 6:33 AM

Hi @Pavan91727, thanks for taking the time to report this and welcome to Wikimedia Phabricator! Who is "we" exactly?
@lakshmi: Do you plan to work on this?

Aklapper moved this task from Inbox to Hacking Projects on the Wikimania-Hackathon-2022 board.Aug 13 2022, 11:18 AM

@lakshmi: Thanks for participating in the Hackathon! We hope you had a great time.

If this task was being worked on and resolved at the Hackathon: Please change the task status to resolved via the Add Action... → Change Status dropdown, and make sure that this task has a link to the public codebase.
If this task is still valid and should stay open: Please add another active project tag to this task, so others can find this task (as likely nobody in the future will look back at the Hackathon workboard when trying to find something they are interested in).
In case there is nothing else to do for this task, or nobody plans to work on this task anymore: Please set the task status to declined.

Thank you,
your Hackathon venue housekeeping service

https://github.com/lakshmi-warrier/wikimarkup-formatter

This is the link to the initial draft of the project. This needs to be worked on, and is not completed.

Thanks! Resolving this task as the Hackathon is over.

Extract text from wikimarkupClosed, ResolvedPublicActions

Description

Event Timeline

Extract text from wikimarkup
Closed, ResolvedPublic
Actions