Security review of Beautiful Soup
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• bmansurov
	Oct 12 2017, 3:15 PM

Description

Project Information

Name of tool/project: Beautiful Soup
Project home page: https://www.crummy.com/software/BeautifulSoup/
Name of team requesting review: Readers Web
Primary contact: @bmansurov
Target date for deployment:
Link to code repository / patchset: https://code.launchpad.net/beautifulsoup
Programming Language(s) Used: Python

Description of the tool/project

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping.

Description of how the tool will be used at WMF

We'll use the tool to post-process PDFs generated by ElectronPdfService / ChromiumPdfService. The tool will be used by a Python script to query / modify HTML used to generate PDFs.

Dependencies

None (afaik)

Has this project been reviewed before?

No (afaik)

Working test environment

There's none yet, but we can share something as part of T171960: Create a library to post-process PDF and add page numbers and table of contents

Post-deployment

Related Objects

Mentioned In: E769: Security review of Beautiful Soup
T173014: Security review of pdfrw
Mentioned Here: T177765: Security review of mediawiki-services-chromium-render
T171960: Create a library to post-process PDF and add page numbers and table of contents

Event Timeline

• bmansurov created this task.Oct 12 2017, 3:15 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 12 2017, 3:15 PM

• bmansurov mentioned this in T173014: Security review of pdfrw.Oct 12 2017, 3:15 PM

Jdlrobson moved this task from Incoming to Upcoming on the Web-Team-Backlog board.Oct 12 2017, 3:49 PM

Readers Web will be responsible for maintaining the library.

Are you planning to fork BeautifulSoup? Or did you mean something else..?

Copy & paste fail. ;(

• bmansurov updated the task description. (Show Details)Oct 12 2017, 6:24 PM

FTR this is lower priority than T177765: Security review of mediawiki-services-chromium-render.

• dpatrick mentioned this in E769: Security review of Beautiful Soup.Oct 31 2017, 8:05 PM

phuedx moved this task from Upcoming to Tracking on the Web-Team-Backlog board.Nov 7 2017, 8:24 PM

phuedx edited projects, added Web-Team-Backlog (Tracking); removed Web-Team-Backlog.

In T178077#3723899, @phuedx wrote:

FTR this is lower priority than T177765: Security review of mediawiki-services-chromium-render.

I can tell, but I basically looked at them all at the same time. I've found not issues with BeautifulSoup. I know that it's use here is generally limited, but I assumed that some user-controlled HTML make make it through to this parser and tested for DoS via resource consumption, code execution via entity expansion, failure to maintain entity encoding, etc. and found no concerns. A quick question which I should I have clarified before: will you be using HTMLParser, or an external parser (lxml, html5lib, etc.)?

will you be using HTMLParser, or an external parser (lxml, html5lib, etc.)?

We'll be using the Python 3's default html.parser for now: https://github.com/kodchi/ppg/blob/master/src/process_toc.py#L26

In T178077#3760654, @bmansurov wrote:

will you be using HTMLParser, or an external parser (lxml, html5lib, etc.)?

We'll be using the Python 3's default html.parser for now: https://github.com/kodchi/ppg/blob/master/src/process_toc.py#L26

Thanks. That looks to be fine. I'll go ahead and mark this complete.

Thanks for taking the time to review this, @dpatrick!

sbassett moved this task from Incoming to Done on the deprecated-security-team-reviews board.Apr 24 2019, 6:10 PM

Restricted Application added a project: Product-Infrastructure-Team-Backlog-Deprecated. · View Herald TranscriptApr 24 2019, 6:10 PM

• chasemp removed a project: deprecated-security-team-reviews.Jan 8 2020, 4:12 PM

Security review of Beautiful SoupClosed, ResolvedPublicActions