Security review of pdfrw
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• bmansurov
	Aug 10 2017, 4:02 PM

Description

Project Information

Name of tool/project: pdfrw
Project home page: https://github.com/pmaupin/pdfrw
Name of team requesting review: Readers Web
Primary contact: @bmansurov
Target date for deployment: Q2 (November), 2017-2018
Link to code repository / patchset: https://github.com/pmaupin/pdfrw
Programming Language(s) Used: Python

Description of the tool/project

pdfrw is a Python library and utility that reads and writes PDF files

Description of how the tool will be used at WMF

We'll use the tool to post-process PDFs generated by ElectronPdfService. The tool will be used by a Python script to modify PDFs to add page numbers and table of contents.

Dependencies

None (afaik)

Has this project been reviewed before?

No (afaik)

Working test environment

There's none yet, but we can share something as part of T171960: Create a library to post-process PDF and add page numbers and table of contents

Post-deployment

Readers Web will be responsible for maintaining the library.

Related Objects
Search...

Status	Assigned	Task
Resolved	• JKatzWMF	T150871 [EPIC] (Proposal) Replicate core OCG features and sunset OCG service
Invalid	None	T186740 [EPIC] It should be possible to print a book using the Proton service
Invalid	None	T171832 Deploy new book renderer to all projects
Resolved	• dpatrick	T173014 Security review of pdfrw

Event Timeline

• bmansurov created this task.Aug 10 2017, 4:02 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 10 2017, 4:02 PM

Jdlrobson moved this task from Untriaged to Move to Backlog on the Web-Team-Backlog (Tracking) board.Aug 10 2017, 4:20 PM

• bmansurov removed a parent task: T171960: Create a library to post-process PDF and add page numbers and table of contents.Aug 18 2017, 4:32 PM

pmiazga subscribed.Aug 24 2017, 4:46 PM

phuedx updated the task description. (Show Details)Aug 24 2017, 4:47 PM

The target date for deployment has shifted forward to EOQ Q1 (end of September). Who's the best person to review this Python library?

ovasileva added a parent task: T171832: Deploy new book renderer to all projects.Aug 24 2017, 5:02 PM

phuedx updated the task description. (Show Details)Aug 29 2017, 1:25 PM

Who's the best person to review this Python library?

@Bawolff @dpatrick: IIRC you were the folks to ping in this context. I apologise if that's not the case.

In T173014#3561993, @phuedx wrote:

Who's the best person to review this Python library?

@Bawolff @dpatrick: IIRC you were the folks to ping in this context. I apologise if that's not the case.

Yes. We will add it to our list of things to review.

Thank you, @Bawolff!

In addition to pdfrw, it's looking increasingly likely that we're going to have to use BeautifulSoup for easy DOM querying and manipulation. At this time we won't be using any external parsers such as lxml, but we'll use Python's built in html.parser. Should I create a new task for this? Not sure if any past projects have used this library before, but ORES or Wikimetrics don't seem to use it.

ovasileva added a project: Proton.Sep 11 2017, 12:18 PM

ovasileva moved this task from Triage to Tracking on the Proton board.

phuedx updated the task description. (Show Details)Oct 2 2017, 12:21 PM

In T173014#3573827, @bmansurov wrote:

In addition to pdfrw, it's looking increasingly likely that we're going to have to use BeautifulSoup for easy DOM querying and manipulation. At this time we won't be using any external parsers such as lxml, but we'll use Python's built in html.parser. Should I create a new task for this? Not sure if any past projects have used this library before, but ORES or Wikimetrics don't seem to use it.

Yes, please create a separate task requesting that review. Thanks.

• dpatrick mentioned this in E752: Security review of pdfrw.Oct 12 2017, 3:04 PM

• dpatrick moved this task from Incoming to Scheduled on the deprecated-security-team-reviews board.Oct 12 2017, 3:09 PM

Done: T178077: Security review of Beautiful Soup

I checked out the WIP ppg code in the description of T171960 and I'm wondering whether that will be invoked by the Node service (T177765), returning a ready-to-read PDF which has ToC, page numbers, etc., or will the Node service just render an article which will then be post-processed (adding ToC, page numbers, etc.) separately? I'm asking to ascertain whether the script which will use pdfrw will be firejailed. This question is not a blocker. I'm just curious.

This review is complete. Basic concerns such as system i/o (very limited) and shell execution (none) were found to be safe. Encryption implementation was not reviewed since we won't be using that portion of the code. My main concerns would lie with code execution or denial of service (a la https://github.com/pmaupin/pdfrw/issues/92) via malicious PDF input, however, the fact that this will run in a closed ecosystem (in which input originates only from a system we control) mitigates those concerns.

In T173014#3724585, @dpatrick wrote:

I checked out the WIP ppg code in the description of T171960 and I'm wondering whether that will be invoked by the Node service (T177765), returning a ready-to-read PDF which has ToC, page numbers, etc., or will the Node service just render an article which will then be post-processed (adding ToC, page numbers, etc.) separately? I'm asking to ascertain whether the script which will use pdfrw will be firejailed. This question is not a blocker. I'm just curious.

I think we'll have the Node service render an article only. We may need to build another service that takes the PDF from the Node service and post-processes it using ppg. I'm not sure if that's what we will end up doing though.

sbassett moved this task from Scheduled to Done on the deprecated-security-team-reviews board.Jul 9 2019, 8:26 PM

Restricted Application added a project: Product-Infrastructure-Team-Backlog-Deprecated. · View Herald TranscriptJul 9 2019, 8:26 PM

• chasemp removed a project: deprecated-security-team-reviews.Jan 8 2020, 4:14 PM

Security review of pdfrwClosed, ResolvedPublicActions