Background - we would like to investigate replicating some or most of OCG's functionality using the Electron PDF service. Namely, we would like the ability to concatenate articles and allow transformations which direct the look and feel of Electron.
Acceptance Criteria
Determine the best way to set up the back end for rendering concatenated PDF's according to the following requirements:
- PDF generation must be triggered from the book creator and from download as PDF links (we must be able to generate PDF's for single and multiple articles)
- For multiple articles (books), the current UI of the book creator will be used
- Users will be able to select between a two-column and single-column layout, where the two-column layout will render using OCG and the single column layout will render using electron. This must be available for both books and individual articles (similar to current implementation on mediawiki (https://www.mediawiki.org/w/index.php?title=Special:ElectronPdf&page=MediaWiki&action=show-selection-screen&coll-download-url=%2Fw%2Findex.php%3Ftitle%3DSpecial%3ABook%26bookcmd%3Drender_article%26arttitle%3DMediaWiki%26returnto%3DMediaWiki%26oldid%3D2301969%26writer%3Drdf2latex)
- Concatenated PDFs must include the following:
- Table of contents
- Table of contents must contain the individual table of contents for each article as subsections
- Table of contents must be clickable - selecting a link from the table of contents must navigate to the correct position within the article
- All tables and infoboxes available in the original articles
- Chapter structure - each article must be numbered as a chapter and marked accordingly in the table of contents
- References
- References will appear individually at the end of each article. If links are available within the references, they will be available within the created PDF
- Blue links - all blue links will be available within the PDF. Blue links will be styled differently
- Styles - styles must contain the current desktop print styles (in progress here: T135022: [EPIC] Improve print styles in desktop and mobile sites)
- Contributions:
- all text contributors - a section for contributions will appear at the end of each book. The list of contributors will be separated by the name of the article
- all image contributors - a section for image contributions will appear at the end of each book. The list of contributors will be separated by the name of the article
- Content license
- Table of contents
Example Structure:
Book title (page break)
Table of contents (page break)
Chapter 1, Article Title, article 1, article 1 references (page break)
Chapter 2, Article Title, article 2, article 2 references (page break)
Text and image sources, contributors, and licenses
- Section 1: text sources
- Section 1.1: text sources article 1,
- Section 1.2: text sources article 2
- Section 2: image sources
- Section 1.1: image sources article 1
- Section 1.2: image sources article 2
- Section 3: content license
Questions to answer
Outcomes
Using wkhtmltopdf we'll have to make the following transformations:
- Create a cover page in HTML. Make sure that the book title is vertically and horizontally aligned in the middle.
- Retrieve articles from RESTBase, e.g. Book, and lay them out in the hierarchy requested in requrested metabook. For each article:
- Create a title with the chapter number, e.g. "1. Apple"
- Prefix section titles with the chapter number and section number, e.g. "1.1. Botanical Information"
- Since articles can be grouped into chapters on Special:Book, we need to make each article a subsection of a chapter if the article is a part of a group. For example, if I'm interested in creating a book about fruits and vegetables, I may have two chapters called "1. Fruits" and "2. Vegetables". The article "Apple", would go under "1. Fruits" and be titled as "1.1. Apple". Sections of the article would be prefixed with "1.1.1.", "1.1.2", etc.
- Change references links to point to the references on the page as opposed to the references in the source URL.
- Remove red links.
- We may also have to '"push down" headings when the page has a =-level section' but this case is rare.
- Retrieve Contributors, images (https://en.wikipedia.org/w/api.php?action=query&titles=File%3ABook_Collage.png&prop=imageinfo&iiprop=url|size|mediatype|mime|sha1|extmetadata), and licence info from the MW API endpoint and create an HTML page using them.
- Generate a PDF using wkhtmltopdf. The table of contents and outline will be generated automatically if the correct arguments are passed. An example command is as follows:
./wkhtmltopdf cover http://mw.loc/w/cover.html toc page https://en.wikipedia.org/api/rest_v1/page/html/Apple/781322367/8461718d-3d68-11e7-86c3-bba2fc26f3f6 https://en.wikipedia.org/api/rest_v1/page/html/Pear https://en.wikipedia.org/api/rest_v1/page/html/Cherry https://en.wikipedia.org/api/rest_v1/page/html/Grape https://en.wikipedia.org/api/rest_v1/page/html/Persimmon --print-media-type fruits.pdf
We'd have to point RESTBase urls from the above to local HTML files that we created using the transformations.