Page MenuHomePhabricator

Page rotation not saved in meta data of a PDF file
Open, Needs TriagePublic

Description

When PdfHandler extracts meta data from a PDF with page size A4 and page rotation of 90 degrees (landscape) the "Page rot" meta information is not saved to the database

Occured in PdfHandler in branch REL1_31

Example output of /usr/bin/pdfinfo -enc UTF-8 -l 9999999 images/b/b5/Overview.pdf

Title:          Overview.vsd
Subject:        
Keywords:       
Author:         auser
Creator:        PDFCreator 2.4.1.13
Producer:       PDFCreator 2.4.1.13
CreationDate:   Thu Jan  2 12:08:27 2019 UTC
ModDate:        Thu Jan  4 12:08:27 2019 UTC
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
Syntax Warning: Invalid least number of objects reading page offset hints table
JavaScript:     no
Pages:          1
Encrypted:      no
Page    1 size: 595 x 842 pts (A4)
Page    1 rot:  90
File size:      123684 bytes
Optimized:      yes
PDF version:    1.4

Stored information in images.image_meta (unserialized)

array (
  'Title' => 'Overview.vsd',
  'Subject' => '',
  'Keywords' => '',
  'Author' => 'auser',
  'Creator' => 'PDFCreator 2.4.1.13',
  'Producer' => 'PDFCreator 2.4.1.13',
  'CreationDate' => 'Thu Jan  2 12:08:27 2019',
  'ModDate' => 'Thu Jan  4 12:08:27 2019',
  'Tagged' => 'no',
  'Pages' => '1',
  'Encrypted' => 'no',
  'pages' => 
  array (
    1 => 
    array (
      'Page size' => '595 x 842 pts (A4)',
    ),
  ),
  'File size' => '123684 bytes',
  'Optimized' => 'yes',
  'PDF version' => '1.4',
  'mergedMetadata' => 
  array (
    'DateTime' => '2019:01:02 12:08:27',
    'DateTimeDigitized' => '2019:01:04 12:08:27',
    'Software' => 'PDFCreator 2.4.1.13',
    'ObjectName' => 
    array (
      'x-default' => 'Overview.vsd',
      '_type' => 'lang',
    ),
    'Artist' => 
    array (
      0 => 'auser',
      '_type' => 'ol',
    ),
    'ImageDescription' => '',
    'pdf-Producer' => 'PDFCreator 2.4.1.13',
    'pdf-Encrypted' => 'no',
    'pdf-PageSize' => 
    array (
      0 => '595 x 842 pts (A4)',
    ),
    'pdf-Version' => '1.4',
  ),
  'text' => 
  array (
    0 => 'Some text
',
    1 => '',
  ),
)

As you can see [ 'pages'][1]['Page rot'] is missing.

This might be due to the "postprocessing" of the extracted meta data: https://github.com/wikimedia/mediawiki-extensions-PdfHandler/blob/51185ca9cb1f3a76bea2054e48c5790802633779/includes/PdfImage.php#L291-L309

Event Timeline

Osnard created this task.Jul 8 2019, 2:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 8 2019, 2:14 PM