Page MenuHomePhabricator

[jobs-api,jobs-cli] php 8.2 crashes when using XMLReader
Open, MediumPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Create a simple xml file "somefile.xml" in some directory and run the following code snipplet:

<?php
  print date('Y-m-d H:i:s', time()) . " Start Test XMLReader\n";
  $xmlFile = new XMLReader();
  $xmlFile->open("somefile.xml");
  $didRead = 0;
  while($xmlFile->read()) {
    ++ $didRead;
  }
  $xmlFile->close();
  print date('Y-m-d H:i:s', time()) . " End Test XMLReader, did read $didRead items\n";                                                                       
?>

What happens?:
With php 7.3 (the default on tools-sgebastion-11) it runs fine

With php 8.2.7 (the version which is provided by "webservice php8.2 shell") it crashes

What should have happened instead?:

No crash ;^)

Software version (skip for WMF-hosted wikis like Wikipedia):

This php version causes a crash with Segmentation fault
php -v
PHP 8.2.7 (cli) (built: Jun 9 2023 19:37:27) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.7, Copyright (c) Zend Technologies

with Zend OPcache v8.2.7, Copyright (c), by Zend Technologies

Event Timeline

The file I used:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
        xmlns:schema="http://schema.org/"
        xmlns:gndo="https://d-nb.info/standards/elementset/gnd#"
        xmlns:lib="http://purl.org/library/"
        xmlns:owl="http://www.w3.org/2002/07/owl#"
        xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
        xmlns:skos="http://www.w3.org/2004/02/skos/core#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        xmlns:editeur="https://ns.editeur.org/thema/"
        xmlns:geo="http://www.opengis.net/ont/geosparql#"
        xmlns:umbel="http://umbel.org/umbel#"
        xmlns:naf="https://id.loc.gov/authorities/names/"
        xmlns:rdau="http://rdaregistry.info/Elements/u/"
        xmlns:sf="http://www.opengis.net/ont/sf#"
        xmlns:bflc="http://id.loc.gov/ontologies/bflc/"
        xmlns:thesoz="http://lod.gesis.org/thesoz/"
        xmlns:dcterms="http://purl.org/dc/terms/"
        xmlns:isbd="http://iflastandards.info/ns/isbd/elements/"
        xmlns:foaf="http://xmlns.com/foaf/0.1/"
        xmlns:mesh="http://id.nlm.nih.gov/mesh/vocab#"
        xmlns:ram="https://data.bnf.fr/ark:/12148/"
        xmlns:mo="http://purl.org/ontology/mo/"
        xmlns:marcRole="http://id.loc.gov/vocabulary/relators/"
        xmlns:agrelon="https://d-nb.info/standards/elementset/agrelon#"
        xmlns:dcmitype="http://purl.org/dc/dcmitype/"
        xmlns:nsogg="https://purl.org/bncf/tid/"
        xmlns:dnbt="https://d-nb.info/standards/elementset/dnb#"
        xmlns:dbp="http://dbpedia.org/property/"
        xmlns:embne="https://datos.bne.es/resource/"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:dnb_intern="http://dnb.de/"
        xmlns:madsrdf="http://www.loc.gov/mads/rdf/v1#"
        xmlns:cidoc="http://www.cidoc-crm.org/cidoc-crm/"
        xmlns:v="http://www.w3.org/2006/vcard/ns#"
        xmlns:ebu="http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#"
        xmlns:wdrs="http://www.w3.org/2007/05/powder-s#"
        xmlns:gbv="http://purl.org/ontology/gbv/"
        xmlns:bibo="http://purl.org/ontology/bibo/"
        xmlns:agrovoc="https://aims.fao.org/aos/agrovoc/"
        xmlns:lcsh="https://id.loc.gov/authorities/subjects/"
        xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="https://d-nb.info/gnd/4000002-3">
        <wdrs:describedby>
                <rdf:Description rdf:about="https://d-nb.info/gnd/4000002-3/about">
                        <dcterms:license rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
                        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2010-01-06T12:56:50.000</dcterms:modified>
                        <gndo:descriptionLevel rdf:resource="https://d-nb.info/standards/vocab/gnd/description-level#1"/>
                </rdf:Description>
        </wdrs:describedby>
        <gndo:gndIdentifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">4000002-3</gndo:gndIdentifier>
        <gndo:oldAuthorityNumber rdf:datatype="http://www.w3.org/2001/XMLSchema#string">(DE-588c)4000002-3</gndo:oldAuthorityNumber>
        <gndo:relatedDdcWithDegreeOfDeterminacy2 rdf:resource="http://dewey.info/class/621.381537/"/>
        <gndo:preferredNameForTheSubjectHeading rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A 302 D</gndo:preferredNameForTheSubjectHeading>
        <gndo:broaderTermGeneral rdf:resource="https://d-nb.info/gnd/4027242-4"/>
        <gndo:gndSubjectCategory rdf:resource="https://d-nb.info/standards/vocab/gnd/gnd-sc#31.9b"/>
        <rdf:type rdf:resource="https://d-nb.info/standards/elementset/gnd#SubjectHeadingSensoStricto"/>
</rdf:Description>
</rdf:RDF>

Actually this is a snipplet of https://data.dnb.de/opendata/authorities-gnd-sachbegriff_lds.rdf.gz

fnegri triaged this task as Medium priority.Jan 16 2024, 4:37 PM
fnegri added a project: cloud-services-team.
fnegri moved this task from Inbox to Clinic Duty on the cloud-services-team board.

I can reproduce this on Toolforge, but not on my laptop which has PHP 8.2.12 from Debian. Updating the container seems like a reasonable first thing to try.

Mentioned in SAL (#wikimedia-cloud) [2024-01-17T08:56:02Z] <taavi> update all pre-built docker images T352886

Ah, I was wrong. 8.2.7 is the newest on Bookworm (stable), the newer versions are in testing. However weirdly enough I can't reproduce in a standalone VM on 8.2.7.

This seems to happen after installing php-tideways:

taavi@taavi-xmltest:~$ php somefile.php
2024-01-18 15:39:59 Start Test XMLReader
2024-01-18 15:39:59 End Test XMLReader, did read 43 items
taavi@taavi-xmltest:~$ sudo apt install php8.2-tideways
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  php8.2-tideways
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 15.5 kB of archives.
After this operation, 57.3 kB of additional disk space will be used.
Get:1 http://mirrors.wikimedia.org/debian bookworm/main amd64 php8.2-tideways amd64 5.0.4-16 [15.5 kB]
Fetched 15.5 kB in 0s (725 kB/s)
Selecting previously unselected package php8.2-tideways.
(Reading database ... 45489 files and directories currently installed.)
Preparing to unpack .../php8.2-tideways_5.0.4-16_amd64.deb ...
Unpacking php8.2-tideways (5.0.4-16) ...
Setting up php8.2-tideways (5.0.4-16) ...
Processing triggers for php8.2-cli (8.2.7-1~deb12u1) ...
taavi@taavi-xmltest:~$ php somefile.php
2024-01-18 15:40:06 Start Test XMLReader
Segmentation fault

@taavi: Great job in finding this problem. Now the mother of all questions: Do we need tideways? Does any tool use its functionality?

dcaro renamed this task from php 8.2 crashes when using XMLReader to [jobs-api,jobs-cli] php 8.2 crashes when using XMLReader.Mar 5 2024, 4:11 PM
taavi removed taavi as the assignee of this task.Jun 25 2024, 3:35 PM