How to read large XML files efficiently using PHP?

Recently I was faced problem of parsing large XML file using PHP. While parsing small files are no problem and all its quickly parsed, an attempt to parse larger files often causes time-out or Internal Sever Error.

We will not able to use DOM or SimpleXML extension to parse the large XML documents in PHP. As DOM and SimpleXML extension will loads the entire XML document into memory while parsing, it will not work for Large XML documents.

Then how we read large XML document using PHP?

PHP will have another extension called XMLReader, This extension is enabled by default as of PHP 5.1 onwards.

How this XMLReader extension differs from others?

XMLReader will reads XML documents on stream base. So, It will not store the entire document in the memory. Instead, they read in one node at a time and allow you to interact with in real time. Once you move onto the next node, the old one is thrown away – unless you explicitly store it yourself for later use. It makes the Stream based parser faster and consume lesser memory.

But DOM or SimpleXML are tree based parsers, It will stores the entire document on memory and store the document into a data structure called Tree. Its the better option for smaller XML documents. But it does not work for large XML document as it causes major performance issue or other error like time-out.

How to use XMLReader?

Code snippet to load data XML document or Load XML data from variable.

[php]
<?php
$document = "<products><item><name>Prod 1</name><price>$2.00</price></item><item><name>Prod 2</name><price>$5.00</price></item></products>";
//Create a new XMLReader Instance
$reader = new XMLReader();
//Loading from a XML File or URL
$reader->open(‘document.xml’);
//Loading from PHP variable
$reader->xml($document);
?>

[/php]

Code snippet to loop the XML document

[php]
<?php
//Loop through the XML Document
while($reader->read())
{
/* Process XML */
// Start element of XML document
if($reader->nodeType == XMLReader::ELEMENT && $reader->localName == ‘name’){
//move to the text node
echo $reader->readString();
}

// End element of XML document
if($reader->nodeType == XMLReader::END_ELEMENT) {

}
}
?>

[/php]

Hope this post would help you to start with XMLReader and parsing large XML Document. You can explore more into XMLReader and its methods at http://php.net/manual/en/book.xmlreader.php.

Permanent link to this article: https://blog.openshell.in/2014/09/how-to-read-large-xml-files-efficiently-using-php/