Tuesday, April 19, 2011

XPath Matching in Unique XML Namespaces (xmlns)

I ran into a problem trying to get my Python script to locate an etree node in an XML document that had declared a namespace for all elements in the document. The trouble I was seeing is that the find and xpath functions weren't returning anything for any of the searches I was doing. Nothing seemed to make sense until I realized that I wasn't including the namespace in the search. The bugger with this is that you have to declare the ns on each and every element in the document! To simplify things a little, i went with code that looked a little like this:

The full code:

from lxml import etree as ET
fp = open("sitemap.xml","r")
element = ET.parse(fp)

namespace = "{http://www.sitemaps.org/schemas/sitemap/0.9"}"
e = element.findall('{0}url'.format(namespace))
for i in e:
print i.text

