Tuesday, April 19, 2011

XPath Matching in Unique XML Namespaces (xmlns)

I ran into a problem trying to get my Python script to locate an etree node in an XML document that had declared a namespace for all elements in the document. The trouble I was seeing is that the find and xpath functions weren't returning anything for any of the searches I was doing. Nothing seemed to make sense until I realized that I wasn't including the namespace in the search. The bugger with this is that you have to declare the ns on each and every element in the document! To simplify things a little, i went with code that looked a little like this:

The full code:

from lxml import etree as ET
fp = open("sitemap.xml","r")
element = ET.parse(fp)

namespace = "{http://www.sitemaps.org/schemas/sitemap/0.9"}"
e = element.findall('{0}url'.format(namespace))
for i in e:
print i.text

Editing XML/DTDs/XSLT/XPath in Linux

My favorite new tool for editing XML in Linux? Eclipse. That's right. The does-it-all-and-comes-with-batteries-too tool does a great job editing XML, XSLT, xpath, etc. All you need to do is add the plugin for Eclipse Web Tools (available in the standard list of plugins). Just enable the software site and install the latest version! Very simple and very powerful.

Thursday, April 14, 2011

Manipulating XML Using Python

I work with XML-related content on a day-to-day basis at work. I come from a .NET background and have written dozens of applications that leverage DOM when manipulating XML. Recently, I've started broadening my horizons to include more languages. I've written a few applications in Python now to do similar tasks as my .NET applications and there's one area that I always find lacking: XML manipulation with eTree. Perhaps I'm mistaken, but it appears from community pages that eTree is the defacto standard in Python for manipulating XML. Sure, it does *most* things correctly, but every once in a while, I can't help but stop and think, this was a whole lot easier with such and such method in .NET or, why does etree.xpath() work when etree.find() doesn't? Why are there two ways to do essentially the same thing in the same class library anyway?

One area that eTree really lacks cohesive support is mixed type XML (Some Text some more text even more text.). Dealing with tails and heads in this sort of situation is a nightmare but completely normal in the XML I work with.

Maybe I'm coming about this the wrong way. Maybe there's a better option out there I haven't considered yet? Maybe I'm just not used to seeing DOM in a Python-esque way. What do you think?

Tuesday, April 12, 2011

PyDev and Code Completion

I was struggling with code completion/hinting in PyDev on Eclipse 3.5.2 (default for Ubuntu) and came across this little useful tidbit on the Internet:

'To enable code completion, go to Window > Preferences > Pydev > Editor > Code Completion, and check the 'Use Code Completion?' box, as well as the other boxes for what you want to complete on. It seems to take a second to load, the first time it has to complete something.'

Problem solved! I now have my autocomplete settings just the way I want them! Thanks Internet! If I haven't said it lately, you're awesome.