Wednesday, December 15, 2004

my latest stuckness

I just posted the following to c.l.python. Hope they can help...

"I am trying to do some xpath on
but cannot get past 'point A' (that is, I am totally stuck):

>> import libxml2
>> mydoc = libxml2.parseDoc(text)
>> mydoc.xpathEval('/html')
>> []"

The purpose of this is to improve my www-scraping skill (I'd like to scrape the audio from HOPE 5 grabbing mp3, author info and description and importing directly into my iTunes). I pretty much settled on the idea that the ideal route for scraping HTML is HTML | Tidy | XPath. Tidying in python took a surprising while to get right (it has a lot of keyword optoins which are not reflected into the python wrapper ergo the wrapper is not self-documenting, which is a strike), so now I am stuck on the XPath thingy. If I do not get unstuck it's back to plan B.


