Wednesday, December 29, 2004

blogmail???

I recently posted a somewhat flaming comment on blogmaverick, to which someone else replyed. I took it offline and replyed to their top blog entry for lack of email, and he did the same to mine. Is this a strange new form of communication???

(I think not, but if was a funny experience and probably typical of the interesting accidental ways Internet brings people together)

Tuesday, December 28, 2004

Asia donation link

Our (Liquidnet) CTO posted a donation link for the tsunami/earthquake victims. He got it from Joel Spolsky's mailing list, and I tend to think of Joel as a pretty thorough guy in terms of recommending things.

the rumors of c++ death are premature

This year I noticed progressive degradation of the CUJ site. With some people's whining about the doom of C++ I (sadly) thought I had CUJ was going to join the C++ report in the programming language heaven. I was apparently wrong.

Google Scholar: good or evil?

Evil:
All those links to the $10/doc papers locked in ACM. I guess if you are already a member this is good ;).
Good:
Links to find the book in your local library! Really good!

about 'third world' economies

a friend pointed to this. IDL looks very cool, and there is an interview (text+audio) with De Soto here.

Sunday, December 26, 2004

american values

Just as a proof of AMK's sentiment about the way Americans value lives of people in other parts of the world, the reaction to today's '9' earthquake and the insuing tsunami drowning at least 12,000 people (12 going on 40 in places where communication is so poor, IMO) in Asia this morning was underwhelming, at least in the press... Nothing to spoil those afterchristmas sales spirit, huh?

Thursday, December 23, 2004

good food for language junkies

the 'Panel' is very interesting, and occasionally very funny.

Wednesday, December 22, 2004

the tale of two ebooks

I recently made two ebook purchases: I got a speed-reading book from Amazon and a programming book from Manning. The experiences were very different. Amazon's ebook delivery took a nice while (while they were contacting the 3d party DRM system, I assume), right away eliminating one of the benefits of ebook buying: immediacy. What I got then is a special 'ancor' file sent to my email, which upon opening in Acrobat (strike 2: does not even try to work in Preview) required me to sign into MS-Passport and then eventually failed with a nondescript error. So much for Mac. I was able to get it to open on my PC (only a single PC - viewing on a second PC gave me a DRM error that I already downloaded the doc). After the file opened it was pretty useless, as any kind of printing or copying was disabled. In summary: don't do it, 's a freaking waste of money!
In contrast when I got a book from Manning there was an immediate download available, a beautifully typeset book, no DRM insanity. The only protection they implemented is putting my name & email address in every page, basically relying on my integrity. Excellent service, I wish there were more like them.


Tuesday, December 21, 2004

oreily hacks

I own some books from O'Reily Hacks series, and like the format quite a bit. I gave a bit of thought as to why I like this very free-format style. This reminded me of a statistical method called Monte Carlo. The method was actually invented by mathematician Stan Ulam, and was instrumental in the development of the H-Bomb. The basic idea of the method is pretty simple, and the best practical explaination I know of is like this: say you have a very irregular shape that you need to measure the area of. Because the shape is so irregular it would be hard to use regular area-measurment algorithms on it. But we can do a pretty easy experiment which Stan proved will give us better and better approximations. First, we draw an easily measurable shape, like a rectangle around the irregular shape. Then we throw a dart randomly inside the rectangle. We record whether it ended up inside the irregular shape. Repeat many times. Tally up how many times the dart ended up inside the shape of interest. The area can be approximated by Area-Of-rectangle*(Dart-Hits-Inside-Weird-Shape/Total-Dart-Throws). This guess will get more and more accurate with the number of tries.
How does this relate to the O'Reily books? I think the disorganized format gives you a good idea of the overall area without covering it in an organized manner. This is similar to the dart throw in the Monte Carlo method. By having multiple somewhat random articles about the subject this format gives you a broad and accurate overview of an area which in some sense is more compact and broad than a well-organized book on the subject.

Monday, December 20, 2004

I am sick...

I have NADD++

a diversion

Finally I cracked and decided to blog this. Following Joel Spolsky's recommendation and my habit of occasionally reading business books I got the classic

It's got to be the book on the subject because I did not die of boredom and
actually have enough interest to keep on reading, despite the semi-technical flavor of the book. The secret is that the technicalities are first explained conceptually, then mathematically and then reinforced by examples, which make the whole thing digestible. Another reason I like the book is that it explains things about pricing that you see in the real world, which the consumer (me) better ways to think about values of things and the mental mechanizms businesses use to occasionally get you. The chapter I was reading just right now is on 'reference point' that people use making pricing decisions. The basic idea is simple: we use the price of competing products as a reference point for eveluating other products. Their real-life examples are great, here is a couple:
"Vasiline Intensive Therapy" lip balm is 1400% (no mistake in zeros)
more expencive than a regular tub of vaseline, which is the same
stuff. What allow this markup is competition with "Chapstick", which
sets the reference price.
Another good example is Jelly Belly candy. You will never see them sold in your grocery store next to other jellybeans - the huge markup against the generic reference point would not allow this. Instead they rely on being sold in gourmet candy stores where the reference point is going to be, for example, some expensive chocolates. Another thing not openly mentioned in the book (so far) is that
reference point is really a relative term.

not-so-iron python for .NET

This certainly deserves a mention. This implementation of python, while not compiling into CLR has pretty much full access to the .NET platform, plus .NET events can be handled and delegates can be written in python.

sync my stuff?

A while ago I decided that rsync is The Tool for backing up my stuff, like my large collection of interesting PDF files or my programming adventures. Of course I have not so far taken the time to learn the tool and relied on manual copying and prayer for backups. Now since there is a Python implementation
maybe I will get somewhere with this!

pythnon scraping follow-up

So c.l.python was helpful as usual. What I was missing was the default namespace prefix in my xpath expression - I have to do some reading up on xpath/xml namespaces interaction. In the meantime I asked them back what would happen if I simply yank the namespace from the file for now to make the xpath-ing easier. Their concern was about potential multiple namespaces, which is not a problem for me as Tidy does not produce multiple namespaces anyway. So I got xpath to play along after I removed the default namespace.
On to the next step, finding the right xpath expression to get what I need. BTW the library I have been using is xmllib2, which currently seems to have a lot of momentum. The next problem is finding the right xpath for the parts of the document you want. I used a couple of editors to look at the document's tree, and the nesting was pretty horrendous. If I was an xpath expert perhaps I would be able to crack this easily, but I felt a need for some kind of automation. I thought it would be cool to find an editor with 'xpath reverse-engineering', which would allow you to point to a node graphicaly and have an xpath suggested to you. Of course the theoretical problem with this is possible multiple xpaths that could point to a node, but I had a feeling that some 'reasonable' solution is possible here. I did some googling around and came up with 0/google results. Admittedly I have not tried XmlSpy, which I am told is The Tool for XML. Then I somehow remembered about something I read about a tool that allows you to navigate XML like a file system from the command-line. That somehow sounded neat, even though it did not solve my problem directly. I quickly 'remembered' what the tool was (using google) and fired it up - 'xmllint --shell input.xhtml'. This gives you a prompt and a lod of the standard shell navigation works like you expect: ls lists the current node's subentities, cd allows you to jump to another node, pwd gives you the current path. I wanted to see the rest of the commands, asked the shell for help, and (drum roll...) here it was - 'grep' command! I immediately realised that this is what I was looking for - you could 'grep' on a node's CDATA and get the xpath to it. The answer to the multiple potential xpaths was also right there - there is a 'simple' xpath to a node which can be obtained by navigation from the root without any wildcards or attribute matching stuff. So now I am almost there. What remains is finding a slightly more abstract xpath that would select ALL the nodes that I am interested in, so that I can process the whole list of them in a loop. To be continued...

Wednesday, December 15, 2004

my latest stuckness

I just posted the following to c.l.python. Hope they can help...

"I am trying to do some xpath on
http://fluidobjects.com/doc.xhtml
but cannot get past 'point A' (that is, I am totally stuck):

>> import libxml2
>> mydoc = libxml2.parseDoc(text)
>> mydoc.xpathEval('/html')
>> []"

The purpose of this is to improve my www-scraping skill (I'd like to scrape the audio from HOPE 5 grabbing mp3, author info and description and importing directly into my iTunes). I pretty much settled on the idea that the ideal route for scraping HTML is HTML | Tidy | XPath. Tidying in python took a surprising while to get right (it has a lot of keyword optoins which are not reflected into the python wrapper ergo the wrapper is not self-documenting, which is a strike), so now I am stuck on the XPath thingy. If I do not get unstuck it's back to plan B.

blockbust

For one reason or another I get a bit nostalgic when I witness a great big business in a downward spiral. The one I am thinking of right now is Blockbuster. Their main business model is obviously at the end of the rope - who wants to shlep to the video store and only to find that what they want to rent is not available and if it is available it is scratched up and if you even get to watch it you are pretty likely to be hit with an unexpected late fee which is almost impossible to fight if they are wrong. But when there were no other options we were pretty happy going to BB, the store was clean and well taken care of and there was always an exitement of getting a movie. Or maybe that was just what being a kid feels like.
Anyways, why do I think BB is dead? Aren't they copying the successful model of Netflix?
The brilliance of Netflix was not the idea, but the timing. The idea of delivering movied via mail must have seemed crazy in the time when everything is going towards digital delivery. But Netflix saw the opportunity to gain market share in a small window created by the upheaval of physical-to-digital distribution switchover. This was not their ultimate goal, I venture, and I think the proof is in their recent teaming up with Tivo to do digital delivery. So while in short run BB can suck some market share off them (mostly opportunity market share, as I doubt current netflix customers will be wooed by the BB offering) they are fighting yesterday's battle. The company's creative spirit appears stagnant and while there always room for turnaround, the things are not looking good...
But in the end the decline of the companies is overall a good thing. Without it the world would be dominated by powerful and monopolies and invention and attendant economic improvement would stop. So, goodbye, BB (maybe)!

Swear this wasn't photoshopped!

Monday, December 13, 2004

Channukah

Here is a Happy Channuka from CLISP (albeit it is one day off, at least if you light at sunset):


And if you forgot your siddur (prayer book) and cannot for some reason remember the Rosh Chodesh Musaf - there is some help from Amazon

Saturday, December 11, 2004

most funnest LISP tutorial EVER (with Macros!)

Go check it out!

python maddness

I recently send around a bug I found (in my own code!) to some of my collegues as a puzzle and hid the answer in the end in the following obfuscatedpython bit:
"You will get the answer when you execute this from command-line:

python -c "import new, sys;print (lambda s: s and s[-1]+new.function(sys._getframe().f_code, globals())(s[:-1]) or s)('.srotarepo noisrevnoc fo eraweB :nosseL \n.tcejbo TNOFH gniylrednu eht llik lliw rotcurtsed s\'tnoFC lanigiro eht dna ,ton si tI \n.tcerroc yllacitnames si tnemngissa tnoFC taht kniht uoy sekam rotcurtsnoc gnitpecca-TNOFH eht htiw noitanibmoc ni rotarepo noisrevnoc TNOFH\n')"

The only really cool (and hopefully useless :) thing here is the list/string reverse function written as a recursive lambda. Of course there is no normal way to call a lambda in python recursively since it doesn't have a name, so you have to poke around the current frame object to get what you want. So you get this beauty:

(lambda s: s and s[-1]+new.function(sys._getframe().f_code, globals())(s[:-1]) or s)

I think I'll go take a shower now :)

Wednesday, December 08, 2004

The sounds of thunder...

Installed Thunderbird 1.0. It seems NOTICABLY faster than the previous version (particularly on windoze), just like Firefox did when you switched to 1.0. Coincidence? Whatever it is, can's stop the open source! Yippie-ki-yay , Mister Falcon!

Monday, December 06, 2004

See KILL BILL (2)

(note: this was written a few month ago, and is even more poorely edited than my other stuff. I am sorry you are reading it)

This one was worth the wait. (Side note: apparently I was not the only one waiting for it. I went to Blockbuster 3 times to pick it up and there were no copies left. I assume Blockbuster anticipated the initial spike in demand, because the movie was not "guaranteed in stock" as many other new releases are. They probably figured that the due to the "spiky" nature of demand here they will be stuck with a bunch of unrentable copies in the end. This makes sense except for the fact that this movie is bound to become a cult classic and people will want to buy copies to own. But I will let their actuaries do their job. Anyway, the point I was getting at was that I HAD to download the movie ILLEGALLY just to watch in when it came out. (I did rent a copy the following week, which I usually do since I want studios to make more good ones. Besides the quality was much better for the second viewing.). This is a hint to the studios (anyone listening?) about electronic distribution solving at least the demand vs. physical copy mismatch. Also I actually signed up for CinemaNow a while ago, but it appears that CinemaNow and Movielink are getting the shrift here, because the movie is not available electronically when the demand is the highest. It is available on CinemaNow sometime in September, but if you haven't watched it by then you are not really KILL BILL material.). Anyway, back to the movie. If you saw KILL BILL part 1 you only saw half the movie, and probably a the wrong half. Steven King suggested multiple times in EW that this is not 'great cinema', making comparisons to "Mystic River" (which he thought was great. I did too). I seriously wonder if he regrets saying that after seeing part 2. This movie goes back and puts an incredible amount of emotions (plural, as there is much more than revenge here) into the storyline. (don't worry, the action sequences are still awesome). And the cinematography and acting is incredible.
(for those who don't mind me spoiling it: the scene-before-last can qualify as one of the most powerful movie scenes of all time IMO. And here is the (un)real QT twist: the fight scene that ends with BILL being killed is actually a very romantic scene - and we owe that to the "Pai Mei five point exploding heart technique". This technique, when used on an opponent allegedly guarantees a sure death from a "hear explosion" once the opponent takes a few steps (5 or 8 IIRC). This incredible Kung-Fu move (by QT) allows us to see the relationship between Bea and BILL in it's pure raw form: you actually see how much they love each other because there is no other considerations in play - BILL will is effectively dead (but still alive and talking untill he takes the few fateful steps), which frees both him an and Bea his twisted path and allows a sort of greatness to come out. And the name: "exploding heart" technique - can this really be a coincidence? Wow, totally ingenious. The next scene of Bea just crying her heart out is a good reinforcement.)
See Kill Bill 2. QED.

post of the day

This really made happy - my feelings on Python vs Java are strongly reinforced now:
http://dirtsimple.org/2004/12/python-is-not-java.html
Of course I do not really know Java. But this explains why :)

Thursday, December 02, 2004

a simple problem

My first blog post. Wonder if anyone will read.
I live on Long Island and work in Manhattan (for Liquidnet, a pretty kick-ass company). As this winter has been pretty warm New York felt quite a bit like Seattle (I lived there for 4 years): wet and overcast. All this wetness created the following problem: umbrellas. Well, umbrellas themselves are not the problem, but having one always available is. Of course, some people solved this by having a super-tiny umbrella that they can always carry around. But the recent rains have been accompanied by quite a bit or wind, so I need my Giant Umbrella. That said, the question is where shold I keep it. If I keep it at home and it rains on the way back from work I am screwed and vice versa. What about keeping 2 umbrellas: one at home and one at work? That should work ( and working at Liquidnet one can afford at least two umbrellas :). But this scheme would require always insuring that both locations are umbrella-equipped, which may require me to shuttle my Giant Umbrella next time I go back to the location after the rain forced me to take the umbrella from there. Since this shuttling may be occuring on an obviously sunny day it will shock my co-passengers on Long Island Rail Road, an effect I try to avoid having on people. Solution? More umbrellas! What if I have, say, a 100 umbrellas at each location? Assuming random distribution of rain activity what should happen is that for every umbrella I take from location A to B due to rain I should eventually take one from B back to A. Actually as I thought about this I realized that this is another reincarnation of the Random Walk problem. This final realization is really the reason I am posting this garbage: the moral of the story is that complicated mathematics is all around us, if we only think a little harder.