Wednesday, November 14, 2007

Improvements on the Wikipedia Script

(Edit: Blogspot is lame, if I set nowrap on an element then it will cut off the text at the preconfigured length.)

(Edit: apparently blogspot's preview of the code tag behaves differently from the actual post, so I had to insert the newlines manually.)

In my previous post I introduced a script that would create a start and end point for the six degrees of wikipedia game. In retrospect, I shouldn't have worked so hard to keep it on one line when it sacrificed so much readability. The following code is a little more DRY (Don't Repeat Yourself (Pragmatic Programmer)), and I hope at least slightly more readable.

for ((i=0;i < 2;++i))
        echo "$(wget -nv 2>&1 | sed "s/^.* URL:\([^ ]*\) .* \"\([^\"]*\)\".*$/<a href='\\1'>\\2<\\/a>/")

There are still a couple outstanding issues, but they're fairly minor:

  • Escaped double quote in page title breaks regex.
  • Underscores in link text should be replaced with spaces.

No comments: