Thursday, February 17, 2011

Van Dale script

Picture taken from Nederlands Online
As I am back on the Linux-force, I can again freely play around with some bash scripting.

For my communication in Dutch, I regularly consult the online dictionary of Van Dale to check the spelling.

To make the search less time consuming, this could be a perfectly simple bash script. Just mix some curl, add some grep for flavour and finish with w3m. But stay away from regular expressions!

update 2014: new version

Of course you first need to make sure that you have curl and w3m installed (both very small tools). If you use a Debian-based system, open a terminal install them as follows:
sudo apt-get install curl w3m
The script itself is fairly simple, it is just about extracting information out of HTML the correct way:
curl -s -b "a=b" "$1" \
| grep -A1000 'div id="results" style="clear:both;">' \
| grep -B1000 'Gebruik dit woordenboek nu ook via de ' \
| head -n -1 \
| w3m -dump -T text/html

First, curl is used with a useful useless cookie, the long URL is the searchquery for the Van Dale-website. The results of the query are mentioned in a <div>-tag called "results", so there grep is used to cut the code and we use everything that comes after (A) that tag.
At the end of the results (also if there are no results), the site will write "Gebruik dit woordenboek blah", so there can be cut again, using only what comes before (B) that sentence.

Personally, I don't find "Gebruik dit woordenboek blah"-sentence very useful, so an easy head seems in order, putting that useless line aside.
All this gets piped into w3m so that it gets stripped of all HTML-tags and is shown in a nice layout in the terminal.


PS: Actually it is silly writing this post in English, it is about a Dutch dictionary, for crying out loud :-)

What has been will be again,
what has been done will be done again;
there is nothing new under the sun.
--Ecclesiastes 1:9
Of course a program with the same functionality already exists since 2001: gnuvd by Dirk-Jan C. Binnema.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...