|Picture taken from Nederlands Online|
For my communication in Dutch, I regularly consult the online dictionary of Van Dale to check the spelling.
To make the search less time consuming, this could be a perfectly simple bash script. Just mix some curl, add some grep for flavour and finish with w3m. But stay away from regular expressions!
update 2014: new version
Of course you first need to make sure that you have curl and w3m installed (both very small tools). If you use a Debian-based system, open a terminal install them as follows:
The script itself is fairly simple, it is just about extracting information out of HTML the correct way:
sudo apt-get install curl w3m
curl -s -b "a=b" "http://www.vandale.nl/vandale/zoekService.do?selectedDictionary=nn&selectedDictionaryName=Nederlands&searchQuery=$1" \
| grep -A1000 'div id="results" style="clear:both;">' \
| grep -B1000 'Gebruik dit woordenboek nu ook via de ' \
| head -n -1 \
| w3m -dump -T text/html
curlis used with a useful useless cookie, the long URL is the searchquery for the Van Dale-website. The results of the query are mentioned in a <div>-tag called "results", so there
grepis used to cut the code and we use everything that comes after (A) that tag.
At the end of the results (also if there are no results), the site will write "Gebruik dit woordenboek blah", so there can be cut again, using only what comes before (B) that sentence.
Personally, I don't find "Gebruik dit woordenboek blah"-sentence very useful, so an easy
headseems in order, putting that useless line aside.
All this gets piped into
w3mso that it gets stripped of all HTML-tags and is shown in a nice layout in the terminal.
- original post (NedLinux)
- usage of w3m (Matt Wynne)
- usage of head and tail (StackOverflow)
- how not to parse HTML (CodingHorror)
PS: Actually it is silly writing this post in English, it is about a Dutch dictionary, for crying out loud :-)
What has been will be again,Of course a program with the same functionality already exists since 2001: gnuvd by Dirk-Jan C. Binnema.
what has been done will be done again;
there is nothing new under the sun.