Picture taken from Nederlands Online |
For my communication in Dutch, I regularly consult the online dictionary of Van Dale to check the spelling.
To make the search less time consuming, this could be a perfectly simple bash script. Just mix some curl, add some grep for flavour and finish with w3m. But stay away from regular expressions!
update 2014: new version
Of course you first need to make sure that you have curl and w3m installed (both very small tools). If you use a Debian-based system, open a terminal install them as follows:
sudo apt-get install curl w3m
The script itself is fairly simple, it is just about extracting information out of HTML the correct way:
curl -s -b "a=b" "http://www.vandale.nl/vandale/zoekService.do?selectedDictionary=nn&selectedDictionaryName=Nederlands&searchQuery=$1" \
| grep -A1000 'div id="results" style="clear:both;">' \
| grep -B1000 'Gebruik dit woordenboek nu ook via de ' \
| head -n -1 \
| w3m -dump -T text/html
First,
curl
is used with a useful useless cookie, the long URL is the searchquery for the Van Dale-website. The results of the query are mentioned in a <div>-tag called "results", so there grep
is used to cut the code and we use everything that comes after (A) that tag.At the end of the results (also if there are no results), the site will write "Gebruik dit woordenboek blah", so there can be cut again, using only what comes before (B) that sentence.
Personally, I don't find "Gebruik dit woordenboek blah"-sentence very useful, so an easy
head
seems in order, putting that useless line aside.All this gets piped into
w3m
so that it gets stripped of all HTML-tags and is shown in a nice layout in the terminal.Links:
- original post (NedLinux)
- usage of w3m (Matt Wynne)
- usage of head and tail (StackOverflow)
- how not to parse HTML (CodingHorror)
PS: Actually it is silly writing this post in English, it is about a Dutch dictionary, for crying out loud :-)
Edit:
What has been will be again,Of course a program with the same functionality already exists since 2001: gnuvd by Dirk-Jan C. Binnema.
what has been done will be done again;
there is nothing new under the sun.
--Ecclesiastes 1:9
No comments:
Post a Comment