Posted on 2007-11-26
I am working on something that needs a list of words, without regard to american vs. british or accents or anything: it just has to be as many words as possible. There are a whole bunch of aspell dictionaries available. First expand the files:
unzip *.wl
Then merge into a single list, eliminating duplicates:
sort --unique --ignore-case *.wl >list.txt
In additon, I want everything to be UTF-8:
iconv -f ISO8859-1 -t UTF-8 list.txt >ulist.txt
Pretty simple. The merged english word list has 137,883 words.
Tags: aspell iconv sort wordlist