blob: 216093fa836ae6b0af2a65ac5c7bfc2ca827d9c0 [file] [log] [blame]
Name: hunspell dictionaries
URL: http://bgoffice.sourceforge.net/
http://code.google.com/p/tr-spell/
http://extensions.services.openoffice.org/dictionary
http://extensions.services.openoffice.org/en/project/dict_et
http://extensions.services.openoffice.org/en/project/dict_lv_LV
http://extensions.services.openoffice.org/en/project/dict-nl
http://extensions.services.openoffice.org/project/dict_ru_RU
http://extensions.services.openoffice.org/project/pl-dict
http://ftp.linux.ee/pub/openoffice/contrib/dictionaries/et_EE.zip
http://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries/de_DE_neu.zip
http://ispell-uk.sourceforge.net/
http://openoffice.rs/dict-sr/
https://addons.mozilla.org/en-US/firefox/addon/albanian-dictionary/
https://addons.mozilla.org/en-US/firefox/addon/thamizha-solthiruthi/
https://addons.mozilla.org/en-US/firefox/addon/turkce-imla-denetimi/
https://dsso.googlecode.com/files/sv-2.12.zip
http://sourceforge.net/projects/magyarispell/
https://spellcheck-ko.googlecode.com/files/ko-aff-dic-0.5.6.zip
http://tajlingvo.tj/project/TajikHunSpellDictionary/tg_TG.zip
https://tr-spell.googlecode.com/files/firefox-tr-dict-v0.3.2.xpi
http://wiki.services.openoffice.org/wiki/Dictionaries
http://wordlist.sourceforge.net/
http://www.broffice.org/
http://www.dicollecte.org/
http://www.justlocal.com.au/
http://lilak-project.com/
http://techiaith.cymru/hunspell/cy_GB/
https://github.com/hyspell/HySpell_3.0.1/tree/master/Dictionaries/Dictr
https://github.com/msmiljan/srdict_chromium
License: MPL 1.1/GPL 2.0/LGPL 2.1/LPGL 3.0
Security Critical: no
Version: unknown
This folder contains the following:
.dic files
.aff files
.bdic files
.dic_delta files
README_<language>_<region>.txt files
The .bdic files are binary files, generated from the corresponding .dic and .aff
files, using convert_dict (chrome\tools\convert_dict). These binary files are
used by the spellchecker. The .dic_delta files are used to add words which are
not there in the .dic files. Irrespective of the encoding of the corresponding
.dic file, the .dic_delta files are encoded as UTF-8.
The final binary file, .bdic, is generated with words from the .dic and
additional words from the .dic_delta file. In order to get the current-most bdic
file versions, it is a good idea to rebuild them using convert_dict from the
.dic, .aff and .dic_delta files in this folder. convert_dict takes care of
duplicate entries present both in .dic and .dic_delta files.
The README_<language>_<region>.txt files contain information about the
individual dictionaries, including copyright information.
The .bdic files are versioned to force clients to download new versions when
necessary. Use the same version for all the dictionaries that you add at the
same time. Increment the major version number (5) if you're updating either dic
or aff files. Increment the minor version number (0) if you're updating only
dic_delta files. If you add or update dictionaries, make sure to update the
constants in chrome\common\spellcheck_common.cc.
Note: the encodings for these files are usually only UTF-8 for English
dictionaries. Otherwise, they could be anything. This will lead to
errors when trying to upload so in general, you'll just have to upload
your change and cross your fingers.
The following 39 dictionaries have been appended with or without additional
words using the .dic_delta files (as of December 14th 2012), and are covered
under the existing GPL/LGPL/MPL tri-license in COPYING:
af_ZA: No changes
bg_BG: No changes
ca_ES: Added words
Added NOSUGGEST flag = ! to .aff file
Added two words to .dic file with the ! flag to mark them forbidden/nosuggest.
cs_CZ: Added words
cy_GB: No changes
da_DK: Added words
Changed "øvrigt/mk" to "øvrigt/" in da_DK.dic to make convert_dict work.
de_DE_neu: Added words
Removed "Information" from dic_delta in favor of "Information/P" to fix
the spelling of "Informationen"
en_AU: Added the same words as en_CA.
en_CA: Added words
en_GB: Added the same words as en_CA.
en_US: Added the same words as en_CA.
es_ES: Added words
et_EE: No changes
fa_IR: Removed IGNORE from affix file, because we don't support it.
fo_FO: No changes
fr_FR: Added words to 4.8 (modern) downloaded from
http://www.dicollecte.org/download.php?prj=fr
he_IL: Added words
hi_IN: Added words
hr_HR: Added words
hu-HU: Added words
hy: Removed a newline on line 6002 (word and affixes were split on 2 lines,
probably by mistake, and that broke our convert_dict)
id_ID: Added words
it_IT: Added words
Added NOSUGGEST flag = % to .aff file
Added three words to .dic file with the ! flag to mark them forbidden/nosuggest.
ko: Removed a line from dic that was longer than 128 characters
lt_LT: Added words.
Added NOSUGGEST flag = ! to .aff file
Added some words to .dic file with the ! flag to mark them forbidden/nosuggest.
lv_LV: Added words
nb_NO: Added words
nl_NL: Added words
pl_PL: Added words
pt_BR: Added words
pt_PT: Added words
ro_RO: Added words
sh: Replaced with new rule based dictionary
sk_SK: Added words
sl_SI: Added words
Added NOSUGGEST flag = ! to .aff file
Added five words to .dic file with the ! flag to mark them forbidden/nosuggest.
Changed "Ÿvrklji/N" to "Ÿvrklji/" in sl_SI.dic to make convert_dict work.
sq: No changes
sr: Replaced with new rule based dictionary
sv_SE: Added words
Changed "ω/r" to "ω/" in sv_SE.dic to make convert_dict work.
ta_IN: Removed the word "சர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர்ர" to make convert_dict work.
tr_TR: No changes
uk_UA: No changes
vi_VN: Added words
The following dictionary has a BSD-like license in README_ru_RU.txt:
ru_RU: Added words
The following dictionary has an Apache license in COPYING.Apache:
tg_TG: No changes
On Jan 20, 2011, en_US.dic_delta was copied to en_AU.dic_delta and
en_GB.dic_delta so these locales would get the same additional words.
On Dec 26, 2012, we reran convert_dict on all dictionaries to add MD5 checksum
to the .bdic files, added sq, ta, and ko dictionaries, and bumped the versions
up to 3-0.
On Jan 8, 2013, we added back et_EE and tr_TR .dic and .aff files to add MD5
checksums to their .bdic files. We used version 3-0 again, because these files
were not updated with the rest of the dictionaries on Dec 26, 2012.
On Jan 9, 2013, we changed the download location of tr_TR from a Firefox website
to a Google code website. The only difference between the two locations is that
the tr_TR affix file from the Google code website includes a "FLAG num"
directive, which prevents Chrome's heapcheck tests from crashing. Given that aff
file changed, we used version 4-0.
On Mar 4, 2014, we added tg_TG dictionaries for Tajik language. We used 5-0
version to denote a new batch of dictionaries.
On Oct 28, 2014, we updated the en dictionaries to the latest upstream version.
We also added a bunch of words from bugs that were being flagged as incorrect.
Given that the aff & dic files changed, we used version 4-0.
On Feb 26, 2015, we updated the en-US and en-CA dictionaries to the latest
upstream version from SCOWL, but made sure that "alot -> a lot" correction does
not disappear, as it is not in SCOWL yet. The updated dictionaries support
typographical apostrophes. Because both aff and dic files changed, we
incremented major version to 6-0. That's 1 higher than the highest major version
of any dictionary.
On Mar 10, 2015, we reran convert_dict on en-US and en-CA dictionaries to fix
the missing REP rules.
On Mar 18, 2016, we added "Lilak" dictionary for Persian Language (fa-IR) and
updated en-US, en-CA and en-GB dictionaries from SCOWL. Added several words to
en*.dic_delta files. Because both aff and dic files changed, we incremented
major version to 7-0. That's 1 higher than the highest major version of any
dictionary.
On Oct 13, 2017, we updated the en dictionaries to the latest upstream version.
Since SCOWL now has an AU dictionary, we changed to using that to generate our
AU dictionary instead of copying the CA version. Because dic files changed, we
incremented the major version to 8-0.
On Aug 20, 2019, we replaced existing no aff rules based Serbian dictionaries
with new aff rule based dictionaries (1800+ rules). That applies to both sr-*
and sh-* dictionaries (Cyrillic and Latin alphabets respectively). New
dictionaries are not continuation of old ones but keep the same file names,
so we just incremented major version from 3-0 to 4-0. Original unaltered new
dictionaries are located at: https://github.com/msmiljan/srdict_chromium
On Jan 8, 2020, we updated the en and fa dictionaries to the latest upstream
versions. Because dic files changed, we incremented the major version to 9-0.