Improve performance of HashMgr::InitHashEntry
Hunspell's HashMgr::InitHashEntry is called for every entry in the
dictionary. Each time it is called, it initializes a FileMgr object.
That FileMgr object is only used if there is a parser error. Should
there be an error, it reports the line number. However, because the
line number doesn't help identify the place which caused the parser
error in a BDICT file, FileMgr::getlinenum() always returns 0. Since
we don't actually need a unique FileMgr object in this scenario, make
it static. This improves performance in instances where the dictionary
is quite large (e.g. German).
Bug: 1432579
Change-Id: I05881b67df13430ca7613caef70d135f1fe63a71
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4416960
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Joanmarie Diggs <jdiggs@igalia.com>
Cr-Commit-Position: refs/heads/main@{#1133402}
diff --git a/third_party/hunspell/README.chromium b/third_party/hunspell/README.chromium
index 625c1d4..6b0cf65 100644
--- a/third_party/hunspell/README.chromium
+++ b/third_party/hunspell/README.chromium
@@ -24,6 +24,7 @@
* calloc buffers in SuggestMgr::lcs to avoid reads from uninintialized buffers.
* Fix string OOB write in reverse_condition in src/hunspell/affixmgr.cxx
Upstream issue: https://github.com/hunspell/hunspell/issues/714
+* Improve performance of HashMgr::InitHashEntry when BDICT file is used.
Chromium-specific changes are in google.patch. To update the patch, follow these
steps, or simply run update_google_patch.sh from the commandline.
diff --git a/third_party/hunspell/google.patch b/third_party/hunspell/google.patch
index 90a37f07..4cba893 100644
--- a/third_party/hunspell/google.patch
+++ b/third_party/hunspell/google.patch
@@ -605,7 +605,7 @@
if (!afflst) {
HUNSPELL_WARNING(
stderr, "Error - could not open affix description file %s\n", affpath);
-@@ -1062,6 +1188,122 @@ bool HashMgr::parse_aliasf(const std::string& line, FileMgr* af) {
+@@ -1062,6 +1188,130 @@ bool HashMgr::parse_aliasf(const std::string& line, FileMgr* af) {
return true;
}
@@ -645,9 +645,17 @@
+ // Initialize a hentry struct with the given parameters, and
+ // append the given string at the end of this hentry struct.
+ memset(entry, 0, item_size);
-+ FileMgr af(NULL);
-+ entry->alen = static_cast<short>(
-+ const_cast<HashMgr*>(this)->get_aliasf(affix_index, &entry->astr, &af));
++
++ // `get_aliasf` only uses the dictionary file in the case of an error.
++ // Should that occur, `FileMgr::getlinenum` is called. But for a BDICT file,
++ // the line number doesn't help identify the place which caused the parser
++ // error. As a result, `getlinenum` always returns 0. Since we don't actually
++ // need the dictionary file, and since `InitHashEntry` is called for every
++ // dictionary entry, eliminate the overhead of creating the `FileMgr` object
++ // every time.
++ static FileMgr dummyDictionaryFile(NULL);
++ entry->alen = static_cast<short>(const_cast<HashMgr*>(this)->get_aliasf(
++ affix_index, &entry->astr, &dummyDictionaryFile));
+ entry->blen = static_cast<unsigned char>(word_length);
+ memcpy(&entry->word, word, word_length);
+
diff --git a/third_party/hunspell/src/hunspell/hashmgr.cxx b/third_party/hunspell/src/hunspell/hashmgr.cxx
index 770fac1..a7bcbb7 100644
--- a/third_party/hunspell/src/hunspell/hashmgr.cxx
+++ b/third_party/hunspell/src/hunspell/hashmgr.cxx
@@ -1224,9 +1224,17 @@
// Initialize a hentry struct with the given parameters, and
// append the given string at the end of this hentry struct.
memset(entry, 0, item_size);
- FileMgr af(NULL);
- entry->alen = static_cast<short>(
- const_cast<HashMgr*>(this)->get_aliasf(affix_index, &entry->astr, &af));
+
+ // `get_aliasf` only uses the dictionary file in the case of an error.
+ // Should that occur, `FileMgr::getlinenum` is called. But for a BDICT file,
+ // the line number doesn't help identify the place which caused the parser
+ // error. As a result, `getlinenum` always returns 0. Since we don't actually
+ // need the dictionary file, and since `InitHashEntry` is called for every
+ // dictionary entry, eliminate the overhead of creating the `FileMgr` object
+ // every time.
+ static FileMgr dummyDictionaryFile(NULL);
+ entry->alen = static_cast<short>(const_cast<HashMgr*>(this)->get_aliasf(
+ affix_index, &entry->astr, &dummyDictionaryFile));
entry->blen = static_cast<unsigned char>(word_length);
memcpy(&entry->word, word, word_length);