[omnibox] [bookmark-paths] Add path index.

**Background**

`TitledUrlIndex` has an index mapping terms in bookmark titles and URLs
to the set of bookmarks containing those terms. This makes it quick to
find bookmarks title or URL matching a term.

The omnibox bookmark paths feature began matching bookmark paths as
well. E.g., the input 'myFolder google' can suggest a google bookmark in
a folder named 'myFolder'.

Before the bookmark paths feature, nodes returned had to title or URL
match every term in the input. We used the index to lookup title or URL
matches per term, and intersect those matches per term.

After the bookmark paths feature, nodes have to title, URL, or path
match every term in the input. We can no longer exclude nodes that don't
title or URL match a term since they may path match instead. Since we
don't index paths, we had to give the benefit of the doubt, and union
the the matches per term. We'd later iterate through them and make sure
all input terms are contained in their title, URL, or path.

Unioning the per term matches resulted in many more nodes than
intersecting. E.g., inputs containing 'www' or 'bookmark bar' would
return every bookmark (or up to a threshold). Processing this many more
(about 20x) nodes is expensive. We tried various optimizations, but none
were sufficient to reduce latency by 20x.

**This CL**

This CL indexes paths. This allows us to check if any of the input terms
are not contained in any paths; and if so, use intersection instead of
union. This guarantees the same results as before, because if an input
term wasn't contained in any paths, all returned nodes had to contain
that term in their titles or URLs. E.g., if the input is 'x y z', and
there are no folders containing 'x', we can simply return the nodes
title or URL matching 'x', rather than the union of nodes title or URL
matching any of 'x', 'y', or 'z'.

This doesn't require a full index (i.e. path term -> bookmarks) like we
have with titles and URLs; we just need a small index (path term -> # of
occurrences in paths). This is sufficient to check if each input term is
in any path. We track the # of occurrences so we can remove indexed
terms when folders are renamed or deleted. Unlike a full path index,
this index does not need to be updated when bookmarks or folders are
moved; it's only updated on folder creation, deletion, and rename.

This is feature guarded by `BookmarkIndexPaths`.

Adds metrics:
- Bookmarks.Memory.IndexMemoryAtStartup
    Will verify the index size isn't expensive.
- Bookmarks.UpdateTitledUrlIndex.[Add|Remove]
    Will measure the  latency of updating the existing title and URL
    index.
- Bookmarks.UpdateTitledUrlIndex.[Add|Remove]Path
    Will measure the  latency of updating the new path index.

Bug: 1129524, 1252537, 1143217
Change-Id: Ia743d284a5eb5d53cf0b9361f2eb6c1f1a21444c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3932363
Reviewed-by: Scott Violet <sky@chromium.org>
Commit-Queue: manuk hovanesian <manukh@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1056026}
11 files changed
tree: 4002e0b571fe64b9c93b4eba908e15480ee96a28
  1. android_webview/
  2. apps/
  3. ash/
  4. base/
  5. build/
  6. build_overrides/
  7. buildtools/
  8. cc/
  9. chrome/
  10. chromecast/
  11. chromeos/
  12. codelabs/
  13. components/
  14. content/
  15. courgette/
  16. crypto/
  17. dbus/
  18. device/
  19. docs/
  20. extensions/
  21. fuchsia_web/
  22. gin/
  23. google_apis/
  24. google_update/
  25. gpu/
  26. headless/
  27. infra/
  28. ios/
  29. ipc/
  30. media/
  31. mojo/
  32. native_client_sdk/
  33. net/
  34. pdf/
  35. ppapi/
  36. printing/
  37. remoting/
  38. rlz/
  39. sandbox/
  40. services/
  41. skia/
  42. sql/
  43. storage/
  44. styleguide/
  45. testing/
  46. third_party/
  47. tools/
  48. ui/
  49. url/
  50. weblayer/
  51. .clang-format
  52. .clang-tidy
  53. .eslintrc.js
  54. .git-blame-ignore-revs
  55. .gitattributes
  56. .gitignore
  57. .gn
  58. .mailmap
  59. .rustfmt.toml
  60. .vpython3
  61. .yapfignore
  62. AUTHORS
  63. BUILD.gn
  64. CODE_OF_CONDUCT.md
  65. codereview.settings
  66. DEPS
  67. DIR_METADATA
  68. ENG_REVIEW_OWNERS
  69. LICENSE
  70. LICENSE.chromium_os
  71. OWNERS
  72. PRESUBMIT.py
  73. PRESUBMIT_test.py
  74. PRESUBMIT_test_mocks.py
  75. README.md
  76. WATCHLISTS
README.md

Logo Chromium

Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all users to experience the web.

The project's web site is https://www.chromium.org.

To check out the source code locally, don't use git clone! Instead, follow the instructions on how to get the code.

Documentation in the source is rooted in docs/README.md.

Learn how to Get Around the Chromium Source Code Directory Structure .

For historical reasons, there are some small top level directories. Now the guidance is that new top level directories are for product (e.g. Chrome, Android WebView, Ash). Even if these products have multiple executables, the code should be in subdirectories of the product.

If you found a bug, please file it at https://crbug.com/new.