[Merge 103] [journeys] Make SimilarVisitDeduperClusterFinalizer use url_for_deduping

In a previous patch, we made SimilarVisitDeduperClusterFinalizer use
url_for_display instead of url_for_deduping:
https://chromium-review.googlesource.com/c/chromium/src/+/3646808

That was not the way to go.

Instead, this CL restores the usage of url_for_deduping, as well as
increases the aggression of url_for_deduping to ALSO strip way the URL
query, so that URLs that differ only by the query part may also be
deduped by SimilarVisitDeduperClusterFinalizer, assuming that the page
title differs.

I added more commentary too, to explain that so long as
url_for_deduping is strictly more aggressive than url_for_display, we
should not display any identical rows in the UI.

(cherry picked from commit d60f7bf318fd7b67f6ebd99175b4726cf8cb9838)

Bug: 1325154
Change-Id: I105f9e74e98924deddf3c2481517962cd12a6b15
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3645237
Reviewed-by: Sophie Chang <sophiechang@chromium.org>
Commit-Queue: Tommy Li <tommycli@chromium.org>
Cr-Original-Commit-Position: refs/heads/main@{#1003896}
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3652251
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Reviewed-by: Tommy Li <tommycli@chromium.org>
Auto-Submit: Tommy Li <tommycli@chromium.org>
Cr-Commit-Position: refs/branch-heads/5060@{#64}
Cr-Branched-From: b83393d0f4038aeaf67f970a024d8101df7348d1-refs/heads/main@{#1002911}
3 files changed
tree: b158f1bf0eb9a566774e9e5991bef19899869018
  1. android_webview/
  2. apps/
  3. ash/
  4. base/
  5. build/
  6. build_overrides/
  7. buildtools/
  8. cc/
  9. chrome/
  10. chromecast/
  11. chromeos/
  12. codelabs/
  13. components/
  14. content/
  15. courgette/
  16. crypto/
  17. dbus/
  18. device/
  19. docs/
  20. extensions/
  21. fuchsia/
  22. fuchsia_webengine/
  23. gin/
  24. google_apis/
  25. google_update/
  26. gpu/
  27. headless/
  28. infra/
  29. ios/
  30. ipc/
  31. media/
  32. mojo/
  33. native_client_sdk/
  34. net/
  35. pdf/
  36. ppapi/
  37. printing/
  38. remoting/
  39. rlz/
  40. sandbox/
  41. services/
  42. skia/
  43. sql/
  44. storage/
  45. styleguide/
  46. testing/
  47. third_party/
  48. tools/
  49. ui/
  50. url/
  51. weblayer/
  52. .clang-format
  53. .clang-tidy
  54. .eslintrc.js
  55. .git-blame-ignore-revs
  56. .gitattributes
  57. .gitignore
  58. .gn
  59. .mailmap
  60. .rustfmt.toml
  61. .vpython
  62. .vpython3
  63. .yapfignore
  64. AUTHORS
  65. BUILD.gn
  66. CODE_OF_CONDUCT.md
  67. codereview.settings
  68. DEPS
  69. DIR_METADATA
  70. ENG_REVIEW_OWNERS
  71. LICENSE
  72. LICENSE.chromium_os
  73. OWNERS
  74. PRESUBMIT.py
  75. PRESUBMIT_test.py
  76. PRESUBMIT_test_mocks.py
  77. README.md
  78. WATCHLISTS
README.md

Logo Chromium

Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all users to experience the web.

The project's web site is https://www.chromium.org.

To check out the source code locally, don't use git clone! Instead, follow the instructions on how to get the code.

Documentation in the source is rooted in docs/README.md.

Learn how to Get Around the Chromium Source Code Directory Structure .

For historical reasons, there are some small top level directories. Now the guidance is that new top level directories are for product (e.g. Chrome, Android WebView, Ash). Even if these products have multiple executables, the code should be in subdirectories of the product.

If you found a bug, please file it at https://crbug.com/new.