| commit | a551f89f60b6372833ff3c139223bb6341b6f256 | [log] [tgz] |
|---|---|---|
| author | Wenson Hsieh <wenson_hsieh@apple.com> | Thu Dec 25 01:37:44 2025 |
| committer | Wenson Hsieh <wenson_hsieh@apple.com> | Thu Dec 25 01:37:44 2025 |
| tree | 084f53b940ace6fd21b768b4097e74f4f2610aa9 | |
| parent | 6f14c548f23ccb602a8d859eb021c0864ae36718 [diff] |
[AutoFill Debugging] Part 1/2: Add an option to heuristically shorten/redact high-entropy URLs in extracted text https://bugs.webkit.org/show_bug.cgi?id=304653 rdar://165847831 Reviewed by Richard Robinson. Add support for a flag, `-shortenURLs`, that clients can use to opt into aggressive policy around shortening link `href` and image `src` when performing text extraction. For links, we discard all query parameters and fragments, and any path components that are not "low-entropy" (based on the results of a fast, very lightweight binary classifier — see below). For images, we use the last path component only if it's "low-entropy", and otherwise fall back to "image" (preserving any existing file extension). Test: fast/text-extraction/debug-text-extraction-shorten-urls.html * LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls-expected.txt: Added. * LayoutTests/fast/text-extraction/debug-text-extraction-shorten-urls.html: Added. Add a layout test to exercise this new option. * Source/WebCore/Headers.cmake: * Source/WebCore/Sources.txt: * Source/WebCore/WebCore.xcodeproj/project.pbxproj: * Source/WebCore/page/text-extraction/TextExtraction.cpp: (WebCore::TextExtraction::extractItemData): * Source/WebCore/page/text-extraction/TextExtractionTypes.h: Use the helpers below to strip out high-entropy path components from extracted URLs, along with any query parameters and fragment. * Source/WebCore/platform/StringEntropyHelpers.cpp: Added. (WebCore::StringEntropyHelpers::symbol): (WebCore::StringEntropyHelpers::dequantize): (WebCore::StringEntropyHelpers::bigramWeight): (WebCore::StringEntropyHelpers::entropyScore): (WebCore::StringEntropyHelpers::isProbablyHumanReadable): (WebCore::StringEntropyHelpers::lowEntropyLastPathComponent): (WebCore::StringEntropyHelpers::removeHighEntropyComponents): Add the fast path component classifier; see above for more details. Each character is mapped to one of 10 character symbol types (e.g. uppercase hex, lowercase hex, uppercase non-hex, lowercase non- hex, digits, etc.); the classifier is a very simple single-layer perceptron that takes (as inputs) bigrams where each bigram consists of two adjacent symbol types. The 100 weights corresponding to each bigram are encoded in a tiny lookup table, where each weight is quantized to a single byte (`uint8_t`). * Source/WebCore/platform/StringEntropyHelpers.h: Added. * Source/WebKit/Shared/TextExtractionToStringConversion.cpp: (WebKit::centerEllipsize): (WebKit::TextExtractionAggregator::shortenURLs const): (WebKit::addPartsForItem): (WebKit::addTextRepresentationRecursive): (WebKit::normalizedURLString): Deleted. Honor the `shortenURLs` flag by using the shortened versions of link hrefs and image sources. * Source/WebKit/Shared/TextExtractionToStringConversion.h: * Source/WebKit/Shared/WebCoreArgumentCoders.serialization.in: * Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm: (-[WKWebView _extractDebugTextWithConfigurationWithoutUpdatingFilterRules:completionHandler:]): * Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.h: * Source/WebKit/UIProcess/API/Cocoa/_WKTextExtraction.mm: (-[_WKTextExtractionConfiguration setShortenURLs:]): * Tools/TestRunnerShared/UIScriptContext/Bindings/UIScriptController.idl: * Tools/TestRunnerShared/UIScriptContext/UIScriptController.h: * Tools/TestRunnerShared/UIScriptContext/UIScriptControllerShared.cpp: (WTR::toTextExtractionTestOptions): Add plumbing from `UIHelper` -> `WebKitTestRunner`, for the new `shortenURLs` flag. * Tools/WebKitTestRunner/cocoa/UIScriptControllerCocoa.mm: (WTR::createTextExtractionConfiguration): Canonical link: https://commits.webkit.org/304927@main
WebKit is a cross-platform web browser engine. On iOS and macOS, it powers Safari, Mail, Apple Books, and many other applications. For more information about WebKit, see the WebKit project website.
On macOS, download Safari Technology Preview to test the latest version of WebKit. On Linux, download Epiphany Technology Preview. On Windows, you'll have to build it yourself.
Once your bug is filed, you will receive email when it is updated at each stage in the bug life cycle. After the bug is considered fixed, you may be asked to download the latest nightly and confirm that the fix works for you.
Run the following command to clone WebKit's Git repository:
git clone https://github.com/WebKit/WebKit.git WebKit
You can enable git fsmonitor to make many git commands faster (such as git status) with git config core.fsmonitor true
Install Xcode and its command line tools if you haven't done so already:
xcode-select --installRun the following command to build a macOS debug build with debugging symbols and assertions:
Tools/Scripts/build-webkit --debug
For performance testing, and other purposes, use --release instead. If you also need debug symbols (dSYMs), run:
Tools/Scripts/build-webkit --release DEBUG_INFORMATION_FORMAT=dwarf-with-dsym
To build for an embedded platform like iOS, tvOS, or watchOS, pass a platform argument to build-webkit.
For example, to build a debug build with debugging symbols and assertions for embedded simulators:
Tools/Scripts/build-webkit --debug --<platform>-simulator
or embedded devices:
Tools/Scripts/build-webkit --debug --<platform>-device
where platform is ios, tvos or watchos.
You can open WebKit.xcworkspace to build and debug WebKit within Xcode. Select the “Everything up to WebKit + Tools” scheme to build the entire project.
If you don't use a custom build location in Xcode preferences, you have to update the workspace settings to use WebKitBuild directory. In menu bar, choose File > Workspace Settings, then click the Advanced button, select “Custom”, “Relative to Workspace”, and enter WebKitBuild for both Products and Intermediates.
For production builds:
cmake -DPORT=GTK -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja ninja sudo ninja install
For development builds:
Tools/gtk/install-dependencies Tools/Scripts/update-webkitgtk-libs Tools/Scripts/build-webkit --gtk --debug
For more information on building WebKitGTK, see the wiki page.
For production builds:
cmake -DPORT=WPE -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja ninja sudo ninja install
For development builds:
Tools/wpe/install-dependencies Tools/Scripts/update-webkitwpe-libs Tools/Scripts/build-webkit --wpe --debug
For building WebKit on Windows, see the WebKit on Windows page.
Run the following command to launch Safari with your local build of WebKit:
Tools/Scripts/run-safari --debug
The run-safari script sets the DYLD_FRAMEWORK_PATH environment variable to point to your build products, and then launches /Applications/Safari.app. DYLD_FRAMEWORK_PATH tells the system loader to prefer your build products over the frameworks installed in /System/Library/Frameworks.
To run other applications with your local build of WebKit, run the following command:
Tools/Scripts/run-webkit-app <application-path>
Run the following command to launch iOS simulator with your local build of WebKit:
run-safari --debug --ios-simulator
In both cases, if you have built release builds instead, use --release instead of --debug.
To run other applications, for example MobileMiniBrowser, with your local build of WebKit, run the following command:
Tools/Scripts/run-webkit-app --debug --iphone-simulator <application-path>
Open WebKit.xcworkspace, select intended scheme such as MobileMiniBrowser and an iOS simulator as target, click run.
If you have a development build, you can use the run-minibrowser script, e.g.:
run-minibrowser --debug --wpe
Pass one of --gtk, --jsc-only, or --wpe to indicate the port to use.
Congratulations! You’re up and running. Now you can begin coding in WebKit and contribute your fixes and new features to the project. For details on submitting your code to the project, read Contributing Code.