meta charset rewrite.

Implements https://github.com/whatwg/html/issues/6962 . Improves performance
when <meta charset> occurs in head but after the first kilobyte and aligns
behavior better with WebKit and Blink.

The main change is to avoid reloads when meta appears within head but
after the first kilobyte. Prior to this change, Gecko reloaded in that
case (in compliance with the spec!) even though WebKit and Blink did not.

Differences from WebKit and Blink:

* WebKit and Blink honor <meta charset> in <noscript>. This implementation
  does not.
* WebKit and Blink look for meta as if the tree builder was unaware of
  foreign content. This implementation is foreign content-aware. This
  makes a difference for CDATA sections that contain a > before the meta
  as well as style and script elements within foreign content. This could
  happen if the CDATA section that has mysteriously been introduced around
  a what looks like a meta tag also contains another prior tag-looking
  run of text.
* This implementation processes rel=preload and speculative loads that are
  seen before <meta charset> has been seen. WebKit and Blink instead first
  look for the meta and rewind before starting speculative parsing.
* Unlike WebKit, if there is neither an honored meta nor syntax resembling
  an XML declaration, detection from content takes place (as in Blink).
* Unlike Blink, if there is neither an honored meta nor syntax resembling
  an XML declaration, the detection from content is not dependent of network
  buffer boundaries.
* Unlike Blink, detection from content can trigger a reload at the end of
  the stream if the guess made at that point differs from the first guess.
  (See below for the definition of the input to the first guess.)

Differences from the old spec and Gecko previously:

* Meta inside script and RCDATA elements is no longer honored.
* Late meta is now ignored and no longer triggers a reload.
* Later meta counts as early enough meta: In addition to the previous
  meta within the first 1024 bytes, now a meta that started within the first
  1024 bytes counts as early enough. Additionally, if by then there hasn't
  been a template start tag and head hasn't ended, meta occurring before the
  earlier of the end of the head or a template start tag counts as early
  enough.
* Meta now counts as not-late even if the encoding label has numeric
  character reference escapes.
* Syntax resembling an XML declaration longer than a kilobyte is honored if
  there is no honored meta.
* If there is neither an honored meta nor syntax resembling an XML declaration,
  the initial chardetng scan is potentially longer than before: the first 1024
  bytes, the token spanning the 1024-byte boundary if there is such a token,
  and, if by then head hasn't ended and there hasn't been a template start tag
  until the end of the template start tag or the end of the token that causes
  head to end, ever comes first. However, if the token implying the end of the
  head is a text token, bytes only to the end of the previous non-text token is
  considered. (This definition avoids depending on network buffer boundaries.)
* XML View Source now uses the code for syntax resembling an XML declaration
  instead of expat for extracting the internal encoding label.

Reftest are added as both WPT and Gecko reftests in order to test both http:
and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order
to use the exact same bytes.

An encoding declaration has been added to a number of old tests that didn't
intend to test the new speculation behavior especially in the context of
https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 .

Differential Revision: https://phabricator.services.mozilla.com/D125808

bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1701828
gecko-commit: 9a8abd87cc7935f29b94248c1a6f8203faa14403
gecko-reviewers: smaug
58 files changed
tree: d052dbf828a6ebd66b199e4b64d0c10fad17618c
  1. .github/
  2. .well-known/
  3. accelerometer/
  4. accessibility/
  5. accname/
  6. acid/
  7. ambient-light/
  8. animation-worklet/
  9. annotation-model/
  10. annotation-protocol/
  11. annotation-vocab/
  12. apng/
  13. app-history/
  14. appmanifest/
  15. audio-output/
  16. background-fetch/
  17. BackgroundSync/
  18. badging/
  19. battery-status/
  20. beacon/
  21. bluetooth/
  22. clear-site-data/
  23. client-hints/
  24. clipboard-apis/
  25. common/
  26. compat/
  27. compression/
  28. compute-pressure/
  29. conformance-checkers/
  30. console/
  31. contacts/
  32. content-dpr/
  33. content-index/
  34. content-security-policy/
  35. contenteditable/
  36. cookie-store/
  37. cookies/
  38. core-aam/
  39. cors/
  40. credential-management/
  41. css/
  42. custom-elements/
  43. custom-state-pseudo-class/
  44. delegated-ink/
  45. density-size-correction/
  46. deprecation-reporting/
  47. device-memory/
  48. docs/
  49. document-policy/
  50. dom/
  51. domparsing/
  52. domxpath/
  53. dpub-aam/
  54. dpub-aria/
  55. editing/
  56. element-timing/
  57. encoding/
  58. encoding-detection/
  59. encrypted-media/
  60. entries-api/
  61. event-timing/
  62. eventsource/
  63. eyedropper/
  64. feature-policy/
  65. fetch/
  66. file-system-access/
  67. FileAPI/
  68. focus/
  69. font-access/
  70. fonts/
  71. forced-colors-mode/
  72. fullscreen/
  73. gamepad/
  74. generic-sensor/
  75. geolocation-API/
  76. geolocation-sensor/
  77. graphics-aam/
  78. gyroscope/
  79. hr-time/
  80. html/
  81. html-longdesc/
  82. html-media-capture/
  83. idle-detection/
  84. imagebitmap-renderingcontext/
  85. images/
  86. import-maps/
  87. IndexedDB/
  88. inert/
  89. infrastructure/
  90. input-device-capabilities/
  91. input-events/
  92. installedapp/
  93. interfaces/
  94. intersection-observer/
  95. intervention-reporting/
  96. is-input-pending/
  97. js/
  98. js-self-profiling/
  99. keyboard-lock/
  100. keyboard-map/
  101. largest-contentful-paint/
  102. layout-instability/
  103. lifecycle/
  104. loading/
  105. longtask-timing/
  106. magnetometer/
  107. managed/
  108. mathml/
  109. measure-memory/
  110. media/
  111. media-capabilities/
  112. media-playback-quality/
  113. media-source/
  114. mediacapture-depth/
  115. mediacapture-fromelement/
  116. mediacapture-image/
  117. mediacapture-insertable-streams/
  118. mediacapture-record/
  119. mediacapture-streams/
  120. mediasession/
  121. merchant-validation/
  122. mimesniff/
  123. mixed-content/
  124. mst-content-hint/
  125. native-io/
  126. navigation-timing/
  127. netinfo/
  128. network-error-logging/
  129. notifications/
  130. old-tests/
  131. orientation-event/
  132. orientation-sensor/
  133. origin-policy/
  134. page-lifecycle/
  135. page-visibility/
  136. paint-timing/
  137. parakeet/
  138. payment-handler/
  139. payment-method-basic-card/
  140. payment-method-id/
  141. payment-request/
  142. performance-timeline/
  143. periodic-background-sync/
  144. permissions/
  145. permissions-policy/
  146. permissions-request/
  147. permissions-revoke/
  148. picture-in-picture/
  149. pointerevents/
  150. pointerlock/
  151. portals/
  152. preload/
  153. presentation-api/
  154. priority-hints/
  155. private-click-measurement/
  156. proximity/
  157. push-api/
  158. quirks/
  159. raw-sockets/
  160. referrer-policy/
  161. remote-playback/
  162. reporting/
  163. requestidlecallback/
  164. resize-observer/
  165. resource-timing/
  166. resources/
  167. sanitizer-api/
  168. savedata/
  169. scheduler/
  170. screen-capture/
  171. screen-orientation/
  172. screen-wake-lock/
  173. screen_enumeration/
  174. scroll-animations/
  175. scroll-to-text-fragment/
  176. secure-contexts/
  177. secure-payment-confirmation/
  178. selection/
  179. serial/
  180. server-timing/
  181. service-workers/
  182. shadow-dom/
  183. shape-detection/
  184. signed-exchange/
  185. speculation-rules/
  186. speech-api/
  187. storage/
  188. storage-access-api/
  189. streams/
  190. subresource-integrity/
  191. svg/
  192. svg-aam/
  193. timing-entrytypes-registry/
  194. tools/
  195. touch-events/
  196. trust-tokens/
  197. trusted-types/
  198. ua-client-hints/
  199. uievents/
  200. upgrade-insecure-requests/
  201. url/
  202. urlpattern/
  203. user-timing/
  204. vibration/
  205. video-rvfc/
  206. virtual-keyboard/
  207. visual-viewport/
  208. wai-aria/
  209. wasm/
  210. web-animations/
  211. web-bundle/
  212. web-locks/
  213. web-nfc/
  214. web-otp/
  215. web-share/
  216. webaudio/
  217. webauthn/
  218. webcodecs/
  219. WebCryptoAPI/
  220. webdriver/
  221. webgl/
  222. webgpu/
  223. webhid/
  224. webidl/
  225. webmessaging/
  226. webmidi/
  227. webnn/
  228. webrtc/
  229. webrtc-encoded-transform/
  230. webrtc-extensions/
  231. webrtc-ice/
  232. webrtc-identity/
  233. webrtc-priority/
  234. webrtc-stats/
  235. webrtc-svc/
  236. websockets/
  237. webstorage/
  238. webtransport/
  239. webusb/
  240. webvr/
  241. webvtt/
  242. webxr/
  243. workers/
  244. worklets/
  245. x-frame-options/
  246. xhr/
  247. xslt/
  248. .azure-pipelines.yml
  249. .gitattributes
  250. .gitignore
  251. .mailmap
  252. .taskcluster.yml
  253. CODE_OF_CONDUCT.md
  254. CODEOWNERS
  255. CONTRIBUTING.md
  256. LICENSE.md
  257. lint.ignore
  258. README.md
  259. testharness_runner.html
  260. wpt
  261. wpt.py
README.md

The web-platform-tests Project

Taskcluster CI Status documentation manifest Python 3

The web-platform-tests Project is a cross-browser test suite for the Web-platform stack. Writing tests in a way that allows them to be run in all browsers gives browser projects confidence that they are shipping software that is compatible with other implementations, and that later implementations will be compatible with their implementations. This in turn gives Web authors/developers confidence that they can actually rely on the Web platform to deliver on the promise of working across browsers and devices without needing extra layers of abstraction to paper over the gaps left by specification editors and implementors.

The most important sources of information and activity are:

  • github.com/web-platform-tests/wpt: the canonical location of the project's source code revision history and the discussion forum for changes to the code
  • web-platform-tests.org: the documentation website; details how to set up the project, how to write tests, how to give and receive peer review, how to serve as an administrator, and more
  • wpt.live: a public deployment of the test suite, allowing anyone to run the tests by visiting from an Internet-enabled browser of their choice
  • wpt.fyi: an archive of test results collected from an array of web browsers on a regular basis
  • Real-time chat room: the wpt:matrix.org matrix channel; includes participants located around the world, but busiest during the European working day.
  • Mailing list: a public and low-traffic discussion list
  • RFCs: a repo for requesting comments on substantial changes that would impact other stakeholders or users; people who work on WPT infra are encouraged to watch the repo.

If you'd like clarification about anything, don't hesitate to ask in the chat room or on the mailing list.

Setting Up the Repo

Clone or otherwise get https://github.com/web-platform-tests/wpt.

Note: because of the frequent creation and deletion of branches in this repo, it is recommended to “prune” stale branches when fetching updates, i.e. use git pull --prune (or git fetch -p && git merge).

Running the Tests

See the documentation website and in particular the system setup for running tests locally.

Command Line Tools

The wpt command provides a frontend to a variety of tools for working with and running web-platform-tests. Some of the most useful commands are:

  • wpt serve - For starting the wpt http server
  • wpt run - For running tests in a browser
  • wpt lint - For running the lint against all tests
  • wpt manifest - For updating or generating a MANIFEST.json test manifest
  • wpt install - For installing the latest release of a browser or webdriver server on the local machine.
  • wpt serve-wave - For starting the wpt http server and the WAVE test runner. For more details on how to use the WAVE test runner see the documentation.

Windows Notes

On Windows wpt commands must be prefixed with python or the path to the python binary (if python is not in your %PATH%).

python wpt [command]

Alternatively, you may also use Bash on Ubuntu on Windows in the Windows 10 Anniversary Update build, then access your windows partition from there to launch wpt commands.

Please make sure git and your text editor do not automatically convert line endings, as it will cause lint errors. For git, please set git config core.autocrlf false in your working tree.

Publication

The master branch is automatically synced to wpt.live and w3c-test.org.

Contributing

Save the Web, Write Some Tests!

Absolutely everyone is welcome to contribute to test development. No test is too small or too simple, especially if it corresponds to something for which you've noted an interoperability bug in a browser.

The way to contribute is just as usual:

  • Fork this repository (and make sure you're still relatively in sync with it if you forked a while ago).
  • Create a branch for your changes: git checkout -b topic.
  • Make your changes.
  • Run ./wpt lint as described above.
  • Commit locally and push that to your repo.
  • Create a pull request based on the above.

Issues with web-platform-tests

If you spot an issue with a test and are not comfortable providing a pull request per above to fix it, please file a new issue. Thank you!