Add support for fuzzy matching in reftests (#12187)

This allows fuzzy matching in reftests in which a comparison can
succeed if the images are different within a specified tolerance. It
is useful in the case of antialiasing, and in other scenarios where
it's not possible to make an exact match in all cases.

Differences between tests are characterised by two values:

* The maximum difference for any pixel on any color channel (in the
  range 0 to 255)

* The maximum total number of differing pixels

The fuzziness can be supplied in two places, according to whether it's
a property of the test or of the implementation:

* In the reftest itself, using a <meta name=fuzzy> tag

* In the expectation metadata file using a fuzzy: key that takes a
  list

The general format of the fuzziness specifier is

range = [name "="] [digits, "-"], digits
fuzziness = [ url, "-" ], range, ";", range
name = "maxDifference" | "totalPixels"

The first range represents the maximum difference of any channel per
pixel and the second represents the total number of pixel
differences. So for example a specifier could be:

* "maxDifference=10;totalPixels=300" - meaning a difference of exactly
10 per color channel and exactly 300 pixels different in total (all
ranges are inclusive).

* "5-10;200-300" - meaning a maximum difference of between 5 and 10
  per color channel and between 200 and 300 pixels differing in total

The definition of url is a little different between the meta element
and the expecation metadata. In the first case the url is resolved
against the current file, and applies to any reference in the current
file with that name. So for example

<meta name="fuzzy" content="option-1-ref.html:5;200">

would allow a fuzziness of up to 5 on a specific channel and up to 200
opixels different for comparisons involving the file containing the
meta element and option-1-ref.html.

In the case of expectation metadata, the metadata is always associated
with the root test, so urls are always resolved relative to that. In
the case as above where only a single URL is supplied, any reference
document with that URL will have the fuzziness applied for whatever
comparisons it's involved in e.g.

[test1.html]
  fuzzy: option-1-ref.html:5;200

would apply the fuziness to any comparison involving option-1-ref.html
whilst running the set of reftests rooted on test1.html. To specify an
exact comparison for the fuzziness, one can also supply a full
reference pair e.g.

[test1.html]
  fuzzy: subtest.html==option-1-ref.html:5;200

in which case the fuzziness would only apply to "match" comparison
involving subtest.html on the lhs and option-1-ref.html on the
rhs (both resolved relative to test1.html).
diff --git a/docs/_writing-tests/reftests.md b/docs/_writing-tests/reftests.md
index f56a84a..5101654 100644
--- a/docs/_writing-tests/reftests.md
+++ b/docs/_writing-tests/reftests.md
@@ -112,10 +112,67 @@
 ## Fuzzy Matching
 
 In some situations a test may have subtle differences in rendering
-compared to the reference due to, e.g., anti-aliasing. This may cause
-the test to pass on some platforms but fail on others. In this case
-some affordance for subtle discrepancies is desirable. However no
-mechanism to allow this has yet been standardized.
+compared to the reference due to, e.g., anti-aliasing. To allow for
+these small differences, we allow tests to specify a fuzziness
+characterised by two parameters, both of which must be specified:
+
+ * A maximum difference in the per-channel color value for any pixel.
+ * A number of total pixels that may be different.
+
+The maximum difference in the per pixel color value is formally
+defined as follows: let <code>T<sub>x,y,c</sub></code> be the value of
+colour channel `c` at pixel coordinates `x`, `y` in the test image and
+<code>R<sub>x,y,c</sub></code> be the corresponding value in the
+reference image, and let <code>width</code> and <code>height</code> be
+the dimensions of the image in pixels. Then <code>maxDifference =
+max<sub>x=[0,width) y=[0,height), c={r,g,b}</sub>(|T<sub>x,y,c</sub> -
+R<sub>x,y,c</sub>|)</code>.
+
+To specify the fuzziness in the test file one may add a `<meta
+name=fuzzy>` element (or, in the case of more complex tests, to any
+page containing the `<link rel=[mis]match>` elements). In the simplest
+case this has a `content` attribute containing the parameters above,
+separated by a colon e.g.
+
+```
+<meta name=fuzzy content="maxDifference=15;totalPixels=300">
+```
+
+would allow for a  difference of exactly 15 / 255 on any color channel
+and 300 exactly pixels total difference. The argument names are optional
+and may be elided; the above is the same as:
+
+```
+<meta name=fuzzy content="15;300">
+```
+
+The values may also be given as ranges e.g.
+
+```
+<meta name=fuzzy content="maxDifference=10-15;totalPixels=200-300">
+```
+
+or
+
+```
+<meta name=fuzzy content="10-15;200-300">
+```
+
+In this case the maximum pixel difference must be in the range
+`10-15` and the total number of different pixels must be in the range
+`200-300`.
+
+In cases where a single test has multiple possible refs and the
+fuzziness is not the same for all refs, a ref may be specified by
+prefixing the `content` value with the relative url for the ref e.g.
+
+```
+<meta name=fuzzy content="option1-ref.html:10-15;200-300">
+```
+
+One meta element is required per reference requiring a unique
+fuzziness value, but any unprefixed value will automatically be
+applied to any ref that doesn't have a more specific value.
 
 ## Limitations
 
diff --git a/infrastructure/metadata/infrastructure/reftest/reftest_fuzzy.html.ini b/infrastructure/metadata/infrastructure/reftest/reftest_fuzzy.html.ini
new file mode 100644
index 0000000..1ab2d77
--- /dev/null
+++ b/infrastructure/metadata/infrastructure/reftest/reftest_fuzzy.html.ini
@@ -0,0 +1,2 @@
+[reftest_fuzzy.html]
+  fuzzy: fuzzy-ref-1.html:maxDifference=255;100-100
diff --git a/infrastructure/reftest/fuzzy-ref-1.html b/infrastructure/reftest/fuzzy-ref-1.html
new file mode 100644
index 0000000..e50fc11
--- /dev/null
+++ b/infrastructure/reftest/fuzzy-ref-1.html
@@ -0,0 +1,9 @@
+<!DOCTYPE html>
+<style>
+div {
+  width: 100px;
+  height: 100px;
+  background-color: green;
+}
+</style>
+<div></div>
diff --git a/infrastructure/reftest/reftest_fuzzy.html b/infrastructure/reftest/reftest_fuzzy.html
new file mode 100644
index 0000000..7429025
--- /dev/null
+++ b/infrastructure/reftest/reftest_fuzzy.html
@@ -0,0 +1,13 @@
+<!DOCTYPE html>
+<link rel=match href=fuzzy-ref-1.html>
+<!-- This meta is overridden in the corresponding ini file -->
+<meta name=fuzzy content="fuzzy-ref-1.html:128;100">
+<style>
+div {
+  width: 99px;
+  height: 100px;
+  background-color: green;
+}
+</style>
+<div></div>
+
diff --git a/infrastructure/reftest/reftest_fuzzy_1.html b/infrastructure/reftest/reftest_fuzzy_1.html
new file mode 100644
index 0000000..1930fe0
--- /dev/null
+++ b/infrastructure/reftest/reftest_fuzzy_1.html
@@ -0,0 +1,12 @@
+<!DOCTYPE html>
+<link rel=match href=fuzzy-ref-1.html>
+<meta name=fuzzy content="fuzzy-ref-1.html:255;100">
+<style>
+div {
+  width: 99px;
+  height: 100px;
+  background-color: green;
+}
+</style>
+<div></div>
+
diff --git a/tools/manifest/item.py b/tools/manifest/item.py
index c06daee..c6363a7 100644
--- a/tools/manifest/item.py
+++ b/tools/manifest/item.py
@@ -1,5 +1,5 @@
 from copy import copy
-
+from six import iteritems
 from six.moves.urllib.parse import urljoin, urlparse
 from abc import ABCMeta, abstractproperty
 
@@ -169,6 +169,14 @@
     def dpi(self):
         return self._extras.get("dpi")
 
+    @property
+    def fuzzy(self):
+        rv = self._extras.get("fuzzy", [])
+        if isinstance(rv, list):
+            return {tuple(item[0]): item[1]
+                    for item in self._extras.get("fuzzy", [])}
+        return rv
+
     def meta_key(self):
         return (self.timeout, self.viewport_size, self.dpi)
 
@@ -181,6 +189,8 @@
             extras["viewport_size"] = self.viewport_size
         if self.dpi is not None:
             extras["dpi"] = self.dpi
+        if self.fuzzy:
+            extras["fuzzy"] = list(iteritems(self.fuzzy))
         return rv
 
     @classmethod
diff --git a/tools/manifest/sourcefile.py b/tools/manifest/sourcefile.py
index b5d7cdf..78843b0 100644
--- a/tools/manifest/sourcefile.py
+++ b/tools/manifest/sourcefile.py
@@ -1,6 +1,7 @@
 import hashlib
 import re
 import os
+from collections import deque
 from six import binary_type
 from six.moves.urllib.parse import urljoin
 from fnmatch import fnmatch
@@ -453,6 +454,79 @@
         return self.dpi_nodes[0].attrib.get("content", None)
 
     @cached_property
+    def fuzzy_nodes(self):
+        """List of ElementTree Elements corresponding to nodes in a test that
+        specify reftest fuzziness"""
+        return self.root.findall(".//{http://www.w3.org/1999/xhtml}meta[@name='fuzzy']")
+
+    @cached_property
+    def fuzzy(self):
+        rv = {}
+        if self.root is None:
+            return rv
+
+        if not self.fuzzy_nodes:
+            return rv
+
+        args = ["maxDifference", "totalPixels"]
+
+        for node in self.fuzzy_nodes:
+            item = node.attrib.get("content", "")
+
+            parts = item.rsplit(":", 1)
+            if len(parts) == 1:
+                key = None
+                value = parts[0]
+            else:
+                key = urljoin(self.url, parts[0])
+                reftype = None
+                for ref in self.references:
+                    if ref[0] == key:
+                        reftype = ref[1]
+                        break
+                if reftype not in ("==", "!="):
+                    raise ValueError("Fuzzy key %s doesn't correspond to a references" % key)
+                key = (self.url, key, reftype)
+                value = parts[1]
+            ranges = value.split(";")
+            if len(ranges) != 2:
+                raise ValueError("Malformed fuzzy value %s" % item)
+            arg_values = {None: deque()}
+            for range_str_value in ranges:
+                if "=" in range_str_value:
+                    name, range_str_value = [part.strip()
+                                             for part in range_str_value.split("=", 1)]
+                    if name not in args:
+                        raise ValueError("%s is not a valid fuzzy property" % name)
+                    if arg_values.get(name):
+                        raise ValueError("Got multiple values for argument %s" % name)
+                else:
+                    name = None
+                if "-" in range_str_value:
+                    range_min, range_max = range_str_value.split("-")
+                else:
+                    range_min = range_str_value
+                    range_max = range_str_value
+                try:
+                    range_value = [int(x.strip()) for x in (range_min, range_max)]
+                except ValueError:
+                    raise ValueError("Fuzzy value %s must be a range of integers" %
+                                     range_str_value)
+                if name is None:
+                    arg_values[None].append(range_value)
+                else:
+                    arg_values[name] = range_value
+            rv[key] = []
+            for arg_name in args:
+                if arg_values.get(arg_name):
+                    value = arg_values.pop(arg_name)
+                else:
+                    value = arg_values[None].popleft()
+                rv[key].append(value)
+            assert list(arg_values.keys()) == [None] and len(arg_values[None]) == 0
+        return rv
+
+    @cached_property
     def testharness_nodes(self):
         """List of ElementTree Elements corresponding to nodes representing a
         testharness.js script"""
@@ -749,7 +823,8 @@
                     references=self.references,
                     timeout=self.timeout,
                     viewport_size=self.viewport_size,
-                    dpi=self.dpi
+                    dpi=self.dpi,
+                    fuzzy=self.fuzzy
                 )]
 
         elif self.content_is_css_visual and not self.name_is_reference:
diff --git a/tools/manifest/tests/test_sourcefile.py b/tools/manifest/tests/test_sourcefile.py
index 7c368a5..18aa55a 100644
--- a/tools/manifest/tests/test_sourcefile.py
+++ b/tools/manifest/tests/test_sourcefile.py
@@ -789,3 +789,41 @@
                                             u'/_fake_base/html/test.any.worker.html?wss']
 
     assert items[0].url_base == "/_fake_base/"
+
+
+@pytest.mark.parametrize("fuzzy, expected", [
+    (b"ref.html:1;200", {("/foo/test.html", "/foo/ref.html", "=="): [[1, 1], [200, 200]]}),
+    (b"ref.html:0-1;100-200", {("/foo/test.html", "/foo/ref.html", "=="): [[0, 1], [100, 200]]}),
+    (b"0-1;100-200", {None: [[0,1], [100, 200]]}),
+    (b"maxDifference=1;totalPixels=200", {None: [[1, 1], [200, 200]]}),
+    (b"totalPixels=200;maxDifference=1", {None: [[1, 1], [200, 200]]}),
+    (b"totalPixels=200;1", {None: [[1, 1], [200, 200]]}),
+    (b"maxDifference=1;200", {None: [[1, 1], [200, 200]]}),])
+def test_reftest_fuzzy(fuzzy, expected):
+    content = b"""<link rel=match href=ref.html>
+<meta name=fuzzy content="%s">
+""" % fuzzy
+
+    s = create("foo/test.html", content)
+
+    assert s.content_is_ref_node
+    assert s.fuzzy == expected
+
+
+@pytest.mark.parametrize("fuzzy, expected", [
+    ([b"1;200"], {None: [[1, 1], [200, 200]]}),
+    ([b"ref-2.html:0-1;100-200"], {("/foo/test.html", "/foo/ref-2.html", "=="): [[0, 1], [100, 200]]}),
+    ([b"1;200", b"ref-2.html:0-1;100-200"],
+     {None: [[1, 1], [200, 200]],
+      ("/foo/test.html", "/foo/ref-2.html", "=="): [[0,1], [100, 200]]})])
+def test_reftest_fuzzy_multi(fuzzy, expected):
+    content = b"""<link rel=match href=ref-1.html>
+<link rel=match href=ref-2.html>
+"""
+    for item in fuzzy:
+        content += b'\n<meta name=fuzzy content="%s">' % item
+
+    s = create("foo/test.html", content)
+
+    assert s.content_is_ref_node
+    assert s.fuzzy == expected
diff --git a/tools/wptrunner/docs/expectation.rst b/tools/wptrunner/docs/expectation.rst
index 6a0c776..7fe89d9 100644
--- a/tools/wptrunner/docs/expectation.rst
+++ b/tools/wptrunner/docs/expectation.rst
@@ -190,13 +190,6 @@
  * A subsection per subtest, with the heading being the title of the
    subtest.
 
- * A key ``type`` indicating the test type. This takes the values
-   ``testharness`` and ``reftest``.
-
- * For reftests, keys ``reftype`` indicating the reference type
-   (``==`` or ``!=``) and ``refurl`` indicating the URL of the
-   reference.
-
  * A key ``expected`` giving the expectation value of each (sub)test.
 
  * A key ``disabled`` which can be set to any value to indicate that
@@ -207,6 +200,19 @@
    the runner should restart the browser after running this test (e.g. to
    clear out unwanted state).
 
+ * A key ``fuzzy`` that is used for reftests. This is interpreted as a
+   list containing entries like ``<meta name=fuzzy>`` content value,
+   which consists of an optional reference identifier followed by a
+   colon, then a range indicating the maximum permitted pixel
+   difference per channel, then semicolon, then a range indicating the
+   maximum permitted total number of differing pixels. The reference
+   identifier is either a single relative URL, resolved against the
+   base test URL, in which case the fuzziness applies to any
+   comparison with that URL, or takes the form lhs url, comparison,
+   rhs url, in which case the fuzziness only applies for any
+   comparison involving that specifc pair of URLs. Some illustrative
+   examples are given below.
+
  * Variables ``debug``, ``os``, ``version``, ``processor`` and
    ``bits`` that describe the configuration of the browser under
    test. ``debug`` is a boolean indicating whether a build is a debug
@@ -246,3 +252,18 @@
 
 Note that ``PASS`` in the above works, but is unnecessary; ``PASS``
 (or ``OK``) is always the default expectation for (sub)tests.
+
+A manifest with fuzzy reftest values might be::
+
+  [reftest.html]
+    fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]
+
+In this case the default fuzziness for any comparison would be to
+require a maximum difference per channel of less than or equal to 10
+and less than or equal to 200 total pixels different. For any
+comparison involving ref1.html on the right hand side, the limits
+would instead be a difference per channel not more than 20 and a total
+difference count of not less than 200 and not more than 300. For the
+specific comparison subtest1.html == ref2.html (both resolved against
+the test URL) these limits would instead be 10 to 15 and 0 to 20,
+respectively.
diff --git a/tools/wptrunner/requirements.txt b/tools/wptrunner/requirements.txt
index 24a7d3d..37f4fde 100644
--- a/tools/wptrunner/requirements.txt
+++ b/tools/wptrunner/requirements.txt
@@ -2,4 +2,6 @@
 mozinfo == 0.10
 mozlog==4.0
 mozdebug==0.1.1
+pillow == 5.2.0
 urllib3[secure]==1.24.1
+
diff --git a/tools/wptrunner/wptrunner/executors/base.py b/tools/wptrunner/wptrunner/executors/base.py
index 8958ecf..5fa3056 100644
--- a/tools/wptrunner/wptrunner/executors/base.py
+++ b/tools/wptrunner/wptrunner/executors/base.py
@@ -1,6 +1,7 @@
 import base64
 import hashlib
 import httplib
+import io
 import os
 import threading
 import traceback
@@ -8,6 +9,8 @@
 import urlparse
 from abc import ABCMeta, abstractmethod
 
+from PIL import Image, ImageChops, ImageStat
+
 from ..testrunner import Stop
 from protocol import Protocol, BaseProtocolPart
 
@@ -286,8 +289,7 @@
 
             screenshot = data
             hash_value = hash_screenshot(data)
-
-            self.screenshot_cache[key] = (hash_value, None)
+            self.screenshot_cache[key] = (hash_value, screenshot)
 
             rv = (hash_value, screenshot)
         else:
@@ -299,11 +301,32 @@
     def reset(self):
         self.screenshot_cache.clear()
 
-    def is_pass(self, lhs_hash, rhs_hash, relation):
+    def is_pass(self, hashes, screenshots, relation, fuzzy):
         assert relation in ("==", "!=")
-        self.message.append("Testing %s %s %s" % (lhs_hash, relation, rhs_hash))
-        return ((relation == "==" and lhs_hash == rhs_hash) or
-                (relation == "!=" and lhs_hash != rhs_hash))
+        if not fuzzy or fuzzy == ((0,0), (0,0)):
+            equal = hashes[0] == hashes[1]
+        else:
+            max_per_channel, pixels_different = self.get_differences(screenshots)
+            allowed_per_channel, allowed_different = fuzzy
+            self.logger.info("Allowed %s pixels different, maximum difference per channel %s" %
+                             ("-".join(str(item) for item in allowed_different),
+                              "-".join(str(item) for item in allowed_per_channel)))
+            equal = (allowed_per_channel[0] <= max_per_channel <= allowed_per_channel[1] and
+                     allowed_different[0] <= pixels_different <= allowed_different[1])
+        return equal if relation == "==" else not equal
+
+    def get_differences(self, screenshots):
+        lhs = Image.open(io.BytesIO(base64.b64decode(screenshots[0]))).convert("RGB")
+        rhs = Image.open(io.BytesIO(base64.b64decode(screenshots[1]))).convert("RGB")
+        diff = ImageChops.difference(lhs, rhs)
+        minimal_diff = diff.crop(diff.getbbox())
+        mask = minimal_diff.convert("L", dither=None)
+        stat = ImageStat.Stat(minimal_diff, mask)
+        per_channel = max(item[1] for item in stat.extrema)
+        count = stat.count[0]
+        self.logger.info("Found %s pixels different, maximum difference per channel %s" %
+                         (count, per_channel))
+        return per_channel, count
 
     def run_test(self, test):
         viewport_size = test.viewport_size
@@ -319,6 +342,7 @@
             screenshots = [None, None]
 
             nodes, relation = stack.pop()
+            fuzzy = self.get_fuzzy(test, nodes, relation)
 
             for i, node in enumerate(nodes):
                 success, data = self.get_hash(node, viewport_size, dpi)
@@ -327,7 +351,8 @@
 
                 hashes[i], screenshots[i] = data
 
-            if self.is_pass(hashes[0], hashes[1], relation):
+            if self.is_pass(hashes, screenshots, relation, fuzzy):
+                fuzzy = self.get_fuzzy(test, nodes, relation)
                 if nodes[1].references:
                     stack.extend(list(((nodes[1], item[0]), item[1]) for item in reversed(nodes[1].references)))
                 else:
@@ -352,6 +377,25 @@
                 "message": "\n".join(self.message),
                 "extra": {"reftest_screenshots": log_data}}
 
+    def get_fuzzy(self, root_test, test_nodes, relation):
+        full_key = tuple([item.url for item in test_nodes] + [relation])
+        ref_only_key = test_nodes[1].url
+
+        fuzzy_override = root_test.fuzzy_override
+        fuzzy = test_nodes[0].fuzzy
+
+        sources = [fuzzy_override, fuzzy]
+        keys = [full_key, ref_only_key, None]
+        value = None
+        for source in sources:
+            for key in keys:
+                if key in source:
+                    value = source[key]
+                    break
+            if value:
+                break
+        return value
+
     def retake_screenshot(self, node, viewport_size, dpi):
         success, data = self.executor.screenshot(node, viewport_size, dpi)
         if not success:
diff --git a/tools/wptrunner/wptrunner/executors/executormarionette.py b/tools/wptrunner/wptrunner/executors/executormarionette.py
index b70f0ed..f9fd97b 100644
--- a/tools/wptrunner/wptrunner/executors/executormarionette.py
+++ b/tools/wptrunner/wptrunner/executors/executormarionette.py
@@ -846,7 +846,7 @@
         return screenshot
 
 
-class InternalRefTestImplementation(object):
+class InternalRefTestImplementation(RefTestImplementation):
     def __init__(self, executor):
         self.timeout_multiplier = executor.timeout_multiplier
         self.executor = executor
@@ -870,7 +870,7 @@
         pass
 
     def run_test(self, test):
-        references = self.get_references(test)
+        references = self.get_references(test, test)
         timeout = (test.timeout * 1000) * self.timeout_multiplier
         rv = self.executor.protocol.marionette._send_message("reftest:run",
                                                              {"test": self.executor.test_url(test),
@@ -881,10 +881,11 @@
                                                               "height": 600})["value"]
         return rv
 
-    def get_references(self, node):
+    def get_references(self, root_test, node):
         rv = []
         for item, relation in node.references:
-            rv.append([self.executor.test_url(item), self.get_references(item), relation])
+            rv.append([self.executor.test_url(item), self.get_references(root_test, item), relation,
+                       {"fuzzy": self.get_fuzzy(root_test, [node, item], relation)}])
         return rv
 
     def teardown(self):
diff --git a/tools/wptrunner/wptrunner/manifestexpected.py b/tools/wptrunner/wptrunner/manifestexpected.py
index 80284bd..fb3ef62 100644
--- a/tools/wptrunner/wptrunner/manifestexpected.py
+++ b/tools/wptrunner/wptrunner/manifestexpected.py
@@ -1,5 +1,6 @@
 import os
 import urlparse
+from collections import deque
 
 from wptmanifest.backends import static
 from wptmanifest.backends.static import ManifestItem
@@ -97,6 +98,105 @@
     return rv
 
 
+def fuzzy_prop(node):
+    """Fuzzy reftest match
+
+    This can either be a list of strings or a single string. When a list is
+    supplied, the format of each item matches the description below.
+
+    The general format is
+    fuzzy = [key ":"] <prop> ";" <prop>
+    key = <test name> [reftype <reference name>]
+    reftype = "==" | "!="
+    prop = [propName "=" ] range
+    propName = "maxDifferences" | "totalPixels"
+    range = <digits> ["-" <digits>]
+
+    So for example:
+      maxDifferences=10;totalPixels=10-20
+
+      specifies that for any test/ref pair for which no other rule is supplied,
+      there must be a maximum pixel difference of exactly 10, and betwen 10 and
+      20 total pixels different.
+
+      test.html==ref.htm:10;20
+
+      specifies that for a equality comparison between test.html and ref.htm,
+      resolved relative to the test path, there can be a maximum difference
+      of 10 in the pixel value for any channel and 20 pixels total difference.
+
+      ref.html:10;20
+
+      is just like the above but applies to any comparison involving ref.html
+      on the right hand side.
+
+    The return format is [(key, (maxDifferenceRange, totalPixelsRange))], where
+    the key is either None where no specific reference is specified, the reference
+    name where there is only one component or a tuple (test, ref, reftype) when the
+    exact comparison is specified. maxDifferenceRange and totalPixelsRange are tuples
+    of integers indicating the inclusive range of allowed values.
+"""
+    rv = []
+    args = ["maxDifference", "totalPixels"]
+    try:
+        value = node.get("fuzzy")
+    except KeyError:
+        return rv
+    if not isinstance(value, list):
+        value = [value]
+    for item in value:
+        if not isinstance(item, (str, unicode)):
+            rv.append(item)
+            continue
+        parts = item.rsplit(":", 1)
+        if len(parts) == 1:
+            key = None
+            fuzzy_values = parts[0]
+        else:
+            key, fuzzy_values = parts
+            for reftype in ["==", "!="]:
+                if reftype in key:
+                    key = key.split(reftype)
+                    key.append(reftype)
+                    key = tuple(key)
+        ranges = fuzzy_values.split(";")
+        if len(ranges) != 2:
+            raise ValueError("Malformed fuzzy value %s" % item)
+        arg_values = {None: deque()}
+        for range_str_value in ranges:
+            if "=" in range_str_value:
+                name, range_str_value = [part.strip()
+                                         for part in range_str_value.split("=", 1)]
+                if name not in args:
+                    raise ValueError("%s is not a valid fuzzy property" % name)
+                if arg_values.get(name):
+                    raise ValueError("Got multiple values for argument %s" % name)
+            else:
+                name = None
+            if "-" in range_str_value:
+                range_min, range_max = range_str_value.split("-")
+            else:
+                range_min = range_str_value
+                range_max = range_str_value
+            try:
+                range_value = tuple(int(item.strip()) for item in (range_min, range_max))
+            except ValueError:
+                raise ValueError("Fuzzy value %s must be a range of integers" % range_str_value)
+            if name is None:
+                arg_values[None].append(range_value)
+            else:
+                arg_values[name] = range_value
+        range_values = []
+        for arg_name in args:
+            if arg_values.get(arg_name):
+                value = arg_values.pop(arg_name)
+            else:
+                value = arg_values[None].popleft()
+            range_values.append(value)
+        rv.append((key, tuple(range_values)))
+    return rv
+
+
 class ExpectedManifest(ManifestItem):
     def __init__(self, name, test_path, url_base):
         """Object representing all the tests in a particular manifest
@@ -183,6 +283,10 @@
     def lsan_max_stack_depth(self):
         return int_prop("lsan-max-stack-depth", self)
 
+    @property
+    def fuzzy(self):
+        return fuzzy_prop(self)
+
 
 class DirectoryManifest(ManifestItem):
     @property
@@ -229,6 +333,11 @@
     def lsan_max_stack_depth(self):
         return int_prop("lsan-max-stack-depth", self)
 
+    @property
+    def fuzzy(self):
+        return fuzzy_prop(self)
+
+
 class TestNode(ManifestItem):
     def __init__(self, name):
         """Tree node associated with a particular test in a manifest
@@ -301,6 +410,10 @@
     def lsan_max_stack_depth(self):
         return int_prop("lsan-max-stack-depth", self)
 
+    @property
+    def fuzzy(self):
+        return fuzzy_prop(self)
+
     def append(self, node):
         """Add a subtest to the current test
 
diff --git a/tools/wptrunner/wptrunner/tests/test_manifestexpected.py b/tools/wptrunner/wptrunner/tests/test_manifestexpected.py
new file mode 100644
index 0000000..9355710
--- /dev/null
+++ b/tools/wptrunner/wptrunner/tests/test_manifestexpected.py
@@ -0,0 +1,38 @@
+import os
+import sys
+from io import BytesIO
+
+import pytest
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", ".."))
+
+from wptrunner import manifestexpected
+
+
+@pytest.mark.parametrize("fuzzy, expected", [
+    (b"ref.html:1;200", [("ref.html", ((1, 1), (200, 200)))]),
+    (b"ref.html:0-1;100-200", [("ref.html", ((0, 1), (100, 200)))]),
+    (b"0-1;100-200", [(None, ((0, 1), (100, 200)))]),
+    (b"maxDifference=1;totalPixels=200", [(None, ((1, 1), (200, 200)))]),
+    (b"totalPixels=200;maxDifference=1", [(None, ((1, 1), (200, 200)))]),
+    (b"totalPixels=200;1", [(None, ((1, 1), (200, 200)))]),
+    (b"maxDifference=1;200", [(None, ((1, 1), (200, 200)))]),
+    (b"test.html==ref.html:maxDifference=1;totalPixels=200",
+     [((u"test.html", u"ref.html", "=="), ((1, 1), (200, 200)))]),
+    (b"test.html!=ref.html:maxDifference=1;totalPixels=200",
+     [((u"test.html", u"ref.html", "!="), ((1, 1), (200, 200)))]),
+    (b"[test.html!=ref.html:maxDifference=1;totalPixels=200, test.html==ref1.html:maxDifference=5-10;100]",
+     [((u"test.html", u"ref.html", "!="), ((1, 1), (200, 200))),
+      ((u"test.html", u"ref1.html", "=="), ((5,10), (100, 100)))]),
+])
+def test_fuzzy(fuzzy, expected):
+    data = """
+[test.html]
+  fuzzy: %s""" % fuzzy
+    f = BytesIO(data)
+    manifest = manifestexpected.static.compile(f,
+                                               {},
+                                               data_cls_getter=manifestexpected.data_cls_getter,
+                                               test_path="test/test.html",
+                                               url_base="/")
+    assert manifest.get_test("/test/test.html").fuzzy == expected
diff --git a/tools/wptrunner/wptrunner/tests/test_wpttest.py b/tools/wptrunner/wptrunner/tests/test_wpttest.py
index f463dd7..6daa59b 100644
--- a/tools/wptrunner/wptrunner/tests/test_wpttest.py
+++ b/tools/wptrunner/wptrunner/tests/test_wpttest.py
@@ -6,6 +6,7 @@
 
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
 
+from manifest import manifest as wptmanifest
 from manifest.item import TestharnessTest
 from wptrunner import manifestexpected, wpttest
 
@@ -44,6 +45,11 @@
   lsan-max-stack-depth: 42
 """
 
+test_fuzzy = """\
+[fuzzy.html]
+  fuzzy: fuzzy-ref.html:1;200
+"""
+
 
 testharness_test = """<script src="/resources/testharness.js"></script>
 <script src="/resources/testharnessreport.js"></script>"""
@@ -139,3 +145,26 @@
     test_obj = wpttest.from_manifest(tests, test, inherit_metadata, test_metadata.get_test(test.id))
 
     assert test_obj.lsan_max_stack_depth == 42
+
+
+def test_metadata_fuzzy():
+    manifest_data = {
+        "items": {"reftest": {"a/fuzzy.html": [["/a/fuzzy.html",
+                                                [["/a/fuzzy-ref.html", "=="]],
+                                                {"fuzzy": [[["/a/fuzzy.html", '/a/fuzzy-ref.html', '=='],
+                                                            [[2, 3], [10, 15]]]]}]]}},
+        "paths": {"a/fuzzy.html": ["0"*40, "reftest"]},
+        "version": wptmanifest.CURRENT_VERSION,
+        "url_base": "/"}
+    manifest = wptmanifest.Manifest.from_json(".", manifest_data)
+    test_metadata = manifestexpected.static.compile(BytesIO(test_fuzzy),
+                                                    {},
+                                                    data_cls_getter=manifestexpected.data_cls_getter,
+                                                    test_path="a/fuzzy.html",
+                                                    url_base="/")
+
+    test = manifest.iterpath("a/fuzzy.html").next()
+    test_obj = wpttest.from_manifest(manifest, test, [], test_metadata.get_test(test.id))
+
+    assert test_obj.fuzzy == {('/a/fuzzy.html', '/a/fuzzy-ref.html', '=='): [[2, 3], [10, 15]]}
+    assert test_obj.fuzzy_override == {'/a/fuzzy-ref.html': ((1, 1), (200, 200))}
diff --git a/tools/wptrunner/wptrunner/wptmanifest/backends/conditional.py b/tools/wptrunner/wptrunner/wptmanifest/backends/conditional.py
index 4eb292e..7ad3575 100644
--- a/tools/wptrunner/wptrunner/wptmanifest/backends/conditional.py
+++ b/tools/wptrunner/wptrunner/wptmanifest/backends/conditional.py
@@ -341,6 +341,8 @@
             yield item
 
     def remove_value(self, key, value):
+        if key not in self._data:
+            return
         try:
             self._data[key].remove(value)
         except ValueError:
diff --git a/tools/wptrunner/wptrunner/wpttest.py b/tools/wptrunner/wptrunner/wpttest.py
index dc1c6b6..6a4fa4f 100644
--- a/tools/wptrunner/wptrunner/wpttest.py
+++ b/tools/wptrunner/wptrunner/wpttest.py
@@ -1,5 +1,6 @@
 import os
 import subprocess
+import urlparse
 from collections import defaultdict
 
 from wptmanifest.parser import atoms
@@ -279,12 +280,12 @@
     @property
     def prefs(self):
         prefs = {}
-        for meta in self.itermeta():
+        for meta in reversed(list(self.itermeta())):
             meta_prefs = meta.prefs
-            prefs.update(meta_prefs)
             if atom_reset in meta_prefs:
-                del prefs[atom_reset]
-                break
+                del meta_prefs[atom_reset]
+                prefs = {}
+            prefs.update(meta_prefs)
         return prefs
 
     def expected(self, subtest=None):
@@ -359,7 +360,7 @@
     test_type = "reftest"
 
     def __init__(self, tests_root, url, inherit_metadata, test_metadata, references,
-                 timeout=None, path=None, viewport_size=None, dpi=None, protocol="http"):
+                 timeout=None, path=None, viewport_size=None, dpi=None, fuzzy=None, protocol="http"):
         Test.__init__(self, tests_root, url, inherit_metadata, test_metadata, timeout,
                       path, protocol)
 
@@ -370,6 +371,7 @@
         self.references = references
         self.viewport_size = viewport_size
         self.dpi = dpi
+        self._fuzzy = fuzzy or {}
 
     @classmethod
     def from_manifest(cls,
@@ -398,7 +400,8 @@
                    path=manifest_test.path,
                    viewport_size=manifest_test.viewport_size,
                    dpi=manifest_test.dpi,
-                   protocol="https" if hasattr(manifest_test, "https") and manifest_test.https else "http")
+                   protocol="https" if hasattr(manifest_test, "https") and manifest_test.https else "http",
+                   fuzzy=manifest_test.fuzzy)
 
         nodes[url] = node
 
@@ -454,6 +457,30 @@
     def keys(self):
         return ("reftype", "refurl")
 
+    @property
+    def fuzzy(self):
+        return self._fuzzy
+
+    @property
+    def fuzzy_override(self):
+        values = {}
+        for meta in reversed(list(self.itermeta(None))):
+            value = meta.fuzzy
+            if not value:
+                continue
+            if atom_reset in value:
+                value.remove(atom_reset)
+                values = {}
+            for key, data in value:
+                if len(key) == 3:
+                    key[0] = urlparse.urljoin(self.url, key[0])
+                    key[1] = urlparse.urljoin(self.url, key[1])
+                else:
+                    # Key is just a relative url to a ref
+                    key = urlparse.urljoin(self.url, key)
+                values[key] = data
+        return values
+
 
 class WdspecTest(Test):