tools/wptrunner/docs/expectation.rst - external/github.com/web-platform-tests/wpt - Git at Google

 Test Metadata
 =============

 Directory Layout
 ----------------

 Metadata files must be stored under the ``metadata`` directory passed
 to the test runner. The directory layout follows that of
 web-platform-tests with each test source path having a corresponding
 metadata file. Because the metadata path is based on the source file
 path, files that generate multiple URLs e.g. tests with multiple
 variants, or multi-global tests generated from an ``any.js`` input
 file, share the same metadata file for all their corresponding
 tests. The metadata path under the ``metadata`` directory is the same
 as the source path under the ``tests`` directory, with an additional
 ``.ini`` suffix.

 For example a test with URL::

   /spec/section/file.html?query=param

 generated from a source file with path::

   <tests root>/spec/section.file.html

 would have a metadata file ::

   <metadata root>/spec/section/file.html.ini

 As an optimisation, files which produce only default results
 (i.e. ``PASS`` or ``OK``), and which don't have any other associated
 metadata, don't require a corresponding metadata file.

 Directory Metadata
 ~~~~~~~~~~~~~~~~~~

 In addition to per-test metadata, default metadata can be applied to
 all the tests in a given source location, using a ``__dir__.ini``
 metadata file. For example to apply metadata to all tests under
 ``<tests root>/spec/`` add the metadata in ``<tests
 root>/spec/__dir__.ini``.

 Metadata Format
 ---------------
 The format of the metadata files is based on the ini format. Files are
 divided into sections, each (apart from the root section) having a
 heading enclosed in square braces. Within each section are key-value
 pairs. There are several notable differences from standard .ini files,
 however:

  * Sections may be hierarchically nested, with significant whitespace
    indicating nesting depth.

  * Only ``:`` is valid as a key/value separator

 A simple example of a metadata file is::

   root_key: root_value

   [section]
     section_key: section_value

     [subsection]
        subsection_key: subsection_value

   [another_section]
     another_key: [list, value]

 Conditional Values
 ~~~~~~~~~~~~~~~~~~

 In order to support values that depend on some external data, the
 right hand side of a key/value pair can take a set of conditionals
 rather than a plain value. These values are placed on a new line
 following the key, with significant indentation. Conditional values
 are prefixed with ``if`` and terminated with a colon, for example::

   key:
     if cond1: value1
     if cond2: value2
     value3

 In this example, the value associated with ``key`` is determined by
 first evaluating ``cond1`` against external data. If that is true,
 ``key`` is assigned the value ``value1``, otherwise ``cond2`` is
 evaluated in the same way. If both ``cond1`` and ``cond2`` are false,
 the unconditional ``value3`` is used.

 Conditions themselves use a Python-like expression syntax. Operands
 can either be variables, corresponding to data passed in, numbers
 (integer or floating point; exponential notation is not supported) or
 quote-delimited strings. Equality is tested using ``==`` and
 inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are
 used in the expected way. Parentheses can also be used for
 grouping. For example::

   key:
     if (a == 2 or a == 3) and b == "abc": value1
     if a == 1 or b != "abc": value2
     value3

 Here ``a`` and ``b`` are variables, the value of which will be
 supplied when the metadata is used.

 Web-Platform-Tests Metadata
 ---------------------------

 When used for expectation data, metadata files have the following format:

  * A section per test URL provided by the corresponding source file,
    with the section heading being the part of the test URL following
    the last ``/`` in the path (this allows multiple tests in a single
    metadata file with the same path part of the URL, but different
    query parts). This may be omitted if there's no non-default
    metadata for the test.

  * A subsection per subtest, with the heading being the title of the
    subtest. This may be omitted if there's no non-default metadata for
    the subtest.

  * The following known keys:

    :expected:
       The expectation value or values of each (sub)test. In
       the case this value is a list, the first value represents the
       typical expected test outcome, and subsequent values indicate
       known intermittent outcomes e.g. ``expected: [PASS, ERROR]``
       would indicate a test that usually passes but has a known-flaky
       ``ERROR`` outcome.

    :disabled:
      Any values apart from the special value ``@False``
      indicates that the (sub)test is disabled and should either not be
      run (for tests) or that its results should be ignored (subtests).

    :restart-after:
      Any value apart from the special value ``@False``
      indicates that the runner should restart the browser after running
      this test (e.g. to clear out unwanted state).

    :fuzzy:
      Used for reftests. This is interpreted as a list containing
      entries like ``<meta name=fuzzy>`` content value, which consists of
      an optional reference identifier followed by a colon, then a range
      indicating the maximum permitted pixel difference per channel, then
      semicolon, then a range indicating the maximum permitted total
      number of differing pixels. The reference identifier is either a
      single relative URL, resolved against the base test URL, in which
      case the fuzziness applies to any comparison with that URL, or
      takes the form lhs URL, comparison, rhs URL, in which case the
      fuzziness only applies for any comparison involving that specific
      pair of URLs. Some illustrative examples are given below.

    :implementation-status:
      One of the values ``implementing``,
      ``not-implementing`` or ``backlog``. This is used in conjunction
      with the ``--skip-implementation-status`` command line argument to
      ``wptrunner`` to ignore certain features where running the test is
      low value.

    :tags:
      A list of labels associated with a given test that can be
      used in conjunction with the ``--tag`` command line argument to
      ``wptrunner`` for test selection.

    In addition there are extra arguments which are currently tied to
    specific implementations. For example Gecko-based browsers support
    ``min-asserts``, ``max-asserts``, ``prefs``, ``lsan-disabled``,
    ``lsan-allowed``, ``lsan-max-stack-depth``, ``leak-allowed``, and
    ``leak-threshold`` properties.

  * Variables taken from the ``RunInfo`` data which describe the
    configuration of the test run. Common properties include:

    :product: A string giving the name of the browser under test
    :browser_channel: A string giving the release channel of the browser under test
    :debug: A Boolean indicating whether the build is a debug build
    :os: A string  the operating system
    :version: A string indicating the particular version of that operating system
    :processor: A string indicating the processor architecture.

    This information is typically provided by :py:mod:`mozinfo`, but
    different environments may add additional information, and not all
    the properties above are guaranteed to be present in all
    environments. The definitive list of available properties for a
    specific run may be determined by looking at the ``run_info`` key
    in the ``wptreport.json`` output for the run.

  * Top level keys are taken as defaults for the whole file. So, for
    example, a top level key with ``expected: FAIL`` would indicate
    that all tests and subtests in the file are expected to fail,
    unless they have an ``expected`` key of their own.

 An simple example metadata file might look like::

   [test.html?variant=basic]
     type: testharness

     [Test something unsupported]
        expected: FAIL

     [Test with intermittent statuses]
        expected: [PASS, TIMEOUT]

   [test.html?variant=broken]
     expected: ERROR

   [test.html?variant=unstable]
     disabled: http://test.bugs.example.org/bugs/12345

 A more complex metadata file with conditional properties might be::

   [canvas_test.html]
     expected:
       if os == "mac": FAIL
       if os == "windows" and version == "XP": FAIL
       PASS

 Note that ``PASS`` in the above works, but is unnecessary since it's
 the default expected result.

 A metadata file with fuzzy reftest values might be::

   [reftest.html]
     fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]

 In this case the default fuzziness for any comparison would be to
 require a maximum difference per channel of less than or equal to 10
 and less than or equal to 200 total pixels different. For any
 comparison involving ref1.html on the right hand side, the limits
 would instead be a difference per channel not more than 20 and a total
 difference count of not less than 200 and not more than 300. For the
 specific comparison ``subtest1.html == ref2.html`` (both resolved against
 the test URL) these limits would instead be 10 to 15 and 0 to 20,
 respectively.

 Generating Expectation Files
 ----------------------------

 wpt provides the tool ``wpt update-expectations`` command to generate
 expectation files from the results of a set of test runs. The basic
 syntax for this is::

   ./wpt update-expectations [options] [logfile]...

 Each ``logfile`` is a wptreport log file from a previous run. These
 can be generated from wptrunner using the ``--log-wptreport`` option
 e.g. ``--log-wptreport=wptreport.json``.

 ``update-expectations`` takes several options:

 --full  Overwrite all the expectation data for any tests that have a
         result in the passed log files, not just data for the same run
         configuration.

 --disable-intermittent  When updating test results, disable tests that
                         have inconsistent results across many
                         runs. This can precede a message providing a
                         reason why that test is disable. If no message
                         is provided, ``unstable`` is the default text.

 --update-intermittent  When this option is used, the ``expected`` key
                        stores expected intermittent statuses in
                        addition to the primary expected status. If
                        there is more than one status, it appears as a
                        list. The default behaviour of this option is to
                        retain any existing intermittent statuses in the
                        list unless ``--remove-intermittent`` is
                        specified.

 --remove-intermittent  This option is used in conjunction with
                        ``--update-intermittent``.  When the
                        ``expected`` statuses are updated, any obsolete
                        intermittent statuses that did not occur in the
                        specified log files are removed from the list.

 Property Configuration
 ~~~~~~~~~~~~~~~~~~~~~~

 In cases where the expectation depends on the run configuration ``wpt
 update-expectations`` is able to generate conditional values. Because
 the relevant variables depend on the range of configurations that need
 to be covered, it's necessary to specify the list of configuration
 variables that should be used. This is done using a ``json`` format
 file that can be specified with the ``--properties-file`` command line
 argument to ``wpt update-expectations``. When this isn't supplied the
 defaults from ``<metadata root>/update_properties.json`` are used, if
 present.

 Properties File Format
 ++++++++++++++++++++++

 The file is JSON formatted with two top-level keys:

 :``properties``:
   A list of property names to consider for conditionals
   e.g ``["product", "os"]``.

 :``dependents``:
   An optional dictionary containing properties that
   should only be used as "tie-breakers" when differentiating based on a
   specific top-level property has failed. This is useful when the
   dependent property is always more specific than the top-level
   property, but less understandable when used directly. For example the
   ``version`` property covering different OS versions is typically
   unique amongst different operating systems, but using it when the
   ``os`` property would do instead is likely to produce metadata that's
   too specific to the current configuration and more difficult to
   read. But where there are multiple versions of the same operating
   system with different results, it can be necessary. So specifying
   ``{"os": ["version"]}`` as a dependent property means that the
   ``version`` property will only be used if the condition already
   contains the ``os`` property and further conditions are required to
   separate the observed results.

 So an example ``update-properties.json`` file might look like::

   {
     "properties": ["product", "os"],
     "dependents": {"product": ["browser_channel"], "os": ["version"]}
   }

 Examples
 ~~~~~~~~

 Update all the expectations from a set of cross-platform test runs::

   wpt update-expectations --full osx.log linux.log windows.log

 Add expectation data for some new tests that are expected to be
 platform-independent::

   wpt update-expectations tests.log

 Why a Custom Format?
 --------------------

 Introduction
 ------------

 Given the use of the metadata files in CI systems, it was desirable to
 have something with the following properties:

  * Human readable

  * Human editable

  * Machine readable / writable

  * Capable of storing key-value pairs

  * Suitable for storing in a version control system (i.e. text-based)

 The need for different results per platform means either having
 multiple expectation files for each platform, or having a way to
 express conditional values within a certain file. The former would be
 rather cumbersome for humans updating the expectation files, so the
 latter approach has been adopted, leading to the requirement:

  * Capable of storing result values that are conditional on the platform.

 There are few extant formats that clearly meet these requirements. In
 particular although conditional properties could be expressed in many
 existing formats, the representation would likely be cumbersome and
 error-prone for hand authoring. Therefore it was decided that a custom
 format offered the best tradeoffs given the requirements.
	Test Metadata
	=============

	Directory Layout
	----------------

	Metadata files must be stored under the ``metadata`` directory passed
	to the test runner. The directory layout follows that of
	web-platform-tests with each test source path having a corresponding
	metadata file. Because the metadata path is based on the source file
	path, files that generate multiple URLs e.g. tests with multiple
	variants, or multi-global tests generated from an ``any.js`` input
	file, share the same metadata file for all their corresponding
	tests. The metadata path under the ``metadata`` directory is the same
	as the source path under the ``tests`` directory, with an additional
	``.ini`` suffix.

	For example a test with URL::

	/spec/section/file.html?query=param

	generated from a source file with path::

	<tests root>/spec/section.file.html

	would have a metadata file ::

	<metadata root>/spec/section/file.html.ini

	As an optimisation, files which produce only default results
	(i.e. ``PASS`` or ``OK``), and which don't have any other associated
	metadata, don't require a corresponding metadata file.

	Directory Metadata
	~~~~~~~~~~~~~~~~~~

	In addition to per-test metadata, default metadata can be applied to
	all the tests in a given source location, using a ``__dir__.ini``
	metadata file. For example to apply metadata to all tests under
	``<tests root>/spec/`` add the metadata in ``<tests
	root>/spec/__dir__.ini``.

	Metadata Format
	---------------
	The format of the metadata files is based on the ini format. Files are
	divided into sections, each (apart from the root section) having a
	heading enclosed in square braces. Within each section are key-value
	pairs. There are several notable differences from standard .ini files,
	however:

	* Sections may be hierarchically nested, with significant whitespace
	indicating nesting depth.

	* Only ``:`` is valid as a key/value separator

	A simple example of a metadata file is::

	root_key: root_value

	[section]
	section_key: section_value

	[subsection]
	subsection_key: subsection_value

	[another_section]
	another_key: [list, value]

	Conditional Values
	~~~~~~~~~~~~~~~~~~

	In order to support values that depend on some external data, the
	right hand side of a key/value pair can take a set of conditionals
	rather than a plain value. These values are placed on a new line
	following the key, with significant indentation. Conditional values
	are prefixed with ``if`` and terminated with a colon, for example::

	key:
	if cond1: value1
	if cond2: value2
	value3

	In this example, the value associated with ``key`` is determined by
	first evaluating ``cond1`` against external data. If that is true,
	``key`` is assigned the value ``value1``, otherwise ``cond2`` is
	evaluated in the same way. If both ``cond1`` and ``cond2`` are false,
	the unconditional ``value3`` is used.

	Conditions themselves use a Python-like expression syntax. Operands
	can either be variables, corresponding to data passed in, numbers
	(integer or floating point; exponential notation is not supported) or
	quote-delimited strings. Equality is tested using ``==`` and
	inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are
	used in the expected way. Parentheses can also be used for
	grouping. For example::

	key:
	if (a == 2 or a == 3) and b == "abc": value1
	if a == 1 or b != "abc": value2
	value3

	Here ``a`` and ``b`` are variables, the value of which will be
	supplied when the metadata is used.

	Web-Platform-Tests Metadata
	---------------------------

	When used for expectation data, metadata files have the following format:

	* A section per test URL provided by the corresponding source file,
	with the section heading being the part of the test URL following
	the last ``/`` in the path (this allows multiple tests in a single
	metadata file with the same path part of the URL, but different
	query parts). This may be omitted if there's no non-default
	metadata for the test.

	* A subsection per subtest, with the heading being the title of the
	subtest. This may be omitted if there's no non-default metadata for
	the subtest.

	* The following known keys:

	:expected:
	The expectation value or values of each (sub)test. In
	the case this value is a list, the first value represents the
	typical expected test outcome, and subsequent values indicate
	known intermittent outcomes e.g. ``expected: [PASS, ERROR]``
	would indicate a test that usually passes but has a known-flaky
	``ERROR`` outcome.

	:disabled:
	Any values apart from the special value ``@False``
	indicates that the (sub)test is disabled and should either not be
	run (for tests) or that its results should be ignored (subtests).

	:restart-after:
	Any value apart from the special value ``@False``
	indicates that the runner should restart the browser after running
	this test (e.g. to clear out unwanted state).

	:fuzzy:
	Used for reftests. This is interpreted as a list containing
	entries like ``<meta name=fuzzy>`` content value, which consists of
	an optional reference identifier followed by a colon, then a range
	indicating the maximum permitted pixel difference per channel, then
	semicolon, then a range indicating the maximum permitted total
	number of differing pixels. The reference identifier is either a
	single relative URL, resolved against the base test URL, in which
	case the fuzziness applies to any comparison with that URL, or
	takes the form lhs URL, comparison, rhs URL, in which case the
	fuzziness only applies for any comparison involving that specific
	pair of URLs. Some illustrative examples are given below.

	:implementation-status:
	One of the values ``implementing``,
	``not-implementing`` or ``backlog``. This is used in conjunction
	with the ``--skip-implementation-status`` command line argument to
	``wptrunner`` to ignore certain features where running the test is
	low value.

	:tags:
	A list of labels associated with a given test that can be
	used in conjunction with the ``--tag`` command line argument to
	``wptrunner`` for test selection.

	In addition there are extra arguments which are currently tied to
	specific implementations. For example Gecko-based browsers support
	``min-asserts``, ``max-asserts``, ``prefs``, ``lsan-disabled``,
	``lsan-allowed``, ``lsan-max-stack-depth``, ``leak-allowed``, and
	``leak-threshold`` properties.

	* Variables taken from the ``RunInfo`` data which describe the
	configuration of the test run. Common properties include:

	:product: A string giving the name of the browser under test
	:browser_channel: A string giving the release channel of the browser under test
	:debug: A Boolean indicating whether the build is a debug build
	:os: A string the operating system
	:version: A string indicating the particular version of that operating system
	:processor: A string indicating the processor architecture.

	This information is typically provided by :py:mod:`mozinfo`, but
	different environments may add additional information, and not all
	the properties above are guaranteed to be present in all
	environments. The definitive list of available properties for a
	specific run may be determined by looking at the ``run_info`` key
	in the ``wptreport.json`` output for the run.

	* Top level keys are taken as defaults for the whole file. So, for
	example, a top level key with ``expected: FAIL`` would indicate
	that all tests and subtests in the file are expected to fail,
	unless they have an ``expected`` key of their own.

	An simple example metadata file might look like::

	[test.html?variant=basic]
	type: testharness

	[Test something unsupported]
	expected: FAIL

	[Test with intermittent statuses]
	expected: [PASS, TIMEOUT]

	[test.html?variant=broken]
	expected: ERROR

	[test.html?variant=unstable]
	disabled: http://test.bugs.example.org/bugs/12345

	A more complex metadata file with conditional properties might be::

	[canvas_test.html]
	expected:
	if os == "mac": FAIL
	if os == "windows" and version == "XP": FAIL
	PASS

	Note that ``PASS`` in the above works, but is unnecessary since it's
	the default expected result.

	A metadata file with fuzzy reftest values might be::

	[reftest.html]
	fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]

	In this case the default fuzziness for any comparison would be to
	require a maximum difference per channel of less than or equal to 10
	and less than or equal to 200 total pixels different. For any
	comparison involving ref1.html on the right hand side, the limits
	would instead be a difference per channel not more than 20 and a total
	difference count of not less than 200 and not more than 300. For the
	specific comparison ``subtest1.html == ref2.html`` (both resolved against
	the test URL) these limits would instead be 10 to 15 and 0 to 20,
	respectively.

	Generating Expectation Files
	----------------------------

	wpt provides the tool ``wpt update-expectations`` command to generate
	expectation files from the results of a set of test runs. The basic
	syntax for this is::

	./wpt update-expectations [options] [logfile]...

	Each ``logfile`` is a wptreport log file from a previous run. These
	can be generated from wptrunner using the ``--log-wptreport`` option
	e.g. ``--log-wptreport=wptreport.json``.

	``update-expectations`` takes several options:

	--full Overwrite all the expectation data for any tests that have a
	result in the passed log files, not just data for the same run
	configuration.

	--disable-intermittent When updating test results, disable tests that
	have inconsistent results across many
	runs. This can precede a message providing a
	reason why that test is disable. If no message
	is provided, ``unstable`` is the default text.

	--update-intermittent When this option is used, the ``expected`` key
	stores expected intermittent statuses in
	addition to the primary expected status. If
	there is more than one status, it appears as a
	list. The default behaviour of this option is to
	retain any existing intermittent statuses in the
	list unless ``--remove-intermittent`` is
	specified.

	--remove-intermittent This option is used in conjunction with
	``--update-intermittent``. When the
	``expected`` statuses are updated, any obsolete
	intermittent statuses that did not occur in the
	specified log files are removed from the list.

	Property Configuration
	~~~~~~~~~~~~~~~~~~~~~~

	In cases where the expectation depends on the run configuration ``wpt
	update-expectations`` is able to generate conditional values. Because
	the relevant variables depend on the range of configurations that need
	to be covered, it's necessary to specify the list of configuration
	variables that should be used. This is done using a ``json`` format
	file that can be specified with the ``--properties-file`` command line
	argument to ``wpt update-expectations``. When this isn't supplied the
	defaults from ``<metadata root>/update_properties.json`` are used, if
	present.

	Properties File Format
	++++++++++++++++++++++

	The file is JSON formatted with two top-level keys:

	:``properties``:
	A list of property names to consider for conditionals
	e.g ``["product", "os"]``.

	:``dependents``:
	An optional dictionary containing properties that
	should only be used as "tie-breakers" when differentiating based on a
	specific top-level property has failed. This is useful when the
	dependent property is always more specific than the top-level
	property, but less understandable when used directly. For example the
	``version`` property covering different OS versions is typically
	unique amongst different operating systems, but using it when the
	``os`` property would do instead is likely to produce metadata that's
	too specific to the current configuration and more difficult to
	read. But where there are multiple versions of the same operating
	system with different results, it can be necessary. So specifying
	``{"os": ["version"]}`` as a dependent property means that the
	``version`` property will only be used if the condition already
	contains the ``os`` property and further conditions are required to
	separate the observed results.

	So an example ``update-properties.json`` file might look like::

	{
	"properties": ["product", "os"],
	"dependents": {"product": ["browser_channel"], "os": ["version"]}
	}

	Examples
	~~~~~~~~

	Update all the expectations from a set of cross-platform test runs::

	wpt update-expectations --full osx.log linux.log windows.log

	Add expectation data for some new tests that are expected to be
	platform-independent::

	wpt update-expectations tests.log

	Why a Custom Format?
	--------------------

	Introduction
	------------

	Given the use of the metadata files in CI systems, it was desirable to
	have something with the following properties:

	* Human readable

	* Human editable

	* Machine readable / writable

	* Capable of storing key-value pairs

	* Suitable for storing in a version control system (i.e. text-based)

	The need for different results per platform means either having
	multiple expectation files for each platform, or having a way to
	express conditional values within a certain file. The former would be
	rather cumbersome for humans updating the expectation files, so the
	latter approach has been adopted, leading to the requirement:

	* Capable of storing result values that are conditional on the platform.

	There are few extant formats that clearly meet these requirements. In
	particular although conditional properties could be expressed in many
	existing formats, the representation would likely be cumbersome and
	error-prone for hand authoring. Therefore it was decided that a custom
	format offered the best tradeoffs given the requirements.