Internet Advertising Bureau
Technology Interoperability Subcommittee

WD-adreport-19970626

Internet Advertising Report Format

IAB TIS Working Draft WD-adreport-19970626

Latest version:
http://www.basswoodassoc.com/standards/WD-adreport.html
This version:
http://www.basswoodassoc.com/standards/WD-adreport-19970626.html
Author:
Tom Shields <tom.shields@basswoodassoc.com>

Status of this document

This is an IAB TIS Working Draft for review by IAB members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use IAB Working Drafts as reference material or to cite them as other than "work in progress".

Note: since working drafts are subject to frequent change, you are advised to reference the above URL, rather than the URLs for working drafts themselves.

Abstract

A standard format for presenting internet ad performance information is presented.  The format is extensible, portable, and easy to parse into a variety of data stores.  The proposal is motivated by the need to capture and compare ad performance information independent of the software used to perform the ad delivery.

Introduction

Advertisers and agencies who plan internet ad campaigns require ad performance reports that are comparable across web sites.  Because many web sites use different ad delivery software, agencies are currently expending considerable manual effort reconciling these reports.  One fixed format report is not enough, because different campaigns require reporting of different kinds of information.  This proposal defines a self-describing report format that is extensible and flexible enough to report many kinds of ad performance information, but simple and strict enough to be easily parsed and converted into data stores for analysis and comparison.

This proposal is broken into two major sections.  The first is a description of the file format syntax, with emphasis on the encoding and ordering of information for easy parsing.  The second part is a "taxonomy" that describes the semantics of standard headers, columns, and report templates, to ensure the utility and comparability of the information.

The report format has the following design goals:

This proposal makes no attempt to define the performance measures used by advertisers, agencies, and ad delivery systems, except to provide for the reporting of them.  This proposal explicitly does not address the issue of transmission of this data, for example from a web site to an advertiser or agency.  This format does not define presentation information, and is designed to be readable, but not camera ready.

This work is distantly based on the W3C Extended Log File Format [Hallam96] working draft.

Format

An advertising report file contains a sequence of lines containing octets terminated by either the sequence LF or CRLF. Report generators should follow the line termination convention for the platform on which they are executed. Analyzers and converters must accept either form. Each line may contain either a directive or an entry. Directives record metadata about the report. Entries consist of a sequence of fields relating to a ad event or series of events. Fields are units of information, separated by whitespace.  In general, directives, attribute names, and field and template identifiers are case insensitive, while data should be considered case sensitive.

Format errors may indicate corrupt data or a non-conforming format generator.  A conforming IARF parser should reject any report file that contains format errors, and rollback any data stores to the state that preceded the start of parsing.  In some non-critical cases, such as a report file viewer, it may be acceptable to ignore corrupt lines and permit viewing uncorrupted data, but an error message indicating the file corruption should be shown.

Directives

Directives are lines beginning with the '#' character.  No whitespace is permitted between the '#' and the directive name. Directive data takes the form of attribute value pairs.  Defined attributes may occur in any order.  If an attribute occurs more than once, the last occurrence is taken as the proper value.  Parsers should ignore unknown directives and unknown attributes.  Every conforming report file must begin with the directive IARF, and contain other required directives as described below.
<directive> = "#" <name> *[ <whitespace> <nvpair> ] <end-of-line>
<name> = <alnum> *[ <alnum> | "-" ]
<nvpair> = <name> "=" <string>

Entries

Entries are analagous to database rows.  An entry consists of a sequence of fields, separated by whitespace.  All lines not beginning with '#' are considered entries.  If the first field of an entry begins with '#', the field must be surrounded by double quotes ("") to distinguish it from a directive.  Entries are not required in report files, although a report file with no entries may not convey much information.  Lines containing only whitespace should be ignored. If an entry contains too few or too many fields as defined by the Format directive, the file is non-conforming or corrupt, and parsing should halt.  No line continuation character is defined - entries must be completely contained on one line.
<entry> = [ <field> *[ <whitespace> <field> ]] <end-of-line>

Fields

Fields are individual data entities analagous to database columns in a row.  Fields conform to one of the following data types, according to the field definition. Fields without definitions should be considered as data type <string>. Fields may not contain ASCII control characters unless they are encoded as described below. Because fields are separated by whitespace, fields must consist of at least one non-whitespace character.
<field> = <integer> | <float> | <uri> | <date> | <time> | <string>
<type-identifier> = "integer" | "float" | "uri" | "date" | "time" | "string"
<field-identifier> = <name>

<digit>        = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<lowalpha>     = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
<hialpha>      = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
<punct>        = "$" | "-" | "_" | "." | "+" | "!" | "*" | "(" | ")" |
                 "{" | "}" | "|" | "&" | "^" | "~" | "[" | "]" | "," |
                 "<" | ">" | "#" | "%" | ";" | "/" | "'" | "`" | "=" |
                 "?" | ":" | "@"
<8-bit>        = <0x80 - 0xFF>
<reserved>     = <"> | "\"
<wchar>        = <tab> | <space>
<whitespace>   = <wchar> *<wchar>
<alnum>        = <lowalpha> | <hialpha> | <digit>
<version>      = <digit> *<digit> "." <digit> *<digit> *[ "." <digit> *<digit> ]

Integer

<integer> = [ "-" ] <digit> *<digit>
Integers are represented as a sequence of digits, base 10.

Float

<float> = [ "-" ] <digit> *<digit> [ "." *<digit> ]
Floats are represented as at least one digit left of decimal, and arbitrary precision.

URI

A URI as specified by RFC1738, relative URIs are specified by RFC1808. URIs cannot by definition include whitespace or ASCII control characters, therefore they do not need to be escaped or enclosed in quotes.

Date

<date>  = 4*<digit> "-" 2*<digit> "-" 2*<digit>
Dates are recorded in the format YYYY-MM-DD where YYYY, MM and DD stand for the numeric year, month (1-12) and day (1-31) respectively. All dates are specified in GMT, unless the GMT-Offset attribute of the Site directive is used. This format is chosen to assist collation.

Time

<time>  = 2*<digit> ":" 2*<digit> [ ":" 2*<digit> [ "." *<digit> ]]
Times are recorded in the form HH:MM, HH:MM:SS or HH:MM:SS.S where HH is the hour in 24 hour format, MM is minutes and SS is seconds. All times are specified in GMT, unless the GMT-Offset attribute of the Site directive is used.

String

<string> = <wstring> | <qstring>
<wstring> = <alnum> *[ <alnum> | <punct> | <8-bit> ]
<qstring> = <"> *<schar> <">

<schar> = <alnum> | <punct> | <8-bit> | <wchar> | <xchar>
<xchar> = <""> | <xescape>
<xescape> = "\x" 2*<hexdigit>
<hexdigit> = <digit> | "A" | "B" | "C" | "D" | "E" | "F"
Strings are rendered in either unquoted or quoted/escaped form.  Unquoted form may only be used for strings that do not contain whitespace, reserved, or control characters, and may be parsed by simply scanning for the end of field separator (whitespace).  Quoted strings must be parsed and characters unescaped appropriately.  Empty strings may be represented by <"">.

The character escaping rules are designed to be easy to parse by a variety of conversion tools, and expressive enough to encode any characters.  The basic rule is any character that is either a special character (backslash or double quote) or not printable (such as control characters) may be encoded using the form "\xFF" where the FF represents the two hex digits of the character.  There is a special case designed for legacy conversion tools: the double quote may also be escaped by doubling the character, and should be escaped in this fashion when possible.

Taxonomy

The taxonomy describes the semantics of standard directives, field identifiers, and templates.  These semantics are subject to change as more feedback is received from the industry.  In particular, the semantics will be changed if necessary to follow IAB definitions and standards.

To maintain simplicity in the format, there are two levels to specify report information: in the Field-Values directive, and in the entries.  This may result in some confusion about where information should be represented.  For example, a report may represent a single campaign over a number of days - in this case, a single Field-Values directive with the campaign-id attribute suffices to represent that information.  However, if a report represents many different campaigns over a period of time, the campaign-name field should be specified in each entry.

Report file parsers should ignore unknown directives and fields.

Directive Names

The following standard directives are defined.  New directives not approved by the IAB TIS should be preceded with the string "X-" as in "X-New-Directive".  Directive names are case-insensitive. All directive attributes are optional unless explicitly indicated otherwise in the description.  New attributes for existing directives should likewise be preceded by "X-" as in "X-New-Attribute".  Arbitrary comments can be inserted in the file using the Remark directive. All directives may occur more than once within the file, enabling concatenation of conforming report files to result in a conforming file.
IARF Version=<version>
The IARF directive is required and must appear as the first line of the report file. Conforming parsers should ignore lines in the file before this directive is encountered. Subsequent occurrences of this directive before the matching End-IARF must be ignored.  The version attribute is required, and defines the IARF version used by the report. This draft defines version 1.0.
End-IARF
The End-IARF directive is required and must appear as the last line of the report file. No further processing of entries or directives will occur after this directive is encountered.  Conforming parsers should resume processing if an IARF directive is found in the remaining lines of the file, to permit concatenation of valid IARF files.
Content Charset=<string> Language=<string> Encoding=<string>
The Content directive indicates content characteristics such as character set, language, and encoding.  The Charset attribute is defined as exactly the same as the charset parameter of the Content-Type header defined in [RFC2068].  The Language and Encoding attributes are defined as exactly the same as the Content-Language and Content-Encoding headers in [RFC2068].  The default value for Charset is "ISO-8859-1". If Language is not specified, the language is not known. If Encoding is not specified, there is no encoding.  The Content directive may occur more than once, and takes effect for all entries until the next Content directive.
Format Fields="<field-identifier> [<field-identifier> [...]]" Template=<name> Headers=<string>
The Format directive controls the field ordering of the entries that follow.  This directive must appear in the report file before any entry lines are encountered, and may occur more than once to encode multiple formats in one file.  The Format directive will take effect for the entries that follow the directive until superceded by a subsequent Format directive.  The Fields and Template attributes are complementary, either or both may be present, and at least one is required; if both are specified they must agree according to the set of standard templates specified below. Non-standard field identifiers may be typed using the Field-Info directive.  The Headers attribute allows specification of human-readable headers for the fields.
Field-Info Name=<field-identifier> Type=<type-identifier> Header=<string>
Optional additional information for each field; this directive is never required.  Standard fields defined below are strictly typed, and unknown fields should be considered type "string". The Name attribute refers to a field-identifier specified in the Format directive. Type identifiers convey field type information; allowed values are listed above, and are case insensitive.  The Header attribute is a string representing column header label information.
Field-Values <field-identifier>=[ <integer> | <float> | <uri> | <date> | <time> | <string> ]...
The Field-Values directive specifies values for fields that remain constant for many entries, and is intended to reduce the size and complexity of IARF files.  Specifying a field and value using the Field-Values directive is exactly equivalent to specifying the field in the Format directive and placing the value in each entry.  Field values take effect for all entries following the Field-Values directive, and are only overridden by a subsequent Field-Values directive specifying the same field identifier.  Multiple field values may be specified with one Field-Values directive.
Source Name=<string> Domain=<string> Type=<string> GMT-Offset=<integer> Email=<string> Contact=<string>
The Source directive conveys information specific to the entity that generated the data.  Name is a human-readable name, domain refers to the primary site domain name.  Type is used to categorize the data source; recommended values include "site" and "network".  GMT-Offset represents the offset in hours (first two digits) and minutes (last two digits) for all dates and times in both entries and directives. This directive may occur anywhere within the file and the GMT-Offset takes effect for all directives and entries following it.  If the GMT-Offset attribute does not occur, all dates and times are assumed to be GMT.  The Email and Contact attributes are for representing contact info regarding this report file.
Created Report-Date=<date> Report-Time=<time> Vendor=<name> Version=<version> OS=<name> OS-Version=<version>
The Created directive is used for information about how and when the report was created.  Report-Date and Report-Time represent the date and time when the report was run.  Vendor and Version represent the software used to create the report.  OS and OS-Version are used to indicate the platform this software was running on.
Remark <text>
Rem <text>
Comment information. Data recorded by this directive should be ignored by analysis tools.  It should be noted that this directive does not follow the "name=value" convention of all of the other directives, and may need to be special-cased during processing.

Field Identifiers

The Fields directive lists a sequence of field identifiers specifying the semantics and data type recorded in the fields of each entry. New field identifiers not approved by the IAB TIS should be preceded with the string "X-" as in "X-New-Field".  Field identifiers may be one of the following case-insensitive strings:
report-start-date - <date>
Date beginning the entry or report.  If this field is not present, interpretation of the entry date is system-specific.
report-start-time - <time>
Time beginning the entry or report.  If this field is not present, tools should default to 00:00:00.  If this field is present, report-end-time must also be present.
report-end-date - <date>
Date the entry or report ends, inclusive.  If this field is not present, tools should default to the same as the report-start-date, unless the report-end-time precedes the report-start-time, in which case the default should be the day after the report-start-date.
report-end-time - <time>
Time the entry or report ends, exclusive.  This field must be present if report-start-time is present.
ad-name - <string>
The name of the ad banner.
ad-agency-id - <string>
The agency or advertiser assigned ID for the ad.  It is recommended that this field appear in every report, as it is likely to be needed to correlate reports from different placements.
ad-site-id - <string>
The unique server-assigned ID of the ad banner.
ad-click-url - <uri>
The clickthrough URL for the ad banner.  Note that the same ad may have multiple click URLs.
ad-creative-url - <uri>
The URL for the ad banner creative.  Note that the one ad may have multiple creatives, or no creative URL.
ad-alt-text - <string>
The "alternate text" for the ad. The text that should be displayed if the browser cannot display images.
advertiser-name=<string>
The name of the advertiser.
advertiser-brand=<string>
The brand associated with the advertiser.
agency-name=<string>
The name of the ad agency representing the advertiser.
agency-insertion-order=<string>
The agency-assigned insertion order number represented by the entry.
site-insertion-order=<string>
The site-assigned insertion order number represented by the entry.
campaign-id - <string>
The agency or advertiser assigned campaign identifier.
campaign-name - <string>
The agency or advertiser assigned campaign name.
flight-name=<string>
The advertiser or agency assigned name of the flight.  For the purposes of this draft, the flight contains advertiser or agency specified information about the ad run, which may or may not correspond to the actual run information returned by the site.
flight-start-date=<date>
The intended start date for the flight, inclusive.
flight-start-time=<time>
The intended start time for the flight, inclusive.
flight-end-date=<date>
The intended end date for the flight, inclusive.
flight-end-time=<time>
The intended end time for the flight, exclusive.
flight-impression-guarantee=<integer>
The number of impressions required to complete the flight.
flight-click-guarantee=<integer>
The number of clicks required to complete the flight.
flight-target=<string>
The agency-defined targeting criteria for the flight, which may bear no relationship to the actual delivery specified in the run target string.
run-name=<string>
The site-defined name of the run.  For the purposes of this draft, the run contains site specified information about the ad run, which may or may not correspond to the flight information requested by the advertiser or agency.
run-start-date=<date>
The actual start date for the run, inclusive.
run-start-time=<time>
The actual start time for the run, inclusive.
run-end-date=<date>
The actual (or projected) end date for the run, inclusive.
run-end-time=<time>
The actual (or projected) end time for the run, exclusive.
run-target - <string>
The name of the target or placement for the run - may be a page URL, section of the site, run of site, keyword, etc.  The format of this is site-defined.
site-name - <string>
The name of the site the ad is displayed on.
site-domain - <string>
The primary domain name of the site the ad is displayed on.
request-browser - <string>
The browser manufacturer of the requesting browser.  Recommended values include "Netscape, Microsoft, Lynx, Unknown".
request-platform - <string>
The platform of the requesting browser.  Recommended values include: "Win16, Win95, WinNT, Mac, Unix, Unknown".
request-top-domain - <string>
The top level domain of the host originating the request.  Values will include "com, edu, net, org, mil, gov", two letter country codes, and the special value "unresolved" for IP addresses that failed reverse DNS lookup.
request-second-domain -<string>
The second level domain of the host originating the request.  Values should also include the top level domain.  Examples include "aol.com, netcom.com, uiuc.edu", and the special value "unresolved" for IP addresses that failed reverse DNS lookup.
total-ad-insertions - <integer>
The total number of ad insertions recorded.  Ad insertions are defined as the same as ad requests as defined by the IAB in [IAB97].  They are distinct from ad downloads (below).
total-ad-downloads - <integer>
The total number of ad downloads recorded.  For the purposes of this draft, ad downloads are defined as requests for the ad media URL, independent of whether or not the media was successfully transferred or viewed by the browser.  They are distinct from ad insertions (above) for a variety of reasons; in particular, several types of ads do not have media URLs.
total-ad-clicks - <integer>
The total number of ad clicks recorded.
total-ad-duration - <float>
The total time duration the ad was viewed, in seconds.  This metric is not yet well defined, and is not available for most ad types currently in use.
total-ad-hover-time - <float>
The total hover time for the ad, in seconds.  This metric is not yet well defined, and is not available for most ad types currently in use.
total-ad-response-time - <float>
The total response time for the ad, in seconds.  This metric is not yet well defined, and may not always be available.
visit-ad-insertions - <integer>
The total number of ad insertions recorded in unique visits.  Visits are defined in [IAB97].
visit-ad-downloads - <integer>
The total number of ad downloads recorded in unique visits.
visit-ad-clicks - <integer>
The total number of ad clicks recorded in unique visits.
unique-ad-insertions - <integer>
The total number of ad insertions recorded by unique users.  Unique users are defined in [IAB97].
unique-ad-downloads - <integer>
The total number of ad downloads recorded by unique users.
unique-ad-clicks - <integer>
The total number of ad clicks recorded by unique users.

Standard Templates

In order to promote report standardization and enable simpler report analysis tools, a set of standard templates are defined.  Agencies and advertisers may therefore require reports to conform to both the standard format, and one (or more) of the standard templates.  Analysis tools may interpret a limited number of templates, instead of requiring enough generality to interpret the entire spectrum of report types.  A template attribute should be interpreted exactly the same as the associated fields attribute; report generators are encouraged to insert both attributes in the Format directive, for maximum flexibility.  Templates also have associated required header fields, these must appear in a Field-Values directive before the first entry.  Templates not approved by the IAB TIS should be preceded by the string "X-" as in "X-New-Template".  For the purposes of this draft, the following standard case-insensitive template names are defined:
Template=ad-totals
Fields="ad-name ad-agency-id total-ad-insertions total-ad-downloads total-ad-clicks"
Required Field-Values are report-start-date, report-end-date, agency-insertion-order, and campaign-id
This template is the most basic set of report criteria.  For each ad, total ad insertions, downloads, and clicks are aggregated and reported.  Report dates may be set using Field-Values and the start-date and end-date fields.
Template=ad-daily
Fields="report-start-date ad-name ad-agency-id total-ad-insertions total-ad-downloads total-ad-clicks"
Required Field-Values are report-end-date, agency-insertion-order, and campaign-id
This template may be used to break down ad performance by day.
Template=ad-target-totals
Fields="ad-name ad-agency-id run-target total-ad-insertions total-ad-downloads total-ad-clicks"
Required Field-Values are report-start-date, report-end-date, agency-insertion-order, and campaign-id
This template may be used to break down ad performance by targeting criteria, such as keyword or page on the site.
Template=ad-target-daily
Fields="report-start-date ad-name ad-agency-id run-target total-ad-insertions total-ad-downloads total-ad-clicks"
Required Field-Values are report-end-date, agency-insertion-order, and campaign-id
This template may be used to break down ad performance by both day and targeting criteria.

Example

The following is an example file in the report format:
#IARF Version=1.0
#Format Template=ad-totals Fields="ad-name ad-client-id total-ad-requests total-ad-url-requests total-ad-clicks"
#Source Name="Content Provider" Domain=www.content.com GMT-Offset=-0800
#Created Report-Date=1997-04-19 Vendor=NetGravity Version=3.0
#Field-Values report-start-date=1997-04-01 report-end-date=1997-04-15
#Field-Values advertiser-name=Ford campaign-id="Explore the world"
#Field-Values agency-name="Funky Agency" agency-insertion-order=11783
#Field-Values flight-start-date=1997-04-01 flight-end-date=1997-04-30 flight-impression-guarantee=1000000
"Ford Explorer" "explorer.gif"      10253 0  843
"Ford Ranger"   "ranger.gif"         2543 0   85
"Ford Taurus"   "taurus.gif"        84922 0 1024
"Ford Escort"   "escort.gif"        11765 0  682
#Remark This IARF stuff is pretty cool!
#End-IARF
The next example uses a slightly more complex template.
#IARF Version=1.0
#Format Template=ad-target-daily Fields="start-date ad-name ad-client-id run-target total-ad-requests total-ad-url-requests total-ad-clicks"
#Source Name="Content Provider" Domain=www.content.com GMT-Offset=-0800
#Created Report-Date=1997-04-03 Vendor=NetGravity Version=3.0
#Field-Values report-end-date=1997-04-01
#Field-Values advertiser-name=Microsoft campaign-id="Try Java"
#Field-Values agency-name="Funky Agency" agency-insertion-order=11784
#Field-Values flight-start-date=1997-04-01 flight-end-date=1997-04-30 flight-impression-guarantee=1000000
1997-04-01 "MS Image ad" msimage.gif "Javaless browsers"    0 10253  843
1997-04-01 "MS Java ad" msjava.class "Java capable"      2543     0  175
1997-04-01 "MS Image ad" msimage.gif "Javaless browsers"    0 11874  735
1997-04-01 "MS Java ad" msjava.class "Java capable"      2278     0  156
#End-IARF
The last example includes some experimental directives, attributes, and fields.
#IARF Version=1.0
#Format Fields="report-start-date report-start-time report-end-time ad-agency-id x-ad-size run-target total-ad-downloads total-ad-insertions total-ad-clicks"
#Field-Info Name=x-ad-size Type=string Header="Ad Size"
#Field-Values advertiser-name="The Big Little Co" campaign-id="Size wise"
#Field-Values agency-name="Funky Agency" agency-insertion-order=11785 x-agency-contact="John Smith"
#X-Display-Hint GraphType=Pie
1997-04-01 08:00 09:00 "Big Ad"    banner "Sports"    10253 13378  843
1997-04-01 08:00 09:00 "Little Ad" button "Sports"     2543  2893   85
1997-04-01 09:00 10:00 "Big Ad"    banner "Sports"    84922 90738 1024
1997-04-01 09:00 10:00 "Little Ad" button "Sports"    10765 12094  682
#Field-Values advertiser-name="BigCo, LTD" campaign-id="We're Big!"
#Field-Values agency-insertion-order="94378"
1997-04-01 10:00 11:00 "Tall Ad"  banner "Sports"    10158 15932  729
1997-04-01 10:00 11:00 "Short Ad" button "Sports"     3097  3387   79
1997-04-01 11:00 12:00 "Tall Ad"  banner "Sports"    96123 99332 1083
1997-04-01 11:00 12:00 "Short Ad" button "Sports"    11074 14930  701
#End-IARF

Acknowledgements

Steve Goldberg pushed the IAB TIS (originally the IATC) to develop this format. Paul Nakada contributed the idea of attribute value pairs for directives. Tom Churchill contributed the idea of a Field-Values directive.  The IAB TIS, in particular Gil Beyda, provided much feedback.

References

[Hallam96]
P. Hallam-Baker, B. Behlendorf Extended Log File Format, March 1996
[RFC1808]
R. Fielding Relative Uniform Resource Locators, June 1995
[RFC1738]
T. Berners-Lee, L. Masinter, Uniform Resource Locators (URL), December 1994
[RFC2068]
R. Fielding, et al, Hypertext Transfer Protocol -- HTTP/1.1, January 1997
[RFC1700]
Reynolds, J., and J. Postel, Assigned Numbers, STD 2, RFC 1700, USC/ISI, October 1994.
[IAB97]
IAB Media Measurement Task Force, Metrics and Methodology for Internet Advertising, June 1997.
$Id: WD-adreport.html,v 1.3 1999/02/19 01:33:21 ts Exp $