Internet Advertising Technology Council

WD-adreport-19970515

Internet Advertising Report Format

IATC Working Draft WD-adreport-19970515

Latest version:
http://www.basswoodassoc.com/standards/WD-adreport.html
This version:
http://www.basswoodassoc.com/standards/WD-adreport.html-19970515
Author:
Tom Shields <tom.shields@basswoodassoc.com>

Status of this document

This is a IATC Working Draft for review by IATC members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use IATC Working Drafts as reference material or to cite them as other than "work in progress".

Note: since working drafts are subject to frequent change, you are advised to reference the above URL, rather than the URLs for working drafts themselves.

Abstract

A standard format for presenting internet ad performance information is presented.  The format is extensible, portable, and easy to parse into a variety of data stores.  The proposal is motivated by the need to capture and compare ad performance information independent of the software used to perform the ad delivery.

Introduction

Advertisers and agencies who plan internet ad campaigns require ad performance reports that are comparable across web sites.  Because many web sites use different ad delivery software, agencies are currently expending considerable manual effort reconciling these reports.  One fixed format report is not enough, because different campaigns require reporting of different kinds of information.  This proposal defines a self-describing report format that is extensible and flexible enough to report many kinds of ad performance information, but simple and strict enough to be easily parsed and converted into data stores for analysis and comparison.

This proposal is broken into two major sections.  The first is a description of the file format syntax, with emphasis on the encoding and ordering of information for easy parsing.  The second part is a "taxonomy" that describes the semantics of standard headers, columns, and report templates, to ensure the utility and comparability of the information.

The report format has the following design goals:

This proposal makes no attempt to define the performance measures used by advertisers, agencies, and ad delivery systems, except to provide for the reporting of them.  This proposal explicitly does not address the issue of transmission of this data, for example from a web site to an advertiser or agency.  This format does not define presentation information, and is designed to be readable, but not camera ready.

This work is distantly based on the W3C Extended Log File Format [Hallam96] working draft.

Format

A advertising report file contains a sequence of lines containing 8-bit ISO latin-1 characters terminated by either the sequence LF or CRLF. Report generators should follow the line termination convention for the platform on which they are executed. Analyzers and converters must accept either form. Each line may contain either a directive or an entry. Directives record metadata about the report. Entries consist of a sequence of fields relating to a ad event or series of events. Fields are units of information, separated by whitespace.  In general, directives, attribute names, and field and template identifiers are case insensitive, while data should be considered case sensitive.

Format errors may indicate corrupt data or a non-conforming format generator.  A conforming IARF parser should reject any report file that contains format errors, and rollback any data stores to the state that preceded the start of parsing.  In some non-critical cases, such as a report file viewer, it may be acceptable to ignore corrupt lines and permit viewing uncorrupted data, but an error message indicating the file corruption should be shown.

Directives

Directives are lines beginning with the '#' character.  No whitespace is permitted between the '#' and the directive name. Directive data takes the form of attribute value pairs.  Defined attributes may occur in any order.  If an attribute occurs more than once, the last occurrence is taken as the proper value.  Parsers should ignore unknown directives, unknown attributes, and directives with parse errors.  Every conforming report file must begin with the directive IARF, and contain other required directives as described below.
<directive> = "#" <name> ":" *<nvpair> <end-of-line>
<name> = *[ <alnum> "-" ]
<nvpair>  =  <whitespace> <name> "=" <string>

Entries

Entries are analagous to database rows.  An entry consists of a sequence of fields, separated by whitespace.  All lines not beginning with '#' are considered entries.  If the first field of an entry begins with '#', the field must be surrounded by double quotes ("") to distinguish it from a directive.  Entries are not required in report files, although a report file with no entries may not convey much information.  Lines containing only whitespace should be ignored. If an entry contains too few or too many fields as defined by the Format directive, the file is non-conforming or corrupt, and parsing should halt.  No line continuation character is defined - entries must be completely contained on one line.
<entry> = [ <field> *[ <whitespace> <field> ]] <end-of-line>

Fields

Fields are individual data entities analagous to database columns in a row.  Fields conform to one of the following data types, according to the field definition. Fields without definitions should be considered as data type <string>. Fields may not contain ASCII control characters, or unprintable characters, unless they are encoded as described below. Because fields are separated by whitespace, fields must consist of at least one non-whitespace character.
<field> = <integer> | <fixed> | <uri> | <date> | <time> | <string>
<type-identifier> = "integer" | "fixed" | "uri" | "date" | "time" | "string"

<digit>        = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<lowalpha>     = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
<hialpha>      = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
<punct>        = "$" | "-" | "_" | "." | "+" | "!" | "*" | "(" | ")" |
                 "{" | "}" | "|" | "&" | "^" | "~" | "[" | "]" | "," |
                 "<" | ">" | "#" | "%" | ";" | "/" | "'" | "`" | "=" |
                 "?" | ":" | "@"
<reserved>     = <"> | "\"
<wchar>        = <tab> | <space>
<whitespace>   = <wchar> *<wchar>
<alnum>        = <lowalpha> | <hialpha> | <digit>
<version>      = <digit> *<digit> "." <digit> *<digit> *[ "." <digit> *<digit> ]

Integer

<integer> = [ "-" ] *<digit>
Integers are represented as a sequence of digits, base 10.

Fixed

<fixed> = [ "-" ] [ <digit> *<digit> [ "." *<digit> ]]
Floats are represented as at least one digit left of decimal, and arbitrary precision.

URI

A URI as specified by RFC1738, relative URIs are specified by RFC1808. URIs cannot by definition include whitespace or ASCII control characters, therefore they do not need to be escaped or enclosed in quotes.

Date

<date>  = 4*<digit> "-" 2*<digit> "-" 2*<digit>
Dates are recorded in the format YYYY-MM-DD where YYYY, MM and DD stand for the numeric year, month and day respectively. All dates are specified in GMT, unless the GMT-Offset attribute of the Site directive is used. This format is chosen to assist collation.

Time

<time>  = 2*<digit> ":" 2*<digit> [ ":" 2*<digit> [ "." *<digit> ]]
Times are recorded in the form HH:MM, HH:MM:SS or HH:MM:SS.S where HH is the hour in 24 hour format, MM is minutes and SS is seconds. All times are specified in GMT, unless the GMT-Offset attribute of the Site directive is used.

String

<string> = <wstring> | <qstring>
<wstring> = <alnum> *[ <alnum> | <punct> ]
<qstring> = <"> *<schar> <">

<schar> = <alnum> | <punct> | <wchar> | <xchar>
<xchar> = <""> | <xescape>
<xescape> = "\x" 2*<hexdigit>
<hexdigit> = <digit> | "A" | "B" | "C" | "D" | "E" | "F"
Strings are rendered in either unquoted or quoted/escaped form.  Unquoted form may only be used for strings that do not contain whitespace, reserved, or control characters, and may be parsed by simply scanning for the end of field separator (whitespace).  Quoted strings must be parsed and characters unescaped appropriately.  Empty strings may be represented by <"">.

The character escaping rules are designed to be easy to parse by a variety of conversion tools, and expressive enough to encode any characters.  The basic rule is any character that is either a special character (backslash or double quote) or not printable (such as control characters) may be encoded using the form "\xFF" where the FF represents the two hex digits of the character.  There is a special case designed for legacy conversion tools: the double quote may also be escaped by doubling the character, and should be escaped in this fashion when possible.

Taxonomy

The taxonomy describes the semantics of standard directives, field identifiers, and templates.  These semantics are subject to change as more feedback is received from the industry.  In particular, the semantics will be changed if necessary to follow IAB definitions and standards.

 To maintain simplicity in the format, there are only two levels to specify report information: directives and fields.  This may result in overlaps and confusion about where information should be represented.  For example, a report may represent a single campaign over a number of days - in this case, a single advertiser directive with the campaign attribute suffices to represent that information.  However, if a report represents many different campaigns over a period of time, the campaign field will be required to specify that information for each entry.  To avoid unnecessary complexity, some decisions have been made as to what level is appropriate for each piece of information, but some (like campaign) may be represented by either.

Report file parsers should ignore unknown directives and fields.

Directive Names

The following standard directives are defined.  New directives not approved by the IATC should be preceded with the string "X-" as in "X-New-Directive".  Directive names are case-insensitive. All directive attributes are optional unless explicitly indicated otherwise in the description.  Arbitrary comments can be inserted in the file using the Remark directive. All directives may occur more than once within the file, enabling concatenation of conforming report files to result in a conforming file.
IARF: Version=<version>
The IARF directive is required and must appear as the first line of the report file. Subsequent occurrences of this directive must be ignored.  The version attribute is required, and defines the IARF version used by the report. This draft defines version 1.0.
Content: Charset=<character-set-identifier> Language=<language-identifier> Encoding=<encoding-identifier>
The Content directive indicates content characteristics such as character set, language, and encoding.  The Charset attribute is defined as exactly the same as the charset parameter of the Content-Type header defined in [RFC2068].  The Language and Encoding attributes are defined as exactly the same as the Content-Language and Content-Encoding headers in [RFC2068].  The default value for Charset is "ISO-8859-1". If Language is not specified, the language is not known. If Encoding is not specified, there is no encoding.  The Content directive may occur more than once, and takes effect for all entries until the next Content directive.
Format: Fields="<field-identifier> [<field-identifier> [...]]" Template=<name>
The Format directive controls the field ordering of the entries that follow.  This directive must appear in the report file before any entry lines are encountered, and may occur more than once to encode multiple formats in one file.  The Format directive will take effect for the entries that follow the directive until superceded by a subsequent Format directive.  The Fields and Template attributes are complementary, either or both may be present, and at least one is required; if both are specified they MUST agree according to the set of standard templates specified below. Non-standard field identifiers may be typed using the "Field-Info" directive.
Field-Info: Name=<field-identifier> Type=<type-identifier> Header=<string>
Optional additional information for each field; this directive is never required.  Standard fields defined below are strictly typed, and unknown fields should be considered type "string". The Name attribute refers to a field-identifier specified in the Format directive. Type identifiers convey field type information; allowed values are listed above, and are case insensitive.  The Header attribute is a string representing column header label information.
Source: Name=<string> Domain=<domain> Type=<string> GMT-Offset=<integer> Email=<address> Contact=<string>
The Source directive conveys information specific to the entity that generated the data.  Name is a human-readable name, domain refers to the primary site domain name.  Type is used to categorize the data source; recommended values include "site" and "network".  GMT-Offset represents the offset in hours for all dates and times in both entries and directives. This directive may occur anywhere within the file and the GMT-Offset takes effect for all directives and entries following it.  If the GMT-Offset attribute does not occur, all dates and times are assumed to be GMT.  The Email and Contact attributes are for representing contact info regarding this report file.
Advertiser: Name=<string> Brand=<string> Campaign=<string>
The Advertiser directive is for advertiser-specific information.  The Name, Brand, and Campaign attributes are self-explanatory.
Agency: Name=<string> Insertion-Order=<string>
The agency directive may be used for ad agency information.  The Name attribute identifies the agency.  The Insertion-Order attribute represents the agency-assigned insertion order number represented by the report.  Multiple insertion orders may be reported on by using multiple Agency directives.
Flight: Name=<string> Start-Date=<date> Start-Time=<time> End-Date=<date> End-Time=<time> Impression-Guarantee=<integer> Target=<string>
The Flight directive represents ad flight information as specified by the advertiser or agency. Multiple flights may be reported in one report file by using multiple Flight directives.  The Name represents an agency-assigned identifier for the flight. Impression-Guarantee represents the number of impressions guaranteed over the duration of the flight.  Target represents the agency-defined targeting criteria for the flight, which may bear no relationship to the actual delivery specified in the entries.
Created: Report-Date=<date> Report-Time=<time> Vendor=<name> Version=<version> OS=<name> OS-Version=<version>
The Created directive is used for information about how and when the report was created.  Report-Date and Report-Time represent the date and time when the report was run.  Vendor and Version represent the software used to create the report.  OS and OS-Version are used to indicate the platform this software was running on.
Remark or Rem: <text>
Comment information. Data recorded by this directive should be ignored by analysis tools.

Field Identifiers

The Fields directive lists a sequence of field identifiers specifying the semantics and data type recorded in the fields of each entry. Field identifiers may be one of the following case-insensitive strings:
start-date - <date>
Date beginning the entry.  If this field is not present, interpretation of the entry date is system-specific.
start-time - <time>
Time beginning the entry.  If this field is not present, tools should default to 00:00:00.  If this field is present, end-time must also be present.
end-date - <date>
Date the entry ends, inclusive.  If this field is not present, tools should default to the same as the start-date, unless the end-time precedes the start-time, in which case the default should be the day after the start-date.
end-time - <time>
Time the entry ends, exclusive.  This field must be present if start-time is present.
ad-name - <string>
The name of the ad banner.
ad-server-id - <string>
The unique server-assigned ID of the ad banner.
ad-client-id - <string>
The agency or advertiser assigned ID for the ad.
ad-click-url - <uri>
The clickthrough URL for the ad banner.  Note that the same ad may have multiple click URLs.
flight-placement - <string>
The name of the placement - may be a page URL, section of the site, run of site, keyword, etc.  The format of this is site-defined.
campaign - <string>
The campaign name.  Should correspond to the Campaign attribute of the Advertiser directive.
site - <string>
The site name.  Should correspond to the Name attribute of the Source directive if the Type is site.
total-impressions - <integer>
The total number of impressions recorded.  For the purposes of this draft, impressions are defined as ad media downloads, as distinct from insertions (below).
total-insertions - <integer>
The total number of insertions recorded.  For the purposes of this draft, insertions are defined as page views containing ads, as distinct from impressions (above).
total-clicks - <integer>
The total number of clicks recorded.
total-duration - <float>
The total time duration the ad was viewed, in seconds.
total-response-time - <float>
The total response time for the ad, in seconds.
total-hover-time - <float>
The total hover time for the ad, in seconds.
session-impressions - <integer>
The total number of impressions recorded in unique sessions.
session-insertions - <integer>
The total number of insertions recorded in unique sessions.
session-clicks - <integer>
The total number of clicks recorded in unique sessions.
unique-impressions - <integer>
The total number of impressions recorded by unique users.
unique-insertions - <integer>
The total number of insertions recorded by unique users.
unique-clicks - <integer>
The total number of clicks recorded by unique users.

Standard Templates

In order to promote report standardization and enable simpler report analysis tools, a set of standard templates are defined.  Agencies and advertisers may therefore require reports to conform to both the standard format, and one (or more) of the standard templates.  Analysis tools may interpret a limited number of templates, instead of requiring enough generality to interpret the entire spectrum of report types.  A template attribute should be interpreted exactly the same as the associated fields attribute; report generators are encouraged to insert both attributes, for maximum flexibility.  Templates not approved by the IATC should be preceded by the string "X-" as in "X-New-Template".  In the future, the process for standardizing templates may be turned over to a standards body such as the IAB.  For the purposes of this draft, the following standard case-insensitive template names are defined:
Template=basic <=> Fields="start-date ad-name placement total-impressions total-insertions total-clicks"
This template is the most common and most basic set of report criteria.  Impression, insertion, and click counts are aggregated daily, and reported for each ad and placement combination
Template=adinfo <=> Fields="start-date start-time end-time ad-name ad-client-id ad-click-url placement total-impressions total-insertions total-clicks"
This template is an example of another set of report criteria.  Impression, insertion and click counts are aggregated hourly, and more ad info is included in the report.

Example

The following is an example file in the report format:
#IARF: Version=1.0
#Format: Template=basic Fields="start-date ad-name placement total-impressions total-insertions total-clicks"
#Site: Name="Content Provider" Domain=www.content.com GMT-Offset=-8
#Advertiser: Name=Ford Campaign="Explore the world"
#Agency: Name="Funky Agency" Insertion-Order=11783
#Flight: Start-Date=1997-04-01 End-Date=1997-04-30 Impression-Guarantee=1000000
#Created: Report-Date=1997-04-03 Vendor=NetGravity Version=3.0
1997-04-01 "Ford Explorer" "Sports section"         10253 0  843
1997-04-01 "Ford Explorer" "Keyword: outdoors"       2543 0   85
1997-04-01 "Ford Taurus"   "Entertainment section"  84922 0 1024
1997-04-02 "Ford Explorer" "Sports section"         10765 0  682
#Remark: This IARF stuff is pretty cool!
The next example uses a slightly more complex template.
#IARF: Version=1.0
#Format: Template=adinfo Fields="start-date start-time end-time ad-name ad-client-id ad-click-url placement total-impressions total-insertions total-clicks"
#Site: Name="Content Provider" Domain=www.content.com GMT-Offset=-8
#Advertiser: Name=Microsoft Campaign="Try Java"
#Agency: Name="Funky Agency" Insertion-Order=11784
#Flight: Start-Date=1997-04-01 End-Date=1997-04-30 Impression-Guarantee=1000000
#Created: Report-Date=1997-04-03 Vendor=NetGravity Version=3.0
1997-04-01 00:00 12:00 "MS Image ad" msimage.gif http://www.microsoft.com/ "Javaless browsers" 10253 0 843
1997-04-01 00:00 12:00 "MS Java ad" msjava.class http://www.microsoft.com/ "Java capable" 0 2543 175
1997-04-01 12:00 00:00 "MS Image ad" msimage.gif http://www.microsoft.com/ "Javaless browsers" 11874 0 735
1997-04-01 12:00 00:00 "MS Java ad" msjava.class http://www.microsoft.com/ "Java capable" 0 2278 156
The last example includes some experimental directives, attributes, and fields.
#IARF: Version=1.0
#Format: Fields="start-date start-time end-time ad-name x-ad-size placement total-impressions total-clicks"
#Field-Info: Name=x-ad-size Type=string Header="Ad Size"
#Advertiser: Name="The Big Little Co" Campaign="Size wise"
#Agency: Name="Funky Agency" Insertion-Order=11785 Contact="John Smith"
#X-Site-Sizes: Allowed="banner button"
1997-04-01 08:00 09:00 "Big Ad"    banner "Sports"    10253  843
1997-04-01 08:00 09:00 "Little Ad" button "Sports"     2543   85
1997-04-01 09:00 10:00 "Big Ad"    banner "Sports"    84922 1024
1997-04-01 09:00 10:00 "Little Ad" button "Sports"    10765  682
#Advertiser: Name="BigCo, LTD" Campaign="We're Big!"
#Agency: Insertion-Order="94378"
1997-04-01 10:00 11:00 "Tall Ad"  banner "Sports"    10158  729
1997-04-01 10:00 11:00 "Short Ad" button "Sports"     3097   79
1997-04-01 11:00 12:00 "Tall Ad"  banner "Sports"    96123 1083
1997-04-01 11:00 12:00 "Short Ad" button "Sports"    11074  701

Acknowledgements.

Steve Goldberg pushed the IATC to develop this format. Paul Nakada contributed the idea of attribute value pairs for directives. The IATC provided much feedback.

References.

[Hallam96]
P. Hallam-Baker, B. Behlendorf Extended Log File Format, March 1996
[RFC1808]
R. Fielding Relative Uniform Resource Locators, June 1995
[RFC1738]
T. Berners-Lee, L. Masinter, Uniform Resource Locators (URL), December 1994
[RFC2068]
R. Fielding, et al, Hypertext Transfer Protocol -- HTTP/1.1, January 1997
[RFC1700]
Reynolds, J., and J. Postel, Assigned Numbers, STD 2, RFC 1700, USC/ISI, October 1994.
$Id: WD-adreport-19970515.html,v 1.3 1999/02/19 01:33:21 ts Exp $