Internet Advertising Technology Council

WD-adreport-19970505

Internet Advertising Report Format

IATC Working Draft WD-adreport-19970505

Latest version:
http://www.basswoodassoc.com/standards/WD-adreport.html
This version:
http://www.basswoodassoc.com/standards/WD-adreport.html-19970505
Author:
Tom Shields <tom.shields@basswoodassoc.com> 

Status of this document

This is a IATC Working Draft for review by IATC members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use IATC Working Drafts as reference material or to cite them as other than "work in progress".

Note: since working drafts are subject to frequent change, you are advised to reference the above URL, rather than the URLs for working drafts themselves.

Abstract

A standard format for presenting internet ad performance information is presented.  The format is extensible, portable, and easy to parse into a variety of data stores.  The proposal is motivated by the need to capture and compare ad performance information independent of the software used to perform the ad delivery.

Introduction

Advertisers and agencies who plan internet ad campaigns require ad performance reports that are comparable across web sites.  Because many web sites use different ad delivery software, agencies are currently expending considerable manual effort reconciling these reports.  One fixed format report is not enough, because different campaigns require reporting of different kinds of information.  This proposal defines a self-describing report format that is extensible and flexible enough to report many kinds of ad performance information, but simple and strict enough to be easily parsed and converted into data stores for analysis and comparison.

This proposal is broken into two major sections.  The first is a description of the file format itself, with emphasis on the encoding and ordering of information for easy parsing.  The second part is a "taxonomy" of standard headers, columns, and report templates, to ensure the utility and comparability of the information.

The report format has the following design goals: This proposal makes no attempt to define the performance measures used by advertisers, agencies, and ad delivery systems, except to provide for the reporting of them.  This proposal explicitly does not address the issue of transmission of this data, for example from a web site to an advertiser or agency.  This format does not define presentation information, and is designed to be readable, but not camera ready.

This work is distantly based on the W3C Extended Log File Format [Hallam96] working draft.

Format

A advertising report file contains a sequence of lines containing US-ASCII (ISO-8859-1) characters terminated by either the sequence LF or CRLF. Report generators should follow the line termination convention for the platform on which they are executed. Analyzers and converters must accept either form. Each line may contain either a directive or an entry. Directives record metadata about the report. Entries consist of a sequence of fields relating to a ad event or series of events. Fields are units of information, separated by whitespace.

Report file parsers should be tolerant of errors. If an entry or directive contains corrupt data or is terminated unexpectedly the parser should resynchronize using the end of line marker and continue to parse the following lines.

Directives

Directives are lines beginning with the '#' character.  No whitespace is permitted between the '#' and the directive name. Directive data takes the form of attribute value pairs.  Defined attributes may occur in any order.  If an attribute occurs more than once, the last occurrence is taken as the proper value.  Parsers should ignore unknown directives, unknown attributes, and directives with parse errors.  Every conforming report file must begin with the directive IARF, and contain other required directives as described below.
<directive> = "#" <name> ":" *<nvpair> <end-of-line>
<name> = *[ <alnum> "-" ]
<nvpair>  =  <whitespace> <name> "=" <string>

Entries

Entries are analagous to database rows.  An entry consists of a sequence of fields, separated by whitespace.  All lines not beginning with '#' are considered entries.  If the first field of an entry begins with '#', the field must be surrounded by double quotes ("") to distinguish it from a directive.  Entries are not required in report files, although a report file with no entries may not convey much information.  If an entry contains fewer than the required number of fields as defined by the Fields directive, the entire entry should be ignored.  If an entry contains extra fields, the extra fields should be ignored.  Blank lines should be ignored, because by definition they will not have enough fields in them. No line continuation character is defined - entries must be completely contained on one line.
<entry> = [ <field> *[ <whitespace> <field> ]] <end-of-line>

Fields

Fields are individual data entities analagous to database columns in a row.  Fields conform to one of the following data types, according to the field definition. Fields without definitions should be considered as data type <string>. Fields may not contain ASCII control characters, or characters outside the US7ASCII set, unless they are encoded as described below. Because fields are separated by whitespace, fields must consist of at least one non-whitespace character.
<field> = <integer> | <fixed> | <uri> | <date> | <time> | <string>
<type-identifier> = "integer" | "fixed" | "uri" | "date" | "time" | "string"

<digit>        = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<lowalpha>     = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
<hialpha>      = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
<punct>        = "$" | "-" | "_" | "." | "+" | "!" | "*" | "(" | ")" |
                 "{" | "}" | "|" | "&" | "^" | "~" | "[" | "]" | "," |
                 "<" | ">" | "#" | "%" | ";" | "/" | "'" | "`" | "=" |
                 "?" | ":" | "@"
<reserved>     = <"> | "\"
<wchar>        = <tab> | <space>
<whitespace>   = <wchar> *<wchar>
<alnum>        = <lowalpha> | <hialpha> | <digit>
<version>      = <digit> *<digit> "." <digit> *<digit> *[ "." <digit> *<digit> ]

Integer

<integer> = [ "-" ] *<digit>
Integers are represented as a sequence of digits, base 10.

Fixed

<fixed> = [ "-" ] [ <digit> *<digit> [ "." *<digit> ]]
Floats are represented as at least one digit left of decimal, and arbitrary precision.

URI

A URI as specified by RFC1738, relative URIs are specified by RFC1808. URIs cannot by definition include whitespace or ASCII control characters, therefore they do not need to be escaped or enclosed in quotes.

Date

<date>  = 4*<digit> "-" 2*<digit> "-" 2*<digit>
Dates are recorded in the format YYYY-MM-DD where YYYY, MM and DD stand for the numeric year, month and day respectively. All dates are specified in GMT, unless the GMT-Offset attribute of the Site directive is used. This format is chosen to assist collation.

Time

<time>  = 2*<digit> ":" 2*<digit> [ ":" 2*<digit> [ "." *<digit> ]]
Times are recorded in the form HH:MM, HH:MM:SS or HH:MM:SS.S where HH is the hour in 24 hour format, MM is minutes and SS is seconds. All times are specified in GMT, unless the GMT-Offset attribute of the Site directive is used.

String

<string> = <wstring> | <qstring>
<wstring> = <alnum> *[ <alnum> | <punct> ]
<qstring> = <"> *<schar> <">

<schar> = <alnum> | <punct> | <wchar> | <xchar>
<xchar> = <""> | <xescape>
<xescape> = "\x" 2*<hexdigit>
<hexdigit> = <digit> | "A" | "B" | "C" | "D" | "E" | "F"
Strings are rendered in either unquoted or quoted/escaped form.  Unquoted form may only be used for strings that do not contain whitespace, reserved, or control characters, and may be parsed by simply scanning for the end of field separator (whitespace).  Quoted strings must be parsed and characters unescaped appropriately.  Empty strings may be represented by <"">.

The character escaping rules are designed to be easy to parse by a variety of conversion tools, and expressive enough to encode any characters.  The basic rule is any character that is either a special character (backslash or double quote) or not printable (such as control characters) may be encoded using the form "\xFF" where the FF represents the two hex digits of the character.  There is a special case designed for legacy conversion tools: the double quote may also be escaped by doubling the character, and should be escaped in this fashion when possible.

Taxonomy

Directive Names

The following standard directives are defined.  New directives not approved by the IATC should be preceded with the string "X-" as in "X-New-Directive".  Directive names are case-insensitive. Arbitrary comments can be inserted in the file using the Remark directive.
IARF: Version=<version>
The IARF directive is required and must appear exactly once as the first line of the report file. The version attribute defines the version of the report format. This draft defines version 1.0.
Format: Fields="<field-identifier> [<field-identifier> [...]]" Template=<name>
The Format directive controls the field ordering of the entries that follow.  This directive must appear in the report file before any entry lines are encountered.  Format directives may occur anywhere within the report file, they will take effect for the entries that follow the directive.  The Fields and Template attributes are complementary, either or both may be present, and at least one is required.  See below for a list of standard field identifiers and templates.  Non-standard field identifiers may be typed using the "Field-Info" directive.
Field-Info: Name=<field-identifier> Type=<type-identifier> Header=<string>
Optional additional information for each field; this directive is never required.  Standard fields defined below are strictly typed, and unknown fields should be considered type "string". The Name attribute refers to a field-identifier specified in the Format directive. Type identifiers convey field type information; allowed values are listed above, and are case insensitive.  The Header attribute is a string representing column header label information.
Site: Name=<string> Domain=<domain> GMT-Offset=<integer> Email=<address> Contact=<string>
The site directive conveys information specific to the site that generated the data.  Name is a human-readable name, domain refers to the primary site domain name.  GMT-Offset represents the offset in hours for all dates and times in both entries and directives. This directive may occur anywhere within the file and the GMT-Offset takes effect for all directives and entries following it.  If the GMT-Offset attribute does not occur, all dates and times are assumed to be GMT.  The Email and Contact attributes are for representing contact info regarding this report file.
Advertiser: Name=<string> Brand=<string> Campaign=<string>
The Advertiser directive is for advertiser-specific information.  The Name, Brand, and Campaign attributes are self-explanatory.
Agency: Name=<string> Insertion-Order=<string>
The agency directive may be used for ad agency information.  The Name attribute identifies the agency.  The Insertion-Order attribute represents the agency-assigned insertion order number represented by the report.  Multiple insertion orders may be reported on by using multiple Agency directives.
Flight: Name=<string> Start-Date=<date> Start-Time=<time> End-Date=<date> End-Time=<time> Impression-Guarantee=<integer>
The Flight directive represents flight information. Multiple flights may be reported in one report file by using multiple Flight directives.  The Name represents an identifier for the flight. Impression-Guarantee represents the number of impressions guaranteed over the duration of the flight.
Created: Report-Date=<date> Report-Time=<time> Vendor=<name> Version=<version> OS=<name> OS-Version=<version>
The Created directive is used for information about how and when the report was created.  Report-Date and Report-Time represent the date and time when the report was run.  Vendor and Version represent the software used to create the report.  OS and OS-Version are used to indicate the platform this software was running on. This directive may occur only once, and should occur near the top.
Remark or Rem: <text>
Comment information. Data recorded by this directive should be ignored by analysis tools.

Field Identifiers

The Fields directive lists a sequence of field identifiers specifying the semantics and data type recorded in the fields of each entry. Field identifiers may be one of the following case-insensitive strings:
start-date - <date>
Date beginning the entry.  If this field is not present, interpretation of the entry date is system-specific.
start-time - <time>
Time beginning the entry.  If this field is not present, tools should default to 00:00:00.
end-date - <date>
Date the entry ends.  If this field is not present, tools should default to the same as the start-date.
end-time - <time>
Time the entry ends.  If this field is not present, tools should default to 23:59:59.999.
ad-name - <string>
The name of the ad banner.
ad-id - <string>
The unique server-assigned ID of the ad banner.
ad-media-filename - <string>
The filename of the ad media.
ad-click-url - <uri>
The clickthrough URL for the ad banner.
placement - <string>
The name of the placement - may be a page URL, section of the site, run of site, keyword, etc.
impressions - <integer>
The number of impressions recorded.
insertions - <integer>
The number of insertions recorded.
clicks - <integer>
The number of clicks recorded.

Standard Templates

In order to promote report standardization and enable simpler report analysis tools, a set of standard templates are defined.  Agencies may therefore require reports to conform to both the standard format, and one (or more) of the standard templates.  Analysis tools may interpret a limited number of templates, instead of requiring enough generality to interpret the entire spectrum of report types.  A template directive should be interpreted exactly the same as the associated fields directive; report generators are encouraged to insert both directives, for maximum flexibility.  Templates not approved by the IATC should be preceded by the string "X-" as in "X-New-Template".  The following standard case-insensitive template names are defined:
Template=basic <=> Fields="start-date ad-name placement impressions clicks"
This template is the most common and most basic set of report criteria.  Impression and click counts are aggregated daily, and reported for each ad and placement combination
Template=adinfo <=> Fields="start-date start-time ad-name ad-media-filename ad-click-url placement impressions insertions clicks"
This template is an example of another set of report criteria.  Impression, insertion and click counts are aggregated daily, and more ad info is included in the report.

Example

The following is an example file in the report format:
#IARF: Version=1.0
#Format: Template=basic Fields="start-date ad-name placement impressions clicks"
#Site: Name="Content Provider" Domain=www.content.com GMT-Offset=-8
#Advertiser: Name=Ford Campaign="Explore the world"
#Agency: Name="Funky Agency" Insertion-Order=11783
#Flight: Start-Date=1997-04-01 End-Date=1997-04-30 Impression-Guarantee=1000000
#Created: Report-Date=1997-04-03 Vendor=NetGravity Version=3.0
1997-04-01 "Ford Explorer" "Sports section"         10253  843
1997-04-01 "Ford Explorer" "Keyword: outdoors"       2543   85
1997-04-01 "Ford Taurus"   "Entertainment section"  84922 1024
1997-04-02 "Ford Explorer" "Sports section"         10765  682
#Remark: This IARF stuff is pretty cool!
The next example uses a slightly more complex template.
#IARF: Version=1.0
#Format: Template=adinfo Fields="start-date start-time ad-name ad-media-filename ad-click-url placement impressions insertions clicks"
#Site: Name="Content Provider" Domain=www.content.com GMT-Offset=-8
#Advertiser: Name= Microsoft Campaign="Try Java"
#Agency: Name="Funky Agency" Insertion-Order=11784
#Flight: Start-Date=1997-04-01 End-Date=1997-04-30 Impression-Guarantee=1000000
#Created: Report-Date=1997-04-03 Vendor=NetGravity Version=3.0
1997-04-01 00:00 "MS Image ad" msimage.gif http://www.microsoft.com/ "Javaless browsers" 10253 0 843
1997-04-01 00:00 "MS Java ad" msjava.class http://www.microsoft.com/ "Java capable" 0 2543 175
1997-04-01 12:00 "MS Image ad" msimage.gif http://www.microsoft.com/ "Javaless browsers" 11874 0 735
1997-04-01 12:00 "MS Java ad" msjava.class http://www.microsoft.com/ "Java capable" 0 2278 156
The last example includes some experimental directives, attributes, and fields.
#IARF: Version=1.0
#Format: Fields="start-date start-time ad-name x-ad-size placement impressions clicks"
#Field-Info: Name=x-ad-size Type=string Header="Ad Size"
#Advertiser: Name="The Big Little Co" Campaign="Size wise"
#Agency: Name="Funky Agency" Insertion-Order=11785 Contact="John Smith"
#X-Site-Sizes: Allowed="banner button"
1997-04-01 08:00 "Big Ad"    banner "Sports"    10253  843
1997-04-01 08:00 "Little Ad" button "Sports"     2543   85
1997-04-01 09:00 "Big Ad"    banner "Sports"    84922 1024
1997-04-01 09:00 "Little Ad" button "Sports"    10765  682
1997-04-01 10:00 "Big Ad"    banner "Sports"    10158  729
1997-04-01 10:00 "Little Ad" button "Sports"     3097   79
1997-04-01 11:00 "Big Ad"    banner "Sports"    96123 1083
1997-04-01 11:00 "Little Ad" button "Sports"    11074  701

Acknowledgements.

Steve Goldberg pushed the IATC to develop this format. Paul Nakada contributed the idea of attribute value pairs for directives.

References.

[Hallam96]
P. Hallam-Baker, B. Behlendorf Extended Log File Format, March 1996
[RFC1808]
R. Fielding Relative Uniform Resource Locators, June 1995
[RFC1738]
T. Berners-Lee, L. Masinter, Uniform Resource Locators (URL), December 1994
$Id: WD-adreport-19970505.html,v 1.3 1999/02/19 01:33:21 ts Exp $