Documentation

From SF Data Wiki

Revision as of 19:46, 17 September 2009 by Jnath (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
Home > Documentation

Contents

Documentation

Data Consumer Requirements

Data Availability

  1. Data feeds should be browseable by type and agency name.
  2. Data feeds should be searchable by tags and/or keywords.
  3. "Live" feeds, with a snapshot of the latest information in a dataset, should made available via a URL in XML format (like RSS or Atom).
  4. Downloadable data sets should be available for regular time periods (i.e., by month, year).
  5. Historical data sets should be made available going back at a minimum x months or x years.
  6. Ability to do custom downloads (select a specific ward in the city, a type of crime, a class of license) should be supported.
  7. Multiple download options should be supported - i.e., HTTP, FTP, etc.

Download Formats

  1. At a minimum, the following formats should be supported (others as appropriate):
    1. XML
    2. CSV
    3. KML/ESRI (if appropriate - when the data has a locational component)
    4. JSON
  2. Proprietary data formats, and non-malleable formats should be avoided wherever possible (i.e., Excel, PDF, etc.).
  3. Data should include government-specific identifiers where appropriate (e.g., crime ID, parcel ID, service request ID, etc.). Ideally, these could be used for PK in database (if unique). If public facing ID is generated then it should be namespaced (e.g. SF#######) to allow for multi-city use.
  4. Data fields should have descriptive names where appropriate (i.e., avoid field1, field2, etc).
  5. Field names with a specific meaning should have a corresponding description in data dictionary (e.g., FY2009SPECIALFUNDING).
  6. Fields should have specified max/min widths. How many characters can there be in a specific field.
  7. Field values should be easily convertible to a specific data type (i.e., date information should be formatted as timestamps, booleans as 0/1 or Y/N, etc).
  8. With the exception of narrative fields, values should contain one of a finite set of values (e.g., license types should be one of a predefined set). This will make data validation easier.
  9. Narrative fields should avoid the use of abbreviations, acronyms and other short hand language constructs wherever possible. This can help when presenting information in nontraditional UIs.

Miscellaneous

  1. All data (whether live feed, or download) should have a corresponding description file with detailed explanations for the type of information included in the data set, source, date created, date changed, field definition, data type (i.e., currency, boolean, narrative), etc.
  2. Application ideas - (API) (Feedback) (Sort and Search) (Healthcare) (Web Service) (Visualization)
Personal tools