Design

From SF Data Wiki

Jump to: navigation, search
Home > Design

Contents

Common data format

See also: Architecture

Ingestion

Purpose: Get the data and metadata.
Result: One or more files containing data and metadata.

More Ingestion >>

Processing

Purpose: Validate and store data elements, list exceptions.
Result: Clean data in some database format.

More Processing >>

API / Access

requires more discussion

Purpose: Provide unified interface to access all the data (validated or raw).
Result: AtomPub interface.

More API / Access >>

Application: Converters

Purpose: Use the API to get the requested data and convert it to popular formats as XLS, RDF, CSV etc.
Result: Usually one file downloaded by the end-user or by applications like Freebase.

More Application / Converters >>

Notes

  • API/Access: as a first step this can just create XML files and use any webserver to publish. The next step would be a web service (ex. django, tornado etc).
  • Not yet sure which module can create DTDs.
  • The OGR library seams to be great to process vector formats (geo).
  • The "database" term is used loosely - can be anything from file, SQL to Bigtable or triple.
  • We need to think about security, spam and abuse at every step.
  • Don't forget unit tests!

Data catalog

Personal tools