Design
From SF Data Wiki
| Home > Design |
Contents |
Common data format
See also: Architecture
Ingestion
Purpose: Get the data and metadata.
Result: One or more files containing data and metadata.
Processing
Purpose: Validate and store data elements, list exceptions.
Result: Clean data in some database format.
API / Access
requires more discussion
Purpose: Provide unified interface to access all the data (validated or raw).
Result: AtomPub interface.
Application: Converters
Purpose: Use the API to get the requested data and convert it to popular formats as XLS, RDF, CSV etc.
Result: Usually one file downloaded by the end-user or by applications like Freebase.
More Application / Converters >>
Notes
- API/Access: as a first step this can just create XML files and use any webserver to publish. The next step would be a web service (ex. django, tornado etc).
- Not yet sure which module can create DTDs.
- The OGR library seams to be great to process vector formats (geo).
- The "database" term is used loosely - can be anything from file, SQL to Bigtable or triple.
- We need to think about security, spam and abuse at every step.
- Don't forget unit tests!
