Architecture
From SF Data Wiki
The goal is a simple but scalable architecture that can run on a laptop or on a Compute Cloud.
See Design goals below.
Contents |
Data source
Use case 1 - Simple
Goal: Publish a simple dataset.
Example: crime data
Diagram
Components
- Data source: CRM or any other internal application. Can be even e-mail. In this example the crime data would originate here.
- Raw data: Raw data exported from the data source (1) in supported format (CSV, XML, Access etc.)
- Processing: Data conversion, sanity verification, annotation (receipt date, dataset identifier). Data processing may be written in any language supported by the equipment being used to process. LAMP stacks are cheap and prolific across all sectors, therefore a reasonable baseline implementation might aim for any interpreted language available to the average Linux distribution.
- Clean data ready to be accessed. This may or may not be in a standardized format, or may be maintained in domain-specific formats.
- Access application (web service)
- Public access: AtomPub (domain specific XML)
- Optional: The original raw data URL can be obtained from the Access API (5)
Notes
- Data source (1) can be one application or multiple in which case the Processing (3) will also aggregate
- Starting with Raw Data storage (2) any or all components can be hosted outside of the local network.
- Processing (4) can be a simple application or distributed (ex. Hadoop, Cassandra) running on local servers or 3rd party compute cluster (ex. Amazon EC2) - See Use Case 2
- Conforming applications can export Clean Data (4) directly, in which case we don't need Raw Data store (2) and Processing (3)
- Data format: XML by default, also selectable HTML, JSON or Freebase accessible (RDF)
Use case 2 - Large dataset
Goal: Publish large (aggregated) dataset.
Example: Daily statistics (?)
Diagram
Similar to Use case 1.
Components
Processing (3), Clean Data (4) and Access (5) is typically running on a redundant and scalable cluster.
Use case 3 - With non-public data
Goal: Allow internal access to full data, filter out sensitive information for public access
Sensitive data: employee home address, water pipe geocode
Diagram
Components
See Use case 1 for details
- One or multiple data sources
- Common (and secure) storage for exported data
- Processing and Access Control List (ACL)
- Clean data - public and sensitive mixed in the same database
- Internal access to full database
- Clean data - just the public records outside local network
- Public access
Use case 4 - Feedback
Goal: Allow feedback (voting and correction). See The need for User Input for the reason why this is important.
Example:
Diagram
Components
The first part is similar to Use case 1 (Simple)
- Data source
- Raw data
- Processing
- Clean data
- Access application (web service)
- API: Read data
- API: Attach comment, correction or rating to data. Feedback should generate a tracking# and status/notification
- Incoming data stored separately (no direct editing)
- Optional: Internal application allows manual or automated update of data
Data consumer
Use case 1 - Web mashup
Web site: interactive data
- Data Source(s) trough CivicDB API
- Web server
- Internet connection (http)
- Web browser
Examples
- http://data.gov style tool catalog
- Web mashup (ex. crime map using Google/Yahoo/MSN maps)
- Web application (ex. crime watch and alerts)
- Feedback on data quality or corrections (comments - See also The Case for User Input)
- Statistics and Visualization
Use case 2 - Download for local processing
Similar to use case 1, but the provided format is other than HTML - mainly for local processing.
- Data Source(s) trough CivicDB API
- Web server
- The user selects the format
- Internet connection (http)
- Download the data in desired format for local processing
Use case 3 - Applications
Similar to use case 1, but with native applications that can connect directly to API (XML, JSON) or trough 3rd party providers.
- Data Source(s) trough CivicDB API
- Optional: Web server (login, profiles etc.)
- Internet connection (http)
- Mobile or Desktop application (iPhone, Google Earth style application etc.)
Design goals
- International (locale and language)
- Multi platform
- Scalable
- Language bindings: C++, C#, Java, PHP, ASP, Python, RoR, Objective C
- Built-in performance monitoring and metrics
- Open source and License-free (ex. CC-Zero)






