Data Sources
WaterQ is built on publicly available federal data. Here is a detailed overview of our data sources and how we use them.
EPA Safe Drinking Water Information System (SDWIS)
Primary Data SourceSDWIS is the EPA's comprehensive database of public water system information and is our primary data source. It contains records for over 150,000 public water systems serving approximately 300 million Americans.
Data We Use
- • Water System Inventory: System names, IDs (PWSID), locations, service populations, water source types, and system classifications
- • Contaminant Test Results: Measured concentrations, test dates, Maximum Contaminant Levels (MCLs), and violation flags
- • Violation Records: Violation types, severity classifications, dates, resolution status, and enforcement actions
USGS Water Quality Portal
Supplementary SourceThe Water Quality Portal is a cooperative service by the United States Geological Survey (USGS), the EPA, and the National Water Quality Monitoring Council. It provides access to water quality monitoring data from multiple federal, state, and tribal agencies.
Data We Use
- • Environmental Monitoring: Surface water and groundwater quality measurements from monitoring stations
- • Regional Context: Helps identify potential contamination sources and regional water quality patterns
EPA Envirofacts
Supplementary SourceEnvirofacts is the EPA's multi-system search tool that provides access to environmental information from across the agency's databases, including facility compliance and enforcement data.
Data We Use
- • Facility Information: Additional details about water treatment facilities and their compliance history
- • Enforcement Actions: Formal enforcement actions taken against non-compliant systems
Update Schedule
| Data Type | Frequency | Notes |
|---|---|---|
| Water System Inventory | Quarterly | Jan, Apr, Jul, Oct |
| Contaminant Test Results | Quarterly | Aligned with EPA reporting cycles |
| Violations | Quarterly | Including resolution status updates |
| Scores & Grades | After each data update | Recalculated when new data is available |
Data Processing
Raw data from our sources goes through several processing steps before being presented on WaterQ:
- 1 Ingestion
Raw data is fetched from federal APIs and loaded into our processing pipeline
- 2 Validation
Records are checked for completeness, format consistency, and data quality
- 3 Normalization
Data from multiple sources is mapped to a unified schema with consistent units and identifiers
- 4 Scoring
Water quality scores and grades are calculated using our scoring methodology
- 5 Aggregation
System-level data is aggregated to city, county, and state levels using population-weighted averages
Data Accuracy
While we take every effort to ensure accuracy, WaterQ relies on data reported by water systems to federal agencies. Reporting delays, data entry errors, and testing gaps may affect the completeness of the information presented. If you believe any data is incorrect, please contact us so we can investigate.