A high-availability serverless pipeline that ingested and normalized daily public health data into a queryable REST API, serving real-time county-level statistics by ZIP code.
It was 2020 and I had two things I was trying to put to use: I'd just passed the AWS Certified Solutions Architect Associate exam, and I'd been learning statistics with Python. The pandemic gave me a problem worth solving — I wanted to be able to look up what was actually happening with COVID infections in my own area, by ZIP code.
This was the first time I designed and built a production system from scratch entirely on AWS. That mattered to me more than the stats themselves.
The New York Times was publishing daily COVID-19 data at the county level. I built a pipeline to pull that data every day and make it queryable by ZIP code:
The data ran from early 2020 through May 2022, when the NYT wound down its dataset. I stopped updating the pipeline at that point.
The tracker below is still wired to the original API. The data reflects statistics through May 2022 — enter any US ZIP code to see what the numbers looked like at the end of the dataset.
What I actually cared about was the system, not the statistics. I had an idea, knew roughly what AWS could do, and built something end-to-end: scheduled ingestion, a database with a lookup pattern, and a live API someone else could call. For a project I built while learning, it held up.
I also learned something about designing for non-technical users. The ZIP code input was a deliberate choice; county FIPS codes would have been easier to work with technically, but no one thinks in FIPS codes.