dbt Migration at City of Boston
Modernizing the Analytics Team's engineering pipelines by migrating transformations and tests to dbt and enabling the team's first data catalog.
During the last year I worked on the Analytics Team at the City of Boston, I proposed and led a redesign of our data warehouse and ETL pipelines centered around the implementation of dbt. This effort started in March 2023 with a proposed new set of schemas for the PostgreSQL data warehouse, and by March 2024 the migration of the engineering team’s pipelines to a dbt-centered implementation was nearly complete. The Analytics Team’s first Data Catalog, built from the dbt-generated docs, also launched for a select group of users in March 2024.
Presenting at Coalesce 2023
I had the opportunity to share out about this migration to dbt at Coalesce in October 2023. Along with my cospeakers Ian Rose at the California Office of Data and Innovation and Laurie Merrell at Jarvus Innovations, we made the case for why public sector data teams should consider adding dbt to their toolset and presented 3 dbt implementation case studies from our respective organizations.
Our goal with this talk was to help other public sector data teams who might be interested in implementing dbt to (1) know whether they should consider dbt (is their team ready for it, and what value would dbt add), (2) if so what might that implementation look like (three case studies with supporting open-source GitHub repos), and finally (3) to jumpstart a community of dbt users in the public sector or adjacent fields. We also wanted to advocate for data professionals currently using dbt to consider working in the public sector.
I collated all of the resources associated with this talk in a GitHub repo (dbt-public-sector-resources), which has links to the talk, the slides, and the public GitHub repos for each case study.
For the City of Boston’s public GitHub repo, I created a copy of our dbt project minus any of the models - so, just the skeleton - along with supporting resources like Civis workflow YAML files and bash & python scripts (cob_analytics_dbt_skeleton_project). Hopefully, any other teams who use Civis Platform and want to see how to run a dbt build can use this repo as an example.
March 2024 update
I first published our dbt project skeleton in October 2023 (in time for Coalesce), but by March 2024 we had significantly evolved our dbt implementation - making full use of the elementary package, selectors, stateful dbt builds, and more. So the public skeleton repo is now up to date with the actual dbt project as of March 11, 2024. You can also compare how the project evolved from October 2023 to March 2024.
Data Catalog
While implementing dbt was a significant improvement for the Data Engineering team, the primary impact for the rest of the Analytics Team and other users of the data warehouse was in the documentation that it enabled. The dbt-generated docs website was dbt’s primary selling point for non-engineers (the lineage graphs are especially stunning). So starting in November 2023 we began a concerted effort to get the docs site published as an officially supported DoIT application, with a boston.gov URL and Single Sign-On authentication.
By March 2024, after a significant collaborative effort across many teams in the department, the first iteration of the official Analytics Data Catalog was published. I am so proud that this important resource is now available to city workers, and to know that my coworkers will continue to iterate on and build both the dbt project and the Data Catalog after my departure.