dbt Migration at City of Boston

Modernizing the Analytics Team's engineering pipelines by migrating transformations and tests to dbt and enabling the team's first data catalog.

During the last year I worked on the Analytics Team at the City of Boston, I proposed and led a redesign of our data warehouse and ETL pipelines centered around the implementation of dbt. This effort started in March 2023 with a proposed new set of schemas for the PostgreSQL data warehouse, and by March 2024 the migration of the engineering team’s pipelines to a dbt-centered implementation was nearly complete. The Analytics Team’s first Data Catalog, built from the dbt-generated docs, also launched for a select group of users in March 2024.

Presenting at Coalesce 2023

City of Boston's dbt migration was one of 3 case studies in our Coalesce 2023 talk

I had the opportunity to share out about this migration to dbt at Coalesce in October 2023. Along with my cospeakers Ian Rose at the California Office of Data and Innovation and Laurie Merrell at Jarvus Innovations, we made the case for why public sector data teams should consider adding dbt to their toolset and presented 3 dbt implementation case studies from our respective organizations.

Official recording of "From coast to coast: Implementing dbt in the public sector" on YouTube

Our goal with this talk was to help other public sector data teams who might be interested in implementing dbt to (1) know whether they should consider dbt (is their team ready for it, and what value would dbt add), (2) if so what might that implementation look like (three case studies with supporting open-source GitHub repos), and finally (3) to jumpstart a community of dbt users in the public sector or adjacent fields. We also wanted to advocate for data professionals currently using dbt to consider working in the public sector.

PDF of the slides presented during the talk

I collated all of the resources associated with this talk in a GitHub repo (dbt-public-sector-resources), which has links to the talk, the slides, and the public GitHub repos for each case study.

For the City of Boston’s public GitHub repo, I created a copy of our dbt project minus any of the models - so, just the skeleton - along with supporting resources like Civis workflow YAML files and bash & python scripts (cob_analytics_dbt_skeleton_project). Hopefully, any other teams who use Civis Platform and want to see how to run a dbt build can use this repo as an example.

March 2024 update

I first published our dbt project skeleton in October 2023 (in time for Coalesce), but by March 2024 we had significantly evolved our dbt implementation - making full use of the elementary package, selectors, stateful dbt builds, and more. So the public skeleton repo is now up to date with the actual dbt project as of March 11, 2024. You can also compare how the project evolved from October 2023 to March 2024.

Data Catalog

While implementing dbt was a significant improvement for the Data Engineering team, the primary impact for the rest of the Analytics Team and other users of the data warehouse was in the documentation that it enabled. The dbt-generated docs website was dbt’s primary selling point for non-engineers (the lineage graphs are especially stunning). So starting in November 2023 we began a concerted effort to get the docs site published as an officially supported DoIT application, with a boston.gov URL and Single Sign-On authentication.

PDF of the slides presented for the DoIT Lunch & Learn

By March 2024, after a significant collaborative effort across many teams in the department, the first iteration of the official Analytics Data Catalog was published. I am so proud that this important resource is now available to city workers, and to know that my coworkers will continue to iterate on and build both the dbt project and the Data Catalog after my departure.

Screenshot of the Data Catalog from March 11, 2024