cv
About me
Full Name | Jennifer (Jenna) Jordan |
Job Title | Data Engineer |
Citizenship | USA |
Skills
- Languages: python, SQL, bash, YAML/YAQL, jinja, regex, Xpath, R (tidyverse), SPARQL
- Featured Libraries: dbt, pandas, plotly, streamlit, SQLAlchemy, Great Expectations
- Data Stores: PostgreSQL (and PostGIS), BigQuery, SQLite, duckdb, JSON, XML, RDF (knowledge graphs)
- Other Tools: git, GitHub, dbt Cloud, Civis Platform, Jira, OpenRefine, ArcGIS Pro
- Concepts: data modeling (3NF, star schema), relational and analytical database design, data orchestration (ETL pipelines), data management and governance, FAIR data principles
- As well as: data wrangling, visualization, and exploratory data analysis; teaching workshops; writing documentation; presenting projects
Experience
- Aug 2024 - present
Senior Consultant, Data Management & Strategy
Analytics8
- Working with a mission-driven healthcare organization to advise on dbt Mesh best practices, strategic processes to support data governance and its implementation within dbt, and provide support & training for the transition to a dbt Mesh of multiple projects.
- Mar 2024 - Aug 2024
Staff Consultant, Data Management
Analytics8
- Working with a mission-driven healthcare organization as they migrate their legacy monolith dbt project to a dbt mesh architecture of multiple domain-driven projects
- Jan 2022 - Mar 2024
Data Engineer
Analytics Team, Department of Innovation & Technology, City of Boston
- Initiated & led the engineering team’s migration of ~200 legacy ETL pipelines to dbt, with redesigned orchestration workflows & data warehouse schemas
- This resulted in the team’s first data catalog, jumpstarted data governance efforts, provided oversight over and analysis of both pipelines and data, and improved the use of both computational resources and engineers’ development time
- Presented and advocated for this work to the team, the CIO, the department, and at a conference
- Built out custom ETL pipelines to integrate data trapped across many different sources, enabling key metrics to be tracked (e.g. time for affordable housing projects to be approved) and complicated manual processes to be automated (e.g. joining together permitting, housing, and development data)
- Advised on structural & process changes to involved source systems across different departments
- Wrote custom component scripts in python to ingest and export data
- Ensure the analysts have high quality, reliable data for analytics end products (e.g. dashboards) and the public has up to date access to open data published on Analyze Boston.
- Initiated & led the engineering team’s migration of ~200 legacy ETL pipelines to dbt, with redesigned orchestration workflows & data warehouse schemas
- Jan 2021 - Nov 2021
Data Engineer, Computational Methods Instructor (Contractor)
Network Contagion Research Institute (NCRI)
- Built ingest pipelines (with Python) and designed relational databases for the data to add new social media communities (Telegram, 4-chan) to Pushshift in order to support NCRI's mission of identifying disinformation and extremism on social media.
- Ran the Computational Methods group for interns at NCRI Labs (Summer and Fall 2021 semesters)
- Designed a semester-long curriculum for computational skills.
- Taught weekly workshops on Bash, Git, Python, SQL, and regex (using the Software Carpentries lesson plans).
- Supervised independent projects.
- Sep 2020 - Apr 2021
Marketing Data Analyst (Contractor)
Bright Wolf
- Built a streamlit web app to visually analyze Bright Wolf’s marketing data from Salesforce, email campaigns, and company data enrichment, enabling the sales team to see overarching trends
- Built the ETL pipelines to ingest data from these sources via APIs into a local database
- Built a streamlit web app to visually analyze Bright Wolf’s marketing data from Salesforce, email campaigns, and company data enrichment, enabling the sales team to see overarching trends
- Aug 2019 - May 2020
Graduate Research Assistant
Cline Center for Advanced Social Research, University of Illinois at Urbana-Champaign
- Supported the Cline Center with data releases by preparing/cleaning data, updating documentation, and coordinating with the data repository.
- Promoted the Global News Index by preparing and helping to present tutorials.
- Developed a user feedback process for Cline Center software applications and data releases.
- Other support tasks as needed.
- May 2019 - Jul 2019
Data Science Intern
The Program on Governance and Local Development (GLD), University of Gothenburg
- Assisted the Data Scientist on data monitoring tasks for an ongoing survey (LGPI 2019) being conducted in Zambia and Kenya - for example, one data monitoring task was to make sure that enumerators completed the requisite number of surveys in each designated hectare, according to the sampling plan.
- Wrote Python scripts to organize and wrangle data downloaded from SurveyToGo, edited surveys in SurveyToGo, worked with geospatial data using Python and QGIS, and helped to develop the Local Governance Performance Index based on data collected from the survey.
- Worked extensively with large XML documents, and wrote custom scripts to extract data from KML files and reformat into WKT files.
- Attended the 2019 Annual GLD Conference, which focused on the theme "Routes to Accountability."
- Jan 2019 - May 2019
Graduate Research Assistant
Cline Center for Advanced Social Research, University of Illinois at Urbana-Champaign
- Created user-friendly documentation for the Cline Center’s Global News Index using Adobe InDesign.
- Wrote an in-depth codebook on the GNI’s variables and corpora, a detailed user guide and a quick-start guide on Archer (the in-house software developed to query the GNI) and a guide on using Solr to query the GNI.
- Helped to test, document bugs, and suggest features for Archer (software application recently developed by the Cline Center), and assisted users in querying the GNI.
- Sep 2018 - Dec 2018
Research Assistant (graduate hourly position)
iSchool, University of Illinois at Urbana-Champaign
- Investigated and compiled a corpus on Data Management Plans (DMPs) for Prof. Peter Darch.
- Annotated datasets for use in human-in-the-loop machine learning analyses performed by Prof. Jana Diesner’s research group.
- Jun 2016 - Aug 2017
English Teacher
Corem Language Institue, South Korea
- Taught English as a Foreign Language to Kindergarten and Elementary students in Yangsan, South Korea, at a private academy (hagwon). I taught an average of 8 40-minute classes each day.
- My kindergarten students were 5-6 years old. In addition to teaching basic reading, writing, and conversational skills I taught fun math, science, and arts & crafts lessons. I also wrote bimonthly progress reports for each student, administered occasional tests, and helped to conduct monthly “phone interviews” with the students in the upper-level classes.
- My elementary students were 7-12 years old. In addition to teaching language lessons, I was in charge of making and grading their tests and writing bimonthly progress reports for each student.
- Jan 2014 - Apr 2014
Intern
U.S. Mission to the UN at Washington, DC (USUNW), U.S. Department of State
- Worked as part of a small team advising the U.S. Ambassador to the United Nations Samantha Power, serving as a bridge between the United Nations, the Executive Office, and the National Security Council.
- Supported the Senior Policy Advisors, Speechwriter, and Executive Assistant with research, copywriting, and administrative tasks.
- Wrote executive summaries and took notes on office, cross-bureau, and inter-departmental meetings for the USUNW team.
Publications & Presentations
- presented Oct 2023
From coast to coast: implementing dbt in the public sector
Coalesce 2023 (by dbt Labs), San Diego
- Presented about Boston's implementation of dbt in order to improve data services and data engineering practices, while my cospeakers presented about their projects for the State of California and Cal-ITP.
- This session discusses the similarities and differences between the implementations of dbt, and how some of the constraints and challenges of working in government shape both the technical and social design of data services. The speakers reflect on successes, challenges, and lessons learned about adopting modern data tooling in state and local governments.
- published Aug 2021
Interactive Data Visualizations in Python
The Carpentries, Lesson Incubator
- Developed a Carpentries workshop lesson designed to be an introduction to making interactive visualizations in python.
- Learners create a new environment using conda, wrangle data into the proper format using pandas library, create visualizations using the Plotly Python library, and display these visualizations and create widgets using Streamlit.
- published Jan 2021
No buzz for bees: Media coverage of pollinator decline
Proceedings of the National Academy of Sciences (PNAS)
- Co-author on Perspective paper submitted to PNAS in March 2020, published Jan 2021.
- Wrote the ETL scripts to collect data from the Global News Index and transform the data for analysis & visualization, created an interactive visualization tool to aid in exploratory data analysis, and wrote the documentation for the data & code deposited in the Illinois Data Bank.
- published May 2020
Python can be tidy too: pandas recipes for normalizing data
PyCon 2020 (virtual)
- Published poster online for PyCon 2020 (conference moved online-only due to COVID-19)
- Demonstrates a collection of recipes from my Tidy Pandas Cookbook that draw on the “tidy data” and “3rd normal form” philosophies of data organization, using data from the Correlates of War and Uppsala Conflict Data Program.
- presented Oct 2019
Put Relational Databases in Your Data Curation Toolbox
Proceedings of the Association for Information Science and Technology (ASIS&T 82nd Annual Meeting; in Melbourne, Australia)
- Presented my poster advocating the use of relational databases in the data curation process, especially for datasets that are published separately but can be used together due to a common identifier scheme and shared attributes.
- The Correlates of War datasets are used as an illustrative example to show how the normalization process results in a design with greater data reusability, while check constraints and foreign key constraints can improve data quality.
Education
- 2020
Master of Science, Library and Information Science
University of Illinois at Urbana-Champaign
- 3.97 GPA
- 2015
Bachelor of Arts, Journalism, Political Science
University of North Carolina at Chapel Hill
- Graduated with Honors
- 3.61 GPA
Certificates
- 2023
dbt Developer
dbt Labs
- 2022
Analytics Engineering with dbt
CoRise
- 2020
Software Carpentries Certified Instructor
The Carpentries
- 2016
CELTA, Teaching English as a Foreign Language
International House Budapest