IRDB: Phase 1

During my first semester, I took a class on databases and a class on python. Both classes concluded with a final project of our own devising. I merged those two final projects into one: designing and creating the database that incorporated most of the Correlates of War Project datasets and then transforming and loading the data into the database using python.

For the database half of the project, I designed a database schema (and created the EER diagram), and then wrote a SQL script to create the database that contains, connects, and describes the disparate datasets published by the Correlates of War project. However, this only covers the creation of the tables and their referential integrity. The python half of the project involved writing scripts (using the Pandas library) to transform the CSV files into SQL insert statements. The transformation process ended up revealing several data quality issues, and the process of integrating the four War datasets was much more complicated than I initially thought.

Together, both sides of the project resulted in a Github repo that contains the necessary scripts to allow anyone to create and fill the database on their own machine.