It’s summer, and I have officially completed half of my Master’s in Library & Information Science degree. This past semester has been extremely busy for me - not only was I taking more classes, but I was working 20 hours/week as a Graduate Research Assistant at the Cline Center on campus. And while that meant that I had essentially no free time (hence the utter lack of blog posts), there was nothing that I wanted to drop… so I can only blame myself. And next year looks to be at least as busy! But before I look ahead, I want to take some time to reflect on what I have learned and accomplished during my first year of library school.
The MSLIS degree at UIUC has only two required classes (IS501 & IS502), and I have now completed both. The degree also requires 40 credit hours, and I have now completed 24 credit hours - I’ll definitely be going well-over the credit hour requirement by the last semester. My cumulative GPA for the first year is 3.94 - despite an overloaded semester, I was able to earn straight A’s. Check, check, check - from an academic standpoint, my MSLIS degree is going very well.
I chose UIUC and planned my classes with the express purpose of gaining and advancing my technical skills. I had a fairly good idea of what fundamentals were important to learn, and I could hold conversations about various technical concepts. I was familiar enough with the tech world to not be overly intimidated by the concepts we were learning about in class, but not enough to actually be able to accomplish something on my own. Now, just a year later, I have confidence in my ability to solve information problems. I feel like an information magician - in the same way that the magicians in my favorite fantasy novels can manipulate the physical world around them with spells, I can manipulate data in an information world with code - a very empowering feeling in our Information Age. I was able to reach this point by focusing my learning on 3 core concepts:
An information-oriented headspace
Philosophy has always been extremely interesting to me. I love having philosophical debates and learning about new philosophical ideas. There is nothing more important to learning than being able to think critically - and nothing teaches critical thinking skills like philosophy. I have a whole rant about how severely undervalued and underutilized philosophy is in the modern era - but I will save that for another time. The point is, philosophy is important because it is foundational to every field of study… whether you realize it or not.
In the fall I audited a practical philosophy class, PHIL103: Logic & Reasoning. I was introduced to propositional logic - propositions, validity, truth tables, translating arguments into the logical language, probability and the Bayes theorem. Essentially, I learned how to think in a purely logical way - the same way that computer “think”. I was cultivating a logical headspace that proved to be extremely useful in both my database and python classes.
In the spring I took IS590TH: Theories of Information. This was a one-time, small seminar-style class taught by the iSchool Dean, Dr. Renear. The goal of the class was to answer one seemingly simple question: what is information? We learned about the technique of conceptual analysis - essentially defining an abstract concept - and slowly accumulated the building blocks we would need in order to define information. We read original works by Gottlob Frege, Alonzo Church, Bertrand Russell, Saul Kripke, Edmund Gettier, H. P. Grice, and Jonathan Furner. While the Logic class had introduced me to the concept of a proposition, in this class we dissected it - to a degree that I had not thought possible. We dissected a lot of concepts, and as a result gained a much deeper understanding of all the fundamental building blocks that make up the concept of information. This was a class that really got my brain fired up - what can I say? I like thinking deeply about abstract and obscure things. But as a result, the headspace I have been cultivating for working with information deepened, broadened, and became more connected. I am able to think critically about information and data in a way that I could not have imagined a year ago, and I feel prepared to tackle more advanced subjects because I have such a solid foundation in the study of information science.
I knew from the beginning that databases were one of those fundamental technologies that I needed to learn how to use, simply because of how prevalent they are in storing information. But I had yet to learn just how important it is to properly organize and structure data so that it can be used properly.
For my first semester, I took IS490DB: Introduction to Databases. My main goal was to learn just what databases are and how to access the information in them using SQL. The course, however, placed a lot of emphasis on database design - and in the end, that was far more fascinating to me than learning how to write SQL queries (though I certainly learned that as well). We learned how to design a database from scratch - modeling the information in terms of entities, attributes, and relationships and drafting a Chen-style ER diagram (and later EER diagrams), mapping to the relational model, building the relational model with a GUI and then finally with SQL. At every stage there are design decisions to make that fundamentally effect how users will interact with the information. Throughout the course we were taught the principles of normalization, but we weren’t formally taught about normalization and functional dependencies until the very end of the course… and at that point it just felt natural and instinctual to use those good database design principles. I was fully converted - relational databases are da bomb!
During the spring I took IS561: Information Modeling. In this class, relational databases were just one technique of many for organizing information. And before you can organize information for use with a particular technology, you need to create a model of that information. Different types of information lend themselves well to different models and technology. We learned how to model documents using XML and how to craft the schema in a DTD; how to model networks using graphs and how the semantic web uses knowledge graphs formed from a combination of RDF syntax and OWL ontologies; we studied logic through phrase-structured formal grammars (BNF) and learned how to use first-order predicate logic (an expansion of propositional logic) and it’s applications in developing ontologies. Only at the end of the class did we learn about relational databases and how to map between the relational model and RDF. As it turns out, relational databases are just one of many ways to structure data - and the same information can be modeled using many different techniques and technologies, each with their advantages and disadvantages. And besides adding a bunch of modeling/organizational tools to my information magician toolbelt, I was also able to advance my understanding of formal logic. This was also a very project-oriented class, so I got a lot of hands-on practice using all of these tools.
I knew from the start that learning how to program would be very important - but it was also something I was kind of dreading. I was afraid I would find it boring, or too difficult and complicated. You can’t really say you have technical skills in this age and not know how to program - and my whole goal for this degree was to gain and advance my technical skills. What if I simply didn’t enjoy it?
As you can probably tell at this point, those fears were not realized. My first semester, I began learning how to program using the python language in IS452: Foundations of Information Processing. In this class, the emphasis was on using python to process information (vs building a software program). I had never considered software development to be very interesting, and I hadn’t really thought about other uses of programming languages outside of the realm of software development. But using it to process and transform data? That I found myself enjoying. I had already encountered the problem of having the data I needed but not in the right form for analysis (during my undergrad honors thesis). Having the power manipulate data structures and get the information I actually needed in a form that I could analyze? That was some powerful stuff. Besides learning how to use all of the basic data structures/techniques in python (lists, dictionaries, tuples, loops, functions, decision structures) we also learned how to read/write files (text, csv, json), navigate XML documents using XPath, and craft Regular Expressions. By the end of the course, I felt like I had a decent grasp of the base python language, and could use it to do some simple data wrangling tasks. But I did not yet have full confidence in my ability to handle any real information processing task independently - which is why the next class was so important.
In the spring I took IS590PR: Programming for Analytics and Data Processing. This class was designed as a follow-up to IS452, and it was all about manipulating data structures to get the target information out of various public datasets. This class allowed me to build confidence in my ability to use python for practical data wrangling projects. Every assignment I started out unsure whether I would even be able to complete it, but by the end I had a solution that I was proud of - and by repeating that process over and over again, I gained confidence that even if I did not know immediately how to solve a problem, I could figure it out. Along the way I learned how to use several important python libraries (Pandas, Numpy), how to properly document code using docstrings and test code using doctests, and how to use classes to create object-oriented programs. We also covered several other topics, including XPath, regex, requesting data from an API, graphs and the NetworkX library, and efficiency/optimization using the Cython and Numba packages. The tool that was most heavily used in this class, however, was Pandas - a library I have come to appreciate greatly.
One of the things that I most appreciate about this degree is how project oriented that classes are. Furthermore, many of my technical classes have very open-ended final projects - allowing me to work within the domain that I am interested in (political science, international relations). There are two major projects that I am especially proud of.
Correlates of War Database
During my first semester, I was able to combine the final projects for both my python and database classes. My goal was to create a relational database version of the various datasets that make up the Correlates of War project. I had previously used some of the CoW datasets for my undergraduate thesis - and had to go to the statistical consulting center to transform the data into the right form so it could be integrated with other datasets. The CoW project is widely used by political scientists studying international conflict, and I wanted to create something that would be useful for them. I knew that the CoW datasets could work so much better together if they existed within one cohesive database - but I had no idea just how difficult it would be to bring them back together. Over time, the datasets had been split apart and maintained by different parties, who made different design decisions. While the datasets could work together because they shared unique IDs for the various entities the project tracked (countries, territories, wars, etc), it was no longer possible to simply merge the datasets and move on.
So on the database side, I had to design and create a database schema that would integrate all of the various datasets. On the python side, I had to transform the publicly available datasets into the format required for my database design. And since there is no better library for wrangling tabular data than Pandas, that meant that I had to start teaching myself Pandas before formally learning it in the Spring semester. It was a much more ambitious undertaking than I had expected - and I continued to work on the project even after turning in the components required by my classes. In fact, I plan to continue working on this project throughout my degree, expanding it to incorporate other datasets that use the CoW identifier system.
If you’re interested in exploring this project more, please refer to its GitHub repo:
Prisoner’s Dilemma: Monte Carlo Simulation
This was the final project for my spring programming class, IS590PR. This project is significant to me for a few reasons - the amount of work I put into it, the game theory + poli-sci element, the potential for future academic work using this program… but mostly because it is the first time I have written my own classes. Up till now, including for all course assignments, I did everything procedurally. I did not foresee just how difficult it would be to wrap my head around programming with classes. They are just another data structure, yes, but they function in a fundamentally different way, and it took a lot of time and experimentation for me to be able to use classes properly (at a beginner level) and feel comfortable with the data flow.
So what does this project do? Essentially, it runs an iterated prisoner’s dilemma tournament (you will find umpteen-million poli-sci journal articles on this topic) many times with several randomized elements in order to rank different strategies on their game score. The twist that makes this prisoner’s dilemma tournament unique is “reactive noise”: in addition to having a starting noise environmental variable, the noise level is incremented up for defecting plays and down for cooperative plays.
If you’re interested in trying to run your own simulations using my code, look for the Quick Start Guide in the GitHub repo:
Bonus: coming soon
So the Correlates of War database project is a good showcase of “organizing information”, and the Prisoner’s Dilemma project is a good showcase of “processing information” - it’s only proper that I should also have a final project to showcase “an information-oriented headspace.” Well, it’s not really a project, or even really a proper academic paper, but I do have something for you.
For my spring philosophy class (IS590TH), our single/final assignment was to write our own (or critique someone else’s) conceptual analysis on a topic relevant to library/information science. I chose to write a conceptual analysis (a.k.a. definition) on unique identifiers. For some reason, identifiers fascinate me - and I had devoted a lot of thought throughout the semester to understanding identifiers on a deeper level.
So what are unique identifiers? Stay tuned to my next blog post to find out! I’ll be posting my paper in its full, rambling glory.
As much fun as learning for the sake of learning is, I’m earning this degree so that I can get a job doing what I enjoy. Taking classes is only half the battle - I also need to gain practical on-the-job experience.
During the fall semester, I had an hourly job and volunteered my time with different research projects in the iSchool. I talked with professors and told them what I wanted to do - work with data in the field of political science. Fortunately, many of those professors knew exactly where I had to go: the Cline Center for Advanced Social Research. The Cline Center is based in UIUC’s Research Park, and works on computational social science research with partners at the University and other organizations focused on providing data for political scientists. The Cline Center was the one place on campus that would provide me the opportunity to combine information science with political science. At the end of the fall semester, I found out that they were hiring a graduate research assistant. I owe it to several amazing professors at the iSchool who advocated for me and connected me with the folks at the Cline Center. The GRA-ship turned out to be a perfect melding of my various skillsets: I would be creating the documentation for the Cline Center’s Global News Index - a massive database of metadata and extracted features for international news articles.
I was hired, and in January I got started writing the documentation that would allow the Cline Center to open up the database to the campus at large. Researching for and writing the documentation used my journalism skills (investigating algorithms and distilling complicated information into a concise and readable form) my political science and journalism domain knowledge, and my newly acquired understanding of information science (knowing what information about the variables and corpora was important to include for researchers from a wide variety of domains). I got to use InDesign to layout the documentation (and even learned some new tricks!), and learned that technical writing was definitely in my wheelhouse. By the end of the semester, after spending 20 hours every week working on this documentation, I had produced documentation for the global news index itself, as well as user guides for the applications used to access the database.
I’m happy to say that next semester I will be continuing to work at the Cline Center as a Graduate Research Assistant, with my duties shifting from creating documentation to more directly assisting with the research conducted by the Cline Center and its partners. This GRA-ship was important to me for many reasons - not only did I gain experience in a new skill (technical writing), but I gained confidence in my ability to operate as a professional. I didn’t feel like a student, just there to help with menial tasks or shadow the real experts; I felt like a part of the team, contributing something valuable on my own merits. What’s more - I enjoyed my work. I even had a couple small opportunities to use my coding skills (which I enjoyed the most!). I’m so excited to see how I can further develop in my next semester.
But before the fall semester arrives, there is summer - and what an exciting summer it will be! This summer I am in Gothenburg, Sweden, as a Data Science Intern for the Program on Governance and Local Development at the University of Gothenburg. I am working directly with GLD’s Data Scientist on a major project - a massive survey on local governance issues in multiple Southeast African countries. My internship started off in the best way possible… attending GLD’s annual conference (at an amazing spa hotel) to get a crash course on accountability research in international development. I got to meet both academics and practitioners from international aid organizations and enjoyed many interesting discussions. As for the internship itself, I’ve been able to use my python skills to process information and put it into the form our Data Scientist needs in order to run her analyses. I’m very glad that both of my programming instructors taught a unit on XPath - most of the information I’ve been processing comes from XML documents! I’m enjoying myself immensely, because every day has a new coding challenge - and it’s all real problems (not class assignments) that need to be solved. I’m so excited to be at an internship that uses the information processing and organization skills I’ve spent the past year acquiring, in the domain of international development. Plus… it’s Sweden! Living and working in a foreign country feels pretty normal at this point, but I am beyond stoked to experience life in a Scandinavian nation (especially considering the cool summers). Here’s to another year of professional development - hopefully, it leads to a job!