52 Weeks of Data Science

Weekly posts cover new concepts and hands-on projects. Follow along to explore data analysis, data cleaning, and machine learning.

Published on

Over the years, I’ve come across numerous challenges using the format X days of {insert technology here}, where the goal is to learn a subject by doing exercises each day and posting about it to stay accountable.

Inspired by this, I’m attempting to provide myself with some structure for learning concepts of Data Science and Statistics. Each week, I will tackle new (to me) concepts and projects to further my understanding. Why weekly instead of daily? While I believe that doing something daily to learn is important, some days I’m quite burnt out from work and don’t want to do anything else. I think weekly will give me the flexibility I need to stay motivated and keep a good pace.

Background

While I’ve known about the field, I didn’t really start thinking about Data Science until I met my wife, an Epidemiologist. Seeing and hearing about the projects and data she works with sparked my interest. Although I don’t have any formal training in Data Science or Statistics, the field has become much more interesting to me.

I have a substantial amount of software development experience to lean on, having used a number of different languages throughout my career. I think this will help ease me into the topic since coding won’t be as big of a barrier for me.

The Desired Outcome

In short, I’m hoping to learn skills that will complement my career as a software developer and provide opportunities for a pivot in the coming years.

When I look at data sets, I often don’t have an idea of what questions to ask to glean any insights. I’m looking to develop skills to explore and analyze data effectively, as well as to build a solid mathematical understanding of the concepts behind the analysis.

Resources

There are a couple of books I’m planning to use:

  1. R for Data Science
  2. Learn Statistics with R These books appear to be highly recommended as excellent introductory resources. I have started going through R for Data Science, and it has been very useful for learning how to clean and explore data using R.

Outside of books, I think the best way to learn is by doing. So, I’m going to find data sets as I progress and start exploring. Right now, I’m looking at using the data sets provided through data.gov, specifically the CDC data sets. I find public health and climate data, in particular, interesting.

Going Forward

My plan is to write a post every week about what I’ve been learning and doing. Currently, the topics I find interesting are analysis, data cleaning, and machine learning. I plan to touch on all of these as I progress.

I hope I can maintain this habit and not move on to another interest in four weeks. Maybe this will help keep me accountable.