This lesson is still being designed and assembled (Pre-Alpha version)

Data Wrangling with Computational Notebooks

Synopsis: This workshop will be an introduction to understanding how to utilize Jupyter Notebooks to create reproducible computational workflows. Attendees will learn how to build notebooks that contain a combination of explanatory markdown formatted text and python code. Lessons will address how to load data from formats such as .csv or .tsv files, how to clean up data that has been loaded, and how to analyze this data. The workshop will conclude with attendees utilizing the skills they are taught to wrangle a real-life dataset and visualize its contents. Python libraries such as Pandas and Matplotlib will be utilized.

Prerequisites

  • Basic understanding of Python
  • Basic understanding of Markdown language
  • Basic understanding of Unix-like file systems
  • Computer with an internet connection
  • Have an account on Mana (in case of problems with Binder)
  • MFA/DUO enabled on your UH Account

Schedule

Setup Download files required for the lesson
00:00 1. Introduction Why are jupyter notebooks useful for cleaning and wrangling data?
00:10 2. Jupyter Notebook Interface How is the jupyter notebook interface setup
00:20 3. Loading and Handling Pandas Data How are Pandas data structures setup?
How to load data into Pandas?
How to write data from Pandas to a file
00:40 4. Wrangling DataFrames How can you select individual columns or rows from a DataFrame?
How can you subset a DataFrame?
How can you sort a DataFrame?
01:00 5. DataFrame Analysis What are some common attributes for Pandas DataFrames?
What are some common methods for Pandas DataFrames?
How can you do arithmetic between two Pandas columns?
01:20 6. Real Example Cleaning How do you clean an example dataset?
How do you deal with missing data?
How do you fix column type mismatches?
01:40 7. Real Example Analysis How do you visualize data from a DataFrame?
How do you group data by year and month?
How do you plot multiple measurements in a single plot?
02:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.