This lesson is still being designed and assembled (Pre-Alpha version)

Data Wrangling with Computational Notebooks: Glossary

Key Points

Introduction	Notebooks help connect the code for cleaning and wrangling data to the documentation explaining what is being done and why.
Jupyter Notebook Interface	A jupyter notebook is divided into cells that are either code, markdown, or raw Cells can be “run” leading to either the execution of code or formatting of markdown depending on the cell type Code cells can be rerun, but this should be avoided to prevent obscuring the notebooks workflow
Loading and Handling Pandas Data	Pandas provides numerous attributes and methods that are useful for wrangling and analyzing data Pandas contains numerous methods to help load/write data to/from files of different types
Wrangling DataFrames	Select columns by using `["column name"]` or rows by using the `loc` attribute Sort based on values in a column by using the `sort_values` method
DataFrame Analysis	Using `.dtypes` to get the types of each column in a `DataFrame` To get general statistics on the DataFrame you can use the `describe` method You can add a constant to a numeric column by using the `column + constant`
Real Example Cleaning	Cleaning a dataset is an iterative process that can require multiple passes Keep in mind to restart the kernel when cleaning a dataset to make sure that your code encompasses all the cleaning needed.
Real Example Analysis	Grouping data by year and months is a powerful way to identify monthly and yearly changes You can easily add more measurements to a single plot by using a list There is a lot we didn’t cover here, so take a look at the Matplotlib docs (Link to Matplotlib docs) and other libraries that can allow you to make dynamic plots e.g. Plotly (Link to Plotly docs)

Glossary

FIXME