PANDAS

Get started with data analysis using Pandas, the Python package that makes it easy to read, clean, understand, and plot your data.

About The Course

“Data is the new oil,” the saying goes. Which means that working with data is a key part of the modern economy. But what tools can you use to work with data?

 

For millions of people, Pandas is the answer: This Python library is built on top of NumPy, meaning that it combines the speed of C with the friendliness of Python. But Pandas goes way beyond NumPy, providing a wide variety of methods that make it easy to read, clean, analyze, plot, and write data in a large number of formats.

 

This course introduces you to Pandas, and to the many ways you can use it in your day-to-day work. You’ll learn how Pandas works, how to use such techniques as broadcasting and boolean indexing, how to read data from a variety of formats and then how to clean that data once you’ve read it into memory. And you’ll see how Pandas works not only with numbers, but also with text — allowing for all sorts of interesting analysis.

 

This is the same course that I give at Fortune 500 companies around the world, including a large number of exercises and the Jupyter notebooks I used when preparing the course.

 

If the world of data analysis interests you, then this course will get you going in relatively short order.

 

This Course Will Show You How To...

This course teaches you everything you need to know to get started with Pandas:
_check-box

Series

_check-box

Data frames

_check-box

Boolean indexing

_check-box

Grouping

_check-box

Joining

_check-box

Plotting

_check-box

Working with missing data

_check-box

Cleaning data

_check-box

Strings

_check-box

Sorting

_check-box

Indexes and multi-indexes

Every section of the course has multiple exercises that help to confirm you know the material. Plus, you can download the Jupyter notebooks I used to create the course.

Preview The Course

Setting and retrieving indexes
Retrieving with loc and iloc

Assigning to dtypes

Course Contents

Course Length

12 Hours

Number of Lessons

151

Training Materials

34 PDFs

Coding Exercises

77

  • Introduction (3 mins)
  • What is pandas? (5 mins)
  • Installing pandas (6 mins)
  • Loading pandas into Jupyter (6 mins)
  • Creating a Series (5 mins)
  • Creating a Series with NumPy (4 mins)
  • Setting and retrieving with indexes (5 mins)
  • Retrieving with loc and iloc (5 mins)
  • Setting the index (6 mins)
  • Non-unique indexes (4 mins)
  • Fancy indexing (3 mins)
  • Basic methods (2 mins)
  • Operations by index (3 mins)
  • Broadcasting operators (2 mins)
  • Boolean indexing (3 mins)
  • Exercise 1 (2 mins)
  • Exercise 1 solutions (6 mins)
  • dtypes (7 mins)
  • Assigning to dtypes (6 mins)
  • Using astype (6 mins)
  • NaN (3 mins)
  • Skipping NaN (5 mins)
  • dropna and fillna (7 mins)
  •  Fill value (6 mins)
  • Exercise 2 (2 mins)
  • Exercise 2 solutions (5 mins)
  • Size and count (3 mins)
  • Median and quantiles (4 mins)
  • Describe (4 mins)
  • describe with non-numeric data (2 mins)
  • Head and Tail (3 mins)
  • Value Counts (5 mins)
  • Duplicated (3 mins)
  • Replace (5 mins)
  • Sorting (5 mins)
  • Apply (5 mins)
  • Exercise 3 (2 mins)
  • Exercise 3 solutions (6 mins)
  • Strings in Pandas vs NumPy (5 mins)
  • String methods and the “str” object (5 mins)
  • Finding numbers (4 mins)
  • startwith and endswith (3 mins)
  • [] and strings (4 mins)
  • str.contains (4 mins)
  • find and index (6 mins)
  • Modifying data (6 mins)
  • Splitting and reusing str (3 mins)
  • Exercise 4 (1 min)
  • Exercise 4 solutions (5 mins)
  • Simple plots with maplotlib (14 mins)
  • More sophisticated plotting with Matplotlib (11 mins)
  • Line plots via pandas (8 mins)
  • Bas plots with pandas (5 mins)
  • Histograms (5 mins)
  • Pie plots (6 mins)
  • Exercise 5 (2 mins)
  • Exercise 5 solutions (6 mins)
  • Data frames introduction (5 mins)
  • Index and columns (simple retrievals) (6 mins)
  • Dot syntax for column retrieval (2 mins)
  • Setting the index and columns (7 mins)
  • Retrieving an individual value (5 mins)
  • Creating data frames from NumPy arrays (6 mins)
  • Creating data frames from a list of dicts (5 mins)
  • Creating data frames from a dict of lists, arrays, or series (8 misn)
  • Methods on columns (3 mins)
  • Methods on an entire data frame (4 mins)
  • Retrieving multiple columns (6 mins)
  • Retrieving multiple rows (4 mins)
  • Updating values in a data frame (5 mins)
  • Using “describe” on data frames (4 mins)
  • Updating a column (6 mins)
  • Adding columns (4 mins)
  • Updating values in rows and adding rows (3 mins)
  • Dropping one or more rows (2 mins)
  • Dropping one or more columns (3 mins)
  • Exercise 6 (2 mins)
  • Exercise 6 solutions (9 mins)
  • Boolean indexes on a column (4 mins)
  • Applying boolean indexes to other columns (8 mins)
  • Complex queries across columns (6 mins)
  • Applying a boolean index to an entire data frame (5 mins)
  • Assigning to data frames (recap) (4 mins)
  • Asigning to multiple rows and columns with loc (4 mins)
  • Assigning to a column based on a boolean index (8 mins)
  • Chained assignment: what it is, and how to avoid it (4 mins)
  • Data frame assignment example (4 mins)
  • Assigning a scalar value to a data frame, based on a condition (4 mins)
  • Assigning a vector value to a data frame, based on a condition (5 mins)
  • using df.replace to replace values across a data frame (6 mins)
  • Using isna, dropna, and fillna with dta frames (8 mins)
  • Using mask and where (4 mins)
  • Using clip (6 mins)
  • Exercises 7 (3 mins)
  • Exercise solutions 7 (14 mins)
  • Pandas and IO — and saving to the clipboard (6 mins)
  • Saving to CSV (6 mins)
  • Changing the CSV separator (6 mins)
  • NaN representation (3 mins)
  • Choosing output columns (3 mins)
  • Writing row and column names (6 mins)
  • Saving with compression (6 mins)
  • Reading CSV files (6 mins)
  • Choosing and ignoring header rows (5 mins)
  • Naming columns (5 mins)
  • Choosing columns (5 mins)
  • Choosing + naming (6 mins)
  • Reading NaN values (4 mins)
  • dtype hints when reading CSV (6 mins)
  • Reading from the network (5 mins)
  • Exercise 8 (1 min)
  • Exercise 8 solutions (6 mins)
  • Excel files (6 mins)
  • Json files (6 mins)
  • SQL databases (13 mins)
  • Analysis of taxi data (6 mins)
  • Taxi data part 2 (3 mins)
  • Taxi data part 3 (5 mins)
  • Taxi data part 4 (5 mins)
  • Taxi data part 5 (6 mins)
  • Taxi data part 6 (6 mins)
  • Exercises 9 (3 mins)
  • Exercise 9 solutions (8 mins)
  • Data frames and memory usage (13 mins)
  • Memory usage in series and data frames (7 mins)
  • Categories (8 mins)
  • Setting dtypes upon load (3 mins)
  • Predefining categories (6 mins)
  • Avoiding low-memory warnings (4 mins)
  • Exercises 10 (2 mins)
  • Exercise solutions 10 (5 mins)
  • set_index and restet_index (7 mins)
  • multi-indexes on series (6 mins)
  • multi-indexes on data frames (5 mins)
  • stack and unstack (4 mins)
  • swaplevel (3 mins)
  • Exercises 11 (3 mins)
  • Exercise 11 solutions (13 mins)
  • sort_index(6 mins)
  • sort_values (4 mins)
  • Concatenating data frames (5 mins)
  • concatenating different data frames (5 mins)
  • Inner and outer joins (3 mins)
  • Merging — inner, left, right, and outer (10 mins)
  • groupby (7 mins)
  • Pivot tables (3 mins)
  • Exercises 12 (3 mins)
  • Exercise solutions 12 (11 mins)
  • Plotting data frames (11 mins)
  • Bar plots (4 mins)
  • Stacked bar plots (4 mins)
  • Histograms (5 mins)
  • Pie plots (4 mins)
  • Box plots (5 mins)
  • Scatter plots (8 mins)
  • Scatter plots and colormaps (7 mins)
  • Scatter matrix (3 mins)
  • Exercises 13 (2 mins)
  • Exercises 13 solutions (10 mins)
  • Conclusion (2 mins)

This Course Is Perfect For...

_check-box

People with a background in Python (core data types and functions) and with some knowledge of NumPy who want to take advantage of Pandas and its extra capabilities.

GET STARTED NOW

Monthly

Annual

Save 20%

BUY THIS COURSE

One-Time Purchase (Lifetime Access)
$ 360 One-Time
  • Pandas data types vs. Python data types
  • Retrieving and storing data
  • Analyzing your data
  • Handling missing and corrupt data
  • Create beautiful plots from your data

OR

GET A MEMBERSHIP

Access All My Training
$ 50 Per Month
  • All my Python courses
  • Monthly office hours + special events
  • Private forum

BUY THIS COURSE

One-Time Purchase (Lifetime Access)
$ 360 One-Time
  • Pandas data types vs. Python data types
  • Retrieving and storing data
  • Analyzing your data
  • Handling missing and corrupt data
  • Create beautiful plots from your data

OR

GET A MEMBERSHIP

Access All My Training
$ 500 Per Year
  • All my Python courses
  • Monthly office hours + special events
  • Private forum

100% Money Back Guarantee

I’m a one-person company dedicated to improving your career via Python and related technologies. If you haven’t gotten value from any of my courses, then just tell me — and I’ll refund your money.

Meet Your Instructor

Reuven is a full-time Python trainer. In a given year, he teaches courses at companies in the United States, Europe, Israel, India, and China — as well as to people around the world, via his online courses.

Reuven created one of the first 100 Web sites in the world just after graduating from MIT’s computer science department. He opened Lerner Consulting in 1995, and has been offering training services since 1996.

In 2020, Reuven published “Python Workout,” a collection of Python exercises with extensive explanations, published by Manning. He’s currently finishing edits on “Pandas Workout,” a similar collection of exercises using the “Pandas” library for data analytics.

Reuven’s free, weekly “Better developers” newsletter, about Python and software engineering, is read by more than 30,000 developers around the globe. His “Trainer weekly” newsletter is popular among people who give corporate training.

Reuven’s most recent venture is Bamboo Weekly: Every Wednesday, he presents a problem based on current events, using a public data set. And every Thursday, he shared detailed solutions to those problems using Pandas.

Reuven’s monthly column appeared in Linux Journal from 1996 until the magazine’s demise in 2019. He was also a panelist on both the Business of Freelancing and Freelancers Show podcasts.

Reuven has a bachelor’s degree in computer science and engineering from MIT, and a PhD in learning sciences from Northwestern University. He lives in Modi’in, Israel with his wife and three children.