Pandas Practice Workshop
Length: 1 day
Description
Python is the top language for data science, and Pandas is the Python library that people turn to most. It combines Python’s ease of use and flexibility with the speed and efficiency of C.
It isn’t hard to learn Pandas, and to do some basic analysis with it. But getting the most out of Pandas can be hard — the library is huge, offering a wide variety of features. It’s often not obvious which is the most efficient and readable way to solve a problem. This has become particularly true in the last few years, now that Pandas supports several back-end storage facilities, including Apache Arrow.
Most Pandas courses want to teach you new techniques. By contrast, this course is designed to help you gain fluency and understanding in techniques you have probably learned, but which you haven’t necessarily had a chance to explore in depth. The assumption is that participants have already participated in a Pandas course.
The course consists of a series of hands-on exercises. Each exercise will require the use of one or more Pandas techniques. After each exercise, we will review and discuss each other’s solutions. The instructor’s role will be to provide help while working to solve the exercises, to explain techniques that participants might not understand well, to walk through the solution to each exercise, and to facilitate the discussion session for each exercise.
The exercises all use real-world data sets, and generally have several parts to them. Many are taken from my Pandas Workout book and my Bamboo Weekly newsletter.
Let’s talk about how to customize this course for your team! Set a meeting at https://savvycal.com/reuven/corp-training.
Audience
Employees who have already taken one or more Pandas courses, and are interested in improving their Pandas fluency. The assumption is that participants have a solid grasp of series, data frames, dtypes, and basic Pandas methods.
Some of the topics discussed in exercises
• Choosing dtypes
• Reading data from CSV and Excel files
• Cleaning data
• Sorting
• Grouping and pivot tables
• Using .iloc, loc, and pd.col
• Time series
• Strings
• Plotting
• Optimization of data size and query time
• Use of PyArrow
•
