Data analytics with NumPy and Pandas
Length: 4 days
Description
This course introduces the NumPy and Pandas packages for Python, and shows how they can be used to ask and answer a variety of questions involving data analysis. NumPy and Pandas are the foundation of the “SciPy stack,” a set of Python packages that have become extremely popular in recent years. Indeed, some financial institutions have begun to replace certain uses of Excel with Pandas, because of its versatility and power.
This course covers all of the major ways in which NumPy and Pandas are typically used — from reading data, to processing and cleaning it, to visualizing it, to exporting it into other formats.
This is the class that some of the world’s largest and best-known companies (e.g., Apple, Cisco, and Western Digital) invite me to teach to their engineers and data scientists.
The course includes a large number of hands-on exercises, along with live demos and ample opportunity for questions and discussion.
Like all of my courses, this is taught without slides. Instead, I live-code into a Jupyter notebook that is available in real time and which I distribute to participants at the end of the course.
Let’s talk about how to customize this course for your team! Set a meeting at https://savvycal.com/reuven/corp-training.
Audience
This course is aimed at programmers and data scientists who have day-to-day practical experience working with Python. Knowledge of basic data types, an ability to write loops, and familiarity with writing and executing functions will all be needed.
Participants will receive the Jupyter notebooks into which I live-code during the class — including demos, exercises, and remarks.
Syllabus
• Intro to Jupyter
• NumPy: Arrays. dtypes. Operations. Boolean indexing. Broadcasting. Sorting. Searching.
• Pandas series: Creating. Retrieving with loc and iloc. pd.col (in Pandas 3). Indexes. Boolean indexing. Method chaining. dtypes. Differences from NumPy.
• Data frames: Creating. Retrieving with 2-argument loc. Modifying and assigning.
• External data: Reading and writing CSV. Excel files. Feather and Parquet. Remote retrieval. Scraping Web sites.
• Indexing: Single-and multi-level indexing. Stacking and unstacking.
• Grouping and pivot tables.
• Sorting. Joining and merging.
• Working with string data.
• Working with datetime values.
• Optimizing data frames: Memory measurement. Categories.
• Visualization and plotting.
•
