Introduction to machine learning in Python
Length: 4 days
Description
Machine learning is changing the world — from ChatGPT, to predictive typing on your cellphone, to Amazon’s Alexa voice recognition, to the spam detector in our e-mail programs. In this course, we’ll learn the basic principles behind machine learning, and will see how we can put these ideas into practice using Python and its popular “scikit-learn” library.
The course will discuss the main uses of machine learning: Classification, regression, and clustering — including specific use cases, such as classification of text, classification of images, and clustering for outlier detection. We’ll create models, and then test those models to make sure that they aren’t overfit. We’ll look at ways in which we can transform our data for better model results.
We will also discuss some of the transformations needed for successful machine learning, and how we can overcome them, including scaling and one-hot encoding. We’ll then see how these can be automated, using such tools as ColumnTransformer, and how we can package such transformations into a “pipeline.”
While we will discuss a number of machine-learning algorithms in the class, the discussion will be at a relatively high level, and will not go into mathematical detail.
By the end of this course, participants will have not only an understanding and appreciation of what machine learning is and how it works, but how they can use machine learning to solve a variety of problems.
The course includes a large number of hands-on exercises, along with live demos and ample opportunity for questions and discussion.
Like all of my courses, this is taught without slides. Instead, I live-code into a Jupyter notebook that is available in real time and which I distribute to participants at the end of the course.
Let’s talk about how to customize this course for your team! Set a meeting at https://savvycal.com/reuven/corp-training.
Audience
Participants are expected to have minimal experience with Python: Knowledge of the basic data types, an ability to write loops, familiarity with writing and executing basic functions, and a basic understanding of object-oriented programming. In addition, participants should be familiar with Python’s Pandas library, especially retrieving, selecting, and modifying data in data frames.
Participants will receive the Jupyter notebooks into which I live-code during the class — including demos, exercises, and remarks.
Syllabus
• What is machine learning?
• Intro to scikit-learn
• Classification problems. Common estimators. Choosing an estimator. Training and testing models. Variance and bias. Hyperparameters.
• Programmatic comparison of models.
• Visualization and models.
• Model persistence.
• Regression problems: Common estimators. Testing regression models. Post-fitting attributes. Scaling. Pipelines. Ensemble estimators. Visualization.
• Transforming data: One-hot encoding. Missing values. ColumnsTransformer.
• Classification of text and documents.
• Classification of images.
• Unsupervised learning and clustering.
• Novelty and outlier detection.
•
