Machine Learning
Basic
Machine Learning, a subdomain of artificial intelligence, allows computers to produce output without being explicitly programmed.
This competency area includes using feature selection, and model selection, selecting, using, and optimizing machine learning models, procuring data, performing basic operations on data, among others.
Key Competencies:
- Feature engineering - Using feature selection, and model selection.
- Machine Learning Models - Selecting, using, and optimizing machine learning models.
- Libraries - Familiarity with various machine learning libraries such as scipy, sympy, numpy, pandas, scikit-learn, and matplotlib.
- Working with Jupyter Notebook - Launch a Jupyter notebook server and create a new Jupyter notebook. Setting up a machine learning development environment (and associated tools) by launching a Jupyter notebook server from the command line and creating a new notebook.
- UCI Machine Learning Repository​ ​- ​Procuring data to use during the machine learning process from the UCI Machine Learning Repository. Data should be downloaded using a Jupyter Notebook.
- Ingest data and display first 10 observations​ - Ingesting downloaded data and inspecting that data using libraries such as Pandas, a popular Python machine learning library that provides dataframes.
- Visualize data ​- Visualizing downloaded data using libraries such as Matplotlib, a popular Python machine learning library that provides 2D plotting.
- Clean and prepare data​ - Cleaning and preparing data by removing null (or empty) or duplicate observations using libraries such as Pandas, a popular Python machine learning library that provides dataframes.
- Transform categorical features to numerical ​- Transforming data using libraries such as Pandas, a popular Python machine learning library that provides dataframes. For example, converting “True” or “False” values to “1” or “0”, respectively.
- Determine data ratios​ - Determining data ratios (for binary classification) using Pandas, a popular Python machine learning library that provides dataframes. For example, counting the “True” and “False” observations across a dataset.