- Import Necessary Libraries:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
datasets
: This module provides access to various datasets, and in this case, we’re using the Iris dataset.train_test_split
: This function is used to split the dataset into training and testing sets.StandardScaler
: This class is used to standardize the features by removing the mean and scaling to unit variance.KNeighborsClassifier
: This is a k-nearest neighbors classifier from scikit-learn.accuracy_score
: This function is used to calculate the accuracy of the model.
- Load the Iris Dataset:
iris = datasets.load_iris()
X = iris.data
y = iris.target
load_iris()
: This function loads the Iris dataset, which is a commonly used dataset in machine learning.
- Split the Dataset:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_test_split()
: This function splits the dataset into training and testing sets. Here, 80% is used for training and 20% for testing. Therandom_state
parameter ensures reproducibility.
- Standardize the Features:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
StandardScaler()
: This object is created to standardize the features.fit_transform()
: This method computes the mean and standard deviation needed for standardization and applies the transformation to the training data.transform()
: This method applies the same transformation to the testing data.
- Create a K-Nearest Neighbors Classifier:
knn_classifier = KNeighborsClassifier(n_neighbors=3)
KNeighborsClassifier()
: This creates a k-nearest neighbors classifier withn_neighbors
set to 3.
- Train the Model:
knn_classifier.fit(X_train, y_train)
fit()
: This method trains the model on the training data.
- Make Predictions:
y_pred = knn_classifier.predict(X_test)
predict()
: This method generates predictions on the test data.
- Evaluate the Model:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
accuracy_score()
: This function calculates the accuracy of the model by comparing the predicted labels (y_pred
) with the actual labels (y_test
).
This code is a basic example to get you started with machine learning using scikit-learn. To apply machine learning successfully, you’ll often need to customize the code based on your specific problem, dataset, and the algorithm you choose. It’s also important to understand the concepts behind the code, such as data preprocessing, model selection, training, and evaluation.