The Iris Dataset

2022-02-28 1 분 소요

원본 사이트: https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

Iris 데이터 세트

이 데이터 세트는 3가지 품종(Setosa, Versicolor, Virginica) 150개의 관측치를 담고 있습니다. 각 샘플은 붓꽃의 꽃잎과 꽃받침의 길이와 너비의 4가지 특성을 갖고 있습니다. 그래서, 이 데이터 세트는 150x4의 numpy.ndarry 입니다.

행은 샘플을 나타냅니다. 그리고, 열은 꽃받침 길이, 꽃받침 너비, 꽃잎 길이 그리고 꽃잎 너비를 나타냅니다.

다음 plot은 처음 2개의 특성을 사용해서 나타냈습니다. 이 데이터 세트에 대한 자세한 정보를 보고 싶으면 여기를 보세요.

# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA

# 사용할 데이터 세트를 불러옵니다.
iris = datasets.load_iris()
X = iris.data[:, :2]  # 처음 2개의 특성만 다룹니다.
y = iris.target

x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5

plt.figure(2, figsize=(8, 6))       # figure 설정 및 할당
plt.clf()       # 최근 figure 제거

# 학습 데이터 세트 산점도로 표현하기
for target in np.unique(y):   # target = [0, 1, 2]
    plt.scatter(X[:, 0][y==target], X[:, 1][y==target], edgecolor="k", cmap=plt.cm.Set1, label=iris.target_names[target])
# plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1, edgecolor="k")
plt.xlabel("Sepal length")
plt.ylabel("Sepal width")
plt.legend(loc='best')

plt.xlim(x_min, x_max)    # x축 값의 범위: x_min ~ x_max
plt.ylim(y_min, y_max)    # y축 값의 범위: y_min ~ y_max
plt.xticks(())            # x축 눈금 제거
plt.yticks(())            # y축 눈금 제거

# 차원의 상호 작용을 더 잘 이해하기 위한 작업
# 처음 3개의 PCA(Principal Component Analysis) 차원 산점도로 표현하기
fig = plt.figure(1, figsize=(8, 6))     # figure 설정 및 할당
ax = Axes3D(fig, elev=-150, azim=110)   # 고도 시야각=-150, 방위각 시야각=110
X_reduced = PCA(n_components=3).fit_transform(iris.data)    # iris 데이터세트를 주성분 3개의 PCA 변환
for target in np.unique(y):
    ax.scatter(
        X_reduced[:, 0][y==target],
        X_reduced[:, 1][y==target],
        X_reduced[:, 2][y==target],
        label=iris.target_names[target],
        # c=y,
        cmap=plt.cm.Set1,
        edgecolor="k",
        s=40,
    )
ax.set_title("First three PCA directions")
ax.set_xlabel("1st eigenvector")
ax.w_xaxis.set_ticklabels([])     # x축 눈금 제거
ax.set_ylabel("2nd eigenvector")
ax.w_yaxis.set_ticklabels([])     # y축 눈금 제거
ax.set_zlabel("3rd eigenvector")
ax.w_zaxis.set_ticklabels([])     # z축 눈금 제거

plt.show()      # 그래프 출력

Twitter Facebook LinkedIn

The Iris Dataset

Iris 데이터 세트

공유하기

댓글남기기

참고

통계 분석 기법과 머신러닝 맛보기

다양한 데이터 분석의 방법

데이터 분석 예제

파이썬 모델링 라이브러리