SW/Python

Python : Keras : iris 품종 예측하기 : 예제, 구현, 방법

얇은생각 2020. 2. 6. 07:30
반응형

https://archive.ics.uci.edu/ml/datasets/Iris

 

UCI Machine Learning Repository: Iris Data Set

Data Set Characteristics:   Multivariate Number of Instances: 150 Area: Life Attribute Characteristics: Real Number of Attributes: 4 Date Donated 1988-07-01 Associated Tasks: Classification Missing Values? No Number of Web Hits: 3093005 Source: Creator: R.

archive.ics.uci.edu

 

인기있는 데이터인 iris 데이터를 활용하여 딥러닝을 진행합니다. 해당 사이트에서 데이터를 다운받을 수 있습니다.

 

 

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from keras.layers import *
from keras.models import *
from keras.utils import *
from sklearn.preprocessing import *
import seaborn as sns

 

우선 필요한 라이브러리를 임포트합니다.

 

 

df = pd.read_csv('iris.data', names=['sl', 'sw', 'pl', 'pw', 'class'], index_col=False)

df.head()

 

데이터를 읽어와서, 컬럼 이름을 설정해줍니다.

 

 

df.count()

"""
sl       150
sw       150
pl       150
pw       150
class    150
dtype: int64
"""

 

약 150개의 데이터가 있다는 것을 확인하였습니다.

 

 

df.describe()

 

 

데이터 정보는 위와 같습니다.

 

 

Y = LabelEncoder().fit_transform(df['class'])
Y = to_categorical(Y)
print(Y)

 

출력 데이터는 꽃의 품종으로 정하였습니다. 훈련할 수 있도록, 데이터를 전처리 해줍니다.

 

 

X = df.drop('class', axis=1)

X_train = X[:-5]
X_test = X[-5:]

Y_train = Y[:-5]
Y_test = Y[-5:]

 

이제 나머지 다른 데이터들은 입력 데이터로 활용합니다. 약 5개를 두고 나머지는 훈련데이터로 활용합니다.

 

 

model = Sequential()
model.add(Dense(256, input_shape=(4,), activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


hist = model.fit(X_train, Y_train, epochs=20, validation_split=0.1)

"""
WARNING:tensorflow:From C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 256)               1280      
_________________________________________________________________
dense_2 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_3 (Dense)              (None, 32)                4128      
_________________________________________________________________
dense_4 (Dense)              (None, 3)                 99        
=================================================================
Total params: 38,403
Trainable params: 38,403
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:From C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\ops\math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Train on 130 samples, validate on 15 samples
Epoch 1/20
130/130 [==============================] - 2s 14ms/step - loss: 1.0559 - acc: 0.5923 - val_loss: 0.9480 - val_acc: 0.5333
Epoch 2/20
130/130 [==============================] - 0s 185us/step - loss: 0.8899 - acc: 0.8077 - val_loss: 0.8882 - val_acc: 0.4667
Epoch 3/20
130/130 [==============================] - 0s 192us/step - loss: 0.7484 - acc: 0.8077 - val_loss: 1.1686 - val_acc: 0.0000e+00
Epoch 4/20
130/130 [==============================] - 0s 192us/step - loss: 0.6301 - acc: 0.8308 - val_loss: 0.6557 - val_acc: 1.0000
Epoch 5/20
130/130 [==============================] - 0s 185us/step - loss: 0.5402 - acc: 0.9462 - val_loss: 0.5531 - val_acc: 1.0000
Epoch 6/20
130/130 [==============================] - 0s 185us/step - loss: 0.4642 - acc: 0.9308 - val_loss: 0.6778 - val_acc: 0.6667
Epoch 7/20
130/130 [==============================] - 0s 185us/step - loss: 0.4023 - acc: 0.8538 - val_loss: 1.0861 - val_acc: 0.0000e+00
Epoch 8/20
130/130 [==============================] - 0s 177us/step - loss: 0.3827 - acc: 0.7846 - val_loss: 0.6464 - val_acc: 0.6667
Epoch 9/20
130/130 [==============================] - 0s 200us/step - loss: 0.3345 - acc: 0.9385 - val_loss: 0.3399 - val_acc: 1.0000
Epoch 10/20
130/130 [==============================] - 0s 223us/step - loss: 0.2916 - acc: 0.9462 - val_loss: 0.7931 - val_acc: 0.4667
Epoch 11/20
130/130 [==============================] - 0s 192us/step - loss: 0.2631 - acc: 0.9308 - val_loss: 0.4372 - val_acc: 0.8667
Epoch 12/20
130/130 [==============================] - 0s 192us/step - loss: 0.2320 - acc: 0.9769 - val_loss: 0.4872 - val_acc: 0.7333
Epoch 13/20
130/130 [==============================] - 0s 185us/step - loss: 0.2041 - acc: 0.9538 - val_loss: 0.6330 - val_acc: 0.6667
Epoch 14/20
130/130 [==============================] - 0s 185us/step - loss: 0.2021 - acc: 0.9385 - val_loss: 0.3467 - val_acc: 0.8667
Epoch 15/20
130/130 [==============================] - 0s 192us/step - loss: 0.1703 - acc: 0.9692 - val_loss: 0.3072 - val_acc: 1.0000
Epoch 16/20
130/130 [==============================] - 0s 200us/step - loss: 0.1472 - acc: 0.9769 - val_loss: 0.5854 - val_acc: 0.6667
Epoch 17/20
130/130 [==============================] - 0s 192us/step - loss: 0.1510 - acc: 0.9462 - val_loss: 0.3326 - val_acc: 0.8000
Epoch 18/20
130/130 [==============================] - 0s 192us/step - loss: 0.1297 - acc: 0.9538 - val_loss: 0.4239 - val_acc: 0.7333
Epoch 19/20
130/130 [==============================] - 0s 177us/step - loss: 0.1229 - acc: 0.9615 - val_loss: 0.3638 - val_acc: 0.8000
Epoch 20/20
130/130 [==============================] - 0s 192us/step - loss: 0.1161 - acc: 0.9615 - val_loss: 0.4100 - val_acc: 0.7333
"""

 

이제 훈련을 진행합니다. 간단한 모델을 설정하여 구현하였습니다.

 

 

plt.figure(figsize=(10,10))
plt.subplot(1,2,1)

plt.plot(hist.history['loss'], color='r')
plt.plot(hist.history['val_loss'], color='b')
plt.title('loss')

plt.subplot(1,2,2)
plt.plot(hist.history['loss'], color='r')
plt.plot(hist.history['val_acc'], color='b')
plt.title('acc')

 

 

훈련 진행 결과를 그래프로 표현하면 위와 같습니다.  훈련이 많이 진행한다고, 정확도가 높아지는 것은 아니지만, 로스 값을 토대로 나름 훈련이 잘 진행된 것을 확인할 수 있습니다.

 

 

score = model.evaluate(X_test, Y_test)
pred = model.predict(X_test)

print(pred)
print(Y_test)

"""
5/5 [==============================] - 0s 4ms/step
[0.3501374423503876, 1.0]
[[7.8439794e-04 2.9193324e-01 7.0728236e-01]
 [1.0062082e-03 2.6554585e-01 7.3344797e-01]
 [1.2481078e-03 4.1020229e-01 5.8854961e-01]
 [4.8029551e-04 1.5584722e-01 8.4367245e-01]
 [1.4025040e-03 3.2443118e-01 6.7416626e-01]]
[[0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]]
"""

 

이제 테스트 데이터로 모델을 검증해보았습니다. 5개의 테스트 데이터는 모두 맞추어 정확도가 1이 나오는 것을 확인할 수 있었습니다. 즉, 꽃의 다른 수치 값들을 토대로 꽃의 품종을 예측하는 것은 어느 정도 유의미한 결과를 뽑아낼 수 있다는 것을 알 수 있습니다. 이렇게 하기 위해서는 좋은 양질의 데이터가 제공되어야 한다는 것 역시 체감할 수 있었습니다.

반응형