SW/Python

Python : Keras : 콘크리트 강도 분류 예측하기 : 예제, 방법, 컨셉

얇은생각 2020. 2. 7. 07:30
반응형

https://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength

 

UCI Machine Learning Repository: Concrete Compressive Strength Data Set

Concrete Compressive Strength Data Set Download: Data Folder, Data Set Description Abstract: Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. Data Set Cha

archive.ics.uci.edu

 

이번에는 케라스를 활용해, 콘크리트의 강도를 체크하는 예제를 활용해보도록 하겠습니다. 위 사이트에서 관련 데이터를 다운받아옵니다.

 

 

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from keras.layers import *
from keras.models import *
from keras.utils import *
from sklearn.preprocessing import *
import seaborn as sns

 

필요한 라이브러리를 임포트합니다.

 

 

df = pd.read_excel('Concrete_Data.xls')

df.head()

 

  Cement (component 1)(kg in a m^3 mixture) Blast Furnace Slag (component 2)(kg in a m^3 mixture) Fly Ash (component 3)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) Superplasticizer (component 5)(kg in a m^3 mixture) Coarse Aggregate (component 6)(kg in a m^3 mixture) Fine Aggregate (component 7)(kg in a m^3 mixture) Age (day) Concrete compressive strength(MPa, megapascals)
0 540.0 0.0 0.0 162.0 2.5 1040.0 676.0 28 79.986111
1 540.0 0.0 0.0 162.0 2.5 1055.0 676.0 28 61.887366
2 332.5 142.5 0.0 228.0 0.0 932.0 594.0 270 40.269535
3 332.5 142.5 0.0 228.0 0.0 932.0 594.0 365 41.052780
4 198.6 132.4 0.0 192.0 0.0 978.4 825.5 360 44.296075

 

읽어들인 데이터는 위와 같습니다.

 

 

df.columns

"""
Index(['Cement (component 1)(kg in a m^3 mixture)',
       'Blast Furnace Slag (component 2)(kg in a m^3 mixture)',
       'Fly Ash (component 3)(kg in a m^3 mixture)',
       'Water  (component 4)(kg in a m^3 mixture)',
       'Superplasticizer (component 5)(kg in a m^3 mixture)',
       'Coarse Aggregate  (component 6)(kg in a m^3 mixture)',
       'Fine Aggregate (component 7)(kg in a m^3 mixture)', 'Age (day)',
       'Concrete compressive strength(MPa, megapascals) '],
      dtype='object')
"""

 

컬럼의 정보는 다음과 같습니다. 

 

 

df.rename(
columns={'Cement (component 1)(kg in a m^3 mixture)' : 'cement',
       'Blast Furnace Slag (component 2)(kg in a m^3 mixture)' : 'blast',
       'Fly Ash (component 3)(kg in a m^3 mixture)' : 'fly',
       'Water  (component 4)(kg in a m^3 mixture)' : 'water',
       'Superplasticizer (component 5)(kg in a m^3 mixture)' : 'super',
       'Coarse Aggregate  (component 6)(kg in a m^3 mixture)' : 'coarse',
       'Fine Aggregate (component 7)(kg in a m^3 mixture)' : 'fine', 
         'Age (day)' : 'age',
       'Concrete compressive strength(MPa, megapascals) ' : 'strength'}, inplace=True)

 

개발이 편하도록, 불필요한 컬럼의 이름을 간단하게 바꾸어 줍니다.

 

 

df.head()

 

  cement blast fly water super coarse fine age strength
0 540.0 0.0 0.0 162.0 2.5 1040.0 676.0 28 79.986111
1 540.0 0.0 0.0 162.0 2.5 1055.0 676.0 28 61.887366
2 332.5 142.5 0.0 228.0 0.0 932.0 594.0 270 40.269535
3 332.5 142.5 0.0 228.0 0.0 932.0 594.0 365 41.052780
4 198.6 132.4 0.0 192.0 0.0 978.4 825.5 360 44.296075

 

데이터 컬럼이 잘 들어갔는지 확인해 봅니다.

 

 

X = df.drop(['strength'], axis=1)
Y = df['strength']

scaler = MinMaxScaler()

X = scaler.fit_transform(X)
X.shape

"""
(1030, 8)
"""

 

그 다음, 입력데이터는 강도를 뺀 나머지 데이터를 활용합니다. 그리고 훈련이 잘 진행되도록 정규화를 진행합니다. 

 

 

sns.pairplot(df)

 

 

데이터 컬럼들의 상관성과 분포도를 해당 함수를 통해 알 수 있습니다. 좀 더 좋은 훈련이 되기 위해서는 분포가 좋은 데이터들이 많을 수 록 좋습니다.

 

 

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1)
X_train.shape

"""
(927, 8)
"""

 

데이터를 라이브러리를 활용하여, 초기화 합니다. 그러면 해당 함수가 알아서, 훈련데이터와 검증데이터를 나누어 줍니다.

 

 

model = Sequential()

model.add(Dense(256, input_shape=(8,), activation='relu'))

model.add(Dense(128, activation='relu'))

model.add(Dense(32, activation='relu'))

model.add(Dense(1, activation='relu'))

model.compile(loss='mse', optimizer='adam')

model.summary()

hist = model.fit(X_train, Y_train, epochs=100, validation_split=0.1)

"""
Layer (type)                 Output Shape              Param #   
=================================================================
dense_5 (Dense)              (None, 256)               2304      
_________________________________________________________________
dense_6 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_7 (Dense)              (None, 32)                4128      
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 33        
=================================================================
Total params: 39,361
Trainable params: 39,361
Non-trainable params: 0
_________________________________________________________________
Train on 834 samples, validate on 93 samples
Epoch 1/100
834/834 [==============================] - 0s 580us/step - loss: 1429.1455 - val_loss: 1002.4135
Epoch 2/100
834/834 [==============================] - 0s 115us/step - loss: 528.6366 - val_loss: 303.4880
Epoch 3/100
834/834 [==============================] - 0s 119us/step - loss: 229.3984 - val_loss: 203.1050
Epoch 4/100
834/834 [==============================] - 0s 119us/step - loss: 187.8111 - val_loss: 176.7382
Epoch 5/100
834/834 [==============================] - 0s 116us/step - loss: 162.9026 - val_loss: 156.8786
Epoch 6/100
834/834 [==============================] - 0s 114us/step - loss: 145.4408 - val_loss: 138.2341
Epoch 7/100
834/834 [==============================] - 0s 122us/step - loss: 131.0700 - val_loss: 128.6725
Epoch 8/100
834/834 [==============================] - 0s 120us/step - loss: 122.2626 - val_loss: 121.5253
Epoch 9/100
834/834 [==============================] - 0s 117us/step - loss: 117.8350 - val_loss: 119.4496
Epoch 10/100
834/834 [==============================] - 0s 126us/step - loss: 113.6401 - val_loss: 114.5831
Epoch 11/100
834/834 [==============================] - 0s 124us/step - loss: 111.4505 - val_loss: 112.7722
Epoch 12/100
834/834 [==============================] - 0s 121us/step - loss: 108.4970 - val_loss: 109.1458
Epoch 13/100
834/834 [==============================] - 0s 122us/step - loss: 106.8510 - val_loss: 108.2624
Epoch 14/100
834/834 [==============================] - 0s 115us/step - loss: 102.4254 - val_loss: 105.4226
Epoch 15/100
834/834 [==============================] - 0s 115us/step - loss: 102.5407 - val_loss: 104.1681
Epoch 16/100
834/834 [==============================] - 0s 112us/step - loss: 98.5970 - val_loss: 99.1225
Epoch 17/100
834/834 [==============================] - 0s 116us/step - loss: 95.1558 - val_loss: 99.9388
Epoch 18/100
834/834 [==============================] - 0s 113us/step - loss: 92.6347 - val_loss: 96.2228
Epoch 19/100
834/834 [==============================] - 0s 126us/step - loss: 88.5141 - val_loss: 98.0514
Epoch 20/100
834/834 [==============================] - 0s 122us/step - loss: 85.5895 - val_loss: 85.3493
Epoch 21/100
834/834 [==============================] - 0s 115us/step - loss: 81.2445 - val_loss: 83.1137
Epoch 22/100
834/834 [==============================] - 0s 113us/step - loss: 77.7913 - val_loss: 77.3656
Epoch 23/100
834/834 [==============================] - 0s 114us/step - loss: 77.4275 - val_loss: 77.2062
Epoch 24/100
834/834 [==============================] - 0s 115us/step - loss: 74.6809 - val_loss: 71.1784
Epoch 25/100
834/834 [==============================] - 0s 114us/step - loss: 70.9468 - val_loss: 66.5246
Epoch 26/100
834/834 [==============================] - 0s 111us/step - loss: 73.2065 - val_loss: 65.0481
Epoch 27/100
834/834 [==============================] - 0s 116us/step - loss: 63.8469 - val_loss: 59.5192
Epoch 28/100
834/834 [==============================] - 0s 114us/step - loss: 68.9593 - val_loss: 71.8911
Epoch 29/100
834/834 [==============================] - 0s 115us/step - loss: 63.5867 - val_loss: 55.2200
Epoch 30/100
834/834 [==============================] - 0s 113us/step - loss: 59.2258 - val_loss: 58.6557
Epoch 31/100
834/834 [==============================] - 0s 114us/step - loss: 57.8298 - val_loss: 51.3701
Epoch 32/100
834/834 [==============================] - 0s 115us/step - loss: 56.0855 - val_loss: 49.2646
Epoch 33/100
834/834 [==============================] - 0s 119us/step - loss: 54.9721 - val_loss: 52.2159
Epoch 34/100
834/834 [==============================] - 0s 126us/step - loss: 52.9757 - val_loss: 50.7561
Epoch 35/100
834/834 [==============================] - 0s 119us/step - loss: 52.1276 - val_loss: 44.1845
Epoch 36/100
834/834 [==============================] - 0s 115us/step - loss: 51.8647 - val_loss: 44.7191
Epoch 37/100
834/834 [==============================] - 0s 113us/step - loss: 48.8898 - val_loss: 46.6701
Epoch 38/100
834/834 [==============================] - 0s 119us/step - loss: 52.6059 - val_loss: 40.7133
Epoch 39/100
834/834 [==============================] - 0s 128us/step - loss: 48.0287 - val_loss: 37.4926
Epoch 40/100
834/834 [==============================] - 0s 125us/step - loss: 46.6562 - val_loss: 40.3896
Epoch 41/100
834/834 [==============================] - 0s 114us/step - loss: 45.7802 - val_loss: 42.5468
Epoch 42/100
834/834 [==============================] - 0s 112us/step - loss: 48.0991 - val_loss: 39.0044
Epoch 43/100
834/834 [==============================] - 0s 115us/step - loss: 44.6652 - val_loss: 48.1392
Epoch 44/100
834/834 [==============================] - 0s 114us/step - loss: 44.9251 - val_loss: 35.2333
Epoch 45/100
834/834 [==============================] - 0s 114us/step - loss: 42.9037 - val_loss: 37.9893
Epoch 46/100
834/834 [==============================] - 0s 116us/step - loss: 44.2775 - val_loss: 50.2931
Epoch 47/100
834/834 [==============================] - 0s 113us/step - loss: 44.1532 - val_loss: 34.7995
Epoch 48/100
834/834 [==============================] - 0s 114us/step - loss: 40.0724 - val_loss: 36.7891
Epoch 49/100
834/834 [==============================] - 0s 114us/step - loss: 41.4432 - val_loss: 37.5936
Epoch 50/100
834/834 [==============================] - 0s 121us/step - loss: 42.5412 - val_loss: 34.2832
Epoch 51/100
834/834 [==============================] - 0s 115us/step - loss: 38.5718 - val_loss: 33.4898
Epoch 52/100
834/834 [==============================] - 0s 114us/step - loss: 38.5902 - val_loss: 33.4109
Epoch 53/100
834/834 [==============================] - 0s 114us/step - loss: 38.4109 - val_loss: 36.4448
Epoch 54/100
834/834 [==============================] - 0s 116us/step - loss: 39.3725 - val_loss: 35.7690
Epoch 55/100
834/834 [==============================] - 0s 121us/step - loss: 38.8388 - val_loss: 33.4616
Epoch 56/100
834/834 [==============================] - 0s 115us/step - loss: 37.8989 - val_loss: 36.6492
Epoch 57/100
834/834 [==============================] - 0s 115us/step - loss: 37.0896 - val_loss: 45.6021
Epoch 58/100
834/834 [==============================] - 0s 116us/step - loss: 38.5968 - val_loss: 33.0854
Epoch 59/100
834/834 [==============================] - 0s 115us/step - loss: 36.5730 - val_loss: 35.5852
Epoch 60/100
834/834 [==============================] - 0s 139us/step - loss: 39.7343 - val_loss: 32.3643
Epoch 61/100
834/834 [==============================] - 0s 120us/step - loss: 36.0273 - val_loss: 33.9282
Epoch 62/100
834/834 [==============================] - 0s 113us/step - loss: 36.3586 - val_loss: 34.9153
Epoch 63/100
834/834 [==============================] - 0s 113us/step - loss: 37.7078 - val_loss: 31.4244
Epoch 64/100
834/834 [==============================] - 0s 116us/step - loss: 34.0002 - val_loss: 33.8998
Epoch 65/100
834/834 [==============================] - 0s 114us/step - loss: 34.8807 - val_loss: 36.9429
Epoch 66/100
834/834 [==============================] - 0s 115us/step - loss: 34.2639 - val_loss: 41.7181
Epoch 67/100
834/834 [==============================] - 0s 115us/step - loss: 36.1606 - val_loss: 33.2617
Epoch 68/100
834/834 [==============================] - 0s 126us/step - loss: 34.2432 - val_loss: 36.9399
Epoch 69/100
834/834 [==============================] - 0s 116us/step - loss: 33.2924 - val_loss: 34.7811
Epoch 70/100
834/834 [==============================] - 0s 119us/step - loss: 33.9718 - val_loss: 42.2463
Epoch 71/100
834/834 [==============================] - 0s 114us/step - loss: 33.3999 - val_loss: 32.5764
Epoch 72/100
834/834 [==============================] - 0s 118us/step - loss: 36.4881 - val_loss: 31.7392
Epoch 73/100
834/834 [==============================] - 0s 115us/step - loss: 35.3655 - val_loss: 33.9083
Epoch 74/100
834/834 [==============================] - 0s 119us/step - loss: 33.2319 - val_loss: 30.5091
Epoch 75/100
834/834 [==============================] - 0s 126us/step - loss: 34.1242 - val_loss: 29.8430
Epoch 76/100
834/834 [==============================] - 0s 120us/step - loss: 32.0600 - val_loss: 31.4169
Epoch 77/100
834/834 [==============================] - 0s 118us/step - loss: 32.1489 - val_loss: 32.2625
Epoch 78/100
834/834 [==============================] - 0s 120us/step - loss: 31.6808 - val_loss: 32.4065
Epoch 79/100
834/834 [==============================] - 0s 114us/step - loss: 31.1104 - val_loss: 32.3849
Epoch 80/100
834/834 [==============================] - 0s 132us/step - loss: 30.8013 - val_loss: 31.2789
Epoch 81/100
834/834 [==============================] - 0s 114us/step - loss: 31.1945 - val_loss: 30.9757
Epoch 82/100
834/834 [==============================] - 0s 113us/step - loss: 30.8913 - val_loss: 31.5188
Epoch 83/100
834/834 [==============================] - 0s 114us/step - loss: 30.7836 - val_loss: 31.5137
Epoch 84/100
834/834 [==============================] - 0s 120us/step - loss: 31.6711 - val_loss: 29.7293
Epoch 85/100
834/834 [==============================] - 0s 116us/step - loss: 30.1709 - val_loss: 46.2979
Epoch 86/100
834/834 [==============================] - 0s 125us/step - loss: 40.2524 - val_loss: 41.9256
Epoch 87/100
834/834 [==============================] - 0s 122us/step - loss: 33.3113 - val_loss: 35.6902
Epoch 88/100
834/834 [==============================] - 0s 117us/step - loss: 31.0072 - val_loss: 30.4290
Epoch 89/100
834/834 [==============================] - 0s 114us/step - loss: 29.0793 - val_loss: 30.0556
Epoch 90/100
834/834 [==============================] - 0s 119us/step - loss: 29.6565 - val_loss: 31.6022
Epoch 91/100
834/834 [==============================] - 0s 122us/step - loss: 30.1623 - val_loss: 33.7353
Epoch 92/100
834/834 [==============================] - 0s 115us/step - loss: 29.2801 - val_loss: 29.8321
Epoch 93/100
834/834 [==============================] - 0s 114us/step - loss: 28.3933 - val_loss: 29.5040
Epoch 94/100
834/834 [==============================] - 0s 114us/step - loss: 27.8342 - val_loss: 28.7708
Epoch 95/100
834/834 [==============================] - 0s 115us/step - loss: 29.4938 - val_loss: 28.7717
Epoch 96/100
834/834 [==============================] - 0s 115us/step - loss: 28.5992 - val_loss: 29.8672
Epoch 97/100
834/834 [==============================] - 0s 118us/step - loss: 28.1570 - val_loss: 30.7702
Epoch 98/100
834/834 [==============================] - 0s 116us/step - loss: 34.0191 - val_loss: 30.1975
Epoch 99/100
834/834 [==============================] - 0s 115us/step - loss: 28.6245 - val_loss: 29.6248
Epoch 100/100
834/834 [==============================] - 0s 128us/step - loss: 27.7708 - val_loss: 30.6792
"""

 

이제 주어진 데이터들을 활용해, 간단한 모델을 활용해, 훈련을 진행합니다.

 

 

plt.figure(figsize=(10,10))
plt.subplot(1,2,1)
plt.plot(hist.history['loss'], color='r')
plt.plot(hist.history['var_loss'], color='b')
plt.title('loss')

 

 

훈련 진행 결과는 다음과 같습니다. 잘 진행된 것을 알 수 있습니다.

 

 

score = model.evaluate(X_test, Y_test)
print(score)

pred = model.predict(X_test[-5:])
print(pred)
print(Y_test[-5:])

"""
103/103 [==============================] - 0s 78us/step
32.46895279004736
[[16.99518 ]
 [16.98122 ]
 [44.119606]
 [56.167183]
 [ 9.190986]]
607    18.415904
788    18.126324
352    51.434910
127    55.495923
685    13.664035
Name: strength, dtype: float64
"""

 

이제 훈련된 모델을 가지고, 백테스팅을 진행합니다. 테스트 결과, 약 32프로의 정확도로 잘 진행되지 않는 것을 알 수 있었습니다. 

좀 더 다양한 모델과 시도를 통해, 진행해볼 가치가 있는 데이터인 것 같습니다. 

다음에 또 기회가 된다면, 다양한 레이어와 테스팅을 통해, 성능을 향상해보도록 하겠습니다.

반응형