- 本文为365天深度学习训练营 中的学习记录博客
- 原作者:K同学啊
任务说明:数据集中提供了火警温度(Tem1)、一氧化碳浓度(CO 1)、烟雾浓度(Soot 1)随着时间变化数据,我们需要根据这些数据对未来某一时候的火警温度做出预测
要求:
1.了解LSTM是什么,并使用其构建一个完整的步调
2.R2到达0.83
拔高:
1.使用第1 ~ 8个时候的数据预测第9 ~ 10个时候的温度数据
我的环境:
●语言环境:Python3.8
●编译器:Jupyter Lab
●深度学习框架:TensorFlow2.4.1
●数据:火警温度数据集
一、理论知识底子
相关知识可以先看看上一篇文章《第R2周:LSTM-火警温度预测:一文搞懂LSTM(是非期影象网络)》
1.LSTM原理
一句话介绍LSTM,它是RNN的进阶版,如果说RNN的最大限度是理解一句话,那么LSTM的最大限度则是理解一段话,详细介绍如下:
LSTM,全称为是非期影象网络(Long Short Term Memory networks),是一种特殊的RNN,能够学习到恒久依赖关系。LSTM由Hochreiter & Schmidhuber (1997)提出,许多研究者举行了一系列的工作对其改进并使之发扬光大。LSTM在许多问题上效果非常好,现在被广泛使用。
全部的循环神经网络都有偏重复的神经网络模块形成链的形式。在平凡的RNN中,重复模块结构非常简朴,其结构如下:
LSTM制止了恒久依赖的问题。可以记住恒久信息!LSTM内部有较为复杂的结构。能通过门控状态来选择调解传输的信息,记住需要长时间影象的信息,忘记不紧张的信息,其结构如下:
为了更好的理解LSTM输入数据的结构,将时序数据(LSTM输入数据)以可视化的形式出现。
根据输入的数据结构、预测输出,我们的步调可以大抵分为以下六类:
- 关于代码实现
3.1. 单输入单输出(单输出时间步)
●输入:单个特征,多个时间步
●输出:单个特征,单个时间步
数据案例
- 训练集:
- X y
- [10, 20, 30, 40, 50] [60]
- [20, 30, 40, 50, 60] [70]
- [30, 40, 50, 60, 70] [80]
- …
- 预测输入:
- X
- [70, 80, 90, 100 ,110]
- 期待输出:
- y
- [120]
复制代码 TensorFlow2代码实现
- model = Sequential()
- model.add( LSTM(50, activation='relu', input_shape = (n_steps, n_features)) )
- model.add( Dense(1) )
- model.compile(optimizer='adam', loss='mse')
- n_steps = 5
- n_features = 1
复制代码 3.2. 多输入单输出(单输出时间步)
●输入:多个特征,多个时间步
●输出:单个特征,单个时间步
数据案例
- 训练集:
- X y
- [[10,11],
- [20,21],
- [30,31],
- [40,41],
- [50,51]] 60
- [[20,21],
- [30,31],
- [40,41],
- [50,51],
- [60,61]] 70
- …
- 预测输入:
- X
- [[30,31],
- [40,41],
- [50,51],
- [60,61],
- [70,71]]
- 期待输出:
- y
- 80
复制代码 TensorFlow2代码实现
- model = Sequential()
- model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
- model.add(Dense(1))
- model.compile(optimizer='adam', loss='mse')
- n_steps = 5
- # 此例中 n_features = 2,因为输入有两个并行序列
- n_features = X.shape[2]
复制代码 3.3. 多输入多输出(单输出时间步)
●输入:多个特征,多个时间步
●输出:多个特征,单个时间步
数据案例
- 训练集:
- X y
- [[10,11],
- [20,21],
- [30,31],
- [40,41],
- [50,51]] [60,61]
- [[20,21],
- [30,31],
- [40,41],
- [50,51],
- [60,61]] [70,71]
- …
- 预测输入:
- X
- [[30,31],
- [40,41],
- [50,51],
- [60,61],
- [70,71]]
- 期待输出:
- y
- [80,81]
复制代码 TensorFlow2代码实现
- model = Sequential()
- model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
- model.add(Dense(n_features))
- model.compile(optimizer='adam', loss='mse')
- n_steps = 5
- # 此例中 n_features = 2,因为输入有2个并行序列
- n_features = X.shape[2]
复制代码 3.4. 单输入单输出(多输出时间步)
●输入:单个特征,多个时间步
●输出:单个特征,多个时间步
数据案例
- 训练集:
- X y
- [10,20,30,40,50] [60,70]
- [20,30,40,50,60] [70,80]
- …
- 预测输入:
- X
- [30,40,50,60,70]
- 期待输出:
- y
- [80,90]
复制代码 TensorFlow2代码实现
- model = Sequential()
- model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
- model.add(Dense(n_steps_out))
- model.compile(optimizer='adam', loss='mse')
- n_steps = 5
- n_steps_out = 2
- n_features = 1
复制代码 多输入单输出(多输出时间步)与多输入多输出(多输出时间步)同理,这里就不赘述了
二、前期准备工作
- import tensorflow as tf
- import pandas as pd
- import numpy as np
- gpus = tf.config.list_physical_devices("GPU")
- if gpus:
- tf.config.experimental.set_memory_growth(gpus[0], True) #设置GPU显存用量按需使用
- tf.config.set_visible_devices([gpus[0]],"GPU")
- print("GPU: ",gpus)
- else:
- print('CPU:')
- # 确认当前可见的设备列表
- print(tf.config.list_physical_devices())
- df_1 = pd.read_csv("./R2/woodpine2.csv")
复制代码 代码输出
- CPU:
- [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
复制代码- import matplotlib.pyplot as plt
- import seaborn as sns
- plt.rcParams['savefig.dpi'] = 500 #图片像素
- plt.rcParams['figure.dpi'] = 500 #分辨率
- fig, ax =plt.subplots(1,3,constrained_layout=True, figsize=(14, 3))
- sns.lineplot(data=df_1["Tem1"], ax=ax[0])
- sns.lineplot(data=df_1["CO 1"], ax=ax[1])
- sns.lineplot(data=df_1["Soot 1"], ax=ax[2])
- plt.show()
复制代码 代码输出
三、构建数据集
- dataFrame = df_1.iloc[:,1:]
- dataFrame
复制代码 代码输出
Tem1CO 1Soot 1025.00.0000000.000000125.00.0000000.000000225.00.0000000.000000325.00.0000000.000000425.00.0000000.000000............5943295.00.0000770.0004965944294.00.0000770.0004945945292.00.0000770.0004915946291.00.0000760.0004895947290.00.0000760.000487 5948 rows × 3 columns
取前8个时间段的Tem1、CO 1、Soot 1为X,第9个时间段的Tem1为y。
- X = []
- y = []
- in_start = 0
- for _, _ in df_1.iterrows():
- in_end = in_start + width_X
- out_end = in_end + width_y
-
- if out_end < len(dataFrame):
- X_ = np.array(dataFrame.iloc[in_start:in_end , ])
- X_ = X_.reshape((len(X_)*3))
- y_ = np.array(dataFrame.iloc[in_end :out_end, 0])
- X.append(X_)
- y.append(y_)
-
- in_start += 1
- X = np.array(X)
- y = np.array(y)
- X.shape, y.shape
复制代码 代码输出
- from sklearn.preprocessing import MinMaxScaler
- #将数据归一化,范围是0到1
- sc = MinMaxScaler(feature_range=(0, 1))
- X_scaled = sc.fit_transform(X)
- X_scaled.shape
复制代码 代码输出
- X_scaled = X_scaled.reshape(len(X_scaled),width_X,3)
- X_scaled.shape
复制代码 代码输出
- 分别数据集
取5000之前的数据为训练集,5000之后的为验证集
- X_train = np.array(X_scaled[:5000]).astype('float64')
- y_train = np.array(y[:5000]).astype('float64')
- X_test = np.array(X_scaled[5000:]).astype('float64')
- y_test = np.array(y[5000:]).astype('float64')
复制代码 代码输出
四、构建模型
- from tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense,LSTM,Bidirectionalfrom tensorflow.keras import Input# 多层 LSTMmodel_lstm = Sequential()model_lstm.add(LSTM(units=64, activation='relu', return_sequences=True, input_shape=(X_train.shape
- [1], 3)))model_lstm.add(LSTM(units=64, activation='relu'))model_lstm.add(Dense(width_y))
复制代码 五、模型训练
- # 只观测loss数值,不观测准确率,所以删去metrics选项
- model_lstm.compile(optimizer=tf.keras.optimizers.Adam(1e-3),
- loss='mean_squared_error') # 损失函数用均方误差
复制代码- X_train.shape
- , y_train.shape
复制代码 代码输出
- ((5000, 8, 3)
- , (5000, 1))
复制代码- history_lstm = model_lstm.fit(X_train, y_train,
- batch_size=64,
- epochs=40,
- validation_data=(X_test, y_test),
- validation_freq=1)
复制代码 代码输出
- Epoch 1/40
- 79/79 [==============================] - 3s 13ms/step - loss: 17642.7640 - val_loss: 5843.1167
- Epoch 2/40
- 79/79 [==============================] - 1s 9ms/step - loss: 421.8025 - val_loss: 863.2029
- Epoch 3/40
- 79/79 [==============================] - 1s 9ms/step - loss: 68.0383 - val_loss: 443.3524
- Epoch 4/40
- 79/79 [==============================] - 1s 11ms/step - loss: 63.1070 - val_loss: 630.0569
- Epoch 5/40
- 79/79 [==============================] - 1s 9ms/step - loss: 60.8359 - val_loss: 429.6816
- Epoch 6/40
- 79/79 [==============================] - 1s 9ms/step - loss: 55.2357 - val_loss: 332.5534
- Epoch 7/40
- 79/79 [==============================] - 1s 9ms/step - loss: 52.6763 - val_loss: 225.5500
- Epoch 8/40
- 79/79 [==============================] - 1s 9ms/step - loss: 50.2085 - val_loss: 233.0096
- Epoch 9/40
- 79/79 [==============================] - 1s 9ms/step - loss: 48.3704 - val_loss: 200.6572
- Epoch 10/40
- 79/79 [==============================] - 1s 9ms/step - loss: 43.5778 - val_loss: 255.6778
- Epoch 11/40
- 79/79 [==============================] - 1s 9ms/step - loss: 41.6273 - val_loss: 187.6802
- Epoch 12/40
- 79/79 [==============================] - 1s 9ms/step - loss: 37.9668 - val_loss: 152.1306
- Epoch 13/40
- 79/79 [==============================] - 1s 9ms/step - loss: 33.7161 - val_loss: 126.5226
- Epoch 14/40
- 79/79 [==============================] - 1s 9ms/step - loss: 29.3218 - val_loss: 99.1449
- Epoch 15/40
- 79/79 [==============================] - 1s 9ms/step - loss: 27.9880 - val_loss: 91.9206
- Epoch 16/40
- 79/79 [==============================] - 1s 9ms/step - loss: 25.1793 - val_loss: 104.4199
- Epoch 17/40
- 79/79 [==============================] - 1s 9ms/step - loss: 23.2140 - val_loss: 68.4278
- Epoch 18/40
- 79/79 [==============================] - 1s 9ms/step - loss: 20.5209 - val_loss: 58.7139
- Epoch 19/40
- 79/79 [==============================] - 1s 9ms/step - loss: 18.9439 - val_loss: 57.1808
- Epoch 20/40
- 79/79 [==============================] - 1s 9ms/step - loss: 18.0535 - val_loss: 65.7030
- Epoch 21/40
- 79/79 [==============================] - 1s 9ms/step - loss: 16.9911 - val_loss: 50.8789
- Epoch 22/40
- 79/79 [==============================] - 1s 9ms/step - loss: 15.8952 - val_loss: 62.8621
- Epoch 23/40
- 79/79 [==============================] - 1s 9ms/step - loss: 15.9065 - val_loss: 71.4229
- Epoch 24/40
- 79/79 [==============================] - 1s 9ms/step - loss: 9.7059 - val_loss: 60.4816
- Epoch 25/40
- 79/79 [==============================] - 1s 11ms/step - loss: 8.4736 - val_loss: 55.1349
- Epoch 26/40
- 79/79 [==============================] - 1s 9ms/step - loss: 8.2527 - val_loss: 47.9371
- Epoch 27/40
- 79/79 [==============================] - 1s 9ms/step - loss: 8.6649 - val_loss: 78.6073
- Epoch 28/40
- 79/79 [==============================] - 1s 9ms/step - loss: 8.9457 - val_loss: 95.0485
- Epoch 29/40
- 79/79 [==============================] - 1s 9ms/step - loss: 8.2558 - val_loss: 73.9929
- Epoch 30/40
- 79/79 [==============================] - 1s 9ms/step - loss: 8.6800 - val_loss: 46.4249
- Epoch 31/40
- 79/79 [==============================] - 1s 9ms/step - loss: 7.4052 - val_loss: 51.3766
- Epoch 32/40
- 79/79 [==============================] - 1s 11ms/step - loss: 8.3682 - val_loss: 47.5709
- Epoch 33/40
- 79/79 [==============================] - 1s 10ms/step - loss: 9.4248 - val_loss: 47.8780
- Epoch 34/40
- 79/79 [==============================] - 1s 11ms/step - loss: 9.0760 - val_loss: 61.7005
- Epoch 35/40
- 79/79 [==============================] - 1s 10ms/step - loss: 6.7884 - val_loss: 71.0755
- Epoch 36/40
- 79/79 [==============================] - 1s 10ms/step - loss: 7.3383 - val_loss: 47.5915
- Epoch 37/40
- 79/79 [==============================] - 1s 11ms/step - loss: 7.7409 - val_loss: 63.6706
- Epoch 38/40
- 79/79 [==============================] - 1s 12ms/step - loss: 6.7351 - val_loss: 44.5680
- Epoch 39/40
- 79/79 [==============================] - 1s 12ms/step - loss: 6.0092 - val_loss: 59.0267
- Epoch 40/40
- 79/79 [==============================] - 1s 11ms/step - loss: 7.3467 - val_loss: 50.5237
复制代码 六、评估
- # 支持中文
- plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
- plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
- plt.figure(figsize=(5, 3),dpi=120)
- plt.plot(history_lstm.history['loss'] , label='LSTM Training Loss')
- plt.plot(history_lstm.history['val_loss'], label='LSTM Validation Loss')
- plt.title('Training and Validation Loss')
- plt.legend()
- plt.show()
复制代码 代码输出
- predicted_y_lstm = model_lstm.predict(X_test) # 测试集输入模型进行预测
- y_test_one = [i[0] for i in y_test]
- predicted_y_lstm_one = [i[0] for i in predicted_y_lstm]
- plt.figure(figsize=(5, 3),dpi=120)
- # 画出真实数据和预测数据的对比曲线
- plt.plot(y_test_one[:1000], color='red', label='真实值')
- plt.plot(predicted_y_lstm_one[:1000], color='blue', label='预测值')
- plt.title('Title')
- plt.xlabel('X')
- plt.ylabel('Y')
- plt.legend()
- plt.show()
复制代码 代码输出
- from sklearn import metrics
- """
- RMSE :均方根误差 -----> 对均方误差开方
- R2 :决定系数,可以简单理解为反映模型拟合优度的重要的统计量
- """
- RMSE_lstm = metrics.mean_squared_error(predicted_y_lstm, y_test)**0.5
- R2_lstm = metrics.r2_score(predicted_y_lstm, y_test)
- print('均方根误差: %.5f' % RMSE_lstm)
- print('R2: %.5f' % R2_lstm)
复制代码 代码输出
- 均方根误差: 7.10801
- R2: 0.82670
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |