Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

建構卷積類神經網路 (CNN) 來提升電腦視覺技術

1. 事前準備

在本程式碼研究室中，您將學會如何使用 CNN 改善您的圖片分類模型。

事前準備

這個程式碼研究室是以前兩個單元 (建構電腦視覺模型) 完成工作，其中會在這裡填入您會使用的部分程式碼，以及建構卷積和執行集區程式碼研究室 (該程式碼介紹了卷積和集區)。

您將會瞭解的內容

如何透過卷積改善電腦視覺和準確度

建構項目

強化類神經網路的圖層

軟硬體需求

您可以找到在 Colab 中執行的其他程式碼研究室程式碼。

你也必須安裝 TensorFlow，以及先前在程式碼研究室安裝的程式庫。

2. 透過卷積改善電腦視覺的準確度

現在您已知道如何使用深層類神經網路 (DNN) 來建構時尚的影像辨識，包含三層：輸入層 (在輸入資料的形狀中)、輸出層 (在想要的輸出型態中) 以及隱藏層。您實驗了幾個會影響最終準確度的參數，例如不同大小的隱藏層和訓練訓練週期數量。

為了方便起見，請再次檢查完整的程式碼。請記下這個結果，並記下結尾顯示的測試準確率。

import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))

訓練的準確率約為 89%，驗證則為 87%。您還可以使用卷積將圖片內容縮小至特定、獨特的細節，藉此提升圖片品質。

如果已經使用篩選器處理圖片，那麼卷積應該會變得非常熟悉。

簡而言之，您接受一個陣列（通常是 3x3 或 5x5）然後通過圖像。根據該矩陣中的公式變更基礎像素，您就可以執行邊緣偵測等作業。例如，通常用於 3x3 用於用於中端電池 8 且其所有相鄰為 -1 的邊緣檢測。在這種情況下，您可以將每個像素的值乘以 8，然後減去每個相鄰值。每個像素都進行這項檢查，但最後會加上一片邊緣經過改良的圖片。

這項功能很適合電腦視覺設計，因為這類功能強化了邊緣功能，有助於電腦區分不同項目。更棒的是，您需要的資訊較少很多，因為您只會針對標明的功能進行訓練。

這正是卷積類神經網路的概念。新增一些圖層，在濃密層之前進行卷積，再進一步將資訊傳送至密集層。

3. 試用程式碼

執行下列指令。它和更早之前的類神經網路是一樣的，但這次則是先加入卷積層。作業時間會拉長，但請認真看待對準確度的影響：

import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images / 255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))

它可能在訓練數據上增加了大約 93%，在進行數據上增加了 91%。

現在，請嘗試執行更多訓練週期，例如大約 20 秒鐘，然後探索結果。儘管訓練結果看起來非常好，但驗證結果可能會因為過度配適這種狀況而下滑。

網路如果從訓練集中學習到的資料過多，就會發生過度配適的情形，因此僅適合識別該項資料，因此在較一般的情況下查看其他資料不太有效。舉例來說，如果您只接受跟鞋跟訓練，那麼網路在辨識鞋跟方面可能非常不錯，但運動鞋可能會混淆。

重新檢視程式碼，並逐步查看卷積的建立過程。

4. 收集資料

第一步是收集資料。

你會發現這裡有所更動，訓練資料也須重新調整。如果不這麼做，訓練時就會收到錯誤訊息，因為卷積無法辨識形狀。

import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images = training_images/255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images = test_images/255.0

5. 定義模型

接下來，請定義您的模型。而會改為新增卷積層 (而不是頂端輸入的圖層)。參數說明如下：

您要產生的捲數。像 32 之類的值是不錯的起點。
卷積矩陣的大小 (在本例中為 3x3 格線)。
要使用的啟用函式，本例採用 relu。
在第一層中，輸入資料的形狀。

您會依照卷積原則使用卷積層，該層會設計壓縮圖片，同時維護卷積處強調的特徵內容。通過指定 (2,2) 作為最大池化，效果是減小圖像的大小 4 倍。它建立了一個 2x2 的像素陣列，然後挑出最大的像素值，為 4 個像素為 1。這會重複計算整個圖片，因此將半像素數和半像素數量相半數。

您可以呼叫 model.summary() 來查看網路的大小和形狀。請注意，在每個集區層級結束後，圖片都會以下列方式縮減：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               204928    
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1290      
=================================================================

以下是 CNN 的完整程式碼：

model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
#Add another convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
#Now flatten the output. After this you'll just have the same DNN structure as the non convolutional version
tf.keras.layers.Flatten(),
#The same 128 dense layers, and 10 output layers as in the pre-convolution example:
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])

6. 編譯及訓練模型

編譯模型，呼叫適合的方法進行訓練，並評估測試集的損失和準確率。

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_acc*100))

7. 以視覺化方式呈現卷積和集區資料

此程式碼以圖形方式呈現卷積。print (test_labels[:100]) 會顯示測試集的前 100 個標籤，您可以看到索引 0、索引 23 和索引 28 的標籤全都相同 (9)。他們是所有鞋款。請逐一查看卷積的執行結果，然後瞭解這些特徵間的共通點。現在，如果 DNN 在訓練數據後，它的信息減少了很多，它可能基於卷積和池化組合找到鞋之間的共同性。

print(test_labels[:100])

[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
 5 7 9 6 1 3 7 6 7 2 1 2 2 4 4 5 8 2 2 8 4 8 0 7 7 8 5 1 1 2 3 9 8 7 0 2 6
 2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]

現在，您可以為這些標籤選取相應的圖片，並呈現這些圖片在圖片中的呈現方式。因此，在下列程式碼中，FIRST_IMAGE、SECOND_IMAGE 和 THIRD_IMAGE 都是值庫啟動值 9 的所有索引。

import matplotlib.pyplot as plt
f, axarr = plt.subplots(3,4)
FIRST_IMAGE=0
SECOND_IMAGE=23
THIRD_IMAGE=28
CONVOLUTION_NUMBER = 6
from tensorflow.keras import models
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
for x in range(0,4):
  f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[0,x].grid(False)
  f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[1,x].grid(False)
  f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[2,x].grid(False)

這時你應該會看到如下轉換，其中，卷積會成為鞋底的唯一品項，有效發現這是所有鞋款的共同特徵。

8. 練習

運動 1

請嘗試編輯卷積。將捲積數為 32 更改為 16 或 64。對精確度和訓練時間有何影響？

運動 2

移除最終卷積。對準確性或訓練時間的影響為何？

運動 3

新增更多卷積。這會帶來什麼影響？

運動 4

移除第一個轉換。這會帶來什麼影響？請多方嘗試。

9. 恭喜

您已經建立第一個 CNN！要進一步瞭解如何強化您的電腦視覺模型，請參閱使用卷積類神經網路 (CNN) 處理複雜的圖片。

建構卷積類神經網路 (CNN) 來提升電腦視覺技術 透過集合功能整理內容 你可以依據偏好儲存及分類內容。