tf.nn.conv2d
I would evlaute api of tensorlfow about convolution, tf.nn.conv2d.
as you already know it, the number of suffix of tf.nn.conv00 api mean input date’s number of dimensionality.
So Tensorflow has 3 types of api about conv0d such as conv1d, conv2d, conv3d.
e.g.
-
conv1d is input data is width and channels
-
conv2d is input data is height, width, and channels
-
conv3d is input data is depth, height, width and channels.
In here I would go through conv2d with data format, “NHWC”.
-
N : The number of batch
-
H : The number of height
-
W : The number of width
-
C : The number of channels
In order to computes 2-D convolution given 4-D input and filter tensors.
Given an input tensor of shape [batch, in_height, in_width, in_channels] and a filter/kernel tensor of shape [filter_height, filter_width, in_channels, out_channels], this op performs the following:
-
Flatten the filter to a 2-D matrix with shape [filter_height * fliter_width * in_channels, output_channels].
-
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].
-
For each patch, right-multiplies the filter matrix and the image patch vector.
In detail, with the default NHWC format,
This format must have strdies[0] = strides[3] = 1. For the most common case of the same horizontal and vertices strides,
strides = [1, stride, stride, 1]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
"""Example code for tf.nn.embedding_lookup(https://www.tensorflow.org/versions/r1.8/api_docs/python/tf/nn/conv2d)
tf.nn.conv2d(
input,
filter,
strides,
padding,
use_cudnn_on_gpu=True,
data_format='NHWC',
dilations=[1, 1, 1, 1],
name=None
)
"""
import sys
import tensorflow as tf
import numpy as np
print("=== Version checking ===")
print("The version of sys: \n{}".format(sys.version))
print("Tensorflow version: {}".format(tf.__version__))
print("========================")
The following functions would show you how much the height and width of output have.
But you have to provide height and widht of input image, and filter size, strides with padding scheme.
You could know the height and width of output and also know size of padding.
In Tensorflow, the scheme of padding is the smallest possible padding.
Let’s see an example:
In here, in_channels of input have to be same from filter’s in_channel.
and filter_out_channels is the size of output after filtering each patches of input.
finally, strides list which has 4 dimension for each striding dimension.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
batch = 1
in_height = 4
in_width = 5
in_channels = 1
filter_height = 2
filter_width = 5
filter_in_channels = in_channels
filter_out_channels = 1
strides = [1, 1, 1, 1]
print("[batch, in_height, in_width, in_channels] = \n[{}, {}, {}, {}]".format(batch, in_height, in_width, in_channels))
print("[filter_height, filter_width, filter_in_channels, filter_out_channels] = \n[{}, {}, {}, {}]".format(filter_height, filter_width, filter_in_channels, filter_out_channels))
print("strides = \n[{}, {}, {}, {}]".format(strides[0], strides[1], strides[2], strides[3]))
First of all, let’s evaluate output size with “VALID” scheme padding:
1
2
3
4
5
6
7
8
9
def check_output_size_with_VALID(in_height, in_width, strides, filter_height, filter_width):
out_height = np.ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = np.ceil(float(in_width - filter_width + 1) / float(strides[2]))
print("VAILD padding is no padding")
print("output_height: {}".format(out_height))
print("output_width: {}".format(in_width))
print("What is height and width of output??????b")
check_output_size_with_VALID(in_height, in_width, strides, filter_height, filter_width)
Secondly, check output size with “SAME” scheme padding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def check_output_size_with_SAME(in_height, in_width, strides, filter_height, filter_width):
out_height = np.ceil(float(in_height) / float(strides[1]))
out_width = np.ceil(float(in_width) / float(strides[2]))
print("SAME has padding and it the smallest possible padding")
print("output_height: {}".format(out_height))
print("output_width: {}".format(out_width))
if (in_height % strides[1] == 0):
pad_along_height = max(filter_height - strides[1], 0)
else:
pad_along_height = max(filter_height - (in_height % strides[1]), 0)
if (in_width % strides[2] == 0):
pad_along_width = max(filter_width - strides[2], 0)
else:
pad_along_width = max(filter_width - (in_width % strides[2]), 0)
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left
print("pad along height and width...")
print("pad along height: {}".format(pad_along_height))
print("pad along width: {}".format(pad_along_width))
pad_top = pad_along_height // 2 # divied by 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left
print("Padding size on top, bottom, left and right")
print("top: {}".format(pad_top))
print("bottom: {}".format(pad_bottom))
print("left: {}".format(pad_left))
print("right: {}".format(pad_right))
print("What is height and width of output???????")
check_output_size_with_SAME(in_height, in_width, strides, filter_height, filter_width)
Basically, covolution computes per channels like this.
Let’s go through “VALID” Scheme padding from bottom in detail.
Input image data is random number and kernel is constant tensor.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
batch = 1
in_height = 4
in_width = 5
in_channels = 1
filter_height = 2
filter_width = 5
filter_in_channels = in_channels
filter_out_channels = 1
strides = [1, 1, 1, 1]
# image is [batch, in_height, in_width, in_channels]
image_tensor = tf.get_variable(shape=[batch, in_height, in_width, in_channels], dtype=tf.float32, name="image")
# filter's size is [filter_height, filter_width, in_channels, out_channels]
kernel = tf.constant(1, shape=[filter_height, filter_width, filter_in_channels, filter_out_channels], dtype=tf.float32, name="kernel")
# the result of convolution with each patches of image and filter.
result = tf.nn.conv2d(image_tensor, kernel, strides=strides, padding="VALID")
init_op = tf.global_variables_initializer()
print("What is height and width of output ?")
check_output_size_with_VALID(in_height, in_width, strides, filter_height, filter_width)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
with tf.Session() as sess:
sess.run(init_op)
print("====== Image Tensor ======= \n{}".format(image_tensor))
image = sess.run(image_tensor)
print(image)
print("\n====== Kernel Tensor ======= \n{}".format(kernel))
print(sess.run(kernel))
#print("\n====== Result Tensor ======= \n{}".format(result))
#print(sess.run(result))
print("\n========== Checking separately how to evaluate conv2d =============")
a = image[0, 0]; b = image[0, 1]; c = image[0, 2]; d = image[0, 3];
print("\nimage(0, 0, :, :) \n{}".format(a))
print("\nimage(0, 1, :, :) \n{}".format(b))
print("\nimage(0, 2, :, :) \n{}".format(c))
print("\nimage(0, 3, :, :) \n{}".format(d))
print("\n=============== Sum ===========")
sumA=np.sum(a); sumB = np.sum(b); sumC = np.sum(c); sumD = np.sum(d)
print("\nSum of image(0, 0, :, :) \n{}".format(sumA))
print("\nSum of image(0, 1, :, :) \n{}".format(sumB))
print("\nSum of image(0, 2, :, :) \n{}".format(sumC))
print("\nSum of image(0, 3, :, :) \n{}".format(sumD))
print("\n============== after conv =================")
print("Predcition of mine")
print("\nSum of image(0, 0:2, :, :) \n{}".format(sumA + sumB))
print("\nSum of image(0, 1:3, :, :) \n{}".format(sumB + sumC))
print("\nSum of image(0, 2:4, :, :) \n{}".format(sumC + sumD))
print("Real result of conv2d")
print("\nThe result after conv2d: \n{}".format(sess.run(result)))
Let’s go through another example changin the in_channels and output_channels
In this case, in_channels is 2 and output_channels is 1
Keep in mind This con2d operation execute right-muplication.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
batch = 1
in_height = 4
in_width = 5
in_channels = 2
filter_height = 2
filter_width = 5
filter_in_channels = in_channels
filter_out_channels = 1
strides = [1, 1, 1, 1]
# image is [batch, in_height, in_width, in_channels]
image_tensor = tf.get_variable(shape=[batch, in_height, in_width, in_channels], dtype=tf.float32, name="image_with_in_channel2")
# filter's size is [filter_height, filter_width, in_channels, out_channels]
kernel = tf.constant(1, shape=[filter_height, filter_width, filter_in_channels, filter_out_channels], dtype=tf.float32, name="kernel")
# the result of convolution with each patches of image and filter.
result = tf.nn.conv2d(image_tensor, kernel, strides=strides, padding="VALID")
init_op = tf.global_variables_initializer()
print("What is height and width of output ?")
check_output_size_with_VALID(in_height, in_width, strides, filter_height, filter_width)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
with tf.Session() as sess:
sess.run(init_op)
print("====== Image Tensor ======= \n{}".format(image_tensor))
image = sess.run(image_tensor)
print(image)
print("\n====== Kernel Tensor ======= \n{}".format(kernel))
print(sess.run(kernel))
#print("\n====== Result Tensor ======= \n{}".format(result))
#print(sess.run(result))
print("\n========== Checking separately how to evaluate conv2d =============")
a = image[0, 0]; b = image[0, 1]; c = image[0, 2]; d = image[0, 3];
print("\nimage(0, 0, :, :) \n{}".format(a))
print("\nimage(0, 1, :, :) \n{}".format(b))
print("\nimage(0, 2, :, :) \n{}".format(c))
print("\nimage(0, 3, :, :) \n{}".format(d))
print("\n=============== Sum ===========")
sumA=np.sum(a); sumB = np.sum(b); sumC = np.sum(c); sumD = np.sum(d)
print("\nSum of image(0, 0, :, :) \n{}".format(sumA))
print("\nSum of image(0, 1, :, :) \n{}".format(sumB))
print("\nSum of image(0, 2, :, :) \n{}".format(sumC))
print("\nSum of image(0, 3, :, :) \n{}".format(sumD))
print("\n============== after conv =================")
print("Predcition of mine")
print("\nSum of image(0, 0:2, :, :) \n{}".format(sumA + sumB))
print("\nSum of image(0, 1:3, :, :) \n{}".format(sumB + sumC))
print("\nSum of image(0, 2:4, :, :) \n{}".format(sumC + sumD))
print("Real result of conv2d")
print("\nThe result after conv2d: \n{}".format(sess.run(result)))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
batch = 1
in_height = 4
in_width = 5
in_channels = 2
filter_height = 2
filter_width = 5
filter_in_channels = in_channels
filter_out_channels = 1
strides = [1, 1, 1, 1]
# image is [batch, in_height, in_width, in_channels]
image_tensor = tf.get_variable(shape=[batch, in_height, in_width, in_channels], dtype=tf.float32, name="image_other_filter")
# filter's size is concatenation of two, [filter_height, filter_width, in_channels-1, out_channels].
kernel1 = tf.constant(1, shape=[filter_height, filter_width, filter_in_channels-1, filter_out_channels], dtype=tf.float32, name="kernel_1")
kernel2 = tf.constant(0, shape=[filter_height, filter_width, filter_in_channels-1, filter_out_channels], dtype=tf.float32, name="kernel_2")
kernel = tf.concat([kernel1, kernel2], axis=2)
# the result of convolution with each patches of image and filter.
result = tf.nn.conv2d(image_tensor, kernel, strides=strides, padding="VALID")
init_op = tf.global_variables_initializer()
print("What is height and width of output ?")
check_output_size_with_VALID(in_height, in_width, strides, filter_height, filter_width)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
with tf.Session() as sess:
sess.run(init_op)
print("====== Image Tensor ======= \n{}".format(image_tensor))
image = sess.run(image_tensor)
print(image)
print("\n====== Kernel Tensor ======= \n{}, \n{}, \n{}".format(kernel1, kernel2, kernel))
#print(sess.run(kernel1))
#print(sess.run(kernel2))
print(sess.run(kernel))
#print("\n====== Result Tensor ======= \n{}".format(result))
#print(sess.run(result))
print("\n========== Checking separately how to evaluate conv2d =============")
a = image[0, 0]; b = image[0, 1]; c = image[0, 2]; d = image[0, 3];
one_hot = np.array([1., 0.])
aMul=np.multiply(a, one_hot); bMul=np.multiply(b, one_hot);
cMul=np.multiply(c, one_hot); dMul=np.multiply(d, one_hot);
print("\nimage(0, 0, :, :), one_hot \n{}, \n{}".format(a, aMul))
print("\nimage(0, 1, :, :), one_hot \n{}, \n{}".format(b, bMul))
print("\nimage(0, 2, :, :), one_hot \n{}, \n{}".format(c, cMul))
print("\nimage(0, 3, :, :), one_hot \n{}, \n{}".format(d, dMul))
print("\n=============== Sum ===========")
sumA=np.sum(aMul); sumB = np.sum(bMul); sumC = np.sum(cMul); sumD = np.sum(dMul)
print("\nSum of image(0, 0, :, :)*one_hot \n{}".format(sumA))
print("\nSum of image(0, 1, :, :)*one_hot \n{}".format(sumB))
print("\nSum of image(0, 2, :, :)*one_hot \n{}".format(sumC))
print("\nSum of image(0, 3, :, :)*one_hot \n{}".format(sumD))
print("\n============== after conv =================")
print("Predcition of mine")
print("\nSum of image(0, 0:2, :, :) \n{}".format(sumA + sumB))
print("\nSum of image(0, 1:3, :, :) \n{}".format(sumB + sumC))
print("\nSum of image(0, 2:4, :, :) \n{}".format(sumC + sumD))
print("Real result of conv2d")
print("\nThe result after conv2d: \n{}".format(sess.run(result)))
Let’s go through “SAME” Scheme with padding.
“SAME” scheme is the smalles possible zero-padding and the height and width are the same between output and input.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
batch = 1
in_height = 4
in_width = 5
in_channels = 1
filter_height = 2
filter_width = 5
filter_in_channels = in_channels
filter_out_channels = 1
strides = [1, 1, 1, 1]
image_tensor = tf.get_variable(shape=[batch, in_height, in_width, in_channels], dtype=tf.float32, name="same_zero_padding")
kernel = tf.constant(1, shape=[filter_height, filter_width, filter_in_channels, filter_out_channels], dtype=tf.float32, name="image_tensor")
result = tf.nn.conv2d(image_tensor, kernel, strides=strides, padding="SAME")
init_op = tf.global_variables_initializer()
print("What is height and width of output ?")
check_output_size_with_SAME(in_height, in_width, strides, filter_height, filter_width)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
with tf.Session() as sess:
sess.run(init_op)
print("====== Image Tensor ======= \n{}".format(image_tensor))
image = sess.run(image_tensor)
print(image)
print("\n====== Kernel Tensor ======= \n{}".format(kernel))
print(sess.run(kernel))
#print("\n====== Result Tensor ======= \n{}".format(result))
#print(sess.run(result))
print("\n========== Checking separately how to evaluate conv2d =============")
a0 = image[0, 0, 0:3]; a1 = image[0, 0, 0:4]; a2 = image[0, 0, 0:5]; a3 = image[0, 0, 1:5]; a4 = image[0, 0, 2:5];
b0 = image[0, 1, 0:3]; b1 = image[0, 1, 0:4]; b2 = image[0, 1, 0:5]; b3 = image[0, 1, 1:5]; b4 = image[0, 1, 2:5];
c0 = image[0, 2, 0:3]; c1 = image[0, 2, 0:4]; c2 = image[0, 2, 0:5]; c3 = image[0, 2, 1:5]; c4 = image[0, 2, 2:5];
d0 = image[0, 3, 0:3]; d1 = image[0, 3, 0:4]; d2 = image[0, 3, 0:5]; d3 = image[0, 3, 1:5]; d4 = image[0, 3, 2:5];
print("\nimage(0, 0, 0:3, :)- \n{}".format(a0))
print("\nimage(0, 0, 0:4, :)- \n{}".format(a1))
print("\nimage(0, 0, 0:5, :)- \n{}".format(a2))
print("\nimage(0, 0, 1:5, :)- \n{}".format(a3))
print("\nimage(0, 0, 2:5, :)- \n{}".format(a4))
print("\n=============== Sum ===========")
sumA0 = np.sum(a0); sumA1 = np.sum(a1); sumA2 = np.sum(a2); sumA3 = np.sum(a3); sumA4 = np.sum(a4);
sumB0 = np.sum(b0); sumB1 = np.sum(b1); sumB2 = np.sum(b2); sumB3 = np.sum(b3); sumB4 = np.sum(b4);
sumC0 = np.sum(c0); sumC1 = np.sum(c1); sumC2 = np.sum(c2); sumC3 = np.sum(c3); sumC4 = np.sum(c4);
sumD0 = np.sum(d0); sumD1 = np.sum(d1); sumD2 = np.sum(d2); sumD3 = np.sum(d3); sumD4 = np.sum(d4);
print("\nSum of image(0, 0, :, :)- \n({}, {}, {}, {}, {})".format(sumA0,sumA1,sumA2,sumA3,sumA4))
print("\nSum of image(0, 1, :, :)- \n({}, {}, {}, {}, {})".format(sumB0,sumB1,sumB2,sumB3,sumB4))
print("\nSum of image(0, 2, :, :)- \n({}, {}, {}, {}, {})".format(sumC0,sumC1,sumC2,sumC3,sumC4))
print("\nSum of image(0, 3, :, :)- \n({}, {}, {}, {}, {})".format(sumD0,sumD1,sumD2,sumD3,sumD4))
print("\n============== after conv =================")
print("\nSum of image(0, 0:2, :, :)- \n({}, {}, {}, {}, {})".format((sumA0+sumB0),(sumA1+sumB1),(sumA2+sumB2),(sumA3+sumB3),(sumA4+sumB4)))
print("\nSum of image(0, 1:3, :, :)- \n({}, {}, {}, {}, {})".format((sumB0+sumC0),(sumB1+sumC1),(sumB2 + sumC2),(sumB3 + sumC3),(sumB4 + sumC4)))
print("\nSum of image(0, 2:4, :, :)- \n({}, {}, {}, {}, {})".format((sumC0 + sumD0),(sumC1 + sumD1),(sumC2 + sumD2),(sumC3 + sumD3),(sumC4 + sumD4)))
print("\nSum of image(0, 2:4, :, :)- \n({}, {}, {}, {}, {})".format(sumD0, sumD1, sumD2, sumD3, sumD4))
print("\nThe result after conv2d: \n{}".format(sess.run(result)))
If you want to know more result tested, visit here