Pads and rearrange overlapping windows of the input volume into column vectors, returning the concatenated padded vectors in a matrix `X_col`. Notes ----- A NumPy reimagining of MATLAB's ``im2col`` 'sliding' function. Code extended from Andrej Karpathy's ``im2col.py``.
(X, W_shape, pad, stride, dilation=0)
| 484 | |
| 485 | |
| 486 | def im2col(X, W_shape, pad, stride, dilation=0): |
| 487 | """ |
| 488 | Pads and rearrange overlapping windows of the input volume into column |
| 489 | vectors, returning the concatenated padded vectors in a matrix `X_col`. |
| 490 | |
| 491 | Notes |
| 492 | ----- |
| 493 | A NumPy reimagining of MATLAB's ``im2col`` 'sliding' function. |
| 494 | |
| 495 | Code extended from Andrej Karpathy's ``im2col.py``. |
| 496 | |
| 497 | Parameters |
| 498 | ---------- |
| 499 | X : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, in_rows, in_cols, in_ch)` |
| 500 | Input volume (not padded). |
| 501 | W_shape: 4-tuple containing `(kernel_rows, kernel_cols, in_ch, out_ch)` |
| 502 | The dimensions of the weights/kernels in the present convolutional |
| 503 | layer. |
| 504 | pad : tuple, int, or 'same' |
| 505 | The padding amount. If 'same', add padding to ensure that the output of |
| 506 | a 2D convolution with a kernel of `kernel_shape` and stride `stride` |
| 507 | produces an output volume of the same dimensions as the input. If |
| 508 | 2-tuple, specifies the number of padding rows and colums to add *on both |
| 509 | sides* of the rows/columns in X. If 4-tuple, specifies the number of |
| 510 | rows/columns to add to the top, bottom, left, and right of the input |
| 511 | volume. |
| 512 | stride : int |
| 513 | The stride of each convolution kernel |
| 514 | dilation : int |
| 515 | Number of pixels inserted between kernel elements. Default is 0. |
| 516 | |
| 517 | Returns |
| 518 | ------- |
| 519 | X_col : :py:class:`ndarray <numpy.ndarray>` of shape (Q, Z) |
| 520 | The reshaped input volume where where: |
| 521 | |
| 522 | .. math:: |
| 523 | |
| 524 | Q &= \\text{kernel_rows} \\times \\text{kernel_cols} \\times \\text{n_in} \\\\ |
| 525 | Z &= \\text{n_ex} \\times \\text{out_rows} \\times \\text{out_cols} |
| 526 | """ |
| 527 | fr, fc, n_in, n_out = W_shape |
| 528 | s, p, d = stride, pad, dilation |
| 529 | n_ex, in_rows, in_cols, n_in = X.shape |
| 530 | |
| 531 | # zero-pad the input |
| 532 | X_pad, p = pad2D(X, p, W_shape[:2], stride=s, dilation=d) |
| 533 | pr1, pr2, pc1, pc2 = p |
| 534 | |
| 535 | # shuffle to have channels as the first dim |
| 536 | X_pad = X_pad.transpose(0, 3, 1, 2) |
| 537 | |
| 538 | # get the indices for im2col |
| 539 | k, i, j = _im2col_indices((n_ex, n_in, in_rows, in_cols), fr, fc, p, s, d) |
| 540 | |
| 541 | X_col = X_pad[:, k, i, j] |
| 542 | X_col = X_col.transpose(1, 2, 0).reshape(fr * fc * n_in, -1) |
| 543 | return X_col, p |