org.deeplearning4j.nn.conf.ConvolutionMode.java Source code

Introduction

Here is the source code for org.deeplearning4j.nn.conf.ConvolutionMode.java
Source

/*******************************************************************************
 * Copyright (c) 2015-2018 Skymind, Inc.
 *
 * This program and the accompanying materials are made available under the
 * terms of the Apache License, Version 2.0 which is available at
 * https://www.apache.org/licenses/LICENSE-2.0.
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 * License for the specific language governing permissions and limitations
 * under the License.
 *
 * SPDX-License-Identifier: Apache-2.0
 ******************************************************************************/

package org.deeplearning4j.nn.conf;

/**
 * ConvolutionMode defines how convolution operations should be executed for Convolutional and Subsampling layers,
 * for a given input size and network configuration (specifically stride/padding/kernel sizes).<br>
 * Currently, 3 modes are provided:
 * <br>
 * <br>
 * <b>Strict</b>: Output size for Convolutional and Subsampling layers are calculated as follows, in each dimension:
 * outputSize = (inputSize - kernelSize + 2*padding) / stride + 1. If outputSize is not an integer, an exception will
 * be thrown during network initialization or forward pass.
 * <br>
 * <br>
 * <br>
 * <b>Truncate</b>: Output size for Convolutional and Subsampling layers are calculated in the same way as in Strict (that
 * is, outputSize = (inputSize - kernelSize + 2*padding) / stride + 1) in each dimension.<br>
 * If outputSize is an integer, then Strict and Truncate are identical. However, if outputSize is <i>not</i> an integer,
 * the output size will be rounded down to an integer value.<br>
 * Specifically, ConvolutionMode.Truncate implements the following:<br>
 * output height = floor((inputHeight - kernelHeight + 2*paddingHeight) / strideHeight) + 1.<br>
 * output width = floor((inputWidth - kernelWidth + 2*paddingWidth) / strideWidth) + 1.<br>
 * where 'floor' is the floor operation (i.e., round down to the nearest integer).<br>
 * <br>
 * The major consequence of this rounding down: a border/edge effect will be seen if/when rounding down is required.
 * In effect, some number of inputs along the given dimension (height or width) will not be used as input and hence
 * some input activations can be lost/ignored. This can be problematic higher in the network (where the cropped activations
 * may represent a significant proportion of the original input), or with large kernel sizes and strides.<br>
 * In the given dimension (height or width) the number of truncated/cropped input values is equal to
 * (inputSize - kernelSize + 2*padding) % stride. (where % is the modulus/remainder operation).<br>
 * <br>
 * <br>
 * <br>
 * <b>Same</b>: Same mode operates differently to Strict/Truncate, in three key ways:<br>
 * (a) Manual padding values in convolution/subsampling layer configuration is not used; padding values are instead calculated
 *     automatically based on the input size, kernel size and strides.<br>
 * (b) The output sizes are calculated differently (see below) compared to Strict/Truncate. Most notably, when stride = 1
 *     the output size is the same as the input size.<br>
 * (c) The calculated padding values may different for top/bottom, and left/right (when they do differ: right and bottom
 *     may have 1 pixel/row/column more than top/left padding)<br>
 * The output size of a Convolutional/Subsampling layer using ConvolutionMode.Same is calculated as follows:<br>
 * output height = ceil( inputHeight / strideHeight )<br>
 * output width = ceil( inputWidth / strideWidth )<br>
 * where 'ceil' is the ceiling operation (i.e., round up to the nearest integer).<br>
 * <br>
 * The padding for top/bottom and left/right are automatically calculated as follows:<br>
 * totalHeightPadding = (outputHeight - 1) * strideHeight + filterHeight - inputHeight<br>
 * totalWidthPadding =  (outputWidth - 1) * strideWidth + filterWidth - inputWidth<br>
 * topPadding = totalHeightPadding / 2      (note: integer division)<br>
 * bottomPadding = totalHeightPadding - topPadding<br>
 * leftPadding = totalWidthPadding / 2      (note: integer division)<br>
 * rightPadding = totalWidthPadding - leftPadding<br>
 * Note that if top/bottom padding differ, then bottomPadding = topPadding + 1
 * <br>
 * <br>
 * <br>
 * <b>Causal</b>: Causal padding mode can only be used for 1D convolutional neural networks.<br>
 * The motivation behind causal padding mode is that the output time steps depend only on current and past time steps.<br>
 * That is, out[t] (for time t) depends on only on values in[T] for t < T<br>
 * The output size of 1D convolution/subsampling layers is the same as with SAME convolution mode -
 * i.e., outSize = ceil( inputSize / stride )<br>
 * Padding is also the same as SAME mode, but all padding in on the left (start of sequence) instead of being on both
 * left and right of the input<br>
 * For more details on causal convolutions, see <a href="https://arxiv.org/abs/1609.03499">WaveNet: A Generative Model For Audio</a>,
 * section 2.1.
 * <br>
 * <br>
 * <br>
 * For further information on output sizes for convolutional neural networks, see the "Spatial arrangement" section at
 * <a href="http://cs231n.github.io/convolutional-networks/">http://cs231n.github.io/convolutional-networks/</a>
 *
 * @author Alex Black
 */
public enum ConvolutionMode {

    Strict, Truncate, Same, Causal

}