Java tutorial
/******************************************************************************* * Copyright (c) 2015-2018 Skymind, Inc. * * This program and the accompanying materials are made available under the * terms of the Apache License, Version 2.0 which is available at * https://www.apache.org/licenses/LICENSE-2.0. * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the * License for the specific language governing permissions and limitations * under the License. * * SPDX-License-Identifier: Apache-2.0 ******************************************************************************/ package org.deeplearning4j.nn.conf; /** * ConvolutionMode defines how convolution operations should be executed for Convolutional and Subsampling layers, * for a given input size and network configuration (specifically stride/padding/kernel sizes).<br> * Currently, 3 modes are provided: * <br> * <br> * <b>Strict</b>: Output size for Convolutional and Subsampling layers are calculated as follows, in each dimension: * outputSize = (inputSize - kernelSize + 2*padding) / stride + 1. If outputSize is not an integer, an exception will * be thrown during network initialization or forward pass. * <br> * <br> * <br> * <b>Truncate</b>: Output size for Convolutional and Subsampling layers are calculated in the same way as in Strict (that * is, outputSize = (inputSize - kernelSize + 2*padding) / stride + 1) in each dimension.<br> * If outputSize is an integer, then Strict and Truncate are identical. However, if outputSize is <i>not</i> an integer, * the output size will be rounded down to an integer value.<br> * Specifically, ConvolutionMode.Truncate implements the following:<br> * output height = floor((inputHeight - kernelHeight + 2*paddingHeight) / strideHeight) + 1.<br> * output width = floor((inputWidth - kernelWidth + 2*paddingWidth) / strideWidth) + 1.<br> * where 'floor' is the floor operation (i.e., round down to the nearest integer).<br> * <br> * The major consequence of this rounding down: a border/edge effect will be seen if/when rounding down is required. * In effect, some number of inputs along the given dimension (height or width) will not be used as input and hence * some input activations can be lost/ignored. This can be problematic higher in the network (where the cropped activations * may represent a significant proportion of the original input), or with large kernel sizes and strides.<br> * In the given dimension (height or width) the number of truncated/cropped input values is equal to * (inputSize - kernelSize + 2*padding) % stride. (where % is the modulus/remainder operation).<br> * <br> * <br> * <br> * <b>Same</b>: Same mode operates differently to Strict/Truncate, in three key ways:<br> * (a) Manual padding values in convolution/subsampling layer configuration is not used; padding values are instead calculated * automatically based on the input size, kernel size and strides.<br> * (b) The output sizes are calculated differently (see below) compared to Strict/Truncate. Most notably, when stride = 1 * the output size is the same as the input size.<br> * (c) The calculated padding values may different for top/bottom, and left/right (when they do differ: right and bottom * may have 1 pixel/row/column more than top/left padding)<br> * The output size of a Convolutional/Subsampling layer using ConvolutionMode.Same is calculated as follows:<br> * output height = ceil( inputHeight / strideHeight )<br> * output width = ceil( inputWidth / strideWidth )<br> * where 'ceil' is the ceiling operation (i.e., round up to the nearest integer).<br> * <br> * The padding for top/bottom and left/right are automatically calculated as follows:<br> * totalHeightPadding = (outputHeight - 1) * strideHeight + filterHeight - inputHeight<br> * totalWidthPadding = (outputWidth - 1) * strideWidth + filterWidth - inputWidth<br> * topPadding = totalHeightPadding / 2 (note: integer division)<br> * bottomPadding = totalHeightPadding - topPadding<br> * leftPadding = totalWidthPadding / 2 (note: integer division)<br> * rightPadding = totalWidthPadding - leftPadding<br> * Note that if top/bottom padding differ, then bottomPadding = topPadding + 1 * <br> * <br> * <br> * <b>Causal</b>: Causal padding mode can only be used for 1D convolutional neural networks.<br> * The motivation behind causal padding mode is that the output time steps depend only on current and past time steps.<br> * That is, out[t] (for time t) depends on only on values in[T] for t < T<br> * The output size of 1D convolution/subsampling layers is the same as with SAME convolution mode - * i.e., outSize = ceil( inputSize / stride )<br> * Padding is also the same as SAME mode, but all padding in on the left (start of sequence) instead of being on both * left and right of the input<br> * For more details on causal convolutions, see <a href="https://arxiv.org/abs/1609.03499">WaveNet: A Generative Model For Audio</a>, * section 2.1. * <br> * <br> * <br> * For further information on output sizes for convolutional neural networks, see the "Spatial arrangement" section at * <a href="http://cs231n.github.io/convolutional-networks/">http://cs231n.github.io/convolutional-networks/</a> * * @author Alex Black */ public enum ConvolutionMode { Strict, Truncate, Same, Causal }