High-level overview of specialized architectures for images and sequences.
While standard feedforward neural networks are powerful, specialized architectures have been developed to handle specific types of data more effectively. Two of the most important are Convolutional Neural Networks (CNNs) for spatial data like images, and Recurrent Neural Networks (RNNs) for sequential data like text or time series. Convolutional Neural Networks (CNNs) are designed to automatically and adaptively learn spatial hierarchies of features. Instead of connecting every neuron to every neuron in the next layer, CNNs use 'convolutional layers.' These layers apply a set of learnable filters (or kernels) across the input image. Each filter is specialized to detect a specific feature, like an edge, a corner, or a texture. As the image passes through the network, earlier layers learn to detect simple features, and later layers combine these to detect more complex objects. CNNs also use 'pooling layers' to downsample the feature maps, making the representation more compact and robust to small translations in the input. This architecture gives CNNs built-in 'translation invariance,' making them extremely effective for image classification and object detection. Recurrent Neural Networks (RNNs) are designed to work with sequences of data. Their key feature is a 'hidden state,' which acts as a memory. As the RNN processes a sequence one element at a time (e.g., one word at a time in a sentence), it passes information from the current step to the next step via this hidden state. This allows the network to maintain a memory of past elements in the sequence, which is crucial for tasks where context is important, such as language translation or sentiment analysis. While simple RNNs struggle with long-term dependencies, more advanced variants like LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units) were developed to address this, forming the basis of NLP for many years.