In this video you will learn about:
For data with hig dimensionality, for example images (3D but 2D for every RGB channel), but also time-series, a single input for a NN may be respresented by thousands of data points. Specifically, a gray image of dimensionality 224×224 contains over 50K pixels! Imagine image classification task. Would each pixel in the image be correlated with the subject/class in the picture? Rather, the networks should itterativelly be trained to associate particular areas of the image capturing at least some part od the subject/class .
Look at the image bellow! Segments or objects, if you want, are masked with different colors. E.g. people - red mask, cars - blue mask? How it is done? By convolution! If you are interested to learn more about segmentation, you may be interested in Mask R-CNN, one of the image segmentation state-of-the-art model.
First, it reduces dimensionality of the input data, so called down-pooling! It uses the kernel operator - a matrix - sliding on the input data. By learning kernel weights it extracts properties in the data/image like edges.
It's the kernel! ! The sliding matrix on the matrix (data). In other words, it is a matrix applied to an image in order to recognize patterns.
e.g. you can highlight edges applying a sobel filter through matrix multiplication