3 DeepLearning algorithms explained in Human Language

by | Feb 25, 2018 | Deep Learning, Machine Learning, Statistics

Deep learning has been the talk of the town in recent years. And with good reason: this subset of machine learning has made impressive inroads in a number of research fields: facial recognition, speech synthesis, machine translation, and many others.

What these fields of research have in common is that they are perceptual problems, linked to our senses and our expression. They have therefore long represented a real challenge for researchers, as it is extremely difficult to translate sight or voice using algorithms and mathematical formulas.

As a result, the first models to be implemented in these fields were built on a certain level of industry expertise (for speech recognition: decomposition into phonemes, for machine translation: passing through grammatical and syntactic rules). Years of research have been devoted to exploiting and transforming this unstructured data in such a way as to derive meaning from it.

The problem is that these new representations of data invented by researchers have run up against generalization: to any text, image or sound. If you used Google Tranlate before 2014, when they switched 100% to deep learning, you'll remember the obvious limitations back then.

In deep learning, the network is placed directly at the level of the data, without any prior deformation or aggregation. Then, by means of an extremely large number of parameters that are self-adjusting as it learns, the network will itself learn the implicit links that exist in the data.

Before going into detail about three different algorithms* used in deep learning for different use cases, let's start by simply defining the model at the heart of deep learning: the "neural network".

*We also talk about different network architectures.

1. Neural Networks

Let's face it, neural networks have very little to do with the neural system and the brain. The analogy between a neuron and a single-layer neural network is essentially graphical, in that there is a flow of information from one end of the network to the other.

neural network vs neuronThe first layer of a neural network is the input layer. This is where the data you have will come in. Before you can "feed" the network, you'll need to transform your data into numbers, if it isn't already.

Let's take the example of the sentiment analysis of a text.

You have 10,000 comments on your site about products sold:
Together with your team, you label 1000 of them (we'll see that you can also rely on pre-trained neural networks) into 3 classes (satisfied | neutral | dissatisfied). This number of 3 classes, often used in sentiment analysis, is an example, and you can actually define more.

- "I loved it, very good taste";
- "I didn't like the packaging very much";
- "I thought it was pretty good"; "I didn't like the packaging".

The final layer, known as the output, will provide you with the "satisfied / neutral / dissatisfied" classification.

And all the layers between the input and output layers, the so-called "hidden" layers, are different representations of the data. A representation might be the number of words in a sentence, the number of punctuation marks (?!) in a sentence, etc. You don't have to tell the network what these representations are. You don't need to tell the network what these representations are; if statistically they help to classify sentences correctly, the network will learn them on its own.

simple neural network

To illustrate these layers, let's take another example: estimating the price of a house.
As we can see, we take 4 variables as inputs: the surface area of the house, the number of bedrooms, the zip code and the degree of affluence of the neighborhood. The output is not a classification but a prediction of a number: the price of the house. This is a regression problem.
The words in italics refer to examples of the representations that the neural network will make of the data after seeing a large number of them.

The network parameters are updated through a process called "backpropagation". The more hidden layers there are in the network, the "deeper" it is, hence the name "deep" learning.

Let's take a look at 3 different types of neural network architecture.

2. Convolutional neural networks (CNN)


These networks are used for everything to do with images and video, including facial recognition and image classification.

The Bai Du company (China's equivalent of Google), for example, has set up gates operated by visual recognition that let only their employees through.

Snapchat and many other mobile applications have used the breakthrough of machine learning and CNNs to enhance their "filter" features.

convolutional neural networkThe name convolutional network refers to a mathematical term: the convolution product.
In simple terms, the idea is to apply a filter to the input image, with the filter parameters being learned as the learning process progresses. A learned filter will, for example, detect angles in an image if the angles are used to best classify the image.

The image is first decomposed into the 3 channels (R,G,B) pixel by pixel, thus obtaining 3 matrices of size n x n (where n is the number of pixels).

Here is an example of convolution with a matrix of size 6 x 6:


neural network convolution

It is important to note two important advantages inherent in convolutional networks:

  • the network can learn to recognize the characteristic elements of an image in stages. To recognize a face, for example, it first learns to recognize eyelids and pupils, and then eyes;
  • once an element has been learned at one point in the image, the network will be able to recognize it anywhere else in the image.


3. Recurrent neural networks (RNN)


Recurrent neural networks are at the heart of a number of substantial improvements in fields as diverse as speech recognition, automatic music composition, sentiment analysis, DNA sequence analysis and machine translation.

The main difference with other neural networks is that they take into account the successive chaining of data, often over time. For example, when analyzing a series of sensor measurements (time series), the network will still have in memory all or part of the previous observations.

A diagram of this network is shown here:


recurrent neural network

Instead of taking input data into account separately (like a CNN analyzes frame by frame), the recurrent network takes past input data into account.

Some architectures, known as bidirectional, can also take future data into account. For example, when analyzing text to find named entities (names of people, companies, countries, etc.), you need to see the words in the whole sentence.

For example:

  • "I see [Jean] Valjean has escaped you again, Javert!"
  • "I see that [Jean] R. is also starring in the adaptation of Les Misérables."

The beginning of the sentence is not enough to identify who "Jean" is.


4. Auto encoders


Autoencoders are mainly used for anomaly detection (e.g. to detect bank fraud or to find anomalies in an industrial production line). They can also be used for dimension reduction (similar to Principal Component Analysis). In fact, the aim of autoencoders is to teach the machine what "normal" observations consist of.
The architecture of our network is as follows:


The network will therefore represent the data using one or more hidden layers, so that the output will contain the same data as the input.

The objective of finding the same data at the output as at the input is characteristic of auto-encoders (analogous to the identity function f(x)=x).
The encoding and decoding phase is not unique to auto-encoders. In fact, they are found in machine translation in recurrent neural networks.

After training the network with sufficient data, it will be possible to identify suspicious or abnormal observations when they exceed a certain threshold in relation to the new "norm".



We have seen 3 main types of neural networks:

  1. convolutional networks with applications in facial recognition and image classification
  2. recurrent networks with applications in text and voice analysis;
  3. auto encoders with applications in anomaly detection and dimension reduction

Other architectures exist, such as GANs (generative adversarial networks), which consist of a model that generates candidates for a given task, such as synthesizing an image, and another that evaluates them. Or reinforcement learning, the method used by DeepMind to train its Alpha Go and Alpha Go zero models.

Of course, there are limits: for example, convolutional networks can be fooled by adding noise to images that is undetectable to the human eye, but which can be fatal to a model that has not been sufficiently robustly tested. New architectures such as capsule networks have merged to tackle this particular problem.


There's no doubt that deep learning still has a bright future ahead of it, with many new applications for businesses to come.

Gaël Bonnardot,

Cofounder and CTO at Datakeen

At Datakeen we aim to simplify the use and understanding of new machine learning paradigms by business functions across all industries.

Contact us for more information: contact@datakeen.co



Discover all our AI solutions