Artificial Intelligence / Machine Learning

An Overview of LSTM's

Source: Britannica

In the domain of natural language processing, we have seen that Recurrent Neural Network (RNN) play a crucial role. We have seen how RNN bridge the gap between previous information and the current tasks. The potential applications are from analyzing the video frames to predicting the next words in a sentence. But can RNN’s truly capture these long-term dependencies ?

When it comes to capturing long-term dependencies we will encounter the issues of vanishing and exploding gradients wherein the loss in each layer keeps on getting multiplied resulting in either too small gradient or huge valued gradient. RNN’s can autocomplete a sentence only when the gap between the relevant information is in proximity to where it’s needed. Ex: When there is fire, there is smoke. The gap between fire and smoke is not much.

But when the gap is more, the RNN’s become incapable of predicting as the training in such a case is very difficult. So, to address these issues LSTM came to the rescue.

LSTM is a special variant of RNN which introduces long-term memory along with short-term memory as the name suggests. They are empowered to remember and utilize the past information effectively.

WHAT IS LSTM ?

Long Short Term Memory networks (LSTM) are a special variant of RNN. Just like you remember what happened a few minutes ago, LSTM remembers things that happened in the past, even if it was a long time ago. It can understand the meaning of words in a sentence by looking at the words that came before. For example, if I say "I love ice cream because it is delicious," LSTM remembers that "it" refers to "ice cream" and helps the computer understand the whole sentence.

LSTM is really good at understanding and remembering patterns in information. It can be used for various applications like predicting the next word in a sentence, translating languages, and even controlling robots. It enables computers to understand and analyse sequential data in a way that is similar to what we do.

HOW DOES A LSTM LOOK LIKE ?

As you know, LSTM is a special kind of RNN. An RNN consists of a chain of repeating modules of neural networks i.e., a single tanh layer. It is repetitive because it performs the same task for each successive element with output depending on the previous input.

RNN Structure :

The key difference between RNN and LSTM is that it has additional signal information that is given from one cell to the other which is known as cell state. The cell state is capable of passing the information without getting diminished and this is controlled using gates.

LSTM structure :

WORKING OF A LSTM :

Cell state:

The top horizontal line is the cell state also known as a memory cell, is a crucial component of LSTM networks. At each time step, the LSTM selectively decides how much new information to store in the cell state, how much old information to forget, and how much of the stored information to output to the next time step or the final prediction. The cell state is updated mainly by using two operations: addition and multiplication.

Let’s understand LSTM step-by-step using an example :

Peacock is the national bird of India whereas the panda and red-crowned crane are the national animal and bird of China.

1) Forget Gate: It consists of a sigmoid function which restricts the output between 0 and 1 by looking at the present input and h(t-1). 0 indicates that it should completely forget the corresponding information, while a value of 1 means that the information should be retained entirely.

Here, in our example, when a panda is encountered we want to forget the memory of the peacock because it is no longer needed.

2) Input Gate: This gate decides which values need to be updated. And the candidate is passed through the tanh function which creates a new vector for the candidate. This is let into the cell state to add the new information.

In our example, we have updated the value of peacock to the value of panda. We added the memory of the panda which is the new candidate. In the next iteration we will add the memory of the red-crowned crane. So, now the memory of the panda and red-crowned crane is stored.

3) Output Gate: The output gate uses a sigmoid activation function to determine the relevance of the information stored in the memory cell. It outputs a value between 0 and 1, where 0 means the information is not relevant for the output and 1 means it is entirely relevant. Then we pass this information through the tanh function to compress the value between 1 and -1.

In our example, when LSTM recognizes a panda and red-crowned crane as its input, it ensures that the autocomplete output to be generated is China.

COMPUTATIONS IN EACH STAGE :

Block diagram of the LSTM recurrent neural network cell unit. Blue... | Download Scientific Diagram — Source: ResearchGate

Forget Gate: ft = sigmoid(Wf (ht-1 + xt) + bf)

Input Gate: it = sigmoid(Wi(ht-1 + xt) + bi)

ct~ = tanh(Wc(ht-1 + xt) + bc)

Output Gate: ot = sigmoid(Wo(ht-1 + xt) + bo)

Cell state: ct = ct-1*ft + it* ct~

ht = ot * tanh(ct)

APPLICATIONS OF LSTM :

There are a wide range of applications where LSTM networks have shown great performance by effectively modelling sequential data and capturing long-term dependencies. Few examples are :

Language Translation – Ex: English to Hindi translation
Speech Recognition – Ex: voice assistants
Sentiment Analysis
Text Generation – Ex: chatbots
Handwriting Recognition – Ex: convert handwritten text to digital format
Music Composition
Anomaly Detection – Ex: Fraud detection
Time Series Prediction – Ex: Stock market prediction, Weather forecasting

CONCLUSION :

In this article, we have dived into the journey of inner workings of LSTM, unwrapping its architecture, computations and the key components that made them to the world of sequential learning. Also, we have seen that the applications which are accomplished earlier by RNN’s can now be done by LSTM’s.

ALSO CHECKOUT:

To read more awesome articles on AI and machine learning, check out our Knowledge Hub. Click here to access informative and engaging content on cutting-edge technologies.

REFERENCES:

Understanding LSTM Networks: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Understanding architecture of LSTM: https://www.analyticsvidhya.com/blog/2021/01/understanding-architecture-of-lstm/
Types of LSTM: https://www.exxactcorp.com/blog/Deep-Learning/5-types-of-lstm-recurrent-neural-networks-and-what-to-do-with-them

By Soumya G

An Overview of LSTM's

Read next

Explainability of a Model In Image Classification

STAR SCHEMA

Basic GAN Modelling