You are reading the article Fundamentals Of Deep Learning – Introduction To Recurrent Neural Networks updated in November 2023 on the website Cancandonuts.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested December 2023 Fundamentals Of Deep Learning – Introduction To Recurrent Neural Networks
IntroductionLet me open this article with a question – “working love learning we on deep”, did this make any sense to you? Not really – read this one – “We love working on deep learning”. Made perfect sense! A little jumble in the words made the sentence incoherent. Well, can we expect a neural network to make sense out of it? Not really! If the human brain was confused on what it meant I am sure a neural network is going to have a tough time deciphering such text.
There are multiple such tasks in everyday life which get completely disrupted when their sequence is disturbed. For instance, language as we saw earlier- the sequence of words define their meaning, a time series data – where time defines the occurrence of events, the data of a genome sequence- where every sequence has a different meaning. There are multiple such cases wherein the sequence of information determines the event itself. If we are trying to use such data for any reasonable output, we need a network which has access to some prior knowledge about the data to completely understand it. Recurrent neural networks thus come into play.
In this article I would assume that you have a basic understanding of neural networks, in case you need a refresher please go through this article before you proceed.
Table of Contents
Need for a Neural Network dealing with Sequences
What are Recurrent Neural Networks (RNNs)?
Understanding a Recurrent Neuron in Detail
Forward Propagation in a Recurrent Neuron in Excel
Back propagation in a RNN (BPTT)
Implementation of RNN in Keras
Vanishing and Exploding Gradient Problem
Other RNN Architectures
Need for a Neural Network dealing with SequencesBefore we deep dive into the details of what a recurrent neural network is, let’s ponder a bit on if we really need a network specially for dealing with sequences in information. Also what are kind of tasks that we can achieve using such networks.
The beauty of recurrent neural networks lies in their diversity of application. When we are dealing with RNNs they have a great ability to deal with various input and output types.
Sentiment Classification – This can be a task of simply classifying tweets into positive and negative sentiment. So here the input would be a tweet of varying lengths, while output is of a fixed type and size.
Image Captioning – Here, let’s say we have an image for which we need a textual description. So we have a single input – the image, and a series or sequence of words as output. Here the image might be of a fixed size, but the output is a description of varying lengths
Language Translation – This basically means that we have some text in a particular language let’s say English, and we wish to translate it in French. Each language has it’s own semantics and would have varying lengths for the same sentence. So here the inputs as well as outputs are of varying lengths.
So RNNs can be used for mapping inputs to outputs of varying types, lengths and are fairly generalized in their application. Looking at their applications, let’s see how the architecture of an RNN looks like.
What are Recurrent Neural Networks?Let’s say the task is to predict the next word in a sentence. Let’s try accomplishing it using an MLP. So what happens in an MLP. In the simplest form, we have an input layer, a hidden layer and an output layer. The input layer receives the input, the hidden layer activations are applied and then we finally receive the output.
Let’s have a deeper network, where multiple hidden layers are present. So here, the input layer receives the input, the first hidden layer activations are applied and then these activations are sent to the next hidden layer, and successive activations through the layers to produce the output. Each hidden layer is characterized by its own weights and biases.
Since each hidden layer has its own weights and activations, they behave independently. Now the objective is to identify the relationship between successive inputs. Can we supply the inputs to hidden layers? Yes we can!
Here, the weights and bias of these hidden layers are different. And hence each of these layers behave independently and cannot be combined together. To combine these hidden layers together, we shall have the same weights and bias for these hidden layers.
We can now combines these layers together, that the weights and bias of all the hidden layers is the same. All these hidden layers can be rolled in together in a single recurrent layer.
So it’s like supplying the input to the hidden layer. At all the time steps weights of the recurrent neuron would be the same since its a single neuron now. So a recurrent neuron stores the state of a previous input and combines with the current input thereby preserving some relationship of the current input with the previous input.
Understanding a Recurrent Neuron in DetailLet’s take a simple task at first. Let’s take a character level RNN where we have a word “Hello”. So we provide the first 4 letters i.e. h,e,l,l and ask the network to predict the last letter i.e.’o’. So here the vocabulary of the task is just 4 letters {h,e,l,o}. In real case scenarios involving natural language processing, the vocabularies include the words in entire wikipedia database, or all the words in a language. Here for simplicity we have taken a very small set of vocabulary.
Let’s see how the above structure be used to predict the fifth letter in the word “hello”. In the above structure, the blue RNN block, applies something called as a recurrence formula to the input vector and also its previous state. In this case, the letter “h” has nothing preceding it, let’s take the letter “e”. So at the time the letter “e” is supplied to the network, a recurrence formula is applied to the letter “e” and the previous state which is the letter “h”. These are known as various time steps of the input. So if at time t, the input is “e”, at time t-1, the input was “h”. The recurrence formula is applied to e and h both. and we get a new state.
The formula for the current state can be written as –
Here, Ht is the new state, ht-1 is the previous state while xt is the current input. We now have a state of the previous input instead of the input itself, because the input neuron would have applied the transformations on our previous input. So each successive input is called as a time step.
In this case we have four inputs to be given to the network, during a recurrence formula, the same function and the same weights are applied to the network at each time step.
Taking the simplest form of a recurrent neural network, let’s say that the activation function is tanh, the weight at the recurrent neuron is Whh and the weight at the input neuron is Wxh, we can write the equation for the state at time t as –
The Recurrent neuron in this case is just taking the immediate previous state into consideration. For longer sequences the equation can involve multiple such states. Once the final state is calculated we can go on to produce the output
Now, once the current state is calculated we can calculate the output state as-
Let me summarize the steps in a recurrent neuron for you-
A single time step of the input is supplied to the network i.e. xt is supplied to the network
We then calculate its current state using a combination of the current input and the previous state i.e. we calculate ht
The current ht becomes ht-1 for the next time step
We can go as many time steps as the problem demands and combine the information from all the previous states
Once all the time steps are completed the final current state is used to calculate the output yt
The output is then compared to the actual output and the error is generated
The error is then backpropagated to the network to update the weights(we shall go into the details of backpropagation in further sections) and the network is trained
Let’s take a look of how we can calculate these states in Excel and get the output.
Forward Propagation in a Recurrent Neuron in ExcelLet’s take a look at the inputs first –
The inputs are one hot encoded. Our entire vocabulary is {h,e,l,o} and hence we can easily one hot encode the inputs.
Now the input neuron would transform the input to the hidden state using the weight wxh. We have randomly initialized the weights as a 3*4 matrix –
Step 1:
Now for the letter “h”, for the the hidden state we would need Wxh*Xt. By matrix multiplication, we get it as –
Step 2:
Now moving to the recurrent neuron, we have Whh as the weight which is a 1*1 matrix as and the bias which is also a 1*1 matrix as
For the letter “h”, the previous state is [0,0,0] since there is no letter prior to it.
Step 3:
Now we can get the current state as –
Since for h, there is no previous hidden state we apply the tanh function to this output and get the current state –
Step 4:
Now we go on to the next state. “e” is now supplied to the network. The processed output of ht, now becomes ht-1, while the one hot encoded e, is xt. Let’s now calculate the current state ht.
Whh*ht-1 +bias will be –
Wxh*xt will be –
Step 5:
Now calculating ht for the letter “e”,
Now this would become ht-1 for the next state and the recurrent neuron would use this along with the new character to predict the next one.
Step 6:
At each state, the recurrent neural network would produce the output as well. Let’s calculate yt for the letter e.
Step 7:
The probability for a particular letter from the vocabulary can be calculated by applying the softmax function. so we shall have softmax(yt)
If we convert these probabilities to understand the prediction, we see that the model says that the letter after “e” should be h, since the highest probability is for the letter “h”. Does this mean we have done something wrong? No, so here we have hardly trained the network. We have just shown it two letters. So it pretty much hasn’t learnt anything yet.
Now the next BIG question that faces us is how does Back propagation work in case of a Recurrent Neural Network. How are the weights updated while there is a feedback loop?
Back propagation in a Recurrent Neural Network(BPTT)To imagine how weights would be updated in case of a recurrent neural network, might be a bit of a challenge. So to understand and visualize the back propagation, let’s unroll the network at all the time steps. In an RNN we may or may not have outputs at each time step.
In case of a forward propagation, the inputs enter and move forward at each time step. In case of a backward propagation in this case, we are figuratively going back in time to change the weights, hence we call it the Back propagation through time(BPTT).
In case of an RNN, if yt is the predicted value ȳt is the actual value, the error is calculated as a cross entropy loss –
Et(ȳt,yt) = – ȳt log(yt)
E(ȳ,y) = – ∑ ȳt log(yt)
We typically treat the full sequence (word) as one training example, so the total error is just the sum of the errors at each time step (character). The weights as we can see are the same at each time step. Let’s summarize the steps for backpropagation
The cross entropy error is first computed using the current output and the actual output
Remember that the network is unrolled for all the time steps
For the unrolled network, the gradient is calculated for each time step with respect to the weight parameter
Now that the weight is the same for all the time steps the gradients can be combined together for all time steps
The weights are then updated for both recurrent neuron and the dense layers
The unrolled network looks much like a regular neural network. And the back propagation algorithm is similar to a regular neural network, just that we combine the gradients of the error for all time steps. Now what do you think might happen, if there are 100s of time steps. This would basically take really long for the network to converge since after unrolling the network becomes really huge.
In case you do not wish to deep dive into the math of backpropagation, all you need to understand is that back propagation through time works similar as it does in a regular neural network once you unroll the recurrent neuron in your network. However, I shall be coming up with a detailed article on Recurrent Neural networks with scratch with would have the detailed mathematics of the backpropagation algorithm in a recurrent neural network.
Implementation of Recurrent Neural Networks in KerasLet’s use Recurrent Neural networks to predict the sentiment of various tweets. We would like to predict the tweets as positive or negative. You can download the dataset here.
We have around 1600000 tweets to train our network. If you’re not familiar with the basics of NLP, I would strongly urge you to go through this article. We also have another detailed article on word embedding which would also be helpful for you to understand word embeddings in detail.
Let’s now use RNNs to classify various tweets as positive or negative.
# import all libraries import keras from keras.models import Sequential from keras.layers import Dense, Activation, Dropout from keras.layers.convolutional import Conv1D from chúng tôi import Tokenizer from keras.preprocessing.sequence import pad_sequences import pandas as pd import numpy as np import spacy nlp=spacy.load("en") #load the dataset train=pd.read_csv("../datasets/training.1600000.processed.noemoticon.csv" , encoding= "latin-1") Y_train = train[train.columns[0]] X_train = train[train.columns[5]] # split the data into test and train from sklearn.model_selection import train_test_split trainset1x, trainset2x, trainset1y, trainset2y = train_test_split(X_train.values, Y_train.values, test_size=0.02,random_state=42 ) trainset2y=pd.get_dummies(trainset2y) # function to remove stopwords def stopwords(sentence): new=[] sentence=nlp(sentence) for w in sentence: if (w.is_stop == False) & (w.pos_ !="PUNCT"): new.append(w.string.strip()) c=" ".join(str(x) for x in new) return c # function to lemmatize the tweets def lemmatize(sentence): sentence=nlp(sentence) str="" for w in sentence: str+=" "+w.lemma_ return nlp(str) #loading the glove model def loadGloveModel(gloveFile): print("Loading Glove Model") f = open(gloveFile,'r') model = {} for line in f: splitLine = line.split() word = splitLine[0] embedding = [float(val) for val in splitLine[1:]] model[word] = embedding print ("Done."),len(model),(" words loaded!") return model # save the glove model model=loadGloveModel("/mnt/hdd/datasets/glove/glove.twitter.27B.200d.txt") #vectorising the sentences def sent_vectorizer(sent, model): sent_vec = np.zeros(200) numw = 0 for w in sent.split(): try: sent_vec = np.add(sent_vec, model[str(w)]) numw+=1 except: pass return sent_vec #obtain a clean vector cleanvector=[] for i in range(trainset2x.shape[0]): document=trainset2x[i] document=document.lower() document=lemmatize(document) document=str(document) cleanvector.append(sent_vectorizer(document,model)) #Getting the input and output in proper shape cleanvector=np.array(cleanvector) cleanvector =cleanvector.reshape(len(cleanvector),200,1) #tokenizing the sequences tokenizer = Tokenizer(num_words=16000) tokenizer.fit_on_texts(trainset2x) sequences = tokenizer.texts_to_sequences(trainset2x) word_index = tokenizer.word_index print('Found %s unique tokens.' % len(word_index)) data = pad_sequences(sequences, maxlen=15, padding="post") print(data.shape) #reshape the data and preparing to train data=data.reshape(len(cleanvector),15,1) from sklearn.model_selection import train_test_split trainx, validx, trainy, validy = train_test_split(data, trainset2y, test_size=0.3,random_state=42 ) #calculate the number of words nb_words=len(tokenizer.word_index)+1 #obtain theembedding matrix embedding_matrix = np.zeros((nb_words, 200)) for word, i in word_index.items(): embedding_vector = model.get(word) if embedding_vector is not None: embedding_matrix[i] = embedding_vector print('Null word embeddings: %d' % np.sum(np.sum(embedding_matrix, axis=1) == 0)) trainy=np.array(trainy) validy=np.array(validy) #building a simple RNN model def modelbuild(): model = Sequential() model.add(keras.layers.InputLayer(input_shape=(15,1))) keras.layers.embeddings.Embedding(nb_words, 15, weights=[embedding_matrix], input_length=15, trainable=False) model.add(keras.layers.recurrent.SimpleRNN(units = 100, activation='relu', use_bias=True)) model.add(keras.layers.Dense(units=1000, input_dim = 2000, activation='sigmoid')) model.add(keras.layers.Dense(units=500, input_dim=1000, activation='relu')) model.add(keras.layers.Dense(units=2, input_dim=500,activation='softmax')) return model #compiling the model finalmodel = modelbuild() finalmodel.fit(trainx, trainy, epochs=10, batch_size=120,validation_data=(validx,validy))If you would run this model, it may not provide you with the best results since this is an extremely simple architecture and quite a shallow network. I would strongly urge you to play with the architecture of the network to obtain better results. Also, there are multiple approaches to how to preprocess your data. Preprocessing shall completely depend on the task at hand.
Vanishing and Exploding Gradient ProblemRNNs work upon the fact that the result of an information is dependent on its previous state or previous n time steps. Regular RNNs might have a difficulty in learning long range dependencies. For instance if we have a sentence like “The man who ate my pizza has purple hair”. In this case, the description purple hair is for the man and not the pizza. So this is a long dependency.
If we backpropagate the error in this case, we would need to apply the chain rule. To calculate the error after the third time step with respect to the first one –
∂E/∂W = ∂E/∂y3 *∂y3/∂h3 *∂h3/∂y2 *∂y2/∂h1 .. and there is a long dependency.
Here we apply the chain rule and if any one of the gradients approached 0, all the gradients would rush to zero exponentially fast due to the multiplication. Such states would no longer help the network to learn anything. This is known as the vanishing gradient problem.
Vanishing gradient problem is far more threatening as compared to the exploding gradient problem, where the gradients become very very large due to a single or multiple gradient values becoming very high.
The reason why Vanishing gradient problem is more concerning is that an exploding gradient problem can be easily solved by clipping the gradients at a predefined threshold value. Fortunately there are ways to handle vanishing gradient problem as well. There are architectures like the LSTM(Long Short term memory) and the GRU(Gated Recurrent Units) which can be used to deal with the vanishing gradient problem.
Other RNN architecturesAs we saw, RNNs suffer from vanishing gradient problems when we ask them to handle long term dependencies. They also become severely difficult to train as the number of parameters become extremely large. If we unroll the network, it becomes so huge that its convergence is a challenge.
Long Short Term Memory networks – usually called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber. They work tremendously well on a large variety of problems, and are now widely used. LSTMs also have this chain like structure, but the repeating module has a slightly different structure. Instead of having a single neural network layer, there are multiple layers, interacting in a very special way. They have an input gate, a forget gate and an output gate. We shall be coming up with detailed article on LSTMs soon.
Another efficient RNN architecture is the Gated Recurrent Units i.e. the GRUs. They are a variant of LSTMs but are simpler in their structure and are easier to train. Their success is primarily due to the gating network signals that control how the present input and previous memory are used, to update the current activation and produce the current state. These gates have their own sets of weights that are adaptively updated in the learning phase. We have just two gates here, the reset an the update gate. Stay tuned for more detailed articles on GRUs.
End NotesRelated
You're reading Fundamentals Of Deep Learning – Introduction To Recurrent Neural Networks
Top 10 Applications Of Artificial Neural Networks In 2023
The Top 10 Applications of Artificial Neural Networks in 2023
Artificial Neural Networks (ANNs) are rapidly emerging as one of the most powerful and versatile technologies of the 21st century. They are a subset of machine learning that is inspired by the structure and function of the human brain and are capable of learning and adapting to complex patterns in data. In recent years, ANNs have found their way into numerous industries and applications, ranging from speech recognition and image processing to financial forecasting and medical diagnosis.
In this article, we will explore the top 10 applications of ANNs in 2023 and what makes them so effective in these domains.
Image Recognition and Computer Vision
Image recognition is one of the most well-known applications of ANNs. In computer vision, ANNs are used to identify objects, people, and scenes in images and videos. ANNs can learn to identify patterns in pictures and make predictions about what is in the image. This technology is already being used in many fields, including surveillance, autonomous vehicles, and medical imaging.
Speech Recognition and Natural Language Processing (NLP)
Speech recognition and NLP are other popular applications of ANNs. In speech recognition, ANNs are used to transcribe spoken words into text, while in NLP, they are used to analyze and understand the meaning of the text. These technologies are being used in virtual assistants, customer service chatbots, and other applications that require the ability to understand and respond to human speech.
Financial Forecasting and Trading
Financial forecasting and trading are areas where ANNs are being used to make predictions about market trends and stock prices. ANNs can analyze large amounts of financial data and identify patterns and relationships that can be used to make informed decisions. This technology is being used by hedge funds, banks, and other financial institutions to improve their investment strategies and minimize risk.
Medical Diagnosis and Treatment Planning
Medical diagnosis and treatment planning are critical applications of ANNs. In medical diagnosis, ANNs are used to analyze medical images and patient data to identify diseases and disorders. In treatment planning, ANNs are used to develop personalized treatment plans based on a patient’s individual characteristics and medical history. These technologies are helping to improve the accuracy and effectiveness of medical diagnoses and treatments, making healthcare more accessible and affordable for everyone.
Autonomous Vehicles
Autonomous vehicles are one of the most exciting applications of ANNs. In autonomous vehicles, ANNs are used to analyze sensor data and make decisions about how the vehicle should respond to its environment. This technology is being used to develop self-driving cars, drones, and other autonomous vehicles that can operate without human intervention.
Recommender Systems
Recommender systems are another application of ANNs that are changing the way we interact with technology. In recommender systems, ANNs are used to analyze user behavior and make recommendations about products, services, and content that are likely to be of interest to the user. This technology is being used by e-commerce websites, streaming services, and other online platforms to improve the user experience and increase engagement.
Natural Language Generation
Natural language generation is a relatively new application of ANNs that is rapidly gaining popularity. In natural language generation, ANNs are used to generate text that mimics human writing. This technology is being used in news articles, reports, and other forms of content that require the ability to write in a natural and engaging style.
Fraud Detection
Fraud detection is an important application of ANNs that is being used to prevent financial losses and protect businesses and consumers. In fraud detection, ANNs are used to analyze financial transactions and identify patterns that indicate fraudulent activity. This technology is being used by banks, credit card companies and other financial institutions to improve their security measures and reduce the risk of fraud.
Supply Chain Optimization
Supply chain optimization is another area where ANNs are being used to improve efficiency and reduce costs. In supply chain optimization, ANNs are used to analyze data from various stages of the supply chain, from raw materials to finished products, to identify bottlenecks and inefficiencies. This technology is helping companies to streamline their supply chains, reduce waste, and improve their overall performance.
Predictive Maintenance
Car Price Prediction – Machine Learning Vs Deep Learning
This article was published as a part of the Data Science Blogathon
1. ObjectiveIn this article, we will be predicting the prices of used cars. We will be building various Machine Learning models and Deep Learning models with different architectures. In the end, we will see how machine learning models perform in comparison to deep learning models.
2. Data UsedHere we have used the data from a hiring competition that was live on chúng tôi Use the below link to access the data and use it for your analysis.
3. Data InspectionIn this section, we will explore the data. First Let’s see what columns we have in the data and their data types along with missing values information.
We can observe that data have 19237 rows and 18 columns.
There are 5 numeric columns and 13 categorical columns. With the first look, we can see that there are no missing values in the data.
‘Price‘ column/feature is going to be the target column or dependent feature for this project.
Let’s see the distribution of the data.
4. Data PreparationHere we will clean the data and prepare it for training the model.
‘ID’ columnWe are dropping the ‘ID’ column since it does not hold any significance for car Price prediction.
df.drop('ID',axis=1,inplace=True) ‘Levy’ columnAfter analyzing the ‘Levy’ column we found out that it does contain the missing values but it was given as ‘-‘ in the data and that’s why we were not able to capture the missing values earlier in the data.
Here we will impute ‘-‘ in the ‘Levy’ column with ‘0’ assuming there was no ‘Levy’. We can also impute it with ‘mean’ or ‘median’, but that’s a choice that you have to make.
df['Levy']=df['Levy'].replace('-',np.nan) df['Levy']=df['Levy'].astype(float) levy_mean=0 df['Levy'].fillna(levy_mean,inplace=True) df['Levy']=round(df['Levy'],2) ‘Mileage’ column‘Mileage’ column here means how many kilometres the car has driven. ‘km’ is written in the column after each reading. We will remove that.
#since milage is in KM only we will remove 'km' from it and make it numerical df['Mileage']=df['Mileage'].apply(lambda x:x.split(' ')[0]) df['Mileage']=df['Mileage'].astype('int') ‘Engine Volume’ columnIn the ‘Engine Volumn’ column along with the Engine Volumn ‘type’ of the engine(Turbo or not Turbo) is also written. We will create a new column that shows the ‘type’ of ‘Engine’.
df['Turbo']=df['Engine volume'].apply(lambda x:1 if 'Turbo' in str(x) else 0) df['Engine volume']=df['Engine volume'].apply(lambda x:str(x).replace('Turbo','')) df['Engine volume']=df['Engine volume'].astype(float) ‘Doors’ Column df['Doors'].unique()Output:
‘Doors’ column represents the number of doors in the car. But as we can see it is not clean. Let’s clean
Handling ‘Outliers’This we will examine across numerical features.
cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[0]]); sns.boxplot(df[cols[1]]); sns.boxplot(df[cols[2]]); sns.boxplot(df[cols[3]]); sns.boxplot(df[cols[4]]);
As we can see there are outliers in ‘Levy’,’Engine volume’, ‘Mileage’, ‘Cylinders’ columns. We will remove these outliers using Inter Quantile Range(IQR) method.
def find_outliers_limit(df,col): print(col) print('-'*50) #removing outliers q25, q75 = np.percentile(df[col], 25), np.percentile(df[col], 75) iqr = q75 - q25 print('Percentiles: 25th=%.3f, 75th=%.3f, IQR=%.3f' % (q25, q75, iqr)) # calculate the outlier cutoff cut_off = iqr * 1.5 lower, upper = q25 - cut_off, q75 + cut_off print('Lower:',lower,' Upper:',upper) return lower,upper def remove_outlier(df,col,upper,lower): # identify outliers outliers = [x for x in df[col] if x upper] print('Identified outliers: %d' % len(outliers)) # remove outliers print('Non-outlier observations: %d' % len(outliers_removed)) return final outlier_cols=['Levy','Engine volume','Mileage','Cylinders'] for col in outlier_cols: lower,upper=find_outliers_limit(df,col) df[col]=remove_outlier(df,col,upper,lower)Let’s examine the features after removing outliers.
plt.figure(figsize=(20,10)) df[outlier_cols].boxplot()We can observe that there are no outliers in the features now.
Creating Additional FeaturesWe see that ‘Mileage’ and ‘Engine Volume’ are continuous variables. While performing regression I have observed that binning such variables can help increase the performance of the model. So I am creating the ‘Bin’ features for these features/columns.
labels=[0,1,2,3,4,5,6,7,8,9] df['Mileage_bin']=pd.cut(df['Mileage'],len(labels),labels=labels) df['Mileage_bin']=df['Mileage_bin'].astype(float) labels=[0,1,2,3,4] df['EV_bin']=pd.cut(df['Engine volume'],len(labels),labels=labels) df['EV_bin']=df['EV_bin'].astype(float) Handling Categorical featuresI have used Ordinal Encoder to handle the categorical columns. OrdinalEncoder works similar to LabelEncoder but OrdinalEncoder can be applied to multiple features while LabelEncoder can be applied to One feature at a time. For more details please visit the below links
num_df=df.select_dtypes(include=np.number) cat_df=df.select_dtypes(include=object) encoding=OrdinalEncoder() cat_cols=cat_df.columns.tolist() encoding.fit(cat_df[cat_cols]) cat_oe=encoding.transform(cat_df[cat_cols]) cat_oe=pd.DataFrame(cat_oe,columns=cat_cols) cat_df.reset_index(inplace=True,drop=True) cat_oe.head() num_df.reset_index(inplace=True,drop=True) cat_oe.reset_index(inplace=True,drop=True) final_all_df=pd.concat([num_df,cat_oe],axis=1)Checking correlation
final_all_df['price_log']=np.log(final_all_df['Price'])We can observe that features are not much correlated in the data. But there is one thing that we can notice is that after log transforming ‘Price’ column, correlation with few features got increased which is a good thing. We will be using log-transformed ‘Price’ to train the model. Please visit mentioned link below to better understand how feature transformations help improve model performance.
5. Data Splitting and ScalingWe have done an 80-20 split on the data. 80% of the data will be used for training and 20% data will be used for testing.
We will also scale the data since feature values in data do not have the same scale and having different scales can produce poor model performance.
cols_drop=['Price','price_log','Cylinders'] X=final_all_df.drop(cols_drop,axis=1) y=final_all_df['Price'] X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=25) scaler=StandardScaler() X_train_scaled=scaler.fit_transform(X_train) X_test_scaled=scaler.transform(X_test) 6. Model BuildingWe built LinearRegression, XGBoost, and RandomForest as machine learning models and two deep learning models one having a small network and another having a large network.
We built base models of LinearRegression, XGBoost, and RandomForest so there is not much to show about these models but we can see the model summary and how they converge with deep learning models that we built.
Deep Learning Model – Small Network model summary model_dl_small.summary() Deep Learning Model – Small Network _Train & Validation Loss #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_small.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid() Deep Learning Model – Large Network model summary model_dl_large.summary() Deep Learning Model – Large Network _Train & Validation Loss #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_large.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid()6.1 Model Performance
We have evaluated the models using Mean_Squared_Error, Mean_Absolute_Error, Mean_Absolute_Percentage_Error, Mean_Squared_Log_Error as performance matrices, and below are the results we got.
We can observe that Deep Learning Model did not perform well in comparison with Machine Learning Models. RandomForest performed really well among Machine Learning Model.
Let’s visualize the results from Random Forest.
7. Result Visualization y_pred=np.exp(model_rf.predict(X_test_scaled)) number_of_observations=20 x_ax = range(len(y_test[:number_of_observations])) plt.figure(figsize=(20,10)) plt.plot(x_ax, y_test[:number_of_observations], label="True") plt.plot(x_ax, y_pred[:number_of_observations], label="Predicted") plt.title("Car Price - True vs Predicted data") plt.xlabel('Observation Number') plt.ylabel('Price') plt.xticks(np.arange(number_of_observations)) plt.legend() plt.grid() plt.show()We can observe in the graph that the model is performing really well as seen in performance matrices as well.
8. CodeCode was done on jupyter notebook. Below is the complete code for the project.
# Loading Libraries import pandas as pd import numpy as np from sklearn.preprocessing import OrdinalEncoder from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_log_error,mean_squared_error,mean_absolute_error,mean_absolute_percentage_error import datetime from sklearn.ensemble import RandomForestRegressor from sklearn.linear_model import LinearRegression from xgboost import XGBRegressor from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt import seaborn as sns from keras.models import Sequential from keras.layers import Dense from prettytable import PrettyTable df=pd.read_csv('../input/Participant_Data_TheMathCompany_.DSHH/train.csv') df.head() # Data Inspection df.shape df.describe().transpose() df.info() sns.pairplot(df, diag_kind='kde') # Data Preprocessing df.drop('ID',axis=1,inplace=True) df['Levy']=df['Levy'].replace('-',np.nan) df['Levy']=df['Levy'].astype(float) levy_mean=0 df['Levy'].fillna(levy_mean,inplace=True) df['Levy']=round(df['Levy'],2) milage_formats=set() def get_milage_format(x): x=x.split(' ')[1] milage_formats.add(x) df['Mileage'].apply(lambda x:get_milage_format(x)); milage_formats #since milage is in KM only we will remove 'km' from it and make it numerical df['Mileage']=df['Mileage'].apply(lambda x:x.split(' ')[0]) df['Mileage']=df['Mileage'].astype('int') df['Engine volume'].unique() df['Turbo']=df['Engine volume'].apply(lambda x:1 if 'Turbo' in str(x) else 0) df['Engine volume']=df['Engine volume'].apply(lambda x:str(x).replace('Turbo','')) df['Engine volume']=df['Engine volume'].astype(float) cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[0]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[1]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[2]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[3]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[4]]); def find_outliers_limit(df,col): print(col) print('-'*50) #removing outliers q25, q75 = np.percentile(df[col], 25), np.percentile(df[col], 75) iqr = q75 - q25 print('Percentiles: 25th=%.3f, 75th=%.3f, IQR=%.3f' % (q25, q75, iqr)) # calculate the outlier cutoff cut_off = iqr * 1.5 lower, upper = q25 - cut_off, q75 + cut_off print('Lower:',lower,' Upper:',upper) return lower,upper def remove_outlier(df,col,upper,lower): # identify outliers outliers = [x for x in df[col] if x upper] print('Identified outliers: %d' % len(outliers)) # remove outliers print('Non-outlier observations: %d' % len(outliers_removed)) return final outlier_cols=['Levy','Engine volume','Mileage','Cylinders'] for col in outlier_cols: lower,upper=find_outliers_limit(df,col) df[col]=remove_outlier(df,col,upper,lower) #boxplot - to see outliers plt.figure(figsize=(20,10)) df[outlier_cols].boxplot() df['Doors'].unique() df['Doors']=df['Doors'].astype(str) #Creating Additional Features labels=[0,1,2,3,4,5,6,7,8,9] df['Mileage_bin']=pd.cut(df['Mileage'],len(labels),labels=labels) df['Mileage_bin']=df['Mileage_bin'].astype(float) labels=[0,1,2,3,4] df['EV_bin']=pd.cut(df['Engine volume'],len(labels),labels=labels) df['EV_bin']=df['EV_bin'].astype(float) #Handling Categorical features num_df=df.select_dtypes(include=np.number) cat_df=df.select_dtypes(include=object) encoding=OrdinalEncoder() cat_cols=cat_df.columns.tolist() encoding.fit(cat_df[cat_cols]) cat_oe=encoding.transform(cat_df[cat_cols]) cat_oe=pd.DataFrame(cat_oe,columns=cat_cols) cat_df.reset_index(inplace=True,drop=True) cat_oe.head() num_df.reset_index(inplace=True,drop=True) cat_oe.reset_index(inplace=True,drop=True) final_all_df=pd.concat([num_df,cat_oe],axis=1) #Checking correlation final_all_df['price_log']=np.log(final_all_df['Price']) plt.figure(figsize=(20,10)) sns.heatmap(round(final_all_df.corr(),2),annot=True); cols_drop=['Price','price_log','Cylinders'] final_all_df.columns X=final_all_df.drop(cols_drop,axis=1) y=final_all_df['Price'] # Data Splitting and Scaling X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=25) scaler=StandardScaler() X_train_scaled=scaler.fit_transform(X_train) X_test_scaled=scaler.transform(X_test) # Model Building def train_ml_model(x,y,model_type): if model_type=='lr': model=LinearRegression() elif model_type=='xgb': model=XGBRegressor() elif model_type=='rf': model=RandomForestRegressor() model.fit(X_train_scaled,np.log(y)) return model def model_evaluate(model,x,y): predictions=model.predict(x) predictions=np.exp(predictions) mse=mean_squared_error(y,predictions) mae=mean_absolute_error(y,predictions) mape=mean_absolute_percentage_error(y,predictions) msle=mean_squared_log_error(y,predictions) mse=round(mse,2) mae=round(mae,2) mape=round(mape,2) msle=round(msle,2) return [mse,mae,mape,msle] model_lr=train_ml_model(X_train_scaled,y_train,'lr') model_xgb=train_ml_model(X_train_scaled,y_train,'xgb') model_rf=train_ml_model(X_train_scaled,y_train,'rf') ## Deep Learning ### Small Network model_dl_small=Sequential() model_dl_small.add(Dense(16,input_dim=X_train_scaled.shape[1],activation='relu')) model_dl_small.add(Dense(8,activation='relu')) model_dl_small.add(Dense(4,activation='relu')) model_dl_small.add(Dense(1,activation='linear')) model_dl_small.summary() epochs=20 batch_size=10 model_dl_small.fit(X_train_scaled,np.log(y_train),verbose=0,validation_data=(X_test_scaled,np.log(y_test)),epochs=epochs,batch_size=batch_size) #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_small.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid() ### Large Network model_dl_large=Sequential() model_dl_large.add(Dense(64,input_dim=X_train_scaled.shape[1],activation='relu')) model_dl_large.add(Dense(32,activation='relu')) model_dl_large.add(Dense(16,activation='relu')) model_dl_large.add(Dense(1,activation='linear')) model_dl_large.summary() epochs=20 batch_size=10 model_dl_large.fit(X_train_scaled,np.log(y_train),verbose=0,validation_data=(X_test_scaled,np.log(y_test)),epochs=epochs,batch_size=batch_size) #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_large.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid() summary=PrettyTable(['Model','MSE','MAE','MAPE','MSLE']) summary.add_row(['LR']+model_evaluate(model_lr,X_test_scaled,y_test)) summary.add_row(['XGB']+model_evaluate(model_xgb,X_test_scaled,y_test)) summary.add_row(['RF']+model_evaluate(model_rf,X_test_scaled,y_test)) summary.add_row(['DL_SMALL']+model_evaluate(model_dl_small,X_test_scaled,y_test)) summary.add_row(['DL_LARGE']+model_evaluate(model_dl_large,X_test_scaled,y_test)) print(summary) y_pred=np.exp(model_rf.predict(X_test_scaled)) number_of_observations=20 x_ax = range(len(y_test[:number_of_observations])) plt.figure(figsize=(20,10)) plt.plot(x_ax, y_test[:number_of_observations], label="True") plt.plot(x_ax, y_pred[:number_of_observations], label="Predicted") plt.title("Car Price - True vs Predicted data") plt.xlabel('Observation Number') plt.ylabel('Price') plt.xticks(np.arange(number_of_observations)) plt.legend() plt.grid() plt.show() 9.ConclusionIn this article, we tried predicting the car price using the various parameters that were provided in the data about the car. We build machine learning and deep learning models to predict car prices and saw that machine learning-based models performed well at this data than deep learning-based models.
10. About the AuthorHi, I am Kajal Kumari. I have completed my Master’s from IIT(ISM) Dhanbad in Computer Science & Engineering. As of now, I am working as Machine Learning Engineer in Hyderabad. You can also check out few other blogs that I have written here.
The media shown in this article on LSTM for Human Activity Recognition are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Top 10 Applications Of Deep Learning In Cybersecurity In 2023
Deep learning tools have a major role to play in the field of cybersecurity in 2023.
Deep learning
which is also known as Deep Neural Network includes machine learning techniques that enable the network to learn from unsupervised data and solve complex problems. It can be extensively used for
cybersecurity
to protect companies from threats like
phishing
, spear-phishing, drive-by attack, a
password attack
, denial of service, etc. Learn about the top 10 applications of
deep learning
in cybersecurity.
Detecting Trace of Intrusion
Deep learning
, convolutional neural networks, and Recurrent Neural Networks (RNNs) can be applied to create smarter ID/IP systems by analyzing the traffic with better accuracy, reducing the number of false alerts, and helping security teams differentiate bad and good network activities. Notable solutions include Next-Generation Firewall (NGFW), Web Application Firewall (WAF), and User Entity and Behavior Analytics (UEBA).
Battle against Malware
Spam and Social Engineering Detection
Natural Language Processing (NLP), a deep learning technique, can help you to easily detect and deal with spam and other forms of social engineering. NLP learns normal forms of communication and language patterns and uses various statistical models to detect and block spam. You can read this post to learn how Google used TensorFlow to enhance the spam detection capabilities of Gmail.
Network Traffic AnalysisDeep learning
ANNs are showing promising results in analyzing HTTPS network traffic to look for malicious activities. This is very useful to deal with many cyber threats such as SQL injections and DOS attacks.
User Behavior Analytics
Tracking and analyzing user activities and behaviors is an important deep learning-based security practice for any organization. It is much more challenging than recognizing traditional malicious activities against the networks since it bypasses security measures and often doesn’t raise any flags and alerts. User and Entity Behavior Analytics (UEBA) is a great tool against such attacks. After a learning period, it can pick up normal employee behavioral patterns and recognize suspicious activities, such as accessing the system in unusual hours, that possibly indicate an insider attack and raise alerts.
Monitoring EmailsIt is vital to keep an eye on the official Email accounts of the employees to prevent any kind of cyberattacks. For instance, phishing attacks are commonly caused through emails to employees and asking them for sensitive data. Cybersecurity software along with deep learning can be used to avoid these kinds of attacks. Natural language processing can also be used to scan emails for any suspicious behavior.
Analyzing Mobile Endpoints
Deep learning
is already going mainstream on mobile devices and is also driving voice-based experiences through mobile assistants. So using deep learning, one can identify and analyze threats against mobile endpoints when the enterprise wants to prevent the growing number of malware on mobile devices.
Enhancing Human Analysis
Deep learning
in cybersecurity can help humans to detect malicious attacks, endpoint protection, analyze the network, and do vulnerability assessments. Through this, humans can decide on things better by bringing out ways and means to find the solutions to the problems.
Task Automation
The main benefit of deep learning is to automate repetitive tasks that can enable staff to focus on more important work. There are a few cybersecurity tasks that can be automated with the help of machine learning. By incorporating deep learning into the tasks, organizations can accomplish tasks faster and better.
WebShellWebShell is a piece of code that is maliciously loaded into a website to provide access to make modifications on the Webroot of the server. This allows attackers to gain access to the database. Deep learning can help in detecting the normal shopping cart behavior and the model can be trained to differentiate between normal and malicious behavior.
Network Risk ScoringArtificial Intelligence (Ai) And Deep Learning
The horizon of what repetitive tasks a computer can replace continues to expand due to artificial intelligence (AI) and the sub-field of deep learning (DL).
Artificial intelligence gives a device some form of human-like intelligence.
Researchers continue to develop self-teaching algorithms that enable deep learning AI applications like chatbots.
To understand deep learning better, we need to understand it as part of the AI evolution:
See more: Artificial Intelligence Market
Partly to eliminate human-based shortcomings in machine learning, researchers continue to try to create smarter ML algorithms. They design neural networks within ML that can learn on their own from raw, uncategorized data. Neural networks — the key to deep learning — incorporate algorithms based on mathematical formulas that add up weighted variables to generate a decision.
One example of a neural network algorithm is all of the possible variables a self-driving car considers when making the decision if it should proceed forward: is something in the way, is it dangerous to the car, is it dangerous to the passenger, etc. The weighting prioritizes the importance of the variables, such as placing passenger safety over car safety.
Deep learning extends ML algorithms to multiple layers of neural networks to make a decision tree of many layers of linked variables and related decisions. In the self-driving car example, moving forward would then lead to decisions regarding speed, the need to navigate obstacles, navigating to the destination, etc. Yet, those subsequent decisions may create feedback that forces the AI to reconsider earlier decisions and change them. Deep learning seeks to mimic the human brain in how we can learn by being taught and through multiple layers of near-simultaneous decision making.
Deep learning promises to uncover information and patterns hidden from the human brain from within the sea of computer data.
AI with deep learning surrounds us. Apple’s Siri and Amazon’s Alexa try to interpret our speech and act as our personal assistants. Amazon and Netflix use AI to predict the next product, movie, or TV show we may want to enjoy. Many of the websites we visit for banking, health care, and e-commerce use AI chatbots to handle the initial stages of customer service.
Deep learning algorithms have been applied to:
Customer service: Conversational AI incorporates natural language processing (NLP), call-center style decision trees, and other resources to provide the first level of customer service as chatbots and voicemail decision trees.
Conversational AI incorporates, call-center style decision trees, and other resources to provide the first level of customer service as chatbots and voicemail decision trees.
Cybersecurity: AI analyzes log files, network information, and more to detect, report, and remediate malware and human attacks on IT systems.
Financial services: Predictive analytics trade stocks, approve loans, flag potential fraud, and manage portfolios.
Health care: Image-recognition AI reviews medical imaging to aid in medical analysis
Law enforcement:
Track payments and other financial transactions for signs of fraud, money laundering, and other crimes
Extract patterns from voice, video, email and other evidence
Analyze large amounts of data quickly
See more: Artificial Intelligence: Current and Future Trends
We do not currently have AI capable of thinking at the human level, but technologists continue to push the envelope of what AI can do. Algorithms for self-driving cars and medical diagnosis continue to be developed and refined.
So far, AI’s main challenges stem from unpredictability and bad training data:
Biased AI judge (2023)
: To the great dismay of those trying to promote AI as unbiased, an AI algorithm designed to estimate recidivism, a key factor in sentencing, produced biased sentencing recommendations. Unfortunately, the AI learned from historical data which has racial and economic biases baked into the data; therefore, it continued to incorporate similar biases.
AI consists of three general categories: artificial narrow intelligence (ANI) focuses on the completion of a specific task, such as playing chess or painting a car on an assembly line; artificial general intelligence (AGI) strives to reach a human’s level of intelligence; and artificial super intelligence (ASI) attempts to surpass humans. Neither of these last two categories exists, so all functional AI remains categorized as ANI.
Deep learning continues to improve and deliver some results, but it cannot currently reach the higher sophistication levels needed to escape the artificial narrow intelligence category. As developers continue to add layers to the algorithms, AI will continue to assist with increasingly complex tasks and expand its utility. Even if human-like and superhuman intelligence through AI may be eluding us, deep learning continues to illustrate the increasing power of AI.
See more: Top Performing Artificial Intelligence Companies
Understanding Loss Function In Deep Learning
The loss function is very important in machine learning or deep learning. let’s say you are working on any problem and you have trained a machine learning model on the dataset and are ready to put it in front of your client. But how can you be sure that this model will give the optimum result? Is there a metric or a technique that will help you quickly evaluate your model on the dataset? Yes, here loss functions come into play in machine learning or deep learning. In this article, we will explain everything about loss function in Deep Learning.
This article was published as a part of the Data Science Blogathon.
What is Loss Function in Deep Learning?In mathematical optimization and decision theory, a loss or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event.
In simple terms, the Loss function is a method of evaluating how well your algorithm is modeling your dataset. It is a mathematical function of the parameters of the machine learning algorithm.
In simple linear regression, prediction is calculated using slope(m) and intercept(b). the loss function for this is the (Yi – Yihat)^2 i.e loss function is the function of slope and intercept.
Why Loss Function in Deep Learning is Important?Famous author Peter Druker says You can’t improve what you can’t measure. That’s why the loss function comes into the picture to evaluate how well your algorithm is modeling your dataset.
if the value of the loss function is lower then it’s a good model otherwise, we have to change the parameter of the model and minimize the loss.
Cost Function vs Loss Function in Deep LearningMost people confuse loss function and cost function. let’s understand what is loss function and cost function. Cost function and Loss function are synonymous and used interchangeably but they are different.
Loss FunctionCost FunctionMeasures the error between predicted and actual values in a machine learning model.Quantifies the overall cost or error of the model on the entire training chúng tôi to optimize the model during chúng tôi to guide the optimization process by minimizing the cost or chúng tôi be specific to individual samples.Aggregates the loss values over the entire training set.Examples include mean squared error (MSE), mean absolute error (MAE), and binary cross-entropy.Often the average or sum of individual loss values in the training chúng tôi to evaluate model chúng tôi to determine the direction and magnitude of parameter updates during optimization.Different loss functions can be used for different tasks or problem domains.Typically derived from the loss function, but can include additional regularization terms or other considerations.
Loss Function in Deep Learning
Regression
MSE(Mean Squared Error)
MAE(Mean Absolute Error)
Hubber loss
Classification
Binary cross-entropy
Categorical cross-entropy
AutoEncoder
KL Divergence
GAN
Discriminator loss
Minmax GAN loss
Object detection
Focal loss
Word embeddings
Triplet loss
In this article, we will understand regression loss and classification loss.
A. Regression Loss 1. Mean Squared Error/Squared loss/ L2 lossThe Mean Squared Error (MSE) is the simplest and most common loss function. To calculate the MSE, you take the difference between the actual value and model prediction, square it, and average it across the whole dataset.
Advantage
1. Easy to interpret.
2. Always differential because of the square.
3. Only one local minima.
1. Error unit in the square. because the unit in the square is not understood properly.
2. Not robust to outlier
Note – In regression at the last neuron use linear activation function.
2. Mean Absolute Error/ L1 lossThe Mean Absolute Error (MAE) is also the simplest loss function. To calculate the MAE, you take the difference between the actual value and model prediction and average it across the whole dataset.
Advantage
1. Intuitive and easy
2. Error Unit Same as the output column.
3. Robust to outlier
1. Graph, not differential. we can not use gradient descent directly, then we can subgradient calculation.
Note – In regression at the last neuron use linear activation function.
3. Huber LossIn statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss.
n – the number of data points.
y – the actual value of the data point. Also known as true value.
ŷ – the predicted value of the data point. This value is returned by the model.
δ – defines the point where the Huber loss function transitions from a quadratic to linear.
Advantage
Robust to outlier
It lies between MAE and MSE.
B. Classification Loss 1. Binary Cross Entropy/log loss
It is used in binary classification problems like two classes. example a person has covid or not or my article gets popular or not.
Binary cross entropy compares each of the predicted probabilities to the actual class output which can be either 0 or 1. It then calculates the score that penalizes the probabilities based on the distance from the expected value. That means how close or far from the actual value.
yi – actual values
yihat – Neural Network prediction
Advantage –
A cost function is a differential.
Multiple local minima
Not intuitive
Note – In classification at last neuron use sigmoid activation function.
2. Categorical Cross EntropyCategorical Cross entropy is used for Multiclass classification and softmax regression.
loss function = -sum up to k(yjlagyjhat) where k is classes
cost function = -1/n(sum upto n(sum j to k (yijloghijhat))
where
k is classes,
y = actual value
yhat – Neural Network prediction
Note – In multi-class classification at the last neuron use the softmax activation function.
if problem statement have 3 classes
softmax activation – f(z) = ez1/(ez1+ez2+ez3)
When to use categorical cross-entropy and sparse categorical cross-entropy?If target column has One hot encode to classes like 0 0 1, 0 1 0, 1 0 0 then use categorical cross-entropy. and if the target column has Numerical encoding to classes like 1,2,3,4….n then use sparse categorical cross-entropy.
Which is Faster?sparse categorical cross-entropy faster than categorical cross-entropy.
ConclusionIn this article, we learned about different types of loss functions. The key takeaways from the article are:
We learned the importance of loss function in deep learning.
Difference between loss and cost.
The mean absolute error is robust to the outlier.
This function is used for binary classification.
Sparse categorical cross-entropy is faster than categorical cross-entropy.
So, this was all about loss functions in deep learning. Hope you liked the article.
Frequently Asked QuestionsQ1. What is a loss function?
A. A loss function is a mathematical function that quantifies the difference between predicted and actual values in a machine learning model. It measures the model’s performance and guides the optimization process by providing feedback on how well it fits the data.
Q2. What is loss and cost function in deep learning?
A. In deep learning, “loss function” and “cost function” are often used interchangeably. They both refer to the same concept of a function that calculates the error or discrepancy between predicted and actual values. The cost or loss function is minimized during the model’s training process to improve accuracy.
Q3. What is L1 loss function in deep learning?
A. L1 loss function, also known as the mean absolute error (MAE), is commonly used in deep learning. It calculates the absolute difference between predicted and actual values. L1 loss is robust to outliers but does not penalize larger errors as strongly as other loss functions like L2 loss.
Q4. What is loss function in deep learning for NLP?
A. In deep learning for natural language processing (NLP), various loss functions are used depending on the specific task. Common loss functions for tasks like sentiment analysis or text classification include categorical cross-entropy and binary cross-entropy, which measure the difference between predicted and true class labels for classification tasks in NLP.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
Update the detailed information about Fundamentals Of Deep Learning – Introduction To Recurrent Neural Networks on the Cancandonuts.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!