Trending March 2024 # An Introduction To Autoencoders For Beginners # Suggested April 2024 # Top 6 Popular

You are reading the article An Introduction To Autoencoders For Beginners updated in March 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 An Introduction To Autoencoders For Beginners

This article was published as a part of the Data Science Blogathon


Autoencoders are unstructured learning models that utilize the power of neural networks to perform the task of representation learning. In the context of machine learning, representation learning means embedding the components and features of original data in some low-dimensional structure for better understanding, visualizing, and extraction of meaningful information.  These low dimensional vectors can help us gain amazing information about our data such as how close two instances of the dataset are, finding structure and patterns in the dataset, etc.

Table of Contents

Current scenario of the industry

Learning with unlabelled data

Introduction to Autoencoders

History of Autoencoders in papers

Introduction to Variational Autoencoders

VAE Varients

Application of Autoencoders


Current scenario of the industry

In this big-data era, where petabytes of data are generated and processed by leading social networking sites and e-commerce giants, we are living in a world of data abundance. Our machine learning algorithms have only mainly exploited labeled datasets which are rare and costly. Most of the data generated are unstructured and unlabelled, so it is high time our machine learning community should focus on unsupervised learning algorithms and not just the supervised ones to unlock the true potential of AI and machine learning.

“If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake” – Yann LeCunn

So why not eat the whole cake?

Learning with unlabelled data

A representation of data is really a mapping. If we have a data point x ∈ X, and we have a function f: X → Z for some data space Z, then f is a representation. The new point f(x) = z ∈ Z is sometimes called a representation of x. A good representation makes downstream tasks easier.

Introduction to Autoencoders

Autoencoders are also known as self-encoders are networks that are trained to reproduce their own inputs. They come under the category of unsupervised learning algorithms, in fact, some researchers suggest autoencoders as self-supervised algorithms as for a training example x, the label is x itself. But in a general sense, they are considered unsupervised as there are no classification or regression labels.

If an autoencoder does this perfectly, then the output vector x` is equal to the input vector x. The autoencoder is designed as a special two-part structure, the encoder, and the decoder.

AE = Decoder(Encoder(x))

The model train using the reconstruction loss which aims to minimize the difference between x and x’. We can define reconstruction loss as something like MSE(x, x’) if the inputs are a real value. The dimensionality of z is usually less than x, that’s why autoencoders are also called bottleneck neural networks. We are forcing a compressed knowledge representation.

The encoder is used to map unseen data to a low dimensional Z and a good representation always focuses on the point that no important information was lost during compression. Autoencoders are just like Principle Component Analysis (PCA) which is itself a dimensionality reduction algorithm but the difference is PCA is linear in nature whereas autoencoders are non-linear in nature due to neural net-based architecture. For a better understanding of the latent space, we can use an example where the observed variable x can be something like (number of people on beach, ice-cream sales, daily temperature) whereas the latent space z can be something like the tilt in Eath’s axis (ie. season of the year) because using the season information we can almost predict the number of visitors on the beach, ice cream sales, etc.

History of Autoencoders in papers

Below are research papers that are the first few to introduce AEs in the machine learning world :

A learning algorithm for Boltzmann machines, DH Ackley, GEHinton, TJ Sejnowski. Cognitive science, 1985. Describes a simple neural network trained by self-supervision.

Learning representations by back-propagating errors, D. Rumelhart, Geoffrey E. Hinton, R. J. Williams. Nature, 1986. “We describe a new learning procedure, back-propagation, for networks of neuron-like units”.

Connectionist learning procedures, GE Hinton. Machine learning, 1990. Describes the “self-supervised” bottleneck neural network.

Introduction to Variational Autoencoders

Variational Autoencoders are Autoencoders exploiting sampling technique and Kullback-Leiber Regularisation. The Variational Autoencoders aim to make the latent space smoother, i.e. a small change in x will lead to a small change in latent space z and a small change in z will lead to a small change in x. A latent space needs to be smooth with plausible points to be more effective and accurate and that is what VAE tries to achieve. In VAE, the encoder outputs not just z, but mu and sigma. After which sampling operation chooses z from these parameters and as usual decoder takes z as before.

A good sampling technique provides a good reconstruction of data points, also a good reconstruction of points near to data points. The process ensures that every point that’s close to the latent location where you encoded [the input x, ie z mean] can be decoded to something similar to [x], thus forcing the latent space to be continuously meaningful. Any two close points in the latent space will decode highly similar images. Continuity, combined with the low dimensionality of the latent space, forces every direction in the latent space to encode a meaningful axis of variation of the data, making the latent space very structured and thus highly suitable for manipulation via concept vectors. The pseudo-code for sampling is detected below :

z_mean, z_log_variance = encoder(x) z = z_mean + exp(z_log_variance) * epsilon x_hat = decoder(z) model = Model(x, x_hat)

In the VAE,  we want the data to be distributed as a normal, in 𝑧-space. In particular, a standard multivariate normal, 𝑁(0,1). When using the decoder, we can be confident that all such points correspond to typical 𝑥 points. No “holes” because the encoder is working hard to compress the data, so it won’t waste space.


“The parameters of a VAE are trained via two loss functions: a reconstruction loss that forces the decoded samples to match the initial inputs, and a regularization loss that helps learn well-formed latent spaces and reduce overfitting to the training data.” – Chollet.

The regularisation loss asks the encoder to put the data into a normal distribution in the latent space.

Kullback-Leibler divergence

The KL loss is a regularisation term in our VAE loss. As always, we can tune the regularisation by multiplying the KL by a scalar. If it is too strong our model will collapse and if too weak it will be equivalent to classic AE.

1. The sampling procedure samples from a multivariate Normal 𝑁(𝜇, 𝛴) to get each point z.

2. The regularisation procedure adds a loss to push the latent distribution to be similar to a standard multivariate Normal 𝑁(0,1).

VAE variants

BetaVAE (stronger regularisation for disentangled representation).

Contractive AE (aims for smoothness with a different regularisation).

Conditional VAE (decoder maps (𝑧, 𝑐) → 𝑥, where 𝑐 is chosen, e.g. 𝑐 specifies which digit to generate and 𝑧 specifies the style

Reference :  Keras Autoencoder

Applications of Autoencoders

Denoising images (Conv AE)

Anomaly detection on time series (1D Conv AE)

Network intrusion detection by anomaly detection (VAE Encoder only)

Generation of video game levels and music (Conv VAE Decoder only).


At last, I feel Autoencoders and Variational Autoencoders are one the most powerful unsupervised learning technique that every data scientist should be aware of. Although these models have their own limitations like they require comparatively larger datasets for training etc.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.


You're reading An Introduction To Autoencoders For Beginners

An Introduction To Rnn For Beginners


Have you ever used “Grammarly” or “text summarizer” and wondered how does Grammarly tells us grammatical mistakes or how “Apple’s Siri” and “Amazon Alexa” understand what you say? There is an interesting algorithm working behind all these applications named “Recurrent Neural Networks” or RNN in short.

The first time I came across RNNs, I was completely mixed up. How can a NN remember things? Why do we even need RNN while we had Feedforward and convolutional Neural networks? After a lot of research and hard work, I somehow managed to understand this algorithm and in this article, I want to convey all my knowledge regarding this algorithm so that you feel comfortable while reading the actual research paper.

Recurrent Neural Network

Let’s start this algorithm with a question – “love I go to the gym to” did this make any sense to you? Not really. Now read this “I love to go to the gym”, this makes perfect sense. A little jumble in the words made the sentence incomprehensible. If a human brain can’t understand this can a computer encode this? It will have a really tough time dealing with such sentences.

From the above example, we can understand that sequence of a text is very important. Data, where the order or the sequence of data is important, can be called sequential data. Text, Speech, and time-series data are a few examples of sequential data.

Why do we need RNN while we had Forward Neural Networks?

Suppose we have some reviews of cricket players, and our task is to predict if the review is positive or negative. The first step before making a model is to convert all the reviews (textual data) into machine-understandable form. We can use various techniques like One Hot Encoding or deep learning techniques like Word2vec.

We see that the total number of words or tokens in this sentence is 5.

Similarly sentence-2 “He is great” has 3 words, but the input layer in the architecture is fixed which is 5. Hence there is no direct way to feed this data into the network. Now you must be thinking that why don’t we convert each sentence to be length equal to that of the sentence with maximum length by adding zeros. Yes, this might solve the problem but then think of the number of weights your network will have, it will definitely increase exponentially.

Suppose the maximum length of a sentence is 10 (which is not realistic, it will be much bigger in real-world applications).

Let the number of words in corpus= 10k (considering a very small corpus)

Then each input will become 100k dimensional and with just 10 neurons in the hidden layer, our number of parameters becomes 1 million! To have a proper network means having billions of parameters. To overcome this we need to have a network with weight sharing capabilities.

This is where RNN comes into the picture. They use the same weights for each element of the sequence, decreasing the number of parameters and allowing the model to generalize to sequences of varying lengths.

In a standard neural network, all the inputs and outputs are independent of each other, however, in certain cases such as predicting the next word of the phrase, the preceding words are essential. Therefore, RNN was created which used a Hidden Layer to overcome this problem.

RNN Model Overview

In the diagram above the “Folded” part represents the neural network, we do not pass the entire sentence to it but rather we pass a word in it and this word will give us one output. It will also produce an activation function which will then be passed on to the next time step as you can see in the diagram. Both these diagrams represent the same thing.

Let’s understand this with the help of an example.

Suppose you are doing a “Named Entity Recognition” project, where you want to predict the name of an entity. The sentence you pass to your model is “Chirag worked at Microsoft in India”. Instead of passing this entire sentence at once, we will only pass the first word in our model.

This RNN will take an input activation which is nothing but a zero matrix initially (it will learn it gradually) and give an output activation function which is then passed on to the next timestep.

In t=2, it will take worked as an input and produce an activation as well as an output. Here 0 means it is not an entity. It will then take the third word at t=3, it will continue repeating the process until we reach the end of the statement.

A few things to note here is that this is the same block that is repeated through the time and here the order in which we pass the sentence does matter, for example in the diagram above the prediction “Organization” is made not just considering the word Microsoft but also considering all the words that have occurred into the sentence so far because we are passing the activation functions to next time stamps repeatedly.

Types of RNN and their application

Many to one: Suppose you want to make a model which predicts the rating of a movie based on its reviews (Movie rating prediction). There your output will always be one that is the rating of the movie and input may be of any size, size reviews can be of any length. Some more applications are Sentiment analysis, emotion identification, etc.

One to many: One example could be a model whose job is to generate a baby name based on a given input. For example, if we pass “male” as the input so our model should generate a male name or vice versa. Here the input size can be only one and the output size can be anything. One thing to note here is that the output from each timestamp is also passed as an input to the next time stamp and that is how the model knows which character has occurred first and which character to produce.

Many to many: In the case of language translation the input size can be of any length and the output can also be of any length. For example, English to French translation. Here when all the input words are passed only then the output will be generated (fig-4 in the above diagram). In such type of model, the first part which takes the input is known as “Encoder” and the second part which gives the output is called “Decoder”, we will understand more about these two later.


I hope this article provided you with the basic intuition you need. In the next article, we will understand what this model does under the hood, we will understand how does backpropagation works in RNN along with the detailed descriptions of LSTMs and GRUs. To summarize, we learned what is sequential data and how we can make use of textual data to make wonderful applications like speech recognition, image caption generator, etc.


Top 10 Tips For Beginners To Learn Machine Learning

At its core,

Machine Learning

functions to answer questions by “


” from data. 

It may sound simple, but knowing Machine Learning requires that you have the perseverance to learn concepts that you might be oblivious about and that you invest a significant amount of your time to have a firm grasp of the principles behind it.

To help you with your journey towards joining the Machine Learning bandwagon, here are the top ten tips for beginners to learn Machine Learning. 

1. Study the Numbers

If you’re like some people who tend to shy away from numbers and statistics, then I have some good news for you. 

You don’t need to be an expert statistician to process your data for machine learning purposes. 

However, you still need to understand some statistical concepts to help you know how and when to apply or use your data effectively for machine learning.

Some of the ideal statistical learning that you can prioritize learning are:

•  Mean and distribution

•  Statistical decision theory

•  Regression

•  Mean Square Error, Least Squares

2. Learn a Programming Language

Learning a

programming language

can seem like a long and painful process, but it doesn’t have to be. The key is to find a programming language that is popular, easy to learn, and those that are commonly used for data analysis and machine learning like




If you’re a newbie to the programming language and how it’s applied in machine learning, you can learn through a

machine learning course


With these courses alone can help you learn how to develop machine learning algorithms using concepts of time series modeling, regression, etc.

With that said, since programming language is used to analyze and manipulate data for statistical reasons, you must learn to “


the language of machine learning



3. Set Your Goals

Machine learning is a rich, and broad field that will continue to expand in the coming years. It’s because of this that you’ll have a big chance of getting overwhelmed and lose focus as you learn it.

To keep this from happening, you need to set concrete goals first before diving into machine learning. 

This can help you stay on track, avoid wasting your time and keep you moving forward.

You can think about which specific sector in the industry you’ll focus on, the tools, the problems that you’d like to solve through Machine Learning, etc. You can use these as your guiding compass in your journey towards mastering Machine Learning.

4. Understand the Basics of Machine Learning

Machine learning deals with processing a lot of data, and it involves specific steps that can be complicated for the untrained.

As a beginner, you will need to invest some time and effort in understanding the basics of machine learning and data science. 

You need to understand the basic concepts of essential aspects in machine learning like data science, programming, algorithms, and more. 

5. Perform Exploratory Data Analysis 

Exploratory data analysis deals with studying a dataset to understand the shape of data, feature correlations, and signals within the data that can be used to build predictive models.

Performing this analysis can help you determine how to improve your products, understand user behavior, and check if the data can give useful signals for data product building. 

It can include a bit of lightweight modeling to help you determine the importance of various features within datasets, but it’s one of the essential competencies of startup data scientists. 

6. Employ Unsupervised Learning Techniques  

Here’s a concise version of some of the crucial things that you need to know:

•  Autoencoders.

It allows you to encode your features in a way that they don’t take up too much space but still represent the same features.  

•  Clustering.

Using a clustering algorithm helps you classify your data points in specific groups.

•  Feature separation techniques.

This helps you see how each of your features contribute to dataset formation, determine which ones are crucial, and the role each one plays in your overall data.


7. Develop supervised learning models

The goal of supervised learning is to use an algorithm to learn and estimate the mapping function well enough that when you add new input, the algorithm can predict the output variables for the specific data. 

You can think of the process where the algorithm learns from your training data as a teacher supervising his or her students’ learning process. 

The learning process stops once the algorithm reaches an adequate level of success. 

8. Learn How to Handle Big Data Systems

You can have access to significant amounts of data that you can use for algorithms to come up with the valuable output. 

That being said, this means you need to know how to handle big data systems effectively.  

You need to understand how to store substantial amounts of data and efficiently access and process them.

Doing so can help you create solutions that you can implement in practice and not just theory. 

9. Look Into Deep Learning Models 

The deep learning algorithm is built with connected layers that allow its neural network to learn increasingly complex data features as it goes through each layer. 

With deep learning, you can turn predictions into actionable results since it can perform knowledge-based predictions and pattern discovery. 

Also, by feeding deep learning

with big data

, you can get remarkable results in terms of your management, innovation, sales, and productivity. 

10. Do and Complete a Data Project

Finally, you will need to complete a data project to apply what you have learned so far from the nine tips above. 

Start small and look for sample machine learning projects for beginners like a social media sentiment analysis using Twitter or Facebook dataset.  

After all, what is machine learning without real-life application, right?

Final Thoughts

Machine learning is an expanding field that is showing no signs of stopping any time soon, so get on board and follow these ten tips for beginners. 

A Complete Guide On Docker For Beginners

This article was published as a part of the Data Science Blogathon


It is not difficult to create a machine learning model that operates on our computers. It is more difficult when you are working with a customer who wants to use the model at scale, that is, a model that can scale and perform on all types of servers all over the world. After you have finished designing your model, it may function smoothly on your laptop or server, but not so well on other platforms, such as when you move it to the production stage or a different server. Many things can go wrong, such as performance issues, the application crashing, or the application not being effectively optimized.

Sometimes it is not the model that is the issue but the requirement to recreate the entire stack. Docker enables you to easily replicate the training and running environment for the machine learning model from any location. Docker allows you to package your code and dependencies into containers that can be transferred to different hosts, regardless of hardware or operating system.

Developers can use Docker to keep track of different versions of a container image, see who produced it with what, and roll back to prior versions. Finally, even if one of your machine learning application services is upgrading, fixing, or down, your machine learning application can continue to run. To update an output message integrated throughout the application, you do not have to update the whole application and disrupt other services.

Image 1

Let’s dig in and start investigating Docker.

What is Docker!

It is a software platform that makes developing, executing, managing, and distributing applications easier. That had accomplished by virtualizing the operating system of the computer it had installed.

Docker’s first edition had launched in 2013.

The GO programming language had used for creating Docker.

Looking at the rich set of functionality Docker has got to offer, it’s been widely accepted by some of the world’s leading organizations and universities, such as Visa, PayPal, Cornell University and Indiana University (just to name a few) to run and manage their applications using Docker.

Now we try to understand the problem, and solution offered by Docker


Let us imagine you want to host three separate Python-based applications on a single server (which could either be a physical or a virtual machine). A different version of Python used by these programs, libraries and dependencies varies from application to application.

We are unable to host all three applications on the same workstation since various versions of Python can not be installed on the same machine,


Let’s see what we could do if we didn’t use Docker to tackle this problem. In this case, we might solve the problem with the help of three physical machines or by using a single physical computer that is powerful enough to host and run three virtual machines.

Both approaches would help us install various versions of Python, and their associated dependencies, on each of these machines.

Regardless of which solution we chose, the costs of purchasing and maintaining the hardware are substantial.

Let’s look at how Docker might be a viable and cost-effective solution to this issue.

To comprehend this, we must first examine it’s functionality.

Image 2

In simple terms, the system with Docker installed and running is referred to as a Docker Host or Host.

As a result, anytime you want to deploy an application on the host, it will build a logical entity to host that application. This logical object is known as a Container or a Docker Container in the Docker nomenclature.

There is no operating system installed or running on a Docker Container. However, a virtual replica of the process table, network interface(s), and file system mount point would be included (s).

It is passed further from the host operating system on which the container is hosted and executing. The kernel of the host’s operating system, on the other hand, is shared by all the containers executing on it.

It allows each container on the same host to be isolated from the others. As a result, it helps numerous containers with varied application requirements and dependencies to run on the same host as long as the operating system requirements are the same.

In other words, rather than virtualizing hardware components, Docker would virtualize the operating system of the host on which it had installed and running.

Pros and Cons of using Docker

Docker allows numerous programs with varied requirements and dependencies to be hosted on the same host as long as they use the same operating system.

Containers are typically a few megabytes in size and occupy relatively little disc space, allowing many applications hosted on the same host.

Robustness, There is no operating system installed on a container. As a result, it uses extremely little memory when compared to a virtual machine (which would have a complete operating system installed and running on it). It cuts the bootup time to only a few seconds, whereas it takes several minutes to start a virtual machine.

Cost is less when it comes to the hardware necessary to run Docker, and it is less demanding.

On the same Docker Host, we can not host applications together that have various operating system needs. Let’s pretend we have four separate programs, three of which require a Linux-based operating system and one of which requires a Windows-based operating system. The three apps that require a Linux-based OS can be on a single Docker Host. The application that requires a Windows-based OS must be on a separate Docker Host.

Docker Core Components

Docker Engine is one of the core components and is responsible for overall functioning.

It is a client-server based application with three main components.


Rest API


Image 3

The Server executes the dockerd (Docker Daemon) daemon, which is nothing more than a process. On the Docker platform, it is in charge of creating and managing Docker Images, Containers, Networks, and Volumes.

The REST API defines how applications can interface with server and tell it how to complete their tasks.

The Client is a command-line interface that allows users to communicate with Docker by issuing commands.

Docker Terminologies

Let’s have a look at some of the terms used in the Docker world.

Docker Images and Docker Containers are the two most key items you’ll encounter while working with Docker regularly.

In simple terms, a Docker Image is a template that includes the program, dependencies needed to run it on Docker.

A Docker Container, on the other hand, is a logical entity, as previously indicated. It is a functioning instance of the Docker Image in more technical terms.

Docker Hub

Docker Hub is the official online repository where we can find all of the Docker Images that we can use.

If we like, we can also use Docker Hub to store and distribute our custom images. We could also make them public or private, depending on our needs.

Note: Free users can keep one Docker Image private. More than one requires a paid subscription.


Before we get our hands dirty with Docker, one last thing we need to know is that we need to have it installed.

The official Docker CE installation directions are linked below. These instructions for installing Docker on your PC are straightforward.

Do you wish to skip installation and start practicing Docker? 

If you’re too slow to install Docker or don’t have enough resources on your PC, don’t panic – there’s a solution to your problem.

Play with Docker, an online playground for Docker, is the best place to start. It enables users to immediately practice Docker commands without the need to install anything on their PC. The best part is that it’s easy to use and completely free.

Docker Commands

It’s finally time to get our hands dirty with Docker commands, as we’ve all been waiting for

docker create

The docker create command will be the first command we’ll look at

We can use this command to build a new container.

The following is the syntax for this command:

docker create [options] IMAGE [commands] [arguments]

Please keep in mind that everything placed in square brackets is optional. It holds for all of the instructions presented in this guide.

The following are some examples of how to use this command:

$ docker create fedora 02576e880a2ccbb4ce5c51032ea3b3bb8316e5b626861fc87d28627c810af03

The docker create command in the preceding example would create a new container using the most recent Fedora image.

It will verify if the latest official Fedora image is available on the Docker Host before building the container. If the most recent image isn’t accessible on the Docker Host, the container had initiated using the Fedora image downloaded from the Docker Hub. If the Fedora image is already present on the Docker Host, the container uses that image for creation.

Docker results in the container ID on successful creation of the container. The container ID returned by Docker is in the above example.

A container ID had assigned to each container. When executing various activities on the container, such as starting, stopping, resuming, and so on, we refer to it by its container ID.

Let’s look at another example of the docker create command, this time with parameters and command supplied to it.

$ docker create -t -i ubuntu bash 30986b73dc0022dbba81648d9e35e6e866b4356f026e75660460c3474f1ca005

The docker create command in the preceding example builds a container using the Ubuntu image (if the image isn’t available on the Docker Host, it will download the most recent image from the Docker Hub before building the container).

The -t and -i options tell Docker to assign a terminal to the container so that the user can interact with it. It also tells Docker to run the bash command every time the container starts.

docker ps

The docker ps command is the next we’ll look at

We can use the docker ps command to see all the containers currently executing on the Docker Host.

$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES30986b73dc00 ubuntu "bash" 45 minutes ago Up About a minute elated_franklin

It only shows the containers that are running on the Docker Host right now.

To view the containers created on this Docker host, regardless of their current condition, whether it is running or not, you must use the -a option, which lists all containers created on this Docker Host.

$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES30986b73dc00 ubuntu “bash” About an hour ago Up 29 minutes elated_franklin02576e880a2c fedora “/bin/bash” About an hour ago Created hungry_sinoussi

Let us understand the above output of the docker ps command.

CONTAINER ID: consists of a unique string with alphanumeric characters connected with each container.

IMAGE: Docker Image used to create the container.

COMMAND: After the start of the container, it runs any application-specific commands.

CREATED: It provides the elapsed time since the creation of the container.

STATUS: It provides the current status of the container.

If the container is running, it will display Up along with time elapsed. (Up About an hour or Up 5 minutes)

If the container is not running, the status will be Exited, with the exit status code enclosed in round brackets and the time expired. (Exited (0) 2 weeks ago or Exited (137) 10 seconds ago,)

PORTS: It provides port mappings described for the container.

NAMES: In addition to the CONTAINER ID, each container had given a unique name. A container can be identified by its container ID or by its unique name. Each container Docker generates and assigns a unique name by default. If you wish to change the container to a unique name, use the  –name option with the docker create or docker run commands.

I hope this helps you better grasp what the docker ps command returns.

docker start

The command helps to start any stopped containers.

docker start [options] CONTAINER ID/NAME [CONTAINER ID/NAME…]

To start the container, you can specify the first unique characters of the container ID or its name.

Below you can look at the example.

$ docker start 30986 $ docker start elated_franklin

docker restart

The command helps to restart any running containers.

docker restart [options] CONTAINER ID/NAME [CONTAINER ID/NAME…]

Similarly, we can restart by specifying the first unique characters of the container ID or its name.

Look at the examples using this command

$ docker restart 30986 $ docker restart elated_franklin

docker stop

The command helps to stop any running containers.

docker stop [options] CONTAINER ID/NAME [CONTAINER ID/NAME…]

It is related to the start command.

You can specify the first unique characters of the container ID or its name to stop the container.

Have a look at the below examples

$ docker stop 30986 $ docker stop elated_franklin

docker run

It first creates the container and then starts it. In summary, it is a combination of the docker create and start commands.

It has a similar syntax to docker create.

docker run [options] IMAGE [commands] [arguments] $ docker run ubuntu 30fa018c72682d78cf168626b5e6138bb3b3ae23015c5ec4bbcc2a088e67520

In the above example, it creates a container using the latest Ubuntu image and starts the container, and immediately stops it. We can not get a chance to interact with it.

To interact with the container, we need to specify the options -it to the docker run command, then we can interact with the container.

$ docker run -it ubuntu

Type exit in the terminal to come out of the container.

docker rm

We use this command to delete a container.

docker rm [options] CONTAINER ID/NAME [CONTAINER ID/NAME...] $ docker rm 30fa elated_franklin

In the above example, we are instructing docker to delete two containers in a single command. We specify the ID for the first and the name for the second container for deletion.

The container should be in a stopped state to delete it.

docker images

The command lists out all docker images present on the docker host.

$ docker images

REPOSITORY: It describes the unique name of the docker image.

TAG: Each image is associated with a unique tag that represents a version of the image.

A tag had represented using a word or set of numbers or alphanumeric characters.

IMAGE: It is a string of alphanumeric characters associated with each image.

CREATED: It provides elapsed time since the image had been created.

SIZE: It provides the size of the image.

docker rmi

This command allows us to remove images from the docker host.

docker rmi [options] IMAGE NAME/ID [IMAGE NAME/ID...] docker rmi mysql

The command removes image mysql from the docker host.

The below command removes the image with ID 94e81 from the docker host.

docker rmi 94e81

The below command removes image ubuntu with tag trusty.

docker rmi ubuntu:trusty

These are some of the basic commands you come across. There are numerous other instructions to explore.

Wind Up

Although containerization has been around for a long time, it has only recently received the attention it deserves. Google, Amazon Web Services (AWS), Intel, Tesla are just a few leading tech businesses with their specialized container engines. They rely significantly on them to develop, run, administer, and distribute their software.

Docker is an extremely powerful containerization engine, and it has a lot to offer when it comes to building, running, managing and distributing your applications efficiently.

You had seen docker at a high level. There is a lot to study about docker, like

Commands(More powerful commands)

Docker Images are a type of container (Build your custom images)

Networking with Docker (Setup and configure networking)

Stack of Docker (Grouping services required by an application)

Docker Compose is a tool that allows you to create a container (Tool for managing and running multiple containers)

Swarm of Dockers (Grouping and managing one or more machines on which docker is running)

If you’ve found this fascinating and want to learn more about it, I recommend enrolling in one of the courses listed below. They were educational and right to the point, in my opinion.

If you are a complete beginner, I recommend enrolling in this course, which has been prepared specifically for you.

Investing your time and money into studying Docker is not something you will regret.

End Notes

I hope you find this article helpful. Please feel free to share it. Thank you, have a great day.

Image Source:

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Basic Troubleshooting Tips For Beginners For Windows 11/10

I felt the need to write this post covering some general tips for beginners since I often receive mail asking for help, to which the answer, in most cases, involves carrying out some basic troubleshooting. This article, therefore, touches upon some common steps a Windows user may take to try to fix or repair his/her Windows 11/10/8/7 computer.

Windows Troubleshooting Tips

OK, so something has gone wrong or something is not working the way you want it to – and you want to fix it! But before you begin, do restart your computer and see if the problem goes away.

1] Run System Restore

2] Run System File Checker

Another thing you can do is to run the in-built System File Checker utility. Surprisingly, this tool is not used so frequently, although it could make some of your problems go away easily. This tool checks if any of your system files have been replaced, damaged, or corrupted and replaces them with “good” files.

Now type sfc /scannow and hit Enter. Go get a coffee or something since this tool will take some time to run.

Once it completes its course, it will present a report. If there are any damaged or corrupted or missing system files, they will be listed. Restart your computer. On restart, your system files will be replaced with the ‘good’ ones!

3] Fix Windows Image or Component Store corruption

You can use chúng tôi to repair a corrupt Windows Image or fix Component Store Corruption.

Use Dism /Online /Cleanup-Image /RestoreHealth.

This checks for component store corruption records the corruption and FIXES the corruption using Windows Update.

Read: Best Windows Help & Tech Support websites

4] Remove unwanted programs

Open Control Panel and check your installed programs. Do you see something you don’t want – or something you did not install, and the suspect could be a rogue or an unwanted software? Uninstall it.

5] Scan for malware

Run a full in-depth scan of your system with you anti-virus. If your security software allows you to schedule a boot-time scan – great, go ahead and schedule one; else, a regular scan should be fine too, no issues. To save your scan time, you may want to use the Disk Cleanup utility or CCleaner or Quick Clean to quickly remove your junk files. If there are any malware that are preventing your Windows from functioning normally, your anti-virus scan should be able to take care of it, by removing the malware. You can check out this list of some good recommended free antivirus software for Windows.

6] Windows computer will not start

If you find that your Windows computer just will not start, you can repair boot problems with Windows Startup Repair. Startup Repair is a Windows recovery tool that can fix certain system problems that might prevent Windows from starting. Startup Repair scans your computer for the problem and then tries to fix it so your computer can start correctly. Startup Repair is one of the recovery tools in the System Recovery Options menu. Windows 11/10 users can access Advanced Startup Options.

7] Run Windows Update

Maybe Microsoft has released fixes for your problems – so it might be a good idea to run Windows Update and see if any are available. If they are, do download and install them.

8] Windows Desktop appears blank on startup

If you find that your computer starts but stops at the logon screen or your desktop does not appear or you get to see a black screen or you simply see your wallpaper, the reasons could be many, but in many cases, it is the chúng tôi which is not starting automatically. Well, press Ctrl-Alt-Del and start the Task Manager.

9] Fix Windows

There are several small problems, nuisances and irritants you may face occasionally…

10] Repair Windows

If you find that your Windows installation had got badly corrupted and even running system restore, system file checker, or trying other troubleshooting steps, does not really help, and you start thinking of re-installing your Windows operating system – try a Repair Install first. Windows 8/10 users may consider using the Refresh or Reset Windows or Automatic Repair feature.

Additional Resources

For specific problems, you can also use the in-built Windows Troubleshooters or the Microsoft Fix it solutions or the Automated Troubleshooting Solutions or the Microsoft Diagnostics Service portal to resolve your issues. You may find additional posts to help solve your particular problems at the following links: Windows Help tutorials, Troubleshooting Windows, Repairing Windows, and Windows Tips.

Read: Windows 11/10 Support and Solutions.

And if all fails, there is always that panacea for all Windows troubles – RESET or REPAIR!

Well, these were just some of the basic troubleshooting tips. If you require something specific, just search for it here on this site, using the search bar in the sidebar. I am sure that you will find a solution. If you don’t, post your specific requirement here and we will try to prepare a tutorial for it.

If you have anything to add, please do go ahead!

A Comprehensive Beginners Guide To Linear Algebra For Data Scientists


One of the most common questions we get on Analytics Vidhya is,

How much maths do I need to learn to be a data scientist?

Even though the question sounds simple, there is no simple answer to the the question. Usually, we say that you need to know basic descriptive and inferential statistics to start. That is good to start.

But, once you have covered the basic concepts in machine learning, you will need to learn some more math. You need it to understand how these algorithms work. What are their limitations and in case they make any underlying assumptions. Now, there could be a lot of areas to study including algebra, calculus, statistics, 3-D geometry etc.

If you get confused (like I did) and ask experts what should you learn at this stage, most of them would suggest / agree that you go ahead with Linear Algebra. 

But, the problem does not stop there. The next challenge is to figure out how to learn Linear Algebra. You can get lost in the detailed mathematics and derivation and learning them would not help as much! I went through that journey myself and hence decided to write this comprehensive guide.

If you have faced this question about how to learn & what to learn in Linear Algebra – you are at the right place. Just follow this guide.

And if you’re looking to understand where linear algebra fits into the overall data science scheme, here’s the perfect article:

Table of contents

Motivation – Why learn Linear Algebra?

2.3. Planes

3.3 Representing in Matrix form

4.2.3 Use of Inverse in Data Science

5.2 Use of Eigenvectors in Data Science: PCA algorithm

Singular Value Decomposition of a Matrix

End Notes

1. Motivation – Why learn Linear Algebra?

I would like to present 4 scenarios to showcase why learning Linear Algebra is important, if you are learning Data Science and Machine Learning.

Scenario 1:

What do you see when you look at the image above? You most likely said flower, leaves -not too difficult. But, if I ask you to write that logic so that a computer can do the same for you – it will be a very difficult task (to say the least).

You were able to identify the flower because the human brain has gone through million years of evolution. We do not understand what goes in the background to be able to tell whether the colour in the picture is red or black. We have somehow trained our brains to automatically perform this task.

But making a computer do the same task is not an easy task, and is an active area of research in Machine Learning and Computer Science in general. But before we work on identifying attributes in an image, let us ponder over a particular question- How does a machine stores this image?

You probably know that computers of today are designed to process only 0 and 1. So how can an image such as above with multiple attributes like colour be stored in a computer? This is achieved by storing the pixel intensities in a construct called Matrix. Then, this matrix can be processed to identify colours etc.

So any operation which you want to perform on this image would likely use Linear Algebra and matrices at the back end.

Scenario 2:

If you are somewhat familiar with the Data Science domain, you might have heard about the world “XGBOOST” – an algorithm employed most frequently by winners of Data Science Competitions. It stores the numeric data in the form of Matrix to give predictions. It enables XGBOOST to process data faster and provide more accurate results. Moreover, not just XGBOOST but various other algorithms use Matrices to store and process data.

Scenario 3:

Deep Learning- the new buzz word in town employs Matrices to store inputs such as image or speech or text to give a state-of-the-art solution to these problems. Weights learned by a Neural Network are also stored in Matrices. Below is a graphical representation of weights stored in a Matrix.

Scenario 4:

Another active area of research in Machine Learning is dealing with text and the most common techniques employed are Bag of Words, Term Document Matrix etc. All these techniques in a very similar manner store counts(or something similar) of words in documents and store this frequency count in a Matrix form to perform tasks like Semantic analysis, Language translation, Language generation etc.

So, now you would understand the importance of Linear Algebra in machine learning. We have seen image, text or any data, in general, employing matrices to store and process data. This should be motivation enough to go through the material below to get you started on Linear Algebra. This is a relatively long guide, but it builds Linear Algebra from the ground up.

2. Representation of problems in Linear Algebra

Let’s start with a simple problem. Suppose that price of 1 ball & 2 bat or 2 ball and 1 bat is 100 units. We need to find price of a ball and a bat.

Suppose the price of a bat is Rs ‘x’ and the price of a ball is Rs ‘y’. Values of ‘x’ and ‘y’ can be anything depending on the situation i.e. ‘x’ and ‘y’ are variables.

Let’s translate this in mathematical form –

2x + y = 100 ...........(1)

Similarly, for the second condition-

x + 2y  =  100 ..............(2)

Now, to find the prices of bat and ball, we need the values of ‘x’ and ‘y’ such that it satisfies both the equations. The basic problem of linear algebra is to find these values of ‘x’ and ‘y’ i.e. the solution of a set of linear equations.

Broadly speaking, in linear algebra data is represented in the form of linear equations. These linear equations are in turn represented in the form of matrices and vectors.

The number of variables as well as the number of equations may vary depending upon the condition, but the representation is in form of matrices and vectors.

2.1 Visualise the problem

It is usually helpful to visualize data problems. Let us see if that helps in this case.

Linear equations represent flat objects. We will start with the simplest one to understand i.e. line. A line corresponding to an equation is the set of all the points which satisfy the given equation. For example,

Points (50,0) , (0,100), (100/3,100/3) and (30,40) satisfy our  equation (1) . So these points should lie on the line corresponding to our equation (1). Similarly, (0,50),(100,0),(100/3,100/3) are some of the points that satisfy equation (2).

Now in this situation, we want both of the conditions to be satisfied i.e. the point which lies on both the lines.  Intuitively, we want to find the intersection point of both the lines as shown in the figure below.

Let’s solve the problem by elementary algebraic operations like addition, subtraction and substitution.

2x + y = 100 .............(1)

x + 2y = 100 ..........(2)

from equation (1)-

y = (100- x)/2

put value of y in equation (2)-

x + 2*(100-x)/2 = 100......(3) 

Now, since the equation (3) is an equation in single variable x, it can be solved for x and subsequently y.

That looks simple – let’s go one step further and explore.

2.2 Let’s complicate the problem

Now, suppose you are given a set of three conditions with three variables each as given below and asked to find the values of all the variables. Let’s solve the problem and see what happens.




From equation (4) we get,


Substituting value of z in equation (6), we get –



Now, we can solve equations (8) and (5) as a case of two variables to find the values of ‘x’ and ‘y’ in the problem of bat and ball above. Once we know‘x’ and ‘y’, we can use (7)  to find the value of ‘z’.

As you might see, adding an extra variable has tremendously increased our efforts for finding the solution of the problem. Now imagine having 10 variables and 10 equations. Solving 10 equations simultaneously can prove to be tedious and time consuming. Now dive into data science. We have millions of data points. How do you solve those problems?

We have millions of data points in a real data set. It is going to be a nightmare to reach to solutions using the approach mentioned above. And imagine if we have to do it again and again and again. It’s going to take ages before we can solve this problem. And now if I tell you that it’s just one part of the battle, what would you think? So, what should we do? Should we quit and let it go? Definitely NO. Then?

Matrix is used to solve a large set of linear equations. But before we go further and take a look at matrices, let’s visualise the physical meaning of our problem. Give a little bit of thought to the next topic. It directly relates to the usage of Matrices.

2.3 Planes

A linear equation in 3 variables represents the set of all points whose coordinates satisfy the equations. Can you figure out the physical object represented by such an equation? Try to think of 2 variables at a time in any equation and then add the third one. You should figure out that it represents a three-dimensional analogue of line.

Basically, a linear equation in three variables represents a plane. More technically, a plane is a flat geometric object which extends up to infinity.

As in the case of a line, finding solutions to 3 variables linear equation means we want to find the intersection of those planes. Now can you imagine, in how many ways a set of three planes can intersect? Let me help you out. There are 4 possible cases –

No intersection at all.

Planes intersect in a line.

They can intersect in a plane.

All the three planes intersect at a point.

Can you imagine the number of solutions in each case? Try doing this. Here is an aid picked from Wikipedia to help you visualise.

So, what was the point of having you to visualise all graphs above?

Normal humans like us and most of the super mathematicians can only visualise things in 3-Dimensions, and having to visualise things in 4 (or 10000) dimensions is difficult impossible for mortals. So, how do mathematicians deal with higher dimensional data so efficiently? They have tricks up their sleeves and Matrices is one such trick employed by mathematicians to deal with higher dimensional data.

Now let’s proceed with our main focus i.e. Matrix.

3. Matrix

Matrix is a way of writing similar things together to handle and manipulate them as per our requirements easily. In Data Science, it is generally used to store information like weights in an Artificial Neural Network while training various algorithms. You will be able to understand my point by the end of this article.

Technically, a matrix is a 2-D array of numbers (as far as Data Science is concerned). For example look at the matrix A below.

1 2 3

4 5 6

7 8 9

Generally, rows are denoted by ‘i’ and column are denoted by ‘j’.  The elements are indexed by ‘i’th row and ‘j’th chúng tôi denote the matrix by some alphabet e.g.  A and its elements by A(ij).

In above matrix

A12 =  2

To reach to the result, go along first row and reach to second column.

3.1 Terms related to Matrix

Order of matrix – If a matrix has 3 rows and 4 columns, order of the matrix is 3*4 i.e. row*column.

Square matrix – The matrix in which the number of rows is equal to the number of columns.

Diagonal matrix – A matrix with all the non-diagonal elements equal to 0 is called a diagonal matrix.

Upper triangular matrix – Square matrix with all the elements below diagonal equal to 0.

Lower triangular matrix – Square matrix with all the elements above the diagonal equal to 0.

Scalar matrix – Square matrix with all the diagonal elements equal to some constant k.

Identity matrix – Square matrix with all the diagonal elements equal to 1 and all the non-diagonal elements equal to 0.

Column matrix –  The matrix which consists of only 1 column. Sometimes, it is used to represent a vector.

Row matrix –  A matrix consisting only of row.

Trace – It is the sum of all the diagonal elements of a square matrix.

3.2 Basic operations on matrix

Let’s play with matrices and realise the capabilities of matrix operations.

Addition – Addition of matrices is almost similar to basic arithmetic addition. All you need is the order of all the matrices being added should be same. This point will become obvious once you will do matrix addition by yourself.

Suppose we have 2 matrices ‘A’ and ‘B’ and the resultant matrix after the addition is ‘C’. Then

Cij  =   Aij + Bij

For example, let’s take two matrices and solve them.

A      =

1 0

2 3

B    =

4 -1

0 5


C        =

5 -1

2 8

Observe that to get the elements of C matrix, I have added A and B element-wise i.e. 1 to 4, 3 to 5 and so on.

Scalar Multiplication –  Multiplication of a matrix with a scalar constant is called scalar multiplication. All we have to do in a scalar multiplication is to multiply each element of the matrix with the given constant.  Suppose we have a constant scalar ‘c’ and a matrix ‘A’.  Then multiplying ‘c’ with ‘A’  gives-

c[Aij] =  [c*Aij]

Transposition – Transposition simply means interchanging the row and column index. For example-

AijT= Aji

Transpose is used in vectorized implementation of linear and logistic regression.

Code in python

Code in R

View the code on Gist.


[,1] [,2] [,3] [1,] 11 12 13 [2,] 14 15 16 [3,] 17 18 19

View the code on Gist.

t(A) [,1] [,2] [,3] [1,] 11 14 17 [2,] 12 15 18 [3,] 13 16 19

Matrix multiplication

Matrix multiplication is one of the most frequently used operations in linear algebra. We will learn to multiply two matrices as well as go through its important properties.

Before landing to algorithms, there are a few points to be kept in mind.

The multiplication of two matrices of orders i*j and j*k results into a matrix of order i*k.  Just keep the outer indices in order to get the indices of the final matrix.

Two matrices will be compatible for multiplication only if the number of columns of the first matrix and the number of rows of the second one are same.

The third point is that order of multiplication matters.

Don’t worry if you can’t get these points. You will be able to understand by the end of this section.

Suppose, we are given two matrices A and B to multiply. I will write the final expression first and then will explain the steps.

I have picked this image from Wikipedia for your better understanding.

In the first illustration, we know that the order of the resulting matrix should be 3*3. So first of all, create a matrix of order 3*3. To determine (AB)ij , multiply each element of ‘i’th row of A with ‘j’th column of B one at a time and add all the terms. To help you understand element-wise multiplication, take a look at the code below.

import numpy as np


AB= array([[2250, 2316, 2382], [2556, 2631, 2706], [2862, 2946, 3030]]) BA= array([[2310, 2406, 2502], [2526, 2631, 2736], [2742, 2856, 2970]])

So, how did we get 2250 as first element of AB matrix?  2250=21*31+22*34+23*37. Similarly, for other elements.

Code in R

View the code on Gist.

A*B [,1] [,2] [,3] [1,] 220 252 286 [2,] 322 360 400 [3,] 442 486 532

Notice the difference between AB and BA.

Properties of matrix multiplication

Matrix multiplication is associative provided the given matrices are compatible for multiplication i.e.

ABC =  (AB)C = A(BC)



array([[306108, 313056, 320004], [347742, 355635, 363528], [389376, 398214, 407052]])

array([[306108, 313056, 320004], [347742, 355635, 363528], [389376, 398214, 407052]])

2. Matrix multiplication is not commutative i.e. AB and  BA are not equal. We have verified this result above.

Matrix multiplication is used in linear and logistic regression when we calculate the value of output variable by parameterized vector method. As we have learned the basics of matrices, it’s time to apply them.

3.3 Representing equations in matrix form

Let me do something exciting for you.  Take help of pen and paper and try to find the value of the matrix multiplication shown below

It can be verified very easily that the expression contains our three equations. We will name our matrices as ‘A’, ‘X’ and ‘Z’.

It explicitly verifies that we can write our equations together in one place as

AX   = Z

Next step has to be solution chúng tôi will go through two methods to find the solution.

4. Solving the Problem

Now, we will look in detail the two methods to solve matrix equations.

Row Echelon Form

Inverse of a Matrix

4.1 Row Echelon form

Now you have visualised what an equation in 3 variables represents and had a warm up on matrix operations. Let’s find the solution of the set of equations given to us to understand our first method of interest and explore it later in detail.

I have already illustrated that solving the equations by substitution method can prove to be tedious and time taking. Our first method introduces you with a neater and more systematic method to accomplish the job in which, we manipulate our original equations systematically to find the solution.  But what are those valid manipulations? Are there any qualifying criteria they have to fulfil? Well, yes. There are two conditions which have to be fulfilled by any manipulation to be valid.

Manipulation should preserve the solution i.e. solution should not be altered on imposing the manipulation.

Manipulation should be reversible.

So, what are those manipulations?

We can swap the order of equations.

We can multiply both sides of equations by any non-zero constant ‘c’.

We can multiply an equation by any non-zero constant and then add to other equation.

These points will become more clear once you go through the algorithm and practice it. The basic idea is to clear variables in successive equations and form an upper triangular matrix. Equipped with prerequisites, let’s get started. But before that, it is strongly recommended to go through this link for better understanding.

I will solve our original problem as an illustration. Let’s do it in steps.

Make an augmented matrix from the matrix ‘A’ and ‘Z’.

What I have done is I have just concatenated the two matrices. The augmented matrix simply tells that the elements in a row are coefficients of ‘x’, ‘y’ and ‘z’ and last element in the row is right-hand side of the equation.

Multiply row (1) with 2 and subtract from row (2). Similarly, multiply equation 1 with 5 and subtract from row (3).

In order to make an upper triangular matrix, multiply row (2) by 2 and then subtract from row (3).

Now we have simplified our job, let’s retrieve the modified equations. We will start from the simplest i.e. the one with the minimum number of remaining variables. If you follow the illustrated procedure, you will find that last equation comes to be the simplest one.


Now retrieve equation (2) and put the value of ‘z’ in it to find ‘y’. Do the same for equation (1).

Isn’t it pretty simple and clean?

Let’s ponder over another point. Will we always be able to make an upper triangular matrix which gives a unique solution? Are there different cases possible? Recall that planes can intersect in multiple ways. Take your time to figure it out and then proceed further.

Different possible cases-

It’s possible that we get a unique solution as illustrated in above example. It indicates that all the three planes intersect in a point.

We can get a case like shown below

Note that in last equation, 0=0 which is always true but it seems like we have got only 2 equations. One of the equations is redundant. In many cases, it’s also possible that the number of redundant equations is more than one. In this case, the number of solutions is infinite.

There is another case where Echelon matrix looks as shown below

Let’s retrieve the last equation.



Is it possible? Very clear cut intuition is NO. But, does this signify something? It’s analogous to saying that it is impossible to find a solution and indeed, it is true. We can’t find a solution for such a set of equations. Can you think what is happening actually in terms of planes? Go back to the section where we saw planes intersecting and find it out.

Note that this method is efficient for a set of 5-6 equations. Although the method is quite simple, if equation set gets larger, the number of times you have to manipulate the equations becomes enormously high and the method becomes inefficient.

Rank of a matrix – Rank of a matrix is equal to the maximum number of linearly independent row vectors in a matrix.

A set of vectors is linearly dependent if we can express at least one of the vectors as a linear combination of remaining vectors in the set.

4.2 Inverse of a Matrix

For solving a large number of equations in one go, the inverse is used. Don’t panic if you are not familiar with the inverse. We will do a good amount of work on all the required concepts. Let’s start with a few terms and operations.

Determinant of a Matrix – The concept of determinant is applicable to square matrices only. I will lead you to the generalised expression of determinant in steps. To start with, let’s take a 2*2 matrix  A.

For now, just focus on 2*2 matrix. The expression of determinant of the matrix A will be:

det(A) =a*d-b*c

Note that det(A) is a standard notation for determinant. Notice that all you have to do to find determinant in this case is to multiply diagonal elements together and put a positive or negative sign before them. For determining the sign, sum the indices of a particular element. If the sum is an even number, put a positive sign before the multiplication and if the sum is odd, put a negative sign.  For example, the sum of indices of element ‘a11’ is 2. Similarly the sum of indices of element ‘d’ is 4. So we put a positive sign before the first term in the expression.  Do the same thing for the second term yourself.

Now take a 3*3 matrix ‘B’ and find its determinant.

I am writing the expression first and then will explain the procedure step by step.

Each term consists of two parts basically i.e. a submatrix and a coefficient. First of all, pick a constant. Observe that coefficients are picked from the first row only. To start with, I have picked the first element of the first row. You can start wherever you want. Once you have picked the coefficient, just delete all the elements in the row and column corresponding to the chosen coefficient. Next, make a matrix of the remaining elements; each one in its original position after deleting the row and column and find the determinant of this submatrix . Repeat the same procedure for each element in the first row. Now, for determining the sign of the terms, just add the indices of the coefficient element. If it is even, put a positive sign and if odd, put a negative sign. Finally, add all the terms to find the determinant. Now, let’s take a higher order matrix ‘C’ and generalise the concept.

Try to relate the expression to what we have done already and figure out the final expression.

Code in python

arr = np.arange(100,116).reshape(4,4)

array([[100, 101, 102, 103], [104, 105, 106, 107], [108, 109, 110, 111], [112, 113, 114, 115]])



Code in R

View the code on Gist.

[,1] [,2] [,3] [1,] -0.16208333 -0.1125 0.17458333 [2,] -0.07916667 0.1250 -0.04583333 [3,] 0.20791667 -0.0125 -0.09541667 #Determinant -0.0004166667

Minor of a matrix

Let’s take a square matrix A. then minor corresponding to an element A(ij)  is the determinant of the submatrix formed by deleting the ‘i’th  row and ‘j’th column of the matrix. Hope you can relate with what I have explained already in the determinant section. Let’s take an example.

To find the minor corresponding to element A11, delete first row and first column to find the submatrix.

Now find the determinant of this matrix as explained already. If you calculate the determinant of this matrix, you should get 4. If we denote minor by M11, then

M11 = 4

Similarly, you can do for other elements.

Cofactor of a matrix

In the above discussion of minors, if we consider signs of minor terms, the resultant we get is called cofactor of a matrix. To assign the sign, just sum the indices of the corresponding element. If it turns out to be even, assign positive sign. Else assign negative. Let’s take above illustration as an example. If we add the indices i.e. 1+1=2, so we should put a positive sign. Let’s say it C11. Then

C11 = 4

You should find cofactors corresponding to other elements by yourself for a good amount of practice.

Cofactor matrix

Find the cofactor corresponding to each element. Now in the original matrix, replace the original element by the corresponding cofactor. The matrix thus found is called the cofactor matrix corresponding to the original matrix.

For example, let’s take our matrix A. if you have found out the cofactors corresponding to each element, just put them in a matrix according to rule stated above. If you have done it right, you should get cofactor matrix

Adjoint of a matrix – In our journey to find inverse, we are almost at the end. Just keep hold of the article for a couple of minutes and we will be there. So, next we will find the adjoint of a matrix.

Suppose we have to find the adjoint of a matrix A. we will do it in two steps.

In step 1, find the cofactor matrix of A.

In step 2, just transpose the cofactor matrix.

The resulting matrix is the adjoint of the original matrix. For illustration, lets find the adjoint of our matrix A. we already have cofactor matrix C. Transpose of cofactor matrix should be

Finally, in the next section, we will find the inverse.

4.2.1 Finding Inverse of a matrix

Do you remember the concept of the inverse of a number in elementary algebra? Well, if there exist two numbers such that upon their multiplication gives 1 then those two numbers are called inverse of each other. Similarly in linear algebra, if there exist two matrices such that their multiplication yields an identity matrix then the matrices are called inverse of each other. If you can not get what I explained, just go with the article. It will come intuitively to you. The best way to learning is learning by doing. So, let’s jump straight to the algorithm for finding the inverse of a matrix A. Again, we will do it in two steps.

Step 1: Find out the adjoint of the matrix A by the procedure explained in previous sections.

Step2: Multiply the adjoint matrix by the inverse of determinant of the matrix A. The resulting matrix is the inverse of A.

For example, let’s take our matrix A and find it’s inverse. We already have the adjoint matrix. Determinant of matrix A comes to be -2. So, its inverse will be

Now suppose that the determinant comes out to be 0. What happens when we invert the determinant i.e. 0?  Does it make any sense?  It indicates clearly that we can’t find the inverse of such a matrix. Hence, this matrix is non-invertible. More technically, this type of matrix is called a singular matrix.

Keep in mind that the resultant of multiplication of a matrix and its inverse is an identity matrix. This property is going to be used extensively in equation solving.

Inverse is used in finding parameter vector corresponding to minimum cost function in linear regression.

4.2.2 Power of matrices

What happens when we multiply a number by 1? Obviously it remains the same. The same is applicable for an identity matrix i.e. if we multiply a matrix with an identity matrix of the same order, it remains same.

Lets solve our original problem with the help of matrices. Our original problem represented in matrix was as shown below

AX = Z i.e.

What happens when we pre multiply both the sides with inverse of coefficient matrix i.e. A. Lets find out by doing.

A-1 A X =A-1 Z

We can manipulate it as,

(A-1 A) X = A -1Z

But we know multiply a matrix with its inverse gives an Identity Matrix. So,

IX =  A -1Z

Where I is the identity matrix of the corresponding order.

If you observe keenly, we have already reached to the solution. Multiplying identity matrix to X does not change it. So the equation becomes

X = A -1Z

For solving the equation, we have to just find the inverse. It can be very easily done by executing a few lines of codes. Isn’t it a really powerful method?

Code for inverse in python

arr1 = np.arange(5,21).reshape(4,4)


4.2.3 Application of inverse in Data Science

Inverse is used to calculate parameter vector by normal equation in linear equation. Here is an illustration. Suppose we are given a data set as shown below-


ARI NL 2012 734 688 81 0.328 0.418 0.259 162 0.317 0.415

ATL NL 2012 700 600 94 0.32 0.389 0.247 162 0.306 0.378

BAL AL 2012 712 705 93 0.311 0.417 0.247 162 0.315 0.403

BOS AL 2012 734 806 69 0.315 0.415 0.26 162 0.331 0.428

CHC NL 2012 613 759 61 0.302 0.378 0.24 162 0.335 0.424

CHW AL 2012 748 676 85 0.318 0.422 0.255 162 0.319 0.405

CIN NL 2012 669 588 97 0.315 0.411 0.251 162 0.305 0.39

CLE AL 2012 667 845 68 0.324 0.381 0.251 162 0.336 0.43

COL NL 2012 758 890 64 0.33 0.436 0.274 162 0.357 0.47

DET AL 2012 726 670 88 0.335 0.422 0.268 162 0.314 0.402

HOU NL 2012 583 794 55 0.302 0.371 0.236 162 0.337 0.427

KCR AL 2012 676 746 72 0.317 0.4 0.265 162 0.339 0.423

LAA AL 2012 767 699 89 0.332 0.433 0.274 162 0.31 0.403

LAD NL 2012 637 597 86 0.317 0.374 0.252 162 0.31 0.364

It describes the different variables of different baseball teams to predict whether it makes to playoffs or not. But for right now to make it a regression problem, suppose we are interested in predicting OOBP from the rest of the variables. So, ‘OOBP’ is our target variable. To solve this problem using linear regression, we have to find parameter vector. If you are familiar with Normal equation method, you should have the idea that to do it, we need to make use of Matrices. Lets proceed further and denote our Independent variables below as matrix ‘X’.This data is a part of a data set taken from analytics edge. Here is the link for the data set.

so,  X=

734 688 81 0.328 0.418 0.259

700 600 94 0.32 0.389 0.247

712 705 93 0.311 0.417 0.247

734 806 69 0.315 0.415 0.26

613 759 61 0.302 0.378 0.24

748 676 85 0.318 0.422 0.255

669 588 97 0.315 0.411 0.251

667 845 68 0.324 0.381 0.251

758 890 64 0.33 0.436 0.274

726 670 88 0.335 0.422 0.268

583 794 55 0.302 0.371 0.236

676 746 72 0.317 0.4 0.265

767 699 89 0.332 0.433 0.274

637 597 86 0.317 0.374 0.252

To find the final parameter vector(θ) assuming our initial function is parameterised by θ and X , all you have to do is to find the inverse of (XT X) which can be accomplished very easily by using code as shown below.

First of all, let me make the Linear Regression formulation easier for you to comprehend.

f θ (X)= θT X, where θ is the parameter we wish to calculate and X is the column vector of features or independent variables.

import numpy as np

#you don’t need to bother about the following. It just #transforms the data from original source into matrix

Df1 = df.head(14)


X = np.asmatrix(X)

x= np.transpose(X)




Imagine if you had to solve this set of equations without using linear algebra. Let me remind you that this data set is less than even 1% of original date set. Now imagine if you had to find parameter vector without using linear algebra. It would have taken a lots of time and effort and could be even impossible to solve sometimes.

One major drawback of normal equation method when the number of features is large is that it is computationally very costly. The reason is that if there are ‘n’ features, the matrix (XT X) comes to be the order n*n and its solution costs time of order O( n*n*n). Generally, normal equation method is applied when a number of features is of the order of 1000 or 10,000. Data sets with a larger number of features are handled with the help another method called Gradient Descent.

5. Eigenvalues and Eigenvectors

Eigenvectors find a lot of applications in different domains like computer vision, physics and machine learning. If you have studied machine learning and are familiar with Principal component analysis algorithm, you must know how important the algorithm is when handling a large data set. Have you ever wondered what is going on behind that algorithm? Actually, the concept of Eigenvectors is the backbone of this algorithm. Let us explore Eigen vectors and Eigen values for a better understanding of it.

Let’s multiply a 2-dimensional vector with a 2*2 matrix and see what happens.

This operation on a vector is called linear transformation.  Notice that the directions of input and output vectors are different. Note that the column matrix denotes a vector here.

I will illustrate my point with the help of a picture as shown below.

In the above picture, there are two types of vectors coloured in red and yellow and the picture is showing the change in vectors after a linear transformation. Note that on applying a linear transformation to yellow coloured vector, its direction changes but the direction of the red coloured vector doesn’t change even after applying the linear transformation. The vector coloured in red is an example of Eigenvector.

Precisely, for a particular matrix; vectors whose direction remains unchanged even after applying linear transformation with the matrix are called Eigenvectors for that particular matrix. Remember that the concept of Eigen values and vectors is applicable to square matrices only. Another thing to know is that I have taken a case of two-dimensional vectors but the concept of Eigenvectors is applicable to a space of any number of dimensions.

5.1 How to find Eigenvectors of a matrix?

Suppose we have a matrix A and an Eigenvector ‘x’ corresponding to the matrix. As explained already, after multiplication with matrix the direction of ‘x’ doesn’t change. Only change in magnitude is permitted. Let us write it as an equation-

Ax = cx

(A-c)x = 0  …….(1)

Please note that in the term (A-c), ‘c’ denotes an identity matrix of the order equal to ‘A’ multiplied by a scalar ‘c’

We have two unknowns ‘c’ and ‘x’ and only one equation. Can you think of a trick to solve this equation?

In equation (1), if we put the vector ‘x’ as zero vector, it makes no sense. Hence, the only choice is that (A-c) is a singular matrix. And singular matrix has a property that its determinant equals to 0. We will use this property to find the value of ‘c’.

Det(A-c) = 0

Once you find the determinant of the matrix (A-c) and equate to 0, you will get an equation in ‘c’ of the order depending upon the given matrix A. all you have to do is to find the solution of the equation. Suppose that we find solutions as ‘c1’ , ‘c2’ and so on. Put ‘c1’ in equation (1) and find the vector ‘x1’ corresponding to ‘c1’. The vector ‘x1’ that you just found is an Eigenvector of A. Now, repeat the same procedure with ‘c2’, ‘c3’ and so on.

Code for finding EigenVectors in python

import  numpy as np

arr = np.arange(1,10).reshape(3,3)


Code in R for finding Eigenvalues and Eigenvectors:

View the code on Gist.


147.737576 5.317459 -3.055035 [,1] [,2] [,3] [1,] -0.3948374 0.4437557 -0.74478185 [2,] -0.5497457 -0.8199420 -0.06303763 [3,] -0.7361271 0.3616296 0.66432391 5.2 Use of Eigenvectors in Data Science

The concept of Eigenvectors is applied in a machine learning algorithm Principal Component Analysis. Suppose you have a data with a large number of features i.e. it has a very high dimensionality. It is possible that there are redundant features in that data. Apart from this, a large number of features will cause reduced efficiency and more disk space. What PCA does is that it craps some of lesser important features. But how to determine those features? Here, Eigenvectors come to our rescue.Let’s go through the algorithm of PCA. Suppose we have an ‘n’ dimensional data and we want to reduce it to ‘k’ dimensions. We will do it in steps.

Step 1: Data is mean normalised and feature scaled.

Step 2: We find out the covariance matrix of our data set.

Now we want to reduce the number of features i.e. dimensions. But cutting off features means loss of information. We want to minimise the loss of information i.e. we want to keep the maximum variance. So, we want to find out the directions in which variance is maximum. We will find these directions in the next step.

Step 4: We will select ‘k’ Eigenvectors corresponding to the ‘k’ largest Eigenvalues and will form a matrix in which each Eigenvector will constitute a column. We will call this matrix as U.

Now it’s the time to find the reduced data points. Suppose you want to reduce a data point ‘a’ in the data set to ‘k’ dimensions.  To do so, you have to just transpose the matrix U and multiply it with the vector ‘a’. You will get the required vector in ‘k’ dimensions.

6. Singular Value Decomposition

Suppose you are given a feature matrix A. As suggested by name, what we do is we decompose our matrix A in three constituent matrices for a special purpose.  Sometimes, it is also said that svd is some sort of generalisation of Eigen value decomposition.  I will not go into its mathematics for the reason already explained and will stick to our plan i.e. use of svd in data science.

Svd is used to remove the redundant features in a data set. Suppose you have a data set which comprises of 1000 features. Definitely, any real data set with such a large number of features is bound to contain redundant features. if you have run ML, you should be familiar with the fact that Redundant features cause a lots of problems in running machine learning algorithms. Also, running an algorithm on the original data set will be time inefficient and will require a lot of memory. So, what should you to do handle such a problem? Do we have a choice?  Can we omit some features? Will it lead to significant amount of information loss? Will we be able to get an efficient enough algorithm even after omitting the rows? I will answer these questions with the help of an illustration.

Look at the pictures shown below taken from this link

We can convert this tiger into black and white and can think of it as a matrix whose elements represent the pixel intensity as relevant location. In simpler words, the matrix contains information about the intensity of pixels of the image in the form of rows and columns. But, is it necessary to have all the columns in the intensity matrix? Will we be able to represent the tiger with a lesser amount of information? The next picture will clarify my point. In this picture, different images are shown corresponding to different ranks with different resolution. For now, just assume that higher rank implies the larger amount of information about pixel intensity. The image is taken from this link

It is clear that we can reach to a pretty well image with 20 or 30 ranks instead of 100 or 200 ranks and that’s what we want to do in a case of highly redundant data. What I want to convey is that to get a reasonable hypothesis, we don’t have to retain all the information present in the original dataset. Even, some of the features cause a problem in reaching a solution to the best algorithm. For the example, presence of redundant features causes multi co-linearity in linear regression. Also, some features are not significant for our model. Omitting these features helps to find a better fit of algorithm along with time efficiency and lesser disk space. Singular value decomposition is used to get rid of the redundant features present in our data.

7. End notes


Update the detailed information about An Introduction To Autoencoders For Beginners on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!