Trending November 2023 # Build And Deploy An Ml App Using Streamlit, Docker And Gke # Suggested December 2023 # Top 18 Popular

You are reading the article Build And Deploy An Ml App Using Streamlit, Docker And Gke updated in November 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested December 2023 Build And Deploy An Ml App Using Streamlit, Docker And Gke


You have a dataset, did extensive data analysis, and built a model around it; now, what? The next step will be to deploy the model on a server, so your model will be accessible to the general public or your development team to integrate it with the app. This article is perfect if you want to know how to share your model with your intended audience.

So, in this article, you will learn how to

Serve a machine learning model for predicting employee churn as a Web service using Fast API.

Create a simple web front end using Streamlit.

Dockerizing the Streamlit app and API.

Deploying on Google Kubernetes Engine

So, before getting into code action, let’s understand a few things about model deployment.

This article was published as a part of the Data Science Blogathon.

Table of Contents ML Model Deployment

A typical life cycle of a machine learning model starts with data collection and ends with deployment and monitoring.

There are different ways a machine learning model can be deployed in a production environment. They are

Edge deployment: Models are deployed directly to the apps or IoT devices. The model runs on local device resources. Hence, size and efficiency are capped.

Web service: The most widely used deployment method. The model is wrapped with a REST API, and predictions are fetched via HTTP calls to API endpoints.

Database Integration: With a small database with occasional update frequency, an ML model can be deployed in a database. Postgres allows for integrating Python scripts, which can also be used for deploying models.

Model deployments depend on various conditions. Deploying a model within an application can be beneficial when there are regulatory or privacy concerns about storing data outside of it. When serving multiple devices, such as mobile, web, and desktop, it’s more efficient to interface the model with a web service instead of deploying it individually on each device.

Model Building

Acquiring data, which can be time-consuming and costly, is the initial step in creating any model. Fortunately, there are a lot of free datasets available on the internet that we can leverage to make a working model. For this project, we will be using an open-sourced employee dataset.

Usually, before writing the model, it is essential to do exploratory data analysis to inspect the underlying patterns in the data. For brevity, I have already done EDA. Here, we will only write the script to create and serialize the model. For an exploratory analysis and dataset, refer to this page.

So, let’s import the libraries for data manipulation.

import pandas as pd from sklearn.preprocessing import LabelEncoder

Prepare the data

import pandas as pd from sklearn.preprocessing import LabelEncoder #encode categorical data enc = LabelEncoder() df['departments'] = enc.fit_transform(df.departments) #split into train and test from sklearn.model_selection import train_test_split y = df['left'] df.drop('left', axis=1, inplace=True) x_train, x_test, y_train, y_test = train_test_split(df, y, test_size=0.15)

Import libraries for model building

from sklearn.ensemble import RandomForestClassifier from chúng tôi import BaseEstimator from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline

Create a custom switcher class.

class my_classifier(BaseEstimator,): def __init__(self, estimator=None): self.estimator = estimator def fit(self, X, y=None):,y) return self def predict(self, X, y=None): return self.estimator.predict(X,y) def predict_proba(self, X): return self.estimator.predict_proba(X) def score(self, X, y): return self.estimator.score(X, y)

Create a pipeline and pass parameters. We will be using a Random Forest classifier with multiple hyperparameters.

pipe = Pipeline([ ('clf', my_classifier())]) parameters = [ {'clf':[RandomForestClassifier()], 'clf__n_estimators': [75, 100, 125,], 'clf__min_samples_split': [2,4,6], 'clf__max_depth': [5, 10, 15,] }, ]

Create a GridsearchCV object and fit the model with it

grid = GridSearchCV(pipe, parameters, cv=5, scoring='roc_auc'),y_train) # model = grid.best_estimator_ score = grid.best_score_

Calculate roc-auc of test data

from sklearn.metrics import roc_auc_score roc_auc = roc_auc_score(y_test, y_pred) print(f'The ROC-AUC for test data is found to be {roc_auc}')

Serialize the model with Joblib and store it

from joblib import dump dump(model, 'my-model2')

We saved the model in our current directory.

With this, we successfully built our classification model using GridserchCV.

Create Rest API

The next step is to wrap our model with a Rest API. This allows us to access our saved model as and when required. We can get our prediction via an HTTP request to an API endpoint. For this, we will be using Fast API. Fast API is a secure, high-performance Python framework for creating and testing APIs that utilizes Starlette and built-in swagger documentation. For more on this, refer to my article “Getting started with Fast API“.

First of all, import libraries

from fastapi import FastAPI from pydantic import BaseModel from joblib import load import pandas as pd import json

Instantiate Fast API app and load model

app = FastAPI() model = load('my-model2')

Build a Pydantic data model for input data

class user_input(BaseModel): satisfaction_level : float last_evaluation : float number_project : int average_montly_hours: int time_spend_company : int Work_accident : int promotion_last_5years: int departments : str salary : str

Create a prediction class to make data appropriate for the model

def predict(data): departments_list = ['IT', 'RandD', 'accounting', 'hr', 'management', 'marketing', 'product_mng', 'sales', 'support', 'technical'] data[-2] = departments_list.index(data[-2]) salaries = ['low', 'medium', 'high'] data[-1] = salaries.index(data[-1]) columns = ['satisfaction_level', 'last_evaluation', 'number_project', 'average_montly_hours', 'time_spend_company', 'Work_accident', 'promotion_last_5years','departments', 'salary'] prediction = model.predict( pd.DataFrame([data], columns= columns)) proba = model.predict_proba(pd.DataFrame([data], columns= columns)) return prediction, proba

Create a base endpoint. So, you know the model is working

@app.get('/') async def welcome(): return f'Welcome to HR api'

Create an endpoint for prediction'/predict') async def func(Input:user_input): data = [Input.satisfaction_level, Input.last_evaluation, Input.number_project, Input.average_montly_hours, Input.time_spend_company, Input.Work_accident, Input.promotion_last_5years, Input.departments, Input.salary] pred, proba = predict(data) output = {'prediction':int(pred[0]), 'probability':float(proba[0][1])} return json.dumps(output)

The final code.

To view the API, run the below script.

uvicorn hrapp:app --reload Streamlit App

Streamlit is an open-source library for building data apps. It provides tools that make it easy to create an interactive website. It allows for the creation of websites to view data, run machine learning models, and accept user input without needing to write HTML, CSS, and Javascript codes. Check out their official documentation for more information.

This app will be in a separate directory. So, create another virtual environment.

python -m venv streamlit-app

Activate the virtual environment.

source path-to-directory/bin/activate

Create a python file and import libraries.

import streamlit as st import requests import json

Define the title and header and add an image.

st.title('HR-analytics App') #title to be shown st.image('office.jpg') #add an image st.header('Enter the employee data:') #header to be shown in app

Create input forms

satisfaction_level = st.number_input('satisfaction level',min_value=0.00, max_value=1.00) last_evaluation = st.number_input('last evaluation score',min_value=0.00, max_value=1.00) number_project = st.number_input('number of projects',min_value=1) average_montly_hours = st.slider('average monthly hours', min_value=0, max_value=320) time_spend_company = st.number_input(label = 'Number of years at company', min_value=0) Work_accident = st.selectbox('If met an accident at work', [1,0], index = 1) promotion_last_5years = st.selectbox('Promotion in last 5 years yes=1/no=0', [1,0], index=1) departments = st.selectbox('Department', ['IT', 'RandD', 'accounting', 'hr', 'management', 'marketing', 'product_mng', 'sales', 'support', 'technical']) salary = st.selectbox('Salary Band', ['low', 'medium', 'high',])

Create a dictionary of the above variables with keys.

names = ['satisfaction_level', 'last_evaluation', 'number_project', 'average_montly_hours', 'time_spend_company', 'Work_accident', 'promotion_last_5years', 'departments', 'salary'] params = [satisfaction_level, last_evaluation, number_project, average_montly_hours, time_spend_company, Work_accident, promotion_last_5years, departments, salary] input_data = dict(zip(names, params))

Predict the output

if st.button('Predict'): #pred = predict(satisfaction_level, last_evaluation, number_project, average_montly_hours, time_spend_company, # Work_accident, promotion_last_5years,department, salary) try: except: print('Not able to connect to api server') ans = eval(output_.json()) output = 'Yes' if ans['prediction']==1 else 'No' if output == 'Yes': st.success(f"The employee might leave the company with a probability of {(ans['probability'])*100: .2f}") if output == 'No': st.success(f"The employee might not leave the company with a probability of {(1-ans['probability'])*100: .2f}")

Full code

To launch the app, type the below code in CLI.

streamlit run Containerizing the Apps

The story of modern app deployments is incomplete without containerization. And when it comes to containerization, the first thing that comes to mind is Docker. Docker is an essential part of MlOps and DevOps. It creates an isolated environment for app components, in this case, the model API and Streamlit front-end. This allows developers to use different tech stacks for various application components. To get an idea of how Docker is used, refer to this article.

To Dockerize the apps, we first need to create a docker file for each component in their respective directory.

Dockerfile for Rest API

FROM python:3.9.15-slim-bullseye WORKDIR /code COPY chúng tôi /code/requirements.txt RUN pip install --no-cache-dir --upgrade -r chúng tôi COPY ./ ./my-model2 /code/ EXPOSE 8000 CMD ["uvicorn", "hr_analytics_api:app", "--host", ""]

If you are on Python’s built-in virtual environment to create a chúng tôi file, type the following code on CLI.

Create a docker file for the Streamlit app and a requirements text file.

FROM python:3.9.15-slim-bullseye WORKDIR /streamlit_code COPY chúng tôi /streamlit_code/requirements.txt RUN pip install --no-cache-dir --upgrade -r chúng tôi COPY ./ ./office.jpg /streamlit_code/ EXPOSE 8501 CMD [ "streamlit", "run", ""]

Now create containers for both the Streamlit app and the Rest API. You can do it individually or define a docker-compose file that builds containers for both.

In our case, it is pretty straightforward. We define the docker-compose – a yaml file – in the following manner.

version: "2" services: streamlit_app: build: ./streamlit-app ports: - '8501:8501' hr_app: build: ./hranalytics ports: - '8000:8000'

This is our project tree structure.

├── docker-compose.yml

├── hranalytics

├── misc

└── streamlit-app

Assuming you have already set up the docker desktop in your local environment, run,

docker-compose up

In your CLI. Two containers will be up and running in a few minutes. Check the containers in your local environment by running,

docker container ls

Before we go to GKE, we need to push our docker images to a remote registry such as Docker Hub or Google container registry. You can go for any of them, but as the Docker Hub is free with no storage limitation, we go with it. Create a docker hub account if you have not. Then log in to the docker hub from your CLI (which is needed to push images to the remote registry).

docker login

Put all the credentials in.

Re-tag images as per your docker hub repository and ID name. For example, if your Docker hub id is xyz1 and your repository name is my-repo, then name your images as follows.

docker tag

Create one more for another image. Now, push them to the Docker hub.

docker push

You can visit the Docker Hub and see your images there.

Deploy on GKE

So far, so good. But our goal is to deploy the containers on Google Kubernetes Engine. You might be thinking, why GKE of all? Well, I like the UI/UX of the GCP, and it doesn’t feel clumsy and complicated as opposed to AWS, which makes it easy for beginners. And it is being used industry-wide. But before that, let’s understand a few things about Kubernetes.


Kubernetes is an open-source tool for container orchestration; in simple terms, it is the tool to manage and coordinate containers, Virtual Machines, etc. When our application needs more than one container to work, it is better to opt for Kubernetes, as it helps scale, manage, and replicate multiple containers independently. It also enables the safe rolling of updates to different services. It makes monitoring of services less painful with various integrations such as Prometheus. This allows for more efficient use of resources, improved application resilience, and easier management of complex microservice architectures.

Kubernetes Lingo

Pod: The basic unit of Kubernetes deployment is a pod. A pod is a collection of containers and VMs. A single pod can have a single container as well.

Nodes: A node in a cluster is a virtual or physical machine responsible for hosting pods and running containers.

Cluster: Clusters are a set of nodes running containerized applications.

Step-by-step process to deploy containers on GKE

Step-1: The first step is to create a project on GCP. Add proper billing details to be eligible to access GCP services.

Step-2: Go to Cloud Console and then search GKE. And create a Kubernetes cluster just as it prompts. Or you can also create a Kubernetes cluster from the Google cloud shell. Refer to this official guide to create one.

Step-3: Create two YAML files for each service, front end and back end. YAML stands for Yet Another Mark-up Language. These YAML files describe the overall configuration of deployments and services.

Deployments: Higher order abstraction of pods. Responsible for replacing and updating pods as and when needed without any downtime.

Services: Services are responsible for routing and load-balancing traffic from external and internal sources to pods. Whenever a pod is deployed or replaced, its IP address changes. Hence a stable address provider is needed. A service provides stable IP addressing and DNS name to pods.

We will define deployment and service for each of our applications.

YAML for the frontend app.

apiVersion: apps/v1 kind: Deployment metadata: name: streamlit namespace: default spec: replicas: 1 selector: matchLabels: app: streamlit template: metadata: labels: app: streamlit spec: containers: - name: streamlitapp image: xyz1/streamlit-app:v1 imagePullPolicy: Always --- apiVersion: v1 kind: Service metadata: name: streamlit-app namespace: default spec: type: LoadBalancer selector: app: streamlit ports: - port: 8501 targetPort: 8501

YAML for the back end.

apiVersion: apps/v1 kind: Deployment metadata: name: hr-api namespace: default spec: replicas: 1 selector: matchLabels: app: api template: metadata: labels: app: api spec: containers: - name: hrapp image: xyz1/hrapp:v1 imagePullPolicy: Always --- apiVersion: v1 kind: Service metadata: name: backend-api namespace: default spec: type: LoadBalancer selector: app: api ports: - port: 8000 targetPort: 8000

It might seem overwhelming at first, but the specifications are pretty straightforward. Refer to this official documentation from Kubernetes to understand Kubernetes objects.

Step-4: Create a GitHub repository and push these YAML files.

Step-5: Open the Google cloud shell and clone the GitHub repository. Enter your directory using the “cd ” command.

Step-6: Type kubectl apply -f chúng tôi backend.yaml. The deployments and services will be created.

Step-7: Check the application resources on the cluster.

kubectl get all

It will show you all the active resources, such as pods, deployments, services, and replicas.

The external IPs are where your apps are deployed. Visit the same in your browser to access your apps. Don’t forget to add ports in the links.

There is still one thing missing. If you run the streamlit app and try to get a prediction, you will encounter an HTTP connection error from the requests library. This is because, in our original streamlit app file, we were sending the HTTP post request to the localhost:8000/predict endpoint. In the local environment, the back end was hosted on localhost. In this case, it is not.

There are two ways you can resolve this issue.

By sending requests directly to the pod IP.

By using DNS of pods through services.

As I mentioned earlier, the former method is not sustainable as the IP of pods changes when replaced. So, we use the second method. Kubernetes resolves inter-pod communication with the help of services. If a pod is a part of a service, then we can send HTTP requests to that pod through the service IP or hostname, and the service will load and balance the request to one of the pods that match the selector.

This is how we can send requests from one pod to another.

Step-8: Go to your IDE and copy the code after editing the address. To edit our streamlit app file, we need to get into the container inside the pod.

kubectl exec -it -c sh

Check available files by typing the ls command.

Now type,

Paste the python file. To add a new line, press enters and then ctrl + c.

Step-9: Now, everything is ready. Go to your app and make predictions.

The below GIF is the final product. You can find the complete code here in this GitHub repository.


Throughout this article, we covered a lot of things, from building a model to finally deploying it successfully on Google  Kubernetes Engine. Here are the key takeaways from the article.

The key takeaways of this article are:

How to create a data app with Streamlit

To serve an ML model as Rest API with Fast API.

How to containerize applications using Docker.

And how to deploy the application on GKE.

So, this was all about it. I hope you found the article helpful. Follow me on Twitter for more things related to Development and Machine learning.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 


You're reading Build And Deploy An Ml App Using Streamlit, Docker And Gke

Text Analysis App Using Spacy, Streamlit, And Hugging Face Spaces

This article was published as a part of the Data Science Blogathon.


Text Classification

Text Extraction

Word Frequency



Word Sense Disambiguation


Text Classification aims to assign a predefined tag or category to the unstructured textual data. Some of the most important text classification tasks are sentiment analysis, topic modeling, language detection, and intent detection.

Text Extraction aims to extract a piece of data that is already present in the data. Some of the important text extraction tasks are keyword extraction, named entity recognition. These are useful in identifying relevant information.

Word Frequency aims to measure the most frequently occurring words in a given text using TF-IDF. We can use this to know the most frequent words that customers use while chatting with a customer support executive or even in the case of reviewing product reviews.

Collocation calculates the words that commonly co-occur with each other. Bi-grams and Tri-grams are the types of collocation that help us find the hidden semantic structure.

A concordance helps us to find the instances and context of words. Word Sense Disambiguation helps us to find the words that have more than one meaning. Clustering enables us to group texts with common attributes as a cluster. In this way, text analysis helps us to find the qualitative aspects of a given text.

In this app, we use Text Classification and Text Extraction techniques to analyze the given sentence. More specifically we use Sentiment analysis, Named Entity Recognition, and Subjectivity. Subjectivity gives us the measure of to what extent a given sentence is opinionated.



Spacy TextBlob


Hugging Face Spaces

Building the application




Spacy is an open-source python library used for all kinds of Natural Language Processing(NLP) tasks and is widely used in the industry. It offers industry-grade scalable features and is very robust. In this app that we are going to build, we shall use the Named Entity Recognition(NER) of the Spacy library.

Spacy TextBlob

Spacy TextBlob is a component of the Spacy library that enables us to do sentiment analysis. We get sentiment aka polarity of the given sentence and also we get the subjectivity of the sentence. This uses the TextBlob library under the hood to get the results.



Streamlit is an open-source python library that is used to build web apps. This can be used to quickly build ML web apps, Data visualization dashboards. This library is easy to learn and anyone can quickly pick up their skills for building user interfaces for their ML apps. We shall use this library to build our web app.


Hugging face Spaces

Hugging face Spaces is a great way of deploying our machine learning web apps quickly. It offers to host an unlimited number of apps on its servers free of cost. In this project, we will host our app on hugging face spaces.


Building the Application

Firstly, we will install all the necessary libraries as follows –

pip install spacy pip install spacytextblob pip install streamlit

Next, we code our application as follows –

import streamlit as st import spacy from spacytextblob.spacytextblob import SpacyTextBlob st.set_page_config(layout='wide', initial_sidebar_state='expanded') st.title('Text Analysis using Spacy Textblob') st.markdown('Type a sentence in the below text box and choose the desired option in the adjacent menu.') side = st.sidebar.selectbox("Select an option below", ("Sentiment", "Subjectivity", "NER")) Text = st.text_input("Enter the sentence") @st.cache def sentiment(text): nlp = spacy.load('en_core_web_sm') nlp.add_pipe('spacytextblob') doc = nlp(text) if doc._.polarity<0: return "Negative" elif doc._.polarity==0: return "Neutral" else: return "Positive" @st.cache def subjectivity(text): nlp = spacy.load('en_core_web_sm') nlp.add_pipe('spacytextblob') doc = nlp(text) return "Highly Opinionated sentence" elif doc._.subjectivity < 0.5: return "Less Opinionated sentence" else: return "Neutral sentence" @st.cache def ner(sentence): nlp = spacy.load("en_core_web_sm") doc = nlp(sentence) ents = [(e.text, e.label_) for e in doc.ents] return ents def run(): if side == "Sentiment": st.write(sentiment(Text)) if side == "Subjectivity": st.write(subjectivity(Text)) if side == "NER": st.write(ner(Text)) if __name__ == '__main__': run()

Explanation of the above code –

As a first step, we import necessary libraries.

Next we set our application page configuration using ‘st.set_page_config()‘. After this, we give the title of our app page using ‘st.title()‘ and write a short description of what our app does, using the  ‘st.markdown()‘. Then we create a sidebar for our application to show the user options using the ‘st.sidebar.selectbox()‘ and give three options for our three text analysis operations as ‘Sentiment’, ‘Subjectivity’ and ‘NER’.

We need to take text input from the user. So we do that using ‘st.text_input()‘. Now we need to create three functions to do three text analysis operations as we wanted. The first function is the sentiment function. We use spacy textblob to find the sentiment of the given text. Here we do a slight modification of the sentiment because spacy textblob gives a polarity score of the text ranging from -1 to 1. If the polarity score is negative then it is ‘Negative’ sentiment. If the polarity score is zero then sentiment is ‘Neutral’ and if the polarity score is positive then the sentiment is ‘Positive’. In this way, we create the Sentiment function as shown in the above code block. We cache this function using ‘@st.cache‘ so that there won’t be any need to re-run the function every time we run the app and this increases the speed of the app.

Similarly, we define the Subjectivity function using spacy textblob. Since subjectivity scores range between 0 and 1 we mark the sentence as highly opinionated if the score is above 0.5 and we mark the sentence as less opinionated if the score is below 0.5 and as a neutral sentence, if the score is equal to 0.5. Next, we create the Named Entity Recognition(NER) function using spacy to get the named entities.

Finally, we create our run function to run the app using all the functions we created. If the user inputs a text and selects the Sentiment option in the sidebar then the sentiment function runs and displays the sentiment. If the user selects the subjectivity option then the subjectivity function runs and displays the result as programmed. Similarly, if the user selects the NER option then the ner function runs and displays the named entities of that text.

Deployment spacy spacytextblob

I have already created a text analysis app as described in this article.

Please check it out here – Text Analysis With Spacy And Streamlit – a Hugging Face Space by rajesh1729


Interested to read Hindi Text Analysis? Head on to our blog.

image-1 source: spaCyTextBlob · spaCy Universe

image-2 source: Streamlit • The fastest way to build and share data apps

image-3 source: Spaces – Hugging Face

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 


Deploying Machine Learning Models Using Streamlit – An Introductory Guide To Model Deployment


Understand the concept of model deployment

Perform model deployment using Streamlit for loan prediction data


I believe most of you must have done some form of a data science project at some point in your lives, let it be a machine learning project, a deep learning project, or even visualizations of your data. And the best part of these projects is to showcase them to others. This will not only motivate and encourage you about your hard work but will also help you to improve upon your project.

But the question is how will you showcase your work to others? Well, this is where Model Deployment will help you.

I have been exploring the field of Model Deployment for the past few months now. Model Deployment helps you showcase your work to the world and make better decisions with it. But deploying a model can get a little tricky at times. Before deploying the model a lot of things need to be looked into, such as data storage, pre-processing, model building, and monitoring. This can be a bit confusing as the number of tools that perform these model deployment tasks efficiently is few. Enter, Streamlit!

Streamlit is a popular open-source framework used for model deployment by machine learning and data science teams. And the best part is it’s free of cost and purely in python.

In this article, we are going to deep dive into model deployment. We will first build a loan prediction model and then deploy it using Streamlit.

Table of Contents

Overview of Machine Learning Lifecycle

Understanding the Problem Statement: Automating Loan Prediction

Machine Learning model for Automating Loan Prediction

Introduction to Streamlit

Model Deployment of the Loan Prediction model using Streamlit

Overview of Machine Learning Lifecycle

Let’s start with understanding the overall machine learning lifecycle, and the different steps that are involved in creating a machine learning project. Broadly, the entire machine learning lifecycle can be described as a combination of 6 stages. Let me break these stages for you:

Stage 1: Problem Definition

The first and most important part of any project is to define the problem statement. Here, we want to describe the aim or the goal of our project and what we want to achieve at the end.

Stage 2: Hypothesis Generation

Once the problem statement is finalized, we move on to the hypothesis generation part. Here, we try to point out the factors/features that can help us to solve the problem at hand.

Stage 3: Data Collection

After generating hypotheses, we get the list of features that are useful for a problem. Next, we collect the data accordingly. This data can be collected from different sources.

Stage 4: Data Exploration and Pre-processing

After collecting the data, we move on to explore and pre-process it. These steps help us to generate meaningful insights from the data. We also clean the dataset in this step, before building the model

Stage 5: Model Building

Once we have explored and pre-processed the dataset, the next step is to build the model. Here, we create predictive models in order to build a solution for the project.

Stage 6: Model Deployment

Once you have the solution, you want to showcase it and make it accessible for others. And hence, the final stage of the machine learning lifecycle is to deploy that model.

These are the 6 stages of a machine learning lifecycle. The aim of this article is to understand the last stage, i.e. model deployment, in detail using streamlit. However, I will briefly explain the remaining stages and the complete machine learning lifecycle along with their implementation in Python, before diving deep into the model deployment part using streamlit.

So, in the next section, let’s start with understanding the problem statement.

Understanding the Problem Statement: Automating Loan Prediction

The project that I have picked for this particular blog is automating the loan eligibility process. The task is to predict whether the loan will be approved or not based on the details provided by customers. Here is the problem statement for this project:

Automate the loan eligibility process based on customer details provided while filling online application form

Based on the details provided by customers, we have to create a model that can decide where or not their loan should be approved. This completes the problem definition part of the first stage of the machine learning lifecycle. The next step is to generate hypotheses and point out the factors that will help us to predict whether the loan for a customer should be approved or not.

As a starting point, here are a couple of factors that I think will be helpful for us with respect to this project:

Amount of loan: The total amount of loan applied by the customer. My hypothesis here is that the higher the amount of loan, the lesser will be the chances of loan approval and vice versa.

Income of applicant: The income of the applicant (customer) can also be a deciding factor. A higher income will lead to higher probability of loan approval.

Education of applicant: Educational qualification of the applicant can also be a vital factor to predict the loan status of a customer. My hypothesis is if the educational qualification of the applicant is higher, the chances of their loan approval will be higher.

These are some factors that can be useful to predict the loan status of a customer. Obviously, this is a very small list, and you can come up with many more hypotheses. But, since the focus of this article is on model deployment, I will leave this hypothesis generation part for you to explore further.

Next, we need to collect the data. We know certain features that we want like the income details, educational qualification, and so on. And the data related to the customers and loan is provided at the datahack platform of Analytics Vidhya. You can go to the link, register for the practice problem, and download the dataset from the problem statement tab. Here is a summary of the variables available for this particular problem:

We have some variables related to the loan, like the loan ID, which is the unique ID for each customer, Loan Amount and Loan Amount Term, which tells us the amount of loan in thousands and the term of the loan in months respectively. Credit History represents whether a customer has any previous unclear debts or not. Apart from this, we have customer details as well, like their Gender, Marital Status, Educational qualification, income, and so on. Using these features, we will create a predictive model that will predict the target variable which is Loan Status representing whether the loan will be approved or not.

Now we have finalized the problem statement, generated the hypotheses, and collected the data. Next are the Data exploration and pre-processing phase. Here, we will explore the dataset and pre-process it. The common steps under this step are as follows:

Univariate Analysis

Bivariate Analysis

Missing Value Treatment

Outlier Treatment

Feature Engineering

We explore the variables individually which is called the univariate analysis. Exploring the effect of one variable on the other, or exploring two variables at a time is the bivariate analysis. We also look for any missing values or outliers that might be present in the dataset and deal with them. And we might also create new features using the existing features which are referred to as feature engineering. Again, I will not focus much on these data exploration parts and will only do the necessary pre-processing.

After exploring and pre-processing the data, next comes the model building phase. Since it is a classification problem, we can use any of the classification models like the logistic regression, decision tree, random forest, etc. I have tried all of these 3 models for this problem and random forest produced the best results. So, I will use a random forest as the predictive model for this project.

Till now, I have briefly explained the first five stages of the machine learning lifecycle with respect to the project automating loan prediction. Next, I will demonstrate these steps in Python.

Machine Learning model for Automating Loan Prediction

In this section, I will demonstrate the first five stages of the machine learning lifecycle for the project at hand. The first two stages, i.e. Problem definition and hypothesis generation are already covered in the previous section and hence let’s start with the third stage and load the dataset. For that, we will first import the required libraries and then read the CSV file:

Here are the first five rows from the dataset. We know that machine learning models take only numbers as inputs and can not process strings. So, we have to deal with the categories present in the dataset and convert them into numbers.

Python Code:

Here, we have converted the categories present in the Gender, Married and the Loan Status variable into numbers, simply using the map function of python. Next, let’s check if there are any missing values in the dataset:

So, there are missing values on many variables including the Gender, Married, LoanAmount variable. Next, we will remove all the rows which contain any missing values in them:

Now there are no missing values in the dataset. Next, we will separate the dependent (Loan_Status) and the independent variables:

View the code on Gist.

For this particular project, I have only picked 5 variables that I think are most relevant. These are the Gender, Marital Status, ApplicantIncome, LoanAmount, and Credit_History and stored them in variable X. Target variable is stored in another variable y. And there are 480 observations available. Next, let’s move on to the model building stage.

Here, we will first split our dataset into a training and validation set, so that we can train the model on the training set and evaluate its performance on the validation set.

View the code on Gist.

We have split the data using the train_test_split function from the sklearn library keeping the test_size as 0.2 which means 20 percent of the total dataset will be kept aside for the validation set. Next, we will train the random forest model using the training set:

View the code on Gist.

Here, I have kept the max_depth as 4 for each of the trees of our random forest and stored the trained model in a variable named model. Now, our model is trained, let’s check its performance on both the training and validation set:

View the code on Gist.

The model is 80% accurate on the validation set. Let’s check the performance on the training set too:

View the code on Gist.

Performance on the training set is almost similar to that on the validation set. So, the model has generalized well. Finally, we will save this trained model so that it can be used in the future to make predictions on new observations:

View the code on Gist.

We are saving the model in pickle format and storing it as chúng tôi This will store the trained model and we will use this while deploying the model.

This completes the first five stages of the machine learning lifecycle. Next, we will explore the last stage which is model deployment. We will be deploying this loan prediction model so that it can be accessed by others. And to do so, we will use Streamlit which is a recent and the simplest way of building web apps and deploying machine learning and deep learning models.

So, let’s first discuss this tool, and then I will demonstrate how to deploy your machine learning model using it.

Introduction to Streamlit

As per the founders of Streamlit, it is the fastest way to build data apps and share them. It is a recent model deployment tool that simplifies the entire model deployment cycle and lets you deploy your models quickly. I have been exploring this tool for the past couple of weeks and as per my experience, it is a simple, quick, and interpretable model deployment tool.

Here are some of the key features of Streamlit which I found really interesting and useful:

It quickly

turns data scripts into shareable web applications

. You just have to pass a running script to the tool and it can convert that to a web app.

Everything in Python

. The best thing about Streamlit is that everything we do is in Python. Starting from loading the model to creating the frontend, all can be done using Python.

All for free

. It is open source and hence no cost is involved. You can deploy your apps without paying for them.

No front-end experience required

. Model deployment generally contains two parts, frontend, and backend. The backend is generally a working model, a machine learning model in our case, which is built-in python. And the front end part, which generally requires some knowledge of other languages like java scripts, etc. Using Streamlit, we can create this front end in Python itself. So, we need not learn any other programming languages or web development techniques. Understanding Python is enough.

Let’s say we are deploying the model without using Streamlit. In that case, the entire pipeline will look something like this:

Model Building

Creating a python script

Write Flask app

Create front-end: JavaScript


We will first build our model and convert it into a python script. Then we will have to create the web app using let’s say flask. We will also have to create the front end for the web app and here we will have to use JavaScript. And then finally, we will deploy the model. So, if you would notice, we will require the knowledge of Python to build the model and then a thorough understanding of JavaScript and flask to build the front end and deploying the model. Now, let’s look at the deployment pipeline if we use Streamlit:

Model Building

Creating a python script

Create front-end: Python


Here we will build the model and create a python script for it. Then we will build the front-end for the app which will be in python and finally, we will deploy the model. That’s it. Our model will be deployed. Isn’t it amazing? If you know python, model deployment using Streamlit will be an easy journey. I hope you are as excited about Streamlit as I was while exploring it earlier. So, without any further ado, let’s build our own web app using Streamlit.

Model Deployment of the Loan Prediction model using Streamlit

We will start with the basic installations:

View the code on Gist.

We have installed 3 libraries here. pyngrok is a python wrapper for ngrok which helps to open secure tunnels from public URLs to localhost. This will help us to host our web app. Streamlit will be used to make our web app. 

Next, we will have to create a separate session in Streamlit for our app. You can download the chúng tôi file from here and store that in your current working directory. This will help you to create a session for your app. Finally, we have to create the python script for our app. Let me show the code first and then I will explain it to you in detail:

View the code on Gist.

This is the entire python script which will create the app for us. Let me break it down and explain in detail:

In this part, we are saving the script as chúng tôi and then we are loading the required libraries which are pickle to load the trained model and streamlit to build the app. Then we are loading the trained model and saving it in a variable named classifier.

Next, we have defined the prediction function. This function will take the data provided by users as input and make the prediction using the model that we have loaded earlier. It will take the customer details like the gender, marital status, income, loan amount, and credit history as input, and then pre-process that input so that it can be feed to the model and finally, make the prediction using the model loaded as a classifier. In the end, it will return whether the loan is approved or not based on the output of the model.

And here is the main app. First of all, we are defining the header of the app. It will display “Streamlit Loan Prediction ML App”. To do that, we are using the markdown function from streamlit. Next, we are creating five boxes in the app to take input from the users. These 5 boxes will represent the five features on which our model is trained. 

The first box is for the gender of the user. The user will have two options, Male and Female, and they will have to pick one from them. We are creating a dropdown using the selectbox function of streamlit. Similarly, for Married, we are providing two options, Married and Unmarried and again, the user will pick one from it. Next, we are defining the boxes for Applicant Income and Loan Amount.

Since both these variables will be numeric in nature, we are using the number_input function from streamlit. And finally, for the credit history, we are creating a dropdown which will have two categories, Unclear Debts, and No Unclear Debts. 

Alright, let’s now host this app to a public URL using pyngrok library.

View the code on Gist.

Here, we are first running the python script. And then we will connect it to a public URL:

View the code on Gist.

This will generate a link something like this:

And it is as simple as this to build and deploy your machine learning models using Streamlit. 

End Notes

Congratulations! We have now successfully completed loan prediction model deployment using Streamlit. I encourage you to first try this particular project, play around with the values as input, and check the results. And then, you can try out other machine learning projects as well and perform model deployment using streamlit. 

The deployment is simple, fast, and most importantly in Python. However, there are a couple of challenges with it. We have used Google colab as the backend to build us and as you might be aware, the colab session automatically restarts after 12 hours. Also, if your internet connection breaks, the colab session breaks. Hence, if we are using colab as the backend, we have to rerun the entire application once the session expires. 

We recommend you go through the following articles on model deployment to solidify your concepts-

To deal with this, we can change the backend. AWS can be the right option here for the backend and using that, we can host our web app permanently. So, in my next article, I will demonstrate how to integrate AWS with Streamlit and make the model deployment process more efficient.


Using Python + Streamlit To Find Striking Distance Keyword Opportunities

Python is an excellent tool to automate repetitive tasks as well as gain additional insights into data.

It’s perfect for Python beginners and pros alike and is a great introduction to using Python for SEO.

If you’d just like to get stuck in there’s a handy Streamlit app available for the code. This is simple to use and requires no coding experience.

There’s also a Google Colaboratory Sheet if you’d like to poke around with the code. If you can crawl a website, you can use this script!

Here’s an example of what we’ll be making today:

These keywords are found in the page title and H1, but not in the copy. Adding these keywords naturally to the existing copy would be an easy way to increase relevancy for these keywords.

By taking the hint from search engines and naturally including any missing keywords a site already ranks for, we increase the confidence of search engines to rank those keywords higher in the SERPs.

This report can be created manually, but it’s pretty time-consuming.

So, we’re going to automate the process using a Python SEO script.

Preview Of The Output

This is a sample of what the final output will look like after running the report:

The final output takes the top five opportunities by search volume for each page and neatly lays each one horizontally along with the estimated search volume.

It also shows the total search volume of all keywords a page has within striking distance, as well as the total number of keywords within reach.

The top five keywords by search volume are then checked to see if they are found in the title, H1, or copy, then flagged TRUE or FALSE.

This is great for finding quick wins! Just add the missing keyword naturally into the page copy, title, or H1.

Getting Started

The setup is fairly straightforward. We just need a crawl of the site (ideally with a custom extraction for the copy you’d like to check), and an exported file of all keywords a site ranks for.

This post will walk you through the setup, the code, and will link to a Google Colaboratory sheet if you just want to get stuck in without coding it yourself.

To get started you will need:

A crawl of the website.

An export of all keywords a site ranks for.

This Google Colab sheet or this Streamlit app to mash up the crawl and keyword data

We’ve named this the Striking Distance Report as it flags keywords that are easily within striking distance.

(We have defined striking distance as keywords that rank in positions four to 20, but have made this a configurable option in case you would like to define your own parameters.)

Striking Distance SEO Report: Getting Started 1. Crawl The Target Website

Set a custom extractor for the page copy (optional, but recommended).

Filter out pagination pages from the crawl.

2. Export All Keywords The Site Ranks For Using Your Favorite Provider

Filter keywords that trigger as a site link.

Remove keywords that trigger as an image.

Filter branded keywords.

Use both exports to create an actionable Striking Distance report from the keyword and crawl data with Python.

Crawling The Site

I’ve opted to use Screaming Frog to get the initial crawl. Any crawler will work, so long as the CSV export uses the same column names or they’re renamed to match.

The script expects to find the following columns in the crawl CSV export:

"Address", "Title 1", "H1-1", "Copy 1", "Indexability" Crawl Settings

The first thing to do is to head over to the main configuration settings within Screaming Frog:

The main settings to use are:

Crawl Internal Links, Canonicals, and the Pagination (Rel Next/Prev) setting.

(The script will work with everything else selected, but the crawl will take longer to complete!)

Next, it’s on to the Extraction tab.

At a bare minimum, we need to extract the page title, H1, and calculate whether the page is indexable as shown below.

Indexability is useful because it’s an easy way for the script to identify which URLs to drop in one go, leaving only keywords that are eligible to rank in the SERPs.

If the script cannot find the indexability column, it’ll still work as normal but won’t differentiate between pages that can and cannot rank.

Setting A Custom Extractor For Page Copy

In order to check whether a keyword is found within the page copy, we need to set a custom extractor in Screaming Frog.

Name the extractor “Copy” as seen below.

Important: The script expects the extractor to be named “Copy” as above, so please double check!

Lastly, make sure Extract Text is selected to export the copy as text, rather than HTML.

There are many guides on using custom extractors online if you need help setting one up, so I won’t go over it again here.

Once the extraction has been set it’s time to crawl the site and export the HTML file in CSV format.

Exporting The CSV File

Exporting the CSV file is as easy as changing the drop-down menu displayed underneath Internal to HTML and pressing the Export button.

The export screen should look like the below:

Tip 1: Filtering Out Pagination Pages

I recommend filtering out pagination pages from your crawl either by selecting Respect Next/Prev under the Advanced settings (or just deleting them from the CSV file, if you prefer).

Tip 2: Saving The Crawl Settings

Once you have set the crawl up, it’s worth just saving the crawl settings (which will also remember the custom extraction).

This will save a lot of time if you want to use the script again in the future.

Exporting Keywords

Once we have the crawl file, the next step is to load your favorite keyword research tool and export all of the keywords a site ranks for.

The goal here is to export all the keywords a site ranks for, filtering out branded keywords and any which triggered as a sitelink or image.

For this example, I’m using the Organic Keyword Report in Ahrefs, but it will work just as well with Semrush if that’s your preferred tool.

In Ahrefs, enter the domain you’d like to check in Site Explorer and choose Organic Keywords.

This will bring up all keywords the site is ranking for.

Filtering Out Sitelinks And Image links

The next step is to filter out any keywords triggered as a sitelink or an image pack.

The reason we need to filter out sitelinks is that they have no influence on the parent URL ranking. This is because only the parent page technically ranks for the keyword, not the sitelink URLs displayed under it.

Filtering out sitelinks will ensure that we are optimizing the correct page.

Here’s how to do it in Ahrefs.

Lastly, I recommend filtering out any branded keywords. You can do this by filtering the CSV output directly, or by pre-filtering in the keyword tool of your choice before the export.

Finally, when exporting make sure to choose Full Export and the UTF-8 format as shown below.

By default, the script works with Ahrefs (v1/v2) and Semrush keyword exports. It can work with any keyword CSV file as long as the column names the script expects are present.


The following instructions pertain to running a Google Colaboratory sheet to execute the code.

There is now a simpler option for those that prefer it in the form of a Streamlit app. Simply follow the instructions provided to upload your crawl and keyword file.

Now that we have our exported files, all that’s left to be done is to upload them to the Google Colaboratory sheet for processing.

The script will prompt you to upload the keyword CSV from Ahrefs or Semrush first and the crawl file afterward.

That’s it! The script will automatically download an actionable CSV file you can use to optimize your site.

Once you’re familiar with the whole process, using the script is really straightforward.

Code Breakdown And Explanation

If you’re learning Python for SEO and interested in what the code is doing to produce the report, stick around for the code walkthrough!

Install The Libraries

Let’s install pandas to get the ball rolling.

!pip install pandas Import The Modules

Next, we need to import the required modules.

import pandas as pd from pandas import DataFrame, Series from typing import Union from google.colab import files Set The Variables

Now it’s time to set the variables.

The script considers any keywords between positions four and 20 as within striking distance.

Changing the variables here will let you define your own range if desired. It’s worth experimenting with the settings to get the best possible output for your needs.

# set all variables here min_volume = 10 # set the minimum search volume min_position = 4 # set the minimum position / default = 4 max_position = 20 # set the maximum position / default = 20 drop_all_true = True # If all checks (h1/title/copy) are true, remove the recommendation (Nothing to do) Upload The Keyword Export CSV File

The next step is to read in the list of keywords from the CSV file.

It is set up to accept an Ahrefs report (V1 and V2) as well as a Semrush export.

upload = files.upload() upload = list(upload.keys())[0] df_keywords = pd.read_csv( (upload), error_bad_lines=False, low_memory=False, encoding="utf8", dtype={ "URL": "str", "Keyword": "str", "Volume": "str", "Position": int, "Current URL": "str", "Search Volume": int, }, ) print("Uploaded Keyword CSV File Successfully!")

If everything went to plan, you’ll see a preview of the DataFrame created from the keyword CSV export. 

Upload The Crawl Export CSV File

Once the keywords have been imported, it’s time to upload the crawl file.

upload = files.upload() upload = list(upload.keys())[0] df_crawl = pd.read_csv( (upload), error_bad_lines=False, low_memory=False, encoding=”utf8″, dtype=”str”, ) print(“Uploaded Crawl Dataframe Successfully!”)

Once the CSV file has finished uploading, you’ll see a preview of the DataFrame.

Clean And Standardize The Keyword Data

The next step is to rename the column names to ensure standardization between the most common types of file exports.

Essentially, we’re getting the keyword DataFrame into a good state and filtering using cutoffs defined by the variables.

df_keywords.rename( columns={ "Current position": "Position", "Current URL": "URL", "Search Volume": "Volume", }, inplace=True, ) # keep only the following columns from the keyword dataframe cols = "URL", "Keyword", "Volume", "Position" df_keywords = df_keywords.reindex(columns=cols) try: # clean the data. (v1 of the ahrefs keyword export combines strings and ints in the volume column) df_keywords["Volume"] = df_keywords["Volume"].str.replace("0-10", "0") except AttributeError: pass # clean the keyword data df_keywords = df_keywords[df_keywords["URL"].notna()] # remove any missing values df_keywords = df_keywords[df_keywords["Volume"].notna()] # remove any missing values df_keywords = df_keywords.astype({"Volume": int}) # change data type to int df_keywords = df_keywords.sort_values(by="Volume", ascending=False) # sort by highest vol to keep the top opportunity # make new dataframe to merge search volume back in later df_keyword_vol = df_keywords[["Keyword", "Volume"]] # drop rows if minimum search volume doesn't match specified criteria df_keywords.loc[df_keywords["Volume"] < min_volume, "Volume_Too_Low"] = "drop" df_keywords = df_keywords[~df_keywords["Volume_Too_Low"].isin(["drop"])] # drop rows if minimum search position doesn't match specified criteria df_keywords.loc[df_keywords["Position"] <= min_position, "Position_Too_High"] = "drop" df_keywords = df_keywords[~df_keywords["Position_Too_High"].isin(["drop"])] # drop rows if maximum search position doesn't match specified criteria df_keywords = df_keywords[~df_keywords["Position_Too_Low"].isin(["drop"])] Clean And Standardize The Crawl Data

Next, we need to clean and standardize the crawl data.

Essentially, we use reindex to only keep the “Address,” “Indexability,” “Page Title,” “H1-1,” and “Copy 1” columns, discarding the rest.

We use the handy “Indexability” column to only keep rows that are indexable. This will drop canonicalized URLs, redirects, and so on. I recommend enabling this option in the crawl.

Lastly, we standardize the column names so they’re a little nicer to work with.

# keep only the following columns from the crawl dataframe cols = "Address", "Indexability", "Title 1", "H1-1", "Copy 1" df_crawl = df_crawl.reindex(columns=cols) # drop non-indexable rows df_crawl = df_crawl[~df_crawl["Indexability"].isin(["Non-Indexable"])] # standardise the column names df_crawl.rename(columns={"Address": "URL", "Title 1": "Title", "H1-1": "H1", "Copy 1": "Copy"}, inplace=True) df_crawl.head() Group The Keywords

As we approach the final output, it’s necessary to group our keywords together to calculate the total opportunity for each page.

Here, we’re calculating how many keywords are within striking distance for each page, along with the combined search volume.

# groups the URLs (remove the dupes and combines stats) # make a copy of the keywords dataframe for grouping - this ensures stats can be merged back in later from the OG df df_keywords_group = df_keywords.copy() df_keywords_group["KWs in Striking Dist."] = 1 # used to count the number of keywords in striking distance df_keywords_group = ( df_keywords_group.groupby("URL") .agg({"Volume": "sum", "KWs in Striking Dist.": "count"}) .reset_index() ) df_keywords_group.head()

Once complete, you’ll see a preview of the DataFrame.

Display Keywords In Adjacent Rows

We use the grouped data as the basis for the final output. We use Pandas.unstack to reshape the DataFrame to display the keywords in the style of a GrepWords export.

# create a new df, combine the merged data with the original data. display in adjacent rows ala grepwords df_merged_all_kws = df_keywords_group.merge( df_keywords.groupby("URL")["Keyword"] .apply(lambda x: x.reset_index(drop=True)) .unstack() .reset_index() ) # sort by biggest opportunity df_merged_all_kws = df_merged_all_kws.sort_values( by="KWs in Striking Dist.", ascending=False ) # reindex the columns to keep just the top five keywords cols = "URL", "Volume", "KWs in Striking Dist.", 0, 1, 2, 3, 4 df_merged_all_kws = df_merged_all_kws.reindex(columns=cols) # create union and rename the columns df_striking: Union[Series, DataFrame, None] = df_merged_all_kws.rename( columns={ "Volume": "Striking Dist. Vol", 0: "KW1", 1: "KW2", 2: "KW3", 3: "KW4", 4: "KW5", } ) # merges striking distance df with crawl df to merge in the title, h1 and category description df_striking = pd.merge(df_striking, df_crawl, on="URL", how="inner") Set The Final Column Order And Insert Placeholder Columns

Lastly, we set the final column order and merge in the original keyword data.

There are a lot of columns to sort and create!

# set the final column order and merge the keyword data in cols = [ "URL", "Title", "H1", "Copy", "Striking Dist. Vol", "KWs in Striking Dist.", "KW1", "KW1 Vol", "KW1 in Title", "KW1 in H1", "KW1 in Copy", "KW2", "KW2 Vol", "KW2 in Title", "KW2 in H1", "KW2 in Copy", "KW3", "KW3 Vol", "KW3 in Title", "KW3 in H1", "KW3 in Copy", "KW4", "KW4 Vol", "KW4 in Title", "KW4 in H1", "KW4 in Copy", "KW5", "KW5 Vol", "KW5 in Title", "KW5 in H1", "KW5 in Copy", ] # re-index the columns to place them in a logical order + inserts new blank columns for kw checks. df_striking = df_striking.reindex(columns=cols) Merge In The Keyword Data For Each Column

This code merges the keyword volume data back into the DataFrame. It’s more or less the equivalent of an Excel VLOOKUP function.

# merge in keyword data for each keyword column (KW1 - KW5) df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW1", right_on="Keyword", how="left") df_striking['KW1 Vol'] = df_striking['Volume'] df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True) df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW2", right_on="Keyword", how="left") df_striking['KW2 Vol'] = df_striking['Volume'] df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True) df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW3", right_on="Keyword", how="left") df_striking['KW3 Vol'] = df_striking['Volume'] df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True) df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW4", right_on="Keyword", how="left") df_striking['KW4 Vol'] = df_striking['Volume'] df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True) df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW5", right_on="Keyword", how="left") df_striking['KW5 Vol'] = df_striking['Volume'] df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True) Clean The Data Some More

The data requires additional cleaning to populate empty values, (NaNs), as empty strings. This improves the readability of the final output by creating blank cells, instead of cells populated with NaN string values.

Next, we convert the columns to lowercase so that they match when checking whether a target keyword is featured in a specific column.

# replace nan values with empty strings df_striking = df_striking.fillna("") # drop the title, h1 and category description to lower case so kws can be matched to them df_striking["Title"] = df_striking["Title"].str.lower() df_striking["H1"] = df_striking["H1"].str.lower() df_striking["Copy"] = df_striking["Copy"].str.lower() Check Whether The Keyword Appears In The Title/H1/Copy and Return True Or False

This code checks if the target keyword is found in the page title/H1 or copy.

It’ll flag true or false depending on whether a keyword was found within the on-page elements.

df_striking["KW1 in Title"] = df_striking.apply(lambda row: row["KW1"] in row["Title"], axis=1) df_striking["KW1 in H1"] = df_striking.apply(lambda row: row["KW1"] in row["H1"], axis=1) df_striking["KW1 in Copy"] = df_striking.apply(lambda row: row["KW1"] in row["Copy"], axis=1) df_striking["KW2 in Title"] = df_striking.apply(lambda row: row["KW2"] in row["Title"], axis=1) df_striking["KW2 in H1"] = df_striking.apply(lambda row: row["KW2"] in row["H1"], axis=1) df_striking["KW2 in Copy"] = df_striking.apply(lambda row: row["KW2"] in row["Copy"], axis=1) df_striking["KW3 in Title"] = df_striking.apply(lambda row: row["KW3"] in row["Title"], axis=1) df_striking["KW3 in H1"] = df_striking.apply(lambda row: row["KW3"] in row["H1"], axis=1) df_striking["KW3 in Copy"] = df_striking.apply(lambda row: row["KW3"] in row["Copy"], axis=1) df_striking["KW4 in Title"] = df_striking.apply(lambda row: row["KW4"] in row["Title"], axis=1) df_striking["KW4 in H1"] = df_striking.apply(lambda row: row["KW4"] in row["H1"], axis=1) df_striking["KW4 in Copy"] = df_striking.apply(lambda row: row["KW4"] in row["Copy"], axis=1) df_striking["KW5 in Title"] = df_striking.apply(lambda row: row["KW5"] in row["Title"], axis=1) df_striking["KW5 in H1"] = df_striking.apply(lambda row: row["KW5"] in row["H1"], axis=1) df_striking["KW5 in Copy"] = df_striking.apply(lambda row: row["KW5"] in row["Copy"], axis=1) Delete True/False Values If There Is No Keyword

This will delete true/false values when there is no keyword adjacent.

# delete true / false values if there is no keyword df_striking.loc[df_striking["KW1"] == "", ["KW1 in Title", "KW1 in H1", "KW1 in Copy"]] = "" df_striking.loc[df_striking["KW2"] == "", ["KW2 in Title", "KW2 in H1", "KW2 in Copy"]] = "" df_striking.loc[df_striking["KW3"] == "", ["KW3 in Title", "KW3 in H1", "KW3 in Copy"]] = "" df_striking.loc[df_striking["KW4"] == "", ["KW4 in Title", "KW4 in H1", "KW4 in Copy"]] = "" df_striking.loc[df_striking["KW5"] == "", ["KW5 in Title", "KW5 in H1", "KW5 in Copy"]] = "" df_striking.head() Drop Rows If All Values == True

This configurable option is really useful for reducing the amount of QA time required for the final output by dropping the keyword opportunity from the final output if it is found in all three columns.

def true_dropper(col1, col2, col3): drop = df_striking.drop( df_striking[ (df_striking[col1] == True) & (df_striking[col2] == True) & (df_striking[col3] == True) ].index ) return drop if drop_all_true == True: df_striking = true_dropper("KW1 in Title", "KW1 in H1", "KW1 in Copy") df_striking = true_dropper("KW2 in Title", "KW2 in H1", "KW2 in Copy") df_striking = true_dropper("KW3 in Title", "KW3 in H1", "KW3 in Copy") df_striking = true_dropper("KW4 in Title", "KW4 in H1", "KW4 in Copy") df_striking = true_dropper("KW5 in Title", "KW5 in H1", "KW5 in Copy") Download The CSV File

The last step is to download the CSV file and start the optimization process.

df_striking.to_csv('Keywords in Striking Distance.csv', index=False)"Keywords in Striking Distance.csv") Conclusion

If you are looking for quick wins for any website, the striking distance report is a really easy way to find them.

Don’t let the number of steps fool you. It’s not as complex as it seems. It’s as simple as uploading a crawl and keyword export to the supplied Google Colab sheet or using the Streamlit app.

The results are definitely worth it!

More Resources:

Featured Image: aurielaki/Shutterstock

Install Odoo 15 Using Docker, Nginx On Ubuntu 22.04

Install Odoo 15 using Docker Compose, Nginx, SSL on Ubuntu 22.04 . In this tutorial you are going to learn how to install and setup Odoo using Docker and Docker Compose and configure Nginx and Let’s Encrypt SSL and also install PostgreSQL. Installing Odoo using Docker Compose is the easiest way compared to install manually.

Odoo is a management self hosted software to run a business with a top notch user experience. The applications within Odoo are perfectly integrated with each other, allowing you to fully automate your business processes easily.


Install Docker on Ubuntu 22.04

Install Docker Compose on Ubuntu 22.04.

Please make sure you have completed all the above mentioned steps

Domain pointed to your server IP address

Docker installed and configured

Docker Compose installed and configured

Step 1: Create a project directory

SSH to your server and start by creating a new project directory named odoo-project. You can also name it whatever you need.



Step 2: Create Docker Compose YAML file

Now navigate inside the project directory and create a new chúng tôi file with the following configuration.



nano docker-compose.yml

Paste the following configuration.

version: '3.9' services: odoo: container_name: odoo image: odoo:15.0 volumes: - ./addons-extra:/mnt/extra-addons - ./etc/odoo:/etc/odoo - odoo-web-data:/var/lib/odoo ports: - "8069:8069" depends_on: - postgres postgres: image: postgres:14 environment: - POSTGRES_DB=postgres - POSTGRES_PASSWORD=




- PGDATA=/var/lib/postgresql/data/pgdata volumes: - odoo-db-data:/var/lib/postgresql/data/pgdata nginx: container_name: nginx image: nginx:latest restart: unless-stopped ports: - 80:80 - 443:443 volumes: - ./nginx/conf:/etc/nginx/conf.d - ./certbot/conf:/etc/nginx/ssl - ./certbot/data:/var/www/html certbot: container_name: certbot image: certbot/certbot:latest volumes: - ./certbot/conf:/etc/letsencrypt - ./certbot/logs:/var/log/letsencrypt - ./certbot/data:/var/www/html volumes: odoo-web-data: odoo-db-data:

Hit CTRL + X followed by Y and Enter to save the file and exit.

Here are the configuration details.

version: Compose file version which is compatible with the Docker Engine. You can check compatibility here.

services: here we have 4 services named odoo, postgres, nginx and certbot.

image: We use latest Odoo 15, Postgres 14, Nginx and Certbot images available in Docker hub.


nginx/conf: here we will place the Nginx configuration file to be synced with the default Nginx conf.d folder inside the container.

etc/odoo: here we will place the Odoo 15 database configuration.

cedtbot/conf: this is where we will receive the SSL certificate and this will be synced with the folder we wish to inside the container.

ports: configure the container to listen upon the listed ports.

command: the command used to receive the SSL certificate.

Step 3: Create Odoo Configuration

Now you can use custom Odoo configuration inside the directory as mentioned in the yml file in the Oddo service.

mkdir -p etc/odoo

Create a new file named odoo.conf

nano etc/odoo/odoo.conf

Replace the highlighted values corresponding to your PostgreSQL values.

[options] ; This is the password that allows database operations: ; admin_passwd = admin db_host =


db_user =


db_password =


Here we will use the hostname same as the PostgreSQL service name.

Step 4: Configure Nginx

Now you can create he default configuration file inside the directory as mentioned in the yml file in the Nginx service.

mkdir -p nginx/conf

Create a new file named default.conf

nano nginx/conf/default.conf

Paste the following configuration and replace the appropriate values with your domain name.

server { listen [::]:80; listen 80; location ~ /.well-known/acme-challenge { allow all; root /var/www/html; } location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $host; } location ~* /web/static/ { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $host; } } Step 5: Deploy Odoo with Docker

Now you can make the deployment using the following command.

Start the containers using the following command, you will receive the SSL certificates once the containers are started.

docker-compose up -d

Once all containers are started you will see additional directories for SSL will be created alongside your docker-compose.yml file.

The directory certbot holds all the files related to your SSL certificates.

To view the containers you can execute the following command.

docker-compose ps Step 6: Configure SSL for Odoo in Docker

As you have received the Let’s Encrypt SSL certificate you can configure HTTPS and setup redirection to HTTPS.

Edit the default.conf and make the following changes.

sudo nano nginx/conf/default.conf server { listen [::]:80; listen 80; } server { } server { location ~ /.well-known/acme-challenge { allow all; root /var/www/html; } location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $host; } location ~* /web/static/ { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $host; } }

Now restart the Nginx service to load the new configurations.

docker-compose restart nginx Step 7: Setup Odoo

Now you can visit your domain name on your web browser. You will see the page similar to the one below. Here you can create the database and admin user for your Odoo.


Now you have learned how to install Odoo 15 on your Ubuntu 22.04 with Docker Compose, Nginx and secure it with Let’s Encrypt.

How To Backup Apps And Data Without Root Using Helium Android App

The most basic reason to root your device is to be able to use Titanium Backup or similar app to take backup of apps with app-data, meaning app settings, game progress, etc. Android doesn’t provide a solution for backup of app-data, and none of the OEMs either.

But, thankfully, there is an app for this, which would take apps and games backup for you, without requiring rot access at all, and you’ll need only PC to set it up.

The app we’re talking about is Helium, and is made by the Koush, who’s famous in Android blogosphere for his ClockworkMod recovery and other work.


Helium can backup your apps — offering you a choice to back up data only, or .APK files too. Helium can create backup on your internal and external SD card or, online to your Dropbox, Box and Google Drive after you connect it to the service.

It gets immensely helpful when you are looking to sync your apps — games, actually, are the fun here — across two or more devices. Just read the section below for this, titled: HOW TO SYNC BACKUP OF APPS USING HELIUM ON ANDROID DEVICES.


Helium is not compatible with Motorola devices, and there are issues with some devices of Sony too, mainly Xperia S and Xperia Z. As the helium developer says, Sony has disabled backup on these devices. Here’s the quote from the developer:

If you find that Helium *does* work on your Motorola Android, please notify me.

Note: Some Sony devices have issues with Helium. This includes the Xperia S and Xperia Z. Sony has *disabled* backup on these devices completely.


Well, .APK is app installer file, like the .EXE for Windows. It simply install the apps, as if downloaded from play store. Completely fresh install.

App-data, is app’s data on your phone. It includes app’s settings mainly, and other cache files. So, for games, it’s your game progress — levels cleared, item unlocked, achievements, etc. stuff. And for apps, well let’s take an alarm app for example, it would alarms created by you, ringtones selected, etc. stuff.

App-data doesn’t include files saved on your SD card. For example, if you downloaded an attachment from Gmail android app, then it’s not its data, and because it’s already saved on PC, you really don’t need to back up these files.

You get to choose whether you want Data Only, or Data plus .APK file, for backup of your Apps and Games.

As you can see in step 3 of the guide ‘How to Backup Apps using Helium Android App’ below, you can select and deselect the option App Data Only (smaller backups) after selecting the apps for backup, to tell app whether .apk files are be backed up or not along with the data of apps.



On Android 2.3 and below: Go to devices’s Settings » Applications » Development – select the ‘USB debugging’ checkbox. On Android 4.0 and above, do this:

Enable developer options first: Go to your Settings » About device » scroll to the bottom and tap on “Build number” seven times to enable developer options

Enable USB Debugging: Open Settings » select Developer options » Tick the “USB debugging” checkbox (under Debugging section)


Without appropriate drivers installed and working, it’s impossible for this beauty called Helium to work. So, head over to this page for drivers and install the driver for your device. We’ve got some neat guides to help you out, btw. See the below:


Connect your android device to PC. It should show up in My Computer if the drivers are correctly installed. Although, this isn’t very fool proof way to check, it’s good.

If it’s there, it means drivers were not installed properly.


Go to the Play Store listing (here) on either PC or on phone using this link and install the app.



This is important part. It involves establishing connection between the Helium Android app and its PC counterpart. Once this is done, Application Backup gets enabled on your device, and you can start backing up apps.

Open the Helium on PC. (It might be already ON and running in background as it’s set to auto-start with Windows when you power on PC.) See the screenshot below, it will look like this. Now, move to next step.

└ In fact, the best way there currently is this! (Without root, of course, otherwise Titanium backup and other apps are cool, too.)

— But as we’re focusing on non-root backup solution here, let’s see further in case of non-root devices. See next step as you’re probably not rooted.

Once you get the above on Helium desktop, you can close the PC software and disconnect the device too. Active connection with PC isn’t required for backup and restore. Plus, only when you Power Off your android device, will you need to connect with Helium desktop software again, not otherwise.


Now, the easy part, which was worth all this. Taking backup of apps with their data. Open the Helium app on your Android device.

Select the Apps you want to backup.

You can swipe up from below (try from that blue line) to see the options as regards backup. Now, let’s discuss backing up with its .apk file and without .apk file (app-data only):

Data Backup Only: If you want to take back up of data only of the apps, then swipe up and then keep the checkbox of the option App Data Only (smaller backups) selected, which it is by default. In this case, Helium will ask you to download the app from Play Store and after that the app will restore data, so that the app/game will be exactly as it was at the time of backup. It’s good especially for online backups, as size is small compared to full backups.

Full Backup: While, if you want to back up both the .apk file and data of apps, then deselect the checkbox of option “App Data Only (smaller backups)”. In this case, both app and app’s data will restored without the need to download from play store. If you have enough space on your device’s internal/external memory, then sure choose this option as it provides full backup and restore. Evidently, it’s not good for Online backups and restore as its backup size is huge.

After you’ve selected apps for the backup, tap on Backup to start backing up the data of apps, or .APK installation files too, depending on whether you select or deselected the checkbox discussed above.

The app will ask for storage to be used. Choose one.

A black screen will appear with space to enter password. Ignore it. It will be gone anyway in a matter of seconds. And backup progress will show up.

That’s it.


Well, backups are saved in Carbon folder on the storage you chose — your internal, external or any of the online storage supported.

Now, one cool and very helpful tip: regarding syncing the data from one device to another (or more) and vice versa. Read below.



It is fun!

But, you will need well-worth paid version of the app, Helium (Premium), costing $4.99, to be abel to restore from online storage. And of course, once bought, it can be used on any no. of Android devices.

So, let’s say, you are playing Wind-up Knight — oh, that game is PURE fun, and a great challenge, even for seasonal PC gamers! — on your Nexus 7 at home, and want to continue on your Nexus 4 at office, school or wherever. And vice versa.

You can do so without much fuss, or root access if you prefer, using one and only Helium.

So, let’s take the above game and devices as our example for the guide below, and let’s call Nexus 7 as our first device and Nexus 4 as our second device. And the concerned app/game is Wind-up Knight.

Make sure you’ve enabled application backups — as discussed at length above — on both of the devices (Nexus 7 and Nexus 4).

Sign into any of the online storage facility available — Google Drive, Dropbox and Box are supported, fyi — on both or all of your devices (both Nexus 7 and Nexus 4) you’re looking to sync data of and with.

└ Important tip: Keep track of which device as the newest data, so that you don’t restore old data with the newest data. For example: If you last played a game on Nexus 7, then take this backup on first device (Nexus 7) and then restore it on your second device (Nexus 4). And when you’re done playing on Nexus 4, backup from Nexus 4 and to restore on Nexus 7 or any other device of yours.

Select your online storage (either of Google Drive, Dropbox or Box) as the backup destination on your first device (Nexus 7). Backup will be saved here.

Now, on your second device (Nexus 4), where you want to restore the data, or sync the data to, make sure you have the Helium android app installed and have logged into online storage you chose in step 4 above.

Make sure you have got the app or the game installed on this second device (Nexus 4) of yours, so that Helium can restore its data.

Open Helium on second device (Nexus 4), swipe right to left to go to Restore option and select the online storage (that you chose in step 4).

You backed up apps and games will show up. Select all apps and games (Wind-up Knight in our example) of whose data that you want to restore on your second device (Nexus 4).

Tap on Restore. This will require paid version of the app, as I mentioned above. And.. that’s it. You just synced data across two Android devices, without Root. You’re a genius! I have already continued playing the game on Nexus 4, btw. And the game’s total fun, did I tell you that?

Let us know what you think of it.



You can apps under Helium, so that if you want to take backup of some specific apps only, you won’t have to select them every time.

Open the app and select the apps to be grouped.

Now, swipe up from blue line.

Write Group name under the option: Remember Group of Apps.

Tap on Backup.

You will get to Storage selection screen. But your group has been created too silently. If you wish to really back them up right now, tap on the storage you prefer, otherwise just go back using the back key.

See the group you just created appearing at the top under SAVED GROUPS.



Well it’s easy. Very easy. Here’s how.

Open your Helium Android app.

Swipe right to left to go to Restore and Sync tab.

└ Swipe up from blue line to get Select All and Deselect All options.

Tap on Restore to begin restoration of apps and games.

A black screen will appear with space to enter password. Ignore it. It will be gone anyway in a matter of seconds. And restore progress will show up.

That’s it.



Well it’s easy. Very easy. Here’s how.

Open your Helium Android app.

Swipe right to left to go to Restore and Sync tab.

Select the apps and games you want to delete. └ Swipe up from blue line to get Select All and Deselect All options.

Tap on the bin icon (represents delete option) in the right top corner.

Tap on OK to confirm deletion of the selected apps and games. You can’t get back deleted apps, it’s permanent delete.

That’s it.

Small Tips!

→ If you took backup on either of internal or external SD card, then keep a copy of it on PC for safety. Copy paste the folder, Carbon, to your PC, after connecting the phone to PC with USB cable.

→ You can also set the Helium’s backup folder, Carbon, for sync with PC so that all backups are copied to PC automatically. Use an app like Cheetah for this.

→ If you don’t want to buy paid version for restoring from online storage, you can try this: download the files to your phone from the online storage and then copy paste the backup files to your Carbon folder on internal/external SD card, and then restore from here.


Your feedback, and suggestions and corrections if any, for this article are welcomed!

Update the detailed information about Build And Deploy An Ml App Using Streamlit, Docker And Gke on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!