Trending February 2024 # Does Google Use Sentiment Analysis To Rank Web Pages? # Suggested March 2024 # Top 11 Popular

You are reading the article Does Google Use Sentiment Analysis To Rank Web Pages? updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Does Google Use Sentiment Analysis To Rank Web Pages?

Many SEOs believe that the sentiment of a web page can influence whether Google ranks a page. If all the pages ranked in the search engine results pages (SERPs) have a positive sentiment, they believe that your page will not be able to rank if it contains negative sentiments.

The evidence and facts are out there to show where Google’s research has been focusing in terms of sentiment analysis.

I asked Bill Slawski (@bill_slawski)  , an expert in Google related patents what he thought about the SEO theory that Google uses sentiment analysis to rank web pages.

“Sentiment is like a flavor, like vanilla or chocolate. It does not reflect the potential information gain that an article might bring.

Information gain can be understood by using NLP processing to extract entities and knowledge about them, and that can lead to a determination of information gain.

Sentiment is a value that doesn’t necessarily reflect how much information an article might bring to a topic.

Positive or negative sentiment is not a reflection of how much knowledge is present and added to a topic.”

Bill affirmed that Google tends to show a range of opinions for review related queries.

“I don’t believe that Google would favor one sentiment over another. That smells of showing potential bias on a topic.

I would expect Google to want some amount of diversity when it comes to sentiment, so if they were considering ranking based upon it, they would not show all negative or positive.”

Bill makes an excellent point about the lack of usefulness if Google search results introduced a sentiment bias.

Some SEOs believe that if all the search results have a positive sentiment, then that’s a reflection of what searchers are looking for. That’s a naive correlation.

There are many known ranking factors such as links that can account for those rankings. There are other factors such as users wanting to see specific sites for specific queries.

Simply isolating one factor and saying, “Aha, all the sites have this so this is why it’s ranking” is naive, it’s cherry picking what you want to see.

For example, the same SEO can look at those search results and see that they all use the same brand of SEO plugin. Does that mean the SEO plugin is the reason those sites rank?

The answer is no.

Similarly, the sentiment expressed in the search results does not necessarily reflect what the searcher is looking for.

This is why I say it is naive to look at one factor such as sentiment and say that’s the reason a site is ranking. Just because you see a correlation does not mean it’s the reason a site is ranking.

Does Google Use Sentiment Analysis for Ranking?

Google’s been largely silent on sentiment analysis since 2023.

In July 2023, someone on Twitter asked:

“…it seems like your search algorithm recognizes and takes into account sentiment. Is there a sentiment search operator?”

Danny Sullivan answered:

“It does not recognize sentiment. So, no operator for that.”

Danny made it clear that Google’s search algorithm does not recognize sentiment.

Earlier that year Danny published an official Google announcement about featured snippets where he mentioned sentiment. But the context of sentiment was that for some queries there may be a diversity of opinions and because of that Google might show two featured snippets, one positive and one negative.

“…people who search for “are reptiles good pets” should get the same featured snippet as “are reptiles bad pets” since they are seeking the same information: how do reptiles rate as pets? However, the featured snippets we serve contradict each other.

A page arguing that reptiles are good pets seems the best match for people who search about them being good. Similarly, a page arguing that reptiles are bad pets seems the best match for people who search about them being bad. We’re exploring solutions to this challenge, including showing multiple responses.”

The point of the above section is that they are exploring showing multiple responses.

Since 2023, Google has stopped showing featured snippets for vague queries like “are reptiles good pets?” and encouraging users to drill down and choose a more specific reptile.

Danny wrote:

Those statements directly contradicts the SEO idea that if the sentiment in the SERPs leans in one direction, that your site needs to lean in the same direction to rank.

Rather, Google is asserting that they want to show diversity in opinions.

Positives and Negatives in Reviews

A Google research paper titled, Structured Models for Fine-to-Coarse Sentiment Analysis (PDF 2007) states that a “question answering system” would require sentiment analysis at a paragraph level.

A system that summarizes reviews would need to understand the positive or negative opinion at the sentence or phrase level.

This is sometimes referred to as opinion mining. The point of this kind of analysis is to understand the opinion.

Here’s how the research paper explains the importance of sentiment analysis:

“The ability to classify sentiment on multiple levels is important since different applications have different needs. For example, a summarization system for product reviews might require polarity classification at the sentence or phrase level; a question answering system would most likely require the sentiment of paragraphs; and a system that determines which articles from an online news source are editorial in nature would require a document level analysis.”

The paper further describes how sentiment analysis is useful:

2004). One interesting work on sentiment analysis is that of Popescu and Etzioni (2005) which attempts to classify the sentiment of phrases with respect to possible product features.”

What stands out about that research is that it is strictly about understanding the sentiment of text.

There is no context for using it to show search results that are biased toward the sentiment in a  user’s search query.

The context is not about ranking text according to the sentiment.

Yet even though the context is not about ranking because of the sentiment, some SEOs will quote this kind of research and then tack on that it’s being used for ranking. And that’s wrong because the context of this and other research papers are consistently about understanding text, well outside of the context of ranking that text.

Sentiment Analysis Encompasses More than Positive and Negative

Another research paper, What’s Great and What’s Not: Learning to Classify the Scope of Negation for Improved Sentiment Analysis (PDF 2010) presents a way to understand the sentiment of product reviews.

The scope of the research is finding a better way to deal with ambiguity in the way ideas are expressed.

Examples of these kinds of linguistic negation phrases are:

“Given the poor reputation of the manufacturer, I expected to be disappointed with the device. This was not the case.”

“Do not neglect to order their delicious garlic bread.”

“Why couldn’t they include a decent speaker in this phone?”

The above examples show how this research paper is focused on understanding what humans mean when they structure their speech in a certain way. This is an example of how sentiment analysis is about more than just positive and negative sentiment.

It’s really about the meaning of words, phrases, paragraphs and documents.

The paper begins by stating the usefulness of sentiment analysis in several scenarios, including question answering:

“The automatic detection of the scope of linguistic negation is a problem encountered in wide variety of document understanding tasks, including but not limited to medical data mining, general fact or relation extraction, question answering, and sentiment analysis.”

How would accurately classifying these kinds of sentences help a search engine in question answering?

A search engine cannot accurately answer a question without understanding the web pages it wants to rank.

It’s not about using that data as ranking factors. It’s about using that data to understand the pages so that they then can then be ranked according to ranking criteria.

One way of looking at sentiment analysis is to think of it as obtaining candidate web pages for ranking. A search engine cannot select a candidate if it cannot understand the web page.

Once a search engine can understand a web page, it can then apply the ranking criteria on the pages that are likely to answer the question.

This is especially important for search queries that are ambiguous because of things like linguistic negation, as described in the research paper above.

If sentiment analysis is used by Google, a web page isn’t ranked because of the sentiment analysis. Sentiment analysis helps a web page be understood so that it can be ranked.

Google can’t rank what it can’t understand. Google can’t answer a question that it can’t understand.

More Sentiment Analysis Research

SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis (PDF 2014)

This research paper studies how to better understand what users mean when they leave online reviews on websites, forums, microblogs and so on.

This is how it describes the problem being solved:

“…most of existing topic methods only model the sentiment text, but do not consider the user, who expresses the sentiment, and the item, which the sentiment is expressed on. Since different users may use different sentiment expressions for different items, we argue that it is better to incorporate the user and item information into the topic model for sentiment analysis.”

Speech Sentiment Analysis via End-To-End ASR Features (PDF 2023)

ASR means Automatic Speech Recognition. This research paper is about understanding speech, and doing things like giving more weight to non-speech inflections like laughter and breathing.

The research shares examples of using breathing and laughter as weighted elements to help them understand the sentiment in the context of speech sentiment analysis, but not for ranking purposes.

These are the examples:

4. That would be wonderful, that would be great seriously. “

The paper describes the context of where it is useful:

“Speech sentiment analysis is an important problem for interactive intelligence systems with broad applications in many industries, e.g., customer service, health-care, and education.

The task is to classify a speech utterance into one of a fixed set of categories, such as positive, negative or neutral.”

This research is very new, from 2023 and while not obviously specific to search, it’s indicative of the kind of research Google is doing and how it is far more sophisticated than what the average reductionist SEO sees as a simple ranking factor.

No Sentiment Analysis Bias at Google

Google has consistently stated that they try not to show pages that reflect a searcher’s sentiment intent (are geckos bad pets?)

In fact, Google says the opposite, that it tries to show a diversity of opinions. Google tries not to be led by a sentiment expressed in the search query.

Example of Google Showing Diversity of Opinion

As you can see in the above screenshot, Google does not allow the negative sentiment expressed in the search query to influence it into showing a web page with a negative sentiment.

This directly contradicts the idea that Google shows search results with a specific sentiment bias if that bias exists in the search query.

You can dig around for Google research and patents about sentiment analysis and you will see that the context is about understanding search queries and web pages.

You will not see research that says the sentiment will be used to rank a page according to its bias.

If the pages that Google is ranking all have the same sentiment, do not assume that that is why those pages are there.

It is clear from Google research papers, statements from Google and from Google search results that Google does not allow the sentiment of the user search query to influence the kind of sites that Google will rank.

You're reading Does Google Use Sentiment Analysis To Rank Web Pages?

How To Send Web Pages To Kindle In Google Chrome

Download the official Amazon extension, Send To Kindle For Google Chrome.

The icon should now be visible on the Bookmarks bar on the top right corner of the browser.

On the Settings page, enter your Amazon account email and password.

Send to Kindle skips the preview and sends the web page right away.

When sending content, a brief pop up box appears to confirm delivery.

You can now check on your Kindle and see it added to your library. The web page will appear just like an ebook. Kindle may need a few seconds to download the content, after which it can be viewed on the app’s archive. If the web page is not showing, do a quick sync to check for new items in the library.

It’s also important that you send content to the right Kindle device. If you have more than one Kindle, be sure to check this on the Settings page of the extension.

Web pages appear uncluttered and clean on Kindle, so it’s a better way to read lengthy articles. It also only takes a few seconds for the web content to show on the Kindle device, as long as there were no problems with delivery. With this extension, reading web content is no longer limited to desktop or laptop computers. You can read anything, anywhere with Send to Kindle for Chrome.

Kim Barloso

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Sign up for all newsletters.

By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.

Twitter Sentiment Analysis Using Python

A Twitter sentiment analysis determines negative, positive, or neutral emotions within the text of a tweet using NLP and ML models. Sentiment analysis or opinion mining refers to identifying as well as classifying the sentiments that are expressed in the text source. Tweets are often useful in generating a vast amount of sentiment data upon analysis. These data are useful in understanding the opinion of people on social media for a variety of topics.

This article was published as a part of the Data Science Blogathon.

What is Twitter Sentiment Analysis?

Twitter sentiment analysis analyzes the sentiment or emotion of tweets. It uses natural language processing and machine learning algorithms to classify tweets automatically as positive, negative, or neutral based on their content. It can be done for individual tweets or a larger dataset related to a particular topic or event.

Why is Twitter Sentiment Analysis Important?

Understanding Customer Feedback: By analyzing the sentiment of customer feedback, companies can identify areas where they need to improve their products or services.

Political Analysis: Sentiment analysis can help political campaigns understand public opinion and tailor their messaging accordingly.

Crisis Management: In the event of a crisis, sentiment analysis can help organizations monitor social media and news outlets for negative sentiment and respond appropriately.

How to Do Twitter Sentiment Analysis?

In this article, we aim to analyze Twitter sentiment analysis using machine learning algorithms, the sentiment of tweets provided from the Sentiment140 dataset by developing a machine learning pipeline involving the use of three classifiers (Logistic Regression, Bernoulli Naive Bayes, and SVM)along with using Term Frequency- Inverse Document Frequency (TF-IDF). The performance of these classifiers is then evaluated using accuracy and F1 Scores.

For data preprocessing, we will be using Natural Language Processing’s (NLP) NLTK library.

Twitter Sentiment Analysis: Problem Statement

In this project, we try to implement an NLP Twitter sentiment analysis model that helps to overcome the challenges of sentiment classification of tweets. We will be classifying the tweets into positive or negative sentiments. The necessary details regarding the dataset involving the Twitter sentiment analysis project are:

The dataset provided is the Sentiment140 Dataset which consists of 1,600,000 tweets that have been extracted using the Twitter API. The various columns present in this Twitter data are:

target: the polarity of the tweet (positive or negative)

ids: Unique id of the tweet

date: the date of the tweet

flag: It refers to the query. If no such query exists, then it is NO QUERY.

user: It refers to the name of the user that tweeted

text: It refers to the text of the tweet

Twitter Sentiment Analysis: Project Pipeline

The various steps involved in the Machine Learning Pipeline are:

Import Necessary Dependencies

Read and Load the Dataset

Exploratory Data Analysis

Data Visualization of Target Variables

Data Preprocessing

Splitting our data into Train and Test sets.

Transforming Dataset using TF-IDF Vectorizer

Function for Model Evaluation

Model Building

Model Evaluation

Let’s get started,

Step-1: Import the Necessary Dependencies # utilities import re import numpy as np import pandas as pd # plotting import seaborn as sns from wordcloud import WordCloud import matplotlib.pyplot as plt # nltk from chúng tôi import WordNetLemmatizer # sklearn from chúng tôi import LinearSVC from sklearn.naive_bayes import BernoulliNB from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import confusion_matrix, classification_report Step-2: Read and Load the Dataset # Importing the dataset DATASET_COLUMNS=['target','ids','date','flag','user','text'] DATASET_ENCODING = "ISO-8859-1" df = pd.read_csv('Project_Data.csv', encoding=DATASET_ENCODING, names=DATASET_COLUMNS) df.sample(5)


Step-3: Exploratory Data Analysis

3.1: Five top records of data



3.2: Columns/features in data



Index(['target', 'ids', 'date', 'flag', 'user', 'text'], dtype='object')

3.3: Length of the dataset

print('length of data is', len(df))


length of data is 1048576

3.4: Shape of data

df. shape


(1048576, 6)

3.5: Data information


3.6: Datatypes of all columns



target int64 ids int64 date object flag object user object text object dtype: object

3.7: Checking for null values




3.8: Rows and columns in the dataset

print('Count of columns in the data is: ', len(df.columns)) print('Count of rows in the data is: ', len(df))


Count of columns in the data is: 6 Count of rows in the data is: 1048576

3.9: Check unique target values



array([0, 4], dtype=int64)

3.10: Check the number of target values



2 Step-4: Data Visualization of Target Variables # Plotting the distribution for dataset. ax = df.groupby('target').count().plot(kind='bar', title='Distribution of data',legend=False) ax.set_xticklabels(['Negative','Positive'], rotation=0) # Storing data in lists. text, sentiment = list(df['text']), list(df['target'])


import seaborn as sns sns.countplot(x='target', data=df)


Step-5: Data Preprocessing

In the above-given problem statement, before training the model, we performed various pre-processing steps on the dataset that mainly dealt with removing stopwords, removing special characters like emojis, hashtags, etc. The text document is then converted into lowercase for better generalization.

Subsequently, the punctuations were cleaned and removed, thereby reducing the unnecessary noise from the dataset. After that, we also removed the repeating characters from the words along with removing the URLs as they do not have any significant importance.

At last, we then performed Stemming(reducing the words to their derived stems) and Lemmatization(reducing the derived words to their root form, known as lemma) for better results.

5.1: Selecting the text and Target column for our further analysis


5.2: Replacing the values to ease understanding. (Assigning 1 to Positive sentiment 4)

data['target'] = data['target'].replace(4,1)

5.3: Printing unique values of target variables



array([0, 1], dtype=int64)

5.4: Separating positive and negative tweets

data_pos = data[data['target'] == 1] data_neg = data[data['target'] == 0]

5.5: Taking one-fourth of the data so we can run it on our machine easily

data_pos = data_pos.iloc[:int(20000)] data_neg = data_neg.iloc[:int(20000)]

5.6: Combining positive and negative tweets

dataset = pd.concat([data_pos, data_neg])

5.7: Making statement text in lowercase

dataset['text']=dataset['text'].str.lower() dataset['text'].tail()


5.8: Defining set containing all stopwords in English.

stopwordlist = ['a', 'about', 'above', 'after', 'again', 'ain', 'all', 'am', 'an', 'and','any','are', 'as', 'at', 'be', 'because', 'been', 'before', 'being', 'below', 'between','both', 'by', 'can', 'd', 'did', 'do', 'does', 'doing', 'down', 'during', 'each','few', 'for', 'from', 'further', 'had', 'has', 'have', 'having', 'he', 'her', 'here', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'i', 'if', 'in', 'into','is', 'it', 'its', 'itself', 'just', 'll', 'm', 'ma', 'me', 'more', 'most','my', 'myself', 'now', 'o', 'of', 'on', 'once', 'only', 'or', 'other', 'our', 'ours','ourselves', 'out', 'own', 're','s', 'same', 'she', "shes", 'should', "shouldve",'so', 'some', 'such', 't', 'than', 'that', "thatll", 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'there', 'these', 'they', 'this', 'those', 'through', 'to', 'too','under', 'until', 'up', 've', 'very', 'was', 'we', 'were', 'what', 'when', 'where','which','while', 'who', 'whom', 'why', 'will', 'with', 'won', 'y', 'you', "youd","youll", "youre", "youve", 'your', 'yours', 'yourself', 'yourselves']

5.9: Cleaning and removing the above stop words list from the tweet text

STOPWORDS = set(stopwordlist) def cleaning_stopwords(text): return " ".join([word for word in str(text).split() if word not in STOPWORDS]) dataset['text'] = dataset['text'].apply(lambda text: cleaning_stopwords(text)) dataset['text'].head()


5.10: Cleaning and removing punctuations

import string english_punctuations = string.punctuation punctuations_list = english_punctuations def cleaning_punctuations(text): translator = str.maketrans('', '', punctuations_list) return text.translate(translator) dataset['text']= dataset['text'].apply(lambda x: cleaning_punctuations(x)) dataset['text'].tail()


5.11: Cleaning and removing repeating characters

def cleaning_repeating_char(text): return re.sub(r'(.)1+', r'1', text) dataset['text'] = dataset['text'].apply(lambda x: cleaning_repeating_char(x)) dataset['text'].tail()


5.12: Cleaning and removing URLs

def cleaning_URLs(data): dataset['text'] = dataset['text'].apply(lambda x: cleaning_URLs(x)) dataset['text'].tail()


5.13: Cleaning and removing numeric numbers

def cleaning_numbers(data): return re.sub('[0-9]+', '', data) dataset['text'] = dataset['text'].apply(lambda x: cleaning_numbers(x)) dataset['text'].tail()


5.14: Getting tokenization of tweet text

from nltk.tokenize import RegexpTokenizer tokenizer = RegexpTokenizer(r'w+') dataset['text'] = dataset['text'].apply(tokenizer.tokenize) dataset['text'].head()


5.15: Applying stemming

import nltk st = nltk.PorterStemmer() def stemming_on_text(data): text = [st.stem(word) for word in data] return data dataset['text']= dataset['text'].apply(lambda x: stemming_on_text(x)) dataset['text'].head()


5.16: Applying lemmatizer

lm = nltk.WordNetLemmatizer() def lemmatizer_on_text(data): text = [lm.lemmatize(word) for word in data] return data dataset['text'] = dataset['text'].apply(lambda x: lemmatizer_on_text(x)) dataset['text'].head()


5.17: Separating input feature and label


5.18: Plot a cloud of words for negative tweets

data_neg = data['text'][:800000] plt.figure(figsize = (20,20)) wc = WordCloud(max_words = 1000 , width = 1600 , height = 800, collocations=False).generate(" ".join(data_neg)) plt.imshow(wc)


5.19: Plot a cloud of words for positive tweets

data_pos = data['text'][800000:] wc = WordCloud(max_words = 1000 , width = 1600 , height = 800, collocations=False).generate(" ".join(data_pos)) plt.figure(figsize = (20,20)) plt.imshow(wc)


Step-6: Splitting Our Data Into Train and Test Subsets # Separating the 95% data for training data and 5% for testing data X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.05, random_state =26105111) Step-7: Transforming the Dataset Using TF-IDF Vectorizer

7.1: Fit the TF-IDF Vectorizer

vectoriser = TfidfVectorizer(ngram_range=(1,2), max_features=500000) print('No. of feature_words: ', len(vectoriser.get_feature_names()))


No. of feature_words: 500000

7.2: Transform the data using TF-IDF Vectorizer

X_train = vectoriser.transform(X_train) X_test = vectoriser.transform(X_test) Step-8: Function for Model Evaluation

After training the model, we then apply the evaluation measures to check how the model is performing. Accordingly, we use the following evaluation parameters to check the performance of the models respectively:

Accuracy Score

Confusion Matrix with Plot


def model_Evaluate(model): # Predict values for Test dataset y_pred = model.predict(X_test) # Print the evaluation metrics for the dataset. print(classification_report(y_test, y_pred)) # Compute and plot the Confusion matrix cf_matrix = confusion_matrix(y_test, y_pred) categories = ['Negative','Positive'] group_names = ['True Neg','False Pos', 'False Neg','True Pos'] group_percentages = ['{0:.2%}'.format(value) for value in cf_matrix.flatten() / np.sum(cf_matrix)] labels = [f'{v1}n{v2}' for v1, v2 in zip(group_names,group_percentages)] labels = np.asarray(labels).reshape(2,2) sns.heatmap(cf_matrix, annot = labels, cmap = 'Blues',fmt = '', xticklabels = categories, yticklabels = categories) plt.xlabel("Predicted values", fontdict = {'size':14}, labelpad = 10) plt.ylabel("Actual values" , fontdict = {'size':14}, labelpad = 10) plt.title ("Confusion Matrix", fontdict = {'size':18}, pad = 20) Step-9: Model Building

In the problem statement, we have used three different models respectively :

Bernoulli Naive Bayes Classifier

SVM (Support Vector Machine)

Logistic Regression

The idea behind choosing these models is that we want to try all the classifiers on the dataset ranging from simple ones to complex models, and then try to find out the one which gives the best performance among them.

8.1: Model-1

BNBmodel = BernoulliNB(), y_train) model_Evaluate(BNBmodel) y_pred1 = BNBmodel.predict(X_test)


8.2: Plot the ROC-AUC Curve for model-1

from sklearn.metrics import roc_curve, auc fpr, tpr, thresholds = roc_curve(y_test, y_pred1) roc_auc = auc(fpr, tpr) plt.figure() plt.plot(fpr, tpr, color='darkorange', lw=1, label='ROC curve (area = %0.2f)' % roc_auc) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC CURVE') plt.legend(loc="lower right")


8.3: Model-2:

SVCmodel = LinearSVC(), y_train) model_Evaluate(SVCmodel) y_pred2 = SVCmodel.predict(X_test)


8.4: Plot the ROC-AUC Curve for model-2

from sklearn.metrics import roc_curve, auc fpr, tpr, thresholds = roc_curve(y_test, y_pred2) roc_auc = auc(fpr, tpr) plt.figure() plt.plot(fpr, tpr, color='darkorange', lw=1, label='ROC curve (area = %0.2f)' % roc_auc) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC CURVE') plt.legend(loc="lower right")


8.5: Model-3

LRmodel = LogisticRegression(C = 2, max_iter = 1000, n_jobs=-1), y_train) model_Evaluate(LRmodel) y_pred3 = LRmodel.predict(X_test)


8.6: Plot the ROC-AUC Curve for model-3

from sklearn.metrics import roc_curve, auc fpr, tpr, thresholds = roc_curve(y_test, y_pred3) roc_auc = auc(fpr, tpr) plt.figure() plt.plot(fpr, tpr, color='darkorange', lw=1, label='ROC curve (area = %0.2f)' % roc_auc) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC CURVE') plt.legend(loc="lower right")


Step-10: Model Evaluation

Upon evaluating all the models, we can conclude the following details i.e.

Accuracy: As far as the accuracy of the model is concerned, Logistic Regression performs better than SVM, which in turn performs better than Bernoulli Naive Bayes.

AUC Score: All three models have the same ROC-AUC score.

We, therefore, conclude that the Logistic Regression is the best model for the above-given dataset.

In our problem statement, Logistic Regression follows the principle of Occam’s Razor, which defines that for a particular problem statement, if the data has no assumption, then the simplest model works the best. Since our dataset does not have any assumptions and Logistic Regression is a simple model. Therefore, the concept holds true for the above-mentioned dataset.


We hope through this article, you got a basic of how Sentimental Analysis is used to understand public emotions behind people’s tweets. As you’ve read in this article, Twitter Sentimental Analysis helps us preprocess the data (tweets) using different methods and feed it into ML models to give the best accuracy.

Key Takeaways

Twitter Sentimental Analysis is used to identify as well as classify the sentiments that are expressed in the text source.

Logistic Regression, SVM, and Naive Bayes are some of the ML algorithms that can be used for Twitter Sentimental Analysis.

Frequently Asked Questions


Top 5 Sentiment Analysis Challenges And Solutions In 2023

Words are the most powerful tools to express our thoughts, opinions, intentions, desires, or preferences. However, they do not have the same meaning in all instances. Instead, the meaning conveyed is mainly shaped by the context. This complexity of human languages constitutes a challenge for AI methods that work with natural languages, such as sentiment analysis. 

Consider the following example:

Figure 1. Consumer feedback on a product

The consumer states in his review that he is content with the product, and his words can be classified as positive (e.g., “love,” “amazing,” and “long battery life”). However, in the fifth sentence, he says that his wife does not have similar thoughts. Instead, her sentiment regarding the product is negative (e.g., “too heavy”). So, how would the algorithm classify this review? As positive, negative, or neutral?

Here are the top five challenges of conducting sentiment analysis and how to solve them:

1. Context-dependent errors Sarcasm

People tend to use sarcasm as a way of expressing their negative sentiment, but the words used can be positive (e.g., “I am so glad that the product arrived in one piece!”). In such cases, sentiment analysis tools can classify the feedback as positive, which in reality is negative.

Solution: Determine the boundaries of sarcasm in the training dataset. For instance, researchers used a multi-head self-attention-based neural network architecture to identify terms that include sarcasm. It highlights the parts that have a sarcastic tone, then connects these parts to each other to obtain an overall score.


Although the emotional tone in some sentences can be very apparent and robust (e.g., “It was a terrible experience.”), the others are not easily classified as positive, negative, or neutral (e.g., “The service quality is not mentionable.”). So, the polarity of the statement cannot always be easily inferred by the algorithms.

Solution: Give polarity scores to the words in the training dataset so that the algorithm can classify the difference between statements such as “very good” and “slightly good.”


When words have more than one meaning (e.g., the head of the sales team vs. wearing an earbud hurts the head), then it becomes more challenging for the algorithm to differentiate what the intended meaning is. Thus, as the word is not evaluated in its context, the results of the analysis can be inaccurate.

Solution: Incorporate domain knowledge during text annotation and model training phases. It can help your sentiment analysis algorithms to differentiate between words that have different meanings in different contexts.

For more in-depth knowledge on sentiment analysis, feel free to download our comprehensive whitepaper:

2. Negation Detection 

Just because a sentence contains negation (e.g., no, not, -non, -less, -dis), it does not mean that the overall sentiment of the statement is negative. Current negation detection methods are not sufficient to classify the sentiment correctly. For instance, “It was not unpleasant” is a statement with negation and can be classified by the algorithm as negative, but it conveys a positive meaning. 

Solution: Train your algorithm with large datasets, including all possible negation words. A combination of term-counting methods that regard contextual valence shifters and machine learning methods is found to be effective in identifying negation signals more accurately.

3. Multilingual Data

Although English is the common language used worldwide, as companies grow, they engage with customers globally. This results in customers using different languages while providing feedback. However, the sentiment analysis tools are primarily trained to categorize the words in one language, and some sentiments may get lost in translation. This causes a significant problem, especially while conducting sentiment analysis on non-English reviews or feedback.

Solution: Design systems that can learn from multilingual content and can make predictions regardless of the language. For instance, you can use a code-switching approach that includes parallel encoders at a word and implements models such as deep neural networks. You can also check our article on multilingual sentiment analysis for a comprehensive account.

4. Emojis

Figure 2. The valence and arousal rates for the most used emojis

Emojis have become a part of daily life and are more effective in expressing one’s sentiment compared to words. However, as the sentiment analysis tools depend on written texts, emojis cannot be classified accurately and thus are removed from many analyses. In turn, one ends up with a noncomprehensive analysis.

Solution: Determining the emoji tags and implementing them into your sentiment analysis algorithm can improve the accuracy of your analysis. 

5. Potential Biases in Model Training

Although AI algorithms are powerful tools to make accurate predictions, they are trained by humans. This means that they inevitably reflect human biases in the training dataset in their results. For instance, if the algorithm is trained to label the sentence “I am a sensitive person” as negative and label the sentence “I can be very ambitionist” as positive, the results can be biased towards some people with emotional tendencies and may distinguish overly ambitious people.

Solution: Minimize bias in AI systems by conducting debiasing methods. For instance, you can detect the words in your dataset that might involve human bias and develop a dictionary for these words. This way, you can tag them and then compare the overall sentiment in the text with and without these tagged words.

To learn more about sentiment analysis, read our other articles:

If you think your company can benefit from sentiment analysis, check our data-driven list of sentiment analysis services.

Do not hesitate to contact us if you have any further questions:

Begüm Yılmaz

Begüm is an Industry Analyst at AIMultiple. She holds a bachelor’s degree from Bogazici University and specializes in sentiment analysis, survey research, and content writing services.





Google Mum Algorithm Can Do More Than Rank Websites

Google’s John Mueller was asked about how many search queries the MUM algorithm was affecting. John said he didn’t know and then explained that the Google MUM algorithm is multi-purpose and could be used in contexts beyond just ranking.

Question About Google Application of MUM Technology

The MUM algorithm is impressive because it can search for answers across web documents regardless of language and can even use images as part of the search query.

So it’s understandable that the person asking the question wanted to know how much MUM was affecting the search results.

Google’s John Mueller answered the question and then tried to put MUM into perspective without any hype.

This is the question that was asked:

“A couple years ago Google noted that when it came to ranking results BERT would better understand and impact about ten percent of searches in the U.S.

My question is two-fold:

Has that percentage changed for BERT?

…What percentage is MUM expected to better understand and impact searches?”

How Many Searches Does MUM Affect?

John Mueller admitted that he didn’t know how many searches MUM affected and then explained why it might be difficult to put a number to the influence of MUM in the search results.

His answer first addressed the numbers for BERT and then addressed MUM.

John Mueller answered:

“I have no idea…

I’m pretty sure that the percentage changed since then because everything is changing.

But I don’t know if we have a fixed number that goes for BERT or that goes for MUM.”

Related: Google Announces Search Redesign Using MUM Algorithm

Mum is Like a Multi-purpose Machine Learning Library

John Mueller next followed up with thoughts about MUM and said that it can be applied to a wide range of tasks that go beyond ranking.

He said:

“Mum, as far as I understand it is more like a multi-purpose machine learning library anyway.

So it’s something that can be applied to lots of different parts of search.

It’s not so much that you would isolate it to just ranking.

But rather you might be able to use it for understanding things on a very fine grained level and then that’s kind of interwoven in a lot of different kinds of search results.

But I don’t think we have any fixed numbers.”

Google is Happy with MUM

The person asking the question asked a follow-up question that John answered with a non-hype description of MUM that portrayed it as doing things that aren’t necessarily as flashy as it might seem from the outside looking in.

The follow-up question:

“It seemed to me like it was going to open up more opportunities actually for different products or queries to be discovered.

It seemed like it was just sort of exponentially going to blow it out what one could learn.”

John Mueller responded:

“I don’t know… we’ll see.

I think it’s always tricky to look at the marketing around machine learning algorithms, because it’s very easy to find …very exponential examples.

But that doesn’t mean that everything is as flashy as that.

…In talking with some of these search quality folks, they’re really happy with the way that these kinds of machine learning models are working.”

Google’s Mum Algorithm is More Than Just Ranking

John Mueller added a little bit more information about Google’s MUM algorithm by explaining that it’s more than just applicable for ranking purposes.

He indicated that there are other tasks that it can perform that are beyond the ranking part of Google’s algorithms and that it can play a role in other parts of search.

Mueller also described MUM as being able to understand things with a fine-grained level of detail.

Related: Google MUM is Coming to Lens

Citation: Google MUM Algorithm Does More than Just Rank Websites

Watch John Mueller discuss the MUM algorithm at the 2:13 minute mark:

Google Updates Search Snippets For Product Review Pages

Google updates search results for product review pages by listing an item’s pros and cons in the search snippet.

In addition, there’s structured data to go along with this update, but it’s not 100% mandatory to qualify for the new snippets.

While the new pros and cons structured data is recommended, Google says it will try to pull the information into the snippets automatically.

Here’s what’s changing and how to manually add the structured data to your product review pages.

New Search Snippets For Product Review Pages

Google is displaying more detailed snippets for product review pages with new lines of text listing pros and cons.

In a blog post, Google states:

“Product reviews often contain a list of pros and cons, which our research has shown to be popular with shoppers when making their purchasing decisions. Because of their importance to users, Google Search may highlight pros and cons in the product review snippet in Search results.”

An example of the new search snippet is shown below:

Google can create these new snippets automatically, as long as the information appears somewhere on the page.

You can make the information clear to Google by marking up your product review pages with pros and cons structured data.

New Pros & Cons Structured Data

In conjunction with the update to product review search snippets, Google is introducing a new type of structured data.

As a best practice, it’s always recommended to use Google-supported structured data when possible, even if it’s not a requirement.

To manually tell Google about the pros and cons of an editorial product review, add the positiveNotes and/or negativeNotes properties to your nested product review.

Examples of both types of markup code are shown below:

See Google’s official documentation for more information about applying this markup.

If you add pros and cons structured data, you must follow these guidelines:

Currently, only editorial product review pages are eligible for the pros and cons appearance in Search, not merchant product pages or customer product reviews.

There must be at least two statements about the product. It can be any combination of positive and/or negative statements (for example, ItemList markup with two positive statements is valid).

The pros and cons must be visible to users on the page.

Update the detailed information about Does Google Use Sentiment Analysis To Rank Web Pages? on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!