How to create a Python keyword analyzer for SEO optimization

Keyword analysis is an essential part of driving traffic to your website, and it's something every content creator and SEO professional should master. Luckily, there are some great tools out there, and you can even build your own using the Python programming language. Here's how to do it.

This guide assumes you have Python installed and know how to run scripts – if you're new, check out our guide to getting started with Python.

I'll explain each part of the script and provide the complete code at the end so you can copy and paste it.

If you don't already have them installed, you'll need to install the beautifulsoup4, requests, and nltk libraries before running the script with the following code in the command line:

pip install beautifulsoup4 request nltk

Now, let's get to the script. First, we set up our environment with the libraries we installed:

Import nltk

nltk.download('punkt')

nltk.download('stopwords')

Next, we'll use the Requests library to retrieve content from a web page. It takes a URL as input, sends an HTTP GET request to the specified URL, and returns the HTML content of the page.

Import Request

def fetch_content(url):

response = request.get(url)

If response.status_code == 200:

Return the response text

other than that:

print(f”Failed to retrieve content from {url}”)

return “”

The text is then cleaned up by removing punctuation and excess whitespace, and tokenized for further analysis and processing.

Import Re

import word_tokenize from nltk.tokenize

def clean_and_tokenize(text):

text = re.sub(r'\s+', ' ', text) # Remove excess whitespace

Text = re.sub(r'[^\w\s]', '', text) # Remove punctuation

token = word_tokenize(text.lower())

Return the token

The next step is to filter out common stop words – common words like “the”, “is” and “in” that are not relevant to the meaning of your content but may affect your keyword analysis.

Import stop words from nltk.corpus

def remove_stopwords(token):

stop_words = set(stopwords.words('english'))

Filtered Tokens = [word for word in tokens if word not in stop_words]

Returns the filtered token

Next, calculate the frequency and density of each keyword. Keyword density refers to the percentage of occurrences of a keyword in a text compared to the total number of words.

Importing counters from a collection

def analyze_keywords(token):

counter = counter(token)

Total words = sum(counter.values())

keyword_density = {word: (count / total_words) * 100 (for words, count in counter.items())}

Returns keyword density

Once the content has been analyzed, Python will print a report containing the data to the screen.

def generate_report(url, keyword_density):

sorted_keywords = sorted(keyword_density.items(), key=lambda item: item[1]Reverse=True)

report = f”Keyword density report for {url}\n”

report += “-” * 50 + “\n”

For keywords, the density in sorted_keywords[:10]: # Display top 10 keywords

report += f”keyword: {keyword}, density: {density:.2f}%\n”

Returns Report

Once all the individual functions are in place, you can place them into a main function that takes a URL as input and outputs a report when the script is run.

Define main() :

url = input(“Enter the URL of the webpage: “)

html_content = fetch_content(url)

For html_content:

Soup = BeautifulSoup(html_content, 'html.parser')

Text = soup.get_text()

token = clean_and_tokenize(text)

filtered tokens = remove stopwords(tokens)

keyword density = analyze_keywords(filtered tokens)

report = generate_report(url, keyword density)

Print (report)

If __name__ == “__main__” :

Main()

Here's the complete code for you to copy and paste:

Import Request

Import BeautifulSoup from bs4

Import stop words from nltk.corpus

import word_tokenize from nltk.tokenize

Importing counters from a collection

Import nltk

Import Re

# Download the required NLTK data

nltk.download('punkt')

nltk.download('stopwords')

# Get the webpage content

def fetch_content(url):

try:

response = request.get(url)

response.raise_for_status() # Raise an HTTPError for invalid responses (4xx and 5xx)

Return the response text

Exceptions: requests.exceptions.HTTPError as http_err.

print(f”An HTTP error occurred: {http_err}”)

Exception: request.exceptions.ConnectionError is raised as conn_err:

print(f”Connection error: {conn_err}”)

Excluding request.exceptions.Timeout as timeout_err:

print(f”A timeout error occurred: {timeout_err}”)

Exception: requests.exceptions.RequestException as req_err:

print(f”An error occurred: {req_err}”)

return “”

# Clean and tokenize the text

def clean_and_tokenize(text):

text = re.sub(r'\s+', ' ', text) # Remove excess whitespace

Text = re.sub(r'[^\w\s]', '', text) # Remove punctuation

text = text.lower() # Convert text to lower case

token = word_tokenize(text)

Return the token

# Remove stop words from tokens

def remove_stopwords(token):

stop_words = set(stopwords.words('english'))

Filtered Tokens = [word for word in tokens if word not in stop_words]

Returns the filtered token

# Analyze keyword density

def analyze_keywords(token):

counter = counter(token)

Total words = sum(counter.values())

keyword_density = {word: (count / total_words) * 100 (for words, count in counter.items())}

Returns keyword density

# Generate a keyword density report

def generate_report(url, keyword_density):

sorted_keywords = sorted(keyword_density.items(), key=lambda item: item[1]Reverse=True)

report = f”Keyword density report for {url}\n”

report += “-” * 50 + “\n”

For keywords, the density in sorted_keywords[:10]: # Display top 10 keywords

report += f”keyword: {keyword}, density: {density:.2f}%\n”

Returns Report

# Main features

Define main() :

url = input(“Enter the URL of the webpage: “)

html_content = fetch_content(url)

For html_content:

Soup = BeautifulSoup(html_content, 'html.parser')

Text = soup.get_text()

token = clean_and_tokenize(text)

filtered tokens = remove stopwords(tokens)

keyword density = analyze_keywords(filtered tokens)

report = generate_report(url, keyword density)

Print (report)

input(“Press Enter to exit…”)

If __name__ == “__main__” :

Main()

To get more useful Python scripts and leave your comments and questions, follow GeekSided.

Source link

What's Hot

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

How to create a Python keyword analyzer for SEO optimization

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

AI-powered SEO software market [2024-2031] Size, Trends, Sales, Revenue Forecasts HubSpot. Marketo. Oracle – Economica

AMD Ryzen AI CPU beats Intel Core Ultra in AI LLM and GenAI benchmarks, delivers lower power consumption and lower cost with XDNA

Microsoft investigates harmful AI-powered chatbot 'Copilot'

AnkerWork S600 review: An AI-powered speakerphone that actually works

Our Picks

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Most Popular

OnlyFans creator dishes dirt on dating

Anya Taylor-Joy has big plans to rival Gwyneth Paltrow's £197m business Goop as she prepares to launch a lifestyle business

OnlyFans star suffers from online stalking by family member: 'It hurts my stomach'

Subscribe to Updates

What's Hot

How to create a Python keyword analyzer for SEO optimization

Related Posts