We are aiming to raise £10k to help 4 small local charities in Brixton. Visit our Just Giving page.

Competitor Keyword Analysis With Python


Competitor Analysis with Pyton
Competitor Keyword Analysis – How Can Python Help?

Competitor keyword analysis as part of a keyword research is a task that every SEO specialist is very much familiar with. Finding the right keywords for your new content ideas is great, but assessing the competitive landscape for them is even better. By knowing how many competitors are already ranking for the same keywords, you can assess your chances to rank for it as well.

This tutorial is aimed to give you an example of how you can automate this part of your keyword research and analysis using Python. By running this script, you will quickly see how many competitors are ranking for the same keywords as your site or the keywords you would like your site to rank for (depending on your goals).

Additionally, by looking up the shared keywords with the original keyword pool, you can easily see keywords that none of your competitors are ranking for, which might be seen as a further opportunity for creating additional content around this keyword or group of keywords to further maintain their positions. In both cases, the expected result after running the script should look like this:

Why Python and not Excel?

Can we not do this analysis in the tool that we – SEO specialists – know well without being involved in coding? Under ‘the tool’ we mean Excel of course. Yes, we can. Unless you have a different suggestion which you are more than welcome to share in the comments, we found a way this can be done in Excel.

One of the possible solutions would be to pivot the keywords and the competitors to get a count of how many competitors are associated with each of those keywords. From this, you can assign a number (of the amount of competitors / ‘instances’) to that keyword. You can then use a formula to concatenate the names of the competitors with “Common to” to give you a nice looking cell with the insight you were looking for. The formula, as well as the result, will look like this:

However, we will still have the job of copying and pasting the formula as values and copying and pasting it again in the empty rows for each keyword that is shared between the competitors. And if you have hundreds of keywords in your file, it will result in quite a high amount of manual work which not everyone would be ready to do. This brings us to the pythonic solution for this task.

Folder Set-Up for Your Competitor Keyword Analysis

Create a folder with a CSV file that will contain the keywords you are ranking or want to rank for which you want to check against your competitors in one column, and the name of your domain in the column next to it. We will call this file ‘my-site’.
Within this folder add another folder called ‘comp_files’ which will contain separate CSV files in the same format as the CSV file for your keywords: column A will contain the keywords the site is ranking for, and column B the competitor’s name.

Python Script to Analyse your Competitors’ Keywords

Step 1. Load libraries

import pandas as pd
import numpy as np
import os
import csv

Step 2. Navigate to your working folder and create ‘myKeywords’ dataFrame that will contain the keywords from the ‘my-site.csv’ file – basically your keywords that you want to check against your competitors.

myKeywords = pd.read_csv("my-site.csv", header=None)

Step 3. Create the ‘dup_check’ function which will basically do the following: 1. It will append your competitors’ keywords from the files to the list of your keywords; 2. It will then identify if there are duplicates to your keywords among those of your competitors’.

def dup_check (comp_file):
    competitor = pd.read_csv(comp_file, header=None)
    with open(comp_file, 'r') as f:
        combined = myKeywords.append(competitor);
        combined['dup'] = combined.duplicated(subset=0);
        combined_dup = combined[combined.dup == True];
        return combined_dup

Step 4. Create an empty dataFrame called ‘data’, and navigate to the directory that contains the CSV files with the keywords from your competitors. In this case, it is ‘comp_files’ folder.

data = pd.DataFrame([])

Step 5. Run a for loop that loops through the files with your competitors’ keywords, and applies the ‘dup_check’ function on them, and appends the result to the empty dataFrame ‘data’.

for file in os.listdir(directory):
    data = data.append(pd.DataFrame(dup_check(directory + file)))

As a result, this dataFrame will be populated with keywords that you and your competitors have in common:
Competitor Keyword Analysis - Identified Competitors
Step 6. The final step involves tidying up the above dataFrame, such as removing the unnecessary now ‘dup’ column, renaming the columns, counting the total number of competitors that are also ranking for the common keyword, and adding it next to each of the keywords, as well as competitors’ names.
We can then finally export the result to the ‘shared_keywords.csv’ file to your working directory next to the file with your original keywords:
Competitor Keyword Analysis - Shared Keywords File
Here is the actual part of the script that does the above:

final_data = data.drop('dup',1)
final_data.rename(columns={0: 'shared_keywords', 1: 'competitors'}, inplace=True)
final_output = final_data.groupby(['shared_keywords']).size().to_frame('count').reset_index().sort_values('count', ascending=False)
final_data.sort_values(['shared_keywords'], ascending=True)
final_data['Count'] = final_data.groupby('shared_keywords').cumcount()

out = final_data.pivot('shared_keywords', 'Count', 'competitors').reset_index()
out_merged = pd.merge(final_output, out, on='shared_keywords')

out_merged.to_csv('shared_keywords.csv', index = False)

This script can and actually should be further extended with the monthly search volumes for each of the keywords and the rankings for your competitors to fully understand the competitive landscape for the keywords we want to rank for. You can find the script as well as the example files used in this post here.

Do you have any further suggestions on how to approach competitor keyword analysis using Python? If so please share in the comments!