Archive by Karina Kumykova

Competitor Keyword Analysis With Python

04Jul

Competitor Analysis with PytonCompetitor Keyword Analysis – How Can Python Help?

Competitor keyword analysis as part of a keyword research is a task that every SEO specialist is very much familiar with. Finding the right keywords for your new content ideas is great, but assessing the competitive landscape for them is even better. By knowing how many competitors are already ranking for the same keywords, you can assess your chances to rank for it as well.

This tutorial is aimed to give you an example of how you can automate this part of your keyword research and analysis using Python. By running this script, you will quickly see how many competitors are ranking for the same keywords as your site or the keywords you would like your site to rank for (depending on your goals).

Additionally, by looking up the shared keywords with the original keyword pool, you can easily see keywords that none of your competitors are ranking for, which might be seen as a further opportunity for creating additional content around this keyword or group of keywords to further maintain their positions. In both cases, the expected result after running the script should look like this:
Competitor Keyword Analysis - Expected Result

Why Python and not Excel?

Can we not do this analysis in the tool that we – SEO specialists – know well without being involved in coding? Under ‘the tool’ we mean Excel of course. Yes, we can. Unless you have a different suggestion which you are more than welcome to share in the comments, we found a way this can be done in Excel.

One of the possible solutions would be to pivot the keywords and the competitors to get a count of how many competitors are associated with each of those keywords. From this, you can assign a number (of the amount of competitors / ‘instances’) to that keyword. You can then use a formula to concatenate the names of the competitors with “Common to” to give you a nice looking cell with the insight you were looking for. The formula, as well as the result, will look like this:
Competitor Keyword Analysis in Excel

However, we will still have the job of copying and pasting the formula as values and copying and pasting it again in the empty rows for each keyword that is shared between the competitors. And if you have hundreds of keywords in your file, it will result in quite a high amount of manual work which not everyone would be ready to do. This brings us to the pythonic solution for this task.

Folder Set-Up for Your Competitor Keyword Analysis

Create a folder with a CSV file that will contain the keywords you are ranking or want to rank for which you want to check against your competitors in one column, and the name of your domain in the column next to it. We will call this file ‘my-site’.
Within this folder add another folder called ‘comp_files’ which will contain separate CSV files in the same format as the CSV file for your keywords: column A will contain the keywords the site is ranking for, and column B the competitor’s name.
Competitor Keyword Analysis - Folder Set-Up

Python Script to Analyse your Competitors’ Keywords

Step 1. Load libraries

import pandas as pd
import numpy as np
import os
import csv

Step 2. Navigate to your working folder and create ‘myKeywords’ dataFrame that will contain the keywords from the ‘my-site.csv’ file – basically your keywords that you want to check against your competitors.

os.chdir('PATH TO THE FOLDER WITH YOUR KEYWORDS')
myKeywords = pd.read_csv("my-site.csv", header=None)

Step 3. Create the ‘dup_check’ function which will basically do the following: 1. It will append your competitors’ keywords from the files to the list of your keywords; 2. It will then identify if there are duplicates to your keywords among those of your competitors’.

def dup_check (comp_file):
    competitor = pd.read_csv(comp_file, header=None)
    with open(comp_file, 'r') as f:
        combined = myKeywords.append(competitor);
        combined['dup'] = combined.duplicated(subset=0);
        combined_dup = combined[combined.dup == True];
        return combined_dup

Step 4. Create an empty dataFrame called ‘data’, and navigate to the directory that contains the CSV files with the keywords from your competitors. In this case, it is ‘comp_files’ folder.

data = pd.DataFrame([])
directory = 'PATH TO THE FOLDER WITH KEYWORDS OF YOUR COMPETITORS'

Step 5. Run a for loop that loops through the files with your competitors’ keywords, and applies the ‘dup_check’ function on them, and appends the result to the empty dataFrame ‘data’.

for file in os.listdir(directory):
    data = data.append(pd.DataFrame(dup_check(directory + file)))

As a result, this dataFrame will be populated with keywords that you and your competitors have in common:
Competitor Keyword Analysis - Identified Competitors
Step 6. The final step involves tidying up the above dataFrame, such as removing the unnecessary now ‘dup’ column, renaming the columns, counting the total number of competitors that are also ranking for the common keyword, and adding it next to each of the keywords, as well as competitors’ names.
We can then finally export the result to the ‘shared_keywords.csv’ file to your working directory next to the file with your original keywords:
Competitor Keyword Analysis - Shared Keywords File
Here is the actual part of the script that does the above:

final_data = data.drop('dup',1)
final_data.rename(columns={0: 'shared_keywords', 1: 'competitors'}, inplace=True)
final_output = final_data.groupby(['shared_keywords']).size().to_frame('count').reset_index().sort_values('count', ascending=False)
final_data.sort_values(['shared_keywords'], ascending=True)
final_data['Count'] = final_data.groupby('shared_keywords').cumcount()

out = final_data.pivot('shared_keywords', 'Count', 'competitors').reset_index()
out_merged = pd.merge(final_output, out, on='shared_keywords')

out_merged.to_csv('shared_keywords.csv', index = False)

This script can and actually should be further extended with the monthly search volumes for each of the keywords and the rankings for your competitors to fully understand the competitive landscape for the keywords we want to rank for. You can find the script as well as the example files used in this post here.

Do you have any further suggestions on how to approach competitor keyword analysis using Python? If so please share in the comments!

Analyse search queries with Python to help with your PPC and SEO efforts

02May

Anyone working in SEO or PPC is exposed to a large amount of data from various third-party programs. In order to get the right insights for your next steps of action, you will need to use the right tools to analyse all this data. Excel is undoubtedly one of them, but as we often learn from experience, when it comes to data analysis at scale, it isn’t perfect.

In this post we are going to show you an example of how to use Python instead of Excel to analyse a search queries report from Google Search Console. The main goal of this analysis is to extract the insights that will help to optimise your future PPC campaigns and improve CTRs of your organic listings.

Search queries report – Why analyse?

The main data source used in this post is the data export from Google Analytics Search Queries report: Acquisition > Search Queries

This report shows you not only the keywords that you are bidding on, but the actual queries that resulted in a display of your Google Ads and have driven traffic to your site.

Search query data offers you insight into which queries are not working and which queries are working better than you might have thought. We can also use the report to learn which keywords produce sales, to target.

Background and expected results

The idea for this script came from our PPC team who were trying to find a way in Excel to extract words and phrases that did not convert in the previous campaigns. Pivot tables, various lookups and macros were tested to complete the task, however none of them – separately or used in combination – were delivering the appropriate results.

The Python script we are going to show here does this in its first part. As a result the user will be given a file with words and phrases that did not convert in the past together with the total amount of cost spent for the ads that had been using these words and phrases.

The second part of the script is showing a way on how the same data from the Search Queries report can be used to optimise your pages for search engines. More specifically, we are going to look up search queries with transactions from this report in the search queries pool from Search Console. This can obviously be done in Excel, but we would like to show you how this can also be done using pandas – “Python Data Analysis Library”. The goal of this exercise is to see if the content of your site is optimised for the search queries that have converted in the past. More on this later.

Programming environment

For this analysis, I will be using the Jupyter Notebook as my editor. I will assume that you are familiar with it and have some basic understanding of pandas library. For all the codes as well as example files used in this post, click here.

Script explained step-by-step

Part I. Data import and preprocessing

Create a folder with two csv files: input_data.csv should contain search queries from your Search Queries report together with their clicks, cost, transactions and bounce rate. (No headers, please!),  search_console_data.csv should contain data from Search Console including queries, their impressions, clicks, CTRs and positions.

import pandas as pd
import numpy as np
import os
import csv

First, we will need to load libraries that include methods/functions to manipulate the files with our data and navigate through the folders. These are going to be pandas, numpy, os and csv.

os.chdir('YOUR-PATH')

Navigate to your folder that contains your input files using .chdir() method. Basically, replace YOUR-PATH with the path to this folder.

output_df = pd.DataFrame(columns=['keyword', 'clicks'])
with open('input_data.csv') as f:
    rows = csv.reader(f)
    for row in rows:
        keyword_column = row[0]
        clicks_column = int(row[1])
        cost_column = float(row[2])
        transactions_column = int(row[3])
        keywords = keyword_column.split()
        for word in keywords:
            output_df = output_df.append({'keyword': word, 'clicks': clicks_column,'cost': cost_column, 'transactions': transactions_column}, ignore_index=True)
output_df

We will start working with the data by splitting all search queries into separate words and attaching to them their clicks, transactions and their costs:search-queries-analysis-step-1

output_df.dtypes

Before starting working with the data in the output_df data frame, we will need to tidy up returned data types. By applying dtypes you will get data type of each column in your output_df data frame:
python-data-types-1

output_df.clicks = output_df.clicks.astype('int64', copy=False)
output_df.transactions = output_df.transactions.astype('int64', copy=False)
output_df.keyword = output_df.keyword.str.lower()
output_df

As object data type was returned for clicks, and transactions as floating-point numbers (also called floats – numbers with a decimal point dividing the integer and fractional parts), we will need to convert them into integers (positive or negative whole numbers with no decimal point) using astype. We will also apply lower() function on the returned words just to make them cleaner in case we have some words with capital letters in them:
search-queries-analysis-step-2

final_df = output_df.groupby("keyword").sum()
final_df.reset_index(inplace=True)
final_df

Now we are going to sum up clicks, costs and transactions for each word. The result will be saved in the final_df data frame:
search-queries-analysis-step-3

final_output_result_no_convs = final_df[final_df.transactions == 0]
final_output_result_no_convs
final_output_result_no_convs.to_csv('1-lookup_words_no_convs.csv', index = False)

Identify and save words with 0 transactions in a separate csv file 1-lookup_words_no_convs.csv:
search-queries-analysis-step-4.png

Part II. Find words and phrases that did not convert

The goal of this part of the script is to find words and phrases that in your input_data.csv file that did not convert in the past. These are now saved in the 1-lookup_words_no_convs csv file.

with open('1-lookup_words_no_convs.csv', 'r') as file:
  reader = csv.reader(file)
  lookup_words_no_convs = list(reader)
lookup_words_no_convs
lookup_words_no_convs_words = [item[0] for item in lookup_words_no_convs]
lookup_words_no_convs_words

Read the file with words that did not convert, and save them into a list:
search-queries-analysis-step-5

my_ads_pool = pd.read_csv('input_data.csv',names = ["keyword_phrases", "clicks", "cost", "transactions", "bounce_rate"])
my_ads_pool

Read the file with all your search queries, and save them into a data frame:
search-queries-analysis-step-6.png

my_ads_pool.dtypes

Bounce_rate is loaded as an object, and needs to be converted:
search-queries-analysis-step-7

my_ads_pool['bounce_rate'] = pd.to_numeric(my_ads_pool['bounce_rate'].astype(str).str.strip('%'), errors='coerce')
my_ads_pool

We need to remove the % sign to be able to work with bounce rate values:

search-queries-analysis-step-8

def getKeywordsFrom(keywords):
    result = ""
    for keyword in keywords.split():
        for lookup_keyword in lookup_words_no_convs_words:
            if keyword == lookup_keyword:
                result = result + " " + keyword
    return result
print(getKeywordsFrom('insurance for computer programmer'))

The function above is what makes the whole thing work. It splits your search queries into words, and checks if any of the words from the list with no conversions can be found there. If found, they are returned.

my_ads_pool['bad_to_ok_keywords'] = my_ads_pool['keyword_phrases'].apply(
    lambda keywords: getKeywordsFrom(keywords)
)
fitered_df = my_ads_pool.sort_values(by='bad_to_ok_keywords')
fitered_df

Next, apply the function above on the “your data frame” with search queries. Add a “bad_to_ok_keywords” column, and if the words with no transactions are found, they will be returned here:
search-queries-analysis-step-10

final_no_convs = fitered_df[fitered_df.bad_to_ok_keywords.str.contains(' ')]
final_no_convs

Remove empty rows:
search-queries-analysis-step-11

final_df_avg = final_no_convs.groupby("bad_to_ok_keywords").mean().drop(['clicks', 'cost', 'transactions'], axis=1)
final_df_avg.reset_index(inplace=True)
final_df_avg_sorted = final_df_avg.sort_values(by='bad_to_ok_keywords', ascending=True)
final_df_avg_sorted

Get bounce rate averages for duplicate words and phrases. This will help us later to split them into bad and ok keywords:
search-queries-analysis-step-12

final_df_sum = final_no_convs.groupby("bad_to_ok_keywords").sum().drop(['bounce_rate'], axis = 1)
final_df_sum.reset_index(inplace=True)
final_df_sum_sorted = final_df_sum.sort_values(by='bad_to_ok_keywords', ascending=True)
final_df_sum_sorted

Sum up clicks and cost, and add them in the column next to bad_to_ok_keywords:
search-queries-analysis-step-13

final_sums_and_avgs = pd.merge(final_df_sum_sorted, final_df_avg_sorted, on='bad_to_ok_keywords')
final_sums_and_avgs

Concatenate final_df_avg_sorted and final_df_sum_sorted:
search-queries-analysis-step-14

final_sums_and_avgs.columns = ['bad_to_ok_keywords', 'clicks_sum',  'cost_sum', 'transactions_sum', 'bounce_rate_avg']
final_sums_and_avgs

Rename the columns to indicate sums and avgs:
search-queries-analysis-step-15

final_sums_and_avgs['category'] = np.where(final_sums_and_avgs['bounce_rate_avg']<100, 'ok', 'bad')
final_sums_and_avgs

Categorise words and phrases as “ok” or “bad” based on the following criteria:
=100% bounce rate – bad
<100% bounce rate – ok
search-queries-analysis-step-16

final_sums_and_avgs['bounce_rate_avg'] = pd.to_numeric(final_sums_and_avgs['bounce_rate_avg'].astype(int))
final_sums_and_avgs

Turn bounce rate to integer:
search-queries-analysis-step-17

final_no_convs_pool = final_no_convs.reset_index(drop=True)
final_no_convs_pool

Save the dataframe with search queries that include words/phrases that did not convert in the new data frame: final_no_convs_pool:
search-queries-analysis-step-18

final_result = pd.merge(final_no_convs_pool, final_sums_and_avgs, on='bad_to_ok_keywords')
final_result_drop_columns = final_result.drop(['transactions_sum','bounce_rate'], axis=1)
final_result_drop_columns

Add cost, click sums and bounce rate averages next to identified words and phrases in the above dataframe:
search-queries-analysis-step-18

final_result_drop_columns.to_csv('2-ppc_bad_to_ok.csv', index = False)

Save to csv.

The result is a data frame saved to a csv file that contains your original search queries data with added words and phrases from these search queries that did not convert in the past, as well as the total amount of clicks and costs spent on the ads triggered by these search queries. Additionally, we categorised these words and phrases into two groups based on their bounce rates with bounce rates lower than 100% being “ok” and the rest being “bad” words and phrases that should definitely not be used in the next campaigns.

By analysing words and phrases from the “bad_to_ok_keywords” column you can quickly identify related query groups that are less likely to convert. For example, search queries with “dance”, “dance class”, “dance instructor”, “dance instructors”, “dance teacher” and “dance teachers” did not convert. These do clearly target the same group of search users.

Part III. Search for converted queries in Search Console data

The goal of this bit of the script is to see if we can find search queries from Search Queries report in the Search Console data export. We want to see if we already rank for them, or if we even have relevant content that addresses those queries. If we do, we can then increase the prominence of those keywords by making sure they are present in key areas, such as titles, H1 tags, and descriptions, as well as in the anchor text of selected internal links.

inputData = pd.read_csv('input_data.csv',names = ["keyword_phrases", "clicks", "cost", "transactions", "bounce_rate"])
inputData

Read the csv file with data export from Search Queries report:
search-queries-analysis-step-20

inputData_with_convs = inputData[(inputData[['transactions']] != 0).all(axis=1)]
inputData_with_convs

Filter out search queries with no conversions:
search-queries-analysis-step-21

searchConsole = pd.read_csv('search_console_data.csv',names = ["keyword_phrases", "clicks", "impressions", "ctr", "position"])
searchConsole

Import Search Console data into a data frame:
search-queries-analysis-step-22

vlookup = pd.merge(inputData_with_convs, searchConsole, on='keyword_phrases')
vlookup

Merge with inputData_with_convs frame, and return search queries that can be found in both: in Search Console data and in GA’s Search Queries report. You have now a table with search queries and their metrics from GA and SC that did convert in the past but also presented in organic search. Optimise these queries first if needed:
search-queries-analysis-step-23

vlookup_new = vlookup.drop(['clicks_y','impressions','ctr','position'], axis=1)
vlookup_new.columns = ['keyword_phrases','clicks', 'cost','transactions','bounce_rate']
vlookup_new

Remove clicks_y, impressions, ctr, position columns from vlookup data frame:
search-queries-analysis-step-24

vlookup_new.to_csv('3-converted_found_in_search_console.csv', index = False)

Save to csv.

We now want to find search queries that did convert but cannot be found in Search Console. These are the queries that we will potentially want to include into our content.

joined_frames = vlookup_new.append(inputData_with_convs).sort_values(by='keyword_phrases', ascending=False)
not_found_in_search_console_but_converted = joined_frames.drop_duplicates(keep=False)
not_found_in_search_console_but_converted

Append one data frame to another. Remove duplicate rows. Below are search queries that did convert but we do not have any visibility for them at the moment as these were not found in Search Console:
search-queries-analysis-step-25

not_found_in_search_console_but_converted.to_csv('4-converted_not_found_in_search_console.csv', index = False)

Export to csv.

Summary

After running the script on your own data you should find four additional csv files saved in the same folder as your input data:

  • 1-lookup_words_no_convs.csv that includes separate words that did not convert.
  • 2-ppc_bad_to_ok.csv that includes your original input data next to the words and phrases from the above file.
  • 3-converted_found_in_search_console.csv that includes search queries found in both – Search Queries report and your Search Console data.
  • 4-converted_not_found_in_search_console.csv that contains search queries with no organic visibility to use for further optimisation of your content.

If you have more ideas on how to use Python to analyse search queries for SEO and/or PPC, let me know in the comments!

Index Match Match – an Alternative to Vlookup for SEO Analysis

20Nov

You would be hard-pressed to find an SEO specialist that has never used VLOOKUP. Nowadays we have access to a massive amount of data from various tools, and the most powerful SEO analysis can be achieved when we connect all these data points, which often means looking up values from different sources.

In this post, we want to give you examples on how and when you can use INDEX MATCH and its more powerful version INDEX MATCH MATCH as an alternative to VLOOKUP.

VLOOKUP is a great formula, but it does have its limits. The main one being that it will only work when it’s pulling data in a left to right fashion. This is where INDEX MATCH comes in. This formula has the freedom to pull from right to left as well as left to right.

An important thing to note is that by switching the columns around in the master sheet we can complete this task with VLOOKUP, however, doing this may break other formulas tied to those cells. Plus, it’s always easier to work with the original output format of a document, enabling you to paste new values into the document without breaking the formula.

Here’s an example of how we’ve used it recently. We were given a list of product numbers, with which we needed to find the associated keyword.

This was the spreadsheet we had to work with in order to find the data we were looking for, and as you can see it contains both the Product No. and the Keywords we are looking for. A VLOOKUP will not work here as we have a list of product numbers and need to find the keywords associated (which are further left than the product numbers).

First things first, we created a column for the Product Numbers which we have on hand. Then we go about setting up the formula. The formula should be placed in the first empty cell in the “Keyword” column of the second table in the image below (cell C14).

The formula is as follows:

=INDEX(C3:C11,MATCH(B14,E3:E11,0)).

INDEX (Range of cells that you want to find values from),

MATCH (Cell you want to find the corresponding value ofrange of cells from which the previous cell value is from, 0)

Once the formula has been entered, you can then drag it downwards to the bottom of the table, allowing you to put in your Product No. values and have the formula pull the keyword automatically.

Now that we know why and when we should use INDEX MATCH instead of VLOOKUP, we can look at a more powerful Excel lookup formula combination which is INDEX MATCH MATCH.

How can it be useful for SEO?

“Internal – All – Export” – a regular Screaming Frog user can easily recognise these steps for exporting crawl data. We all know what a great tool it is and cannot imagine our work without it. But sometimes we wish it could only give us the data we need depending on what metrics we want to look up. And this where INDEX MATCH MATCH can come very handy.

The number of columns in the exported spreadsheet from the Internal tab can vary depending on the number of that were found on the site (rarely, but sometimes you can see sites with two titles added to their pages, or even two canonical tags), and if any additional data is included or not (e.g.: Google Search Console and Google Analytics).

INDEX MATCH MATCH allows to perform two-ways look-up – a vertical one (to find certain URLs among all URLs from the very left “Address” column of the crawl) and a horizontal one (to match data from certain columns in your Screaming Frog crawl for these URLs).

An example will help us to see how it actually works.

The table below represents a Screaming Frog crawl. In this example we want to look up the values from the “Meta Robots” and “Canonical Link Element” columns for URLs 4, 7 and 9 (all marked in orange).

To do so we are using INDEX MATCH MATCH as shown below:

The result is as expected will be the meta robots and canonical URLs values for the three URLs:

So, how do we actually get to this result using INDEX MATCH MATCH formula combination?

Below you will find the same images as above but with a coloured markup that should help further explain the formula. However, for clarity, let us review it step-by-step.

We added the above formula to the row C16 as per image below (or add it next to the first URL you would want to look up the metric values for):

  1. INDEX (mark up for your entire crawl),
  2. MATCH (lookup value – your first URL to look up, the column we want to look up against, 0),
  3. MATCH (lookup value – your first metric to look up, rows we want to look up against, 0))

As you can see, we are using $ sign in our INDEX MATCH MATCH formula combination, which allows us to lock columns and rows. When added to the formula, as shown above, you can easily drag it down to populate the lookup values for the remaining URLs.

We hope you will give INDEX MATCH MATCH a go as a Vlookup alternative to in your next SEO analysis if you haven’t already done so. Here at Passion Digital we love to share knowledge and learn from our peers, so if you have any other use cases when these formula combinations can be applied for everyday SEO analysis, feel free to get in touch and let us know about them!

We're recognised by
Digital Marketing Agency London +Mike Grindy