02/05/2019

11 minutes to read

Anyone working in SEO or PPC is exposed to a large amount of data from various third-party programs. In order to get the right insights for your next steps of action, you will need to use the right tools to analyse all this data. Excel is undoubtedly one of them, but as we often learn from experience, when it comes to data analysis at scale, it isn’t perfect.

In this post we are going to show you an example of how to use Python instead of Excel to analyse a search queries report from Google Search Console. The main goal of this analysis is to extract the insights that will help to optimise your future PPC campaigns and improve CTRs of your organic listings.

Search queries report – Why analyse?

The main data source used in this post is the data export from Google Analytics Search Queries report: Acquisition > Search Queries

This report shows you not only the keywords that you are bidding on, but the actual queries that resulted in a display of your Google Ads and have driven traffic to your site.

Search query data offers you insight into which queries are not working and which queries are working better than you might have thought. We can also use the report to learn which keywords produce sales, to target.

Background and expected results

The idea for this script came from our PPC team who were trying to find a way in Excel to extract words and phrases that did not convert in the previous campaigns. Pivot tables, various lookups and macros were tested to complete the task, however none of them – separately or used in combination – were delivering the appropriate results.

The Python script we are going to show here does this in its first part. As a result the user will be given a file with words and phrases that did not convert in the past together with the total amount of cost spent for the ads that had been using these words and phrases.

The second part of the script is showing a way on how the same data from the Search Queries report can be used to optimise your pages for search engines. More specifically, we are going to look up search queries with transactions from this report in the search queries pool from Search Console. This can obviously be done in Excel, but we would like to show you how this can also be done using pandas – “Python Data Analysis Library”. The goal of this exercise is to see if the content of your site is optimised for the search queries that have converted in the past. More on this later.

Programming environment

For this analysis, I will be using the Jupyter Notebook as my editor. I will assume that you are familiar with it and have some basic understanding of pandas library. For all the codes as well as example files used in this post, click here.

Script explained step-by-step

Part I. Data import and preprocessing

Create a folder with two csv files: input_data.csv should contain search queries from your Search Queries report together with their clicks, cost, transactions and bounce rate. (No headers, please!), search_console_data.csv should contain data from Search Console including queries, their impressions, clicks, CTRs and positions.

import pandas as pd
import numpy as np
import os
import csv

First, we will need to load libraries that include methods/functions to manipulate the files with our data and navigate through the folders. These are going to be pandas, numpy, os and csv.

os.chdir('YOUR-PATH')

Navigate to your folder that contains your input files using .chdir() method. Basically, replace YOUR-PATH with the path to this folder.

output_df = pd.DataFrame(columns=['keyword', 'clicks'])
with open('input_data.csv') as f:
    rows = csv.reader(f)
    for row in rows:
        keyword_column = row[0]
        clicks_column = int(row[1])
        cost_column = float(row[2])
        transactions_column = int(row[3])
        keywords = keyword_column.split()
        for word in keywords:
            output_df = output_df.append({'keyword': word, 'clicks': clicks_column,'cost': cost_column, 'transactions': transactions_column}, ignore_index=True)
output_df

We will start working with the data by splitting all search queries into separate words and attaching to them their clicks, transactions and their costs:

output_df.dtypes

Before starting working with the data in the output_df data frame, we will need to tidy up returned data types. By applying dtypes you will get data type of each column in your output_df data frame:

output_df.clicks = output_df.clicks.astype('int64', copy=False)
output_df.transactions = output_df.transactions.astype('int64', copy=False)
output_df.keyword = output_df.keyword.str.lower()
output_df

As object data type was returned for clicks, and transactions as floating-point numbers (also called floats – numbers with a decimal point dividing the integer and fractional parts), we will need to convert them into integers (positive or negative whole numbers with no decimal point) using astype. We will also apply lower() function on the returned words just to make them cleaner in case we have some words with capital letters in them:

final_df = output_df.groupby("keyword").sum()
final_df.reset_index(inplace=True)
final_df

Now we are going to sum up clicks, costs and transactions for each word. The result will be saved in the final_df data frame:

final_output_result_no_convs = final_df[final_df.transactions == 0]
final_output_result_no_convs
final_output_result_no_convs.to_csv('1-lookup_words_no_convs.csv', index = False)

Identify and save words with 0 transactions in a separate csv file 1-lookup_words_no_convs.csv:

Part II. Find words and phrases that did not convert

The goal of this part of the script is to find words and phrases that in your input_data.csv file that did not convert in the past. These are now saved in the 1-lookup_words_no_convs csv file.

with open('1-lookup_words_no_convs.csv', 'r') as file:
  reader = csv.reader(file)
  lookup_words_no_convs = list(reader)
lookup_words_no_convs
lookup_words_no_convs_words = [item[0] for item in lookup_words_no_convs]
lookup_words_no_convs_words

Read the file with words that did not convert, and save them into a list:

my_ads_pool = pd.read_csv('input_data.csv',names = ["keyword_phrases", "clicks", "cost", "transactions", "bounce_rate"])
my_ads_pool

Read the file with all your search queries, and save them into a data frame:

my_ads_pool.dtypes

Bounce_rate is loaded as an object, and needs to be converted:

my_ads_pool['bounce_rate'] = pd.to_numeric(my_ads_pool['bounce_rate'].astype(str).str.strip('%'), errors='coerce')
my_ads_pool

We need to remove the % sign to be able to work with bounce rate values:

def getKeywordsFrom(keywords):
    result = ""
    for keyword in keywords.split():
        for lookup_keyword in lookup_words_no_convs_words:
            if keyword == lookup_keyword:
                result = result + " " + keyword
    return result
print(getKeywordsFrom('insurance for computer programmer'))

The function above is what makes the whole thing work. It splits your search queries into words, and checks if any of the words from the list with no conversions can be found there. If found, they are returned.

my_ads_pool['bad_to_ok_keywords'] = my_ads_pool['keyword_phrases'].apply(
    lambda keywords: getKeywordsFrom(keywords)
)
fitered_df = my_ads_pool.sort_values(by='bad_to_ok_keywords')
fitered_df

Next, apply the function above on the “your data frame” with search queries. Add a “bad_to_ok_keywords” column, and if the words with no transactions are found, they will be returned here:

final_no_convs = fitered_df[fitered_df.bad_to_ok_keywords.str.contains(' ')]
final_no_convs

Remove empty rows:

final_df_avg = final_no_convs.groupby("bad_to_ok_keywords").mean().drop(['clicks', 'cost', 'transactions'], axis=1)
final_df_avg.reset_index(inplace=True)
final_df_avg_sorted = final_df_avg.sort_values(by='bad_to_ok_keywords', ascending=True)
final_df_avg_sorted

Get bounce rate averages for duplicate words and phrases. This will help us later to split them into bad and ok keywords:

final_df_sum = final_no_convs.groupby("bad_to_ok_keywords").sum().drop(['bounce_rate'], axis = 1)
final_df_sum.reset_index(inplace=True)
final_df_sum_sorted = final_df_sum.sort_values(by='bad_to_ok_keywords', ascending=True)
final_df_sum_sorted

Sum up clicks and cost, and add them in the column next to bad_to_ok_keywords:

final_sums_and_avgs = pd.merge(final_df_sum_sorted, final_df_avg_sorted, on='bad_to_ok_keywords')
final_sums_and_avgs

Concatenate final_df_avg_sorted and final_df_sum_sorted:

final_sums_and_avgs.columns = ['bad_to_ok_keywords', 'clicks_sum',  'cost_sum', 'transactions_sum', 'bounce_rate_avg']
final_sums_and_avgs

Rename the columns to indicate sums and avgs:

final_sums_and_avgs['category'] = np.where(final_sums_and_avgs['bounce_rate_avg']<100, 'ok', 'bad')
final_sums_and_avgs

Categorise words and phrases as “ok” or “bad” based on the following criteria:
=100% bounce rate – bad
<100% bounce rate – ok

final_sums_and_avgs['bounce_rate_avg'] = pd.to_numeric(final_sums_and_avgs['bounce_rate_avg'].astype(int))
final_sums_and_avgs

Turn bounce rate to integer:

final_no_convs_pool = final_no_convs.reset_index(drop=True)
final_no_convs_pool

Save the dataframe with search queries that include words/phrases that did not convert in the new data frame: final_no_convs_pool:

final_result = pd.merge(final_no_convs_pool, final_sums_and_avgs, on='bad_to_ok_keywords')
final_result_drop_columns = final_result.drop(['transactions_sum','bounce_rate'], axis=1)
final_result_drop_columns

Add cost, click sums and bounce rate averages next to identified words and phrases in the above dataframe:

final_result_drop_columns.to_csv('2-ppc_bad_to_ok.csv', index = False)

Save to csv.

The result is a data frame saved to a csv file that contains your original search queries data with added words and phrases from these search queries that did not convert in the past, as well as the total amount of clicks and costs spent on the ads triggered by these search queries. Additionally, we categorised these words and phrases into two groups based on their bounce rates with bounce rates lower than 100% being “ok” and the rest being “bad” words and phrases that should definitely not be used in the next campaigns.

By analysing words and phrases from the “bad_to_ok_keywords” column you can quickly identify related query groups that are less likely to convert. For example, search queries with “dance”, “dance class”, “dance instructor”, “dance instructors”, “dance teacher” and “dance teachers” did not convert. These do clearly target the same group of search users.

Part III. Search for converted queries in Search Console data

The goal of this bit of the script is to see if we can find search queries from Search Queries report in the Search Console data export. We want to see if we already rank for them, or if we even have relevant content that addresses those queries. If we do, we can then increase the prominence of those keywords by making sure they are present in key areas, such as titles, H1 tags, and descriptions, as well as in the anchor text of selected internal links.

inputData = pd.read_csv('input_data.csv',names = ["keyword_phrases", "clicks", "cost", "transactions", "bounce_rate"])
inputData

Read the csv file with data export from Search Queries report:

inputData_with_convs = inputData[(inputData[['transactions']] != 0).all(axis=1)]
inputData_with_convs

Filter out search queries with no conversions:

searchConsole = pd.read_csv('search_console_data.csv',names = ["keyword_phrases", "clicks", "impressions", "ctr", "position"])
searchConsole

Import Search Console data into a data frame:

vlookup = pd.merge(inputData_with_convs, searchConsole, on='keyword_phrases')
vlookup

Merge with inputData_with_convs frame, and return search queries that can be found in both: in Search Console data and in GA’s Search Queries report. You have now a table with search queries and their metrics from GA and SC that did convert in the past but also presented in organic search. Optimise these queries first if needed:

vlookup_new = vlookup.drop(['clicks_y','impressions','ctr','position'], axis=1)
vlookup_new.columns = ['keyword_phrases','clicks', 'cost','transactions','bounce_rate']
vlookup_new

Remove clicks_y, impressions, ctr, position columns from vlookup data frame:

vlookup_new.to_csv('3-converted_found_in_search_console.csv', index = False)

Save to csv.

We now want to find search queries that did convert but cannot be found in Search Console. These are the queries that we will potentially want to include into our content.

joined_frames = vlookup_new.append(inputData_with_convs).sort_values(by='keyword_phrases', ascending=False)
not_found_in_search_console_but_converted = joined_frames.drop_duplicates(keep=False)
not_found_in_search_console_but_converted

Append one data frame to another. Remove duplicate rows. Below are search queries that did convert but we do not have any visibility for them at the moment as these were not found in Search Console:

not_found_in_search_console_but_converted.to_csv('4-converted_not_found_in_search_console.csv', index = False)

Export to csv.

Summary

After running the script on your own data you should find four additional csv files saved in the same folder as your input data:

1-lookup_words_no_convs.csv that includes separate words that did not convert.
2-ppc_bad_to_ok.csv that includes your original input data next to the words and phrases from the above file.
3-converted_found_in_search_console.csv that includes search queries found in both – Search Queries report and your Search Console data.
4-converted_not_found_in_search_console.csv that contains search queries with no organic visibility to use for further optimisation of your content.

If you have more ideas on how to use Python to analyse search queries for SEO and/or PPC, let me know in the comments!

Karina Kumykova

Analyse search queries with Python to help with your PPC and SEO efforts