From google webmaster central you can download data for your website. You can create one excel file using downloaded data.
Here is an sample excel file with some sample data downloaded from google webmaster central. In this file there are 50 rows of data. Each row contains one Query field and one Clicks field. The Clicks field says how many visitors has clicked and visited the website by using the given Query.
import pandas as pd
my_data=pd.read_excel("kw1_short.xlsx")
print(my_data)
This will display all 50 records from the excel file. Now using this code we will develop further code.
Listing Queries ( rows ) based on searched string
We used search word 'HTML' and applied case insensitive search by using str.contains.
import pandas as pd
my_data=pd.read_excel("kw1_short.xlsx")
my_data=my_data[my_data.Query.str.contains('HTML',case=False)]
print(my_data)
Output is here
Query Clicks
6 marquee speed html 406
7 html calendar 387
8 html marquee speed 356
9 marquee html speed 341
10 calendar html code 326
12 html calendar code 289
18 calendar html 197
21 marquee behaviour in html 183
23 image button html 178
24 how to select multiple options from a drop dow... 172
29 html event calendar code generator 165
30 image button in html 164
32 html marquee behavior 127
38 paired tags in html 120
40 types of tags in html 117
Note that our DataFrame my_data now contains filtered rows from the main or full DataFrame based on presence of word HTML. This filtered DataFrame will be used in our further processing.
After collecting the filtered rows we used count() to get the number of rows.
print(my_data.Query.count())
Output of count()
15
Find out total number of Clicks for the queries having the word HTML.
We will use sum() for this. Just add this code at the end of the above code.
print(my_data.Clicks.sum())
Ouptut
3528
Find out the query which has generated maximum number of Click having the word HTML
We will use max(). Note that by only using max() we will get the highest Click ( 406 ) but we want the row having the highest click. The 2nd line of code is used for this.
Here 406 is the maximum click and the 6th row with Query marquee speed html generated the highest Clicks.
Find out the shortest query having the word HTML in it
We are using the copy of a slice from main DataFrame ( my_data) which has all the rows having HTML in it. We can get length of the string data by using len().
Now we will find out the length of each Query column. For this we will create one more column NUMBER and then display the DATAFRAME in decreasing order of NUMBER column by using sort_values(). While displaying we will restrict the display to one row only by using iloc[0] .
As you can see the minimum length query is html calendar is of length 13. But as we are displaying one row only, we may miss if more than one column is having length equal to 13. We will modify the code to get all the rows having minimum length of query.
Query Clicks
24 how to select multiple options from a drop dow... 172
Here is the full code. Lines are commented with descriptions, you can remove the comment and check the code.
import pandas as pd
my_data1=pd.read_excel("kw1_short.xlsx")
my_data=my_data1[my_data1.Query.str.contains('HTML',case=False)]
#print(my_data)
#my_data['LENGTH']=my_data['Query'].str.len()
#print(my_data)
#print(my_data.sort_values(by=['LENGTH']).iloc[0]) # shortest query ( one row only)
#print(my_data.Query.count()) # Total number of rows
#print(my_data.Clicks.sum()) # sum of clicks
#print(my_data.Clicks.max()) # maximum clicks
#print(my_data[my_data['Clicks']==my_data.Clicks.max()]) # row with maximum clicks
#print(my_data[my_data['Query'].str.len()==my_data['Query'].str.len().min()]) # shortest Query
#print(my_data[my_data['Query'].str.len()==my_data['Query'].str.len().max()]) # longest Query