Exercise : Pandas and analytics

Pandas

From google webmaster central you can download data for your website. You can create one excel file using downloaded data.

Here is an sample excel file with some sample data downloaded from google webmaster central. In this file there are 50 rows of data. Each row contains one Query field and one Clicks field. The Clicks field says how many visitors has clicked and visited the website by using the given Query.

Download kw1_short.xlsx file

  1. Read the excel file and create a Pandas dataframe.
  2. List all queries which has the word HTML within it.
  3. Find out total number of Clicks for the queries having the word HTML.
  4. Find out the query which has generated maximum number of Click having the word HTML.
  5. Find out the shortest query having the word HTML in it.
  6. Find out the longest query having the word HTML in it.

Reading and creating DataFrame

You can read on how to use read_excel and create one DataFrame.
import pandas as pd 
my_data=pd.read_excel("kw1_short.xlsx")
print(my_data)
This will display all 50 records from the excel file. Now using this code we will develop further code.

Listing Queries ( rows ) based on searched string

We used search word 'HTML' and applied case insensitive search by using str.contains.
import pandas as pd 
my_data=pd.read_excel("kw1_short.xlsx")
my_data=my_data[my_data.Query.str.contains('HTML',case=False)]
print(my_data)
Output is here
                                                Query  Clicks
6                                  marquee speed html     406
7                                       html calendar     387
8                                  html marquee speed     356
9                                  marquee html speed     341
10                                 calendar html code     326
12                                 html calendar code     289
18                                      calendar html     197
21                          marquee behaviour in html     183
23                                  image button html     178
24  how to select multiple options from a drop dow...     172
29                 html event calendar code generator     165
30                               image button in html     164
32                              html marquee behavior     127
38                                paired tags in html     120
40                              types of tags in html     117
After collecting the filtered rows we used count() to get the number of rows.
print(my_data.Query.count())
Output of count()
15

Find out total number of Clicks for the queries having the word HTML.

We will use sum() for this. Just add this code at the end of the above code.
print(my_data.Clicks.sum())
Ouptut
3528

Find out the query which has generated maximum number of Click having the word HTML

We will use max(). Note that by only using max() we will get the highest Click ( 406 ) but we want the row having the highest click. The 2nd line of code is used for this.
print(my_data.Clicks.max())
print(my_data[my_data['Clicks']==my_data.Clicks.max()])
Output
406
                Query  Clicks
6  marquee speed html     406
Here 406 is the maximum click and the 6th row wtih Query marquee speed html generated the highest Clicks.

Find out the shortest query having the word HTML in it

We are using the copy of a slice from main DataFrame ( my_data) which has all the rows having HTML in it. We can get length of the string data by using len().

Now we will find out the length of each Query column. For this we will create one more column NUMBER and then display the DATAFRAME in decreasing order of NUMBER column by using sort_values(). While displaying we will restrict the display to one row only by using iloc[0] .
my_data['LENGTH']=my_data['Query'].str.len()
print(my_data.sort_values(by=['LENGTH']).iloc[0])
Output is here
Query     html calendar
Clicks              387
length               13
As you can see the minimum legth query is html calendar is of length 13. But as we are displaying one row only, we may miss if more than one column is having length equal to 13. We will modify the code to get all the rows having minimum length of query.
print(my_data[my_data['Query'].str.len()==my_data['Query'].str.len().min()])
Output is here
            Query  Clicks  LENGTH
7   html calendar     387      13
18  calendar html     197      13

Find out the longest query having the word HTML in it

Based on the similar line ( as above ) we will find out the longest Query.
print(my_data[my_data['Query'].str.len()==my_data['Query'].str.len().max()])
Here is the output
                                                Query  Clicks
24  how to select multiple options from a drop dow...     172
Here is the full code. Lines are commented with descriptions, you can remove the comment and check the code.
import pandas as pd 
my_data1=pd.read_excel("kw1_short.xlsx")
my_data=my_data1[my_data1.Query.str.contains('HTML',case=False)]
#print(my_data)
#my_data['LENGTH']=my_data['Query'].str.len()
#print(my_data)
#print(my_data.sort_values(by=['LENGTH']).iloc[0]) # shortest query ( one row only)
#print(my_data.Query.count()) # Total number of rows 
#print(my_data.Clicks.sum()) # sum of clicks
#print(my_data.Clicks.max()) # maximum clicks
#print(my_data[my_data['Clicks']==my_data.Clicks.max()]) # row with maximum clicks
#print(my_data[my_data['Query'].str.len()==my_data['Query'].str.len().min()]) # shortest Query
#print(my_data[my_data['Query'].str.len()==my_data['Query'].str.len().max()]) # longest Query
Pandas Pandas DataFrame


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer