Exercise : Pandas and Date time Back Link

Pandas date & time Pandas

Background of Data & sample excel file

Inside google webmaster central we can monitor the back links ( pointing to your site ) by downloading the excel file. Login to webmaster central and visit the link saying Links , there are two sections, one is External Links and other one is Internal Links. At top right you will get another button saying Export External Links. Click that and select Latest Links. Download the excel file from here.

We have modified the file of this site and created one sample file with sample data. In the sample file the name of the sites are changed. You can use your own site data or use this sample file. The output will change when you use your own file.

Download links_external.xlsx file

There are two columns in this excel file, one is Linking page ( URL ) and other one is Last crawled date .

Save this file in your computer and use the path based on your system. We stored the file at root of D drive so we set the path like this.
my_data = pd.read_excel('D:\links_external.xlsx')
Read more on how to read excel file and create DataFrame.

  1. How many rows and columns are there in our sample DataFrame?
  2. What are the columns in the DataFrame?
  3. What is the current year?
  4. Number of records of current year?
  5. Number of records of previous year?
  6. Display year and number of records in that year?
  7. Display month and number of records in that month?
  8. Display Year and Month with number of records in that year month?
  9. Display number of records ( links ) from a particular site ( www.stackoverflow.com )?

How many rows and columns are there in our sample DataFrame.

import pandas as pd 
my_data = pd.read_excel('D:\links_external.xlsx') # change the path 
print(my_data.shape)

What are the columns in the DataFrame

print(my_data.columns)
Output
Index(['Linking page', 'Last crawled'], dtype='object')

What is the current year?

More on to_datetime() here
current_year=pd.to_datetime('now').year

Number of records of current year

We will use the current_year to match the records against the Last crawled datetime column.
import pandas as pd 
current_year=pd.to_datetime('now').year
my_data = pd.read_excel('D:\links_external.xlsx')
my_data['Last crawled']=pd.to_datetime(my_data['Last crawled'])
# all records of current year
df=my_data[my_data['Last crawled'].dt.year==current_year]
print("Number of Records of current year : ",len(df))

Number of records of previous year

df=my_data[my_data['Last crawled'].dt.year==current_year-1]
print("Number of Records of previous  year : ",len(df))

Display year and number of records in that year.

We used groupby() to group data against each year.
import pandas as pd 
#current_year=pd.to_datetime('now').year
my_data = pd.read_excel('D:\links_external.xlsx')
my_data['Last crawled']=pd.to_datetime(my_data['Last crawled'])
print(my_data.groupby(my_data['Last crawled'].dt.year).count())

Display month and number of records in that month.

Changed the year by month. Note that this will include months of all years.
print(my_data.groupby(my_data['Last crawled'].dt.month).count())

Display Year and Month with number of records in that year month

Each month is grouped against each year.
print(my_data.groupby([my_data['Last crawled'].dt.year,my_data['Last crawled'].dt.month]).count())

Display number of records ( links ) from a particular site ( www.stackoverflow.com )

You can use any other site in your example. Read more on str.contains()
By using groupby() we created breakup of records against each year.
import pandas as pd 
my_data = pd.read_excel('D:\links_external.xlsx')
my_data=my_data[my_data['Linking page'].str.contains('stackoverflow')]
print(len(my_data)) # number of records
print(my_data.groupby(my_data['Last crawled'].dt.year).count())
Pandas date & time
Exercise datetime 3-1 Exercise datetime 3-3 Exercise datetime 3-4


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer