Inside google webmaster central we can monitor the back links ( pointing to your site ) by downloading the excel file. Login to webmaster central and visit the link saying Links , there are two sections, one is External Links and other one is Internal Links. At top right you will get another button saying Export External Links. Click that and select Latest Links. Download the excel file from here.
We have modified the file of this site and created one sample file with sample data. In the sample file the name of the sites are changed.
You can use your own site data or use this sample file. The output will change when you use your own file.
We will use the current_year to match the records against the Last crawled datetime column.
In the sample data, highest records are there for the year 2020, so after 2020, current year will shift and there will not be any matching year. You can skip this and go to next number of records for previous year.
import pandas as pd
current_year=pd.to_datetime('now').year
my_data = pd.read_excel('D:\links_external.xlsx')
my_data['Last crawled']=pd.to_datetime(my_data['Last crawled'])
# all records of current year
df=my_data[my_data['Last crawled'].dt.year==current_year]
print("Number of Records of current year : ",len(df))
Number of records of previous year
df=my_data[my_data['Last crawled'].dt.year==current_year-1]
print("Number of Records of previous year : ",len(df))
Display year and number of records in that year.
We used groupby() to group data against each year.
Display number of records ( links ) from a particular site ( www.stackoverflow.com )
You can use any other site in your example. Read more on str.contains()
By using groupby() we created breakup of records against each year.
import pandas as pd
my_data = pd.read_excel('D:\links_external.xlsx')
my_data=my_data[my_data['Linking page'].str.contains('stackoverflow')]
print(len(my_data)) # number of records
print(my_data.groupby(my_data['Last crawled'].dt.year).count())