import pandas as pd
df_l=pd.read_html('https://www.plus2net.com/php_tutorial/site_map-date.php') # List
#print(type(df_l)) # <class 'list'>
df=df_l[0] # create a DataFrame from the list object
print(df.head()) # Top 5 rows from DataFrame
Output is here
Function Description
0 Date PHP Date & Time object
1 createfromformat() Change date format
2 checkdate() Validating date
3 date() Required date and time in given format
4 date_create() Creating date objects
This function uses <table> <tr><th><td> .. tags and takes care of colpsan and rowspan of the <td> <th>. tags.
import pandas as pd
df_l=pd.read_html('C:\\data\\student.html',index_col='id')
df=df_l[0] # creating dataframe from list
print(df.tail()) # Last five rows of DataFrame
output name class mark gender
id
31 Marry Toeey Four 88 male
32 Binn Rott Seven 90 female
33 Kenn Rein Six 96 female
34 Gain Toe Seven 69 male
35 Rows Noump Six 88 female
fob=open('C:\\data\\student.html','r') # Open in read mode
data=fob.read() # read the file data
df_l=pd.read_html(data,index_col='id')
df=df_l[0]
print(df.tail())
Output is same as above.
https://www.plus2net.com/python/site_map.php
There are multiple tables in this page. Check here how we are matching this regex or string. Lines are commented with different match values. Try them.
import pandas as pd
#df_l=pd.read_html('https://www.plus2net.com/python/site_map.php',match='Operators')
#df_l=pd.read_html('https://www.plus2net.com/python/site_map.php',match='MySQL database')
#df_l=pd.read_html('https://www.plus2net.com/python/site_map.php',match='Django')
df_l=pd.read_html('https://www.plus2net.com/python/site_map.php',match='Pygsheet')
df=df_l[0] # create a DataFrame
print(df)
my_dict={'id': 'tb1'} # valid HTML table attributes
df_l=pd.read_html('https://www.plus2net.com/python/site_map.php',attrs=my_dict)
Change the id value to tb2 and check the result.
There are multiple tables ( total 11) with same 'class' attribute. While creating the DataFrame we can use different elements of the list and get the tables. Here len() is used to get the number of tables having the matching attribute. my_dict={'class': 'table table-striped'}
df_l=pd.read_html('https://www.plus2net.com/python/site_map.php',attrs=my_dict)
print(len(df_l)) # 11
df=df_l[1] # Change this value to get different table
print(df)
Output
0 1
0 BooleanVar() Tkinter Variable for handling True / False data
1 IP IP address and host name in Python
2 Json Json methods to manage Json data formatting
3 tkinter Python GUI Module module
4 Turtle Draw graphics in Python
5 tuple Ordered unchangeable items list
6 Django Python web framework
7 Pickle Pickle or Un-pickle Python objects
8 Pillow Python Imageing Library : PIL
Keep changing this line and get different tables.
df=df_l[2] # Change this value to get different table
io
: path or url or file objects ( check the above examples ), requiredmatch
: Matching the regex or string ( examples above ) flavor
: Engine to use , 'bs4' or 'html5lib' ( You may have to install these libraries if not there ) header
: The row number to be usedindex_col
: The column to be used as index ( see example above )skiprows
: Number of rows to skipattrs
: Valid html attribute passed as dictionary to identify table. ( see example above ) parse_dates
: Pasing date column thousands
: Separators to use for thousands marking. encoding
: Encoding to be used while reading the file. decimal
: Char to be used as decimal ( , is used in European data )na_values
: How to handle NA valueskeep_default_na
: How to override default NA values. displayed_only
: How to handle displayed None elements. extract_links
: Extract href value. read_html()
function in Pandas?read_html()
function to read data from an HTML table?read_html()
function?read_html()
function read multiple tables from a single HTML page?read_html()
function?header
parameter work in the read_html()
function?read_html()
function handle tables with merged cells or complex structures?read_html()
function?read_html()
function?Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.