read_html():HTML tables into a list of DataFrame objects

import pandas as pd
df_l=pd.read_html('') # List 
#print(type(df_l)) # <class 'list'>
df=df_l[0] # create a DataFrame from the list object
print(df.head()) # Top 5 rows from DataFrame
Output is here
             Function                             Description
0                Date                  PHP Date & Time object
1  createfromformat()                      Change date format
2         checkdate()                         Validating date
3              date()  Required date and time in given format
4       date_create()                   Creating date objects
This function uses <table> <tr><th><td> .. tags and takes care of colpsan and rowspan of the <td> <th>. tags.

Creating DataFrame from Local html file

Download sample student.html file
import pandas as pd
df=df_l[0] # creating dataframe from list
print(df.tail()) # Last five rows of DataFrame
           name  class  mark  gender
31  Marry Toeey   Four    88    male
32    Binn Rott  Seven    90  female
33    Kenn Rein    Six    96  female
34     Gain Toe  Seven    69    male
35   Rows Noump    Six    88  female

Using file

File read
fob=open('C:\\data\\student.html','r') # Open in read mode # read the file data 
Output is same as above.


To collect the table where multiple tables are there we can use match option. Check this URL or Click Python Home page.
There are multiple tables in this page. Check here how we are matching this regex or string. Lines are commented with different match values. Try them.
import pandas as pd
#df_l=pd.read_html('',match='MySQL database')
df=df_l[0] # create a DataFrame


my_dict={'id': 'tb1'} # valid HTML table attributes
Change the id value to tb2 and check the result. There are multiple tables ( total 11) with same 'class' attribute. While creating the DataFrame we can use different elements of the list and get the tables. Here len() is used to get the number of tables having the matching attribute.

my_dict={'class': 'table table-striped'}
print(len(df_l)) # 11
df=df_l[1] # Change this value to get different table 
             0                                                1
0  BooleanVar()  Tkinter Variable for handling True / False data
1            IP               IP address and host name in Python
2          Json      Json methods to manage Json data formatting
3       tkinter                         Python GUI Module module
4        Turtle                          Draw graphics in Python
5         tuple                  Ordered unchangeable items list
6        Django                             Python web framework
7        Pickle               Pickle or Un-pickle Python objects
8        Pillow                    Python Imageing Library : PIL
Keep changing this line and get different tables.
df=df_l[2] # Change this value to get different table 


io: path or url or file objects ( check the above examples ), required
match : Matching the regex or string ( examples above )
flavor : Engine to use , 'bs4' or 'html5lib' ( You may have to install these libraries if not there )
header : The row number to be used
index_col : The column to be used as index ( see example above )
skiprows : Number of rows to skip
attrs : Valid html attribute passed as dictionary to identify table. ( see example above )
parse_dates: Pasing date column
thousands : Separators to use for thousands marking.
encoding : Encoding to be used while reading the file.
decimal : Char to be used as decimal ( , is used in European data )
na_values : How to handle NA values
keep_default_na : How to override default NA values.
displayed_only : How to handle displayed None elements.
extract_links : Extract href value.


to_html() read_csv()
Pandas read_excel() to_csv() to_excel()
Data input and output from Pandas DataFrame

Subscribe to our YouTube Channel here


* indicates required
Subscribe to plus2net

    Post your comments , suggestion , error , requirements etc here

    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2023 All rights reserved worldwide Privacy Policy Disclaimer