« BeautifulSoup Basics
By using select method we can run a CSS selector and get all matching elements. We can find tags also by using select method.
import requests
link = "https://www.plus2net.com/html_tutorial/html_form.php"
content = requests.get(link)
from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, 'html.parser')
print(soup.select("title"))
Output
[<title>Web Form tag elements in HTML</title>]
All meta tags available within HEAD tag
print(soup.select("head meta"))
Tags with class='table-striped'
print(soup.select('.table-striped'))
All the links inside class='table-striped'
print(soup.select(".table-striped a"))
All links even list
print(soup.select("a:nth-of-type(even)"))
Odd list
print(soup.select("a:nth-of-type(odd)"))
print(soup.select("a:nth-of-type(2n)"))
Example
We can use class name, id , tag with class , tag with id etc.
content = """<h2>List of web programming languages</h2>
<div class=my_list>
<p>My Pages one </p>
<p class=my_pages>My Pages </p>
<p id=ck1>My ck1 page</p>
<a href='https://www.plus2net.com' class='home_link'>Home page</a>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(content, 'html.parser')
print(soup.select("div")) # all div tags
We will get all div tags in above code.
Let us collect tag having class=my_list
print(soup.select('.my_list')) # class=my_list
Print tags with class=my_pages
print(soup.select('.my_pages')) # class=my_pages
Output
[<p class="my_pages">My Pages </p>]
Print tags with id
print(soup.select('#ck1'))
Output
[<p id="ck1">My ck1 page</p>]
Print all a tags within the class=home_link
[<a class="home_link" href="https://www.plus2net.com">Home page</a>]
Print all <p>
print(soup.select('p')) # all p tags
Output
[<p>My Pages one </p>, <p class="my_pages">My Pages </p>,
<p id="ck1">My ck1 page</p>]
All <p> tags having class
print(soup.select('p[class]'))
Output
[<p class="my_pages">My Pages </p>]
All <p> tags having id
print(soup.select('p[id]'))
Output
[<p id="ck1">My ck1 page</p>]
Find HTML Table with width="170", then collect the 2nd and 3rd <td> tag value
str1=soup.select('table[width="170"] td')
print(str1[1].string)
print(str1[2].string)
select_one
Print only the first <p> tag
print(soup.select_one('p')) # the first p tag only.
Output
<p>My Pages one </p>
Using CSS selector for XML
Try using this code to get mtaching XML tags with details.
To read XML files
pip install lxml
import requests
link = "https://www.plus2net.com/php_tutorial/file-xml-demo.xml"
content = requests.get(link)
from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, "xml")
print(soup.select("name"))
Output ( sample output )
[<name>John Deo</name>, <name>Max Ruin</name>,
------
------
<name>Rows Noump</name>]
← Subscribe to our YouTube Channel here