import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,60,30],
'ENGLISH':[80,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[1, 50, 70, 100])
print(my_data)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (70, 100]
1 Raju 2 40 70 (1, 50]
2 Alex 3 70 40 (50, 70]
3 Ron 4 70 50 (50, 70]
4 King 5 60 60 (50, 70]
5 Jack 6 30 30 (1, 50]
print(my_data['my_cut'].dtypes) # category
Read more on data types by dtypes() and about categorical data type.
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=5)
print(my_data)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (70.0, 80.0]
1 Raju 2 40 70 (29.95, 40.0]
2 Alex 3 70 40 (60.0, 70.0]
3 Ron 4 70 50 (60.0, 70.0]
4 King 5 60 60 (50.0, 60.0]
5 Jack 6 30 30 (29.95, 40.0]
Sequence of scalars : We specify the edges of the bins.
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[1,50,70,100])
print(my_data)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (70, 100]
1 Raju 2 40 70 (1, 50]
2 Alex 3 70 40 (50, 70]
3 Ron 4 70 50 (50, 70]
4 King 5 60 60 (50, 70]
5 Jack 6 30 30 (1, 50]
Intervalindex : Non overlapping exact bins.
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[1,49,50,69,70,79,80,100])
print(my_data)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (79, 80]
1 Raju 2 40 70 (1, 49]
2 Alex 3 70 40 (69, 70]
3 Ron 4 70 50 (69, 70]
4 King 5 60 60 (50, 69]
5 Jack 6 30 30 (1, 49]
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[1,50,70,100],right=True)
output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (70, 100]
1 Raju 2 40 70 (1, 50]
2 Alex 3 70 40 (50, 70]
3 Ron 4 70 50 (50, 70]
4 King 5 70 60 (50, 70]
5 Jack 6 30 30 (1, 50]
Let us change to right=False
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[1,50,70,100],right=False)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 [70, 100)
1 Raju 2 40 70 [1, 50)
2 Alex 3 70 40 [70, 100)
3 Ron 4 70 50 [70, 100)
4 King 5 70 60 [70, 100)
5 Jack 6 30 30 [1, 50)
my_labels=['Fail','Second','First']
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[1, 50, 75, 100],labels=my_labels)
print(my_data)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 First
1 Raju 2 40 70 Fail
2 Alex 3 70 40 Second
3 Ron 4 70 50 Second
4 King 5 70 60 Second
5 Jack 6 30 30 Fail
We can use sum of two columns as our input array.
my_labels=['Fail','Second','First']
my_data['my_cut'] = pd.cut(x=my_data['MATH']+my_data['ENGLISH'],bins=[1, 100, 150, 200],labels=my_labels)
print(my_data)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 First
1 Raju 2 40 70 Second
2 Alex 3 70 40 Second
3 Ron 4 70 50 Second
4 King 5 70 60 Second
5 Jack 6 30 30 Fail
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[30,60,80,100],include_lowest=False)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (60.0, 80.0]
1 Raju 2 40 70 (30.0, 60.0]
2 Alex 3 70 40 (60.0, 80.0]
3 Ron 4 70 50 (60.0, 80.0]
4 King 5 70 60 (60.0, 80.0]
5 Jack 6 30 30 NaN
Let us try include_lowest=True
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[30,60,80,100],include_lowest=True)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (60.0, 80.0]
1 Raju 2 40 70 (29.999, 60.0]
2 Alex 3 70 40 (60.0, 80.0]
3 Ron 4 70 50 (60.0, 80.0]
4 King 5 70 60 (60.0, 80.0]
5 Jack 6 30 30 (29.999, 60.0]
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[40,50,50,100],duplicates='drop')
print(my_data)
Output
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (50.0, 100.0]
1 Raju 2 40 70 NaN
2 Alex 3 70 40 (50.0, 100.0]
3 Ron 4 70 50 (50.0, 100.0]
4 King 5 70 60 (50.0, 100.0]
5 Jack 6 30 30 NaN
Let us change to duplicates='raise'
my_data['my_cut'] = pd.cut(x=my_data['MATH'],bins=[40,50,50,100],duplicates='raise')
OutputAuthor
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.