x : The one dimensional input array to be categorized. bins : The segments to be used for categorization. We can specify integer or non-uniform width or interval index. right: Default is True , the bin should include right most value or not ( see examples below ) labels : Default None , A list of labels can be used for bins, must match with number of segments or bins retbins : Default False, to return bins or not. precision : int , default 3 include_lowest : default False, the first interval should be left inclusive or not duplicates : default 'raise', 'drop' For non-unique bin edges if set.
Examples using options
In this example mark of each student in MATH is used for segmentation. We used bins to make non-uniform 3 segments. That is from 1 to 50 , from 50 to 70 and from 70 to 100.
How we will decide on segments for distribution of values ? There are three types.
Fixed width bins : By specifying integer we can say how many number of segments we want. Here mark is varying in the range of 50, so by saying bins= 5 we are creating segments of fixed width 10. The The range of x is extended by .1% to include minimum and maximum values.
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (79, 80]
1 Raju 2 40 70 (1, 49]
2 Alex 3 70 40 (69, 70]
3 Ron 4 70 50 (69, 70]
4 King 5 60 60 (50, 69]
5 Jack 6 30 30 (1, 49]
mark
Which bin we should place for the mark which are at the edges of the bins ?
Alex got 70 and he is kept in 50, 70 segment. We can place him in 70 , 100 also. For this we have to use right option. By default right=True. So when MARK is 70, it is included in 50 to 70 segment. If we make right=False then we will include MARK in 70 to 100 segment.
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 First
1 Raju 2 40 70 Fail
2 Alex 3 70 40 Second
3 Ron 4 70 50 Second
4 King 5 70 60 Second
5 Jack 6 30 30 Fail
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 First
1 Raju 2 40 70 Second
2 Alex 3 70 40 Second
3 Ron 4 70 50 Second
4 King 5 70 60 Second
5 Jack 6 30 30 Fail
include_lowest
Default value is False. The first interval should be left inclusive or not.
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (60.0, 80.0]
1 Raju 2 40 70 (30.0, 60.0]
2 Alex 3 70 40 (60.0, 80.0]
3 Ron 4 70 50 (60.0, 80.0]
4 King 5 70 60 (60.0, 80.0]
5 Jack 6 30 30 NaN
NAME ID MATH ENGLISH my_cut
0 Ravi 1 80 80 (50.0, 100.0]
1 Raju 2 40 70 NaN
2 Alex 3 70 40 (50.0, 100.0]
3 Ron 4 70 50 (50.0, 100.0]
4 King 5 70 60 (50.0, 100.0]
5 Jack 6 30 30 NaN