We will create DataFrame by using 1-D and 2-D Numpy arrays (numpy ndarray).
DataFrame can be created by using Numpy arrays. We know that Numpy array can have one type of data only, so we will try to create different numpy arrays by using different types of data and finally we will create one DataFrame with name of the students ( string ) and their marks ( numbers ).
Our final DataFrame will have NAME ( String ) and marks in two subjects or numbers in MATH & ENGLISH ( integer).
Let us create one 1-D array to store marks of students. While creating the DataFrame we will add the column name as MATH. We are creating DataFrame for marks in MATH only for four students.
import pandas as pd
import numpy as np
my_np=np.array([30,40,50,45]) # Numpy array
# print(my_np) # display the array
my_pd=pd.DataFrame(data=my_np,columns=['MATH'])
print(my_pd)
Output
MATH
0 30
1 40
2 50
3 45
Using 2-D array to create the DataFrame
We will use one 2-D array to create the DataFrame. Here we will not add the column names.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
my_pd=pd.DataFrame(data=[my_np1[0],my_np1[1]])
print(my_pd)
Output
0 1 2 3
0 30 40 50 45
1 50 60 50 55
Adding columns
Before adding the columns we will transpose the DataFrame to make it two columns.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
# transpose the Dataframe
my_pd=pd.DataFrame(data=[my_np1[0],my_np1[1]]).T
my_pd.columns=['MATH','ENGLISH']
print(my_pd)
Output
MATH ENGLISH
0 30 50
1 40 60
2 50 50
3 45 55
Here we got the marks of two subjects in our DataFrame. Let us add one string column to this to include the student Names.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
my_names=np.array(['Alex','Ron','Jack','King'])
my_pd=pd.DataFrame(data=[my_names,my_np1[0],my_np1[1]]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
Output
NAMES MATH ENGLISH
0 Alex 30 50
1 Ron 40 60
2 Jack 50 50
3 King 45 55
Adding new column to DataFrame
In above code we have two integer columns showing marks in two subjects. We can add one more column to show us sum of the marks or total marks. We will use sum() for this.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
my_names=np.array(['Alex','Ron','Jack','King'])
my_pd=pd.DataFrame(data=[my_names,my_np1[0],my_np1[1]]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
my_pd['Total']=my_pd['MATH'] + my_pd['ENGLISH']
print(my_pd)
Output
NAMES MATH ENGLISH Total
0 Alex 30 50 80
1 Ron 40 60 100
2 Jack 50 50 100
3 King 45 55 100
We have used one 2-D array for two subjects. However it is better to use multiple 1-D arrays, one for each subject so it can be scaled up to include more subjects.
import pandas as pd
import numpy as np
my_math=np.array([30,40,50,45])
my_english=np.array([50,60,50,55])
my_names=np.array(['Alex','Ron','Jack','King'])
my_pd=pd.DataFrame(data=[my_names,my_math,my_english]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
Output
NAMES MATH ENGLISH
0 Alex 30 50
1 Ron 40 60
2 Jack 50 50
3 King 45 55
Removing index
my_pd=pd.DataFrame(data=[my_names,my_math,my_english]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
# remove index
print( my_pd.to_string(index=False))
Output
NAMES MATH ENGLISH
0 Alex 30 50
1 Ron 40 60
2 Jack 50 50
3 King 45 55
NAMES MATH ENGLISH
Alex 30 50
Ron 40 60
Jack 50 50
King 45 55
Using random integers
Create one DataFrame by using random integer Numpy array. We created here one student mark DataFrame using 5 students ( rows ) and two subjects ( columns ) , you can increase to include more number of columns ( subjects ) and rows (students).
import numpy as np
import pandas as pd
n=5 # Number of students
my_math=np.random.randint(40,100,size=n)
my_english=np.random.randint(40,100,size=n)
my_pd=pd.DataFrame(data=[my_math,my_english]).T
my_pd.columns=['MATH','ENG']
print(my_pd)
Output
MATH ENG
0 76 91
1 53 40
2 69 60
3 47 67
4 73 91
We can add one more column as student ID
import numpy as np
import pandas as pd
n=5 # Number of students
my_id=np.arange(1,n+1)
my_math=np.random.randint(40,100,size=n)
my_english=np.random.randint(40,100,size=n)
my_pd=pd.DataFrame(data=[my_id,my_math,my_english]).T
my_pd.columns=['ID','MATH','ENG']
print(my_pd.to_string(index=None))
Output
ID MATH ENG
1 65 58
2 58 97
3 75 90
4 42 69
5 55 51