# DataFrame using Numpy arrays

We will create DataFrame by using 1-D and 2-D Numpy arrays (numpy ndarray).

DataFrame can be created by using Numpy arrays. We know that Numpy array can have one type of data only, so we will try to create different numpy arrays by using different types of data and finally we will create one DataFrame with name of the students ( string ) and their marks ( numbers ).

Our final DataFrame will have NAME ( String ) and marks in two subjects or numbers in MATH & ENGLISH ( integer).

Let us create one 1-D array to store marks of students. While creating the DataFrame we will add the column name as MATH. We are creating DataFrame for marks in MATH only for four students.
import pandas as pd
import numpy as np
my_np=np.array([30,40,50,45]) # Numpy array
# print(my_np) # display the array
my_pd=pd.DataFrame(data=my_np,columns=['MATH'])
print(my_pd)
Output
MATH
0    30
1    40
2    50
3    45

## Using 2-D array to create the DataFrame

We will use one 2-D array to create the DataFrame. Here we will not add the column names.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
my_pd=pd.DataFrame(data=[my_np1[0],my_np1[1]])
print(my_pd)
Output
0   1   2   3
0  30  40  50  45
1  50  60  50  55

Before adding the columns we will transpose the DataFrame to make it two columns.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
# transpose the Dataframe
my_pd=pd.DataFrame(data=[my_np1[0],my_np1[1]]).T
my_pd.columns=['MATH','ENGLISH']
print(my_pd)
Output
MATH  ENGLISH
0    30       50
1    40       60
2    50       50
3    45       55
Here we got the marks of two subjects in our DataFrame. Let us add one string column to this to include the student Names.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
my_names=np.array(['Alex','Ron','Jack','King'])
my_pd=pd.DataFrame(data=[my_names,my_np1[0],my_np1[1]]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
Output
NAMES MATH ENGLISH
0  Alex   30      50
1   Ron   40      60
2  Jack   50      50
3  King   45      55

## Adding new column to DataFrame

In above code we have two integer columns showing marks in two subjects. We can add one more column to show us sum of the marks or total marks. We will use sum() for this.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
[50,60,50,55]])
my_names=np.array(['Alex','Ron','Jack','King'])
my_pd=pd.DataFrame(data=[my_names,my_np1[0],my_np1[1]]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
my_pd['Total']=my_pd['MATH'] + my_pd['ENGLISH']
print(my_pd)
Output
NAMES MATH ENGLISH Total
0  Alex   30      50    80
1   Ron   40      60   100
2  Jack   50      50   100
3  King   45      55   100
We have used one 2-D array for two subjects. However it is better to use multiple 1-D arrays, one for each subject so it can be scaled up to include more subjects.
import pandas as pd
import numpy as np
my_math=np.array([30,40,50,45])
my_english=np.array([50,60,50,55])
my_names=np.array(['Alex','Ron','Jack','King'])

my_pd=pd.DataFrame(data=[my_names,my_math,my_english]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
Output
NAMES MATH ENGLISH
0  Alex   30      50
1   Ron   40      60
2  Jack   50      50
3  King   45      55

## Removing index

my_pd=pd.DataFrame(data=[my_names,my_math,my_english]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
# remove index
print( my_pd.to_string(index=False))
Output
NAMES MATH ENGLISH
0  Alex   30      50
1   Ron   40      60
2  Jack   50      50
3  King   45      55
NAMES MATH ENGLISH
Alex   30      50
Ron   40      60
Jack   50      50
King   45      55

## Using random integers

Create one DataFrame by using random integer Numpy array. We created here one student mark DataFrame using 5 students ( rows ) and two subjects ( columns ) , you can increase to include more number of columns ( subjects ) and rows (students).
import numpy as np
import pandas as pd
n=5 # Number of students
my_math=np.random.randint(40,100,size=n)
my_english=np.random.randint(40,100,size=n)

my_pd=pd.DataFrame(data=[my_math,my_english]).T

my_pd.columns=['MATH','ENG']
print(my_pd)
Output
MATH  ENG
0    76   91
1    53   40
2    69   60
3    47   67
4    73   91
We can add one more column as student ID
import numpy as np
import pandas as pd
n=5 # Number of students
my_id=np.arange(1,n+1)

my_math=np.random.randint(40,100,size=n)
my_english=np.random.randint(40,100,size=n)

my_pd=pd.DataFrame(data=[my_id,my_math,my_english]).T

my_pd.columns=['ID','MATH','ENG']
print(my_pd.to_string(index=None))
Output
ID  MATH  ENG
1    65   58
2    58   97
3    75   90
4    42   69
5    55   51

Subscribe to our YouTube Channel here

## Subscribe

* indicates required
Subscribe to plus2net

plus2net.com