DataFrame using Numpy arrays

Pandas Numpy

We will create DataFrame by using 1-D and 2-D Numpy arrays (numpy ndarray).

DataFrame can be created by using Numpy arrays. We know that Numpy array can have one type of data only, so we will try to create different numpy arrays by using different types of data and finally we will create one DataFrame with name of the students ( string ) and their marks ( numbers ).

Our final DataFrame will have NAME ( String ) and marks in two subjects or numbers in MATH & ENGLISH ( integer).

Let us create one 1-D array to store marks of students. While creating the DataFrame we will add the column name as MATH. We are creating DataFrame for marks in MATH only for four students.
import pandas as pd
import numpy as np
my_np=np.array([30,40,50,45]) # Numpy array
# print(my_np) # display the array 
my_pd=pd.DataFrame(data=my_np,columns=['MATH'])
print(my_pd)
Output
   MATH
0    30
1    40
2    50
3    45

Using 2-D array to create the DataFrame

We will use one 2-D array to create the DataFrame. Here we will not add the column names.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
                 [50,60,50,55]])
my_pd=pd.DataFrame(data=[my_np1[0],my_np1[1]])
print(my_pd)
Output
    0   1   2   3
0  30  40  50  45
1  50  60  50  55

Adding columns

Before adding the columns we will transpose the DataFrame to make it two columns.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
                 [50,60,50,55]])
# transpose the Dataframe				 
my_pd=pd.DataFrame(data=[my_np1[0],my_np1[1]]).T 
my_pd.columns=['MATH','ENGLISH']
print(my_pd)
Output
   MATH  ENGLISH
0    30       50
1    40       60
2    50       50
3    45       55
Here we got the marks of two subjects in our DataFrame. Let us add one string column to this to include the student Names.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
                 [50,60,50,55]])
my_names=np.array(['Alex','Ron','Jack','King'])
my_pd=pd.DataFrame(data=[my_names,my_np1[0],my_np1[1]]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
Output
  NAMES MATH ENGLISH
0  Alex   30      50
1   Ron   40      60
2  Jack   50      50
3  King   45      55

Adding new column to DataFrame

In above code we have two integer columns showing marks in two subjects. We can add one more column to show us sum of the marks or total marks. We will use sum() for this.
import pandas as pd
import numpy as np
my_np1=np.array([[30,40,50,45],
                 [50,60,50,55]])
my_names=np.array(['Alex','Ron','Jack','King'])
my_pd=pd.DataFrame(data=[my_names,my_np1[0],my_np1[1]]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
my_pd['Total']=my_pd['MATH'] + my_pd['ENGLISH']
print(my_pd)
Output
  NAMES MATH ENGLISH Total
0  Alex   30      50    80
1   Ron   40      60   100
2  Jack   50      50   100
3  King   45      55   100
We have used one 2-D array for two subjects. However it is better to use multiple 1-D arrays, one for each subject so it can be scaled up to include more subjects.
import pandas as pd
import numpy as np
my_math=np.array([30,40,50,45])
my_english=np.array([50,60,50,55])
my_names=np.array(['Alex','Ron','Jack','King'])

my_pd=pd.DataFrame(data=[my_names,my_math,my_english]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
Output
  NAMES MATH ENGLISH
0  Alex   30      50
1   Ron   40      60
2  Jack   50      50
3  King   45      55

Removing index

my_pd=pd.DataFrame(data=[my_names,my_math,my_english]).T
my_pd.columns=['NAMES','MATH','ENGLISH']
print(my_pd)
# remove index
print( my_pd.to_string(index=False))
Output
  NAMES MATH ENGLISH
0  Alex   30      50
1   Ron   40      60
2  Jack   50      50
3  King   45      55
NAMES MATH ENGLISH
 Alex   30      50
  Ron   40      60
 Jack   50      50
 King   45      55

Using random integers

Create one DataFrame by using random integer Numpy array. We created here one student mark DataFrame using 5 students ( rows ) and two subjects ( columns ) , you can increase to include more number of columns ( subjects ) and rows (students).
import numpy as np
import pandas as pd
n=5 # Number of students 
my_math=np.random.randint(40,100,size=n)
my_english=np.random.randint(40,100,size=n)

my_pd=pd.DataFrame(data=[my_math,my_english]).T

my_pd.columns=['MATH','ENG']
print(my_pd)
Output
   MATH  ENG
0    76   91
1    53   40
2    69   60
3    47   67
4    73   91
We can add one more column as student ID
import numpy as np
import pandas as pd
n=5 # Number of students 
my_id=np.arange(1,n+1)

my_math=np.random.randint(40,100,size=n)
my_english=np.random.randint(40,100,size=n)

my_pd=pd.DataFrame(data=[my_id,my_math,my_english]).T

my_pd.columns=['ID','MATH','ENG']
print(my_pd.to_string(index=None))
Output
 ID  MATH  ENG
  1    65   58
  2    58   97
  3    75   90
  4    42   69
  5    55   51
Pandas read_csv() read_excel() to_excel()


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer