Pandas data iteration

The behavior of the basic iteration between Pandas objects depends on the type.When iterating over a series, it is considered an array, and the basic iteration produces these values.Other data structures, such as DataFrame and Panel, follow similar conventions to iterate over objects'keys.

In short, the basic iteration (for i in objects) occurs:

  • Series - Value
  • DataFrame - Column Label
  • Pannel - Item Label

Iteration Series

Iterating Series is similar to python dictionary objects

Example:

df = pd.Series(['A','B','C'])

#Traversal series value
for item in df:
    print(item)
print('\n')
#Key to traverse series
for item in df.keys():
    print(item)
print('\n')

#Traversing series key-value
for item, value in df.items():
    print(item, value)
print('\n')
for item in df.items():
    print(item)
print('\n')
for item in df.iteritems():
    print(item)

Output:

A
B
C


0
1
2


0 A
1 B
2 C


(0, 'A')
(1, 'B')
(2, 'C')


(0, 'A')
(1, 'B')
(2, 'C')

Iterate DataFrame

Iterate DataFrame Provide Column Name

Example:

import pandas as pd
import numpy as np

N=20

df = pd.DataFrame({
    'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
    'x': np.linspace(0,stop=N-1,num=N),
    'y': np.random.rand(N),
    'C': np.random.choice(['Low','Medium','High'],N).tolist(),
    'D': np.random.normal(100, 10, size=(N)).tolist()
    })

for col in df:
   print (col)

Execute the example code above and get the following results-

A
C
D
x
y

To traverse rows in a data frame, you can use the following functions -

  • iteritems() - iteration (key, value) pairs
  • iterrows() - Iterates rows into (index, series) pairs
  • itertuples() - Iterate rows as namedtuples

iteritems()

Iterates each column as a key and values as keys and column values as a Series object.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])
for key,value in df.iteritems():
   print (key,value)

Execute the example code above and get the following results-

col1 0    0.802390
1    0.324060
2    0.256811
3    0.839186
Name: col1, dtype: float64

col2 0    1.624313
1   -1.033582
2    1.796663
3    1.856277
Name: col2, dtype: float64

col3 0   -0.022142
1   -0.230820
2    1.160691
3   -0.830279
Name: col3, dtype: float64

iterrows()

iterrows() returns an iterator that generates each index value and a sequence containing each row of data.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
for row_index,row in df.iterrows():
   print (row_index,row)

Execute the example code above and get the following results-

0  col1    1.529759
   col2    0.762811
   col3   -0.634691
Name: 0, dtype: float64

1  col1   -0.944087
   col2    1.420919
   col3   -0.507895
Name: 1, dtype: float64

2  col1   -0.077287
   col2   -0.858556
   col3   -0.663385
Name: 2, dtype: float64
3  col1    -1.638578
   col2     0.059866
   col3     0.493482
Name: 3, dtype: float64

itertuples()

The itertuples() method returns an iterator that produces a named tuple for each row in the DataFrame.The first element of the tuple will be the corresponding index value of the row, while the remaining values will be row values.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
for row in df.itertuples():
    print (row)

Execute the example code above and get the following results-

Pandas(Index=0, col1=1.5297586201375899, col2=0.76281127433814944, col3=-
0.6346908238310438)

Pandas(Index=1, col1=-0.94408735763808649, col2=1.4209186418359423, col3=-
0.50789517967096232)

Pandas(Index=2, col1=-0.07728664756791935, col2=-0.85855574139699076, col3=-
0.6633852507207626)

Pandas(Index=3, col1=0.65734942534106289, col2=-0.95057710432604969,
col3=0.80344487462316527)

Note - Do not attempt to modify any objects during iteration.Iteration is for reading, and the iterator returns a copy of the original object (view), so changes will not be reflected on the original object.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])

for index, row in df.iterrows():
   row['a'] = 10
print (df)

Execute the example code above and get the following results-

        col1       col2       col3
0  -1.739815   0.735595  -0.295589
1   0.635485   0.106803   1.527922
2  -0.939064   0.547095   0.038585
3  -1.016509  -0.116580  -0.523158

Tags: Big Data Python

Posted on Fri, 27 Mar 2020 19:52:52 -0700 by phpnewbie25