Introduction to pandas 04 - Data Reading

01 Text Format Data Reading and Writing

01-01 header file

Some data loading functions, such as pandas.read_csv, do type inference because the data type of the column is not part of the data format. This means that you don't have to specify which column is a value, an integer, a Boolean value, or a string.

For example, we can use read_csv to read a partitioned file into the DataFrame.

df = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex1.csv')
print(df)

The results are as follows:

You can also use read_table and specify a delimiter.

df2 = pd.read_table(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex1.csv',
	sep=',')
print(df2)

The results are as follows:

01-02 Document without Header

Some files do not have a header. To read the file, some options need to be considered. pandas can be allowed to automatically assign default column names, or they can specify column names themselves.

data = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex2.csv',header=None)
print(data)

data1 = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex2.csv',names=['a','b','c','d','message'])
print(data1)

Assuming that you want the message column to be an index returning to the DataFrame, you can specify the column at position 4 as an index or pass'message'to the parameter index_col.

names = ['a', 'b', 'c', 'd', 'message']
data2 = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex2.csv', names=names, index_col='message')
print(data2)

When you want to form a hierarchical index from multiple columns, you need to pass in a list containing column number or column name:

parsed = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\csv_mindex.csv',index_col=['key1','key2'])
print(parsed)

The results are as follows:

Regular expressions can be used when fields are separated by a variety of different numbers of spaces.

result = pd.read_table(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex3.csv', sep='\s+')
print(result)

Because the number of column names is one less than the number of columns, read_table infers that the first column is the DataFrame index.

#Skip line 1, 3, 4
skip_res = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex4.csv', skiprows=[0, 2, 3])
print(skip_res)

02 Block Read Text File

# Make the output display more compact
pd.options.display.max_rows = 10

res = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex6.csv')
print(res)

If you want to read only a few lines (avoid reading the entire file), you can specify nrows.

five_rows = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex6.csv', nrows=5)
print(five_rows)

To read files in blocks, you can specify chunksize as the number of rows per block:

chunkers = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex6.csv',chunksize=1000)
tot = pd.Series([])
for piece in chunkers:
	tot = tot.add(piece['key'].value_counts(),fill_value=0)
tot = tot.sort_values(ascending=False)
print(tot[:10])

There is also the get_chunk method, which allows you to read data blocks of any size.

03 Write data into text format

The to_csv method can be used to export data to comma-separated files.

data.to_csv("out.csv")

Other separators can also be used.

# sys.stdout, console standard output
data.to_csv(sys.stdout,sep='|')

The missing values appear as empty strings at the time of output, and can be annotated with other identifier values.

data.to_csv(sys.stdout,na_rep='NULL')

Writing labels on rows and columns can be prohibited:

data.to_csv(sys.stdout,index=False,header=False)

You can write only to a subset of columns and in the specified order:

data.to_csv(sys.stdout,index=False, columns=[1,2,3])

Series also has to_csv methods.

Json strings are converted to Python using the json.load method, and Python objects are converted to Json by json.dumps.

Tags: Programming JSON Python less

Posted on Tue, 10 Sep 2019 18:50:30 -0700 by pdaoust