Introduction to pandas 04 - Data Reading

01 Text Format Data Reading and Writing

01-01 header file

Some data loading functions, such as pandas.read_csv, do type inference because the data type of the column is not part of the data format. This means that you don't have to specify which column is a value, an integer, a Boolean value, or a string.

For example, we can use read_csv to read a partitioned file into the DataFrame.

df = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex1.csv')

The results are as follows:

You can also use read_table and specify a delimiter.

df2 = pd.read_table(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex1.csv',

The results are as follows:

01-02 Document without Header

Some files do not have a header. To read the file, some options need to be considered. pandas can be allowed to automatically assign default column names, or they can specify column names themselves.

data = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex2.csv',header=None)

data1 = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex2.csv',names=['a','b','c','d','message'])

Assuming that you want the message column to be an index returning to the DataFrame, you can specify the column at position 4 as an index or pass'message'to the parameter index_col.

names = ['a', 'b', 'c', 'd', 'message']
data2 = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex2.csv', names=names, index_col='message')

When you want to form a hierarchical index from multiple columns, you need to pass in a list containing column number or column name:

parsed = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\csv_mindex.csv',index_col=['key1','key2'])

The results are as follows:

Regular expressions can be used when fields are separated by a variety of different numbers of spaces.

result = pd.read_table(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex3.csv', sep='\s+')

Because the number of column names is one less than the number of columns, read_table infers that the first column is the DataFrame index.

#Skip line 1, 3, 4
skip_res = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex4.csv', skiprows=[0, 2, 3])

02 Block Read Text File

# Make the output display more compact
pd.options.display.max_rows = 10

res = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex6.csv')

If you want to read only a few lines (avoid reading the entire file), you can specify nrows.

five_rows = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex6.csv', nrows=5)

To read files in blocks, you can specify chunksize as the number of rows per block:

chunkers = pd.read_csv(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\examples\ex6.csv',chunksize=1000)
tot = pd.Series([])
for piece in chunkers:
	tot = tot.add(piece['key'].value_counts(),fill_value=0)
tot = tot.sort_values(ascending=False)

There is also the get_chunk method, which allows you to read data blocks of any size.

03 Write data into text format

The to_csv method can be used to export data to comma-separated files.


Other separators can also be used.

# sys.stdout, console standard output

The missing values appear as empty strings at the time of output, and can be annotated with other identifier values.


Writing labels on rows and columns can be prohibited:


You can write only to a subset of columns and in the specified order:

data.to_csv(sys.stdout,index=False, columns=[1,2,3])

Series also has to_csv methods.

Json strings are converted to Python using the json.load method, and Python objects are converted to Json by json.dumps.

Tags: Programming JSON Python less

Posted on Tue, 10 Sep 2019 18:50:30 -0700 by pdaoust