pandas - Python Data Analysis Library
Pandas is an open-source software library for analysing, cleaning, exploring, and manipulating data, built on top of the Python programming language. The main data structures in Pandas are the Series and the DataFrame (similar to R's data frame). A Pandas Series one-dimensional labelled array of data and an index. All the data in a dataFrame Series is of the same data type. The pandas DataFrame is a two-dimensional tabular style data with column and row indexes. The columns in DataFrame are made up of Series objects . The pandas module allows developers to import data from various file formats (csv, json, sql, xls, etc.) and perform data manipulation operations, including cleaning and reshaping the data, summarizing observations , grouping data, and merging multiple datasets.
Importing Pandas DataFrame module
>>> import pandas
If you have large amounts of function calls to pandas , it can become hard to write pandas.x() over and over again. Instead, it is better to import under the brief name pd.
>>> import pandas as pd
Get your data into a DataFrame
There are several ways you can use to take a standard python datastructure and create a panda's DataFrame.Pandas DataFrame from Python List
>>> import pandas as pd
>>> lstColors = ['red','blue','green']
>>> df=pd.DataFrame(lstColors)
>>> print(df)
0
0 red
1 blue
2 green
Pandas DataFrame from Python Dictionary
>>> import pandas as pd
>>>
>>> data = {
... "Name": ['John', 'Doe', 'Gates'],
... "Age": [34, 52, 25],
... "Grade": ['B','A','B']
... }
>>>
>>> #load data into a DataFrame object:
>>>
>>> df=pd.DataFrame(data)
>>> print(df)
Name Age Grade
0 John 34 B
1 Doe 52 A
2 Gates 25 B
Working with DataFrame Columns and Rows
Select Columns from DataFrame
From daraframe select only Name and Grade Columns
>>> print(df[['Name', 'Grade']])
Name Grade
0 John B
1 Doe A
2 Gates B
Select Rows from DataFrame
Pandas daraframe uses the loc() method to return one or more specified row(s).
>>> print(df.loc[1])
Name Doe
Age 52
Grade A
Name: 1, dtype: object
Select Multiple rows from DataFrame
>>> print(df.loc[[0,2]])
Name Age Grade
0 John 34 B
2 Gates 25 B
Adding Named Indexes
In dataframe you can name your own indexes by using index argument .
>>> import pandas as pd
>>>
>>> data = {
... "Name": ['John', 'Doe', 'Gates'],
... "Age": [34, 52, 25],
... "Grade": ['B','A','B']
... }
>>>
>>> df=pd.DataFrame(data,index=['Student-1','Student-2','Student-3'])
>>> print(df)
Name Age Grade
Student-1 John 34 B
Student-2 Doe 52 A
Student-3 Gates 25 B
Retrieve data using Named Index
>>> print(df.loc["Student-2"])
Name Doe
Age 52
Grade A
Name: Student-2, dtype: object
Dataframe from numpy ndarray
>>> import numpy as np
>>> import pandas as pd
>>> df=pd.DataFrame(np.random.randint(low=100,high=999,size=(10,4)))
>>> df
0 1 2 3
0 935 842 850 327
1 232 149 306 615
2 602 943 729 686
3 894 460 563 221
4 223 529 905 486
5 386 961 100 451
6 801 852 692 887
7 922 491 325 186
8 678 942 386 152
9 286 764 359 708
View the first or last N rows
- DataFrame head() method return first 5 rows
- DataFrame tail() method return last 5 rows
>>> df.head()
0 1 2 3
0 935 842 850 327
1 232 149 306 615
2 602 943 729 686
3 894 460 563 221
4 223 529 905 486
You can pass number of rows as argument
>>> df.tail(2)
0 1 2 3
8 678 942 386 152
9 286 764 359 708
Loading Data from files
The function read_csv (for comma separated values), read_excel (for Microsoft Excel spreadsheets), read_fwf (fixed width formatted text) etc. are using read data from external files.
import pandas as pd
df = pd.read_csv('your-data.csv')
example
import pandas as pd
df = pd.read_csv('https://static.lib.virginia.edu/statlab/materials/data/VDH-COVID-19-PublicUseDataset-EventDate.csv')
df.head()

Saving a DataFrame
Read data and saving a DataFrame to a CSV file.
import pandas as pd
df = pd.read_csv('https://static.lib.virginia.edu/statlab/materials/data/VDH-COVID-19-PublicUseDataset-EventDate.csv')
df.to_csv('d:/data.csv', encoding='utf-8')
print('done')
Find columns data types
>>> import pandas as pd
>>> df = pd.read_csv('data.csv')
>>> print(df.dtypes)
Unnamed: 0 int64
Event Date object
Health Planning Region object
Case Status object
Number of Cases int64
Number of Hospitalizations int64
Number of Deaths int64
dtype: object
Statistical Summary of Data
Pandas describe() method output a a brief statistical summary of the numeric columns in the data, including descriptive statistics of the central tendency and dispersion.
Copy DataFrame to another DataFrame
import pandas as pd
df = pd.read_csv('data.csv')
dfc = df.copy()
dfc.head()
Count rows in a DataFrame
>>> import pandas as pd
>>> df = pd.read_csv('data.csv')
>>> df.count()
Unnamed: 0 2338
Event Date 2338
Health Planning Region 2338
Case Status 2338
Number of Cases 2338
Number of Hospitalizations 2338
Number of Deaths 2338
dtype: int64
Related Topics