Pandas DataFrame operations

Data has a variety of types. A data-type is essentially an internal construct that a programming language uses to understand how to store and operate data. The format of individual rows and columns will affect analysis performed on a dataset read into programming environment. The Pandas DataFrame is a structure that contains 2-dimensional Data and its corresponding labels.
  1. Types of Data
  2. Numeric Data Types
  3. Text Data Type
Numeric data types include integers and floats. Text data type is known as Objects in Pandas or Strings in Python. The names of data-types are somewhat change from those in native Python .

How to Check the Data Type in Pandas DataFrame

There are two purpose to check data types in a dataframe. Pandas automatically assigns types based on the encoding it detects from the original dataset. For a number of reasons, this assignment may be correct or incorrect. The data type for a column in a Pandas DataFrame or a Series is known as the dtype. You can use the dtype property to grab the type of a specific column You can use the following syntax to check the data type of all columns in Pandas DataFrame :

df.dtypes

Alternatively, you may use the syntax below to check the data type of a specific column in a DataFrame:
df['DataFrame Column'].dtypes

How to change column type in pandas?

You have four main options for converting types in pandas:

  1. astype()
  2. to_numeric()
  3. infer_objects()
  4. convert_dtypes()

astype()

The astype() method is generally used for casting the pandas object to a specified dtype.astype() function. It can also convert any appropriate existing column to a categorical type. example
df = df.astype(int) # convert all columns to int64
df = df.astype({"x": int, "y": complex}) # column "x" to int64 dtype and "y" to complex type
s = s.astype(np.float16) # Series to float16 type
s = s.astype(str) # Series to Python strings
s = s.astype('category') # Series to categorical type

to_numeric()

Pandas to_numeric() method will try to change strings (such as non-numeric objects) into integers or floating point numbers as appropriate.
df["a"] = pd.to_numeric(df["a"]) # column "a" of a DataFrame
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric) # convert columns "a" and "b"

infer_objects()

infer_objects() for converting columns of a Pandas DataFrame that have an object datatype to a more specific type.


infer_objects()
Using infer_objects() , you can change the type of column 'a' to int64:
convert_dtypes()

convert_dtypes()

You can use pandas convert_dtypes() method to convert the default assigned data types to the suitable datatype automatically. There is one big advantage of using convert_dtypes()- it supports new type for missing values pd.NA along with NaN.
import pandas as pd import numpy as np # creating a dataframe df = pd.DataFrame({"Roll_No.": ([101, 102, 103]), "Name": ["John", "Doe", "Bill"], "Result": ["Pass", "Fail", np.nan], "Promoted": [True, False, np.nan], "Marks": [80.34, 36.6, np.nan]}) # printing the dataframe print("PRINTING DATAFRAME") display(df) # checking datatype print() print("PRINTING DATATYPE") print(df.dtypes) # converting datatype print() print("AFTER CONVERTING DATATYPE") print(df.convert_dtypes().dtypes)