How pandas info works in Python? Best example

How pandas info works in Python? Best example
“`html

When working with data in Python, the pandas library is one of the most powerful tools at our disposal. One of the most frequently used methods in pandas is DataFrame.info(). It provides a wealth of information about a dataset, helping us quickly understand the structure and quality of our data.

What Is pandas.info()?

The info() method in pandas displays a summary of the DataFrame, including the index type, column names, non-null values, and memory usage. It’s especially useful when working with large datasets where we need a quick glance at the data before performing operations.

How pandas.info() Works in Python

Let’s explore the usage of pandas.info() in a practical example. First, we need to import pandas and create a sample DataFrame:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, None],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 90000, None, 110000]
}

df = pd.DataFrame(data)

# Display DataFrame info
df.info()

The output of df.info() will look something like this:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    5 non-null      object 
 1   Age     4 non-null      float64
 2   City    5 non-null      object 
 3   Salary  4 non-null      float64
dtypes: float64(2), object(2)
memory usage: 288.0 bytes

Breaking Down the Output

The output consists of several useful details:

  • DataFrame Class: Shows that the data structure is a pandas DataFrame.
  • RangeIndex: Displays the number of rows (entries) in the dataset.
  • Column Details: Lists all columns, their non-null counts, and data types.
  • Data Types: Helps us understand how data is stored (e.g., int, float, object).
  • Memory Usage: Indicates how much memory the DataFrame consumes.

Key Use Cases of pandas.info()

Why should we use .info()? Here are some common scenarios:

  1. Detecting Missing Values: The “Non-Null Count” column helps spot missing data.
  2. Checking Data Types: Ensures that numerical values are stored correctly (e.g., checking if a column is int or float).
  3. Optimizing Memory Usage: Understanding memory consumption helps when working with large datasets.
  4. Verifying Data Integrity: Ensures that expected values are present and properly formatted.

Customizing pandas.info() Output

By default, info() prints all column details, but we can modify its behavior using parameters:

Limiting Output with max_cols

df.info(max_cols=2)

This limits the number of columns displayed in the summary.

Retrieving Output as a String

By default, info() prints to the console. We can capture it as a string for further analysis:

import io

buffer = io.StringIO()
df.info(buf=buffer)
info_str = buffer.getvalue()

print(info_str)

Comparison Table: info() vs Other Methods

Method Purpose Output
info() Summary of DataFrame Column names, data types, non-null counts, memory usage
describe() Statistical summary Mean, min, max, standard deviation, etc.
head() First N rows Shows the first few rows of data
dtypes Data type summary Displays only column names and their types

Conclusion

Understanding how pandas.info() works in Python is essential for efficient data manipulation. It’s a simple yet powerful function that provides a quick overview of any DataFrame. By using it effectively, we can detect missing values, check data types, and optimize memory usage effortlessly. Whether you’re handling small datasets or massive ones, info() is a must-have tool in your pandas toolkit.

“` Other interesting article: How pandas describe works in Python? Best example