How pandas read_json works in Python? Best example

When working with JSON data in Python, the pandas.read_json() function is an incredibly useful tool. It allows us to read JSON data and convert it into a Pandas DataFrame with just a single line of code. JSON (JavaScript Object Notation) is a common format for data exchange, often used in APIs, configuration files, and more. Let’s explore how pandas.read_json() works and see some practical examples.

Basic Usage of `pandas.read_json()`

The pandas.read_json() function is designed to take a JSON string, file, or URL and transform it into a DataFrame. Let’s begin with a simple example.

import pandas as pd

# JSON string
json_data = '{"name": ["Alice", "Bob", "Charlie"], "age": [25, 30, 35]}'

# Convert JSON to DataFrame
df = pd.read_json(json_data)
print(df)

The output:

      name  age
0   Alice   25
1     Bob   30
2  Charlie  35

As you can see, Pandas automatically structures the JSON data into a readable and structured format.

Reading JSON from a File

Often, we’ll need to read JSON from an external file rather than a string. The pandas.read_json() function supports this as well.

df = pd.read_json("data.json")

By default, it assumes that the data inside the file is formatted as a JSON object with key-value pairs representing column names.

Understanding `orient` Parameter

The orient parameter in pandas.read_json() allows us to specify how JSON data is structured. The available options are:

records – JSON is a list of dictionaries, each representing a row.
index – A dictionary where keys are row indices and values are dictionaries containing column data.
columns – A dictionary where keys are column names and values are lists of column values.
values – A simple 2D list containing raw values.

Let’s look at an example using the records orientation:

json_data = '[{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}]'

df = pd.read_json(json_data, orient="records")
print(df)

Reading JSON from a URL

Sometimes, JSON data is retrieved from an online API. We can directly read data from a URL using pandas.read_json().

url = "https://api.example.com/data.json"
df = pd.read_json(url)

This makes it incredibly easy to work with live data directly within a Pandas DataFrame.

Handling Nested JSON

JSON data often comes in a nested format, which can be tricky to handle. Pandas does not automatically flatten nested structures, but we can manually normalize them using json_normalize().

import json
from pandas import json_normalize

nested_json = '{"employees": [{"name": "Alice", "info": {"age": 25, "city": "New York"}}, {"name": "Bob", "info": {"age": 30, "city": "London"}}]}'

data = json.loads(nested_json)
df = json_normalize(data["employees"])
print(df)

Error Handling While Reading JSON

Sometimes, reading JSON files may lead to errors due to formatting issues. These are a few common errors:

JSONDecodeError – Occurs when JSON is improperly formatted.
FileNotFoundError – Happens when the specified file is missing.
ValueError – Can arise due to unexpected JSON structures.

We can handle these errors using try-except blocks:

try:
    df = pd.read_json("data.json")
except ValueError as e:
    print(f"Error reading JSON: {e}")

Comparison of Different Orientations

The table below summarizes different orientations and how the data is structured:

Orientation	Example JSON Structure
`records`	[{“name”: “Alice”, “age”: 25}, {“name”: “Bob”, “age”: 30}]
`index`	{“0”: {“name”: “Alice”, “age”: 25}, “1”: {“name”: “Bob”, “age”: 30}}
`columns`	{“name”: [“Alice”, “Bob”], “age”: [25, 30]}
`values`	[[“Alice”, 25], [“Bob”, 30]]

Conclusion

The pandas.read_json() function is an essential tool when working with JSON data in Python. It allows us to seamlessly integrate JSON files, strings, and URLs into Pandas for efficient data analysis. By understanding its parameters, error handling techniques, and common use cases, we can ensure that JSON data is processed effectively.

Other interesting article:

How pandas read_sql works in Python? Best example