How numpy split works in Python? Best example

How numpy split works in Python? Best example

 

When working with arrays in Python, the numpy.split() function lets us divide arrays into multiple sub-arrays along a specified axis. This can be incredibly useful in data processing, machine learning, and numerical computations where we often need to break datasets into smaller, more manageable chunks.

Understanding numpy.split()

The numpy.split() function is part of the NumPy library and allows us to split an array into multiple sub-arrays. The basic syntax is:

numpy.split(arr, indices_or_sections, axis=0)
  • arr: The input array that we want to split.
  • indices_or_sections: Either an integer specifying the number of equal parts or a list of indices at which the array is split.
  • axis: The axis along which to split the array. By default, it’s set to 0 (rows).

Splitting an Array into Equal Parts

If we pass an integer to numpy.split(), it divides the array into equal parts. For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, 3)  # Splitting into 3 equal parts

print(split_arr)

Output:

[array([1, 2]), array([3, 4]), array([5, 6])]

Each sub-array contains two elements because the original array contains six elements and is split into three parts.

Splitting an Array at Specific Indices

If we want more control over where the array is split, we can provide a list of indices instead of an integer. Here’s an example:

import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])
split_arr = np.split(arr, [2, 4])

print(split_arr)

Output:

[array([10, 20]), array([30, 40]), array([50, 60])]

In this case, the array is split at indices 2 and 4, creating three sub-arrays.

Splitting Along Different Axes

The axis parameter specifies whether the array should be split along rows or columns. Let’s see how it works with a 2D array:

import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
split_arr = np.split(arr, 2, axis=1)  # Splitting along columns

print(split_arr)

Output:

[array([[1, 2],  
        [5, 6]]),  
 array([[3, 4],  
        [7, 8]])]

By setting axis=1, we split along columns instead of rows.

Handling Unequal Splits

If the number of elements in the array isn’t evenly divisible by the number of sections, numpy.split() will raise an error. For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
np.split(arr, 3)  # This will raise an error

To handle uneven splits, NumPy provides an alternative function called numpy.array_split(), which allows unequal splits.

When to Use numpy.split()?

Some common use cases for numpy.split() include:

  • Dividing datasets into training and testing subsets
  • Preprocessing large numerical datasets
  • Parallel processing of array segments
  • Data manipulation and transformation tasks

Comparison: numpy.split() vs numpy.array_split()

Although both numpy.split() and numpy.array_split() serve a similar purpose, they function slightly differently, especially when dealing with uneven splits.

Function Can Handle Unequal Splits? Raises Error When Division is Unequal?
numpy.split() No Yes
numpy.array_split() Yes No

Conclusion

The numpy.split() function provides an efficient way to divide arrays into sub-arrays along a specified axis. Whether splitting into equal parts or using indices, it’s useful for a variety of applications in data science and numerical computing. If an array cannot be split evenly, consider using numpy.array_split() instead.

 

Other interesting article:

How numpy dstack works in Python? Best example