NumPy strings.decode()

The numpy.strings.decode() function decodes byte strings element-wise in a NumPy array, converting them into regular Unicode strings. It supports multiple encoding formats available in Python.

Syntax

</>

Copy

numpy.strings.decode(a, encoding=None, errors=None)

Parameters

Parameter	Type	Description
`a`	array_like (dtype=bytes_)	Array containing byte strings to be decoded.
`encoding`	str, optional	The name of the encoding format (e.g., `'utf-8'`, `'ascii'`, etc.). Defaults to system default encoding.
`errors`	str, optional	Defines error handling behavior. Options: `'strict'` (default, raises errors), `'ignore'` (ignores errors), `'replace'` (replaces errors with a placeholder).

Return Value

Returns an ndarray containing the decoded Unicode strings. If the input is a scalar, a single decoded string is returned.

Examples

1. Decoding a Single Byte String

Decoding a single byte string encoded in UTF-8.

</>

Copy

import numpy as np

# Define a byte string
byte_string = np.array(b'apple', dtype='S')

# Decode using UTF-8
decoded_string = np.strings.decode(byte_string, encoding='utf-8')

# Print the result
print("Decoded string:", decoded_string)

Output:

Decoded string: apple

2. Decoding an Array of Byte Strings

Decoding multiple byte strings stored in a NumPy array.

</>

Copy

import numpy as np

# Define an array of byte strings
byte_strings = np.array([b'banana', b'cherry', b'grape'], dtype='S')

# Decode using UTF-8
decoded_strings = np.strings.decode(byte_strings, encoding='utf-8')

# Print the results
print("Decoded strings:", decoded_strings)

Output:

Decoded strings: ['banana' 'cherry' 'grape']

3. Handling Decoding Errors with `errors='ignore'`

Using the errors parameter to ignore invalid byte sequences during decoding.

</>

Copy

import numpy as np

# Define a byte string with invalid UTF-8 characters
invalid_bytes = np.array([b'blueberry', b'\x65\x66\x67', b'orange'], dtype='S')

# Decode using UTF-8 with 'ignore' to skip errors
decoded_strings = np.strings.decode(invalid_bytes, encoding='utf-8', errors='ignore')

# Print the results
print("Decoded strings:", decoded_strings)

Output:

Decoded strings: ['blueberry' 'efg' 'orange']

Since the second byte string contains invalid bytes, it is ignored in the output.

4. Replacing Invalid Bytes with `errors='replace'`

Using errors='replace' to substitute invalid byte sequences with a placeholder.

</>

Copy

import numpy as np

# Define a byte string with invalid UTF-8 characters
invalid_bytes = np.array([b'pear', b'\xff\xfa\xfc', b'plum'], dtype='S')

# Decode using UTF-8 with 'replace' to handle errors
decoded_strings = np.strings.decode(invalid_bytes, encoding='utf-8', errors='replace')

# Print the results
print("Decoded strings:", decoded_strings)

Output:

Decoded strings: ['pear' '���' 'plum']

The invalid byte sequence is replaced with the placeholder character �.

Additional Reading

Python Strings

TutorialKart

NumPy strings.decode() – Decode Byte Strings in Array