NumPy strings.decode()

The numpy.strings.decode() function decodes byte strings element-wise in a NumPy array, converting them into regular Unicode strings. It supports multiple encoding formats available in Python.

Syntax

</>
Copy
numpy.strings.decode(a, encoding=None, errors=None)

Parameters

ParameterTypeDescription
aarray_like (dtype=bytes_)Array containing byte strings to be decoded.
encodingstr, optionalThe name of the encoding format (e.g., 'utf-8', 'ascii', etc.). Defaults to system default encoding.
errorsstr, optionalDefines error handling behavior. Options: 'strict' (default, raises errors), 'ignore' (ignores errors), 'replace' (replaces errors with a placeholder).

Return Value

Returns an ndarray containing the decoded Unicode strings. If the input is a scalar, a single decoded string is returned.


Examples

1. Decoding a Single Byte String

Decoding a single byte string encoded in UTF-8.

</>
Copy
import numpy as np

# Define a byte string
byte_string = np.array(b'apple', dtype='S')

# Decode using UTF-8
decoded_string = np.strings.decode(byte_string, encoding='utf-8')

# Print the result
print("Decoded string:", decoded_string)

Output:

Decoded string: apple

2. Decoding an Array of Byte Strings

Decoding multiple byte strings stored in a NumPy array.

</>
Copy
import numpy as np

# Define an array of byte strings
byte_strings = np.array([b'banana', b'cherry', b'grape'], dtype='S')

# Decode using UTF-8
decoded_strings = np.strings.decode(byte_strings, encoding='utf-8')

# Print the results
print("Decoded strings:", decoded_strings)

Output:

Decoded strings: ['banana' 'cherry' 'grape']

3. Handling Decoding Errors with errors='ignore'

Using the errors parameter to ignore invalid byte sequences during decoding.

</>
Copy
import numpy as np

# Define a byte string with invalid UTF-8 characters
invalid_bytes = np.array([b'blueberry', b'\x65\x66\x67', b'orange'], dtype='S')

# Decode using UTF-8 with 'ignore' to skip errors
decoded_strings = np.strings.decode(invalid_bytes, encoding='utf-8', errors='ignore')

# Print the results
print("Decoded strings:", decoded_strings)

Output:

Decoded strings: ['blueberry' 'efg' 'orange']

Since the second byte string contains invalid bytes, it is ignored in the output.

4. Replacing Invalid Bytes with errors='replace'

Using errors='replace' to substitute invalid byte sequences with a placeholder.

</>
Copy
import numpy as np

# Define a byte string with invalid UTF-8 characters
invalid_bytes = np.array([b'pear', b'\xff\xfa\xfc', b'plum'], dtype='S')

# Decode using UTF-8 with 'replace' to handle errors
decoded_strings = np.strings.decode(invalid_bytes, encoding='utf-8', errors='replace')

# Print the results
print("Decoded strings:", decoded_strings)

Output:

Decoded strings: ['pear' '���' 'plum']

The invalid byte sequence is replaced with the placeholder character .

Additional Reading

  1. Python Strings