NumPy strings.decode()
The numpy.strings.decode()
function decodes byte strings element-wise in a NumPy array, converting them into regular Unicode strings. It supports multiple encoding formats available in Python.
Syntax
</>
Copy
numpy.strings.decode(a, encoding=None, errors=None)
Parameters
Parameter | Type | Description |
---|---|---|
a | array_like (dtype=bytes_) | Array containing byte strings to be decoded. |
encoding | str, optional | The name of the encoding format (e.g., 'utf-8' , 'ascii' , etc.). Defaults to system default encoding. |
errors | str, optional | Defines error handling behavior. Options: 'strict' (default, raises errors), 'ignore' (ignores errors), 'replace' (replaces errors with a placeholder). |
Return Value
Returns an ndarray containing the decoded Unicode strings. If the input is a scalar, a single decoded string is returned.
Examples
1. Decoding a Single Byte String
Decoding a single byte string encoded in UTF-8.
</>
Copy
import numpy as np
# Define a byte string
byte_string = np.array(b'apple', dtype='S')
# Decode using UTF-8
decoded_string = np.strings.decode(byte_string, encoding='utf-8')
# Print the result
print("Decoded string:", decoded_string)
Output:
Decoded string: apple

2. Decoding an Array of Byte Strings
Decoding multiple byte strings stored in a NumPy array.
</>
Copy
import numpy as np
# Define an array of byte strings
byte_strings = np.array([b'banana', b'cherry', b'grape'], dtype='S')
# Decode using UTF-8
decoded_strings = np.strings.decode(byte_strings, encoding='utf-8')
# Print the results
print("Decoded strings:", decoded_strings)
Output:
Decoded strings: ['banana' 'cherry' 'grape']

3. Handling Decoding Errors with errors='ignore'
Using the errors
parameter to ignore invalid byte sequences during decoding.
</>
Copy
import numpy as np
# Define a byte string with invalid UTF-8 characters
invalid_bytes = np.array([b'blueberry', b'\x65\x66\x67', b'orange'], dtype='S')
# Decode using UTF-8 with 'ignore' to skip errors
decoded_strings = np.strings.decode(invalid_bytes, encoding='utf-8', errors='ignore')
# Print the results
print("Decoded strings:", decoded_strings)
Output:
Decoded strings: ['blueberry' 'efg' 'orange']
Since the second byte string contains invalid bytes, it is ignored in the output.

4. Replacing Invalid Bytes with errors='replace'
Using errors='replace'
to substitute invalid byte sequences with a placeholder.
</>
Copy
import numpy as np
# Define a byte string with invalid UTF-8 characters
invalid_bytes = np.array([b'pear', b'\xff\xfa\xfc', b'plum'], dtype='S')
# Decode using UTF-8 with 'replace' to handle errors
decoded_strings = np.strings.decode(invalid_bytes, encoding='utf-8', errors='replace')
# Print the results
print("Decoded strings:", decoded_strings)
Output:
Decoded strings: ['pear' '���' 'plum']

The invalid byte sequence is replaced with the placeholder character �
.