NumPy strings.encode()
The numpy.strings.encode()
function encodes an array of strings element-wise using the specified encoding format.
It is useful for converting string data into encoded byte representations.
Syntax
numpy.strings.encode(a, encoding=None, errors=None)
Parameters
Parameter | Type | Description |
---|---|---|
a | array_like | An array of strings with StringDType or str_ dtype to be encoded. |
encoding | str, optional | The encoding format to use, such as 'utf-8' or 'ascii' . If not provided, the default system encoding is used. |
errors | str, optional | Specifies how to handle encoding errors. Common values are 'strict' , 'ignore' , and 'replace' . |
Return Value
Returns an ndarray where each element is the encoded byte representation of the corresponding string in the input array.
Examples
1. Encoding a Single String
In this example, we encode a single string into UTF-8 format.
import numpy as np
# Define a string
fruit = np.array("apple", dtype="str")
# Encode the string using UTF-8
encoded_fruit = np.strings.encode(fruit, encoding="utf-8")
# Print the encoded result
print("Encoded string:", encoded_fruit)
Output:
Encoded string: np.bytes_(b'apple')

2. Encoding an Array of Strings
We encode multiple strings stored in a NumPy array.
import numpy as np
# Define an array of strings
fruits = np.array(["apple", "banana", "cherry"], dtype="str")
# Encode the array into UTF-8
encoded_fruits = np.strings.encode(fruits, encoding="utf-8")
# Print the encoded result
print("Encoded array:", encoded_fruits)
Output:
Encoded array: [b'apple' b'banana' b'cherry']

3. Encoding with ASCII and Handling Errors
In this example, we attempt to encode a string that contains special characters using ASCII encoding. Since ASCII does not support non-ASCII characters, we specify how errors should be handled.
import numpy as np
# Define an array with a special character
fruits = np.array(["apple", "banãna", "cherry"], dtype="str")
# Encode using ASCII with 'ignore' to skip unsupported characters
encoded_fruits_ignore = np.strings.encode(fruits, encoding="ascii", errors="ignore")
# Encode using ASCII with 'replace' to replace unsupported characters
encoded_fruits_replace = np.strings.encode(fruits, encoding="ascii", errors="replace")
# Print results
print("Encoded with ignore:", encoded_fruits_ignore)
print("Encoded with replace:", encoded_fruits_replace)
Output:
Encoded with ignore: [b'apple' b'banna' b'cherry']
Encoded with replace: [b'apple' b'ban?na' b'cherry']

In the 'ignore'
case, the unsupported character is removed, while in the 'replace'
case, it is replaced with a question mark.
4. Encoding with a Different Encoding Format
We use an alternative encoding format, 'utf-16'
, to encode strings.
import numpy as np
# Define an array of strings
fruits = np.array(["apple", "banana", "cherry"], dtype="str")
# Encode using UTF-16
encoded_fruits_utf16 = np.strings.encode(fruits, encoding="utf-16")
# Print results
print("Encoded with UTF-16:", encoded_fruits_utf16)
Output:
Encoded with UTF-16: [b'\xff\xfea\x00p\x00p\x00l\x00e' b'\xff\xfeb\x00a\x00n\x00a\x00n\x00a'
b'\xff\xfec\x00h\x00e\x00r\x00r\x00y']

UTF-16 encoding results in a different byte representation, including the Byte Order Mark (BOM) at the start.