NumPy strings.encode()

The numpy.strings.encode() function encodes an array of strings element-wise using the specified encoding format. It is useful for converting string data into encoded byte representations.

Syntax

</>
Copy
numpy.strings.encode(a, encoding=None, errors=None)

Parameters

ParameterTypeDescription
aarray_likeAn array of strings with StringDType or str_ dtype to be encoded.
encodingstr, optionalThe encoding format to use, such as 'utf-8' or 'ascii'. If not provided, the default system encoding is used.
errorsstr, optionalSpecifies how to handle encoding errors. Common values are 'strict', 'ignore', and 'replace'.

Return Value

Returns an ndarray where each element is the encoded byte representation of the corresponding string in the input array.


Examples

1. Encoding a Single String

In this example, we encode a single string into UTF-8 format.

</>
Copy
import numpy as np

# Define a string
fruit = np.array("apple", dtype="str")

# Encode the string using UTF-8
encoded_fruit = np.strings.encode(fruit, encoding="utf-8")

# Print the encoded result
print("Encoded string:", encoded_fruit)

Output:

Encoded string: np.bytes_(b'apple')

2. Encoding an Array of Strings

We encode multiple strings stored in a NumPy array.

</>
Copy
import numpy as np

# Define an array of strings
fruits = np.array(["apple", "banana", "cherry"], dtype="str")

# Encode the array into UTF-8
encoded_fruits = np.strings.encode(fruits, encoding="utf-8")

# Print the encoded result
print("Encoded array:", encoded_fruits)

Output:

Encoded array: [b'apple' b'banana' b'cherry']

3. Encoding with ASCII and Handling Errors

In this example, we attempt to encode a string that contains special characters using ASCII encoding. Since ASCII does not support non-ASCII characters, we specify how errors should be handled.

</>
Copy
import numpy as np

# Define an array with a special character
fruits = np.array(["apple", "banãna", "cherry"], dtype="str")

# Encode using ASCII with 'ignore' to skip unsupported characters
encoded_fruits_ignore = np.strings.encode(fruits, encoding="ascii", errors="ignore")

# Encode using ASCII with 'replace' to replace unsupported characters
encoded_fruits_replace = np.strings.encode(fruits, encoding="ascii", errors="replace")

# Print results
print("Encoded with ignore:", encoded_fruits_ignore)
print("Encoded with replace:", encoded_fruits_replace)

Output:

Encoded with ignore: [b'apple' b'banna' b'cherry']
Encoded with replace: [b'apple' b'ban?na' b'cherry']

In the 'ignore' case, the unsupported character is removed, while in the 'replace' case, it is replaced with a question mark.

4. Encoding with a Different Encoding Format

We use an alternative encoding format, 'utf-16', to encode strings.

</>
Copy
import numpy as np

# Define an array of strings
fruits = np.array(["apple", "banana", "cherry"], dtype="str")

# Encode using UTF-16
encoded_fruits_utf16 = np.strings.encode(fruits, encoding="utf-16")

# Print results
print("Encoded with UTF-16:", encoded_fruits_utf16)

Output:

Encoded with UTF-16: [b'\xff\xfea\x00p\x00p\x00l\x00e' b'\xff\xfeb\x00a\x00n\x00a\x00n\x00a'
 b'\xff\xfec\x00h\x00e\x00r\x00r\x00y']

UTF-16 encoding results in a different byte representation, including the Byte Order Mark (BOM) at the start.