NumPy strings.encode()

The numpy.strings.encode() function encodes an array of strings element-wise using the specified encoding format. It is useful for converting string data into encoded byte representations.

Syntax

</>

Copy

numpy.strings.encode(a, encoding=None, errors=None)

Parameters

Parameter	Type	Description
`a`	array_like	An array of strings with `StringDType` or `str_` dtype to be encoded.
`encoding`	str, optional	The encoding format to use, such as `'utf-8'` or `'ascii'`. If not provided, the default system encoding is used.
`errors`	str, optional	Specifies how to handle encoding errors. Common values are `'strict'`, `'ignore'`, and `'replace'`.

Return Value

Returns an ndarray where each element is the encoded byte representation of the corresponding string in the input array.

Examples

1. Encoding a Single String

In this example, we encode a single string into UTF-8 format.

</>

Copy

import numpy as np

# Define a string
fruit = np.array("apple", dtype="str")

# Encode the string using UTF-8
encoded_fruit = np.strings.encode(fruit, encoding="utf-8")

# Print the encoded result
print("Encoded string:", encoded_fruit)

Output:

Encoded string: np.bytes_(b'apple')

2. Encoding an Array of Strings

We encode multiple strings stored in a NumPy array.

</>

Copy

import numpy as np

# Define an array of strings
fruits = np.array(["apple", "banana", "cherry"], dtype="str")

# Encode the array into UTF-8
encoded_fruits = np.strings.encode(fruits, encoding="utf-8")

# Print the encoded result
print("Encoded array:", encoded_fruits)

Output:

Encoded array: [b'apple' b'banana' b'cherry']

3. Encoding with ASCII and Handling Errors

In this example, we attempt to encode a string that contains special characters using ASCII encoding. Since ASCII does not support non-ASCII characters, we specify how errors should be handled.

</>

Copy

import numpy as np

# Define an array with a special character
fruits = np.array(["apple", "banãna", "cherry"], dtype="str")

# Encode using ASCII with 'ignore' to skip unsupported characters
encoded_fruits_ignore = np.strings.encode(fruits, encoding="ascii", errors="ignore")

# Encode using ASCII with 'replace' to replace unsupported characters
encoded_fruits_replace = np.strings.encode(fruits, encoding="ascii", errors="replace")

# Print results
print("Encoded with ignore:", encoded_fruits_ignore)
print("Encoded with replace:", encoded_fruits_replace)

Output:

Encoded with ignore: [b'apple' b'banna' b'cherry']
Encoded with replace: [b'apple' b'ban?na' b'cherry']

In the 'ignore' case, the unsupported character is removed, while in the 'replace' case, it is replaced with a question mark.

4. Encoding with a Different Encoding Format

We use an alternative encoding format, 'utf-16', to encode strings.

</>

Copy

import numpy as np

# Define an array of strings
fruits = np.array(["apple", "banana", "cherry"], dtype="str")

# Encode using UTF-16
encoded_fruits_utf16 = np.strings.encode(fruits, encoding="utf-16")

# Print results
print("Encoded with UTF-16:", encoded_fruits_utf16)

Output:

Encoded with UTF-16: [b'\xff\xfea\x00p\x00p\x00l\x00e' b'\xff\xfeb\x00a\x00n\x00a\x00n\x00a'
 b'\xff\xfec\x00h\x00e\x00r\x00r\x00y']

UTF-16 encoding results in a different byte representation, including the Byte Order Mark (BOM) at the start.

TutorialKart

NumPy strings.encode() – Encode Array of Strings