numpy strings

In NumPy version 2.0 and later, string operations are primarily handled by the numpy.strings module. This module provides a comprehensive set of universal functions (ufuncs) designed to operate efficiently on arrays of type numpy.str_ or numpy.bytes_. These functions facilitate vectorized string operations, enhancing performance when working with large datasets. 

Key Features of numpy.strings:

  • Element-wise Operations: Functions like addmultiply, and mod allow for element-wise string concatenation, repetition, and formatting.
  • String Manipulation: Utilities such as capitalizecenterdecodeencodeexpandtabsljustlowerlstripreplacerjustrstripstripswapcasetitletranslateupper, and zfill provide various string manipulation capabilities.
  • Comparison Functions: Functions like equalnot_equalgreater_equalless_equalgreater, and lessenable element-wise string comparisons.
  • String Information: Functions such as countendswithfindindexisalnumisalphaisdecimalisdigitislowerisnumericisspaceistitleisupperrfindrindexstartswith, and str_len assist in retrieving information about string elements.

String Operations

numpy.strings.add() add(x1, x2, /[, out, where, casting, order, ...])
Performs element-wise string concatenation for two arrays of strings.
numpy.strings.center() center(a, width[, fillchar])
Returns a copy of each string element centered within a string of the specified width, with optional fill characters.
numpy.strings.capitalize() capitalize(a)
Returns a copy of each string element with the first character capitalized and the rest lowercased.
numpy.strings.decode() decode(a[, encoding, errors])
Decodes each byte-string element to a string using the specified encoding.
numpy.strings.encode() encode(a[, encoding, errors])
Encodes each string element to a byte-string using the specified encoding.
numpy.strings.expandtabs() expandtabs(a[, tabsize])
Replaces tab characters in each string element with spaces, using the specified tab size.
numpy.strings.ljust() ljust(a, width[, fillchar])
Returns an array with each string element left-justified in a string of the given width, using optional fill characters.
numpy.strings.lower() lower(a)
Returns a copy of each string element converted to lowercase.
numpy.strings.lstrip() lstrip(a[, chars])
Removes leading characters from each string element, using an optional list of characters to remove.
numpy.strings.mod() mod(a, values)
Performs string formatting (pre-Python 2.6 interpolation) element-wise on an array of strings.
numpy.strings.multiply() multiply(a, i)
Performs element-wise string multiplication, repeating each string element i times.
numpy.strings.partition() partition(a, sep)
Splits each string element into three parts: the part before the separator, the separator itself, and the part after.
numpy.strings.replace() replace(a, old, new[, count])
Returns a copy of each string element where occurrences of old are replaced by new, optionally limiting the number of replacements.
numpy.strings.rjust() rjust(a, width[, fillchar])
Returns an array where each string element is right-justified in a field of the specified width, using optional fill characters.
numpy.strings.rpartition() rpartition(a, sep)
Splits each string element into three parts: the part before the last occurrence of the separator, the separator itself, and the part after.
numpy.strings.rstrip() rstrip(a[, chars])
Returns a copy of each string element with trailing characters removed, using an optional set of characters to strip.
numpy.strings.strip() strip(a[, chars])
Returns a copy of each string element with leading and trailing characters removed, using an optional set of characters to strip.
numpy.strings.swapcase() swapcase(a)
Returns a copy of each string element with uppercase characters converted to lowercase and vice versa.
numpy.strings.title() title(a)
Returns a copy of each string element converted to title case, where the first letter of each word is capitalized.
numpy.strings.translate() translate(a, table[, deletechars])
Returns a copy of each string element where characters in deletechars are removed, and remaining characters are mapped using a translation table.
numpy.strings.upper() upper(a)
Returns a copy of each string element converted to uppercase.
numpy.strings.zfill() zfill(a, width)
Returns a copy of each numeric string element left-filled with zeros to match the specified width.

String Comparison Functions

numpy.equal() equal(x1, x2, /[, out, where, casting, ...])
Performs element-wise comparison, returning True where x1 == x2.
numpy.not_equal() not_equal(x1, x2, /[, out, where, casting, ...])
Performs element-wise comparison, returning True where x1 != x2.
numpy.greater_equal() greater_equal(x1, x2, /[, out, where, ...])
Performs element-wise comparison, returning True where x1 >= x2.
numpy.less_equal() less_equal(x1, x2, /[, out, where, casting, ...])
Performs element-wise comparison, returning True where x1 <= x2.
numpy.greater() greater(x1, x2, /[, out, where, casting, ...])
Performs element-wise comparison, returning True where x1 > x2.
numpy.less() less(x1, x2, /[, out, where, casting, ...])
Performs element-wise comparison, returning True where x1 < x2.

String Information Functions

numpy.strings.count() count(a, sub[, start, end])
Returns an array with the number of non-overlapping occurrences of the substring sub within each string element, optionally within the specified range [start, end).
numpy.strings.endswith() endswith(a, suffix[, start, end])
Returns a boolean array where True indicates that the string element ends with the specified suffix, optionally within the range [start, end).
numpy.strings.find() find(a, sub[, start, end])
Returns the lowest index in each string element where the substring sub is found, or -1 if not found, optionally searching within [start, end).
numpy.strings.index() index(a, sub[, start, end])
Similar to find() but raises a ValueError if the substring is not found.
numpy.strings.isalnum() isalnum(x, /[, out, where, casting, order, ...])
Returns True for each element where all characters are alphanumeric and there is at least one character; otherwise, returns False.
numpy.strings.isalpha() isalpha(x, /[, out, where, casting, order, ...])
Returns True for each element where all characters are alphabetic and there is at least one character; otherwise, returns False.
numpy.strings.isdecimal() isdecimal(x, /[, out, where, casting, ...])
Returns True for each element where all characters are decimal digits; otherwise, returns False.
numpy.strings.isdigit() isdigit(x, /[, out, where, casting, order, ...])
Returns True for each element where all characters are digits and there is at least one character; otherwise, returns False.
numpy.strings.islower() islower(x, /[, out, where, casting, order, ...])
Returns True for each element where all cased characters are lowercase and there is at least one cased character; otherwise, returns False.
numpy.strings.isnumeric() isnumeric(x, /[, out, where, casting, ...])
Returns True for each element where all characters are numeric; otherwise, returns False.
numpy.strings.isspace() isspace(x, /[, out, where, casting, order, ...])
Returns True for each element where all characters are whitespace and there is at least one character; otherwise, returns False.
numpy.strings.istitle() istitle(x, /[, out, where, casting, order, ...])
Returns True for each element where the string is titlecased (i.e., the first letter of each word is uppercase and the rest are lowercase); otherwise, returns False.
numpy.strings.isupper() isupper(x, /[, out, where, casting, order, ...])
Returns True for each element where all cased characters are uppercase and there is at least one character; otherwise, returns False.
numpy.strings.rfind() rfind(a, sub[, start, end])
Returns the highest index in each string element where the substring sub is found, or -1 if not found, optionally searching within [start, end).
numpy.strings.rindex() rindex(a, sub[, start, end])
Similar to rfind() but raises a ValueError if the substring sub is not found.
numpy.strings.startswith() startswith(a, prefix[, start, end])
Returns a boolean array where True indicates that the string element starts with the specified prefix, optionally within the range [start, end).
numpy.strings.str_len() str_len(x, /[, out, where, casting, order, ...])
Returns the length of each string element in the array.