mblen() Function
The mblen() function is declared in the header file <stdlib.h>.
The mblen() function determines the length in bytes of a multibyte character pointed to by a given pointer. It examines up to a specified maximum number of bytes and maintains an internal shift state that can be reset by passing a NULL pointer.
Syntax of mblen()
int mblen(const char *pmb, size_t max);
Parameters
| Parameter | Description |
|---|---|
pmb | Pointer to the first byte of a multibyte character. It can also be NULL to reset the internal shift state. |
max | The maximum number of bytes to consider for the multibyte character. No more than MB_CUR_MAX bytes are examined. |
Note that the behavior of mblen() depends on the current locale’s LC_CTYPE category. Additionally, calling mblen() with a NULL pointer resets its internal state and returns information on whether the encoding is state-dependent.
Return Value
If pmb is not NULL, the function returns the number of bytes that form the multibyte character, returns zero if the character is the terminating null character, or -1 if the sequence is invalid. When pmb is NULL, a nonzero value is returned if multibyte encodings are state-dependent, and zero otherwise.
Examples for mblen()
Example 1: Checking the Length of a Single Multibyte Character
This example demonstrates how to use mblen() to obtain the byte-length of a single multibyte character.
Program
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "");
const char *mbChar = "é"; // A multibyte character in UTF-8 encoding
int len = mblen(mbChar, MB_CUR_MAX);
if (len == -1) {
printf("Invalid multibyte sequence.\n");
} else if (len == 0) {
printf("Reached the terminating null character.\n");
} else {
printf("The multibyte character length is %d bytes.\n", len);
}
return 0;
}
Explanation:
- A pointer
mbCharis assigned a UTF-8 encoded multibyte character, e.g., “é”. mblen()is called withmbCharandMB_CUR_MAXto determine the number of bytes for this character.- The function returns the length (typically 2 bytes for “é” in UTF-8) or an error code.
- An appropriate message is printed based on the return value.
Program Output:
The multibyte character length is 2 bytes.
Example 2: Resetting the Internal Shift State
This example shows how to reset the internal shift state of mblen() by calling it with a NULL pointer.
Program
#include <stdio.h>
#include <stdlib.h>
int main() {
// Reset the internal shift state of mblen()
int state = mblen(NULL, 0);
if (state != 0) {
printf("Multibyte encodings are state-dependent.\n");
} else {
printf("Multibyte encodings are not state-dependent.\n");
}
return 0;
}
Explanation:
- The function
mblen()is called with aNULLpointer to reset its internal state. - The return value indicates whether the current multibyte encoding is state-dependent.
- The program prints a message based on whether the encoding is state-dependent or not.
Program Output:
Multibyte encodings are not state-dependent.
Example 3: Handling Invalid Multibyte Sequences
This example demonstrates how mblen() handles an invalid multibyte sequence.
Program
#include <stdio.h>
#include <stdlib.h>
int main() {
// Intentionally provide an invalid multibyte sequence
const char *invalidMB = "\xFF";
int len = mblen(invalidMB, MB_CUR_MAX);
if (len == -1) {
printf("Invalid multibyte sequence encountered.\n");
} else {
printf("The multibyte character length is %d bytes.\n", len);
}
return 0;
}
Explanation:
- An invalid multibyte sequence is provided via the pointer
invalidMB. mblen()returns -1 to indicate an error with the sequence.- The program checks for the error and prints an appropriate error message.
Program Output:
Invalid multibyte sequence encountered.
Example 4: Determining the Lengths of Characters in a Multibyte String
This example iterates through a multibyte string and uses mblen() to determine the length of each character sequentially.
Program
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "");
const char *mbStr = "¡Hola!";
int totalBytes = 0;
int charLen;
while (*mbStr != '\0') {
charLen = mblen(mbStr, MB_CUR_MAX);
if (charLen < 0) {
printf("Invalid multibyte character encountered.\n");
return 1;
}
printf("Character length: %d bytes\n", charLen);
totalBytes += charLen;
mbStr += charLen;
}
printf("Total bytes processed: %d\n", totalBytes);
return 0;
}
Explanation:
- A multibyte string
"¡Hola!"is defined. - The program iterates over the string, using
mblen()to determine the length of each multibyte character. - The length of each character is printed and accumulated into a total byte count.
- Finally, the total number of bytes processed is printed.
Program Output:
Character length: 2 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Total bytes processed: 7
