mblen() Function
The mblen()
function is declared in the header file <stdlib.h>
.
The mblen()
function determines the length in bytes of a multibyte character pointed to by a given pointer. It examines up to a specified maximum number of bytes and maintains an internal shift state that can be reset by passing a NULL
pointer.
Syntax of mblen()
int mblen(const char *pmb, size_t max);
Parameters
Parameter | Description |
---|---|
pmb | Pointer to the first byte of a multibyte character. It can also be NULL to reset the internal shift state. |
max | The maximum number of bytes to consider for the multibyte character. No more than MB_CUR_MAX bytes are examined. |
Note that the behavior of mblen()
depends on the current locale’s LC_CTYPE category. Additionally, calling mblen()
with a NULL
pointer resets its internal state and returns information on whether the encoding is state-dependent.
Return Value
If pmb
is not NULL
, the function returns the number of bytes that form the multibyte character, returns zero if the character is the terminating null character, or -1 if the sequence is invalid. When pmb
is NULL
, a nonzero value is returned if multibyte encodings are state-dependent, and zero otherwise.
Examples for mblen()
Example 1: Checking the Length of a Single Multibyte Character
This example demonstrates how to use mblen()
to obtain the byte-length of a single multibyte character.
Program
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "");
const char *mbChar = "é"; // A multibyte character in UTF-8 encoding
int len = mblen(mbChar, MB_CUR_MAX);
if (len == -1) {
printf("Invalid multibyte sequence.\n");
} else if (len == 0) {
printf("Reached the terminating null character.\n");
} else {
printf("The multibyte character length is %d bytes.\n", len);
}
return 0;
}
Explanation:
- A pointer
mbChar
is assigned a UTF-8 encoded multibyte character, e.g., “é”. mblen()
is called withmbChar
andMB_CUR_MAX
to determine the number of bytes for this character.- The function returns the length (typically 2 bytes for “é” in UTF-8) or an error code.
- An appropriate message is printed based on the return value.
Program Output:
The multibyte character length is 2 bytes.
Example 2: Resetting the Internal Shift State
This example shows how to reset the internal shift state of mblen()
by calling it with a NULL
pointer.
Program
#include <stdio.h>
#include <stdlib.h>
int main() {
// Reset the internal shift state of mblen()
int state = mblen(NULL, 0);
if (state != 0) {
printf("Multibyte encodings are state-dependent.\n");
} else {
printf("Multibyte encodings are not state-dependent.\n");
}
return 0;
}
Explanation:
- The function
mblen()
is called with aNULL
pointer to reset its internal state. - The return value indicates whether the current multibyte encoding is state-dependent.
- The program prints a message based on whether the encoding is state-dependent or not.
Program Output:
Multibyte encodings are not state-dependent.
Example 3: Handling Invalid Multibyte Sequences
This example demonstrates how mblen()
handles an invalid multibyte sequence.
Program
#include <stdio.h>
#include <stdlib.h>
int main() {
// Intentionally provide an invalid multibyte sequence
const char *invalidMB = "\xFF";
int len = mblen(invalidMB, MB_CUR_MAX);
if (len == -1) {
printf("Invalid multibyte sequence encountered.\n");
} else {
printf("The multibyte character length is %d bytes.\n", len);
}
return 0;
}
Explanation:
- An invalid multibyte sequence is provided via the pointer
invalidMB
. mblen()
returns -1 to indicate an error with the sequence.- The program checks for the error and prints an appropriate error message.
Program Output:
Invalid multibyte sequence encountered.
Example 4: Determining the Lengths of Characters in a Multibyte String
This example iterates through a multibyte string and uses mblen()
to determine the length of each character sequentially.
Program
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "");
const char *mbStr = "¡Hola!";
int totalBytes = 0;
int charLen;
while (*mbStr != '\0') {
charLen = mblen(mbStr, MB_CUR_MAX);
if (charLen < 0) {
printf("Invalid multibyte character encountered.\n");
return 1;
}
printf("Character length: %d bytes\n", charLen);
totalBytes += charLen;
mbStr += charLen;
}
printf("Total bytes processed: %d\n", totalBytes);
return 0;
}
Explanation:
- A multibyte string
"¡Hola!"
is defined. - The program iterates over the string, using
mblen()
to determine the length of each multibyte character. - The length of each character is printed and accumulated into a total byte count.
- Finally, the total number of bytes processed is printed.
Program Output:
Character length: 2 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Total bytes processed: 7