mblen() Function

The mblen() function is declared in the header file <stdlib.h>.

The mblen() function determines the length in bytes of a multibyte character pointed to by a given pointer. It examines up to a specified maximum number of bytes and maintains an internal shift state that can be reset by passing a NULL pointer.


Syntax of mblen()

</>
Copy
int mblen(const char *pmb, size_t max);

Parameters

ParameterDescription
pmbPointer to the first byte of a multibyte character. It can also be NULL to reset the internal shift state.
maxThe maximum number of bytes to consider for the multibyte character. No more than MB_CUR_MAX bytes are examined.

Note that the behavior of mblen() depends on the current locale’s LC_CTYPE category. Additionally, calling mblen() with a NULL pointer resets its internal state and returns information on whether the encoding is state-dependent.


Return Value

If pmb is not NULL, the function returns the number of bytes that form the multibyte character, returns zero if the character is the terminating null character, or -1 if the sequence is invalid. When pmb is NULL, a nonzero value is returned if multibyte encodings are state-dependent, and zero otherwise.


Examples for mblen()

Example 1: Checking the Length of a Single Multibyte Character

This example demonstrates how to use mblen() to obtain the byte-length of a single multibyte character.

Program

</>
Copy
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <locale.h>

int main() {
    setlocale(LC_CTYPE, "");
    const char *mbChar = "é";  // A multibyte character in UTF-8 encoding
    int len = mblen(mbChar, MB_CUR_MAX);

    if (len == -1) {
        printf("Invalid multibyte sequence.\n");
    } else if (len == 0) {
        printf("Reached the terminating null character.\n");
    } else {
        printf("The multibyte character length is %d bytes.\n", len);
    }
    return 0;
}

Explanation:

  1. A pointer mbChar is assigned a UTF-8 encoded multibyte character, e.g., “é”.
  2. mblen() is called with mbChar and MB_CUR_MAX to determine the number of bytes for this character.
  3. The function returns the length (typically 2 bytes for “é” in UTF-8) or an error code.
  4. An appropriate message is printed based on the return value.

Program Output:

The multibyte character length is 2 bytes.

Example 2: Resetting the Internal Shift State

This example shows how to reset the internal shift state of mblen() by calling it with a NULL pointer.

Program

</>
Copy
#include <stdio.h>
#include <stdlib.h>

int main() {
    // Reset the internal shift state of mblen()
    int state = mblen(NULL, 0);

    if (state != 0) {
        printf("Multibyte encodings are state-dependent.\n");
    } else {
        printf("Multibyte encodings are not state-dependent.\n");
    }
    return 0;
}

Explanation:

  1. The function mblen() is called with a NULL pointer to reset its internal state.
  2. The return value indicates whether the current multibyte encoding is state-dependent.
  3. The program prints a message based on whether the encoding is state-dependent or not.

Program Output:

Multibyte encodings are not state-dependent.

Example 3: Handling Invalid Multibyte Sequences

This example demonstrates how mblen() handles an invalid multibyte sequence.

Program

</>
Copy
#include <stdio.h>
#include <stdlib.h>

int main() {
    // Intentionally provide an invalid multibyte sequence
    const char *invalidMB = "\xFF";
    int len = mblen(invalidMB, MB_CUR_MAX);

    if (len == -1) {
        printf("Invalid multibyte sequence encountered.\n");
    } else {
        printf("The multibyte character length is %d bytes.\n", len);
    }
    return 0;
}

Explanation:

  1. An invalid multibyte sequence is provided via the pointer invalidMB.
  2. mblen() returns -1 to indicate an error with the sequence.
  3. The program checks for the error and prints an appropriate error message.

Program Output:

Invalid multibyte sequence encountered.

Example 4: Determining the Lengths of Characters in a Multibyte String

This example iterates through a multibyte string and uses mblen() to determine the length of each character sequentially.

Program

</>
Copy
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <locale.h>

int main() {
    setlocale(LC_CTYPE, "");
    const char *mbStr = "¡Hola!";
    int totalBytes = 0;
    int charLen;
    
    while (*mbStr != '\0') {
        charLen = mblen(mbStr, MB_CUR_MAX);
        if (charLen < 0) {
            printf("Invalid multibyte character encountered.\n");
            return 1;
        }
        printf("Character length: %d bytes\n", charLen);
        totalBytes += charLen;
        mbStr += charLen;
    }
    
    printf("Total bytes processed: %d\n", totalBytes);
    return 0;
}

Explanation:

  1. A multibyte string "¡Hola!" is defined.
  2. The program iterates over the string, using mblen() to determine the length of each multibyte character.
  3. The length of each character is printed and accumulated into a total byte count.
  4. Finally, the total number of bytes processed is printed.

Program Output:

Character length: 2 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Character length: 1 bytes
Total bytes processed: 7