C++ char8_t Keyword

The char8_t keyword in C++ represents an 8-bit data type introduced in C++20 for storing UTF-8 encoded characters. It provides a distinct type for handling UTF-8 strings, making code more readable and improving type safety when working with Unicode text. Unlike char, which could ambiguously represent text or binary data, char8_t is specifically designed for UTF-8 character encoding.

Strings using char8_t are prefixed with u8, and their type is const char8_t*.


Syntax

</>
Copy
char8_t variable_name = u8'character';
const char8_t* string_name = u8"string";
char8_t
The keyword for declaring a variable to store an 8-bit UTF-8 character.
variable_name
The name of the variable to store the UTF-8 character.
u8
A prefix used for UTF-8 encoded strings or characters.

Examples

Example 1: Declaring a UTF-8 Character

This example demonstrates how to declare a char8_t variable and print its value as a character and integer.

</>
Copy
#include <iostream>
using namespace std;

int main() {
    char8_t ch = u8'A'; // Declare a UTF-8 character
    cout << "Character: " << (char)ch << endl;
    cout << "ASCII Value: " << (int)ch << endl;
    return 0;
}

Output:

Character: A
ASCII Value: 65

Explanation:

  1. The char8_t variable ch is initialized with u8'A', representing a UTF-8 character.
  2. The character is cast to char for display in the first output line.
  3. The character is cast to int to print its ASCII value, which is 65.

Example 2: Using a UTF-8 String

This example demonstrates how to declare and print a UTF-8 encoded string using char8_t.

</>
Copy
#include <iostream>
using namespace std;

int main() {
    const char8_t* message = u8"Hello, UTF-8!";
    cout << "Message: " << (const char*)message << endl;
    return 0;
}

Output:

Message: Hello, UTF-8!

Explanation:

  1. The string u8"Hello, UTF-8!" is a UTF-8 encoded string stored in a const char8_t* variable.
  2. The string is cast to const char* before printing, as cout does not directly support char8_t.
  3. The program prints the UTF-8 string as expected.

Example 3: UTF-8 with Non-ASCII Characters

This example shows how to use char8_t with UTF-8 encoded non-ASCII characters.

</>
Copy
#include <iostream>
#include <string>
using namespace std;

int main() {
    const char8_t* utf8Str = u8"こんにちは"; // "Hello" in Japanese
    cout << "UTF-8 String: " << (const char*)utf8Str << endl;
    return 0;
}

Output:

UTF-8 String: こんにちは

Explanation:

  1. The string u8"こんにちは" is UTF-8 encoded and stored in a const char8_t* variable.
  2. The string is cast to const char* for display, as cout does not natively support char8_t.
  3. The program prints the UTF-8 encoded string correctly, showing “こんにちは.”

Key Points about char8_t Keyword

  1. char8_t is an 8-bit data type introduced in C++20 for handling UTF-8 encoded characters.
  2. Strings using char8_t must be prefixed with u8.
  3. The char8_t type improves type safety when working with Unicode text.
  4. Direct output of char8_t variables requires explicit casting to char or const char* for compatibility with std::cout.
  5. Use char8_t when working with UTF-8 strings for better clarity and standards compliance in modern C++ programs.