C++ char8_t Keyword
The char8_t
keyword in C++ represents an 8-bit data type introduced in C++20 for storing UTF-8 encoded characters. It provides a distinct type for handling UTF-8 strings, making code more readable and improving type safety when working with Unicode text. Unlike char
, which could ambiguously represent text or binary data, char8_t
is specifically designed for UTF-8 character encoding.
Strings using char8_t
are prefixed with u8
, and their type is const char8_t*
.
Syntax
</>
Copy
char8_t variable_name = u8'character';
const char8_t* string_name = u8"string";
- char8_t
- The keyword for declaring a variable to store an 8-bit UTF-8 character.
- variable_name
- The name of the variable to store the UTF-8 character.
- u8
- A prefix used for UTF-8 encoded strings or characters.
Examples
Example 1: Declaring a UTF-8 Character
This example demonstrates how to declare a char8_t
variable and print its value as a character and integer.
</>
Copy
#include <iostream>
using namespace std;
int main() {
char8_t ch = u8'A'; // Declare a UTF-8 character
cout << "Character: " << (char)ch << endl;
cout << "ASCII Value: " << (int)ch << endl;
return 0;
}
Output:
Character: A
ASCII Value: 65
Explanation:
- The
char8_t
variablech
is initialized withu8'A'
, representing a UTF-8 character. - The character is cast to
char
for display in the first output line. - The character is cast to
int
to print its ASCII value, which is65
.
Example 2: Using a UTF-8 String
This example demonstrates how to declare and print a UTF-8 encoded string using char8_t
.
</>
Copy
#include <iostream>
using namespace std;
int main() {
const char8_t* message = u8"Hello, UTF-8!";
cout << "Message: " << (const char*)message << endl;
return 0;
}
Output:
Message: Hello, UTF-8!
Explanation:
- The string
u8"Hello, UTF-8!"
is a UTF-8 encoded string stored in aconst char8_t*
variable. - The string is cast to
const char*
before printing, ascout
does not directly supportchar8_t
. - The program prints the UTF-8 string as expected.
Example 3: UTF-8 with Non-ASCII Characters
This example shows how to use char8_t
with UTF-8 encoded non-ASCII characters.
</>
Copy
#include <iostream>
#include <string>
using namespace std;
int main() {
const char8_t* utf8Str = u8"こんにちは"; // "Hello" in Japanese
cout << "UTF-8 String: " << (const char*)utf8Str << endl;
return 0;
}
Output:
UTF-8 String: こんにちは
Explanation:
- The string
u8"こんにちは"
is UTF-8 encoded and stored in aconst char8_t*
variable. - The string is cast to
const char*
for display, ascout
does not natively supportchar8_t
. - The program prints the UTF-8 encoded string correctly, showing “こんにちは.”
Key Points about char8_t
Keyword
char8_t
is an 8-bit data type introduced in C++20 for handling UTF-8 encoded characters.- Strings using
char8_t
must be prefixed withu8
. - The
char8_t
type improves type safety when working with Unicode text. - Direct output of
char8_t
variables requires explicit casting tochar
orconst char*
for compatibility withstd::cout
. - Use
char8_t
when working with UTF-8 strings for better clarity and standards compliance in modern C++ programs.