C++ char32_t Keyword

The char32_t keyword in C++ is a data type introduced in C++11 for representing 32-bit Unicode characters. It is primarily used for handling UTF-32 encoded text, which assigns a unique 32-bit code point for each Unicode character. The char32_t type provides compatibility with modern Unicode standards and is suitable for applications requiring wide character support.

Strings using char32_t are prefixed with U, and their type is const char32_t*.


Syntax

</>
Copy
char32_t variable_name = U'character';
const char32_t* string_name = U"string";
char32_t
The keyword used to declare a variable to store a 32-bit Unicode character.
variable_name
The name of the variable that stores the Unicode character.
U
A prefix used for UTF-32 encoded strings or characters.

Examples

Example 1: Declaring a UTF-32 Character

This example demonstrates how to declare a char32_t variable and print its value as a Unicode character and integer.

</>
Copy
#include <iostream>
using namespace std;

int main() {
    char32_t ch = U'A'; // Declare a UTF-32 character
    cout << "Character: " << (char)ch << endl;
    cout << "Unicode Value: " << (int)ch << endl;
    return 0;
}

Output:

Character: A
Unicode Value: 65

Explanation:

  1. The char32_t variable ch is initialized with U'A', representing a UTF-32 character.
  2. The character is cast to char for display in the first output line.
  3. The character is cast to int to display its Unicode value, which is 65.

Example 2: Declaring and Printing a UTF-32 String

This example demonstrates how to declare and print a UTF-32 encoded string using char32_t.

</>
Copy
#include <iostream>
#include <string>
#include <codecvt> // For conversion (deprecated in C++17)
#include <locale>  // For std::wstring_convert

using namespace std;

int main() {
    // UTF-32 encoded string
    const char32_t* greeting = U"Hello, UTF-32!";

    // Convert to std::u32string
    std::u32string utf32_string(greeting);

    // Convert UTF-32 to UTF-8 using std::wstring_convert
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
    std::string utf8_string = converter.to_bytes(utf32_string);

    // Output the UTF-8 string
    cout << "Message: " << utf8_string << endl;

    return 0;
}

Output:

Message: Hello, UTF-32!

Explanation:

  1. UTF-32 String Declaration: The string greeting is declared as const char32_t* to represent a UTF-32 encoded literal.
  2. Convert to std::u32string: The greeting pointer is converted to a std::u32string for compatibility with conversion utilities.
  3. UTF-32 to UTF-8 Conversion: The std::wstring_convert class is used with std::codecvt_utf8<char32_t> to convert the UTF-32 encoded string to a UTF-8 encoded std::string.
  4. Output with std::cout: The UTF-8 encoded string is printed using std::cout.

Example 3: Working with Non-ASCII Characters

This example shows how to use char32_t with UTF-32 encoded non-ASCII characters.

</>
Copy
#include <iostream>
#include <string>
#include <codecvt> // For conversion (deprecated in C++17)
#include <locale>  // For std::wstring_convert

int main() {
    // UTF-32 encoded string (Japanese for "Hello")
    const char32_t* japanese = U"こんにちは";

    // Convert to std::u32string
    std::u32string utf32_string(japanese);

    // Convert UTF-32 to UTF-8
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
    std::string utf8_string = converter.to_bytes(utf32_string);

    // Print the UTF-8 encoded string
    std::cout << "UTF-8 String: " << utf8_string << std::endl;

    return 0;
}

Output:

UTF-32 String: こんにちは

Explanation:

  1. UTF-32 String Declaration: The japanese string is a UTF-32 encoded string using char32_t*.
  2. Conversion to std::u32string: The raw UTF-32 pointer is wrapped in a std::u32string to simplify handling and compatibility with conversion utilities.
  3. UTF-32 to UTF-8 Conversion: The std::wstring_convert class is used along with std::codecvt_utf8<char32_t>to convert the UTF-32 string to a UTF-8 encoded std::string.
  4. Output with std::cout: The UTF-8 encoded string is printed using std::cout, which is compatible with UTF-8.

Key Points about char32_t Keyword

  1. char32_t is a 32-bit data type introduced in C++11 for handling UTF-32 encoded characters.
  2. Strings using char32_t must be prefixed with U.
  3. char32_t ensures compatibility with modern Unicode encoding standards and supports all Unicode code points.
  4. Outputting char32_t data typically requires casting, as std::cout does not natively support it.
  5. Use char32_t when working with UTF-32 strings for wide character support in modern applications.