C++ char16_t Keyword

The char16_t keyword in C++ is a data type introduced in C++11 for representing 16-bit Unicode characters. It is primarily used to handle UTF-16 encoded text, which is common in applications requiring internationalization. Unlike the traditional char, char16_t ensures compatibility with 16-bit wide characters and provides better support for modern Unicode encoding standards.

Strings using char16_t are prefixed with u, and their type is const char16_t*.


Syntax

</>
Copy
char16_t variable_name = u'character';
const char16_t* string_name = u"string";
char16_t
The keyword used to declare a variable to store a 16-bit Unicode character.
variable_name
The name of the variable that stores the Unicode character.
u
A prefix used for UTF-16 encoded strings or characters.

Examples

Example 1: Declaring a UTF-16 Character

This example demonstrates how to declare a char16_t variable and print its value as a Unicode character and integer.

</>
Copy
#include <iostream>
using namespace std;

int main() {
    char16_t ch = u'A'; // Declare a UTF-16 character
    cout << "Character: " << (char)ch << endl;
    cout << "Unicode Value: " << (int)ch << endl;
    return 0;
}

Output:

Character: A
Unicode Value: 65

Explanation:

  1. The char16_t variable ch is initialized with u'A', representing a UTF-16 character.
  2. The character is cast to char for display in the first output line.
  3. The character is cast to int to display its Unicode value, which is 65.

Example 2: Declaring and Printing a UTF-16 String

This example demonstrates how to declare and print a UTF-16 encoded string using char16_t.

</>
Copy
#include <iostream>
#include <string>

int main() {
    // Create a UTF-16 encoded string
    const char16_t* greeting = u"Hello, UTF-16!";
    
    // Convert UTF-16 to wide string (platform-specific handling of wchar_t)
    std::wstring wide_greeting(greeting, greeting + std::char_traits<char16_t>::length(greeting));
    
    // Print using wcout
    std::wcout << L"Message: " << wide_greeting << std::endl;

    return 0;
}

Output:

Message: Hello, UTF-16!

Explanation:

  1. The string u"Hello, UTF-16!" is a UTF-16 encoded string stored in a const char16_t* variable.
  2. The program prints the UTF-16 string correctly to output using std::wstring.

Example 3: Working with Non-ASCII Characters

This example shows how to use char16_t with UTF-16 encoded non-ASCII characters.

</>
Copy
#include <iostream>
#include <string>
#include <codecvt> // For conversion
#include <locale>  // For std::wstring_convert

int main() {
    const char16_t* japanese = u"こんにちは"; // "Hello" in Japanese
    
    // Convert UTF-16 to UTF-8
    std::u16string utf16_string(japanese); // Convert to std::u16string
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> converter;
    std::string utf8_string = converter.to_bytes(utf16_string);

    // Output the UTF-8 encoded string
    std::cout << "UTF-8 String: " << utf8_string << std::endl;

    return 0;
}

Output:

UTF-16 String: こんにちは

Explanation:

  1. A UTF-16 string is declared using the char16_t* type: const char16_t* japanese = u"こんにちは";.
  2. The UTF-16 string is converted into a std::u16string for compatibility with the conversion utilities.
  3. The std::wstring_convert class is used with std::codecvt_utf8_utf16 to perform the UTF-16 to UTF-8 conversion.
  4. The to_bytes method of std::wstring_convert is called to convert the std::u16string into a UTF-8 encoded std::string.
  5. The resulting UTF-8 encoded string is printed to the console using std::cout.
  6. The program successfully outputs the UTF-8 encoded version of the Japanese text “こんにちは”.

Key Points about char16_t Keyword

  1. char16_t is a 16-bit data type introduced in C++11 for handling UTF-16 encoded characters.
  2. Strings using char16_t must be prefixed with u.
  3. char16_t ensures compatibility with modern Unicode encoding standards.
  4. Outputting char16_t data typically requires casting, as std::cout does not natively support it.
  5. Use char16_t when working with UTF-16 strings for internationalization and Unicode compliance in modern applications.