How to diagnose and fix the 22P05 untranslatable_character error code in Postgres.

The 22P05 error code in PostgreSQL, which stands for untranslatable_character, indicates that a character in a string could not be translated to the server’s character set. This often occurs when you’re trying to import or convert data between different character encodings where certain characters do not have a direct match in the target encoding.

To diagnose and fix the 22P05 error, here’s what you need to do:

Identify the Problematic Character: Determine which character or characters in your data cannot be translated. This often requires examining the error message details or manually inspecting the data.

Check the Client and Server Encoding: Ensure that the client application and the PostgreSQL server are using compatible character encodings. You can check the server’s encoding with the following SQL command:

   SHOW SERVER_ENCODING;

Convert the Data to a Compatible Encoding: If necessary, convert your data to an encoding compatible with the PostgreSQL server before attempting to insert it into the database.
Use the convert_to and convert_from Functions: PostgreSQL provides functions for converting between different character encodings. You can use these functions to convert character strings within PostgreSQL.

Here’s an example of how you might encounter the 22P05 error:

-- Assuming the server encoding is UTF8 and you try to insert a string with an invalid byte sequence
INSERT INTO my_table (my_column) VALUES (E'\x92');

The byte sequence \x92 might not be valid in the server’s encoding (UTF-8), leading to the 22P05 error.

To fix this, you can try to remove or replace the problematic character, or convert the string from the correct encoding:

-- Remove the problematic character if it's not essential
INSERT INTO my_table (my_column) VALUES ('Some valid UTF-8 string');

-- Or replace it with a valid character
INSERT INTO my_table (my_column) VALUES (replace(E'\x92', '’', ''''));

If you know the source encoding of the string, you can use convert_from to convert it to the server’s encoding:

-- Convert the string from the source encoding (e.g., WINDOWS-1252) to the server encoding (UTF-8)
INSERT INTO my_table (my_column) VALUES (convert_from(E'\x92'::bytea, 'WINDOWS-1252'));

In this example, the byte sequence is assumed to be in the Windows-1252 encoding, where \x92 represents the right single quotation mark. The convert_from function is used to convert it to the server’s UTF-8 encoding.

Remember that character encoding issues can be complex, and it’s essential to understand the encodings involved when working with text data. If you’re regularly dealing with multiple encodings, it may be beneficial to standardize your workflow to use UTF-8, which can represent a vast array of characters from many languages.

Leave a Comment Cancel reply