How to diagnose and fix the 22021 character_not_in_repertoire error code in Postgres.

The 22021: character_not_in_repertoire error in Postgres occurs when you try to store a character in a database that is not valid for the database’s encoding. This typically happens when inserting data that contains invalid byte sequences that cannot be represented in the database’s character set.

To diagnose and fix this error, you can follow these steps:

  1. Identify the Invalid Data: Look at the data you’re trying to insert and check for any unusual characters. This might be non-printable characters, characters from a different encoding, or anything that doesn’t match the database’s expected encoding.
  2. Check Database Encoding: Verify the encoding of your PostgreSQL database using the \l command in the psql command-line interface. Ensure that the data you’re inserting matches this encoding.
  3. Examine the Application Code: If you’re inserting data through an application, ensure that the application specifies the correct encoding when connecting to the database. For instance, set the client encoding in PostgreSQL to match the encoding of the input data.
  4. Convert the Data: If the data is in a different encoding, convert it to the encoding expected by PostgreSQL. You can use tools like iconv in Unix-based systems for conversion. For example, if your data is in Windows-1252 encoding and PostgreSQL expects UTF-8, you would convert it using iconv -f WINDOWS-1252 -t UTF-8.
  5. Remove or Escape Invalid Characters: If there are specific characters that are causing the issue, you might need to remove them or escape them before insertion. For instance, the null byte (0x00) is not allowed in text fields in PostgreSQL and must be removed.
  6. Use Bytea for Binary Data: If you are dealing with binary data, use the bytea data type instead of text types to avoid encoding issues.

Here are some examples where these steps could be applied:

  • When inserting data from a CSV file that was exported from a Windows system, you might encounter encoding issues due to Windows-1252 characters. You would need to convert the file to UTF-8 before importing it into PostgreSQL.
  • When a PHP application incorrectly handles UTF-8 data and tries to store it in PostgreSQL, you might get the 22021 error. You would need to make sure the application correctly encodes the data as UTF-8 before sending it to the database.

For more information on PostgreSQL error codes, you can refer to the official PostgreSQL documentation. If you’re using an Elixir/Phoenix application, you might encounter this error when integrating with libraries such as Cloak, as discussed in a GitHub issue. Troubleshooting would involve checking the data being encrypted or decrypted by the library for invalid characters.

Leave a Comment