SAP Knowledge Base Article - Public

3487712 - Some Chinese Character are not decoding correctly - Invalid Character returned for some chinese words

Symptom

When running one of Integration Center xml output files, encountered a parsing issue with the error message "Character reference "�" is an invalid XML character." for Chinese character 𠮷 .

Preview: <Alt_Last_Name>𠮷田 </Alt_Last_Name
Output file: <Alt_Last_Name>&#55362;&#57271;田</Alt_Last_Name

Image/data in this KBA is from SAP internal systems, sample data, or demo systems. Any resemblance to real data is purely coincidental.

Environment

  • SAP SuccessFactors HXM Suite
    • Integrations
      • Integration Center

Reproducing the Issue

  1. Goto Integration Center, click on Create
  2. Select XML based output integration.
  3. Configure the fields and preview will look like

  4. Configure destination and save the integration. 
  5. Run the job.
  6. SFTP output file will look like:

  7. As you see, character 𠮷 didn't get decode correctly.

Cause

The current implementation of XML UTF-8 encoding in the Integration Center does not account for supplementary characters like 𠮷. 

Resolution

We (SAP) need to explore alternative methods for proper encoding or consider using UTF-16 encoding. This requires further analysis and brainstorming.

JIRA has been created for this fix. Currently we don't have any ETA on this fix. Currently development team assessing the change and performing feasibility check. Once we have a possible solution in place, we will come up with estimate and will update this KBA.

Please keep on monitoring this KBA for the fix.

Workaround:

Currently, supplementary characters in the range of U+10000 to U+10FFFF are not supported in the Integration Center. According to our research, the character 𠮷 is a variant of the kanji 吉. The UTF-8 code for the kanji 吉 falls within the Basic Multilingual Plane (BMP), which includes characters from U+0000 to U+FFFF. BMP ranges are supported in IC. 

We recommend that, use the kanji 吉 instead of the character 𠮷. This approach should also be applied to all other supplementary characters. 

See Also

XML Output | SAP Help Portal

Keywords

XML based integration, XML, Chinese character, encode, invalid XML character, decode, SF, integration center, odata api, odata, SFTP, SFTP file, invalid character, Chinese character, invalid value, Chinese, XML based output integration , KBA , LOD-SF-INT-INC , Integration Center , Problem

Product

SAP SuccessFactors HXM Suite 2311