SAP Knowledge Base Article - Preview

1488246 - Whitespace doesn't break Kanji words - Text Analysis 3.x

Symptom

  • Word segmentation in Kanji
  • The standard Japanese language module
  • The Japanese segmenter doesn't treat whitespace between Kanji words as a token boundary.
  • Kanji words separated by whitespace are being treated as a single token by the Japanese segmenter.


Read more...

Environment

  • Text Analysis 3.0
  • LinguistX Platform 3.7 and 3.8

Product

BusinessObjects Text Analysis, LinguistX platform SDK 3.0 ; SAP BusinessObjects Text Analysis XI 3.0 ; SAP Text Analysis SDK (for OEMs) XI 3.0

Keywords

TA,LXP,SDK,Kanji,segment,token,whitespace , KBA , EIM-TA , Text Analysis , Problem

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.