SAP Knowledge Base Article - Public

3624782 - How to customize the way Optical Character Recognition(OCR) reads fields' data from uploaded invoices

Symptom

  1. How to customize the way Optical Character Recognition(OCR) reads fields' data from uploaded invoices in Central Invoice Management(CIM)?
  2. We have made changes to Extraction Results of an uploaded invoice document via app Manage Document Information Extraction Templates but still the changes do not reflect.

Cause

Altering Extraction Results via app Manage Document Information Extraction Templates may alone not be sufficient to customize the way OCR reads invoice file data. A subscription to the application Document Information Extraction is necessary. 

Document Information Extraction (DOX) is not included in the installation booster and requires separate configuration and licensing. It is a paid service and available under the Base Edition or Premium Edition, depending on your use case. Please see Document Information Extraction in SAP Discovery Center.

To proceed with the implementation of the suggested solution, a DOX instance is a required prerequisite. If customers have enough credits, then we recommend them to go through the documentation Resolution

Once DOX application is present in customer's CIM subaccount, below steps can be performed. The steps considers an example where:

  1. A header level field is stopped from being detected which otherwise would be detected per out-of-the-box design.
  2. A header level field is detected differently. In this example adding a couple of leading zeroes.
  3. A header level field is always detected with a fixed value. 

Steps:

  1. Navigate to your BTP Cockpit within your CIM subaccount. Within the General tab click Instances and Subscriptions

        A screenshot of a web pageAI-generated content may be incorrect.

     2. Go to the application Document Information Extraction

      A screenshot of a computerAI-generated content may be incorrect.

   3. Make sure the loaded client begins with CIM_DEFAULT_. If not click on Change Client and choose the one starting with CIM_DEFAULT_.       

       

   4. Click Templates tab:

        A screenshot of a computerAI-generated content may be incorrect.

   5. Choose Create Template

       

   6. Choose a Name and Description of your choice. Choose Document Type and Schema as shown in below screenshot. Click Create button.

       A screenshot of a computerAI-generated content may be incorrect.

   7. Next page shows the created template as below. Do not Activate yet.

       A screenshot of a computerAI-generated content may be incorrect.

   8. Go to app Manage Document Information Extraction Templates in your CIM Home page.

      A screenshot of a computerAI-generated content may be incorrect.

   9. Go To Documents tab and make a note of the filename for which you would like to make the template modification. In the below screenshot we considered the file ‘Internal CIM file_deliverynote (1).pdf’.

     Note: Invoice files manually uploaded will be shown in ‘Manage Document Information Extraction Templates’ app for 15 days. If you do not see the file, you can add a document via Add Document button in the Documents tab. 

    A screenshot of a computer screenAI-generated content may be incorrect.

   10. Let’s assume you have a business use case wherein you would like the system to not detect the ‘Delivery No.’ at all. In the above screenshot you can see a green box around the ‘Delivery No.’ field which implies that the field is being detected.

   11. Click on Add to Template. In the dropdown you would see the Template that you created in the previous step. Click Add.

      A screenshot of a computerAI-generated content may be incorrect.

   12. Go back to the application Document Information Extraction into the created template. You should see the document added. 

     A screenshot of a computerAI-generated content may be incorrect.

13. Click into the document and then click Edit. As the field in our example is Delivery No. , click on the green box around the value and click Delete as we do not want system to detect it. Once Delete is clicked the green box is removed indicating that the field will not be read by the system. Click Save button.

    A screenshot of a computerAI-generated content may be incorrect.

  14. Go back to the Template and click Extraction Fields tab. Click Edit in the section Header Fields or Line Item Fields depending on your business use case. For this example, it is Header Field deliveryNoteNumber

       A screenshot of a computerAI-generated content may be incorrect. 

     Change the Extraction Method from the default Template with AI to Template Only. Then click Save.

      A screenshot of a computerAI-generated content may be incorrect.

  15. Activate the Template.

    A screenshot of a computerAI-generated content may be incorrect.

Now that the template is active, all new invoices uploaded of this template will stop detecting Delivery No.

Note: If the business use case is to alter the way the field is detected, for example to add couple of leading zeroes before the Delivery No. then at Step 13 change the value from 350815 to 00350815 and click ‘Apply’.

If the business use case is to always pick a constant value for Delivery No. (for example say 999999) then at Step 14 update Fixed Value to 999999 (in addition to updating the Extraction Method as Template Only).

Keywords

KBA , S2P-CIM-OCR , CIM Optical Character Recognition , How To

Product

SAP S/4HANA Cloud Public Edition 2502