SAP Knowledge Base Article - Public

3320686 - Retraining Status of Scanned invoice is ‘Rejected Due To Personal Data'

Symptom

You want to know

Why does the system recognize personal data in the invoice document?

According to which criteria is the evaluation carried out here? Cause your invoice definitely does not contain any personal data.

What do we have to do so that the system uses the invoice for the training?

Environment

SAP Business ByDesign

Reproducing the Issue

  • Go to work center Supplier Invoicing
  • Go to Invoice Scanning View
  • Upload an invoice and click on Start Process
  • Wait till the status changes to Scanning Completed
  • Review the scan( ensure to remove any personal data existing)
  • check the box “Does not contain Personal Data” and save it
  • Now create an invoice based on the scan and save it
  • After a while you can see that the Retraining Status changes to “Rejected Due To Personal Data”

Cause

The SAP machine learning algorithm performs a pre-check for presence of data privacy relevant attributes like email, contact number (including handwriting) etc., in the invoiced documents.

Since we have very restrictive checking at times it may not be up the expectation and thus its moved to status rejected,

In case any such attributes are found then the document and its annotations are not considered for re-training of the machine learning model. The ‘Retraining Status of those documents will be changed to ‘Rejected Due To Personal Data'. This personal data check thus limits consistently the number of submitted invoices which can be effectively used for retraining of the machine learning model.

The status of documents passing the check for privacy relevant data will be ‘Accepted For Training.’ Because of the necessary post-processing for those documents the retraining of the model can happen several weeks to months later. Currently, there is no status update for those documents that have been used for retraining.

Also, given the genericity of the machine learning algorithm there is no guarantee that you may see significant changes in the overall performance of the algorithm based on the contribution of your submitted invoices (the algorithm is trained on millions of invoices so the perceptual impact of the small sample of invoices submitted by your company might be sometime neglectable).

Resolution

SAP is looking into improving the sensitivity of the data privacy pre-check, to ensure that generic information on data privacy attributes is processed appropriately.

Request you to continue sharing the feedback as this provides the necessary variants to improve the robustness of the pre-check and positively influencing the re-training of the model.

Example – Processing of generic emails like info@sap.com should be allowed, even though email is a data privacy relevant attribute

Keywords

KBA , AP-SIP-SIV , Supplier Invoice , Problem

Product

SAP Business ByDesign all versions