Symptom
You would like to know why the system recognizes personal data in the invoice document.
According to which criteria is this evaluation carried out?
Your invoice does not contain any personal data. What do you need to do so that the system uses the invoice for training?
Environment
SAP Business ByDesign
Reproducing the Issue
- Go to work center Supplier Invoicing.
- Go to Invoice Scanning View.
- Upload an invoice and click on Start Process.
- Wait until the status changes to Scanning Completed.
- Review the scan (ensure to remove any personal data existing).
- check the box “Does not contain Personal Data” and save it.
- Now create an invoice based on the scan and save it.
- After a while you can see that the Retraining Status changes to “Rejected Due To Personal Data”.
Cause
The SAP machine learning algorithm performs a pre-check for the presence of data privacy-relevant attributes such as email, contact number (including handwriting), etc., in the invoiced documents.
Since we have very restrictive checking, at times it may not meet expectations, and thus it is moved to the status 'Rejected.'
If any such attributes are found, then the document and its annotations are not considered for retraining of the machine learning model. The 'Retraining Status' of those documents will be changed to 'Rejected Due to Personal Data.' This personal data check thus consistently limits the number of submitted invoices that can be effectively used for retraining of the machine learning model.
The status of documents passing the check for privacy-relevant data will be 'Accepted for Training.' Due to the necessary post-processing for those documents, the retraining of the model can happen several weeks to months later. Currently, there is no status update for those documents that have been used for retraining.
Also, given the generic nature of the machine learning algorithm, there is no guarantee that you will see significant changes in the overall performance of the algorithm based on the contribution of your submitted invoices. The algorithm is trained on millions of invoices, so the perceptible impact of the small sample of invoices submitted by your company might be sometimes negligible.
Resolution
SAP is looking into improving the sensitivity of the data privacy pre-check to ensure that generic information with data privacy attributes is processed appropriately.
We request that you continue sharing feedback, as this provides the necessary variants to improve the robustness of the pre-check and positively influence the retraining of the model.
Example: The processing of generic emails like info@sap.com should be allowed, even though email is a data privacy-relevant attribute.
Keywords
Invoice Scanning; Personal data; Rejected Due To Personal Data. , KBA , AP-SIP-SIV , Supplier Invoice , Problem
SAP Knowledge Base Article - Public