How do I read the DO Patent output? – Deep Origin

DO Patent output consists of several columns for recognition quality assessment and subsequent data processing:

Extracted Image: The original image in the PDF document as the algorithm extracted it.
Predicted structure: 2D rendering of a chemical structure encoded in a SMILES string (see below).
Confidence: Confidence score indicating accuracy of recognition and the need for manual data review. We recommend sorting results by the confidence score.
- >0.98 confidence score: high likelihood of accurate recognition
- 0.92-0.98 confidence score: manual review is needed
- <0.92 confidence score: poor recognition, consider discarding result
Confidence details: Specific recognition tokens forming the confidence score from the elements of the molecular structure.
SMILES: 1D representation of the molecule predicted by the algorithm. This is a standard format for data import across all scientific software solutions.
Source: Name of the original PDF document.
Page: Page number of the recognized image of the molecule.

Related to

Related articles