Use OCR (DEPRECATED)
DEPRECATED: USE files.scanDocument(fileNameId, properties) INSTEAD.
Totalum allows you to use OCR to extract text from images and pdf documents.
π Setup Required: For installation and usage of the Totalum SDK or API, see the Installation Guide.
Example of how to use the ocr endpoint with the Totalum SDK:
get OCR of an image (get the text of an image)β
// si ya tienes el archivo subido a totalum, puedes usar el nombre del archivo subido para extraer el texto
const fileName = 'your_file_name_id.your-image-extension'; // replace 'your_file_name' with the name of your file, replace .png with the extension of your image
const resultOcr = await totalumClient.files.ocrOfImage(fileName);
const ocrResult = result.data;
// ocrResult.text will contain all text of the image
// ocrResult.fullDetails will contain all details of the image, like the language, in the position of the text, etc.
// si no tienes el archivo subido a totalum, primero tendrΓ‘s que subirlo y luego extraer el texto
const fileName = 'your_file_name_id.your-image-extension'; // replace 'your_file_name' with the name of your file, replace .png with the extension of your image
const file = yourFileBlob // replace yourFile with your file object binary blob (in blob format)
const fileFormData = new FormData();
fileFormData.append('file', file, fileName);
const result = await totalumClient.files.uploadFile(fileFormData);
const fileNameId = result.data;
const result = await totalumClient.files.ocrOfImage(fileNameId);
const ocrResult = result.data;
// ocrResult.text will contain all text of the image
// ocrResult.fullDetails will contain all details of the image, like the language, the position of the text, etc
get OCR of a pdf (get the text of a pdf)β
// if you already have the file uploaded to totalum, you can use the name of the uploaded file to extract the text
const fileName = 'nombre-del-archivo.pdf'
const resultOcr = await totalumClient.files.ocrOfImage(fileName);
const ocrResult = result.data;
// ocrResult.text will contain all text of the pdf
// ocrResult.fullDetails will contain all details of the pdf, like the language, in which page is the text, the position of the text, etc.
// if you don't have the file uploaded to totalum, you will first need to upload it and then extract the text
const fileName = 'your_file_name.png'; // replace 'your_file_name' with the name of your file, replace .png with the extension of your file
const file = yourFileBlob // replace yourFile with your file object binary blob (in blob format)
const fileFormData = new FormData();
fileFormData.append('file', file, fileName);
const result = await totalumClient.files.uploadFile(fileFormData);
const fileNameId = result.data;
const result = await totalumClient.files.ocrOfPdf(fileNameId);
const ocrResult = result.data;
// ocrResult.text will contain all text of the pdf
// ocrResult.fullDetails will contain all details of the pdf, like the language, in which page is the text, the position of the text, etc.