Select language
  1. Products
  2. Aspose.OCR
  3. Recognize PDF
clearbit icon

Scanned PDF to text for .NET

.NET OCR plug-in that extracts text from scanned PDFs or converts them into searchable documents from which you can select and copy text. Advanced machine learning algorithms accurately identify the structure of text and tables. The original images are placed in the background so you do not lose the content.

Features and benefits

  • Supports European, Cyrillic and Asian languages, including Chinese and Hindi.
  • Superior recognition speed and accuracy.
  • Can be used to build desktop, cloud and web applications for a number of platforms.
  • Detects and recognizes all popular typefaces and font styles.
  • Processes rotated, distorted and noisy images.
  • Reads all image formats you can get from a scanner or camera.
  • Bulk recognition of multiple images in a folder or archive.
  • Recognizes the whole image or selected areas only.
  • Detects and automatically corrects spelling errors.

How to use the plug-in

1
Install-Package Aspose.OCR

Converting a scanned PDF to text

Digitize scanned documents or convert scans into indexable and searchable PDFs with less than 10 lines of code. Forget about manual retyping, save precious time and focus on the task at hand.

How to use the plug-in

  1. Install Aspose.OCR package from Nuget or a locally downloaded file.
  2. Load a scanned image into `OcrInput` object.
  3. Create an instance of Aspose.OCR recognition engine.
  4. Extract text from an scanned PDF.
  5. Output the recognized text or save it to a file.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
using System;
 using Aspose.ocr;

namespace CSharpTutorials
{
    class Program
    {
        static void Main(string[] args)
        {
            Aspose.OCR.Metered metered = new Aspose.OCR.Metered();
            metered.SetMeteredKey("PublicKey", "PrivateKey");
            // This code recognize image             
            Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();
            // Add images to OcrInput object
            Aspose.OCR.OcrInput input = new Aspose.OCR.OcrInput(Aspose.OCR.InputType.PDF);
            input.Add("source1.pdf", 0, 3); // set start page and pages number
            input.Add("source2.pdf"); // all pages for recognition
            // Set recognition language
            Aspose.OCR.RecognitionSettings recognitionSettings = new Aspose.OCR.RecognitionSettings();
            recognitionSettings.Language = Aspose.OCR.Language.Latin;
            // Recognize image
            List<Aspose.OCR.RecognitionResult> results = recognitionEngine.Recognize(input, recognitionSettings);
            foreach(Aspose.OCR.RecognitionResult result in results)
            {
                Console.WriteLine(result.RecognitionText);
            }
            //save as a file
            results[0].Save("result.txt", Aspose.OCR.SaveFormat.Text);
            //or
            Aspose.OCR.AsposeOcr.SaveMultipageDocument("result.pdf", Aspose.OCR.SaveFormat.Pdf, results);
        }
    }
}

System requirements

To use Scanned PDF to text for .NET plug-in, make sure your system meets the following requirements