Platform :

Popular Solution Go   Back

VB.NET - How to OCR the document to create a searchable PDF with VB.Net application

Step 1: Download trial version of Image Viewer cp Gold ActiveX and installed it.

Step 2: Create New Visual Basic Project, select Windows Application.

Step 3: Add a buttons into the form.

Step 4: Add following code in buttons click event.

 Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

Dim iResult as Integer

AxImageViewer1.LoadMultiPage "c:\yoursourcefile.tif",1

AxImageViewer1.OCRCharFilter = ""

AxImageViewer1.OCRRecognizeMode=1

AxImageViewer1.OCRSetRect(0, 0, 0, 0) 

iResult =AxImageViewer1.OCR2SearchableMultipagePDF("c:\test1.pdf", 0, "dictfiles")

If iResult = -3 Then
            MessageBox.Show("Please select image first")
            Exit Sub
Else
            If iResult <> 1 Then
                MessageBox.Show("Start OCR Failed")
            Else
                MessageBox.Show("Save to c:\test1.pdf completed")

            End If

End If


End Sub

A searchable PDF is simply a PDF with multiple layers. The top layer is the original image as it was scanned in. Under that layer is a layer of text accurately positioned so that each word is directly behind the pixels that represent the word

A searchable PDF has following benefits:

   1. You can select the text, copy it to the clipboard, and then paste into a metadata collection form.
   2. You can highlight search hits.
   3. Search engines will be able to index the PDF and return it later as a search result.