メイン コンテンツにスキップする
該当する項目はありません。
Dropbox Sign のロゴ
Dropbox Sign が選ばれる理由
アコーディオンの展開と折りたたみ

機能

ドキュメントへのオンライン署名
電子署名の作成
テンプレートを選択または作成する
PDF への入力と署名
契約書へのオンライン署名
ドキュメント管理
機能を見る
右矢印のアイコン

ユースケース

セールス/ビジネス開発
人事
スタートアップ
金融テクノロジー
不動産
オンデマンド サービス
製品
アコーディオンの展開と折りたたみ
Dropbox のアイコン
Sign
手軽に送信、手軽に署名
Dropbox のアイコン
Sign API
電子署名をワークフローに統合
Dropbox Fax のアイコン
Fax
ファクス機なしでファクスを送信
Dropbox インテグレーションのアイコン
インテグレーション
さまざまなツールと連携
リソース
アコーディオンの展開と折りたたみ
公式ブログ
ワークフローの専門知識と製品ニュース
お客様の体験談
実際の導入事例とその成果
ヘルプセンター
当社製品の詳細ガイド
リソース ライブラリ
レポート、動画、情報シート
開発者向け情報
価格
アコーディオンの展開と折りたたみ
Dropbox Sign の価格
ニーズに合わせてお選びください
Dropbox Sign API の価格
実際の導入事例とその成果
セールス担当に連絡
登録
セールス担当へ連絡する
ログイン
アコーディオンの展開と折りたたみ
Dropbox Sign
Dropbox Forms
Dropbox Fax
無料トライアル
公式ブログ
/
開発者向け情報

How to extract signatures from paper documents

by 
Gourav Bais
April 7, 2023
9
分(記事閲覧時間)
How to Extract Signatures from Paper Documents using data capture
ツールチップのアイコン

新しい名前でも変わらぬ高品質!HelloSign の名称が Dropbox Sign になりました。

閉じるアイコン

In the machine learning (ML) era, everything from language generation to image processing is becoming automated. One emerging field is online document processing, which is used by banking, insurance, healthcare, and other industries to save the time and effort of manual data verification.

‍

ML technologies like intelligent character recognition (ICR) and natural language processing (NLP) are helping organizations to capture data from documents and process them without the risk of human error.

‍

Document processing isn’t limited to extracting text, though; it also involves images and signatures. In this tutorial, you will learn how to create a system that can extract document signatures.

‍

What Is signature extraction?

Signature extraction is the technique of automatically identifying the signatures in a scanned document and cropping them to use for different verification purposes. First, a signature must be detected and cropped out of the document. ML or computer vision models can extract that signature no matter how many times it is present in a document. Then, the signature can be used for validating the person’s identity, Know Your Customer (KYC) processing services, or contract and agreement processing.

‍

Banking and finance services especially rely on signatures to verify a person’s identity. As more businesses transition to online platforms, they’re also switching from manually handled verification tasks to signature extraction, which is becoming increasingly accurate. This helps save the time and effort of printing, scanning, emailing, and making changes to documents.

‍

The following are some use cases for signature extraction:

  • Banks: Banks rely on signature verification, where an extracted signature is validated against a ground truth to confirm that it’s from the same person.
  • Real estate: Buying and selling property requires a lot of paperwork. Once the contracts and legal documents are signed and shared across multiple cities or countries, they are normally converted to digital images. The signatures in those images can be extracted for verification.
  • Sales and procurement: Many consumers have already shifted from in-store to e-commerce purchases. Businesses that make purchases, however, must complete sales contracts and other paperwork.
  • Company onboarding: New hires, especially at larger organizations, will need to sign a letter of acceptance, offer letter, and/or nondisclosure agreement, among other documents.
  • Legal agreements: Signatures are required for documents in all types of legal proceedings, such as court cases or estate hearings.

Implementing signature extraction

A signature extraction system can be developed in two ways: traditional computer vision using OpenCV and object detection with deep learning. In this tutorial, you’ll be implementing the first solution using Python 3.9 and Anaconda.

‍

If you install the latest version of Anaconda, it comes with Python 3.9 and pip, Python’s package manager. It also includes platforms to run your code, like Jupyter Notebook and Spyder. While you can use any of these platforms to write the code, Spyder is preferred because it is more interactive.

‍

Once you have the dependencies set up, you can clone or download this project repository from Ahmet Özlü to follow along.

‍

You should find the following files/folders inside the project repository:

  • The `inputs` folder stores the input images that are passed to the model to extract signatures.
  • The `outputs` folder stores the extracted signatures, or the output images produced by the code.
  • The `signature_extractor.py` file contains the implementation of computer-vision-based connected component analysis. You’ll need to run this file to produce the output.

Install the OpenCV library and other dependencies for the task. You can do so using pip either on the Anaconda prompt or on any terminal provided by Anaconda:

‍

pip install opencv-python
pip install scikit-image

‍

Other libraries like Matplotlib and NumPy already come with Anaconda. If you run into issues, though, you can download them in the same fashion:

‍

pip install matplotlib
pip install numpy

‍

When you open `signature_extractor.py`, you’ll see a lot of code. To better understand the process of signature extraction using connected component analysis, and the meaning behind each code block, follow along with this article and create a new Python file.

‍

First, import the dependencies:

‍

import cv2
import matplotlib.pyplot as plt
from skimage import measure, morphology
from skimage.color import label2rgb
from skimage.measure import regionprops
import numpy as np

‍

Here, `cv2` (`OpenCV`) and `scikit-image` (a.k.a. `skimage`) libraries are used for overall image processing. numpy is used to expedite the mathematical operations applied on the data, and `matplotlib` is used to plot the images.

‍

Read the input image file from the local path and apply preprocessing that will help in the identification of the signature area:

‍

img = cv2.imread('./inputs/in1.jpg', 0)
img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)[1]

‍

Input image

‍

In the above code, first the input image is read from the local path, and the mode of the image is specified as `0`. This indicates that the image has one color channel; in other words, it’s a black and white or grayscale image. Then, binary thresholding is applied to the image. Binary thresholding is the process of converting image pixels to black or white given a threshold, in this case `127`. Pixel values lower than the threshold are converted to 0 (black), and values greater than or equal to the threshold are converted to 255 (white). The image generated is a binary image with two pixel values.

‍

Now that the image is ready, connected component analysis must be applied to detect the connected regions in the image. This helps in identifying the signature area, as signature characters are coupled together. `skimage` provides a function to do this:

‍

# connected component analysis by scikit-learn framework
blobs = img > img.mean()
blobs_labels = measure.label(blobs, background=1)
image_label_overlay = label2rgb(blobs_labels, image=img)

‍

A blob is a set of pixel values that generally distinguishes an object from its background. In this case, the text and signature are blobs on a background of white pixels. The first line of code identifies blobs whose size is greater than the image pixel average. The next line measures the size of each blob. Finally, the blob labels are converted to RGB and are overlaid on the original image for better visualization.

‍

You might want to see the RGB image after connected component analysis. You can do that with Matplotlib:

‍

# draw image
# fix the figure size to (10, 6)
fig, ax = plt.subplots(figsize=(10, 6))

# plot the connected components (for debugging)
ax.imshow(image_label_overlay)
ax.set_axis_off()
plt.tight_layout()
plt.show()

‍

Connected component analysis

‍

So far you’ve read the image, analyzed its components, and visualized it. Generally, a signature will be bigger than other text areas in a document, so you need to do some measurements. Using component analysis, find the biggest component among the blobs:

‍

# initialize the variables to get the biggest component
the_biggest_component = 0
total_area = 0
counter = 0
average = 0.0

# iterate over each blob and get the highest size component
for region in regionprops(blobs_labels):
    # if blob size is greater than 10 then add it to the total area
    if (region.area > 10):
        total_area = total_area + region.area
        counter = counter + 1

    # take regions with large enough areas and filter the highest component
    if (region.area >= 250):
        if (region.area > the_biggest_component):
                the_biggest_component = region.area

# calculate the average of the blob regions
average = (total_area/counter)
print("the_biggest_component: " + str(the_biggest_component))
print("average: " + str(average))

‍

The above code is a simple calculation to get the highest area component by iterating over each text blob. If its size is greater than `10`, add it to the variable `total_area` to compute the area average. If the blob size is greater than `250`, check if it’s greater than the previous element. If it is, replace it with the previous one; otherwise, keep it the same. Repeat until the highest area element is found.

‍

For sizes that this code uses, `10` is good for scanned images, because the smallest object is almost always around the same length. Meanwhile, the largest object in an image is generally the signature, which has a tested value of greater than `250`.

‍

Next, you need to filter out some outliers that might get confused with the signature blob:

‍

# the parameters are used to remove outliers of small size connected pixels
constant_parameter_1 = 84
constant_parameter_2 = 250
constant_parameter_3 = 100

# the parameter is used to remove outliers of large size connected pixels
constant_parameter_4 = 18

‍

The values in the code above were finalized after testing different sets of values for blob sizes to remove for signature extraction.

‍

For outlier removal, you need to define some thresholds. There are four parameters initialized above: three for small size outlier removal, and one for big size outlier removal. First, check the small size outliers to remove:

‍

# experimental-based ratio calculation, modify it for your cases
a4_small_size_outlier_constant = ((average/constant_parameter_1)*constant_parameter_2)+constant_parameter_3
print("a4_small_size_outlier_constant: " + str(a4_small_size_outlier_constant))

‍

Above, `a4_small_size_outlier_constant` is used as a threshold value to remove outlier connected pixels that are smaller than it in A4 size scanned documents.

‍

Similarly, check the big size outliers:

‍

# experimental-based ratio calculation, modify it for your cases
a4_big_size_outlier_constant = a4_small_size_outlier_constant*constant_parameter_4
print("a4_big_size_outlier_constant: " + str(a4_big_size_outlier_constant))

‍

Here, `a4_big_size_outlier_constant` is used as a threshold value to remove outlier connected pixels that are bigger than it in A4 size scanned documents.

‍

Once you have these components, you can use the `morphology` operation to remove the outliers from your blob collection. You can then store the image locally, and it should be close to the final result:

‍

# remove the connected pixels that are smaller than threshold a4_small_size_outlier_constant
pre_version = morphology.remove_small_objects(blobs_labels, a4_small_size_outlier_constant)
# remove the connected pixels that are bigger than threshold a4_big_size_outlier_constant
component_sizes = np.bincount(pre_version.ravel())
too_small = component_sizes > (a4_big_size_outlier_constant)
too_small_mask = too_small[pre_version]
pre_version[too_small_mask] = 0
# save the pre-version, which is the image with color labels after connected component analysis
plt.imsave('pre_version.png', pre_version)

‍

`pre_version.png` is the image obtained after all the preprocessing. As a final step, read this image again and apply Otsu’s thresholding:

‍

# read the pre-version
img = cv2.imread('pre_version.png', 0)
# ensure a binary image with Otsu’s method
img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

‍

To store the final image with signature only, use OpenCV’s write function to save the result:

‍

cv2.imwrite("./outputs/output.png", img)

‍

Output image

‍

Now your document signature extractor is ready.

‍

This signature extractor removes all other entities from the document and leaves only the signature area. If you want to extract the exact location of the signature, you’ll have to use the object detection technique.

‍

Note that you must have a decent amount of data to train the object detection model, which would normally be 200 images. You can follow the steps in this article to prepare your data and train the object detection model for signature extraction.

‍

The best way to extract signatures with Dropbox Sign

You should now have a better understanding of signature extraction and its use cases, as well as how to create a signature extractor. This process offers benefits to a number of industries, because it increases automation and speeds up document processing while reducing human error and freeing up team members to focus on other tasks.

‍

Instead of creating your own signature extractor, though, you can use a ready-made solution. One such application is Dropbox Sign. Its API allows you to sign and track eSignatures while still keeping those documents secure. Dropbox Sign easily integrates into your site or application for a seamless experience. To learn more, check out Dropbox Sign’s documentation.

効率を維持

完了しました。受信トレイをご確認ください。

Thank you!
Thank you for subscribing!

Lorem ipsum

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum
右矢印のアイコン
閉じるアイコン

Up next:

手書き署名のクローズアップ イラスト。最新のデジタル署名ソリューションを表しています。
開発者向け情報
15
分(記事閲覧時間)

Dropbox Sign と Ruby on Rails の連携:チュートリアルで詳しい手順をご紹介

手書き署名のクローズアップ イラスト。最新のデジタル署名ソリューションを表しています。
開発者向け情報
15
分(記事閲覧時間)

Dropbox Sign vs. SignNow for developers

テンプレート

Form 1099-MISC テンプレート

製品
Dropbox SignDropbox Sign APIDropbox Faxインテグレーション
Dropbox Sign が選ばれる理由
電子署名ドキュメントへの署名PDF への入力と署名オンライン契約書電子署名の作成署名エディタWord ドキュメントへの署名
サポート
ヘルプセンターセールス担当に連絡サポートへのお問い合わせCookie の管理スタート ガイド:Dropbox Signスタート ガイド:Dropbox Sign API
リソース
公式ブログお客様の体験談リソース センター適法性ガイドトラスト センター
パートナー
戦略的パートナーパートナー ロケーター
会社
採用情報利用規約プライバシー
Facebook のアイコンYouTube のアイコン

利用可能なお支払い方法

Mastercard のロゴVISA のロゴAmerican Express のロゴDiscover のロゴ
CPA 準拠のバッジHIPAA 準拠のバッジSky High Enterprise Ready のバッジISO 9001 認証のバッジ

Dropbox Sign の電子署名は、米国、欧州連合、英国などを含め、世界中の多くの国で法的に有効です。
詳細については、利用規約およびプライバシー ポリシーをご覧ください。