画像内の単語とグラフを検出し、単語やグラフごとに1画像に画像を切り出します。

私は数学の学習に役立つWebアプリケーションを構築しています。画像内の単語とグラフを検出し、単語やグラフごとに1画像に画像を切り出します。

LaTexファイルのMathコンテンツを表示する必要があります。これらのLatexファイルは、私がpdf2svgのおかげでsvgにきれいに変換できるpdfに（きれいに）レンダリングされます。

_______________________________________ 
|          | 
| 1. Word1 word2 word3 word4   | 
| a. Word5 word6 word7    | 
|          | 
| ///////////Graph1///////////  | 
|          | 
| b. Word8 word9 word10    | 
|          | 
| 2. Word11 word12 word13 word14  | 
|          | 
|_______________________________________|

実例：

（SVGやPNGまたは任意の画像形式）画像は、このようなものになります

をWebアプリケーションの意図がありますコンテンツを操作してこれにコンテンツを追加すると、次のようなものになります。

_______________________________________ 
|          | 
| 1. Word1 word2      | <-- New line break 
|_______________________________________| 
|          | 
| -> NewContent1      | 
|_______________________________________| 
|          | 
| word3 word4       | 
|_______________________________________| 
|          | 
| -> NewContent2      | 
|_______________________________________| 
|          | 
| a. Word5 word6 word7    | 
|_______________________________________| 
|          | 
| ///////////Graph1///////////  | 
|_______________________________________| 
|          | 
| -> NewContent3      | 
|_______________________________________| 
|          | 
| b. Word8 word9 word10    | 
|_______________________________________| 
|          | 
| 2. Word11 word12 word13 word14  | 
|_______________________________________|

例：

大規模な単一の画像は私の操作のこの種を行うための柔軟性を与えることはできません。

しかし、イメージファイルが単一の単語と単一のグラフを保持する小さなファイルに分割された場合、私はこれらの操作を行うことができました。私はこれを行う方法を探しています

_______________________________________ 
|   |  |  |   | 
| 1. Word1 | word2 | word3 | word4  | 
|__________|_______|_______|____________| 
|    |  |     | 
| a. Word5 | word6 | word7   | 
|_____________|_______|_________________| 
|          | 
| ///////////Graph1///////////  | 
|_______________________________________| 
|    |  |     | 
| b. Word8 | word9 | word10   | 
|_____________|_______|_________________| 
|   |  |  |   | 
| 2. Word11 | word12 | word13 | word14 | 
|___________|________|________|_________|

：私は私が行う必要があると思う何

は、画像内の空白を検出し、複数のサブ画像に画像をスライスし、このような何かを探しています。あなたはどうやって行くべきだと思いますか？

ありがとうございました！

出典

2017-08-19 enzolito

垂直および水平投影を。最初に全体イメージを行に分割し、各行を列に分割します。 –

Danにありがとう。私は考えを得る。垂直投影と水平投影にはどのようなツールを使用しますか？それは自動化できますか？行と列を検出できますか？ – enzolito

基本的には、1行あたりの平均輝度を計算する（例えば、 'cv2.reduce'を使用して）行間の白い空白を識別するためにそれを使用してください間隙の中点を見つけて、それらをカットポイントとして使用して、 1行に1つのテキスト/グラフを作成します。 –

最初に画像を線分に分割し、次に各線を小さなスライス（単語など）に分割するには、水平投影と垂直投影を使用します。

まず、画像をグレースケールに変換して反転し、ギャップにゼロが入り、テキスト/グラフィックスがゼロでないようにします。

img = cv2.imread('article.png', cv2.IMREAD_COLOR) 
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
img_gray_inverted = 255 - img_gray

水平射影計算 - cv2.reduceを使用して、行ごとの平均強度、および線形アレイにそれを平らにします。

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()

ここで、隣接するすべてのギャップの行範囲を見つけます。 this answerで提供されている機能を使用することができます。

row_gaps = zero_runs(row_means)

最後に、イメージをカットするために使用するギャップの中間点を計算します。

row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1)/2

あなたは（ギャップが赤、ピンク、カットポイントです）このような状況のようなもので終わる：

次のステップは、識別された各ラインを処理することです。

bounding_boxes = [] 
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])): 
    line = img[start:end] 
    line_gray_inverted = img_gray_inverted[start:end]

垂直投影（1列あたりの平均強度）を計算し、ギャップとカットポイントを見つけます。さらに、ギャップサイズを計算して、個々の文字間の小さなギャップを除外することができます。

column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten() 
column_gaps = zero_runs(column_means) 
column_gap_sizes = column_gaps[:,1] - column_gaps[:,0] 
column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1)/2

カットポイントをフィルタリングします。

filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]

各セグメントの境界ボックスのリストを作成します。

for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]): 
    bounding_boxes.append(((xstart, start), (xend, end)))

は今、あなたはこのようなもので終わる（再びギャップがカットポイントは、赤、ピンクです）：

今、あなたがイメージをカットすることができます。私はちょうど見つけバウンディングボックスを可視化します：

フルスクリプト：

import cv2 
import numpy as np 
import matplotlib.pyplot as plt 
from matplotlib import gridspec 


def plot_horizontal_projection(file_name, img, projection): 
    fig = plt.figure(1, figsize=(12,16)) 
    gs = gridspec.GridSpec(1, 2, width_ratios=[3,1]) 

    ax = plt.subplot(gs[0]) 
    im = ax.imshow(img, interpolation='nearest', aspect='auto') 
    ax.grid(which='major', alpha=0.5) 

    ax = plt.subplot(gs[1]) 
    ax.plot(projection, np.arange(img.shape[0]), 'm') 
    ax.grid(which='major', alpha=0.5) 
    plt.xlim([0.0, 255.0]) 
    plt.ylim([-0.5, img.shape[0] - 0.5]) 
    ax.invert_yaxis() 

    fig.suptitle("FOO", fontsize=16) 
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97]) 

    fig.set_dpi(200) 

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi) 
    plt.clf() 

def plot_vertical_projection(file_name, img, projection): 
    fig = plt.figure(2, figsize=(12, 4)) 
    gs = gridspec.GridSpec(2, 1, height_ratios=[1,5]) 

    ax = plt.subplot(gs[0]) 
    im = ax.imshow(img, interpolation='nearest', aspect='auto') 
    ax.grid(which='major', alpha=0.5) 

    ax = plt.subplot(gs[1]) 
    ax.plot(np.arange(img.shape[1]), projection, 'm') 
    ax.grid(which='major', alpha=0.5) 
    plt.xlim([-0.5, img.shape[1] - 0.5]) 
    plt.ylim([0.0, 255.0]) 

    fig.suptitle("FOO", fontsize=16) 
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97]) 

    fig.set_dpi(200) 

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi) 
    plt.clf() 

def visualize_hp(file_name, img, row_means, row_cutpoints): 
    row_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
    row_highlight[row_means == 0, :, :] = [255,191,191] 
    row_highlight[row_cutpoints, :, :] = [255,0,0] 
    plot_horizontal_projection(file_name, row_highlight, row_means) 

def visualize_vp(file_name, img, column_means, column_cutpoints): 
    col_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
    col_highlight[:, column_means == 0, :] = [255,191,191] 
    col_highlight[:, column_cutpoints, :] = [255,0,0] 
    plot_vertical_projection(file_name, col_highlight, column_means) 


# From https://stackoverflow.com/a/24892274/3962537 
def zero_runs(a): 
    # Create an array that is 1 where a is 0, and pad each end with an extra 0. 
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0])) 
    absdiff = np.abs(np.diff(iszero)) 
    # Runs start and end where absdiff is 1. 
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2) 
    return ranges 


img = cv2.imread('article.png', cv2.IMREAD_COLOR) 
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
img_gray_inverted = 255 - img_gray 

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten() 
row_gaps = zero_runs(row_means) 
row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1)/2 

visualize_hp("article_hp.png", img, row_means, row_cutpoints) 

bounding_boxes = [] 
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])): 
    line = img[start:end] 
    line_gray_inverted = img_gray_inverted[start:end] 

    column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten() 
    column_gaps = zero_runs(column_means) 
    column_gap_sizes = column_gaps[:,1] - column_gaps[:,0] 
    column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1)/2 

    filtered_cutpoints = column_cutpoints[column_gap_sizes > 5] 

    for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]): 
     bounding_boxes.append(((xstart, start), (xend, end))) 

    visualize_vp("article_vp_%02d.png" % n, line, column_means, filtered_cutpoints) 

result = img.copy() 

for bounding_box in bounding_boxes: 
    cv2.rectangle(result, bounding_box[0], bounding_box[1], (255,0,0), 2) 

cv2.imwrite("article_boxes.png", result)

出典

2017-08-19 16:50:40

これは私が期待する以上のものです。 – enzolito

OpenCVは.svgファイルを読み書きできません。 OpenCVが扱うベクトル画像フォーマットはありますか？ – enzolito

私が知る限りでは、[それはできません]（https://github.com/opencv/opencv/tree）/master/modules/imgcodecs/src）あなたがそれを考えるときは、レンダリングしない限り、それはラスタイメージではないので、アプローチは異なる必要があります（TBH、あなたにそれに対する良い答えを与えるための研究）1つのpos現在のアプローチを使用してバウンディングボックスをレンダリングして見つけ、その座標を使用してSVGの対応する部分を見つけます。 –

画像は最高品質で、きれいで、歪んでいない、よく分離された文字です。夢！

まず、2値化とブロブ検出（OpenCVでは標準）を実行します。

次に、重なりがあるものを縦座標にグループ化して（つまり、横に並べて）文字をクラスタ化します。これにより、個々の線が自然に分離されます。

すべての行で、ブロブを左から右にソートし、近接でクラスターをソートしてワードを分離します。これは、単語内の文字の間隔が別個の単語間の間隔に近いため、微妙なステップになります。完璧な結果を期待しないでください。これは投影よりもうまくいくはずです。

のイタリック体では状況が悪くなります。水平方向の間隔がさらに狭くなるため、です。「斜めの距離」を見なければならないこともあります。つまり、イタリックの方向に文字を接する線を見つける必要があります。これは、逆剪断変形を適用することによって達成することができる。グリッドへ

おかげで、グラフは大きな塊として表示されます。

出典

2017-08-19 15:52:53

Yvesに感謝します。 – enzolito

画像内の単語とグラフを検出し、単語やグラフごとに1画像に画像を切り出します。

答えて

関連する問題