Pythonのスペクトログラムから音符（周波数とその時間）を取得する方法は？

私は音楽ファイルをアップロードしてこのファイル（ピアノ上）からノートを得ることができるPythonでプログラムを作ろうとしています。 Spectrogramを作成しましたが、どのようにして周波数を得ることができますか？私はどのようにスペクトログラムを修正できますか（スペクトログラムの半分から鏡面反射があります）？ thisのようなものが必要です。 Hereはコードです。Pythonのスペクトログラムから音符（周波数とその時間）を取得する方法は？

import numpy as np 
from matplotlib import pyplot as plt 
import scipy.io.wavfile as wav 
from numpy.lib import stride_tricks 

""" short time fourier transform of audio signal """ 
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning): 
    win = window(frameSize) 
    hopSize = int(frameSize - np.floor(overlapFac * frameSize)) 

    # zeros at beginning (thus center of 1st window should be for sample nr. 0) 
    samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)  
    # cols for windowing 
    cols = np.ceil((len(samples) - frameSize)/float(hopSize)) + 1 
    # zeros at end (thus samples can be fully covered by frames) 
    samples = np.append(samples, np.zeros(frameSize)) 

    frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy() 
    frames *= win 

    return np.fft.rfft(frames)  

""" scale frequency axis logarithmically """  
def logscale_spec(spec, sr=44100, factor=20.): 
    timebins, freqbins = np.shape(spec) 

    scale = np.linspace(0, 1, freqbins) ** factor 
    scale *= (freqbins-1)/max(scale) 
    scale = np.unique(np.round(scale)) 

    # create spectrogram with new freq bins 
    newspec = np.complex128(np.zeros([timebins, len(scale)])) 
    for i in range(0, len(scale)): 
     if i == len(scale)-1: 
      newspec[:,i] = np.sum(spec[:,scale[i]:], axis=1) 
     else:   
      newspec[:,i] = np.sum(spec[:,scale[i]:scale[i+1]], axis=1) 

    # list center freq of bins 
    allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1]) 
    freqs = [] 
    for i in range(0, len(scale)): 
     if i == len(scale)-1: 
      freqs += [np.mean(allfreqs[scale[i]:])] 
     else: 
      freqs += [np.mean(allfreqs[scale[i]:scale[i+1]])] 

    return newspec, freqs 

""" plot spectrogram""" 
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"): 
    samplerate, samples = wav.read(audiopath) 
    s = stft(samples, binsize) 

    sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate) 
    ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel 

    timebins, freqbins = np.shape(ims) 

    plt.figure(figsize=(15, 7.5)) 
    plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none") 
    plt.colorbar() 

    plt.xlabel("time (s)") 
    plt.ylabel("frequency (Hz)") 
    plt.xlim([0, timebins-1]) 
    plt.ylim([0, freqbins]) 

    xlocs = np.float32(np.linspace(0, timebins-1, 5)) 
    plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate]) 
    ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10))) 
    plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs]) 

    if plotpath: 
     plt.savefig(plotpath, bbox_inches="tight") 
    else: 
     plt.show() 

    plt.clf() 

plotstft("Sound/piano2.wav")

出典

2017-11-05 Gleuq

あなたが記述する音声の転写の問題は、音楽情報検索（MIR）研究コミュニティでよく知られている問題です。それは解決しやすいものではありませんし、二つの側面で構成されています

検出できるため、高調波の発生やメモは、多くの場合、（C＃のに滑空しているという事実のためにハードしばしばあるピッチ周波数を検出しますCの代わりに）、チューニングの不一致も原因です。
ビート検出：オーディオのパフォーマンスが正確に時間通りに再生されないことが多いため、実際のオンセットを見つけるのは難しい場合があります。

有望な新しいアプローチは例えば、これを解決するために、深いニューラルネットワークを使用することである：

ブーランジェ-Lewandowskiの、N.、Bengio、Y.、&ヴィンセント、P.（2012）。 Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription。 arXiv preprint arXiv：1206.6392。

詳しい情報：

Poliner、G. E.、エリス、D. P.、Ehmann、A. F.、ゴメス、E.、シュトライヒ、S.、&オング、B.（2007）。ミュージックオーディオからのメロディーの転写：アプローチと評価。 IEEE、音声、言語、言語処理、15（4）、1247-1256。

出典

2017-11-07 06:32:23 dorien

Pythonのスペクトログラムから音符（周波数とその時間）を取得する方法は？

答えて

関連する問題