librosa库 【NLP】音频特征工程(1)( 二 )


特征提取 过零率(ZeroRate,ZCR)
The zerorate is the rate of sign- along a . i.e., the rate at which thefromtoor back. Thishas benn usedin bothand music. Ithasforlike those in metal and rock.
过零率表示在每帧中,信号通过零点的次数
# 过零率def ZCR_plot():start, end=1300, 1500plt.figure(figsize=(10, 6))plt.plot(info[start: end])plt.grid(True)plt.show()ZCR_plot()
计算过零点的数量
n_zcr=librosa.zero_crossings(info[start: end], pad=False)print('# of ZCR is {}'.format(sum(n_zcr)))
过零率计算,函数..()接口如下
librosa.feature.zero_crossing_rate(y, frame_length=2048, hop_length=512, center=True)参数:y,音频时间序列frame_length,帧长hop_length,帧移center,bool,True通过填充y的边缘使得帧居中返回:zcr,zcr[0, i]表示第i帧中的过零率
print('ZCR is')print(librosa.feature.zero_crossing_rate(info))
频谱中心( )
Itwhere theof mass for a sound isand isas themean of thein the sound. If thein music are samethenwould beaand if there are highat the end of sound then thewould beits end.
频谱中心表示声音的质心,即频谱一阶矩,中心的位置表示该段频谱的能量集中在该频段附近
# 频谱中心import sklearndef spec_center():x=info[:80000] # 取80000/8000=10秒的数据spec_centroids=librosa.feature.spectral_centroid(x, sr=sr)[0]frames=range(len(spec_centroids))t=librosa.frames_to_time(frames, sr=8000) # 时间轴# 归一化处理def normalize(x, axis=0): return sklearn.preprocessing.minmax_scale(x, axis=axis)dd.waveplot(x, sr=sr, alpha=0.4, label='wave')plt.plot(t, normalize(spec_centroids), color='r', linewidth=1, linestyle=':', label='spec_center')plt.legend()plt.show()spec_center()
计算每一帧的频谱中心
将帧转换为时间time[i]==frame[i]
频谱滚降点( )
is thebelow which aof the total, e.g. 90% lies.
设置频率点 f f f,低于 f f f的频谱能量占总能量的比例达到了设定值,如90%或者85%(经验值)
arg?min ? f c ∈ { 1 , … , N } ∑ i = 1 f c m i ≥ 0.85 × ∑ i = 1 N m i \{f_c\in\{1, \dots, N\}}\sum_{i=1}^{f_c} m_i\geq 0.85\times\sum_{i=1}^Nm_i fc?∈{1,…,N}?i=1∑fc??mi?≥0.85×i=1∑N?mi?
其中 f c f_c fc?为滚降点频率,m i m_i mi?为该频率下的能量()分量.
# 滚降点def spec_rolloff():x=info[:160000]# 前20秒spec_roll=librosa.feature.spectral_rolloff(x, sr=sr)[0]frames=range(len(spec_roll))t=librosa.frames_to_time(frames, sr=8000) # 时间轴# 归一化处理def normalize(x, axis=0): return sklearn.preprocessing.minmax_scale(x, axis=axis)dd.waveplot(x, sr=sr, alpha=0.4, label='wave')plt.plot(t, normalize(spec_roll), color='r', linewidth=1, linestyle=':', label='spec_roll')plt.show()spec_rolloff()
MFCC(Mel-Coef.)
Theis one of the mosttoaof an audioand is usedon audio . The mel(MFCCs) of aare a small set of( about 10-20) whichtheshape of a.(谱包络)
MFCC是重要的音频信号特征,属于集合特征,可以表示频谱的包络.
# MFCCdef mfcc_plot():x=info[:160000] # 采样前20秒mfccs=librosa.feature.mfcc(x, sr=sr)print(mfccs.shape)dd.specshow(mfccs, sr=sr, x_axis='time')plt.show()mfcc_plot()
得到mfccs.shape为(20, 313)表示mfcc每帧有20维特征,帧数为313
接口 重采样
重采样从到的时间序列
y_hat = libsora.resample(y, orig_sr, target_sr, fix=True, scale=False)参数:y,音频时间序列,可以为单声道或者立体声orig_sr,y的原始采样率target_sr,目标采样率fix,bool,调整采样信号的长度,使其大小恰好为len(y)/orig_sr*target_sr=t*target_srscale,bool,缩放重新采样的信号,使得y和y_hat能量近似相等返回:y_hat,重采样之后的音频序列