VuVoPy.data.utils package

Submodules

VuVoPy.data.utils.formant_frequencies module

class VuVoPy.data.utils.formant_frequencies.FormantFrequencies(fs, formants)[source]

Bases: Segmented

FormantFrequencies is a class for extracting and managing formant frequencies from segmented voice data. Formants are resonant frequencies of the vocal tract, and this class provides methods to calculate and retrieve formant frequencies from raw, pre-emphasized, and normalized voice segments. .. attribute:: formants

A 3D array containing formant frequencies for raw, pre-emphasized, and normalized voice segments. The shape is (N, 3, 3), where N is the number of segments, and the second and third dimensions correspond to the first three formants (F1, F2, F3) and the type of segment (raw, pre-emphasized, normalized), respectively.

type:

numpy.ndarray

__init__(fs, formants)[source]

Initializes the FormantFrequencies object with a sampling rate and formant frequencies.

from_voice_sample(segments)[source]

Class method to create an instance of FormantFrequencies by extracting formant frequencies from segmented voice data.

get_formants()[source]

Returns the numpy array of formants extracted from the raw waveform.

get_formants_preem()[source]

Returns the numpy array of formants extracted from the pre-emphasized waveform.

get_formants_norm()[source]

Returns the numpy array of formants extracted from the normalized waveform.

get_sampling_rate()[source]

Returns the sampling rate of the voice data.

classmethod from_voice_sample(segments)[source]

Creates an instance of the class from a voice sample by extracting formant frequencies. :param cls: The class itself, used to create an instance. :param segments: An object containing segmented voice data with methods to retrieve

raw, pre-emphasized, and normalized segments, as well as the sampling rate.

Returns:

An instance of the class initialized with the sampling rate and extracted formant frequencies.

Notes

  • The method calculates LPC coefficients for raw, pre-emphasized, and normalized segments.

  • Formant frequencies are derived from the roots of the LPC polynomial.

  • Only roots with non-negative imaginary parts are considered.

  • The method currently extracts and sorts the first three formants for each segment.

  • The bandwidths of the formants are not calculated at this stage.

get_formants()[source]

Return the numpy array of formants extracted from raw waveform

get_formants_norm()[source]

Return the numpy array of formants extracted from normalized waveform

get_formants_preem()[source]

Return the numpy array of formants extracted from pre-emphasis waveform

get_sampling_rate()[source]

Return the sampling rate.

VuVoPy.data.utils.fundamental_frequency module

class VuVoPy.data.utils.fundamental_frequency.FundamentalFrequency(sample, plim=(30, 500), hop_size=512, dlog2p=0.010416666666666666, dERBs=0.1, sTHR=-inf)[source]

Bases: object

FundamentalFrequency is a class for analyzing and extracting the fundamental frequency (F0) from a given audio sample using the SWIPE’ algorithm. .. attribute:: x

The waveform of the input audio sample.

type:

numpy.ndarray

fs

The sampling rate of the input audio sample.

Type:

int

plim

A tuple specifying the pitch search range (min_freq, max_freq).

Type:

tuple

hop_size

The time step for analysis in samples.

Type:

int

dlog2p

The resolution of pitch candidates in log2 scale.

Type:

float

dERBs

The frequency resolution in ERBs.

Type:

float

sTHR

The pitch strength threshold.

Type:

float

f0

The computed fundamental frequency values.

Type:

numpy.ndarray

time

The time instances corresponding to the computed F0 values.

Type:

numpy.ndarray

strength

The pitch strength values.

Type:

numpy.ndarray

__init__(sample, plim=(30, 500), hop_size=512, dlog2p=1 / 96, dERBs=0.1, sTHR=-np.inf)[source]

Initializes the FundamentalFrequency object with the given parameters and computes F0.

calculate_f0()[source]

Computes the fundamental frequency (F0) using the SWIPE’ algorithm.

get_f0()[source]

Returns the computed fundamental frequency values.

get_time()[source]

Returns the time instances corresponding to the computed F0 values.

get_strength()[source]

Returns the pitch strength values.

get_sampling_rate()[source]

Returns the sampling rate of the input audio sample.

calculate_f0()[source]

Compute F0 using SWIPE’ algorithm.

get_f0()[source]

Return computed fundamental frequency.

get_sampling_rate()[source]

Return the sampling rate.

get_strength()[source]

Return pitch strength values.

get_time()[source]

Return time instances corresponding to F0.

VuVoPy.data.utils.swipep module

VuVoPy.data.utils.swipep.erbs2hz(erbs)[source]

Convert ERBs to frequency in Hz.

VuVoPy.data.utils.swipep.hz2erbs(hz)[source]

Convert frequency in Hz to ERBs.

VuVoPy.data.utils.swipep.pitchStrengthAllCandidates(f, L, pc)[source]

Calculate the pitch strength for all candidates.

Parameters: f – Frequency y L – Loudness y pc – Pitch candidates

Returns: S – Pitch salience matrix

VuVoPy.data.utils.swipep.pitchStrengthOneCandidate(f, L, pc)[source]

Calculate the pitch strength for one pitch candidate.

Parameters: f – Frequency y L – Loudness y pc – Pitch candidate

Returns: S – Pitch strength for this candidate

VuVoPy.data.utils.swipep.swipep(x, fs, plim, hop_size, dlog2p, dERBs, sTHR)[source]

Perform the SWIPE’ (Sawtooth Waveform Inspired Pitch Estimator) algorithm for pitch estimation in a given audio signal. Parameters: ———– x : ndarray

Input audio signal (1D array).

fsfloat

Sampling frequency of the audio signal (in Hz).

plimtuple

Pitch range as a tuple (min_pitch, max_pitch) in Hz.

hop_sizefloat

Hop size for analysis (in samples).

dlog2pfloat

Step size for pitch candidates in log2 scale.

dERBsfloat

Step size for Equivalent Rectangular Bandwidth (ERB) spaced frequencies.

sTHRfloat

Threshold for pitch strength to consider a valid pitch.

Returns:

pndarray

Estimated pitch values (in Hz) for each time frame.

tndarray

Time vector corresponding to the pitch estimates (in seconds).

sndarray

Pitch strength values for each time frame.

Notes:

  • The function uses a multi-resolution analysis approach to estimate pitch.

  • It computes pitch candidates, their strengths, and refines the pitch estimates using parabolic interpolation.

  • The algorithm is robust to noise and works well for a wide range of pitch frequencies.

VuVoPy.data.utils.vuvs_detection module

class VuVoPy.data.utils.vuvs_detection.Vuvs(segment, fs, winlen=512, winover=496, wintype='hann', smoothing_window=5)[source]

Bases: object

The Vuvs class is designed to analyze voiced, unvoiced, and silence segments in an audio signal. It uses Gaussian Mixture Models (GMM) to compute these segments and provides methods to retrieve various statistics about silence durations and counts. .. attribute:: segment

The original audio segment.

type:

array-like

segment_preem

The pre-emphasized audio segment.

Type:

array-like

segment_norm

The normalized audio segment.

Type:

array-like

fs

The sampling rate of the audio signal.

Type:

int

winlen

The length of the analysis window in samples. Default is 512.

Type:

int

winover

The overlap between consecutive windows in samples. Default is 496.

Type:

int

wintype

The type of window to apply (e.g., ‘hann’). Default is ‘hann’.

Type:

str

smoothing_window

The size of the smoothing window for post-processing. Default is 5.

Type:

int

vuvs

The computed voiced/unvoiced/silence segments.

Type:

array-like

calculate_vuvs()[source]

Compute the voiced/unvoiced/silence segments using a GMM-based approach.

get_vuvs()[source]

Retrieve the computed voiced/unvoiced/silence segments.

get_sampling_rate()[source]

Retrieve the sampling rate of the audio signal.

get_total_silence_duration(min_silence_duration_ms=50)[source]

Calculate the total duration (in seconds) of silences longer than a specified threshold.

get_silence_count(min_silence_duration_ms=50)[source]

Count the number of silent segments longer than a specified threshold.

get_silence_durations(min_silence_duration_ms=50)[source]

Retrieve a list of durations (in seconds) for all silences longer than a specified threshold.

calculate_vuvs()[source]

Calculate the voiced/unvoiced segments (VUVS) of an audio signal. This method uses a Gaussian Mixture Model (GMM) to determine the voiced and unvoiced segments of the audio signal based on the provided parameters. :returns: A list of VUVs detected in the audio signal. :rtype: list

Notes

  • The method relies on the vuvs_gmm function, which performs the actual VUV detection.

  • The detection process uses the attributes segment, fs, winover, and smoothing_window of the class instance.

get_sampling_rate()[source]

Return the sampling rate.

get_silence_count(min_silence_duration_ms=50)[source]

Return number of silent segments longer than the threshold.

get_silence_durations(min_silence_duration_ms=50)[source]

Return list of durations (in seconds) for all silences longer than the threshold.

get_total_silence_duration(min_silence_duration_ms=50)[source]

Return total duration (in seconds) of silences longer than the threshold.

get_vuvs()[source]

Return computed voiced/unvoiced/scilence segments.

VuVoPy.data.utils.vuvs_gmm module

VuVoPy.data.utils.vuvs_gmm.vuvs_gmm(segments, sr, winover, smoothing_window=5)[source]

Classifies audio frames into voiced, unvoiced, or silence using Gaussian Mixture Models (GMMs) and applies smoothing and post-processing rules to refine the classification. :param segments: A 2D array of audio frames with shape (num_frames, frame_length). :type segments: numpy.ndarray :param sr: Sampling rate of the audio signal in Hz. :type sr: int :param winover: Overlap between consecutive frames in samples. :type winover: int :param smoothing_window: Window size for smoothing the classification labels.

Defaults to 5.

Returns:

An array of labels for each frame, where:

0 = silence, 1 = unvoiced, 2 = voiced.

Return type:

numpy.ndarray

Notes

  • The function extracts features such as energy, high-to-low frequency ratio, normalized autocorrelation coefficient, and zero-crossing rate for each frame.

  • Two GMMs are used: the first separates voiced frames from unvoiced/silence, and the second separates unvoiced from silence.

  • Smoothing is applied to reduce noise in the classification labels.

  • Post-processing rules are applied to handle short segments and ensure temporal consistency.

Module contents