VuVoPy.data.utils package
Submodules
VuVoPy.data.utils.formant_frequencies module
- class VuVoPy.data.utils.formant_frequencies.FormantFrequencies(fs, formants)[source]
Bases:
SegmentedFormantFrequencies is a class for extracting and managing formant frequencies from segmented voice data. Formants are resonant frequencies of the vocal tract, and this class provides methods to calculate and retrieve formant frequencies from raw, pre-emphasized, and normalized voice segments. .. attribute:: formants
A 3D array containing formant frequencies for raw, pre-emphasized, and normalized voice segments. The shape is (N, 3, 3), where N is the number of segments, and the second and third dimensions correspond to the first three formants (F1, F2, F3) and the type of segment (raw, pre-emphasized, normalized), respectively.
- type:
numpy.ndarray
- __init__(fs, formants)[source]
Initializes the FormantFrequencies object with a sampling rate and formant frequencies.
- from_voice_sample(segments)[source]
Class method to create an instance of FormantFrequencies by extracting formant frequencies from segmented voice data.
- get_formants_preem()[source]
Returns the numpy array of formants extracted from the pre-emphasized waveform.
- get_formants_norm()[source]
Returns the numpy array of formants extracted from the normalized waveform.
- classmethod from_voice_sample(segments)[source]
Creates an instance of the class from a voice sample by extracting formant frequencies. :param cls: The class itself, used to create an instance. :param segments: An object containing segmented voice data with methods to retrieve
raw, pre-emphasized, and normalized segments, as well as the sampling rate.
- Returns:
An instance of the class initialized with the sampling rate and extracted formant frequencies.
Notes
The method calculates LPC coefficients for raw, pre-emphasized, and normalized segments.
Formant frequencies are derived from the roots of the LPC polynomial.
Only roots with non-negative imaginary parts are considered.
The method currently extracts and sorts the first three formants for each segment.
The bandwidths of the formants are not calculated at this stage.
VuVoPy.data.utils.fundamental_frequency module
- class VuVoPy.data.utils.fundamental_frequency.FundamentalFrequency(sample, plim=(30, 500), hop_size=512, dlog2p=0.010416666666666666, dERBs=0.1, sTHR=-inf)[source]
Bases:
objectFundamentalFrequency is a class for analyzing and extracting the fundamental frequency (F0) from a given audio sample using the SWIPE’ algorithm. .. attribute:: x
The waveform of the input audio sample.
- type:
numpy.ndarray
- fs
The sampling rate of the input audio sample.
- Type:
int
- plim
A tuple specifying the pitch search range (min_freq, max_freq).
- Type:
tuple
- hop_size
The time step for analysis in samples.
- Type:
int
- dlog2p
The resolution of pitch candidates in log2 scale.
- Type:
float
- dERBs
The frequency resolution in ERBs.
- Type:
float
- sTHR
The pitch strength threshold.
- Type:
float
- f0
The computed fundamental frequency values.
- Type:
numpy.ndarray
- time
The time instances corresponding to the computed F0 values.
- Type:
numpy.ndarray
- strength
The pitch strength values.
- Type:
numpy.ndarray
VuVoPy.data.utils.swipep module
- VuVoPy.data.utils.swipep.pitchStrengthAllCandidates(f, L, pc)[source]
Calculate the pitch strength for all candidates.
Parameters: f – Frequency y L – Loudness y pc – Pitch candidates
Returns: S – Pitch salience matrix
- VuVoPy.data.utils.swipep.pitchStrengthOneCandidate(f, L, pc)[source]
Calculate the pitch strength for one pitch candidate.
Parameters: f – Frequency y L – Loudness y pc – Pitch candidate
Returns: S – Pitch strength for this candidate
- VuVoPy.data.utils.swipep.swipep(x, fs, plim, hop_size, dlog2p, dERBs, sTHR)[source]
Perform the SWIPE’ (Sawtooth Waveform Inspired Pitch Estimator) algorithm for pitch estimation in a given audio signal. Parameters: ———– x : ndarray
Input audio signal (1D array).
- fsfloat
Sampling frequency of the audio signal (in Hz).
- plimtuple
Pitch range as a tuple (min_pitch, max_pitch) in Hz.
- hop_sizefloat
Hop size for analysis (in samples).
- dlog2pfloat
Step size for pitch candidates in log2 scale.
- dERBsfloat
Step size for Equivalent Rectangular Bandwidth (ERB) spaced frequencies.
- sTHRfloat
Threshold for pitch strength to consider a valid pitch.
Returns:
- pndarray
Estimated pitch values (in Hz) for each time frame.
- tndarray
Time vector corresponding to the pitch estimates (in seconds).
- sndarray
Pitch strength values for each time frame.
Notes:
The function uses a multi-resolution analysis approach to estimate pitch.
It computes pitch candidates, their strengths, and refines the pitch estimates using parabolic interpolation.
The algorithm is robust to noise and works well for a wide range of pitch frequencies.
VuVoPy.data.utils.vuvs_detection module
- class VuVoPy.data.utils.vuvs_detection.Vuvs(segment, fs, winlen=512, winover=496, wintype='hann', smoothing_window=5)[source]
Bases:
objectThe Vuvs class is designed to analyze voiced, unvoiced, and silence segments in an audio signal. It uses Gaussian Mixture Models (GMM) to compute these segments and provides methods to retrieve various statistics about silence durations and counts. .. attribute:: segment
The original audio segment.
- type:
array-like
- segment_preem
The pre-emphasized audio segment.
- Type:
array-like
- segment_norm
The normalized audio segment.
- Type:
array-like
- fs
The sampling rate of the audio signal.
- Type:
int
- winlen
The length of the analysis window in samples. Default is 512.
- Type:
int
- winover
The overlap between consecutive windows in samples. Default is 496.
- Type:
int
- wintype
The type of window to apply (e.g., ‘hann’). Default is ‘hann’.
- Type:
str
- smoothing_window
The size of the smoothing window for post-processing. Default is 5.
- Type:
int
- vuvs
The computed voiced/unvoiced/silence segments.
- Type:
array-like
- get_total_silence_duration(min_silence_duration_ms=50)[source]
Calculate the total duration (in seconds) of silences longer than a specified threshold.
- get_silence_count(min_silence_duration_ms=50)[source]
Count the number of silent segments longer than a specified threshold.
- get_silence_durations(min_silence_duration_ms=50)[source]
Retrieve a list of durations (in seconds) for all silences longer than a specified threshold.
- calculate_vuvs()[source]
Calculate the voiced/unvoiced segments (VUVS) of an audio signal. This method uses a Gaussian Mixture Model (GMM) to determine the voiced and unvoiced segments of the audio signal based on the provided parameters. :returns: A list of VUVs detected in the audio signal. :rtype: list
Notes
The method relies on the vuvs_gmm function, which performs the actual VUV detection.
The detection process uses the attributes segment, fs, winover, and smoothing_window of the class instance.
- get_silence_count(min_silence_duration_ms=50)[source]
Return number of silent segments longer than the threshold.
- get_silence_durations(min_silence_duration_ms=50)[source]
Return list of durations (in seconds) for all silences longer than the threshold.
VuVoPy.data.utils.vuvs_gmm module
- VuVoPy.data.utils.vuvs_gmm.vuvs_gmm(segments, sr, winover, smoothing_window=5)[source]
Classifies audio frames into voiced, unvoiced, or silence using Gaussian Mixture Models (GMMs) and applies smoothing and post-processing rules to refine the classification. :param segments: A 2D array of audio frames with shape (num_frames, frame_length). :type segments: numpy.ndarray :param sr: Sampling rate of the audio signal in Hz. :type sr: int :param winover: Overlap between consecutive frames in samples. :type winover: int :param smoothing_window: Window size for smoothing the classification labels.
Defaults to 5.
- Returns:
- An array of labels for each frame, where:
0 = silence, 1 = unvoiced, 2 = voiced.
- Return type:
numpy.ndarray
Notes
The function extracts features such as energy, high-to-low frequency ratio, normalized autocorrelation coefficient, and zero-crossing rate for each frame.
Two GMMs are used: the first separates voiced frames from unvoiced/silence, and the second separates unvoiced from silence.
Smoothing is applied to reduce noise in the classification labels.
Post-processing rules are applied to handle short segments and ensure temporal consistency.