VuVoPy package

Subpackages

Module contents

VuVoPy.durmad(folder_path, winlen=512, winover=496, wintype='hamm')[source]

Compute the absolute median deviation of silence durations from a voice sample.

This function processes a voice sample from a given folder path, segments it using the specified window parameters, and calculates the silence durations. It then returns the absolute median deviation of these silence durations.

Parameters:
  • folder_path (str) – Path to the folder containing the WAV voice sample.

  • winlen (int, optional) – Length of the analysis window. Default is 512.

  • winover (int, optional) – Overlap between windows. Default is 496.

  • wintype (str, optional) – Type of windowing function (e.g., ‘hamm’ for Hamming). Default is ‘hamm’.

Returns:

Absolute median deviation of silence durations in seconds.

Return type:

float

VuVoPy.durmed(folder_path, winlen=512, winover=496, wintype='hamm')[source]

Calculate the median silence duration from a voice sample. This function processes a voice sample from a given file path, segments it using specified window parameters, and calculates the median duration of silence segments. :param folder_path: The file path to the voice sample in WAV format. :type folder_path: str :param winlen: The length of the analysis window. Default is 512. :type winlen: int, optional :param winover: The overlap between consecutive windows. Default is 496. :type winover: int, optional :param wintype: The type of window to apply (e.g., ‘hamm’ for Hamming). Default is ‘hamm’. :type wintype: str, optional

Returns:

The median duration of silence segments in the voice sample.

Return type:

float

VuVoPy.duv(folder_path, winlen=512, winover=496, wintype='hamm')[source]

Computes the voiced-to-unvoiced ratio (VUV) percentage for an audio sample. This function processes an audio file located at the specified folder path, segments it using the given window parameters, and calculates the percentage of voiced frames in the sample. :param folder_path: The path to the folder containing the audio file in WAV format. :type folder_path: str :param winlen: The length of the analysis window in samples. Default is 512. :type winlen: int, optional :param winover: The overlap between consecutive windows in samples. Default is 496. :type winover: int, optional :param wintype: The type of window function to apply (e.g., ‘hamm’ for Hamming). Default is ‘hamm’. :type wintype: str, optional

Returns:

The percentage of voiced frames in the audio sample.

Return type:

float

Notes

  • The function assumes the presence of preprocessing, segmentation, and VUV classification utilities (pp, vs, sg, and vuvs) in the codebase.

  • The smoothing_window parameter for VUV classification is set to 5 by default.

VuVoPy.hnr(folder_path, winlen=512, winover=256, wintype='hann', f0_min=75, f0_max=500)[source]

Compute Harmonics-to-Noise Ratio (HNR) using an autocorrelation-based method.

This function processes a WAV file, divides it into overlapping frames, and estimates the harmonic-to-noise ratio (HNR) for each frame using pitch period information derived from the autocorrelation method.

Parameters:
  • folder_path (str) – Path to the audio file (WAV format).

  • winlen (int) – Frame length in samples.

  • winover (int) – Overlap between consecutive frames in samples.

  • wintype (str) – Type of window function to apply (e.g., ‘hann’, ‘hamming’).

  • f0_min (float) – Minimum fundamental frequency in Hz.

  • f0_max (float) – Maximum fundamental frequency in Hz.

Returns:

Mean HNR value across all frames.

Return type:

float

VuVoPy.jitterPPQ(folder_path, n_points=3, plim=(30, 500), hop_size=512, dlog2p=0.010416666666666666, dERBs=0.1, sTHR=-inf)[source]

Calculate the Pitch Perturbation Quotient (PPQ) jitter for a given audio file.

This function computes jitter based on the fundamental frequency (F0) extracted from a voice signal. Jitter reflects cycle-to-cycle variability in F0 and is useful in analyzing vocal stability.

Parameters:
  • folder_path (str) – Path to the WAV audio file to analyze.

  • n_points (int, optional) – Number of points to average when calculating PPQ. Default is 3.

  • plim (tuple, optional) – Pitch range in Hz for F0 extraction. Default is (30, 500).

  • hop_size (int, optional) – Hop size for F0 tracking. Default is 512.

  • dlog2p (float, optional) – Log2 pitch step size. Default is 1/96.

  • dERBs (float, optional) – Frequency resolution in ERBs. Default is 0.1.

  • sTHR (float, optional) – Voicing threshold. Default is -np.inf.

Returns:

The average PPQ jitter value. Returns 0 if there are not enough F0 values.

Return type:

float

VuVoPy.mpt(folder_path, winlen=512, winover=496, wintype='hamm')[source]

Computes the Maximal Phonation Time (MPT) from a given audio file. :param folder_path: The file path to the audio sample in WAV format. :type folder_path: str :param winlen: The length of the analysis window. Default is 512. :type winlen: int, optional :param winover: The overlap between consecutive windows. Default is 496. :type winover: int, optional :param wintype: The type of windowing function to apply (e.g., ‘hamm’ for Hamming). Default is ‘hamm’. :type wintype: str, optional

Returns:

The Maximal Phonation Time (MPT) in seconds, calculated as the total duration of voiced segments.

Return type:

float

VuVoPy.ppr(folder_path, winlen=512, winover=496, wintype='hamm', min_silence_duration_ms=100)[source]

Compute the percentage of silence in an audio file using voice activity detection.

This function loads and preprocesses an audio file, segments it using a specified windowing approach, and applies a voiced/unvoiced/silence detection algorithm to estimate the percentage of silence.

Parameters:
  • folder_path (str) – Path to the audio file (e.g., WAV format).

  • winlen (int, optional) – Window length for segmentation. Defaults to 512.

  • winover (int, optional) – Overlap between consecutive windows. Defaults to 496.

  • wintype (str, optional) – Window type (‘hann’, ‘hamm’, ‘blackman’, ‘square’). Defaults to ‘hamm’.

  • min_silence_duration_ms (int, optional) – Minimum duration of silence to count, in milliseconds. Defaults to 100.

Returns:

Percentage of silence in the audio signal.

Return type:

float

Notes

  • Ensure the input file is in a compatible format (e.g., mono WAV).

  • The silence detection accuracy depends on the quality of preprocessing and the VAD algorithm.

VuVoPy.relF0SD(folder_path, plim=(30, 500), hop_size=512, dlog2p=0.010416666666666666, dERBs=0.1, sTHR=-inf)[source]

Calculate the relative standard deviation of the fundamental frequency (F0).

This function computes the relative standard deviation (standard deviation divided by mean) of the fundamental frequency (F0) extracted from an audio file.

Parameters:
  • folder_path (str) – Path to the audio file.

  • plim (tuple, optional) – Tuple (min_freq, max_freq) specifying pitch range in Hz. Default is (30, 500).

  • hop_size (int, optional) – Time step for analysis in samples. Default is 512.

  • dlog2p (float, optional) – Resolution of pitch candidates in log2 space. Default is 1/96.

  • dERBs (float, optional) – Frequency resolution in ERBs. Default is 0.1.

  • sTHR (float, optional) – Pitch strength threshold. Default is -np.inf.

Returns:

Relative standard deviation of the fundamental frequency (std/mean).

Return type:

float

Notes

  • Requires vs.from_wav and f0 functions from VuVoPy.

  • The input file must be supported by vs.from_wav (e.g., WAV format).

VuVoPy.relF1SD(folder_path, winlen=512, winover=256, wintype='hann')[source]

Compute the relative standard deviation of the first formant frequency (F1).

This function segments a WAV audio sample, extracts the first formant (F1), and returns its relative standard deviation (std / mean).

Parameters:
  • folder_path (str) – Path to the voice sample (WAV).

  • winlen (int, optional) – Length of the analysis window in samples. Default is 512.

  • winover (int, optional) – Overlap between consecutive windows in samples. Default is 256.

  • wintype (str, optional) – Type of window function to use. Default is ‘hann’.

Returns:

Relative standard deviation of the F1 frequency.

Return type:

float

VuVoPy.relF2SD(folder_path, winlen=512, winover=256, wintype='hann')[source]

Compute the relative standard deviation of the second formant frequency (F2).

This function processes a WAV voice sample, segments it using a windowing approach, extracts F2 formants, and calculates the relative standard deviation (std/mean) of the second formant frequency.

Parameters:
  • folder_path (str) – Path to the WAV file.

  • winlen (int, optional) – Window length in samples. Default is 512.

  • winover (int, optional) – Overlap between windows in samples. Default is 256.

  • wintype (str, optional) – Type of window function. Default is ‘hann’.

Returns:

Relative standard deviation of F2 (std/mean).

Return type:

float

Notes

  • Requires modules vs, pp, sg, ff for preprocessing and formant extraction.

  • Input file must contain a valid speech sample for accurate results.

VuVoPy.relSEOSD(folder_path, winlen=512, winover=496, wintype='hamm')[source]

Computes the relative standard deviation of the root mean square (RMS) energy contour of a voice sample after silence removal.

Parameters:

folder_pathstr

Path to the folder containing the voice sample in WAV format.

winlenint, optional

Length of the analysis window in samples. Default is 512.

winoverint, optional

Overlap between consecutive windows in samples. Default is 496.

wintypestr, optional

Type of window function to apply (e.g., ‘hamm’ for Hamming window). Default is ‘hamm’.

Returns:

float

The relative standard deviation (standard deviation divided by the mean) of the RMS energy contour. Returns 0 if the mean RMS energy is zero.

VuVoPy.shimmerAPQ(folder_path, n_points=5, plim=(30, 500), sTHR=0.5, winlen=512, winover=496, wintype='hamm')[source]

Calculate shimmer APQ-N: amplitude perturbation quotient over an N-point window.

This function estimates shimmer by analyzing cycle-to-cycle amplitude variations in voiced frames of an audio signal. The average absolute difference between local peak amplitudes is computed over a moving window of size n_points.

Parameters:
  • folder_path (str) – Path to the .wav file.

  • n_points (int) – Number of points in the local averaging window (e.g., 3 for APQ3, 5 for APQ5).

  • plim (tuple) – F0 pitch range in Hz. Default is (30, 500).

  • sTHR (float) – Voicing threshold for F0 tracking.

  • winlen (int) – Window length for segmentation.

  • winover (int) – Window overlap for segmentation.

  • wintype (str) – Window type for segmentation.

Returns:

Shimmer APQ-N value.

Return type:

float

VuVoPy.spir(folder_path, winlen=512, winover=496, wintype='hamm')[source]

Calculate the percentage of silence in an audio signal using a windowing approach.

This function segments the signal using a specified window type and size, detects silent segments, and returns the ratio of silence duration to the total duration as a percentage.

Parameters:
  • folder_path (str) – Path to the WAV audio file.

  • winlen (int, optional) – Window length for segmentation. Default is 512.

  • winover (int, optional) – Overlap between consecutive windows. Default is 496.

  • wintype (str, optional) – Type of window function to use. Options are: ‘hann’, ‘hamm’, ‘blackman’, ‘square’. Default is ‘hamm’.

Returns:

Percentage of silence in the signal.

Return type:

float