VuVoPy.data.containers package

Submodules

VuVoPy.data.containers.prepocessing module

class VuVoPy.data.containers.prepocessing.Preprocessed(x, fs, xnorm, preem, alpha=0.94)[source]

Bases: VoiceSample

The Preprocessed class represents a preprocessed version of a voice sample, extending the VoiceSample class. It includes functionality for normalization and pre-emphasis of the waveform. .. attribute:: x

The original waveform of the voice sample.

type:

numpy.ndarray

fs

The sampling rate of the voice sample.

Type:

int

xnorm

The normalized waveform. Defaults to the original waveform if not provided.

Type:

numpy.ndarray

preem

The pre-emphasized waveform. Defaults to the original waveform if not provided.

Type:

numpy.ndarray

alpha

The pre-emphasis coefficient. Defaults to 0.94.

Type:

float

from_voice_sample(cls, voice_sample, alpha=0.94)[source]

Creates a Preprocessed object from a VoiceSample object by applying normalization and pre-emphasis.

get_preemphasis(alpha=None)[source]

Returns the pre-emphasized waveform as a NumPy array. If an alpha value is provided, it applies pre-emphasis with the given coefficient.

get_normalization()[source]

Returns the normalized waveform as a NumPy array.

get_waveform()[source]

Returns the original waveform as a NumPy array.

get_sampling_rate()[source]

Returns the sampling rate of the voice sample.

classmethod from_voice_sample(voice_sample, alpha=0.94)[source]

Apply normalization and pre-emphasis to a VoiceSample and return a Preprocessed object.

get_normalization()[source]

Return the normalized waveform as a NumPy array.

get_preemphasis(alpha=None)[source]

Return the waveform with applied pre-emphasis as a NumPy array.

get_sampling_rate()[source]

Return the sampling rate.

get_waveform()[source]

Return the waveform as a NumPy array.

VuVoPy.data.containers.sample module

class VuVoPy.data.containers.sample.VoiceSample(x: ndarray, fs: int)[source]

Bases: object

Class to load and process audio samples.

classmethod from_wav(file_path: str, sr: int = None)[source]

Load a WAV file and return a VoiceSample instance.

get_sampling_rate()[source]

Return the sampling rate.

get_waveform()[source]

Return the waveform as a NumPy array.

VuVoPy.data.containers.segmentation module

class VuVoPy.data.containers.segmentation.Segmented(x, fs, xnorm, preem, xsegment, winlen, wintype, winover, alpha=0.94)[source]

Bases: Preprocessed

class Segmented[source]

Bases: object

A class for segmenting and preprocessing audio signals. The Segmented class extends the Preprocessed class and provides functionality for segmenting audio signals into overlapping frames, applying window functions, and storing the segmented data in multiple forms (original, pre-emphasized, and normalized). .. attribute:: x

The original waveform.

type:

numpy.ndarray

fs

The sampling rate of the audio signal.

Type:

int

xnorm

The normalized waveform.

Type:

numpy.ndarray

preem

The pre-emphasized waveform.

Type:

numpy.ndarray

xsegment

A 3D array containing segmented data for the original, pre-emphasized, and normalized waveforms.

Type:

numpy.ndarray

winlen

The length of the window used for segmentation.

Type:

int

wintype

The type of window function applied (e.g., “hann”, “hamming”).

Type:

str

winover

The overlap between consecutive windows.

Type:

int

alpha

The pre-emphasis coefficient (default is 0.94).

Type:

float

from_voice_sample(voice_sample, winlen, wintype, winover, alpha=0.94)

Class method to create a Segmented instance from a voice sample object.

get_segment()

Returns the segmented original waveform as a NumPy array.

get_preem_segment()

Returns the segmented pre-emphasized waveform as a NumPy array.

get_norm_segment()

Returns the segmented normalized waveform as a NumPy array.

get_sampling_rate()

Returns the sampling rate of the audio signal.

get_window_type()

Returns the type of window function applied.

get_window_length()

Returns the length of the window used for segmentation.

get_window_overlap()

Returns the overlap between consecutive windows.

classmethod from_voice_sample(voice_sample, winlen, wintype, winover, alpha=0.94)[source]

Creates a segmentation object from a voice sample. This method processes a voice sample by segmenting it into overlapping frames, applying pre-emphasis, normalization, and a specified windowing function. :param voice_sample: The input voice sample object containing waveform,

sampling rate, pre-emphasis, and normalization methods.

Parameters:
  • winlen (int) – The length of the window (in samples) to be applied to each frame.

  • wintype (str) – The type of window to apply. Supported types are: “hann”, “blackman”, “hamm”, “square”. Defaults to “hamming” if unspecified.

  • winover (int) – The overlap (in samples) between consecutive frames.

  • alpha (float, optional) – The pre-emphasis coefficient. Defaults to 0.94.

Returns:

An instance of the Segmentation class containing the segmented

waveform, sampling rate, normalized waveform, pre-emphasized waveform, and other parameters.

Return type:

Segmentation

Raises:

ValueError – If an unsupported window type is specified.

Notes

  • The input waveform is padded with zeros if its length is not a multiple of the window length.

  • The segmentation process generates three versions of the signal: original, pre-emphasized, and normalized, each of which is windowed and stored in the output.

get_norm_segment()[source]

Return the waveform as a NumPy array.

get_preem_segment()[source]

Return the waveform as a NumPy array.

get_sampling_rate()[source]

Return the sampling rate.

get_segment()[source]

Return the waveform as a NumPy array.

get_window_length()[source]

Return the window length.

get_window_overlap()[source]

Return the window overlap.

get_window_type()[source]

Return the window type.

VuVoPy.data.containers.test_voicedsample module

VuVoPy.data.containers.voiced_sample module

class VuVoPy.data.containers.voiced_sample.VoicedSample(preprocessed, vuvs, fs)[source]

Bases: VoiceSample

VoicedSample is a class that processes and analyzes preprocessed audio data to extract voiced samples and remove silence from the waveform. It also provides functionality to stretch labels to match the signal length. .. attribute:: x

The original waveform extracted from the preprocessed data.

type:

numpy.ndarray

x_preem

The pre-emphasized version of the waveform.

Type:

numpy.ndarray

x_norm

The normalized version of the waveform.

Type:

numpy.ndarray

fs

The sampling rate of the audio signal.

Type:

int

vuvs

An object containing voiced/unvoiced labels for the audio signal.

Type:

object

voiced_sample

The waveform containing only voiced segments.

Type:

numpy.ndarray

silence_removed_sample

The waveform with silence removed.

Type:

numpy.ndarray

get_waveform()[source]

Returns the silence-removed waveform as a NumPy array.

label_stretch()[source]

Stretches the voiced/unvoiced labels to match the length of the audio signal.

get_voiced_sample()[source]

Extracts and returns the voiced segments of the waveform based on the stretched labels.

get_silence_remove_sample()[source]

Removes silence from the waveform based on the stretched labels and returns the resulting waveform.

get_sampling_rate()[source]

Returns the sampling rate of the audio signal.

get_sampling_rate()[source]

Return the sampling rate.

get_silence_remove_sample()[source]

Removes segments of silence from the audio sample based on the provided labels. This method identifies silent regions in the audio sample self.x using the labels generated by the label_stretch method. Silent regions are defined as consecutive frames labeled as 0, with a duration greater than or equal to 50 ms. These regions are then removed from the audio sample. :returns: A modified version of the audio sample self.x with silent

regions removed.

Return type:

numpy.ndarray

get_voiced_sample()[source]

Extracts and returns the voiced portion of the audio sample. This method uses the label information to identify the voiced segments in the audio sample. It assumes that the labels are generated such that a label value of 2 corresponds to voiced segments. :returns: A subset of the audio sample containing only the

voiced segments.

Return type:

numpy.ndarray

get_waveform()[source]

Return the silence removed waveform as a NumPy array.

label_stretch()[source]

Stretches or compresses a sequence of labels to match the length of a target array. This function takes a sequence of labels and adjusts their lengths proportionally to match the length of the target array self.x. It ensures that the relative proportions of the original label segments are preserved while fixing any rounding errors to exactly match the target length. :returns: A stretched or compressed array of labels with the same length as self.x. :rtype: np.ndarray

Module contents