VuVoPy.data.containers package
Submodules
VuVoPy.data.containers.prepocessing module
- class VuVoPy.data.containers.prepocessing.Preprocessed(x, fs, xnorm, preem, alpha=0.94)[source]
Bases:
VoiceSampleThe Preprocessed class represents a preprocessed version of a voice sample, extending the VoiceSample class. It includes functionality for normalization and pre-emphasis of the waveform. .. attribute:: x
The original waveform of the voice sample.
- type:
numpy.ndarray
- fs
The sampling rate of the voice sample.
- Type:
int
- xnorm
The normalized waveform. Defaults to the original waveform if not provided.
- Type:
numpy.ndarray
- preem
The pre-emphasized waveform. Defaults to the original waveform if not provided.
- Type:
numpy.ndarray
- alpha
The pre-emphasis coefficient. Defaults to 0.94.
- Type:
float
- from_voice_sample(cls, voice_sample, alpha=0.94)[source]
Creates a Preprocessed object from a VoiceSample object by applying normalization and pre-emphasis.
- get_preemphasis(alpha=None)[source]
Returns the pre-emphasized waveform as a NumPy array. If an alpha value is provided, it applies pre-emphasis with the given coefficient.
- classmethod from_voice_sample(voice_sample, alpha=0.94)[source]
Apply normalization and pre-emphasis to a VoiceSample and return a Preprocessed object.
VuVoPy.data.containers.sample module
VuVoPy.data.containers.segmentation module
- class VuVoPy.data.containers.segmentation.Segmented(x, fs, xnorm, preem, xsegment, winlen, wintype, winover, alpha=0.94)[source]
Bases:
Preprocessed- class Segmented[source]
Bases:
objectA class for segmenting and preprocessing audio signals. The Segmented class extends the Preprocessed class and provides functionality for segmenting audio signals into overlapping frames, applying window functions, and storing the segmented data in multiple forms (original, pre-emphasized, and normalized). .. attribute:: x
The original waveform.
- type:
numpy.ndarray
- fs
The sampling rate of the audio signal.
- Type:
int
- xnorm
The normalized waveform.
- Type:
numpy.ndarray
- preem
The pre-emphasized waveform.
- Type:
numpy.ndarray
- xsegment
A 3D array containing segmented data for the original, pre-emphasized, and normalized waveforms.
- Type:
numpy.ndarray
- winlen
The length of the window used for segmentation.
- Type:
int
- wintype
The type of window function applied (e.g., “hann”, “hamming”).
- Type:
str
- winover
The overlap between consecutive windows.
- Type:
int
- alpha
The pre-emphasis coefficient (default is 0.94).
- Type:
float
- from_voice_sample(voice_sample, winlen, wintype, winover, alpha=0.94)
Class method to create a Segmented instance from a voice sample object.
- get_segment()
Returns the segmented original waveform as a NumPy array.
- get_preem_segment()
Returns the segmented pre-emphasized waveform as a NumPy array.
- get_norm_segment()
Returns the segmented normalized waveform as a NumPy array.
- get_sampling_rate()
Returns the sampling rate of the audio signal.
- get_window_type()
Returns the type of window function applied.
- get_window_length()
Returns the length of the window used for segmentation.
- get_window_overlap()
Returns the overlap between consecutive windows.
- classmethod from_voice_sample(voice_sample, winlen, wintype, winover, alpha=0.94)[source]
Creates a segmentation object from a voice sample. This method processes a voice sample by segmenting it into overlapping frames, applying pre-emphasis, normalization, and a specified windowing function. :param voice_sample: The input voice sample object containing waveform,
sampling rate, pre-emphasis, and normalization methods.
- Parameters:
winlen (int) – The length of the window (in samples) to be applied to each frame.
wintype (str) – The type of window to apply. Supported types are: “hann”, “blackman”, “hamm”, “square”. Defaults to “hamming” if unspecified.
winover (int) – The overlap (in samples) between consecutive frames.
alpha (float, optional) – The pre-emphasis coefficient. Defaults to 0.94.
- Returns:
- An instance of the Segmentation class containing the segmented
waveform, sampling rate, normalized waveform, pre-emphasized waveform, and other parameters.
- Return type:
Segmentation
- Raises:
ValueError – If an unsupported window type is specified.
Notes
The input waveform is padded with zeros if its length is not a multiple of the window length.
The segmentation process generates three versions of the signal: original, pre-emphasized, and normalized, each of which is windowed and stored in the output.
VuVoPy.data.containers.test_voicedsample module
VuVoPy.data.containers.voiced_sample module
- class VuVoPy.data.containers.voiced_sample.VoicedSample(preprocessed, vuvs, fs)[source]
Bases:
VoiceSampleVoicedSample is a class that processes and analyzes preprocessed audio data to extract voiced samples and remove silence from the waveform. It also provides functionality to stretch labels to match the signal length. .. attribute:: x
The original waveform extracted from the preprocessed data.
- type:
numpy.ndarray
- x_preem
The pre-emphasized version of the waveform.
- Type:
numpy.ndarray
- x_norm
The normalized version of the waveform.
- Type:
numpy.ndarray
- fs
The sampling rate of the audio signal.
- Type:
int
- vuvs
An object containing voiced/unvoiced labels for the audio signal.
- Type:
object
- voiced_sample
The waveform containing only voiced segments.
- Type:
numpy.ndarray
- silence_removed_sample
The waveform with silence removed.
- Type:
numpy.ndarray
- label_stretch()[source]
Stretches the voiced/unvoiced labels to match the length of the audio signal.
- get_voiced_sample()[source]
Extracts and returns the voiced segments of the waveform based on the stretched labels.
- get_silence_remove_sample()[source]
Removes silence from the waveform based on the stretched labels and returns the resulting waveform.
- get_silence_remove_sample()[source]
Removes segments of silence from the audio sample based on the provided labels. This method identifies silent regions in the audio sample self.x using the labels generated by the label_stretch method. Silent regions are defined as consecutive frames labeled as 0, with a duration greater than or equal to 50 ms. These regions are then removed from the audio sample. :returns: A modified version of the audio sample self.x with silent
regions removed.
- Return type:
numpy.ndarray
- get_voiced_sample()[source]
Extracts and returns the voiced portion of the audio sample. This method uses the label information to identify the voiced segments in the audio sample. It assumes that the labels are generated such that a label value of 2 corresponds to voiced segments. :returns: A subset of the audio sample containing only the
voiced segments.
- Return type:
numpy.ndarray
- label_stretch()[source]
Stretches or compresses a sequence of labels to match the length of a target array. This function takes a sequence of labels and adjusts their lengths proportionally to match the length of the target array self.x. It ensures that the relative proportions of the original label segments are preserved while fixing any rounding errors to exactly match the target length. :returns: A stretched or compressed array of labels with the same length as self.x. :rtype: np.ndarray