Lab 6: Audio Filtering

Due Friday 3/6/2020

Overview

The purpose of this lab is to give students more practice with arrays and loops in Java by implementing a "filtering" method, in which parts of an audio clip are removed based on different criteria.

Code

The code for this assignment can be found in the homework 5 repository. You will be editing the file AudioFiltering.java

Overview

As we have seen, digital audio can be represented as an array of doubles between -1 and 1, which we refer to as samples. In this assignment, you will fill in a method that takes in a sequence of audio samples x and creates a new array of audio samples y which is a subset of x. There are three options, and you must implement one. Each option consists of the following steps:

  1. Count the number of samples in x that are above or below a certain threshold ("energy" for loudness filtering or "zero crossings" for vowel/consonant filtering). This will be the size of the array y
  2. Create an array y of the appropriate size.
  3. Copy over the values from x to y that exceed a threshold.
Below is more information about what the thresholds are for the three different options:

Option 1: Loudness Filtering

Your first option is to fill in the function

public static double[] filterLoudness(double[] x, int win, double cutoff)

This method should return an array y where all of the quiet regions have been removed. The first line of this function has been filled in for you:

double[] energy = ArrayUtils.getLocalEnergy(x, win);

This line creates an array energy which is parallel to x. In particular, energy[i] holds a number between 0 and 1 which is the mean of the square of the audio samples in x between indices i-win and i+win (Click here to see the implementation details if you're curious, but you can treat that as a black box for the purposes of this assignment). energy[i] is higher if the audio samples in x around index i are louder, and lower otherwise. Let's say, for example, that we have the following audio:

Below is a plot that shows the original samples of x (top plot), followed by the energy array (middle plot). The regions in which the person is speaking have higher energy, and the regions between speech has lower energy. If we choose a cutoff of 0.005 in energy (shown as the dotted line), and we only fill in samples in y that are above this cutoff, then we get the following audio (which is also shown as the bottom plot)

Option 2: Consonant Filtering

Your second option is to fill in the function

public static double[] filterZCS(double[] x, int win, int maxzcs)

This method should return an array y where all of the vowels have been removed from the audio. The first line of this function has been filled in for you:

int[] allzcs = ArrayUtils.getZeroCrossings(x, win);

This line creates an array allzcs which is parallel to x. In particular, allzcs[i] holds the number of zero crossings in x between indices i-win and i+win (Click here to see the implementation details if you're curious, but you can treat that as a black box for the purposes of this assignment). Generally, allzcs[i] is higher if x[i] is in the middle of a consonant, and it is lower otherwise. This is because consonants are higher frequency than vowels, which means they have a smaller period (i.e. they go through more cycles in the same amount of time, which means they cross the x axis more). Let's say, for example, that we have the same audio clip:

If we create a new array y in which the zero crossings in a window of size 4001 around x must be at least 150, we get the following audio

Below is a plot that shows how this happens. As before, the original audio is up top, and the filtered audio is at the bottom. The middle plot shows the zero crossings counted in windows of size 4001 around each sample, and a threshold of 150 is drawn as a dotted line. As an example, consider the circled region around the letter "six." The zero crossings peak around s and x, but they dip around i.

Option 3: Vowel Filtering

You can also choose to create a method which is almost identical to the consonant filtering method, but where you only keep samples if the zero crossings are below a certain amount. Instead of keeping the consonants, this will keep the vowels. For example, if we create a new array y in which the zero crossings in a window of size 4001 around x must be at most 150, we get the following audio

What To Submit

You must submit the file Lab5.java to Canvas. Please indicate as a comment which of the three options you chose to implement.