Tutorial Make your own Voice Command App using Java and Sphinx4

Ex094 · 09-10-2016, 01:23 PM

Blog Source: Procurity
GitHub Repo: VoiceCom - A Simple Voice Command App using Java and Sphinx4

Hello and welcome to another tutorial on Java, In this tutorial we'll be creating a Voice command application using Java and Sphinx4 Speech Recognition Library for Java.

If you are new to this Voice Command term, there are many apps that serve as an example in reality. If you are an Android user, you must have used the Google App where you speak "Ok Google" and it listens to your command and if you say something like "open google", it'll automatically launch Chrome and open Google.com on it.

Now when you speak into your mic, the computer might not be able to understand what is it that you are saying so we'll be providing our computer with the ability to recognize the words that we say and then covert them into a form that the computer is able to understand hence basis of the term Speech Recognition. You might be wondering how in the world are we going to do that? Well we don't have to worry about anything because we have been blessed with a library called Sphinx4 which does all the complex work for us hence we only have to call certain methods in order to create our Voice Command app.

Approach

So what is it that our app is going to do?

Code:
1) We will speak a command like "open terminal" in our mic

2) Sphinx4 detects and recognizes the words that we speak

3) Sphinx4 outputs the recognized words

4) We compare the words to our list of commands

5) If a command exists, it'll execute a certain task

6) Wait for another command and repeat step 2

Here we have our basic approach on creating our Voice command Application.

Requirements

For this app, you'll require the following:

1) Java 8
2) Sphinx4 (Download the latest Alpha 5 Version)
Goto https://oss.sonatype.org/#nexus-search and download the Alpha 5 sphinx-core.jar and sphinx4-data.jar files.
3) Netbeans IDE4) A good quality Microphone.

About Models

There are basically three models required for speech recognition in Sphinx4:

Code:
1) Acoustic Model

2) Phonetic Dictionary (File ends with .dict extension)

3) Language Model (File ends with .lm extension)

The sphinx4-data.jar comes with the English version of Acoustic Model as Default hence we will be using that, if you are using other language then you'll have to download it from Here.

Since we are creating a Voice Command app so we'll be creating our own Language Model and the Phonetic Dictionary because our vocabulary will be limited i.e. our commands only. Now lets create our needed files,

Creating Language Model & Dictionary

As said above our vocabulary is limited hence making the model and dict will be a breeze thanks to Sphinx Online Base Generator. But first we have to make a corpus (Data using which we will train our Language Model) file containing our commands for which we will create our Language Model and Dictionary. For this tutorial I'll be choosing 4 commands.

Code:
1) open file manager

2) open browser

3) close browser

4) close file manager

Now type these commands in your text file and save it. Then navigate to the Sphinx Online Base Generator, click Choose File and select your corpus text file. Now in response the site will give you a list of files, for now we are interested in the files ending with .dict and .lm extension, so download them and place them in your project folder.

Importing Jar Files

We'll create a new Java Project in NetBeans and then import some jar files for our project because they are required by Sphinx4. So when you have created your project, goto

Run > Set Project Configuration > Customize > Libraries > Add JAR/Folder

Now select the 2 jar files you downloaded earlier, sphinx4-core.jar and sphinx4-data.jar

Press Ok and you are all set, Now lets get to the coding part.

Coding the Application

Now that we are done creating and importing important files, we now have to create a Configuration object and pass it to the Recognizer so that it can make use of the required files, Create a new class called voiceLanucher in the project which will serve as our main class.

Code:
//Imports

import edu.cmu.sphinx.api.Configuration;

import edu.cmu.sphinx.api.LiveSpeechRecognizer;

import edu.cmu.sphinx.api.SpeechResult;

import java.io.IOException;

/**

 *

 * @author ex094

 */

public class VoiceLauncher {

    public static void main(String[] args) throws IOException {

        // Configuration Object

        Configuration configuration = new Configuration();

        // Set path to the acoustic model.

        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

        // Set path to the dictionary.

        configuration.setDictionaryPath("PATH_TO_YOUR_.DIC_FILE");

        // Set path to the language model.

        configuration.setLanguageModelPath("PATH_TO_YOUR_.LM_FILE");

      }

}

Replace the PATH_TO_YOUR_.DIC_FILE with the .dic file and PATH_TO_YOUR_.LM_FILE with the .lm file you downloaded from the Sphinx Online Base Generator earlier from the Creating Language Model and Dictionary.

The configuration object is now set and we need to pass it to the recognizer. Also we need the recognizer to use our microphone as a source of input, Gladly the latest (Alpha 5) makes it really easy. We just have to create a LiveSpeechRecognizer object, pass in the configuration and call the startRecognition method.

Code:
//Imports

import edu.cmu.sphinx.api.Configuration;

import edu.cmu.sphinx.api.LiveSpeechRecognizer;

import edu.cmu.sphinx.api.SpeechResult;

import java.io.IOException;

/**

 *

 * @author ex094

 */

public class VoiceLauncher {

    public static void main(String[] args) throws IOException {

        //Configuration Object

        Configuration configuration = new Configuration();

        // Set path to the acoustic model.

        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

        // Set path to the dictionary.

        configuration.setDictionaryPath("PATH_TO_YOUR_.DIC_FILE");

        // Set path to the language model.

        configuration.setLanguageModelPath("PATH_TO_YOUR_.LM_FILE");

        //Recognizer Object, Pass the Configuration object

        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)

        recognize.startRecognition(true);

     }

}

Now that the recognition process has started, the recognizer will take your speech when ever you speak into the mic and then processes. For the voice command app we definitely need to check that what type of command is the user giving hence we need the recognizer to display the result that what command has it recognized from the speech.

For that we will use the getHypothesis() method from the SpeechResult object, using a while loop we will be able to get all the recognized speech that the user will speak.

Code:
//Imports

import edu.cmu.sphinx.api.Configuration;

import edu.cmu.sphinx.api.LiveSpeechRecognizer;

import edu.cmu.sphinx.api.SpeechResult;

import java.io.IOException;

/**

 *

 * @author ex094

 */

public class VoiceLauncher {

    public static void main(String[] args) throws IOException {

        //Configuration Object

        Configuration configuration = new Configuration();

        // Set path to the acoustic model.

        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

        // Set path to the dictionary.

        configuration.setDictionaryPath("PATH_TO_YOUR_.DIC_FILE");

        // Set path to the language model.

        configuration.setLanguageModelPath("PATH_TO_YOUR_.LM_FILE");

        //Recognizer object, Pass the Configuration object

        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)

        recognize.startRecognition(true);

        //Create SpeechResult Object

        SpeechResult result;

        //Checking if recognizer has recognized the speech

        while ((result = recognize.getResult()) != null) {

            //Get the recognize speech

            String command = result.getHypothesis();

        }

    }

}

The command variable will store the recognized speech from the user (The command that you speak) in string format hence we can compare whether the recognized command matches any from our list of commands and then execute the command. We will be using if conditions but you can do it using switch conditional.

Code:
//Imports

import edu.cmu.sphinx.api.Configuration;

import edu.cmu.sphinx.api.LiveSpeechRecognizer;

import edu.cmu.sphinx.api.SpeechResult;

import java.io.IOException;

/**

 *

 * @author ex094

 */

public class VoiceLauncher {

    public static void main(String[] args) throws IOException {

        //Configuration Object

        Configuration configuration = new Configuration();

        // Set path to the acoustic model.

        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

        // Set path to the dictionary.

        configuration.setDictionaryPath("/home/ex094/Desktop/4220.dic");

        // Set path to the language model.

        configuration.setLanguageModelPath("/home/ex094/Desktop/4220.lm");

        //Recognizer object, Pass the Configuration object

        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)

        recognize.startRecognition(true);

        //Creating SpeechResult object

        SpeechResult result;

        //Check if recognizer recognized the speech

        while ((result = recognize.getResult()) != null) {

            //Get the recognized speech

            String command = result.getHypothesis();

            //Match recognized speech with our commands

            if(command.equalsIgnoreCase("open file manager")) {

                System.out.println("File Manager Opened!");

            } else if (command.equalsIgnoreCase("close file manager")) {

                System.out.println("File Manager Closed!");

            } else if (command.equalsIgnoreCase("open browser")) {

                System.out.println("Browser Opened!");

            } else if (command.equalsIgnoreCase("close browser")) {

                System.out.println("Browser Closed!");

            }

        }

    }

}

Since the recognized speech is stored in our command variable, we can now compare using String comparison easily. Now run the code and speak into your mic one of the 4 commands If you speak "open filemanager" it should print "File Manager Opened".

After your testing is complete, it's time to add real commands like the one's that'll open the file manager when you speak the "open file manager" command. We will store the command in a variable and then use the Process library to execute the commands.

Code:
//Imports

import edu.cmu.sphinx.api.Configuration;

import edu.cmu.sphinx.api.LiveSpeechRecognizer;

import edu.cmu.sphinx.api.SpeechResult;

import java.io.IOException;

/**

 *

 * @author ex094

 */

public class VoiceLauncher {

    public static void main(String[] args) throws IOException {

        //Configuration Object

        Configuration configuration = new Configuration();

        // Set path to the acoustic model.

        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

        // Set path to the dictionary.

        configuration.setDictionaryPath("/home/ex094/Desktop/4220.dic");

        // Set path to the language model.

        configuration.setLanguageModelPath("/home/ex094/Desktop/4220.lm");

        //Recognizer object, Pass the Configuration object

        LiveSpeechRecognizer recognize = new LiveSpeechRecognizer(configuration);

        //Start Recognition Process (The bool parameter clears the previous cache if true)

        recognize.startRecognition(true);

        //Creating SpeechResult object

        SpeechResult result;

        //Check if recognizer recognized the speech

        while ((result = recognize.getResult()) != null) {

            //Get the recognized speech

            String command = result.getHypothesis();

            String work = null;

            Process p;

            if(command.equalsIgnoreCase("open file manager")) {

                work = "nautilus";

            } else if (command.equalsIgnoreCase("close file manager")) {

                work = "pkill nautilus";

            } else if (command.equalsIgnoreCase("open browser")) {

                work = "google-chrome";

            } else if (command.equalsIgnoreCase("close browser")) {

                work = "pkill google-chrome";

            }

            //In case command recognized is none of the above hence work might be null

            if(work != null) {

                //Execute the command

                p = Runtime.getRuntime().exec(work);

            }

        }

    }

}

Run this code and speak the "open browser" command into your mic, it should open the file manager, test all the other commands as well.

Adding more commands

In order to add more commands, just add your new commands in your previous corpus.txt file and then repeat the steps from the Creating Language Model and Dictionary.

If-Else Spaghetti

There might come a time when you'll have a lot of commands in your program and putting them in if-else would be an absolute mess, so what to do? The best thing would be to load all the commands from corpus text file inside a HashTable and map the speech command to it's respective executable command. I'll add an updated code in the github repo of this tutorial so in case if you needed it.

That's it for this tutorial, have fun with your voice command app Smile