Sports Scores, Inc
When you want scores just Say it.

By: Matthew Moore
Michael Dulberg

 

Menu

Key Components


Results

Presentation

 


Objective

The purpose of this project is to facilitate sports news information using voice recognition. To optimize the system for multiple users, it must adapt to a specific user. The system must be user friendly and report the sought score for the specified team back in a timely fashion.

    Sport is a form of entertainment that most people enjoy worldwide. Across the world people play and watch a variety of games to entertain themselves.  From Football to Soccer, Basketball to Golf, people of all ages are interested in finding out how different teams or individuals have performed in their game of choice. Voice recognition will be used to facilitate the desire to find out sports. Due to limited training data the system also must be reconfigurable as to allow multiple users.

Top

Past Solutions

Today’s cell phones, websites, and 1-800 numbers have the ability for speech recognition. Cell phones often implement speech recognition to look up contact information and key cell phone features. Some websites and 1-800 numbers use speech recognition for database retrieval. This project is a follow up on these types of systems. Due to the limited amount of training database our project must adapt as it is used.

Top

Flow Diagram

Top

Installation/Training

    The initial training of the system is equivalent to an installation file. Here the user will run the training program in which the transition matrix and other matrices needed to perform the HMM are saved to a load file. Each time the training program is ran the user begins with a preset HMM matrices. Once the training has finished the user can run the GUI program  to capture the necessary input voice and report the score or retrain the system to that utterance.

Top

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 Endpoint Detection

Endpoint detection comes to find the beginning and end of an utterance. This increases the likelihood that the HMM testing will return the correct score for that particular utterance. L.R. Rabiner in 1974 along with M.R. Sambur wrote a paper titled An Algorithm for Determining the Enpoint of Isolated Utterances. In this paper they discussed a technique in which breaking the utterance into several 10ms windows and summing the energy about this window. From there it can be understood that once an utterance is spoken it will surpass an energy threshold that will begin the actual utterance. To find this threshold we found the average energy over the last 30-40 windows. This region is the noise floor of the utterance and by adding a small buffer to that average we can find the energy threshold. Rabiner also stated that zero crossing rate can improve endpoint detection. Our system does not employ zero crossing rate as a merit of endpoints as we found we could get responses to just the energy threshold detection of the signal. 

Another reference paper consulted for our endpoint detection development was titled A Robust Algorithm for Word Boundary Detection in the Presence of Noise by Junqua, Mak, and Reaves.

HMM Voice Recognition

    Hidden Markov Models is a statistical approach to voice recognition. Different parameters can be trained for each word using HMM. Our project varied the number of states, Gaussian mixtures, and maximum number of iterations. To extract the features used in the HMM training and recognition we used the MFCC feature extraction function from the Auditory Toolbox. The HMM testing was done using the HMM toolbox.

Top

Top

Decision Thresholds

    The next part of the project is the decision tree. Once the user has recorded the team of choice, the program determines if it is an acceptable utterance. If it was not, the user will be asked to choose between the two highest recognition scores that were found. Once the user has input their choice, a system-retraining option will be offered to them.
Top

Our Solution

The Hidden Markov Models speech recognition technique was employed to provide accurate detection of spoken team names. Since HMM is a more sophisticated technique for voice recognition it requires additional processing time. This disadvantage though alleviated by the heightened recognition percentage for a large vocabulary.

Our system implements the HMM training process and detection in a two stage process. The first step in the process is similar to an installation process. Here the initial training is done and all relevant matrices used in the HMM evaluation are saved. Since training for a large vocabulary takes a long time this step allows the user to have pre-saved information before the program begins and save time. Once the installation or training is complete the user begins the Graphical User Interface program. The GUI lends the user simple tools to pick which team’s score they want to be displayed. All detection and database retrieval takes place within this main program.

The following diagram explains what goes on once the user has recorded their voice sample.  Some of the key features in the main code include endpoint detection, a decision structure, and retraining.
 

Top

Retraining the system

    Once the user has picked the correct utterance they have the option to retrain the system to that utterance. For several utterances the score may be returned for either the complete team name or part of the team name so the user may choose not to retrain the system to that utterance. This provides flexibility for the user. The basics of the retraining code involves incorporation of the previously installed HMM matrices, training extracted features and the test utterances feature extraction. To retrain the system we use the previous HMM matrices as the initial guess for the mhmm_em function algorithm. The algorithm adds the test feature set to the last three original training feature sets. This allows the system to remain speaker independent but become more accurate as the same user uses the system more.

Top

Directions

The following are directions for the use of this system. We assume the user has Matlab 6.5 or older.

1)      Download the Zip file containing all the required toolboxes and used functions

2)      In Matlab, set path to the folder in which you’ve saved the above mentioned Zip file

3)      Type Training This will install the HMM data that will be used

4)      Type Project Next the GUI should pop up

5)      Check the sport you would like to know about

6)      Click on the “Record” button and record a team name. The recording will stop after 2 seconds

7)      Next, if the sample was recognized beyond any doubt, the corresponding score will be displayed. If the log-liklihood scores were too close, the top two will be displayed and you will have to choose one. In case neither of the options was the spoken team name, a new recording should be made by going back to step 6

 

A new recording can be made at any point once the GUI is functional.

Top