You are here: > Windows Server Products > Speech Server
ActiveXperts Network Monitor 2015 proactively manages network servers, devices, databases and more.

Windows Server Products - Speech Server

[By: Peter van Mol]

Setting up interactive voice-response systems (like those used by large financial companies and movie directory hotlines) used to require a significant investment in proprietary software. But with the debut of Microsoft Speech Server 2004 Standard Edition ($7,999 per CPU), speech-based computing is now within reach of a much wider range of businesses.

With Speech Server, you can develop several kinds of speech applications: touch tone interfaces, voice-driven menus, and multimodal interfaces (where voice supplements a standard visual Web interface). Microsoft's multimodal application style is new and lets callers interact with Web pages via voice as never before. Speech Server piggybacks on ASP.NET, adding speech to standard Web applications via its Speech Application Language Tags (SALT). It offers voice input recognition as well as text-to-speech conversion, with technology licensed from ScanSoft. However, it does not currently support VoiceXML.

ration of Speech Server is simple. If you can handle standard ASP.NET Web applications, you'll be able to administer speech-enabled ones, too. Basic admin is handled with a bare-bones MMC snap-in for configuring speech-enabled applications. Additional counters are available in Performance Monitor for pinpointing processing bottlenecks. For debugging on the server, Speech Server offers a trace utility and the ability to view server events within Windows Event Viewer.

The real power behind speech processing is the freely downloadable Microsoft Speech Server SDK 1.0. In the VoiceXML space, development often means using a text editor to write and tweak XML. Microsoft offers a component-based model for building speech apps, with some two dozen Visual Basic-style components for handling speech dialogs and managing phone calls—all without delving into the details of XML. In testing, we used these components to model several voice dialogs in C# and SALT from a legacy travel alert application built in VoiceXML.

Other standout tools make creating your first speech application relatively painless. A visual editor let us define and tweak speech recognition grammars (sets of phrases that are valid at particular points within a speech-enabled application). Professional speech developers record all valid prompts ahead of time in sound files; a handy prompt editor in the SDK let us manage a list of text prompts along with recorded WAV files. Speech Server SDK includes support for recording, playing back, and editing sampled speech.

A final noteworthy feature is a simulator for running speech applications on a desktop PC. After typing a starting URL, we were able to simulate a complete telephone session using a standard PC microphone, while viewing a detailed trace log of events. In testing, this feature proved effective, though we missed the color-coding available in VoiceXML solutions like Voxeo, which can make slogging through a trace of phone and speech activity a little easier.

Microsoft's opening gambit in IVR systems is promising, though it's not likely that businesses that have already invested in traditional voice software will jump onboard with this first release. If you are new to voice development, however, and haven't already invested in VoiceXML, the component-based style of programming in Speech Server lets you tackle speech applications with ease. It also expands the kinds of voice applications you can create on the Windows platform.