On Prosody modules running Aculab speech processing firmware, channels
are allocated with characteristics governed by the value of the
type
parameter of
sm_channel_alloc_placed()
as follows:
kSMChannelTypeInput
for an input-only channel, capable
of recording and detecting DTMF.
kSMChannelTypeOutput
for an output-only channel,
capable of replaying and/or generating conference outputs.
kSMChannelTypeHalfDuplex
for a channel capable of
replaying and/or recording, not simultaneously, and simultaneously
detecting DTMF.
kSMChannelTypeFullDuplex
for channel capable of
replaying and recording and detecting DTMF simultaneously.
Prosody has three different modes of DTMF detection. The correct mode should be chosen based on the requirements of the application. Although all modes detect all frequencies in the specification, the different modes have subtly different behaviour, as follows:
kSMToneDetectionNoMinDuration
- is the most sensitive and reacts to the shortest of digit durations.
kSMToneDetectionMinDuration64
- is less sensitive
than NoMinDuration, and will only detect DTMF digits that have a
duration of at least 64ms. It is therefore slightly more robust against
talk-off (see below).
kSMToneDetectionMinDuration40
- is almost equally
as sensitive as NoMinDuration, but has extra safeguards against talk-off (see
below). It will detect digits with durations as short as 40ms. It needs
at least 48ms gap between digits in a sequence.
A DTMF Digit is simply a pair of pure tones, added together. Each digit (0, 1, 2, ...) has a high frequency and a low frequency. There are four possible low frequencies and four possible high frequencies, making a total of 16 combinations. Because the DTMF digit is an audio signal, it is susceptible to interference and distortion in the telephone network. Further, the tones can be mimicked unintentionally by speech and other audio signals.
This is the term used when a DTMF detector mistakenly detects a speech signal as a DTMF digit. It is often the case that a speech signal momentarily contains very strong frequency components in just the right frequency bands, and a DTMF detection algorithm actually has to be quite shrewd to avoid talk off in some situations.
The measures taken against false (talk-off) detection include a very specific definition of frequency limits, a very carefully chosen signal-to-noise ratio requirement and a few secondary measures. The result of good talk-off performance is inevitably that sensitivity is reduced for signals outside of the specified signal requirements (frequencies, SNR, duration).
There are two standard talk-off tests, Mitel and Bellcore. Both are supplied as analogue cassette tapes, and so are not completely consistent within a digital environment. Prosody's DTMF detection algorithm performs well within acceptable limits for both tests, as shown below. The acceptable number of false-alarms for the Mitel test is 30, for the Bellcore test is 666.
kSMDetectModeNoMinDuration
- Mitel talk-off = 11,
Bellcore talk-off = 100 +/- 5
kSMDetectModeMinDuration64
- Mitel talk-off = 2,
Bellcore talk-off = 9 +/- 1
kSMDetectModeMinDuration40
- Mitel talk-off = 0,
Bellcore = 60 +/- 5
Note: For MinDuration40 detection, the Bellcore test is not an accurate reflection of the talk-off in real-world telephone signals. That is, the talk-off performance is better than the figure above would imply.
The term 'cut-through' refers to the ability of a system to respond to input while an outgoing prompt is in progress. As long as the correct types of channels are allocated, there is no reason why DTMF should not be detected while an outgoing signal is being generated. In the presence of echo, unwanted detections may occur, see below.
The DTMF detection on Prosody is very finely tuned in order to detect all DTMF digits within its specifications, and to reject the maximum amount of talk-off. As soon as an interfering signal is added to the DTMF digit, there is a possibility that the digit will not be recognised. As long as the power of the interfering signal is much less than the power of the DTMF digit itself, there will not be a problem.
Interfering signals can come from two sources - background noise or speech echo:
For this situation there are four solutions:
This is a combination of the 'background noise' and the 'echo' situations described above. A telephone should disable its microphone while dialling DTMF digits. This is particularly important for speakerphones. If this is not the case, the background noise problem will occur, and will be exacerbated by the fact that the microphone is open to the room. Further, if the speakerphone does not perform any acoustic echo cancellation, then a fraction of any outgoing speech will be added to the incoming DTMF signal, because of the acoustic coupling between loudspeaker and microphone. To an extent, the echo case can be solved by the techniques described below, but speakerphone echo tends to be far more difficult for an echo canceller to deal with.
Because analogue telephones reflect some of the outgoing signal, if a DTMF digit is present in the outgoing signal, it will potentially be detected by the channel which is connected to the incoming channel from the same network connection.
Work around this by ensuring that no DTMF tones exist in the outgoing signal. If DTMF signals must be transmitted, use echo cancellation to remove the DTMF signals. One module will perform echo cancellation on up to 30 duplex channels. See the speech processing API guide for details.
If an application uses DTMF for a user to navigate through a series of prompts, it is usually more appropriate to use 'trailing-edge' DTMF detection, described below. When in this mode, the application only responds to DTMF when the digit ceases. If leading edge is used (where the application responds when the digit starts) and the user holds a DTMF button down for over a second, they will probably miss the beginning of the next prompt, especially if the DTMF buttons are in the handset of the telephone.
Trailing-edge DTMF detection is enabled by using mode
kSMToneEndDetectionxxxx
rather than
kSMToneDetectionxxxx
.
It is often the case (particularly in interactive systems for use by the general public) that 'dirty' DTMF digits are received. These can take various forms with various reasons, but the effect is either:
The Prosody API provides a method for de-bouncing, which means either:
These are controlled by a call to sm_adjust_input_tone_set() to adjust the parameters:
kAdjustToneSetIntParamIdMinOnTime
kAdjustToneSetIntParamIdMinOffTime
When adjusted, these parameters apply to all tone detectors on a module which are using the tone detection table to which they were applied. That is, if sm_adjust_input_tone_set() is called with module 0 and tone-set 0, then all DTMF detection on module 0 will be affected. It is, of course possible to create multiple tone-sets, so a duplicate of the DTMF table can be created and modified with these parameters, if debouncing is only required on some channels. Design of custom tone-tables is explained in the Application Note Configuring universal tone detection on Prosody.
De-bouncing will only take effect if the detection mode is
kSMToneEndDetectionMinDuration64
or
kSMToneLenDetectionMinDuration64
.
Note: If tone detection mode kSMToneLenDetectionxxxx
is
used, the application can retrieve the duration of the tone and make
decisions based on that. This is independent of de-bouncing parameters -
that is, they will both work together.
A Prosody channel that is connected to a conference can be made to detect DTMF digits. In a conferencing environment, echoes in the participants' telephones can cause problems. Firstly, if a user hits a DTMF digit, all other parties will be disturbed by the sound of that digit. Secondly, if one user dials a digit, it can be echoed back from some (or potentially all) of the other parties, and therefore detected on all channels. When part of a conference the following features are enabled:
Document reference: AN 1338