Prosody speech processing: Notes on adding call progress tones
- There is only one active repertoire of call-progress tones at any time.
- The set of call-progress tones uses a given tone-set (modified using
sm_add_input_tone_set()).
- The default call-progress repertoire uses tone set 1, defined in
Prosody speech processing:
pre-loaded input tones
- A tone can be added to the call-progress repertoire using
sm_add_input_cptone(),
as long as it uses this same tone set.
- If use of a different tone-set is required, the entire current
repertoire of call-progress tones must be discarded and a new repertoire
created based on the new tone-set.
- To do this, create the new tone set using
sm_add_input_tone_set(),
use
sm_reset_input_cptones(),
with the new tone-set id, and build the new
call-progress repertoire from scratch.
-
Call progress tones are specified as a sequence of tones, each with a
frequency and a duration. The algorithm wakes up every time a tone
starts or stops, then looks to see if the detected sequence matches any
of the specified sequences in the Call Progress repertoire.
-
In the event that the cadence is not symmetrical, the state sequence
should be in reverse order. i.e.
states[0]
is later in time
than states[1]
. An example of this is the S.I.T. tone,
where the sequence is tone 2 - silence - tone 3 - silence - tone
4.
- The tone-ids used in call progress tone definition must be
incremented by 1, and a tone-id of zero is used to specify silence.
- Durations are stored internally as multiples of 32ms, so it is
useful to round durations before specifying them. This way you can be
sure of what durations are actually used.
- If the minimum duration for a state is zero, then the tone will be
recognised even if it never occurs.
-
In order to detect long periods of silence efficiently, the maximum
duration for a silent state is 2528ms (0x4f x 32ms). After this
duration, the algorithm will wake up and move to the next state. The
next state may also be silence.
-
The previous rule also applies to any tone with index 1. Maximum
duration in this case is 1248ms (0x27 x 32ms).
-
If a maximum duration is specified as
~0U
(the
largest unsigned integer value) this means that there is no maximum
duration. When checking a tone specified in this way, the software
considers the tone to have matched as soon as it has persisted for the
minimum duration. For all other maximum values, the software must
wait until the end of the tone so that it can check that its duration
does not exceed the maximum.
-
An entry in the call progress tone table will match an incoming tone
which contains the correct sequence of tones even if the sequence
does not start at the first item in the table. For example, the
S.I.T. tone, whose pattern is:
tone 2, silence, tone 3, silence,
tone 4, silence
will match any of these six incoming sequences:
No | Sequence
|
1
| tone 2,
| silence,
| tone 3,
| silence,
| tone 4,
| silence,
|
2
|
| silence,
| tone 3,
| silence,
| tone 4,
| silence,
| tone 2,
|
3
|
|
| tone 3,
| silence,
| tone 4,
| silence,
| tone 2,
| silence,
|
4
|
|
|
| silence,
| tone 4,
| silence,
| tone 2,
| silence,
| tone 3,
|
5
|
|
|
|
| tone 4,
| silence,
| tone 2,
| silence,
| tone 3,
| silence,
|
6
|
|
|
|
|
| silence,
| tone 2,
| silence,
| tone 3,
| silence,
| tone 4,
|
-
If a tone set is used which has both
Nl
tones in
band 1, denoted L[0]..L[Nl-1]
, and Nh
tones in
band 2, denoted H[0]..H[Nh-1]
, then any
combination of tones, L[i]
with H[j]
will
have a
freq_id
value of i + Nl * j
. For example, if there are two
tones in band 1, 350 Hz and 900 Hz, and three tones
in band 2, 500 Hz, 700 Hz, and 1234 Hz, then
the possible combinations are:
| tones with 500 Hz
| tones with 700 Hz
| tones with 1234 Hz
|
tones with 350 Hz
| 1 is 350 + 500
| 3 is 350 + 700
| 5 is 350 + 1234
|
tones with 900 Hz
| 2 is 900 + 500
| 4 is 900 + 700
| 6 is 900 + 1234
|