Prosody speech processing: API: sm_conf_prim_adj_tracking

Prototype Definition

int sm_conf_prim_adj_tracking(struct sm_conf_prim_adj_tracking_parms *trackp)

Parameters

a structure of the following type:

typedef struct sm_conf_prim_adj_tracking_parms {
	tSMChannelId channel;					/* in */
	double min_noise_level;					/* in */
	double speech_thresh;					/* in */
} SM_CONF_PRIM_ADJ_TRACKING_PARMS;

Description

Adjusts two parameters for the designated input channel that control the criteria by which the channel is reported as having an active input when it is included as one of the participants in a conference. An input is only added to a conference when it is considered to be active.

The speech detection algorithm assumes a fairly constant level of background noise, over which is the speech. It also assumes that there are some pauses in the speech.

The signal on an incoming timeslot is analysed to produce two measurements that determine the eventual noise threshold. These measurements are Lmin, which is the lowest energy monitored, and Lmax, which is the highest energy monitored. Since the speech is assumed to have pauses, Lmin is the quietest level of noise. To allow for some variation in the level of noise, the noise threshold is set a little above the Lmin level. The signal is assumed to contain speech when it is above this threshold. The exact threshold value used is:

	Lmin + (Lmax - Lmin) * speech_thresh

This means that speech_thresh specifies the proportion of the distance between Lmin and Lmax that the threshold is above Lmin. The diagram illustrates this:

diagram of speech threshold .

The default value of speech_thresh is 0.01, which means it raises the threshold above Lmin by 1% of the difference between the loudest and quietest sounds in the signal. To make the detector less sensitive, this value should be increased, though values above 0.03 usually make it too insensitive.

The other adjustable parameter, min_noise_level, specifies the smallest value permitted for Lmin. If the value calculated from the signal is below this, then this value is used instead. This prevents the threshold from being set too low when there is no noise, such as when the caller has muted their phone. The default value for this level is -53 dBm0. To make the detector less sensitive, this value should be increased, though values above -34 usually make it too insensitive.

When speech_thresh is zero, if the signal level is above min_noise_level then the signal is considered to be active. In this case, setting min_noise_level to -90 dBm0 or lower will cause the input to be considered active always.

Note: all input settings are lost when the channel is no longer a conference input unless the channel has been explicitly attached for conferencing by calling sm_conf_prim_attach().

Fields

channel: The input channel which has been attached to conferencing and which is to be adjusted.
min_noise_level: The new value for the minimum noise level (in dBm0).
speech_thresh: The new value for the speech threshold ratio.

Returns

0 if call completed successfully, otherwise a standard error such as:

ERR_SM_DEVERR - device error
ERR_SM_BAD_PARAMETER - illegal volume or agc value
ERR_SM_WRONG_CHANNEL_STATE - if no conference started on channel

This function is part of the Prosody speech processing API.