Prosody speech processing: API: sm_record_start

Prototype Definition

int sm_record_start(struct sm_record_parms *recordp)

Parameters

*recordp
a structure of the following type:
typedef struct sm_record_parms {
	tSMChannelId channel;					/* in */
	tSMChannelId alt_data_source;				/* in */
	enum kSMDataFormat type;				/* in */
	tSM_UT32 silence_elimination;				/* in */
	enum kSMToneDetection tone_elimination_mode;		/* in */
	tSM_UT32 tone_elimination_set_id;			/* in */
	tSM_UT32 max_octets;					/* in */
	tSM_UT32 max_elapsed_time;				/* in */
	tSM_UT32 max_silence;					/* in */
	tSM_INT agc;						/* in */
	tSM_INT volume;						/* in */
	enum kSMRecordAltSource {
		kSMRecordAltSourceDefault,
		kSMRecordAltSourceInput,
		kSMRecordAltSourceOutput,
	} alt_data_source_type;					/* in */
	tSM_UT32 sampling_rate;					/* in */
	double min_noise_level;					/* in */
	double grunt_threshold;					/* in */
	tSM_UT32 grunt_holdoff;					/* in */
	tSM_UT32 max_initial_silence;				/* in */
} SM_RECORD_PARMS;

Description

This call starts a new recording job using the specified channel.

Normally alt_data_source is set to kSMNullChannelId and the data that will be recorded will be that switched to this input channel. If however alt_data_source is set to the channel id of another existing channel, then the data source for the recording will be determined by the value of alt_data_source_type. Note that the channel specified in alt_data_source must not be reconfigured while this recording is in progress. When alt_data_source_type selects the output of a channel, the output datafeed from that channel must be referenced by calling sm_channel_get_datafeed() (or the legacy sm_switch_channel_output() ), before starting the recording.

The PCM data received will be encoded into buffers in the format specified by the type parameter which is a value from same range of values permitted in the type parameter of sm_replay_start().

Note that, for compatibility with earlier releases of Prosody, many other values are permitted for the type field. These compatibility values specify a combination of data type and sampling rate. When one of these is used in the type field, the sampling_rate field must be zero, and the actual rate used will be as listed here. They are:

compatibility code new code
type sampling rate
kSMDataFormat8KHzALawPCM kSMDataFormatALawPCM 8000
kSMDataFormat8KHzULawPCM kSMDataFormatULawPCM 8000
kSMDataFormat8KHzOKIADPCM kSMDataFormatOKIADPCM 8000
kSMDataFormat8KHzACUBLKPCM kSMDataFormatACUBLKPCM 8000
kSMDataFormat6KHzALawPCM kSMDataFormatALawPCM 6000
kSMDataFormat6KHzULawPCM kSMDataFormatULawPCM 6000
kSMDataFormat6KHzOKIADPCM kSMDataFormatOKIADPCM 6000
kSMDataFormat6KHzACUBLKPCM kSMDataFormatACUBLKPCM 6000
kSMDataFormat8KHz16bitMono kSMDataFormat16bit 8000
kSMDataFormat8KHz8bitMono kSMDataFormat8bit 8000
kSMDataFormat8KHzSigned8bitMono kSMDataFormatSigned8bit 8000
kSMDataFormatIMAADPCM kSMDataFormatIMAADPCM 8000

Any form of record requires the module inchan to have been downloaded in addition to the module that is required for the specific type of record, and any module required for the sampling rate:

record type extra firmware required
kSMDataFormatALawPCM recA
kSMDataFormatULawPCM recmu
kSMDataFormatOKIADPCM recoki
kSMDataFormatACUBLKPCM recablk
kSMDataFormatSigned8bit rec8b
kSMDataFormat8bit recms8b
kSMDataFormat16bit rec16b
kSMDataFormatIMAADPCM recima
kSMDataFormatSpeex speexrp

The sampling rate firmware:

sampling rate extra firmware required
8000 -
6000 sixkin
11000 8_to_11

See Prosody application note: speech processing replay and record data formats for more details on data formats supported by Prosody and their appropriate use.

The volume parameter is the change in volume compared to the level of the data (i.e. set this to -6 to attenuate by 6dB). If AGC and volume are both applied, the change in volume requested is applied after AGC.

The agc parameter controls whether automatic gain control is applied to the recorded data. If agc is non-zero then automatic gain control is applied. Even if this is the case, the recording level is still governed by volume. The behaviour of the AGC algorithm may be controlled by changing its parameters, see sm_record_agc_adjust_settings() for more details.

The recorded data may be retrieved by the application through periodic calls to sm_get_recorded_data(). The amount of data recorded is determined by the termination criteria specified in the parameters:

max_octets max octets of data to record, 0 if no limit
max_elapsed_time max recording period in mS, 0 if no limit
max_silence max period of silence in mS before recording terminated, 0 if no limit (see also max_initial_silence )

and also by the function sm_record_abort() which will terminate a recording directly.

If an event has been previously associated with a channel (see sm_channel_set_event()), then the driver will notify the application with that event whenever (for that channel):

The channel is reserved for recording until sm_record_status() returns the status kSMRecordStatusComplete. No other recording activity can take place on the channel during this time.

Fields

channel
The channel to perform the record.
alt_data_source
kSMNullChannelId, or another channel whose input or output is to be recorded. If this specifies a channel, that channel must not be reconfigured while recording is taking place.
type
The format in which to record. (See the main text above for compatibility codes that can also be used in this field.) One of these values:
kSMDataFormatNone
Special value for test purposes only. This indicates that the channel should prepare as if it was about to play or record data, but not actually transfer any data.
kSMDataFormatALawPCM
G.711 A-law. This uses 8 bits per sample.
kSMDataFormatULawPCM
G.711 mu-law. This uses 8 bits per sample.
kSMDataFormatOKIADPCM
A 4-bit coding scheme.
kSMDataFormatACUBLKPCM
This format is obsolete, as cards fitted with SHARC DSPs are no longer supported. It has never been implemented for Prosody X cards.
kSMDataFormat16bit
16-bit linear coding, where each sample is a signed value (-32768 to 32767). The first octet of each sample is the less significant one.
kSMDataFormat8bit
8-bit unsigned linear coding, where each sample is an unsigned value (0 to 255). This is Microsoft's 8-bit format.
kSMDataFormatSigned8bit
8-bit linear coding, where each sample is a signed value (-128 to 127).
kSMDataFormatIMAADPCM
A 4-bit coding scheme standardised by the Interactive Multimedia Association (IMA).
kSMDataFormatSpeex
A patent and royalty-free speech compression codec. Use of the functions sm_replay_start() and sm_record_start() only allows playback and recording using the default "narrowband" Speex configuration. Other operating modes and parameters will be made available via new API calls.
silence_elimination
The maximum duration (in mS) of silence to record. Silences longer than this are truncated to this length. The value zero disables silence elimination. Requires the module grunt.
tone_elimination_mode
What types of tones to eliminate from the recording. This allows the same tone detection as sm_listen_for(). Requires the module td unless the value is kSMToneDetectionNone. One of these values:
kSMToneDetectionNone
Simple tones never recognised.
kSMToneDetectionNoMinDuration
Simple tone detection enabled, no minimum period. If the correct frequencies are detected with the correct signal to noise ratio, twist, etc. for however short a duration, the tone is considered to be present and is recognised.
kSMToneDetectionMinDuration64
Simple tone detection enabled, tone must be valid for minimum period to be detected. If the tone is valid for 64mS it will definitely be detected. Tones of shorter duration between 32mS and 64mS may be detected but cannot be guaranteed. The minimum duration of a tone can be increased by setting the parameter kAdjustToneSetIntParamIdMinOnTime with sm_adjust_input_tone_set().
kSMToneDetectionMinDuration40
This mode uses a slightly more complex algorithm for analysing duration of a valid tone, and enables robust detection of tones with duration as short as 40mS.
kSMToneEndDetectionNoMinDuration
This mode is like kSMToneDetectionNoMinDuration but application notified when end of tone detected.
kSMToneEndDetectionMinDuration64
This mode is like kSMToneDetectionMinDuration64 but application notified when end of tone detected.
kSMToneEndDetectionMinDuration40
This mode is like kSMToneDetectionMinDuration40 but application notified when end of tone detected.
kSMToneLenDetectionNoMinDuration
This mode is like kSMToneEndDetectionNoMinDuration but returns additional tone duration information to application.
kSMToneLenDetectionMinDuration64
This mode is like kSMToneEndDetectionMinDuration64 but returns additional tone duration information to application.
kSMToneLenDetectionMinDuration40
This mode is like kSMToneEndDetectionMinDuration40 but returns additional tone duration information to application.
kSMToneDetectionAsListenFor
This mode is only valid when specified in the parameters for sm_record_start() and a tone detection mode is currently active on the same channel, started by sm_listen_for(). Any tones detected on the same channel as the recording will be eliminated from the recorded data.
tone_elimination_set_id
The tone set to use (only relevant if tone_elimination_mode is not kSMToneDetectionNone). See sm_listen_for() for details of how to select an input tone set.
max_octets
The maximum amount of data to record. The value zero indicates no maximum.
max_elapsed_time
The maximum duration of the recording in mS. The value zero indicates no maximum. Requires the module timerx.
max_silence
The maximum silence permitted (in mS). The value zero indicates no maximum. Silences longer than this cause the recording to terminate. Requires the module grunt.
agc
Indicator of whether automatic gain control is to be enabled. (non-zero) or not (zero). Requires the module gainbg.
volume
The desired adjustment to the volume (dB). The range of gain supported is at least +8 to -22 dB, Requires the module gainbg.
alt_data_source_type
If an alt_data_source channel is specified, which kind of data associated with that channel should be recorded. One of these values:
kSMRecordAltSourceDefault
If alt_data_source is an input only channel, then data switched to this channel input will be recorded, otherwise the data being generated on this channel output will be recorded (this feature is normally used to record conferenced outputs). This value is deprecated since it is equivalent to either kSMRecordAltSourceInput or kSMRecordAltSourceOutput which could be used instead.
kSMRecordAltSourceInput
Data switched to alt_data_source input will be recorded. This value is deprecated since several channels can take input from the same timeslot and that is a more straightforward way of achieving the same result.
kSMRecordAltSourceOutput
Data generated on alt_data_source output will be recorded.
sampling_rate
The sampling rate at which to record the data. Currently supported values are: Note that when you specify a non-zero value here, this function assumes that the source of the data to be recorded is providing data at 8000 samples per second. The use of data at other rates is not supported and will cause the data to be recorded at an incorrect sampling rate. Consequently, the use of a non-zero value in this field is deprecated.
min_noise_level
The minimum level, in dBm0, that the noise estimate of the grunt detector may reach. The default is -55 dBm0. Only used if silence_elimination or max_silence are non zero. Requires the module grunt.
grunt_threshold
The threshold, in dB, above the noise estimate of the grunt detector at which a signal is considered present. The default is 15 dB. Only used if min_noise_level is non zero. Requires the module grunt.
grunt_holdoff
The period, in ms, following start of speech, to disable updating the estimate of the background noise energy (a non-zero period, typically 1000ms, can be required when long periods of uninterrupted speech are expected). Requires the module grunt.
max_initial_silence
If both max_silence and this parameter are non-zero, then this parameter specifies the maximum period of silence allowed, in ms, prior to start of speech, whereas the max_silence timeout will now specify maximum period of silence allowed subsequent to the start of speech. Requires the module grunt.

Returns

0 if call completed successfully, otherwise a standard error such as:


This function is part of the Prosody speech processing API.