Prosody speech processing: API: sm_record_start

Prototype Definition

int sm_record_start(struct sm_record_parms *recordp)

Parameters

a structure of the following type:

typedef struct sm_record_parms {
	tSMChannelId channel;					/* in */
	tSMChannelId alt_data_source;				/* in */
	enum kSMDataFormat type;				/* in */
	tSM_UT32 silence_elimination;				/* in */
	enum kSMToneDetection tone_elimination_mode;		/* in */
	tSM_UT32 tone_elimination_set_id;			/* in */
	tSM_UT32 max_octets;					/* in */
	tSM_UT32 max_elapsed_time;				/* in */
	tSM_UT32 max_silence;					/* in */
	tSM_INT agc;						/* in */
	tSM_INT volume;						/* in */
	enum kSMRecordAltSource {
		kSMRecordAltSourceDefault,
		kSMRecordAltSourceInput,
		kSMRecordAltSourceOutput,
	} alt_data_source_type;					/* in */
	tSM_UT32 sampling_rate;					/* in */
	double min_noise_level;					/* in */
	double grunt_threshold;					/* in */
	tSM_UT32 grunt_holdoff;					/* in */
	tSM_UT32 max_initial_silence;				/* in */
} SM_RECORD_PARMS;

Description

This call starts a new recording job using the specified channel.

Normally alt_data_source is set to kSMNullChannelId and the data that will be recorded will be that switched to this input channel. If however alt_data_source is set to the channel id of another existing channel, then the data source for the recording will be determined by the value of alt_data_source_type. Note that the channel specified in alt_data_source must not be reconfigured while this recording is in progress. When alt_data_source_type selects the output of a channel, the output datafeed from that channel must be referenced by calling sm_channel_get_datafeed() (or the legacy sm_switch_channel_output() ), before starting the recording.

The PCM data received will be encoded into buffers in the format specified by the type parameter which is a value from same range of values permitted in the type parameter of sm_replay_start().

Note that, for compatibility with earlier releases of Prosody, many other values are permitted for the type field. These compatibility values specify a combination of data type and sampling rate. When one of these is used in the type field, the sampling_rate field must be zero, and the actual rate used will be as listed here. They are:

compatibility code	new code
compatibility code	type	sampling rate
kSMDataFormat8KHzALawPCM	kSMDataFormatALawPCM	8000
kSMDataFormat8KHzULawPCM	kSMDataFormatULawPCM	8000
kSMDataFormat8KHzOKIADPCM	kSMDataFormatOKIADPCM	8000
kSMDataFormat8KHzACUBLKPCM	kSMDataFormatACUBLKPCM	8000
kSMDataFormat6KHzALawPCM	kSMDataFormatALawPCM	6000
kSMDataFormat6KHzULawPCM	kSMDataFormatULawPCM	6000
kSMDataFormat6KHzOKIADPCM	kSMDataFormatOKIADPCM	6000
kSMDataFormat6KHzACUBLKPCM	kSMDataFormatACUBLKPCM	6000
kSMDataFormat8KHz16bitMono	kSMDataFormat16bit	8000
kSMDataFormat8KHz8bitMono	kSMDataFormat8bit	8000
kSMDataFormat8KHzSigned8bitMono	kSMDataFormatSigned8bit	8000
kSMDataFormatIMAADPCM	kSMDataFormatIMAADPCM	8000

Any form of record requires the module inchan to have been downloaded in addition to the module that is required for the specific type of record, and any module required for the sampling rate:

record type	extra firmware required
kSMDataFormatALawPCM	recA
kSMDataFormatULawPCM	recmu
kSMDataFormatOKIADPCM	recoki
kSMDataFormatACUBLKPCM	recablk
kSMDataFormatSigned8bit	rec8b
kSMDataFormat8bit	recms8b
kSMDataFormat16bit	rec16b
kSMDataFormatIMAADPCM	recima
kSMDataFormatSpeex	speexrp

The sampling rate firmware:

sampling rate	extra firmware required
8000	-
6000	sixkin
11000	8_to_11

See Prosody application note: speech processing replay and record data formats for more details on data formats supported by Prosody and their appropriate use.

The volume parameter is the change in volume compared to the level of the data (i.e. set this to -6 to attenuate by 6dB). If AGC and volume are both applied, the change in volume requested is applied after AGC.

The agc parameter controls whether automatic gain control is applied to the recorded data. If agc is non-zero then automatic gain control is applied. Even if this is the case, the recording level is still governed by volume. The behaviour of the AGC algorithm may be controlled by changing its parameters, see sm_record_agc_adjust_settings() for more details.

The recorded data may be retrieved by the application through periodic calls to sm_get_recorded_data(). The amount of data recorded is determined by the termination criteria specified in the parameters:

max_octets	max octets of data to record, 0 if no limit
max_elapsed_time	max recording period in mS, 0 if no limit
max_silence	max period of silence in mS before recording terminated, 0 if no limit (see also max_initial_silence )

and also by the function sm_record_abort() which will terminate a recording directly.

If an event has been previously associated with a channel (see sm_channel_set_event()), then the driver will notify the application with that event whenever (for that channel):

recorded data becomes newly available for collection by sm_get_recorded_data()
recorded data remains available for collection by sm_get_recorded_data()
recording terminates due to one of the termination criteria being met

The channel is reserved for recording until sm_record_status() returns the status kSMRecordStatusComplete. No other recording activity can take place on the channel during this time.

Fields

channel

The channel to perform the record.

alt_data_source

kSMNullChannelId, or another channel whose input or output is to be recorded. If this specifies a channel, that channel must not be reconfigured while recording is taking place.

type

The format in which to record. (See the main text above for compatibility codes that can also be used in this field.) One of these values:

kSMDataFormatNone: Special value for test purposes only. This indicates that the channel should prepare as if it was about to play or record data, but not actually transfer any data.
kSMDataFormatALawPCM: G.711 A-law. This uses 8 bits per sample.
kSMDataFormatULawPCM: G.711 mu-law. This uses 8 bits per sample.
kSMDataFormatOKIADPCM: A 4-bit coding scheme.
kSMDataFormatACUBLKPCM: This format is obsolete, as cards fitted with SHARC DSPs are no longer supported. It has never been implemented for Prosody X cards.
kSMDataFormat16bit: 16-bit linear coding, where each sample is a signed value (-32768 to 32767). The first octet of each sample is the less significant one.
kSMDataFormat8bit: 8-bit unsigned linear coding, where each sample is an unsigned value (0 to 255). This is Microsoft's 8-bit format.
kSMDataFormatSigned8bit: 8-bit linear coding, where each sample is a signed value (-128 to 127).
kSMDataFormatIMAADPCM: A 4-bit coding scheme standardised by the Interactive Multimedia Association (IMA).
kSMDataFormatSpeex: A patent and royalty-free speech compression codec. Use of the functions sm_replay_start() and sm_record_start() only allows playback and recording using the default "narrowband" Speex configuration. Other operating modes and parameters will be made available via new API calls.

silence_elimination

The maximum duration (in mS) of silence to record. Silences longer than this are truncated to this length. The value zero disables silence elimination. Requires the module grunt.

tone_elimination_mode

What types of tones to eliminate from the recording. This allows the same tone detection as sm_listen_for(). Requires the module td unless the value is kSMToneDetectionNone. One of these values:

kSMToneDetectionNone: Simple tones never recognised.
kSMToneDetectionNoMinDuration: Simple tone detection enabled, no minimum period. If the correct frequencies are detected with the correct signal to noise ratio, twist, etc. for however short a duration, the tone is considered to be present and is recognised.
kSMToneDetectionMinDuration64: Simple tone detection enabled, tone must be valid for minimum period to be detected. If the tone is valid for 64mS it will definitely be detected. Tones of shorter duration between 32mS and 64mS may be detected but cannot be guaranteed. The minimum duration of a tone can be increased by setting the parameter kAdjustToneSetIntParamIdMinOnTime with sm_adjust_input_tone_set().
kSMToneDetectionMinDuration40: This mode uses a slightly more complex algorithm for analysing duration of a valid tone, and enables robust detection of tones with duration as short as 40mS.
kSMToneEndDetectionNoMinDuration: This mode is like kSMToneDetectionNoMinDuration but application notified when end of tone detected.
kSMToneEndDetectionMinDuration64: This mode is like kSMToneDetectionMinDuration64 but application notified when end of tone detected.
kSMToneEndDetectionMinDuration40: This mode is like kSMToneDetectionMinDuration40 but application notified when end of tone detected.
kSMToneLenDetectionNoMinDuration: This mode is like kSMToneEndDetectionNoMinDuration but returns additional tone duration information to application.
kSMToneLenDetectionMinDuration64: This mode is like kSMToneEndDetectionMinDuration64 but returns additional tone duration information to application.
kSMToneLenDetectionMinDuration40: This mode is like kSMToneEndDetectionMinDuration40 but returns additional tone duration information to application.
kSMToneDetectionAsListenFor: This mode is only valid when specified in the parameters for sm_record_start() and a tone detection mode is currently active on the same channel, started by sm_listen_for(). Any tones detected on the same channel as the recording will be eliminated from the recorded data.

tone_elimination_set_id

The tone set to use (only relevant if tone_elimination_mode is not kSMToneDetectionNone). See sm_listen_for() for details of how to select an input tone set.

max_octets

The maximum amount of data to record. The value zero indicates no maximum.

max_elapsed_time

The maximum duration of the recording in mS. The value zero indicates no maximum. Requires the module timerx.

max_silence

The maximum silence permitted (in mS). The value zero indicates no maximum. Silences longer than this cause the recording to terminate. Requires the module grunt.

agc

Indicator of whether automatic gain control is to be enabled. (non-zero) or not (zero). Requires the module gainbg.

volume

The desired adjustment to the volume (dB). The range of gain supported is at least +8 to -22 dB, Requires the module gainbg.

alt_data_source_type

If an alt_data_source channel is specified, which kind of data associated with that channel should be recorded. One of these values:

kSMRecordAltSourceDefault: If alt_data_source is an input only channel, then data switched to this channel input will be recorded, otherwise the data being generated on this channel output will be recorded (this feature is normally used to record conferenced outputs). This value is deprecated since it is equivalent to either kSMRecordAltSourceInput or kSMRecordAltSourceOutput which could be used instead.
kSMRecordAltSourceInput: Data switched to alt_data_source input will be recorded. This value is deprecated since several channels can take input from the same timeslot and that is a more straightforward way of achieving the same result.
kSMRecordAltSourceOutput: Data generated on alt_data_source output will be recorded.

sampling_rate

The sampling rate at which to record the data. Currently supported values are:

0 - record at the rate reported via sm_record_status().
8000 - the typical rate for telephony, since it is the rate at which telephone networks themselves operate.
6000 - a rate which reduces file sizes at the cost of lower quality.
11000 - a rate convenient for use with typical PC soundcards. This is sufficiently close to a quarter of the rate used by CDs (44100 Hz) that the difference is not significant, allowing almost universal compatibility with cheap PC soundcards which can handle 11025 Hz sampling.

Note that when you specify a non-zero value here, this function assumes that the source of the data to be recorded is providing data at 8000 samples per second. The use of data at other rates is not supported and will cause the data to be recorded at an incorrect sampling rate. Consequently, the use of a non-zero value in this field is deprecated.

min_noise_level

The minimum level, in dBm0, that the noise estimate of the grunt detector may reach. The default is -55 dBm0. Only used if silence_elimination or max_silence are non zero. Requires the module grunt.

grunt_threshold

The threshold, in dB, above the noise estimate of the grunt detector at which a signal is considered present. The default is 15 dB. Only used if min_noise_level is non zero. Requires the module grunt.

grunt_holdoff

The period, in ms, following start of speech, to disable updating the estimate of the background noise energy (a non-zero period, typically 1000ms, can be required when long periods of uninterrupted speech are expected). Requires the module grunt.

max_initial_silence

If both max_silence and this parameter are non-zero, then this parameter specifies the maximum period of silence allowed, in ms, prior to start of speech, whereas the max_silence timeout will now specify maximum period of silence allowed subsequent to the start of speech. Requires the module grunt.

Returns

0 if call completed successfully, otherwise a standard error such as:

ERR_SM_DEVERR - device error
ERR_SM_WRONG_CHANNEL_STATE - if already recording
ERR_SM_WRONG_CHANNEL_TYPE - if attempt to record using output channel
ERR_SM_NOT_SAME_MODULE - alt_data_source channel not located on same module

This function is part of the Prosody speech processing API.