Prosody speech processing: API: sm_record_start
Prototype Definition
int sm_record_start(struct sm_record_parms *recordp)
Parameters
- *recordp
-
a structure of the following type:
typedef struct sm_record_parms {
tSMChannelId channel; /* in */
tSMChannelId alt_data_source; /* in */
enum kSMDataFormat type; /* in */
tSM_UT32 silence_elimination; /* in */
enum kSMToneDetection tone_elimination_mode; /* in */
tSM_UT32 tone_elimination_set_id; /* in */
tSM_UT32 max_octets; /* in */
tSM_UT32 max_elapsed_time; /* in */
tSM_UT32 max_silence; /* in */
tSM_INT agc; /* in */
tSM_INT volume; /* in */
enum kSMRecordAltSource {
kSMRecordAltSourceDefault,
kSMRecordAltSourceInput,
kSMRecordAltSourceOutput,
} alt_data_source_type; /* in */
tSM_UT32 sampling_rate; /* in */
double min_noise_level; /* in */
double grunt_threshold; /* in */
tSM_UT32 grunt_holdoff; /* in */
tSM_UT32 max_initial_silence; /* in */
} SM_RECORD_PARMS;
Description
This call starts a new recording job using the specified
channel.
Normally
alt_data_source
is set to kSMNullChannelId
and the data that will
be recorded will be that switched to this input channel. If
however
alt_data_source
is set to the channel id of another existing channel, then the
data source for the recording will be determined by the value of
alt_data_source_type.
Note that the channel specified in
alt_data_source
must not be reconfigured while this recording is in progress. When
alt_data_source_type
selects the output of a channel, the output datafeed from that channel
must be referenced by calling
sm_channel_get_datafeed()
(or the legacy
sm_switch_channel_output()
), before starting the recording.
The PCM data received will be encoded into buffers in the format
specified by the
type
parameter which is a value from same range of values
permitted in the
type
parameter of
sm_replay_start().
Note that, for compatibility with earlier
releases of Prosody, many other values are permitted for the
type
field. These compatibility values specify a combination of
data type and sampling rate. When one of these is used in
the
type
field, the
sampling_rate
field must be zero, and the actual rate used will be as
listed here. They are:
compatibility code |
new code |
type |
sampling rate |
kSMDataFormat8KHzALawPCM |
kSMDataFormatALawPCM |
8000 |
kSMDataFormat8KHzULawPCM |
kSMDataFormatULawPCM |
8000 |
kSMDataFormat8KHzOKIADPCM |
kSMDataFormatOKIADPCM |
8000 |
kSMDataFormat8KHzACUBLKPCM |
kSMDataFormatACUBLKPCM |
8000 |
kSMDataFormat6KHzALawPCM |
kSMDataFormatALawPCM |
6000 |
kSMDataFormat6KHzULawPCM |
kSMDataFormatULawPCM |
6000 |
kSMDataFormat6KHzOKIADPCM |
kSMDataFormatOKIADPCM |
6000 |
kSMDataFormat6KHzACUBLKPCM |
kSMDataFormatACUBLKPCM |
6000 |
kSMDataFormat8KHz16bitMono |
kSMDataFormat16bit |
8000 |
kSMDataFormat8KHz8bitMono |
kSMDataFormat8bit |
8000 |
kSMDataFormat8KHzSigned8bitMono |
kSMDataFormatSigned8bit |
8000 |
kSMDataFormatIMAADPCM |
kSMDataFormatIMAADPCM |
8000 |
Any form of record requires the module
inchan
to have been downloaded in addition to the module
that is required for the specific type of record, and any
module required for the sampling rate:
The sampling rate firmware:
sampling rate |
extra firmware required |
8000 | - |
6000 |
sixkin
|
11000 |
8_to_11
|
See
Prosody application note: speech processing replay and record data formats
for more details on data formats supported by
Prosody and their appropriate use.
The
volume
parameter is the change in volume compared to the level of
the data (i.e. set this to -6
to attenuate by
6dB). If AGC and volume are both applied, the change in volume
requested is applied after AGC.
The
agc
parameter controls whether automatic gain control is applied to
the recorded data. If
agc
is non-zero then automatic gain control is applied. Even if this
is the case, the recording level is still governed by volume.
The behaviour of the AGC algorithm may be controlled by changing
its parameters, see
sm_record_agc_adjust_settings()
for more details.
The recorded data may be retrieved by the application through
periodic calls to
sm_get_recorded_data().
The amount of data recorded is determined by the termination
criteria specified in the parameters:
and also by the function
sm_record_abort()
which will terminate a recording directly.
If an event has been previously associated with a channel (see
sm_channel_set_event()),
then the driver will notify the application with that event
whenever (for that channel):
- recorded data becomes newly available for collection by
sm_get_recorded_data()
-
recorded data remains available for collection by
sm_get_recorded_data()
- recording terminates due to one of the termination
criteria being met
The channel is reserved for recording until
sm_record_status()
returns the status
kSMRecordStatusComplete. No other recording activity can
take place on the channel during this time.
Fields
- channel
- The channel to perform the record.
- alt_data_source
kSMNullChannelId
, or another
channel whose input or output is to be recorded. If this
specifies a channel, that channel must not be reconfigured
while recording is taking place.
- type
- The format in which to record. (See the main text above for
compatibility codes that can also be
used in this field.)
One of these values:
- kSMDataFormatNone
- Special value for test purposes only. This indicates that the
channel should prepare as if it was about to play or record
data, but not actually transfer any data.
- kSMDataFormatALawPCM
- G.711 A-law. This uses 8 bits per sample.
- kSMDataFormatULawPCM
- G.711 mu-law. This uses 8 bits per sample.
- kSMDataFormatOKIADPCM
- A 4-bit coding scheme.
- kSMDataFormatACUBLKPCM
- This format is obsolete, as cards fitted with SHARC DSPs are no longer supported.
It has never been implemented for Prosody X cards.
- kSMDataFormat16bit
- 16-bit linear coding, where each sample is a signed value
(-32768 to 32767). The first octet of each sample is the less
significant one.
- kSMDataFormat8bit
- 8-bit unsigned linear coding, where each sample is an unsigned value
(0 to 255). This is Microsoft's 8-bit format.
- kSMDataFormatSigned8bit
- 8-bit linear coding, where each sample is a signed value (-128 to 127).
- kSMDataFormatIMAADPCM
- A 4-bit coding scheme standardised by the Interactive Multimedia
Association (IMA).
- kSMDataFormatSpeex
- A patent and royalty-free speech compression codec. Use of the functions
sm_replay_start()
and
sm_record_start()
only allows playback and recording using the default "narrowband" Speex
configuration. Other operating modes and parameters will be made available
via new API calls.
- silence_elimination
- The maximum duration (in mS) of silence to record. Silences longer
than this are truncated to this length. The value zero disables
silence elimination.
Requires the module
grunt.
- tone_elimination_mode
- What types of tones to eliminate from the recording. This
allows the same tone detection as
sm_listen_for().
Requires the module
td
unless the value is
kSMToneDetectionNone.
One of these values:
- kSMToneDetectionNone
- Simple tones never recognised.
- kSMToneDetectionNoMinDuration
- Simple tone detection enabled, no minimum period. If
the correct frequencies are detected with the correct
signal to noise ratio, twist, etc. for however short a
duration, the tone is considered to be present and is
recognised.
- kSMToneDetectionMinDuration64
- Simple tone detection enabled, tone must be valid
for minimum period to be detected. If the tone is valid
for 64mS it will definitely be detected. Tones of
shorter duration between 32mS and 64mS may be detected
but cannot be guaranteed. The minimum duration of a tone can
be increased by setting the parameter
kAdjustToneSetIntParamIdMinOnTime
with
sm_adjust_input_tone_set().
- kSMToneDetectionMinDuration40
- This mode uses a slightly more complex algorithm for
analysing duration of a valid tone, and enables robust
detection of tones with duration as short as 40mS.
- kSMToneEndDetectionNoMinDuration
- This mode is like
kSMToneDetectionNoMinDuration
but application notified when end of tone detected.
- kSMToneEndDetectionMinDuration64
- This mode is like
kSMToneDetectionMinDuration64
but application notified when end of tone detected.
- kSMToneEndDetectionMinDuration40
- This mode is like
kSMToneDetectionMinDuration40
but application notified when end of tone detected.
- kSMToneLenDetectionNoMinDuration
- This mode is like
kSMToneEndDetectionNoMinDuration
but returns additional tone duration information to application.
- kSMToneLenDetectionMinDuration64
- This mode is like
kSMToneEndDetectionMinDuration64
but returns additional tone duration information to application.
- kSMToneLenDetectionMinDuration40
- This mode is like
kSMToneEndDetectionMinDuration40
but returns additional tone duration information to application.
- kSMToneDetectionAsListenFor
- This mode is only valid when specified in the parameters for
sm_record_start()
and a tone detection mode is currently active on the same
channel, started by
sm_listen_for().
Any tones detected on the same channel as the recording
will be eliminated from the recorded data.
- tone_elimination_set_id
- The tone set to use (only relevant if
tone_elimination_mode
is not
kSMToneDetectionNone). See
sm_listen_for()
for details of how to select an input tone set.
- max_octets
- The maximum amount of data to record. The value zero indicates
no maximum.
- max_elapsed_time
- The maximum duration of the recording in mS. The value zero
indicates no maximum.
Requires the module
timerx.
- max_silence
- The maximum silence permitted (in mS). The value zero indicates
no maximum. Silences longer than this cause the recording to
terminate.
Requires the module
grunt.
- agc
- Indicator of whether automatic gain control is to be enabled.
(non-zero) or not (zero).
Requires the module
gainbg.
- volume
- The desired adjustment to the volume (dB). The range of gain
supported is at least +8 to -22 dB,
Requires the module
gainbg.
- alt_data_source_type
- If an
alt_data_source
channel is specified, which kind of data associated with that
channel should be recorded.
One of these values:
- kSMRecordAltSourceDefault
- If
alt_data_source
is an input only channel, then data switched to this channel
input will be recorded, otherwise the data being generated on this channel output
will be recorded (this feature is normally used to record
conferenced outputs).
This value is deprecated since it is equivalent to either
kSMRecordAltSourceInput
or
kSMRecordAltSourceOutput
which could be used instead.
- kSMRecordAltSourceInput
- Data switched to
alt_data_source
input will be recorded.
This value is deprecated since several channels can take input
from the same timeslot and that is a more straightforward way of
achieving the same result.
- kSMRecordAltSourceOutput
- Data generated on
alt_data_source
output will be recorded.
- sampling_rate
- The sampling rate at which to record the data. Currently supported
values are:
- 0 - record at the rate reported via
sm_record_status().
- 8000 - the typical rate for telephony, since it is the rate at
which telephone networks themselves operate.
- 6000 - a rate which reduces file sizes at the cost of lower
quality.
- 11000 - a rate convenient for use with typical PC soundcards.
This is sufficiently close to a quarter of the rate used
by CDs (44100 Hz) that the difference is not significant,
allowing almost universal compatibility with cheap PC
soundcards which can handle 11025 Hz sampling.
Note that when you specify a non-zero value here, this function
assumes that the source of the data to be recorded is providing
data at 8000 samples per second. The use of data at other rates is
not supported and will cause the data to be recorded at an incorrect
sampling rate. Consequently, the use of a non-zero value in this
field is deprecated.
- min_noise_level
- The minimum level, in dBm0, that the noise estimate of the grunt detector may
reach. The default is -55 dBm0. Only used if
silence_elimination
or
max_silence
are non zero.
Requires the module
grunt.
- grunt_threshold
- The threshold, in dB, above the noise estimate of the grunt detector at
which a signal is considered present. The default is 15 dB. Only used if
min_noise_level
is non zero.
Requires the module
grunt.
- grunt_holdoff
- The period, in ms, following start of speech, to disable updating the estimate of the background noise energy
(a non-zero period, typically 1000ms, can be required when long periods of uninterrupted speech are expected).
Requires the module
grunt.
- max_initial_silence
- If both
max_silence
and this parameter are non-zero, then this parameter specifies the maximum period of silence allowed, in ms, prior
to start of speech, whereas the max_silence timeout will now specify maximum period of silence allowed subsequent to the start of speech.
Requires the module
grunt.
Returns
0
if call completed successfully, otherwise a standard error such as:
- ERR_SM_DEVERR - device error
- ERR_SM_WRONG_CHANNEL_STATE - if already recording
- ERR_SM_WRONG_CHANNEL_TYPE - if attempt to record using output channel
- ERR_SM_NOT_SAME_MODULE - alt_data_source channel not located on same module
This function is part of the Prosody speech processing API.