Prosody application note: recording 2-party conversations

Introduction

This application note describes how the Aculab Prosody speech processing API may be used to efficiently record multiple two party conversations, (efficient in terms of maximum recording count per Prosody module).

Setting up the conversation

In a conversation between two parties, A and B, the signal from A is transmitted to B, and the signal from B is transmitted to A. Each party has a pair of unidirectional speech paths (incoming and outgoing) that appear as timeslots on the network ports and are connected using the switch driver API.

A common application requirement is to make a recording of the conversation between parties A and B. This may be achieved by adding the incoming signals from A and B using a DSP, and then making a recording of the combined signal.

Recording the conversation using signal paths

This is the preferred and most straightforward approach. A signal path must be created using sm_path_create() then the signal from one side of the conversation fed into the path using sm_path_datafeed_connect() then the signal from the other side fed into a mixer task added to the path using sm_path_mix() and finally the output of the path fed to an input channel using sm_path_get_datafeed() and sm_channel_datafeed_connect().

A recording started on this input channel will have the signals from both parties mixed together.

Recording the conversation using conferencing primitives

Using Prosody conferencing primitives to activate the mixing function, a conference output job is created; the signals from A and B are nominated as the inputs to this conference.

In order to economise on channels and external connections, rather than switching the conference output back into a separate input, Prosody can be made to record the output of the conference job directly.

This is achieved by using the optional parameter alt_data_source when starting the recording.

Channel allocation

For each conversation, Prosody channels are allocated using sm_channel_alloc_placed(). Two inputs are required; these are incoming signals A and B.

  1. The first input A is allocated as a channel that is capable of input, eg kSMChannelTypeInput.
  2. The second input B is allocated as a input channel that is capable of input, e.g. kSMChannelTypeInput.
  3. The combined channel sumofAB, whose signal is generated by the conference job, is the output channel. In this instance, type should be set to kSMChannelOutput.
  4. The recording channel recAB is allocated as a channel that is capable of input, eg kSMChannelTypeInput.

Prosody firmware configurations

TiNG modules inchan and outchan are required for input and output of speech data. The module conf is required for the conferencing and, depending on the recording format desired, an appropriate recording module will be needed. For example, for A-law, recA.

Starting the Prosody jobs

Following on from the example above.

  1. The conference should be set up on the output channel sumofAB using the conferencing primitive sm_conf_prim_start(), with channel set to the channel allocated to sumofAB. At this point, as it has no input signals assigned, the primitive conference is doing nothing.
  2. Add the two input signals to the primitive conference using sm_conf_prim_add(), with channel set to channel sumofAB, and participant set first to channel A, and then channel B. Channel sumofAB will then consist of channels A and B mixed together.
  3. Record the conference using sm_record_start(), sm_record_file_start(), or sm_record_wav_start(), with channel set to recAB (this has the recording resources), but alt_data_source set to channel sumofAB (this is the signal we want to record).

Source code illustration

The following code example shows a two party recording being set up. For conciseness, it merely returns if an error occurs. A real application would need to clean up (e.g. by freeing any channels already allocated) and report exactly what function returned the error. etc.

The code assumes you have selected appropriate streams and timeslots on the Prosody module. The two inputs will use the specified streams and timeslots. The conference sum needs to be switched to a timeslot to provide a buffer to store the output data until it is recorded. In this example, the conference sum will use the output corresponding to the first input (but since that isn't switched anywhere it doesn't matter where it goes as long as it doesn't conflict with another output). Normally, there is a trivial way to allocate these timeslots (for example by deriving them from the port and timeslot on a bearer circuit).

int setup2party_recording(tSMModuleId module,
	int streamA, int tsA, enum kSMTimeslotType tstypeA,
	int streamB, int tsB, enum kSMTimeslotType tstypeB)
{
 tSMChannelId chanA;
 tSMChannelId chanB;
 tSMChannelId sumofAB;
 tSMChannelId recAB;
 int err;
 {					// allocate A
  SM_CHANNEL_ALLOC_PLACED_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.module = module;
  pp.type = kSMChannelTypeInput;
  err = sm_alloc_channel_placed(&pp);
  if (err) return err;
  chanA = pp.channel;
 }
 {
  SM_SWITCH_CHANNEL_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.channel = chanA;
  pp.st = streamA;
  pp.ts = tsA;
  pp.type = tstypeA;
  err = sm_switch_channel_input(&pp);
  if (err) return err;
 }
 {					// allocate B
  SM_CHANNEL_ALLOC_PLACED_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.module = module;
  pp.type = kSMChannelTypeInput;
  err = sm_alloc_channel_placed(&pp);
  if (err) return err;
  chanB = pp.channel;
 }
 {
  SM_SWITCH_CHANNEL_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.channel = chanB;
  pp.st = streamB;
  pp.ts = tsB;
  pp.type = tstypeB;
  err = sm_switch_channel_input(&pp);
  if (err) return err;
 }
 {					// allocate sumofAB
  SM_CHANNEL_ALLOC_PLACED_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.module = module;
  pp.type = kSMChannelTypeOutput;
  err = sm_alloc_channel_placed(&pp);
  if (err) return err;
  sumofAB = pp.channel;
 }
 {
  SM_SWITCH_CHANNEL_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.channel = sumofAB;
  pp.st = streamA;
  pp.ts = tsA;
  pp.type = tstypeA;
  err = sm_switch_channel_output(&pp);
  if (err) return err;
 }
 {					// allocate recAB
  SM_CHANNEL_ALLOC_PLACED_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.module = module;
  pp.type = kSMChannelTypeInput;
  err = sm_alloc_channel_placed(&pp);
  if (err) return err;
  recAB = pp.channel;
 }
 {					// start the conference
  SM_CONF_PRIM_START_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.channel = sumofAB;
  err = sm_conf_prim_start(&pp);
  if (err) return err;
 }
 {					// add A to the conference
  SM_CONF_PRIM_ADD_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.channel = sumofAB;
  pp.participant = chanA;
  err = sm_conf_prim_add(&pp);
  if (err) return err;
 }
 {					// add B to the conference
  SM_CONF_PRIM_ADD_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.channel = sumofAB;
  pp.participant = chanB;
  err = sm_conf_prim_add(&pp);
  if (err) return err;
 }
 {					// start recording
  SM_RECORD_PARMS pp;
  memset(&pp, 0, sizeof(pp));
  pp.channel = recAB;
  pp.alt_data_source = sumofAB;
  pp.alt_data_source_type = kSMRecordAltSourceOutput;
  err = sm_record_start(&pp);
  if (err) return err;
 }
 /*
  * process recording until call is cleared
  */
 err = sm_channel_release(recAB);	// release all the channels
 if (err) return err;
 err = sm_channel_release(sumofAB);
 if (err) return err;
 err = sm_channel_release(chanA);
 if (err) return err;
 err = sm_channel_release(chanB);
 if (err) return err;
 return 0;
}

Document reference: AN 1348