Prosody application note: building large conferences distributed over multiple Prosody modules

Introduction

The number of participants that can be included together in a conference on a single Prosody module is limited to the number of calls that can be switched to a single Prosody module, this limit is 64 calls. Conferences with up to this number of participants can be built with either the High level conferencing API or by using the conferencing primitives (such as sm_conf_prim_start()) directly.

Larger conferences can be achieved by switching the conference outputs from one DSP to be the input to a conference on another DSP, and vice versa. There are two simple ways to do this: the broadcast model and the pipeline model, and the total size of the conference is limited in different ways by these two as described below. There's also a more complicated model, the hierarchical model, which allows for even bigger conferences.

Classic Single Module Conference

With a classic single module conference on DSP A that has 64 participants, A0..A63, with each participant having input A<N>.in and output A<N>.out.

This situation arranges for participant A0 to have output to them the combined inputs of A1..A63:

	A0.out = A1.in + A2.in + A3.in + ... + A63.in

and for participant A1 to have output to them the combined inputs of A0 and A2..A63:

	A1.out = A0.in + A2.in + A3.in + ... + A63.in

and so on. Notice that each participant's output is formed from the sum of all the other participant's inputs, with the participant's own input being excluded from this sum.

Distributed Conference - Simplest case

Imagine there are 126 conference participants, A0..A62 and B0..B62, and two DSPs, A and B. This would require:

	A0.out =         A1.in + A2.in + ... + A.62.in + B0.in + B1.in + B2.in + ... + B62.in
	A1.out = A0.in +         A2.in + ... + A.62.in + B0.in + B1.in + B2.in + ... + B62.in

	B0.out = A0.in + A1.in + A2.in + ... + A.62.in +         B1.in + B2.in + ... + B62.in
	B1.out = A0.in + A1.in + A2.in + ... + A.62.in + B0.in +         B2.in + ... + B62.in

Now create a summed output on DSP A:

	A.ASUM.out = A0.in + A1.in +... A62.in

and then switch this output to an input on DSP B, called B.ASUM.in, then on DSP B create B0.out as follows:

	B0.out = B.ASUM.in +         B1.in + B2.in + ... + B62.in
	B1.out = B.ASUM.in + B0.in +         B2.in + ... + B62.in

Similarly, create a summed output B.BSUM.out = B0.in + B1.in + ... + B62.in and then switch this output to an input on DSP A, A.ASUM.in, then on DSP A create A0.out as follows:

	A0.out = A1.in + A2.in ... + A.62.in + A.BSUM.in

In this way, a conference of 126 participants can be built up. This is the simplest case with more than one DSP in use, so it fits into both the broadcast and pipeline models. Bigger conferences fall into at most one of the two models.

Distributed Conference - A Larger Broadcast Model

A larger example would be to have 244 participants A0..A60, B0..B60, C0..C60, D0..D60 and four DSPs A,B,C and D.

Create on DSP A:

	A.ASUM.out = A0.in + A1.in + ... + A60.in

on DSP B:

	B.BSUM.out = B0.in + B1.in + ... + B60.in

on DSP C:

	C.CSUM.out = C0.in + C1.in + ... + C60.in

on DSP D:

	D.DSUM.out = D0.in + D1.in + ... + D.60.in

and switch:

	A.ASUM.out	to B.ASUM.in,	and to C.ASUM.in,	and to D.ASUM.in
	B.BSUM.out	to A.BSUM.in,	and to C.BSUM.in,	and to D.BSUM.in
	C.CSW.out	to A.CSUM.in, 	and to B.CSUM.in,	and to D.CSUM.in
	D.DSUM.out	to A.DSUM.in,	and to B.DSUM.in, 	and to C.DSUM.in

The following diagram illustrates this configuration:

diagram of broadcast model conference

In general, with this model we use N DSPs, each of which provides as an output the sum of its local participants, and each of which gets an input from all the other DSPs. This means that we lose N-1 inputs from each DSP for carrying these sums, leaving 64-(N-1) inputs for real participants, which means we get N * (65 - N) participants. Solving this quadratic, we find the maximum number of participants is 1056 which occurs when we use 32 DSPs, losing 31 inputs per DSP for the sums, hence the 32 * 33 = 1056 total participants. For more DSPs, we lose too many inputs carrying the sums. In this model, the delay through the conference is limited to twice the delay through one DSP (i.e. 16 mS).

Distributed Conference - Pipeline Model

In this model the DSPs are arranged as a pipeline with each DSP passing on to is neighbour on the right the sum of all its own participants' inputs and those of all DSPs to the left, and vice versa.

Using 4 DSPs as in the broadcast model arranged as a pipeline of DSPs, then each would have an input of all participants to its left, and an input of all participants to its right, while also producing outputs to left and right suitable for its neighbours. It is constructed as follows: Create on DSP A:

	A.right.out = A0.in + A1.in +... A62.in

on DSP B:

	B.right.out = B.left.in + B0.in + B1.in +... + B61.in
	B.left.out  =             B0.in + B1.in +... + B61.in + B.right.in

on DSP C:

	C.right.out = C.left.in + C0.in + C1.in +... + C61.in
	C.left.out  =             C0.in + C1.in +... + C61.in + C.right.in

on DSP D:

	D.left.out = D0.in + D1.in +... + D62.in + D.right.in

and switch:

	A.right.out	to B.left.in
	B.left.out	to A.right.in
	B.right.out	to C.left.in
	C.left.out	to B.right.in
	C.right.out	to D.left.in
	D.left.out	to C.right.in

The following diagram illustrates this configuration:

diagram of pipeline model conference

This method has the advantage of being scaleable while sacrificing only two potential participants per DSP. With N DSPs you can get 62 * N + 2 participants. However it does have the disadvantage of incurring a maximum transit delay of N * 8 mS. This maximum delay would be incurred for speech from participants connected to the first DSP in the pipeline being transmitted to participants connected to the last DSP in the pipeline, and vice versa. Therefore, although the maximum size of a conference is not limited by the DSP switching (unlike with the broadcast model), it is limited by the maximum delay you can tolerate. For example, for 1056 participants (where the broadcast model stops, but has an 16 mS delay) you need 17 DSPs (17 * 62 + 2 = 1056) and the delay is 136 mS.

Distributed Conference - Hierarchical Model

The hierarchical model is more complex to set up, but allows bigger conferences than the broadcast model with lower delay than the pipeline model. The simplest hierarchical model uses two levels of DSPs. The bottom layer consists of DSPs (L0, L1, ... L63) each of which have 63 participants. Each of these provides the sum of all its participants up to the next level, and receives from the next level, the sum of all participants on the other DSPs. The next level up consists of a DSP (M0) which is responsible for generating all the sums for the layer below it. The following diagram illustrates this configuration:

diagram of two-level hirearchy conference

For example, the DSP L1 gets the sum L0 + L2 + ... + L63 from M0. Since each of the L0..L63 DSPs has 63 local participants, you can have up to 63 * 64 = 4032 participants - far more than with the biggest broadcast model. And the delay is not excessive. It's 3 * 8 = 24 mS at worst (from one L DSP to another).

We can take this even further. Suppose we add another layer. That gives us this configuration:

diagram of three-level hirearchy conference

We replace L63 with a link to a higher level again. This means we can have 64 mid-level DSPs, each of which has 63 low-level DSPs, each of which has 63 local participants. This gives up to 254016 participants, with a maximum delay of 5 * 8 = 40 mS (which occurs between participants under different M DSPs. In principle, this could be extended even further, with each extra layer adding 16 mS delay and multiplying the number of participants by 63. However, notice the connections to the M DSPs with only three layers. There are 64 M DSPs, and all their inputs are in use, which is 64 * 64 = 4096 timeslots just to interconnect the DSPs. This is the total capacity of a H.110 backplane, leaving no capacity to connect the participants! In practice, a single chassis would probably have only two layers, L0 .. L62 and M0, with the M0 <-> H0 link being a connection to a remote chassis. This uses 64 DSPs in a chassis (which is the current limit, given that you can have at most 16 cards per chassis and at most 4 DSPs per card), but may incur a little extra delay on the M0 <-> H0 link.

Document reference: AN 1340