Prosody application note: Dealing with Prosody X connection failure

Determining connection failure

Prosody uses UDP to communicate with a Prosody X card. UDP is an 'unreliable' protocol, meaning packets can be discarded with no indication to either the sender or receiver. Hence, occasional packet loss, and therefore brief interruptions to the network, must be tolerated. Prosody uses a protocol known as ASSP to manage the acknowledgement and retransmission of packets as needed. This is handled by the API library and needs no user intervention. Usually the only API visible symptom of significant, but temporary, packet loss is replay underruns and recording overruns.

There are many different ways in which the transmission of packets between the controlling application and the Prosody X can get interrupted. These include cables being removed and network switch failure. In many cases, the sockets interface on the controlling application's host system will provide no indication of network failure and so is indistinguishable from packet loss. When the sockets interface does report errors, the API library will reset the connection and start returning errors, such as ERR_SM_DISCONNECTED. In these circumstances, the clean up procedure given below should be followed.

Since the API library is not always able to determine if any packet loss is caused by transient conditions or by a permanent failure, external methods must be used. The Aculab 'Resource management API' provides methods for tracking which cards are currently reachable and available for use.

Coping with connection failure

When communication with a Prosody X card has failed the API is no longer informed of changes to the state of any tasks running on the card. Events will not become signalled (except the module event) and blocking functions will not return. In situations where terminating the application is not an option, special care must be taken. It is necessary to avoid all blocking functions and never have an infinite wait on only Prosody events. Caution is needed as some functions that normally do not block, may under certain circumstances. For example, sm_put_replay_data() can block if sm_replay_status() does not report kSMReplayStatusHasCapacity.

Prosody API functions that will block

The functions sm_play_tone(), sm_play_cptone(), and sm_play_digits() will block when wait_for_completion is non zero. The functions sm_replay_abort() and sm_vmprx_get_ports() will block when nowait is zero.

The function sm_conf_prim_abort() waits for confirmation that the conference has been aborted. The function sm_conf_prim_stop() with a non zero no_wait should be used instead, along with sm_conf_prim_status() to determine when the conference has been stopped. The high level conference API uses sm_conf_prim_abort().

Releasing resources

Obviously, waiting for tasks to stop is not possible but channels, VMP[rx], VMP[tx], RTCP handlers, etc. still need to be cleaned up. However, it is important to ensure that there is no attempt to use resources after they have been freed. Channels should be released with sm_channel_release() and any associated events freed with smd_ev_free(). Other items should be destroyed using the appropriate function:

ItemDestroy function
VMP[tx]sm_vmptx_destroy()
VMP[tx] tonesetsm_vmptx_destroy_toneset()
VMP[tx] CSRC listsm_vmptx_destroy_csrc_list()
VMP[rx]sm_vmprx_destroy()
RTCP handlersm_rtcphand_destroy()
FMP[tx]sm_fmptx_destroy()
FMP[rx]sm_fmprx_destroy()
Data pathsm_path_destroy()

Closing the module

All resources associated with a module must be released before it is closed. It is an error not to do so. The function sm_close_module() can be used. However, this function resets the ASSP connection to the module and blocks until the DSP confirms the reset was successful, or until it can be sure there are no packets related to the connection still in transit. This blocking behaviour may be undesirable (can block for up to 10 minutes). Alternatively, the closing of the module can be initiated using sm_shutdown_module(). When the module can be closed without blocking the module event, obtained using sm_module_get_event(), will become signalled and sm_module_status() will return kSMModuleStatusShutdown. At this point, sm_close_module() will no longer block. Once all the modules on a card have been closed the card itself can be closed.

Resources on the DSP module

The DSP will clean up all resources related to an ASSP connection when it receives an ICMP 'unreachable' packet. As this will only occur in response to an outgoing packet, the timing of the clean up of DSP resources depends on the current activity. To prevent DSP resources from becoming orphaned, the ASSP connection is occasionally validated (every 10 minutes) during idle periods by an exchange of packets.

When connectivity is restored

Application has not started to clean up

If the DSP has not noticed the break in connectivity, then the application can proceed as normal. Depending on the duration of the break in there may be replay underruns and recording overruns.

If, however, the DSP has already cleaned up the resources the application was using, the connection to the module will be reset. This will cause various functions (usually the status functions) to return errors, such as ERR_SM_DISCONNECTED. When this occurs the application should clean up as described above before using the module again.

Application has closed the module

Methods external to the Prosody API must be used to determine when an application is able to use a module again. Whatever method is used to decide when a module is available, the firmware loaded on it must be in a known state before it is used. It is recommended that when a module becomes available for use, the firmware is reloaded.

The Aculab 'Resource management API' can be used to determine which cards are available for application use. If it has been configured to load firmware on to the card, it will do so and report when this has been done.