Because the SIP and RTP protocols are not designed to be secure, the IETF has defined SIPS and SRTP. SIPS in an implementation of SIP over TLS(Transport Layer Security) to solve authentication, confidentiality and integrity problems. SRTP is a bit different and defines its own cryptography mechanisms.
During the summer of 2006, Phil Zimmerman (the PGP guy) has published an interesting software called Zfone. This software is based on ZRTP (a enhancement of SRTP by Zimmerman) to increase the security level of RTP.
If you didn't look at SIP before, you might be interested in this other article: SIP and RTP : overview of a VoIP communication. SIP is like HTTP, it has been designed to share information between a client and a server, and like most protocols, in didn't embed a lot of security. SIPS solves that, and OpenSER provides a Transport Layer Security implementation.
OpenSER is one of the few SIP proxy that implement a TLS layer. As far as I know, it works only in one way: ciphering the communication using the Proxy's certificate. This means that OpenSER sends its own public signed certificate to the client and the client uses it in a TLS Handshake. In fact, this is exactly the same as in a HTTPS communication, since its defined in the TLS protocol stack. I'll not detail the TLS protocol here. Other articles (in this wiki too, but in french) and wikipedia can be useful if your are not familiar with TLS. Let's take a look at an OpenSER SIPS communication with the tool SSLDump. OpenSER is configured to provide its signed certificate to the client. The client AND OpenSER need to know the Certificate Authority, so the CA public certificate has to be on both OpenSER and the client's (soft)phone. The following diagram shows a typical TLS Handshake.
SSLDump give more details about the handshake.
New TCP connection #1: client (C) <-> OpenSER(S) .1 1 0.0907 (0.0907) C>S Handshake ClientHello Version 3.1 cipher suites Unknown value 0x39 Unknown value 0x38 Unknown value 0x35 TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA TLS_RSA_WITH_3DES_EDE_CBC_SHA Unknown value 0x33 Unknown value 0x32 Unknown value 0x2f TLS_RSA_WITH_RC4_128_SHA TLS_RSA_WITH_RC4_128_MD5 TLS_DHE_RSA_WITH_DES_CBC_SHA TLS_DHE_DSS_WITH_DES_CBC_SHA TLS_RSA_WITH_DES_CBC_SHA TLS_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA TLS_DHE_DSS_EXPORT_WITH_DES40_CBC_SHA TLS_RSA_EXPORT_WITH_DES40_CBC_SHA TLS_RSA_EXPORT_WITH_RC2_CBC_40_MD5 TLS_RSA_EXPORT_WITH_RC4_40_MD5 compression methods unknown value NULL .1 2 0.0933 (0.0026) S>C Handshake ServerHello Version 3.1 session_id= bc 1c 46 ac a4 0e 80 ec 0c 00 48 ef cc f8 ae c3 11 e9 66 e4 7d fd d0 ad 2c 13 55 ff be f5 a5 2c cipherSuite Unknown value 0x35 compressionMethod NULL .1 3 0.0934 (0.0000) S>C Handshake Certificate .1 4 0.0934 (0.0000) S>C Handshake ServerHelloDone .1 5 0.2233 (0.1299) C>S Handshake ClientKeyExchange .1 6 0.2233 (0.0000) C>S ChangeCipherSpec .1 7 0.2233 (0.0000) C>S Handshake .1 8 0.3300 (0.1066) S>C ChangeCipherSpec .1 9 0.3305 (0.0005) S>C Handshake .1 10 0.4177 (0.0872) C>S application_data .1 11 0.4882 (0.0705) C>S application_data .1 12 0.7364 (0.2481) S>C application_data .1 13 0.7369 (0.0005) S>C application_data
That's almost all that we can say about SIPS. SIPS is an encapsulated version of SIP inside TLS. The most important is to remember that the CA certificate HAS TO BE KNOWN by all the clients. Otherwise, clients can't verify if the server's certificate is legitimate or not.
SRTP is defined in RFC 3711, published in March 2004. Its goal is to ensure confidentiality, integrity, replay protection and message authentication of RTP and RTCP packets (SRTCP protect RTCP packets).It does not provide availability or proof.
The SRTP layer is located just below the RTP in the protocols stack. The SRTP layer intercepts the RTP packet, modifies it and sends it to the UDP layer.
Because RTP include strong requirements on time delay, SRTP provides high throughput and low packet expansion. The SRTP packet is composed of a regular RTP packet plus a Master Key Identifier and an authentication tag. The format of an SRTP packet is illustrated below:
As we can see, the whole message is authenticated, but only the payload (with the padding and the pad count) is ciphered.
The SRTP cryptography algorithm is quite complex. The diagram below is an attempt to provide a global vision of the successive steps:
SRTP (and SRTCP) is not designed to perform the key management. This is left to key management standards such as Multimedia Internet KEYing (MIKEY) . MIKEY provide several mechanisms to generate a Master Key (Pre-Shared Key, Public Key, Diffie-Hellman exchange). If PSK (Pre-Shared Key) is clearly not designed to provide a high security level, Public Key and Diffie-Hellman methods are often used in strong protocols. Diffie-Hellman is very simple to implement but is also highly exposed to Man-In-The-Middle attacks.
Diffie-Hellman (D-H) key exchange is a cryptographic protocol that allows two peers that have no prior knowledge of each other to decide on a shared secret key over an insecure communication channel.
(source : wikipedia)
Public Key (asymmetric cryptography) define the use of public and private key between two peers to share a Master Key. The classical algorithm used in asymmetric cryptography is RSA but there are also several others (El Gamal, etc…).
All those algorithms have the same goal : compute a Master Key shared between two or more peers. This Master Key is not directly used as a ciphering key, but is an input of the cryptographic context composed of:
The key derivation algorithm is independent of the encryption or authentication algorithms used. Once the key derivation rate is defined at the beginning of the SRTP communication, there's no need for extra communication between the peers. The key derivation algorithms work as follow :
More details about the key derivation function in RFC 3711.
Once the cryptographic context has been initialized, ciphering is performed according to the following steps :
Then, the packet is sent over the wire and the receiver decrypts the payload.
ZRTP is an enhancement for SRTP that defines a DH-key exchange and displays information for the users to avoid MITM attacks. To put it simply, when the DH-key is set, ZRTP displays a little hash of the DH key. So, the peers can compare their hashes during the voice communication to verify that they are equal. The big idea is that an attacker can break the DH key exchange but he cannot modify what the peers are saying over the phone. Because verifying the hash is very simple, this works extremely well !
Moreover, ZRTP keeps an history of the keys exchanged between peers. Then, after the first communication, ZRTP will use this history in the DH-generation. This method ensures that only both communication peers are able to generate the final DH key, an attacker who doesn't know the history is harmless ! The final point of ZRTP is that it doesn't trouble the line when a communication with a peer that doesn't support it is done. This is possible because RTP ignore headers it doesn't know. ZRTP is quite young yet (this is still a draft at the IETF) but it contains all the elements to become a key protocol in protecting VoIP communications.
source : ZRTP Draft v2