Secure VoIP protocols: SIPS & SRTP

Because the SIP and RTP protocols are not designed to be secure, the IETF has defined SIPS and SRTP. SIPS in an implementation of SIP over TLS(Transport Layer Security) to solve authentication, confidentiality and integrity problems. SRTP is a bit different and defines its own cryptography mechanisms.

During the summer of 2006, Phil Zimmerman (the PGP guy) has published an interesting software called Zfone. This software is based on ZRTP (a enhancement of SRTP by Zimmerman) to increase the security level of RTP.

TLS and SIP, the SIPS implementation

If you didn't look at SIP before, you might be interested in this other article: SIP and RTP : overview of a VoIP communication. SIP is like HTTP, it has been designed to share information between a client and a server, and like most protocols, in didn't embed a lot of security. SIPS solves that, and OpenSER provides a Transport Layer Security implementation.

OpenSER is one of the few SIP proxy that implement a TLS layer. As far as I know, it works only in one way: ciphering the communication using the Proxy's certificate. This means that OpenSER sends its own public signed certificate to the client and the client uses it in a TLS Handshake. In fact, this is exactly the same as in a HTTPS communication, since its defined in the TLS protocol stack. I'll not detail the TLS protocol here. Other articles (in this wiki too, but in french) and wikipedia can be useful if your are not familiar with TLS. Let's take a look at an OpenSER SIPS communication with the tool SSLDump. OpenSER is configured to provide its signed certificate to the client. The client AND OpenSER need to know the Certificate Authority, so the CA public certificate has to be on both OpenSER and the client's (soft)phone. The following diagram shows a typical TLS Handshake.

SSLDump give more details about the handshake.

New TCP connection #1: client (C) <-> OpenSER(S)
.1 1  0.0907 (0.0907)  C>S  Handshake
        Version 3.1 
        cipher suites
        Unknown value 0x39
        Unknown value 0x38
        Unknown value 0x35
        Unknown value 0x33
        Unknown value 0x32
        Unknown value 0x2f
        compression methods
                unknown value
.1 2  0.0933 (0.0026)  S>C  Handshake
        Version 3.1 
          bc 1c 46 ac a4 0e 80 ec 0c 00 48 ef cc f8 ae c3 
          11 e9 66 e4 7d fd d0 ad 2c 13 55 ff be f5 a5 2c 
        cipherSuite         Unknown value 0x35
        compressionMethod                   NULL
.1 3  0.0934 (0.0000)  S>C  Handshake
.1 4  0.0934 (0.0000)  S>C  Handshake
.1 5  0.2233 (0.1299)  C>S  Handshake
.1 6  0.2233 (0.0000)  C>S  ChangeCipherSpec
.1 7  0.2233 (0.0000)  C>S  Handshake
.1 8  0.3300 (0.1066)  S>C  ChangeCipherSpec
.1 9  0.3305 (0.0005)  S>C  Handshake
.1 10 0.4177 (0.0872)  C>S  application_data
.1 11 0.4882 (0.0705)  C>S  application_data
.1 12 0.7364 (0.2481)  S>C  application_data
.1 13 0.7369 (0.0005)  S>C  application_data

That's almost all that we can say about SIPS. SIPS is an encapsulated version of SIP inside TLS. The most important is to remember that the CA certificate HAS TO BE KNOWN by all the clients. Otherwise, clients can't verify if the server's certificate is legitimate or not.

SRTP, secure the data communication

SRTP is defined in RFC 3711, published in March 2004. Its goal is to ensure confidentiality, integrity, replay protection and message authentication of RTP and RTCP packets (SRTCP protect RTCP packets).It does not provide availability or proof.

SRTP Layer

The SRTP layer is located just below the RTP in the protocols stack. The SRTP layer intercepts the RTP packet, modifies it and sends it to the UDP layer.

Because RTP include strong requirements on time delay, SRTP provides high throughput and low packet expansion. The SRTP packet is composed of a regular RTP packet plus a Master Key Identifier and an authentication tag. The format of an SRTP packet is illustrated below:

As we can see, the whole message is authenticated, but only the payload (with the padding and the pad count) is ciphered.

The SRTP cryptography algorithm is quite complex. The diagram below is an attempt to provide a global vision of the successive steps:

Key Management and Cryptographic contexts

SRTP (and SRTCP) is not designed to perform the key management. This is left to key management standards such as Multimedia Internet KEYing (MIKEY) . MIKEY provide several mechanisms to generate a Master Key (Pre-Shared Key, Public Key, Diffie-Hellman exchange). If PSK (Pre-Shared Key) is clearly not designed to provide a high security level, Public Key and Diffie-Hellman methods are often used in strong protocols. Diffie-Hellman is very simple to implement but is also highly exposed to Man-In-The-Middle attacks.

Diffie-Hellman (D-H) key exchange is a cryptographic protocol that allows two peers that have no prior knowledge of each other to decide on a shared secret key over an insecure communication channel.

  1. Alice and Bob both agree to use a prime number as p=23 and base g=5.
  2. Alice chooses a secret integer a=6, then sends Bob (g^a mod p) ⇒ 56 mod 23 = 8.
  3. Bob chooses a secret integer b=15, then sends Alice (g^b mod p) ⇒ 515 mod 23 = 19.
  4. Alice computes (g^b mod p)^a mod p ⇒ 196 mod 23 = 2.
  5. Bob computes (g^a mod p)^b mod p ⇒ 815 mod 23 = 2.
  6. Alice and Bob got the same key : 2.

(source : wikipedia)

Public Key (asymmetric cryptography) define the use of public and private key between two peers to share a Master Key. The classical algorithm used in asymmetric cryptography is RSA but there are also several others (El Gamal, etc…).

All those algorithms have the same goal : compute a Master Key shared between two or more peers. This Master Key is not directly used as a ciphering key, but is an input of the cryptographic context composed of:

  • a Rollover counter (ROC): count how many times the 16 bits RTP sequence number as been reset to zero after passing through 65536;
  • an identifier for the encryption algorithm (AES in counter mode, in f8 mode, and so on);
  • an identifier for the message authentication algorithm;
  • a replay list (maintained by the receiver only);
  • a MKI indicator (0/1 if MKI is present in SRTP/SRTCP packets);
  • the master key(s), random, shared and secret;
  • for each master key, a counter of how many SRTP packets that have been processed with that master key;
  • the length of session keys for encryption and authentication;
  • a master salt used in the key derivation of session keys;
  • a key derivation rate (an integer in {1..2^24});
  • an MKI value;
  • a lifetime for a master key.

Key derivation

The key derivation algorithm is independent of the encryption or authentication algorithms used. Once the key derivation rate is defined at the beginning of the SRTP communication, there's no need for extra communication between the peers. The key derivation algorithms work as follow :

More details about the key derivation function in RFC 3711.

Packet processing

Once the cryptographic context has been initialized, ciphering is performed according to the following steps :

  1. Determine which cryptographic context to use;
  2. Determine the index of the SRTP packet;
  3. Determine the Master Key and the Master Salt;
  4. Determine he session keys and the session salt;
  5. Encrypt the RTP Payload with the encryption algorithm indicated in the cryptographic context with the session key and the session salt;
  6. If the MKI indicator is set to one, append the MKI to the packet;
  7. Compute the authentication tag with the ROC, the auth algorithms and the session authentication key (found in step 4);
  8. If necessary, update the ROC using the packet index determined in step 2.

Then, the packet is sent over the wire and the receiver decrypts the payload.

ZRTP, protect the key exchange

ZRTP is an enhancement for SRTP that defines a DH-key exchange and displays information for the users to avoid MITM attacks. To put it simply, when the DH-key is set, ZRTP displays a little hash of the DH key. So, the peers can compare their hashes during the voice communication to verify that they are equal. The big idea is that an attacker can break the DH key exchange but he cannot modify what the peers are saying over the phone. Because verifying the hash is very simple, this works extremely well !

Moreover, ZRTP keeps an history of the keys exchanged between peers. Then, after the first communication, ZRTP will use this history in the DH-generation. This method ensures that only both communication peers are able to generate the final DH key, an attacker who doesn't know the history is harmless ! The final point of ZRTP is that it doesn't trouble the line when a communication with a peer that doesn't support it is done. This is possible because RTP ignore headers it doesn't know. ZRTP is quite young yet (this is still a draft at the IETF) but it contains all the elements to become a key protocol in protecting VoIP communications.

source : ZRTP Draft v2

note: this article is taken from a university project realized by david bigot and myself during our last year of master at University of Poitiers, in 2007.

en/ressources/dossiers/voip/tls_sips_rtps.txt · Last modified: 2011/03/16 01:30 (external edit)
CC Attribution-Noncommercial-Share Alike 3.0 Unported Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0