====== SIP and RTP : overview of a VoIP communication ====== {{tag>voip sip rtp rfc}} This page describes in detail the protocols used in a typical SIP/RTP communication with or without the use of TLS. First, I'm going to describe how a simple VoIP communication works with OpenSER acting as a Proxy/Registrar and two X-Lite clients. We are working on a very simple network: - An OpenSER PBX -> 192.168.0.30 - Two X-Lite clients -> 10.42.16.48 and 10.42.16.88 The reason why all hosts are not in the same IP range is that we are using a VPN between our two networks to avoid NAT problems… We will discuss this later. ===== Registering ===== From RFC3261: SIP is an application-layer control protocol that can establish, modify, and terminate multimedia sessions (conferences) such as Internet telephony calls. The first thing that a user of a SIP Network does is registering himself with the SIP Registrar. In our example, we use an OpenSER server that acts as the SIP Registrar (and also the SIP Proxy, but for now it doesn't matter). So, Julien, the user who has the IP 10.42.16.48, launches his softphone and tries to register himself only by sending a REGISTER datagram such as this one: No. Time Source Destination Protocol Info 74 15.788470 10.42.16.48 192.168.0.30 SIP Request: REGISTER sip:192.168.0.30 {......truncated...datagram......} User Datagram Protocol, Src Port: 5061 (5061), Dst Port: 5060 (5060) Source port: 5061 (5061) Destination port: 5060 (5060) Length: 434 Checksum: 0xc0db [correct] Session Initiation Protocol Request-Line: REGISTER sip:192.168.0.30 SIP/2.0 Method: REGISTER [Resent Packet: False] Message Header Via: SIP/2.0/UDP 10.42.16.48:5061;rport;branch=z9hG4bK240C2422F5DFEF700B859D9BAD8F9063 From: julien ;tag=1847976374 SIP Display info: julien SIP from address: sip:julien@intra-calcman.org SIP tag: 1847976374 To: julien SIP Display info: julien SIP to address: sip:julien@intra-calcman.org Contact: "julien" Contact Binding: "julien" URI: "julien" SIP Display info: "julien" SIP contact address: sip:julien@10.42.16.48:5061 Call-ID: 16E70BC816CCE9FF71C7F405E6C4B56F@192.168.0.30 CSeq: 8933 REGISTER Expires: 1800 Max-Forwards: 70 User-Agent: X-Lite release 1105d Content-Length: 0 The message is very simple. The SIP data structure is sent in plain text and contains human readable information such as the local IP address, the Registrar IP address, several information and a CALL-ID. This last one is very important because UDP protocols do not maintain a network session. So, the Call-ID is used to refer to a session and will change when a new session is launched (a new registrar message, an invite message and so on...). The Registrar return a "100 Trying" message while processing the request. This message contains the same Call-ID. But here is the interesting thing: after a very short time, the Registrar returns the following message to julien: No. Time Source Destination Protocol Info 79 15.865100 192.168.0.30 10.42.16.48 SIP Status: 401 Unauthorized (1 bindings) {......truncated...datagram......} User Datagram Protocol, Src Port: 5060 (5060), Dst Port: 5061 (5061) Source port: 5060 (5060) Destination port: 5061 (5061) Length: 536 Checksum: 0xcbdf [correct] Session Initiation Protocol Status-Line: SIP/2.0 401 Unauthorized Status-Code: 401 [Resent Packet: False] Message Header Via: SIP/2.0/UDP 10.42.16.48:5061;rport;branch=z9hG4bK240C2422F5DFEF700B859D9BAD8F9063;received=10.42.16.48 From: julien ;tag=1847976374 SIP Display info: julien SIP from address: sip:julien@intra-calcman.org SIP tag: 1847976374 To: julien ;tag=as4c996f20 SIP Display info: julien SIP to address: sip:julien@intra-calcman.org SIP tag: as4c996f20 Call-ID: 16E70BC816CCE9FF71C7F405E6C4B56F@192.168.0.30 CSeq: 8933 REGISTER User-Agent: OpenSER PBX Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY Contact: Contact Binding: URI: SIP contact address: sip:julien@intra-calcman.org WWW-Authenticate: Digest realm="intra-calcman.org", nonce="3f314b07" Authentication Scheme: Digest Realm: "intra-calcman.org" Nonce Value: "3f314b07" Content-Length: 0 Why is the registrar sending an Unauthorized message ? This could be quite disconcerting… but, in fact, this is the regular way to register!!! As defined in the RFC : When a |Registrar| receives a request from a |Client|, the |Registrar| MAY authenticate the originator before the request is processed. If no credentials (in the Authorization header field) are provided in the request, the |Registrar| can challenge the originator to provide credentials by rejecting the request with a 401 (Unauthorized) status code. The WWW-Authenticate response-header field MUST be included in 401 (Unauthorized) response messages. The field value consists of at least one challenge that indicates the authentication scheme(s) and parameters applicable to the realm. So, the client has to re-send a Register request but, this time, it has to include an authorization method. Let's see the next message: No. Time Source Destination Protocol Info 81 15.869887 10.42.16.48 192.168.0.30 SIP Request: REGISTER sip:192.168.0.30 {......truncated...datagram......} User Datagram Protocol, Src Port: 5061 (5061), Dst Port: 5060 (5060) Source port: 5061 (5061) Destination port: 5060 (5060) Length: 575 Checksum: 0xab44 [correct] Session Initiation Protocol Request-Line: REGISTER sip:192.168.0.30 SIP/2.0 Method: REGISTER [Resent Packet: False] Message Header Via: SIP/2.0/UDP 10.42.16.48:5061;rport;branch=z9hG4bK4FD30F8A1D5F0DE817436029ACE18888 From: julien ;tag=1847976374 SIP Display info: julien SIP from address: sip:julien@intra-calcman.org SIP tag: 1847976374 To: julien SIP Display info: julien SIP to address: sip:julien@intra-calcman.org Contact: "julien" Contact Binding: "julien" URI: "julien" SIP Display info: "julien" SIP contact address: sip:julien@10.42.16.48:5061 Call-ID: 16E70BC816CCE9FF71C7F405E6C4B56F@192.168.0.30 CSeq: 8934 REGISTER Expires: 1800 Authorization: Digest username="julien",realm="intra-calcman.org",nonce="3f314b07",response="3ef916fd68651deaf5dd74b4473aa641",uri="sip:192.168.0.30" Authentication Scheme: Digest Username: "julien" Realm: "intra-calcman.org" Nonce Value: "3f314b07" Digest Authentication Response: "3ef916fd68651deaf5dd74b4473aa641" Authentication URI: "sip:192.168.0.30" Max-Forwards: 70 User-Agent: X-Lite release 1105d Content-Length: 0 As expected, this Register request contains an authorization field. Because this is not a new Registrar session, the Call-ID is still the same and as defined in the RFC, this authorization response includes the realm and the nonce provided by the Registrar. The nonce is a temporary random number used to avoid replay attacks. The generation of the "Digest Authentication Response" field is defined in RFC 3617. It uses the realm, the nonce, the URI, the username and the password to generate a md5 sum. The Registrar sends a "100 Trying" message while processing the request and after some milliseconds, it sends an Options request: No. Time Source Destination Protocol Info 84 15.960925 192.168.0.30 10.42.16.48 SIP Request: OPTIONS sip:julien@10.42.16.48:5061 The Registrer has not yet accepted the registration. This allows the Registrar to query the client about what functionalities it supports. What is interesting is that this message uses a new Call-ID: **Call-ID: 0b51f8be3c5396be74caa22c52f26340@192.168.0.30** Almost at the same time, the Registrar sends the "200 OK" message which validates the registration. So, the client is logged in now. In response to the Options request, it sends the "200 OK" message which lists the SIP methods it supports. (The code could be 486 if the client was here but not ready to accept a call). ===== Calling ===== OK, Now our client is registered and he can call his friends. To explain this part, let me welcome david@intra-calcman.org. Julien is going to call david. To do this, he sends an INVITE request to the proxy (which is also the registrar in our case, but can be a different one). The following datagram is the INVITE request: No. Time Source Destination Protocol Info 95 30.398482 10.42.16.48 192.168.0.30 SIP/SDP Request: INVITE sip:david@intra-calcman.org, with session description {......truncated...datagram......} User Datagram Protocol, Src Port: 5060 (5060), Dst Port: 38276 (38276) Source port: 5060 (5060) Destination port: 38276 (38276) Length: 831 Checksum: 0x25eb [correct] Session Initiation Protocol Request-Line: INVITE sip:david@intra-calcman.org:38276;rinstance=90eafe1f95fbbfa3 SIP/2.0 Method: INVITE [Resent Packet: False] Message Header Via: SIP/2.0/UDP 192.168.0.30:5060;branch=z9hG4bK01a93c8b;rport From: "julien" ;tag=as1ea41f9e SIP Display info: "julien" SIP from address: sip:julien@intra-calcman.org SIP tag: as1ea41f9e To: SIP to address: sip:david@intra-calcman.org:38276 Contact: Contact Binding: URI: SIP contact address: sip:julien@intra-calcman.org Call-ID: 48f934056782cf6f581c024128f6c29c@192.168.0.30 CSeq: 102 INVITE User-Agent: OpenSER PBX Max-Forwards: 70 Date: Mon, 02 Oct 2006 14:26:13 GMT Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY Content-Type: application/sdp Content-Length: 259 Message body Session Description Protocol Session Description Protocol Version (v): 0 Owner/Creator, Session Id (o): root 3725 3725 IN IP4 192.168.0.30 Owner Username: root Session ID: 3725 Session Version: 3725 Owner Network Type: IN Owner Address Type: IP4 Owner Address: 192.168.0.30 Session Name (s): session Connection Information (c): IN IP4 10.42.16.88 Connection Network Type: IN Connection Address Type: IP4 Connection Address: 10.42.16.88 Time Description, active time (t): 0 0 Session Start Time: 0 Session Stop Time: 0 Media Description, name and address (m): audio 14318 RTP/AVP 0 3 8 101 Media Type: audio Media Port: 14318 Media Proto: RTP/AVP Media Format: ITU-T G.711 PCMU Media Format: GSM 06.10 Media Format: ITU-T G.711 PCMA Media Format: 101 Media Attribute (a): rtpmap:0 PCMU/8000 Media Attribute Fieldname: rtpmap Media Format: 0 MIME Type: PCMU MIME type: PCMU Media Attribute (a): rtpmap:3 GSM/8000 Media Attribute Fieldname: rtpmap Media Format: 3 MIME Type: GSM MIME type: GSM Media Attribute (a): rtpmap:8 PCMA/8000 Media Attribute Fieldname: rtpmap Media Format: 8 MIME Type: PCMA MIME type: PCMA Media Attribute (a): rtpmap:101 telephone-event/8000 Media Attribute Fieldname: rtpmap Media Format: 101 MIME Type: telephone-event MIME type: telephone-event Media Attribute (a): fmtp:101 0-16 Media Attribute Fieldname: fmtp Media Format: 101 [telephone-event] Media format specific parameters: 0-16 Media Attribute (a): silenceSupp:off - - - - Media Attribute Fieldname: silenceSupp Media Attribute Value: off - - - - Once again, reading a SIP message is very simple (this is one of the aims of the protocol). We can see david's address and port: **INVITE sip:david@intra-calcman.org:38276;rinstance=90eafe1f95fbbfa3 SIP/2.0** and many others useful information such as the Via parameter, including the "branch" transaction identifier. The via parameter is needed to identify where the response is to be sent. **Via: SIP/2.0/UDP 192.168.0.30:5060;branch=z9hG4bK01a93c8b;rport** We haven't already talked about the SDP protocol. Session Description Protocol is the purpose of the RFC 4566 (july 2006). From the RFC: When initiating multimedia teleconferences, voice-over-IP calls, streaming video, or other sessions, there is a requirement to convey media details, transport addresses, and other session description metadata to the participants. SDP provides a standard representation for such information, irrespective of how that information is transported. SDP is purely a format for session description -- it does not incorporate a transport protocol, and it is intended to use different transport protocols as appropriate, including the Session Announcement Protocol, Session Initiation Protocol, Real Time Streaming Protocol, electronic mail using the MIME extensions, and the Hypertext Transport Protocol. SDP is intended to be general purpose so that it can be used in a wide range of network environments and applications. However, it is not intended to support negociation of session content or media encodings: this is viewed as outside the scope of session description. For our purpose, we only need to know that SDP syntax is used to define call parameters. In the previous datagram, we see that RTP will be used between david and julien. While OpenSER delivers the datagram to david, a "100 Trying" message is sent to julien. So, what happens on david's side after he has received this datagram? No. Time Source Destination Protocol Info 33 172.296544 10.42.16.88 192.168.0.30 SIP Status: 180 Ringing His softphone (we are using x-lite) sent a "180 Ringing" message, which is exactly what it means… The Ringing response contains all the SIP informations to identify the call: User Datagram Protocol, Src Port: 38276 (38276), Dst Port: 5060 (5060) Source port: 38276 (38276) Destination port: 5060 (5060) Length: 435 Checksum: 0xb210 [correct] Session Initiation Protocol Status-Line: SIP/2.0 180 Ringing Status-Code: 180 [Resent Packet: False] Message Header Via: SIP/2.0/UDP 192.168.0.30:5060;branch=z9hG4bK01a93c8b;rport=5060 Contact: Contact Binding: URI: SIP contact address: sip:david@intra-calcman.org:38276 To: ;tag=79182b50 SIP to address: sip:david@intra-calcman.org:38276 SIP tag: 79182b50 From: "julien";tag=as1ea41f9e SIP Display info: "julien" SIP from address: sip:julien@intra-calcman.org SIP tag: as1ea41f9e Call-ID: 48f934056782cf6f581c024128f6c29c@192.168.0.30 CSeq: 102 INVITE User-Agent: X-Lite release 1003l stamp 30942 Content-Length: 0 OpenSER, the proxy, receives this response and sends a "180 Ringing" message to julien. To establish the call, david sends a "200 OK" datagram: No. Time Source Destination Protocol Info 37 175.961865 10.42.16.88 192.168.0.30 SIP/SDP Status: 200 OK, with session description {......truncated...datagram......} User Datagram Protocol, Src Port: 38276 (38276), Dst Port: 5060 (5060) Source port: 38276 (38276) Destination port: 5060 (5060) Length: 785 Checksum: 0xb573 [correct] Session Initiation Protocol Status-Line: SIP/2.0 200 OK Status-Code: 200 [Resent Packet: False] Message Header Via: SIP/2.0/UDP 192.168.0.30:5060;branch=z9hG4bK01a93c8b;rport=5060 Contact: Contact Binding: URI: SIP contact address: sip:david@intra-calcman.org:38276 To: ;tag=79182b50 SIP to address: sip:david@intra-calcman.org:38276 SIP tag: 79182b50 From: "julien";tag=as1ea41f9e SIP Display info: "julien" SIP from address: sip:julien@intra-calcman.org SIP tag: as1ea41f9e Call-ID: 48f934056782cf6f581c024128f6c29c@192.168.0.30 CSeq: 102 INVITE Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY, MESSAGE, SUBSCRIBE, INFO Content-Type: application/sdp User-Agent: X-Lite release 1003l stamp 30942 Content-Length: 239 Message body Session Description Protocol Session Description Protocol Version (v): 0 Owner/Creator, Session Id (o): - 4 2 IN IP4 10.42.16.88 Owner Username: - Session ID: 4 Session Version: 2 Owner Network Type: IN Owner Address Type: IP4 Owner Address: 10.42.16.88 Session Name (s): CounterPath eyeBeam 1.5 Connection Information (c): IN IP4 10.42.16.88 Connection Network Type: IN Connection Address Type: IP4 Connection Address: 10.42.16.88 Time Description, active time (t): 0 0 Session Start Time: 0 Session Stop Time: 0 Media Description, name and address (m): audio 21008 RTP/AVP 0 3 8 101 Media Type: audio Media Port: 21008 Media Proto: RTP/AVP Media Format: ITU-T G.711 PCMU Media Format: GSM 06.10 Media Format: ITU-T G.711 PCMA Media Format: 101 Media Attribute (a): fmtp:101 0-15 Media Attribute Fieldname: fmtp Media Format: 101 Media format specific parameters: 0-15 Media Attribute (a): rtpmap:101 telephone-event/8000 Media Attribute Fieldname: rtpmap Media Format: 101 MIME Type: telephone-event MIME type: telephone-event Media Attribute (a): sendrecv Media Attribute (a): x-rtp-session-id:9A236EF6D78A42C39FFB8460582A5DE0 Media Attribute Fieldname: x-rtp-session-id Media Attribute Value: 9A236EF6D78A42C39FFB8460582A5DE0 As you can see, david also sends some information for Session Description. In fact, david chooses several parameters among those sent by julien and returns them. Those parameters will define the session. We can also note that the branch value is still the same. ===== Voice transport ===== David gets the call and sends a RTCP message. RTCP (Real Time Control Protocol) is a protocol used to manage RTP (Real Time Protocol) communication. Both protocols are defined in the RFC 3550. I couldn't present it better than the way the RFC does it: RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. I will deal with RTP after, but for now we have to study the RTCP packet sent by david just before he has accepted the call. Why did he send a RTCP packet before he sent a "200 OK" packet for accepting the call? There's no official reason so we have to assume that's a X-lite developers' choice... RTCP has been designed to manage the RTP protocol. It performs four functions: - Provide feedback on the quality of the data distribution; - Carry a persistent identifier for an RTP source: the canonical name or CNAME; - Use the first two functions to know the exact number of participants, and then calculates the rate at which packets are sent; - Provide minimal information about the participants. This last function is optional. There is several packets types for RTCP: * SR : sender report (transmission/reception statistics from active senders); * RR : ACK for SR when there are more than 31 sources; * SDES : Source Description, including CNAME; * BYE : a participant has left the session; * APP : application specific function; So, now, let's see the RTCP packet sent by david: No. Time Source Destination Protocol Info 36 175.934060 10.42.16.88 10.42.16.48 RTCP Receiver Report {......truncated...datagram......} User Datagram Protocol, Src Port: 21009 (21009), Dst Port: 14319 (14319) Source port: 21009 (21009) Destination port: 14319 (14319) Length: 140 Checksum: 0x3bf0 [correct] Real-time Transport Control Protocol (Receiver Report) [Stream setup by SDP (frame 34)] [Setup frame: 34] [Setup Method: SDP] 10.. .... = Version: RFC 1889 Version (2) ..0. .... = Padding: False ...0 0000 = Reception report count: 0 Packet type: Receiver Report (201) Length: 1 Sender SSRC: 739353178 Real-time Transport Control Protocol (Source description) [Stream setup by SDP (frame 34)] [Setup frame: 34] [Setup Method: SDP] 10.. .... = Version: RFC 1889 Version (2) ..0. .... = Padding: False ...0 0001 = Source count: 1 Packet type: Source description (202) Length: 30 Chunk 1, SSRC/CSRC 739353178 Identifier: 739353178 SDES items Type: CNAME (user and domain) (1) Length: 61 Text: 7260F2D3A8994DF8B7C89FD1A725211A@unique.z4140B17CA7EF45E8.org Type: PRIV (private extensions) (8) Length: 49 Prefix length: 16 Prefix string: x-rtp-session-id Text: 9A236EF6D78A42C39FFB8460582A5DE0 Type: END (0) It's a SDES packet, so it provides the CNAME of david. This last is generated randomly using the user name and the host name. In fact, the CNAME is not the primary identifier of an RTP communication. This functionnality is supplied by the SSRC number (which will be described later). But because the SSRC number could change during a session, the CNAME is used to identify a participant in any case. For david, CNAME is: **7260F2D3A8994DF8B7C89FD1A725211A@unique.z4140B17CA7EF45E8.org** David, who is very talkative, also sent the first RTP packet: No. Time Source Destination Protocol Info 38 175.989523 10.42.16.88 192.168.0.30 RTP Payload type=ITU-T G.711 PCMU, SSRC=739353178, Seq=1424, Time=2097700, Mark {......truncated...datagram......} User Datagram Protocol, Src Port: 21008 (21008), Dst Port: 14318 (14318) Source port: 21008 (21008) Destination port: 14318 (14318) Length: 180 Checksum: 0x78d0 [correct] Real-Time Transport Protocol 10.. .... = Version: RFC 1889 Version (2) ..0. .... = Padding: False ...0 .... = Extension: False .... 0000 = Contributing source identifiers count: 0 1… .... = Marker: True Payload type: ITU-T G.711 PCMU (0) Sequence number: 1424 Timestamp: 2097700 Synchronization Source identifier: 739353178 Payload: FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF... RTP packets are very simple. Their main goal is to deliver payloads (data samples). So, the packet contains only the codec used (Payload Type), the sequence number (for reassembly), the Timestamp (packet are dropped after a too long time) and the SSRC we have already talked about. The SSRC is a 32bits number chosen randomly that identifies one specific participant, or more precisely a participant's session. This number will change in many cases and that's why the CNAME number do exists. But if the line is clear, the SSRC will be the same until the end of the discussion. The Payload is the voice, the video or anything you want, encoded with the codec. Closing the call When david wants to leave the communication, he sends a BYE SIP datagram such as this one to the proxy: No. Time Source Destination Protocol Info 3090 79.192225 10.42.16.88 192.168.0.30 SIP Request: BYE sip:julien@10.42.16.48:5060;transport=UDP {.....truncated...datagram.......} Session Initiation Protocol Request-Line: BYE sip:julien@10.42.16.48:5060;transport=UDP SIP/2.0 Method: BYE Resent Packet: False Message Header Max-Forwards: 70 From: ;tag=13701 SIP from address: sip:david@intra-calcman.org SIP tag: 13701 To: ;tag=6868 SIP to address: sip:julien@intra-calcman.org SIP tag: 6868 CSeq: 602 BYE Call-ID: 11665@10.42.16.88 Route: Via: SIP/2.0/UDP 10.42.16.88:5060;rport;branch=z9hG4bK17673 Content-Length: 0 This is a classic SIP message, david send a BYE Request to julien via the OpenSER Proxy. The proxy forwards the BYE Request to julien and this last one answers the proxy with a 200 OK datagram. Finally, the proxy forwards the 200 OK datagram to david, so the call is closed. No. Time Source Destination Protocol Info 3095 79.356865 10.42.16.24 10.6.0.108 SIP Status: 200 OK {.....truncated...datagram.....} User Datagram Protocol, Src Port: sip (5060), Dst Port: sip (5060) Source port: sip (5060) Destination port: sip (5060) Length: 306 Checksum: 0x0432 [correct] Session Initiation Protocol Status-Line: SIP/2.0 200 OK Status-Code: 200 Resent Packet: False Message Header Max-Forwards: 70 Record-Route: From: ;tag=13701 SIP from address: sip:julien@intra-calcman.org SIP tag: 13701 To: ;tag=6868 SIP to address: sip:david@intra-calcman.org SIP tag: 6868 CSeq: 602 BYE Call-ID: 11665@10.42.16.88 Via: SIP/2.0/UDP 10.42.16.88:5060;rport=5060;branch=z9hG4bK17673 Content-Length: 0 ===== Conclusion ===== This paper is an unexhaustive overview of SIP and RTP protocols. If you want to find more precise information, I recommend to you to read the RFCs. Moreover, if you find mismatching values in this page, this is because I haven't wrote it in once. So, several packets come from differents communications. note: this article is taken from a university project realized by david bigot and myself during our last year of master at [[http://iriaf.univ-poitiers.fr/|University of Poitiers]], in 2007.