1, What is SDP
SDP (Session Description Protocal) says that the straight white point is the ability of each communication end (PC end, Mac end, Android end, iOS end, etc.) described in text. The capabilities here refer to the audio codecs supported by each end, the parameters set by these codecs, the transmission protocol used, and the audio and video media included. Let's take a look at a real SDP fragment:
v=0 o=- 3409821183230872764 2 IN IP4 127.0.0.1 ... m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126 ... a=rtpmap:111 opus/48000/2 a=rtpmap:103 ISAC/16000 a=rtpmap:104 ISAC/32000 ...
As shown in the above SDP segment, an audio stream, i.e. m=audio, is described in the SDP. The Payload (i.e. data load) types supported by the audio include 111, 103, 104, etc. In this SDP segment, Payload types such as 111, 103 and 104 are further described in more detail. For example, a=rtpmap:111 opus/48000/2 indicates that the data with Payload type 111 is OPUS encoded audio data, and its sampling rate is 48000, using two channels. By analogy, you can know that a=rtpmap:104 ISAC/32000 means that the audio data is encoded by ISAC, the sampling frequency is 32000, and the mono is used.
2, SDP action
In the previous rtsp experiment, we used sdp to describe the communication ports between the server and the customer service. Let's expand on some SDPs used in webrtc.
When webrtc two clients / browsers make A 1-to-1 call, they must first carry out signaling interaction, and an important information of interaction is SDP exchange. The purpose of SDP exchange is to let the other party know what capabilities they have, and then negotiate according to their respective capabilities to negotiate the recognized audio and video codec, codec related parameters (such as audio channel number, sampling rate, etc.), transmission protocol and other information. For example, A and B communicate. They first record their supported audio parameters, video parameters, transmission protocols and other information in the SDP, and then send their SDP information to each other through the signaling server. When one party receives the SDP information from the opposite end, it will compare the received SDP with its own SDP and take out the intersection between them. This intersection is the result of their negotiation, that is, the audio and video parameters and transmission protocol they finally use.
3, Standard SDP specification
Put an official file address first: https://datatracker.ietf.org/doc/html/rfc4566#page-24
sdp information consists of multiple lines of "< type > = < value >", where < type > is a string, < value > is a string, type indicates the type, the format of value depends on the type, the whole protocol is case sensitive, and spaces are not allowed on both sides of "="! The sdp session description consists of a session_level_description and multiple media_level description s! The scope of session level description is the whole session, and its position starts from "v =" line to the first media description; Media level description refers to the description of a single media stream, such as the video stream information during transmission, from m = to the next media description, as shown in the following figure:
Session level description mainly includes the following fields:
v = (indicates the version number of sdp, excluding the minor version number)
o = (owner / creator and session identifier)
s = (Session Name session name)
i = * (session information)
u = * (URI description)
e = * (Email address)
p = * (phone number)
c = * (connection information - not required if included in all media)
b = * (bandwidth information)
Time description
t = (session activity time)
r = * (0 or more repetitions)
The media level description mainly includes the following fields:
m = (media name and transport address)
i = * (media title)
c = * (connection information - optional if included in the session layer)
b = * (bandwidth information)
k = * (encryption key)
a = * (0 or more session property lines)
SDP example:
webrtc example (modified and extended based on standard SDP)
v=0 //sdp version number, always 0, as specified in rfc4566 o=- 7017624586836067756 2 IN IP4 127.0.0.1 // o=<username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address> //How does username not use - instead, 701762458683067756 is the number of the whole session, and 2 represents the session version if in the session //There are operations such as changing the code. When the sdp is regenerated, the sess ID remains unchanged and the sess version is increased by 1 s=- //Session name, if not, use - instead t=0 0 //The two values are the start time and end time of the session. Here, 0 means there is no limit a=group:BUNDLE audio video data //Media that need to share a transmission channel. If there is no such line, audio, video and data will be sent separately through a udp port a=msid-semantic: WMS h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C //WMS is the abbreviation of WebRTC Media Stream. This line defines that the client supports simultaneous transmission of multiple streams. A stream can include multiple track s, //Generally, if this is defined, the following line a=ssrc will have msid,mslabel and other attributes m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 126 //m=audio indicates that the session contains audio. 9 means that the audio is transmitted through port 9. However, in webrtc, I is generally not used now. If it is set to 0, it means that it is not used //For audio transmission, UDP/TLS/RTP/SAVPF refers to the protocol supported by the user to transmit audio. udp, tls and rtp refer to the use of udp to transmit rtp packets and tls encryption //SAVPF represents that the feedback mechanism of srtcp is used to control the communication process. The background 111 103 104 9 0 8 106 105 13 126 represents the coding supported by the audio of this session. The background lines will be supplemented in detail c=IN IP4 0.0.0.0 //This line indicates the IP address you want to use to receive or send audio. webrtc uses ice transmission instead of this address a=rtcp:9 IN IP4 0.0.0.0 //The address and port used to transmit rtcp are not used in webrtc a=ice-ufrag:khLS a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ //The above two lines are the security verification information during ice negotiation a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17 //The above line is the authentication information required during dtls negotiation a=setup:actpass //The above line represents that the client can be either a client or a server during dtls negotiation. Refer to rfc4145 and rfc4572 a=mid:audio //The media ID used in the previous BUNDLE line a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level //The previous line indicates that I want to add volume information to the rtp header. Refer to rfc6464 a=sendrecv //The previous line indicates that I am in two-way communication, and the other types are recvonly, sendonly and inactive a=rtcp-mux //The previous line indicates that RTP and RTCP packets use the same port for transmission //The following lines are supplementary descriptions for the media coding of the line m=audio, indicating the coding number, sampling rate, channel, etc a=rtpmap:111 opus/48000/2 a=rtcp-fb:111 transport-cc //The above line shows that opus coding supports the use of rtcp to control congestion. Refer to https://tools.ietf.org/html/draft-holmer-rmcat-transport-wide-cc-extensions-01 a=fmtp:111 minptime=10;useinbandfec=1 //For the optional supplementary description of opus coding, minptime means that the minimum packaging time is 10ms, and useinbandfec=1 means that the built-in fec feature of opus coding is used a=rtpmap:103 ISAC/16000 a=rtpmap:104 ISAC/32000 a=rtpmap:9 G722/8000 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:106 CN/32000 a=rtpmap:105 CN/16000 a=rtpmap:13 CN/8000 a=rtpmap:126 telephone-event/8000 a=ssrc:18509423 cname:sTjtznXLCNH7nbRw //cname is used to identify a data source. ssrc may change in case of conflict, but cname will not change and will also appear in SDEC in rtcp package, //For audio and video synchronization a=ssrc:18509423 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C 15598a91-caf9-4fff-a28f-3082310b2b7a //The above line defines the relationship between mediastream and audiotrack in ssrc and WebRTC. The first attribute after msid is stream-d and the second is track ID a=ssrc:18509423 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C a=ssrc:18509423 label:15598a91-caf9-4fff-a28f-3082310b2b7a m=video 9 UDP/TLS/RTP/SAVPF 100 101 107 116 117 96 97 99 98 //Refer to m=audio above, meaning similar c=IN IP4 0.0.0.0 a=rtcp:9 IN IP4 0.0.0.0 a=ice-ufrag:khLS a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17 a=setup:actpass a=mid:video a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=extmap:4 urn:3gpp:video-orientation a=extmap:5 http://www.ietf.org/id/draft-hol ... de-cc-extensions-01 a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay a=sendrecv a=rtcp-mux a=rtcp-rsize a=rtpmap:100 VP8/90000 a=rtcp-fb:100 ccm fir //ccm is the abbreviation of codec control using RTCP feedback message, which means to support the use of rtcp feedback mechanism to realize coding control. fir is Full Intra Request //Abbreviation means that the receiver notifies the sender to send a complete frame a=rtcp-fb:100 nack //Support packet loss retransmission, refer to rfc4585 a=rtcp-fb:100 nack pli //Support key frame packet loss retransmission, refer to rfc4585 a=rtcp-fb:100 goog-remb //Support the use of rtcp packets to control the sender's code stream a=rtcp-fb:100 transport-cc //Refer to opus above a=rtpmap:101 VP9/90000 a=rtcp-fb:101 ccm fir a=rtcp-fb:101 nack a=rtcp-fb:101 nack pli a=rtcp-fb:101 goog-remb a=rtcp-fb:101 transport-cc a=rtpmap:107 H264/90000 a=rtcp-fb:107 ccm fir a=rtcp-fb:107 nack a=rtcp-fb:107 nack pli a=rtcp-fb:107 goog-remb a=rtcp-fb:107 transport-cc a=fmtp:107 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f //Additional instructions for h264 encoding options a=rtpmap:116 red/90000 //fec redundant coding. Generally, if there is this line in sdp, the load type of rtp header is 116, otherwise it is the primary responsible type of each coding a=rtpmap:117 ulpfec/90000 //Support ULP FEC, refer to rfc5109 a=rtpmap:96 rtx/90000 a=fmtp:96 apt=100 //The above two lines are rtp types of VP8 encoded retransmission packets a=rtpmap:97 rtx/90000 a=fmtp:97 apt=101 a=rtpmap:99 rtx/90000 a=fmtp:99 apt=107 a=rtpmap:98 rtx/90000 a=fmtp:98 apt=116 a=ssrc-group:FID 3463951252 1461041037 //In webrtc, the retransmission packet is different from the normal packet ssrc. In the previous line, the former is the ssrc of the normal rtp packet and the latter is the ssrc of the retransmission packet a=ssrc:3463951252 cname:sTjtznXLCNH7nbRw a=ssrc:3463951252 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C ead4b4e9-b650-4ed5-86f8-6f5f5806346d a=ssrc:3463951252 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C a=ssrc:3463951252 label:ead4b4e9-b650-4ed5-86f8-6f5f5806346d a=ssrc:1461041037 cname:sTjtznXLCNH7nbRw a=ssrc:1461041037 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C ead4b4e9-b650-4ed5-86f8-6f5f5806346d a=ssrc:1461041037 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C a=ssrc:1461041037 label:ead4b4e9-b650-4ed5-86f8-6f5f5806346d m=application 9 DTLS/SCTP 5000 c=IN IP4 0.0.0.0 a=ice-ufrag:khLS a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17 a=setup:actpass a=mid:data a=sctpmap:5000 webrtc-datachannel 1024
RTSP streaming example
v=0 o=- 1586545639954157 1586545639954157 IN IP4 192.168.1.63 s=Media Presentation e=NONE b=AS:5100 t=0 0 a=control:rtsp://192.168.1.63:554/ m=video 0 RTP/AVP 96 c=IN IP4 0.0.0.0 b=AS:5000 a=recvonly a=x-dimensions:1920,1080 a=control:rtsp://192.168.1.63:554/trackID=1 a=rtpmap:96 H264/90000 a=fmtp:96 profile-level-id=420029; packetization-mode=1; sprop-parameter-sets=Z01AKI2NQDwBE/LgLcBAQFAAAD6AAAw1DoYACYFAABfXgu8uNDAATAoAAL68F3lwoA==,aO44gA== m=audio 0 RTP/AVP 8 c=IN IP4 0.0.0.0 b=AS:50 a=recvonlya=control:rtsp://192.168.1.63:554/trackID=2 a=rtpmap:8 PCMA/8000 a=Media_header:MEDIAINFO=494D4B48010300000400000111710110401F000000FA000000000000000000000000000000000000; a=appversion:1.0 v=0 o=34020000001320000010 0 0 IN IP4 192.168.1.202 s=Play c=IN IP4 192.168.1.202 t=0 0 m=video 5500 RTP/AVP 96 97 98 a=rtpmap:96 PS/90000 a=rtpmap:97 MPEG4/90000 a=rtpmap:98 H264/90000 a=recvonly