1.1 Glossary

The following terms are defined in [MS-GLOS]:

authentication
base64
cipher
datagram
encryption
network address translation (NAT)
Unicode string
User Datagram Protocol (UDP)
UTF-8

The following terms are specific to this document:

cipher block chaining (CBC): A DES mode of operation that chains blocks of cipher text as specified in [FIPS46-3].

codec: Short for encoder/decoder. An algorithm used to convert media between digital formats, especially between raw media (for example, audio or video) data and a format that is more suitable for a particular purpose (reducing size for example, in the context of RTP). The conversion from raw data is regarded as the encoding step, and the conversion back to raw data is regarded as the decoding step.

conference: An RTP session involving multiple participants.

connectionless protocol: A transport protocol by means of which endpoints communicate without a prior connection arrangement, and in which each packet is treated independently as a datagram. Examples of this type of protocol include Internet Protocol (IP) and User Datagram Protocol (UDP).

connection-oriented transport protocol: A transport protocol by means of which endpoints communicate after first establishing a connection, and in which each packet is treated according to the connection state. An example of this type of protocol is Transmission Control Protocol (TCP).

contributing source (CSRC): A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the synchronization source (SSRC) identifiers of the sources that contributed to the generation of a particular packet into the RTP header of that packet. This list is called the CSRC list. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer). See [RFC3550] section 3.

Data Encryption Standard (DES): An encryption standard that specifies a FIPS approved cryptographic algorithm as specified in [FIPS46-3].

Dual Tone Multiple Frequency (DTMF): The signaling system used in telephony systems, in which each digit is associated with two specific frequencies. Most commonly associated with telephone touch-tone keypads.

forward error correction (FEC): A mechanism in which a sender uses redundancy to enable a receiver to recover from packet loss.

jitter: Variation in the network delay that is perceived by the receiver for each packet.

message digest algorithm 5 (MD5): A cryptographic hash function that generates 128 bits of hash value as specified in [RFC1321].

mixer: An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet. Because the timing among multiple input sources will not generally be synchronized, the mixer will make timing adjustments among the streams and generate its own timing for the combined stream. Thus, all data packets originating from a mixer will be identified as having the mixer as their synchronization source. See [RFC3550] section 3.

multimedia session: A set of concurrent RTP sessions among a common group of participants. For example, a video conference (which is a multimedia session) may contain an audio RTP session and a video RTP session. See [RFC3550] section 3.

non-RTP means: Protocols and mechanisms that may be needed in addition to RTP to provide a usable service. In particular, for multimedia conferences, a control protocol may distribute multicast addresses and keys for encryption, negotiate the encryption algorithm to be used, and define dynamic mappings between RTP payload type values and the payload formats they represent for formats that do not have a predefined payload type value. Examples of such protocols include the Session Initiation Protocol (SIP) ([RFC3261]), ITU Recommendation H.323, and applications using SDP ([RFC2327]), such as RTSP ([RFC2326]). For simple applications, electronic mail or a conference database may also be used. See [RFC3550] section 3.

packetization time (P-time): For audio, the amount (in milliseconds) of audio data that is sent in a single Real-Time Transport Protocol (RTP) Packet.

participant: A user who is participating in a conference or peer-to-peer call. May also be used in reference to the object that is used to represent this participant on the implementation.

port: The "abstraction that transport protocols use to distinguish among multiple destinations within a given host computer. TCP/IP protocols identify ports using small positive integers." The transport selectors (TSEL) used by the OSI transport layer are equivalent to ports. RTP depends upon the lower-layer protocol to provide some mechanism such as ports to multiplex the RTP and RTCP packets of a session. See [RFC3550] section 3.

Real-Time Transport Protocol (RTP): A network protocol that provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio and video.

RTCP packet: A control packet consisting of a fixed header part similar to that of RTP packets, followed by structured elements that vary depending upon the RTCP packet type. Typically, multiple RTCP packets are sent together as a compound RTCP packet in a single packet of the underlying protocol; this is enabled by the length field in the fixed header of each RTCP packet. See [RFC3550] section 3.

RTP packet: A data packet consisting of the fixed RTP header, a possibly empty list of contributing sources, and the payload data. Some underlying protocols may require an encapsulation of the RTP packet to be defined. Typically one packet of the underlying protocol contains a single RTP packet, but several RTP packets can be contained if permitted by the encapsulation method. See [RFC3550] section 3.

RTP payload: The data transported by RTP in a packet, for example audio samples or compressed video data. For more information, see [RFC3550] section 3.

RTP session: An association among a set of participants communicating with RTP. A participant may be involved in multiple RTP sessions at the same time. In a multimedia session, each medium is typically carried in a separate RTP session with its own RTCP packets unless the encoding itself multiplexes multiple media into a single data stream. A participant distinguishes multiple RTP sessions by reception of different sessions using different pairs of destination transport addresses, where a pair of transport addresses comprises one network address plus a pair of ports for RTP and RTCP. All participants in an RTP session may share a common destination transport address pair, as in the case of IP multicast, or the pairs may be different for each participant, as in the case of individual unicast network addresses and port pairs. In the unicast case, a participant may receive from all other participants in the session using the same pair of ports, or may use a distinct pair of ports for each. The distinguishing feature of an RTP session is that each maintains a full, separate space of SSRC identifiers. The set of participant included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP. For example, consider a three- party conference implemented using unicast UDP with each participant receiving from the other two on separate port pairs. If each participant sends RTCP feedback about data received from one other participant only back to that participant, the conference is composed of three separate point-to-point RTP sessions. If each participant provides RTCP feedback about its reception of one other participant to both of the other participants, the conference is composed of one multi-party RTP session. The latter case simulates the behavior that would occur with IP multicast communication among the three participants. The RTP framework allows the variations defined here, but a particular control protocol or application design will usually impose constraints on these variations. See [RFC3550] section 3.

Session Description Protocol (SDP): A protocol that is used for session announcement, session invitation, and other forms of multimedia session initiation [MS-SDP].

Session Initiation Protocol (SIP): An application-layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants, as specified in [RFC3261].

silence suppression: A mechanism for conserving bandwidth by detecting silence in the audio input, and not sending packets that would only contain silence.

stream: A flow of data from one host to another host. May also be used to reference the flowing data.

synchronization source (SSRC): The source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback. Examples of synchronization sources include the sender of a stream of packets derived from a signal source such as a microphone or a camera, or an RTP mixer. A synchronization source may change its data format (for example, audio encoding) over time. The SSRC identifier is a randomly chosen value meant to be globally unique within a particular RTP session. A participant need not use the same SSRC identifier for all the RTP sessions in a multimedia session; the binding of the SSRC identifiers is provided through RTCP. If a participant generates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC. See [RFC3550] section 3.

throttling: The enforcement of a limit in the frequency where an action can occur.

transport address: The combination of a network address and port that identifies a transport-level endpoint, for example an IP address and a UDPport. Packets are transmitted from a source transport address to a destination transport address. See [RFC3550] section 3.

video encapsulation: A mechanism for transporting video payload and metadata in RTP packets.

video frame: One of the still images that are shown in quick succession in a video.

MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as described in [RFC2119]. All statements of optional behavior use either MAY, SHOULD, or SHOULD NOT.