Streaming Session Protocol

Owner: Jordan Bayles


This document is formatted to follow Gitiles syntax, which includes several extensions (but not GitHub Extended Markdown).

Objective and Requirements

This document is the reference specification for the libcast Streaming Session Protocol. This spec was originally developed as part of the Google Cast v2 project over a decade ago, and has slowly evolved over time to meet the needs of various Cast devices and scenarios.

With the implementation of libcast and its ever increasing adoption rate across the Cast ecosystem, this document will continue to evolve as the official standard for the Cast Streaming protocol.

Goals

  1. Provide a reference specification for consumers and implementers of the libcast library, so that proposed features can be properly discussed, designed, implemented, and maintained.

  2. Define interfaces for establishing, controlling, and terminating Cast streaming (mirroring, remoting, and otherwise) sessions.

  3. Prioritize interoperability across the Cast ecosystem, so that Cast devices using the latest and greatest version of libcast are still, to the best of our ability, compatible with legacy devices no longer receiving updates. This specification will attempt to maintain compatibility with legacy Cast v2 devices, but the protocol is expected to grow and change over time.

Non-Goals

Unfortunately, some important Cast APIs are still defined by closed source specifications, and are thus out of scope of this document. Defining the following APIs is thus out of scope:

  • Application control messaging: Some application control messages, such as LAUNCH and STOP, are included. However, it is likely that there are other, unspecified messages used to manage higher level app state. A functional implementation is provided for the receiver ( openscreen::cast::ApplicationAgent) and a reference implementation for the sender (openscreen::cast::LoopingFileAgent).

  • Authentication messaging: On top of TLS, Cast provides additional authentication through the use of certificates and private keys. The API for authentication is implemented in several directories of cast, especially the //cast/common/channel and //cast/common/certificate folders.

  • Keep-alive behavior: Although the control channel may (or may not) have more sophisticated behavior for keeping a session alive, in this specific protocol it is limited to a simple timeout, with deprecated PING/PONG messages kept in the specification.

  • Flinging messaging: there is a rich suite of messages in the media namespace used for controlling flinging messages, i.e. sessions where the receiver is responsible for fetching content and controlling playback. This document is focused on streaming, both mirroring and remoting, and leaves flinging for closed source documentation and closed source APIs.

Background

Libcast

See the cast/README.md for more information about the libcast project.

Overview

The streaming session protocol defines how a streaming session interacts with standard Cast messages (in the com.google.cast namespace), mirroring-specific messages (in the urn:x-cast:com.google.cast.webrtc namespace), and remoting-specific messages (in the urn:x-cast:com.google.cast.remoting namespace).

In this context, “mirroring” refers to sending a real-time encoded stream of the sender‘s screen to the receiver. “Remoting” refers to an optimization where, if the sender’s screen is primarily composed of a video that the receiver is capable of performantly decoding, instead of transcoding the video and streaming it to the receiver, it is instead sent to the receiver still encoded and decoded directly by the receiver.

  • Launch and Termination Request from Sender: streaming is initiated using a standard com.google.cast LAUNCH request with the appId parameter set to a pre-defined 8-digit alphanumeric app identifier. The full set of application IDs known to libcast at this time are defined in the cast_streaming_app_ids.h header. Termination is via a standard STOP request.

  • Transport Session Negotiation: a streaming-specific OFFER event is defined, and used by the sender to generate an offer for the receiver. The ANSWER response to this event contains the answer object. The OFFER must be sent immediately after launch, and can be sent again at any time.

  • Presentation Request from Sender: a PRESENTATION request allows sender control over rendering transformations applied on the receiver. The transformation defines zoom, offset, and rotation; it enables the current letterboxing-elimination feature, overscan compensation (for flexibility, if needed), and rotation (intended for Android, not used by Chrome).

  • Keep Alive: both sender and receiver are expected to use media-level activity and/or transport layer activity for keeping the connection alive. There's no separate application-level keep alive message.

  • Application Protocol & Other Control Messages: the minimum set of control messages required for streaming is LAUNCH, STOP, and RECEIVER_STATUS. The full list is documented in the Session Control Messages section.

Detailed Design

Protocol Messages

This section defines the JSON payloads for various messages used in the Cast protocol.

Notes on Types

Message field types generally one of the following three classes: primitives, structures, or collections of primitives or structures. The most common type primitives are string and int. Although this spec is generally written in C++, there is no technical reason one could not produce an implementation in other languages. This specification's assumptions about primitive types are defined in the below table:

NameDefinition
intA signed at-least-32-bit integer value (int in C++)
uint32An unsigned 32 bit integer value (uint32_t in C++)
stringAn array of ASCII characters (char* or std::string in C++)

Common Message Fields

It is assumed that messages generally have all of the following properties:

NameTypeValue/description
sessionIdintUnique identifier of the session
seqNumintRequest sequence number
typestringRepresents the specific kind of message

Session Control Messages (com.google.cast)

Basic streaming session control occurs via com.google.cast messages (as currently defined with version=2), as follows:

  • LAUNCH (request): initiates the streaming session. For v2 streaming, the appId parameter must be set to 0F5096E8 if the session is audio and video, or 85CDB22F if the session is audio only. For this app name, it is expected that the Cast receiver will run a specific built-in streaming receiver that implements this specification.

  • LAUNCH_STATUS (reply): sent from the Cast receiver to indicate that the launch request succeeded.

  • LAUNCH_ERROR (reply): sent from the Cast receiver to indicate that the launch request failed.

  • GET_APP_AVAILABILITY (request, reply): name of both the request and response message for getting information about what applications are available.

  • GET_STATUS (request): requests the status of the receiver.

  • STATUS_RESPONSE (reply): response to a GET_STATUS request.

  • STOP (request): terminates the streaming session, and must terminate all underlying media streams from the sender to the receiver.

  • INVALID_REQUEST (reply): Optional message sent by the receiver whenever an invalid command is received.

  • RECEIVER_STATUS (reply): response to a GET_STATUS request.

LAUNCH

This message is sent from the sender to the receiver to initiate a streaming session.

NameTypeValue/description
typestringMust be LAUNCH.
requestIdintA unique identifier for the request.
appIdstringThe ID of the application to launch. For streaming, this is typically 0F5096E8 for A/V or 85CDB22F for audio-only.
appParamsobject (optional)An optional object containing application-specific parameters.
languagestring (optional)The preferred language for the application (e.g., “en-US”).
supportedAppTypesarray of string (optional)A list of application types supported by the sender (e.g., “WEB”, “ANDROID_TV”).
LAUNCH_STATUS

This message is sent from the receiver to the sender to indicate that the application launch was successful. Note that a full RECEIVER_STATUS message is typically sent immediately after this.

NameTypeValue/description
responseTypestringMust be LAUNCH_STATUS.
launchRequestIdintThe requestId from the original LAUNCH request.
statusstringA status string, must be USER_ALLOWED.
LAUNCH_ERROR

This message is sent from the receiver to the sender if the application launch failed.

NameTypeValue/description
responseTypestringMust be LAUNCH_ERROR.
requestIdintThe requestId from the original LAUNCH request.
reasonstringA string code indicating the reason for the failure (e.g., NOT_FOUND, SYSTEM_ERROR).
GET_APP_AVAILABILITY

A request from the sender to check if specific applications can be launched on the receiver. The response uses the same message name in its responseType field.

Request:

NameTypeValue/description
typestringMust be GET_APP_AVAILABILITY.
requestIdintA unique identifier for the request.
appIdarray of stringAn array of application IDs to check.

Response:

NameTypeValue/description
responseTypestringMust be GET_APP_AVAILABILITY.
requestIdintThe requestId from the request.
availabilityobjectAn object where keys are appIds and values are APP_AVAILABLE or APP_UNAVAILABLE.
GET_STATUS

A request from the sender to get the receiver's current status. It has no payload other than the common type and requestId fields.

RECEIVER_STATUS

A response sent by the receiver containing its current status. This can be in response to a GET_STATUS request or sent unsolicited when the receiver's state changes.

NameTypeValue/description
responseTypestringMust be RECEIVER_STATUS.
requestIdintThe requestId from the GET_STATUS request, or 0 if unsolicited.
statusobjectThe main status object.
status.applicationsarray of objectA list of running applications. Each object contains appId, displayName, sessionId, transportId, isIdleScreen, etc.
status.volumeobjectAn object describing the device's volume state, with fields like level, muted, and controlType.
STOP

This message is sent from the sender to the receiver to terminate a running application.

NameTypeValue/description
typestringMust be STOP.
requestIdintA unique identifier for the request.
sessionIdstringThe ID of the session to be terminated.
INVALID_REQUEST

Sent by the receiver when it receives a malformed or invalid request.

NameTypeValue/description
responseTypestringMust be INVALID_REQUEST.
requestIdintThe requestId from the invalid request, if it could be parsed.
reasonstringA string code for the error (e.g., INVALID_COMMAND).

Discovery Messages (google.cast.receiver.discovery)

GET_DEVICE_INFO

A request from the sender for detailed information about the receiver device. The response unexpectedly uses the same message type and is not a responseType.

Request:

NameTypeValue/description
typestringMust be GET_DEVICE_INFO.
requestIdintA unique identifier for the request.

Response:

NameTypeValue/description
typestringMust be GET_DEVICE_INFO.
requestIdintThe requestId from the request.
deviceIdstringA unique identifier for the receiver device.
friendlyNamestringThe user-configured device name.
deviceModelstringThe product model name.
capabilitiesintA bitmask of device capabilities.
controlNotificationsintA flag for control notifications.

Setup Messages (com.google.cast.setup)

eureka_info

A response message providing detailed product and build information about the receiver hardware. The request for this message is not clearly defined in the code, but this response also uses a type field instead of a responseType.

NameTypeValue/description
typestringMust be eureka_info.
request_idintThe request_id from the original request. Note the underscore.
response_codeintA status code (e.g., 200 for OK).
response_stringstringA status string (e.g., “OK”).
dataobjectAn object containing the device details.
data.namestringThe friendly name of the device.
data.versionintThe version of this info structure.
data.device_infoobjectAn object containing device hardware details like manufacturer and product_name.
data.build_infoobjectAn object containing software build details like cast_build_revision.

Media Transport Messages (com.google.cast.webrtc)

A number of streaming-specific features are defined via the com.google.cast.webrtc namespace, which defines the following additional messages:

OFFER

This message is sent from the sender to the receiver to initiate a streaming session.

NameTypeValue/description
typestringMust be OFFER.
offerobjectOffer object

The OFFER request can be sent by the sender at any point in time during the session to renegotiate parameters of the session.

If the receiver generates an error response to the initial offer, the sender should immediately terminate the session and inform the receiver unless it's able to generate a fallback offer.

A subsequent offer after a session is successfully established is only effective once an “ok” response is generated by the receiver. If an “error” response is generated, the already-established session should remain in effect.

Example OFFER Message

For a full example OFFER message, see castv2/streaming_examples/offer.json.

ANSWER

This message is sent from the receiver to the sender in response to an OFFER.

NameTypeValue/description
typestringMust be ANSWER.
resultstringMust be ok or error.
errorobject (optional)Only populated if result is error. Error object
answerobject (optional)Only populated if result is ok. Answer object
Example ANSWER message

For a full example ANSWER message, see castv2/streaming_examples/answer.json.

GET_CAPABILITIES

The “type” must be set to GET_CAPABILITIES, with no message body.

CAPABILITIES_RESPONSE

The “type” must be set to CAPABILITIES_RESPONSE, with the message body in a “capabilities” object. Note that the key_systems field has been deprecated and removed here.

NameTypeValue/description
typestringMust be set to CAPABILITIES_RESPONSE
capabilitiesobjectA Capabilities object

Object Definitions

Offer Object Definition

The “type” must be set to “OFFER”, with the message body in an “offer” object. For a living reference, see libcast's offer_messages.h.

NOTE: the libcast implementation separates out supportedStreams into strongly typed audio_streams and video_streams arrays, but they are collated together when serialized to JSON.

NameTypeValue/description
supportedStreamsarray of Stream objectsAn array of stream objects describing all acceptable stream formats that this endpoint supports. Sender only includes codecs it supports and the order of the stream objects shows the sender‘s preference. Receiver can choose any stream it prefers, or the first stream it supports if it doesn’t have any preferences. Receiver informs sender about the selected stream objects in the sendIndexes of the ANSWER object.
castModestringIndicates whether the offer is for “mirroring” or “remoting”. See CastMode in cast/streaming/public/constants.h.
Stream object

The stream object contains a generic section common for both audio and video; it also contains an audio or video specific section based on the type specified in the generic section.

A note on codecs: The set of codec profiles supported for Cast playback / remoting is notably larger than the codec profiles supported for mirroring streams. This is due to the practical limitation that implementing decoding support for a given codec is generally easier, more likely to have hardware support, and less likely to run into licensing issues.
NameTypeValue/description
indexintAn identifier established by the initiator that MUST contain a Number. The index of the first stream object must start with 0 and each following index MUST be the previous index +1.
typestringA String specifying the type of stream offered. Supported values are defined in Stream::Type in cast/streaming/public/offer_messages.h.
codecNamestringA String specifying the codec. Supported values are defined in AudioCodec and VideoCodec in cast/streaming/public/constants.h. To be compliant, cast receivers must at least implement opus, if they support audio, and vp8 if they support video. Senders must implement at least the baseline codecs (h264 or vp8, and aac or opus).
codecParameterstringA string specifying the codec parameter, in accordance with RFC.6381. Also known as the “media type string” as in Supported Media for Google Cast. Examples include avc1.64002A for H.264 level 4.2, and hev1.1.6.L150.B0 for H.265 main 5.0.
rtpProfilestringA String specifying supported RTP profile. Currently only cast is supported, with codec reserved for future interop with the intention of being used to indicate that codec-defined RTP profiles (defined by their respective RFCs) shall be used.
rtpPayloadTypeintA Number specifying the RTP payload type used for this stream. Valid values are in the range [96, 127]. See RtpPayloadType
ssrcuint32A Number specifying the RTP SSRC used for this stream. Values must be unique between all streams for this sender. All values are valid.
targetDelayintIndicates the desired total end-to-end latency.
aesKeystringA String specifying which AES key to use. Must consist of exactly 32 hex digits. Both an AES key and initialization vector are required: if either field is missing, this stream is invalid.
aesIvMaskstringA String specifying which initialization vector mask to use. Must consist of exactly 32 hex digits. Must be provided.
receiverRtcpEventLogboolean (optional)True to request receiver to send event log via RTCP. False otherwise.
receiverRtcpDscpint (optional)Request receiver to send RTCP packets using DSCP value indicated. Typically this value is 46.
rtpExtensionsArray of string (optional)RTP extensions supported by the Sender. Receivers can then reply with a list of rtpExtensions from this list that it also supports.
timeBasestring (optional)Number specifying the time base used by this “rtpPayloadType”. Default value is 1/90000. Valid values are “1/<sample rate>” where sample rate is strictly positive.
Audio Stream object
NameTypeValue/description
bitRateintA Number specifying the average bitrate in bits per second used by this “rtpPayloadType”.
channelsintA Number specifying the number of audio channels used by this “rtpPayloadType”.
Video Stream object

Note that additional video codec information such as codec profile and level, and video stream protection, are not implemented by any current senders or receivers. If these features need to be used in the future, they should be reimplemented.>TODO(crbug.com/471102790): The implementation includes profile and level fields, which contradicts the spec's claim that they are not implemented. The spec should be updated to reflect their presence.

NameTypeValue/description
maxFrameRatestringMax number of frames per second used by this “rtpPayloadType”. Note: Receivers may ignore this field when providing constraints in the ANSWER message. In this case, the sender must respect those constraints.
maxBitRateintMax bitrate in bits per second used by this “rtpPayloadType”. Note: Receivers may ignore this field when providing constraints in the ANSWER message. In this case, the sender must respect those constraints.
resolutionsarray of Video resolution objectsAn array of resolutions supported by this “rtpPayloadType”. Note: Receivers may ignore this field when providing constraints in the ANSWER message. In this case, the sender must respect those constraints.
errorRecoveryModestring (optional)String to indicate how video stream is encoded. Default value is castv2.castv2” means that the receiver cannot drop any video packets. There is no key frame or intra refresh mode after the first video frame in the session. “intra_mb_refresh” means that frames are encoded using intra macroblock refresh mode. The receiver can drop a video frame and recover later on after receiving new key frames or intra refresh macroblocks.
Video resolution object
NameTypeValue/description
widthintWidth in pixels.
heightintHeight in pixels.

Error Object Definition

The “type” may be set to anything, but the “result” field must be present and set to “error”, with the message body in an “error” object.

NameTypeValue/description
codeint32A code indicating what class of error occurred.
descriptionstringDescription of the error.

For a comprehensive and up-to-date list of error codes, refer to the openscreen::Error::Code enum in platform/base/error.h.

Answer Object Definition

The “type” must be set to “ANSWER”, with the message body in an “answer” object. For a living reference, see libcast's answer_messages.h.

NameTypeValue/description
udpPortintA Number specifying the UDP port used for all streams (RTP and RTCP) in this session. Note: values 1 to 65535 is valid.
sendIndexesArray of intNumbers specifying the indexes chosen from the OFFER message.
ssrcsArray of uint32Number specifying the RTP SSRC used to send the RTCP feedback of the stream indicated by the “sendIndexes” above. Note: values 0 to 2^32 is valid.
constraintsreceiver constraints object (optional, but highly recommended)Provides detailed maximum capabilities of the receiver for processing the streams selected in “sendIndexes” above; including audio sampling rate and number of channels, video dimensions and rates, encoded bit rates, and target latency. A sender may alter video resolution or frame rate throughout a session. The constraints here restrict how much data volume is allowed before the sender must subsample (e.g., downscale and/or reduce frame rate).
displaydisplay description object (optional, but highly recommended)Provides details about the display on the receiver, including dimensions (aspect ratio implied), scaling behavior, color profile, etc.
receiverRtcpEventLogArray of int (optional)Numbers specifying the indexes of streams that will send event log via RTCP. If this field is not present then the receiver does not support sending an event log via RTCP.
receiverRtcpDscpArray of int (optional)Numbers specifying the indexes of streams that will use DSCP values specified in the OFFER message for RTCP packets. If this field is not present then the receiver does not support DSCP.
rtpExtensionsArray of string (optional)If this field is not present then the receiver does not support any RTP extensions.
Receiver Constraints Object Definition
NameTypeValue/description
audioaudio receiver constraints objectAudio constraints. See below.
videovideo receiver constraints objectVideo constraints. See below.
Audio Receiver Constraints Object Definition
NameTypeValue/description
codecNamestringAudio codec name. See AudioCodec in cast/streaming/public/constants.h.
maxSampleRateintMaximum supported sampling frequency (not necessarily the ideal sampling frequency).
maxChannelsintMaximum number of audio channels supported. The number here is interpreted to relate to a standard speaker layout (e.g., 2 for left-and-right stereo, 5 for a left+center+right+left_surround+right_surround).
minBitRateint (optional)Minimum encoded audio data bits per second. If not specified, the sender will assume 32 kbps. Note: A receiver should never restrict the minBitRate to try to improve quality. This should reflect the true operational minimum.
maxBitRateintMaximum encoded audio data bits per second. This is the lower of: 1) The maximum capability of the decoder; or 2) The maximum sustained data transfer rate (e.g., could be limited by the CPU, RAM bandwidth, etc.). If not specified, the sender will assume no greater than 320kbps.
maxDelayint (optional)Maximum supported end-to-end latency, in milliseconds, for audio. This is proportional to the size of the data buffers in the receiver. Meaning, assume a very low-latency link between sender and receiver, and this value would indicate the amount of buffering that can be maintained (due to RAM capacity, etc.). If not provided, a default of 1200ms should be used.
Video Receiver Constraints Object Definition
NameTypeValue/description
codecNamestring (optional)Video codec name. See VideoCodec in cast/streaming/public/constants.h. If omitted, these constraints apply to all video codecs.
maxPixelsPerSeconddouble (optional)Maximum pixel rate (width x height x framerate). Note that this value can, and often will be, much less than multiplying the fields in maxDimensions. The purpose of this field is to limit the overall maximum processing rate. A sender will use this, in conjunction with the fields below, to trade-off between higher/lower resolution and lower/higher frame rate. Example: A device may be capable of 62208000 pixels per second, which allows a sender to send 1280x720@60 or 1920x1080@30*. In this example, the maxDimensions might specify {width:1920, height:1080, frameRate:60}.*
minResolutionresolution object (optional)Minimum width and height in pixels. If not specified, the sender will assume a reasonable minimum having the same aspect ratio as maxDimensions, with an area as close to 320x180 as possible. Note: A receiver should never restrict the minResolution in an effort to improve quality. This should reflect the true operational minimum.
maxDimensionsdimensions objectMaximum width and height in pixels (not necessarily the ideal width or height), and the maximum frame rate (not necessarily the ideal frame rate).
minBitRateint (optional)Minimum encoded video data bits per second. If not specified, the sender will assume 300 kbps. Note: A receiver should never restrict the minBitRate in an effort to improve quality. This should reflect the true operational minimum.
maxBitRateintMaximum encoded video data bits per second. This is the lower of: 1) The maximum capability of the decoder; or 2) The maximum sustained data transfer rate (e.g., could be limited by the CPU, RAM bandwidth, etc.).
maxDelayint (optional)Maximum supported end-to-end latency, in milliseconds, for video. This is proportional to the size of the data buffers in the receiver. Meaning, assume a very low-latency link between sender and receiver, and this value would indicate the amount of buffering that can be maintained (due to RAM capacity, etc.). If not provided, a default of 1200ms should be used.
Resolution Object Definition
NameTypeValue/description
widthintWidth, in pixels.
heightintHeight, in pixels.
Dimensions Object Definition
NameTypeValue/description
widthintWidth, in pixels.
heightintHeight, in pixels.
frameRatestringFrame rate. This should be specified as a rational decimal number (e.g., “30” or “30000/1001”).
Display Description Object Definition
NameTypeValue/description
dimensionsdimensions object (optional)If present, the receiver is attached to a fixed display having the given dimensions and frame rate (vsync) configuration. These dimensions may exceed, be the same, or be less than those mentioned in the constraints. If undefined, the receiver display is assumed to be fixed (e.g., a panel in a Hangouts UI). The sender uses this to decide the best way to sample, capture, and encode the content to optimize the user viewing experience.
aspectRatiostring (optional)The aspect ratio, in “#:#” format, when the receiver is attached to a fixed display. When missing and dimensions are specified, the sender will assume pixels are square, and the dimensions imply the aspect ratio of the fixed display. When present and dimensions are also specified, this implies the display pixels are not square.
scalingstring (optional)One of: “sender” The sender must scale and letterbox the content and provide video frames of a fixed aspect ratio. “receiver” The sender may send arbitrarily sized frames, and the receiver will handle the scaling and letterboxing as necessary for proper display.
Support for color balance profile, bit depth, and other properties has been discussed in the past, but never added to the spec or finalized.
Capabilities Object Definition
NameTypeValue/description
mediaCapsarray of stringList of media capabilities of the receiver. See AudioCapability and VideoCapability in cast/streaming/remoting_capabilities.h. video is deprecated and not used.
remotingintRemoting version of the receiver.
resultstringIndicates whether getting capabilities succeeded, must be either “ok” or “error.”

audio is a special value that indicates support for an set of codecs that have been defined as a “baseline” set. The current baseline set is defined in the Chrome RendererController as the following list:

  1. MP3
  2. PCM (Baseline, S16BE, S24BE, ALAW)
  3. Vorbis
  4. FLAC (Free Lossless Audio codec)
  5. AMR (both narrow band and wide band)
  6. GSM MS (a special Microsoft version of GSM Full Rate)
  7. Enhanced AC-3 (Dolby Digital Plus)
  8. ALAC (Apple Lossless Audio Codec)
  9. AC3 (also known as Dolby Digital)
  10. DTS-HD Master Audio
  11. DTS:X (Profile 2, lossy)
  12. DTS Extended Surround
Some legacy receivers may report vp9 and hevc in their mediaCaps response, even if they cannot remote these codecs.

Remoting Messages (com.google.cast.remoting)

Finally, in the com.google.cast.remoting namespace contains the following remoting specific messages:

RPC

The “type” must be set to “RPC”, with the base64-encoded protobuf message stored as a string under the “rpc” key.

Protobuf messages are complex, and defined in the remoting.proto file.

Input (Draft)

The com.google.cast.remoting namespace also supports sending input events from receiver to sender.

INPUT

The “type” must be set to “INPUT”, with the base64-encoded protobuf message stored as a string under the “input” key.

Protobuf messages are complex, and defined in the input.proto file.

Media Status Messages (com.google.cast.media)

Most of these messages are not supported in libcast, and are instead used for controlling flinging sessions with the Cast SDKs.

TODO(crbug.com/471102790): evaluate including more media message types in libcast.
MEDIA_STATUS

Sent by a media application on the receiver to update the sender on the state of media playback.

NameTypeValue/description
responseTypestringMust be MEDIA_STATUS.
requestIdintAn identifier for the request, or 0 if unsolicited.
mediaarray of objectAn array containing one or more media status objects.
media[n].mediaSessionIdintThe ID of the media session.
media[n].playerStatestringThe state of playback (e.g., PLAYING, PAUSED, IDLE).
media[n].currentTimedoubleThe current playback time in seconds.
media[n].mediaobjectAn object with metadata about the content, such as contentId and title.

Reference Schemas and Examples

TODO(crbug.com/471102790): rename castv2 folder to reflect its evergreen nature.

This specification is backed by two JSON Schemas:

  1. receiver_schema.json: containing core receiver control and status messages, including LAUNCH, STOP, GET_APP_AVAILABILITY, LAUNCH_STATUS, LAUNCH_ERROR, GET_STATUS, RECEIVER_STATUS, INVALID_REQUEST, GET_DEVICE_INFO, eureka_info, and MEDIA_STATUS.

  2. streaming_schema.json: containing messages specific to the streaming session, such as OFFER, ANSWER, GET_CAPABILITIES, CAPABILITIES_RESPONSE, and RPC.

Examples are provided in the castv2/receiver_examples (for receiver control and media status messages) and castv2/streaming_examples (for streaming specific messages) folders, with a C++ validation component defined in castv2/validation.h.

When adding or modifying messages in this specification, the corresponding schema and examples should be updated concurrently. The syntax of these files can be validated using yajsv -- see the castv2/README.md for more information.

Discovering Receiver Capabilities

Prior to the offer/answer exchange, the sender may desire information about the receiver in order to create an optimal offer. This discovery of capabilities is currently limited to the DNS-SD ca bit-field, which indicates whether the receiver supports audio or video. Remoting specific capabilities may be discovered by the sender using a GET_CAPABILITIES call, as defined below.

Keep Alive

There is no official control/application-level keep alive, and the PING and PONG messages originally included in the protocol are now deprecated. The sender and receiver are both expected to either disconnect by inferring the session has ended through status messages, or independently based on media-level timeouts (the media layer must thus send an event to the application level to do the appropriate cleanup, but the exact mechanism to this is specific to particular sender and receiver implementations).

The default timeout is 15 seconds. If no media packets (RTP, ACK, NACK, etc.) are received from the remote peer within this duration, the sender or receiver that failed to receive data should disconnect.

In legacy Cast devices, an application-level “keep-alive” message was used both by the sender and the receiver to terminate a streaming session. This was used to handle a variety of scenarios, with the most common being the desire to quickly end a session when a user closes their laptop (for example).

This approach was abandoned in favor of relying on ACKs, status messages, and timeouts due to these mechanisms resulting in more robust streaming sessions, as well as reducing unnecessary network traffic and battery drain.

Security Considerations

The Cast v2 protocol includes several security mechanisms to protect the streaming session.

Encryption

Media streams must be encrypted using AES-128. The aesKey and aesIvMask fields in the Stream object of the OFFER message are used to provide the encryption key and initialization vector mask. Both are 32-digit hex strings. If these fields are absent, the OFFER message shall be considered invalid and the session request rejected.

Authentication

Device authentication is performed as the first step of the connection, as shown in Appendix A. This is handled by the com.google.cast.tp.deviceauth namespace. The details of the authentication protocol are outside the scope of this document.

Deprecated Messages

Several message types were originally included in the Cast v2 specification, but have become deprecated and may or may not be implemented on modern Cast devices and are not required for devices to be compliant Cast senders or receivers.

These message types are called out in the CastMessageType enum in the libcast implementation, as well as listed here:

  • APPLICATION_BROADCAST (reply): context is unknown -- lost to time.

  • INVALID_PLAYER_STATE (reply): indicates that the player is in an invalid state.

  • LOAD_FAILED (reply): indicates that loading a media has failed.

  • LOAD_CANCELLED (reply): indicates that loading a media was cancelled.

  • MULTIZONE_STATUS (request): context is unknown -- lost to time.

  • PRESENTATION (request): controls zooming, panning, rotation.

  • PING (request): sent to ask the receiver if it is currently alive.

  • PONG (response): sent to inform the sender that this receiver is alive.

Appendix A - Sample Cast Channel Message Flow

This appendix describes the typical sequence of JSON messages exchanged between a Cast sender and receiver to establish, run, and terminate a streaming session.

1. Discovery and Handshake

Before a streaming session can be launched, the sender first discovers the receiver on the network (e.g., via mDNS) and establishes a secure channel. The following message exchange then occurs:

  1. Device Authentication: The sender and receiver exchange authentication messages to verify their identities. This is a prerequisite for all further communication.

    • Namespace: urn:x-cast:com.google.cast.tp.deviceauth
  2. Virtual Connection: The sender establishes a “virtual connection” to the main receiver process. This acts as the initial control channel.

    • Sender → Receiver: urn:x-cast:com.google.cast.tp.connection, type CONNECT
  3. Receiver Status Check: The sender queries the receiver's current status.

    • Sender → Receiver: urn:x-cast:com.google.cast.receiver, type GET_STATUS
    • Receiver → Sender: urn:x-cast:com.google.cast.receiver, type RECEIVER_STATUS. The reply indicates the currently running application (typically the “Backdrop” idle screen) and other device state like volume.
  4. Application Availability: The sender checks if the receiver supports the Cast streaming application.

    • Sender → Receiver: urn:x-cast:com.google.cast.receiver, type GET_APP_AVAILABILITY. The sender sends one request for the standard audio/video streaming app (0F5096E8) and another for the audio-only app (85CDB22F).
    • Receiver → Sender: urn:x-cast:com.google.cast.receiver, type GET_APP_AVAILABILITY response. The receiver confirms whether each app is available or not.

2. Streaming Session Launch

Once the user initiates streaming (e.g., by selecting “Cast...” in Chrome), the following message flow begins:

  1. Launch Request: The sender requests the receiver to launch the streaming application.

    • Sender → Receiver: urn:x-cast:com.google.cast.receiver, type LAUNCH
    {
      "type": "LAUNCH",
      "appId": "0F5096E8",
      "requestId": 17
    }
    
  2. Launch Confirmation: The receiver confirms the launch by sending a RECEIVER_STATUS update. This crucial message contains the transportId and sessionId for the newly created application instance. This transportId will be used as the destination_id for all subsequent messages to the streaming app.

    {
      "type": "RECEIVER_STATUS",
      "requestId": 17,
      "status": {
        "applications": [
          {
            "appId": "0F5096E8",
            "displayName": "Chrome Mirroring",
            "sessionId": "154d8823-...",
            "transportId": "154d8823-...",
            "isIdleScreen": false
          }
        ]
      }
    }
    
  3. Connect to Streaming App: The sender establishes a new virtual connection, this time directly to the streaming application instance using its transportId.

    • Sender → Streaming App: urn:x-cast:com.google.cast.tp.connection, type CONNECT
  4. Media Negotiation (Offer/Answer): The sender and receiver negotiate the media format for streaming.

    • Sender → Streaming App: urn:x-cast:com.google.cast.webrtc, type OFFER. The sender proposes a set of supported audio and video streams.
    {
      "type": "OFFER",
      "seqNum": 820263768,
      "offer": {
        "castMode": "mirroring",
        "supportedStreams": [
          {
            "index": 0,
            "type": "audio_source",
            "codecName": "opus",
            "rtpPayloadType": 127,
            "ssrc": 264890,
            "targetDelay": 400,
            "channels": 2,
            ...
          },
          {
            "index": 1,
            "type": "video_source",
            "codecName": "vp8",
            "rtpPayloadType": 96,
            "ssrc": 748229,
            "maxFrameRate": "30",
            "resolutions": [{"width": 1920, "height": 1080}],
            ...
          }
        ]
      }
    }
    
    • Streaming App → Sender: urn:x-cast:com.google.cast.webrtc, type ANSWER. The receiver accepts the offer, selects which streams it will use (via sendIndexes), and specifies the UDP port for media transport.
    {
      "type": "ANSWER",
      "seqNum": 820263768,
      "result": "ok",
      "answer": {
        "udpPort": 33533,
        "sendIndexes": [0, 1],
        "ssrcs": [264891, 748230]
      }
    }
    
  5. Streaming Active: With the negotiation complete, media begins to flow over UDP. The streaming app sends a MEDIA_STATUS message to confirm that playback has started.

    • Streaming App → Sender: urn:x-cast:com.google.cast.media, type MEDIA_STATUS

3. Streaming Session Termination

When the user stops the session:

  1. Stop Request: The sender sends a STOP message to the main receiver process, referencing the sessionId of the streaming application.

    • Sender → Receiver: urn:x-cast:com.google.cast.receiver, type STOP
    {
      "type": "STOP",
      "sessionId": "154d8823-...",
      "requestId": 22
    }
    
  2. Close Connection: The streaming application, upon termination, sends a CLOSE message to tear down its virtual connection with the sender.

    • Streaming App → Sender: urn:x-cast:com.google.cast.tp.connection, type CLOSE
  3. Return to Idle: The receiver returns to the idle screen and broadcasts a final RECEIVER_STATUS update, showing that “Backdrop” is now the active application.