Owner: Jordan Bayles
This document is the reference specification for the libcast Streaming Session Protocol. This spec was originally developed as part of the Google Cast v2 project over a decade ago, and has slowly evolved over time to meet the needs of various Cast devices and scenarios.
With the implementation of libcast and its ever increasing adoption rate across the Cast ecosystem, this document will continue to evolve as the official standard for the Cast Streaming protocol.
Provide a reference specification for consumers and implementers of the libcast library, so that proposed features can be properly discussed, designed, implemented, and maintained.
Define interfaces for establishing, controlling, and terminating Cast streaming (mirroring, remoting, and otherwise) sessions.
Prioritize interoperability across the Cast ecosystem, so that Cast devices using the latest and greatest version of libcast are still, to the best of our ability, compatible with legacy devices no longer receiving updates. This specification will attempt to maintain compatibility with legacy Cast v2 devices, but the protocol is expected to grow and change over time.
Unfortunately, some important Cast APIs are still defined by closed source specifications, and are thus out of scope of this document. Defining the following APIs is thus out of scope:
Application control messaging: Some application control messages, such as LAUNCH and STOP, are included. However, it is likely that there are other, unspecified messages used to manage higher level app state. A functional implementation is provided for the receiver ( openscreen::cast::ApplicationAgent) and a reference implementation for the sender (openscreen::cast::LoopingFileAgent).
Authentication messaging: On top of TLS, Cast provides additional authentication through the use of certificates and private keys. The API for authentication is implemented in several directories of cast, especially the //cast/common/channel and //cast/common/certificate folders.
Keep-alive behavior: Although the control channel may (or may not) have more sophisticated behavior for keeping a session alive, in this specific protocol it is limited to a simple timeout, with deprecated PING/PONG messages kept in the specification.
Flinging messaging: there is a rich suite of messages in the media namespace used for controlling flinging messages, i.e. sessions where the receiver is responsible for fetching content and controlling playback. This document is focused on streaming, both mirroring and remoting, and leaves flinging for closed source documentation and closed source APIs.
See the cast/README.md for more information about the libcast project.
The streaming session protocol defines how a streaming session interacts with standard Cast messages (in the com.google.cast namespace), mirroring-specific messages (in the urn:x-cast:com.google.cast.webrtc namespace), and remoting-specific messages (in the urn:x-cast:com.google.cast.remoting namespace).
In this context, “mirroring” refers to sending a real-time encoded stream of the sender‘s screen to the receiver. “Remoting” refers to an optimization where, if the sender’s screen is primarily composed of a video that the receiver is capable of performantly decoding, instead of transcoding the video and streaming it to the receiver, it is instead sent to the receiver still encoded and decoded directly by the receiver.
Launch and Termination Request from Sender: streaming is initiated using a standard com.google.cast LAUNCH request with the appId parameter set to a pre-defined 8-digit alphanumeric app identifier. The full set of application IDs known to libcast at this time are defined in the cast_streaming_app_ids.h header. Termination is via a standard STOP request.
Transport Session Negotiation: a streaming-specific OFFER event is defined, and used by the sender to generate an offer for the receiver. The ANSWER response to this event contains the answer object. The OFFER must be sent immediately after launch, and can be sent again at any time.
Presentation Request from Sender: a PRESENTATION request allows sender control over rendering transformations applied on the receiver. The transformation defines zoom, offset, and rotation; it enables the current letterboxing-elimination feature, overscan compensation (for flexibility, if needed), and rotation (intended for Android, not used by Chrome).
Keep Alive: both sender and receiver are expected to use media-level activity and/or transport layer activity for keeping the connection alive. There's no separate application-level keep alive message.
Application Protocol & Other Control Messages: the minimum set of control messages required for streaming is LAUNCH, STOP, and RECEIVER_STATUS. The full list is documented in the Session Control Messages section.
This section defines the JSON payloads for various messages used in the Cast protocol.
Message field types generally one of the following three classes: primitives, structures, or collections of primitives or structures. The most common type primitives are string and int. Although this spec is generally written in C++, there is no technical reason one could not produce an implementation in other languages. This specification's assumptions about primitive types are defined in the below table:
| Name | Definition |
|---|---|
int | A signed at-least-32-bit integer value (int in C++) |
uint32 | An unsigned 32 bit integer value (uint32_t in C++) |
string | An array of ASCII characters (char* or std::string in C++) |
It is assumed that messages generally have all of the following properties:
| Name | Type | Value/description |
|---|---|---|
| sessionId | int | Unique identifier of the session |
| seqNum | int | Request sequence number |
| type | string | Represents the specific kind of message |
com.google.cast)Basic streaming session control occurs via com.google.cast messages (as currently defined with version=2), as follows:
LAUNCH (request): initiates the streaming session. For v2 streaming, the appId parameter must be set to 0F5096E8 if the session is audio and video, or 85CDB22F if the session is audio only. For this app name, it is expected that the Cast receiver will run a specific built-in streaming receiver that implements this specification.
LAUNCH_STATUS (reply): sent from the Cast receiver to indicate that the launch request succeeded.
LAUNCH_ERROR (reply): sent from the Cast receiver to indicate that the launch request failed.
GET_APP_AVAILABILITY (request, reply): name of both the request and response message for getting information about what applications are available.
GET_STATUS (request): requests the status of the receiver.
STATUS_RESPONSE (reply): response to a GET_STATUS request.
STOP (request): terminates the streaming session, and must terminate all underlying media streams from the sender to the receiver.
INVALID_REQUEST (reply): Optional message sent by the receiver whenever an invalid command is received.
RECEIVER_STATUS (reply): response to a GET_STATUS request.
This message is sent from the sender to the receiver to initiate a streaming session.
| Name | Type | Value/description |
|---|---|---|
type | string | Must be LAUNCH. |
requestId | int | A unique identifier for the request. |
appId | string | The ID of the application to launch. For streaming, this is typically 0F5096E8 for A/V or 85CDB22F for audio-only. |
appParams | object (optional) | An optional object containing application-specific parameters. |
language | string (optional) | The preferred language for the application (e.g., “en-US”). |
supportedAppTypes | array of string (optional) | A list of application types supported by the sender (e.g., “WEB”, “ANDROID_TV”). |
This message is sent from the receiver to the sender to indicate that the application launch was successful. Note that a full RECEIVER_STATUS message is typically sent immediately after this.
| Name | Type | Value/description |
|---|---|---|
responseType | string | Must be LAUNCH_STATUS. |
launchRequestId | int | The requestId from the original LAUNCH request. |
status | string | A status string, must be USER_ALLOWED. |
This message is sent from the receiver to the sender if the application launch failed.
| Name | Type | Value/description |
|---|---|---|
responseType | string | Must be LAUNCH_ERROR. |
requestId | int | The requestId from the original LAUNCH request. |
reason | string | A string code indicating the reason for the failure (e.g., NOT_FOUND, SYSTEM_ERROR). |
A request from the sender to check if specific applications can be launched on the receiver. The response uses the same message name in its responseType field.
Request:
| Name | Type | Value/description |
|---|---|---|
type | string | Must be GET_APP_AVAILABILITY. |
requestId | int | A unique identifier for the request. |
appId | array of string | An array of application IDs to check. |
Response:
| Name | Type | Value/description |
|---|---|---|
responseType | string | Must be GET_APP_AVAILABILITY. |
requestId | int | The requestId from the request. |
availability | object | An object where keys are appIds and values are APP_AVAILABLE or APP_UNAVAILABLE. |
A request from the sender to get the receiver's current status. It has no payload other than the common type and requestId fields.
A response sent by the receiver containing its current status. This can be in response to a GET_STATUS request or sent unsolicited when the receiver's state changes.
| Name | Type | Value/description |
|---|---|---|
responseType | string | Must be RECEIVER_STATUS. |
requestId | int | The requestId from the GET_STATUS request, or 0 if unsolicited. |
status | object | The main status object. |
status.applications | array of object | A list of running applications. Each object contains appId, displayName, sessionId, transportId, isIdleScreen, etc. |
status.volume | object | An object describing the device's volume state, with fields like level, muted, and controlType. |
This message is sent from the sender to the receiver to terminate a running application.
| Name | Type | Value/description |
|---|---|---|
type | string | Must be STOP. |
requestId | int | A unique identifier for the request. |
sessionId | string | The ID of the session to be terminated. |
Sent by the receiver when it receives a malformed or invalid request.
| Name | Type | Value/description |
|---|---|---|
responseType | string | Must be INVALID_REQUEST. |
requestId | int | The requestId from the invalid request, if it could be parsed. |
reason | string | A string code for the error (e.g., INVALID_COMMAND). |
google.cast.receiver.discovery)A request from the sender for detailed information about the receiver device. The response unexpectedly uses the same message type and is not a responseType.
Request:
| Name | Type | Value/description |
|---|---|---|
type | string | Must be GET_DEVICE_INFO. |
requestId | int | A unique identifier for the request. |
Response:
| Name | Type | Value/description |
|---|---|---|
type | string | Must be GET_DEVICE_INFO. |
requestId | int | The requestId from the request. |
deviceId | string | A unique identifier for the receiver device. |
friendlyName | string | The user-configured device name. |
deviceModel | string | The product model name. |
capabilities | int | A bitmask of device capabilities. |
controlNotifications | int | A flag for control notifications. |
com.google.cast.setup)A response message providing detailed product and build information about the receiver hardware. The request for this message is not clearly defined in the code, but this response also uses a type field instead of a responseType.
| Name | Type | Value/description |
|---|---|---|
type | string | Must be eureka_info. |
request_id | int | The request_id from the original request. Note the underscore. |
response_code | int | A status code (e.g., 200 for OK). |
response_string | string | A status string (e.g., “OK”). |
data | object | An object containing the device details. |
data.name | string | The friendly name of the device. |
data.version | int | The version of this info structure. |
data.device_info | object | An object containing device hardware details like manufacturer and product_name. |
data.build_info | object | An object containing software build details like cast_build_revision. |
com.google.cast.webrtc)A number of streaming-specific features are defined via the com.google.cast.webrtc namespace, which defines the following additional messages:
This message is sent from the sender to the receiver to initiate a streaming session.
| Name | Type | Value/description |
|---|---|---|
type | string | Must be OFFER. |
offer | object | Offer object |
The OFFER request can be sent by the sender at any point in time during the session to renegotiate parameters of the session.
If the receiver generates an error response to the initial offer, the sender should immediately terminate the session and inform the receiver unless it's able to generate a fallback offer.
A subsequent offer after a session is successfully established is only effective once an “ok” response is generated by the receiver. If an “error” response is generated, the already-established session should remain in effect.
OFFER MessageFor a full example OFFER message, see castv2/streaming_examples/offer.json.
This message is sent from the receiver to the sender in response to an OFFER.
| Name | Type | Value/description |
|---|---|---|
type | string | Must be ANSWER. |
result | string | Must be ok or error. |
error | object (optional) | Only populated if result is error. Error object |
answer | object (optional) | Only populated if result is ok. Answer object |
For a full example ANSWER message, see castv2/streaming_examples/answer.json.
The “type” must be set to GET_CAPABILITIES, with no message body.
The “type” must be set to CAPABILITIES_RESPONSE, with the message body in a “capabilities” object. Note that the key_systems field has been deprecated and removed here.
| Name | Type | Value/description |
|---|---|---|
| type | string | Must be set to CAPABILITIES_RESPONSE |
| capabilities | object | A Capabilities object |
The “type” must be set to “OFFER”, with the message body in an “offer” object. For a living reference, see libcast's offer_messages.h.
NOTE: the libcast implementation separates out supportedStreams into strongly typed audio_streams and video_streams arrays, but they are collated together when serialized to JSON.
| Name | Type | Value/description |
|---|---|---|
| supportedStreams | array of Stream objects | An array of stream objects describing all acceptable stream formats that this endpoint supports. Sender only includes codecs it supports and the order of the stream objects shows the sender‘s preference. Receiver can choose any stream it prefers, or the first stream it supports if it doesn’t have any preferences. Receiver informs sender about the selected stream objects in the sendIndexes of the ANSWER object. |
| castMode | string | Indicates whether the offer is for “mirroring” or “remoting”. See CastMode in cast/streaming/public/constants.h. |
The stream object contains a generic section common for both audio and video; it also contains an audio or video specific section based on the type specified in the generic section.
| Name | Type | Value/description |
|---|---|---|
| index | int | An identifier established by the initiator that MUST contain a Number. The index of the first stream object must start with 0 and each following index MUST be the previous index +1. |
| type | string | A String specifying the type of stream offered. Supported values are defined in Stream::Type in cast/streaming/public/offer_messages.h. |
| codecName | string | A String specifying the codec. Supported values are defined in AudioCodec and VideoCodec in cast/streaming/public/constants.h. To be compliant, cast receivers must at least implement opus, if they support audio, and vp8 if they support video. Senders must implement at least the baseline codecs (h264 or vp8, and aac or opus). |
| codecParameter | string | A string specifying the codec parameter, in accordance with RFC.6381. Also known as the “media type string” as in Supported Media for Google Cast. Examples include avc1.64002A for H.264 level 4.2, and hev1.1.6.L150.B0 for H.265 main 5.0. |
| rtpProfile | string | A String specifying supported RTP profile. Currently only cast is supported, with codec reserved for future interop with the intention of being used to indicate that codec-defined RTP profiles (defined by their respective RFCs) shall be used. |
| rtpPayloadType | int | A Number specifying the RTP payload type used for this stream. Valid values are in the range [96, 127]. See RtpPayloadType |
| ssrc | uint32 | A Number specifying the RTP SSRC used for this stream. Values must be unique between all streams for this sender. All values are valid. |
| targetDelay | int | Indicates the desired total end-to-end latency. |
| aesKey | string | A String specifying which AES key to use. Must consist of exactly 32 hex digits. Both an AES key and initialization vector are required: if either field is missing, this stream is invalid. |
| aesIvMask | string | A String specifying which initialization vector mask to use. Must consist of exactly 32 hex digits. Must be provided. |
| receiverRtcpEventLog | boolean (optional) | True to request receiver to send event log via RTCP. False otherwise. |
| receiverRtcpDscp | int (optional) | Request receiver to send RTCP packets using DSCP value indicated. Typically this value is 46. |
| rtpExtensions | Array of string (optional) | RTP extensions supported by the Sender. Receivers can then reply with a list of rtpExtensions from this list that it also supports. |
| timeBase | string (optional) | Number specifying the time base used by this “rtpPayloadType”. Default value is 1/90000. Valid values are “1/<sample rate>” where sample rate is strictly positive. |
| Name | Type | Value/description |
|---|---|---|
| bitRate | int | A Number specifying the average bitrate in bits per second used by this “rtpPayloadType”. |
| channels | int | A Number specifying the number of audio channels used by this “rtpPayloadType”. |
Note that additional video codec information such as codec profile and level, and video stream protection, are not implemented by any current senders or receivers. If these features need to be used in the future, they should be reimplemented.>TODO(crbug.com/471102790): The implementation includes profile and level fields, which contradicts the spec's claim that they are not implemented. The spec should be updated to reflect their presence.
| Name | Type | Value/description |
|---|---|---|
| maxFrameRate | string | Max number of frames per second used by this “rtpPayloadType”. Note: Receivers may ignore this field when providing constraints in the ANSWER message. In this case, the sender must respect those constraints. |
| maxBitRate | int | Max bitrate in bits per second used by this “rtpPayloadType”. Note: Receivers may ignore this field when providing constraints in the ANSWER message. In this case, the sender must respect those constraints. |
| resolutions | array of Video resolution objects | An array of resolutions supported by this “rtpPayloadType”. Note: Receivers may ignore this field when providing constraints in the ANSWER message. In this case, the sender must respect those constraints. |
| errorRecoveryMode | string (optional) | String to indicate how video stream is encoded. Default value is castv2. “castv2” means that the receiver cannot drop any video packets. There is no key frame or intra refresh mode after the first video frame in the session. “intra_mb_refresh” means that frames are encoded using intra macroblock refresh mode. The receiver can drop a video frame and recover later on after receiving new key frames or intra refresh macroblocks. |
| Name | Type | Value/description |
|---|---|---|
| width | int | Width in pixels. |
| height | int | Height in pixels. |
The “type” may be set to anything, but the “result” field must be present and set to “error”, with the message body in an “error” object.
| Name | Type | Value/description |
|---|---|---|
| code | int32 | A code indicating what class of error occurred. |
| description | string | Description of the error. |
For a comprehensive and up-to-date list of error codes, refer to the openscreen::Error::Code enum in platform/base/error.h.
The “type” must be set to “ANSWER”, with the message body in an “answer” object. For a living reference, see libcast's answer_messages.h.
| Name | Type | Value/description |
|---|---|---|
| udpPort | int | A Number specifying the UDP port used for all streams (RTP and RTCP) in this session. Note: values 1 to 65535 is valid. |
| sendIndexes | Array of int | Numbers specifying the indexes chosen from the OFFER message. |
| ssrcs | Array of uint32 | Number specifying the RTP SSRC used to send the RTCP feedback of the stream indicated by the “sendIndexes” above. Note: values 0 to 2^32 is valid. |
| constraints | receiver constraints object (optional, but highly recommended) | Provides detailed maximum capabilities of the receiver for processing the streams selected in “sendIndexes” above; including audio sampling rate and number of channels, video dimensions and rates, encoded bit rates, and target latency. A sender may alter video resolution or frame rate throughout a session. The constraints here restrict how much data volume is allowed before the sender must subsample (e.g., downscale and/or reduce frame rate). |
| display | display description object (optional, but highly recommended) | Provides details about the display on the receiver, including dimensions (aspect ratio implied), scaling behavior, color profile, etc. |
| receiverRtcpEventLog | Array of int (optional) | Numbers specifying the indexes of streams that will send event log via RTCP. If this field is not present then the receiver does not support sending an event log via RTCP. |
| receiverRtcpDscp | Array of int (optional) | Numbers specifying the indexes of streams that will use DSCP values specified in the OFFER message for RTCP packets. If this field is not present then the receiver does not support DSCP. |
| rtpExtensions | Array of string (optional) | If this field is not present then the receiver does not support any RTP extensions. |
| Name | Type | Value/description |
|---|---|---|
| audio | audio receiver constraints object | Audio constraints. See below. |
| video | video receiver constraints object | Video constraints. See below. |
| Name | Type | Value/description |
|---|---|---|
| codecName | string | Audio codec name. See AudioCodec in cast/streaming/public/constants.h. |
| maxSampleRate | int | Maximum supported sampling frequency (not necessarily the ideal sampling frequency). |
| maxChannels | int | Maximum number of audio channels supported. The number here is interpreted to relate to a standard speaker layout (e.g., 2 for left-and-right stereo, 5 for a left+center+right+left_surround+right_surround). |
| minBitRate | int (optional) | Minimum encoded audio data bits per second. If not specified, the sender will assume 32 kbps. Note: A receiver should never restrict the minBitRate to try to improve quality. This should reflect the true operational minimum. |
| maxBitRate | int | Maximum encoded audio data bits per second. This is the lower of: 1) The maximum capability of the decoder; or 2) The maximum sustained data transfer rate (e.g., could be limited by the CPU, RAM bandwidth, etc.). If not specified, the sender will assume no greater than 320kbps. |
| maxDelay | int (optional) | Maximum supported end-to-end latency, in milliseconds, for audio. This is proportional to the size of the data buffers in the receiver. Meaning, assume a very low-latency link between sender and receiver, and this value would indicate the amount of buffering that can be maintained (due to RAM capacity, etc.). If not provided, a default of 1200ms should be used. |
| Name | Type | Value/description |
|---|---|---|
| codecName | string (optional) | Video codec name. See VideoCodec in cast/streaming/public/constants.h. If omitted, these constraints apply to all video codecs. |
| maxPixelsPerSecond | double (optional) | Maximum pixel rate (width x height x framerate). Note that this value can, and often will be, much less than multiplying the fields in maxDimensions. The purpose of this field is to limit the overall maximum processing rate. A sender will use this, in conjunction with the fields below, to trade-off between higher/lower resolution and lower/higher frame rate. Example: A device may be capable of 62208000 pixels per second, which allows a sender to send 1280x720@60 or 1920x1080@30*. In this example, the maxDimensions might specify {width:1920, height:1080, frameRate:60}.* |
| minResolution | resolution object (optional) | Minimum width and height in pixels. If not specified, the sender will assume a reasonable minimum having the same aspect ratio as maxDimensions, with an area as close to 320x180 as possible. Note: A receiver should never restrict the minResolution in an effort to improve quality. This should reflect the true operational minimum. |
| maxDimensions | dimensions object | Maximum width and height in pixels (not necessarily the ideal width or height), and the maximum frame rate (not necessarily the ideal frame rate). |
| minBitRate | int (optional) | Minimum encoded video data bits per second. If not specified, the sender will assume 300 kbps. Note: A receiver should never restrict the minBitRate in an effort to improve quality. This should reflect the true operational minimum. |
| maxBitRate | int | Maximum encoded video data bits per second. This is the lower of: 1) The maximum capability of the decoder; or 2) The maximum sustained data transfer rate (e.g., could be limited by the CPU, RAM bandwidth, etc.). |
| maxDelay | int (optional) | Maximum supported end-to-end latency, in milliseconds, for video. This is proportional to the size of the data buffers in the receiver. Meaning, assume a very low-latency link between sender and receiver, and this value would indicate the amount of buffering that can be maintained (due to RAM capacity, etc.). If not provided, a default of 1200ms should be used. |
| Name | Type | Value/description |
|---|---|---|
| width | int | Width, in pixels. |
| height | int | Height, in pixels. |
| Name | Type | Value/description |
|---|---|---|
| width | int | Width, in pixels. |
| height | int | Height, in pixels. |
| frameRate | string | Frame rate. This should be specified as a rational decimal number (e.g., “30” or “30000/1001”). |
| Name | Type | Value/description |
|---|---|---|
| dimensions | dimensions object (optional) | If present, the receiver is attached to a fixed display having the given dimensions and frame rate (vsync) configuration. These dimensions may exceed, be the same, or be less than those mentioned in the constraints. If undefined, the receiver display is assumed to be fixed (e.g., a panel in a Hangouts UI). The sender uses this to decide the best way to sample, capture, and encode the content to optimize the user viewing experience. |
| aspectRatio | string (optional) | The aspect ratio, in “#:#” format, when the receiver is attached to a fixed display. When missing and dimensions are specified, the sender will assume pixels are square, and the dimensions imply the aspect ratio of the fixed display. When present and dimensions are also specified, this implies the display pixels are not square. |
| scaling | string (optional) | One of: “sender” The sender must scale and letterbox the content and provide video frames of a fixed aspect ratio. “receiver” The sender may send arbitrarily sized frames, and the receiver will handle the scaling and letterboxing as necessary for proper display. |
| Name | Type | Value/description |
|---|---|---|
| mediaCaps | array of string | List of media capabilities of the receiver. See AudioCapability and VideoCapability in cast/streaming/remoting_capabilities.h. video is deprecated and not used. |
| remoting | int | Remoting version of the receiver. |
| result | string | Indicates whether getting capabilities succeeded, must be either “ok” or “error.” |
audio is a special value that indicates support for an set of codecs that have been defined as a “baseline” set. The current baseline set is defined in the Chrome RendererController as the following list:
vp9 and hevc in their mediaCaps response, even if they cannot remote these codecs.com.google.cast.remoting)Finally, in the com.google.cast.remoting namespace contains the following remoting specific messages:
The “type” must be set to “RPC”, with the base64-encoded protobuf message stored as a string under the “rpc” key.
Protobuf messages are complex, and defined in the remoting.proto file.
The com.google.cast.remoting namespace also supports sending input events from receiver to sender.
The “type” must be set to “INPUT”, with the base64-encoded protobuf message stored as a string under the “input” key.
Protobuf messages are complex, and defined in the input.proto file.
com.google.cast.media)Most of these messages are not supported in libcast, and are instead used for controlling flinging sessions with the Cast SDKs.
Sent by a media application on the receiver to update the sender on the state of media playback.
| Name | Type | Value/description |
|---|---|---|
responseType | string | Must be MEDIA_STATUS. |
requestId | int | An identifier for the request, or 0 if unsolicited. |
media | array of object | An array containing one or more media status objects. |
media[n].mediaSessionId | int | The ID of the media session. |
media[n].playerState | string | The state of playback (e.g., PLAYING, PAUSED, IDLE). |
media[n].currentTime | double | The current playback time in seconds. |
media[n].media | object | An object with metadata about the content, such as contentId and title. |
This specification is backed by two JSON Schemas:
receiver_schema.json: containing core receiver control and status messages, including LAUNCH, STOP, GET_APP_AVAILABILITY, LAUNCH_STATUS, LAUNCH_ERROR, GET_STATUS, RECEIVER_STATUS, INVALID_REQUEST, GET_DEVICE_INFO, eureka_info, and MEDIA_STATUS.
streaming_schema.json: containing messages specific to the streaming session, such as OFFER, ANSWER, GET_CAPABILITIES, CAPABILITIES_RESPONSE, and RPC.
Examples are provided in the castv2/receiver_examples (for receiver control and media status messages) and castv2/streaming_examples (for streaming specific messages) folders, with a C++ validation component defined in castv2/validation.h.
yajsv -- see the castv2/README.md for more information.Prior to the offer/answer exchange, the sender may desire information about the receiver in order to create an optimal offer. This discovery of capabilities is currently limited to the DNS-SD ca bit-field, which indicates whether the receiver supports audio or video. Remoting specific capabilities may be discovered by the sender using a GET_CAPABILITIES call, as defined below.
There is no official control/application-level keep alive, and the PING and PONG messages originally included in the protocol are now deprecated. The sender and receiver are both expected to either disconnect by inferring the session has ended through status messages, or independently based on media-level timeouts (the media layer must thus send an event to the application level to do the appropriate cleanup, but the exact mechanism to this is specific to particular sender and receiver implementations).
The default timeout is 15 seconds. If no media packets (RTP, ACK, NACK, etc.) are received from the remote peer within this duration, the sender or receiver that failed to receive data should disconnect.
In legacy Cast devices, an application-level “keep-alive” message was used both by the sender and the receiver to terminate a streaming session. This was used to handle a variety of scenarios, with the most common being the desire to quickly end a session when a user closes their laptop (for example).
This approach was abandoned in favor of relying on ACKs, status messages, and timeouts due to these mechanisms resulting in more robust streaming sessions, as well as reducing unnecessary network traffic and battery drain.
The Cast v2 protocol includes several security mechanisms to protect the streaming session.
Media streams must be encrypted using AES-128. The aesKey and aesIvMask fields in the Stream object of the OFFER message are used to provide the encryption key and initialization vector mask. Both are 32-digit hex strings. If these fields are absent, the OFFER message shall be considered invalid and the session request rejected.
Device authentication is performed as the first step of the connection, as shown in Appendix A. This is handled by the com.google.cast.tp.deviceauth namespace. The details of the authentication protocol are outside the scope of this document.
Several message types were originally included in the Cast v2 specification, but have become deprecated and may or may not be implemented on modern Cast devices and are not required for devices to be compliant Cast senders or receivers.
These message types are called out in the CastMessageType enum in the libcast implementation, as well as listed here:
APPLICATION_BROADCAST (reply): context is unknown -- lost to time.
INVALID_PLAYER_STATE (reply): indicates that the player is in an invalid state.
LOAD_FAILED (reply): indicates that loading a media has failed.
LOAD_CANCELLED (reply): indicates that loading a media was cancelled.
MULTIZONE_STATUS (request): context is unknown -- lost to time.
PRESENTATION (request): controls zooming, panning, rotation.
PING (request): sent to ask the receiver if it is currently alive.
PONG (response): sent to inform the sender that this receiver is alive.
This appendix describes the typical sequence of JSON messages exchanged between a Cast sender and receiver to establish, run, and terminate a streaming session.
Before a streaming session can be launched, the sender first discovers the receiver on the network (e.g., via mDNS) and establishes a secure channel. The following message exchange then occurs:
Device Authentication: The sender and receiver exchange authentication messages to verify their identities. This is a prerequisite for all further communication.
urn:x-cast:com.google.cast.tp.deviceauthVirtual Connection: The sender establishes a “virtual connection” to the main receiver process. This acts as the initial control channel.
urn:x-cast:com.google.cast.tp.connection, type CONNECTReceiver Status Check: The sender queries the receiver's current status.
urn:x-cast:com.google.cast.receiver, type GET_STATUSurn:x-cast:com.google.cast.receiver, type RECEIVER_STATUS. The reply indicates the currently running application (typically the “Backdrop” idle screen) and other device state like volume.Application Availability: The sender checks if the receiver supports the Cast streaming application.
urn:x-cast:com.google.cast.receiver, type GET_APP_AVAILABILITY. The sender sends one request for the standard audio/video streaming app (0F5096E8) and another for the audio-only app (85CDB22F).urn:x-cast:com.google.cast.receiver, type GET_APP_AVAILABILITY response. The receiver confirms whether each app is available or not.Once the user initiates streaming (e.g., by selecting “Cast...” in Chrome), the following message flow begins:
Launch Request: The sender requests the receiver to launch the streaming application.
urn:x-cast:com.google.cast.receiver, type LAUNCH{ "type": "LAUNCH", "appId": "0F5096E8", "requestId": 17 }
Launch Confirmation: The receiver confirms the launch by sending a RECEIVER_STATUS update. This crucial message contains the transportId and sessionId for the newly created application instance. This transportId will be used as the destination_id for all subsequent messages to the streaming app.
{ "type": "RECEIVER_STATUS", "requestId": 17, "status": { "applications": [ { "appId": "0F5096E8", "displayName": "Chrome Mirroring", "sessionId": "154d8823-...", "transportId": "154d8823-...", "isIdleScreen": false } ] } }
Connect to Streaming App: The sender establishes a new virtual connection, this time directly to the streaming application instance using its transportId.
urn:x-cast:com.google.cast.tp.connection, type CONNECTMedia Negotiation (Offer/Answer): The sender and receiver negotiate the media format for streaming.
urn:x-cast:com.google.cast.webrtc, type OFFER. The sender proposes a set of supported audio and video streams.{ "type": "OFFER", "seqNum": 820263768, "offer": { "castMode": "mirroring", "supportedStreams": [ { "index": 0, "type": "audio_source", "codecName": "opus", "rtpPayloadType": 127, "ssrc": 264890, "targetDelay": 400, "channels": 2, ... }, { "index": 1, "type": "video_source", "codecName": "vp8", "rtpPayloadType": 96, "ssrc": 748229, "maxFrameRate": "30", "resolutions": [{"width": 1920, "height": 1080}], ... } ] } }
urn:x-cast:com.google.cast.webrtc, type ANSWER. The receiver accepts the offer, selects which streams it will use (via sendIndexes), and specifies the UDP port for media transport.{ "type": "ANSWER", "seqNum": 820263768, "result": "ok", "answer": { "udpPort": 33533, "sendIndexes": [0, 1], "ssrcs": [264891, 748230] } }
Streaming Active: With the negotiation complete, media begins to flow over UDP. The streaming app sends a MEDIA_STATUS message to confirm that playback has started.
urn:x-cast:com.google.cast.media, type MEDIA_STATUSWhen the user stops the session:
Stop Request: The sender sends a STOP message to the main receiver process, referencing the sessionId of the streaming application.
urn:x-cast:com.google.cast.receiver, type STOP{ "type": "STOP", "sessionId": "154d8823-...", "requestId": 22 }
Close Connection: The streaming application, upon termination, sends a CLOSE message to tear down its virtual connection with the sender.
urn:x-cast:com.google.cast.tp.connection, type CLOSEReturn to Idle: The receiver returns to the idle screen and broadcasts a final RECEIVER_STATUS update, showing that “Backdrop” is now the active application.