14 KiB
Yandex Telemost WebRTC API Documentation - YTWA
Overview
Yandex Telemost implements a Selective Forwarding Unit (SFU) architecture for WebRTC conferencing. The system supports audio, video, and SCTP DataChannel transport over WebRTC with separate publisher and subscriber peer connections.
Architecture
Connection Model
- SFU-based routing (not P2P)
- Separate PeerConnections for publishing and subscribing
- WebSocket signaling channel
- STUN/TURN infrastructure for NAT traversal
Transport Capabilities
- Audio: Opus codec (48kHz, stereo, DTX, FEC)
- Video: VP8, VP9, H.264, AV1 codecs with simulcast support
- DataChannel: SCTP over DTLS (max message size: 1023MB, port: 5000)
API Endpoints
Base URL
https://cloud-api.yandex.ru/telemost_front/v2/telemost
1. Connection Initialization
Endpoint: GET /conferences/{encoded_conference_url}/connection
Parameters:
next_gen_media_platform_allowed: boolean (string "true")display_name: string (URL-encoded participant name)waiting_room_supported: boolean (string "true")
Headers:
User-Agent: Browser user agent stringAccept: "/"content-type: "application/json"Client-Instance-Id: UUID v4X-Telemost-Client-Version: Version string (e.g., "187.1.0")idempotency-key: UUID v4Origin: "https://telemost.yandex.ru"Referer: "https://telemost.yandex.ru/"
Response:
{
"connection_type": "CONFERENCE",
"uri": "https://telemost.yandex.ru/j/{conference_id}",
"room_id": "uuid",
"safe_room_id": "uuid",
"peer_id": "uuid",
"session_id": "uuid",
"peer_session_id": "uuid",
"credentials": "string",
"expiration_time": 1775424866469,
"conference_limit": 40,
"media_platform": "GOLOOM",
"client_configuration": {
"media_server_url": "wss://goloom.strm.yandex.net/join",
"service_name": "telemost",
"ice_servers": [
{
"urls": ["stun:stun.rtc.yandex.net:3478"]
}
],
"goloom_session_open_ms": 120000,
"wait_time_to_reconnect_ms": 3553
}
}
WebSocket Protocol
Connection
URL: Obtained from client_configuration.media_server_url
Protocol: WSS (WebSocket Secure)
Message Format
All messages are JSON objects with a uid field (UUID v4) and one message-specific field.
Message Types
1. Client Hello
Direction: Client → Server
Structure:
{
"uid": "uuid",
"hello": {
"participantMeta": {
"name": "string",
"role": "SPEAKER",
"description": "string",
"sendAudio": boolean,
"sendVideo": boolean
},
"participantAttributes": {
"name": "string",
"role": "SPEAKER",
"description": "string"
},
"sendAudio": boolean,
"sendVideo": boolean,
"sendSharing": boolean,
"participantId": "uuid",
"roomId": "uuid",
"serviceName": "telemost",
"credentials": "string",
"capabilitiesOffer": {
"offerAnswerMode": ["SEPARATE"],
"initialSubscriberOffer": ["ON_HELLO"],
"slotsMode": ["FROM_CONTROLLER"],
"simulcastMode": ["DISABLED", "STATIC"],
"selfVadStatus": ["FROM_SERVER", "FROM_CLIENT"],
"dataChannelSharing": ["TO_RTP"]
},
"sdkInfo": {
"implementation": "string",
"version": "string",
"userAgent": "string",
"hwConcurrency": integer
},
"sdkInitializationId": "uuid",
"disablePublisher": boolean,
"disableSubscriber": boolean,
"disableSubscriberAudio": boolean
}
}
2. Server Hello
Direction: Server → Client
Structure:
{
"uid": "uuid",
"serverHello": {
"capabilitiesAnswer": {
"offerAnswerMode": "SEPARATE",
"initialSubscriberOffer": "ON_HELLO",
"slotsMode": "FROM_CONTROLLER",
"simulcastMode": "DISABLED",
"selfVadStatus": "FROM_SERVER",
"dataChannelSharing": "TO_RTP",
"videoEncoderConfig": "NO_CONFIG",
"dataChannelVideoCodec": "UNIQUE_CODEC_FROM_TRACK_DESCRIPTION",
"bandwidthLimitationReason": "BANDWIDTH_REASON_ENABLED",
"publisherVp9": "PUBLISH_VP9_ENABLED",
"svcMode": "SVC_MODE_L3T3_KEY"
},
"servingComponents": [
{
"type": "BORDER|WEBRTC_SERVER|CONTROLLER",
"host": "string",
"version": "string"
}
],
"sessionSecret": "uuid",
"sfuPeerInitializationId": "uuid",
"rtcConfiguration": {
"iceServers": [
{
"urls": ["stun:turn.tel.yandex.net", "stun:stun.rtc.yandex.net"],
"credential": "",
"username": ""
},
{
"urls": ["turn:turn.tel.yandex.net:443"],
"credential": "string",
"username": "string"
}
]
},
"pingPongConfiguration": {
"pingInterval": 5000,
"ackTimeout": 9000
},
"telemetryConfiguration": {
"sendingInterval": 20000
}
}
}
3. Acknowledgment
Direction: Client → Server
Structure:
{
"uid": "uuid",
"ack": {
"status": {
"code": "OK",
"description": "string"
}
}
}
4. Subscriber SDP Offer
Direction: Server → Client
Structure:
{
"uid": "uuid",
"subscriberSdpOffer": {
"pcSeq": integer,
"sdp": "string"
}
}
SDP Format: Standard WebRTC SDP with bundled media streams. Includes:
- Video tracks (m=video) with VP8, VP9, H.264, AV1 codecs
- Audio tracks (m=audio) with Opus, PCMA, PCMU, G722 codecs
- Application track (m=application) for SCTP DataChannel
5. Subscriber SDP Answer
Direction: Client → Server
Structure:
{
"uid": "uuid",
"subscriberSdpAnswer": {
"pcSeq": integer,
"sdp": "string"
}
}
6. Publisher SDP Offer
Direction: Client → Server
Structure:
{
"uid": "uuid",
"publisherSdpOffer": {
"pcSeq": integer,
"sdp": "string"
}
}
7. Publisher SDP Answer
Direction: Server → Client
Structure:
{
"uid": "uuid",
"publisherSdpAnswer": {
"pcSeq": integer,
"sdp": "string"
}
}
8. ICE Candidate
Direction: Bidirectional
Structure:
{
"uid": "uuid",
"webrtcIceCandidate": {
"candidate": "string",
"sdpMid": "string",
"sdpMlineIndex": integer,
"usernameFragment": "string",
"target": "PUBLISHER|SUBSCRIBER",
"pcSeq": integer
}
}
Candidate Format: Standard ICE candidate string
candidate:{foundation} {component} {protocol} {priority} {ip} {port} typ {type} [tcptype {tcptype}]
9. VAD Activity
Direction: Server → Client
Structure:
{
"uid": "uuid",
"vadActivity": {
"active": boolean
}
}
10. Update Description
Direction: Server → Client
Structure:
{
"uid": "uuid",
"updateDescription": {
"description": [
{
"id": "peer-uuid",
"meta": {
"name": "string",
"role": "SPEAKER",
"description": "string",
"sendAudio": boolean,
"sendVideo": boolean
},
"participantAttributes": {
"name": "string",
"role": "SPEAKER",
"description": "string"
},
"sendSharing": boolean,
"tracks": []
}
]
}
}
Description: Sent by server when participants join/leave or change their media state. Contains full state of all participants in the conference.
Connection Flow
1. Initialization Phase
- Client requests connection info via REST API
- Server responds with room credentials and WebSocket URL
- Client establishes WebSocket connection
2. Handshake Phase
- Client sends
hellomessage with capabilities - Server responds with
serverHellocontaining negotiated capabilities - Client acknowledges with
ackmessage
3. Subscriber Setup
- Server sends
subscriberSdpOfferwith remote media tracks - Client creates answer and sends
subscriberSdpAnswer - Client acknowledges offer
- ICE candidates exchanged for subscriber PeerConnection
4. Publisher Setup
- Client creates offer and sends
publisherSdpOffer - Server responds with
publisherSdpAnswer - Client acknowledges answer
- ICE candidates exchanged for publisher PeerConnection
5. Media Exchange
- DataChannel opens on publisher PeerConnection
- Audio/video tracks activated based on configuration
- Bidirectional media flow through SFU
DataChannel Specifications
Configuration
- Protocol: SCTP over DTLS
- Port: 5000
- Max Message Size (Advertised): 1,073,741,823 bytes (1023 MB)
- Max Message Size (Actual): 8,192 bytes (8 KB)
- Ordered: Configurable (recommended: true)
- Label: Custom (e.g., "olcrtc")
SDP Attributes
m=application 9 UDP/DTLS/SCTP webrtc-datachannel
a=sctp-port:5000
a=max-message-size:1073741823
Message Size Limitations
The GOLOOM media server enforces a hard limit of 8KB per SCTP message despite advertising 1023MB in SDP. Messages exceeding 8KB are silently dropped.
Verified Limits:
- 8KB (8,192 bytes): ✓ Delivered
- 10KB (10,240 bytes): ✗ Dropped
Root Cause: SCTP fragmentation limit. Messages requiring more than ~6-7 UDP packets (MTU 1500) exceed server's reassembly buffer.
Large Data Transfer
For data exceeding 8KB, implement application-level chunking:
Method: Split data into 8KB chunks, send sequentially with bufferedAmount throttling.
Performance Metrics (Measured):
- 64KB: 2,198ms (239 Kbps)
- 128KB: 2,218ms (473 Kbps)
- 256KB: 2,204ms (952 Kbps)
Implementation:
chunk_size = 8192
for i in range(0, len(data), chunk_size):
while datachannel.bufferedAmount > chunk_size * 2:
await asyncio.sleep(0.001)
datachannel.send(data[i:i+chunk_size])
Latency Characteristics (RTT):
- 100 bytes: 42-54ms
- 1KB: 63-74ms
- 4KB: 54-118ms
- 8KB: 84-201ms
Usage
DataChannel is created on the publisher PeerConnection and becomes available on the subscriber PeerConnection through the ondatachannel event.
Audio Codec Configuration
Opus
- Sample Rate: 48000 Hz
- Channels: 2 (stereo)
- Parameters:
minptime=10useinbandfec=1(Forward Error Correction)usedtx=1(Discontinuous Transmission)
RED (Redundant Audio Data)
- Payload Type: 101
- Format:
111/111(Opus redundancy)
Audio Channel Limitations
WARNING: Audio channels are unreliable for data transmission through Yandex Telemost SFU.
Observed Issues:
- Severe signal degradation (>95% energy loss)
- Opus codec applies aggressive voice optimization
- Automatic gain control destroys non-voice signals
- Voice activity detection (VAD) filters out data tones
- Noise suppression corrupts encoded data
Test Results:
- Transmitted signal energy: 0.4973 (normalized)
- Received signal energy: 0.0031-0.0125 (99% loss)
- DTMF tones: Not detectable after transmission
- ggwave encoding: Completely destroyed by codec
Root Cause: WebRTC audio pipeline optimized for human voice, not arbitrary data. The Opus codec with DTX, FEC, and server-side processing makes audio channels unsuitable for data encoding schemes (FSK, DTMF, ggwave, etc.).
Recommendation: Use DataChannel for reliable data transmission. Audio channels should only be used for actual voice communication.
Video Codec Configuration
Supported Codecs
-
VP8
- Payload type: 96
- RTX support: 97
- NACK, PLI, REMB feedback
-
VP9
- Payload type: 98
- Profile: 0
- RTX support: 99
- NACK, PLI, REMB feedback
-
H.264
- Multiple profiles supported:
- Baseline (42001f, 42e01f)
- Main (4d001f)
- High (64001f)
- Packetization modes: 0, 1
- Level asymmetry allowed
- Multiple profiles supported:
-
AV1
- Payload type: 45
- RTX support: 46
RTP Extensions
http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
ICE Configuration
STUN Servers
stun:stun.rtc.yandex.net:3478stun:turn.tel.yandex.net
TURN Servers
turn:turn.tel.yandex.net:443turn:turn.tel.yandex.net:443?transport=tcp
Credentials are time-limited and provided in serverHello.rtcConfiguration.iceServers.
Error Handling
Connection Errors
- Retry with exponential backoff
- Maximum wait time:
wait_time_to_reconnect_ms(typically 3553ms)
Session Timeout
- Session open timeout:
goloom_session_open_ms(typically 120000ms) - Expiration time provided in connection response
Ping/Pong
- Ping interval: 5000ms
- ACK timeout: 9000ms
- Connection considered dead if no ACK received
Security Considerations
Authentication
- Credentials obtained from REST API
- Time-limited session tokens
- Peer ID and room ID validation
Transport Security
- WSS (WebSocket Secure) for signaling
- DTLS for media transport
- SRTP for audio/video encryption
Implementation Notes
Guest Access
- No authentication required for participants
- Only conference initiator needs account
- Display name can be arbitrary
Peer Limit
- Default conference limit: 40 participants
- Configurable per conference
User Agent Spoofing
- Server may validate
User-AgentandsdkInfo - Recommended to use realistic browser signatures
DataChannel Availability
- DataChannel support is not guaranteed
- Check SDP for
m=applicationline before assuming availability - Server may disable DataChannel without notice
DataChannel Message Size Workaround
The 8KB message limit can be bypassed using application-level chunking. For reliable large data transfer:
- Split payload into 8KB chunks
- Add sequence headers (chunk index, total chunks, transfer ID)
- Implement reassembly buffer on receiver
- Use bufferedAmount throttling to prevent congestion
Throughput: ~950 Kbps sustained for 256KB transfers with 32 sequential 8KB messages.
Rate Limits
Not explicitly documented. Recommended approach:
- Limit connection attempts per IP
- Implement exponential backoff on failures
- Respect session expiration times
Bandwidth Configuration
Video Layers
- L1: 1000 kbps (single layer)
- L2: 120 kbps (low), 360 kbps (med)
- L3: 120 kbps (low), 360 kbps (med), 800 kbps (hi)
- L4: 120 kbps (low), 360 kbps (med), 800 kbps (hi), 1000 kbps (ultra)
Screen Sharing (4K)
- Codec: VP8
- Min Bitrate: 300 kbps
- Max Bitrate: 2000 kbps
- Min Framerate: 8 fps
- Max Framerate: 30 fps
- Content Hint: detail
Telemetry
- Sending interval: 20000ms
- Endpoint:
logEndpoint(if provided in serverHello) - Format: Not documented
Version Information
- API Version: v2
- Client Version: 187.1.0 (as of documentation date)
- Media Platform: GOLOOM
- SDK Version: Configurable in
sdkInfo