当前位置：首页 > article >正文

【AndroidRTC-10】webrtc是如何确定双端的编解码类型？

article 2025/3/26 13:56:10

Android-RTC系列软重启，改变以往细读源代码的方式改为带上实际问题分析代码。增加实用性，方便形成肌肉记忆。同时不分种类、不分难易程度，在线征集问题切入点。

问题：webrtc-android是如何确定编解码类型，如何调整视频使用h264编解码？

分析：在早年的理论文章介绍到，rtc双端在正常建立媒体通讯前，需要进行SDP的信息交换，也就是使用Offer-Answer 模型交换 SDP。而在SDP中就包含了媒体信息的描述，进而确定通讯的媒体类型。所以这次我们应该重点关注这一块——SDP。

SDP Structure

SDP 描述分为两部分，分别是会话级别的描述（session level）和媒体级别的描述（media level），其具体的组成可参考 RFC4566，带星号 (*) 的是可选的。常见的结构如下：

Session description（会话级别描述）
         v=  (protocol version)
         o=  (originator and session identifier)
         s=  (session name)
         c=* (connection information -- not required if included in all media)
         One or more Time descriptions ("t=" and "r=" lines; see below)
         a=* (zero or more session attribute lines)
         Zero or more Media descriptions

Time description
         t=  (time the session is active)

Media description（媒体级别描述）, if present
         m=  (media name and transport address)
         c=* (connection information -- optional if included at session level)
         a=* (zero or more media attribute lines)

一个真实的sdp描述 (只包含了视频没有开启音频)

    v=0
    o=- 5771495527976275027 2 IN IP4 127.0.0.1
    s=-
    t=0 0
    a=group:BUNDLE 0
    a=extmap-allow-mixed
    a=msid-semantic: WMS
    m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 35 36 37 38 39 40 41 42 127 103 104 105 106 107 108 43
    c=IN IP4 0.0.0.0
    a=rtcp:9 IN IP4 0.0.0.0
    a=ice-ufrag:e8ac
    a=ice-pwd:J2ER3pB4qDi1l6dExhW5fgmw
    a=ice-options:trickle renomination
    a=fingerprint:sha-256 82:EF:59:DD:D4:8B:49:75:F1:6B:54:CF:E4:8B:94:4A:86:C6:21:C0:23:E7:6C:CA:E4:B4:E2:30:E5:F0:38:11
    a=setup:actpass
    a=mid:0
    a=extmap:1 urn:ietf:params:rtp-hdrext:toffset
    a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
    a=extmap:3 urn:3gpp:video-orientation
    a=extmap:4 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
    a=extmap:5 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
    a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
    a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
    a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/color-space
    a=extmap:9 urn:ietf:params:rtp-hdrext:sdes:mid
    a=extmap:10 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
    a=extmap:11 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
    a=recvonly
    a=rtcp-mux
    a=rtcp-rsize
    a=rtpmap:96 VP8/90000
    a=rtcp-fb:96 rrtr
    a=rtcp-fb:96 goog-remb
    a=rtcp-fb:96 transport-cc
    a=fmtp:96 x-google-max-bitrate=50000;x-google-min-bitrate=500;x-google-start-bitrate=2000;x-google-huge-frames-sent=0;x-google-packetization-mode=1
    a=rtcp-fb:96 ccm fir
    a=rtcp-fb:96 nack
    a=rtcp-fb:96 nack pli
    a=rtpmap:97 rtx/90000
    a=fmtp:97 apt=96
    a=rtpmap:98 VP9/90000
    a=rtcp-fb:98 goog-remb
    a=rtcp-fb:98 transport-cc
    a=rtcp-fb:98 ccm fir
    a=rtcp-fb:98 nack
    a=rtcp-fb:98 nack pli
    a=fmtp:98 profile-id=0
    a=rtpmap:99 rtx/90000
    a=fmtp:99 apt=98
    a=rtpmap:35 VP9/90000
    a=rtcp-fb:35 goog-remb
    a=rtcp-fb:35 transport-cc
    a=rtcp-fb:35 ccm fir
    a=rtcp-fb:35 nack
    a=rtcp-fb:35 nack pli
    a=fmtp:35 profile-id=1
    a=rtpmap:36 rtx/90000
    a=fmtp:36 apt=35
    a=rtpmap:37 VP9/90000
    a=rtcp-fb:37 goog-remb
    a=rtcp-fb:37 transport-cc
    a=rtcp-fb:37 ccm fir
    a=rtcp-fb:37 nack
    a=rtcp-fb:37 nack pli
    a=fmtp:37 profile-id=3
    a=rtpmap:38 rtx/90000
    a=fmtp:38 apt=37
    a=rtpmap:39 AV1/90000
    a=rtcp-fb:39 goog-remb
    a=rtcp-fb:39 transport-cc
    a=rtcp-fb:39 ccm fir
    a=rtcp-fb:39 nack
    a=rtcp-fb:39 nack pli
    a=rtpmap:40 rtx/90000
    a=fmtp:40 apt=39
    a=rtpmap:41 AV1/90000
    a=rtcp-fb:41 goog-remb
    a=rtcp-fb:41 transport-cc
    a=rtcp-fb:41 ccm fir
    a=rtcp-fb:41 nack
    a=rtcp-fb:41 nack pli
    a=fmtp:41 profile=1
    a=rtpmap:42 rtx/90000
    a=fmtp:42 apt=41
    a=rtpmap:127 H264/90000
    a=rtcp-fb:127 goog-remb
    a=rtcp-fb:127 transport-cc
    a=rtcp-fb:127 ccm fir
    a=rtcp-fb:127 nack
    a=rtcp-fb:127 nack pli
    a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
    a=rtpmap:103 rtx/90000
    a=fmtp:103 apt=127
    a=rtpmap:104 H265/90000
    a=rtcp-fb:104 goog-remb
    a=rtcp-fb:104 transport-cc
    a=rtcp-fb:104 ccm fir
    a=rtcp-fb:104 nack
    a=rtcp-fb:104 nack pli
    a=rtpmap:105 rtx/90000
    a=fmtp:105 apt=104
    a=rtpmap:106 red/90000
    a=rtpmap:107 rtx/90000
    a=fmtp:107 apt=106
    a=rtpmap:108 ulpfec/90000
    a=rtpmap:43 flexfec-03/90000
    a=rtcp-fb:43 goog-remb
    a=rtcp-fb:43 transport-cc
    a=fmtp:43 repair-window=10000000

SDP Line 是顺序相关的，比如 a=rtpmap:96 后面的都是它相关的设置，直到下一个 a=rtpmap

SDP 解析时，每个 SDP Line 都是以 key=... 形式，解析出 key 是 a 后，可能有两种方式

a=<attribute> 或者 a=<attribute>:<value>

比如 c=IN IP4 0.0.0.0，key 为 c。
比如 a=rtcp-mux，key 为 a，attribute 为 rtcp-mux，没有 value。
比如 a=rtpmap:96 VP8/90000，key 为 a，attribute 为 rtpmap，value=96 VP8/90000。

详细介绍请参考链接：WebRTC SDP 详解和剖析-阿里云开发者社区

我们这里直接跳到媒体级别描述。类似下面的 SDP 描述了一个音频和一个视频，它的格式参考 RFC4566:

.........
m=audio 9 UDP/TLS/RTP/SAVPF 111
a=mid:audio
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
.........
m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 39 40 127 
a=mid:video
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
.........

后面的一串数字 111 和 96 97 就是 fmt，分别代表音频和视频的 Media Codec格式，后面会跟着 rtpmap、rtcp-fb、fmtp 这些属性来做进一步的详细的描述。

a=mid 属性可以认为是每个 M 描述的唯一 ID。比如 a=mid:audio，那么 audio 这个字符串就是这个 M 描述的 ID。有的时候 mid 属性值也可以用数字表示，比如 a=mid:0，那么 0 也是这个 M 描述的 ID。mid 值一般和 grouping 传输属性的 BUNDLE 策略结合来用，比如 a=group:BUNDLE audio video，代表本次会话将对 mid 为 audio 和 video 的 M 描述进行复用传输。
M line 的数字 9 代表该媒体类型的传输端口，在 RTC 场景中都是使用 ICE candidate 的地址信息进行数据传输，所以 M line 的 port 并没有用到。
RTX 表示是重传，比如 video 的 97，就是 apt=96 的重传。也就是说如果用的是 97 这个编码格式，它是在 96(VP8) 基础上加了重传功能。

如何确定最后的编码？是在localOffer 和 remoteAnswer选取交集的第一个媒体类型。

PlanB and UnifiedPlan

在源代码中可能还会涉及到PlanB and UnifiedPlan，is_unified_plan是WebRTC中标识是否使用Unified Plan SDP语义的标志。Unified Plan是较新的标准，每个媒体流对应独立的m-line，而Plan B则是旧标准，同一类型的媒体流共享一个m-line。从M89版本开始，Chrome默认启用Unified Plan，并逐步废弃Plan B。

在sdp中使用ssrc来指定有多少媒体流，如下所示。

m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 39 40 127 103 104 105 106 107 108
a=ssrc-group:FID 61733252 3497130671
a=ssrc:61733252 cname:3pW572Zij3FqEUjL
a=ssrc:61733252 msid:ARDAMS ARDAMSv0
a=ssrc:3497130671 cname:3pW572Zij3FqEUjL
a=ssrc:3497130671 msid:ARDAMS ARDAMSv0

实际上 Audio 和 Video 都有多个 SSRC，每个 SSRC 的编码可能相同但也可能不同。比如互联网视频会议，用移动端接入时，编码可能都是 H.264，但是和其他终端接入时可能会有其他编码。

如果 SSRC 的编码不相同，那么将这些 SSRC 放在同一个 M 描述就会有问题，这就是 PlanB 和 UnifiedPlan 的关键所在。对于 PlanB 只有一个 M(audio) 和 M(video)，他们的编码要相同，有多路媒体流时，则根据 SSRC 去区分。UnifiedPlan 则可以有多个 M(audio) 和 M(video)，每路流都有自己的 M 描述，这样就可以支持不同的编码。

PlanB 和 UnifiedPlan 其实就是 WebRTC 在多路媒体源（multi media source）场景下的两种不同的 SDP 协商方式。如果引入 Stream 和 Track 的概念，那么一个 Stream 可能包含 AudioTrack 和 VideoTrack，当有多路 Stream 时，就会有更多的 Track，如果每一个 Track 唯一对应一个自己的 M 描述，那么这就是 UnifiedPlan，如果每一个 M line 描述了多个 Track(track id)，那么这就是 Plan B。

源代码解读

回到问题上：webrtc-android是如何确定编解码类型，如何调整视频使用h264编解码？

答：因为SDP会话描述协议是基于文本的协议，顺序相关的。所以在sdp交换后通过 localOffer 和 remoteAnswer的媒体类型列表中取交集取第一个共同类型确认本次会话的编解码类型。也就是说local和remote的description两者结合确认。

所以我们需要知道 localOffer究竟是如何产生本地的视频媒体列表的？并看看如何把h264相关的媒体类型，放置到sdp描述中 m=video的第一个选项。

以下createOffer的调用链路：

Java层
|--> PeerConnection.createOffer(sdpObserver, sdpMediaConstraints);
|--> nativeCreateOffer

sdk\android\src\jni\pc\peer_connection.cc
|--> static void JNI_PeerConnection_CreateOffer( ... )

api\peer_connection_interface.h
|--> virtual void CreateOffer(CreateSessionDescriptionObserver* observer,
                           const RTCOfferAnswerOptions& options) = 0;

pc\peer_connection.cc
void PeerConnection::CreateOffer(CreateSessionDescriptionObserver* observer,
                                 const RTCOfferAnswerOptions& options) {
  RTC_DCHECK_RUN_ON(signaling_thread());
  sdp_handler_->CreateOffer(observer, options);
}

pc\sdp_offer_answer.cc
void SdpOfferAnswerHandler::DoCreateOffer(
    const PeerConnectionInterface::RTCOfferAnswerOptions& options,
    rtc::scoped_refptr<CreateSessionDescriptionObserver> observer) {
    ... ...
    cricket::MediaSessionOptions session_options;
    GetOptionsForOffer(options, &session_options);
    webrtc_session_desc_factory_->CreateOffer(observer.get(), options,
                                            session_options);
}

前面的调用链路我就不细说了，我们来重点关心 sdp_hander 和其内部的 webrtc_session_desc_factory_的CreateOffer内部又会调用 InternalCreateOffer。

void WebRtcSessionDescriptionFactory::InternalCreateOffer(
    CreateSessionDescriptionRequest request) {
  ... ...
  auto result = session_desc_factory_.CreateOfferOrError(
      request.options, sdp_info_->local_description()
                           ? sdp_info_->local_description()->description()
                           : nullptr);
  if (!result.ok()) {
    PostCreateSessionDescriptionFailed(request.observer.get(), result.error());
    return;
  }
  std::unique_ptr<cricket::SessionDescription> desc = std::move(result.value());
  RTC_CHECK(desc);
  ... ...
}

在WebRtcSessionDescriptionFactory::InternalCreateOffer中看到关键，又跳转到cricket::MediaSessionDescriptionFactory的CreateOfferOrError，如下所示。

webrtc::RTCErrorOr<std::unique_ptr<SessionDescription>>
MediaSessionDescriptionFactory::CreateOfferOrError(
    const MediaSessionOptions& session_options,
    const SessionDescription* current_description) const {
  ... ... ...
  StreamParamsVec current_streams =
      GetCurrentStreamParams(current_active_contents);

  AudioCodecs offer_audio_codecs;
  VideoCodecs offer_video_codecs;
  GetCodecsForOffer(current_active_contents, &offer_audio_codecs,
                    &offer_video_codecs);
  
  auto offer = std::make_unique<SessionDescription>();
  ... ... ...
}

提示：当前变量的关系
std::unique_ptr<SdpOfferAnswerHandler> sdp_handler_
|——> std::unique_ptr<WebRtcSessionDescriptionFactory> webrtc_session_desc_factory_
|——————> cricket::MediaSessionDescriptionFactory session_desc_factory_;

这里看到GetCodecsForOffer，明显就是我们说的，获取当前设备支持的媒体能力列表。这里贴出GetCodecsForOffer的详细代码。可以看到 all_video_codecs_ 就是目标成员变量。

// Getting codecs for an offer involves these steps:
// 1. Construct payload type -> codec mappings for current description.
// 2. Add any reference codecs that weren't already present
// 3. For each individual media description (m= section), filter codecs based
//    on the directional attribute (happens in another method).
void MediaSessionDescriptionFactory::GetCodecsForOffer(
    const std::vector<const ContentInfo*>& current_active_contents,
    AudioCodecs* audio_codecs,
    VideoCodecs* video_codecs) const {
  // First - get all codecs from the current description if the media type is used.
  // Add them to `used_pltypes` so the payload type is not reused if a
  // new media type is added.
  UsedPayloadTypes used_pltypes;
  MergeCodecsFromDescription(current_active_contents, audio_codecs,
                             video_codecs, &used_pltypes);

  // Add our codecs that are not in the current description.
  MergeCodecs(all_audio_codecs_, audio_codecs, &used_pltypes);
  MergeCodecs(all_video_codecs_, video_codecs, &used_pltypes);
}

all_video_codecs_ 由(video_recv_codecs_, video_send_codecs_);两者取并集，这步操作就是在MediaSessionDescriptionFactory::MediaSessionDescriptionFactory构造函数里发生的。

void MediaSessionDescriptionFactory::ComputeVideoCodecsIntersectionAndUnion() {
  video_sendrecv_codecs_.clear();

  // Use ComputeCodecsUnion to avoid having duplicate payload IDs
  all_video_codecs_ =
      ComputeCodecsUnion(video_recv_codecs_, video_send_codecs_);

  // Use NegotiateCodecs to merge our codec lists, since the operation is
  // essentially the same. Put send_codecs as the offered_codecs, which is the
  // order we'd like to follow. The reasoning is that encoding is usually more
  // expensive than decoding, and prioritizing a codec in the send list probably
  // means it's a codec we can handle efficiently.
  NegotiateCodecs(video_recv_codecs_, video_send_codecs_,
                  &video_sendrecv_codecs_, true);
}

MediaSessionDescriptionFactory::MediaSessionDescriptionFactory(
    cricket::MediaEngineInterface* media_engine,
    bool rtx_enabled,
    rtc::UniqueRandomIdGenerator* ssrc_generator,
    const TransportDescriptionFactory* transport_desc_factory)
    : ssrc_generator_(ssrc_generator),
      transport_desc_factory_(transport_desc_factory) {
  RTC_CHECK(transport_desc_factory_);
  if (media_engine) {
    audio_send_codecs_ = media_engine->voice().send_codecs();
    audio_recv_codecs_ = media_engine->voice().recv_codecs();
    video_send_codecs_ = media_engine->video().send_codecs(rtx_enabled);
    video_recv_codecs_ = media_engine->video().recv_codecs(rtx_enabled);
  }
  ComputeAudioCodecsIntersectionAndUnion();
  ComputeVideoCodecsIntersectionAndUnion();
}

到这里看到media_engine就轻松了，也就由应用层创建的PeerConnectionFactory、ConnectionContext 等全局变量延申的媒体引擎，这里我们可以逆向溯源验证是否，并且顺流溯源找到调用堆栈。

逆向溯源验证：

所以media_engine()->video() 对应的就是多年前的文章《Android-RTC-9 PeerConnectionFactory》介绍的 MediaEngineInterface 中video_decoder_factory/video_encoder_factory，也就是应用层PeerConnectionFactory中的DefaultVideoDecoder/EncoderFactory。

顺流溯源调用的方法：

//文件位置 media\engine\webrtc_video_engine.cc
std::vector<VideoCodec> WebRtcVideoEngine::send_codecs(bool include_rtx) const {
  return GetPayloadTypesAndDefaultCodecs(encoder_factory_.get(),
                                         /*is_decoder_factory=*/false,
                                         include_rtx, trials_);
}

template <class T>
std::vector<VideoCodec> GetPayloadTypesAndDefaultCodecs(
    const T* factory, bool is_decoder_factory, bool include_rtx,
    const webrtc::FieldTrialsView& trials) {
  if (!factory) {
    return {};
  }

  std::vector<webrtc::SdpVideoFormat> supported_formats =
      factory->GetSupportedFormats();
}

VideoEncoderFactoryWrapper::VideoEncoderFactoryWrapper(
    JNIEnv* jni,
    const JavaRef<jobject>& encoder_factory)
    : encoder_factory_(jni, encoder_factory) {
  const ScopedJavaLocalRef<jobjectArray> j_supported_codecs =
      Java_VideoEncoderFactory_getSupportedCodecs(jni, encoder_factory);
  supported_formats_ = JavaToNativeVector<SdpVideoFormat>(
      jni, j_supported_codecs, &VideoCodecInfoToSdpVideoFormat);
  const ScopedJavaLocalRef<jobjectArray> j_implementations =
      Java_VideoEncoderFactory_getImplementations(jni, encoder_factory);
  implementations_ = JavaToNativeVector<SdpVideoFormat>(
      jni, j_implementations, &VideoCodecInfoToSdpVideoFormat);
}

std::vector<SdpVideoFormat> VideoEncoderFactoryWrapper::GetSupportedFormats()
    const {
  return supported_formats_;
}

主要就是HardwareEncoderFactory的getSupportedCodecs，显然supportedCodecInfos这里，我们把VideoCodecMimeType.H264放在第一个位置，把h264的优先权提到最高。

上面这种方法需要修改源码并重新编译，侵入式比较强。我看最新版本的Demo也支持了多年前的暴力修改法。也就是在应用层等待local_description的onCreateSuccess后，以修改文本的方式修改sdp，把期望的编码类型提前到第一个a=rtpmap。这种方式简单直接，看看大家喜欢哪个方案了。