分布式数据库音频3A一——webrtc源码3A的启用方法和具体流程

涛声依旧在 发表于 2024-11-5 13:17:02

音频3A一——webrtc源码3A的启用方法和具体流程

媒介

在上一篇文章中，音频3A——初步相识音频3A，大致先容了3A的作用、使用场景以及带来了哪些题目，同时枚举了一些各个平台常用的3A开源库，再接下来的文章中，博主打算以webrtc（实在过于经典）来先容具体的3A算法，以是须要读者对于webrtc拥有一定的相识。
由于webrtc过于庞大，3A只是webrtc中的一个模块，而且博主在接下来的文章中会涉及到很多webrtc中具体的实现，以是在正式进入到webrtc 3A算法之前，先先容webrtc中3A的具体使用流程，以便于有兴趣的读者可以对照具体代码举行查看。
|版本声明：山河君，未经博主允许，克制转载
一、webrtc开启3A

1.3A对象的创建

webrtc中全部的对外3A接口都在webrtc\src\api\audio\audio_processing.h文件中，创建方法也在里面
class RTC_EXPORT AudioProcessingBuilder {

......

// Creates an APM instance with the specified config or the default one if
// unspecified. Injects the specified components transferring the ownership
// to the newly created APM instance - i.e., except for the config, the
// builder is reset to its initial state.
rtc::scoped_refptr<AudioProcessing> Create();
.....
};
通过_audioProcessingPtr = webrtc::AudioProcessingBuilder().Create(); 先创建3A的实例对象
2. 3A的配置

创建实例后，可以选择对应的3A配置，这个配置选项是AudioProcessing的内部类，内部类存在大量的配置选项
class RTC_EXPORT AudioProcessing : public RefCountInterface {
public:
struct RTC_EXPORT Config {
.....
};
.....
}
下面是具体的设置项：

[*]HighPassFilter
struct HighPassFilter {
bool enabled = false;
bool apply_in_full_band = true;
} high_pass_filter;

[*]enabled:开启高通滤波器
[*]apply_in_full_band：全屏段使用
[*]备注：过滤低频声音，人耳对于低频声音不敏感，且回声大多是低频

[*]EchoCanceller
struct EchoCanceller {
bool enabled = false;
bool mobile_mode = false;
bool export_linear_aec_output = false;
// Enforce the highpass filter to be on (has no effect for the mobile
// mode).
bool enforce_high_pass_filtering = true;
} echo_canceller;

[*]enabled：表现是否启用回声消除。设置为 true 时启用此功能。
[*]mobile_mode：表现是否使用移动模式。当为 true 时，会调整算法以适应移动装备的音频特性。
[*]export_linear_aec_output：如果设置为 true，将导出线性 AEC 输出。这通常用于调试和分析。
[*]enforce_high_pass_filtering：逼迫开启高通滤波器。此选项在移动模式下没有效果，但在其他情况下有助于去除低频噪声。

[*]NoiseSuppression
struct NoiseSuppression {
bool enabled = false;
enum Level { kLow, kModerate, kHigh, kVeryHigh };
Level level = kModerate;
bool analyze_linear_aec_output_when_available = false;
} noise_suppression;

[*]enabled:指示是否启用噪声克制功能。如果设置为 true，则噪声克制将被应用于音频流。
[*]level: 设置噪声克制的强度。可能的值包括：
1）kLow: 低级别的噪声克制，得当较轻的配景噪声。
2）kModerate: 中品级别的噪声克制，适用于一般配景噪声。
3）kHigh: 高级别的噪声克制，得当较强的配景噪声。
4）kVeryHigh: 非常高的噪声克制，得当极其嘈杂的环境。
[*]analyze_linear_aec_output_when_available: 指示在可能的情况下是否分析线性回声消除（AEC）输出。当设置为 true 时，噪声克制算法将尝试分析回声消除的输出，以便更好地去除配景噪声。这可以进步噪声克制的效果，尤其是在有回声的环境中

[*]TransientSuppression
struct TransientSuppression {
bool enabled = false;
} transient_suppression;

[*]enabled: 指示是否启用瞬态克制功能。如果设置为 true，则启用瞬态克制，音频处理模块会尝试检测并减少瞬态噪声。如果设置为 false，则不举行瞬态克制处理。

[*]GainController1/GainController2
调整AGC的配置有两个GainController1和GainController2，前者是针对麦克风输入，后者是针对扬声器输出，AGC为了针对不同场景，存在很多参数，比如须要限定增益范围，最大限定等等，这里只针对常用场景举行先容，有须要的小伙伴可以再根据须要选择

[*]enabled：开启增益
[*]Mode: 枚举范例，界说增益控制的工作模式：
1）kAdaptiveAnalog: 适用于有模拟音量控制的装备，结合模拟增益和数字压缩。
2）kAdaptiveDigital: 适用于没有模拟音量控制的装备，紧张在数字域中举行增益调整和压缩。
3）kFixedDigital: 只启用数字压缩，通过固定增益来处理输入信号，得当信号级别可预测的嵌入式装备。
[*]target_level_dbfs: 目标峰值水平，以 dBFs（相对于数字满刻度的分贝）表现。通常使用正值。比方，3 表现目标水平为 -3 dBFs。
[*]compression_gain_db: 数字压缩阶段可以施加的最大增益，以 dB 表现。数值越大，压缩效果越强。0 表现不举行压缩，范围为。
[*]enable_limiter:指示是否启用限定器功能。如果启用，压缩阶段会将信号限定在目标水平之下。
配置完成后调用 _audioProcessingPtr->ApplyConfig(config);完成设置。
3.注册到Peerconnection

3A实例创建完成后，在一开始创建peerconnection时就可以注册进去，固然由于3A是独立的模块，也可以在创建媒体引擎时或者开启voiceengine时单独在别的地方举行创建
RTC_EXPORT rtc::scoped_refptr<PeerConnectionFactoryInterface>
CreatePeerConnectionFactory(
rtc::Thread* network_thread,
rtc::Thread* worker_thread,
rtc::Thread* signaling_thread,
rtc::scoped_refptr<AudioDeviceModule> default_adm,
rtc::scoped_refptr<AudioEncoderFactory> audio_encoder_factory,
rtc::scoped_refptr<AudioDecoderFactory> audio_decoder_factory,
std::unique_ptr<VideoEncoderFactory> video_encoder_factory,
std::unique_ptr<VideoDecoderFactory> video_decoder_factory,
rtc::scoped_refptr<AudioMixer> audio_mixer,
rtc::scoped_refptr<AudioProcessing> audio_processing,
std::unique_ptr<AudioFrameProcessor> audio_frame_processor = nullptr,
std::unique_ptr<FieldTrialsView> field_trials = nullptr);
二、webrtc中3A的调用流程

[*]在创建PeerConnection时把3A实例注册进去后，PeerConnection会记录全部的依赖项
rtc::scoped_refptr<PeerConnectionFactoryInterface> CreatePeerConnectionFactory(
rtc::Thread* network_thread,
rtc::Thread* worker_thread,
rtc::Thread* signaling_thread,
rtc::scoped_refptr<AudioDeviceModule> default_adm,
rtc::scoped_refptr<AudioEncoderFactory> audio_encoder_factory,
rtc::scoped_refptr<AudioDecoderFactory> audio_decoder_factory,
std::unique_ptr<VideoEncoderFactory> video_encoder_factory,
std::unique_ptr<VideoDecoderFactory> video_decoder_factory,
rtc::scoped_refptr<AudioMixer> audio_mixer,
rtc::scoped_refptr<AudioProcessing> audio_processing,
std::unique_ptr<AudioFrameProcessor> audio_frame_processor,
std::unique_ptr<FieldTrialsView> field_trials) {
PeerConnectionFactoryDependencies dependencies;
.....

if (audio_processing) {
dependencies.audio_processing = std::move(audio_processing);
} else {
dependencies.audio_processing = AudioProcessingBuilder().Create();
}
.....

EnableMedia(dependencies);
return CreateModularPeerConnectionFactory(std::move(dependencies));
}

[*]在创建Peerconnection时，会创建毗连上下文

// Static
rtc::scoped_refptr<PeerConnectionFactory> PeerConnectionFactory::Create(
PeerConnectionFactoryDependencies dependencies) {
.......
auto context = ConnectionContext::Create(
   CreateEnvironment(std::move(dependencies.trials),
                     std::move(dependencies.task_queue_factory)),
.....
}

[*]在创建上下文中，会创建媒体引擎MediaEngine，同时把audio_processing传入

ConnectionContext::ConnectionContext(
const Environment& env,
PeerConnectionFactoryDependencies* dependencies)
.....
   media_engine_(
      dependencies->media_factory != nullptr
         ? dependencies->media_factory->CreateMediaEngine(env_,
                                                            *dependencies)
         : nullptr),
.....

[*]媒体引擎会MediaEngine ，分别创建音频引擎audio_engine 和视频引擎video_engine ，同时把audio_processing传入音频引擎
std::unique_ptr<MediaEngineInterface> CreateMediaEngine(
const Environment& env,
PeerConnectionFactoryDependencies& deps) override {
auto audio_engine = std::make_unique<WebRtcVoiceEngine>(....
auto video_engine = std::make_unique<WebRtcVideoEngine>(....
}

[*]音频引擎持有3A实例
WebRtcVoiceEngine::WebRtcVoiceEngine(
....
rtc::scoped_refptr<webrtc::AudioProcessing> audio_processing,
.....)
:
.....
   apm_(audio_processing),
....
{...}

[*]音频引擎audio_engine 初始化时，创建AudioState

void WebRtcVoiceEngine::Init() {
...
audio_state_ = webrtc::AudioState::Create(config);
...
}

[*]AudioState拥有AudioTransportImpl实例，同时会把一些资源包括3A实例注册进去
AudioState::AudioState(const AudioState::Config& config)
: config_(config),
   audio_transport_(config_.audio_mixer.get(),
                  config_.audio_processing.get(),
                  config_.async_audio_processing_factory.get()) {
                  ....
}

[*]此时AudioTransportImpl会在音频采集和渲染过程中，将近端信号和远端信号塞入到3A里

[*]近端信号的塞入
int32_t AudioTransportImpl::RecordedDataIsAvailable(
const void* audio_data,
size_t number_of_frames,
size_t bytes_per_sample,
size_t number_of_channels,
uint32_t sample_rate,
uint32_t audio_delay_milliseconds,
int32_t /*clock_drift*/,
uint32_t /*volume*/,
bool key_pressed,
uint32_t& /*new_mic_volume*/,
absl::optional<int64_t>
   estimated_capture_time_ns) {
   ....
ProcessCaptureFrame(audio_delay_milliseconds, key_pressed,
                  swap_stereo_channels, audio_processing_,
                  audio_frame.get());
   ....
}
void ProcessCaptureFrame(uint32_t delay_ms,
                     bool key_pressed,
                     bool swap_stereo_channels,
                     AudioProcessing* audio_processing,
                     AudioFrame* audio_frame) {
....
if (audio_processing) {
audio_processing->set_stream_delay_ms(delay_ms);
audio_processing->set_stream_key_pressed(key_pressed);
int error = ProcessAudioFrame(audio_processing, audio_frame);
}
....
}
int ProcessAudioFrame(AudioProcessing* ap, AudioFrame* frame) {
...
int result = ap->ProcessStream(frame->data(), input_config, output_config,
                              frame->mutable_data());

AudioProcessingStats stats = ap->GetStatistics();
...
}

[*]远端信号的塞入
int32_t AudioTransportImpl::NeedMorePlayData(const size_t nSamples,
                                          const size_t nBytesPerSample,
                                          const size_t nChannels,
                                          const uint32_t samplesPerSec,
                                          void* audioSamples,
                                          size_t& nSamplesOut,
                                          int64_t* elapsed_time_ms,
                                          int64_t* ntp_time_ms) {
   ....
if (audio_processing_) {
const auto error =
   ProcessReverseAudioFrame(audio_processing_, &mixed_frame_);
RTC_DCHECK_EQ(error, AudioProcessing::kNoError);
}
....
}
int ProcessReverseAudioFrame(AudioProcessing* ap, AudioFrame* frame) {
....
int result = ap->ProcessReverseStream(frame->data(), input_config,
                                    output_config, frame->mutable_data());
....
}
总结

本篇博客是为了在后面更深入的进入到3A算法之前，对于webrtc中3A调用的配置以及3A工作流程的先容，但是这一系列的博客最终目的是为了先容3A算法的具体实现，在下一篇博客里，就会进入真正的算法细节了，会有大量的数学公式，建议如果读者想要进一步相识的话，可以先看看博主之前对于音频进阶的文章。
如果对您有所资助，请帮忙点个赞吧！

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

ToB企服应用市场:ToB评测及商务社交产业平台's Archiver

音频3A一——webrtc源码3A的启用方法和具体流程