FFmpeg audio player - create FFmpeg player

Original address: https://www.jianshu.com/p/73b0a0a9bb0d

Related articles

1. FFmpeg audio decoding and playing---- https://www.jianshu.com/p/76562aba84fb

2. Playing music files through FFMpeg---- https://blog.csdn.net/chenhy24/article/details/84201421

3. ffmpeg command operation audio format conversion---- https://www.bbsmax.com/A/mo5kbyeEJw/

4. Decoding audio files with FFMpeg---- https://blog.csdn.net/douzhq/article/details/82937422

5. Audio processing of FFmpeg---- https://blog.csdn.net/fireroll/article/details/83032025

6. Getting started with FFmpeg (3): playing audio---- https://blog.csdn.net/naibei/article/details/81086483

7. ffmpeg audio and video file playing module---- https://blog.csdn.net/wer85121430/article/details/79689002

8. Implementation of FFmpeg simple player audio playing---- https://blog.csdn.net/leisure_chn/article/details/87641899

FFmpeg audio player (1) - Introduction
FFmpeg audio player (2) - compile dynamic library
FFmpeg audio player (3) - add FFmpeg to Android
FFmpeg audio player (4) - decoding mp3 to pcm
FFmpeg audio player (5) - single input filter(volume,atempo)
FFmpeg audio player (6) - multi input filter(amix)
FFmpeg audio player (7) - play audio with OpenSLES
FFmpeg audio player (8) - create FFmpeg player
FFmpeg audio player (9) - playback control
With a series of preparatory knowledge, we can start to build FFmpeg audio player. Main requirements: multiple audio mixing playback, volume control of each track, synthetic audio variable speed playback. And audio pause, progress bar, stop to the next section.

AudioPlayer class

First, we create a C++ Class named AudioPlayer. In order to realize audio decoding, filtering, and playing functions, we need to decode, filter, queue, output pcm related, multithreaded, and Open SL ES related member variables. The code is as follows:


int fileCount;                  //Number of input audio files
AVFormatContext **fmt_ctx_arr;  //FFmpeg context array
AVCodecContext **codec_ctx_arr; //Decoder context array
int *stream_index_arr;          //Audio stream index group
AVFilterGraph *graph;
AVFilterContext **srcs;         //Input filter
AVFilterContext *sink;          //Output filter
char **volumes;                 //Volume of each audio
char *tempo;                    //Playback speed 0.5 ~ 2.0

//AVFrame queue
std::vector<AVFrame *> queue;   //Queue, used to save AVFrame after decoding and filtering

//I / O format
SwrContext *swr_ctx;            //Resampling, used to convert AVFrame to pcm data
uint64_t in_ch_layout;
int in_sample_rate;            //sampling rate
int in_ch_layout_nb;           //Enter the number of channels to use with SWR? CTX
enum AVSampleFormat in_sample_fmt; //Input audio sampling format

uint64_t out_ch_layout;
int out_sample_rate;            //sampling rate
int out_ch_layout_nb;           //Output channel number, used with SWR ﹣ CTX
int max_audio_frame_size;       //Maximum buffer data size
enum AVSampleFormat out_sample_fmt; //Output audio sampling format

// Schedule correlation
AVRational time_base;           //Scale, used to calculate progress
double total_time;              //Total duration (seconds)
double current_time;            //Current progress
int isPlay = 0;                 //Playing status 1: playing

pthread_t decodeId;             //Decode thread id
pthread_t playId;               //Play thread id
pthread_mutex_t mutex;          //Synchronous lock
pthread_cond_t not_full;        //Not full condition, used when producing AVFrame
pthread_cond_t not_empty;       //Not null condition, used when consuming AVFrame

//Open SL ES
SLObjectItf engineObject;       //Engine object
SLEngineItf engineItf;          //Engine interface
SLObjectItf mixObject;          //Output mix object
SLObjectItf playerObject;       //Player object
SLPlayItf playItf;              //Player interface
SLAndroidSimpleBufferQueueItf bufferQueueItf;   //Buffer interface

Decoding and playing process


The whole audio processing and playing process is shown in the figure above. First, we need two threads: one is decoding thread, the other is playing thread. The decoding thread is responsible for decoding, filtering and joining the queue operation of multiple audio files. The playback thread needs to take out the processed AVFrame from the queue, and then turn it into pcm input to play the audio through buffer callback.
In order to initialize these member variables, we define the initialization method for according to each member list.


int createPlayer();                     //Create player
int initCodecs(char **pathArr);         //Initialize decoder
int initSwrContext();                   //Initialize SwrContext
int initFilters();                      //Initialize filter

In the constructor, the array of audio files, the number of files, and the initialization methods are passed in


AudioPlayer::AudioPlayer(char **pathArr, int len) {
    fileCount = len;
    //Default volume 1.0 speed 1.0
    volumes = (char **) malloc(fileCount * sizeof(char *));
    for (int i = 0; i < fileCount; i++) {
        volumes[i] = "1.0";
    tempo = "1.0";

    pthread_mutex_init(&mutex, NULL);
    pthread_cond_init(&not_full, NULL);
    pthread_cond_init(&not_empty, NULL);


Here we also initialize the variables that control the volume and speed of each audio, the synchronization lock and the condition variable (production consumption control).

Initialize decoder array


int AudioPlayer::initCodecs(char **pathArr) {
    LOGI("init codecs");
    fmt_ctx_arr = (AVFormatContext **) malloc(fileCount * sizeof(AVFormatContext *));
    codec_ctx_arr = (AVCodecContext **) malloc(fileCount * sizeof(AVCodecContext *));
    stream_index_arr = (int *) malloc(fileCount * sizeof(int));
    for (int n = 0; n < fileCount; n++) {
        AVFormatContext *fmt_ctx = avformat_alloc_context();
        fmt_ctx_arr[n] = fmt_ctx;
        const char *path = pathArr[n];

        if (avformat_open_input(&fmt_ctx, path, NULL, NULL) < 0) {//Open file
            LOGE("could not open file:%s", path);
            return -1;
        if (avformat_find_stream_info(fmt_ctx, NULL) < 0) {//Read audio format file information
            LOGE("find stream info error");
            return -1;
        //Get audio index
        int audio_stream_index = -1;
        for (int i = 0; i < fmt_ctx->nb_streams; i++) {
            if (fmt_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
                audio_stream_index = i;
                LOGI("find audio stream index:%d", audio_stream_index);
        if (audio_stream_index < 0) {
            LOGE("error find stream index");
            return -1;
        stream_index_arr[n] = audio_stream_index;
        //Get decoder
        AVCodecContext *codec_ctx = avcodec_alloc_context3(NULL);
        codec_ctx_arr[n] = codec_ctx;
        AVStream *stream = fmt_ctx->streams[audio_stream_index];
        avcodec_parameters_to_context(codec_ctx, fmt_ctx->streams[audio_stream_index]->codecpar);
        AVCodec *codec = avcodec_find_decoder(codec_ctx->codec_id);
        if (n == 0) {//Get input format
            in_sample_fmt = codec_ctx->sample_fmt;
            in_ch_layout = codec_ctx->channel_layout;
            in_sample_rate = codec_ctx->sample_rate;
            in_ch_layout_nb = av_get_channel_layout_nb_channels(in_ch_layout);
            max_audio_frame_size = in_sample_rate * in_ch_layout_nb;
            time_base = fmt_ctx->streams[audio_stream_index]->time_base;
            int64_t duration = stream->duration;
            total_time = av_q2d(stream->time_base) * duration;
            LOGI("total time:%lf", total_time);
        } else {//If it is multiple files, judge whether the format is consistent (adoption rate, format, channel number)
            if (in_ch_layout != codec_ctx->channel_layout
                || in_sample_fmt != codec_ctx->sample_fmt
                || in_sample_rate != codec_ctx->sample_rate) {
                LOGE("The format of the input file is the same");
                return -1;
        //Open decoder
        if (avcodec_open2(codec_ctx, codec, NULL) < 0) {
            LOGE("could not open codec");
            return -1;
    return 1;

Here, the format information of the input audio is saved for SwrContext initialization and Filter initialization.

Initialize Filters


int AudioPlayer::initFilters() {
    LOGI("init filters");
    graph = avfilter_graph_alloc();
    srcs = (AVFilterContext **) malloc(fileCount * sizeof(AVFilterContext **));
    char args[128];
    AVDictionary *dic = NULL;
    //Mix filter
    AVFilter *amix = avfilter_get_by_name("amix");
    AVFilterContext *amix_ctx = avfilter_graph_alloc_filter(graph, amix, "amix");
    snprintf(args, sizeof(args), "inputs=%d:duration=first:dropout_transition=3", fileCount);
    if (avfilter_init_str(amix_ctx, args) < 0) {
        LOGE("error init amix filter");
        return -1;

    const char *sample_fmt = av_get_sample_fmt_name(in_sample_fmt);
    snprintf(args, sizeof(args), "sample_rate=%d:sample_fmt=%s:channel_layout=0x%" PRIx64,
             in_sample_rate, sample_fmt, in_ch_layout);

    for (int i = 0; i < fileCount; i++) {
        AVFilter *abuffer = avfilter_get_by_name("abuffer");
        char name[50];
        snprintf(name, sizeof(name), "src%d", i);
        srcs[i] = avfilter_graph_alloc_filter(graph, abuffer, name);
        if (avfilter_init_str(srcs[i], args) < 0) {
            LOGE("error init abuffer filter");
            return -1;
        //Volume filter
        AVFilter *volume = avfilter_get_by_name("volume");
        AVFilterContext *volume_ctx = avfilter_graph_alloc_filter(graph, volume, "volume");
        av_dict_set(&dic, "volume", volumes[i], 0);
        if (avfilter_init_dict(volume_ctx, &dic) < 0) {
            LOGE("error init volume filter");
            return -1;
        //Link input to volume filter
        if (avfilter_link(srcs[i], 0, volume_ctx, 0) < 0) {
            LOGE("error link to volume filter");
            return -1;
        //Link to mix amix filter
        if (avfilter_link(volume_ctx, 0, amix_ctx, i) < 0) {
            LOGE("error link to amix filter");
            return -1;

    //Variable speed filter atempo
    AVFilter *atempo = avfilter_get_by_name("atempo");
    AVFilterContext *atempo_ctx = avfilter_graph_alloc_filter(graph, atempo, "atempo");
    av_dict_set(&dic, "tempo", tempo, 0);
    if (avfilter_init_dict(atempo_ctx, &dic) < 0) {
        LOGE("error init atempo filter");
        return -1;
    //Output format
    AVFilter *aformat = avfilter_get_by_name("aformat");
    AVFilterContext *aformat_ctx = avfilter_graph_alloc_filter(graph, aformat, "aformat");
    snprintf(args, sizeof(args), "sample_rates=%d:sample_fmts=%s:channel_layouts=0x%" PRIx64,
             in_sample_rate, sample_fmt, in_ch_layout);
    if (avfilter_init_str(aformat_ctx, args) < 0) {
        LOGE("error init aformat filter");
        return -1;
    //Output buffer
    AVFilter *abuffersink = avfilter_get_by_name("abuffersink");
    sink = avfilter_graph_alloc_filter(graph, abuffersink, "sink");
    if (avfilter_init_str(sink, NULL) < 0) {
        LOGE("error init abuffersink filter");
        return -1;
    //Link amix to atempo
    if (avfilter_link(amix_ctx, 0, atempo_ctx, 0) < 0) {
        LOGE("error link to atempo filter");
        return -1;
    if (avfilter_link(atempo_ctx, 0, aformat_ctx, 0) < 0) {
        LOGE("error link to aformat filter");
        return -1;
    if (avfilter_link(aformat_ctx, 0, sink, 0) < 0) {
        LOGE("error link to abuffersink filter");
        return -1;
    if (avfilter_graph_config(graph, NULL) < 0) {
        LOGE("error config graph");
        return -1;

    return 1;

By initializing the input audio format information obtained by the decoder, the abuffer input filter can be initialized (the sampling rate, format and channel must match), and then the volume, amix and atempo filter can be linked. In this way, the audio can achieve the effect of tuning, mixing and shifting.

Initialize SwrContext


int AudioPlayer::initSwrContext() {
    LOGI("init swr context");
    swr_ctx = swr_alloc();
    out_sample_fmt = AV_SAMPLE_FMT_S16;
    out_ch_layout = AV_CH_LAYOUT_STEREO;
    out_ch_layout_nb = 2;
    out_sample_rate = in_sample_rate;
    max_audio_frame_size = out_sample_rate * 2;

    swr_alloc_set_opts(swr_ctx, out_ch_layout, out_sample_fmt, out_sample_rate, in_ch_layout,
                       in_sample_fmt, in_sample_rate, 0, NULL);
    if (swr_init(swr_ctx) < 0) {
        LOGE("error init SwrContext");
        return -1;
    return 1;

In order to enable the decoded AVFrame to play under OpenSL ES, we will adopt the AV ﹐ sample ﹐ FMT ﹐ S16 with the format fixed to 16 bits, the channel is stereo AV ﹐ ch ﹐ layout ﹐ stereo, the channel number is 2, and the sampling rate is the same as the input. The maximum value of buffered callback pcm data is sampling rate * 2.

Initialize OpenSL ES player


int AudioPlayer::createPlayer() {
    //Create player
    //Create and initialize engine objects
//    SLObjectItf engineObject;
    slCreateEngine(&engineObject, 0, NULL, 0, NULL, NULL);
    (*engineObject)->Realize(engineObject, SL_BOOLEAN_FALSE);
    //Get engine interface
//    SLEngineItf engineItf;
    (*engineObject)->GetInterface(engineObject, SL_IID_ENGINE, &engineItf);
    //Get output mix through engine interface
//    SLObjectItf mixObject;
    (*engineItf)->CreateOutputMix(engineItf, &mixObject, 0, 0, 0);
    (*mixObject)->Realize(mixObject, SL_BOOLEAN_FALSE);

    //Set player parameters
    SLuint32 samplesPerSec = (SLuint32) out_sample_rate * 1000;
    //pcm format
    SLDataFormat_PCM pcm = {SL_DATAFORMAT_PCM,
                            2,//Two channel
                            SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT,//

    SLDataSource slDataSource = {&android_queue, &pcm};

    //Output pipeline
    SLDataLocator_OutputMix outputMix = {SL_DATALOCATOR_OUTPUTMIX, mixObject};
    SLDataSink audioSnk = {&outputMix, NULL};

    //Create and initialize player objects through the engine interface
//    SLObjectItf playerObject;
    (*engineItf)->CreateAudioPlayer(engineItf, &playerObject, &slDataSource, &audioSnk, 1, ids,
    (*playerObject)->Realize(playerObject, SL_BOOLEAN_FALSE);

    //Get playback interface
//    SLPlayItf playItf;
    (*playerObject)->GetInterface(playerObject, SL_IID_PLAY, &playItf);
    //Get buffer interface
//    SLAndroidSimpleBufferQueueItf bufferQueueItf;
    (*playerObject)->GetInterface(playerObject, SL_IID_BUFFERQUEUE, &bufferQueueItf);

    //Register buffer callback
    (*bufferQueueItf)->RegisterCallback(bufferQueueItf, _playCallback, this);
    return 1;

The pcm format here should be consistent with the parameters set by SwrContext

Start playback thread and decoding thread


void *_decodeAudio(void *args) {
    AudioPlayer *p = (AudioPlayer *) args;

void *_play(void *args) {
    AudioPlayer *p = (AudioPlayer *) args;

void AudioPlayer::setPlaying() {
    //Set playback status
    (*playItf)->SetPlayState(playItf, SL_PLAYSTATE_PLAYING);
    _playCallback(bufferQueueItf, this);

void AudioPlayer::play() {

    isPlay = 1;
    pthread_create(&decodeId, NULL, _decodeAudio, this);
    pthread_create(&playId, NULL, _play, this);

In the play method, we pthread "create" starts the play and decoding thread. The play thread sets the state in play through the play interface, and then calls back the buffer interface. In the call back, take out the AVFrame in the queue and turn it into pcm, and then play through Enqueue. The decoding thread is responsible for decoding and filtering out the AVFrame and adding it to the queue.

Buffer callback


void _playCallback(SLAndroidSimpleBufferQueueItf bq, void *context) {
    AudioPlayer *player = (AudioPlayer *) context;
    AVFrame *frame = player->get();
    if (frame) {
        int size = av_samples_get_buffer_size(NULL, player->out_ch_layout_nb, frame->nb_samples,
                                              player->out_sample_fmt, 1);
        if (size > 0) {
            uint8_t *outBuffer = (uint8_t *) av_malloc(player->max_audio_frame_size);
            swr_convert(player->swr_ctx, &outBuffer, player->max_audio_frame_size,
                        (const uint8_t **) frame->data, frame->nb_samples);
            (*bq)->Enqueue(bq, outBuffer, size);

Decoding filtering


void AudioPlayer::decodeAudio() {
    LOGI("start decode...");
    AVFrame *frame = av_frame_alloc();
    AVPacket *packet = av_packet_alloc();
    int ret, got_frame;
    int index = 0;
    while (isPlay) {
        LOGI("decode frame:%d", index);
        for (int i = 0; i < fileCount; i++) {
            AVFormatContext *fmt_ctx = fmt_ctx_arr[i];
            ret = av_read_frame(fmt_ctx, packet);
            if (packet->stream_index != stream_index_arr[i])continue;//Not audio packet skip
            if (ret < 0) {
                LOGE("decode finish");
                goto end;
            ret = avcodec_decode_audio4(codec_ctx_arr[i], frame, &got_frame, packet);
            if (ret < 0) {
                LOGE("error decode packet");
                goto end;
            if (got_frame <= 0) {
                LOGE("decode error or finish");
                goto end;
            ret = av_buffersrc_add_frame(srcs[i], frame);
            if (ret < 0) {
                LOGE("error add frame to filter");
                goto end;
        LOGI("time:%lld,%lld,%lld", frame->pkt_dts, frame->pts, packet->pts);
        while (av_buffersink_get_frame(sink, frame) >= 0) {
            frame->pts = packet->pts;
            LOGI("put frame:%d,%lld", index, frame->pts);

One thing to note here is that packets read through AV read frame are not necessarily audio streams, so packets need to be filtered through the audio stream index. In the AV frame acquired by AV buffer get frame, the pts is modified to pts in the packet to save the progress (the filtered pts time progress is not the current decoding progress).

AVFrame storage and retrieval


 * Add AVFrame to the queue, when the queue length is 5, block waiting
 * @param frame
 * @return
int AudioPlayer::put(AVFrame *frame) {
    AVFrame *out = av_frame_alloc();
    if (av_frame_ref(out, frame) < 0)return -1;//Copy AVFrame
    if (queue.size() == 5) {
        LOGI("queue is full,wait for put frame:%d", queue.size());
        pthread_cond_wait(&not_full, &mutex);
    return 1;

 * Take out AVFrame, block waiting when the queue is empty
 * @return
AVFrame *AudioPlayer::get() {
    AVFrame *out = av_frame_alloc();
    while (isPlay) {
        if (queue.empty()) {
            pthread_cond_wait(&not_empty, &mutex);
        } else {
            AVFrame *src = queue.front();
            if (av_frame_ref(out, src) < 0)return NULL;
            queue.erase(queue.begin());//Remove removed elements
            if (queue.size() < 5)pthread_cond_signal(&not_full);
            current_time = av_q2d(time_base) * out->pts;
            LOGI("get frame:%d,time:%lf", queue.size(), current_time);
            return out;
    return NULL;

Through two conditional variables, a production consumption model with buffer too small as 5 is implemented for the storage and retrieval of AVFrame queue.
Through the above code, we can realize the multi audio playback with volume of 1 and speed of 1, and the specific audio control will be put in the next section.

Project address

When playing, you need to put the audio files of assets into the corresponding sd card directory


Author: Star y
Link: https://www.jianshu.com/p/73b0a0a9bb0d
Source: Jianshu
The copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source.

137 original articles published, 308 praised, 4.4 million visitors+
Private letter follow

Tags: codec github

Posted on Wed, 12 Feb 2020 00:46:45 -0800 by tmed