There’s quite a few scattered comments online about doing parts of this with
FFMPEG, but nothing cohesive and it seemed a bit hit and miss getting some of the bits to work at first. I've collected together the steps I worked through along with some of the useful references that helped along the way. It’s not complete by any means, but it should be a good starter for building upon.
ffmpeg -i XX.mp4 -vf
"
color=c=black:s=720x576:d=10 [pre] ;
color=c=black:s=720x576:d=30 [post] ;
[pre] [in] [post]
concat=n=3" -an -vcodec mpeg2video -pix_fmt yuv422p -s 720v576 -aspect
16:9 -r 25 -minrate 30000k -maxrate 30000k -b 30000k output.mpg
This gives the starting point using the –vf command to setup
a concat video effect. The –vf option only allows a single input file into the
filter graph but allows the definition of some processing options inside the
function. The post goes on to look at some of the challenges that the author
was facing mostly related to differences between the input file resolution and
aspect ratio. I played around with this in a much more simplified manner using
an MXF sample input file as follows:
ffmpeg -i XX.mxf -vf
"
color=c=black:s=1920x1080:d=10 [pre] ;
color=c=black:s=1920x1080:d=30 [post] ;
[pre] [in] [post]
concat=n=3" –y output.mxf
In my case here I’m using the same output codec as the input
so hence the simpler command line without additional parameters. The –y option
means overwrite any existing files which I’ve just used for ease. The key to
understanding this is the concat filter which took me quite a bit of work.
As a note, I’ve laid this out on multiple lines for
readability, but it needs to be on a single command line to work.
Concat Video Filter
The concat video filter is the key to this operation which
is stitching together the three components. There are a couple of concepts to
explore, so I’ll take them bit by bit, some of the referenced links will use
them earlier in examples so it might be worthwhile skipping the references
first, reading through to the end and then dipping into the references as
needed to explore more details once you have the full context.
ffmpeg -i opening.mkv -i
episode.mkv -i ending.mkv \
-filter_complex '[0:0] [0:1]
[1:0] [1:1] [2:0] [2:1] concat=n=3:v=1:a=1 [v] [a]' \
-map '[v]' -map '[a]' output.mkv
Notice in this example the \ is used to spread the command
line over multiple lines as might be used in a linux script (this doesn’t work
on windows). In this case there are multiple input files and the filter_complex
command is used instead. As most of the examples use –filter_complex instead of
–vf I’ll use that from now on. I had a number of problems getting this to work
initially which I’ll describe as I go through.
In this case the concat command has a few more options:
concat=n=3:v=0:a=1" :
concat means use the media concatenate (joining)
function.
n means confirm total count of input files.
v means has
video? use 0 = no video, 1 = contains video.
a means has
audio? use 0 = no audio, 1 = contain audio.
Some clues to understanding how this works are given with this
nice little diagram indicating how the inputs, streams and outputs are mapped together:
Video Audio
Stream Stream
input_file_1 ----> [0:1] [0:0]
| |
input_file_2 ----> [1:1] [1:0]
| |
"concat" filter
| |
[v] [a]
| |
"map" "map"
| |
Output_file <-------------------
Along with the following description:
ffmpeg -i input_1 -i input_2
-filter_complex "[0:1] [0:0] [1:1] [1:0]
concat=n=2:v=1:a=1 [v] [a]"
-map [v] -map [a] output_file
The above command uses:
- Two input files are specified: "-i input_1"
and "-i input_2".
- The "conact" filter is used in the
"-filter_complex" option to concatenate 2 segments of input
streams.
- Two input files are specified: "-i input_1"
and "-i input_2".
- "[0:1] [0:0] [1:1] [1:0]" provides a list of
input streams to the "concat" filter. "[0:1]" refers
to the first (index 0:) input file and the second (index :1) stream, and
so on.
- "concat=n=2:v=1:a=1" specifies the
"concat" filter and its arguments: "n=2" specifies 2
segments of input streams; "v=1" specifies 1 video stream in
each segment; "a=1" specifies 1 audio stream in each segment.
- "[v] [a]" defines link labels for 2 streams
coming out of the "concat" filter.
- "-map [v]" forces stream labeled as [v] go to
the output file.
- "-map [a]" forces stream labeled as [a] go to
the output file.
- "output_file" specifies the output file.
Filter_Complex input mapping
Before we get onto the output mapping, let’s look at what
this input syntax means – I cannot remember quite where I found the
information, but basically the definition of the concat command n, v, a gives
the ‘dimensions’ of the input and output to the filter. So there will be v+a
outputs and n*(v+a) inputs.
The inputs are referenced as follows: [0:1] means input 0,
track 1, or is defined [0:v:0] which means input 0, video track 0.
There needs to be n*(v+a) of these, arranged (v+a) in front
of the concat command. For example:
Concat two input video sequences
“[0:0] [1:0] concat=n=2:v=1:a=0”
Concat two input audio sequences (assuming audio is on the
second track)
“[0:1] [1:1] concat=n=2:v=0:a=1”
Concat two input AV sequences (assuming audio is on the
second and third tracks)
“[0:0] [0:1] [0:2] [1:0] [1:1] [1:2] concat=n=2:v=1:a=2”
Getting this wrong kept producing this cryptic message:
"[AVFilterGraph @
036e2fc0] No such filter: '
'
Error initializing complex
filters.
Invalid argument"
Output mapping
Taking the rest of the filter, it is possible to put
mappings after the concat command to identify the outputs:
concat=n=2:v=1:a=1 [v] [a]"
-map [v] -map [a]
These can then be mapped using the map command to the
various output tracks which are created in order, so if you had four audio
tracks it would look something like this:
concat=n=2:v=1:a=4 [v] [a1] [a2] [a3] [a4]"
-map [v] -map [a1] -map [a2] -map [a3] -map [a4]
This can be omitted and seems to work fine with a default
mapping.
Notice in this case that the output tracks have all been
named to something convenient to understand, likewise these could be written as
follows:
concat=n=2:v=1:a=4 [vid] [engl] [engr] [frl]
[frr]"
-map [vid] -map [engl] -map [engr] -map [frl] -map [frr]
This is also possible on the input as follows:
“[0:0] [vid1]; [1:0] [vid2]; [vid1] [vid2] concat=n=2:v=1:a=0”
Which is pretty neat and allows a bit more clearer
description.
Generating Black
Now we’ve got the groundwork in place we can create some
black video. I found two ways of doing this, first in the concat filter:
ffmpeg –i XX.mxf –filter_complex “[0:0] [video];
color=c=black:s=1920x1080:d=30 [black]; [black] [video] concat=n=2:v=1:a=0”
This has a single input file which we take the video track
from and then create an input black video for 30secs and feed into the concat
filter to produce a video stream.
Alternatively a number of samples show the input stream
being created like this:
ffmpeg –i XX.mxf -f lavfi -i
"color=c=black:s=1920x1080:d=10" –filter_complex “[0:0] [video];
[1:0] [black]; [black] [video] concat=n=2:v=1:a=0”
This all works fine, so let’s now add some audio tracks in.
We’ll need to generate a matching audio track for the black video. I found when
I was first playing with this that the output video files were getting
truncated because the duration of the output only matched the ‘clock-ticks’ of
the duration of the input video. The way I’ll do this is to generate a tone
using a sine wave function and set the frequency to zero, which just saves me
explaining this again later.
ffmpeg –i XX.mxf –filter_complex “color=c=black:s=1920x1080:d=10 [black];
sine=frequency=0:sample_rate=48000:d=10 [silence]; [black] [silence] [0:0]
[0:1] concat=n=2:v=1:a=1”
And similarly if we wanted to top and tail with black it
works like this:
ffmpeg –i XX.mxf –filter_complex “color=c=black:s=1920x1080:d=10
[black]; sine=frequency=0:sample_rate=48000:d=10 [silence]; [black] [silence]
[0:0] [0:1] [black] [silence] concat=n=3:v=1:a=1”
or it doesn’t! It seems that you can only use the streams
once in the mapping… so it’s easy enough to modify to this:
ffmpeg
-i hottubmxf.mxf -filter_complex "color=c=black:s=1920x1080:d=10
[preblack]; sine=frequency=0:sample_rate=48000:d=10 [presilence];
color=c=black:s=1920x1080:d=10 [postblack];
sine=frequency=0:sample_rate=48000:d=10 [postsilence]; [preblack] [presilence]
[0:0] [0:1] [postblack] [postsilence] concat=n=3:v=1:a=1" -y output.mxf
Which is a bit more of a faff, but works.
ColourBars and Slates
Adding ColorBars is simply a matter of using another
generator like this to replace black at the front with colorbars and this time
generating a tone:
ffmpeg
-i hottubmxf.mxf -filter_complex "
testsrc=d=10:s=1920x1080
[prebars];
sine=frequency=1000:sample_rate=48000:d=10
[pretone];
color=c=black:s=1920x1080:d=10
[postblack];
sine=frequency=0:sample_rate=48000:d=10
[postsilence];
[prebars]
[pretone] [0:0] [0:1] [postblack] [postsilence]
concat=n=3:v=1:a=1"
-y output.mxf
Let’s add the black in back at the start as well:
ffmpeg
-i hottubmxf.mxf -filter_complex "
testsrc=d=10:s=1920x1080
[prebars];
sine=frequency=1000:sample_rate=48000:d=10
[pretone];
color=c=black:s=1920x1080:d=10
[preblack];
sine=frequency=0:sample_rate=48000:d=10
[presilence];
color=c=black:s=1920x1080:d=10
[postblack];
sine=frequency=0:sample_rate=48000:d=10
[postsilence];
[prebars]
[pretone] [preblack] [presilence] [0:0] [0:1] [postblack] [postsilence]
concat=n=4:v=1:a=1"
-y output.mxf
Now let’s add the title to the black as a slate, which can
be done with the following:
drawtext=fontfile=OpenSans-Regular.ttf:text='Title
of this Video':fontcolor=white:fontsize=24:x=(w-tw)/2:y=(h/PHI)+th
Which I found along with some additional
explanation for adding text boxes.
This can be achieved in the ffmpeg filtergraph syntax by
adding the filter into the stream. In each of the inputs these filter options
can be added as comma separated items, so for the [preblack], let’s now call
that [slate] and would look like this:
color=c=black:s=1920x1080:d=10
,drawtext=fontfile='C\:\\Windows\\Fonts\\arial.ttf':text='Text
to write':fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h-line_h)/2
[slate];
This puts the text in the middle of the screen. Multiple
lines are then easy to add.