This commit implements the perceptual noise substitution AAC extension. This is a proof of concept
implementation, and as such, is not enabled by default. This is the fourth revision of this patch,
made after some problems were noted out. Any changes made since the previous revisions have been indicated.
In order to extend the encoder to use an additional codebook, the array holding each codebook has been
modified with two additional entries - 13 for the NOISE_BT codebook and 12 which has a placeholder function.
The cost system was modified to skip the 12th entry using an array to map the input and outputs it has. It
also does not accept using the 13th codebook for any band which is not marked as containing noise, thereby
restricting its ability to arbitrarily choose it for bands. The use of arrays allows the system to be easily
extended to allow for intensity stereo encoding, which uses additional codebooks.
The 12th entry in the codebook function array points to a function which stops the execution of the program
by calling an assert with an always 'false' argument. It was pointed out in an email discussion with
Claudio Freire that having a 'NULL' entry can result in unexpected behaviour and could be used as
a security hole. There is no danger of this function being called during encoding due to the codebook maps introduced.
Another change from version 1 of the patch is the addition of an argument to the encoder, '-aac_pns' to
enable and disable the PNS. This currently defaults to disable the PNS, as it is experimental.
The switch will be removed in the future, when the algorithm to select noise bands has been improved.
The current algorithm simply compares the energy to the threshold (multiplied by a constant) to determine
noise, however the FFPsyBand structure contains other useful figures to determine which bands carry noise more accurately.
Some of the sample files provided triggered an assertion when the parameter to tune the threshold was set to
a value of '2.2'. Claudio Freire reported the problem's source could be in the range of the scalefactor
indices for noise and advised to measure the minimal index and clip anything above the maximum allowed
value. This has been implemented and all the files which used to trigger the asserion now encode without error.
The third revision of the problem also removes unneded variabes and comparisons. All of them were
redundant and were of little use for when the PNS implementation would be extended.
The fourth revision moved the clipping of the noise scalefactors outside the second loop of the two-loop
algorithm in order to prevent their redundant calculations. Also, freq_mult has been changed to a float
variable due to the fact that rounding errors can prove to be a problem at low frequencies.
Considerations were taken whether the entire expression could be evaluated inside the expression
, but in the end it was decided that it would be for the best if just the type of the variable were
to change. Claudio Freire reported the two problems. There is no change of functionality
(except for low sampling frequencies) so the spectral demonstrations at the end of this commit's message were not updated.
Finally, the way energy values are converted to scalefactor indices has changed since the first commit,
as per the suggestion of Claudio Freire. This may still have some drawbacks, but unlike the first commit
it works without having redundant offsets and outputs what the decoder expects to have, in terms of the
ranges of the scalefactor indices.
Some spectral comparisons: https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Original.png (original),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_NO.png (encoded without PNS),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS1.2.png (encoded with PNS, const = 1.2),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Difference1.png (spectral difference).
The constant is the value which multiplies the threshold when it gets compared to the energy, larger
values means more noise will be substituded by PNS values. Example when const = 2.2:
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_2.2.png
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adjusts the intial offset for PNS values, introduced
with commit f7f71b5795d708763eb0c55fe5e2cb051b2b69f4 earlier. This
commit shifts the value in such a way that no further offsets are
required in the aaccoder.c file. Earlier version of the PNS patch had 2 offsets in both the aaccoder and aacenc.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit implements support for writing the noise energy values used in PNS.
The difference between regular scalefactors and noise energy values is that the latter
require a small preamble (NOISE_PRE + energy_value_diff) to be written as the first
noise-containing band. Any following noise energy values use the previous one to
base their "diff" on. Ordinary scalefactors remain unchanged other than that they ignore the noise values.
This commit should not change anything by itself, the following commits will bring it in use.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, warn that bitrate will be clamped down to the maximum allowed.
Patch is mostly work of Kamendo2 in issue #2686, quite tested within that issue.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch fixes a pointer arithmetic bug in adjust_frame_information that resulted in heavily corrupted audio when using M/S encoding. Also, a backup copy of untransformed coefficients has to be kept around or attempts at re-processing the frame (which happens when hevavily overspending bits during transients) will result in re-encoding of the coefficients and subsequent corruption of the resulting stream.
A/B testing shows the bug as corrected, but still cannot prove that M/S coding is a win at least in numbers. Limited listening tests do show improvement on M/S encoded samples in lower bitrates, but they're hidden among the other artifacts that remain to be corrected in the encoder.
Some of the regressions flagged in the report do show poor stereo image (but not buggy), so M/S encoding is clearly not good enough yet to be defaulted to auto.
In numbers, Patched against Unpatched, stereo_mode auto:
Files: 114
Bitrates: 6
Tests: 683
Serious Regressions: 0 (0%)
Regressions: 0 (0%)
Improvements: 227 (33%)
Big improvements: 92 (13%)
Worst regression - mybloodrusts.wv - 256k
- StdDev: 28.61 pSNR: -0.43 maxdiff: 1372.00
Best improvement - 60.wv - 384k
- StdDev: -369.57 pSNR: 45.02 maxdiff: -13322.00
Average - StdDev: -80.56 pSNR: 2.49 maxdiff: -8858.00
Patched against Unpatched stereo_mode ms_off shows no difference.
Patched stereo_mode auto vs Unpatched stereo_mode ms_off shows a small average improvement, just not too significant:
Serious Regressions: 0 (0%)
Regressions: 10 (1%)
Improvements: 45 (6%)
Big improvements: 2 (0%)
Worst regression - Illinois.wv - 256k
- StdDev: 33.20 pSNR: -2.03 maxdiff: 477.00
Best improvement - song_of_circomstances.flac - 384k
- StdDev: -3.97 pSNR: 7.61 maxdiff: -826.00
Average - StdDev: -10.25 pSNR: 0.20 maxdiff: -281.00
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Several encoders were multiplying the buffer size by 8, in order to get
a bit size. However, the buffer_size argument is for the byte size of
the buffer. We had experienced crashes encoding prores (Anatoliy) at
size 4096x4096.
* commit '2df0c32ea12ddfa72ba88309812bfb13b674130f':
lavc: use a separate field for exporting audio encoder padding
Conflicts:
libavcodec/audio_frame_queue.c
libavcodec/avcodec.h
libavcodec/libvorbisenc.c
libavcodec/utils.c
libavcodec/version.h
libavcodec/wmaenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Currently, the amount of padding inserted at the beginning by some audio
encoders, is exported through AVCodecContext.delay. However
- the term 'delay' is heavily overloaded and can have multiple different
meanings even in the case of audio encoding.
- this field has entirely different meanings, depending on whether the
codec context is used for encoding or decoding (and has yet another
different meaning for video), preventing generic handling of the codec
context.
Therefore, add a new field -- AVCodecContext.initial_padding. It could
conceivably be used for decoding as well at a later point.
This was due to a miscomputation of s->cur_channel, which led to
psy-based encoders using the psy coefficients for the wrong channel.
Signed-off-by: Martin Storsjö <martin@martin.st>
This was due to a miscomputation of s->cur_channel, which led to
psy-based encoders using the psy coefficients for the wrong channel.
Test sample attached on the bug tracker had the peculiar case of all
other channels being silent, so the error was far more noticeable.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0f24a3ca999a702f83af9307f9f47b6fdeb546a5':
lavc: remove disabled FF_API_OLD_ENCODE_VIDEO cruft
lavc: remove disabled FF_API_OLD_ENCODE_AUDIO cruft
lavc: remove disabled FF_API_OLD_DECODE_AUDIO cruft
Conflicts:
libavcodec/flacenc.c
libavcodec/libgsm.c
libavcodec/utils.c
libavcodec/version.h
The compatibility wrapers are left as they likely sre still
in wide use. They will be removed when they break or otherwise
cause work without an volunteer being available.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
This fixes segfault caused by 3d3cf6745e2a5dc9c377244454c3186d75b177fa
when SingleChannelElement.ret was renamed to SingleChannelElement.ret_buf.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* commit '3d3cf6745e2a5dc9c377244454c3186d75b177fa':
aacdec: use float planar sample format for output
Conflicts:
libavcodec/aacdec.c
libavcodec/aacsbr.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '381dc1a5ec0925b281c573457c413ae643567086':
fate: ac3: Place E-AC-3 tests and AC-3 tests in different groups
fate: Add shorthands for acodec PCM and ADPCM tests
avconv: Drop unused function argument from do_video_stats()
cmdutils: Conditionally compile libswscale-related bits
aacenc: Drop some unused function arguments
rtsp: Avoid a cast when calling strtol
nut: support textual data
nutenc: verbosely report unsupported negative pts
Conflicts:
cmdutils.c
ffmpeg.c
libavformat/nut.c
libavformat/nutenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
wmaenc: use float planar sample format
(e)ac3enc: use planar sample format
aacenc: use planar sample format
adpcmenc: use planar sample format for adpcm_ima_wav and adpcm_ima_qt
adpcmenc: move 'ch' variable to higher scope
adpcmenc: fix 3 instances of variable shadowing
adpcm_ima_wav: simplify encoding
libvorbis: use planar sample format
libmp3lame: use planar sample formats
vorbisenc: use float planar sample format
ffm: do not write or read the audio sample format
parseutils: fix parsing of invalid alpha values
doc/RELEASE_NOTES: update for the 9 release.
smoothstreamingenc: Add a more verbose error message
smoothstreamingenc: Ignore the return value from mkdir
smoothstreamingenc: Try writing a manifest when opening the muxer
smoothstreamingenc: Move the output_chunk_list and write_manifest functions up
smoothstreamingenc: Properly return errors from ism_flush to the caller
smoothstreamingenc: Check the output UrlContext before accessing it
Conflicts:
doc/RELEASE_NOTES
libavcodec/aacenc.c
libavcodec/ac3enc_template.c
libavcodec/wmaenc.c
tests/ref/lavf/ffm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The value used in allocation is based on a estimate of the
maximum size of the spectral coefficients multiplied with 2
and rounded up. The exact or a tighter limit should be
found and used instead. But this issue shouldnt be left
open until someone works on that.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '124134e42455763b28cc346fed1d07017a76e84e':
avopt: Store defaults for AV_OPT_TYPE_CONST in the i64 union member
Conflicts:
libavcodec/aacenc.c
libavcodec/libopenjpegenc.c
libavcodec/options_table.h
libavdevice/bktr.c
libavdevice/v4l2.c
libavdevice/x11grab.c
libavfilter/af_amix.c
libavfilter/vf_drawtext.c
libavformat/movenc.c
libavformat/options_table.h
libavutil/opt.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mpc8: return more meaningful error codes.
mpc: return more meaningful error codes.
wv,mpc8: don't return apetag data in packets.
rtmp: do not warn about receiving metadata packets
x86: h264dsp: Adjust YASM #ifdefs
x86: yadif: Mark mmxext optimizations as such
h264: convert loop filter strength dsp function to yasm.
Improve descriptiveness of a number of codec and container long names
Conflicts:
libavcodec/flvdec.c
libavcodec/libopenjpegdec.c
libavformat/apetag.c
libavformat/mp3dec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
float_dsp: ppc: add a separate header for Altivec function prototypes
ARM: fix float_dsp breakage from d5a7229
Add a float DSP framework to libavutil
PPC: Move types_altivec.h and util_altivec.h from libavcodec to libavutil
ARM: Move asm.S from libavcodec to libavutil
vc1dsp: mark put/avg_vc1_mspel_mc() always_inline
Merged-by: Michael Niedermayer <michaelni@gmx.at>