Remove early termination in vp9_rd_pick_inter_mode_sub8x8() in
order to complete mode selection for 8x4/4x8 blocks which will
try supertx in a higher level function.
Change-Id: I457505257332f70d9cd8d22db52ad32ff15f7f87
A new set of prediction modes, called copy modes, is implemented
to allow blocks to directly copy motion information from a neighbor.
The motivation is to create regions of arbitrary shapes in which
blocks share same motion parameters and hence to save bits spent on
duplicated side information.
Compression gain:
derf: 0.894%; stdhd: 1.513%.
Change-Id: I5e026b12c902bc6985c199ec38f1b3b67ac7d930
We removed the restriction that transform blocks could not exceed
the size of prediction blocks. Smoothing masks are applied to reduce
discontinuity between prediction blocks in order to realize the
efficiency of large transform.
0.997%/0.895% bit-rate reduction is achieved on derf/stdhd set.
Change-Id: I8db241bab9fe74d864809e95f76b771ee59a2def
The old interintra experiment is slow (speed is 0.2x original codec
at speed 0).
We use best inter mode to skip some reference frame and NEWMV
search when searching best joint mode.
Quality drop: ~0.1% derf. Speed: 0.36x head.
Change-Id: If10453448284f86c14a0a41f20aeaf9ac838fa32
Moving RD-opt related code from vp9_encoder.h to vp9_rdopt.h.
Squashed-Change-Id: I8fab776c8801e19d3f5027ed55a6aa69eee951de
gen_msvs_proj: fix in tree configure under cygwin
strip trailing '/' from paths, this is later converted to '\' which
causes execution errors for obj_int_extract/yasm. vs10+ wasn't affected
by this issue, but make the same change for consistency.
gen_msvs_proj:
+ add missing '"' to obj_int_extract call
unlike gen_msvs_vcproj, the block is duplicated
missed in: 1e3d9b9 build/msvs: fix builds in source dirs with spaces
Squashed-Change-Id: I76208e6cdc66dc5a0a7ffa8aa1edbefe31e4b130
Improve vp9_rb_bytes_read
Squashed-Change-Id: I69eba120eb3d8ec43b5552451c8a9bd009390795
Removing decode_one_iter() function.
When superframe index is available we completely rely on it and use frame
size values from the index.
Squashed-Change-Id: I0011d08b223303a8b912c2bcc8a02b74d0426ee0
iosbuild.sh: Add vpx_config.h and vpx_version.h to VPX.framework.
- Rename build_targets to build_framework
- Add functions for creating the vpx_config shim and obtaining
preproc symbols.
Squashed-Change-Id: Ieca6938b9779077eefa26bf4cfee64286d1840b0
Implemented vp9_denoiser_{alloc,free}()
Squashed-Change-Id: I79eba79f7c52eec19ef2356278597e06620d5e27
Update running avg for VP9 denoiser
Squashed-Change-Id: I9577d648542064052795bf5770428fbd5c276b7b
Changed buf_2ds in vp9 denoiser to YV12 buffers
Changed alloc, free, and running average code as necessary.
Squashed-Change-Id: Ifc4d9ccca462164214019963b3768a457791b9c1
sse4 regular quantize
Squashed-Change-Id: Ibd95df0adf9cc9143006ee9032b4cb2ebfd5dd1b
Modify non-rd intra mode checking
Speed 6 uses small tx size, namely 8x8. max_intra_bsize needs to
be modified accordingly to ensure valid intra mode checking.
Borg test on RTC set showed an overall PSNR gain of 0.335% in speed
-6.
This also changes speed -5 encoding by allowing DC_PRED checking
for block32x32. Borg test on RTC set showed a slight PSNR gain of
0.145%, and no noticeable speed change.
Squashed-Change-Id: I1502978d8fbe265b3bb235db0f9c35ba0703cd45
Implemented COPY_BLOCK case for vp9 denoiser
Squashed-Change-Id: Ie89ad1e3aebbd474e1a0db69c1961b4d1ddcd33e
Improved vp9 denoiser running avg update.
Squashed-Change-Id: Ie0aa41fb7957755544321897b3bb2dd92f392027
Separate rate-distortion modeling for DC and AC coefficients
This is the first step to rework the rate-distortion modeling used
in rtc coding mode. The overall goal is to make the modeling
customized for the statistics encountered in the rtc coding.
This commit makes encoder to perform rate-distortion modeling for
DC and AC coefficients separately. No speed changes observed.
The coding performance for pedestrian_area_1080p is largely
improved:
speed -5, from 79558 b/f, 37.871 dB -> 79598 b/f, 38.600 dB
speed -6, from 79515 b/f, 37.822 dB -> 79544 b/f, 38.130 dB
Overall performance for rtc set at speed -6 is improved by 0.67%.
Squashed-Change-Id: I9153444567e5f75ccdcaac043c2365992c005c0c
Add superframe support for frame parallel decoding.
A superframe is a bunch of frames that bundled as one frame. It is mostly
used to combine one or more non-displayable frames and one displayable frame.
For frame parallel decoding, libvpx decoder will only support decoding one
normal frame or a super frame with superframe index.
If an application pass a superframe without superframe index or a chunk
of displayable frames without superframe index to libvpx decoder, libvpx
will not decode it in frame parallel mode. But libvpx decoder still could
decode it in serial mode.
Squashed-Change-Id: I04c9f2c828373d64e880a8c7bcade5307015ce35
Fixes in VP9 alloc, free, and COPY_FRAME case
Squashed-Change-Id: I1216f17e2206ef521fe219b6d72d8e41d1ba1147
Remove labels from quantize
Use break instead of goto for early exit. Unbreaks Visual Studio
builds.
Squashed-Change-Id: I96dee43a3c82145d4abe0d6a99af6e6e1a3991b5
Added CFLAG for outputting vp9 denoised signal
Squashed-Change-Id: Iab9b4e11cad927f3282e486c203564e1a658f377
Allow key frame more flexibility in mode search
This commit allows the key frame to search through more prediction
modes and more flexible block sizes. No speed change observed. The
coding performance for rtc set is improved by 1.7% for speed -5 and
3.0% for speed -6.
Squashed-Change-Id: Ifd1bc28558017851b210b4004f2d80838938bcc5
VP9 denoiser bugfixes
s/stdint.h/vpx\/vpx_int.h
Added missing 'break;'s
Also included other minor changes, mostly cosmetic.
Squashed-Change-Id: I852bba3e85e794f1d4af854c45c16a23a787e6a3
Don't return value for void functions
Clears "warning: 'return' with a value, in function returning void"
Squashed-Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2
Include type defines
Clears error: unknown type name 'uint8_t'
Squashed-Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2
Validate error checking code in decoder.
This patch adds a mechanism for insuring error checking on invalid files
by creating a unit test that runs the decoder and tests that the error
code matches what's expected on each frame in the decoder.
Disabled for now as this unit test will segfault with existing code.
Squashed-Change-Id: I896f9686d9ebcbf027426933adfbea7b8c5d956e
Introduce FrameWorker for decoding.
When decoding in serial mode, there will be only
one FrameWorker doing decoding. When decoding in
parallel mode, there will be several FrameWorkers
doing decoding in parallel.
Squashed-Change-Id: If53fc5c49c7a0bf5e773f1ce7008b8a62fdae257
Add back libmkv ebml writer files.
Another project in ChromeOS is using these files. To make libvpx
rolls simpler, add these files back unitl the other project removes
the dependency.
crbug.com/387246 tracking bug to remove dependency.
Squashed-Change-Id: If9c197081c845c4a4e5c5488d4e0190380bcb1e4
Added Test vector that tests more show existing frames.
Squashed-Change-Id: I0ddd7dd55313ee62d231ed4b9040e08c3761b3fe
fix peek_si to enable 1 byte show existing frames.
The test for this is in test vector code ( show existing frames will
fail ). I can't check it in disabled as I'm changing the generic
test code to do this:
https://gerrit.chromium.org/gerrit/#/c/70569/
Squashed-Change-Id: I5ab324f0cb7df06316a949af0f7fc089f4a3d466
Fix bug in error handling that causes segfault
See: https://code.google.com/p/chromium/issues/detail?id=362697
The code properly catches an invalid stream but seg faults instead of
returning an error due to a buffer not having been initialized. This
code fixes that.
Squashed-Change-Id: I695595e742cb08807e1dfb2f00bc097b3eae3a9b
Revert 3 patches from Hangyu to get Chrome to build:
Avoids failures:
MSE_ClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKeyDecryptOnly/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKeyDecryptOnly_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ExternalClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ExternalClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
Patches are
This reverts commit 9bc040859b
This reverts commit 6f5aba069a
This reverts commit 9bc040859b
I1f250441 Revert "Refactor the vp9_get_frame code for frame parallel."
Ibfdddce5 Revert "Delay decreasing reference count in frame-parallel decoding."
I00ce6771 Revert "Introduce FrameWorker for decoding."
Need better testing in libvpx for these commits
Squashed-Change-Id: Ifa1f279b0cabf4b47c051ec26018f9301c1e130e
error check vp9 superframe parsing
This patch insures that the last byte of a chunk that contains a
valid superframe marker byte, actually has a proper superframe index.
If not it returns an error.
As part of doing that the file : vp90-2-15-fuzz-flicker.webm now fails
to decode properly and moves to the invalid file test from the test
vector suite.
Squashed-Change-Id: I5f1da7eb37282ec0c6394df5c73251a2df9c1744
Remove unused vp9_init_quant_tables function
This function is not effectively used, hence removed.
Squashed-Change-Id: I2e8e48fa07c7518931690f3b04bae920cb360e49
Actually skip blocks in skip segments in non-rd encoder.
Copy split from macroblock to pick mode context so it doesn't get lost.
Squashed-Change-Id: Ie37aa12558dbe65c4f8076cf808250fffb7f27a8
Add Check for Peek Stream validity to decoder test.
Squashed-Change-Id: I9b745670a9f842582c47e6001dc77480b31fb6a1
Allocate buffers based on correct chroma format
The encoder currently allocates frame buffers before
it establishes what the chroma sub-sampling factor is,
always allocating based on the 4:4:4 format.
This patch detects the chroma format as early as
possible allowing the encoder to allocate buffers of
the correct size.
Future patches will change the encoder to allocate
frame buffers on demand to further reduce the memory
profile of the encoder and rationalize the buffer
management in the encoder and decoder.
Squashed-Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57
Fork vp9_rd_pick_inter_mode_sb_seg_skip
Squashed-Change-Id: I549868725b789f0f4f89828005a65972c20df888
Switch active map implementation to segment based.
Squashed-Change-Id: Ibb841a1fa4d08d164cf5461246ec290f582b1f80
Experiment for mid group second arf.
This patch implements a mechanism for inserting a second
arf at the mid position of arf groups.
It is currently disabled by default using the flag multi_arf_enabled.
Results are currently down somewhat in initial testing if
multi-arf is enabled. Most of the loss is attributable to the
fact that code to preserve the previous golden frame
(in the arf buffer) in cases where we are coding an overlay
frame, is currently disabled in the multi-arf case.
Squashed-Change-Id: I1d777318ca09f147db2e8c86d7315fe86168c865
Clean out old CONFIG_MULTIPLE_ARF code.
Remove the old experimental multi arf code that was under
the flag CONFIG_MULTIPLE_ARF.
Squashed-Change-Id: Ib24865abc11691d6ac8cb0434ada1da674368a61
Fix some bugs in multi-arf
Fix some bugs relating to the use of buffers
in the overlay frames.
Fix bug where a mid sequence overlay was
propagating large partition and transform sizes into
the subsequent frame because of :-
sf->last_partitioning_redo_frequency > 1 and
sf->tx_size_search_method == USE_LARGESTALL
Squashed-Change-Id: Ibf9ef39a5a5150f8cbdd2c9275abb0316c67873a
Further dual arf changes: multi_arf_allowed.
Add multi_arf_allowed flag.
Re-initialize buffer indices every kf.
Add some const indicators.
Squashed-Change-Id: If86c39153517c427182691d2d4d4b7e90594be71
Fixed VP9 denoiser COPY_BLOCK case
Now copies the src to the correct location in the running average buffer.
Squashed-Change-Id: I9c83c96dc7a97f42c8df16ab4a9f18b733181f34
Fix test on maximum downscaling limits
There is a normative scaling range of (x1/2, x16)
for VP9. This patch fixes the maximum downscaling
tests that are applied in the convolve function.
The code used a maximum downscaling limit of x1/5
for historic reasons related to the scalable
coding work. Since the downsampling in this
application is non-normative it will revert to
using a separate non-normative scaler.
Squashed-Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
Add unit test to test user_priv parameter.
Squashed-Change-Id: I6ba6171e43e0a43331ee0a7b698590b143979c44
vp9: check tile column count
the max is 6. there are assumptions throughout the decode regarding
this; fixes a crash with a fuzzed bitstream
$ zzuf -s 5861 -r 0.01:0.05 -b 6- \
< vp90-2-00-quantizer-00.webm.ivf \
| dd of=invalid-vp90-2-00-quantizer-00.webm.ivf.s5861_r01-05_b6-.ivf \
bs=1 count=81883
Squashed-Change-Id: I6af41bb34252e88bc156a4c27c80d505d45f5642
Adjust arf Q limits with multi-arf.
Adjust enforced minimum arf Q deltas for non primary arfs
in the middle of an arf/gf group.
Squashed-Change-Id: Ie8034ffb3ac00f887d74ae1586d4cac91d6cace2
Dual ARF changes: Buffer index selection.
Add indirection to the section of buffer indices.
This is to help simplify things in the future if we
have other codec features that switch indices.
Limit the max GF interval for static sections to fit
the gf_group structures.
Squashed-Change-Id: I38310daaf23fd906004c0e8ee3e99e15570f84cb
Reuse inter prediction result in real-time speed 6
In real-time speed 6, no partition search is done. The inter
prediction results got from picking mode can be reused in the
following encoding process. A speed feature reuse_inter_pred_sby
is added to only enable the resue in speed 6.
This patch doesn't change encoding result. RTC set tests showed
that the encoding speed gain is 2% - 5%.
Squashed-Change-Id: I3884780f64ef95dd8be10562926542528713b92c
Add vp9_ prefix to mv_pred and setup_pred_block functions
Make these two functions accessible by both RD and non-RD coding
modes.
Squashed-Change-Id: Iecb39dbf3d65436286ea3c7ffaa9920d0b3aff85
Replace cpi->common with preset variable cm
This commit replaces a few use cases of cpi->common with preset
variable cm, to avoid unnecessary pointer fetch in the non-RD
coding mode.
Squashed-Change-Id: I4038f1c1a47373b8fd7bc5d69af61346103702f6
[spatial svc]Implement lag in frames for spatial svc
Squashed-Change-Id: I930dced169c9d53f8044d2754a04332138347409
[spatial svc]Don't skip motion search in first pass encoding
Squashed-Change-Id: Ia6bcdaf5a5b80e68176f60d8d00e9b5cf3f9bfe3
decode_test_driver: fix type size warning
like vpx_codec_decode(), vpx_codec_peek_stream_info() takes an unsigned
int, not size_t, parameter for buffer size
Squashed-Change-Id: I4ce0e1fbbde461c2e1b8fcbaac3cd203ed707460
decode_test_driver: check HasFailure() in RunLoop
avoids unnecessary errors due to e.g., read (Next()) failures
Squashed-Change-Id: I70b1d09766456f1c55367d98299b5abd7afff842
Allow lossless breakout in non-rd mode decision.
This is very helpful for large moving windows in screencasts.
Squashed-Change-Id: I91b5f9acb133281ee85ccd8f843e6bae5cadefca
Revert "Revert 3 patches from Hangyu to get Chrome to build:"
This patch reverts the previous revert from Jim and also add a
variable user_priv in the FrameWorker to save the user_priv
passed from the application. In the decoder_get_frame function,
the user_priv will be binded with the img. This change is needed
or it will fail the unit test added here:
https://gerrit.chromium.org/gerrit/#/c/70610/
This reverts commit 9be46e4565.
Squashed-Change-Id: I376d9a12ee196faffdf3c792b59e6137c56132c1
test.mk: remove renamed file
vp90-2-15-fuzz-flicker.webm was renamed in:
c3db2d8 error check vp9 superframe parsing
Squashed-Change-Id: I229dd6ca4c662802c457beea0f7b4128153a65dc
vp9cx.mk: move avx c files outside of x86inc block
same reasoning as:
9f3a0db vp9_rtcd: correct avx2 references
these are all intrinsics, so don't depend on x86inc.asm
Squashed-Change-Id: I915beaef318a28f64bfa5469e5efe90e4af5b827
Dual arf: Name changes.
Cosmetic patch only in response to comments on
previous patches suggesting a couple of name changes
for consistency and clarity.
Squashed-Change-Id: Ida3a359b0d5755345660d304a7697a3a3686b2a3
Make non-RD intra mode search txfm size dependent
This commit fixes the potential issue in the non-RD mode decision
flow that only checks part of the block to estimate the cost. It
was due to the use of fixed transform size, in replacing the
largest transform block size. This commit enables per transform
block cost estimation of the intra prediction mode in the non-RD
mode decision.
Squashed-Change-Id: I14ff92065e193e3e731c2bbf7ec89db676f1e132
Fix quality regression for multi arf off case.
Bug introduced during multiple iterations on: I3831*
gf_group->arf_update_idx[] cannot currently be used
to select the arf buffer index if buffer flipping on overlays
is enabled (still currently the case when multi arf OFF).
Squashed-Change-Id: I4ce9ea08f1dd03ac3ad8b3e27375a91ee1d964dc
Enable real-time version reference motion vector search
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.
The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p 11830 ms -> 10990 ms
i.e., 6%-8% speed-up.
For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.
Squashed-Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
Add const mark to const values in non-RD coding mode
Squashed-Change-Id: I65209fd1e06fc06833f6647cb028b414391a7017
Change-Id: Ic0be67ac9ef48f64a8878a0b8f1b336f136bceac
This patch allows the VP9 encoder to skip the un-necessary
motion search in the first pass. It computes the motion error
of 0,0 motion using the last source frame as the reference,
and skips the further motion search if this error is small.
Borg test shows overall the patch gives PSNR gain (derf -0.001%,
yt 0.341%, hd 0.282%). Individual clips may have PSNR gain or
loss. The best PSNR performance is 7.347% and the worst is -0.662%.
The first pass encoding speedup for slideshow clips is over 30%.
Change-Id: I4cac4dbd911f277ee858e161f3ca652c771344fe
Allow for an option to selectively apply the deblocking loop filter to the denoised
raw block, based on the denoised state (no-filter, filter with zero motion, or filter with non-zero motion)
of the current block and its upper and left denoised block.
This helps to reduce some blocking artifacts from the motion-compensated denoising.
Change-Id: I0ac4e70076df69a98c5391979e739a2681e24ae6
This commit fixes frame header decoding for superframe index, to
prevent out of boundary memory read triggered by fuzz test
vector. It resolves a chromium security violation issue
crbug.com/376802.
The issue was introduced in the change:
Add VPXD_SET_DECRYPTOR support to the VP9 decoder.
cl-id I88f86c8ff9af34e0b6531028b691921b54c2fc48
where the buffer was read before validation check on index offset
applied.
A test vector is added accordingly.
Change-Id: I41c988e776bbdd1033312a668e03a3dbcf44ca99
This patch appears to have introduced non-determinism and/or
mismatch from debug vs release.
This reverts commit 5daef90efc.
Change-Id: I80081e55cfeaaa821b510b58a4e6e6328003c7da
The current decoding scheme will decrease the reference count
of the output frame when finish decoding. Then the application
could copy the frame from the decoder buffer to application buffer.
In frame-parallel decoding, a decoded frame will not be outputted
until several frames later which depends on thread numbers. So
the decoded frame's reference count should be decreased only
after application finish copying the frame out. But due to the
limitation of vpx_codec_get_frame, decoder could not know when
application finish decoding. So use a index last_show_frame to
release the last output frame's reference count.
Change-Id: I403ee0d01148ac1182e5a2d87cf7dcc302b51e63
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
This patch allows the encoder to skip the
un-neccessary motion search in the first pass. It
calculates the error of the zero motion vector using
the last source frame as reference and skips the
further motion search in the first pass if the error
is small.
The encoding speedup of the first pass for slideshow
videos is over 30%. Borg test shows the overall PSNR
performance remain approximately the same (derf -0.009,
hd 0.387, yt 0.021, stdhd 0.065). Individual clips may
have either PSNR gain or loss. The worst PSNR perfomance
is from yt set, with a PSNR loss of -1.1.
Change-Id: I08b2ab110b695e4689573b2567fa531b6457616e
* Only use ZEROMV, disalowing the intra modes that were previously
tested.
* Score rate and distortion as zero.
Change-Id: Ifcf99e272095725f11da1dcd26bd0f850683e680
Really just armv7. This is a convenience target intended to make iOS
development with libvpx easier. Xcode projects with default settings
will fail to build when a framework lacks armv7s support when targetting
iOS7.
Change-Id: I7eb80d52eec25501febc0d2c3c0b4ed964b8ed5b
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.
In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.
Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
tests failing under Win32/Win64
+ dct16x16_test: add missing avx2 functions (partially disabled)
exercises the forward transforms
no idct/iht implementations, so the c-code is used
Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.
Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.
The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.
Change-Id: If4dc74ed2df359346b059f47fb73b4a0193ec548
Use of stack frame variable "fps" beyond the lifetime of the function.
fps is sent as a paremeter to output_stats and stored in the
packet holding this encoded frame. This has scope beyond the
lifetime of the calling function.
This reverts commit 3f95a230c7
Change-Id: Icd8e14b3d7dd733590ada12e619b9dce95b6b0f5
* changes:
gen_msvs_*proj.sh: strip SRC_PATH_BARE from obj names
*.mk: pass SRC_PATH_BARE to all GEN_VCPROJ invocations
build/msvs: fix builds in source dirs with spaces
When this compiler flag is enabled, the encoder will write a denoised,
uncompressed, version of the input to denoised.yuv.
Change-Id: Ie0247f76b23219d95fe97dd70f23e097d742c249
This commit enables unit test for SSSE3 16x16 inverse 2D-DCT with
10 non-zero coefficients. It includes a new test condition to
cover the potential overflow issue due to extremely coarse quantization.
Change-Id: I945e16f05dfbe19500f0da5f15990feba8e26d99
The SSSE3 implementation might find a potential overflow issue in
its second 1-D transform, if all input residual pixels are close to
255. This commit fixes the issue and re-enables the unit test on
the SSSE3 version.
Change-Id: I0520478abdab7afd3ff2842516bec951111e9b3c
This commit reworks the unit test for 8x8 forward/inverse
transformation. It adds extreme input value test to detect overflow
issues in the intermediate steps.
It temporarily disables unit test for the SSSE3 version, which
showed overflow failure in the new test conditions.
Change-Id: I7caf10bba4b6db031add65d8c0eb99426b38aa42
Right now there is just one place to check: xd->lossless and for the first
pass there is a function is_lossless_requested().
Change-Id: I949a6834e64ce51e422e2892f097f2b871b5429a
In Aq mode 2 for kf/arf/gf the segment q delta
is calculated and then applied by re-quantization without
going through the rd loop again. If the base Q != 0
but the segment Q == 0 (lossless) this can could give rise
to a situation where we have an illegal combination of
transform size and Q. (Q == 0 requires that all blocks
are coded 4x4 WHT).
Change-Id: I241a58c6494ed442e9e4630070b0cde0fb99ae45
...when configured below the path containing spaces. configuring outside
the path containing spaces still won't work due to issues with the
makefiles, e.g.,
/path with spaces/git
/path with spaces/build1
/build2
configure/make in build1 will work, build2 will not
Change-Id: Ie4a1f313596d7457cadd67476ac1dbd3273ad46e
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.
Fixes a bug in original patch:
(https://gerrit.chromium.org/gerrit/#/c/70163/8)
that was reverted due to a nightly test failure.
Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
This commit applies quantization process with coarse quantization
step size to the forward transform coefficients and tests all the
inverse 16x16 DCT and ADST implementation versions with the
dequantized coefficients as input, to verify that the outcomes
match the prototype.
Change-Id: I68034a6126b45192c87d8c642155290e89bff8fa
In frame parallel decoding mode, there will be still several frames inside
the decoder when application stop calling vpx_codec_decode to decode frames.
The application will need to keep calling vpx_codec_get_frame to get all the
remaining decoded frames in the decoder.
Change-Id: I2ce8260a91282f045bb9a6093ff8a606b1990f14
This commit added a call to set speed feature before initializing
motion search, fixed the problem where unintialized search method
is used before its value being set.
Change-Id: I537e4612bf0d00fd6f51396fd222d4b3bd6fde58
By enabling the OUTPUT_YUV_SRC compiler flag, the encoder will write the raw
input to bd.yuv.
The functionality was mostly implemented, but in its previous state did not
compile.
Change-Id: Ia331ad0f4c6e6f9f51e8d42cd33ba8cc146b3dbf
Making this consistent with intra mode masks: you need to specify
allowed inter/intra modes to use.
Change-Id: Iaecd28bf79047259707d8e7a59a57bb7b856383e
SEG_LEVEL_SKIP requires the block size to be at least 8x8. Attempting to
use it on smaller partitions causes the decoder to reject the bitstream.
Change-Id: Ia7188cdf8ae5ac1df6bd29f3f80dbb0610e1f7b1
An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.
Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
Now match the "C" version of "Fix to reduce block
artifacts from vp8 temporal denoiser."
(see change id Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f)
Change-Id: I99e569bb6af4ae3532621127e12bf917a48ba08e
In the current logic, if the sse for zero motion is smaller
than the sse for new_mv (i.e., best_sse), we may still end up
using the non-zero mv for denoising (if the magnitude of new_mv is above threshold).
This can happen for very noisy content, and can lead to artifacts.
This change ensures that we always use zero_mv (over new_mv) for
denoisng if sse_zero_mv <= best_sse.
Change-Id: I8ef9294d837b077013b77a46c9a71d17c648b48a
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
This code dates from the ancient past and
applied an error score weighting based on pixel
brightness. This not seem to be providing any
benefit metrics wise and could be making some
visual issues in dark frames worse.
The field is left in place in the FIRSTPASS_STATS data
structure in this patch, pending changes to unit tests that
use a pre-defined first pass file.
Change-Id: Id50f04205230234858e7548ce523f11acaf3567d
The subpixel SSSE3 was fixed in this patch:
https://gerrit.chromium.org/gerrit/#/c/70283/
So the equivalent AVX2 is fixed accordingly.
Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a
The previous change only tunes forward transform. It doesn't affect
the neon implementation of the inverse transform. Hence turn the
unit test on.
Change-Id: I4f0f43783b98814d1eee53182209f9669d538140
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.
Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
The intepolation filter functions can be better tested withe extreme
values, especially given the optimization functions are prone to
overflow signed 16 bit intermediate value when operation order is
wrong.
Change-Id: I712142b0bc1e5969c692c0486a57ffa37c9742b5
Further changes to first pass allocation for gf/arf groups.
Three variables removed from TWO_PASS structure as only
now used locally. Dont adjust gf_group_bits in the post
encode update as this will no longer have any effect.
Change-Id: Iff89b225db923fc856f5d2aedbc899f1d7d68b55
In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".
Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
Restructuring to allocate the bits for each frame in
a GF group at the time the group is defined.
At the moment the allocation closely mirrors what
we had before.
Also changes the default rate adjustment method to
LONG_TERM_VBR_CORRECTION.
Change-Id: Ie5793c46c6b9c888cead5d8790792efd7d60b7c1
Fixes a bug introduced in
https://gerrit.chromium.org/gerrit/#/c/69779/13, where
uninitialized frame buffers due to corrupt and short
buffer sizes, may cause a crash.
This patch fixes the currently failing
video/processing/static_image/vp8_convert_test
Change-Id: I1b09e21482f292c11a2bfb4e570aef1d643410a7
As mismatchs were found between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.
Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
Use the appropriate subblock offset mode info rather than the parent
block base, when filling mbmi in the pc tree in nonrd_use_partition.
This mimics what is done in the vertical case and what is done for
both cases in nonrd_pick_partition.
This change has little practical effect at the moment since in speed 5
rt horizontal and vertical partitions are currently only used unpaired
at edges of the picture.
Change-Id: I4632f66ca84086dac56c7d36b45ddbe38a06f42a
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.
Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
If we are already saving a lot in bits from the target (maximum)
bitrate in the constrained quality mode, allow the quantizer
to go lower than the cq level. This hopefully will solve issues
with getting too low a bitrate and consequently poor quality for
certain videos in cq mode.
Change-Id: I1c4e8b0171fcf58f95198b3add85eea5f3c8f19f
If the denoiser filter causes too big a change in the absolute pixel difference
(between source and denoised signal), the block is not denoised, which can cause
visual block artifacts. This change applies a second adjustment to the temporal filter
to effectively allow for a (weaker) denoising for such blocks (which can keep
the absolute differnence within the tolerance range in most cases).
This helps to reduce some of the block artifacts from the denoising.
The additional cost of re-applying the filter to this set of blocks is low,
as the percentage of blocks per frame (with too big a change in absolute pixel difference)
is typically small, 2-5%.
Change-Id: Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
This commit changed to enable the encoder to adjust motion dection
speed threshold based on picture size. In addition, cpu-used 1 now
does a partition search every other frame instead of every third
frame for low resolution inputs.
The change has no quality/speed impact for 720p and above. Test
showed the change increase encoding time by between 3% to 6% for
cpu-used 2 encodiong of 360p sequences. It also has a compression
gain about .3%.
For cpu-used 2, the change resolved some very disturbing visual
artifacts in certain sequences when large block partitionings and
transforms are used as a result of copying the partition from a
previous frame.
Change-Id: Ic7fd22508cdb811d4ca935655adbf20109286cfa
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.
Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.
|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|
For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.
For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
This commit modifies the x86inc to allow explicit local buffer
allocation and the corresponding stack pointer adjustment.
Change-Id: I3cb2174e0242b5869a4ba0ca0cd240ee066836c3
MMX variance code in vp8 was reading out of bounds..
TODO(JBB): The best fix would involve removing duplicate library
functions between vp8 and vp9...
Change-Id: I5722853a6a58d3b55257ff385fa54c773bf98ded
This commit adjusts the forward 16x16 DCT computation steps to
simplify the register level operations. It fixes the corresponding
sse2 version accordingly.
Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.
Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
speedup on related clips: 4% - 9%
Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
Use VPX_TEST_NAME instead of the script name sans path and extension
when reporting test results when the variable is not empty.
Also: Clean up some style nits while I'm at it.
Change-Id: I0319745a3b7a90d0f307e55c5108fea2204187cd
The commit changed to use memset for initialiazation of non-trivial
strucutures, where initialization using {0} caused warnings. Also,
removed {0} initializations where appropriate initialization calls
are in place.
Change-Id: Ifd03e34aa80688e382124eb889c0fc1ec43c48e6
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.
Also, remove the vizualization code from VP9
since it is out of date and will not compile.
Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
disabled by default, enable with:
--enable-experimental --enable-spatial-svc
this disables vp9_spatial_svc_encoder and svc_test, further work is
needed to remove internal lib references
Change-Id: I6a487ecbf07eb98843a99d96e17f08f960b63088
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%
Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
New name: vpx_temporal_svc_encoder.c
Also, update comment to note that example supports VP8 and VP9.
Change-Id: I6fffab81296f918ebca740192a5c609593852dff
The various motion search functions share a
common function prototype. In the case of
vp9_full_range_search() two of the parameters
are not needed.
Change-Id: I0e190af54a3b3f276409f20e8ec55912f9b0b798
Simplify the calculation of KF bitrate in similar way
to previous patch for GF/arf.
This has no impact on derf or std hd sets but gives a
small net gain of ~0.1% for yt and yt-hd sets.
Change-Id: Ida64ac1428d9c2a62adb67056fadbf0180eff030
The variation in boost calculation for gf and arf groups
is not significant enough to justify the extra complexity.
Also removed some other spurious code that no longer
has much material impact.
The handling of the rare case, where the boost bits
number is less than the number of bits a that would
be allocated if a frame was not boosted, will be dealt
with in a subsequent patch.
This change actually helps on all sets a little by
~0.1% - 0.2% with slightly bigger gains on SSIM.
Change-Id: Id42c1ac22a80a8c4993cfa0e51bc733eb9ed4f75
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.
Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
This reverts commit 81ad047ee5.
Revert "VP8 for ARMv8 by using NEON intrinsics 15"
This reverts commit 727af7cebe.
This exposes a bug in gcc 4.9 regarding register allocation. Will reland
when 4.9 is fixed.
Change-Id: I2d8a04e4edde93719280e41550f4c0765608ec4d
The warning messages complained that there are unused arguments
in a few prediction modes. This structure was designed on purpose,
such that a wrapper function can cover all prediction mode cases
and make them readily accessible as an pointer array.
This commit silences such warnings.
Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
Allow slightly larger minq-maxq range for P frames. This improves
the compression performance of speed -5 for rtc set by 2.7% in psnr.
Change-Id: I438653d52d0fe51111509c6092e2334bac2de0cf
+ the remnants in the build system & README
the documentation that required php was removed in:
50fa585 Removing examples code generation and making them static.
Change-Id: Ibf00dca9ab2715fc21e8de358807b63d1445662c
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.
Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
Re-factor duplicate code.
Add two pass check for use of section_intra_rating as
it is un-initialised in the 1 pass and rt case.
Change-Id: I93120796f07961b8a21fb26e1a9f0d3d13949994
One of a series of changes to clean up two pass
allocation as precursor to support for multiple arf
or boosted frames per GF/ARF group.
This change pulls out the calculation of the total bits
allocated to a GF/ARF group into a function, to aid
readability and reduce the line count for define_gf_group().
This change should have no material impact on output.
Change-Id: I716fba08e26f9ddde3257e7d9b188453791883a3
This commit enables a chessboard pattern for partition search. All
the black blocks run regular partition search ranging from 8x8 to
32x32. The rest white blocks take the nearby blocks' information
to adaptively decide the effective search range.
The compression performance for rtc set at speed -5 is down by 1.5%.
For pedestrian 1080p at speed -5, the runtime goes from 41594 ms to
39697 ms, i.e., about 5% faster.
Change-Id: Ia4b96e237abfaada487c743bca08fe1afd298685
The test vector has segment enabled with different quantizer used for
different segments for bot the first frame(key) frame and the rest of
non-key frames.
Change-Id: I7e21122183050ee046219caba483c18cbc34afe7
Fixes the idecoder in the case where:
cm->error_resilient_mode == 0, and
cm->frame_parallel_decoding_mode == 0, but
new_fb->corrupted == 1.
The assert in debug_check_frame_counts fails to
take into account the case of a corrupt frame.
Change-Id: Idf318a68458cc88d65d6f3f408a10d8ffe87e43f
* changes:
Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT
Change eob threshold for partial inverse 8x8 2D-DCT to 12
SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
tx_mode supercedes whatever mechanism is used to push for 16x16
allowing for the use of the 4x4 transform.
Change-Id: I6c3f05ab9fe52050e40cc6303de9334653763289
We only used two members from that struct: max_threads and inv_tile_order.
Moving them directly to VP9Decoder struct.
Change-Id: If696a4e5b5b41868a55f3cc971e1d7c1dd9d5f69
This reverts commit 4725ab7e51.
The constants are necessary to avoid breakage in vs9 builds:
warning C4180: qualifier applied to function type has no meaning; ignored
error C2436: 'f2_' : member function or nested class in constructor initializer list
while compiling class template member function 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>::tuple(const int &,const int &,unsigned int (__cdecl &))'
..\test\variance_test.cc : see reference to class template instantiation 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>' being compiled
Change-Id: Ia218b74fc473d40f02fee84cb7009adfbe82e5a7
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.
The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).
Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
vp9_is_upper_layer_key_frame() definition does not match declaration--
it was missing the second const.
Change-Id: I71312579eb443be1924b8b06d8b3177c3dcb40f3
This commit enables ssse3 assembly implementation of the 8x8
inverse 2D-DCT with only first 10 coefficients non-zero. The
average runtime for this unit goes down from 198 cycles to 129
cycles (34.8% faster).
Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
Merged minq tables for arf and gf cases.
These tables were almost the same and for
VBR the arf table was not used at all.
Change-Id: Ie3c87e91dab613cf06f6945ac1ace0e0e4213d34
Small adjustment to the active Q range calculations.
These changes should slightly extend the available Q range
for KF/GF/ARF and narrow it for other frames.
The results for this change in isolation are broadly positive
for SSIM and average PSNR and slightly up but mixed for opsnr.
derf +0.293% opsnr, +1.286% SSIM
std-hd + 0.528% opsnr, + 1.746% SSIM
yt +0.056% opsnr, +0.457% SSIM
yt-hd -0.147% opsnr, + 0.226% SSIM
Change-Id: If065280342027ecc5d44b49fc1d440dfef041002
The test vector is produced to have a single key frame, with segment
map enabled and transmitted. Yet no segment feature is active.
Change-Id: I365d62f00d05c07098b9a76fc8d3a991e427ec1a
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.
This reverts commit 89fbf3de50.
Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
When the variance is far less than sse, the block is considered to
be under light change. All the energy is compacted into DC coeff
and can be coded at low cost. In such situation, switch the rate-
distortion modeling from sse+var based back to variance based.
Note that this is a temporary solution to handle the rare situations
where the scene light changes.
Change-Id: I1ee0fe2b9eda6b5fac40152e1841bf23f4d229fd
This reverts commit c500fc22c1
There is an issue with gcc 4.6 in the Android NDK:
loopfiltersimpleverticaledge_neon.c: In function 'vp8_loop_filter_bvs_neon':
loopfiltersimpleverticaledge_neon.c:176:1: error: insn does not satisfy its constraints:
Change-Id: I95b6509d12f075890308914cc691b813d2e5cd9f
This reverts commit a5d79f43b9
There is an issue with gcc 4.6 in the Android NDK:
loopfilter_neon.c: In function 'vp8_loop_filter_vertical_edge_y_neon':
loopfilter_neon.c:394:1: error: insn does not satisfy its constraints:
Change-Id: I2b8c6ee3fa595c152ac3a5c08dd79bd9770c7b52
Pulling libwebm from upstream
Changes from upstream:
249629d make Mkv(Reader|Writer)(FILE*) explicit
7f3cda4 mkvparser: fix a bunch of windows warnings
5c06178 Merge "clang-format on mkvparser.[ch]pp"
4df111e clang-format on mkvparser.[ch]pp
7b24501 clang-format re-run.
c6767b9 Change AlignTrailingComments to false in .clang-format
9097a06 Merge "muxer: Reject file if TrackType is never specified"
eddf974 Merge "clang-format on mkvmuxertypes.hpp and webmids.hpp"
def325c muxer: Reject file if TrackType is never specified
41f869c Merge "clang-format on webvttparser.(cc|h)"
fd0be37 clang-format on webvttparser.(cc|h)
207d8a1 Merge "clang-format on mkvmuxerutil.[ch]pp"
02429eb Merge "clang-format on mkvwriter.[ch]pp"
0cf7b1b Merge "clang-format on mkvreader.[ch]pp"
2e80fed Merge "clang-format on sample.cpp"
3402e12 Merge "clang-format on sample_muxer.cpp"
1a685db Merge "clang-format on sample_muxer_metadata.(cc|h)"
6634c7f Merge "clang-format on vttreader.cc"
7566004 Merge "clang-format on vttdemux.cc"
9915b84 clang-format on mkvreader.[ch]pp
7437254 clang-format on mkvmuxertypes.hpp and webmids.hpp
0d5a98c clang-format on sample_muxer.cpp
e3485c9 clang-format on vttdemux.cc
46cc823 clang-format on dumpvtt.cc
5218bd2 clang-format on vttreader.cc
1a0130d clang-format on sample_muxer_metadata.(cc|h)
867f189 clang-format on sample.cpp
4c7bec5 clang-format on mkvwriter.[ch]pp
9ead078 clang-format on mkvmuxerutil.[ch]pp
fb6b6e6 clang-format on mkvmuxer.[ch]pp
ce77592 Update .clang-format to allow short functions in one line
0a24fe4 Merge "Add support for DateUTC and DefaultDuration in MKV Muxer."
11d5b66 Merge "Add .clang-format"
a1a3b14 Add .clang-format
0fcec38 Add support for DateUTC and DefaultDuration in MKV Muxer.
Change-Id: Ia0ed161ffc3d63c2eba8ed145707ffe543617976
In a future release we plan to remove the
option of setting the ARNR filter type.
This patch marks this control as being deprecated
as advance warning that it will be removed from
the API at some point.
Change-Id: I5dcca804b44c7c93b1a10da7d69d19ba6061869c
Added macro to conditionally compile some of the
post-processing functions only when CONFIG_POSTPROC
is defined.
This was causing the build for the generic-gnu target
to fail.
Change-Id: Ibfa447feceb7a0528135025f105be48f97e9965c
The rounding of the ARNR filter output prior to
normalization by the filter strength was incorrect
when strength = 0.
In this case 1 << (strength - 1) would not create the
required rounding of 0, rather it would outrange. This
patch fixes this issue.
Change-Id: I771809ba34d6052b17d34c870ea11ff67b418dab
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.
Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
The microsoft build tools explicitly disallow building for arm in
the "desktop" target configuration; one has to target "Windows
Store" apps (aka WinRT/Metro) or Windows Phone. In Visual Studio
2012, one could just pick the v110_wp80 toolset which made the
vcxproj files buildable. In Visual Studio 2013, picking the v120_wp81
toolset isn't enough - one has to configure the vcxproj files
as an "AppContainerApplication". This has the implication that
you can't just build a plain .exe (such as the examples) - an .exe
project would need to have an AppxManifest file. Therefore we can
only build the library itself.
If loaded into Visual Studio for Windows (the Windows Store/Phone
version of Visual Studio, not the Desktop one), the obj_int_extract
project is omitted since it's treated as incompatible. Building
from the command line with msbuild works fine though.
The armv7-win32-vs12 target was added as part of a638bdf4 even
though actual use of it hadn't been tested.
Change-Id: Iee8088252cf790317aeb6b417d29058225f1f629
The getenv function doesn't exist there. In Visual Studio 2012,
the function still existed in the link libraries even though
it was hidden in the headers, but in the 2013 version it has been
removed from the link libraries as well.
Change-Id: Iea6289a698fa1788e906f5aabb6fddda3675815b
Disable register checks when vp9 is not configured. Soon vp8 assembly
will move to intrinsics, obviating this check.
This will still run the check when vp9 is enabled.
Change-Id: I90f50d22cb8c15e9c07f2c8e830e08de7fce0689
This does not do the full toolchain setup like the arm builds. It only
allows for ndk-builds. See the instructions in tests/android/README or
the webm jnin bindings project:
https://chromium.googlesource.com/webm/bindings/+/master/JNI/README.Android
Because this support is not quite polished, the build targets must be
forced. Please use
--force-target=x86-android-gcc --disable-ssse3 --disable-sse4_1 --disable-avx2
--force-target-mips-android-gcc
Change-Id: Ie2b6623f71ac816e3965c39bf97097e9d30b6e94
When ARNR filtering is disabled, by setting
arnr_max_frames=0, mode_skip_mask was being set to
-1 for the ARF frame resulting in no mode being
selected for the block.
The intent is to restrict the reference frame to the
previous ARF frame and the mode to one of ZEROMV,
NEARMV or NEARESTMV.
Change-Id: Ifc3920b153142cd01d422910c94d2f20ffb6f129
On balance Deb's modified rate control for VBR seems
to be outperforming especially on some low motion YT
clips so I have switched this to be the default mode for
now.
Change-Id: I0713d430cad6425ac5c48fccdf332e12814ee44a
Used horizonal add instructions instead of adding
byte lanes. The encoder performance improved by
~4% for the test clip used.
Change-Id: Iaddd10403fcffb5b3f53b1f591ab2fe0ff002c08
This patch did a cleanup following the commit "Save NEON registers
in VP8 NEON functions". The pushing/poping of callee-saved NEON
registers was moved into individual NEON functions. Therefore,
we don't need to save those registers at the beginning of codec.
The related code was removed.
Change-Id: I5648166514fc9beffb780aa138495597731f49ea
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.
Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
This reverts commit 59e733ca81.
Hold off removing arnr_type to give users the opportunity
to change their script files to handle its deprecation. A
follow-up patch will mark the control for setting arnr_type
as deprecated and it will be removed completely in a later
revision of the code.
Change-Id: I8b817c744e144d3714234a4cd4309816d0c7e3e8
The recent compiler can generate optimized code that uses NEON registers
for various operations besides floating-point operations. Therefore,
only saving callee-saved registers d8 - d15 at the beginning of the
encoder/decoder is not enough anymore. This patch added register saving
code in VP8 NEON functions that use those registers.
Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
dist is broken in msvs currently due to a dependency on libs.mk which in
turn depends on the rest of the source tree, not just the examples
Change-Id: I3e313ceeae81eb29ef4bfb099d89756b43583eaa
This member of VP9_COMP seemed unnecessary since it
only shadowed VP9EncoderConfig.key_freq that is
accessible through VP9_COMP.
Change-Id: Ib751bb1cf1b0b3c50a2a527d7c34f6829dd6fee3
There are a few tests which read/write directly to/from WebM files. They should
be disabled when --disable-webm-io is passed.
Change-Id: Ibac4732e27c66da33082151ba6e6993eaa9a1efd
The global variables used in vpxdec.sh and vpxenc.sh have become useful
elsewhere: Define them in tools_common.sh instead.
Change-Id: I5b8dbd2e88c8d6b2f46c5c55d7711fa154c12b6a
The encoder was not handling requests to place keyframes at
fixed intervals, i.e. kf_min_dist == kf_max_dist, correctly.
In this case when looking to place the next keyframe it was
accumulating stats all the way up to the end of the firstpass
file. This patch corrects this behavior.
Change-Id: I948ad9f1d7faa0c05861df588136cce3bb61d7e7
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.
For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.
Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d
ARNR filtering is now forced to be centered on the ARF
frame and the other two options have been removed.
The other modes of constructing the ARNR frame were
not used and there does not seem to be any good
reason to maintain them.
This is purely an encoder-side change.
Change-Id: Ic772636d23f280752973852b9740083532a49de2
This patch fixed errors reported in Issue 746: "dr memory VP8
encode errors" and Issue 745: "dr memory VP8 decode errors".
The "UNINITIALIZED READ" errors were fixed in x86 assembly
code. The list of files fixed is
vp8_intra_pred_uv_tm_sse2
vp8_intra_pred_uv_tm_ssse3
vp8_intra_pred_uv_ho_mmx2
vp8_intra_pred_uv_ho_ssse3
vp8_intra_pred_y_tm_sse2
vp8_intra_pred_y_tm_ssse3
vp8_intra_pred_y_ho_sse2
Change-Id: Ib6df7bf1d442077fe534edfd90e50ad16fadacdd
For speed 3 and above, such search is only allowed at speed 3.
The change helped cif and stdhd set by 1.2% and .7% in compression,
but increased the encoding time by around 5%.
Change-Id: Ifa4832327f1c1bef3decb032ceb769cbf50e059f
Adds test code to verify that supplemental superframe information
that precedes the normal superframe information will not break
decoding.
Change-Id: Ia252b887d7ee138f51dc9a778376ff739402c455
The end_useage parameter is confusingly named since it
now actually defines the rate control method used.
Change-Id: I98912caabfe556b7af0b939a645d1336409e4d71
This commit enables a background detection approach for adaptive
quantizer control. It combines the cyclic refresh pattern and the
background information to determine the segment id for adaptive
quantizer selection, prior to the non-RD mode decision process.
It hence allows proper quantization information update for a more
precise rate-distortion modeling in the non-RD mode decision.
The compression performance of speed -5 for rtc set is improved
by 2.5%, at no speed change.
Change-Id: Ic3713e8ed9185b403b5b1679d19dabd57506d452
1. We didn't scale source image in lower layers so that
the stats are incorrect.
2. We didn't extend borders for re-constructed image.
Change-Id: Ia8d7bafbdb695ffa7f504e171f9449812e7bb0a3
Remove duplicate WebM parsing code in test/webm_video_source.h and linking it
against webmdec.c which does the exact same thing.
Change-Id: Ib7152eecde890fca58be42028cab18c9cb54221c
To make direct side by side testing this patch combines two
VBR corrections schemes to allow more direct side by side testing.
(The other patch was by Debargha chg id I0cd1f7...)
Change-Id: I271c45e5c4ccf8de8305589000218b80d9dc3a25
The background detection only tracks luma component. This commits
removes the frame buffer pointer retrieval for chroma components.
Change-Id: I098bd2950f5e5829ed5dc2b48568167248da7fad
This patch sets up a quad_tree structure (pc_tree) for holding all of
pick_mode_context data we use at any square block size during encoding
or picking modes. That includes contexts for 2 horizontal and 2 vertical
splits, one none, and pointers to 4 sub pc_tree nodes corresponding
to split. It also includes a pointer to the current chosen partitioning.
This replaces code that held an index for every level in the pick
modes array including: sb_index, mb_index,
b_index, ab_index.
These were used as stateful indexes that pointed to the current pick mode
contexts you had at each level stored in the following arrays
array ab4x4_context[][][],
sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][],
sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][],
sb16x32[], sb32_context[], sb32x64_context[], sb64x32_context[],
sb64_context
and the partitioning that had been stored in the following:
b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning.
Prior to this patch before doing an encode you had to set the appropriate
index for your block size ( switch statement), update it ( up to 3
lookups for the index array value) and then make your call into a recursive
function at which point you'd have to call get_context which then
had to do a switch statement based on the blocksize, and then up to 3
lookups based upon the block size to find the context to use.
With the new code the context for the block size is passed around directly
avoiding the extraneous switch statements and multi dimensional array
look ups that were listed above. At any level in the search all of the
contexts are local to the pc_tree you are working on (in?).
In addition in most places code that used to call sub functions and
then check if the block size was 4x4 and index was > 0 and return
now don't preferring instead to call the right none function on the inside.
Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9
There is no need to initialize source/dst frame buffers at frame
level. These will be done at block coding stage. This commit hence
removes the redundant operations.
Change-Id: I11d9f2556058c6205c8e58ed53e31f78622c41b7
Add code to monitor over and under spend and
apply limited correction to the data rate of subsequent
frames. To prevent the problem of starvation or overspend
on individual frames (especially near the end of a clip) the
maximum adjustment on a single frame is limited to a %
of its un-modified allocation.
Change-Id: I6e1ca035ab8afb0c98eac4392115d0752d9cbd7f
This commit compares the current original frame to the previous
original frame at 64x64 block level and decides if the entire
block belongs to background area. If it is in the background area,
skip non-RD partition search and copy the partition types of the
collocated block in the previous frame.
For vidyo1 in the rtc set, this makes the speed -5 coding speed
about 8% faster. The overall compression performance is down by
1.37% for rtc set.
Change-Id: Iccf920562fcc88f21d377fb6a44c547c8689b7ea
This commit added a check of reference frame to make sure that pre
buffer pointers are initialized only when necessary and make them
to 0 if ref frame is intra, hence those buffer should never be used.
Change-Id: Ieb474fcd9feb759f02e2f9c282b7348a8fa31117
Delete code relating to the old VP8_TUNE_SSIM flag
as this code does not currently work and is largely made
redundant in VP9 by the various AQ modes.
Change-Id: I71f28e1f680573d296422254489000678552b17b
Remove duplicate rd_thresh code introduced when vp9_rd_pick_inter_mode_sub8x8()
was forked from vp9_rd_pick_inter_mode_sb().
Change-Id: I3c9b7143d182e1f28b29c16518eaca81dc2ecfed
Fix rate control bug whereby the rate factor heuristics
were being updated on arf overlays causing a rate surge
for a few frames followed by a corrective drop.
This fix eliminates many of the overshoot problems that
we were seeing on hard clips (even without applying
stricter vbr rate control) and also helps quality on
almost all clips with some hard clips improving by >5%.
Overall quality results measured at speed 2.
Derf +1.78% opsnr , +2.44% SSIM
Stdhd +2.41% opsnr, +2.85% SSIM
Change-Id: I2369df6295c2705963fa6307877f6acb304bcc39
Fix return values for webm_read_frame so that we can distinguish between
error and end of stream. 0 - Success, 1 - End of stream, -1 error.
Change-Id: Ic35d0c7d7a166e027711a3d2300ecdda25a5d0cc
We don't use declarations from this file. The real declarations
(differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad.
Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0
Added command line flags "resize-width" & "resize-height"
to allow the user to specify the frame size to encode at.
These two flags are ignored if the "resize-allowed" switch
is not set to 1.
All frames in the clip are then encoded at this size, which
must be smaller than the raw frame size.
Change-Id: I3d64bd9303d5c0bd678461a866a1ea621700d744
A previous path improved speed 2 quality a little but
more extensive testing showed that it slowed encode
by a few %.
The change will have a similar effect for speed 3 but
should not impact speeds 4+;
This experiment should reverse that and give a speed
up at the cost of a small quality loss.
Borg results pending.
Change-Id: I4493fc1541aaf44587f1a41ff219f7088da9252c
Both values are already checked as command line arguments:
RANGE_CHECK_HI(cfg, g_lag_in_frames, MAX_LAG_BUFFERS);
RANGE_CHECK_HI(extra_cfg, sharpness, 7);
Change-Id: I584798d587152d88dfd517c210054b466f4e5f8a
With a more approriate one vp9_setup_src_planes() as only src buffer
pointers need to be initialized here.
Change-Id: I40fac4d8b2d39eb7d0c65b9b6afab45138a13936
This increases the range of Q values available to
normal inter frames to allow encoder a better chance
to hit the target rate.
Change-Id: I33cd96469a46577fdcea631e26d3355710909e6d
The limits applied under the flag
"LIMIT_QRANGE_FOR_ALTREF_AND_KEY"
behaved in an undesirable way if the gap between
active_worst_quality and active_best_quality was
changed.
In this patch, the adjustment is made using the
vp9_compute_qdelta_by_rate() function and fixed
rate multiplier values. Hence it is not impacted by
the relative value of active_best_quality.
Change-Id: I93b3308e04ade1e4eb5af63edf64f91cd3700249
The cause is because VP9 encoding use vp8_vpxyv12_extendframeborders_neon
on arm which only extend boarder size 32. But VP9's border size is 160
Change-Id: I1ff7e945344a658af862beb1197925e677e8ff57
Problem has been introduced recently with the cleanup patch
I0816ec12ec0a6f21d0f25f10c214b5fd327afc6c
Change-Id: Iaacb956a6039eb5826b82618dc03be32053fb892
This commit changed the initialization of best_mode_index to -1 to make
sure it is not mistakenly used for mode masking.
Change-Id: I75b05db51466070dd23c4ee57a4d4b40764dc019
In mode selection loop, once mode_index pass mode_skip_start, all
modes with a different reference frame from current best mode are
masked out using mode_skip_mask.
However, the setting of mode_skip_mask may use an invalid mode if
there is no mode tested yet. This commit fixes the issue by making
sure a mode has been tested and selected. Otherwise, no mode will be
masked out because of their reference frame.
Change-Id: Ib0009e8a96836a65cf5347440fff8a2e1a67f29f
This patch fixed the uninitialized read errors in Issue 748:
"dr memory VP9 encode errors". In vp9_convolve_avg_sse2,
when width is 4, pavgb reads 8 bytes from dst buffer that is
out of range. An error is reported although the data is not
actually used later. This issue was resolved by preventing
uninitialized reads.
Change-Id: I109a54910aa47139cb13119de86f2062cff207df
The macosx release of clang v5.0 identifies itself as:
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
This version of clang uses the older _mm_broadcastsi128_si256, like
v3.3, as given away in the LLVM svn version above.
Change-Id: I4d6d59d5454efd57d2ae9e75f5eb7486af7cbd0c
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.
An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.
The selection of source_var_thresh needs to be investigated
further later.
RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.
Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
This commit allows the non-RD mode decision flow to select
prediction filter type in NEWMV mode. It provides 8.14% compression
performance gains in both settings of AQ=0 and 3. The current speed
impact is about 5% to 10% slower.
Change-Id: Id66ecebf77abd8f90fb3f6a066c0e8dfb4bf1c42
Adds some high-level hooks for profile 2 before further
progress on the implementation.
According to the definitiion in this patch:
1. Profile 2 only supports 10 or 12 bit color but not 8
2. Profile 2 supports all color sampling modes: 444, 422 and 420,
and alpha plane.
3. Profile 3 is currently undefined.
Please consider the definition carefully and suggest modifications
to the definition as needed.
Change-Id: I5b284fc679e54ac5aee171af72fa7994cfd28995
There was a bug with the decoder that if you started the decoder
with more threads than the first frame had tile columns. Afterwards
tried to decode a frame with more tile columns than the first frame,
the decoder would hang. E.g. run vpxdec --threads=4. The first frame
had two tile columns, then the next key frame had 4 tile columns, the
decoder would hang. If you started with 4 tiles and switched to 2
tiles the decoder would be fine. The issue is that the worker the thread
loop is using is stale.
I added a test vector "vp90-2-14-resize-848x480-1280x720.webm" that
exhibited the bug.
Change-Id: I7bdd47241a52ac0fe1c693a609bc779257e94229
Copy up to a certain bsize, otherwise set to a fixed bsize.
This helsp to reduce artifact near moving boundary caused by full partition
copy without checking motion of super-block.
This artifact can occur at speeds 3,4 in real-time mode.
Issue: https://code.google.com/p/webm/issues/detail?id=738.
Change-Id: I05812521fd38816a467f72eb6a951cae4c227931
ARF overlays now use the same rate correction factor as regular inter
frames, further testing would be needed to see if it makes sense to use
a completely separate rate correction factor for ARF overlays.
$ vpxenc --cpu-used=5 --fps=50/1 --target-bitrate=2000
parkjoy.y4m -o out.webm
=> Before: 3356 kb/s
=> After: 2271 kb/s
Change-Id: I73e4defa615ba7a8a2bdb845864f4b1721cbbffe
This commit estimates the motion vector rate cost right after full
pixel motion search. It combines this and the mode cost and compares
the corresponding rate-distortion cost. If it is already above the
current best one, skip the rest sub-pixel motion search and modeling
process. For pedestrian_area 1080p at 4000 kpbs, the speed -5 runtime
goes down from 39425 ms -> 38399 ms.
Change-Id: If4cd7119fd6c266798d5cf1d19d19ab425e52a26
Tests the basics (first confirms feature is available in vpx_config.h):
- VP8 decode (in IVF file).
- VP9 decode (in WebM file).
- VP8 encode (to IVF and WebM).
- VP9 encode (to IVF and WebM).
- VP9 lossless encode (to IVF, currently disabled due to failure).
- Pipe input (to vpxdec and vpxenc).
Test data path and path to vpx{dec,enc} have been parameterized. In
addition:
- Supports disabling tests (test names prefixed with DISABLED_ are not
run by default).
- Supports filtering tests.
vpxdec.sh: Tests vpxdec.
vpxenc.sh: Tests vpxenc.
tools_common.sh: Common test functions.
Change-Id: I0612c88b8dd6049a05bbbc79a317a0cca61733a5
Reinstates this macro and truns it on in order to avoid issues
due to some frames at the end starving in harder videos.
A more acceptable solution is in the works.
Change-Id: I3c46148e86fa6114e3fed245246fb3686a9e6700
This commit slightly increases the bit allocation for key frame. This
improves speed -5 coding performance by 2.77% with aq-mode=0 and by
2.78% with aq-mode=3.
Change-Id: Iaa3e777f80b9706306606af06e89852bac146659
in some configurations MSVS will define _off_t / off_t in wchar.h; the
former is used locally while the latter is for compatibility. this
change overrides off_t as in the past and sets _OFF_T_DEFINED to prevent
a clash in types.
Change-Id: I9b0e6db586a0a2729b545d93edfc56570d2fcf97
For real-time mode under cbr, this increases the gain (5-10%)
for speed 5 (none/little change for 6), on vc-clips.
Change-Id: I9b38beeb3c820de22c43a0ba53a9456168dd24ba
Turns off the DISABLE_RC_LONG_TERM_MEM macro and makes other changes
in the way the bits are updated, to make 2-pass rate control track
target bitrates closer.
Change-Id: I5f3be4b11c2908e6a9a9a1dd4fcf4e65531c44d8
This commit reduces the frequency of frames using finer quantizer
in non-RD coding flow, and slightly tune up the quantizer resolution
when used. It provides 1.7% compression gains in speed -5 at no speed
difference.
Change-Id: I430249a51260a841a0402666e5ec1566e4f7d5a6
Temporary revert.
Problems with conflicting definitions of type off_t
in MSVC builds that need resolving.
c:\Program Files (x86)\
Microsoft Visual Studio 9.0\VC\include\wchar.h(479) :
"error C2371: 'off_t' : redefinition; different basic types
c:\on2experimental\libvpx\tools_common.h(26) :
see declaration of 'off_t'"
This reverts commit 92a4c59112.
Change-Id: I535e40a18842a92e3e6e0b29e5fba66313010803
The new tolerance is a little higher than before (especially
for kf/gf/arf) so this change gives an encode speed up
for some clips up for speeds 0-2.
Change-Id: I63f7d6c9cc11c7f58742f41e250dcd3eab1741eb
This code/setting was actually not used (since speed features were not set on first frame,
until a recent change) and should be removed.
In CBR mode, the q value for the first frame can be controlled by setting
the target size via the parameters rc_buf_initial_sz (and max_intra_size_pct).
Change-Id: I65afc64972b36c449bd5a8c25800e65da5389066
While encoding a frame, its last frame source can be used to give
acurate motion information. This patch prevents last frame to be
overwritten so that it is available during current frame encoding.
The last source is scaled when it is necessary. cpi->Last_Source
points to the last source frame.
Change-Id: I0e1ef5e9e1d2badf9d0c7a1a44a7ed5b24c09425
Use a crude correction factor to correct for
lower compression efficiency at higher encode
speeds when estimating the max Q for the
clip.
Change-Id: I5ae377647f4adf5e91d700a8791fb3b8f70efc73
This commit optimizes the bit allocation for the non-RD coding flow.
It applies slightly better quantizer to the frames, where all blocks
run a non-RD partition search. Such frames typically have better
rate-distortion trade off, thus improving the reconstruction quality
for next few frames reference at reasonably low increment in rate
cost.
The coding performance for rtc set at speed -5 with error-resilient
tuned on and rate control set as cbr is improved by 19.58%. It improved
the coding speed by about 10% for a portion of local test clips.
Change-Id: I9d56696cd4359dc8136ca10aff10fff05aaa2686
For very large size video image, the scaling calculation may need use
value beyond the range of int. This commit upgrade the value to 64bit
to make sure the calculation do not wrap around INT_MAX.
The change corrected the decoder behavior.
The bug affects only very large resolution video because the scaling
calculation was sufficient for image size smaller than 2^13.
This resolves issue:
https://code.google.com/p/webm/issues/detail?id=750
Change-Id: I2d2ed303ca6482f31f819f3c07d6d3e98ef3adc5
This commit adjusted the speed steps in rt mode to make the steps
more evenly spaced on speed and quality, specifically:
1. Merged 3 and 4 into one single step 3 and removed confilicting
features.
2. Move 8, 7, 6, 5 to be 7, 6, 5, 4 repsectively.
Change-Id: I38d56d61531f3561d772aef953c411c8fb38c063
Root cause is the different default register length between x86
and x64 platform. Change spatial_layer_id to long long.
Change-Id: If1a5972365c7a59f7e76cb4fd714610f3d48a8ff
Small speed gain for speed 1.
Quality is generally a little up for speed 2.
Speed 3 was similar to speed 4 but now positioned more
evenly between 2 and 4 speed and quality wise.
(opsnr +5.6% ssim +8.25% across all sets)
Speed 4 is a little slower than before but sizable quality gains.
(opsnr +3.7% ssim +6.8% across all sets)
The code has been cleaned up a bit so that for each incremental
speed step changes over the previous speed step are applied.
This makes it easier to see what is changing from one setting to
the next.
Change-Id: I2d98d0d6230af23486adaec01908f58942a7cdeb
Allow tx search for ARF and GF helps quality but a little slower.
Setting subpel_iters_per_step to 1 improves encode speed.
Setting sf->mode_skip_start = 10 improves speed.
Initial local results suggest overall impact on quality is neutral
but encode is up to 15% faster.
Change-Id: Ibde02cae6626a44c10a1da0cefe888afbb51f037
Removes a TODO. Changed meaning of some parameters
(target-max-percent refresh and starting index) to be
defined relative to superblock. Also, modify turn-off condition.
Change-Id: I5e55f372b7079c24f9cdac0b06fa34620dbf456b
This commit enables the non-RD mode decision coding flow to
adaptively apply partition search in non-refresh frame, when the
collocated block in previous frame suggests there might be a motion
activity. It refactors the update_state_rt() function to support
buffer swap of mode_info struct, thereby unifying the encoding
stage across various non-RD coding modes.
It provides 5% compression performance gains in speed -6 for rtc
test set, at about 12% speed slow down.
Change-Id: Iefa374aed5a11c4b7ff9a3ed36a98ea8bd184edb
Fix so that vp9_update_segment_aq() will use the correct (i..e, chosen)
encoding mode (from ctx struct) in update_state.
Change-Id: Icc4b66f3935fad5ec4516a4d57e843d12c365e64
This commit allows the recursive non-RD partition search to early
terminate sub search tree when the cumulative rate-distortion is
already above the best available.
Change-Id: Ifdbcbb4bee229f47fde3033200829577c9f1fc1d
This commit added a speed feature to make the logic of calculating
skip_recode on a block level more explicit. This also enable the
feature to be enabled at speed 5 where the previous logic is too
conservative, help gain back the lost speed for --rt(-5).
Change-Id: Ieb37ca3e85c2e7bda343486edf13d5f5395f2233
There were a few conflicts between the new non-RD partition search
and recent clean-up patches, which were not caught by git control.
This commit fixed these issues.
Change-Id: Ieebefbd6c19d81d0d13e3c568877d5cce2ab7797
This commit takes out the if statements on using adaptive motion
search flag. This feature is automatically enabled in non-RD mode
decision flow for variable partition types search.
Change-Id: I5a25cf9109d84d07aa61b3e02c8d32dda1e90cb0
This patch fixed WebRTC Issue 3020: "Uninit error at
vp8_mbpost_proc_down_xmm". The first 8 values in d were not initialized,
but was accessed. This patch fixed c code as well as mmx and sse2 code.
Change-Id: Iaa5b41a4ed3bea971b15fb826ce34b7ab4e36fb1
This commit enables a recursive partition type search for non-RD
mode decisions. It allows the encoder to choose variable block
sizes in a 64x64 block based on rate-distortion modeling.
It improves speed -6 coding efficiency for rtc set by 2.4%. Most
of the gains come from 32-40dB range, where many sequences gain
about 5% to 20%. Local tests suggest there is no speed change.
Change-Id: I06300016e500a21652812b7b3b081db39a783d66
This commit reformats non-RD coding flow layout to allow mode
decision with fixed and variable block sizes.
Change-Id: I2cdd3bb9f26c499ee4a9849004fd925cdd195d09
2 functions were optimized for avx2 by using full 256 bit register
In order to handle 32 elements in parallel instead of only 16 in parallel:
1. vp9_sad32x32x4d
2. vp9_sad64x64x4d
The function level gain is 66% and the user level gain is ~1%.
Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
Fixed dr memory errors reported in Issue 736:
https://code.google.com/p/webm/issues/detail?id=736
All elements in left_col buffer need to be initialized to ensure
the correctness of SIMD operations in x86 optimized code.
Change-Id: I8e7f26ab45cca8099c1f9342bcf852f828bda7e4
The third pred_mv is stored in x->pred_mv[ref_frame]. This commit make
sure the correct mv is read.
Change-Id: Ibed24daf36703a63f0394c87b2381ee1d2eb7910
In non-rd pick_mode code, added mode skipping according to
thresholds. Used rd thresholds now, but we can modified them
later for real-time case.
RTC set borg test showed a 0.095% PSNR gain. For different rtc
clips, the real-time(speed 7) encoder speedup is 2% - 10%.
Change-Id: Ic72535c96b891092c662453be32d3168f7e34dcc
The flag x->skip_recode interacts badly with
the cpi->sf.use_nonrd_pick_mode and
cpi->sf.skip_encode_sb speed settings.
Restricting the use of the skip_decode flag when
these other speed choices are in use helps quality
for speeds 3 and 4 by a large amount with only a
small impact on speed.
Average improvmentes for 2 pass speed 4:
Derf +8.8%
Yt + 10.53%
Std-Hd +6.95%
yt-hd + 22.95%
Change-Id: I8010876d8012042a11077c92e69d813c3dfa58eb
This commit changed how q is validated in lossless mode. With this
commit, when --lossless=1 is specificed at commandline, --min-q and
--max-q are now ignored. This is to make the option non-ambiguious.
Change-Id: I33e85690460537509d33be75d6a3597be4affc09
One of the tests for real-time mode is failing at speed 6.
Introduced recently, will enable again when fixed.
Change-Id: I8f42de6a3eca226c9aa5c5e1fab98d629993c087
Instead of hardcoding a certain indentation, use the regexp to
provide similar indentation for the new line as well.
Change-Id: Iacb2621b35ce7e1aa3980c1603b8e3ab02d98a35
This is an initial attempt to allow variable block size partition
in non-RD coding flow. It tests 8x8, 16x16 and 32x32 block size per
64x64 block, all using non-RD mode decision and the associated rate
distortion costs from modeling, then selects the best block size to
encode the entire 64x64 block. Such operations are triggered every
other 3 frames. The blocks of intermediate frames will reuse the
collocated block's partition type.
It improves the compression performance by 13.2%. Note that the gains
are not evenly distributed. For many hard clips, the compression
performance is improved by 20% to 28%. Local speed test shows that
it will also increase runtime by 50%, as compared to speed -7. It is
now enabled in speed -6 setting.
Change-Id: Ib4fb8659d21621c9075b3c369ddaa9ecb0a4b204
This makes sure that labels for data symbols directly after
functions get properly 4-byte-aligned (when the source is assembled
in thumb mode).
Previously, if declaring a data symbol directly after a function, the
symbol could end up pointing to the unaligned address (if the total
size of the thumb function didn't end up being a multiple of 4). The
data in the symbol itself ended up aligned, but the symbol pointed to
the preceding unaligned position.
That is, a source file looking like this:
---
...
ENDP
symbol
DCD 0x12345678
---
could end up being assembled into
symbol:
xxxxx2: 0000
xxxxx4: 5678
xxxxx6: 1234
(This doesn't happen if the symbol label is on the same line as the
DCD directive.)
By adding an ALIGN 4 directly after the ENDP we make sure the symbol
itself gets aligned properly.
This isn't an issue with the original, untranslated arm source,
since it only is built in arm mode where all instructions are 4 byte,
and since the gnu assembler automatically adds the padding before the
symbol even in thumb mode.
Change-Id: Iadbeebd656b0197e423e79a12a7d3ef8859cf445
1. Save stats for each spatial layer
2. Add frame buffer management for svc first pass rc
3. Set default spatial layer to 1
4. Flush encoder at the end of stream in test app
This only supports spatial svc.
Change-Id: Ia89cfa87bb6394e6c0405b921d86c426d0a0c9ae
<=sse2 isn't strictly necessary on x86_64, but this is more consistent
with the rest of the flags and should be harmless
Change-Id: Ice0f1d1c4c7510ee90af2a62dbd3d6508db63487
inheritance should be public; also correct placement of ClearSystemState
as the base class doesn't inherit from testing
Change-Id: I0f41330fccc62a70b8dd40d66bbd829b9d98cf84
This reverts commit 89025585cd.
This check breaks BSD builds and isn't useful through the configure
process. The README describes the build environment requirements (GNU
make).
Change-Id: I25f8a9c1640909412ab405dbd09a1c4d93e5a511
The use of uninitialized skip flag will trigger inconsistency in
coding statistics, when alternate RD and non-RD coding modes are
enabled. This commit fixes this issue and removes unnecessary if
statements from update_state_rt.
Change-Id: I7d549dcb0e3ef48b999e5bbc78174ba84502cfcf
The comment made it look like the condition code was dropped from
the extra add instruction, while it actually was handled properly.
Thus, the comment was misleading while the code itself did the right
thing.
Also clarify the comment indicating that we use the full three-operand
form of the add instruction.
Change-Id: I2c1ac6ac4fedf262d104ea30a6c005febc74de9c
Adding a --(enable|disable)-webm-io flag to control WebM container input and
output support. For now, enabling WebM IO by default only when there is a C++
compiler. Doing so because eventually we will move WebM IO to libwebm and it
is built using C++.
Change-Id: I210ac36c23528e382ed41d3c4322291720481492
The block coding skip flags are assigned in the normal RD mode
decision loop. They are then used in the final encoding stage.
In the non-RD mode decision, the forward transform and quantization
stages are replaced by modeling based on SSE and variance of
prediction residues. This commit applies reset to this array in
the non-RD coding mode.
Change-Id: I66584669b035e9c8ac23e95047849ff277472742
This commit moves the position where rdmult is saved to make sure it
is the correct value. Prior, an uninitialized value may be saved and
restored.
This addresses issue:
https://code.google.com/p/webm/issues/detail?id=733
Change-Id: I436407f289169bc63da3c5a6bf609bed16cb71b5
This commit unifies the non-RD partition use cases for both fixed
and variable block sizes. Deprecate and remove the separate function
for fixed partition type only.
Change-Id: I2b6cb945e90c1566f985adcebc4d0757480a8004
This commit allows the non-RD mode decision process to return the
rate and distortion costs associated with the selected mode.
Change-Id: Ibe0f67d323f65839fd9cb0a726c1219bf7b55da9
Brings back most of Jim's previous patch for choosing
partitioning based on variance while making it compatible
with the current state of the code. Also adds a
nonrd_use_partition() function to recursively encode for any
arbitrary sb_type decisions within a 64x64 block; and
includes some refactoring.
Currently, when the VAR_BASED_PARTITIONING mode is turned on
for speed 7, there is a 10+% speed-up observed.
Experiments/improvements with this new partitioning method
will be conducted subsequently.
Change-Id: Ie6f43bfbde30583e941f450bf07c3b48828c9571
This commit adjusts the rate-distortion modeling for non-RD mode
decision. It puts more weights on energy from AC coefficients when
estimating the cost. The coding performance for rtc testset is
improved by 0.72%.
Change-Id: Ifa6ff11311a513ec2b10586589e82a9a21f6c61c
Assign interp_kernel value in MACROBLOCKD. This will be used to
select prediction filter coefficient sets and generate motion
compensated prediction.
Change-Id: I28c8dfb2dae6566f6939bb328aca5875c94bee65
Reimplementing sub8x8-reading of intra block modes in
read_intra_frame_mode_info() and read_intra_block_mode_info(). Code looks
more readable as well.
Change-Id: Ia42fc7d0dad708bc0c7a8bff1f8b37809b843f40
Clean-ups include
a. redundant code in rt -5 speed feature settings
b. code that guarantees square block availability in
rd_auto_partition_range()
Change-Id: Ic7b04d45b6dc15c461e0edbbb4e78aec20348291
The block size used for non-RD mode decision in FIXED_PARTITION
setting was uninitialized. This commit fixes it by setting block
size to be BLOCK_16X16.
Change-Id: Ief04c9f1ab668de69297d9ab3dc15e2fa0bc4e95
Neon code unit test is failing now due to save/restore neon register
operations are not done inside this function, but outside of it. Disable
it now until VP8 neon code get cleaned up.
Bug: 725
Change-Id: Id1ff1ef50a0e894b41c820a310ff8ba31ef12d18
translates to TreatWarningAsError (/WX)
setting this via the CL environment variable is not possible due to the
/WX- default which is used on the command line
Change-Id: I0b42a9d3ca9eba6af82c25b8e434baa2fcb00156
Adds a fast diamond search which is about 5% faster than FAST_HEX
with only a 0.1% drop in psnr when turned on for both speeds 5 and 7.
This search is turned on for speed 7.
Change-Id: I497630aa88a5148926086bb3038e7975e5f4eb98
This commit allows the non-RD mode decision to skip mode RD modelling
check, if the motion vector associated with the current mode is
same as that of NEARESTMV mode. This makes speed -7 about 2% faster.
Previous change that converts cost metric from SAD to model based RD
value makes the codec 6% slower at speed -7.
Change-Id: I30cfec5452f606a671b8432a2f7f0c94fbb49fc8
For some dimensions, neon code ends up in a dead loop inside.
This will fix the unit test failure in svc_test on ARM.
Change-Id: Ie6098bfaefd86bcf3616a3d0c2c3ff0b154222b5
The rate-distortion model in non-RD coding mode is only applied to
luma component. This commit removed a few redundant addition steps.
Change-Id: Id8edc0a47c2dbef8deba43debe2c95db39454de3
This commit replaces SAD cost with modeled rate-distortion cost
for non-RD mode decision. It translates the prediction residual
SSE into estimate rate and reconstruction distorion costs, hence
capturing the quantization setting effect. The compression
performance of speed -7 for rtc set is improved by 14.79%.
Change-Id: Ifda014eb0501d13109fe7f92680bf1410b463632
fixes issue #711
specifying a multiword CC, e.g., CC='gcc -m32', would cause the failure
under dash
reported in
https://bugs.gentoo.org/show_bug.cgi?id=498136
patch by floppymaster at gmail dot com
Change-Id: I2ba246f765646161538622739961ec0f6c2d8c2d
clang on macosx does not support -Wunused-but-set-variable; adding the flag
causes additional warnings about the flag. As a more generalized fix, use
-Werror when checking compiler flag support in order to avoid using
unsupported warning flags.
Change-Id: I2529862e211f880d56491eac3b9fa90fff1aa5c3
clang reports gcc-4.2.1 in e.g., 3.3, 3.4; add a specific clang version
check for _mm256_broadcastsi128_si256
fixes issue #720
Change-Id: I5c8e3c27fdea05d8a5b050e8cb74894b595f4709
prevents out of tree build failures when the source tree has already
been configured; modeled after a similar check in autoconf
Change-Id: I627eb7243576f4d753141dfcb4ed4e34544d03a7
Configuration logging is passed through pr, but nothing configure
does actually requires pr. Use cat instead.
Change-Id: I451217882a329c2bfb8942ac86ac624a7feef670
The only difference between two examples was usage of VPX_EFLAG_FORCE_KF
flag for frame encoding. Moving this functionality into simple_encoder
with additional command line option.
Change-Id: Ia3c4209be073eeb541d4ac6b41bd0f12812f6676
avoid building x86inc.asm, x86_abi_support.asm and vpx_config.asm as
they provide no symbols themselves
fixes:
warning LNK4221: This object file does not define any previously
undefined public symbols, so it will not be used by any link operation
that consumes this library
Change-Id: Iecfe03aa76efbfc07c2af5b91ba5405634e45f1d
Set speed features before running frame encoding. This avoids
redundant RD threshold calculation in key frame coding.
Change-Id: If8e3cf2c02976baa59b310c1c23af9eea0c46e36
* Remove all non-DC intra modes for BLOCK_32x32 and up
* Remove all intra modes for blocks bigger than BLOCK_32x32
* Remove ZEROMV for BLOCK_32x32 and up
* Only consider NEARESTMV for blocks bigger than BLOCK_32x32
Change-Id: Ia18351a238213e2f072f9e481d622949346a245f
+ nestegg_track_codec_data
quiets uint64_t -> size_t warnings
the sizes used are previously validated against their associated LIMIT_*
values
Change-Id: Ie574a3a7496d0143bd58b778145c27f38dd6a4da
- Change type of encrypt_buffer() offset argument to ptrdiff_t, and change the
type of the size argument to size_t.
- Update size argument encrypt_buffer() in vp8_boolcoder_test.c with
same.
Change-Id: Ie29c7c82c73318bee01b89c6fb4c4e1442eef03c
The core motion estimation fucntions all return sad now consistently.
The only exception is vp9_full_pixel_diamond(), however the core diamond
and refining search routines called from vp9_full_pixel_diamond() also
return SAD. If variance of pred error + mv cost is desired it must be
calculated explicitly outside these functions. For very fast encoding,
hopefully this will eliminate some redundant computations.
Also suggests reimplementing FAST_HEX with the vp9_pattern_search
framework. It is not exactly the same as the existing FAST_HEX, but
performance is slightly better and speed is very similar. Enables
removing a lot of duplicate code.
Change-Id: I152736393438c25bdf7e96b37cbb8ce330f4f94a
significantly speeds up file generation.
the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.
---
Linux
[CREATE] vpx_scale_rtcd.h
real 0m0.485s -> 0m0.022s
[CREATE] vp8_rtcd.h
real 0m4.619s -> 0m0.060s
[CREATE] vp9_rtcd.h
real 0m10.102s -> 0m0.087s
Windows
[CREATE] vpx_scale_rtcd.h
real 0m8.360s -> 0m0.080s
[CREATE] vp8_rtcd.h
real 1m8.083s -> 0m0.160s
[CREATE] vp9_rtcd.h
real 2m6.489s -> 0m0.233s
Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
* speed improvment of 30 percent achieved
* multiplies and adds remain the same
* non-arithmetic instructions minimized by hand, by:
-expanding 2 pass loop
-removing irrelivant "shuffles"
-combining last two rounding steps
* further improvments may be possible
Change-Id: Idec2c3f52910c48e6a0e0f9aefed5cae31b0b8c0
This patch adds a new speed feature which doesn't do the rather
expensive entropy context lookup or save to the table, while
doing costing.
The speed up on desktop36p.y4m is around 10% other clips much less.
On the RTC test set this was + 1% in overall datarate.
Change-Id: Ia5144bbf45270671e7be9c8e4055369909e2f738
This gets more accurate mode hit stats. It's also the first step to
handling ZEROMV not being allowed more intelligently.
Change-Id: I5de6734507b5177bf73e9ddbad923f218c39f3e4
intra_y_mode_mask is already enforced for the sub8x8 case.
intra_uv_mode_mask is already enforced for all sizes.
Change-Id: Ia9dd14701cb49873c2e8f24eb5f8b255eaf76a1f
The function has evolved over time, now only calls vp9_rtcd(), so this
commit removes the function and changes to call vp9_rtcd() directly.
Change-Id: I8cfa6190daa4b28f6f3d1e11bb3a07f9c95322bf
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_avg_variance64x64
2. vp9_sub_pixel_avg_variance32x32
both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 80% function level gain and 2% user level gain
Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8
ne_read_block/ne_find_cue_position_for_track/nestegg_get_cue_point
in the use of ne_map_track_number_to_index
+ add a check to ensure it doesn't exceed the type bounds
fixes:
./third_party/nestegg/src/nestegg.c|1322| warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
Change-Id: I3703d739dcf9a2d4d8e2b704e957e5e3fd80dca0
different_ref_found is always equal to one (if calculated) because
ref_frame[0] != ref_frame[1] for each mi-block.
Change-Id: Ibd7625b7b29dec2fd3c40edbc3de1169abb78585
2. Add read/write for RC stats file
The two pass RC for svc does not work yet. This is just the first
step. We need further development to make it working.
Change-Id: I8ef0e177dff0b5ed3c97a916beea5123717cc6f2
Adds a speed 8 to VP9 where only the nearestmv (0 mv) is searched.
This seems to be about the same speed as vp8 speed 5.
Adds a new speed feature to disable inter modes based on a mask for
each blocksize.
Adds code for having lower complexity motion search methods
in nonrd pick mode function, even though speed 7 still uses DIAMOND
search for now.
Also uses HEX search for speed 6 rather than FAST_HEX which improves
psnr by 0.56% without any noticeable speed drop (tested on gipsmotion).
Change-Id: Ic13176572dbd3aed5884a26786940a4b1bbd8a75
For blocks at frame boundary, the selected block size sometimes needs
to be smaller than that was first given. This commit forces such block
size change only between square blocks, so as to avoid the potential
use case containing 32x16 + 16x8 + 16x8, for 1080p sequences.
Local test suggested no visible coding speed difference. Borg test
reveals no difference in terms of compression performance.
Change-Id: Ie8de87f3c6febc3acf11b4cbfdf2077f9f6def52
This commit checks if the motion vector associated with the current
mode has been computed in previous mode tests. If possible, skip the
redundant reference block generation and SAD calculation in the
non-RD mode decision process.
For test sequence pedestrian_area 1080p, the runtime goes from
24261 ms to 23770 ms. This does not change compression performance.
The speed-up is mostly around places with consistent motion.
Change-Id: I97be63c6a2d07c57be26b3c600fbda3803adddda
previously the scale functions would always be include regardless of the
CONFIG_SPATIAL_RESAMPLING setting.
Change-Id: Ifbccf47b20689b5dd61bb3ddccd5c013297b4e05
Added fast HEX search while doing non-rd partition picking to
speed up the encoder.
Borg test(speed 7) on rtc set showed 1.8% overall PSNR loss.
Encoder speedup was 5% - 15% for different rtc clips.
Change-Id: I9c83026eabc70b69fcc747c90369ec60bfa3ca24
In handle_inter_mode, the reference frames are set in refs buffer.
One can use refs buffer directly to avoid redundant fetch.
Change-Id: I811d408cae52dcd5e053dd4bfe69550eb6a2ff56
Instead of using source variance, this patch uses variance of the
frame difference between the source and the current frame to make
fixed size partition decisions. Also disables adjusting partitioning
if variance based or fixed size partitioning is used.
The latter change improves the speed substantially for speed 6, so
that speed 7 is now less than 3x the speed of speed 6. But speed
6 is 48% better in psnr on the rtc set compared to speed 7.
As compared to speed 5,
speed 6 is -37% in psnr at about 2.5x the speed,
speed 7 is -55% in psnr at about 7x the speed.
Change-Id: If61d80431d3e04ed304ac05832e773cdb2c0a578
Begin() will be called twice with 2-pass encodes, invoking
y4m_input_open which allocates memory; close the old instance first.
Change-Id: Id252a21d286ca9ae998bd87599d43aeb8d7d77aa
FwdTrans8x8HT is disabled as the tests currently fail.
note not all functions have NEON implementations:
- fdct8x8/fht8x8
Change-Id: I028bdec9a21eaaee2c5865470ab179aac403540e
Trans4x4HT is disabled as the tests currently fail.
note not all functions have NEON implementations:
- fdct4x4/fht4x4
Change-Id: I26f8724bf2a9ea01d59205a1c57119ed25d043bc
There was a bug in the previous code that if GOLDEN was better than
LAST neither would be used. LAST would get turned off due to superior
GOLDEN quality then all GOLDEN modes would get skipped.
Change-Id: I173f3720451707dab7b2cbbe8b8e6a047089bde7
Removing all copies of identical vp8_mse2psnr/vp9_mse2psnr functions.
Using vpx_sse_to_psnr() instead in all places.
Change-Id: I15beef9834d43d8fc8a8a7a2d1fc5de3d658fed8
Usage of encode_b_args is unnecessary because encode_block_pass1() doesn't
use them. That's why optimize_init_b() call is also not required.
Change-Id: Ib6cfe4916c2ca85749c90bb0adcba6fea592f9ac
As Yunqing suggested, this commit makes non-RD mode decision always run
sub-pixel motion search in NEWMV mode. The compression performance
gains becomes fairly significant after we enabled sub-pixel accuracy
motion compensated prediction to calculate SAD cost.
For test sequences pedestrian_area at 1080p and vidyo1 at 720p, the
runtime goes slower by 5%. For rtc test set, the compression performance
is improved by 21.20%.
Change-Id: I38cbfdd5c53d79423e1fafb3154f8ddeed63bbf0
The commit change to use partitions sizes directly from last frame
for frames directly where last frame selects partitions sizes based
on coding efficiency.
On --rt --cpu-used=-5, the change hurts compression by 4% but reduces
encoding time by ~20%
Change-Id: Ia68665e5c8489b7bfcf5fac7768332fba88928e6
Modify existing test to also check the case of dropping
(i.e., skip decoding) a consecutive list of frames.
Change-Id: Ia8c1195559f952e86e6697996931d3a920c05ae3
The only difference between two examples was a setting of
g_error_resilient flag in error-resilient example. Moving this
functionality into simple_encoder with additional command line option.
Change-Id: I0245793320125926e1bf208cc1e87aef39ca478d
This commit builds the actual prediction block in sub-pixel accuracy
and uses which to calculate SAD for non-RD mode decision. In the trail
run on pedestrian_area at 1080p, rtc speed -7 runtime goes from
23495 ms -> 25107 ms (7% slower). The compression performance is
improved by 20.57% for rtc test set.
Change-Id: I438589cd103fe99f1b50c2d1939ac6ca43fa0157
Use a set of dedicated variables to buffer the current best mode
in non-RD mode decision. This allows to use mode_info for more
complicated test in the non-RD process.
Change-Id: I6024c9feb0662afd3eb29f7017f7b5a5446f303f
Adds a method for determining a fixed size partition based on
variance of a 64x64 SB. This method is added to rtc speed 6.
Also fixes a bug in rtc_use_partition() and includes some
refactoring related to partitioning search, and some cosmetics.
Currently compared to speed 5, the coding efficiency of speed 6
is -19% and that of speed 7 is -55%, in cbr mode.
Change-Id: I057e04125a8b765906bb7d4bf7a36d1e575de7c6
The optimizer did something funny with the code around
line 1412. Before the call to encode_sb split_dist was
set properly but after it was adjusted and converted to
a negative.
https://code.google.com/p/webm/issues/detail?id=714
Change-Id: I9a7631d5325ade2dc28c1030653a23eecec8721b
Change dx_time data type to int64_t to prevent
test time overflow when decoding long video.
Change-Id: I3dd5e324a246843e07e635fd25c50e71e385ed70
Signed-off-by: James Yu <james.yu@linaro.org>
If sf->disable_split_mask is DISABLE_ALL_SPLIT, disable
sf->adaptive_pred_interp_filter to avoid unnecessary operations.
Change-Id: Icb59174b2f4e9a3c3c16a696deb8018e5bd999eb
Moves the existing speed 6 to speed 7 and adds an
intermediate level 6 which is roughly in between
speeds 6 and 7 in both speed and coding efficiency.
Also includes some minor fixes/adjustments.
Change-Id: I98befc4d82d750e79fe426c457c4a2571f6b6cc7
If show_existing_frame indicates that the decoder should
display an existing (previously decoded) frame, add a
check to make sure that the signaled buffer does contain
a valid decoded frame.
Change-Id: Iac8c686b321827414d69a3f2d0467566911bcba2
for ABSDATA mode, so segment loop filter level always fall in valid
range for both Absolute and delta modes.
Change-Id: If90df3411479533dbdab63f8ae088d2f5dd174a9
The qindex for a segment was not clamped in ABSDATA mode, which may
cause invalid memory access if an ill-formed stream has a negative
value in ABSDATA mode. This commit added clamp to make sure qindex
for a segment always fall into valid range.
Change-Id: I0a74d00f4ef40aec7edaeca1d03c8645e23ab08c
Skip coefficient cost update in non-RD mode decision setting. Allow
periodical mode and motion vector cost update. Currently every other
8 frames. The increment runtime is a constant number. Hence more
visible for CIF resolution, while negligible for 1080p.
Speed -6 compression performance for rtc set is improved by 4.5%.
Change-Id: I27e0ad7c521fcc2af1d825582cbdd1a27ac4c323
This commit makes a refactoring of the rtc_use_partition. It allows
the encoder to take a preferred block size for non-RD mode decision.
The boundary blocks are handled such that smaller block sizes that
fit in the boundary size will be used instread.
In rtc mode, the coding performance of speed -6 for pedestrian_1080p
goes from
158980 b/f, 38.934 dB, 22721 ms to
159008 b/f, 40.064 dB, 23721 ms.
For rtc set, the speed -6 compression performance is improved by
26%. Still about 2dB behind speed -5 at this point.
Change-Id: If0944f0880eaf1ad340bc325d97cea8d0f9dd53f
- Update the vcxproj generator to pass the path to the batch file.
- Update the batch file the take the path to obj_int_extract.exe as arg
2.
Fixes this warning:
warning MSB8012: TargetPath does not match Linker's OutputFile property
value.
Change-Id: I5825f1d1d79f370aeb295bbd2aeb08b22c0e73ab
* Reduce the number of short cirtcuit checks by pre-computing and combining like checks.
* Postpone non-trivial initializations until after the shortcircuits are evaluated.
* Add some consts and const pointers.
No change to the actual results of the call or output of the encoder.
Change-Id: Ie44c4702aec6e08cfe0b8b0ba3cd6b57206478d1
This commit enables the use of DC, vertical, and horizontal intra
prediction mode in rtc non-RD mode decision. When the best cost value
of inter modes is above a given threshold, the encoder runs the
above three intra modes and selects the one that has minimum
prediction residual in terms of SAD.
This together with recent changes on non-RD mode decision and coding
control improves compression performance of speed -6 by
derf 91%
yt 61%
hd 46%
stdhd 52%
In terms of encoding speed, it is about 3 times faster than speed -5.
Change-Id: I6b483bfd0307e6482bb22a6676ae4e25a52b1310
When non-RD coding decision is used in rtc mode, the alt reference
is not used for inter frame prediction. This commit disabled alt ref
option whenever speed -6 is used.
Change-Id: I0b33ca03661de1db2d9bef1bcbff848cd4c9396f
Set TargetName for library builds instead of changing the value of
OutputFile.
This fixes the following warnings:
warning MSB8012: TargetPath does not match Library's OutputFile property
value.
Change-Id: I4320b6d9ea922d3a15b9823c7c6694ee33edbf45
Current setting was specific to 1 layer case.
rc_target_bitrate is total bitrate for whole stream,
so set it to ts_target_bitrate for highest/top temporal layer.
Change-Id: I83de73364956fa21c0a7c971c9f390d4840457e6
Use unsigned int instead of uint64_t for duration and deadline
arguments to functions get_frame_stats() and encode_frame().
Change-Id: I1f26a7afc38ae89916b2c67415ced26fdc9d53e7
In the first coding run of a 64x64 block, check the coding mode
for each 8x8 block. Will need a second annealing stage to decide
the partition size to be encoded.
Change-Id: Ida9417805ff3358979b0c0429d4099c023c88866
In good quality mode motion search, the best matches are normally
found after searching in a large area. In real time mode, to make
encoding fast, a center-biased fast HEX search is used, which
converges quickly most of the time. A 4-point diamond search is
also carried out as the following refining search, which gives more
precise results, and maintains good motion search quality.
At speed 5, the borg test on rtc set showed an overall PSNR loss of
0.936%. The encoding speed gain is 4% - 5%.
Change-Id: I42cd68bb56a09ca1b86293c99d5f7312225ca7ae
Run sub-pixel motion search when NEWMV gives lower rate-distortion
cost. This improves coding performance of derf set by 8%, std-hd by
2.2%.
Change-Id: Ife50f7fda8463927784fe59a41cc439c833e941a
- Rename and make static
s/vp9_compute_qdelta_by_rate/compute_qdelta_by_rate/
- Make base_q_index an integer.
- Add a cast.
Change-Id: Iea8d1397fd2717e7373b182ec51f5db960ef2cca
If MINGW_HAS_SECURE_API is defined, we don't need to declare strtok_s, but we still need strtok_r define.
Change-Id: I7cf781bb58f991a2bdce6a2ccf5082f6924579a3
these were incorrectly stripped in:
50fa585 Removing examples code generation and making them static.
Change-Id: Idb475ad5b303634311e9f616604312cb925cc6a9
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_variance64x64
2. vp9_sub_pixel_variance32x32
both of those function were calling vp9_sub_pixel_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 70% function level gain and 2% user level gain
Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization is done only for 64 bit
Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
the remainder of the documentation will not be included in the output
unless the file itself is documented
Change-Id: I5a83a6c41cdfbf2976da288e4b70bd04002725f2
Only use layered average size if number_temporal_layers > 1.
Also removed unneeded commented-out line, and change some parameter
setting in vpx_temporal_scalable_patterns.c
Change-Id: Ic86e43e7daf0313e8c5a4aba1497299158111955
Added support for external frame buffers to libvpx's VP9 decoder.
If the external frame buffer functions are set then libvpx will
call the get function whenever it needs a new frame buffer to
decode a frame into. And it will call the release function
whenever there are no more references to that buffer.
Change-Id: Id2934d005f606af6e052fb6db0d5b7c02f567522
Prior to this commit, both encoder and decoder reset mode/mv info from
previous frame in error resilient mode to ensure bitstreams are able to
decode when there is loss of frame in decoder side. However, this is
not necessary. This commit changed to remove the reset, so encoder can
continue to use mode/mv/partition information from previously encoded
frame without affecting decodeablilty under loss of frame.
Change-Id: I0279f862900dc647fb471ae3389770bb1b9f454f
Silence signed/unsigned mismatch warnings by adding casts where
ts_number_layers does not match the sign of the variable to which
it is being compared.
Change-Id: Iab25e18c877d158b2b2b417de7da94669648b2fa
In rtc coding mode, the encoder is running non-RD mode decision. It
does not need dual buffer swap as was the case in the RD mode. This
commit initializes the internal buffer pointers outside the block
coding loop for rtc mode.
Change-Id: Ie076705c60d6b7919217e3f1dfd49e7db5064ac2
The functionalities of set_offsets() are subsumed in later
set_partitioning() and rtc_use_partition() functions, hence removed.
Change-Id: Ie514b13cb66c2379f13d0be9b1da4c12ca4581e5
Flipping arf on and off too often is hurting some clips.
This change makes no difference for 50-75% of our test
clips but helps some by a big margin. (eg. std-hd crew
by 6% and one of the YT and YT-hd clips by 14%)
Average improvements for 2 pass, speed 2 (psnr,ssim)
are as follows:-
derf 0.165%, 0.210%
yt 1.210%, 1.464%
yt-hd 1.189%, 1.471%
std-hd 1.031%, 0.886%
Change-Id: I121fe66cfb4a62d384b23b484a7d648789641969
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows
Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
Use size_t for DecodeFrame()'s size arg, and cast only
at the vpx_codec_decode() call site. This silences warnings that
appear in svc_test.cc when building with vs2013.
Change-Id: I2cf39f02a45732c752097f07b0c7ad414b1517d8
This function initializes the predictor buffer pointers and
calculates reference motion vectors. It is only needed in the settings
of inter frame coding. Hence removing it from the key frame coding
branch in rtc_use_partition.
Change-Id: Ic4e16c7467a5f32be4e0bf619ef9d57afb4a7075
Turns on AVX when the final characters of .c and .cc file names preceding the
.c and .cc file extension contain the substrings avx or avx2. This silences
many MSVC warnings issued during compilation files that use AVX.
Change-Id: I82bda394af7a688679abab2a50dd7e10b3cb0c7a
This function is deprecated after the re-design of partition search
that runs big block size, then four-way split, followed by
rectangular block sizes. This commit removes the related functions.
Change-Id: I417549c8e0fa3cf35bd29816b805dd4e7c3660c6
The function rd_pick_reference_frame can be deprecated. Its use was
subsumed by the adaptive motion search control.
Change-Id: Icb0c2fa335f0f06fa7b79a71f972d9fa54d750db
Removes certain cases of feedback of active_worst_quality,
and removes it from the RATE_CONTROL structure. Now active
worst quality is expected to be computed locally in the
q picking function during the encode.
Making temporal filter strength depend on avg_frame_qindex
rather than on active_worst_quality actually improves
performance esp. for yt.
derf: +0.038%
yt: +0.359%
Change-Id: I1fe5a343034b55af9322289165321f00ac0827b1
In real time encoding, we enable encode_breakout to make encoding
fast. A speed feature "use_encode_breakout" is defined to set
encode_breakout thresholds for different speeds.
However, currently, static_thresh is an encoder option. The encode_
breakout can be turned off if user sets static_thresh=0 specifically.
The rtc set borg test result: (need to set --static_thresh=1)
speed -5, psnr loss -3.543%;
speed -4, psnr loss -2.358%;
speed -3, psnr loss -0.771%.
Encoding speed test:
speed -5, 11% - 60% speedup;
speed -4, 5.5% - 28% speedup;
speed -3, 0.8% - 7% speedup.
Change-Id: Icde592ffbe77eac7446f872a2e9eb2051733677b
Aq 1 only updates segment map on kf and arf and
only uses 3 segments. With these settings AQ1 is
+ for most clips in SSIM but negative in psnr.
However, the penalty in PSNR is much less than
previously.
Old version aq1 average results for std hd
-20.899% psnr, -5.809% SSIM
New version aq1 for std hd
-3.57% psnr, +1.23% SSIM
Aq2 Now uses only 2 segments and rd.
This mode is still slightly negative for most clips on
psnr and SSIM but seems to have a much bigger visual
impact on several problem clips than aq mode 1.
Old results for std hd:
-2.578% psnr, -1.151% SSIM
New results for std hd:
-1.561% psnr, -0.85% SSIM
Change-Id: I94f57f8a73121629ce598fb921aad761c1450e1c
Some parameter changes and fixes on one-pass rate control.
derfraw300 is now only 10% below 2-pass speed 0 rate control.
Change-Id: I1940eef8a5a035dc18e71b880d5e00cabd1f01b9
This CL changes libvpx to call a function when a frame buffer
is needed for decode. Libvpx will call a release callback when
no other frames reference the frame buffer. This CL adds a
default implementation of the frame buffer callbacks. Currently
only VP9 is supported. A future CL will add support for
applications to supply their own frame buffer callbacks.
Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
Function encode_rtc_frame_internal() and encode_frame_internal() only
differed by a couple of speed features, this commit relocation those
difference into the setup of speed features and merged two functions
into one to remove duplication.
It also fixed a subtle bug super_fast_rtc was used before it was
initialized.
Change-Id: I234a5a1d11a4450930e5b4943dbab434208d5030
-Properly set the average frame size for each layer.
-Allow each layer to update its average/last Q stats after encoding.
-Initialize for some layer context variables.
Change-Id: Iaa37d144fcf4f30ff4283a4e8db8b9ca8bf4c815
A bug was reported in Issue 702: "SIGILL (Illegal instruction) when
transcoding with vp9 - using FFmpeg". It was reproduced and fixed.
Change-Id: Ie32c149a89af02856084aeaf289e848a905c7700
Bitwise OR operation doesn't guarantee any subexpression evaluation order.
Just reading one bit now and ignoring the next one. For reference look at
vp9_decode_frame() implementation.
Change-Id: I4971686929838ae5ded8f43a38a2934db5e1d462
Fixes some of the parameters for 1-pass non-cbr mode.
Also includes some cleanups, inlcuding refactoring of the
recode_loop options.
Results on derfraw300 improve by about 5-6%, so that the one-pass
mode is now 13% below the 2-pass mode in speed 0.
Change-Id: I844cc2638694c7574f3be00d41d60b23dc1016f0
this ensures both are properly initialized when calling _dealloc().
+ check the arrays before access
Change-Id: I789af39b41c271b5cb3c029526581b4d9903b895
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.
Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).
Updated datarate unittest with tests for temporal layer rate-targeting.
Change-Id: I8900a854288b9354d9c697cfeb0243a9fd6790b1
Update the local makefile to build all the files and the test
application by default to simplify build verification.
Change-Id: Ic10141ea14c85110ff7507447d16297b77d296e9
Right now only IVF format is supported which is enough for example code.
Other formats like y4m, webm, raw yuv will be supported later.
Change-Id: I34c6f20731c1851947587ca5c589d7856b675164
since
50fa585 Removing examples code generation and making them static.
the examples have been c files, not generated from text. this removes
GEN_EXAMPLES and replaces it with EXAMPLES, building the source directly
rather than copying it to the build folder
Change-Id: I5445bc49553419e3d2430963517d2c18cdba1f82
Inlcudes a number cleanups:
1. Moves the one-pass pre-encode parameter setting functions
to vp9_ratectrl.c
2. Deprecates per_frame_bandwidth in RATE_CONTROL structure
3. Removes target_bandwidth in cpi structure since it is not used.
4. Various renaming of functions
There is no bit-stream change in 2-pass, one-pass cbr and one-pass
vbr modes.
Change-Id: Ifd9916bf4d485b7d04c5f52044ffe6703254ccbd
This isn't strictly necessary, but makes the file more consistent
with the other arm assembly source files.
Change-Id: I245c9677d89e0ab3f31991e473764858af35b180
Avoid substitution of substrings by using \b to make sure the
substituted strings are at word boundaries.
This is an adaption of the corresponding changes to ads2gas.pl
from 7ebcaeb0fa.
Change-Id: I52160e8ba0373d4779d5fc3b0c384ca5c51c7b13
remove example files that have been tracked since:
50fa585 Removing examples code generation and making them static.
Change-Id: I9dd2e1588003918286d455c5e58a43393b176a84
older versions of visual studio did not include the trailing \. this
moves the objects to their intended location: the project subdirectory
Change-Id: I244479cdebf6b3f03bed6dbfca82e7fb4542f0de
CONFIG_USE_X86INC is available to every makefile, there's no need to
duplicate its value with USE_X86INC
Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
avoid wrapping msvc includes with extern "C"; this breaks some visual
studio builds of the (c++) tests.
Change-Id: Ie8062d55d4f4c049f6cd360a36da6a67607df132
git diff adds the following line to diffs:
\ No newline at end of file
which interferes with diff.py parsing. diff.py only looks for '+', '-'
and ' ' at the beginning of the line.
Issue seen on https://gerrit.chromium.org/gerrit/68611
Change-Id: I0d7b4485c470e0b409f2c9cddde6c9aceba0152e
Fixes rate control partially in one-pass non-cbr case to achieve a
bitrate close to the one desired. Previous version was way off at
the high bitrate end.
Also includes several one-pass rate control cleanups and refactoring.
On derfraw300, one-pass encoding is now 19% off from two-pass speed
0 encoding, down from 35%.
Change-Id: I6f0dcdb7f8aa85a7e7cd3a3155d4f9d2a4d2f4f4
This patch added ssse3 optimization of bilinear sub-pixel filters.
The real time encoder was speeded up by ~1%.
Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
Moved a few features with low impact on compression form -5 to -4 and
increased adaptive_rd_thresh for -5.
Change-Id: Ib1b748168cc6ed7684ae4818499f3a536ae76253
The new implementation disagrees when the argument is equal to 2**n but
that is never called in practice and based on how it is used the new
implementation is correct in that case.
Change-Id: Ifbac4ad87d459fe6bd2fd0f400c0340f96617342
This avoids calls to get_unsigned_bits() with constants and
replaces hard to trace loops with simpler structures.
Change-Id: Ic1afc5a17d7df5bcfc85b76efda316b0bf118467
Using bilinear filters could speed up the codec in real-time mode.
This patch added sse2 optimizations of bilinear filters that
operate on different-sized blocks.
Tests showed that the real-time encoder was speeded up by 3%.
Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.
Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).
Updated datarate unittest with tests for temporal layer rate-targeting.
Change-Id: I9cb6cce2494390ae6096ee17774af7fb9308bde7
the public typedef already includes a const, quiets
'same type qualifier used more than once' warnings
Change-Id: Ib118b3b116fba59d4c6ead84d85b26e5d3ed363d
As pointed out by Dmitry and James, "partial" is a Microsoft-
specific c++ keyword, and it is renamed.
Change-Id: Ia0fc11ceb89e54b3195287f89f7e26edbbe9beb8
This commit added a logic to prevent the inter_filter type from being
changed if the default interp_filter mode is not switchable. Also, it
sets the default interp_filter to BILINEAR at very and super fast rtc
encoding modes
Change-Id: Ic41e6d31de29795a4ce536ec79afb01cab6daad3
--rt --cpu-used=-5 uses the progressive rtc mode
--rt --cpu-used=-6 uses the new super fast rtc mode
Change-Id: Id6469ca996100cdf794a0e42d76430161f22f976
Implemented parallel loopfiltering, which uses existing tile-
decoding threads. Each thread works on one row, and when that row
is loopfiltered, it moves to next unattended row. To ensure the
correct filtering order, threads are synchronized and one
superblock is filtered only if the superblocks it depends on are
filtered already.
To reduce synchronization overhead and speed up the decoder, we use
nsync > 1 for high resolution.
Performance tests:
1. on desktop:
8-tile 4k video using 8 threads, speedup: 70% - 80%
4-tile HD video using 4 threads, speedup: ~35%
2. on mobile device(Nexus 7):
4-tile 1080p video using 4 threads, speedup: 18% - 25%
4-tile 1080p video using 2 threads, speedup: 10% - 15%
Change-Id: If54b4a11960dd706c22d5ad145ad94156031f36a
* Avoid unnecessary type erasure
* Prune unused/duplicate fields from struct rdcost_block_args
* Make struct rdcost_block_args a local
Change-Id: I4f1fd4837ccd028bbfe727191ee8d69f0463b7e5
When showing a previously decoded frame, i.e. when
show_existing_frame=1, the update of the
last_show_frame flag must be disabled.
This is to ensure that the last_show_frame flag
reflects the state of the flag for the immediately
previously decoded frame rather then the value that
was forced to ensure that a previously decoded frame
would be displayed.
This patch also adds a test vector to verify that the
display_existing_frame flag works as expected. Code
for generating the test vector can be found in this
patch:
https://gerrit.chromium.org/gerrit/#/c/68581/
(Bug originally reported by Alexander Voronov
<ru.xalba@gmail.com>).
Change-Id: I731d288fba02088959f7fcc87707137fffc6acf5
Added a constant to represent the minimum KF boost
rather than using the magic number 2000 in the code.
Change-Id: I9428b61f47d26312caff81c6f9ae8587df004791
Some changes in 1ca1186 were mistakenly reverted by a later merge,
this commit re-instated the chanages from values to enums.
Change-Id: Ia6b01c31da584a1f612996e6432612c1295b9eaf
obj_int_extract.bat
this project and target still need some work to allow for concurrent
builds to succeed from the command line.
Change-Id: Ieb3bddc54636e77519083c48573909616257eb23
In this new mode, the size range is strictly determined by the min
and max partition size in neighborhood blocks.
Niklas720 encoding time at cpu-used -5 goes from 56250ms to 50676ms,
a 10% reduction.
Change-Id: I316b0e2ac967ff3fad57b28d69c0ec80b7d8b34e
Includes a few fixes and clean-ups that adds the ability
to use alt-ref frames in one-pass mode.
Whether alt-refs are actually used or not is controlled by a
macro USE_ALTREF_FOR_ONE_PASS in vp9_firstpass.c.
This first cut seems to improve derf by 15+% in 1-pass mode.
But further experiments with parameters are underway.
Change-Id: I78254421435478003367c788c7930d2dc4ee2816
This patch only works if the video is a width and height that are both
a multiple of 32.. It sets every partition to 16x16, and does INTRADC
only on the first frame and ZEROMV on every other frame. It always does
does the largest possible transform, and loop filter level is set to 4.
Was ~20% faster than speed -5 of vp8
Now 20% slower but adds motion search ( every block ), nearest, near
and zeromv
The SVC test was changed because - while this realtime mode produces
bad quality albeit quickly, it isn't obeying all the rules it should
about which frames are available.
Change-Id: I235c0b22573957986d41497dfb84568ec1dec8c7
Adds a stand-alone resize_util app for testing. The app
will not be built in the shared library configurations
so as not to require the APIs to be exposed.
Change-Id: I4718c8bff1abf4e57c2ab2d84be8738fc0048200
Adds multiple filters in the 0.5-1.0 range in the last stage
of the resize functions to prevent over-smoothing/aliasing
Change-Id: I1a615adb16f0df5095790945c94b28b4d6a6fc48
SSE for a 64x64 block with 3 planes can go as high as 3*2^28. So left
shift by 4 may overflow 32 bit int.
Change-Id: I63c84aa56894788bb987299badabbd7cc6fd0be6
The sum of squared mv components can go beyond int range for large
input resolution. This commit changed the type to int64 to avoid
overflow.
Change-Id: Ib21ea2817845cea1435f893064e6417c79c5bc64
New API is supposed to be used from example code. Current implementation
only supports IVF containers (will be extended to Y4M).
Change-Id: Ib7da87237690b1a28297bdf03bc41c6836a84b7e
Adds an arbitrary-size resize library for use in scaling of input
frames in a non-normative manner in the vp9 encoder. The method
used is as follows:
Downsampling - Uses a 8 tap filter for factor of 2 decimation upto
a size just higher than the desired size. Then interpolates pixels
at a precision of 1/32 pel using a set of 8-tap filters.
Upsampling - Interpolates pixels at a precision of 1/32 pel using
a set of 8-tap filters.
There is no assembly optimization yet.
Change-Id: Ib5b81e174fc139da322bb97c8214d52289d60d8a
Encoder's boarder is still 160, while decoder's boarder will be 32.
With on demand and separate boarder buffer for boarder extension.
The decoder's boarder does not need to to 160 anymore.
Change-Id: I93d5aaff15a33a2213e9761eaa37c5f2870747db
This commit explicitly enforces the effective motion vector range
in the motion search stage. The range needs to be the intersection
of UMV border, effective absolute motion vector value range, and
the target search area.
Change-Id: I1cf7c563e02b1086040dad6c1f4f6be1538635a6
When showing a previously decoded frame, we need to
explicitly set the show_frame flag.
For the current frame being decoded this flag is
explicitly set in the frame header.
This should fix WebM Issue 696:
http://code.google.com/p/webm/issues/detail?id=696
Change-Id: I5751a809813f88d2ca6f62c47c3878475ff9ba8d
The affect on quality was minimal. Less than .1%, various sets
yt ( +.15%), derf (-.1%), hd ( -.1% ), std hd(-.15%)...
The affect on speed of encode at speed -5 was substantial ( ~3% ).
Change-Id: I8903346fbae0c35f5b9ea20f81fdd239ae81247d
This commit deprecates the use of best_mv from encoding and bit-stream
writing stages. It hence removes the definition from MACROBLOCKD.
Change-Id: I8e5302775a2aa4a18900726df407bff881f2dfb1
Handle the non-420 case and set uv_width.
This is needed to get the correct colorspace information out of
vp9e_get_preview().
Change-Id: I62ce118cd7082708d812deb0843c1be87582e0fe
This commit removes the use of best_mv in the decoding process. This
variable can be replaced with nearest_mv. It saves a few cycles on
assigning the values for best_mv.
Change-Id: Ic183f9c1fb615c54efd7e6ccfedcf09d493435e4
This commit setups a test framework for real-time coding. It enables
a light motion search for non-RD mode decision purpose.
Change-Id: I8bec656331539e963c2b685a70e43e0ae32a6e9d
Calculate the skip_coeff as part of the encode process, rather than
checking the eobs after the fact with another pass.
Change-Id: Ib41b139e96a97dee30e4b993b4cc53d86337128d
Fixes assert that fails occasionally on small values of
max-key frame intervals. Also, adds a small change on
updating frames_to_key for frame drops.
Change-Id: Icc2b33b25e3e4ced7e49f8db73e0a887ef9c99e0
Applies an upper limit on burst bitrate for any
frame. This is to insure that typical encodes for YT
do not produce frames that are so large that they
risk stalling HW implementations. Such frames
could also cause playback problems in SW.
For now the limit is set at 250 bits per MB for 1080P
and larger (with the 1080P limit used for smaller frames).
Setting maxQ, constant quality mode or targeting a
very high bandwidth will have precedence over this limit.
Change-Id: Ie6f776c38b06ac7cec043d034085f4b79ee46a38
Putting appropriate check to open_output_file() and close_output_file().
Before that the output file has been opened twice during two-pass encoding.
Change-Id: I290cecf00513d6a31ca3f45bc20fef7efcb10190
File type check inside ivf_read_frame() is not necessary (it is done
before this function get called).
Change-Id: Iede8feb358d25878b340473d85c3b01d701fc624
There are three contributors to the definition of how the
display size is set:
(1) display width/height set in the container.
(2) display size (optional in the frame header)
(3) decoded frame size (from the frame header)
This patch modifies the way that vpxdec defines the display
size to give preference to these three criteria in the order
given above. If the container sets a non-zero size, it is
used, otherwise the display size specified in the first
decoded frame is used (if specified), with the raw
decoded frame size of the first frame used as a last resort.
The display size set in frames other than the first is
always ignored in this implementation.
Change-Id: I7e98d817d3f5894d559dd2aeb0a6cb1959b9092b
Reference frame masking helped good quality mode to gain about 5% in
encoding speed, this commit enable it for rt mode to gain the speed
improvement.
In addition, this commit move the speed feature setup to a separate
function.
Change-Id: I015e8f78bbb21dd43ae183b9b9355bea2ccda9c5
To reduce pulsing we now allow an arf just before forced key frames
and at the end of a clip or section (which may be stitched to
another clip or section). However, this does not make sense for
key frames arising from real scene cuts.
Change from original patch reflects other recent changes in regard
to alignment of gf/arf and kf groups.
Change-Id: I074a91d1207e9b3e28085af982f6718aa599775f
Introducing calc_psnr() which calculates psnr between two yv12 buffers.
Previously we incorrectly used width/height instead of
crop_width/crop_height to calculate number of samples -- fixed.
Change-Id: Iecda01980555de55ad347e0276e6641c793fa56c
Added comments to explain what the various speed features do, and removed
1 that was clearly unused.
Change-Id: Icd37a536072ddafedbfaefcecbe48979f6d10faf
This funtion initializes buffer pointers and first stage motion vector
prediction. It will be needed by both regular rate-distortion
optimization loop and the non-RD mode decision. Hence move its
declaration in vp9_rdopt.h
Change-Id: I64e8b6316c9d05f20756a62721533a2e4d158235
Filter out files ending in _neon.c and append .neon so the Android build
system knows to apply -mfpu=neon
Change-Id: Ib67277e5920bfcaeda7c4aa16cd1001b11d59305
This commit allows encoder to compare the SAD cost associated with
the best motion vector predictor, per frame. If one reference frame
has this cost more than 4 times of the best SAD cost given by other
reference frames, skip NEARESTMV, NEARMV, ZEROMV mode check of this
reference frame.
This setting is turned on in speed 2 and above. Compression quality
change in speed 2:
derf -0.014%
yt -0.097%
hd -0.023%
stdhd 0.046%
It reduces the speed 2 runtime of test sequences:
pedestrian_area_1080p 4000 kbps 310763 ms -> 303595 ms
bluesky_1080p 6000 kbps 259852 ms -> 251920 ms
Change-Id: I7f59cf79503d51836d61d56d50dc5bdf0e502e22
Under a configuration change, where the bitrate suddenly decreases,
the buffer level may be larger than maximum allowed (for that first frame to be encoded after change_config).
This change keeps it clipped to its maximum level.
Change-Id: I4d0b5b3d1fd8148600dd39e02bd630c9464baba5
1. Made speed choices to be progressive
2. Adjusted rt speed settings to achieve better speed/quality
Overall, rt-5 gained 2.5% in compression/quality, encoding time of 720p
niklas clip goes from 137,052ms to 121,874ms
Change-Id: Ia6e7e1e15225395a868a2f1059c3db8e266e1600
This commit further optimizes SSE2 operations in the second 1-D
inverse 16x16 DCT, with (<10) non-zero coefficients. The average
runtime of this module goes down from 779 cycles -> 725 cycles.
Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.
Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
Move the code outside the conditions. The test sources themselves are
also required for Visual Studio.
Change-Id: Id5e93ebc7369e1807eba0b9dc4f7d0f18033d794
This commit is the first patch optimizing SSE2 implementation of inverse
16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
transformation. It exploits the fact that only top-left 4x4 block contains
non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
The average runtime of idct16x16_10 unit is reduced from
883 cycles -> 779 cycles (12% faster).
For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
down from 310651 ms -> 305910 ms. The decoding speed goes up from
80.37 fps -> 80.87 fps.
Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
vp9_mse16x16 by migrating to AVX2
some of the functions were optimized by processing 32 elements instead of 16.
some of the functions were optimized by processing 2 loop strides of 16
elements in a single 256 bit register
This optimization gives between 2.4% - 2.7% user level performance gain
and 42% function level gain.
Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
Fix miss alignment of the frames contributing to the
error score and bit allocation for gf/arf groups.
Initial results slightly +.
Change-Id: Ie508bdcfdac52e592d48e1f13e01b3551b523deb
Some cleanups on frames_to_key, frames_since_key.
Also removes the unused fixed_q parameters in vp9.
Change-Id: If8743a32c71de30a8d17136477b53d607a7acda8
The previous implementation stops motion vector prediction test when
the zero motion vector appears for the second time. This commit fixes
it by simply skipping the second time check on zero mv and continuing
on to next mv candidate.
It slightly improves stdhd in speed 2 by 0.06% on average. Most static
sequences are not affected. A few hard ones, like jet, ped, and riverbed
were improved by 0.1 - 0.2%.
Change-Id: Ia8d4e2ffb7136669e8ad1fb24ea6e8fdd6b9a3c1
also:
o remove dead code, create_dummy_frame
o Fix a bug in command line handling that caused a segfault if wrong
number of arguments were given.
Change-Id: I78f026aee4e363967b750e6cde0982659c558a1f
The feature undergoes prior assumption that the recursive partition
size search from 4x4 to 64x64, hence utilizing information from small
blocks to determine early termination in large block rate-distortion
optimization search. The current codebase is now going from top down.
The previous function might go with not properly initialized values,
hence removed.
Tested on pedestrian_area_1080p at 4000 kbps running under speed 2.
No visible difference in runtime observed.
Change-Id: I553df415c6191413762db7ae34e8790c71d8118e
This patch sets frame types correctly in the new
vp9_get_second_pass_params() function called prior
to encode_frame_to_data_rate() function, so that the
latter function can just work with what is passed to
it. This will allow multiple vp9_get_second_pass_params()
to be created for various encode strategies without
messing with the core encode function.
There is no difference in derf and yt. stdhd/hd are pending.
Change-Id: I70dfb97e9f497e9cee04052e0e8e0c2892eab0c3
This commit adds input/output ports for IDCT8_1D macro function to
provide more flexibility in variable use. It allows to skip several
buffer swap operations.
Change-Id: I21f3450509537322293043b3281bfd3949868677
Adding RefBuffer to simplify reference buffer management. The struct has a
pointer to image data and scale factors relative to the current frame.
Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e
This commit merges the initial buffer swap operations in idct8_1d_sse2
into the array transpose step, hence reducing number of instructions
therein.
Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
the fact that only top-left 4x4 block contains non-zero coefficients,
and hence reduces the instructions needed.
The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
frames coded at 4000kbps, the average decoding speed goes up from
79.3 fps to 79.7 fps.
Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
In two pass encodes bits are allocated to each frame
according to a modified error score for the frame as a
fraction of the modified error score for the clip or section.
Previously a minimum rate per frame was reserved and
subtracted from the bits allocatable by the two pass code.
The vbr max section rate was enforced by clipping the
actual number of bits allocated.
In this patch the min and max vbr rates are enforced
instead by clipping the modified error scores for each frame
rather than the number of bits allocated.
Small gains for all test sets (psnr and SSIM) ranging from
~ +0.05 for YT psnr up to ~ +0.25 for Std-hd SSIM.
Change-Id: Iae27d70bdd3944e3f0cceaf225bad2e8802833de
vpx_codec_vp9x_cx is not used internally. Experimental flag from
vp9_extracfg is also not really used. YUV 4:4:4 just works after these
changes (you have to specify --profile=1 for the encoder).
Change-Id: Ib1c8461d0d19d159827e005efe868f891eea0140
This commit takes a preliminary attempt to refine the motion search
control. It detects the SAD associated with mv predictor per reference
frame, and based on which to determine whether the encoder wants to
reduce the motion search range (if the predicted mv provides fairly
small SAD), or to skip the current reference frame (if there exists
another ref frame that gives much smaller SAD cost).
This feature is turned on in the settings of speed 1 and above.
In speed 1, compression performance changed
derf -0.018%
yt -0.043%
hd -0.045%
stdhd -0.281%
speed-up
pedestrian_area_1080p at 4000 kbps 100 frames
199651ms -> 188846ms (5.5% speed-up)
blue_sky_1080p at 6000 kbps
443531ms -> 415239ms (6.3% speed-up)
In speed 2, compression performance changed
derf -0.026%
yt -0.090%
hd -0.055%
stdhd -0.210%
speed-up
pedstrian 113949ms -> 108855ms (4.5% speed-up)
blue_sky 271057ms -> 257322ms (5% speed-up)
Change-Id: I1b74ea28278c94fea329d971d706d573983d810d
Buffer the SSE of prediction residuals in the rate-distortion
optimization loop of a given block. This information would be used
for later encoding control.
Change-Id: If4e63f3462490513c48be9407d3327c8dd438367
Moving back to scale_factors struct. We don't need anymore x_offset_q4 and
y_offset_q4 because both values are calculated locally inside vp9_scale_mv
function.
Change-Id: I78a2122ba253c428a14558bda0e78ece738d2b5b
Before mv scaling it is required to calculate x_offset_q4/y_offset_q4
by calling set_scaled_offsets(). Now offset configuration can not be
missed because it happens just before scale_mv().
Change-Id: I7dd1a85b85811a6cc67c46c9b01e6ccbbb06ce3a
Various cleanups and streamlining of interfaces as precursor
to further advancements in rate control.
Pre-encode parameter setting for different use cases:
One-pass, first of 2-pass, second of 2-pass, and Svc
are separated out.
There is no change in output with this change.
Change-Id: Ied8ca7d84d610993776aa30ef263fe20452e0e3e
Take account of the fact that the overlay frame is usually
very cheap so distribute target bits among the other frames.
Change-Id: I120685122e8cbbe75da8d07d02932f7877059867
This will hurt metrics in some cases (particularly for static
clips at low data rates where there is extra overhead, but it
helps smooth transitions around forced key frames between
stitched kf sections.
Change-Id: I7e1026ae0de6c77bba863061e115136d7f283cc0
Slightly reduces the mean tendency to undershoot target
rate in vbr, especially when using the memory less mode
and when recodes are disabled.
The effect is primarily at low q.
Change-Id: I59a593b99522cc7da31b4134d1c8a65f5b7b7c53
This commit reworks the prediction filter rate-distortion cost update
process consistent for all block sizes.
Change-Id: I5874349ab38df380240f96c2d4ef924072bab68d
From frame 2, the lpf deltas are all cleared for for even frames, and
a set of values are set and used for odd frames. The intention is to
exercise decoding code around lpf delta/update decoding.
Change-Id: Ic9ff1bc2c2a023f4805852f8573398f2ec2249d7
Guard against incorrect size values moving *data past data_end.
Check read length against the difference of the buffers.
Change-Id: Ie0b54e2db517fd41a0f3ceb23402ee44839a4739
lf deltas are later setup in function vp9_setup_past_independence(),
so this commit removed the redundant copy. Also renamed a function
to better align the behavior of the funciton.
Change-Id: I5d28c2f5b12b3d31817e14296ed4605c1fd5c98c
MV struct was ussed to indicate the postition of a MI_BLOCK with row
and col components. The expression was confusing, this commit added a
new stucture "POSITION" with row and col component to better describe
the position of a mi_block.
Change-Id: I59fdd4b45010fe7d85a8db22a55503265c4f5b2b
Various cleanups and refactoring.
Removes feedback of active worst qaulity and uses last_q
instead to make the interface cleaner. Active worst quality
is now decided only once for a frame being coded in the
beginning based on last_q and other stats. Also, adds other
cleaups on last_q to store also the last_q for altref frames,
and reduces the altref interval a little.
The output does change a little.
derfraw300: +0.224% (global psnr)
stdhdraw250: +0.442% (global psnr)
Change-Id: Ie634cdc032697044c472dd0fe79c109b3e7f9767
The added vector was encoded with aq mode on, with the intent to
exercise the decode code around segment feature.
Change-Id: Iedcb7261e87d3e11b25ecf031d3a69385271148e
Properly handle the rd_filter_cache update, when early termination
or skip prediction filter type check is triggered.
Change-Id: Ie7b9a75fed3358f45ffd15817f2b36670c14eb2d
Increased threshold(t) for interp filter search. This sped up the
encoder with some PSNR loss.
Borg tests were ran at speed 2.
t = 100, PSNR loss:
-0.710%(derf); -0.561%(stdhd); -0.647%(youtube)
speedup:
9%(derf); 3%(stdhd); 5.7%(youtube)
t = 500, PSNR loss:
-1.687%(derf); -1.665%(stdhd); -1.664%(youtube)
speedup:
18%(derf); 10%(stdhd); 8%(youtube)
Change-Id: I180e3657c1e156aaa88dc7c437f8bcbd19f5caba
Corrected a typo that set rc_2pass_vbr_minsection_pct to
two different values on consecutive lines. Second line
should have set rc_2pass_vbr_maxsection_pct.
Change-Id: Ie07ac67cd5455afe556bef34da8127304db9c97c
This commit enables an adaptive prediction filter type selection
for sub8x8 block sizes. In speed 1, it re-uses the filter type of
collocated 8x8 block if it is tested in the rate-distortion optimization
loop, for the sub8x8 blocks. Otherwise, it runs the normal test
over all the three filter types. In speed 2, it re-uses the 8x8
block's prediction filter type, if available. Otherwise, force it
to be EIGHTTAP.
Compression and speed performance wise:
speed 1
derf -0.266%
yt -0.138%
bus at 2000 kbps: 33766ms -> 30451ms (10% speed-up)
football at 600 kbps: 48173ms -> 43786ms (9% speed-up)
speed 2
derf -0.026%
yt +0.134%
bus at 2000 kbps: 18973ms -> 17698ms (6% speed-up)
football at 600 kbps: 26748ms -> 25096ms (6% speed-up)
Change-Id: I77e097533b969fd3472147225fa79fc98095d342
Making overall logic more clear, moving "hacked" calculation of base filter
array pointer to get_filter_base() function.
Change-Id: Ibbd38a9f937e48d35bbbfef3ad933ab36664cccb
Trying to make encode_sb() more similar to write_modes_sb() and
decode_mode_sb() because essentially all branching logic should be the
same.
Change-Id: Ib7dec7b48fce29418142abad4d1dcfdb1c770735
This commit constrains the maximal motion search range for sub8x8
blocks to be [-1023, 1023], in the unit of full pixel.
Change-Id: I955b60649364ab410f2453cafd46a496f2fcb43e
There were two problems with the format string in
the conditionally compiled print statement. It referred
to a variable that is no longer available and it used
incorrect format specifiers.
Change-Id: I315e22bea2691bb535a2e33f5ca206fc55287a37
Adds a hook that derived test classes can implement to be notified
before every call to decode a frame.
Change-Id: Iefa836459cf3e5d7df9ee27f8198daf82b1be088
reorder the tiles based on size and their presumed complexity. this
minimizes the cases where the main thread is waiting on a worker to
complete.
Change-Id: Ie80642c6a1d64ece884f41683d23a3708ab38e0c
In evaluating partition split case, Wrong partition size is used in
calling partition_plane_context(). This commit change to use the
correct sub partition size. The incorrect partition size used were
causing an ASAN error in unit test.
Change-Id: Iab695b764bc51cc61580075f2ae4001421132362
Clean up and simplification of both estimate_max_q
variants and only call once per clip/section.
This leads to a more constrained range of Q values
across a clip / section.
Average gains across all 4 test sets:-
PSNR ~0.5% SSIM ~0.3%
Change-Id: If77d5f7bb50939a464e117724f4da5b001c62d70
For VP9, lossless coding is enabled by passing 0 for both min_q and
max_q. This is a valid configuration, and should not be warned.
Change-Id: Idd117579cd89cd14c0723b1d7e482067ac12b401
In lossless coding, distortion is always 0. Early exit based on this
metric was incorrect.
This CL also changed to use best_rd instead of distortion as the metric
for easly exit as requested by Jim.
Change-Id: I8ef3e407ac03b4abc3283b273f936a68fad5c2ab
Add a full range motion search for regular block sizes. This runs
exhaustive search within the given reference area. This commit further
optimizes the search process by combining 4 points test into one
pipeline, which gives 30% speed-up as compared to run each individual
point at a time.
This full range search serves as a best possible motion search reference.
When replacing the diamond search with full range search, the speed 0
runtime of bus CIF at 2000 kbps goes from 153872ms to 623051ms. The
compression performance compared to speed 0 setting gains 0.585% for
derf set.
Change-Id: Ieef1225216b0b86b4ac4872fa7fb9e18bf2eabb3
Removed an adaptive rate correction factor that was having
a negative impact on quality in many clips. This factor
was influencing the Q range available to each frame
independently of the bits allocated to each.
Average results with DISABLE_RC_LONG_TERM_MEM.
derf +0.199, -0.059.
yt +3.957, +3.798
std hd +1.577, +2.140
yt hd +4.127, +4.513
Average results without DISABLE_RC_LONG_TERM_MEM
derf -0.628, -0.665
yt +3.432, +3.015
std hd -0.105, +0.153
yt hd +3.432, +3.015
Change-Id: I45bab6b606f49a442e7b27a6d631f3ffd843bbce
Includes various cleanups.
Streamlines the interfaces so that all rate control state
updates happen in the vp9_rc_postencode_update() function.
This will hopefully make it easier to support multiple
rate control schemes.
Removes some unnecessary code, which in rare cases can casue
a difference in the constrained quality mode output, but
other than that there is no bitstream change yet.
Change-Id: I3198cc37249932feea1e3691c0b2650e7b0c22fc
Removed calls to vp9_update_mode_info_border since
they immediately followed code that initialized the
entire buffer to 0.
Change-Id: Ife06794daa20439a0b607a83a87f88df59afac40
- Disable mode info update in case where current frame is coded
as "show existing frame".
- Should fix issue 676.
Change-Id: Ibee681850eb307f982da6528d3e31cb94f881c08
Both single frame and compound inter motion search run with luma
component only. Hence removing the block size mapping therein.
Change-Id: I217488e702432ae9fa0e95bf6f516ebb36b5c79b
The old code would start in a mixed state, where all the reference
frames were pointing to frame buffer 0, but the reference counts
were 0. This is why we needed special code for the first frame.
Change-Id: I734961012917654ff8c0c8b317aac00ab75ded1a
Using get_plane_block_size() instead of manipulation with subsampling
values, calculating all required values only once without redundant calls
to b_width_log2().
Change-Id: I00303f2a0926f9c4cb17f34591adda60615f8919
Modifications to the spatial scalable encoder to match
changes made to the scaling code in the decoder.
In particular, the use of a dummy first frame was removed
now that the decoder is able to handle a smaller first
frame.
SvcTest.FirstFrameHasLayers unit test re-enabled.
Change-Id: Ic2e91fbe4eadf95895569947670d36d68abaf458
Jingning saw bitstream change with this patch. It could be true
that (mask_16x16_0 & 1) is 1, but (mask_16x16_1 & 1) is 0 in some
edge cases.
This reverts commit 8f05e70340.
Change-Id: I0a529435ce816a1e14653eb510d5090de276070a
In the decoder we don't need to save eobs, we can pass eob as an argument.
That's why removing eob arrays from VP9Decompressor and TileWorkerData,
and moving eob pointer from macroblockd_plane to macroblock_plane.
Change-Id: I8eb919acc837acfb3abdd8319af63d1bbca8217a
This commit makes the coefficient tree initialized prior to token
initialization, where the coefficient costs are filled out according
to the probabilities associated with coefficient value categories.
Change-Id: If4e89c3923058376f8382c683fe4a225a4a38af3
This commit fixes the intra prediction reference source selection
in the settings of skip_encode. Use original boundary pixels as
prediction reference, when the inverse transform and reconstruction
are skipped in the per block size rate-distortion optimization loop.
Change-Id: I36081aa30aa46e203e0e6f4e8a420fd08269469a
Its last remaining caller can be passed its results directly without any
additional work. Also, it's not non-4:2:0 safe.
Change-Id: Ia5089ba5f7f66c7617270483c619c9271aefd868
The performance gain of idct16x16_10_add_sse2 function is not
noticeable. However since both functions use the IDCT16_1D,
idct16x16_10_add_sse2 should be modified as well.
Tested with: park_joy_420_720p50.y4m
Change-Id: I02b957e36fcf997c677d15baf496533895271bff
This commit fixes the use of uv_intra_estimate by properly restoring
the mode_info struct required by rd_pick_intra_sbuv_mode.
Change-Id: I6a156d79533c4e2e60dfd3b8c5bb0a42a8eca280
The difference with the old code is that originally the whole token_cache
was initialized with zeros at the beginning of decode_coefs() function.
Now we set several zero values explicitly with "token_cache[scan[c]] = 0".
Change-Id: I88cc5031f01d13012d1a4491739c36cb44f9401e
Removing goto and using while loop instead, renaming seg_eob to max_eob,
moving eob token counter increment.
Change-Id: Idcc4b3a45e4f313596a71776aef56691a6647e5f
E.g. disable vertical partioning for 4:2:2. Until we come up with something
better to do with the chroma block size, this prevents an assert error.
Change-Id: I9394fb3f14ec1343abc3ad4769de208e6278f285
This fixes issue 667.
In the case where the frame was an odd number of pixels
wide or high, the border was being extended by one col
or row too far.
The calculation of color plane dimensions was modified
to use those already computed at the time the frame
buffer was allocated.
Also freed the temporary scaling buffer in vpxdec to
prevent a memory leak.
Change-Id: I195bc81d84c0fc5d8260c1232200d62399e4b51f
Considering a horizontal edge, if mask_16x16 is 1 for an even-
indexed 8x8 block, then mask_16x16 is 1 for next 8x8 block in
same row. Similiar to a verticle edge, if mask_16x16 is 1 for
an even-rowed 8x8 block, then mask_16x16 is 1 for the 8x8 block
right below it in next raw. Based on that, the mask_16x16 checking
can be simplified to save cycles. The corresponding 8-pixel
vp9_mb_lpf_horizontal_edge code can also be removed.
Change-Id: Ic3fe7a5674322239208cbe2731dc3216ce2084f3
We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct
and VP9Decompressor struct sizes by 24K.
Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207
Renaming treed_read() to consistent vp9_read_tree() and moving it from
deleted vp9_treereader.h to vp9_dboolhuff.h file.
Change-Id: Iedd8655acbe25e4fcf62b79e5a13bdea69b6b004
Added the test vector provided by Attila, which caught the bug in
Issue 661 "Decoder produces mismatched outputs with ssse3 enabled
and disabled"
vp90-hantro-stream-001.ivf
size: 320x180; 20 frames
Change-Id: Ic0d2b57ac7596ecb938dd55abc8c706fc2dd6d8f
vp9_idct32x32_34_add_sse2:
speedup: 1.472
IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized
based on the fact that Only upper-left 8x8 has
non-zero values.
vp9_idct32x32_1024_add_sse2:
speedup: 1.032
Tested with: park_joy_420_720p50.y4m
Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
- Add command line args that allow display of warnings without prompting
for user input.
- Extend warning code to make it somewhat scalable.
Change-Id: I2bad8f9315f6eed120c2e1bbe0a2a5ede15fbf35
The idea here is to allow "in frame" adjustment of the final Q
value used to encode each SB64, using segmentation.
There is also adjustment of the rd mult in regions of overspend.
Activated using aq_mode=2
Change-Id: I2f140cd898c9f877c32cd6d2e667f5e11ada4b1c
The decoder will construct inter predictor using lazy border extension,
while the encoder, going with multiple runs of motion search in the rate-
distortion optimization loop for each block, does border extension at
frame level. This commit makes separate the inter predictors for encoder
and decoder, respectively.
Change-Id: Ieca2fecba3a7201a6d64ef9f219e5d91e50559c3
When calling check_initial_width through vp9_set_size_literal
the function was defaulting to using non-subsampled chroma.
This patch changes the default to assume sampled chroma as
an interim solution until complete support for other
color formats is added.
Change-Id: Id8e7e919b350e3473dfdf7551af6fd0716478b04
This commit takes out vp9_extend_frame_borders from
vp9_setup_scale_factors.
The refactoring is for the preparation of the use of lazy border
extension at decoder. This makes it necessary to handle border
extension separately at encoder/decoder. The use of
vp9_extend_frame_borders will be removed, when lazy border extension
is ready.
Change-Id: Ia3baba3d179d5f11eee1634f19b3b319d2a59186
The decoder ignored the display width & height
specified in the frame header.
This patch adds a control, VP9D_GET_DISPLAY_SIZE, to
allow the application to obtain the display width and
height from the frame header.
vpxdec has been modified to scale the output frame to
this size.
Should the request for the display size fail vpxdec will
use the native width and height of the raw decoded
frame instead.
Change-Id: I25db04407426dac730263720c75a7dd6400af68a
This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.
Decoder speedup:
tulip clip: 2% speed gain;
old_town_cross: 1.2% speed gain;
bus: 2% speed gain.
Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
This fixes issue 667.
In the case where the frame was an odd number of pixels
wide or high, the border was being extended by one col
or row too far.
The calculation of color plane dimensions was modified
to use those already computed at the time the frame
buffer was allocated.
Also freed the temporary scaling buffer in vpxdec to
prevent a memory leak.
Change-Id: Ied04bdcdfd77469731408c05da205db1a6f89bf5
Moves all rate control variables to a separate structure,
removes some currently unused variables,
moves some rate control functions to vp9_ratectrl.c,
and splits the encode_frame_to_data_rate function.
Change-Id: I4ed54c24764b3b6de2dd676484f01473724ab52b
- Rename the struct to VpxEncoderConfig.
- The idea behind this is to enable checking the global settings against
stream specific settings in source files other than vpxenc.c.
Change-Id: Ic736cbb714845b9466acb34671780d65b83ad1a8
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.
Change-Id: Ib85412d690bea5609a51f0e50e7c858406b8ff9e
Separate the rounding and right shift operations of forward transform
from those of inverse transform. Take out the assertion check from
inverse transforms. If the transform coefficients were constructed to
cause intermediate steps of inverse transform overflow, the codec will
just let it overflow without breaking the decoding flow.
Change-Id: Ia7ce15dfd1a73b4abbaa78cbc74ec718523c5b1b
In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.
Change-Id: I39e4adc6b8aa3379854dd264d41aa6f0f15c7953
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.
Change-Id: Ia78663dfe74a2d46900f1c6fb07c21fac273892f
VS2010 only supports avx. There is currently no avx code
in libvpx so don't create a special case for it.
Change-Id: I39a11410367712b98bc6122c5a42fabffcdb94cf
Using for loop based on max_tx_size instead of separate checks. Combining
build_coeff_contexts() with update_coef_probs().
Change-Id: Ie335a7db29830677fbc14478a9c190d3c1068665
Modifications are done to reduce the total clock cycle.
Speedup: 1.2
Tested with: park_joy_420_720p50.y4m
Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f
Both functions have no relation to motion vectors, so moving them from
vp9_findnearmv.h to vp9_blockd.h.
Change-Id: I74f524267886ab0fff4a2da793a10c906ed0f43a
Added filter_selectively_vert_row2 to be ready for parallel
loopfiltering in vertical direction. This change did 2-row
filtering at a time. If 2 vertically adjacent 8x8 blocks do same
type of filtering, we can do 16-pixel filtering in parallel.
Next, we need to provide 16-pixel loopfiltering functions in c
and optimized versions for codec speedup.
Change-Id: Idf97bbdd70566e55bd30e1fd25cb8544e33291be
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.
Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
This function is called from vp9_setup_past_independence() which is called
before the modified piece of code. Moving reset of inter_mode_probs into
vp9_init_mbmode_probs() for consistency.
Change-Id: Ib188e8798e1fbe15407fd501406761b746fdda95
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.
Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca
This CL fixes an overcite with the AVX2 support CL previously
merged (Change-Id: Idc03f3fca4bf2d0afd33631ea1d3caf8fc34ec29) that
prevented runtime execution of AVX2 code in WebM.
Background:
Starting with the Sandybridge processor, the CPUID instruction was
enhanced to add various extended feature flag enumeration leaves.
Reading these leaves requires an additional input value for the CPUID
instruction which is stored in ECX. This change adds this second input
value for all ARCH_X86 and ARCH_x86_64 targets to the CPUID macros,
allowing checks of EBX bit 5 for AVX2 support. This capability will be
required moving forward to check for future processor features.
Change-Id: Ie9d872bc9ff68dad4b6578e4544e4dfd0ae26c36
In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.
Change-Id: I062971e195f1f2ab9d0ab5fb84dcf215a0fcaa67
There are many places in handle_inter_mode that need to restore the
dst buffer pointers, due to buffer pointer swap and early rd search
breakout. This commit wraps these operations into an inline function
for clean-up.
Change-Id: I0462e8c41c8bc3cd8db07395489cac03d8e5be54
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.
Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
Reversing bit order of partition_context_lookup, and modifying accordingly
update_partition_context() and partition_plane_context().
Change-Id: I64a11f1a94962a3bf217de2f50698cb781db71a5
- Move it to webmdec.c and webmdec.h.
- Also, tidy up obvious style nits in the vicinity of code I was
already touching.
Change-Id: Ie2898d06e73c1e9030d9c8d465b73ee7edc3c02a
This rebase is a better implementation of the previous ones.
Modifications are done to reduce the total clock cycle.
Speedup: 1.341
Compiled with -O3
Tested with: park_joy_420_720p50.y4m
Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
Commit a4a5a210 enabled lossless coding, but the commit incorrectly
disabled the usage of skip in encoder even when skip should be used.
This commit make sure that skip is enabled even in lossless mode.
Change-Id: I276954f952c6ac68f17a316ebc72f09001228a08
VS2010 only supports avx. There is currently no avx code
in libvpx so don't create a special case for it.
Change-Id: Iacb10ea4762155412e04f23904b4324d01451fbd
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
Explicitly constrain the upper limit of motion search range (in the
unit of full pixel) to be [-1023, +1023]. It is intended to control
the effective motion search range for 4K sequences.
Change-Id: I645539c70885eec0f155781f439d97d333336e88
Refactored IVF frame reading code out into ivf_read_frame(). Forgot
to actually make the function call in read_frame().
Change-Id: Ie9f6917e70bd26d0352a761932465c60a29a1f81
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
Separate the rounding and right shift operations of forward transform
from those of inverse transform. Take out the assertion check from
inverse transforms. If the transform coefficients were constructed to
cause intermediate steps of inverse transform overflow, the codec will
just let it overflow without breaking the decoding flow.
Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730
on arm until we implenment real vp9_idct32x32_34_add_neon.
This issue is due to commit 47665452f0
Merge "Add 32x32 idct function for eob<=34 case".
Change-Id: I56b5f0abc20e7dd1bba521f78a995e85d65ea296
Upstream changes to account for differences in clang
syntax for Chromium iOS builds.
Since most of these are incompatible with XCode clang,
hide them behind a flag.
Change-Id: Idafcbcd4eb01b1ada6277da2d2edfd6c04b579fd
- Move IVF reading support into ivfdec.c and ivfdec.h
- Move IVF writing support into ivfenc.c and ivfenc.h
- Removed IVF writing code from the SVC example in favor of ivfenc.
Change-Id: I70adf6240d0320fdd232d8546ed573f0f68dd793
* Change from thumb mode to arm mode improves test time significantly
* Direct inclusion of test.mk allows for unit test configuration via
configure script
Change-Id: Id58d3ba8289374528756a672459d8334afe20e2a
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.
Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
This commit enables the unit tests for 4x4 DCT and ADST transforms.
It covers tests of round-trip error check, coefficient match check,
coefficient overflow check, and inverse accuracy check.
Change-Id: Ibfea928ee48f0ebc088b7fdb0bf2d89a14161299
The switch to the rate-correction damping factor
in https://gerrit.chromium.org/gerrit/#/c/67536/ was not conditioned on CBR mode.
Change-Id: I2326704e8ac030a4f7b592dd3fedb94c7dd0644d
The step that sums three input samples could potentially cause the
intermediate result go beyond 16 bit limit, when operating as the
second 1-D transform. This commit fixes the issue.
Change-Id: Iaf512449ac2d25ddd8a806d760afab362c62a516
Overall change (using dual buffer scheme for superblocks of both inter
and intra modes) reduces speed 2 runtime:
bluesky_1080p at 6000kbps: 263553ms -> 257441ms
riverbed_1080p at 8000kbps: 233230ms -> 225308ms.
Change-Id: Idf8d70f768a4b0d97b2a8506372c57b7b4022119
Match any whitespace instead of individual spaces. The macro
definitions in vp9/common/arm/neon/vp9_short_idct32x32_1_add_neon.asm
triggered this and treated spaces as arguments leading to lines like:
$8vld1$8.$88$8 {$8q8$8}, [$$89$8], $$8stride$8
Change-Id: I2d5718aba4614e4fd7b702e15c2a1bd80e656bd2
These changes are to support automated regressions of vpx on android
new file: test/android/Android.mk
new file: test/android/README
new file: test/android/get_files.py
Change-Id: I52c8e9daf3676a3561badbe710ec3a16fed72abd
As Jim suggested, 1D array was used to store filter levels instead
of 2D array. This used shift_y in setup_mask directly, and saved
few cycles.
Change-Id: If61ab298784861f1806b1cd396d4e4e2e0f097b9
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.
Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.
Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
This commit enables the dual buffer rate-distortion optimization
and encoding scheme. It stacks the original transform coefficients,
quantized levels, and reconstructed coefficients, in the rate-
distortion optimization search process, hence eliminates the need
to re-run residual generation, forward transform, and quantization
in the encoding stage.
Change-Id: I011bfad3a59a380a869ee552e91dae0394ec492e
Added loop filter mask checking, and made the caller function
ready for implementation of parallel loopfiltering in horizontal
direction.
Next, we need to go through the loopfilter functions (both c and
optimized versions), and provide 16-byte wide loopfiltering for
each filter type.
Change-Id: Ifef47e7ef9086ebc2fd6ca7ede8f27c9bbf79e66
Allocate memory space of dual buffer sets that store the coeff, qcoeff,
dqcoeff, and eobs. Connect the pointers of macroblock_plane and
macroblockd_plane to the actual buffer in use accordingly.
Change-Id: I2f0b5f482ca879fae39095013eaf8901db20a5a4
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.
Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
We only used "ib" to call get_scan() function, which in turn calls
get_tx_type_4x4() function. The latter one only needs block index if
bsize < BLOCK_8X8 -- under that condition raster_block == block.
Change-Id: I697306a0c3cf937acdd4f5e623d4367c5acc0b2f
This commit fixes the assignment of mode_info pointer per tile. It
makes recognition of tiles in both row and column formats and properly
arrange the use of mode_info.
The bug was first introduced in
I6226456dd11f275fa991e4a7a930549da6675915
https://gerrit.chromium.org/gerrit/#/c/67492/
Change-Id: Ie12cd209f53241513728c461ee3d7b9599ddb860
Inlining set_contexts_on_border() into set_contexts(). The only difference
is the additional check that "has_eob != 0" in addition to
"xd->mb_to_right_edge < 0" and "xd->mb_to_right_edge < 0". If has_eob == 0
then memset does the right thing and works faster.
Change-Id: I5206f767d729f758b14c667592b7034df4837d0e
This patch continued the work done in "Rewrite loop_filter_info_n
struct"(commit:00dbd369c70270428d56da6d15ea5486fc821c52) to further
improve loopfilter function.
1. Instead of storing pointers to thresholds, store loopfilter
levels within 64x64 SB;
2. Since loopfilter levels are already calculated in setup_mask,
we don't need call build_lfi to look up them again. Just save
loopfilter levels in setup_mask.
3. Reorganized and simplified filter_block_plane().
Tests showed a ~0.8% decoder speedup.
Change-Id: I723c7779738bbc2afcb9afa2c6f78580ee6c3af7
This to make sure that prediction residue always get coded in lossless
mode.
This commit also fixed lossless unit test
Change-Id: I537726ee55328d4e4cf0a0196393a67e12bfcde1
The new expression is much more logical than previous one. Surprisingly
both expressions give exactly the same set of dependent values
-- have_top, have_left, have_right -- in vp9_predict_intra_block.
Change-Id: I63eb1b592b8c37883b3a0dbb1f3daa271e446109
This patch fixed the issue reported in "Issue 655: remove textrel's
from 32-bit vp9 encoder". The set of vp9_subpel_variance functions
that used x86inc.asm ABI didn't build correctly for 32bit PIC. The
fix was carefully done under the situation that there was not
enough registers.
After the change, we got
$ eu-findtextrel libvpx.so
eu-findtextrel: no text relocations reported in 'libvpx.so'
Change-Id: I1b176311dedaf48eaee0a1e777588043c97cea82
The term x represents macroblock pointer across encode_block. Change
the two local variable names to avoid confusion.
Change-Id: Ic732e73023525d673c0a678ed2708ac1edf5a3f9
Now tile decoding consists of two stages:
1. Find tile buffer start and its size, put this info into tile_buffers.
2. Decode each tile based on information from tile_buffers.
It seems that stage 1 can also be reused by multithreaded tile decoder.
Change-Id: If0cdaefdd6d10bb41c63561346c9ae4cfac081dd
It is more logical to use dqcoeff buffer to put there *dequantized*
transform coefficients (inside inverse_transform_block and
decode_coefs functions). Dequantization happens inside WRITE_COEF_CONTINUE
macro.
qcoeff buffer should be only used in the encoder for *quantized*
transform coefficients.
Change-Id: Ifd54bef272bbf5311ced6669c4f1079f998af5d7
SVC multiple layer per frame encoding is invoked with vpx_svc_init and
vpx_svc_encode. These interfaces are designed to be invoked from ffmpeg.
Additional improvements:
- make dummy frame handling a bit more explicit
- fixed bug with single layer encodes
- track individual frame sizes and psnrs instead of averages
- parameterized quantizer, 16th scalefactors, more logging,
- enabled single layer encodes to generate baseline
- include new mode for 3 layer I frame with 5 total layers
Change-Id: I46cfa600d102e208c6af8acd6132e0cc25cda8d4
I'm sure I could do more, but I don't know how long this code has to
live. I think this at least makes the code a little easier to read and
understand.
Change-Id: I6ca76357f89468d4851a3d1826e7aefa498e51d1
This is mainly a clean up patchset. It moves the WebM writing support
out of vpxenc and into its own source file. Changes to tools_common and
vpxdec result from relocation of shared bits of code.
Change-Id: Iee55d3285f56e0a548f791094fb14c5ac5346a26
Removing special case handling from vp9_tree_probs_from_distribution(),
tree_merge_probs(), and vp9_tokens_from_tree_offset() functions. Replacing
inter_mode_offset() function with macro INTER_OFFSET which is used now for
vp9_inter_mode_tree definition.
Change-Id: Iff75a1499d460beb949ece543389c8754deaf178
Removes stack-alocation of token_cache in decode_coefs function
Seems to achieve about 1% decode speed improvement as tested on
25 480p videos.
Change-Id: I8e7eb3361fa09d9654dfad0677a6d606701fdc6e
The compound inter prediction could potentially run with initial
motion vectors of invalid value and check the mv_cost, which triggers
overheap read. This commit resolves this issue by forcing a motion
vector value check for compound inter modes of both superblock and
sub8x8 block sizes.
Change-Id: I4f4fc19ce83c8272782bc382f12c82a3f03212fc
We only update partition_probs for inter frames but they are constant
for key frames. It is not necessary to have constants inside frame
context and copy them every time. This change reduces FRAME_CONTEXT size
by at least 48 bytes.
Change-Id: If70a53be51043f37fe7d113853217937710932a7
Removed:
goldfreq, avg_encode_time, avg_pick_mode_time,
cpu_freq, interquantizer
member variables from VP9_COMP since they are no longer
used in the code.
Change-Id: I010a82c217d0da03c3f53d1858d3462190c12dcf
Removed three members from the VP9_COMP data structure:
inter_zz_count, gf_bad_count, gf_update_recommended.
These were part of the VP8 real-time mode implementation
that was removed from the initial VP9 codecbase.
Change-Id: I866b083b88ef02c74837277d50ce532ca88492f3
This commit fixes the use case of plane_block_idx, which determines
the plane (Y/U/V) index based on block index. When block idx >= 4 in
sub8x8 block loop, it should be of chroma components.
Change-Id: I072705aa7b35445524ac607089ca8ce54b7ba478
We don't have to calculate 'new' probability in convert_distribution()
because it is enough to calculate only 'new' counters which could be used
to calculate probability if necessary. That's why removing a lot of unused
temporary probability arrays and reducing number of get_binary_prob()
calls.
Change-Id: I4e14eb7203d1ace61bbddefd6b9b6326be83ba63
When a frame is dropped due to |buffer_level| < 0 for a given temporal layer,
the buffer level for the upper temporal layers was not updated (in calc_pframe_target_size()).
This change fixes that.
Also, use the layer per-frame-bandwidth for updating the buffer level
of the higher layers when a frame is dropped.
Change-Id: I660c23f3229b47e9d124a950b480314b4307c5a8
1. Reduced the size memset based on eob for 32x32 transform. The reset
of non-zero coefficient should probably go into where they are read in
inverse transform functions. (TODO)
2. Removed a redundant level of indirection.
vp9_iht4x4_add() checks transform type and call vp9_iht4x4_16_add()
for tranforms other than DCT_DCT. In this case, the DCT_DCT case
has been already handled here.
Change-Id: Iacbc77da761f0b308df5acea0f20c9add9f33d20
The change doesn't affect the bitstream. It changes the order or function
calls and affects how we reconstruct intra- and inter-blocks. Speed up is
about 1...1.5%.
For intra-blocks:
Before:
for each transform block read tokens
for each transform block do prediction
for each transform block do inverse transform
Now:
for each transform block
read tokens
do prediction
do inverse transform
For inter-blocks:
Before:
for each transform block read tokens
for each transform block do inverse transform
Now:
for each transform block
read tokens
do inverse transform
Change-Id: I12a79bf1aa5a18c351b8010369bd3ff1deae1570
This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.
Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
-Don't reduce maxQ for gold/alt in CBR mode.
-Fix to min/maxQ for first/initial key frame.
-Add more speeds to datarate test and reduce the starting bitrate for test.
Change-Id: Id2a333d76dd3f6a51b322ca984588e2a22159c58
This commit makes zcoeff_blk cache the case where the entire block
is quantized to be zero (without applying zero-forcing) in the rate-
distortion optimization loop, and skip the forward DCT, quantization,
inverse DCT, and reconstruction process in the encode_block stage.
It now works for all the block sizes, including sub8x8 blocks.
Change-Id: I5ae60a9c436ba3637d11666733554bec4580ef98
Both decode_modes_sb and decode_modes_b had conditions to immediately
return at the beginning. Eliminating these conditions here and calling
these functions only to do a real work. Also unrolling loop for
PARTITION_SPLIT.
Change-Id: I2fc41cb74ac491f045a2f04fe68d30ff4aaa555d
"<< SUBPEL_BITS" needs to be added in the calculation. Call
set_scaled_offsets() to calculate x_offset_q4 and y_offset_q4.
Change-Id: Ied130ea771510e918f51cd1dc3abe57f4c0962b5
factorizes the code in decode_tiles(). reading the offsets backwards
wasn't doing anything to prove tile independence
Change-Id: I0395d3c77205852ebdc55efedc68291e93cef85c
Warning was: "implicit conversion from enumeration type 'VPX_SCALING_MODE'
(aka 'enum vpx_scaling_mode_1d') to different enumeration type
'VPX_SCALING'".
Change-Id: I45689e439a8775bc1e7534d0ea1ff7c729f2c7f5
"keyframe" variable in the current code actually means that previous
frame is a keyframe because cm->frame_type has not been initialized
in read_uncompressed_header.
Change-Id: I5645b0816c70abdef5dfc70113018d06276dac77
The clamp operation may not affect the values of the final assigned mv
where compiler may make use of strict aliasing rule to optimize out the
clamp operation. This change made the code segments to better comply
the strict aliasing rule.
Change-Id: I24502ff18bd4f9e62507a879cc8760a91a0fd07e
It is enough to check just block type: intra or inter. Intra block implies
intra prediction mode, and inter block implies inter mode.
Change-Id: I3cf98731a3935f670a3cd8e2b2443483eb944be4
When building with new versions of Clang we encounter some issues. Work
around them by adding -fno-strict-aliasing when we detect Clang.
https://code.google.com/p/webm/issues/detail?id=603
Change-Id: I8e945a18a7215bcc627e7a1ee110078413259cc7
replaces use of cur_tile_mi_(row|col)_(start|end) by VP9_COMMON, making
it less stateful and more reusable for parallel tile decoding
Change-Id: I1df09382b4567a0e5f4434825d47c79afe2399be
Adding these functions to encapsulate tx_type check. Changing TX_TYPE to
int to match the declaration in vo9_rtch.h.
Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f
Restructured the storing of loopfilter information. Deleted
loop_filter_info struct and reduced copying happened in every
superblock.
Tests showed a 0.5% ~ 0.8% decoder speed gain.
Change-Id: Ie6a8e46bae71dc3a3cd8c6054f5de540b8e0ef5e
update_partition_context / partition_plane_context: this will allow for
separate storage to be used in tile decoding
Change-Id: Ie0bc393531ab7e9d2ce35c95111849b294aad4ed
This is required in order to build libvpx on OS X Mavericks where gcc
compiler is deleted, clang (3.3) is the default now.
Using unmodified source files from gtest-1.7.0/fused-src folder.
Change-Id: I3d5f7278149c904e48737327daf7097a8bb0b390
When only upper-left 8x8 area has non-zero dct coefficients, we
could skip 1D IDCT for 9th to 32th rows to save operations. This
function is called when eob <= 34.
Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
If the webm file did not have a Cues then vpxdec would fail
when creating a y4m file. If there is no Cues element print
out a warning and set fps to 30.
Change-Id: Ieea7040265dfdac7dff4ccf917c6f756160a96bc
set_active_map()
set_roi_map()
The APIs need be implemented and tested later, to insure consistency
with VP9 codec internals
Change-Id: I198124ee318f0883b58d1d36cea3c7ccd742a57e
For consistency with idct function names. Renames:
vp9_short_fdct4x4 -> vp9_fdct4x4
vp9_short_walsh4x4 -> vp9_fwht4x4
Change-Id: Id15497cc1270acca626447d846f0ce9199770f58
Splitting setup_inter_inter function into is_compound_prediction_allowed
and setup_compound_prediction. Moving setup_compound_prediction call
into read_comp_pred from read_uncompressed_header.
We should do the same in the encoder as well.
Change-Id: I40d75fdc4a221b2f7705df00d23a4b3fe79987c3
The encode_block for pass 1 takes simpler functionalities and can
save a few branches. The main reason is to make encode_block only
used after running rate-distortion optimization search in pass 2,
hence allowing dual buffer stack approach later.
Change-Id: I9e549ffb758e554fe185e48a07d6e0e01e475bcf
Use a flag variable to determine if coded in inter mode, thus avoiding
multiple inter mode checks in super_block_yrd.
Change-Id: I0ef998b2811c38e185a2e0583f0f636cee45d2cf
Assign the pointer to mode_info stream per tile. Remove the use of
tile_col in the decoding modules.
Change-Id: I7df87086708a3d92c5e20e86bcfb04e458ff47a6
This move is done to have all compressed header reading functions in one
place. Moved functions:
read_switchable_interp_probs
read_inter_mode_probs
read_comp_pred_mode
read_comp_pred
update_mv
read_mv_probs
Change-Id: I2aebb57d2826d03d11bf2f8fbbfc3a9978c4f9fb
The ref's scale_factors are set at frame level, and then copied for
each partition block. Since the struct members are mostly constant,
this patch separated the constant and non-constant members, and
reduced struct copying. This gave 0.5% ~ 1.4% decoder speed gain.
Change-Id: I94043bf5a6995c8042da52e5c661818dfa6f6d4c
The pointer was asigned only once with vp9_regular_quantize_b_4x4, calling
this function directly now. Also removing unused declarations:
prototype_quantize_block
prototype_quantize_block_pair
prototype_quantize_mb
vp9_regular_quantize_b_4x4_pair
vp9_regular_quantize_b_8x8
Change-Id: I14325bc2f082336820671eafbc06126651b79f73
This commit uses left_available flag to decide if the left mode_info
struct is available for left_block_mode. As discussed with James
Zern (jzern@), this prevents the codec from fetching mode_info from
blocks in the left tile, which although effectively not used might
present concerns for multi-threaded tile decoding.
This is NOT a bit-stream change.
Change-Id: I1dc8cf1bcbf056688eee27c7bc5706ac4b4e0125
Simple modification to reduce number of cycles in the
function.
Original function number of cycles: 973
Modified function number of cycles: 835
Improvment factor: 1.165
Tested with: park_joy_420_720p50.y4m
Change-Id: Ic5857272ea3aafe21d5ef9a69258d78c688f69bd
This reverts commit a82001b1cf, reversing
changes made to f6d870f7ae.
This commit breaks windows builds and needs some work to fix those and
some additional comments.
Change-Id: Ic0b0228e36704b127e5e399ce59db26182cfffe7
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
We used set_partition_seg_context() only before calls to:
1. update_partition_context()
2. partition_plane_context()
Moving these functions from vp9_blockd.h to vp9_onyxc_int.h and
inlining set_partition_seg_context into them. After that it is not
necessary to have {above, left}_seg_context fields in MACROBLOCKD struture,
so removing them also.
Change-Id: I4723f59e1c8f3788432b7f51185d8d747b3a97f9
missed one in vp9_detokenize.c in the last
+ add some asserts in vp9_decode_frame() to catch regressions
Change-Id: Ide67505114ee17efdafb13694aed0c09039e5a16
replace VP9D_COMP usage with the (slightly) more targeted
VP9_COMMON/MACROBLCKD/struct segmentation structures.
Change-Id: Iabb3616e231417b0e17b7e4b384ea63167a81745
This 2-pass rate control setting allocates bits based
on first pass stats to each kf group, gf group and individual
frame but does not correct the bits left and allocation after
each frame.
In other words it recommends a bit allocation for each frame
but does not try and correct any over or under spend on a
frame over the remainder of the clip. This reduces the accuracy
of rate control in terms of hitting an average bitrate but prevents
problems that may arise because early frames either use to many
or too few bits. This mode is currently more inclined to undershoot
than overshoot (particularly at higher data rates).
Also minor changes to rate of adaption when recode loop is not
enabled.
This mode is currently enabled by default for VBR.
It gives the following % performance gains.
derf +0.467, +1.072
yt 2.962, 2.645
stdhd 1.682, 1.595,
yt-hd 2.3, 2.174
Change-Id: I3c84a9bf8884e5b345698ff0e19187f792c2f3a0
Delta reduced because of concern about popping on some
very hard clips.
Also allow some frame recode at speed 2 for kf/gf/arf.
Change-Id: Ib47dff42da41aa6eec83b7285fcaaca24abb851e
Renames for consistency with other constants:
NUM_FRAME_TYPES -> FRAME_TYPES
NUM_PARTITION_CONTEXTS -> PARTITION_CONTEXTS
Change-Id: I3db30acb2868eb0a424237c831087b2e264ec47f
This patch fixed a bug that caused 32bit PIC build mismatch. The
stack pointer was modified after "GET_GOT". Loading left pointer
from a hard-coded position gave wrong result.
Change-Id: Iea0aec6f917b12a6b3393ffc986bad74510248cc
Commit "d207 intra prediction ssse3 using bytes" caused mismatch
while building 32bit PIC code. Disabled these SSSE3 functions
until we fix the bug.
Change-Id: Ic444e531d3d4058092fe6eab09006b44fcb18e4c
This commit makes the buffer allocation of zcoeff_blk array in
pick_mode_context block size aware. It calculates the number of
4x4 blocks in the partition and assigns the memory space accordingly.
This process (and the uninitialization) is done once for each encoding
pass. It allows memory copy of smaller buffer when possible.
For football at 600kbps, the runtimes improve by about 1%:
speed 1, 45961ms -> 45472ms
speed 2, 23863ms -> 23598ms
Change-Id: Id2ca24906fa89f46fa5fe742ec4b8efc2a61f877
in most cases at least the left column was a harmless race as it was
left unused later in the code.
Change-Id: I43211df66fb157c6feecf08c681add4fcf18b644
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d
That makes decoder and encoder (only bitstream writing part) a little bit
simpler and faster. Moving get_sb_index() function to the encoder.
Change-Id: Ie91aaeefd69c84b085948267b33556a7666c6278
coef_counts is now in cpi->mb, instead of cpi. The commit corrected the
mis-use and enable succefual build.
Change-Id: I0e77909d34571cfd2560c66b46b1f8fa0cd1a6b4
Making this change in order to move allow_high_precision_mv field
from MACROBLOCKD structure to VP9_COMMON (because it is a frame level
flag).
Change-Id: I1d006ba36d938e0caf4d40fa051e2e38df9c1108
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Id623c5113262655fa50f7c9d6cec9a91fcb20bb4
cherry-picked from:
commit 988b70844e03efcfcc075a9bc25d846670494f36
Author: Pascal Massimino <pascal.massimino@gmail.com>
Date: Fri Aug 2 11:15:16 2013 -0700
add WebPWorkerExecute() for convenient bypass
This is mainly for re-using the worker structs without using the
thread.
Change-Id: I8e1be29e53874ef425b15c192fb68036b4c0a359
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob c0d318aee628fdf9ba4876451a28aa978f1066b8 src/utils/thread.c
100644 blob c2b92c9fe353f8e514f78922f3d237204a9cbc66 src/utils/thread.h
Change-Id: I13fe92b1e94062bb99fdeeb7cb0b4b0575d27793
* changes:
Use a separate MODE_INFO stream for each tile column
Get rid of "this_mi", use "mi_8x8[0]" everywhere instead
Make the static_segmentation feature work again
First pass does not produce compressed data, therefore encode/decode
match check is not initialized.
Change-Id: I1971a6747337872a850987cc70ba267bd0f1d564
The only case where they were intentionally pointing to different
structures was in mbgraph, and this didn't have the expected behavior
because both of these pointers are used interchangeably through the code
Change-Id: I979251782f90885fe962305bcc845bc05907f80c
Moving code that gets band_translate array from get_scan_and_band()
function to get_band_translate() function. Renaming get_scan_and_band() to
get_scan().
Change-Id: I43047c205a1ca2a6e24be44db39dc04b7a385008
This should be similar to what x264 does with --aq-mode 1.
It works well with clips like parkjoy and touhou
(http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv).
At low bitrates, the segmentation signaling overhead may negate the
benefits of this feature.
(PGW) Default changed to feature OFF to allow provisional merge.
Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
Updated the encoder to handle frames that are coded
intra-only. Intra-only frames must be non-showable,
that is, the "show frame" flag must be set to 0 in
the frame header.
Tested by forcing the ARF frames to be coded intra-
only.
Note: The rate control code will need to be modified
to account for intra-only frames better than they
are currently handled.
Change-Id: I6a9dd5337deddcecc599d3a44a7431909ed21079
Remove the semicolon in the definition of vp9_zero macro. Make all
the use cases of vp9_zero of consistent format.
Change-Id: Ibaf9751e8595872b12766381a93d185a4d90df8f
The commit added check to make sure no invalid memory access even when
the decoder instance is never initialized.
Change-Id: I4da343d0b3c78c27777ac7f5ce7688562c69f0c5
For bad input data, the decoder may access the array out of bounds. The
commit added clamp to prevent such out of bound access
Change-Id: I0a1cfd9b8786ea7113a998053c76605c963b077a
Use the zcoeff_blk buffer of PICK_MODE_CONTEXT to store the indexes
of all-zero-coeff block of the current best mode. Remove the temporary
buffer best_zcoeff_blk defined in the rate-distortion optimization
loop. This improves the speed performance by about 0.5% in all speed
settings.
Change-Id: Ie3e15988ddfa581eafa2e19a8228d3fe4a46095c
This commit moves token_cache buffer into macroblock struct, instead
of defining as a local variable in cost_coeffs. This avoids repeatedly
re-allocating memory space in the rate-distortion optimization loop.
The runtime at speed 0 reduces:
bus 2000kbps, 161692ms to 159951ms
football 600kbps, 229505ms to 225821ms
Change-Id: If7da6b0b6d8c5138a16271a33c4548fba33d8840
"-no-prec-div" option helps codec performance, so it was added back.
"-no-intel-extensions" was added to suppress link warning #10237.
option '-use-asm' is deprecated and removed.
Tested icc 32bit build and 64bit build.
Change-Id: I736ec2619857efd425ef76338dc52f8fbc0bcc7e
Using TREE_SIZE for the following trees:
vp9_intra_mode_tree
vp9_inter_mode_tree
vp9_partition_tree
vp9_switchable_interp_tree
vp9_mv_joint_tree
vp9_mv_class_tree
vp9_mv_class0_tree
vp9_mv_fp_tree
Change-Id: I0212bb4c1ee6648249f68517e28a67a56591ee1b
Values of MODE_UPDATE_PROB and VP9_COEF_UPDATE_PROB are equal, so replacing
them with one constant. Inlining appropriate arguments for functions:
vp9_cond_prob_diff_update (encoder)
vp9_diff_update_prob (decoder)
Change-Id: I1255a1cb477743b799b3bfbbcd8de6b32b067338
Converts the constant rddiv parameter to 128 (from 100) and
implements RDCOST with bit-shift rather than multiplication.
Other parameters are also adjusted to roughly keep the same
balance between Rate and Distortion.
There is a slight speed-up of about 0.5-1% (at speed 0) as
testted on football_cif.
There is a slight change in performance due to small change
in the parameters.
derfraw300: +0.033%
stdhdraw250; +0.102%
Change-Id: I70ac69f58fa71c83108f68fe41796cd19d1fc760
The commit changes to mask available intra prediction modes for test
based on prediction block size.
With this patch, encoding time of CpuUsed 2 reduces from 10% to 20% for
HD clips with a compression drop of 0.2%
Change-Id: I65f320f1237c0f5ae3a355bf7caf447f55625455
When the codec in VBR (or cq) mode hits its max q limits and is
struggling to hit a target bandwidth, the bit target per frame collapses.
In the first instance normal frames cap out at the maximum allowed
Q and then the ARF and GFs do the same. This latter behavior is not
generally desirable as GFs and ARFs are only effective from a quality
and data rate perspective if they have at lease some level of -Q delta
compared to the surrounding frames.
In this patch I define a separate max Q for GFs and ARFs that is
derived from but somewhat lower than that defined for normal frames.
In effect there is a minimum Q delta that will always be available for
GFs and ARFs regardless of the target rate and MAXQ setting.
This may of course mean that the absolute lowest rate obtainable for
a given clip is somewhat higher.
Change-Id: I268868b28401900d0cd87e51e609cd3b784ab54a
We have two SSE2-optimized functions for idct4_1d:
vp9_idct4_1d_sse2 <-- removing this one
idct4_1d_sse2
vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2
idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2
vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
Renames:
fdct4_1d -> fdct4
fadst4_1d -> fadst4
fdct8_1d -> fdct8
fadst8_1d -> fadst8
fdct16_1d -> fdct16
fadst16_1d -> fadst16
"_1d" suffix is redundant, so removing it. The same will happen with idct
in the next change sets.
Change-Id: Ibf421cd2f569146c6079269df7a31819c098265e
This commit re-designs the per transformed block rate-distortion
costs tracking buffers. It removes redundant buffer usage, makes
the needed context memory allocation per VP9_COMP instance and
reuses the same buffer sets inside the rate-distortion optimization
search loop, thereby avoiding repeatedly requiring memory space.
It reduces speed 0 runtime:
bus at 2000 kbps from 166763ms to 158967ms,
football at 600 kbps from 246614ms to 234257ms.
Both about 5% speed-up. Local tests suggest about 2% to 5% speed-up
for speed 1 and 2 settings. This does not change compression
performance.
Change-Id: I363514c5276b5cf9a38c7251088ffc6ab7f9a4c3
Increases these parameters.
There is a small efficiency gain.
Change-Id: Ie5f0ddb39c907d335e0dafa5eb112365a81f4542
derfraw300: +0.091%
stdhdraw250: +0.238%
The intra mode distortion adjustment for skip_encode feature was
broken in the refactoring cc91851. This commit fixes it and tunes
the distortion models used therein.
Change-Id: I0d676e82f8e855536a90cf9b3e3fdefafcd886c6
snprintf is not supported by MSVC, the commit replace it with the msvc
variant _snprintf to enable build.
Change-Id: I686943a78c289bae6b486a5e75effad5f86c24de
Use b_mode_info to store the inter prediction mode of sub8x8 block,
in replacement of the use of partition_info. Remove redundant buffer
update for partition_info. For bus_cif at 2000 kbps, this seem to make
speed 0 about 1% faster.
Change-Id: Id1b3be45e75a24fb4b42335ac480c23e440978f6
When all coefficients are zeros, skip the corresponding 1-D inverse
transform. This practice has been used in the SSE2 implementation of
inverse 32x32 DCT. This commit imports this algorithm into the C code.
Change-Id: I0f58bfcb183a569fab85d524d5d9cf8ae8653f86
We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).
Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
For CpuUsed 1 & 2, this commit allow to skip retangular partition check
when NONE is better than SPLIT. It also changed to allow such logic
on alt ref frame coding rather than use square partition all them. The
change has gain compressio about .3% on yt and ythd for both 1&2, It
helped .6% compression on cif and stdhd for both CpuUsed 1&2.
Change-Id: I814b653baf89f59acd20e042629a12938a1bd4e5
Now we have entropy code separate from scan/iscan code. The next step
in future is to move iscan code from common part to the encoder.
Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451
A new set of MSVC warnings were introduced by change
I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847
In particular MSVC does not like:-
typedef const int16_t subpel_kernel[SUBPEL_TAPS];
struct subpix_fn_table {
const subpel_kernel *filter_x;
const subpel_kernel *filter_y;
};
causes new warning in MSVC.
warning C4114: same type qualifier used more than once
Change-Id: Iae596fd13aadf36169faf00c68eabe9a32a9b156
This commit allows sub8x8 intra modes test in the rate-distortion
loop for hd sequences in speed 1 and 2.
For sequence y90n of hd set at 8000 kbps, speed 2 runtime goes
from 207s to 210s. For ped_1080p at 3000 kbps, speed 2 runtim goes
from 336s to 337s. Both are running with 300 frames.
This improves compression performance by 0.24% for stdhd and 0.32%
for hd.
Change-Id: I173ca38a6411565ae6cfadd184c42b2070c5de1f
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
Speed 4 still does not give a big gain over speed 3.
This just cleans it up a little from the last patch and comments
out features that do not seem to be giving much benefit.
Change-Id: I5f366e6160e1dbe5dc45cf5eb90cc02712baa1b6
Allow selective masking of individual split modes rather than
just a single on / off flag.
For speed 2 recovers the large speed loss seen for some derf
clips in change Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
and a small quality gain.
For speed 1 10 % speed increase observed locally on some derf clips
for minimal quality change.
Change-Id: If86191087b93cbc05351c26c60c7933e2149e485
Moving INTERPOLATIONFILTERTYPE enum and subpix_fn_table struct to
vp9_filter.h. Adding convenient typedef for subpel kernels.
Function vp9_setup_interp_filters() besides setting xd->subpix.filter_x &
xd->subpix.filter_y has a side effect of also setting scale factors. This
is not required inside decode_modes_b() because scale factors have been
already set by set_ref() calls. That's why replacing
vp9_setup_interp_filters() call with newly created vp9_get_filter_kernel()
call. The behavior of vp9_setup_interp_filters() is unchanged (it
is used from the encoder).
Change-Id: I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847
This commit removes the redundant second reference frame check in
the rate-distortion optimization loop for sub8x8 blocks.
Change-Id: I13a57a6f624c4a9bcef02ff2a867fa30d8b44a93
This commit defines b_mode_info as a struct type. This will allow
us to further remove the use of PARTITION_INFO in the encoding process.
Change-Id: I975b0f7d557b5e0f66545a61b472def76b671cce
This commit separates the rate-distortion optimization loop of
superblocks from that of sub8x8 blocks. This allows better design
rate-distortion optimization search loop for each setting. It also
removes the use of SPLITMV and I4X4_PRED therein.
No performance change in speed 0 settings. For bus@CIF at 2000kbps,
the speed 1 runtime goes from 48009ms to 43894ms (about 10% faster).
The overall compression performance on derf changed by -0.021%.
Speed 2 runtime goes from 27114ms to 28700ms (6% slower), while the
overall coding efficiency goes up by 1.629% for derf, 1.236% for yt.
Change-Id: Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
In subpixel filters, prefetched source data, unrolled loops,
and interleaved instructions.
In HORIZx4, integrated the idea in Scott's CL (commit:
d22a504d11), which was suggested by
Erik/Tamar from Intel. Further tweaking was done to combine row 0,
2, and row 1, 3 in registers to do more 2-row-in-1 operations until
the last add.
Test showed a ~2% decoder speedup.
Change-Id: Ib53d04ede8166c38c3dc744da8c6f737ce26a0e3
Substantial reworking of the speed vs quality trade offs for
speed 1 and 2.
In this patch I am attempting to freeze the "quality" meaning of
speeds 1 and 2 relative to speed 0 so that in future we can
better evaluate progress.
I am targeting :
Speed 1 quality ~-5% vs speed 0.
Speed 2 quality ~-10% vs speed 0
It is inevitable that quality will still fluctuate a little as we adjust
settings and add new features, but we will attempt to keep as
close as possible to these values. Above speed 2 things will remain
a bit more fluid for now.
In this patch speed 1 is approximately 4-5x as fast as speed 0. This
is similar to before but the quality hit is a lot less. Likewise speed 2
is approximately 2x as fast as speed 1 but is similar in quality to the
previous speed 1 configuration.
Also slight change to behavior of FLAG_EARLY_TERMINATE to insure
all reference frames get at least one rd test. Important for very low
variance regions.
WIP :- Added a new speed level with old speed 4 becoming speed 5.
Speed 3 and 4 tradeoffs still WIP
Change-Id: Ic7a38dd7b5b63ab1501f9352411972f480ac6264
This commit causes use last partition to consider whether a 64x64 has
motion that might make a new partitioning worth while.
Change-Id: I3a57bedef4f3cd961fadbfa96651c206fa36da4a
Simplify the k_cvtlo_epi16 and k_cvthi_epi16 to only two
instructions. Then inlined them.
quoting from intel MMX_App_Compute_16bit_Vector.pdf
"The PMADDWD instruction multiplies four
pairs of 16-bit numbers and produces partial sums of the results
and can do so once per clock (with a three-clock latency)."
so I am assuming that there will be three clock overhead after the
last _mm_madd_pi16 command.
Even with the overhead the number of clocks in general should be
smaller. I am not sure though becasue I could not find information
about number of clocks required for instructions in k_cvtlo_epi16
and k_cvthi_epi16. I will run a test and compare the execution time.
Change-Id: Ieda4aa338f69ad3dd196ac6e7892da3cf1b47ea7
Moving functions from vp9_idct_blk to vp9_idct because these functions are
used from both encoder and decoder. Removing duplicated code from
vp9_encodemb.c and reusing existing functions.
Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
The codec should effectively run with motion vector of range (-2048, 2047)
in full pixels, for sequences of 1080p and below. Add assertions to clarify
this behavior.
Change-Id: Ia0cac28249f587d8f8882205228fa480263ab313
Moving out decode_tokens function calls and adding decode_blocks boolean
variable. We only have to decode if eobtotal > 0, i.e. we have at least one
non-zero coefficient. Also inlining and remove vp9_set_pred_flag_mbskip
function.
Change-Id: I7be38b12ee8206faf0beea2bbf4d52be42575b03
The declaration of the bilinear filters specified an alignment clause
in the implementation file but not in the header. This turned out
to be harmless, but it did cause linker warnings to be emitted when
building on Windows.
The (extern) declaration in the header was changed, to match the
declaration in the implementation.
Change-Id: I44be89b1572fe9a50fa47a42e4db9128c4897b04
Interleaved the instructions, reduced register dependency, and
prefetched the source data. This improved the decoder speed
by 0.6% - 2%.
Change-Id: I568067aa0c629b2e58219326899c82aedf7eccca
byte version of ronalds d153 ssse3 optimizations for
4x4 and 8x8
(commit: fc91a2a112238a1aee568f3b840585de4e928fca)
Change-Id: Iec4426032311483f615fd9e0dceba3ee85ddebd7
Make encoder skip rectangular partition check in speed 1 and above,
when early termination was triggered in partition split.
Thanks Guillaume (gmartres@) for catching this issue.
This change makes bus_cif at 2000kbps speed 1 runtime goes down from
25612ms to 23438ms (about 9% speed-up), at the expense of -0.235%
performance down.
Change-Id: I98613fad081a261d30d5fa206f934ca70601c180
We don't need these functions anymore. The only one which was actually
used is vp9_add_constant_residual_32x32. Addition of
vp9_short_idct32x32_1_add eliminates this single usage. SSE2 optimized
version of vp9_short_idct32x32_1_add will be added in the next patch set,
right now it is only C implementation. Now we have all idct functions
implemented in a consistent manner.
Change-Id: I63df79a13cf62aa2c9360a7a26933c100f9ebda3
The code now takes into account temporal and spatial
information to determine the partition size range, but the
frequency counts have been removed.
The net effect is similar in quality but about 10% faster.
Change-Id: I39a513fb79cec9177b73b2a7218f0da70963ae95
This patch deletes the variance based speed three partitioning.
Speed 3 now uses the same partitioning method as speed 2
but with some stricter conditions.
The speed and quality are now somewhere between speeds 2 and 4
whereas before it was worse in both than speed 4.
Change-Id: Ia142e7007299d79db3ceee6ca8670540db6f7a41
apparently we are going to have trouble completely removing lint issue in this file.
It needs a bit more work. We need to include vpx_config.h to know whether
we need to have multi threading . and that means vpx_config.h has to come
before the system headers. ( a violation )
Change-Id: I023feeab1bf5643b79dccc3b80a4a9ad42689e7b
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
Include the whitespace after the first argument's comma in the
optional first argument group.
This fixes a minor style regression in the converted output
since 2a233dd31.
Change-Id: I254f4aaff175e2d728d9b6f3c12ede03846adcf1
Both vp9_init_mbmode_probs() and vp9_zero(cm->ref_frame_sign_bias) are
called inside vp9_setup_past_independence() which called in any case for
encoder/decoder after VP9_COMMON struct creation.
Change-Id: I3724d1a4fb8060101ff0290dd6a158f0b5c57bb4
Replace current code which corrupts the stack by
duplicate of vp8 code to save and restore neon
registers.
Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
Some small changes to the quantizer mapping functions.
Also includes some cleanups.
Change-Id: I9dea29b24015f6e6697012a0e4d8983049d8e5c7
Results:
derfraw300: +0.106%
stdhdraw250: +0.139%
Don't divide RDMULT and RDDIV by 100 when RDMULT > 1000. This was
probably done to avoid overflow when the rd cost was stored in a 32 bits
integer but this is not the case anymore. This change will make it easier
to support multiple quantizers per frame.
derf compression gain at speed 0: 0.037%
Change-Id: Ibeeb9b7cfa1a132a7af41bc90fc07a3bba0857f6
Jenkins warns on left shift of negative numbers and non-aligned read
of int. This commit fixed the two issues.
Change-Id: I389a7fb6a572c643902e40a4c10fefef94500d2c
- full ASM version, no more C gateway file.
- integrate combine-add with last step of 2nd pass.
- remove a few push/pop pairs.
- some instruction reordering to hide latency.
Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
Both first pass and mbgraph search use block size 16x16 for motion
estimation. This commit put a limit of motion vector range. The
effective range allows the entire 16x16 with required subpel
interpolation input to be completely outside image border, but
not any further away from image border.
Change-Id: Id70a5ed08be49e70959f064859d72adc7d775d08
INT64_MAX may be assigned as RDCOST when RDCSOST computation is skipped
for speed, this commit to prevent INT64_MAX from being used as real
RDCOST in transform size decision.
Change-Id: I89a945134191bbdea1f1431ade70424ac079eaac
After change of MI context storage , mi_8x8[] pointer may be null for
a block outside of image border. The commit changes to access the data
only after validation of mi_row and mi_col.
Change-Id: I039c4eb486a228ea9d8e5f35ab9ae6717d718bf3
The probability model used to code prediction mode is conditioned
on the immediate above and left 8x8 blocks' prediction modes. When
the above/left block is coded in sub8x8 mode, we use the prediction
mode of the bottom-right sub8x8 block as the reference to generate
the context.
This commit moves the update of mbmi.mode out of the sub8x8 decoding
loop, hence removing redundant update steps and keeping the bottom-
right block's mode for the decoding process of next blocks.
Change-Id: I1e8d749684d201c1a1151697621efa5d569218b6
39c7b01d accidently reverted the row/col initialization, which broke
mv clamps, which is dependent on the sites for valid motion vector
range. This commit fixed the issue.
Change-Id: Ibcce0226e0360b1ef483fe760b2e33f1af4bf494
Added hiding global symbols for macho32 and macho64 in x86inc.asm.
This was done to fix exported symbol issue in Chrome build.
Change-Id: I08d5c559b985b82f655b537469fee125615e78c0
This commit enables forcing all coefficients zero per transformed
block, when its rate-distortion cost is lower than regular coeff
quantization.
The overall performance improvement (including its parent patch on
calculating rd cost per transformed block) at speed 1:
derf: 0.298%
yt: 0.452%
hd: 0.741%
stdhd: 0.006%
Change-Id: I66005fe0fd7af192c3eba32e02fd6d77952accb5
Adds modeled functions to decide the qp for altref frames in constant q
mode similar to other functions in use in bitrate mode.
Also turns on the constrained quality mode (end-usage=2) option which
was turned off before. Basic testing shows the mode works in principle,
to cap bitrate to the target-bitrate specified, while allowing lower
bitrate depending on the cq-level specified. The mode will need to be
improved over time.
Results for constant quality vs bitrate control mode:
derfraw300/fullderfraw: +3.0% at constant quality over bitrate control.
fullstdhdraw: +4.341%
stdhdraw250: +5.361%
Change-Id: If5027c9ec66c8e88d33e47062c6cb84a07b1cda9
This commit makes the rate-distortion optimization loop evaluate
the rd costs of regular quantization and all zero coeffs, per
transformed block. It improves speed 1 compression performance:
derf: 0.245%
yt: 0.515%
For a large partition that consists multiple transformed blocks,
this allows more flexibility to selectively force a portion of
them coded as all zero coeffs, as well be continued in the next
patches.
Change-Id: I211518be4179747b57375696f017d1160cc91851
The sub8x8 blocks has its own motion vector reference scheme. The
mv_pred is only used blocks of sizes 8x8 and above, to find the
starting point for motion search.
This change does not change any coding behavior. It makes the
encoding process slightly faster. (0.5% speed-up for local test on
speed 1.)
Change-Id: I746ee6ef0eac19aa3621be014afa12be8d82cbb9
The fake token EOSB may cause invaild memory read in pack token, this
commit reworked the loop to avoid such invalid read.
Change-Id: I37fdfce869b44a7f90003f82a02f84c45472a457
Now the same regexp that previously handled cases such as
"ldr r1, [r2, -r3]" also can handle the first operand being omitted
as in "pld [r2, -r3]".
This fixes building vp9_convolve8*neon.asm in thumb mode (and thus,
for Windows Phone as well).
Change-Id: I20c1c3f2bfb2587fb5fa523b863972a7fe30d8ff
Current x86inc.asm didn't handle 32bit PIC build properly.
TEXTRELs were seen in the library built. The PIC macros from
libvpx's x86_abi_support.asm was used to fix this problem.
The assembly code was modified to use the macros.
Notes: We need this fix in for decoder building. Functions in
encoder will be fixed later.
Change-Id: Ifa548d37b1d0bc7d0528db75009cc18cd5eb1838
This commit cleans up the second reference check in the
rate-distortion optimization loop of sub8x8 blocks.
Change-Id: Ife68feaa4cddbfad2878c9b44d3012788d634f97
Modified the resize unit test so that it optionally
writes the encoded bitstream to file. The macro
WRITE_COMPRESSED_STREAM should be set to 1 to enable
output of the test bitstream; it is set to 0 by default.
Change-Id: I7d436b1942f935da97db6d84574a98d379f57fb1
The sub8x8 check can be directly inferred from block_idx, hence
removed from the arguments if get_sub_block_mv.
Change-Id: Ib766d57e81248fb92df0f6d9b163e6c77b933ccd
This commit reworked the unit test for 8x8 forward transform. It
allows scalability to cover various implemented versions.
Change-Id: I5594bd3e2307bb5bec764eaffd8860caa260e432
Jenkins was failing to detect the case where an existing
file is recreated with new content. In this case, thinking
that the file already existed, Jenkins did not re-copy the
file as it should have.
By adding the file test-data.sha1 as a dependendency to
the LIBVPX_TEST_DATA build target the files will be
recopied if the MD5 of an existing file changes.
This could be further improved to only copy files that
have changed rather than copying the whole set as done in
this patch.
(Thanks to jzern@ who diagnozed ithe problem and suggested
this fix).
Change-Id: Icea7c61a95189bc639fec83020c28c70da5b2b41
Reduce the delta loop filter for blocks that are cyclicly refreshed.
This helps to reduce the dot artifacts that may happen
when zero_mv blocks are repeatedly loop-filtered.
This change, along with the fix in:
https://gerrit.chromium.org/gerrit/#/c/40409/
helps to reduce this artifact, but cannot remove the dot artifacts completely.
Change-Id: I44675e7a0f59295b648a3b7d4956fb301231a97f
2013-01-11 16:46:09 -08:00
1150 changed files with 172722 additions and 196443 deletions
Description: SmartyPants is a web publishing utility that translates plain ASCII punctuation characters into “smart” typographic punctuation HTML entities. This plugin <strong>replace the default WordPress Texturize algorithm</strong> for the content and the title of your posts, the comments body and author name, and everywhere else Texturize normally apply. Based on the original Perl version by <a href="http://daringfireball.net/">John Gruber</a>.
Version: 1.5.1e
Author: Michel Fortin
Author URI: http://www.michelf.com/
*/
if(isset($wp_version)){
# Remove default Texturize filter that would conflict with SmartyPants.
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'CASE_KEYWORDS\'] specification which is neither of GESHI_CAPS_NO_CHANGE, GESHI_CAPS_LOWER nor GESHI_CAPS_UPPER!');
}
if(!isset($language_data['KEYWORDS'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'KEYWORDS\'] structure to check!');
} else if (!is_array($language_data['KEYWORDS'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'KEYWORDS\'] structure which is not an array!');
} else {
foreach($language_data['KEYWORDS'] as $kw_key => $kw_value) {
if(!is_integer($kw_key)) {
report_error(TYPE_WARNING, "Language file contains an key '$kw_key' in \$language_data['KEYWORDS'] that is not integer!");
} else if (!is_array($kw_value)) {
report_error(TYPE_ERROR, "Language file contains a \$language_data['CASE_SENSITIVE']['$kw_value'] structure which is not an array!");
}
}
}
if(!isset($language_data['SYMBOLS'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'SYMBOLS\'] structure to check!');
} else if (!is_array($language_data['SYMBOLS'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'SYMBOLS\'] structure which is not an array!');
}
if(!isset($language_data['CASE_SENSITIVE'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'CASE_SENSITIVE\'] structure to check!');
} else if (!is_array($language_data['CASE_SENSITIVE'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'CASE_SENSITIVE\'] structure which is not an array!');
} else {
foreach($language_data['CASE_SENSITIVE'] as $cs_key => $cs_value) {
if(!is_integer($cs_key)) {
report_error(TYPE_WARNING, "Language file contains an key '$cs_key' in \$language_data['CASE_SENSITIVE'] that is not integer!");
} else if (!is_bool($cs_value)) {
report_error(TYPE_ERROR, "Language file contains a Case Sensitivity specification for \$language_data['CASE_SENSITIVE']['$cs_value'] which is not a boolean!");
}
}
}
if(!isset($language_data['URLS'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'URLS\'] structure to check!');
} else if (!is_array($language_data['URLS'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'URLS\'] structure which is not an array!');
} else {
foreach($language_data['URLS'] as $url_key => $url_value) {
if(!is_integer($url_key)) {
report_error(TYPE_WARNING, "Language file contains an key '$url_key' in \$language_data['URLS'] that is not integer!");
} else if (!is_string($url_value)) {
report_error(TYPE_ERROR, "Language file contains a Documentation URL specification for \$language_data['URLS']['$url_value'] which is not a string!");
} else if (preg_match('#&([^;]*(=|$))#U', $url_value)) {
report_error(TYPE_ERROR, "Language file contains unescaped ampersands (&) in \$language_data['URLS']!");
}
}
}
if(!isset($language_data['OOLANG'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'OOLANG\'] specification!');
} else if (!is_int($language_data['OOLANG']) && !is_bool($language_data['OOLANG'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'OOLANG\'] specification which is neither boolean nor integer!');
} else if (false !== $language_data['OOLANG'] &&
true !== $language_data['OOLANG'] &&
2 !== $language_data['OOLANG']) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'OOLANG\'] specification which is neither of false, true or 2!');
}
if(!isset($language_data['OBJECT_SPLITTERS'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'OBJECT_SPLITTERS\'] structure to check!');
} else if (!is_array($language_data['OBJECT_SPLITTERS'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'OBJECT_SPLITTERS\'] structure which is not an array!');
}
if(!isset($language_data['REGEXPS'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'REGEXPS\'] structure to check!');
} else if (!is_array($language_data['REGEXPS'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'REGEXPS\'] structure which is not an array!');
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'STRICT_MODE_APPLIES\'] specification which is neither of GESHI_MAYBE, GESHI_ALWAYS nor GESHI_NEVER!');
}
if(!isset($language_data['SCRIPT_DELIMITERS'])) {
report_error(TYPE_ERROR, 'Language file contains no $language_data[\'SCRIPT_DELIMITERS\'] structure to check!');
} else if (!is_array($language_data['SCRIPT_DELIMITERS'])) {
report_error(TYPE_ERROR, 'Language file contains a $language_data[\'SCRIPT_DELIMITERS\'] structure which is not an array!');
report_error(TYPE_ERROR, "Language file contains no \$language_data['CASE_SENSITIVE'] specification for keyword group $key!");
}
if(!isset($language_data['URLS'][$key])) {
report_error(TYPE_ERROR, "Language file contains no \$language_data['URLS'] specification for keyword group $key!");
}
if(empty($keywords)) {
report_error(TYPE_WARNING, "Language file contains an empty keyword list in \$language_data['KEYWORDS'] for group $key!");
}
foreach($keywords as $id => $kw) {
if(!is_string($kw)) {
report_error(TYPE_WARNING, "Language file contains an non-string entry at \$language_data['KEYWORDS'][$key][$id]!");
} else if (!strlen($kw)) {
report_error(TYPE_ERROR, "Language file contains an empty string entry at \$language_data['KEYWORDS'][$key][$id]!");
} else if (preg_match('/^([\(\)\{\}\[\]\^=.,:;\-+\*\/%\$\"\'\?]|&[\w#]\w*;)+$/i', $kw)) {
report_error(TYPE_NOTICE, "Language file contains an keyword ('$kw') at \$language_data['KEYWORDS'][$key][$id] which seems to be better suited for the symbols section!");
report_error(TYPE_WARNING, "Language file contains an superfluous \$language_data['CASE_SENSITIVE'] specification for non-existing keyword group $key!");
}
}
foreach($language_data['URLS'] as $key => $keywords) {
if(!isset($language_data['KEYWORDS'][$key])) {
report_error(TYPE_WARNING, "Language file contains an superfluous \$language_data['URLS'] specification for non-existing keyword group $key!");
}
}
foreach($language_data['STYLES']['KEYWORDS'] as $key => $keywords) {
if(!isset($language_data['KEYWORDS'][$key])) {
report_error(TYPE_WARNING, "Language file contains an superfluous \$language_data['STYLES']['KEYWORDS'] specification for non-existing keyword group $key!");
}
}
foreach($language_data['COMMENT_SINGLE'] as $ck => $cv) {
if(!is_int($ck)) {
report_error(TYPE_WARNING, "Language file contains an key '$ck' in \$language_data['COMMENT_SINGLE'] that is not integer!");
}
if(!is_string($cv)) {
report_error(TYPE_WARNING, "Language file contains an non-string entry at \$language_data['COMMENT_SINGLE'][$ck]!");
report_error(TYPE_NOTICE, "Language file contains an superfluous \$language_data['STYLES']['COMMENTS'] specification for Single Line or Regular-Expression Comment key $ck!");
}
}
if (isset($language_data['STYLES']['STRINGS']['HARD'])) {
if (empty($language_data['HARDQUOTE'])) {
report_error(TYPE_NOTICE, "Language file contains superfluous \$language_data['STYLES']['STRINGS'] specification for key 'HARD', but no 'HARDQUOTE's are defined!");
report_error(TYPE_WARNING, "Language file contains a regular expression with an unescaped match for a pipe character '|' which needs escaping as '<PIPE>' instead at \$language_data['REGEXPS'][$rk]!");
}
}
} elseif(is_array($rv)) {
if(!isset($rv[GESHI_SEARCH])) {
report_error(TYPE_ERROR, "Language file contains no GESHI_SEARCH entry in extended regular expression at \$language_data['REGEXPS'][$rk]!");
} elseif(!is_string($rv[GESHI_SEARCH])) {
report_error(TYPE_ERROR, "Language file contains a GESHI_SEARCH entry in extended regular expression at \$language_data['REGEXPS'][$rk] which is not a string!");
report_error(TYPE_WARNING, "Language file contains a regular expression with an unescaped match for a pipe character '|' which needs escaping as '<PIPE>' instead at \$language_data['REGEXPS'][$rk]!");
}
}
if(!isset($rv[GESHI_REPLACE])) {
report_error(TYPE_WARNING, "Language file contains no GESHI_REPLACE entry in extended regular expression at \$language_data['REGEXPS'][$rk]!");
} elseif(!is_string($rv[GESHI_REPLACE])) {
report_error(TYPE_ERROR, "Language file contains a GESHI_REPLACE entry in extended regular expression at \$language_data['REGEXPS'][$rk] which is not a string!");
}
if(!isset($rv[GESHI_MODIFIERS])) {
report_error(TYPE_WARNING, "Language file contains no GESHI_MODIFIERS entry in extended regular expression at \$language_data['REGEXPS'][$rk]!");
} elseif(!is_string($rv[GESHI_MODIFIERS])) {
report_error(TYPE_ERROR, "Language file contains a GESHI_MODIFIERS entry in extended regular expression at \$language_data['REGEXPS'][$rk] which is not a string!");
}
if(!isset($rv[GESHI_BEFORE])) {
report_error(TYPE_WARNING, "Language file contains no GESHI_BEFORE entry in extended regular expression at \$language_data['REGEXPS'][$rk]!");
} elseif(!is_string($rv[GESHI_BEFORE])) {
report_error(TYPE_ERROR, "Language file contains a GESHI_BEFORE entry in extended regular expression at \$language_data['REGEXPS'][$rk] which is not a string!");
}
if(!isset($rv[GESHI_AFTER])) {
report_error(TYPE_WARNING, "Language file contains no GESHI_AFTER entry in extended regular expression at \$language_data['REGEXPS'][$rk]!");
} elseif(!is_string($rv[GESHI_AFTER])) {
report_error(TYPE_ERROR, "Language file contains a GESHI_AFTER entry in extended regular expression at \$language_data['REGEXPS'][$rk] which is not a string!");
}
} else {
report_error(TYPE_WARNING, "Language file contains an non-string and non-array entry at \$language_data['REGEXPS'][$rk]!");
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodenable_classes">GeSHi::enable_classes()</a> in geshi.php</div>
<divclass="index-item-description">Sets whether CSS classes should be used to highlight the source. Default is off, calling this method with no arguments will turn it on</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodenable_multiline_span">GeSHi::enable_multiline_span()</a> in geshi.php</div>
<divclass="index-item-description">Sets wether spans and other HTML markup generated by GeSHi can span over multiple lines or not. Defaults to true to reduce overhead.</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodenable_strict_mode">GeSHi::enable_strict_mode()</a> in geshi.php</div>
<divclass="index-item-description">Enables/disables strict highlighting. Default is off, calling this method without parameters will turn it on. See documentation for more details on strict mode and where to use it.</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodget_language_name_from_extension">GeSHi::get_language_name_from_extension()</a> in geshi.php</div>
<divclass="index-item-description">Given a file extension, this method returns either a valid geshi language name, or the empty string if it couldn't be found</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodget_stylesheet">GeSHi::get_stylesheet()</a> in geshi.php</div>
<divclass="index-item-description">Returns a stylesheet for the highlighted code. If $economy mode is true, we only return the stylesheet declarations that matter for this code block instead of the whole thing</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_brackets_highlighting">GeSHi::set_brackets_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for brackets</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_brackets_style">GeSHi::set_brackets_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for brackets. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_comments_highlighting">GeSHi::set_comments_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for comment groups</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_comments_style">GeSHi::set_comments_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for comment groups. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_escape_characters_highlighting">GeSHi::set_escape_characters_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for escaped characters</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_escape_characters_style">GeSHi::set_escape_characters_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for escaped characters. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_highlight_lines_extra_style">GeSHi::set_highlight_lines_extra_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the style for extra-highlighted lines</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_keyword_group_highlighting">GeSHi::set_keyword_group_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for a keyword group</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_keyword_group_style">GeSHi::set_keyword_group_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the style for a keyword group. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_language_path">GeSHi::set_language_path()</a> in geshi.php</div>
<divclass="index-item-description">Sets the path to the directory containing the language files. Note that this path is relative to the directory of the script that included geshi.php, NOT geshi.php itself.</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_numbers_style">GeSHi::set_numbers_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for numbers. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_overall_class">GeSHi::set_overall_class()</a> in geshi.php</div>
<divclass="index-item-description">Sets the overall classname for this block of code. This class can then be used in a stylesheet to style this object's output</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_overall_id">GeSHi::set_overall_id()</a> in geshi.php</div>
<divclass="index-item-description">Sets the overall id for this block of code. This id can then be used in a stylesheet to style this object's output</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_overall_style">GeSHi::set_overall_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for the code that will be outputted when this object is parsed. The style should be a string of valid stylesheet declarations</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_regexps_style">GeSHi::set_regexps_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for regexps. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_strings_style">GeSHi::set_strings_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for strings. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_symbols_style">GeSHi::set_symbols_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for symbols. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_url_for_keyword_group">GeSHi::set_url_for_keyword_group()</a> in geshi.php</div>
<divclass="index-item-description">Sets the base URL to be used for keywords</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_use_language_tab_width">GeSHi::set_use_language_tab_width()</a> in geshi.php</div>
<divclass="index-item-description">Sets whether or not to use tab-stop width specifed by language</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodenable_classes">GeSHi::enable_classes()</a> in geshi.php</div>
<divclass="index-item-description">Sets whether CSS classes should be used to highlight the source. Default is off, calling this method with no arguments will turn it on</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodenable_multiline_span">GeSHi::enable_multiline_span()</a> in geshi.php</div>
<divclass="index-item-description">Sets wether spans and other HTML markup generated by GeSHi can span over multiple lines or not. Defaults to true to reduce overhead.</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodenable_strict_mode">GeSHi::enable_strict_mode()</a> in geshi.php</div>
<divclass="index-item-description">Enables/disables strict highlighting. Default is off, calling this method without parameters will turn it on. See documentation for more details on strict mode and where to use it.</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodget_language_name_from_extension">GeSHi::get_language_name_from_extension()</a> in geshi.php</div>
<divclass="index-item-description">Given a file extension, this method returns either a valid geshi language name, or the empty string if it couldn't be found</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodget_stylesheet">GeSHi::get_stylesheet()</a> in geshi.php</div>
<divclass="index-item-description">Returns a stylesheet for the highlighted code. If $economy mode is true, we only return the stylesheet declarations that matter for this code block instead of the whole thing</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_brackets_highlighting">GeSHi::set_brackets_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for brackets</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_brackets_style">GeSHi::set_brackets_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for brackets. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_comments_highlighting">GeSHi::set_comments_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for comment groups</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_comments_style">GeSHi::set_comments_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for comment groups. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_escape_characters_highlighting">GeSHi::set_escape_characters_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for escaped characters</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_escape_characters_style">GeSHi::set_escape_characters_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for escaped characters. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_highlight_lines_extra_style">GeSHi::set_highlight_lines_extra_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the style for extra-highlighted lines</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_keyword_group_highlighting">GeSHi::set_keyword_group_highlighting()</a> in geshi.php</div>
<divclass="index-item-description">Turns highlighting on/off for a keyword group</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_keyword_group_style">GeSHi::set_keyword_group_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the style for a keyword group. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_language_path">GeSHi::set_language_path()</a> in geshi.php</div>
<divclass="index-item-description">Sets the path to the directory containing the language files. Note that this path is relative to the directory of the script that included geshi.php, NOT geshi.php itself.</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_numbers_style">GeSHi::set_numbers_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for numbers. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_overall_class">GeSHi::set_overall_class()</a> in geshi.php</div>
<divclass="index-item-description">Sets the overall classname for this block of code. This class can then be used in a stylesheet to style this object's output</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_overall_id">GeSHi::set_overall_id()</a> in geshi.php</div>
<divclass="index-item-description">Sets the overall id for this block of code. This id can then be used in a stylesheet to style this object's output</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_overall_style">GeSHi::set_overall_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for the code that will be outputted when this object is parsed. The style should be a string of valid stylesheet declarations</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_regexps_style">GeSHi::set_regexps_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for regexps. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_strings_style">GeSHi::set_strings_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for strings. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_symbols_style">GeSHi::set_symbols_style()</a> in geshi.php</div>
<divclass="index-item-description">Sets the styles for symbols. If $preserve_defaults is true, then styles are merged with the default styles, with the user defined styles having priority</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_url_for_keyword_group">GeSHi::set_url_for_keyword_group()</a> in geshi.php</div>
<divclass="index-item-description">Sets the base URL to be used for keywords</div>
<divclass="index-item-details"><ahref="geshi/core/GeSHi.html#methodset_use_language_tab_width">GeSHi::set_use_language_tab_width()</a> in geshi.php</div>
<divclass="index-item-description">Sets whether or not to use tab-stop width specifed by language</div>
<pclass="description"><p>The GeSHi class for Generic Syntax Highlighting. Please refer to the documentation at http://qbnz.com/highlighter/documentation.php for more information about how to use this class.</p><p>For changes, release notes, TODOs etc, see the relevant files in the docs/ directory.</p><p>This file is part of GeSHi.</p><p>GeSHi is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.</p><p>GeSHi is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.</p><p>You should have received a copy of the GNU General Public License along with GeSHi; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA</p></p>
<!-- ========== Info from phpDoc block ========= -->
<pclass="short-description">Use a "table" to surround the source:</p>
<pclass="description"><p><table><thead><tr><td colspan="2">$header</td></tr></thead><tbody><tr><td><pre>$linenumbers</pre></td><td><pre>$code></pre></td></tr></tbody><tfooter><tr><td colspan="2">$footer</td></tr></tfoot></table></p><p>this is essentially only a workaround for Firefox, see sf#1651996 or take a look at https://bugzilla.mozilla.org/show_bug.cgi?id=365805</p></p>
<ulclass="tags">
<li><spanclass="field">note:</span> when linenumbers are disabled this is essentially the same as GESHI_HEADER_PRE</li>
<spanclass="var-name">$string</span><spanclass="var-description">: The code to highlight</span></li>
<li>
<spanclass="var-type">string</span>
<spanclass="var-name">$language</span><spanclass="var-description">: The language to highlight the code in</span></li>
<li>
<spanclass="var-type">string</span>
<spanclass="var-name">$path</span><spanclass="var-description">: The path to the language files. You can leave this blank if you need as from version 1.0.7 the path should be automatically detected</span></li>
<li>
<spanclass="var-type">boolean</span>
<spanclass="var-name">$return</span><spanclass="var-description">: Whether to return the result or to echo</span></li>
</ul>
</div>
</div>
</div>
<pclass="notes"id="credit">
Documentation generated on Thu, 25 Dec 2008 14:34:34 +0100 by <ahref="http://www.phpdoc.org"target="_blank">phpDocumentor 1.4.2</a>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.