Commit Graph

1338 Commits

Author SHA1 Message Date
Thijs Vermeir
8942f70cdf Fix documentation typos
Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce
2011-04-30 09:34:59 +02:00
Ronald S. Bultje
5a23352c03 Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035
2011-04-29 11:52:09 -07:00
Yaowu Xu
57ad189129 changed configure option name to reduce confusion
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35

Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
2011-04-29 09:39:05 -07:00
Yunqing Wang
dfa9e2c5ea Merge "Use insertion sort instead of quick sort" 2011-04-29 08:27:58 -07:00
Scott LaVarnway
1b2abc5f49 Merge "Consolidated build inter predictors" 2011-04-29 07:13:49 -07:00
James Berry
f10732554b bug fix removed inline from recon_wrapper_sse2.c
removed inline from recon_wrapper_sse2.c to build
for visual stuido

Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22
2011-04-28 15:12:00 -04:00
Scott LaVarnway
219ba87a93 Merge "Use psadbw to get the sum of bytes in a line." 2011-04-28 07:58:20 -07:00
Scott LaVarnway
ccd6f7ed77 Consolidated build inter predictors
Code cleanup.

Change-Id: Ic8b0167851116c64ddf08e8a3d302fb09ab61146
2011-04-28 10:53:59 -04:00
Ronald S. Bultje
1e7ded69cf Use psadbw to get the sum of bytes in a line.
Thanks Jason for pointing that out on #vp8. ;-).

Change-Id: I5330a753e752a8704b78a409597472628e0b26a5
2011-04-27 13:49:21 -07:00
Scott LaVarnway
2e102855f4 Removed unused code in reconinter
The skip flag is never set by the encoder for SPLITMV.

Change-Id: I5ae6457edb3a1193cb5b05a6d61772c13b1dc506
2011-04-27 15:25:32 -04:00
John Koleszar
085fb4b737 Merge "SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}()." 2011-04-27 12:02:55 -07:00
Ronald S. Bultje
1083fe4999 SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
decoding

before
10.425
10.432
10.423
=10.426

after:
10.405
10.416
10.398
=10.406, 0.2% faster

encoding

before
14.252
14.331
14.250
14.223
14.241
14.220
14.221
=14.248

after
14.095
14.090
14.085
14.095
14.064
14.081
14.089
=14.086, 1.1% faster

Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
2011-04-27 11:31:27 -07:00
Yunqing Wang
5abafcc381 Use insertion sort instead of quick sort
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.

Change-Id: Ie02eaa6fed662866a937299194c590d41b25bc3d
2011-04-27 13:53:28 -04:00
John Koleszar
64355ecad3 Merge "Speed up VP8DX_BOOL_DECODER_FILL" 2011-04-27 09:03:45 -07:00
John Koleszar
f8ffecb176 Merge "Update VP8DX_BOOL_DECODER_FILL to better detect EOS" 2011-04-27 09:03:24 -07:00
John Koleszar
5e1fd41357 Speed up VP8DX_BOOL_DECODER_FILL
The end-of-buffer check is hoisted out of the inner loop. Gives
about 0.5% improvement on x86_64.

Change-Id: I8e3ed08af7d33468c5c749af36c2dfa19677f971
2011-04-27 10:25:03 -04:00
John Koleszar
9594370e0c Update VP8DX_BOOL_DECODER_FILL to better detect EOS
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.

Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69
2011-04-27 10:24:39 -04:00
John Koleszar
db5057c742 Refactor calc_iframe_target_size
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.

Change-Id: I227051361cf46727caa5cd2b155752d2c9789364
2011-04-26 16:55:35 -04:00
John Koleszar
81d2206ff8 Move pick_frame_size() to ratectrl.c
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.

Change-Id: I66a9a62a5f9c23c818015e03f92f3757bf3bb5c8
2011-04-26 16:49:54 -04:00
Scott LaVarnway
0da77a840b Merge "Test vector mismatch fix" 2011-04-26 10:12:37 -07:00
Scott LaVarnway
7a2b9c50a3 Test vector mismatch fix
Fixed test vector mismatch that was introduced
in the "Removed dc_diff from MB_MODE_INFO"
(Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931)

Change-Id: I98fa509b418e757b5cdc4baa71202f4168dc14ec
2011-04-26 09:37:19 -04:00
Johann
d5c46bdfc0 Merge "remove simpler_lpf" 2011-04-25 14:51:07 -07:00
Johann
01527e743f remove simpler_lpf
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers

stop tracking the option in two places. use filter_type exclusively

Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25 17:37:41 -04:00
John Koleszar
fd6da3b2e7 Fix duplicate vp8_compute_frame_size_bounds
Likely introduced by a bad automatic merge from gerrit.

Change-Id: I0c6dd6ec18809cf9492f524d283fa4a3a8f4088b
2011-04-25 14:30:57 -04:00
John Koleszar
1f32b1489c Merge "Remove unused functions" 2011-04-25 11:05:00 -07:00
John Koleszar
47bc1c7013 Remove unused functions
Remove estimate_min_frame_size() and calc_low_ss_err(), as they are
never referenced.

Change-Id: I3293363c14ef70b79c4678ca27aa65b345077726
2011-04-25 13:54:23 -04:00
John Koleszar
cfbfd39de8 Merge "Change rc undershoot/overshoot semantics" 2011-04-25 10:49:32 -07:00
John Koleszar
76557e34d2 Merge "Limit size of initial keyframe in one-pass." 2011-04-25 10:48:13 -07:00
John Koleszar
d9f898ab6d Merge "Add rc_max_intra_bitrate_pct control" 2011-04-25 10:47:57 -07:00
John Koleszar
454cbc96b7 Limit size of initial keyframe in one-pass.
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.

This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.

Change-Id: Ieba55b86517b81e51e6f0a9fe27aabba295acab0
2011-04-25 13:47:20 -04:00
John Koleszar
aa926fbd27 Add rc_max_intra_bitrate_pct control
Adds a control to limit the maximum size of a keyframe, as a function of
the per-frame bitrate. See this thread[1] for more detailed discussion:

[1]: http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/271b944a5e47ca38

Change-Id: I7337707642eb8041d1e593efc2edfdf66db02a94
2011-04-25 13:47:14 -04:00
John Koleszar
2089b2cee5 Merge "bug fix possible keyframe context divide by zero" 2011-04-25 09:35:12 -07:00
James Berry
8d5ce819dd bug fix possible keyframe context divide by zero
vp8_adjust_key_frame_context() divides by
estimate_keyframe_frequency() which can
return 0 in the case where --kf-max-dist=0.

Change-Id: Idfc59653478a0073187cd2aa420e98a321103daa
2011-04-25 12:16:36 -04:00
Johann
aeca599087 Merge "keep values in registers during quantization" 2011-04-25 06:52:38 -07:00
Scott LaVarnway
c36b6d4d01 Merge "Removed unnecessary frame type checks" 2011-04-25 06:45:43 -07:00
Scott LaVarnway
5b67329747 Merge "Removed dc_diff from MB_MODE_INFO" 2011-04-25 06:45:32 -07:00
Ronald S. Bultje
496bcbb0de Fix overflow in temporal_filter_apply_sse2().
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.

Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
2011-04-22 10:00:38 -04:00
John Koleszar
73c3d32705 Merge "Remove unused kf rate variables" 2011-04-21 16:54:14 -07:00
Adrian Grange
d2a6eb4b1e Corrected format specifiers in debug print statements
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.

Change-Id: I3d2aa8a448d52e6dc08858d825bf394929b47cf3
2011-04-21 15:45:57 -07:00
Johann
508ae1b3d5 keep values in registers during quantization
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.

Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
2011-04-21 15:47:55 -04:00
Scott LaVarnway
6f6cd3abb9 Removed unnecessary frame type checks
ref_frame is set to INTRA_FRAME for keyframes.  The B_PRED
mode is only used in intra frames.

Change-Id: I9bac8bec7c736300d47994f3cb570329edf11ec0
2011-04-21 14:59:42 -04:00
Scott LaVarnway
3698c1f620 Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering.  Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.

Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
2011-04-21 14:38:36 -04:00
Scott LaVarnway
7a49accd0b Removed force_no_skip
force_no_skip is always set to zero.

Change-Id: I89b61c5e0bee34627a9c07c05f3517e1db76af77
2011-04-20 15:45:12 -04:00
Scott LaVarnway
09c933ea80 Removed redundant checks of the mode_info_context flags
Code cleanup.  The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.

Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
2011-04-20 14:06:40 -04:00
Attila Nagy
43464e94ed Do not copy data between encoder reference buffers.
Golden and ALT reference buffers were refreshed by copying from
the new buffer. Replaced this by index manipulation.
Also moved all the reference frame updates to one function for
easier tracking.

Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1
2011-04-20 15:26:55 +03:00
John Koleszar
ad6a8ca58b Remove unused kf rate variables
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.

Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.

Change-Id: Icbc2c83d2b90e184be03e6f9679e678f3a4bce8f
2011-04-19 16:14:57 -04:00
Johann
4a2b684ef4 modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.

Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19 10:42:45 -04:00
Johann
a9b465c5c9 Merge "Add save/restore xmm registers in x86 assembly code" 2011-04-19 06:32:10 -07:00
Johann
c7cfde42a9 Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.

Where possible, remove dependencies on xmm[67]

Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.

Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18 16:30:38 -04:00
Yunqing Wang
48438d6016 Merge "Use sub-pixel search's SSE in mode selection" 2011-04-18 13:20:04 -07:00
Yunqing Wang
b8f0b59985 Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.

Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
2011-04-18 16:12:28 -04:00
Yunqing Wang
d5069b5af0 Merge "Handle long delay between video frames in multi-thread decoder(issue 312)" 2011-04-18 10:11:41 -07:00
Johann
cd103a5721 Merge "store quant_shift as an unsigned char" 2011-04-18 10:03:40 -07:00
Yaowu Xu
c619f6cb0f Merge "fixed an overflow in ssim calculation" 2011-04-18 07:44:34 -07:00
Scott LaVarnway
e1a8b6c8d5 Removed unused timers
Change-Id: I209803b9dbed2b2f6d02258fd7a3963a6645f4ab
2011-04-18 09:09:57 -04:00
Yunqing Wang
8ba58951e9 Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.

This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."

Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().

Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
2011-04-15 17:27:26 -04:00
Johann
d889035fe6 Merge "remove dead code, add missing RESTORE_XMM" 2011-04-15 13:32:54 -07:00
Johann
f64f425a50 remove executable bit
source files are not executable

Change-Id: Id2c7294695a22217468426423979f68f02d82340
2011-04-15 13:43:24 -04:00
Adrian Grange
0d2abe3084 Merge "Fix usage of value returned by vp8_pick_intra4x4mby_modes" 2011-04-15 08:37:19 -07:00
Yunqing Wang
1312a7a2e2 Merge "Reduce unnecessary distortion computation" 2011-04-15 08:17:03 -07:00
Johann
487c0299c9 remove dead code, add missing RESTORE_XMM
vp8_filter_block1d16_h4_ssse3 was never called

because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.

Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
2011-04-15 10:11:53 -04:00
John Koleszar
a3399291ad Fix off-by-one in copy_and_extend_plane
Should only copy h lines, not h+1.

Change-Id: I802a85686635900459c6dc79596189033e5298d8
2011-04-15 08:44:39 -04:00
Yunqing Wang
918fb5487e Reduce unnecessary distortion computation
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.

Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
2011-04-14 15:53:33 -04:00
John Koleszar
63f15987a5 Merge "Refactor lookahead ring buffer" 2011-04-14 12:35:01 -07:00
Fritz Koenig
e749ae510f Merge "Use consistent delimiters." 2011-04-14 11:56:18 -07:00
Adrian Grange
8608de1c6f Fix usage of value returned by vp8_pick_intra4x4mby_modes
The value of distortion2 returned by vp8_pick_intra4x4mby_modes
was being overwritten by the value returned by get16x16prederror
before it was tested.

Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
2011-04-14 10:50:00 -07:00
Fritz Koenig
33cefd6f6e Use consistent delimiters.
opsnr.stt file was using \t for delimiters on everything
except between VPXSSIM and Time.

Change-Id: I6284c4e40c05ff642bf4b0170dca062c279a42df
2011-04-13 15:06:17 -07:00
Adrian Grange
8861174624 Fixed use of early breakout in vp8_pick_intra4x4mby_modes
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.

Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
2011-04-13 12:56:46 -07:00
John Koleszar
88841f1059 Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.

Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
2011-04-13 14:26:45 -04:00
Johann
70f30aa95d store quant_shift as an unsigned char
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant

only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize

Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
2011-04-13 13:50:12 -04:00
John Koleszar
c99f9d7abf Change rc undershoot/overshoot semantics
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.

This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.

Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
2011-04-12 20:49:33 -04:00
John Koleszar
538f110407 Merge "Bugfix for error accumulator stats" 2011-04-12 06:59:00 -07:00
John Koleszar
e689a27d62 Bugfix for error accumulator stats
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.

Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
2011-04-12 08:47:33 -04:00
John Koleszar
fd09009227 Merge "Fix encoder range check for frame width and height" 2011-04-12 05:34:12 -07:00
Attila Nagy
1aadcedcfb Fix encoder range check for frame width and height
14 bits available in the bistream => valid range [1..16383]
Removed unused local vars.

Change-Id: Icf3385e47a9fa13af70053129c2248671f285583
2011-04-12 15:07:37 +03:00
Yunqing Wang
4fd81a99f8 Set cpu_used range to [-16, 16] in real-time mode
Remove encoding speed limitation in real-time mode.

Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
2011-04-11 15:55:04 -04:00
Yunqing Wang
d1abe62d1c Define RDCOST only once
Clean up the code.

Change-Id: I7db048efa4d972b528d553a7921bc45979621129
2011-04-11 11:53:56 -04:00
John Koleszar
a9ce3e3834 Remove unused files
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
2011-04-11 10:34:40 -04:00
Yunqing Wang
4b43167ad1 Fix input MV for full search
Input MV needs to be modified to full-pixel precision.

Change-Id: Ic5d78e41bf27077e325024332b9fe89f76c44f0c
2011-04-08 16:29:41 -04:00
Johann Koenig
6e156a4cd7 Merge "use asm_offsets with vp8_fast_quantize_b_sse3" 2011-04-08 10:05:47 -07:00
John Koleszar
921a32a306 Merge "Error accumulator stats bug." 2011-04-08 08:20:32 -07:00
Paul Wilkins
de4e9e3b44 Error accumulator stats bug.
The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.

These are only "currently" used in a limited way for RT compress
key frame detection.

Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
2011-04-08 14:21:36 +01:00
Jim Bankoski
d4cdb683a4 fixed an overflow in ssim calculation
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.

Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
2011-04-07 14:25:25 -07:00
Johann Koenig
08702002e8 use asm_offsets with vp8_fast_quantize_b_sse3
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.

Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
2011-04-07 16:40:05 -04:00
James Berry
aec5487cdd Use correct 32 bit comparisons for SAD breakout.
Rax updated to eax to avoid uninitialized memory
usage.

Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a
2011-04-07 15:08:03 -04:00
Johann
2de858b9fc Merge "use asm_offsets with vp8_fast_quantize_b_sse2" 2011-04-06 10:53:55 -07:00
Yunqing Wang
9e9f61a317 Merge "Minor modification" 2011-04-06 06:12:13 -07:00
Yunqing Wang
02423b2e92 Minor modification
A small change.

Change-Id: I2e7726e58370a95d0319361f4f6ad231138d1328
2011-04-06 09:08:47 -04:00
Johann
c32e0ecc59 use asm_offsets with vp8_fast_quantize_b_sse2
on the same order as the regular quantize change: ~2%

Change-Id: I5c9eec18e89ae7345dd96945cb740e6f349cee86
2011-04-04 16:23:29 -04:00
Scott LaVarnway
f212a98ee7 Fixed unused variable warnings for firstpass.c
Change-Id: I8378a9a541ade2f098359a7b20fa08e6c1596d80
2011-04-04 14:18:31 -04:00
John Koleszar
91036996ac Merge "Slightly simplify vp8_decode_mb_tokens." 2011-04-04 08:58:25 -07:00
Johann
610dd90288 Merge "tweak vp8_regular_quantize_b_sse2" 2011-04-04 08:56:25 -07:00
Gaute Strokkenes
15f03c2f13 Slightly simplify vp8_decode_mb_tokens.
Change-Id: I0058ba7dcfc50a3374b712197639ac337f8726be
2011-04-04 16:47:22 +01:00
Yunqing Wang
f5c0d95e8c Merge "Use full-pixel MV in mvsadcost calculation" 2011-04-04 08:40:51 -07:00
Yunqing Wang
3d6815817c Use full-pixel MV in mvsadcost calculation
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.

Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
2011-04-01 16:41:58 -04:00
Johann
8520b5c785 tweak vp8_regular_quantize_b_sse2
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi

gains of about the same order as the obj_int_extract implementation: ~2%

Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
2011-04-01 09:58:23 -04:00
Johann
ba11e24d47 Merge "Wrapper function removed from vp8_subtract_b_neon function call" 2011-04-01 05:47:21 -07:00
Tero Rintaluoma
cec76a36d6 Wrapper function removed from vp8_subtract_b_neon function call
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
 - vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
   that contains all needed address calculations
 - unnecessary file encodemb_arm.c removed
 - consistent with ARMv6 optimized version

Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
2011-04-01 10:06:44 +03:00
Johann
9d138379a2 Merge "ARMv6 optimized subtract functions" 2011-03-31 08:40:10 -07:00
Attila Nagy
297b27655e Runtime detection of available processor cores.
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.

Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.

Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
2011-03-31 10:23:01 +03:00
Attila Nagy
7d335868df Fix: lpf semaphore was signaled in single threaded run
After picking filter level, post the loopfilter semaphore
just when multiple threads are in use.

Change-Id: If7bfb64601d906adef703f454dafc25e978b93c6
2011-03-30 15:55:29 +03:00
Johann
0e43668546 Merge "Half pixel variance further optimized for ARMv6" 2011-03-29 12:14:54 -07:00
Yunqing Wang
534ea700bd Merge "Fix a crash while enabling shared (--enable-shared)" 2011-03-29 09:04:22 -07:00
Yunqing Wang
b843aa4eda Fix a crash while enabling shared (--enable-shared)
Fixed a bug in SSSE3 sub-pixel filter functions.

Change-Id: I2e2126652970eb78307ffcefcace1efd5966fb0a
2011-03-29 11:31:06 -04:00
Johann
f0c22a3f33 use GLOBAL correctly on 32bit shared libraries
http://code.google.com/p/webm/issues/detail?id=309

Change-Id: I6fce9e2f74bc09a9f258df7f91ab599812324e8c
2011-03-29 11:27:03 -04:00
Tero Rintaluoma
6fdc9aa79f ARMv6 optimized subtract functions
Adds following ARMv6 optimized functions to encoder:
  - vp8_subtract_b_armv6
  - vp8_subtract_mby_armv6
  - vp8_subtract_mbuv_armv6

Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.

Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
2011-03-29 16:52:00 +03:00
Johann
4be062bbc3 add asm_enc_offsets.c for all targets
now that we need asm_enc_offsets.c for x86 and arm and it is
harmless to build it for other targets, add it unconditionally

Change-Id: I320c5220afd94fee2b98bda9ff4e5e34c67062f3
2011-03-28 10:43:47 -04:00
Tero Rintaluoma
f5e433464b Half pixel variance further optimized for ARMv6
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.

Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
2011-03-28 09:51:51 +03:00
Johann
beaafefcf1 Merge "use asm_offsets with vp8_regular_quantize_b_sse2" 2011-03-24 11:06:36 -07:00
Johann
8edaf6e2f2 use asm_offsets with vp8_regular_quantize_b_sse2
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems

when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9

Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
2011-03-24 13:34:48 -04:00
Johann
4cde2ab765 Merge "ARMv6 optimized fdct4x4" 2011-03-23 07:52:51 -07:00
Yunqing Wang
73065b67e4 Merge "Fix multithreaded encoding for 1 MB wide frame" 2011-03-21 07:41:31 -07:00
John Koleszar
2cbd962088 Remove unused vp8_get4x4sse_cs_mmx declaration
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.

Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
2011-03-21 07:53:53 -04:00
John Koleszar
769c74c0ac Merge "Increase static linkage, remove unused functions" 2011-03-21 04:51:51 -07:00
Tero Rintaluoma
a61785b6a1 ARMv6 optimized fdct4x4
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
  - No interlocks in Cortex-A8 pipeline
  - One interlock cycle in ARM11 pipeline
  - About 2.16 times faster than current C-code compiled with -O3

Change-Id: I60484ecd144365da45bb68a960d30196b59952b8
2011-03-21 13:33:45 +02:00
Attila Nagy
bfe803bda3 Fix multithreaded encoding for 1 MB wide frame
Thread synchronization was not correct when frame width was 1 MB.
Number of allocated encoding threads is limited by the sync_range.
There is no point having more because each thread lags sync_range MBs
behind the thread processing the row above.

http://code.google.com/p/webm/issues/detail?id=302

Change-Id: Icaf67a883beecc5ebf2f11e9be47b6997fdf6f26
2011-03-18 12:35:30 +02:00
John Koleszar
429dc676b1 Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.

These symbols were identified by:

  $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
    | sort | grep '^ *1 '

Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-17 20:53:47 -04:00
Ralph Giles
185557344a Set bounds from the array when iterating mmaps.
The mmap allocation code in vp8_dx_iface.c was inconsistent.
The static array vp8_mem_req_segs defines two descriptors,
but only the first is real. The second is a sentinel and
isn't actually allocated, so vpx_codec_alg_priv is declared
with mmaps[NELEMENTS(vp8_mem_req_segs)-1]. Some functions
use this reduced upper bound when iterating though the mmap
array, but these two functions did not.

Instead, this commit calls NELEMENTS(...->mmaps) to directly
query the bounds of the dereferenced array.

This fixes an array-bounds warning from gcc 4.6 on
vp8_xma_set_mmap.

Change-Id: I918e2721b401d134c1a9764c978912bdb3188be1
2011-03-17 14:52:05 -07:00
Ralph Giles
de5182eef3 Remove commented-out VP6 code from vp8_finalize_mmaps
Change-Id: I48642c380353043bed96026f56de5908fcee270a
2011-03-17 14:51:31 -07:00
John Koleszar
8431e768c9 Merge "Fix "used uninitialized" warning in vp8_pack_bitstream()" 2011-03-17 14:25:04 -07:00
John Koleszar
de50520a8c apple: include proper mach primatives
Fixes implicit declaration warning for 'mach_task_self'. This change
is an update to Change I9991dedd1ccfddc092eca86705ecbc3b764b799d,
which fixed this issue for the decoder but not the encoder.

Change-Id: I9df033e81f9520c4f975b7a7cf6c643d12e87c96
2011-03-16 13:59:32 -04:00
Attila Nagy
71bcd9f1af Add vp8_variance8x8_armv6 and vp8_sub_pixel_variance8x8_armv6 functions
Change-Id: I08edaffc62514907fa5e90e1689269e467c857f5
2011-03-15 15:50:44 +02:00
John Koleszar
8c48c943e7 Merge "Fix an unused variable warning." 2011-03-14 14:13:53 -07:00
Johann
d0ec28b3d3 Merge "Add vp8_mse16x16_armv6 function" 2011-03-14 12:47:42 -07:00
Attila Nagy
e54dcfe88d Add vp8_mse16x16_armv6 function
Change-Id: I77e9f2f521a71089228f96e2db72524189364ffb
2011-03-14 14:38:31 +02:00
Johann
3788b3564c Merge "Move build_intra_predictors_mby to RTCD framework" 2011-03-11 10:23:48 -08:00
John Koleszar
27972d2c1d Move build_intra_predictors_mby to RTCD framework
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.

Change-Id: I29c407f828ac2bddfc039f852f138de5de888534
2011-03-11 13:04:50 -05:00
Johann
5c60a646f3 Merge "ARMv6 optimized quantization" 2011-03-11 08:29:00 -08:00
John Koleszar
75051c8b59 Merge "Only enable ssim_opt.asm on X86_64" 2011-03-11 08:28:05 -08:00
John Koleszar
5db0eeea21 Only enable ssim_opt.asm on X86_64
Fix compiling on 32 bit x86.

Change-Id: I6210573e1d9287ac49acbe3d7e5181e309316107
2011-03-11 11:27:08 -05:00
Paul Wilkins
6e73748492 Clean up of vp8_init_config()
Clean up vp8_init_config() a bit and remove null pointer case,
as this code can't be called any more and is not an adequate
trap anyway, as a null pointer would cause exceptions before
hitting the test.

Change-Id: I937c00167cc039b3aa3f645f29c319d58ae8d3ee
2011-03-11 11:06:51 -05:00
John Koleszar
170b87390e Merge "1 Pass CQ and VBR bug fixes" 2011-03-11 08:06:09 -08:00
Paul Wilkins
2ae91fbef0 1 Pass CQ and VBR bug fixes
Issue 291 highlighted  the fact that CQ mode was not working
as expected in 1 pass mode,

This commit fixes that specific problem but in so doing I also
uncovered an overflow issue in the VBR code for 1 pass and
some data values not being correctly initialized.

For some clips (particularly short clips), the resulting
improvement is dramatic.

Change-Id: Ieefd6c6e4776eb8f1b0550dbfdfb72f86b33c960
2011-03-11 10:59:34 -05:00
John Koleszar
e34e417d94 Merge "Fix incorrect macroblock counts in twopass rate control" 2011-03-11 06:06:04 -08:00
Yunqing Wang
3c9dd6c3ef Merge "Align SAD output array to be 16-byte aligned" 2011-03-11 05:56:02 -08:00
John Koleszar
c5c5dcd0be Merge "vp8cx - psnr converted to call assemblerized sse" 2011-03-11 05:54:00 -08:00
John Koleszar
29c46b64a2 Merge "vp8cx- alternate ssim function with optimizations" 2011-03-11 05:53:41 -08:00
Jim Bankoski
3dc382294b vp8cx - psnr converted to call assemblerized sse
Change-Id: Ie388d4618c44b131f96b9fe526618b457f020dfa
2011-03-11 08:51:22 -05:00
Jim Bankoski
3f6f7289aa vp8cx- alternate ssim function with optimizations
Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
2011-03-11 08:51:21 -05:00
Yunqing Wang
b2aa401776 Align SAD output array to be 16-byte aligned
Use aligned store.

Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
2011-03-11 08:24:23 -05:00
Yunqing Wang
76ec21928c Merge "Encoder loopfilter running in its own thread" 2011-03-11 04:55:05 -08:00
Attila Nagy
9c836daf65 Fix "used uninitialized" warning in vp8_pack_bitstream()
Change-Id: Iadcbdba717439f47a2c24e65fd69a3a1464174b5
2011-03-11 12:36:28 +02:00
Attila Nagy
3ae2465788 Encoder loopfilter running in its own thread
In multithreaded mode the loopfilter is running in its own thread (filter level
calculation and frame filtering). Filtering is mostly done in parallel with the
bitstream packing. Before starting the packing the loopfilter level has
to be calculated. Also any needed reference frame copying is done in the
filter thread.

Currently the encoder will create n+1 threads, where n > 1 is the number of
threads specified by application  and 1 is the extra filter thread. With n = 1
the encoder runs in single thread mode. There will never be more than n threads
running concurrently.

Change-Id: I4fb29b559a40275d6d3babb8727245c40fba931b
2011-03-11 10:52:51 +02:00
Tero Rintaluoma
7ab08e1fee ARMv6 optimized quantization
Adds new ARMv6 optimized function vp8_fast_quantize_b_armv6
to the encoder.

Change-Id: I40277ec8f82e8a6cbc453cf295a0cc9b2504b21e
2011-03-11 10:48:42 +02:00
Adrian Grange
6daacdb785 Added missing format specifier in print statement
Printout of firstpass stats for frame had one fewer
format specifiers than arguments.

Change-Id: I5a42c85aa79c471e1a70afd75e24a91546b7a1cd
2011-03-10 12:43:49 -08:00
Adrian Grange
ed40ff9e2d Removed firstpass motion map
The firstpass motion map consists of an 8-bit flag for
each MB indicating how strongly the firstpass code
believes it should be filtered during the second pass
ARNR filtering.

For long or large format material the motion map can
become extremely large and hamper the operation of
the encoding process.

This change removes the motion map altogether, leaving
the second pass to rely on the magnitude of the motion
compensated error to determine the filter weight to
use for the MB during ARNR filtering.

Tests on the derf set indicate that the effect of this
change is neutral, with some small wins and losses. The
motion map has therefore been removed based on
a cost/benefit evaluation.

Change-Id: I53e07d236f5ce09a6f0c54e7c4ffbb490fb870f6
2011-03-10 11:32:48 -08:00
James Berry
f3e9e2a0f8 Fix incorrect macroblock counts in twopass rate control
The previous calculation of macroblock count (w*h)/256
is not correct when the width/height are not multiples of
16. Use the precalculated macroblock count from
cpi->common instead. This manifested itself as a divide
by zero when the number of pixels was less than 256.
num_mbs updated in estimate_max_q, estimate_q,
 estimate_kf_group_q, and estimate_cq

Change-Id: I92ff98587864c801b1ee5485cfead964673a9973
2011-03-10 13:33:06 -05:00
Yunqing Wang
a0306ea660 Merge "Add vp8_sub_pixel_variance16x8_ssse3 function" 2011-03-09 12:26:37 -08:00
John Koleszar
c5a049babd Merge branch 'bali'
Change-Id: Icf18b4981afb12ef255fca431d4ba45860dd22c9
2011-03-09 14:11:54 -05:00
John Koleszar
5c24071504 Add missing filter.h to build system
Missing file causes 'make dist' to not include a complete copy of the
source.

Change-Id: I3f55aeb5a86d0e81234e4e4588cb8086ba4cfc4a
2011-03-09 13:43:31 -05:00
Yunqing Wang
7b8e7f0f3a Add vp8_sub_pixel_variance16x8_ssse3 function
Added SSSE3 function

Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
2011-03-09 12:33:21 -05:00
Yunqing Wang
4561109a69 Remove unused functions
Removed some unused functions

Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
2011-03-09 10:45:03 -05:00
Yunqing Wang
7966dd5287 Merge "Improve SSE2 half-pixel filter funtions" 2011-03-09 07:23:06 -08:00
John Koleszar
fa836faede Merge "Configuration updates:Making a clear distinction between Init and Change" 2011-03-09 05:07:11 -08:00
Ralph Giles
56efffdcd1 Fix an unused variable warning.
Move the update of the loopfilter info to the same block where it
is used. GCC 4.5 is not able trace the initialization of the local
filter_info across the other calls between the two conditionals on
pbi->common and issues an uninitialized variable warning.

Change-Id: Ie4487b3714a096b3fb21608f6b0c74e745e3c6fc
2011-03-08 14:56:15 -08:00
Yunqing Wang
419f638910 Improve SSE2 half-pixel filter funtions
Rewrote these functions to process 16 pixels once instead of 8.

Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
2011-03-08 16:25:06 -05:00
Yunqing Wang
859abd6b5d Merge "Add zero offset checking in SSE2 sub-pixel filter function" 2011-03-08 12:26:58 -08:00
Yunqing Wang
8432a1729f Add zero offset checking in SSE2 sub-pixel filter function
Skip filter at zero offset.

Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
2011-03-08 15:22:07 -05:00
Yunqing Wang
e8f7b0f7f5 Merge "Write SSSE3 sub-pixel filter function" 2011-03-08 10:58:30 -08:00
Yunqing Wang
244e2e1451 Write SSSE3 sub-pixel filter function
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
   during motion search.
This change gave encoder 1%~3% performance gain.

Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
2011-03-08 13:29:01 -05:00
Ralph Giles
e6948bf0f9 Fix a multi-line format-string warning.
GCC 4.5 and 4.6 both issue a warning about the multi-line format
string introduced in bc9c30a0, which also changed the whitespace
in the associated stt file by line-wrapping the long format string.

Instead, use multiple string constants, which the compiler will
concatenate. This maintains the original formatting, but remains
legible within the standard line length.

Change-Id: I27c9f92d46be82d408105a3a4091f145f677e00e
2011-03-08 07:14:12 -08:00
Paul Wilkins
de87c420ef Corrected minor typos.
Change-Id: Icc9f12bd1e1bdaf51256dc8a90d08aa9be89ef34
2011-03-08 14:46:22 +00:00
Paul Wilkins
0eccee4378 Merge changes I00c3e823,If8bca004
* changes:
  Improved key frame detection.
  Improved KF insertion after fades to still.
2011-03-08 06:40:11 -08:00
John Koleszar
5d1d9911cb correct zbin boost for splitmv mode
Disable zbin boost in SPLITMV mode as intended. Was incorrectly looking
at vp8_ref_frame_order instead of vp8_mode_order when comparing against
SPLITMV. This condition should have always been false, as SPLITMV is
not in the range of valid reference frames.

Change-Id: I0408cc7595eff68f00efef6d008e79f5b60d14bf
2011-03-07 20:58:37 -05:00
Paul Wilkins
bc9c30a003 Improved key frame detection.
In some cases where clips have been encoded with
borders (eg. some wide-screen content where there is a
border top and bottom and slide shows containing portrait
format photographs (border left and right)) key frames were
not being correctly detected.

The new code looks to measure cases where a portion of
the image can be coded equally easily using intra or inter
modes and where the resulting error score is also very low.
These "neutral" areas are then discounted in the key frame
detection code.

Change-Id: I00c3e8230772b8213cdc08020e1990cf83b780d8
2011-03-07 15:58:07 +00:00
Paul Wilkins
9fc8cb39aa Improved KF insertion after fades to still.
This code extends what was previously done for GFs, to pick
cases where insertion of a key frame after a fade (or other
transition or complex motion)  followed by a still section, will
be beneficial and will reduce the number of forced key frames.

Change-Id: If8bca00457f0d5f83dc3318a587f61c17d90f135
2011-03-07 15:11:09 +00:00
John Koleszar
0bc31f1887 Merge "Fixing divide by zero" 2011-03-04 05:40:33 -08:00
John Koleszar
fb37eda3e2 Merge "Fix drastic undershoot in long form content" 2011-03-04 05:39:40 -08:00
John Koleszar
eed2ce58e3 Merge "Fix counter of fixed keyframe distance" 2011-03-04 05:28:38 -08:00
Mikhal Shemer
84f7f20985 Configuration updates:Making a clear distinction between Init and Change
Change-Id: I7b2fb326e1aabc08b032177a7b914a5b8bb7376f
2011-03-03 10:35:09 -08:00
Mikhal Shemer
1de99a2a81 Fixing divide by zero
Change-Id: I9d8a98a2f7ed1e3116d0bae35164618c41998bac
2011-03-03 10:33:36 -08:00
John Koleszar
36be4f7f06 Fix drastic undershoot in long form content
When the modified_error_left accumulator exceeds INT_MAX, an incorrect
cast to int resulted in a negative value, causing the rate control to
allocate no bits to that keyframe group, leading to severe undershoot
and subsequent poor quality.

This error was exposed by the recent change to the rolling target and
actual spend accumulators in commit 305be4e4 which fixed them to
actually calculate the average value rather than be re-initialized
on every frame to the average per-frame bitrate. When this bug was
triggered, the target bitrate could be 0, so the rolling target
becomes small, which causes the undershoot. The code prior to 305be4e4
did not exhibit this behavior because the rolling target was always
set to a reasonable value and was independent of the actual target
bitrate. With this patch, the actual target bitrate is calculated
correctly, and the rate control tracks as expected.

This cast was likely added to silence a compiler warning on a comparison
between a double (modified_error_left) and an int (0). Instead, this
patch removes the cast and changes the comparison to be against 0.0,
which should prevent the warning from reoccuring.

This fixes issue #289. Special thanks to gnafu for his efforts in
reporting and debugging this fix.

Change-Id: Ie5cc1a7b516c578a76c3a50c892a6f04a11621fe
2011-03-02 22:52:27 -05:00
Johann
6f5189c044 Merge "ARMv6 optimized half pixel variance calculations" 2011-03-02 05:48:46 -08:00
Yunqing Wang
cfaee9f7c6 Merge "Add prefetch before variance calculation" 2011-02-28 11:42:28 -08:00
Scott LaVarnway
3e6d476ac3 Merge "Avoid double copying of key frames into alt and golden buffer" 2011-02-28 10:16:33 -08:00
Yunqing Wang
d96ba65a23 Add prefetch before variance calculation
This improved encoding performance by 0.5% (good, speed 1) to
1.5% (good, speed 5).

Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
2011-02-28 11:25:55 -05:00
Johann
31dab574cc Merge "Remove a second check for invalid ptr in vp8_get_compressed_data" 2011-02-25 11:44:18 -08:00
Johann
e4fa638653 Merge "Remove temporal alt ref from realtime only build" 2011-02-25 06:55:17 -08:00
Johann
1fae7018a8 Merge "Handle mem allocation failure in vp8e_init" 2011-02-25 06:55:10 -08:00
Attila Nagy
d8fc974ac0 Avoid double copying of key frames into alt and golden buffer
Change-Id: I726976a297a593a35ed6cba3c660e372562f7b27
2011-02-25 09:03:16 +02:00
Attila Nagy
6da2018789 Remove a second check for invalid ptr in vp8_get_compressed_data
Check is done first when function si entered.

Change-Id: Ief0d0cbd4860aaf492b78728f8d22f24029b1174
2011-02-25 08:41:13 +02:00
Scott LaVarnway
861175ef00 Removed vp8_block2type
and used defines instead.

Change-Id: Idb56e0295d004793f406dfd2d8d8c546aad62e03
2011-02-24 14:35:18 -05:00
Scott LaVarnway
d53492bba4 Merge "Revisited rd_pick_intra4x4block" 2011-02-24 11:25:21 -08:00
Scott LaVarnway
658454a04c Revisited rd_pick_intra4x4block
Removed unnecessary copies.  No noticeable speed gains.


Change-Id: I996c50c23fedd06d54ee7a3e762cbf559cc4a9d1
2011-02-24 13:31:47 -05:00
Paul Wilkins
b862c108dd Overflow of frame error accumulators.
This fixes an overflow problem in the frame error accumulators.

The overflow condition is extreme but did trigger when Frank B.
coded some high motion interlaced HD content.

The observed effect was a catastrophic  breakdown of the rate
control leading to massive undershoot and poor bit allocation.

All the error values should really be unsigned but I will look at this
separately.

Change-Id: I9745f5c5ca2783620426b66b568b2088b579151f
2011-02-24 15:49:41 +00:00
Tero Rintaluoma
8ae92aef66 ARMv6 optimized half pixel variance calculations
Adds following ARMv6 optimized functions to the encoder:
 - vp8_variance_halfpixvar16x16_h_armv6
 - vp8_variance_halfpixvar16x16_v_armv6
 - vp8_variance_halfpixvar16x16_hv_armv6

Change-Id: I1e9c2af7acd2a51b72b3845beecd990db4bebd29
2011-02-23 13:27:27 +02:00
Attila Nagy
e6db21ecc4 Handle mem allocation failure in vp8e_init
Change-Id: I0d0445c57eb0889082f83de1948852d57b38fefb
2011-02-23 12:36:03 +02:00
Attila Nagy
7af0d906e3 Remove temporal alt ref from realtime only build
It is not used in realtime mode. Reduces memory footprint.

Change-Id: I7f163225762368df5457cfd413050161d3704a3f
2011-02-22 12:53:32 +02:00
Johann
945dad277d Revert "use unaligned load"
This reverts commit f50f2fd2a7.

Change Ib7506e3e aligns the buffer

Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
2011-02-18 10:23:02 -05:00
John Koleszar
c764c2a20f Merge "clean up unused files" 2011-02-18 06:33:05 -08:00
John Koleszar
3ed8fe8778 remove unused vp8_predict_dc function
Change-Id: I64fa47889c54cfed094a674c49ef0996d49bdd42
2011-02-18 09:12:20 -05:00
John Koleszar
cbf923b12c clean up unused files
Removed a number of files that were unused or little-used.

Change-Id: If9ae5e5b11390077581a9a879e8a0defe709f5da
2011-02-18 09:09:49 -05:00
John Koleszar
d371ca93e5 cosmetic: remove unnecessary scope
Clean up some unnecessary scoping around pick_filter_level.

Change-Id: Ic57fa33e3fcae37fe6beae977e5743783399d5af
2011-02-18 08:46:07 -05:00
John Koleszar
597d02b508 Merge "Dont pick encoder filter level when loopfilter is disabled." 2011-02-18 05:26:23 -08:00
Attila Nagy
fb5a692d27 Reinitialize quantizer only when any delta is changing
No need to reinitialize for base Q changes.

Change-Id: Ie76ec21dd3c5582d5183dbed75ed73a1eed3e291
2011-02-18 14:23:37 +02:00
Attila Nagy
c6ef75690f Dont pick encoder filter level when loopfilter is disabled.
Change-Id: I58154faf4f3ece24f9927a5c3ab7e830e0887fb6
2011-02-18 08:53:00 +02:00
John Koleszar
b2ae57f1b6 Merge "Use endian-neutral bitstream packing/unpacking" 2011-02-17 12:34:16 -08:00
John Koleszar
562f1470ce Use endian-neutral bitstream packing/unpacking
Eliminate unnecessary checks on target endianness and associated
macros.

Change-Id: I1d4e6a9dcee9bfc8940c8196838d31ed31b0e4aa
2011-02-17 15:20:53 -05:00
John Koleszar
ac10665ad8 Merge "Removed unused vp8_recon_intra4x4mb function" 2011-02-17 11:30:13 -08:00
Scott LaVarnway
07f7b66fae Removed unused vp8_recon_intra4x4mb function
Change-Id: I4a328ce152d9dbe6b0d1606d1b523e8e7bfb468e
2011-02-17 13:34:38 -05:00
John Koleszar
c351aa7f1b Merge "Fix relative include paths" 2011-02-17 04:13:44 -08:00
Yunqing Wang
da9402fbf6 Merge "Allocate source buffers to be multiples of 16" 2011-02-16 11:35:06 -08:00
Yunqing Wang
da227b901d Allocate source buffers to be multiples of 16
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.

Change-Id: Ib7506e3eb2cea06657d56be5a899f38dfe3eeb39
2011-02-16 12:57:17 -05:00
Johann
0c2cfff9b0 Merge "ARMv6 optimized sad16x16" 2011-02-16 05:22:38 -08:00
James Zern
0030303b69 Remove redundant ptr checks in calls to vpx_free
vpx_free if used contains this check. If replaced, well behaved free
will behave similarly.

Change-Id: I25483aaa8b39255b9a8cf388d6e5eaa20a908ae1
2011-02-15 12:43:35 -08:00
Yunqing Wang
7725a7eb56 Merge "Improve vp8_sad16x16_sse3 function" 2011-02-14 14:09:25 -08:00
Yaowu Xu
27dad21548 Merge "Improved vp8_rd_pick_intra_mbuv_mode" 2011-02-14 13:58:12 -08:00
Scott LaVarnway
94d4fee08f Improved vp8_rd_pick_intra_mbuv_mode
Eliminated unnecessary calculations. Very small change
to performance.

Change-Id: Ib7213d43c64e36955177c4d47950ff472266f822
2011-02-14 16:34:33 -05:00
Yunqing Wang
2debd5b5f7 Improve vp8_sad16x16_sse3 function
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).

Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
2011-02-14 16:23:49 -05:00
Yaowu Xu
404e998eb7 Merge "mem leak fix for cpi->tplist" 2011-02-14 11:29:22 -08:00
James Berry
d3dfcde0f7 mem leak fix for cpi->tplist
checks added to make sure that cpi->tplist
is freed correctly in vp8_dealloc_compressor_data
and vp8_alloc_compressor_data.

Change-Id: I66149dbbd25c958800ad94f4379d723191d9680d
2011-02-14 14:02:52 -05:00
Scott LaVarnway
d419b93e3e Improved rd_pick_intra4x4block
Eliminated unnecessary calculations.  Improved performance
by 10% on keyframes and 1.6% overall for the test clip used.

Change-Id: I87671b26af5e2cc439e81d0fee3b15c7cd2a3309
2011-02-14 13:32:58 -05:00
Johann
0ff10bb1f7 Merge "remove assembly detokenizer" 2011-02-14 05:10:16 -08:00
Johann
bb6bcbccda remove assembly detokenizer
hasn't been kept up to date. remove it to avoid confusion.

Change-Id: I52ffde19b59fec5c7a381299ca2e85cb38330be7
2011-02-11 11:09:00 -05:00
Yunqing Wang
353246bd60 Merge "Add improved_mv_pred flag in real-time mode" 2011-02-11 07:20:17 -08:00
Yunqing Wang
9d0b2cbbce Add improved_mv_pred flag in real-time mode
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.

Change-Id: I9f4ae5a8bf449222d1784b54e7d315fc8347b2d1
2011-02-11 09:59:41 -05:00
Tero Rintaluoma
1ef86980b9 ARMv6 optimized sad16x16
Adds a new ARMv6 optimized function vp8_sad16x16_armv6 to encoder.

Change-Id: Ibbd7edb8b25cb7a5b522d391b1e9a690fe150e57
2011-02-11 11:14:07 +02:00
Yaowu Xu
4f8a166058 Merge "Redefining good quality speed settings" 2011-02-10 21:38:19 -08:00
Yunqing Wang
6f53e59641 Merge "Improve motion search in real-time mode" 2011-02-10 12:42:44 -08:00
John Koleszar
02321de0f2 Fix relative include paths
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.

Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-10 15:09:44 -05:00
Yunqing Wang
41e6eceb28 Improve motion search in real-time mode
Applied better MV prediction in real-time mode, which improves
the encoding quality.

Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.

Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.

Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
2011-02-10 13:40:24 -05:00
Johann
7d8199f0c3 Merge "Adds armv6 optimized variance calculation" 2011-02-10 06:06:46 -08:00
Scott LaVarnway
19054ab6da Redefining good quality speed settings
Created a new speed 1 which is in the middle of the old
speed 0 and speed 1. (for both quality and performance)

Change-Id: I4802133cdb43f359ca787646c090899679dd5d84
2011-02-09 17:18:28 -05:00
James Berry
fffa2a61d7 fixed stride in vp8_temporal_filter_predictors_mb_c
stride would not be calculated correctly for material
with odd sized frame widths.

Change-Id: I1710f6aef9ebb93d36249c9239c68c5baa9791f8
2011-02-09 16:55:39 -05:00
John Koleszar
c2b43164bd Merge "correct cost for implicit bit in mvs" 2011-02-09 11:20:12 -08:00
John Koleszar
9954d05ca6 correct cost for implicit bit in mvs
Use 0xFFF0 vice 240 (0xF0) for determining whether the sometimes
implicit bit 3 will be transmitted. This is consistent with the decoder
and encode_mvcomponent().

Change-Id: Ic1304d0ab56844bed8236edd1c5243a6767fc6b1
2011-02-09 12:50:17 -05:00
John Koleszar
a39b5af10b Merge "Put more code under #if CONFIG_MULTITHREAD." 2011-02-09 08:31:36 -08:00
Gaute Strokkenes
315e3c2518 Put more code under #if CONFIG_MULTITHREAD.
Change-Id: Icf4b692099d7d249fe3553852b1022b027b28e4b
2011-02-09 11:21:18 -05:00
Scott LaVarnway
85e79ce288 Merge "Added early breakout for vp8_rd_pick_intra4x4mby_modes" 2011-02-09 07:55:04 -08:00
John Koleszar
c96031da69 Merge "vp8e_get_preview fixed for resized frames" 2011-02-09 07:41:40 -08:00
Tero Rintaluoma
cb14764fab Adds armv6 optimized variance calculation
Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
and adds new assembly file for variance16x16 calculation.
 - vp8_filter_block2d_bil_first_pass_armv6   (integrated)
 - vp8_filter_block2d_bil_second_pass_armv6  (integrated)
 - vp8_variance16x16_armv6 (new)
 - bilinearfilter_arm.h (new)
Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
2011-02-09 10:23:43 -05:00
Johann
e5aaac24bb clean up bilinear filter
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.

recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.

change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).

recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C

Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
2011-02-08 17:42:54 -05:00
Scott LaVarnway
13db80c282 Added early breakout for vp8_rd_pick_intra4x4mby_modes
Improved performance of good quality, speed 0 (3% average)
with no average quality loss.

Change-Id: Ica34473f99bd74260eaebde6b132185e09e3c09d
2011-02-08 16:50:43 -05:00
Johann
40dcae9c2e clarify *_offsets.asm differences
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.

Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
2011-02-08 16:35:43 -05:00
James Berry
ddacf1cf69 vp8e_get_preview fixed for resized frames
preview_img d_w and d_h along with w and h
would not be updated for resized frames.

now uses sd.y_width and sd.y_height

Change-Id: I52241de4cc1de5e73f865e668bd70a7cbd954390
2011-02-08 14:27:00 -05:00
Andoni Morales Alastruey
48140167cd Fix counter of fixed keyframe distance
When the keyframe distance is fixed the first interval has the right
distance but, the next ones have kf_distance + 1.

Change-Id: I44f1190fe7146124bd07660a5e0ef08829e3ae07
2011-02-07 18:30:04 +01:00
Johann
3273c7b679 move one of the offset files
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
encoder/arm/vpx_vp8_enc_asm_offsets

Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
2011-02-07 11:35:30 -05:00
John Koleszar
adaf2b697c Merge "remove unused dboolhuff code" 2011-02-07 05:36:26 -08:00
Yunqing Wang
58d2e70fc5 Fix link error in real-time mode
make vp8_mv_pred() and vp8_cal_sad() available in real-time mode.

Change-Id: I71dbae241b486ba943458dcbae552ec4a51689d3
2011-02-07 08:21:14 -05:00
Johann
bb9c95ea53 remove unused dboolhuff code
we were holding on to this "just in case." purge it instead

Change-Id: I77a367b36d0821d731019f2566ecfffdae1d4b8a
2011-02-04 16:00:00 -05:00
Yunqing Wang
350ffe8dae Merge "Improve MV prediction in vp8_pick_inter_mode() for speed>3" 2011-02-04 10:10:15 -08:00
John Koleszar
63fc44dfa5 correct quantizer initialization
The encoder was not correctly catching transitions in the quantizer
deltas. If a delta_q was set, then the quantizer would be reinitialized
on every frame, but if they transitioned to 0, the quantizer would
not be reinitialized, leading to a encode-decode mismatch.

This bug was triggered by commit 999e155, which sets a Y2 delta Q
for very low base Q levels.

Change-Id: Ia6733464a55ee4ff2edbb82c0873980d345446f5
2011-02-04 11:37:47 -05:00
John Koleszar
6bf7e2cc37 Merge "Remove duplicate loopfilter parameters." 2011-02-04 07:07:45 -08:00
Gaute Strokkenes
ffc6aeef14 Remove duplicate loopfilter parameters.
Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8
2011-02-04 14:55:02 +00:00
John Koleszar
c0a9cbebe1 Merge "Delay auto key frame insertion in realtime configuration" 2011-02-04 05:16:15 -08:00
Gaute Strokkenes
bf5f585b0d Make vp8_adjust_mb_lf_value return the updated value rather than
manipulating it in situ via a pointer.

Change-Id: If4a87a4eccd84f39577c0e91e171245f4954c5cf
2011-02-03 19:24:16 +00:00
Scott LaVarnway
4aa12b6c5f Merge "Zero out block mv when an intra mode is selected" 2011-02-03 07:16:52 -08:00
Yunqing Wang
a870315629 Merge "Improved encoder threading" 2011-02-03 05:44:57 -08:00
Attila Nagy
e5904f2d5e Delay auto key frame insertion in realtime configuration
Whe auto keyframe insertion is enabled and conditions are right (scene change)
the encoder can decide to insert a key frame and does a re-encoding. This can
introduce extra latency. In RT mode we do not do the re-encoding of the current
frame but force the next frame to key frame.

Change-Id: I15c175fa845ac4c1a1f18bea3676e154669522a7
2011-02-02 13:54:40 +02:00
Scott LaVarnway
07a7c08aef Zero out block mv when an intra mode is selected
instead of each time mode is tested.

Change-Id: Ief0f5586dafde54cc14d348dcecdacb182e7c1d5
2011-02-01 12:55:51 -05:00
Scott LaVarnway
a5ecaca6a7 Removed unnecessary B_MODE_INFO memset.
Change-Id: I2bcef6a8e47f88542861fd1356631ca934e2a0e7
2011-02-01 11:35:08 -05:00
Scott LaVarnway
b18df82e1d Moved rd calculation into vp8_pick_intra4x4mby_modes
Then removed unnecessary code.

Change-Id: I142658815d843c9396b07881dbdd8d387c43c90e
2011-02-01 11:26:04 -05:00
Scott LaVarnway
4e7e79f770 Removed intra_modes from vp8cx_encode_intra_macro_block
Restructured function in order to eliminate the prediction
modes save/restore.  Code cleanup also.

Change-Id: I816e3b910de64d0f0f0ddc2398805c63263191e8
2011-02-01 10:05:35 -05:00
Attila Nagy
385c2a76d1 Improved encoder threading
Reduce the number of sync points by letting each thread
continue imediatly with a new MB row.
Better multicore scaling, improves performance by 5-20% on ARM multicore.

Change-Id: Ic97e4d1c4886a842c85dd3539a93cb217188ed1b
2011-02-01 12:17:58 +02:00
Scott LaVarnway
9e7fec216e Removed prediction_error accumulation
from vp8cx_encode_intra_macro_block.  prediction_error is used when
deciding if a frame should be a keyframe.  After reviewing this with
Yaowu, it was pointed out that vp8cx_encode_intra_macro_block
is only called for keyframes, so the accumulation is unnecessary.

Change-Id: Id79dc81b80d4f5d124f3a0dba1b923887e2e1ec8
2011-01-31 19:53:02 -05:00
Scott LaVarnway
317f0da91e Removed last_auto_filter_prediction_error
last_auto_filter_prediction_error is not really used.

Change-Id: Ic6e56c4076bbd250ef783ee1be46964c85f62864
2011-01-31 19:41:09 -05:00
Scott LaVarnway
4a15e55793 Possible bug in vp8cx_encode_intra_macro_block
vp8_pick_intra4x4mby_modes uses the passed in distortion
for an early breakout.  The best distortion was never saved
and the distortion for TM_PRED was always used.

Change-Id: Idbaf73027408a4bba26601713725191a5d7b325e
2011-01-31 17:43:18 -05:00
Scott LaVarnway
60fde4d342 Merge "Performance improvement of first pass" 2011-01-31 13:02:23 -08:00
Yaowu Xu
6d19d40718 Merge "change the threshold of DC check for encode breakout" 2011-01-31 11:00:46 -08:00
John Koleszar
f6214d1db8 Merge "validate min_q against max_q" 2011-01-31 07:33:55 -08:00
John Koleszar
2d03f073a7 validate min_q against max_q
min_q is required to be <= max_q.

Change-Id: I28eccf96df3b52a94913762b54c4fbe0d021ce5e
2011-01-31 10:33:00 -05:00
Adrian Grange
408a8adc15 Merge "Changed condition for using RD in Intra Mode" 2011-01-31 02:18:40 -08:00
Yaowu Xu
8f279596cb change the threshold of DC check for encode breakout
Previously, the DC check is to make sure there is no code-able
DC shift for quantizer Q0, which has been verified rather
conservative. This commit changes the criteria to have two
components, DC and AC, to address the conservativeness. First,
it checks if all AC energy is enough to contribute a single
non-zero quantized AC coefficient. Second, for DC, the decision
to skip further considers two possible scenarios: 1. There is
no code-able 2nd order DC coefficient at all; 2 The residue is
relatively flat, but the uniform DC change is very small, i.e.
less than 1/2 gray level per pixel.

Comparing to previous criteria, the new criteria is about 10%
to 15% faster in encoding time with a very small quality loss.
(threshold ~1000 and quality range 33db-45db)

It should be noted that this commit enables "automatic" static
threshold for encodebreakout if a non-zero small value is passed
in to encoder.

Change-Id: I0f77719a1ac2c2dfddbd950d84920df374515ce3
2011-01-28 09:43:23 -08:00
Johann
f3cb9ae459 Merge "Adds "armvX-none-rvct" targets" 2011-01-28 09:03:58 -08:00
Yunqing Wang
7cbe684ef5 Improve MV prediction in vp8_pick_inter_mode() for speed>3
Applied same method used in vp8_rd_pick_inter_mode() to improve
the accuracy of MV prediction.

Change-Id: Ia50ae26208b18482695601f32febd99fe89fbc17
2011-01-28 10:00:20 -05:00
Adrian Grange
e9f513d74a Changed condition for using RD in Intra Mode
The condition for using RD when selecting the intra coding mode
for a MB is that the RD flag is set AND we're not in real-time
mode.

Previously the code used RD if either the RD flag was set OR
we were not using real-time mode.

Change-Id: Ic711151298468a3f99babad39ba8375f66d55a08
2011-01-28 14:47:36 +00:00
Paul Wilkins
dcb23e2aaa Inconsistent distortion metric in vp8_rd_pick_intra_mbuv_mode
This function was using a variance metric compared to and SSE metric in
other places (eg. vp8_rd_inter_uv)

Change-Id: I9109fcc5a13bca9db1d7ead500fe14999ab233eb
2011-01-28 13:13:30 +00:00
Tero Rintaluoma
11a222f5d9 Adds "armvX-none-rvct" targets
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
 - armv5te-none-rvct
 - armv6-none-rvct
 - armv7-none-rvct

To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.

Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.

Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
2011-01-28 12:47:39 +02:00
Johann
73207a1d8b warning: pointer targets differ in signedness
vp8/encoder/rdopt.c:728: warning: pointer targets in passing argument 3
of 'macro_block_yrd' differ in signedness
vp8/encoder/rdopt.c:541: note: expected 'int *' but argument is of type
'unsigned int *'

distortion is signed when calling macro_block_yrd is both other cases,
as well as for RDCOST

Change-Id: I5e22358b7da76a116f498793253aac8099cb3461
2011-01-27 11:53:26 -05:00
Johann
27000ed6d9 clean up implicit declaration warnings for neon
Change-Id: I6ca2d89f355839c4c770773c09fc69dcea7c1406
warning: implicit declaration of function
  'vp8_variance_halfpixvar16x16_[h|v|hv]_neon'
  'vp8_sub_pixel_variance16x16_neon_func'
2011-01-27 11:31:59 -05:00
Scott LaVarnway
8a5c255b3d Merge "Removed unused members from VP8_COMP" 2011-01-27 08:12:22 -08:00
Yunqing Wang
bb30ffc4dc Merge "Remove copies of same functions" 2011-01-27 08:11:26 -08:00
Yunqing Wang
3ee4e1e79f Merge "Refine motion vector prediction for NEWMV mode" 2011-01-27 08:10:53 -08:00
Scott LaVarnway
3c18a2bb2e Performance improvement of first pass
Improved the performance of the first pass only
(~6% on 720p test clip) by making use of LUT instead of the
float calculations.  Might try a SIMD version later.
Also started to make use of int_mv instead of
MV.

Change-Id: If2a217c7d6b59cd2c25c5553e0ca7e0502403af8
2011-01-26 16:42:56 -05:00
Yunqing Wang
cac54404b9 Remove copies of same functions
Reduce the code size.

Change-Id: I2e1998557a3c8776e262c442fd758c25e17aff7a
2011-01-26 15:37:00 -05:00
Scott LaVarnway
c4887da39c Removed unused members from VP8_COMP
Change-Id: I8f3f2642b02975fbdb14982984a29821f80d30d3
2011-01-26 15:07:17 -05:00
Paul Wilkins
35bb74a6bd Rationalize vp8_rd_pick_intra16x16mby_mode()
Use the function macro_block_yrd() to calculate error and distortion
in keeping with what is done for inter frames.

The old code was using a variance metric for once case and an
SSE function for measuring distortion in the other case.

The function vp8_encode_intra16x16mbyrd() is no longer used.

Change-Id: Ic228cb00a78ff637f4365b43f58fbe5a9273d36f
2011-01-26 18:46:34 +00:00
Paul Wilkins
e8e09d33df Merge "Correction to buffer update for non-viewable frames." 2011-01-26 09:33:48 -08:00
Yaowu Xu
82266a1ac9 Merge "cap the best quantizer for 2nd order DC" 2011-01-26 09:27:11 -08:00
John Koleszar
be3e0ff7c3 Merge "Adds vpx_vp8_enc_asm_offsets.c.o to OBJS-yes list" 2011-01-26 07:29:19 -08:00
Attila Nagy
0def48b60f Adds vpx_vp8_enc_asm_offsets.c.o to OBJS-yes list
Change-Id: Ibd6e3bc82471839904b1086b499efc55f7c5cbaf
2011-01-26 17:06:09 +02:00
Paul Wilkins
a3f71ccff6 Correction to buffer update for non-viewable frames.
The code previously tested cpi->common.refresh_alt_ref_frame
but there are situations where this flag may be set for viewable frames.

The correct test should be !cm->show_frame.

Change-Id: Ia1a600622992a4a68fe1d38ac23bf6b34b133688
2011-01-26 12:52:31 +00:00
Paul Wilkins
2caa36aa4f Merge "Fix for incorrect variable declaration." 2011-01-26 01:53:53 -08:00
Yaowu Xu
999e155f55 cap the best quantizer for 2nd order DC
This commit also removes artificial RDMULT cap for low quantizers.
The intention is to address some abnormal behavior of mode selections
at the low quantizer end, where many macroblocks were coded with
SPLITMV with all partitions using same motion vector including (0,0).
This change improves the compression quality substantially for high
quality encodings in both PSNR and SSIM terms. Overall effect on
mid/low rate range is also positive for all metrics, but smaller
in magnitude.

Change-Id: I864b29c4bd9ff610d2545fa94a19cc7e80c02667
2011-01-25 22:26:18 -08:00
Fritz Koenig
53d8e9dc97 Fix for incorrect variable declaration.
Commit 336aa0b7da incorrectly
declared current_pos as and int, when it should have been
a FIRSTPASS_STATS pointer.

Change-Id: I0a51c7a86ebba8546c95dd5d9d1c1143d4613e40
2011-01-25 15:41:41 -08:00
Johann
907e98fbb5 Merge "update sse2 regular quantizer" 2011-01-25 13:40:28 -08:00
Johann
58f19cc697 Merge "move new neon subpixel function" 2011-01-25 13:09:05 -08:00
Yunqing Wang
dcaaadd8ed Refine motion vector prediction for NEWMV mode
Adjust checking points in motion vector prediction to better cover
possible movements, and get a better prediction. Tests on test
clips showed a 0.1% improvement in SSIM, and no change in PSNR
and performance.

Change-Id: Ifdab05d35e10faea1445c61bb73debf888c9d2f8
2011-01-25 15:54:34 -05:00
Johann
af7d23c9b4 Merge "Fix issue 262, vp8cx_pack_tokens_into_partitions_armv5" 2011-01-25 12:49:52 -08:00
Johann
2168a94495 move new neon subpixel function
previously wasn't guarded with ifdef ARMV7, causing a link error with
ARMV6

Change-Id: I0526858be0b5f49b2bf11e9090180b2a6c48926d
2011-01-25 15:48:37 -05:00
Yunqing Wang
4e149bb447 Merge "Modify calling of NEON code in sub-pixel search" 2011-01-25 09:54:23 -08:00
Attila Nagy
3bf235a4c9 Fix issue 262, vp8cx_pack_tokens_into_partitions_armv5
http://code.google.com/p/webm/issues/detail?id=262
Function was asuming that partitions have equal amount of mb_rows,
which is not always true.

Change-Id: I59ed40117fd408392a85c633beeb5340ed2f4b25
2011-01-25 15:55:02 +02:00
Paul Wilkins
a69c18980f Merge "Incorrect bit allocation in forced KF groups." 2011-01-25 05:32:26 -08:00
Paul Wilkins
336aa0b7da Incorrect bit allocation in forced KF groups.
The old 2 pass code estimated error distribution when coding a
forced (by interval) key frame. The result of this was that in some
cases, when allocating bits at the GF group level within a KF
group there was either a glut of bits or starvation of bits at the end
of the KF group.

Added code to rescan and get the correct data once the position of
a forced key frame has been determined.

Change-Id: I0c811675ef3f9e4109d14bd049d7641682ffcf11
2011-01-25 12:29:06 +00:00
Scott LaVarnway
0ee525d6de Added vp8_update_zbin_extra
vp8cx_mb_init_quantizer was being called for every mode checked
in vp8_rd_pick_inter_mode.  zbin_extra is the only value that
really needs to be recalculated.  This calculation is disabled
when using the fast quantizer for mode selection.
This gave a small performance boost (~.5% to 1%).
Note: This needs to be verified with segmentation_enabled.

Change-Id: I62716a870b3c82b4a998bdf95130ff0b02106f1e
2011-01-24 11:00:56 -05:00
Yunqing Wang
d3e9409bb0 Merge "Modify sub-pixel filters to eliminate unnecessary calculations" 2011-01-21 11:07:17 -08:00
Yunqing Wang
0822a62f40 Modify sub-pixel filters to eliminate unnecessary calculations
In sub-pixel calculation, xoffset and yoffset mostly take some
specific values. Modified sub-pixel filter functions according to
these possible values to improve performance.

Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
2011-01-21 13:59:27 -05:00
Paul Wilkins
0cdfef1e22 Modified static scene check.
Added code to scan ahead a few frames when we see what
we think is a static scene in the two pass GF loop to see if the
conditions persist.

Moved calculation of decay rate out into a fuunction.

Change-Id: I6e9c67e01ec9f555144deafc8ae67ef25bffb449
2011-01-21 17:52:00 +00:00
Paul Wilkins
8064583d26 Further work to reduce pulsing.
These changes are specifically targeted at fade transitions to
static scenes. Here we want to place a GF/ARF immediately
after the fade and prevent an ARF just  before the fade.

Also some code lines and comment lines shortened to 80 chars
while I was there.

Change-Id: Iefdc09a4fa7b265048fc017246b73e138693950f
2011-01-20 18:01:20 +00:00
Adrian Grange
815e1e9fe4 Fixed use of motion percentage in KF/GF group calc
In both vp8_find_next_key_frame and define_gf_group,
motion_pct was initialised at the top of the loop before
next_frame stats had been read in.

This fix sets motion_pct after next_frame stats have
been read.

Change-Id: I8c0bebf372ef8aa97b97fd35b42973d1d831ee73
2011-01-20 13:13:33 +00:00
Paul Wilkins
06e7320c3e Merge "First pass loop bug." 2011-01-19 08:33:34 -08:00
Paul Wilkins
e867516843 First pass loop bug.
Incorrect value loop_decay_rate used in GF loop.

The intent was to test the  cumulative value decay_accumulator.

Change-Id: I62928c63eb09f4f6936a45ebd1c23784d1c9681b
2011-01-19 15:50:22 +00:00
John Koleszar
2f0331c90c Merge "Implement error tracking in the decoder" 2011-01-19 05:51:00 -08:00
Henrik Lundin
67fb3a5155 Implement error tracking in the decoder
A new vpx_codec_control called VP8D_GET_FRAME_CORRUPTED. The output
from the function is non-zero if the last decoded frame contains
corruption due to packet losses.

The decoder is also modified to accept encoded frames of zero length.
A zero length frame indicates to the decoder that one or more frames
have been completely lost. This will mark the last decoded reference
buffer as corrupted. The data pointer can be NULL if the length is
zero.

Change-Id: Ic5902c785a281c6e05329deea958554b7a6c75ce
2011-01-19 09:53:21 +01:00
John Koleszar
f97f2b1bb6 Merge "fix last frame buffer copy logic regression" 2011-01-18 12:54:57 -08:00
Yunqing Wang
ce6c954d2e Modify calling of NEON code in sub-pixel search
In vp8_find_best_sub_pixel_step_iteratively(), many times xoffset
and yoffset are specific values - (4,0) (0,4) and (4,4). Modified
code to call simplified NEON version at these specific offsets to
help with the performance.

Change-Id: Iaf896a0f7aae4697bd36a49e182525dd1ef1ab4d
2011-01-18 14:19:52 -05:00
Jim Bankoski
edcf74c6ad vp8e -removed undefined max call
Change-Id: I42a86b0488f44115f09551fc5ad6d711fd470f0d
2011-01-18 11:21:32 -05:00
Paul Wilkins
d6d5d43708 Merge "Further CQ, Key frame and ARF changes" 2011-01-18 08:04:46 -08:00
Paul Wilkins
57136a268a Further CQ, Key frame and ARF changes
This code fixes a bug in the calculation of
the minimum Q for alt ref frames.

It also allows an extended gf/arf interval for sections
of clips that completely static (or nearly so).

Change-Id: I1a21aaa16d4f0578e5f99b13bebd78d59403c73b
2011-01-18 15:19:05 +00:00
Attila Nagy
cb791aaa2f Fix encoder real-time only configuration.
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in  validate_config.

Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
2011-01-18 08:19:21 -05:00
Paul Wilkins
339c512762 Fix CQ range and experimental KF sizing changes.
The CQ level was not using the q_trans[] array to convert
to a 0-127 range as per min and maxq

Experimental change to try and match the reconstruction
error for forced key frames approximately to that of the
previous frame by means of the recode loop. Though this
may cause extra recodes and the recode behavior has not
been optimized, it can only happen on forced key frames.

Change-Id: I1f7e42d526f1b1cb556dd461eff1a692bd1b5b2f
2011-01-17 17:24:45 +00:00
Johann
15f9bea73b update sse2 regular quantizer
about ~5% gain on 32bit. disabled for 64bit

unset executable bit on ssse3 version (cosmetic)

Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
2011-01-14 14:26:10 -05:00
Paul Wilkins
a1a4d23797 Merge "KF/GF Pulsing" 2011-01-14 09:20:37 -08:00
Paul Wilkins
3aafb47729 Merge "Testing of modes with Alt Ref frame" 2011-01-14 07:26:37 -08:00
Paul Wilkins
8f711db4e8 Merge "Experimental change to help with ARNR problem." 2011-01-14 07:26:01 -08:00
Paul Wilkins
415371c9d9 Testing of modes with Alt Ref frame
Previously when a frame was being overlaid on a previously coded
alt ref frame we only checked the alt ref 0,0 mode. Where there is
a possibility that the alt ref buffer is a filtered frame we should allow
the other prediction modes as normal or at the least allow use of
the last frame buffer.

Change-Id: I4d6227223d125c96b4f3066ec6ec9484fee7768c
2011-01-14 15:20:45 +00:00
Adrian Grange
2c1b06e672 ARNR filter pointer update bug fix
In cases where the frame width is not a multiple of 16 the
ARNR filter would go wrong.

In vp8_temporal_filter_iterate_c when updating pointers
at the end of a row of MBs,  the image size was
incorrectly used rather than using Num_MBs_In_Row
times 16 (Y) or 8 (U,V).

This worked when width is multiple of 16 but failed
otherwise.

Change-Id: I008919062715bd3d17c7aa2562ab58d1cb37053a
2011-01-14 15:04:39 +00:00
Paul Wilkins
72e22b0bb8 Experimental change to help with ARNR problem.
Allow use of other reference frames for the ARF overlay frame
when ARNR filtering is enabled

Change-Id: Icd6a9fb38977a88fbe7cc9b9c18198eb454c0273
2011-01-14 12:07:12 +00:00
Paul Wilkins
c8338ebf7a KF/GF Pulsing
This change is designed to try and reduce pulsing effects when moving
with a complex transition like a fade, into an easy or static section in
an otherwise difficult clip in CQ mode.

The active CQ level is relaxed down to the user entered level for frames that
are generating less than the passed in minimum bandwidth.

Change-Id: Id6d8b551daad4f489c087bd742bc95418a95f3f0
2011-01-14 11:37:26 +00:00
Scott LaVarnway
b082790c7d Merge "Moved ref frame calculations" 2011-01-13 06:59:28 -08:00
Paul Wilkins
eda7d538bf One pass rate control correction.
Fixed discrepancy cpi->ni_frames vs cm->current_video_frame > 150.

Make one pass path explicit.

There is still scope for some odd behaviour around the transition
point at cpi->ni_frames > 150.

Change-Id: Icdee130fe6e2a832206d30e45bf65963edd7a74d
2011-01-13 12:51:41 +00:00
Paul Wilkins
55acda98f7 Limit key frame quantizer for forced key frames.
Where a key frame occurs because of a minimum interval
selected by the user, then these forced key frames ideally need
to be more closely matched in quality to the surrounding frame.

Change-Id: Ia55b1f047e77dc7fbd78379c45869554f25b3df7
2011-01-12 17:43:59 +00:00
Scott LaVarnway
96fd758ea9 Moved ref frame calculations
Moved ref frame calculations to outside of the
mode_index loop.

Change-Id: I06103fc7e8af88b54b84443acf6691d29b1272ac
2011-01-11 15:00:00 -05:00
Yunqing Wang
6ff2b0883a Merge "Add no_skip_block4x4_search flag in SPLITMV mode" 2011-01-11 08:34:24 -08:00
Johann
e88d7ab245 Merge "use unaligned load" 2011-01-11 08:25:22 -08:00
Johann
f50f2fd2a7 use unaligned load
source buffer is not guaranteed to be aligned for odd size buffers

Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
2011-01-11 11:22:29 -05:00
Yunqing Wang
1546e6a8c9 Add no_skip_block4x4_search flag in SPLITMV mode
Add a flag to always enable block4x4 search for speed=0 (good
quality) to guarantee no quality loss for speed0.

Change-Id: Ie04bbc25f7e6a33a7bfa30e05775d33148731c81
2011-01-11 09:50:13 -05:00
Henrik Lundin
48c28fc42c Remove unused local variables
Removing unused local variables causing compiler warnings in
Visual Studio.

Change-Id: I0e2096303be1fdbc01428a6e57cca9796bb32c8a
2011-01-11 15:22:19 +01:00
Yunqing Wang
3675b2291c Fix bug in motion search
The maximum possible MV in 1/8 pel units is (1<<11), which could
cause mvcost out of its range that is 1023. Change maximum
possible MV in 1/8 pel units to (1<<11)-8 will fix this problem.

Change-Id: I5788ed1de773f66658c14f225fb4ab5b1679b74b
2011-01-10 16:16:59 -05:00
Paul Wilkins
cf7c4732e5 Two Pass VBR change
Further experiment with restriction of the Q range.

This uses the average non KF/GF/ARF quantizer,  instead
of just relying on the initial value. It is not such a strong constraint
but there may be a reduced risk of rate misses.

Change-Id: I424fe782a37a2f4e18c70805e240db55bfaa25ec
2011-01-10 16:41:53 +00:00
Paul Wilkins
405499d835 Revert BASE_ERRPERMB
Constant value reverted pending more tests
on different video formats.

Change-Id: I07d11a0e0185e60724698c835416caf2e0774e61
2011-01-10 16:02:51 +00:00
Paul Wilkins
c28b10adeb Merge "CQ Mode" 2011-01-07 11:05:56 -08:00
Paul Wilkins
e0846c9c8c CQ Mode
The merge includes hooks to for CQ mode and other code
changes merged from the test branch.

CQ mode attempts to maintain a more stable quantizer within a clip
whilst also trying to adhere to a guidline maximum bitrate.

The existing target data rate parameter is used to specify the
guideline maximum bitrate.

A new parameter allows the user to specify a target CQ level.

For normal (non kf/gf/arf) frames, the quantizer will not drop BELOW the
user specified value (0-63). However, in some cases the encoder may
choose to impose a target CQ that is above that specified by the user,
if it estimates that consistent use of the target value is not compatible
with guideline maximum bitrate.

Change-Id: I2221f9eecae8cc3c431d36caf83503941b25e4c1
2011-01-07 18:46:29 +00:00
Paul Wilkins
ba976eaa9b Merge "Limit Q variability in two pass." 2011-01-07 09:32:29 -08:00
Paul Wilkins
3af3593c8e Limit Q variability in two pass.
In two pass encoding each frame is given an active
Q range to work with. This change limits how much this
Q range can be altered over time from the initial estimate
made for the clip as a whole.

There is some danger this could lead to overshoot or undershoot
in some corner cases but it helps considerably in regard to
clips where either there is a glut or famine of bits in some sections,
particularly near the end of a clip.

Change-Id: I34fcd1af31d2ee3d5444f93e334645254043026e
2011-01-07 17:23:50 +00:00
Paul Wilkins
f7e2f1fedf Merge "Disable some features for first pass." 2011-01-07 08:34:27 -08:00
Scott LaVarnway
dd314351e6 Merge "Removed cpi->target_bits_per_mb" 2011-01-07 06:46:45 -08:00
Scott LaVarnway
6dbdfe3422 Removed cpi->target_bits_per_mb
cpi->target_bits_per_mb is currently not being used,
so delete it.  Also removed other unused code in rdopt.c.

Change-Id: I98449f9030bcd2f15451d9b7a3b9b93dd1409923
2011-01-07 09:41:13 -05:00
Johann
8b0cf5f79d x86 sse2 temporal_filter_apply
count can be reduced to short because the max number of filtered frames
is set to 15. the max value for any frame is 32 (modifier = 16,
filter_weight = 2). 15*32 = 480 which requires 9 bits

this function goes from about 7000 us / 1000 iterations for the C code
to < 275 us / 1000 iterations for sse2 for block_size = 16 and from
about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8

Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
2011-01-06 14:00:30 -05:00
John Koleszar
1942eeb886 fix last frame buffer copy logic regression
Commit 0ce3901 introduced a change in the frame buffer copy logic where
the NEW frame could be copied to the ARF or GF buffer through the
copy_buffer_to_{arf,gf}==1 flags, if the LAST frame was not being
refreshed. This is not correct. The intent of the
copy_buffer_to_{arf,gf}==1 flag is to copy the LAST buffer. To copy the
NEW buffer, the refresh_{alt_ref,golden}_frame flag should be used.

The original buffer copy logic is fairly convoluted. For example:

    if (cm->refresh_last_frame)
    {
        vp8_swap_yv12_buffer(&cm->last_frame, &cm->new_frame);

        cm->frame_to_show = &cm->last_frame;
    }
    else
    {
        cm->frame_to_show = &cm->new_frame;
    }
    ...
    if (cm->copy_buffer_to_arf)
    {
        if (cm->copy_buffer_to_arf == 1)
        {
            if (cm->refresh_last_frame)
                vp8_yv12_copy_frame_ptr(&cm->new_frame, &cm->alt_ref_frame);
            else
                vp8_yv12_copy_frame_ptr(&cm->last_frame, &cm->alt_ref_frame);
        }
        else if (cm->copy_buffer_to_arf == 2)
            vp8_yv12_copy_frame_ptr(&cm->golden_frame, &cm->alt_ref_frame);
    }

Effectively, if refresh_last_frame, then new and last are swapped, so
when "new" is copied to ARF, it's equivalent to copying LAST to ARF. If
not refresh_last_frame, then LAST is copied to ARF. So LAST is copied to
ARF in both cases.

Commit 0ce3901 removed the first buffer swap but kept the
refresh_last_frame?new:last behavior, changing the sense since the first
swap wasn't done to the more readable refresh_last_frame?last:new, but
this logic is not correct when !refresh_last_frame.

This commit restores the correct behavior from v0.9.1 and prior. This
case is missing from the test vector set.

Change-Id: I8369fc13a37ae882e31a8a104da808a08bc8428f
2011-01-06 13:07:42 -05:00
Paul Wilkins
431dac08d1 Disable some features for first pass.
The following features don't make sense for the first
pass in its current form and have a significant impact on its
speed (up to 50%).

Slow quantizer, slow dct and trellis optimization.

Change-Id: Id9943f6765ffbd71fc0084ec7dfbc9d376fd6fcd
2011-01-06 17:10:07 +00:00
Paul Wilkins
b095d9df3c Adjustment to boost calculation in two pass.
Calculate a minimum intra value to be used in determining the
IIratio scores used in two pass, second pass.

This is to make sure sections that are low complexity" in the
intra domain are still boosted appropriately for KF/GF/ARF.

For now I have commented out the Q based adjustment of
KF boost.

Change-Id: I15deb09c5bd9b53180a2ddd3e5f575b2aba244b3
2011-01-04 18:11:28 +00:00
Scott LaVarnway
de4e8185e9 Fixed encoder crash when mult-threading is enabled.
Happens in real-time mode.  Will happen in good quality, speed 1.

Change-Id: I3e5b68827b1a5798d0431b088a709256d1ce2c95
2010-12-29 16:41:22 -05:00
Yunqing Wang
a864678cdb Always update last_frame_type
Scott pointed out that last_frame_type only gets updated while
loopfilter exists. Since last_frame_type is also needed in
motion search now, it needs to be updated every frame.

Change-Id: I9203532fd67361588d4024628d9ddb8e391ad912
2010-12-29 10:28:35 -05:00
Scott LaVarnway
3fb4abf3d1 Merge "Use the fast quantizer for inter mode selection" 2010-12-28 11:56:11 -08:00
Scott LaVarnway
516ea8460b Use the fast quantizer for inter mode selection
Use the fast quantizer for inter mode selection and the
regular quantizer for the rest of the encode for good quality,
speed 1.  Both performance and quality were improved.  The
quality gains will make up for the quality loss mentioned in
I9dc089007ca08129fb6c11fe7692777ebb8647b0.

Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
2010-12-28 14:51:46 -05:00
Yunqing Wang
bf53ec492d Adjust MV borders for SPLITMV mode
Add limits to avoid MV going out of range.

Change-Id: I8a5deb40bf393488d29f694b5a56804d578e68b5
2010-12-28 13:23:07 -05:00
Yunqing Wang
e463b95b4e Merge "Modify motion estimation for SPLITMV mode" 2010-12-28 08:12:26 -08:00
Yunqing Wang
a5a8d92976 Modify motion estimation for SPLITMV mode
1. Search for block8x16/block16x8 uses block8x8's search results.
2. Check block4x4 only if block8x8 is chosen. (This hurts quality,
   which will be improved in another check-in.)
3. In block4x4 search, the previous block's result is used as
   MV predictor for next block.

This change improves performance.

Change-Id: I9dc089007ca08129fb6c11fe7692777ebb8647b0
2010-12-28 10:34:42 -05:00
Yaowu Xu
0f5264b584 adjusted sad_per_bit to correlate with quantizer
Re-calibrated sad_per_bit16 and sad_per_bit4 tables to linearly
correlated to quantizer values, these two variables are used in
motion search for costing motion vectors. This change has an small
positive effect on compression.

Change-Id: Ic9b5ea6fb8d5078ef663ba4899db019cc51f4166
2010-12-23 22:59:38 -08:00
Johann
20b855c33e improve integer version of filter
the lookup table is based on floating point calculations (see source)

by moving the *3 before the downshift and adding the rounding bit, the
delta (LUT - integer) goes from:
______________________________________
__ 1__ 1______________________________
__ 1__ 1______________________________
____ 1______ 1________________________
____ 1 2__ 2 1________________________
______ 1 1 2__ 2__ 2__ 2 1 1__________
________ 1 1 2 2__ 1 2 3 1 2__ 2__ 2__
to:
__-1__-1______________________________
______________________________________
____-1______-1________________________
______________________________________
________-1______________-1____________
______________________________________

it's important to be able to use the integer version because the LUT
more or less precludes SIMD optimizations

Change-Id: I45a81127dc7b72a06fba951649135d9d918386c0
2010-12-22 11:33:59 -05:00
Johann
4b6219cb33 temporal filter naming changes
be more consistant with the naming pattern, especially wrt rtcd

Change-Id: I3df50686a09f1dab0a9620b5adbb8a1577b40f2f
2010-12-22 11:32:15 -05:00
Johann
092b5bef37 abstract apply_temporal_filter
allow for optimized versions of apply_temporal_filter
(now vp8_apply_temporal_filter_c)

the function was previously declared as static and appears to have been
inlined. with this change, that's no longer possible. performance takes
a small hit.

the declaration for vp8_cx_temp_filter_c was moved to onyx_if.c because
of a circular dependency. for rtcd, temporal_filter.h holds the
definition for the rtcd table, so it needs to be included by onyx_int.h.
however, onyx_int.h holds the definition for VP8_COMP which is needed
for the function prototype. blah.

Change-Id: I499c055fdc652ac4659c21c5a55fe10ceb7e95e3
2010-12-22 11:31:54 -05:00
Jim Bankoski
6cb708d501 Merge "Add psnr/ssim tuning option" 2010-12-20 09:32:13 -08:00
John Koleszar
c49f49b113 propagate user private data on decode
The pointer passed in the user_priv argument to vpx_codec_decode()
should be propagated through to the corresponding output frame and
made available in the image's user_priv member. Fixes issue #252

Change-Id: I182746a6882c8549fb146b4a4fdb64f1789eb750
2010-12-17 11:34:02 -05:00
John Koleszar
fc6ce744a6 Merge "Inform caller of decoder about updated references" 2010-12-17 07:08:21 -08:00
John Koleszar
b0da9b399d Add psnr/ssim tuning option
Add a new encoder control, VP8E_SET_TUNING, to allow the application
to inform the encoder that the material will benefit from certain
tuning. Expose this control as the --tune option to vpxenc. The args
helper is expanded to support enumerated arguments by name or value.

Two tunings are provided by this patch, PSNR (default) and SSIM.
Activity masking is made dependent on setting --tune=ssim, as the
current implementation hurts speed (10%) and PSNR (2.7% avg,
10% peak) too much for it to be a default yet.

Change-Id: I110d969381c4805347ff5a0ffaf1a14ca1965257
2010-12-17 10:01:05 -05:00
Henrik Lundin
2a87491fb0 Inform caller of decoder about updated references
Inform the caller of the decoder if a decoded frame updated last,
golden, or altref frames, required for realtime communication
proposed in document VP8 RTP payload format.

Added a new vpx_codec_control called VP8D_GET_LAST_REF_UPDATES, to be
called after vpx_codec_decode. The control will indicate which of the
reference frames that were updated by setting the 3 LSBs in the input
int (pointer).

Change-Id: Iac9db60dac414356c7ffa0b0fede88cb91e11bd7
2010-12-17 14:43:13 +01:00
Scott LaVarnway
64baa8df2e Changed segmentation check order
In SPLITMV, the 8x8 segment will be checked first.  If the 8x8 rd
is better than the best, we check the other segments.  Otherwise
bail.  Adjustments to the thresh_mult were necessary to make
up for the initial quality loss.
The performance improved by 20% (average) for good quality,
speed 0 and speed 1, while the overall quality remained the same.

Change-Id: I717aef401323c8a254fba3e9777d2a316c774cc3
2010-12-16 17:01:27 -05:00
Scott LaVarnway
81cdeb7117 Adjusted breakout RD for SPLITMV
vp8_rd_pick_best_mbsegmentation looks at y only.  The new
breakout does not include the frame cost, the prob_skip_false
cost, or the uv rate.  Performance improved by a few percent
and the quality remained the same.

Change-Id: I94ff013998ac51e8ecce7130870f7b6600758e15
2010-12-16 09:38:02 -05:00
Yunqing Wang
4fbd0227f5 Merge "Fix a bug in motion search code(2)" 2010-12-15 08:10:34 -08:00
Yunqing Wang
08706a3ea7 Fix a bug in motion search code(2)
This fix added MV range checks for NEWMV mode as suggested by Jim.
To reduce unnecessary MV range checks, I tried Yaowu's suggestion.
Update UMV borders in NEWMV mode to also cover MV range check.
Also, in this way, every MV that is valid gets checked in diamond
search function.

Change-Id: I95a89ce0daf6f178c454448f13d4249f19b30f3a
2010-12-14 17:39:25 -05:00
Yaowu Xu
3ac73173a4 Merge "fix a bug that "optimize" flag is not set for sub-threads" 2010-12-14 13:32:04 -08:00
Yunqing Wang
23aa13d92c Merge "Fix a bug in motion search code" 2010-12-14 13:25:34 -08:00
Yunqing Wang
7fb0f86863 Fix a bug in motion search code
The MV's range is 256. Since the new motion search uses a different
starting MV than the center ref MV, a MV range checking needs to
be done to avoid corruption.

Change-Id: I8ae0721d1bd203639e13891e2e54a2e87276f306
2010-12-14 13:59:38 -05:00
Yaowu Xu
64f3d91579 fix a bug that "optimize" flag is not set for sub-threads
The flag for quantization optimization was not properly propagated to
mb row encoding threads.

Change-Id: Ic561599c35acd94cd5698c9b314bccd596ac2deb
2010-12-14 10:12:21 -08:00
Johann
825adc464f shrink TOKENEXTRA and vp8_extra_bit_struct
Per John's previous change, shrink TOKENEXTRA from 20 to 8 bytes
original: b7b1e6fb
reverted: 41f4458a

Also drop unused field from vp8_extra_bit_struct

Update ARM ASM to deal with this change. In particular, Extra is signed
and needs to be sign-extended when loaded.

Change-Id: Ibd0ddc058432bc7bb09222d6ce4ef77e93a30b41
2010-12-14 10:32:50 -05:00
John Koleszar
41f4458a03 Revert "Reduce size of TOKENEXTRA struct"
This reverts commit b7b1e6fb55. Previous
fix is incomplete, breaks ARM. Itchy submit finger.

Change-Id: I939dc0d3bf4173cf951c1d152338ab6ea2184bb9
2010-12-13 17:12:51 -05:00
John Koleszar
3809d7bbd9 Merge "remove unused temporal preproc code" 2010-12-13 13:57:59 -08:00
John Koleszar
398aa81849 Merge "Reduce size of TOKENEXTRA struct" 2010-12-13 13:57:55 -08:00
John Koleszar
b1aa54ab26 remove unused temporal preproc code
This code is unused, as the current preproc implementation uses the
same spatial filter that postproc uses.

Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
2010-12-13 16:47:59 -05:00
John Koleszar
b7b1e6fb55 Reduce size of TOKENEXTRA struct
Change the size of structure elements to reduce memory utilization.
Removed the 'section' member entirely, as it is set but never read.

Change-Id: Iad043830392fb4168cb3cd6075fb0eb70c7f691c
2010-12-13 16:37:37 -05:00
Yaowu Xu
97a86c5b13 fix a bug in multithreaded encoding with active_map enabled
Added the initialization of the pointer to active map. Also added the
same logic for cyclic refresh in mbrow encoding threads.

Change-Id: Ic48d0849dc706b27fba72d07dcc498075725663d
2010-12-10 10:48:30 -08:00
Fritz Koenig
0ced701487 Merge "vp8 fast quantizer sse2 optimizations for eob." 2010-12-10 09:25:04 -08:00
Fritz Koenig
e0cf330cde vp8 fast quantizer sse2 optimizations for eob.
Changed the end of block computation to use pmaxw.  Removed
additional pushing and popping of registers that was not needed.

Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
2010-12-09 15:00:30 -08:00
John Koleszar
cb9698951c fix uninitialized read in encode breakout
Change I3430820 performed an uninitialized read when
encode_breakout == 0, since the sum and sse wouldn't be set:

   if(x->encode_breakout)
       VARIANCE_INVOKE(..., get16x16var)(..., &sum, &sse);
   if (cpi->active_map_enabled && x->active_ptr[0] == 0) {
       ...
   } else if (sse < x->encode_breakout)

Change-Id: I915eb76d1227b4b6d1137a0dedf2c143860098a2
2010-12-09 16:05:26 -05:00
Paul Wilkins
c63fc881e1 Correct q_low and q_high limits for the recode loop
Corrected the initial Q range limits for the recode loop
to reflect the current allowed range for the frame.

In experimental work on constrained quality this bug was
causing unnecessary recodes.

Change-Id: I7e256fbfa681293b0223fe21ec329933d76c229f
2010-12-09 15:02:04 +00:00
Yaowu Xu
160f3c7e9e Merge "vp8e - static threshold play" 2010-12-08 13:08:04 -08:00
Yaowu Xu
d88da98614 Merge "vp8e - remove unnecessary variance calc" 2010-12-08 09:19:22 -08:00
Jim Bankoski
718c19711a vp8e - static threshold play
Realized no need for new assembly code sum is already
calculated.

Change-Id: Ie2d94feb4b7c1f77c5359bca29b66228e41638c9
2010-12-07 16:07:23 -05:00
Scott LaVarnway
f661fa1f24 Merge "vp8_rd_pick_best_mbsegmentation code restructure" 2010-12-07 07:53:12 -08:00
Yaowu Xu
062980cc48 Merge "adjust RDMULT for UV plane in quantization RDO" 2010-12-06 22:04:45 -08:00
Yaowu Xu
7c03a1c308 adjust RDMULT for UV plane in quantization RDO
This patch adds a weighting factor on RDMULT for UV blocks. The change
has an overall gain about 0.5% based on ssim, between 0.1 and 0.2% by
psnr numbers.

Change-Id: I97781b077ce3bb7e34241b03268491917e8d1d72
2010-12-06 20:53:59 -08:00
Yunqing Wang
9520f4b3cc Fix a memory leak problem in encoder
Deallocating the buffers before re-allocating them.

The fix passed James Berry's test program for memory
leak check.

Change-Id: I18c3cf665412c0e313a523e3d435106c03ca438d
2010-12-06 17:21:37 -05:00
Scott LaVarnway
2fa5d5a26d vp8_rd_pick_best_mbsegmentation code restructure
Moved the code from the segmentation loop into a function
which is now called for each segment. This will allow us
to change the segment order checking more easily.

Change-Id: I9510d26f0acae5a73043fcca8f1984b121d3e052
2010-12-06 16:42:52 -05:00
Scott LaVarnway
d283d9bb30 Merge "Improve MV prediction accuracy to achieve performance gain" 2010-12-06 09:41:09 -08:00
Patrik Westin
8534071de0 Fix for manual Golden frame frequency
When auto_golden wasn't set it forced all frames to be a golden
frame. Now the manual configured frequency is adhered to.

Change-Id: I360acac9bc487db0d9c4d4da6ee41f70c227c539
2010-12-06 09:53:41 -05:00
Paul Wilkins
ccb0348473 Merge "Change to inter_minq table." 2010-12-04 02:06:33 -08:00
Paul Wilkins
cec6a596b5 Change to inter_minq table.
The inter_minq table controls the range of quantizers available
for a particular frame in two pass relative to a max Q value.

The changes reduces the range somewhat. The effect of this
was a small increase (0.3% average) in psnr for the test set
but it should also help encode speed somewhat for higher
quality modes as it will reduce the number of iterations in the
recode loop.

The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the target
bit rate in VBR mode and this risk will be more pronounced for short
clips.

The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the
target bit rate in VBR mode and this risk will be more
pronounced for short clips.

Change-Id: I84465567d49ae767c6c73ff2a2aac30c895adb52
2010-12-04 10:04:12 +00:00
Yunqing Wang
c3bbb29164 Improve MV prediction accuracy to achieve performance gain
Add vp8_mv_pred() to better predict starting MV for NEWMV
mode in vp8_rd_pick_inter_mode(). Set different search
ranges according to MV prediction accuracy, which improves
encoder performance without hurting the quality. Also,
as Yaowu suggested, using diamond search result as full
search starting point and therefore adjusting(reducing)
full search range helps the performance.

Change-Id: Ie4a3c8df87e697c1f4f6e2ddb693766bba1b77b6
2010-12-03 15:23:35 -05:00
John Koleszar
5e76dfcc70 Merge 'Add simple version of activity masking.'
Merge commit 'refs/changes/79/779/2' of
    https://review.webmproject.org/p/libvpx

Conflicts:
	vp8/encoder/encodeintra.c
	vp8/encoder/encodemb.c

Change-Id: Id607063fabe92d99eeb3c380e8ca670b01bfb3ef
2010-12-03 13:30:50 -05:00
Fritz Koenig
9c8ad79fdc Set refresh_alt_ref_frame on keyframe encode.
On a keyframe alt ref and golden are refreshed.  The flag was
not being set and so on the frame after a keyframe, motion
search would occur on the alt ref frame.  This is not necessary
because the alt ref frame identical to the last frame in this
scenario.

Handle corner case where a forward alt-ref frame is put
directly after a keyframe.

Change-Id: I9be4cf290d694f8cf2f9a31852014b5ccf1504d3
2010-12-01 12:48:22 -08:00
Jim Bankoski
3430820bbe vp8e - remove unnecessary variance calc
only do the variance calculation if necessary
( eg needed for breakout test)
2010-11-27 14:02:59 -05:00
Pascal Massimino
fd9f9dc054 allow dimensions as low as 1 pixel
remove warning comment in vpxenc.c: in case of 1x1 picture,
detect_bytes will be equal to '3' and we'll fall back to
RAW_TYPE.
fix read_frame() by tracking the pre-read buffer length
in the struct detect

Change-Id: If1ed86ee5260dcdbc8f9d10da6cbb84a4cc2f151
2010-11-24 16:44:33 -08:00
John Koleszar
78cbe51bc3 Merge changes I3aed713e,I9ef7f56e,Ic18c60df
* changes:
  vp8_set_maps: remove hard-coded width/height
  vp8mt_alloc_temp_buffers: make prototype return void
  Disable compile warning for ERROR macro
2010-11-23 12:38:20 -08:00
Paul Wilkins
ad6150f769 Recalibration of bits per MB tables
The baseline bits per MB prediction tables have been
re calibrated based on the assumption that bits per mb
is inversely proportional to the quantizer level.

Change-Id: Ibd355c7acac4b8053dda1baf1032fe35f11da7f7
2010-11-22 13:17:35 +00:00
Paul Wilkins
1753f0d208 Merge "Added extra two pass stats gathering." 2010-11-22 04:11:20 -08:00
Paul Wilkins
70b885a0e8 Added extra two pass stats gathering.
Added code to record spend so far against planed budget.

Change-Id: I5a3335346fa1771b2b1219df9f6127f9993d2594
2010-11-19 14:12:33 -05:00
Pascal Massimino
ed5ab7fa49 remove warning
was having: "vp8/encoder/onyx_if.c:5365: warning: comparison of unsigned expression >= 0 is always true"
2010-11-17 16:50:02 -08:00
Scott LaVarnway
9a6740af80 Merge "Removed unnecessary checks." 2010-11-17 11:28:22 -08:00
Scott LaVarnway
f7670acc68 Removed unnecessary checks.
macro_block_yrd and vp8_rdcost_mby are not called for SPLITMV.

Change-Id: I2224d3c8725df526d48426447482768d543752f1
2010-11-17 14:25:48 -05:00
Paul Wilkins
f874391e02 Replaced recode loop test with a function call
Replaced existing code to decide if a frame recode is required
with a function call. This is to simplify addition of extra clauses
that may be needed for the planned constrained quality mode.

Also fixed a bug where by alt ref not considered in the test.

Change-Id: I3d40bb21abe3e19f8456761e6849deb171738b60
2010-11-17 15:12:04 +00:00
John Koleszar
8d94796cad vp8mt_alloc_temp_buffers: make prototype return void
This function was never called in a context expecting a return value,
the return value was always a constant, and the !CONFIG_MULTITHREAD
path didn't have a return statement, which caused a compiler warning.
This patch changes the function to return void instead.

Fixes issue #231

Change-Id: I9ef7f56e54418b7265026c54fc4ed5660c1418d1
2010-11-17 09:13:57 -05:00
John Koleszar
79e2b1f39b Disable compile warning for ERROR macro
The ERROR macro collides wiith the MS SDK on Windows. Since we're not
making any win32 calls in this function, just #undef it first to take
ownership.

Change-Id: Ic18c60dfa3a33c52e6c49d3f4f8d3e7e3ac3341d
2010-11-17 09:08:51 -05:00
Fritz Koenig
99d02c0f9f Merge "Comments for alt ref flags." 2010-11-16 16:11:39 -08:00
Fritz Koenig
69ee697fef Comments for alt ref flags.
Clarify what the alt ref flags do when encoding.

Change-Id: I71f78e0f42edae633fb91840f29dfbe64362c44c
2010-11-16 15:16:24 -08:00
Yaowu Xu
d49da085c0 correct errors in token alphabet descriptions
There were a few errors in the comment section that describe VP8 token
alphabet table.

Change-Id: Ie6728a0e08bc3798893221b60408d5b201064bdc
2010-11-16 10:51:43 -08:00
Fritz Koenig
e180255375 Remove stack shadowing for x86-x64 for SAD functions.
x86-64 passes arguments in registers.  There is no need to push
them to the stack before using them.

This fixes 15acc84f10 where ebx
was not getting preserved on x86.

Change-Id: I1214b5f818a0201f75ab6ad7d5c6f448e09b16c2
2010-11-15 10:56:02 -08:00
Paul Wilkins
f4709d2895 Merge "Bad cost tables used in ARNR filtering." 2010-11-15 09:55:35 -08:00
Paul Wilkins
373f5c3144 Bad cost tables used in ARNR filtering.
The use of incorrect mv costing tables in the ARNR sub-pel
filtering code led to corruption of the altref buffer in some cases,
particularly at low data rates.

The average gain from this fix is about 0.3% but there are a few
extreme cases where nasty and visible artifacts manifested and
for these few data points the improvement is > 10%.

PGW and AWG

Change-Id: I95cc02b196a433e71d0d2bd2b933fe68ed31e796
2010-11-15 17:47:12 +00:00
Yaowu Xu
73189f21b3 Merge "make rdmult adaptive for intra in quantizer RDO" 2010-11-15 09:22:45 -08:00
Yaowu Xu
ef2f27f10e make rdmult adaptive for intra in quantizer RDO
This intends to correct the tendency that VP8 aggressively favors rate
on intra coded frames. Experiments tested different numbers in [0, 1]
and found 9/16 overall provided about 2-4% gains for all-intra coded
clips based on vpx-ssim metric. The impact on regular encoded clips
is much smaller but positive overall. Overall impact on psnr is also
positive even though very small.

Change-Id: If808553aaaa87fdd44691f9787820ac9856d9f8a
2010-11-11 11:33:35 -08:00
John Koleszar
0a49747b01 quantizer: fix assertion in fast quantizer path
The fast quantizer assembly code has not been updated to match the new
exact quantizer, which was made the default in commit 6adbe09.
Specifically, they are not aware of the potential for the coefficient
to be scaled, which results in the quantized result exceeding the range
of the DCT. This patch restores the previous behavior of using the
non-shifted coefficients when in the fast quantizer code path, but
unfortunately requires rebuilding the tables when switching between the
two.

Change-Id: I0a33f5b3850335011a06906f49fafed54dda9546
2010-11-11 13:05:20 -05:00
Fritz Koenig
58083cb34d Revert "Remove stack shadowing for x86-64"
This reverts commit 15acc84f10.

Change-Id: Ia640be8cbc134432914849c1750f62575ea084e6
2010-11-11 08:20:02 -08:00
Paul Wilkins
213f7b0907 Merge "Relax rate control for last few frames" 2010-11-11 02:39:20 -08:00
Fritz Koenig
9b1ece2cca Merge "Remove stack shadowing for x86-64" 2010-11-10 14:36:10 -08:00
Fritz Koenig
5f0e0617ba FDCT optimizations.
Fixed up the fdct for mmx and 8x4 sse2 to match them
most recent changes.

Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
2010-11-10 14:34:02 -08:00
Fritz Koenig
647df00f30 postproc : Re-work posproc calling to allow more flags.
Debugging in postproc needs more flags to allow for specific
block types to be turned on or off in the visualizations.

Must be enabled with --enable-postproc-visualizer during
configuration time.

Change-Id: Ia74f357ddc3ad4fb8082afd3a64f62384e4fcb2d
2010-11-10 14:14:46 -08:00
Paul Wilkins
513f8e6814 Relax rate control for last few frames
VBR rate control can become very noisy for the last few frames.
If there are a few bits to spare or a small overshoot then the
target rate and hence quantizer may start to fluctuate wildly.

This patch prevents further adjustment of the active Q limits for
the last few frames.

Patch also removes some redundant variables and makes one small bug fix.

Change-Id: Ic167831bec79acc9f0d7e4698bcc4bb188840c45
2010-11-10 10:09:45 +00:00
Paul Wilkins
6adbe09058 Tuning for the more exact quantizer.
Small changes to the default zero bin and rounding tables.
Though the tables are currently the same for the Y1 and Y2 cases
I have left them as separate tables in case we want to tune this later.

There is now some adjustment of the zbin based on the prediction mode.
Previously this was restricted to an adjustment for gf/arf 0,0 MV.

The exact quantizer now marginal outperforms and is the default.

The overall average gain is about 0.5%

Change-Id: I5e4353f3d5326dde4e86823684b236a1e9ea7f47
2010-11-10 09:52:58 +00:00
John Koleszar
458f4fedd2 Merge "improve average framerate calculation" 2010-11-09 08:52:16 -08:00
John Koleszar
4d1b0d2a2d Merge commit 'fix integer promotion bug in partition size check'
Change-Id: I4081917b46013fa8f4218cade8bd12cb2d013aee
2010-11-05 16:49:32 -04:00
John Koleszar
9fb80f7170 fix integer promotion bug in partition size check
The check '(user_data_end - partition < partition_size)' must be
evaluated as a signed comparison, but because partition_size was
unsigned, the LHS was promoted to unsigned, causing an incorrect
result on 32-bit. Instead, check the upper and lower bounds of
the segment separately.

Change-Id: I6266aba7fd7de084268712a3d2a81424ead7aa06
2010-11-05 14:52:53 -04:00
John Koleszar
f7e187d362 improve average framerate calculation
Change Ice204e86 identified a problem with bitrate undershoot due to
low precision in the timestamps passed to the library. This patch
takes a different approach by calculating the duration of this frame
and passing it to the library, rather than using a fixed duration
and letting the library average it out with higher precision
timestamps. This part of the fix only applies to vpxenc.

This patch also attempts to fix the problem for generic applications
that may have made the same mistake vpxenc did. Instead of
calculating this frame's duration by the difference of this frame's
and the last frame's start time, we use the end times instead. This
allows the framerate calculation to scavenge "unclaimed" time from
the last frame. For instance:

  start |  end  | calculated duration
  ======+=======+====================
    0ms    33ms   33ms
   33ms    66ms   33ms
   66ms    99ms   33ms
  100ms   133ms   34ms

Change-Id: I92be4b3518e0bd530e97f90e69e75330a4c413fc
2010-11-05 08:42:46 -04:00
Fritz Koenig
0e7b60617f postproc : Update visualizations.
Change color reference frame to blend the macro block edge.
This helps with layering of visualizations.

Add block coloring for intra prediction modes.

Change-Id: Icefe0e189e26719cd6937cebd6727efac0b4d278
2010-11-04 10:35:02 -07:00
Fritz Koenig
0a29bd9793 postproc : Fix display of motion vectors.
Split motion vectors were all being treated as 4x4
blocks.  Now correctly handle 16x8, 8x16, 8x8, 4x4
blocks.

Change-Id: Icf345c5e69b5e374e12456877ed7c41213ad88cc
2010-11-02 13:29:13 -07:00
Scott LaVarnway
b8f43aec66 Merge "SSSE3 version of fast quantizer" 2010-11-02 06:27:29 -07:00
Fritz Koenig
90c505f218 Merge "postproc : Added SPLITMV visualization, fix line constrain." 2010-11-01 14:41:41 -07:00
Fritz Koenig
9f61a83bf9 postproc : Added SPLITMV visualization, fix line constrain.
Now draw 16 vectors for SPLITMV mode.

Fixed constrain line to block divide by zero issues.

Blend block was not centering the shaded area correctly.

Change-Id: I1edabd8b4e553aac8d980f7b45c80159e9202434
2010-11-01 13:27:13 -07:00
Scott LaVarnway
ff4a71f4c2 SSSE3 version of fast quantizer
(test clip: tulip)
For good quality mode with speed=1, this gave the encoder
a small (2 - 3%) performance boost.

Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
2010-11-01 16:24:15 -04:00
Scott LaVarnway
dcee88ea37 Finding first label
Using tables for the label count and label offset.

Change-Id: Iac3d5b292c37341a881be0af282f5cac3b3e01eb
2010-10-29 10:01:04 -04:00
Yunqing Wang
6614563b8f Save XMM registers in asm functions
XMM6/7 are used in these functions, and need to be saved.

Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
2010-10-28 16:59:03 -04:00
Yunqing Wang
f57fc7bcc6 Merge "Fix full-search SAD function crash in Visual Studio" 2010-10-28 13:46:35 -07:00
Yunqing Wang
7e3a1e7361 Fix full-search SAD function crash in Visual Studio
Unlike GCC, Visual Studio compiler doesn't allocate SAD output
array 16-byte aligned, which causes crash in visual studio.

Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
2010-10-28 15:26:58 -04:00
Timothy B. Terriberry
c4d7e5e67e Eliminate more warnings.
This eliminates a large set of warnings exposed by the Mozilla build
 system (Use of C++ comments in ISO C90 source, commas at the end of
 enum lists, a couple incomplete initializers, and signed/unsigned
 comparisons).
It also eliminates many (but not all) of the warnings expose by newer
 GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
 without checking the return values).
There are a few spurious warnings left on my system:

../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
 uninitialized in this function
gcc seems to be unable to figure out that the value shortcut doesn't
 change between the two if blocks that test it here.

../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
 expression >= 0 is always true
../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
 expression >= 0 is always true
This is true, so far as it goes, but it's comparing against an enum, and the C
 standard does not mandate that enums be unsigned, so the checks can't be
 removed.

Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
2010-10-27 18:08:04 -07:00
Fritz Koenig
a097e18964 postproc: Tweaks to line drawing and blending.
Turned down the blending level to make colored blocks obscure
the video less.
Not blending the entire block to give distinction to macro
block edges.
Added configuration so that macro block blending function can
be optimized.
Change to constrain line as to when dx and dy are computed.
Now draw two lines to form an arrow.

Change-Id: Id3ef0fdeeab2949a6664b2c63e2a3e1a89503f6c
2010-10-27 13:20:03 -07:00
Yunqing Wang
71ecb5d7d9 Full search SAD function optimization in SSE4.1
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4

(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.

Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
2010-10-27 13:36:31 -04:00
John Koleszar
a0ae3682aa Fix half-pixel variance RTCD functions
This patch fixes the system dependent entries for the half-pixel
variance functions in both the RTCD and non-RTCD cases:

  - The generic C versions of these functions are now correct.
    Before all three cases called the hv code.

  - Wire up the ARM functions in RTCD mode

  - Created stubs for x86 to call the optimized subpixel functions
    with the correct parameters, rather than falling back to C
    code.

Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
2010-10-27 13:00:30 -04:00
Johann
927f29a644 Merge "fix implicit declarations" 2010-10-27 09:59:28 -07:00
Johann
787733d855 Merge "RTCD build is bringing old errors to light" 2010-10-27 09:59:01 -07:00
Fritz Koenig
cf127474d8 vpxdec : Change --pp-debug-info to be a bit field.
This allows multiple post processor debug levels to be overlayed.
i.e. can show colored reference blocks and visual motion vectors.

Change-Id: Ic4a1df438445b9f5780fe73adb3126e803472e53
2010-10-27 09:53:37 -07:00
Fritz Koenig
36ff6a6743 Merge "postproc: Add mode and refrence frame visualizers." 2010-10-27 09:04:39 -07:00
Johann
b90a072f10 fix implicit declarations
ARM used to explicitly remove this file from the build. With the RTCD
changes, that's no longer possible. These errors also exist for x86 w/o
RTCD, but that's not the default configuration

Change-Id: I3e10e5553ddf3278e8d3c9365ca6fb84f52f5066
2010-10-27 11:21:02 -04:00
Johann
abcf36c758 RTCD build is bringing old errors to light
needs to be _recon_ not _recon_recon_

Change-Id: I7a8b9ddcb4fb72c2b723c563932c9ea52ff15982
2010-10-27 10:47:48 -04:00
John Koleszar
1747207700 Merge "Add half-pixel variance RTCD functions" 2010-10-26 20:05:02 -07:00
John Koleszar
1320e54d95 Merge "make vp8_recon16x16mb{,y} RTCD functions" 2010-10-26 20:02:57 -07:00
John Koleszar
87e17737e9 Merge "make arm hex search the generic implementation" 2010-10-26 20:02:37 -07:00
John Koleszar
53f64a7736 Merge "arm: move unrolled loops back to generic code" 2010-10-26 20:02:18 -07:00
John Koleszar
9fdd90c9aa Merge "arm: remove duplicate functions" 2010-10-26 20:01:54 -07:00
John Koleszar
209d82ad72 Add half-pixel variance RTCD functions
NEON has optimized 16x16 half-pixel variance functions, but they
were not part of the RTCD framework. Add these functions to RTCD,
so that other platforms can make use of this optimization in the
future and special-case ARM code can be removed.

A number of functions were taking two variance functions as
parameters. These functions were changed to take a single
parameter, a pointer to a struct containing all the variance
functions for that block size. This provides additional flexibility
for calling additional variance functions (the half-pixel special
case, for example) and by initializing the table for all block sizes,
we don't have to construct this function pointer table for each
macroblock.

Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
2010-10-26 20:00:56 -07:00
Fritz Koenig
a0ccc97d8a postproc: Add mode and refrence frame visualizers.
Post process option to color the block for either the mode
of the macro block, or the frame that the macro block references.

Change-Id: Ie498175497f2d20e3319924d352dc4ddc16f4134
2010-10-26 16:00:14 -07:00
John Koleszar
d6c67f02c9 make vp8_recon16x16mb{,y} RTCD functions
ARM NEON has a platform specific version of vp8_recon16x16mb, though
it's just a stub to extract the various parameters from the
MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
that function's prototype directly will be a better long term solution,
but it's quite an invasive change.

Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
2010-10-26 13:23:36 -04:00
John Koleszar
96cf6588de make arm hex search the generic implementation
The ARM version of vp8_hex_search() is a faster implementation
of the same algorithm. Since it doesn't use any ARM specific
code, it can be made the default implementation. This removes
a linking error.

Change-Id: I77d10f2c16b2515bff4522c350004e03b7659934
2010-10-26 10:46:31 -04:00
John Koleszar
1e7c05e0b4 Merge "add missing GET_GOT/RESTORE_GOT pairs" 2010-10-26 07:05:21 -07:00
John Koleszar
19638c2309 arm: move unrolled loops back to generic code
Some of the ARM functions differed from their generic counterparts
only by unrolling their loops. Since this change may be useful
on other platforms, or might even supercede the looped version
in the generic case, move it back to the generic file.

This code is left under #if ARCH_ARM for now, but it may be worth
considering a different (possibly new) conditional for these. If
it turns out that this should be runtime selectable, these
functions will have to move to the RTCD infrastructure. Don't want
to take that step at this time without more profile data.

Change-Id: I4612fdbc606fbebba4971a690fb743ad184ff15f
2010-10-26 09:51:35 -04:00
John Koleszar
d330a5876b arm: remove duplicate functions
These functions were true duplicates of functions present in the
generic code. This fixes some of the link errors when building
with --enable-shared --enable-pic.

Change-Id: Idff26599d510d954e439207883607ad6b74df20c
2010-10-26 09:37:44 -04:00
Jim Bankoski
0a5a638c60 Merge commit 'refs/changes/09/809/1' of https://review.webmproject.org/p/libvpx 2010-10-26 07:34:57 -04:00
John Koleszar
b523dd51bd add missing GET_GOT/RESTORE_GOT pairs
These functions made global references but did not set up the GOT,
causing compilation failures in PIC mode.

Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
2010-10-25 23:45:02 -04:00
Fritz Koenig
1d70aaf08b Merge "Debug option for drawing motion vectors." 2010-10-25 15:40:22 -07:00
Fritz Koenig
d1a4cce809 Debug option for drawing motion vectors.
Postproc level that uses Bresenham's line algorithm
to draw motion vectors onto the postproc buffer.

Change-Id: I34c7daa324f2bdfee71e84fcb1c50b90fa06f6fb
2010-10-25 15:39:04 -07:00
Johann
a3b002fc90 Merge "quiet compiler" 2010-10-25 13:26:55 -07:00
Martin Ettl
c3fd2c4ea7 Fix leaked file descriptor with ENTROPY_STATS
cppcheck found a leaked file descriptor in the debugging code
enabled by defining ENTROPY_STATS. Fixes issue #60.

Change-Id: I0c1d0669cb94d44fed77860f97b82763be06b7cb
2010-10-25 13:16:39 -04:00
Johann
385865f820 quiet compiler
clean up compiler warnings, man in the yellow hat warnings, and start to
remove unused #includes

Change-Id: I6267e98d9b3024b6fb1ef2732b29067a33cb96f6
2010-10-25 10:07:35 -04:00
Johann
1376f061da reuse common loopfilter code
there were four versions for the regular and
macroblock loopfilters:
horizontal [y|uv]
vertical [y|uv]

this moves all the common code into 2 functions:
vp8_loop_filter_neon
vp8_mbloop_filter_neon

this provides no gain in performance. there's a bit
of jitter, but it trends down ~0.25-0.5%. however,
this is a huge gain maintenance. also, there is the
potential to drop some stack usage in the macroblock
loopfilter.

Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92
2010-10-25 09:48:50 -04:00
Timothy B. Terriberry
b71962fdc9 Add runtime CPU detection support for ARM.
The primary goal is to allow a binary to be built which supports
 NEON, but can fall back to non-NEON routines, since some Android
 devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
 Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
 which versions of each function to build, and when
 CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
 at run time.
In order for this to work, the CFLAGS must be set to something
 appropriate (e.g., without -mfpu=neon for ARMv7, and with
 appropriate -march and -mcpu for even earlier configurations), or
 the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
 required at build time, since the ARM assembler will refuse to emit
 them otherwise.
I have not attempted to make any changes to configure to do this
 automatically.
Doing so will probably require the addition of new configure options.

Many of the hooks for RTCD on ARM were already there, but a lot of
 the code had bit-rotted, and a good deal of the ARM-specific code
 is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
 of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
 site were expanded to check the RTCD flags at that site, but they
 should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
 these should be moved into an RTCD struct for thread safety (I
 believe every platform currently supported has atomic pointer
 stores, but this is not guaranteed).

The encoder's boolhuff functions did not even have _c and armv7
 suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
 version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
 used was rbit, and this was completely superfluous, so I reworked
 them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
 ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
 least ARMv5TE, I did not try to detect these at runtime, and simply
 enable them for ARMv5 and above.

Finally, the NEON register saving code was completely non-reentrant,
 since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
 and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
 and produced identical output, while using the correct accelerated
 functions on each.
I did not test on any earlier processors.

Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
2010-10-25 09:23:29 -04:00
Johann
e81e30c25d isolate new temporal filtering code
onyx_if is getting pretty big. split out the temporal code to make it
easier to look at.

Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd
2010-10-25 09:11:03 -04:00
John Koleszar
3b9e72b210 Merge "Improve handling of invalid frames."
Change-Id: Icef5226a70260607c190126c1c0cc28b796e759c
2010-10-22 11:54:49 -04:00
Timothy B. Terriberry
09bcc1f710 Improve handling of invalid frames.
The code was not checking for frame sizes smaller than 3 bytes, and the
 partition size checks might have failed if the input buffer was within
 16MB of the top of the heap.
In addition, the reference count on the current frame buffer was not
 being decremented on error, so after a small number of errors, no new
 frame buffer could be found and it would run off the list of them.

Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46
2010-10-22 11:50:56 -04:00
Timothy B. Terriberry
8f75ea6b5c Convert [4][4] matrices to [16] arrays.
Most of the code that actually uses these matrices indexes them as
 if they were a single contiguous array, and coverity produces
 reports about the resulting accesses that overflow the static
 bounds of the first row.
This is perfectly legal in C, but converting them to actual [16]
 arrays should eliminate the report, and removes a good deal of
 extraneous indexing and address operators from the code.

Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-21 17:04:30 -07:00
Frank Galligan
45e6494177 Change altref times to preceding pts+1.
Change the pts of the altref frame to be as close as possible to the
pts of the preceding frame and still be strictly increasing.

Change-Id: Iae3033a4c89ae5a9d0e5c4198e9196e5f3ee57c7
2010-10-21 14:11:58 -04:00
John Koleszar
1ee3ebcd66 Merge "Move firstpass motion map to stats packet" 2010-10-21 11:09:02 -07:00
John Koleszar
bb7dd5b1ba Move firstpass motion map to stats packet
The first implementation of the firstpass motion map for motion
compensated temporal filtering created a file, fpmotionmap.stt,
in the current working directory. This was not safe for multiple
encoder instances. This patch merges this data into the first pass
stats packet interface, so that it is handled like the other
(numerical) firstpass stats.

The new stats packet is defined as follows:
    Numerical Stats (16 doubles) -- 128 bytes
    Motion Map                   -- 1 byte / Macroblock
    Padding                      -- to align packet to 8 bytes

The fpmotionmap.stt file can still be generated for debugging
purposes in the same way that the textual version of the stats
are available (defining OUTPUT_FPF in firstpass.c)

Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
2010-10-21 14:04:20 -04:00
Yunqing Wang
4cefb4434f Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm
Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
2010-10-21 13:42:24 -04:00
Yunqing Wang
31752f2f41 Merge "Rewrite vp8_short_walsh4x4_sse2()" 2010-10-21 10:31:23 -07:00
Yunqing Wang
0918747520 Merge "Add SSE2 subtract functions" 2010-10-21 10:30:27 -07:00
Fritz Koenig
15acc84f10 Remove stack shadowing for x86-64
x86-64 passes most arguments in registers.  There is no need to
push them to the stack before using them.

Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
2010-10-21 10:28:08 -07:00
Yunqing Wang
fc94ffcea4 Rewrite vp8_short_walsh4x4_sse2()
This rewriting reflects changes made in commit "Improve the
accuracy of forward walsh-hadamard transform". Since this function
is not called much, only a small encoder performance gain (~0.5% )
is seen.

Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
2010-10-21 13:02:55 -04:00
John Koleszar
bdf469c91e Merge "Update arnr strength range form 1-6 to 0-6." 2010-10-19 20:20:31 -07:00
Frank Galligan
15542721ee Update arnr strength range form 1-6 to 0-6.
Change-Id: I8eb49c56f7509f0a8074d440e8345b9e3344b85b
2010-10-19 20:18:13 -07:00
Yaowu Xu
fc2f8dafaf Merge "fixed a typo that mis-used Y plane stride for UV blocks." 2010-10-19 16:23:31 -07:00
Yaowu Xu
b9fe6d4da4 Merge "change to make use of more trellis quantization" 2010-10-19 08:11:52 -07:00
Yunqing Wang
4db2076594 Add SSE2 subtract functions
Instead of doing 8-bit data unpack and 16-bit subtraction, use
psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
sign information. This does not bring noticable gain since
these functions are not called frequently.

Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
2010-10-18 14:15:15 -04:00
Johann
ce1ce992ce copy compiler warning fixes
generic version got fixed, but not the arm version. fixes:
vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3':
vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing
argument 5 of 'fn_ptr->sdx3f' differ in signedness
vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but
argument is of type 'int *'

and another unsigned change to keep the files similar

Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
2010-10-18 13:23:39 -04:00
Johann
963bcd6c87 remove dead code
vp8_diamond_search_sadx4 isn't used in arm because there is no
corrosponding sdx4df as in x86. rather than keep it in sync with
../mcomp.c, delete it

vp8_hex_search had the original, more readable/understandable code if`d
out. it's also available in ../mcomp.c, so remove the dead copy

Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e
2010-10-15 15:37:09 -04:00
Yaowu Xu
2e53e9e53f change to make use of more trellis quantization
when a subsequent frame is encoded as an alt reference frame, it is
unlikely that any mb in current frame will be used as reference for
future frames, so we can enable quantization optimization even when
the RD constant is slightly rate-biased. The change has an overall
benefit between 0.1% to 0.2% bit savings on the test sets based on
vpxssim scores.

Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
2010-10-15 10:14:34 -07:00
Jim Bankoski
39f41a4f36 safety check to avoid divide by 0s 2010-10-14 16:19:06 -04:00
Yunqing Wang
a2b598a2f9 Merge "Fix one gcc compiler warning" 2010-10-14 12:20:25 -07:00
Yunqing Wang
7804befb55 Fix one gcc compiler warning
../libvpx/vp8/encoder/bitstream.c: In function ‘pack_inter_mode_mvs’:
../libvpx/vp8/encoder/bitstream.c:1026: warning: array subscript has type ‘char’

Change-Id: Ic77491e0a172fa1821e5b3e914d0dc41fe87c00f
2010-10-14 15:15:35 -04:00
Yunqing Wang
7f31d987f0 Merge "Improve bounds checking in vp8_diamond_search_sadx4()" 2010-10-14 11:29:24 -07:00
Yunqing Wang
d6da7b8ea1 Improve bounds checking in vp8_diamond_search_sadx4()
In order to know if all 4/8 neighbor points are within the bounds,
4 bounds checking are enough instead of checking 4 bounds for
each points (16/32 checkings). This improvement reduces cost of
vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5%
performance gain (test options: 1 pass, good, speed=4).

Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a
2010-10-14 11:06:37 -04:00
Fritz Koenig
1dc0ca1340 Fix compiler warning about vp8_fast_quantize_b_impl_ssse2.
Typo had function defined as _ssse2 and prototyped as _sse2.

Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
2010-10-13 17:08:13 -07:00
Fritz Koenig
92df4a06d2 Correct QWORD usage in assembly files
QWORD was being undefined because it was being used
incorrectly.

Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
2010-10-13 16:57:57 -07:00
John Koleszar
136857475e Centralize mb skip state calculation
This patch moves the scattered updates to the mb skip state
(mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent
changes to the quantizer exposed a bug where if a macroblock
could be coded as a skip but isn't, the encoder would run the
loopfilter but the decoder wouldn't, causing a reference buffer
mismatch.

The loopfilter is controlled by a flag called dc_diff. The decoder
looks at the number of decoded coefficients when setting this flag.
The encoder sets this flag based on the skip state, since any
skippable macroblock should be transmitted as a skip. The coefficient
optimization pass (vp8_optimize_b()) could change the coefficients
such that a block that was not a skip becomes one. The encoder was
not updating the skip state in this situation for intra coded blocks.

The underlying issue predates it, but this bug was recently triggered
by enabling trellis quantization on the Y2 block in commit dcd29e3,
and by changing the quantizer range control in commit 305be4e.

Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802
2010-10-12 09:03:19 -04:00
John Koleszar
acff1627b8 Merge "Add const qualifiers to variance/SAD functions." 2010-10-12 05:44:20 -07:00
Timothy B. Terriberry
8d0f7a01e6 Add simple version of activity masking.
This uses MB variance to change the RDO weight for mode decision
 and quantization.
Activity is normalized against the average for the frame, which is
 currently tracked using feed-forward statistics.
This could also be used to adjust the quantizer for the entire
 frame, but that requires more extensive rate control changes.
This does not yet attempt to adapt the quantizer within the frame,
 but the signaling cost means that will likely only be useful at
 very high rates.

Change-Id: I26cd7c755cac3ff33cfe0688b1da50b2b87b9c93
2010-10-12 08:41:03 -04:00
Timothy B. Terriberry
f4a8594492 Add const qualifiers to variance/SAD functions.
These functions should never change their input, and there's no
 reason not to declare that.
This allows them to be passed static const data.

Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
2010-10-12 08:40:54 -04:00
John Koleszar
037345eb69 Merge "Move vp8_strict_quantize_b inside EXACT_QUANT #define." 2010-10-12 05:34:30 -07:00
John Koleszar
fc018e0d92 Merge "Remove INTRARDOPT #define and intra_rd_opt option." 2010-10-12 05:33:22 -07:00
Timothy B. Terriberry
82c4339885 Move vp8_strict_quantize_b inside EXACT_QUANT #define.
There is currently no inexact version of this function, so do not
 even compile it without EXACT_QUANT.
This will prevent someone from inadvertently trying to use it without
 the proper EXACT_QUANT setup.

Change-Id: Ia13491e0128afb281c05c9222ee5987101e4010d
2010-10-11 13:51:35 -07:00
Timothy B. Terriberry
dd08db9315 Remove INTRARDOPT #define and intra_rd_opt option.
This is just eliminating some cruft.
Although a number of variables are declared only when INTRARDOPT
 is defined, they are used elsewhere without that protection, and
 no longer just for intra RDO.
The intra_rd_opt flag was hard-coded to 1 and never checked.

Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f
2010-10-11 11:53:57 -07:00
Scott LaVarnway
6b1b28a83c Merge "Added vp8_fast_quantize_b_sse2" 2010-10-11 09:34:48 -07:00
Yunqing Wang
7e6f7b579a Remove unused file in encoder
Remove vp8/encoder/x86/csystemdependent.c

Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
2010-10-07 12:08:08 -04:00
Scott LaVarnway
d860f685b8 Added vp8_fast_quantize_b_sse2
Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into
quantize_sse2.asm and renamed.  Updated the assembly code to
match the C version.

Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
2010-10-07 11:43:19 -04:00
Yaowu Xu
d338d14c6b optimize fast_quantizer c version
As the zbin and rounding constants are normalized, rounding effectively
does the zbinning, therefore the zbin operation can be removed. In
addition, the memset on the two arrays are no longer necessary.

Change-Id: If39c353c42d7e052296cb65322e5218810b5cc4c
2010-10-06 13:28:36 -07:00
Paul Wilkins
2931b05ac5 Merge "Tune effect of motion on KF/GF boost in two pass;" 2010-10-05 06:58:24 -07:00
Jan Kratochvil
1fc294116a nasm: movhps compatibility QWORD->MMWORD
Filed for nasm as:
https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208

nasm just does not accept any size parameter for movhps:
1.asm:2: error: mismatch in operand sizes

Some parts of libvpx already use MMWORD for movhps and MMWORD is
defined-out so it is compatible both with yasm and nasm.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.

Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc
2010-10-04 20:47:19 -04:00
Jan Kratochvil
5cdc3a4c29 nasm: address labels 'rel label' vice 'wrt rip'
nasm does not support `label wrt rip', it requires `rel label'. It is
still fully compatible with yasm.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04 19:47:54 -04:00
Jan Kratochvil
e114f699f6 nasm: match instruction length (movd/movq) to parameters
nasm requires the instruction length (movd/movq) to match to its
parameters. I find it more clear to really use 64bit instructions when
we use 64bit registers in the assembly.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-10-04 23:36:29 +02:00
Yaowu Xu
49fdb7c41e fixed a typo that mis-used Y plane stride for UV blocks.
Raised by Lei Yang, the Y plane stride was used for UV blocks.
This is clearly a typo. But as the comments in the code suggested
that this port of code has not been used yet, so the typo should
not have created any damage yet.

Change-Id: Iea895edc17469a51c803a8cc6d0fce65a1a7fc2f
2010-10-04 11:31:14 -07:00
Yaowu Xu
2d4ef37507 Merge "enable trellis quantization for 2nd order blocks" 2010-10-04 10:41:20 -07:00
Paul Wilkins
788c0eb54e Tune effect of motion on KF/GF boost in two pass;
This code adjust the impact of the amount and speed of motion
on GF and KF boost.

Sections with lots of slow motion will tend to have a
somewhat bigger boost and sections with fast motion may
have less.

There is a knock on effect to the selection of the active
quantizer range.

This will likely require further tuning but helps with a couple
of particularly bad edge cases.

Change-Id: Ic2449cda7305672b69acf42fc0a845b77ac98d40
2010-10-02 17:31:46 +01:00
Yaowu Xu
dcd29e369f enable trellis quantization for 2nd order blocks
Experimented with different value for Y2_RD_MULT ranging f[1, 32],
without adapting the value to MB coding mode/frame type/Q value,
4 works out best among all values, providing overall 0.1% coding
gain on the test set.

Change-Id: I6b2583a8aa5db5e7e5c65c646301909c0c58f876
2010-10-02 06:20:33 -07:00
Johann
f143a81191 Merge "Fix valgrind errors in the NEON loop filters." 2010-10-01 06:18:53 -07:00
Adrian Grange
999bc00301 Made temporal filter default to use centered mode
If temporal filtering is enabled but a filter type is not specified
centered filter mode is used by default.

Change-Id: I87306f267c1390074c806c506a69b4ba914d92a2
2010-10-01 10:14:01 +01:00
Timothy B. Terriberry
a465076e02 Fix valgrind errors in the NEON loop filters.
Like the ARMv6 code, these functions were accessing values below
 the stack pointer, which can be corrupted by signal delivery at
 any time.
2010-09-30 20:40:45 -07:00
John Koleszar
0faa8a0861 Merge "Rename mode_ref_lf_test_function" 2010-09-30 10:26:31 -07:00
John Koleszar
a047fee606 Merge "Fix loopfilter delta zero transitions" 2010-09-30 10:26:10 -07:00
Adrian Grange
8ee7284d60 Changed defaults & range checking for AltRef params
Modified the range checking of parameters used in the
AltRef temporal filter (arnr-max-frames, arnr-strength,
arnr-type) and default values for each of them.

Change-Id: Ib261028d501b9523f6e44cb4790cc52167b6e92b
2010-09-30 10:06:09 +01:00
John Koleszar
7e5e31516c Rename mode_ref_lf_test_function
This function graduated from being a test func to something that's on
by default. Rename it and remove some spurious comments that confuse
its status.

Change-Id: I689695a3ad29c35e9a72a43ec93766733ac6c20b
2010-09-29 13:53:14 -04:00
Fritz Koenig
439b2ecd74 Merge "Optimizations on the loopfilters." 2010-09-29 10:47:01 -07:00
John Koleszar
b9be7a464f Fix loopfilter delta zero transitions
Loopfilter deltas are initialized to zero on keyframes in the decoder.
The values then persist from the previous frame unless an update bit
is set in the bitstream. This data is not included in the entropy
data saved by the 'refresh entropy' bit in the bitstream, so it is
effectively an additional contextual element beyond the 3 ref-frames
and the entropy data.

The encoder was treating this delta update bit as update-if-nonzero,
meaning that the value would be refreshed even if it hadn't changed,
and more significantly, if the correct value for the delta changed
to zero, the update wouldn't be sent, and the decoder would preserve
the last (presumably non-zero) value.

This patch updates the encoder to send an update only if the value
has changed from the previously transmitted value. It also forces the
value to be transmitted in error resilient mode, to account for lost
context in the event of lost frames.

Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868
2010-09-29 13:04:04 -04:00
Paul Wilkins
7288cdf79d Change to coefficient optimization rules.
Allow coefficient optimization for good quality speed 0.

Change-Id: Id0cb363df6823c6798671584fbba097916a7df2c
2010-09-29 13:22:05 +01:00
Adrian Grange
4f92b96bdb Merge "Moved row-specific computation of MV bounds out of col loop" 2010-09-29 05:13:41 -07:00
Adrian Grange
0e7c45b391 Moved row-specific computation of MV bounds out of col loop
Moved the bounds computation on vertical MV component out
of the loop that processes MBs within a MB row.
2010-09-29 13:03:07 +01:00
Paul Wilkins
ff3068d6da Control of active min quantizer for two pass.
Create  look up tables for controlling the active quantizer range.
Some initial tuning to improve quality circa 0.5% on test set.
Clean up of some stats output code

Change-Id: Ia698a8525f8b8129a503cadace3ee73fe888f543
2010-09-29 12:03:19 +01:00
Fritz Koenig
0964ef0e71 Optimizations on the loopfilters.
- Scheduling for Atom processors
- Combining of macros to allow for better interleaving
- Change from multiplies to adds for main filter
- Use of movhps/movlps to fill xmm registers without
  shifting and orring

Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
2010-09-28 12:01:34 -07:00
Adrian Grange
47fc8f2683 Enabled AltRef motion map creation
Enabled the first-pass encode to output the
map of macroblock coding modes required by
the AltRef filter.
2010-09-28 16:52:19 +01:00
Adrian Grange
0090328164 Merge "Made AltRef filter adaptive & added motion compensation" 2010-09-28 08:34:44 -07:00
Adrian Grange
1b2f8308e4 Made AltRef filter adaptive & added motion compensation
Modified AltRef temporal filter to adapt filter length based
on macroblock coding modes selected during first-pass
encode.

Also added sub-pixel motion compensation to the AltRef
filter.
2010-09-28 15:23:41 +01:00
Timothy B. Terriberry
18dc92fd66 Add 4-tap version of 2nd-pass ARMv6 MC filter.
The existing code applied a 6-tap filter with 0's on either end.
We're already paying the branch penalty to avoid computing the two
 extra columns needed as input to this filter.
We might as well save time computing the filter as well.
This reduces the inner loop from 21 instructions to 16, the number
 of loads per iteration from 4 to 1, and the number of multiplies
 from 7 to 4.
The gain in overall decoding performance, however, is small (less
 than 1%).

This change also means we now valgrind clean on ARMv6, which is
 its real purpose.
The errors reported here were valgrind's fault (it does not detect
 that 0 times an uninitialized value is initialized), but Julian
 Seward says it would slow down valgrind considerably to make such
 checks.
Speeding up libvpx rather, even by a small amount, seems a much
 better idea if only to enable proper valgrind checking of the
 rest of the codec.

Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
2010-09-27 18:25:45 -07:00
Paul Wilkins
305be4e417 Badly placed initialization of rolling rate monitors.
This affects control of the active quantizer range.

Change-Id: I30511fc81ac9f75ff20d9f1372382423d56739da
2010-09-27 12:50:55 -04:00
John Koleszar
2b521ab551 move reconintra_mt to decoder (fixup)
Missed the .h file in the move.

Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc
2010-09-27 12:48:31 -04:00
John Koleszar
9fdcdc511d Merge "disable compilation of debugging code" 2010-09-27 07:00:03 -07:00
Johann
063be9b82a Merge "combine max values and compare once" 2010-09-27 06:39:20 -07:00
Timothy B. Terriberry
e2795e9978 Fix valgrind errors in vp8_sixtap_predict8x4_armv6().
This function was accessing values below the stack pointer, which
 can be corrupted by signal delivery at any time.

Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
2010-09-24 14:34:18 -07:00
Johann
f30e8dd7bd combine max values and compare once
previous implementation compared each set of values to limit and then
&'d them together, requiring a compare and & for each value.

this does the accumulation first, requiring only one compare

Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323
2010-09-24 15:42:50 -04:00
John Koleszar
dbd57c2663 Merge "move reconintra_mt to decoder (for now)" 2010-09-24 08:46:35 -07:00
John Koleszar
8ca779aba8 disable compilation of debugging code
This patch avoids compiling some debugging code in onyx_if.c. The most
significant fix is to avoid generating code for vp8_write_yuv_frame,
which is never called. Some other code was removed by the dead code
elimination performed by the compiler, and this patch does it with the
preprocessor instead. There are advantages both ways.

Change-Id: I044fd43179d2e947553f0d6f2cad5b40907ac458
2010-09-24 11:42:22 -04:00
Yunqing Wang
aab0f5b121 Merge "Adjust multi-thread sync ranges according to image sizes" 2010-09-24 08:34:07 -07:00
John Koleszar
48e76ff4fd move reconintra_mt to decoder (for now)
reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.

This patch is needed to build with --disable-vp8-decoder.

Change-Id: I568c52221a2b309234d269675cba97131ce35c86
2010-09-24 11:23:06 -04:00
John Koleszar
329aaaf453 Merge "Add getter functions for the interface data symbols" 2010-09-24 05:39:48 -07:00
John Koleszar
fa7a55bb04 Add getter functions for the interface data symbols
Having these symbols be available as functions rather than data is
occasionally more convenient. Implemented this way rather than a
get-codec-by-id style to avoid creating a link-time dependency
between the encoder and the decoder.

Fixes issue #169

Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d
2010-09-23 14:58:43 -04:00
Yunqing Wang
8db5da2906 Adjust multi-thread sync ranges according to image sizes
In multi-threaded decoder, set different sync ranges for
different video resolutions.

Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7
2010-09-23 13:53:09 -04:00
Johann
7fed3832e7 Remove dead code
The new loopfilter was originally introduced as an experimental change.
It's permanent now.

Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546
2010-09-22 11:07:34 -04:00
John Koleszar
cdd2066687 unset execute bit on c source
Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
2010-09-21 19:48:06 -04:00
John Koleszar
6f4c0435d1 Merge "Don't reset mb clamping state during splitmv decoding" 2010-09-21 09:06:59 -07:00
John Koleszar
4d391e8ed2 Don't reset mb clamping state during splitmv decoding
The MV decoding changes in c5fb0eb introduced a bug where the
macroblock clamping state was reset for each partition, so if an
earlier partition needed clamping but a subsequent one didn't,
the MB wouldn't receive clamping. Instead, the state is only
set during splitmv decoding, never cleared.

Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
2010-09-21 11:58:48 -04:00
John Koleszar
015cfcafbd Merge "Add high limit check for unsigned parameters" 2010-09-21 05:36:46 -07:00
Yunqing Wang
a23ccf8f8c Merge "Restructure multi-threaded decoder" 2010-09-21 05:00:30 -07:00
Fritz Koenig
b7dc9398f2 Use movq instead of movdqu.
Movdqu is more expensive (throughput, uops) than movq.  Minimal
impact for newer big cores, but ~2.25% gain on Atom.

Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
2010-09-20 11:34:26 -07:00
Fritz Koenig
1c906448cc Merge "Better choice of instruction filter mask comparision." 2010-09-20 11:01:51 -07:00
Johann
6cf2b4aa0e Merge "reorder data to use wider instructions" 2010-09-20 10:47:33 -07:00
Johann
9c9afbab85 Merge "Update NEON wide idcts" 2010-09-20 10:47:22 -07:00
Fritz Koenig
8eae7fe7e8 Better choice of instruction filter mask comparision.
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.

Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
2010-09-20 10:20:38 -07:00
Guillermo Ballester Valor
236906863a Add high limit check for unsigned parameters
The patch related with issue #55 (5a72620) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.

This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.

Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
2010-09-20 10:03:05 -04:00
Johann
022323bf85 reorder data to use wider instructions
the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data

looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
2010-09-17 16:47:39 -04:00
Yunqing Wang
f857a85088 Restructure multi-threaded decoder
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.

The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.

Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.

Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
2010-09-17 09:56:05 -04:00
John Koleszar
9100073e8d cleanup: remove unused xprintf
These files aren't currently used, and we can get them back if we
need them.

Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
2010-09-16 13:14:12 -04:00
John Koleszar
147b125b15 Reduce size of tokenizer tables
This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.

Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
2010-09-16 10:00:04 -04:00
Fritz Koenig
769f2424cc Removed unnecessary pxor.
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.

Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
2010-09-13 18:34:34 -07:00
Fritz Koenig
71a1c19754 Merge "Make block access to frame buffer sequential" 2010-09-13 11:04:22 -07:00
Fritz Koenig
a65cd3def0 Make block access to frame buffer sequential
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.

Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
2010-09-10 16:27:28 -07:00
Scott LaVarnway
a32ded1d5f Merge "Improved subset block search" 2010-09-09 11:51:29 -07:00
Scott LaVarnway
c5fb0eb8d9 Improved subset block search
Improved the subset block search and fill.  (about 3% improvement for
32 bit)  Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.

Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
2010-09-09 14:42:48 -04:00
Johann
14ba764219 Update NEON wide idcts
Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost

Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
2010-09-09 14:08:12 -04:00
John Koleszar
edcbb1c199 Fix GF interval for non-lagged ARFs
When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd, but this incorrect GF interval caused a quality regression.

Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
2010-09-09 13:18:54 -04:00
Fritz Koenig
6d90f867e4 Merge branch 'master' of git://review.webmproject.org/libvpx 2010-09-09 08:54:21 -07:00
John Koleszar
c2140b8af1 Use WebM in copyright notice for consistency
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.

Fixes issue #97.

Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-09 10:01:21 -04:00
Jim Bankoski
69ae8f475d Skip unnecessary search of identical frames
vp8_get_compressed_data() was defeating logic in
encode_frame_to_datarate() that determined the reference buffers to
search and forcing all frames to be eligible to search. In cases
where buffers have identical contents, this is unnecessary extra
work.

Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114
2010-09-08 11:31:34 -04:00
Jim Bankoski
63ccfbd545 Enable ARFs for non-lagged compress
ARFs were explicitly disabled except in lagged compress mode. New
ARF logic allows for the ARF buffer to hold an older golden frame,
which does not require lagged compress.

Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79
2010-09-08 11:26:13 -04:00
Fritz Koenig
3fb37162a8 Bilinear subpixel optimizations for ssse3.
Used pmaddubsw for multiply and add of two filter taps
at once for 16x16 and 8x8 blocks.

Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
2010-09-07 17:19:40 -07:00
Scott LaVarnway
0de458f6b9 Reduced the size of MB_MODE_INFO
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK.  Also reduced the size of other members
of the MB_MODE_INFO struct.  For 1080p, the memory was reduced
by 1,209,516 bytes.  The decoder performance appeared to improve
by 3% for the clip used.
Note:  The main goal for this change is to improve the decoder
performance.  The encoder will be revisited at a later date for
further structure cleanup.

Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
2010-09-03 16:43:23 -04:00
John Koleszar
4496db45e3 Whitespace: nuke CRLFs
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
2010-09-02 13:33:01 -04:00
James Zern
76640f85da encoder: remove postproc dependency
Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.

Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.

This addresses issue #153.

Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
2010-09-02 11:52:37 -04:00
John Koleszar
7a3e0a1d93 Merge "added separate rounding/zbin constants for 2nd order" 2010-09-02 08:42:29 -07:00
John Koleszar
9398be0f46 Merge "Disable frame dropping by default" 2010-09-02 08:41:46 -07:00
Yaowu Xu
fca129203a added separate rounding/zbin constants for 2nd order
This allows experiments of using different rounding and
zerobin constants for 2nd order blocks.

Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71
2010-09-02 10:27:03 -04:00
John Koleszar
23216211bc Disable frame dropping by default
This is not the behavior that most users expect.

Change-Id: I226126ea400c22cf1f7918e80ea7fe0771c569cb
2010-09-02 09:32:03 -04:00
Frank Galligan
d45e55015e Fix rare deadlock before loop filter
There was an extremely rare deadlock that happened when one thread
was waiting to start the loop filter on frame n while the other
threads were starting to work on frame n+1.

Change-Id: Icc94f728b3b6663405435640d9a2996735ba19ef
2010-09-01 22:01:21 -04:00
Paul Wilkins
18c902f8a4 Merge "Improved Force Key Frame Behaviour" 2010-09-01 02:45:12 -07:00
Yunqing Wang
0e78efad0b Replace sleep(0) calls in multi-threaded decoder
This is a workaround for gLucid problem.

Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa
2010-08-31 20:37:11 -04:00
Paul Wilkins
c239a1b67c Improved Force Key Frame Behaviour
These changes improve the behaviour of the code with
forced key frames sent in by a calling application.

The sizing of the frames is still suboptimal for two pass in
particular but the behaviour is much better than it was.

Change-Id: I35fae610c67688ccc69d11f385e87dfc884e65a1
2010-08-31 14:32:40 -04:00
Johann
0b94f5d6e8 followup arm patch
make the arm asm detokenizer work with the new structures

Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a
2010-08-31 11:41:10 -04:00
Scott LaVarnway
e85e631504 Changed above and left context data layout
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit)  This layout should be more
cache friendly.

As a result of this change, the encoder had to be updated.

Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
2010-08-31 11:24:30 -04:00
John Koleszar
aaad6d1b54 Merge "Fix harmless off-by-1 error." 2010-08-30 12:40:42 -07:00
John Koleszar
674e477b81 Merge "increase rate control buffer level precision" 2010-08-30 07:49:35 -07:00
Timothy B. Terriberry
7a8e0a2935 Fix harmless off-by-1 error.
The memory being zeroed in vp8_update_mode_info_border() was just
 allocated with calloc, and so the entire function is actually
 redundant, but it should be made correct in case someone expects
 it to actually work in the future.

Change-Id: If7a84e489157ab34ab77ec6e2fe034fb71cf8c79
2010-08-27 16:07:54 -07:00
Johann
5c244398e1 clean up compiler warnings
did a test compile with clang and got rid of some warnings that have
been annoying me for a while:
vp8/decoder/detokenize.c: In function 'vp8_init_detokenizer':
vp8/decoder/detokenize.c:121: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:122: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:123: warning: assignment from incompatible pointer type
vp8/decoder/detokenize.c:124: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:125: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:128: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:129: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:130: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:131: warning: assignment discards qualifiers from pointer target type

Change-Id: I78ddab176fe47cbeed30379709dc7bab01c0c2e4
2010-08-24 18:23:16 -04:00
Johann
d73217ab17 update structures
mbmi and eob moved in previous commits

Change-Id: I30a2eba36addf89ee50b406ad4afdd059a832711
2010-08-23 13:44:56 -04:00
Fritz Koenig
93c32a55c2 Rework idct calling structure.
Moving the eob structure allows for a non-struct based
function to handle decoding an entire mb of
idct/dequant/recon data.  This allows for SIMD functions
to idct/dequant/recon multiple blocks at once.

SSE2 implementation gives 3% gain on Atom.

Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
2010-08-23 08:58:54 -07:00
John Koleszar
8e7ebacb19 increase rate control buffer level precision
The external API exposes the RC initial/optimal/full buffer level in
milliseconds, but this value was truncated internally to seconds. This
patch allows the use of the full precision during the conversion from
time to bits.

Change-Id: If8dd2a87614c05747f81432cbe75dd9e6ed2f04e
2010-08-20 11:04:48 -04:00
Jim Bankoski
b0660457fe Revert "Removed ssse3 sixtap code"
This reverts commit 6ea5bb85cd.
2010-08-19 15:58:27 -04:00
Johann
52852da7c9 cleanup simple loop filter
move some things around, reorder some instructions

constant 0 is used several times. load it once per call in horiz,
once per loop in vert.

separate saturating instructions to avoid stalls.

just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0

document some stalls for further consideration

Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec
2010-08-19 13:37:40 -04:00
Johann
a522be2941 Merge "fix armv6 simpleloop filter" 2010-08-19 08:31:57 -07:00
Johann
467a0b99ab fix armv6 simpleloop filter
test cases were causing a crash because the count was being read
incorrectly. after fixing that, noticed that the output was not
matching. fixed that.

Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27
2010-08-19 11:29:21 -04:00
Scott LaVarnway
6ea5bb85cd Removed ssse3 sixtap code
Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4
2010-08-18 15:34:09 -04:00
John Koleszar
496cf8cc48 Merge "store more vars than we removed" 2010-08-16 07:54:48 -07:00
Johann
c75f3993c0 store more vars than we removed
only saved r4-11+lr, but were storing r4-r12+lr

Change-Id: If77df1998af50e9badee7d99ef53543046434675
2010-08-16 10:32:15 -04:00
John Koleszar
9aa498b82a arm: fix missing dependency with --enable-shared
The C version of the dequant/idct/add function depends on the C
version of the IDCT, but this isn't compiled in on ARM. Since this
code has asm version, we can just remove this file to eliminate the
link error.

Change-Id: I21de74d89d3765a1db2da27292b20727c53178e9
2010-08-16 09:34:34 -04:00
John Koleszar
80d3923a78 move segmentation_common to encoder
vp8_update_gf_useage_maps() is only used by the encoder. This patch
fixes the ability to build in decode-only or encode-only
configurations.

Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa
2010-08-13 14:54:24 -04:00
Johann
9602799cd9 framework for assembly version of the detokenizer
adds a compile time option: --enable-arm-asm-detok which pulls in
vp8/decoder/arm/detokenize.asm

currently about break even speed wise, but changes are pending to
the fill code (branch and load 3 bytes versus conditionally always
load one) and the error handling. Currently it doesn't handle zero
runs or overrunning the buffer.

this is really just so i don't have to rebase my changes all the
time to run benchmarks - now just need to replace one file!

Change-Id: I56d0e2354dc0ca3811bffd0e88fe1f952fa6c797
2010-08-12 16:39:56 -04:00
Johann
633646b73b update structure
mode_info_context->mbmi no longer gets copied up a level

Change-Id: Icd2d27d381909721326c34594a1ccdc26d48a995
2010-08-12 16:37:55 -04:00
Johann
1ec7981c34 remove unused definition
asm_offsets contains some definitions which are no longer used. this
was one of them. v6 build works now

Change-Id: If370cfa8acd145de4fead2d9a11b048fccc090df
2010-08-12 16:37:55 -04:00
Scott LaVarnway
9c7a0090e0 Removed unnecessary MB_MODE_INFO copies
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD.  As a result,
a large number compile errors had to be fixed.

Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
2010-08-12 16:25:43 -04:00
Scott LaVarnway
f5615b6149 Merge "Finished vp8_sixtap_predict4x4_ssse3 function" 2010-08-11 12:23:24 -07:00
John Koleszar
d22e2968a8 cosmetics: add missing 2D array braces
Silences compile warning.

Change-Id: I4b207d97f8570fe29aa2710e4ce4f02e7e43b57a
2010-08-11 13:55:38 -04:00
John Koleszar
392a958274 avoid negative array subscript warnings
The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV
and LEFT4X4, respectively, rather than being zero-based like the
other token encodings.

Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5
2010-08-11 13:49:12 -04:00
Scott LaVarnway
b07e5b6fa1 Finished vp8_sixtap_predict4x4_ssse3 function
Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3
assembly routines.  Also removed unused assembly.

Change-Id: I01c1021835f2edda9da706822345f217087ca0d0
2010-08-11 13:49:00 -04:00
Johann
c0ba42d3c0 rename DETOK_[AL]
everything else uses lowercase detok

Change-Id: I9671e2e90eb2961208dfa81c00b3accb5749ec04
2010-08-11 13:36:35 -04:00
Scott LaVarnway
99f46d62d9 Moved gf_active code to encoder only
The gf_active code is only used by the encoder, so it was moved from
common and decoder.

Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025
2010-08-11 11:54:25 -04:00
Yaowu Xu
c404fa42ac Removed duplicate functions
Change-Id: Ie587972ccefd3c762b8cdf8ef39345cd22924b9b
2010-08-10 21:45:34 -07:00
Yaowu Xu
3b95a46c55 Normalize quantizer's zero bin and rounding factors
This patch changes a few numbers in the two constant arrays
for quantizer's zerobin and rounding factors, in general to
make the sum of the two factors for any Q to be 128.  While
it might be beneficial to calibrate the two arrays for best
quantizer performance, it is not the purpose of this patch.
Normalizing the two arrays will enable quick optimization
of the current faster quantizer, i.e .zerobin check can be
removed.

Change-Id: If9abfd7929bf4b8e9ecd64a79d817c6728c820bd
2010-08-10 21:12:04 -07:00
Timothy B. Terriberry
8fa38096a3 Add trellis quantization.
Replace the exponential search for optimal rounding during
 quantization with a linear Viterbi trellis and enable it
 by default when using --best.
Right now this operates on top of the output of the adaptive
 zero-bin quantizer in vp8_regular_quantize_b() and gives a small
 gain.
It can be tested as a replacement for that quantizer by
 enabling the call to vp8_strict_quantize_b(), which uses
 normal rounding and no zero bin offset.
Ultimately, the quantizer will have to become a function of lambda
 in order to take advantage of activity masking, since there is
 limited ability to change the quantization factor itself.
However, currently vp8_strict_quantize_b() plus the trellis
 quantizer (which is lambda-dependent) loses to
 vp8_regular_quantize_b() alone (which is not) on my test clip.

Patch Set 3:

Fix an issue related to the cost evaluation of successor
states when a coefficient is reduced to zero. With this
issue fixed, now the trellis search almost exactly matches
the exponential search.

Patch Set 2:

Overall, the goal of this patch set is to make "trellis"
search to produce encodings that match the exponential
search version. There are three main differences between
Patch Set 2 and 1:
a. Patch set 1 did not properly account for the scale of
2nd order error, so patch set 2 disable it all together
for 2nd blocks.
b. Patch set 1 was not consistent on when to enable the
the quantization optimization. Patch set 2 restore the
condition to be consistent.
c. Patch set 1 checks quantized level L-1, and L for any
input coefficient was quantized to L. Patch set 2 limits
the candidate coefficient to those that were rounded up
to L. It is worth noting here that a strategy to check
L and L+1 for coefficients that were truncated down to L
might work.

(a and b get trellis quant to basically match the exponential
search on all mid/low rate encodings on cif set, without
a, b, trellis quant can hurt the psnr by 0.2 to .3db at
200kbps for some cif clips)
(c gets trellis quant  to match the exponential search
to match at Q0 encoding, without c, trellis quant can be
1.5 to 2db lower for encodings with fixed Q at 0 on most
derf cif clips)

Change-Id:	Ib1a043b665d75fbf00cb0257b7c18e90eebab95e
2010-08-10 20:58:24 -07:00
Scott LaVarnway
e4fe866949 Added ssse3 version of sixtap filters
Improved decoder performance by 9% for the clip used.

Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9
2010-08-10 17:33:49 -04:00
Yunqing Wang
ba2e107d28 First modification of multi-thread decoder
This is the first modification of VP8 multi-thread decoder, which uses
same threads to decode macroblocks and then do loopfiltering for each
frame.

Inspired by Rob Clark, synchronization was done on every 8 macroblocks
instead of every macroblock to reduce lock contention.

Comparing with the original code, this implementation gave about 15%-
20% performance gain while decoding my test clips on a Core2 Quad
platform (Linux).

The work is not done yet.

Test on other platforms are needed.

Change-Id: Ice9ddb0b511af1359b9f71e65066143c04fef3b5
2010-08-10 14:09:57 -04:00
John Koleszar
618c7d27a0 Mark loopfilter C functions as static
Clang defaults to C99 mode, and inline works differently in C99.
(gcc, on the other hand, defaults to a special gnu-style inlining,
which uses different syntax.)   Making the functions static makes sure
clang doesn't decide to discard a function because it's too large to
inline.

Thanks to eli.friedman for the patch.

Fixes http://code.google.com/p/webm/issues/detail?id=114

Change-Id: If3c1c3c176eb855a584a60007237283b0cc631a4
2010-08-09 09:36:44 -04:00
John Koleszar
cfb204eaf7 Merge "Issue 150: Fixing linker warning in extend.c." 2010-08-02 09:35:05 -07:00
Jan Kratochvil
0e8f108fb0 nasm: avoid space before the :data symbol type.
global label:data
           ^^

Provide nasm compatibility.  No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id:	I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021
2010-08-02 09:20:42 -04:00
Jan Kratochvil
0327d3df90 nasm: end labels with colon (':')
Labels should end by colon (':'), nasm requires it.

Provide nasm compatibility.  No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f
2010-08-02 09:20:03 -04:00
Jan Kratochvil
c8134bc54a nasm: use OWORD vs DQWORD
nasm knows only OWORD.  yasm knows both OWORD and DQWORD.

Provide nasm compatibility.  No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I62151390089e90df9a7667822fa594ac20b00e78
2010-08-02 09:17:14 -04:00
John Koleszar
675298216d Merge "Replace pinsrw (SSE) with MMX instructions" 2010-08-02 06:16:26 -07:00
Philip Jägenstedt
7d243701d9 Replace pinsrw (SSE) with MMX instructions
Fixes http://code.google.com/p/webm/issues/detail?id=136

Change-Id:	I5a3e294061644a1a9718e8ba4a39548ede25cc42
2010-08-02 09:15:45 -04:00
John Koleszar
38a20e030f apple: include proper mach primatives
Fixes implicit declaration warning for 'mach_task_self'.

Patch courtesy of timeless at gmail.com

Change-Id: I9991dedd1ccfddc092eca86705ecbc3b764b799d
2010-07-29 17:04:44 -04:00
Yaowu Xu
c2a8d8b54c Merge "Enable the switch between two versions of quantizer" 2010-07-29 07:17:40 -07:00
Frank Galligan
062e6c1886 Removed two unused global variables.
Removed the global variables vp8_an and vp8_cd. vp8_an was causing problems
because it was increasing the .bss by 1572864 bytes.

Change-Id: I6c12e294133c7fb6e770c0e4536d8287a5720a87
2010-07-28 17:25:09 -04:00
Yaowu Xu
f95c80b60f Enable the switch between two versions of quantizer
To facilitate more testing related to quantizer and rate
control, the old version quantizer is added back. old and
new quantizer can be switched back and forth by define or
un-define the macro "EXACT_QUANT".

Change-Id: Ia77e687622421550f10e9d65a9884128a79a65ff
2010-07-28 10:51:34 -07:00
John Koleszar
aa82363c46 Merge "msvs: fix install of codec sources" 2010-07-27 11:21:42 -07:00
Johann
a570bbd418 x86/sse2: disable asm quantizer
follow up to Change I0e51492d: neon: disable asm quantizer

Now x86 doesn't segfault with --disable-runtime-cpu-detect and -p=2

Change-Id: I8ca127bb299198efebbcbd5a661e81788361933f
2010-07-27 12:54:43 -04:00
Johann
b9a038a5ed Fix build w/o RTCD
So many places to update ...

Change-Id: Ide957b40cc833f99c2d1849acade6850fbf7585d
2010-07-27 11:56:19 -04:00
John Koleszar
d8009c077a neon: disable asm quantizer
The assembly version of the quantizer has not been updated to match
the new exact quantizer introduced in commit e04e2935. That commit tried
to disable this code but missed the non-RTCD case.

Thanks to David Baker <david.baker at openmarket.com> for isolating the
issue and testing this fix.

Change-Id: I0e51492dc6f8e44d2c10b587427448bf94135c65
2010-07-27 11:16:19 -04:00
Fritz Koenig
1743f9486b Merge "update arm idct functions" 2010-07-26 06:05:39 -07:00
Fritz Koenig
3de8a95831 Merge changes I896fe6f9,I90d8b167
* changes:
  Change the x86 idct functions to do reconstruction at the same time
  Combine idct and reconstruction steps
2010-07-26 06:05:30 -07:00
Johann
56f5a9a060 update arm idct functions
Jeff Muizelaar posted some changes to the idct/reconstruction c code.
This is the equivalent update for the arm assembly.

This shows a good boost on v6, and a minor boost on neon.
Here are some numbers for highway in qcif, 2641 frames:
HEAD neon: ~161 fps
new neon:  ~162 fps
HEAD v6:   ~102 fps
new v6:    ~106 fps

The following functions have been updated for armv6 and neon:
vp8_dc_only_idct_add
vp8_dequant_idct_add
vp8_dequant_dc_idct_add

Conflicts:

	vp8/decoder/arm/armv6/dequantdcidct_v6.asm
	vp8/decoder/arm/armv6/dequantidct_v6.asm

Resolved by removing these files. When I rewrote the functions, I also
moved the files to dequant_dc_idct_v6.asm/dequant_idct_v6.asm

Change-Id: Ie3300df824d52474eca1a5134cf22d8b7809a5d4
2010-07-26 08:55:19 -04:00
Justin Lebar
1d8277f8e8 Issue 150: Fixing linker warning in extend.c. 2010-07-23 16:42:25 -07:00
Fredrik Söderquist
2add72d9bc Don't dereference ctx->priv if it hasn't been setup correctly. 2010-07-23 19:13:50 -04:00
Fredrik Söderquist
eafcf918a0 Only touch ctx->priv if vp8_mmap_alloc succeeded. 2010-07-23 19:13:34 -04:00
Jeff Muizelaar
98fcccfe97 Change the x86 idct functions to do reconstruction at the same time
Change-Id: I896fe6f9664e6849c7cee2cc6bb4e045eb42540f
2010-07-23 15:21:36 -04:00
Jeff Muizelaar
b2fa74ac18 Combine idct and reconstruction steps
This moves the prediction step before the idct and combines the idct and
reconstruction steps into a single step. Combining them seems to give an
overall decoder performance improvement of about 1%.

Change-Id: I90d8b167ec70d79c7ba2ee484106a78b3d16e318
2010-07-23 15:21:36 -04:00
Fritz Koenig
0ce3901282 Swap alt/gold/new/last frame buffer ptrs instead of copying.
At the end of the decode, frame buffers were being copied.
The frames are not updated after the copy, they are just
for reference on later frames.  This change allows multiple
references to the same frame buffer instead of copying it.

Changes needed to be made to the encoder to handle this.  The
encoder is still doing frame buffer copies in similar places
where pointer reference could be done.

Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66
2010-07-23 14:53:59 -04:00
Paul Wilkins
68cf24310b Merge commit 'refs/changes/51/351/1' of ssh://review.webmproject.org:29418/libvpx into KfRateBugMerged 2010-07-23 17:45:26 +01:00
Yaowu Xu
f5cf8553a2 Merge "Make the quantizer exact." 2010-07-23 09:26:26 -07:00
Paul Wilkins
9404c7db6d Rate control bug with long key frame interval.
In two pass encodes, the calculation of the number of bits
allocated to a KF group had the potential to overflow for high data
rates if the interval is very long.

We observed the problem in one test clip where there was one
section where there was an 8000 frame gap between key frames.

Change-Id: Ic48eb86271775d7573b4afd166b567b64f25b787
2010-07-23 17:01:12 +01:00
Timothy B. Terriberry
e04e293522 Make the quantizer exact.
This replaces the approximate division-by-multiplication in the
 quantizer with an exact one that costs just one add and one
 shift extra.
The asm versions have not been updated in this patch, and thus
 have been disabled, since the new method requires different
 multipliers which are not compatible with the old method.

Change-Id: I53ac887af0f969d906e464c88b1f4be69c6b1206
2010-07-23 08:48:01 -07:00
Paul Wilkins
d576690ba1 80 character line length on Arnr LUT
Tweaked table to fit to 80 characters.

Change-Id: Ie6ba80e0b31b33e23d2bf78599abe223369fcefb
2010-07-23 16:47:54 +01:00
Fritz Koenig
08eed049d4 Remove CONFIG_NEW_TOKENS files.
These files were out of date and no longer maintained.
Token decoding has implemented the no-crash code which
is incompatible with this arm assembly code.

Change-Id: Ibf729886c56fca48181af60b44bda896c30023fc
2010-07-22 19:00:21 -04:00
John Koleszar
4d86ef3534 msvs: fix install of codec sources
The libs.mk file must be installed for the vpx.vcproj file to be
generated. It was being installed, but not in the src/ directory as
expected.

Also missed include files yasm.rules, quantize_x86.h

Change-Id: Ic1a6f836e953bfc954d6e42a18c102a0114821eb
2010-07-22 18:33:25 -04:00
Johann
160d671e34 Merge "limit range checking code for L[k] to CONFIG_DEBUG. patch by timeless@gmail.com" 2010-07-21 12:59:39 -07:00
Yaowu Xu
7a89d4c3d4 Merge "Improve the accuracy of forward walsh-hadamard transform" 2010-07-19 07:50:26 -07:00
Paul Wilkins
0ba32632cd ARNR Lookup Table.
Change submitted for Adrian Grange. Convert threshold
calculation in ARNR filter to a lookup table.

Change-Id: I12a4bbb96b9ce6231ce2a6ecc2d295610d49e7ec
2010-07-19 14:46:42 +01:00
Paul Wilkins
02277b8aa3 Parameter limit change.
Change maximum ARNR filter width to 15.

Change-Id: I3b72450ea08e96287445ec18810630ee2292954c
2010-07-19 14:39:43 +01:00