Commit Graph

550 Commits

Author SHA1 Message Date
Aron Rosenberg
eeb8117303 Fix semaphore emulation on Windows
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.

This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.

Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
2011-05-06 00:13:59 -04:00
Yunqing Wang
eb16f00cf2 Fix rare hang in multi-thread encoder on Windows
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.

Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
2011-05-05 10:42:29 -04:00
Yunqing Wang
aeb86d615c Merge "Runtime detection of available processor cores." 2011-05-05 04:59:54 -07:00
Yunqing Wang
3fbade23a2 Merge "Modify HEX search" 2011-05-03 11:59:32 -07:00
Yunqing Wang
04ec930abc Modify HEX search
Changed 8-neighbor searching to 4-neighour searching, and continued
searching until the center point is the best match.

Test on test set showed 1.3% encoding speed improvement as well as
0.1% PSNR and SSIM improvement at speed=-5 (rt mode).

Will continue to improve it.

Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
2011-05-03 14:26:33 -04:00
Yaowu Xu
e9465daee3 Merge "change to use fast ssim code for internal ssim calculations" 2011-05-03 11:20:52 -07:00
Yaowu Xu
6c565fada0 change to use fast ssim code for internal ssim calculations
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged

Change-Id: I1d874073cddca00f3c997f4b9a9a3db0aa212276
2011-05-03 08:36:17 -07:00
Yaowu Xu
57ad189129 changed configure option name to reduce confusion
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35

Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
2011-04-29 09:39:05 -07:00
Yunqing Wang
dfa9e2c5ea Merge "Use insertion sort instead of quick sort" 2011-04-29 08:27:58 -07:00
Scott LaVarnway
ccd6f7ed77 Consolidated build inter predictors
Code cleanup.

Change-Id: Ic8b0167851116c64ddf08e8a3d302fb09ab61146
2011-04-28 10:53:59 -04:00
Scott LaVarnway
2e102855f4 Removed unused code in reconinter
The skip flag is never set by the encoder for SPLITMV.

Change-Id: I5ae6457edb3a1193cb5b05a6d61772c13b1dc506
2011-04-27 15:25:32 -04:00
John Koleszar
085fb4b737 Merge "SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}()." 2011-04-27 12:02:55 -07:00
Ronald S. Bultje
1083fe4999 SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
decoding

before
10.425
10.432
10.423
=10.426

after:
10.405
10.416
10.398
=10.406, 0.2% faster

encoding

before
14.252
14.331
14.250
14.223
14.241
14.220
14.221
=14.248

after
14.095
14.090
14.085
14.095
14.064
14.081
14.089
=14.086, 1.1% faster

Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
2011-04-27 11:31:27 -07:00
Yunqing Wang
5abafcc381 Use insertion sort instead of quick sort
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.

Change-Id: Ie02eaa6fed662866a937299194c590d41b25bc3d
2011-04-27 13:53:28 -04:00
John Koleszar
db5057c742 Refactor calc_iframe_target_size
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.

Change-Id: I227051361cf46727caa5cd2b155752d2c9789364
2011-04-26 16:55:35 -04:00
John Koleszar
81d2206ff8 Move pick_frame_size() to ratectrl.c
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.

Change-Id: I66a9a62a5f9c23c818015e03f92f3757bf3bb5c8
2011-04-26 16:49:54 -04:00
Johann
d5c46bdfc0 Merge "remove simpler_lpf" 2011-04-25 14:51:07 -07:00
Johann
01527e743f remove simpler_lpf
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers

stop tracking the option in two places. use filter_type exclusively

Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25 17:37:41 -04:00
John Koleszar
fd6da3b2e7 Fix duplicate vp8_compute_frame_size_bounds
Likely introduced by a bad automatic merge from gerrit.

Change-Id: I0c6dd6ec18809cf9492f524d283fa4a3a8f4088b
2011-04-25 14:30:57 -04:00
John Koleszar
1f32b1489c Merge "Remove unused functions" 2011-04-25 11:05:00 -07:00
John Koleszar
47bc1c7013 Remove unused functions
Remove estimate_min_frame_size() and calc_low_ss_err(), as they are
never referenced.

Change-Id: I3293363c14ef70b79c4678ca27aa65b345077726
2011-04-25 13:54:23 -04:00
John Koleszar
cfbfd39de8 Merge "Change rc undershoot/overshoot semantics" 2011-04-25 10:49:32 -07:00
John Koleszar
76557e34d2 Merge "Limit size of initial keyframe in one-pass." 2011-04-25 10:48:13 -07:00
John Koleszar
d9f898ab6d Merge "Add rc_max_intra_bitrate_pct control" 2011-04-25 10:47:57 -07:00
John Koleszar
454cbc96b7 Limit size of initial keyframe in one-pass.
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.

This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.

Change-Id: Ieba55b86517b81e51e6f0a9fe27aabba295acab0
2011-04-25 13:47:20 -04:00
John Koleszar
aa926fbd27 Add rc_max_intra_bitrate_pct control
Adds a control to limit the maximum size of a keyframe, as a function of
the per-frame bitrate. See this thread[1] for more detailed discussion:

[1]: http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/271b944a5e47ca38

Change-Id: I7337707642eb8041d1e593efc2edfdf66db02a94
2011-04-25 13:47:14 -04:00
John Koleszar
2089b2cee5 Merge "bug fix possible keyframe context divide by zero" 2011-04-25 09:35:12 -07:00
James Berry
8d5ce819dd bug fix possible keyframe context divide by zero
vp8_adjust_key_frame_context() divides by
estimate_keyframe_frequency() which can
return 0 in the case where --kf-max-dist=0.

Change-Id: Idfc59653478a0073187cd2aa420e98a321103daa
2011-04-25 12:16:36 -04:00
Johann
aeca599087 Merge "keep values in registers during quantization" 2011-04-25 06:52:38 -07:00
Scott LaVarnway
5b67329747 Merge "Removed dc_diff from MB_MODE_INFO" 2011-04-25 06:45:32 -07:00
Ronald S. Bultje
496bcbb0de Fix overflow in temporal_filter_apply_sse2().
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.

Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
2011-04-22 10:00:38 -04:00
John Koleszar
73c3d32705 Merge "Remove unused kf rate variables" 2011-04-21 16:54:14 -07:00
Adrian Grange
d2a6eb4b1e Corrected format specifiers in debug print statements
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.

Change-Id: I3d2aa8a448d52e6dc08858d825bf394929b47cf3
2011-04-21 15:45:57 -07:00
Johann
508ae1b3d5 keep values in registers during quantization
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.

Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
2011-04-21 15:47:55 -04:00
Scott LaVarnway
3698c1f620 Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering.  Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.

Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
2011-04-21 14:38:36 -04:00
Scott LaVarnway
7a49accd0b Removed force_no_skip
force_no_skip is always set to zero.

Change-Id: I89b61c5e0bee34627a9c07c05f3517e1db76af77
2011-04-20 15:45:12 -04:00
Scott LaVarnway
09c933ea80 Removed redundant checks of the mode_info_context flags
Code cleanup.  The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.

Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
2011-04-20 14:06:40 -04:00
John Koleszar
ad6a8ca58b Remove unused kf rate variables
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.

Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.

Change-Id: Icbc2c83d2b90e184be03e6f9679e678f3a4bce8f
2011-04-19 16:14:57 -04:00
Johann
4a2b684ef4 modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.

Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19 10:42:45 -04:00
Johann
a9b465c5c9 Merge "Add save/restore xmm registers in x86 assembly code" 2011-04-19 06:32:10 -07:00
Johann
c7cfde42a9 Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.

Where possible, remove dependencies on xmm[67]

Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.

Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18 16:30:38 -04:00
Yunqing Wang
48438d6016 Merge "Use sub-pixel search's SSE in mode selection" 2011-04-18 13:20:04 -07:00
Yunqing Wang
b8f0b59985 Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.

Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
2011-04-18 16:12:28 -04:00
Johann
cd103a5721 Merge "store quant_shift as an unsigned char" 2011-04-18 10:03:40 -07:00
Yaowu Xu
c619f6cb0f Merge "fixed an overflow in ssim calculation" 2011-04-18 07:44:34 -07:00
Adrian Grange
0d2abe3084 Merge "Fix usage of value returned by vp8_pick_intra4x4mby_modes" 2011-04-15 08:37:19 -07:00
Yunqing Wang
1312a7a2e2 Merge "Reduce unnecessary distortion computation" 2011-04-15 08:17:03 -07:00
Yunqing Wang
918fb5487e Reduce unnecessary distortion computation
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.

Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
2011-04-14 15:53:33 -04:00
John Koleszar
63f15987a5 Merge "Refactor lookahead ring buffer" 2011-04-14 12:35:01 -07:00
Fritz Koenig
e749ae510f Merge "Use consistent delimiters." 2011-04-14 11:56:18 -07:00
Adrian Grange
8608de1c6f Fix usage of value returned by vp8_pick_intra4x4mby_modes
The value of distortion2 returned by vp8_pick_intra4x4mby_modes
was being overwritten by the value returned by get16x16prederror
before it was tested.

Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
2011-04-14 10:50:00 -07:00
Fritz Koenig
33cefd6f6e Use consistent delimiters.
opsnr.stt file was using \t for delimiters on everything
except between VPXSSIM and Time.

Change-Id: I6284c4e40c05ff642bf4b0170dca062c279a42df
2011-04-13 15:06:17 -07:00
Adrian Grange
8861174624 Fixed use of early breakout in vp8_pick_intra4x4mby_modes
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.

Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
2011-04-13 12:56:46 -07:00
John Koleszar
88841f1059 Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.

Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
2011-04-13 14:26:45 -04:00
Johann
70f30aa95d store quant_shift as an unsigned char
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant

only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize

Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
2011-04-13 13:50:12 -04:00
John Koleszar
c99f9d7abf Change rc undershoot/overshoot semantics
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.

This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.

Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
2011-04-12 20:49:33 -04:00
John Koleszar
538f110407 Merge "Bugfix for error accumulator stats" 2011-04-12 06:59:00 -07:00
John Koleszar
e689a27d62 Bugfix for error accumulator stats
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.

Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
2011-04-12 08:47:33 -04:00
Yunqing Wang
4fd81a99f8 Set cpu_used range to [-16, 16] in real-time mode
Remove encoding speed limitation in real-time mode.

Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
2011-04-11 15:55:04 -04:00
Yunqing Wang
d1abe62d1c Define RDCOST only once
Clean up the code.

Change-Id: I7db048efa4d972b528d553a7921bc45979621129
2011-04-11 11:53:56 -04:00
John Koleszar
a9ce3e3834 Remove unused files
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
2011-04-11 10:34:40 -04:00
Yunqing Wang
4b43167ad1 Fix input MV for full search
Input MV needs to be modified to full-pixel precision.

Change-Id: Ic5d78e41bf27077e325024332b9fe89f76c44f0c
2011-04-08 16:29:41 -04:00
Johann Koenig
6e156a4cd7 Merge "use asm_offsets with vp8_fast_quantize_b_sse3" 2011-04-08 10:05:47 -07:00
John Koleszar
921a32a306 Merge "Error accumulator stats bug." 2011-04-08 08:20:32 -07:00
Paul Wilkins
de4e9e3b44 Error accumulator stats bug.
The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.

These are only "currently" used in a limited way for RT compress
key frame detection.

Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
2011-04-08 14:21:36 +01:00
Jim Bankoski
d4cdb683a4 fixed an overflow in ssim calculation
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.

Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
2011-04-07 14:25:25 -07:00
Johann Koenig
08702002e8 use asm_offsets with vp8_fast_quantize_b_sse3
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.

Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
2011-04-07 16:40:05 -04:00
James Berry
aec5487cdd Use correct 32 bit comparisons for SAD breakout.
Rax updated to eax to avoid uninitialized memory
usage.

Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a
2011-04-07 15:08:03 -04:00
Johann
2de858b9fc Merge "use asm_offsets with vp8_fast_quantize_b_sse2" 2011-04-06 10:53:55 -07:00
Yunqing Wang
9e9f61a317 Merge "Minor modification" 2011-04-06 06:12:13 -07:00
Yunqing Wang
02423b2e92 Minor modification
A small change.

Change-Id: I2e7726e58370a95d0319361f4f6ad231138d1328
2011-04-06 09:08:47 -04:00
Johann
c32e0ecc59 use asm_offsets with vp8_fast_quantize_b_sse2
on the same order as the regular quantize change: ~2%

Change-Id: I5c9eec18e89ae7345dd96945cb740e6f349cee86
2011-04-04 16:23:29 -04:00
Scott LaVarnway
f212a98ee7 Fixed unused variable warnings for firstpass.c
Change-Id: I8378a9a541ade2f098359a7b20fa08e6c1596d80
2011-04-04 14:18:31 -04:00
Johann
610dd90288 Merge "tweak vp8_regular_quantize_b_sse2" 2011-04-04 08:56:25 -07:00
Yunqing Wang
f5c0d95e8c Merge "Use full-pixel MV in mvsadcost calculation" 2011-04-04 08:40:51 -07:00
Yunqing Wang
3d6815817c Use full-pixel MV in mvsadcost calculation
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.

Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
2011-04-01 16:41:58 -04:00
Johann
8520b5c785 tweak vp8_regular_quantize_b_sse2
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi

gains of about the same order as the obj_int_extract implementation: ~2%

Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
2011-04-01 09:58:23 -04:00
Johann
ba11e24d47 Merge "Wrapper function removed from vp8_subtract_b_neon function call" 2011-04-01 05:47:21 -07:00
Tero Rintaluoma
cec76a36d6 Wrapper function removed from vp8_subtract_b_neon function call
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
 - vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
   that contains all needed address calculations
 - unnecessary file encodemb_arm.c removed
 - consistent with ARMv6 optimized version

Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
2011-04-01 10:06:44 +03:00
Johann
9d138379a2 Merge "ARMv6 optimized subtract functions" 2011-03-31 08:40:10 -07:00
Attila Nagy
297b27655e Runtime detection of available processor cores.
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.

Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.

Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
2011-03-31 10:23:01 +03:00
Attila Nagy
7d335868df Fix: lpf semaphore was signaled in single threaded run
After picking filter level, post the loopfilter semaphore
just when multiple threads are in use.

Change-Id: If7bfb64601d906adef703f454dafc25e978b93c6
2011-03-30 15:55:29 +03:00
Johann
0e43668546 Merge "Half pixel variance further optimized for ARMv6" 2011-03-29 12:14:54 -07:00
Yunqing Wang
534ea700bd Merge "Fix a crash while enabling shared (--enable-shared)" 2011-03-29 09:04:22 -07:00
Yunqing Wang
b843aa4eda Fix a crash while enabling shared (--enable-shared)
Fixed a bug in SSSE3 sub-pixel filter functions.

Change-Id: I2e2126652970eb78307ffcefcace1efd5966fb0a
2011-03-29 11:31:06 -04:00
Johann
f0c22a3f33 use GLOBAL correctly on 32bit shared libraries
http://code.google.com/p/webm/issues/detail?id=309

Change-Id: I6fce9e2f74bc09a9f258df7f91ab599812324e8c
2011-03-29 11:27:03 -04:00
Tero Rintaluoma
6fdc9aa79f ARMv6 optimized subtract functions
Adds following ARMv6 optimized functions to encoder:
  - vp8_subtract_b_armv6
  - vp8_subtract_mby_armv6
  - vp8_subtract_mbuv_armv6

Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.

Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
2011-03-29 16:52:00 +03:00
Tero Rintaluoma
f5e433464b Half pixel variance further optimized for ARMv6
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.

Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
2011-03-28 09:51:51 +03:00
Johann
beaafefcf1 Merge "use asm_offsets with vp8_regular_quantize_b_sse2" 2011-03-24 11:06:36 -07:00
Johann
8edaf6e2f2 use asm_offsets with vp8_regular_quantize_b_sse2
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems

when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9

Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
2011-03-24 13:34:48 -04:00
Johann
4cde2ab765 Merge "ARMv6 optimized fdct4x4" 2011-03-23 07:52:51 -07:00
Yunqing Wang
73065b67e4 Merge "Fix multithreaded encoding for 1 MB wide frame" 2011-03-21 07:41:31 -07:00
John Koleszar
2cbd962088 Remove unused vp8_get4x4sse_cs_mmx declaration
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.

Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
2011-03-21 07:53:53 -04:00
John Koleszar
769c74c0ac Merge "Increase static linkage, remove unused functions" 2011-03-21 04:51:51 -07:00
Tero Rintaluoma
a61785b6a1 ARMv6 optimized fdct4x4
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
  - No interlocks in Cortex-A8 pipeline
  - One interlock cycle in ARM11 pipeline
  - About 2.16 times faster than current C-code compiled with -O3

Change-Id: I60484ecd144365da45bb68a960d30196b59952b8
2011-03-21 13:33:45 +02:00
Attila Nagy
bfe803bda3 Fix multithreaded encoding for 1 MB wide frame
Thread synchronization was not correct when frame width was 1 MB.
Number of allocated encoding threads is limited by the sync_range.
There is no point having more because each thread lags sync_range MBs
behind the thread processing the row above.

http://code.google.com/p/webm/issues/detail?id=302

Change-Id: Icaf67a883beecc5ebf2f11e9be47b6997fdf6f26
2011-03-18 12:35:30 +02:00
John Koleszar
429dc676b1 Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.

These symbols were identified by:

  $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
    | sort | grep '^ *1 '

Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-17 20:53:47 -04:00
John Koleszar
8431e768c9 Merge "Fix "used uninitialized" warning in vp8_pack_bitstream()" 2011-03-17 14:25:04 -07:00
Attila Nagy
71bcd9f1af Add vp8_variance8x8_armv6 and vp8_sub_pixel_variance8x8_armv6 functions
Change-Id: I08edaffc62514907fa5e90e1689269e467c857f5
2011-03-15 15:50:44 +02:00
Johann
d0ec28b3d3 Merge "Add vp8_mse16x16_armv6 function" 2011-03-14 12:47:42 -07:00