Commit Graph

1248 Commits

Author SHA1 Message Date
Johann
c7cfde42a9 Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.

Where possible, remove dependencies on xmm[67]

Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.

Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18 16:30:38 -04:00
Yunqing Wang
48438d6016 Merge "Use sub-pixel search's SSE in mode selection" 2011-04-18 13:20:04 -07:00
Yunqing Wang
b8f0b59985 Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.

Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
2011-04-18 16:12:28 -04:00
Yunqing Wang
d5069b5af0 Merge "Handle long delay between video frames in multi-thread decoder(issue 312)" 2011-04-18 10:11:41 -07:00
Johann
cd103a5721 Merge "store quant_shift as an unsigned char" 2011-04-18 10:03:40 -07:00
Yaowu Xu
05d9421e8b Merge "Add spin-wait pause intrinsic for Windows x64 platform." 2011-04-18 09:53:26 -07:00
Yaowu Xu
c619f6cb0f Merge "fixed an overflow in ssim calculation" 2011-04-18 07:44:34 -07:00
Scott LaVarnway
e1a8b6c8d5 Removed unused timers
Change-Id: I209803b9dbed2b2f6d02258fd7a3963a6645f4ab
2011-04-18 09:09:57 -04:00
John Koleszar
8fcb801d15 Merge "added -fomit-frame-pointer flag for gcc builds" 2011-04-18 06:07:57 -07:00
John Koleszar
0ba3fffc3a Merge remote branch 'origin/master' into experimental
Change-Id: I6ee7c49138576326887b32316cffe8d3e48aa044
2011-04-16 00:05:08 -04:00
Yunqing Wang
8ba58951e9 Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.

This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."

Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().

Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
2011-04-15 17:27:26 -04:00
Johann
d889035fe6 Merge "remove dead code, add missing RESTORE_XMM" 2011-04-15 13:32:54 -07:00
Scott LaVarnway
9409e38050 added -fomit-frame-pointer flag for gcc builds
According to the docs, this should have been enabled, but
the disassembled output shows otherwise.  This improved
the encode/decode performance.

Change-Id: I45ad7e6d299b89ac3166d7ef7da75b74994344c6
2011-04-15 15:59:21 -04:00
Johann
f64f425a50 remove executable bit
source files are not executable

Change-Id: Id2c7294695a22217468426423979f68f02d82340
2011-04-15 13:43:24 -04:00
Adrian Grange
0d2abe3084 Merge "Fix usage of value returned by vp8_pick_intra4x4mby_modes" 2011-04-15 08:37:19 -07:00
Yunqing Wang
1312a7a2e2 Merge "Reduce unnecessary distortion computation" 2011-04-15 08:17:03 -07:00
Johann
487c0299c9 remove dead code, add missing RESTORE_XMM
vp8_filter_block1d16_h4_ssse3 was never called

because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.

Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
2011-04-15 10:11:53 -04:00
John Koleszar
a3399291ad Fix off-by-one in copy_and_extend_plane
Should only copy h lines, not h+1.

Change-Id: I802a85686635900459c6dc79596189033e5298d8
2011-04-15 08:44:39 -04:00
John Koleszar
b4bb910b57 Merge remote branch 'origin/master' into experimental
Change-Id: Iacd40d38693f433cd25b071fc8420f563b242696
2011-04-15 00:05:09 -04:00
Yunqing Wang
918fb5487e Reduce unnecessary distortion computation
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.

Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
2011-04-14 15:53:33 -04:00
John Koleszar
63f15987a5 Merge "Refactor lookahead ring buffer" 2011-04-14 12:35:01 -07:00
Fritz Koenig
e749ae510f Merge "Use consistent delimiters." 2011-04-14 11:56:18 -07:00
Adrian Grange
8608de1c6f Fix usage of value returned by vp8_pick_intra4x4mby_modes
The value of distortion2 returned by vp8_pick_intra4x4mby_modes
was being overwritten by the value returned by get16x16prederror
before it was tested.

Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
2011-04-14 10:50:00 -07:00
Johann
ab48305fb6 Merge "update configure for ios sdk 4.3" 2011-04-14 08:55:22 -07:00
Joshua Bleecher Snyder
5e7a3bb69a update configure for ios sdk 4.3
update for the latest version of the ios sdk. adding
usr/lib/system fixes a missing libcache.dylib issue

make isysroot path more DRY

Change-Id: Ib748ef3dac3cac2e4848fbffa1e9a0112eac826b
2011-04-14 11:22:33 -04:00
Fritz Koenig
33cefd6f6e Use consistent delimiters.
opsnr.stt file was using \t for delimiters on everything
except between VPXSSIM and Time.

Change-Id: I6284c4e40c05ff642bf4b0170dca062c279a42df
2011-04-13 15:06:17 -07:00
Adrian Grange
8861174624 Fixed use of early breakout in vp8_pick_intra4x4mby_modes
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.

Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
2011-04-13 12:56:46 -07:00
John Koleszar
88841f1059 Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.

Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
2011-04-13 14:26:45 -04:00
Johann
70f30aa95d store quant_shift as an unsigned char
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant

only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize

Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
2011-04-13 13:50:12 -04:00
John Koleszar
cb3e0aaba3 Merge remote branch 'origin/master' into experimental
Change-Id: I231e4dd65adcf4f5c158e3749880a18b8c36cbe4
2011-04-13 00:05:09 -04:00
John Koleszar
c99f9d7abf Change rc undershoot/overshoot semantics
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.

This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.

Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
2011-04-12 20:49:33 -04:00
John Koleszar
538f110407 Merge "Bugfix for error accumulator stats" 2011-04-12 06:59:00 -07:00
John Koleszar
e689a27d62 Bugfix for error accumulator stats
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.

Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
2011-04-12 08:47:33 -04:00
John Koleszar
fd09009227 Merge "Fix encoder range check for frame width and height" 2011-04-12 05:34:12 -07:00
Attila Nagy
1aadcedcfb Fix encoder range check for frame width and height
14 bits available in the bistream => valid range [1..16383]
Removed unused local vars.

Change-Id: Icf3385e47a9fa13af70053129c2248671f285583
2011-04-12 15:07:37 +03:00
John Koleszar
7ff5084f33 Merge remote branch 'origin/master' into experimental
Change-Id: Ib42656b05f2b099f17fd6c2033bbc3445421150c
2011-04-12 00:05:09 -04:00
Yunqing Wang
4fd81a99f8 Set cpu_used range to [-16, 16] in real-time mode
Remove encoding speed limitation in real-time mode.

Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
2011-04-11 15:55:04 -04:00
Yunqing Wang
d1abe62d1c Define RDCOST only once
Clean up the code.

Change-Id: I7db048efa4d972b528d553a7921bc45979621129
2011-04-11 11:53:56 -04:00
John Koleszar
a9ce3e3834 Remove unused files
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
2011-04-11 10:34:40 -04:00
Ralph Giles
c9f8aefe26 Remove duplicate ';;' from the configure script.
This gave a syntax error, preventing configure from completing.

Change-Id: I3df765f93c26e5bb3b2aab939d1cd01d6c57d450
2011-04-10 09:30:36 -07:00
John Koleszar
e33241bb13 Merge remote branch 'origin/master' into experimental
Change-Id: I1a58ce4643377bae4cc6bf9c89320251f724ca66
2011-04-09 00:05:08 -04:00
Yunqing Wang
4b43167ad1 Fix input MV for full search
Input MV needs to be modified to full-pixel precision.

Change-Id: Ic5d78e41bf27077e325024332b9fe89f76c44f0c
2011-04-08 16:29:41 -04:00
Johann Koenig
6e156a4cd7 Merge "use asm_offsets with vp8_fast_quantize_b_sse3" 2011-04-08 10:05:47 -07:00
John Koleszar
921a32a306 Merge "Error accumulator stats bug." 2011-04-08 08:20:32 -07:00
Paul Wilkins
de4e9e3b44 Error accumulator stats bug.
The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.

These are only "currently" used in a limited way for RT compress
key frame detection.

Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
2011-04-08 14:21:36 +01:00
John Koleszar
be3dee8903 Merge remote branch 'origin/master' into experimental
Change-Id: Ib70851b1d801d719edb8f5cd48d2f8fb210d3867
2011-04-08 00:05:08 -04:00
Jim Bankoski
d4cdb683a4 fixed an overflow in ssim calculation
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.

Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
2011-04-07 14:25:25 -07:00
Johann Koenig
08702002e8 use asm_offsets with vp8_fast_quantize_b_sse3
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.

Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
2011-04-07 16:40:05 -04:00
James Berry
aec5487cdd Use correct 32 bit comparisons for SAD breakout.
Rax updated to eax to avoid uninitialized memory
usage.

Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a
2011-04-07 15:08:03 -04:00
John Koleszar
6e4f6c96b3 Merge remote branch 'origin/master' into experimental
Change-Id: Icee86a4b25e53dc04b508179101b1a782b688f61
2011-04-07 00:05:11 -04:00