96 Commits

Author SHA1 Message Date
Ronald S. Bultje
9bc5f3e3af Change common vp8_ public symbol prefixes to vp9_.
Change-Id: Ic5a5f60e1ff9d9ccae4174160d36529466eeb509
2012-10-31 09:47:32 -07:00
Ronald S. Bultje
0d53fc262c Change decoder vp8_ and vp8dx_ public symbol prefixes to vp9_.
Change-Id: Iedb4c3b4171d8640cc525727b4c3658e2bb400db
2012-10-30 22:07:14 -07:00
Scott LaVarnway
ce811f87c4 Faster 8t filtering
Quickly modified the ssse3 sixtap filters to support eight taps.  For the test
clip used, a 23+% boost in decoder performance was seen.  We can
revisit later and improve further.

Change-Id: I5f59860459e80d6fa23e6cc0fd91296a969f5240
2012-10-25 17:24:50 -07:00
Scott LaVarnway
d3465a5352 Added sse2 instrinsic version of vp8_sad3x16
1.6% boost in decoder performance for the clip used.

Change-Id: I91f3c4573fd3d10afbf18930f279af7ae2223e3a
2012-10-25 12:19:41 -07:00
Scott LaVarnway
9ba2efd034 Added sse2 instrinsic version of vp8_sad16x3
3.7% boost in decoder performance for the clip used.

Change-Id: I74f28486a9352b472b36e21b5eaf30eff35e9199
2012-10-25 12:16:08 -07:00
Scott LaVarnway
f8e9d0104b Fixed the MSVC compiling error with correct cast
Change-Id: Ia904f4ec72500d29f1361ce305d8f3231e592f47
2012-10-23 14:25:01 -07:00
Yaowu Xu
d3b7c850a4 Revert "make the instrinsic code build with MS compilers"
This reverts commit b0e3ca126189123ddec27ebba6aa62290e64adb6.

Change-Id: I9c5aa463461b187160ad01fbc1795ae4f5263b2c
2012-10-23 13:03:35 -07:00
Scott LaVarnway
b0e3ca1261 make the instrinsic code build with MS compilers
to enable build under windows/MSVC

Change-Id: Ida41cc5b3c8d0fec9512c2f5c5feb64e07b44805
2012-10-23 10:52:11 -07:00
Scott LaVarnway
085433c2d0 sse2 intrinsic version of vp8_mbloop_filter_vertical_edge()
First sse2 version of vp8_mbloop_filter_vertical_edge().  For now,
intrinsics are being used until the bitstream is finalized.  This function
will be revisited later for further performance improvements.

For the test clip used, a 34+% decoder performance improvement
was seen.  This will vary depending on material.

Change-Id: I455b438bc8d8af76cf7533ac42eda5f689b21f7c
2012-10-19 15:52:12 -07:00
Scott LaVarnway
992b5e2d95 sse2 intrinsic version of vp8_mbloop_filter_horizontal_edge()
First sse2 version of vp8_mbloop_filter_horizontal_edge().  For now,
intrinsics are being used until the bitstream is finalized.  This function
will be revisited later for further performance improvements.
For the test clip used, a 31+% decoder performance improvement
was seen.  This will vary depending on material.

Change-Id: I03ed3a7182478bdd1f094644ff3e0442625600e7
2012-10-18 14:29:26 -07:00
Scott LaVarnway
15ce6bd62e Removed the loopfilter rtcd invoke macro code
Change-Id: I446b2ffcbe732ffb112dbd97a4799272d4c01a84
2012-10-16 16:19:35 -07:00
Jim Bankoski
7c15c18c5e removed the recon rtcd invoke macro code (unrevert)
This reinstates reverted commit 2113a831575d81faeadd9966e256d58b6b2b1633

Change-Id: I9a9af13497d1e58d4f467e3e083fddf06b1b786c
2012-10-16 12:02:31 -07:00
Jim Bankoski
f9d5f86643 Revert "removed the recon. rtcd invoke macro code"
This reverts commit 2113a831575d81faeadd9966e256d58b6b2b1633
2012-10-13 20:29:04 -07:00
Jim Bankoski
2113a83157 removed the recon. rtcd invoke macro code
Code clean up - removed rtcd

Change-Id: Id963ecf53c370b1d99484ef18d6befeed7e0c748
2012-10-13 18:49:44 -07:00
Jim Bankoski
89f060e88a convert copy16x16 to rtcd
Convert copy16x16 from invoke to rtcd.  The first in a long
string of converts.

Change-Id: I296b0aa32f40e9fb649f7a3cb914a4e5300cad63
2012-10-09 17:09:08 -07:00
Christian Duvivier
63ef9c40a4 SSE2 version of vectorized 8-tap filtering.
About 20% overall encoder speedup (vs. about 30% for sse4 version).

Change-Id: Ibf608a6a1bc94b14ec47e8046d3206b275b5a8bd
2012-08-21 15:26:14 -07:00
Christian Duvivier
525b183910 A few more optimizations, about 1% overall speedup.
Unroll horizontal pass, no more intermediate buffer, faster special transpose.

Change-Id: I05df75be4e5f01420066cdf3c61a2edf35bedb64
2012-08-16 15:03:29 -07:00
Christian Duvivier
9471bc2e9e Merge "First partial snapshot of vectorized 8-tap filtering." into experimental 2012-08-15 18:01:18 -07:00
Christian Duvivier
5a34e0eb89 First partial snapshot of vectorized 8-tap filtering.
About 3.5x faster, 30% overall encoder speedup. Rest of optimizations
will come soon (see TODO section in filter_sse4.c).

Change-Id: If18108048bfd5345fc942e8574e4c7f58e0e86e0
2012-08-15 17:55:06 -07:00
Paul Wilkins
77dc5c65f2 Code clean up.
Further cases of inconsistent naming convention.

Change-Id: Id3411ecec6f01a4c889268a00f0c9fd5a92ea143
2012-08-15 11:00:53 +01:00
Deb Mukherjee
7d0656537b Merging in the sixteenth subpel uv experiment
Merges this experiment in to make it easier to run tests on
filter precision, vectorized implementation etc.

Also removes an experimental filter.

Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
2012-08-08 16:57:43 -07:00
Deb Mukherjee
0ebf548c75 Merging and bug-fix in enhanced_interp experiment
Merged the enhanced_interp experiment.
Found and fixed a bug in the include files framework, whereby
certain encoder files were still using the old INTERP_EXTEND
value of 3 instead of 4. The thresholds for mv range mcomp.c
need a small adjustment to prevent crashes.

The results are more or less unchanged.

Change-Id: Iac5008390f1efc97ce1102fbb5f8989c847fb579
2012-07-31 11:45:31 -07:00
Deb Mukherjee
9984a155d6 Merges several experiments
The following five experiments are merged:

newentropy
newupdate
adaptive_entropy (also includes a couple of parameter changes
                  that improves results a little
                  in common/entropymode.c and encoder/modecosts.c
                  that were not merged from the internal branch)
newintramodes
expanded_coef_context

Change-Id: I8a142a831786ee9dc936f22be1d42a8bced7d270
2012-07-27 12:12:39 -07:00
John Koleszar
c6b9039fd9 Restyle code
Approximate the Google style guide[1] so that that there's a written
document to follow and tools to check compliance[2].

[1]: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
[2]: http://google-styleguide.googlecode.com/svn/trunk/cpplint/cpplint.py

Change-Id: Idf40e3d8dddcc72150f6af127b13e5dab838685f
2012-07-17 11:46:03 -07:00
Ronald S. Bultje
9c9d6743d4 Sign-extend input argument so it can be used in pointer arithmetic.
Change-Id: I6cbd4de96f9dcc783cef170bfd7652f6cbee36a2
2012-06-25 14:16:39 -07:00
Daniel Kang
31fd84d80b x86inc: Move x86inc to the correct location.
Change-Id: I6802731a4d15feef5ce62993dc505ded55c40f7e
2012-06-18 13:36:41 -07:00
Daniel Kang
7a00071576 Adds x86inc.asm and update idct/dequant mmx
Updates idct/dequant mmx assembly to work with vpnext instead of vp8.

Also adds x86inc.asm

Change-Id: I6e147d5e89177ae449271e97e50d082eb11b078e
2012-06-12 15:04:03 -07:00
Deb Mukherjee
c5ddb7f016 Adds new Directional Intra prediction modes.
Adds 6 directional intra predictiom modes for 16x16 and 8x8 blocks.

Change-Id: I25eccc0836f28d8d74922e4e9231568a648b47d1
2012-05-15 08:54:50 -07:00
Deb Mukherjee
18e90d744e Supporting high precision 1/8-pel motion vectors
This is the initial patch for supporting 1/8th pel
motion. Currently if we configure with enable-high-precision-mv,
all motion vectors would default to 1/8 pel. Encode and
decode syncs fine with the current code. In the next phase
the code will be refactored so that we can choose the 1/8
pel mode adaptively at a frame/segment/mb level.

Derf results:
http://www.corp.google.com/~debargha/vp8_results/enhinterp_hpmv.html
(about 0.83% better than 8-tap interpoaltion)

Patch 3: Rebased. Also adding 1/16th pel interpolation for U and V

Patch 4: HD results.
http://www.corp.google.com/~debargha/vp8_results/enhinterp_hd_hpmv.html
Seems impressive (unless I am doing something wrong).

Patch 5: Added mmx/sse for bilateral filtering, as well as enforced
use of c-versions of subpel filters with 8-taps and 1/16th pel;
Also redesigned the 8-tap filters to reduce the cut-off in order to
introduce a denoising effect. There is a new configure option
sixteenth-subpel-uv which will use 1/16 th pel interpolation for
uv, if the motion vectors have 1/8 pel accuracy.

With the fixes the results are promising on the derf set. The enhanced
interpolation option with 8-taps alone gives 3% improvement over thei
derf set:
http://www.corp.google.com/~debargha/vp8_results/enhinterpn.html

Results on high precision mv and on the hd set are to follow.

Patch 6: Adding a missing condition for CONFIG_SIXTEENTH_SUBPEL_UV in
vp8/common/x86/x86_systemdependent.c

Patch 7: Cleaning up various debug messages.

Patch 8: Merge conflict

Change-Id: I5b1d844457aefd7414a9e4e0e06c6ed38fd8cc04
2012-02-23 09:25:21 -08:00
John Koleszar
180b0306cc Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/common/defaultcoefcounts.h
	vp8/common/entropy.c
	vp8/encoder/bitstream.c

Change-Id: Idd4990c80d5b5494ac036254694015fab449bc08
2011-08-25 08:36:19 -04:00
Fritz Koenig
112bd4e2b4 Fix naming of sse2 idct functions.
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.

Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
2011-08-24 10:25:32 -07:00
John Koleszar
67864c5f97 Merge remote branch 'internal/upstream' into HEAD 2011-08-24 00:05:05 -04:00
Johann
85358d04cd Fix data accesses for simple loopfilters
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.

For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).

This shows a small improvement on older processors which have a
significant penalty for unaligned reads.

postproc_mmx.c is unused

Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
2011-08-23 20:42:45 -04:00
Fritz Koenig
c5f890af2c Use local labels for jumps/loops in x86 assembly.
Prepend . to local labels in assembly code.  This
allows non unique labels within a file.  Also
makes profiling information more informative
by keeping the function name with the loop name.

Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
2011-08-23 09:05:29 -07:00
John Koleszar
6901105e99 Merge remote branch 'internal/upstream' into HEAD 2011-07-14 00:05:04 -04:00
Johann
01433c5043 update x86 asm for loopfilter
Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce
2011-07-08 09:23:38 -04:00
John Koleszar
5380a2215e Merge remote branch 'internal/upstream' into HEAD 2011-07-02 00:05:10 -04:00
Ronald S. Bultje
c8a23ad3f4 Properly use GET_GOT/RESTORE_GOT when using GLOBAL().
This should fix binaries using PIC on x86-32. Also should
fix issue 343.

Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02
2011-06-30 14:04:27 -07:00
John Koleszar
27331e1377 Merge remote branch 'internal/upstream' into HEAD 2011-05-20 00:05:16 -04:00
Scott LaVarnway
914f7c36d7 Merge "Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3." 2011-05-19 11:22:01 -07:00
John Koleszar
65b1648f35 Merge remote branch 'internal/upstream' into HEAD 2011-05-11 00:05:07 -04:00
John Koleszar
6edd07d656 Merge remote branch 'internal/upstream-experimental' into HEAD 2011-05-11 00:05:07 -04:00
Johann
df2023a6cb set up Global Offset Table in recon
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.

Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f
http://code.google.com/p/webm/issues/detail?id=328
2011-05-10 15:58:56 -04:00
Johann
a7d4d3c550 clean up unused variable warnings
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-05-09 12:56:20 -04:00
John Koleszar
e2990fcc48 Merge remote branch 'internal/upstream' into HEAD 2011-05-03 00:05:05 -04:00
Thijs Vermeir
8942f70cdf Fix documentation typos
Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce
2011-04-30 09:34:59 +02:00
Ronald S. Bultje
5a23352c03 Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035
2011-04-29 11:52:09 -07:00
John Koleszar
57afffbcbb Merge remote branch 'internal/upstream' into HEAD 2011-04-29 00:05:07 -04:00
James Berry
f10732554b bug fix removed inline from recon_wrapper_sse2.c
removed inline from recon_wrapper_sse2.c to build
for visual stuido

Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22
2011-04-28 15:12:00 -04:00
John Koleszar
e1b90ce862 Merge remote branch 'internal/upstream' into HEAD 2011-04-28 00:05:07 -04:00