Compare commits

..

940 Commits

Author SHA1 Message Date
Johann
0c0a05046d Release v1.6.1 Long Tailed Duck
Change-Id: If27447472417c7ed34238295427ddb9da0561725
2017-01-12 12:27:27 -08:00
Johann Koenig
cabc29ba24 Merge "Add mips dspr2 partial idct tests" 2017-01-09 19:49:02 +00:00
Johann Koenig
8a7847c2c9 Merge "Fix mips dspr2 idct32x32 functions for large coefficient input" 2017-01-09 19:47:47 +00:00
Johann Koenig
bf168b24f5 Merge "Fix mips dspr2 idct16x16 functions for large coefficient input" 2017-01-09 19:47:00 +00:00
Johann Koenig
08d0a7fd0f Merge "Fix mips dspr2 idct8x8 functions for large coefficient input" 2017-01-09 19:46:18 +00:00
Johann Koenig
ab20869221 Merge "Fix mips dspr2 idct4x4 functions for large coefficient input" 2017-01-09 19:45:54 +00:00
Johann Koenig
7b18202e74 Merge "Add mips dspr2 vp9 intrapred tests" 2017-01-09 19:39:13 +00:00
Johann Koenig
9af97fb630 Merge "postproc: vpx_mbpost_proc_across_ip_neon" 2017-01-09 18:17:26 +00:00
Marco Paniconi
ebe0b57c91 Merge "vp9: 1 pass cbr mode: increase threshold for gf_cbr_boost_pct usage." 2017-01-09 17:23:12 +00:00
Kaustubh Raste
6377f9d966 Add mips dspr2 partial idct tests
Change-Id: Idf4003ea6f9a2a42a9f26e156bee73697acb7a37
2017-01-09 17:30:16 +05:30
Kaustubh Raste
50dd3eb62c Fix mips dspr2 idct32x32 functions for large coefficient input
Change-Id: If9da7099f226a27a09cc9e2899eb66a1158909d2
2017-01-09 17:21:09 +05:30
Kaustubh Raste
c06991fce6 Fix mips dspr2 idct16x16 functions for large coefficient input
Change-Id: I9be3d3d040837f658c6314606e28db8c31092a1a
2017-01-09 16:35:28 +05:30
Kaustubh Raste
24d804f79c Fix mips dspr2 idct8x8 functions for large coefficient input
Change-Id: If011dd923bbe976589735d5aa1c3167dda1a3b61
2017-01-09 16:22:19 +05:30
Kaustubh Raste
afd2d797eb Fix mips dspr2 idct4x4 functions for large coefficient input
Change-Id: I06730eec80ca81e0b7436d26232465b79f447e89
2017-01-09 15:28:30 +05:30
Kaustubh Raste
c6ccd1e939 Add mips dspr2 vp9 intrapred tests
Change-Id: I6be8c59ee220af0597bc2d7213f2779ac2e88db9
2017-01-09 14:11:57 +05:30
Hui Su
c7e2bd6298 Merge "Add support for VP9 level targeting" 2017-01-07 00:55:41 +00:00
Johann
4dca923454 postproc: vpx_mbpost_proc_across_ip_neon
The speedup is pretty poor. I would be concerned except the SSE2 is
worse:
Existing SSE2 improvement: 22%
New neon improvement: 35%

BUG=webm:1320

Change-Id: Ied598a261134aa6cbe69f96f58589d2bae17bf62
2017-01-06 16:39:17 -08:00
Marco
f1909d26f8 vp9: 1 pass cbr mode: increase threshold for gf_cbr_boost_pct usage.
Increase the boost threshold below which GOLDEN update will use same
rate correction factor as INTER_NORMAL.

Improves performance when gf_cbr_boost_pct is set (between 0 and 100)
in CBR mode.

Change-Id: I9f54cc18664786a100b13a416b7137ae03bd0cab
2017-01-06 15:37:10 -08:00
Jerome Jiang
316071d79c Merge "vp9: Enable more aggresive short circuit for speed 8." 2017-01-06 22:38:40 +00:00
Marco Paniconi
b632626ec0 Merge "vp9: Add some controls to sample encoder: vpx_temporal_svc_encoder" 2017-01-06 22:34:49 +00:00
Jerome Jiang
b87ebd7af8 Merge "vp9: Compute source sad for every superblock when partition copy is on." 2017-01-06 21:57:27 +00:00
Marco
bf5cdbdf9d vp9: Add some controls to sample encoder: vpx_temporal_svc_encoder
Add the gf boost and frame_parallel controls.
Set as default to off.

Change-Id: Id85fcb16a4fae97f51c09e9ebadb5cdcd510c2f5
2017-01-06 11:34:04 -08:00
Jerome Jiang
267e73446c vp9: Enable more aggresive short circuit for speed 8.
Set short_circuit_low_temp_var to 3 for speed 8 for all res.
No strong visual difference on all clips.

Change-Id: Ia6d9a314291ab1c14d5421bbdd769974083aeb2a
2017-01-06 10:23:34 -08:00
hui su
337ad83e58 Add support for VP9 level targeting
Constraints on encoder config:
-target_bandwidth is no larger than 80% of level bitrate limit
-target_bandwidth * (1 + max_over_shoot_pct) is no larger than
88% of level bitrate limit
-min_gf_interval is no smaller than level limit
-tile_columns is no larger than level limit

Constraints on rate control:
-current frame size plus previous three frames' size is no larger
than the CPB level limit
-current frame size is no larger than 50%/40%/20% of the CPB
level limit if it's a key/alt-ref/other frame.

Change-Id: I84d1a2d6d6e3c82bfd533b3309ce999cfaba2c8b
2017-01-06 10:07:31 -08:00
Jerome Jiang
afc8c4836f vp9: Compute source sad for every superblock when partition copy is on.
The source sad could be used to copy the partition without going into
choose_partitioning function to speed up vp9 encoding. Computing source
sad takes little time. Speed test on Android and Linux shows little
encoding time gain (less than 1.4%).

Turned off for now since partition copy is turned off.

Change-Id: I61c9d5b8f22329760cb29a4ee30a7f9c232ce8d3
2017-01-06 17:59:02 +00:00
Linfeng Zhang
2d12a52ff0 Merge "Add high bitdepth 8x8 idct NEON intrinsics" 2017-01-06 16:47:23 +00:00
Linfeng Zhang
90f889a56d Merge "Clean DC only idct NEON intrinsics" 2017-01-06 01:16:19 +00:00
Jerome Jiang
72746c079d vp9: Set short circuit to level 3 for VGA for speed 8.
vp9: Set short circuit to level 3 for VGA for speed 8. Also change the
threshold_32x32 to 5/8*thresholds[1] to improve quality regression
caused to VGA clips.

Change-Id: Ia1590e91e7cb22be78d5b85013387bb1be4272e3
2017-01-04 11:28:31 -08:00
Marco Paniconi
1ca1515dd3 Merge "vp9: 1 pass cbr: allow noise estimation down to 360p." 2017-01-04 17:24:08 +00:00
Marco
768b1f7281 vp9: 1 pass cbr: allow noise estimation down to 360p.
Also adjust some thresholds for noise level setting.

Change-Id: I7e03d7057ef2061c9447728deb9c6aff5d3da4b7
2017-01-03 16:26:22 -08:00
Marco
63a8257fb7 vp9: SVC unittests: fix to use y4m source.
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.

Also cast the update of bits_in_buffer_model_, as this can
go negative now due to the buffer underrun.
This fixes the issue in #1352.

BUG=webm:1350
BUG=webm:1352

Change-Id: Ibd4ef23921daf09e5c15b000aca904aa4573599c
2017-01-03 15:29:04 -08:00
Yunqing Wang
99c573f018 Merge "Fix for out of range motion vector bug in joint motion search" 2017-01-03 17:46:15 +00:00
Ranjit Kumar Tulabandu
b67e1f701f Fix for out of range motion vector bug in joint motion search
Clamped the initial mv in vp9_refining_search_8p_c.

BUG=webm:1354

Change-Id: I47d302b350937e3e6e52e95c983b5fb0b4c64fba
2017-01-03 09:12:32 -08:00
Yunqing Wang
ecdb6a00c2 Merge "Make sub-pixel mv search's return value consistent with the return type" 2016-12-29 19:16:01 +00:00
Yunqing Wang
c96a8dcb5b Merge "Bug fix to avoid random crashes during ARNR filtering" 2016-12-29 17:24:24 +00:00
Gabriel Marin
e6b9609fc0 Merge "Remove superfluous conditional on 'shortcut'" 2016-12-29 06:03:43 +00:00
Linfeng Zhang
911bb980b1 Clean DC only idct NEON intrinsics
BUG=webm:1301

Change-Id: Iffc83854218460b3f687f3774e71d45b552382a5
2016-12-28 13:51:44 -08:00
Linfeng Zhang
9b187954df Add high bitdepth 8x8 idct NEON intrinsics
BUG=webm:1301

Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-27 16:28:53 -08:00
Yunqing Wang
1d12559b09 Make sub-pixel mv search's return value consistent with the return type
For out-of-range cases, returned UINT_MAX instead of INT_MAX in the
sub-pixel mv search to be consistent with the "uint32_t" return type.

Change-Id: I8e206d771228c13d89bafbbe9f14722c8ecc6a7a
2016-12-27 12:08:38 -08:00
Ranjit Kumar Tulabandu
7cf13826b7 Bug fix to avoid random crashes during ARNR filtering
The function 'vp9_find_best_sub_pixel_tree_pruned_more' is modified
to return INT_MAX for handling invalid MV cases from UINT32_MAX.

yunqingwang:
patch 3: rebased on top of the tree.
patch 4: The return type of vp9_find_best_sub_pixel_tree* was changed
to uint32_t to fix ubsan warnings. Changing UINT_MAX back to INT_MAX
was not quite right. Patch 4 modified vp9_temporal_filter.c to accept
uint32_t.
(Note: Inconsistency exists in vp9_find_best_sub_pixel_tree*, which
will be fixed in a separate CL.)

Change-Id: Ib1a79dc2aa41ea6335c21669c76883cdbb7e0535
2016-12-27 11:20:08 -08:00
Linfeng Zhang
3c47a0dc6f Merge "Clean idct 8x8 neon functions" 2016-12-27 17:59:28 +00:00
James Zern
78a24171a6 Revert "vp9: SVC unittests: fix to use y4m source."
This reverts commit f0b491a524.

This change results in unsigned integer overflows (as reported by
-fsanitize=integer) in datarate_test.cc,
for many of --gtest_filter=VP9/DatarateOnePassCbrSvc.OnePassCbrSvc*:
unsigned integer overflow: 167198 - 185560 cannot be represented in type
'unsigned long'

As the encoder didn't change, but the input with the change to
(correctly) use Y4mVideoSource, this revert is merely masking the issue.

BUG=webm:1352

Change-Id: Iecd9a6c83b3fca67c566732a5c92d36193cc2060
2016-12-23 14:18:18 -08:00
Marco Paniconi
36e767c147 Merge "vp9: SVC unittests: fix to use y4m source." 2016-12-22 17:26:42 +00:00
James Zern
90ceaba3e4 libs.mk/stress.sh,curl: set --retry to 1
provide some resilience for transient errors

Change-Id: I8db3d4eb5ef3cccc235a8c4c0052199c0ce23a27
2016-12-22 08:29:15 -05:00
Marco
f0b491a524 vp9: SVC unittests: fix to use y4m source.
Comment out check on buffer underrun, as it currently fails
on some of the svc tests.

BUG=webm:1350

Change-Id: I73c88b800cdcc06bd2f900f7b7e2a5fd08248065
2016-12-21 22:59:35 -08:00
Linfeng Zhang
6d5a3fe583 Clean idct 8x8 neon functions
BUG=webm:1301

Change-Id: I05f47dca1fddc155c8396e627cfccf6449677307
2016-12-21 14:24:17 -08:00
Marco
e7c453b613 vp9: 1 pass vbr: Skip find_predictors in pickmode when source is altref.
When source frame is altref, we only do zero-mv mode, so we can skip
the find_predictors(). No change in compression.
Small speed gain, ~1%.

Only affects 1 pass vbr with lookhead altref, for ytlive with
the macro flag USE_ALTREF_FOR_ONE_PASS on.

Change-Id: I9318c5da8521f017bf54919cd652438b3a6313d1
2016-12-21 12:12:55 -08:00
Marco Paniconi
b5770a2007 Merge "vp9; Fix to unitest for high noise." 2016-12-21 19:38:00 +00:00
Marco
9ba77ed45b vp9; Fix to unitest for high noise.
Source if y4m, and fix comment.

Change-Id: I1eb84977d42dd0f9009c276b56b3fdb03949bfc2
2016-12-21 10:22:34 -08:00
Marco Paniconi
9ba45fa510 Merge "vp9: Add datarate test for denoiser, for high noise case." 2016-12-21 03:56:13 +00:00
Marco
3fcd595dfb vp9: Add datarate test for denoiser, for high noise case.
Also breakout the denoiser tests, as the denoiser only
runs for real-time speed >=5.

Change-Id: I921b785860c35e9d1ebfad0833673a98490186c2
2016-12-20 16:48:25 -08:00
Jerome Jiang
f27276f44f Merge "vp9: Add feature to copy partition from the last frame." 2016-12-20 21:46:44 +00:00
Gabriel Marin
fce163cd54 Remove superfluous conditional on 'shortcut'
Remove superfluous test. Produces a small improvement in instruction scheduling.
Measured a 1% to 1.5% reduction in execution time for routine vp9_optimize_b
with different compilers.

No change in behavior.

TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225

Change-Id: I2bf248d4c25fc0256147d7a8766ff9108ae9cba3
2016-12-20 12:20:21 -08:00
Kaustubh Raste
8a152a55f7 Merge "Add mips msa vp9 intrapred tests" 2016-12-20 02:27:08 +00:00
Jerome Jiang
1d5ca84df6 vp9: Add feature to copy partition from the last frame.
Add feature to copy partition from the last frame.
The copy is only done under certain conditions that SAD is below threshold.
Feature is currently disabled, until threshold is tuned.
Feature will be initially used for Speed 8 (ARM).

Under extreme case of always copying partition for speed 8:
Encode time is reduced by 5.4% on rtc_derf and 7.8% on rtc.
Overall PSNR reduced by 2.1 on rtc_derf and 0.968 on rtc.

Change-Id: I1bcab515af3088e4d60675758f72613c2d3dc7a5
2016-12-19 16:24:03 -08:00
Gabriel Marin
85aead1790 Merge "Simplify address arithmetic in vp9_optimize_b" 2016-12-19 23:25:39 +00:00
James Zern
80474bf65e Merge "vpx_idct32x32_1024_add_neon: quiet uninitialized warning" 2016-12-19 22:39:01 +00:00
Marco Paniconi
c1f5194842 Merge "vp9 denoiser: Fix the logic for re-evaluating zeromv after denoising." 2016-12-19 21:15:37 +00:00
Gabriel Marin
0549f5aae9 Simplify address arithmetic in vp9_optimize_b
Simplify address arithmetic on token_costs to reduce the number of generated
instructions that are used for address arithmetic inside routine
vp9_optimize_b. It also helps improve instruction scheduling depending on
compiler and optimization level.

Measured a 9.3% reduction in retired instructions and 5.3% reduction in
execution time for this routine with GCC v4.8.4 and optimization flags -O3,
and a reduction of up to 11.6% in execution time with other compilers.

No change in behavior.

TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225

Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f
2016-12-19 13:10:04 -08:00
James Zern
a68b36c752 vpx_idct32x32_1024_add_neon: quiet uninitialized warning
relocate the assignment to 'in' outside of the for loop. this quiets a
spurious warning in visual studio builds since:
86e340c enable vpx_idct32x32_1024_add_neon in hbd builds

+ give the variable a more descriptive name

BUG=webm:1294

Change-Id: I5c3da5c7939621477e0fc0ad3a1b2a3045c5bffd
2016-12-19 12:49:44 -08:00
Marco
6e8dbc76ad vp9: With denoising on, only estimate noise level for higher resolns.
Allow it for resolns above 640x360 for now.

Change-Id: I087d0d8173f96b316164fdd4a499110ce2e7a233
2016-12-19 10:05:54 -08:00
Marco
61b569b461 vp9 denoiser: Fix the logic for re-evaluating zeromv after denoising.
Correctly set interp_filter to SWITCHABLE for INTRA mode.
Also reduce threshold on noise level for re-evaluating zeromv.

Change-Id: Id32c01e193209fb380aa07204f0be3babf29f70a
2016-12-19 09:30:16 -08:00
Linfeng Zhang
7e23f895ca Merge "Clean hbd idct 4x4 neon functions and other" 2016-12-19 17:09:26 +00:00
Kaustubh Raste
1f3e079a35 Add mips msa vp9 intrapred tests
Change-Id: I49b91464a87cad8692f4b1477e45e5f567b4fe87
2016-12-19 17:32:38 +05:30
Johann Koenig
9b63cb057a Merge "post proc test: add padding for sse2 tests" 2016-12-17 01:12:34 +00:00
Marco Paniconi
d1eca240fb Merge "vp9: Change condition to enable recheck_zeromv_after_denoising." 2016-12-16 23:53:33 +00:00
Marco
4260a7f2b3 vp9: Change condition to enable recheck_zeromv_after_denoising.
For when denoising enabled: change condition to enable
the recheck_zeromv_after_denoising for only very high noise level.
This is causing an issue, so enabling it for very high noise
to effectively shut it off.

Change-Id: Ic40d6025f3f398338cedd270d17c0ccd9a3daa84
2016-12-16 15:00:21 -08:00
Johann
5993b808f0 post proc test: add padding for sse2 tests
Avoid valgrind warnings for reading out of bounds when the width is not
divisible by 16.

Change-Id: I5670d7cfbbce00874b98cfb7472f99c7936c2c47
2016-12-16 14:06:06 -08:00
Johann
4781a67737 postproc test: disable new down and across test
The new test is causing valgrind failures:
[ RUN      ] SSE2/VpxPostProcDownAndAcrossMbRowTest.CheckCvsAssembly/0
==28923== Invalid read of size 16
28923==    at 0x724016: ??? (deblock_sse2.asm:146)

Disable during investigation. The test is new but the code is not.

Change-Id: I5521e5fd48a595e3798b833bf7e3cc97b81c1975
2016-12-16 12:19:00 -08:00
Jim Bankoski
318a1ff5ec vp8 : use threading mutex's for tsan only.
To avoid decode performance hit of 2% when running on hyperthreaded
cores.

This patch only uses the mutex's when we are running tsan.

This is safe because 32 bit operations like read and store are atomic
on all the platforms we care about. Tsan warns about race situations,
but in this case either situation ( read occurs before write or write
before read) the worst case is that we go around one extra time in the
loop.  So the ordering doesn't really matter.

That said a few other things have been tried :

for instance as per here:
webrtc/base/atomicops.h#52

In this patch they use:
__atomic_load_n(i, __ATOMIC_ACQUIRE);
__atomic_store_n(i, value, __ATOMIC_RELEASE);

This code works on gcc, clang ( replacing protected write and read), and
avoids tsan errors. Incurring no penalty in performance.  In C11 its
replaced by straight atomic operands.

However there is no equivalent in the visual studio's we support as
int32 on all windows platforms is already atomic.  To avoid tsan like
warnings on windows we'd need to use interlocked exchange and the
end result doesn't gain us any thing.

Change-Id: I2066e3c7f42641ebb23d53feb1f16f23f85bcf59
2016-12-16 08:50:55 -08:00
Marco Paniconi
2b1ec65b5d Merge "vp9: Fix to usage of flag USE_ALTREF_FOR_ONE_PASS" 2016-12-15 19:48:16 +00:00
Johann
41b0888a84 postproc: neon down and across macroblock filter
Implement vpx_post_proc_down_and_across_mb_row in NEON.
Runs about 6-7x faster than C.

BUG=webm:1320

Change-Id: Ic5c7d3552a88cfcf999ec5bf2bd46fee460642c2
2016-12-14 15:11:28 -08:00
Marco
5de798f2b2 vp9: Fix to usage of flag USE_ALTREF_FOR_ONE_PASS
The flag USE_ALTREF_FOR_ONE_PASS allows for alt-ref lookahead
in 1 pass vbr (from https://chromium-review.googlesource.com/#/c/365498).
This change is to make sure this macro flag only has effect if
the config flag cpi->oxcf.enable_auto_altef is also on.

No change in ytlive encoding, as USE_ALTREF_FOR_ONE_PASS is not
yet enabled.

Change-Id: I1a69681e4a15c5244581a3dab4587fca08f02e0f
2016-12-14 15:07:38 -08:00
Linfeng Zhang
c8f25fa5c0 Clean hbd idct 4x4 neon functions and other
BUG=webm:1301

Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-14 11:38:28 -08:00
Yaowu Xu
27e1bacdb3 Change order of operation to avoid ubsan warnings
This commit change an order of operation to avoid left shifts of
negative numbers.

Change-Id: I607c7eb91658c7a5ef397fc1504721d1b10e3dd6
2016-12-14 09:37:14 -08:00
Linfeng Zhang
3dd20456ab Merge "Update idct test code to test 8-bit & high bitdepth simultaneously" 2016-12-14 17:05:34 +00:00
Linfeng Zhang
201dcefafe Update idct test code to test 8-bit & high bitdepth simultaneously
Change-Id: Icc0eb9c0ddf2a13ec832877a089450972134e8ec
2016-12-13 17:25:04 -08:00
James Bankoski
3486abd54a Merge "Reapply 'Amend and improve VP8 multithreading implementation'" 2016-12-14 01:21:50 +00:00
James Zern
86e340c76e enable vpx_idct32x32_1024_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ibdda54e6d1303b0f73bc7bc71417e4041d7618de
2016-12-12 19:28:35 -08:00
Jim Bankoski
85a541a421 Reapply 'Amend and improve VP8 multithreading implementation'
Reapply this patch:
ff0107f Amend and improve VP8 multithreading implementation

Amended the patch to add a unit test, and fix an asan error.

BUG=webm:851

Change-Id: I6572c03256169c64e80248bf5a5e99f59a2fc93c
2016-12-13 02:11:34 +00:00
Linfeng Zhang
5d4aa325a6 Cosmetics by unifying dest_stride to stride in idct
Change-Id: Ie9336a808a3c3592bb4fd5d4ad3839028bfcafba
2016-12-12 15:13:22 -08:00
James Bankoski
282f3b3d78 Merge "vp8: adds multithread testing." 2016-12-10 00:01:32 +00:00
Marco Paniconi
817488be47 Merge "vp9: Fix to crash in svc code." 2016-12-09 23:47:02 +00:00
Jim Bankoski
121e161115 vp8: adds multithread testing.
The test is disabled because of TSAN errors until we resolve
BUG=webm:851

Change-Id: I0b21c8d815bc1ea365da024b1e2ee5e1fc5715c2
2016-12-09 15:05:59 -08:00
Johann
2c24f7178d Move load_and_transpose to transpose_neon.h
Allows for use outside the idcts without pulling in idct_neon.h

Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf
2016-12-09 12:54:55 -08:00
Marco
076d4bd91a vp9: Fix to crash in svc code.
use_base_mv assumes 2x2 scaling, so fix is to shutoff
this feature unless spatial scale factors are 2.

Added svc unittest for 2 spatial layers with 5x5 scaling,
which generates the issue without this fix.

Also fix some settings in svc unittest:
let the speed setting vary (from 5 to 8), and enable static threshold.

BUG=webm:1344

Change-Id: Idfd0a6c633c21b49a0479601506302cfe974e30e
2016-12-09 08:57:09 -08:00
James Zern
7ba9d31e3f Merge "idct16x16_add_neon: fix arm visual studio builds" 2016-12-09 03:19:16 +00:00
Marco
cd6f742980 vp8 multi_res_encoder: Ajust some settings in sample encoder.
Set #threads to default 1 for all streams, change bit allocaton
for 3 temporal layers, and enable denoiser on middle resolution layer.

Change-Id: I4a57adbfdb2c319002b8f3cf359613842dc00d75
2016-12-08 15:27:16 -08:00
James Zern
6defef4ab2 idct16x16_add_neon: fix arm visual studio builds
after:
2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds

reorder INCLUDEs and fix indent of IF/ENDIFs

remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.

Change-Id: I0dacfbae62a6ec8fe4a26940c1a52da2dfad2029
2016-12-08 15:17:57 -08:00
Yunqing Wang
880adc3355 Merge "Remove an unused first pass statistic" 2016-12-08 22:46:44 +00:00
Yunqing Wang
394020383d Remove an unused first pass statistic
One of the first pass stats "new_mv_count" is no longer used in VP9,
and is removed. This also makes it easy to implement a multi-threaded
first pass. This change doesn't affect the coding performance, which
has been verified by borg tests.

Change-Id: I4c7c7bf9465fda838eb230814ef0c631c068c903
2016-12-07 15:32:25 -08:00
Marco Paniconi
e4c6f8fde7 Merge "vp9: Fix some TODOs in svc code." 2016-12-07 22:06:01 +00:00
Linfeng Zhang
385599b553 Merge "Update TEST_P(PartialIDctTest, RunQuantCheck)" 2016-12-07 21:05:05 +00:00
Linfeng Zhang
174528de1e Merge "Update idct NEON optimization to not use narrowing saturating shift" 2016-12-07 21:03:21 +00:00
Marco
5778a7c9cb vp9: Fix some TODOs in svc code.
Change-Id: Ie9f441245987ade9dab38af69adf4dd1fb38ca3f
2016-12-07 13:02:48 -08:00
James Zern
f16a0a1aa4 Merge "enable vpx_idct16x16_256_add_neon in hbd builds" 2016-12-07 20:26:44 +00:00
Linfeng Zhang
834feffe08 Update TEST_P(PartialIDctTest, RunQuantCheck)
1. Use correct projections when copying real dct/quant outputs.
2. Remove local random number generator and combine loops.
3. Quantization with minimum allowed step sizes instead of maximum.
   This may generate larger inputs.

Change-Id: I154afc26230c894d564671cff4b8fd5485b69598
2016-12-07 11:34:00 -08:00
Marco Paniconi
17c403d0ab Merge "vp9: Adjust the weight factor for segment rate cost for aq-mode=3." 2016-12-07 19:31:13 +00:00
Linfeng Zhang
018a2adcb1 Update idct NEON optimization to not use narrowing saturating shift
Change-Id: Iae517017217dbacd638d40fcfeeb0f4bba7b8b8b
2016-12-07 10:25:09 -08:00
James Zern
2d3d95f7ac enable vpx_idct16x16_256_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ib421c150b0d29dee0a81390a612bf01a4a28cff1
2016-12-06 18:32:21 -08:00
James Zern
228c9940ea Merge changes Ibad079f2,I7858a0a1
* changes:
  enable vpx_idct16x16_10_add_neon in hbd builds
  idct16x16,NEON: rm output_stride from pass1 fns
2016-12-07 01:40:28 +00:00
James Zern
8befcd0089 enable vpx_idct16x16_10_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ibad079f25e673d4f5181961896a8a8333a51e825
2016-12-06 16:09:19 -08:00
James Zern
af9d7aa9fb idct16x16,NEON: rm output_stride from pass1 fns
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.

Change-Id: I7858a0a15a284883ef45c13dfd97c308df9ea09e
2016-12-06 15:13:33 -08:00
Linfeng Zhang
cb339d628f Refine 8-bit 8x8 idct NEON intrinsics
Change-Id: I4ec4ad1928ec2ed87f596f52f097bc52065278dd
2016-12-05 17:50:14 -08:00
Marco
360ac89885 vp9: Adjust the weight factor for segment rate cost for aq-mode=3.
Use the segment weight factor based on the target (cr->percent_refresh)
if it less than the current estimate (avergae of past usage and target).
Small improvement at low bitrates.

Change-Id: Iba8fd909e203f94458901366d3a991f7ea854d49
2016-12-05 12:42:56 -08:00
Linfeng Zhang
a8eee97b43 Check in vpx_lpf_vertical_4_dual_neon() assembly
This replaces its C version.

Change-Id: Ie39e9324305fdc0fff610ced608a037e44a85a1a
2016-12-02 15:54:30 -08:00
James Zern
a7fa1314da Merge changes I4afc130e,Iaa64d23f
* changes:
  Add high bitdepth 4x4 idct NEON intrinsics
  Update idct x86 intrinsics to not use saturated add and sub
2016-12-02 04:01:28 +00:00
Linfeng Zhang
17a8cf5cc3 Add high bitdepth 4x4 idct NEON intrinsics
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-30 13:07:13 -08:00
Linfeng Zhang
264f6e70ec Update idct x86 intrinsics to not use saturated add and sub
Change-Id: Iaa64d23fdb45ca1f235b0ea57e614516e548eca4
2016-11-29 17:06:08 -08:00
James Zern
c6641782c3 idct16x16,NEON,cosmetics: normalize fn signatures
+ remove unused parameters from vpx_idct16x16_10_add_neon_pass2

Change-Id: Ie5912a4abdd308fab589380bca054a2e7234a2c4
2016-11-28 16:46:01 -08:00
James Zern
12566c3d0f Merge changes Ide6d3994,I164cfcbe
* changes:
  enable vpx_idct32x32_135_add_neon in hbd builds
  idct_neon: rename load_tran_low_to_s16 -> ...s16q
2016-11-29 00:12:45 +00:00
James Zern
33ddc645ce Merge "build/make/Android.mk: correct rtcd template var refs" 2016-11-28 23:39:37 +00:00
James Bankoski
68991d7f87 Merge "svc_test: fix two warnings" 2016-11-28 22:27:26 +00:00
Jim Bankoski
27b5cc31e6 svc_test: fix two warnings
Use of possibly uninitialized variable and missing test initializer.

Change-Id: I2192c81c39ef4239cc11a309850c0ee8781ef17e
2016-11-28 12:53:39 -08:00
Jerome Jiang
f68cf8ba19 Cosmetic changes to variable names in deblocker tests.
Change kExpectedOutput to expected_output in function parameters in
the deblocker test.

Change-Id: I5baf8d1285ac47922950887406c7aa519ddc512a
2016-11-28 10:08:12 -08:00
James Zern
a58e0b2a74 build/make/Android.mk: correct rtcd template var refs
the expansion of findstring and rtcd_dep_template_CONFIG_ASM_ABIS needs
to be deferred until the block is parsed as makefile syntax rather than
eval time where rtcd_dep_template_CONFIG_ASM_ABIS will be unset. this
ensures vpx_config.asm is properly created.

Change-Id: I7c38c6c082da78397936467482789dd468adc316
2016-11-24 17:55:16 -08:00
James Zern
120234fa17 Merge changes I6b4cd56e,I88f91b92
* changes:
  Android.mk,armv7: fix idct_neon.asm.S creation
  build/make/Android.mk: set/use qexec appropriately
2016-11-24 07:22:04 +00:00
James Zern
21a1abd8e3 enable vpx_idct32x32_135_add_neon in hbd builds
BUG=webm:1294

Change-Id: Ide6d3994fe01c4320c9d143e6d059b49568048e4
2016-11-23 19:59:43 -08:00
James Zern
568d4b1d63 idct_neon: rename load_tran_low_to_s16 -> ...s16q
BUG=webm:1294

Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b
2016-11-23 19:57:48 -08:00
James Zern
d757d7e998 Merge changes Icc4ead05,Ib019964b,I3b5fd3b3,Ieedadee2
* changes:
  Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
  Refine 8-bit 4x4 idct NEON intrinsics
  Add idct speed test.
  Update partial_idct_test.cc to support high bitdepth
2016-11-24 03:31:25 +00:00
Jerome Jiang
f63eb66ecd Merge "Change C/MSA post proc to match SSE2." 2016-11-24 01:56:34 +00:00
Jerome Jiang
95eb505660 Merge "Cover more filter levels in unit tests for post proc." 2016-11-24 01:56:22 +00:00
James Zern
2c598d0858 Android.mk,armv7: fix idct_neon.asm.S creation
force this to be created before any other .S files. this change
additionally removes the file from the source list as it doesn't need to
be compiled on its own.

Change-Id: I6b4cd56ef6059d08f75f06fb749cddf76e0e165e
2016-11-23 16:49:19 -08:00
James Zern
1136db0db0 build/make/Android.mk: set/use qexec appropriately
commands are echo'd when V=1; libs.mk depends on this variable as well

Change-Id: I88f91b9260f16686cfccdf6bd3f29d246521b62e
2016-11-23 16:46:50 -08:00
Marco
d793950ec8 vp9: Adjust cyclic refresh parameters for low bitrates.
Increase the motion threshold and qp-delta for segment#2 boost.
This can increase the frame-drop at low bitrates, but generally
better spatial quality.

Only affects real-time mode with aq-mode=3, at very low bitrates.

Change-Id: I5ccb784667f70d0c27d369806b93b1f93d5605d1
2016-11-23 12:14:28 -08:00
James Zern
af290bfe3b Merge "use storage.googleapis for testdata download" 2016-11-23 19:27:20 +00:00
Jerome Jiang
97ec6291ee Change C/MSA post proc to match SSE2.
BUG=webm:1321

Change-Id: I719023375dc48cf7d8ed72188853f0f1ccc4ad7f
2016-11-23 10:42:11 -08:00
Jerome Jiang
755fb3d4ec Cover more filter levels in unit tests for post proc.
For some filter level, the C/MSA doesn't match SSE2. Part of unit tests
are disabled. They will be re-enabled when C/MSA funcs are fixed.
BUG=webm:1321

Change-Id: Ib16b98b5eecb15d2252aa4ea267b782ee2b27533
2016-11-23 10:31:41 -08:00
Marco Paniconi
8b2cbaefcf Merge "vp9: Use more aggressive skip when short_circuit_low_temp_var = 1." 2016-11-23 18:15:58 +00:00
James Zern
22f7aca097 use storage.googleapis for testdata download
replace downloads.webmproject.org with the canonical
storage.googleapis.com/... form. this appears less likely to fail when
dealing with multiple concurrent connections.

Change-Id: I0dcbd04df9e4057fa851f458b3ef7e3589f1f2f1
2016-11-22 23:03:12 -08:00
James Zern
d7f1d60c51 Merge "avoid redefining WIN32_LEAN_AND_MEAN" 2016-11-23 00:43:22 +00:00
James Zern
198a046d3e Merge "vp9,read_inter_block_mode_info: quiet msan warning" 2016-11-23 00:42:24 +00:00
James Zern
cb22359d02 vp9,read_inter_block_mode_info: quiet msan warning
best_sub8x8[1] won't be used meaningfully when is_compound is false, but
may trigger an msan warning as the value is copied around and later
clamped.

BUG=667044

Change-Id: Icc24c3b72cdb550bebea44d4aaa4ff8bf3fbab56
2016-11-22 15:32:00 -08:00
Linfeng Zhang
05e2b5a59f Merge "Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction" 2016-11-22 23:20:53 +00:00
James Zern
446d1ee624 avoid redefining WIN32_LEAN_AND_MEAN
fixes redef errors when the macro is supplied elsewhere, e.g., the
command line

Change-Id: Ic15726817a43e30595d50562ef1f077060c193cf
2016-11-22 15:15:53 -08:00
Marco
b6597745f9 vp9: Use more aggressive skip when short_circuit_low_temp_var = 1.
Use the same feature as https://chromium-review.googlesource.com/#/c/411327/,
but allow it to be used for speed  = 6 and 7, where
short_circuit_low_temp_var = 1.

Speed up of ~2-3% for speed 7, with little/no loss in compression.

Change-Id: I263a0f261ad9929034392d68f0153dc6376fdb5f
2016-11-22 14:54:28 -08:00
Jerome Jiang
0966757874 Cosmetic changes to post proc unit tests.
Remove unnecessary "virtual" before some functions. Change *_btm_* in
variable names to *_bottom_*.

Change-Id: Ifd4ce667537617f451cdfed47dd8c48817fd983b
2016-11-22 22:28:17 +00:00
James Zern
7d2690e658 Merge "build/make/Android.mk: use -fPIC w/ENABLE_SHARED=1" 2016-11-22 20:14:12 +00:00
Linfeng Zhang
6cc76ec73f Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22 11:35:05 -08:00
James Bankoski
4bb01229cd Merge "vp9-tests : split VpxEncoderThreadTest into two tests." 2016-11-22 19:34:16 +00:00
Linfeng Zhang
974e81d184 Refine 8-bit 4x4 idct NEON intrinsics
Change-Id: Ib019964bfcbce7aec57d8c3583127f9354d3c11f
2016-11-22 11:26:03 -08:00
Linfeng Zhang
45876b4550 Add idct speed test.
Change-Id: I3b5fd3b36cac1fb3a93e27fd8fd0781c91d412ce
2016-11-22 11:19:24 -08:00
Linfeng Zhang
d479c9653e Update partial_idct_test.cc to support high bitdepth
BUG=webm:1301

Change-Id: Ieedadee221ce539e39bf806c41331f749f891a3c
2016-11-22 11:11:58 -08:00
Jim Bankoski
719f39f44e vp9-tests : split VpxEncoderThreadTest into two tests.
VpxEncoderThreadTest was taking a very long time for some runs and
timing out a lot.   This is an attempt to split the test into runs
that can be run nightly ( speeds 2 through 9) and runs that can
be run weekly ( speeds 0-1 ).

Change-Id: Iee6f61a561006d3a30381dd3b52b9a4dce07a70c
2016-11-22 07:31:04 -08:00
Kaustubh Raste
ecc5998bcf Fix mips dspr2 build warning
Change-Id: Ia8fb3ed124f01384e7896e309c9ff22c05b40719
2016-11-22 17:49:17 +05:30
Yaowu Xu
0ffbb36ddc Add validation of frame_parallel_decoding_mode
This is a boolean value that is written into bitstream, any value other
than 0 or 1 could have led to unexpected behavior. This commit fix the
issue by adding validation of the value to make sure it is boolean.

BUG=webm:1339

Change-Id: I2d3e69e8dbefcab9a0db9cb39a91a40ce531c5a1
2016-11-21 10:53:25 -08:00
Jingning Han
f473e892f7 Merge "Enable asymptotic closed-loop encoding decision" 2016-11-19 04:12:55 +00:00
Kaustubh Raste
a38e9f412d Merge "Fix SingleLargeCoeff idct test" 2016-11-19 03:37:29 +00:00
James Zern
3d55311062 vpx_temporal_svc_encoder.sh: fix comment (// -> #)
Change-Id: Ib13152a9ff523b1c29e8519e4f7ed01ad9874525
2016-11-18 19:11:55 -08:00
James Zern
7317ce8bd4 build/make/Android.mk: use -fPIC w/ENABLE_SHARED=1
fixes reloc errors like:
R_X86_64_PC32
vpx_dsp/x86/deblock_sse2.o:
requires dynamic R_X86_64_PC32 reloc against 'vpx_rv' which may overflow
at runtime

Change-Id: I218fc0e7c8258197f890d395f335e5a4fe82dccb
2016-11-18 18:54:34 -08:00
James Zern
cbeae53e76 Merge "Clean horizontal intra prediction NEON optimization" 2016-11-19 01:29:37 +00:00
James Zern
7adeccb33d Merge "partial_idct_test: s/SingleLargeCoef/SingleExtremeCoeff/" 2016-11-18 20:02:44 +00:00
Jerome Jiang
23f1bfbd85 Merge "Change *_xmm to *_sse2 in deblocker assembly functions." 2016-11-18 00:23:45 +00:00
Jerome Jiang
de5fd00ec5 Change *_xmm to *_sse2 in deblocker assembly functions.
Some cosmetic changes because xmm is an anachronism.

Change-Id: I436a5b78a3c52776c20d6640939311f2a84a9bc7
2016-11-17 23:38:04 +00:00
James Zern
f6921412d4 partial_idct_test: s/SingleLargeCoef/SingleExtremeCoeff/
tests with 'Large' in the name are reserved for slow running tests which
may not be run on all platforms

Change-Id: I2a7d6dd46b29b50469893e46433844132fb727c2
2016-11-17 12:28:57 -08:00
Marco Paniconi
485a49d0b8 Merge "vpx_temporal_svc_encoder.sh: Run all tests for 1-4 threads for vp8/vp9." 2016-11-17 20:22:32 +00:00
Kaustubh Raste
c56e5dd620 Fix SingleLargeCoeff idct test
Updated idct code to handle single large coefficient (-32768)

Change-Id: Ia13ab1ab434a9a1b9954a5914088977a88841cc7
2016-11-17 11:41:07 +00:00
Jerome Jiang
5d48663e04 Merge "Change C and msa to match results from sse2." 2016-11-17 05:16:27 +00:00
Jerome Jiang
cb1b1b8fef Change C and msa to match results from sse2.
Re-enable the tests to check CvsAssembly.
BUG=webm:1321

Change-Id: Id7f7d74b06c469fb6c8f5d04e91359e9cd9097a6
2016-11-16 17:05:26 -08:00
Marco
2ef2243804 vpx_temporal_svc_encoder.sh: Run all tests for 1-4 threads for vp8/vp9.
Change-Id: I079ee87cb32e36a1486c377c0037945b4bb89626
2016-11-16 14:11:25 -08:00
Jim Bankoski
f667cc7a4e stress.sh: Runs multiple libvpx encodes and decodes in parallel
This runs multiple encodes and decodes of vp8 and vp9 in parallel,
with so many threads that problems with synchronization can show up.

Change-Id: I2b297e7f43d1e741323c7ad9f50a3931ae609f16
2016-11-16 06:59:26 -08:00
James Zern
12fe34516e Merge "build/make/Android.mk: fix cpufeatures import" 2016-11-15 23:41:48 +00:00
James Zern
f09c687ea9 Merge changes I3950c883,I2b679b04
* changes:
  partial_idct_test: use <limits> for int16_min/max
  vpx_timer.h,x86.h: define NOMINMAX for windows.h
2016-11-15 23:41:18 +00:00
Linfeng Zhang
011fdec1e6 Merge "Add high bitdepth intra prediction NEON optimization (mode tm)" 2016-11-15 23:30:48 +00:00
Jerome Jiang
4ddae8f524 Merge "vp9: Speed 8: More aggresive golden skip for low res." 2016-11-15 22:50:58 +00:00
Linfeng Zhang
85c1ee434d Add high bitdepth intra prediction NEON optimization (mode tm)
BUG=webm:1316

Change-Id: Ib014de06836ac12726f4a2c9f0833ec4eb4d233b
2016-11-15 14:19:46 -08:00
Jerome Jiang
360217a233 vp9: Speed 8: More aggresive golden skip for low res.
Add a new, more aggresive short circuit: short_circuit_low_temp_var = 3 to skip
golden of any mode when variance is lower than threshold for low res.
This change only affects speed = 8, low resolution.

Metrics for avgPSNR/SSIM on rtc_derf (low resolution) show loss of
0.27/0.31%.
On Nexus 6, the encoding time is reduced by ~2.3% on average across all
low-res clips.

Visually little change on rtc_derf clips.

Change-Id: Ia8f7366fc2d49181a96733a380b4dbd7390246ec
2016-11-15 13:56:27 -08:00
James Zern
2218a4c292 partial_idct_test: use <limits> for int16_min/max
this removes the need for __STDC_LIMIT_MACROS which is defined in
vpx_integer.h, but may be preceded by earlier includes of stdint.h;
fixes build with the r13 ndk

Change-Id: I3950c8837cf90d5584a20ce370ae370581c2182c
2016-11-15 12:18:38 -08:00
James Zern
0412193bb9 vpx_timer.h,x86.h: define NOMINMAX for windows.h
avoids the definition of min/max macros in headers that may appear in
c++ unit tests. the codebase uses VPXMIN/MAX for this purpose in any
case

Change-Id: I2b679b045d64fb34fd8780f704e3caf10a758d82
2016-11-15 12:18:38 -08:00
James Zern
f938ab5e6a build/make/Android.mk: fix cpufeatures import
use 'android/cpufeatures' rather than 'cpufeatures'; this matches the
documentation, fixes compilation with r12b/r13 and still works with
older ndks.

Change-Id: I2f34233c164e6d4d46428f8905d5502cea4288a2
2016-11-14 13:25:51 -08:00
Jerome Jiang
eff68a3a4d vp9: Speed 8: Turn off 4x4avg for low-res non-key frames.
Changes only affects speed = 8 for low resolutions.

Metrics for avgPSNR/SSIM on rtc_derf (low resolutions) show loss of
0.5/0.6%.
On Nexus 6, the encoding time is reduced by ~5.9% on average across all
low-res clips.
Visually little/no change on rtc_derf clips.

Change-Id: I68dd50e558d72dcc1af8317d224bfae5e3bd872d
2016-11-14 11:17:14 -08:00
Jingning Han
44f8ee7258 Enable asymptotic closed-loop encoding decision
This commit enables asymptotic closed-loop encoding decision for
the key frame and alternate reference frame. It follows the regular
rate control scheme, but leaves out additional iteration on the
updated frame level probability model. It is enabled for speed 0.

The compression performance is improved:

lowres 0.2%
midres 0.35%
hdres  0.4%

Change-Id: I905ffa057c9a1ef2e90ef87c9723a6cf7dbe67cb
2016-11-14 09:22:55 -08:00
Linfeng Zhang
a3128ad33a Add high bitdepth intra prediction NEON optimization (h and v)
BUG=webm:1316

Change-Id: I47eeac698a98a31d1af5f72441052302e9fa4f46
2016-11-12 12:00:19 -08:00
Jerome Jiang
186dc40e8e Merge "Add unit tests for post proc." 2016-11-12 04:38:27 +00:00
Jerome Jiang
b282048fe4 Add unit tests for post proc.
Some tests are disabled since C and msa don't match sse2.
BUG=webm:1321

Change-Id: I61f303348e5292844a822612f100dbe006489e3e
2016-11-11 15:17:53 -08:00
Marco Paniconi
b6f6169348 Merge "vp9: Adjust thresholds for limiting cyclic refresh for noisy content." 2016-11-11 17:11:19 +00:00
James Zern
ba016b710a Merge "*ppflags.h: remove unused *_DEBUG_* enum values" 2016-11-10 20:53:22 +00:00
James Zern
80f6b243a7 Merge changes I339088b2,Iaade219e,If142afb1,I4257c4b3
* changes:
  fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
  fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
  partial_idct_test,NEON: add missing idct variants
  enable vpx_idct32x32_34_add_neon in hbd builds
2016-11-10 05:02:39 +00:00
James Zern
a1c40a2c1a fdct8x8_test: add vpx_idct8x8_64_add_neon in hbd
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter

+ enable it for all NEON configs, both intrisincs and assembly versions
exist

BUG=webm:1294

Change-Id: I339088b2a398200f95658d040034fb9b2a7c8ce0
2016-11-09 20:04:27 -08:00
Linfeng Zhang
40ab0424d4 Add high bitdepth intra prediction NEON optimization (mode d45 and d135)
BUG=webm:1316

Change-Id: I6a330874348df04df24a6d9efdc06f567e04bf8e
2016-11-09 12:04:04 -08:00
James Zern
4807f1584c *ppflags.h: remove unused *_DEBUG_* enum values
usage of the vp8 versions was removed in:
3f72509 vp8: remove VP8_SET_DBG* control support

vp9 had the usage stripped even earlier.

Change-Id: I978142eb6492552cd29c9c6feb1e89acfc5f7b84
2016-11-08 21:09:16 -08:00
James Zern
cfbb599335 fdct4x4_test: add vpx_idct4x4_16_add_neon in hbd
this was enabled in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter

+ enable it for all NEON configs, both intrisincs and assembly versions
exist

BUG=webm:1294

Change-Id: Iaade219e9d1de7b69423670d3ea6271b0965e068
2016-11-08 18:29:40 -08:00
James Zern
c344dee463 partial_idct_test,NEON: add missing idct variants
idct4x4 and idct8x8 were universally enabled for high-bitdepth builds
in:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter

BUG=webm:1294

Change-Id: If142afb169c48728cc4b222e7c41aa4a63f95f0f
2016-11-08 18:29:35 -08:00
James Zern
738c8f23c6 enable vpx_idct32x32_34_add_neon in hbd builds
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.

BUG=webm:1294

Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9
2016-11-08 17:03:36 -08:00
Marco
18794d8ddc vp9: Adjust thresholds for limiting cyclic refresh for noisy content.
For noisy content, be more aggressive in skippping some blocks for
delta-qp to reduce noise pulsing artifact. Also treat frame boundary
case when dimension is not multiple of superblock size/64.

Only affects non-screen content case, and when source noise
is measured to be high (at least level kMedium).

Change-Id: Ib13a2a20ed1ce37ff3c44d95c3ef2635fd695222
2016-11-08 15:50:46 -08:00
Johann
f5141ea45f Refine vp8_refining_search_sadx4 targeting
This uses the same sdx4df pointers as vp8_diamond_search_sadx4 and
should therefore target the same optimizations.

See e4ddf9db6a

Change-Id: Ic298e9b25c34bbe6b7a0799509355b0addb56675
2016-11-08 15:22:44 -08:00
Johann
50b40f114c Optimize idct32x32_135_add for NEON
BUG=webm:1295

Change-Id: I7f80ef4d29813fcb401fc6075babf19e3c195462
2016-11-08 22:06:07 +00:00
Linfeng Zhang
64a5a8fd6f Merge "Add high bitdepth intra prediction NEON optimization (mode dc)" 2016-11-08 16:53:42 +00:00
James Zern
3fdfbcb73d Merge "partial_idct_test: set MinSupportedCoeff for NEON" 2016-11-08 01:05:16 +00:00
Johann Koenig
5c64c01c7c Merge "ads2gas: remove RN stanza" 2016-11-08 00:44:17 +00:00
Johann
271de2c9fb ads2gas: remove RN stanza
The matching on ads2gas_apple.pl is too liberal and catches
CONFIG_EXTERNAL_BUILD and CONFIG_INTERNAL_STATS because they have RN in
the names.

The RN renaming feature is not used in any existing assembly files. It
was used in some armv6 files but they were removed.

Change-Id: Ib65abf1947d3e89f0d1584e2a5de399d24008f95
2016-11-07 16:21:16 -08:00
James Zern
40bcb96abd partial_idct_test: set MinSupportedCoeff for NEON
vpx_idct4x4_16_add_neon fails with INT16_MIN, +1 is all right

BUG=webm:1335

Change-Id: I25830c8ab0782822fc3c9db6cc669c2e65f2700e
2016-11-07 15:47:09 -08:00
Linfeng Zhang
d545c19afa Rename vpx_highbd_idct8x8_10{*}() to vpx_highbd_idct8x8_12{*}()
Also update its trigger threshold from 10 to 12.

Change-Id: Ib8dddd87a5a22a12ca66e7084d342fbb027b0a2f
2016-11-07 09:07:55 -08:00
Linfeng Zhang
a9874961f0 Merge "Replace highbd_dct_const_round_shift with dct_const_round_shift" 2016-11-07 16:55:01 +00:00
Johann Koenig
ac495218fb Merge "idct test: use coeff consistently" 2016-11-06 00:13:05 +00:00
Johann Koenig
a139ecd0c9 Merge "partial_idct_test: Add large coefficient test" 2016-11-06 00:13:00 +00:00
Johann Koenig
5d0a271ded Merge "Update vp9_fdct8x8_quant_ssse3 for highbitdepth" 2016-11-06 00:12:13 +00:00
James Zern
6e179dacd0 Merge "vp9-svc: Add unittest for svc-decoding." 2016-11-05 02:47:11 +00:00
Johann
e851160642 idct test: use coeff consistently
Change-Id: I913a13066993a3315a0ff8310b3cad1572d4cdd7
2016-11-04 18:41:59 -07:00
Johann
9ad3e14015 partial_idct_test: Add large coefficient test
Two functions do not pass this test:
vpx_idct8x8_64_add_ssse3
vpx_idct8x8_12_add_ssse3

The test has been modified to avoid triggering an issue with those
functions but they still must be investigated.

BUG=webm:1332

Change-Id: I52569a81e8e6e0b33c4a4d060d0b69c3fc4f578e
2016-11-04 18:37:58 -07:00
Marco
eefc7d1412 vp9-svc: Add unittest for svc-decoding.
To test the VP9_DECODE_SVC_SPATIAL_LAYER decoder control
introduced in 86b0042.

Change-Id: I3d164a41d7bbab14c0aee80fd890870704a18f6e
2016-11-05 01:29:51 +00:00
Johann
e10c95dc83 Update vp9_fdct8x8_quant_ssse3 for highbitdepth
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2

BUG=webm:1304

Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
2016-11-05 01:23:07 +00:00
Linfeng Zhang
04c3bf3c85 Replace highbd_dct_const_round_shift with dct_const_round_shift
They are identical.

Change-Id: I1ccaf03c81c3cbf88e82d77ffeb8204f5b063c61
2016-11-04 16:15:02 -07:00
Linfeng Zhang
32326c2f13 Merge "Cosmetics of inv_txfm.c" 2016-11-04 22:40:03 +00:00
Johann Koenig
900ec31bea Merge "Extract high bit depth helper functions" 2016-11-04 21:03:17 +00:00
Linfeng Zhang
b68d8107cb Cosmetics of inv_txfm.c
Unify code of 8-bit and high bitdepth.

Change-Id: I3fe441577af0249030ca3a1ef769eb9030711434
2016-11-04 13:24:41 -07:00
Johann
cf35ffc025 Extract high bit depth helper functions
These can be used in the vp9 fdct as well.

Change-Id: I4f3875e0cba1b8cad209c3a0581e121deba7675e
2016-11-04 18:13:51 +00:00
James Zern
232221b83a Merge "configure: disable tools for armv7-win32-vs1[24]" 2016-11-04 17:48:24 +00:00
Martin Storsjo
34c35b6fb6 Add a missing END directive in idct_neon.asm
This fixes building with MS armasm.

Change-Id: I2629eeed859b775ca667a65ba109f8d1bf7b0e03
2016-11-04 12:21:18 +02:00
Martin Storsjo
c559cc6191 Fix producing vcxproj files with *.S arm assembly files
These cases were leftover in
1ddb4c0362.

Change-Id: Ie058fb6c78580e60205c47a1d314bd66e794cde4
2016-11-04 12:21:13 +02:00
James Zern
90a135854c configure: disable tools for armv7-win32-vs1[24]
this shares the same prohibition as the examples

Change-Id: I17d65e4f26847af8cbb1d1a3c4a114ed021a8b9f
2016-11-03 22:54:35 -07:00
Marco Paniconi
cca774c7df Merge "vp9: Non-rd pickmode: fix logic in reference masking." 2016-11-03 23:12:05 +00:00
Marco
86b0042f44 vp9-svc: Add decoder control to decode up to x spatial layers.
Change-Id: I85536473b8722424785c84c5b5520960b4e5744a
2016-11-03 11:18:00 -07:00
Marco
da9f762e24 vp9: Non-rd pickmode: fix logic in reference masking.
Add condition that usable_ref_frame > LAST.
This is to avoid potentially skipping all last-nonzero mv modes,
if golden is used as a reference but skipped completely for the
current block.

This has no effect currenty, as we always consider testing golden
mode for each block.

Change-Id: I3182cf44664081935a90ed43aa7b32e710e60e22
2016-11-03 10:32:57 -07:00
Debargha Mukherjee
f93305aa07 Merge "Speed-up recode loop for extreme bitrate diffs" 2016-11-03 17:04:17 +00:00
Jerome Jiang
cb5a2ac920 Merge "pp_filter_test.cc,cosmetics:adjust name convention" 2016-11-03 04:31:35 +00:00
Jerome Jiang
3e961c09be pp_filter_test.cc,cosmetics:adjust name convention
Change-Id: I81b6fc9b83f0febbb12975aef92768bbd273fd61
2016-11-02 13:50:00 -07:00
Linfeng Zhang
1338c71dfb Clean horizontal intra prediction NEON optimization
Change-Id: I1ef0a5b2655cbc7e1cc2a4a1a72e0eed9aa41f05
2016-11-02 11:43:45 -07:00
Linfeng Zhang
1868582e7d Add 32x32 d45 and 8x8, 16x16, 32x32 d135 NEON intra prediction
Change-Id: I852616794244490123eb615ac750da50265f0fa5
2016-11-02 11:40:37 -07:00
Johann Koenig
5ac7a59a05 Merge "arm idct: move to-be-shared code to header" 2016-11-02 18:09:45 +00:00
Paul Wilkins
295cd3b493 Merge "Fixed bug in formatting of debug stats." 2016-11-02 17:10:07 +00:00
paulwilkins
de76d2e315 Fixed bug in formatting of debug stats.
Fixed formatting bug introduced by the fix to BUG=webm:1322
( Iedc4477aef1746aa0a4f84d88a1156296fd3ba87)

Change-Id: I715ee446c0e8584967ab87ba4e355759dd394187
2016-11-02 09:38:18 +00:00
James Zern
1961a92a94 vp9,tile_worker_hook: correctly set jmp target
vp9_init_macroblockd() resets the error_info to cm's global copy; this
needs to be set to the thread-level target to avoid jumping to the
incorrect stack, resulting in hang or crash.
broken since:
1f4a6c8 vp9/tile_worker_hook: add multiple tile decoding
includes v1.5.0, v1.6.0

BUG=629481

Change-Id: Icbf1696b25ba8c479e845fbf227b3c3ca73542f5
2016-11-01 18:45:50 -07:00
Linfeng Zhang
3b74066b10 Add high bitdepth intra prediction NEON optimization (mode dc)
BUG=webm:1316

Change-Id: I984d6004ea2445e86f213fb6fa4d794a9955af8f
2016-11-01 17:07:36 -07:00
Johann
bf8ab194ee arm idct: move to-be-shared code to header
Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e
2016-11-01 15:43:56 -07:00
James Zern
1b275ab898 Merge "idct32x32_1_add_neon: clear a couple conv warnings" 2016-11-01 22:34:59 +00:00
James Zern
9de91855ef Merge changes I08af3a54,If5959a25,I6763e62e
* changes:
  build/make/Android.mk: s/armv8/arm64/
  build/make/Android.mk: fix armeabi-v7a build
  use .S suffix rather than .s for NEON asm
2016-11-01 21:43:13 +00:00
Linfeng Zhang
05ee241493 Add high bitdepth intra prediction optimization speed test
BUG=webm:1316

Change-Id: I99feec867d5b8ea06b43cdd3fcd7c90238f5efdb
2016-11-01 13:57:01 -07:00
Linfeng Zhang
0c88014592 Merge "Refine 8-bit intra prediction NEON optimization (mode tm)" 2016-11-01 19:38:07 +00:00
Linfeng Zhang
cc5f49767a Refine 8-bit intra prediction NEON optimization (mode tm)
Change-Id: I98b9577ec51367df5e5d564bedf7c3ea0606de4c
2016-11-01 09:45:16 -07:00
Paul Wilkins
84dcfced5b Merge "Change to KF boost calculation." 2016-11-01 09:29:30 +00:00
James Zern
7625c803b3 idct32x32_1_add_neon: clear a couple conv warnings
int16_t -> uint8_t

Change-Id: I3c5e0985bc3584dce289c35b5973de24cdc73b76
2016-10-31 18:56:34 -07:00
James Zern
2e076ffe50 build/make/Android.mk: s/armv8/arm64/
the configure target is arm64-android-gcc which generates .mk files of
the same form

Change-Id: I08af3a54ef203b1496d185a0f8c8fe702881a173
2016-10-31 18:35:23 -07:00
James Zern
ae32318170 build/make/Android.mk: fix armeabi-v7a build
vpx_config.asm and idct_neon.asm.S are required since:
3ae2597 idct,NEON: add a tran_low_t->s16 load adapter

Change-Id: If5959a25edb370dd7dcdca71c96e9a5aad0840ce
2016-10-31 18:34:16 -07:00
James Zern
1ddb4c0362 use .S suffix rather than .s for NEON asm
for compatibility with other build systems

Change-Id: I6763e62e3126850ad4f8ad29e388b8dad0bbc4c3
2016-10-31 16:39:05 -07:00
Marco Paniconi
7cf0c000cf Merge "vp9-svc: Fix some stats in vp9_spatial_svc_encoder." 2016-10-31 23:16:36 +00:00
Marco
41ad80f69d vp9-svc: Fix some stats in vp9_spatial_svc_encoder.
Correction to rate control stats output under -rcstat.

Change-Id: I46fa5d2a66ed657121ee3d685608a148bb9a7bb3
2016-10-31 15:24:13 -07:00
James Zern
410d947c5f Merge "idct,NEON: add a tran_low_t->s16 load adapter" 2016-10-31 21:59:12 +00:00
Peter Boström
11b099ea46 Merge "Add vp9_spatial_svc_encoder to .gitignore." 2016-10-31 20:01:11 +00:00
Linfeng Zhang
cde5d5db13 Merge "Refine 8-bit intra prediction NEON optimization (mode h and v)" 2016-10-31 19:57:23 +00:00
Peter Boström
39bcb49909 Add vp9_spatial_svc_encoder to .gitignore.
Change-Id: I3c90d657cca533264dd62bb7749c53a862d0352a
2016-10-31 14:55:56 -04:00
Marco Paniconi
702b3e1ee5 Merge "vp9-svc: Add checks to layer bitrates in vp9_spatial_svc_encoder." 2016-10-31 18:23:14 +00:00
James Zern
3ae25974fd idct,NEON: add a tran_low_t->s16 load adapter
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.

BUG=webm:1294

Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00
Linfeng Zhang
a347118f3c Refine 8-bit intra prediction NEON optimization (mode h and v)
Change-Id: I45e1454c3a85e081bfa14386e0248f57e2a91854
2016-10-31 10:33:44 -07:00
Marco
e1cdb50298 vp9-svc: Add checks to layer bitrates in vp9_spatial_svc_encoder.
Add some checks to the layer bitrates passed in through the command line.

Change-Id: I16f270035a6034d63313fe3019aa90dca9a3eefb
2016-10-31 10:07:24 -07:00
James Bankoski
fb9fef83c7 Merge "vpxdec.c : don't double count corrupted frames" 2016-10-31 13:58:15 +00:00
Jim Bankoski
30f3017697 vpxdec.c : don't double count corrupted frames
A past patch made it so that every frame that had a decode error
caused a corrupted frame to be counted.  Unfortunately it was possible
to get both a decode error and a corrupt frame for the same frame
and thus double count an error. This code makes that impossible.

Change-Id: Iea973727422a3bf093ffda72fa358a285736048b
2016-10-31 06:09:58 -07:00
James Zern
086aab7e13 tiny_ssim: fix visual studio build
s/inttypes.h/vpx_integer.h/
clear a uint64_t -> double conversion warning

Change-Id: I58d108b083787a754152eb79ef6df61c2c5f95b1
2016-10-29 13:04:07 -07:00
Peter Boström
ae206924a6 Merge "Add temporal-layer support to tiny_ssim." 2016-10-29 01:25:01 +00:00
Johann Koenig
5724e8c4c7 Merge "partial_idct_test: add _add_ test" 2016-10-29 00:34:16 +00:00
Peter Boström
fd4efde489 Add temporal-layer support to tiny_ssim.
Permits skipping 0, 1/2 or 3/4 of the frames, corresponding to
temporal layers 2, 1 and 0 of a 3-temporal-layer encoding. 1/2
corresponds to TL0 in a 2-layer encoding.

Change-Id: I7f6d131f63707e5262fc67d111bfb3a751ede90d
2016-10-28 14:56:05 -04:00
Marco Paniconi
7042137e60 Merge "vp9: Updates to SVC sample encoder." 2016-10-28 18:32:41 +00:00
Marco
a8fdb3926e vp9: Updates to SVC sample encoder.
Allow for passing in the layer bitrates at command line.
Fix to allow passing in bitrate for each spatial-temporal layer.

Change to some default values for 1 pass cbr mode:
spatial scale and qp-max/min.

Small fixes to some build warnings.

Change-Id: I3f9a776262712480a6570bb863a835b2fc49935a
2016-10-28 10:59:58 -07:00
Yaowu Xu
9205f54744 Merge "Add tools/tiny_ssim for generating SSIM/PSNR." 2016-10-28 17:21:48 +00:00
Peter Boström
7c75cae74a Add tools/tiny_ssim for generating SSIM/PSNR.
Change-Id: Icc3e5aaa6636ffe17dc9da5f7a80afaccbde509a
2016-10-28 12:39:49 -04:00
Linfeng Zhang
4d305dab34 Merge "Refine 8-bit intra prediction NEON optimization (mode d45 and d135)" 2016-10-28 15:58:01 +00:00
Paul Wilkins
715c65914b Change to KF boost calculation.
This  change is a step in a larger change to the way boost and interval are
determined for ARF and Key frames.

This patch contains some pluming for the general case but focuses on the
key frame boost calculation. This now relies more heavily on the rate at
which the error score increases between the primary and secondary reference
frame. This seems to be less fragile when dealing with different frame sizes.
For example larger image formats tend in the first pass to see a higher
% of intra coded blocks and the use of this number in calculating the frame
decay factor was leading to much lower boost numbers for 4K, for example,
than the same clip coded at 2K.

This change does give overall gains but they are MUCH larger for the 4K Netflix
set. For the 4K Netflix set the average gain is around 3% with some clips > 20%
whereas for the same set at 2K the average gain is 0.5-1%.

In general for small image formats the boost is most often reduced a little whereas
4K clips the boost is increased. There are some -ve cases such as Akiyo at 352x288
where the reduced boost hurts the metrics, especially for SSIM, even while
the set as a whole improves. This is most notable at very low Q and may be the
subject of a future patch.

Some common code for KF and ARF was separated in this patch for the purposes of
tuning but may later be re-merged if appropriate.

Change-Id: Iaa15ac5a58d2be89181100d95cef6a8dc4b12d0d
2016-10-28 15:35:59 +01:00
Debargha Mukherjee
4f7a59c802 Merge "Force recode if framesize exceeds max allowed size" 2016-10-28 04:21:44 +00:00
Linfeng Zhang
4ae9f5c092 Refine 8-bit intra prediction NEON optimization (mode d45 and d135)
dst += stride behaving better with gcc/clang.
Unroll loops.

Change-Id: I83f85df2bc9f17c6159542f57680b509395db2b1
2016-10-27 14:24:50 -07:00
Debargha Mukherjee
1cd987d922 Speed-up recode loop for extreme bitrate diffs
Adjusts the q adjustement step depending on how far the
projected and target rates differ.

Change-Id: I498d03523ca233a270512ca3972c372daa4ca2a8
2016-10-27 11:08:44 -07:00
Debargha Mukherjee
54e03017b6 Force recode if framesize exceeds max allowed size
Fixes a case where recode is not triggered based on the value
of maxq passed into the recode loop test function.

BUG=b/32375284

Change-Id: I15ad985d0525c68e0443cfaf842440d2754b2266
2016-10-27 09:52:51 -07:00
Johann Koenig
4555c50ecd Merge "partial_idct_test: consolidate block size" 2016-10-27 15:41:44 +00:00
Paul Wilkins
aadfde4687 Merge "Changes to KF boost calculation." 2016-10-27 10:21:23 +00:00
Paul Wilkins
02deeea447 Merge "Removal of a couple of two pass adjustments." 2016-10-27 10:21:06 +00:00
Johann
7994dba6c0 partial_idct_test: add _add_ test
The result of the transform is added to the destination buffers. In the
existing tests the destination buffer is always empty so that portion of
the code was never exercised.

Change-Id: I1858c4fed2274f1b9faf834d2ba4186a4510492a
2016-10-26 21:35:49 -07:00
Johann
ed2c240538 partial_idct_test: consolidate block size
Use *input_block_ for sizeof() calculation like the other test

Change-Id: I1e4bd227131662056405af78c5052ad6ef769e9f
2016-10-26 21:35:03 -07:00
Johann
08e0da30ca Refactor partial idct test
Switch to using correctly sized inputs and outputs. This simplifies
adding tests with varying strides.

Change-Id: I716a0d8173dcf6a86d56656ac9d3101b7ec27642
2016-10-26 12:28:18 -07:00
Paul Wilkins
de859676dd Changes to KF boost calculation.
Remove double counting of decay. Limit maximum KF boost.

Change-Id: I0fb2344d0f78b5e95bb899dfad12b0ca84034b2c
2016-10-26 17:53:29 +01:00
paulwilkins
ccd6a8e2fa Removal of a couple of two pass adjustments.
Removed a couple of adjustments that no longer move the needle
much but complicate the process of tuning.

Change-Id: Ie320f5cf155e6aac14a4757ea9ada2cd59f27590
2016-10-26 17:52:37 +01:00
Linfeng Zhang
9c0680bd43 Merge "Refine 8-bit intra prediction NEON optimization (mode dc)" 2016-10-26 16:51:44 +00:00
Johann
9720b58aac Optimize idct32x32_34_add for NEON
Approximately 3 times faster than the 1024 version which was used
previously.

BUG=webm:1295

Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9
2016-10-25 15:43:58 -07:00
James Zern
98ffc49204 Merge "Update vp9_intrapred_test.cc to support 8-bit" 2016-10-25 21:59:29 +00:00
Yunqing Wang
4b8b1bae52 Merge "Modify the encoder multi-thread unit test" 2016-10-25 20:54:26 +00:00
Yunqing Wang
c327b3f0b0 Modify the encoder multi-thread unit test
Modified the encoder multi-thread test so that it included cpu-used=0 and
frame-parallel=0.

frame_parallel_decoding_mode is 1 by default, which disables probability
updating and gives lower encoding quality. Current VP9 multi-threading
encoder and decoder support probability updating. To test this part, we
should turn on it in the unit test, namely, setting frame-parallel to 0.

Change-Id: Ia1f86e01f0de628f50d819ae31509de3e1b6c755
2016-10-25 11:35:01 -07:00
James Bankoski
f53d3363ac Merge "vpxdec: return fail if frame fails to decode." 2016-10-25 18:34:07 +00:00
Yunqing Wang
c192def8f3 Change 2 motion search counts to be tile data
This patch modified the motion search counts used in:
https://chromium-review.googlesource.com/#/c/305640/

These 2 counts were originally added as thread data, and used to
make decisions in motion search. The tile encoding order can be
inconsistent while using different number of threads, which can
cause bitstream mismatch. Here moved them to tile data to solve
the issue.

BUG=webm:1322

Change-Id: Iedc4477aef1746aa0a4f84d88a1156296fd3ba87
2016-10-25 10:12:41 -07:00
Linfeng Zhang
ce88b8f5c5 Refine 8-bit intra prediction NEON optimization (mode dc)
dst += stride behaving better with gcc/clang
Expanding inline function dc_SIZExSIZE() save intructions for
vpx_dc_predictor_SIZExSIZE_neon().

Change-Id: Id0ccbd58b6a31df539141fd33bdf28633339150d
2016-10-24 13:18:51 -07:00
Linfeng Zhang
d1c74c149b Update vp9_intrapred_test.cc to support 8-bit
BUG=webm:1316

Change-Id: Ic9309bbeeef52e9d07fb4a4c95c12efa813cbf8c
2016-10-24 13:13:55 -07:00
Jim Bankoski
7ef094c02f vpxdec: return fail if frame fails to decode.
A failure to decode is most likely equivalent to a corrupt
frame for the purpose of returning a failure.

Change-Id: Ie53db2b8130b40b725841f5f7a299d63aa56913d
2016-10-24 12:05:59 -07:00
James Zern
2e6a1976a0 Merge "remove idct32x32*_add_neon.asm" 2016-10-22 02:29:56 +00:00
James Zern
5d91752a98 Merge "vpx_highbd_convolve_copy_neon: use multi reg loads" 2016-10-22 02:28:15 +00:00
Vignesh Venkatasubramanian
9a032fa262 Merge "vp9_bitstream: Encode tiles in parallel" 2016-10-22 02:23:06 +00:00
Vignesh Venkatasubramanian
5deffa1175 vp9_bitstream: Encode tiles in parallel
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis.  Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).

BUG=webm:1309

Change-Id: I8a80da7c5089b837d0df79a5c49d5e3022dfc8ec
2016-10-21 17:35:03 -07:00
Marco
ee1b3f34c0 vp9: Nonrd variance partition: increase threshold for using 4x4 avg.
In variance partition low resolutions may use varianace based on
4x4 average for better partitioning.
Increase the threshold for doing this at speed = 8.

Improves speed by ~5%, with little loss, < 1%, on RTC_derf set.

Change-Id: Ib5ec420832ccff887a06cb5e1d2c73199b093941
2016-10-21 11:51:06 -07:00
James Zern
9dbb3ad396 remove idct32x32*_add_neon.asm
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.

BUG=webm:1303

Change-Id: I4d31f9c017944681b881493525d4573a7a5b1e16
2016-10-20 19:47:14 -07:00
Marco
a7d116aa67 vp9: Speed=8 real-time: Keep the bias_golden feature on.
Small/no change in metrics on RTC set, speed increase by 2-3%.

Change-Id: Iee997bd7433e8e508216e9267b1c31c5a9aa5121
2016-10-20 17:03:51 -07:00
Marco Paniconi
32e63efcfb Merge "vp8: Apply gf target-size boost only when refresh_golden_frame = 1." 2016-10-20 22:38:55 +00:00
Vignesh Venkatasubramanian
83ca63582a Add vp9cx_set_ref to .gitignore
Get rid of the 'git status' clutter when building with examples.

Change-Id: I20b715ddfc6c8ccb4993de7ebb2b4ad6df9ea437
2016-10-20 12:07:30 -07:00
Marco
9fdae93858 vp8: Apply gf target-size boost only when refresh_golden_frame = 1.
Change only affects 1 pass cbr, error resilience off.

Change-Id: I68b896b09d722995a71c44331233e97bd862bcfc
2016-10-20 11:32:29 -07:00
James Zern
995a967f19 Merge "third_party: roll libwebm snapshot" 2016-10-19 22:34:25 +00:00
Marco
9624964832 vp8: Adjust threshold to set the gf_noboost flag.
Change only affects 1 pass cbr, with error_resilient off.

Change-Id: Ibf254d8772fa2a8f188c9932d37b2f42362d8003
2016-10-19 12:55:37 -07:00
Marco
ff38b8dfae vp8: Add control for gf boost for 1 pass cbr.
Control already exists for vp9, adding it to vp8.
Usage is only when error_resilient is off.
Added a datarate unittest for non-zero boost.

Change-Id: I4296055ebe2f4f048e8210f344531f6486ac9e35
2016-10-19 09:43:53 -07:00
James Zern
7f31bfeddb Revert "vp9_bitstream: Encode tiles in parallel"
This reverts commit 9e8efa5b18.

this change causes ubsan warnings, failures in
vpxenc_vp9_webm_rt_multithread_tiled

BUG=webm:1309

Change-Id: I020c7be985c771bfff4b3de1afe51cc8edb980da
2016-10-18 22:47:48 -07:00
James Zern
68833c7f85 third_party: roll libwebm snapshot
git log --no-merges 32d5ac4..9732ae9
9732ae9 EbmlElementSize: quiet uint64->int32 conv warning
da04eba SetProjectionPrivate: quiet uint64->size_t conv warning
6db32d5 mkvparser,Projection::Parse: fix int->bool conv
3bb0dfa cosmetics: fix a couple lint warnings
0e179d6 update .clang-format
fc5f88d Fix temp files being left on system.
c04a134 Add support for overriding PixelWidth and PixelHeight.
c0160e0 Add support to explicitly set segment duration.
02bc809 Add support to estimate file duration.
c97e3e7 Add support to output sub-sample encryption information.
26f4344 MakeUID: quiet unused param warning in Android builds
d6af52a Change check to fix compile error.
1720020 webm_parser: Add Mesh value for ProjectionType
78f2c5a webm_parser: Use ./ prefix for includes
da62f65 webm_parser: Remove webm/ prefix from public includes
e15e8f2 webm_parser: Update README build instructions
5023f2b mkvmuxer: Fix Colour::Valid()
cf16204 mkvmuxer_tests: Actually test cue points in the cue point test.
93e9fb3 Validate Colour element values.
8036925 mkvparser_tests: Add Projection element test.
f52d38c mkvparser_tests: Add Colour element test.
826436a mkvparser: minor SeekHead::Entry clean up.
24fb44a mkvmuxer_tests: Add Projection element test.
1e0a8ea mkvmuxer_tests: Add Colour element test.
0278616 mkvmuxer: Colour accessors/mutators.
2346f8f Add mkvparser wrapper functions.
54d6b6b webm_info: Add Projection element support.
65fee06 mkvmuxer_sample: Add support for Projection element.
9a3f2b5 mkvparser_sample: Add support for Projection element.
41e814a mkvparser: Add Projection element support.
483a0ff mkvmuxer: Add Projection element support.
676a713 Add support for the Projection element
725f362 mkvmuxer: Fix memory leak when Colour is set multiple times.
fa182de mkvparser_sample: Add output of audio track codec private size.
8f521f2 mkvparser_tests: Add invalid BlockGroup test.
39137d7 Remove docs saying binary elements default to 0
80685d3 Do not skip over unknown elements at the root level
c147504 Fix legacy Makefile.
58711e8 mkvparser_sample: Fix version info string.
837746f mkvparser_tests: Add invalid block test.
207cd80 Disambiguate sample sources and targets.
a112d71 mkvparser_tests: Refactor invalid file loading code.
5dea33e Disambiguate test source and target names.
125049e parser_tests: Add another truncated chapter string test.
1de8d4c parser_tests: Add truncated chapter string test.
ff8c2b6 parser_tests: Move cue validation to test_util.
4b0690f parser_tests: Add invalid lacing test.
9828e39 mkvmuxer: Set default doc type version to 4.
5495a59 webm_parser: Reference more files in CMakeLists.txt.
0c0ecd0 vpxpes_parser: Add start code emulation prevention support.
639a4bc webm2pes: Remove debug printfs().
9a51102 webm2pes: fflush() in the correct conversion function.
dc7f155 webm2pes: Track total bytes written.
d518128 webm_parser: Enable usage of werror.
e1fe762 webm2pes: Add test for mux/demux of large input.
1b24a79 vpxpes_parser: Read and store PTS when present.
6cf0a0f vpxpes_parser: Store frame payloads.
25d2602 webm_parser: Convert style to match the rest of libwebm
24be76d webm2pes: Replace VpxFrame with VideoFrame.
b451c3b Add a basic video frame storage class.
05c90eb libwebm_util: Clarify error text in superframe parser.
e6415af webm2pes: Make WritePesPacket() a public method.
8f840dd webm2pes: Move frame read out of PES packet write method.
448af97 webm2pes: Restore frame fragmentation support.
f8bb714 cmake: Integrate new parsing API and tests.
cb8ce0b Add a new incremental parsing API
900d322 vpxpes_parser/webm2pes: BCMV and PTS fixes.
4b73545 webm2pes: Add start code emulation prevention.
82903f3 Add column tiles and frame parallel to webm_info
5d91edf style_clean_up: Remove unnecessary parentheses
a95aa4b vp9_level_stats: correct total_uncompressed_bits_ calculation
f46566f mkvreader: Fix shorten-64-to-32 warning in 32 bit builds.
76630ca mkvwriter: Fix shorten-64-to-32 warning in 32 bit builds.
a8ffbd4 webm2pes: Fix format specifier warnings.
faf89d4 Add MaxLumaSampleRate grace percent to stats.
d31e6c9 Fix profile 2 in vp9_header_parser.
bd3ab3a Add flag to estimate last frame's duration to stats.
c182ed9 Fix lint issue in hdr_util.h
cc62ecd Add test for Cluster memory leak
196708a Change MaxLumaSampleRate to be based on frame resolution.
cbd676b mkvmuxer: Fix leak when a Cluster isn't finalized
9a235e0 mkvmuxer: Set doctype to matroska when muxing non-WebM codecs.
47f2843 Add parsing support for new features in CodecPrivate.
e3c9576 Add VP9 level output to webm_info.
5cf549f cmake: Log compiler flag at check time.
bbaaf2d Add class to gather VP9 level stats.
8bb68c2 Add file to parse data from VP9 frames.
296429a Add support to parse VP9 profile.
df3412f Add support for setting VP9 profile and level to sample_muxer.
87832d4 mkvmuxer: Fix Segment::Finalize in kLive mode
6df3e56 mkvmuxerutil.hpp: Add using directives for overloaded size utils.
ec47928 mkvmuxerutil: Revert to using mkvmuxertypes.
a1dc4f2 Fix parsing of VP9 level.
4e3d037 Add support to output Colour elements to webm_info.
d3656fd muxer_tests: ignore iwyu re gtest-message.h
e76dd5e Fix file name in mkvmuxertypes shim.
1be5889 Add temporary include shims at old file locations.
039df94 Add TEST_TMPDIR environment variable

Change-Id: I84bc1401b0aad71ad6727b687f1bede9953a7a08
2016-10-18 18:11:36 -07:00
James Zern
a60dd5c83a Merge "Fix warnings reported by -Wshadow: Part1: vpx_dsp directory" 2016-10-18 22:09:29 +00:00
James Zern
53d8ff6f14 Merge "Revert "third_party: Roll libwebm snapshot."" 2016-10-18 20:06:48 +00:00
Kaustubh Raste
8ff5af773a Merge "Optimize sad_64width_x4d_msa function" 2016-10-18 07:46:02 +00:00
James Zern
171e2ccf99 Revert "third_party: Roll libwebm snapshot."
This reverts commit 808a560be6.

causes build warnings under visual studio

Change-Id: I2e49a75d72469f316e8b01929b783e6f727f756c
2016-10-17 23:24:47 -07:00
Kaustubh Raste
b7310e2aff Optimize sad_64width_x4d_msa function
Reduced HADD_UH_U32 macro calls

Change-Id: Ie089b9a443de516646b46e8f72156aa826ca8cfa
2016-10-18 04:05:33 +00:00
Urvang Joshi
e084e05484 Fix warnings reported by -Wshadow: Part1: vpx_dsp directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Change-Id: I75c4248cb75aa54c52111686f139b096dc119328
(cherry picked from aomedia 09eea21)
2016-10-17 19:25:19 -07:00
James Zern
68cd3052ca vpx_highbd_convolve_copy_neon: use multi reg loads
for copy16/32/64

BUG=webm:1299

Change-Id: I5080d736bde7e487c80ef3d7024dda1e96a57eaf
2016-10-17 17:15:03 -07:00
Marco Paniconi
f6980ca68e Merge "vp9: Non-rd variance partition: add condition for 64x64 split." 2016-10-18 00:03:17 +00:00
Linfeng Zhang
b0cc8d5cc6 Merge "add vpx high bitdepth convolve8 NEON intrinsics optimization" 2016-10-17 23:57:14 +00:00
Linfeng Zhang
9c8981c666 add vpx high bitdepth convolve8 NEON intrinsics optimization
BUG=webm:1299

Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Marco
55a2b67368 vp9: Non-rd variance partition: add condition for 64x64 split.
Add stronger condition for splitting 64x64, for low noise content.
This reduces dragging artifact near moving head.

Little/no change in metrics on RTC set.

Change-Id: I39b38cfd20f2ece53ff49c2aaf76ba9f82761be1
2016-10-17 12:54:27 -07:00
Frank Galligan
808a560be6 third_party: Roll libwebm snapshot.
fc5f88d Fix temp files being left on system.
c04a134 Add support for overriding PixelWidth and PixelHeight.
c0160e0 Add support to explicitly set segment duration.
02bc809 Add support to estimate file duration.
c97e3e7 Add support to output sub-sample encryption information.
26f4344 MakeUID: quiet unused param warning in Android builds
d6af52a Change check to fix compile error.
1720020 webm_parser: Add Mesh value for ProjectionType
78f2c5a webm_parser: Use ./ prefix for includes
da62f65 webm_parser: Remove webm/ prefix from public includes
e15e8f2 webm_parser: Update README build instructions
5023f2b mkvmuxer: Fix Colour::Valid()
cf16204 mkvmuxer_tests: Actually test cue points in the cue point test.
93e9fb3 Validate Colour element values.
8036925 mkvparser_tests: Add Projection element test.
f52d38c mkvparser_tests: Add Colour element test.
826436a mkvparser: minor SeekHead::Entry clean up.
24fb44a mkvmuxer_tests: Add Projection element test.
1e0a8ea mkvmuxer_tests: Add Colour element test.
0278616 mkvmuxer: Colour accessors/mutators.
2346f8f Add mkvparser wrapper functions.
54d6b6b webm_info: Add Projection element support.
65fee06 mkvmuxer_sample: Add support for Projection element.
9a3f2b5 mkvparser_sample: Add support for Projection element.
41e814a mkvparser: Add Projection element support.
483a0ff mkvmuxer: Add Projection element support.
676a713 Add support for the Projection element
725f362 mkvmuxer: Fix memory leak when Colour is set multiple times.
fa182de mkvparser_sample: Add output of audio track codec private size.
8f521f2 mkvparser_tests: Add invalid BlockGroup test.
39137d7 Remove docs saying binary elements default to 0
c147504 Fix legacy Makefile.
80685d3 Do not skip over unknown elements at the root level
58711e8 mkvparser_sample: Fix version info string.
837746f mkvparser_tests: Add invalid block test.
207cd80 Disambiguate sample sources and targets.
a112d71 mkvparser_tests: Refactor invalid file loading code.
5dea33e Disambiguate test source and target names.
125049e parser_tests: Add another truncated chapter string test.
1de8d4c parser_tests: Add truncated chapter string test.
ff8c2b6 parser_tests: Move cue validation to test_util.
4b0690f parser_tests: Add invalid lacing test.
9828e39 mkvmuxer: Set default doc type version to 4.
5495a59 webm_parser: Reference more files in CMakeLists.txt.
0c0ecd0 vpxpes_parser: Add start code emulation prevention support.
639a4bc webm2pes: Remove debug printfs().
9a51102 webm2pes: fflush() in the correct conversion function.
dc7f155 webm2pes: Track total bytes written.
d518128 webm_parser: Enable usage of werror.
e1fe762 webm2pes: Add test for mux/demux of large input.
1b24a79 vpxpes_parser: Read and store PTS when present.
6cf0a0f vpxpes_parser: Store frame payloads.
25d2602 webm_parser: Convert style to match the rest of libwebm
24be76d webm2pes: Replace VpxFrame with VideoFrame.
b451c3b Add a basic video frame storage class.
05c90eb libwebm_util: Clarify error text in superframe parser.
e6415af webm2pes: Make WritePesPacket() a public method.
8f840dd webm2pes: Move frame read out of PES packet write method.
448af97 webm2pes: Restore frame fragmentation support.
f8bb714 cmake: Integrate new parsing API and tests.
cb8ce0b Add a new incremental parsing API
900d322 vpxpes_parser/webm2pes: BCMV and PTS fixes.
4b73545 webm2pes: Add start code emulation prevention.
82903f3 Add column tiles and frame parallel to webm_info
5d91edf style_clean_up: Remove unnecessary parentheses
a95aa4b vp9_level_stats: correct total_uncompressed_bits_ calculation
f46566f mkvreader: Fix shorten-64-to-32 warning in 32 bit builds.
76630ca mkvwriter: Fix shorten-64-to-32 warning in 32 bit builds.
a8ffbd4 webm2pes: Fix format specifier warnings.
faf89d4 Add MaxLumaSampleRate grace percent to stats.
d31e6c9 Fix profile 2 in vp9_header_parser.
bd3ab3a Add flag to estimate last frame's duration to stats.
c182ed9 Fix lint issue in hdr_util.h
cc62ecd Add test for Cluster memory leak
196708a Change MaxLumaSampleRate to be based on frame resolution.
cbd676b mkvmuxer: Fix leak when a Cluster isn't finalized
47f2843 Add parsing support for new features in CodecPrivate.
9a235e0 mkvmuxer: Set doctype to matroska when muxing non-WebM codecs.
e3c9576 Add VP9 level output to webm_info.
bbaaf2d Add class to gather VP9 level stats.
5cf549f cmake: Log compiler flag at check time.
8bb68c2 Add file to parse data from VP9 frames.
df3412f Add support for setting VP9 profile and level to sample_muxer.
296429a Add support to parse VP9 profile.
87832d4 mkvmuxer: Fix Segment::Finalize in kLive mode
6df3e56 mkvmuxerutil.hpp: Add using directives for overloaded size utils.
ec47928 mkvmuxerutil: Revert to using mkvmuxertypes.
4e3d037 Add support to output Colour elements to webm_info.
a1dc4f2 Fix parsing of VP9 level.
039df94 Add TEST_TMPDIR environment variable
d3656fd muxer_tests: ignore iwyu re gtest-message.h
e76dd5e Fix file name in mkvmuxertypes shim.
1be5889 Add temporary include shims at old file locations.

Change-Id: I6a1026814560be80d604a5ecb9b66406a1186dd9
2016-10-17 12:45:05 -07:00
Vignesh Venkatasubramanian
9e8efa5b18 vp9_bitstream: Encode tiles in parallel
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis.  Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).

BUG=webm:1309

Change-Id: Ia2c982da56697756e12f02643f589189b3271d98
2016-10-17 10:42:03 -07:00
Jerome Jiang
4c3d539baa Merge "VP8: Add realtime speed to datarate_test.cc" 2016-10-15 06:01:41 +00:00
Jerome Jiang
acd21e053a VP8: Add realtime speed to datarate_test.cc
Change-Id: Ia56f0e8dfba20143be3e69666d9184dd3ca5b563
2016-10-14 17:09:27 -07:00
Linfeng Zhang
6c309c1f59 Merge "add vpx_highbd_convolve_{copy,avg}_neon()" 2016-10-14 23:04:59 +00:00
James Bankoski
e49a02b113 Merge "Drop empty frames." 2016-10-14 16:38:56 +00:00
Jim Bankoski
3e21d703ce Drop empty frames.
Change-Id: I2d45a6eb3aaca97eb61e8e7ef9e5114221091244
2016-10-14 06:28:14 -07:00
Linfeng Zhang
f910d14a1a add vpx_highbd_convolve_{copy,avg}_neon()
BUG=webm:1299

Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13 15:21:14 -07:00
Marco
f5b8b473db vp8: Adjust thresholds in VP8/DatarateTestLarge tests.
Fix unit_tests_ubsan failure VP8/DatarateTestLarge.BasicBufferModel.
Failure was triggered by commit: df66f8e8.

Change-Id: I2c49e5cc24094b15063161bab27b09ec7e6f2045
2016-10-13 09:28:40 -07:00
James Zern
1909270f65 Merge "cosmetics,*loopfilter_neon.c: s/tranpose/transpose/" 2016-10-13 07:12:51 +00:00
Vignesh Venkatasubramanian
3e3475321c Merge "vp9_bitstream: Parameterize interp_filter_selected" 2016-10-13 06:33:31 +00:00
Vignesh Venkatasubramanian
769292017b vp9_bitstream: Parameterize interp_filter_selected
Facilitates encoding tiles in parallel.

BUG=webm:1309

Change-Id: I37aa336d47babffc8352188dc767eebdb8a99474
2016-10-12 20:22:03 -07:00
Kaustubh Raste
9e75c01353 Merge "Optimize vpx_mbpost_proc_across_ip_msa function" 2016-10-13 02:12:33 +00:00
Kaustubh Raste
99adf8b22e Merge "Optimize vpx_get4x4sse_cs_msa function" 2016-10-13 02:12:00 +00:00
James Zern
fd270437f0 cosmetics,*loopfilter_neon.c: s/tranpose/transpose/
Change-Id: I267d6a9d715ddb6110f0881c2e820c37fc673fe1
2016-10-12 16:12:56 -07:00
Vignesh Venkatasubramanian
04a6010742 Merge "vp9_bitstream: Parameterize max_mv_magnitude" 2016-10-12 21:52:42 +00:00
Vignesh Venkatasubramanian
d03d1c8cd3 vp9_bitstream: Parameterize max_mv_magnitude
Facilitates encoding tiles in parallel.

BUG=webm:1309

Change-Id: I614a5a492c30b6773c30e7294cd6a6f456e02ab4
2016-10-12 12:50:17 -07:00
Linfeng Zhang
b894d95b32 Merge "[vpx highbd lpf NEON 6/6] vertical 16" 2016-10-12 19:31:39 +00:00
Linfeng Zhang
f664d3f6d5 Merge "[vpx highbd lpf NEON 5/6] horizontal 16" 2016-10-12 19:31:25 +00:00
Linfeng Zhang
3b06acd4e2 Merge "[vpx highbd lpf NEON 4/6] vertical 8" 2016-10-12 19:31:06 +00:00
Linfeng Zhang
01454ec485 [vpx highbd lpf NEON 6/6] vertical 16
BUG=webm:1300

Change-Id: I29d0b482d66f05e278325ddebcf108fbf0b6e222
2016-10-11 22:59:19 -07:00
Linfeng Zhang
27479775c4 [vpx highbd lpf NEON 5/6] horizontal 16
BUG=webm:1300

Change-Id: I21da32d6cfb8a1a6f58bc9756d17f48f13a59a12
2016-10-11 22:59:19 -07:00
Linfeng Zhang
251cbfbec8 [vpx highbd lpf NEON 4/6] vertical 8
BUG=webm:1300

Change-Id: If06b12bc081bab60059b100414dd7018f83ac62d
2016-10-11 22:59:19 -07:00
Kaustubh Raste
56b1be1889 Merge "Optimize vp8 loopfilter msa functions" 2016-10-12 05:44:11 +00:00
James Zern
356f95b423 Merge "[vpx highbd lpf NEON 3/6] horizontal 8" 2016-10-12 05:35:48 +00:00
Linfeng Zhang
96c7206ede [vpx highbd lpf NEON 3/6] horizontal 8
BUG=webm:1300

Change-Id: Ica2379e294be60b7f80fcfcec110dca4c3b59d81
2016-10-12 00:48:31 +00:00
Marco
065ba0c486 vp8: Adjust threshold on VP8/DatarateTestLarge.DenoiserOffOn.
Fix unit_tests_ubsan failure for VP8/DatarateTestLarge.DenoiserOffOn.
Failure was triggered by commit: df66f8e8.

Change-Id: I7cc5bd309e85950cfc5755e01d0eb942d9ca6984
2016-10-11 16:18:14 -07:00
Marco
57c6bf291e 1 pass vbr: Allow for lookahead alt-ref in real-time mode.
For 1 pass vbr real-time mode:
Allow for the usage of alt-ref frame when non-zero lag-in-frames is used.
Use non-filtered alt-ref, and select usage based on fast scene/content
analysis/detection within the lag of frames.

Positive gains on ytlive set: overall avgPSNR ~3-4%.
Several clips are up between 5-14%, a few clips are neutral/small change.

Current speed decrease is about ~5-10%.

Use the flag USE_ALTREF_FOR_ONE_PASS to enable this feature
(off by default for now).

Change-Id: I802d2bf3d44f9cf01f6d15c76be9c90192314769
2016-10-11 10:13:17 -07:00
Marco
cdbd89197e vp9: 1 pass vbr: some adjustments to gf interval.
Put limit on gf interval based on lag, and allow
for the adjustment on next gf group also on key frame.

Small/neutral change on ytlive metrics.

Change only affects 1 pass vbr real-time mode.

Change-Id: I339c8f4398848698b6e10fe9482c52ca661b94a5
2016-10-11 08:34:12 -07:00
Marco Paniconi
294a734a5f Merge "vp8: Change default gf behavior for 1 pass cbr." 2016-10-10 23:06:31 +00:00
Linfeng Zhang
57e4cbc632 Merge "[vpx highbd lpf NEON 2/6] vertical 4" 2016-10-10 16:57:55 +00:00
Linfeng Zhang
19046d9963 Merge "[vpx highbd lpf NEON 1/6] horizontal 4" 2016-10-10 16:56:23 +00:00
Kaustubh Raste
3da752fe00 Optimize vpx_mbpost_proc_across_ip_msa function
Removed HADD_SW_S32 calculation

Change-Id: I7384dc881451d197404d09beb7c27b222e1d6875
2016-10-10 18:03:28 +05:30
Kaustubh Raste
d05104b488 Optimize vpx_get4x4sse_cs_msa function
Reuse CALC_MSE_B macro

Change-Id: I39f0a92ac2dbb5fa8628df1a5d556cfdc42a3648
2016-10-10 16:31:57 +05:30
Kaustubh Raste
8b5eddf709 Merge "Optimize vp9 loopfilter msa functions" 2016-10-08 05:05:16 +00:00
Kaustubh Raste
3c2f7eb339 Optimize vp9 loopfilter msa functions
Updated code to process in 8bit as saturation/clipping takes care of
overflow
Removed unused macro

Change-Id: I113df60286fb28b216df800d95b2d3695ef71440
2016-10-07 19:26:26 -07:00
Marco
df66f8e830 vp8: Change default gf behavior for 1 pass cbr.
In 1 pass CBR, with error_resilience off, allow for
special logic to change the default gf behaviour.
In this CL: boost is turned off and the gf period
is set to a multiple of cyclic refresh period.

Change only affect 1 pass CBR mode, i.e, when the flag
gf_update_onepass_cbr is set.

Including the previous change (3ec8e11: to allow cyclic refresh
for error_resilience off), comparing metrics on RTC set for
error_resilience off vs on: avgPSNR/SSIM up by ~6%.

Change-Id: Id5b3fb62a4f04de5a805bd1b418f2b349574e0bc
2016-10-07 11:13:06 -07:00
Vignesh Venkatasubramanian
e83e828998 Merge "write_modes: add MACROBLOCKD as a parameter" 2016-10-07 18:09:09 +00:00
Vignesh Venkatasubramanian
ed50e7710c write_modes: add MACROBLOCKD as a parameter
This will enable bit stream packing of each tile column in
parallel.

BUG=webm:1309

Change-Id: Ie349d8cc5825326218ffda893a50730b2e68ed34
2016-10-07 10:25:02 -07:00
Kaustubh Raste
06a6b28d75 Optimize vp8 loopfilter msa functions
Updated code to process in 8bit as saturation/clipping takes care of overflow

Change-Id: I35fb2c0e702fd91309cc391c5a7745a3b619a64c
2016-10-07 15:48:31 +05:30
James Zern
5e4d2548cf Merge "Fix build failure in libvpx_example_test-multi-target." 2016-10-07 01:53:40 +00:00
Linfeng Zhang
49aa9b1f12 [vpx highbd lpf NEON 2/6] vertical 4
BUG=webm:1300

Change-Id: Ia33a9f2d6c7e2e6b3497ad6f1a09439a85b33983
2016-10-06 14:22:26 -07:00
Linfeng Zhang
7aa27bd62f [vpx highbd lpf NEON 1/6] horizontal 4
BUG=webm:1300

Change-Id: Idf441806e6bf397ff5ecd8776146b3f781f50c40
2016-10-06 14:03:04 -07:00
James Zern
ac00db7948 Merge changes from topic '8bit-hbd-idct'
* changes:
  vpx_dsp/idct*_neon.asm: simplify immediate loads
  enable idct*_1_add_neon in high-bitdepth builds
2016-10-06 19:37:19 +00:00
Marco
c7072ae2f4 Fix build failure in libvpx_example_test-multi-target.
Due to change in command line to sample encoder from:
7eff8f3 Update to vpx_temporal_svc_encoder command line.

This caused the tests in vpx_temporal_svc_encoder.sh to fail.

Change-Id: Ic667da81955ad117d04610af21877fed1d4f188f
2016-10-06 12:22:32 -07:00
Alex Converse
fd918cf9a3 Merge "Remove vpx_realloc()" 2016-10-06 18:42:05 +00:00
Kaustubh Raste
f875267ad0 Merge "Modify vp8 idct msa functions store method" 2016-10-06 02:25:42 +00:00
James Zern
1e1caad165 vpx_dsp/idct*_neon.asm: simplify immediate loads
mov supports 0-65535

Change-Id: I019de0d784836d7bd60e6b36f2cdeefb541cb3fd
2016-10-05 14:28:32 -07:00
James Zern
a6be7ba1aa enable idct*_1_add_neon in high-bitdepth builds
these are compatible as they only load one element of the input so the
larger size of tran_low_t makes no difference in little endian builds.
note the asm is incompatible with big-endian, but there are other points of
failure there so currently it's considered unsupported.

BUG=webm:1294

Change-Id: Icd2665a0699bccae92d1bea43a95b0a83fb17028
2016-10-05 11:14:25 -07:00
Marco Paniconi
efb56ec3ff Revert "Revert "vp8/encoder/onyx_if.c: apply clang-format""
This reverts commit a7456144ce.

Change-Id: I400987fb26a09e9b9ea42c91f48ea12f7bc37356
2016-10-05 17:59:55 +00:00
Alex Converse
3063c37600 Remove vpx_realloc()
It only handles the realloc constraint (preserving low elements) by
serendipity, and we don't actually rely on that behavior anyway.
Meanwhile the calls may do extra copying that gets immediately clobbered
by the callers.

Change-Id: I8dfa89e4a81084b084889c27bd272fdf85184e8d
2016-10-05 10:57:56 -07:00
Marco Paniconi
a7456144ce Revert "vp8/encoder/onyx_if.c: apply clang-format"
This reverts commit 891a87dccd.

Change-Id: I067b3b6a3cfb5bc760166999948b8087d4c5cb80
2016-10-05 15:45:48 +00:00
Kaustubh Raste
68f6f6c4cc Modify vp8 idct msa functions store method
vp8_short_inv_walsh4x4_msa - Optimized to process in short vector type
Updated below functions to store exact number of bytes in output rather than complete vector
idct4x4_addblk_msa
idct4x4_addconst_msa
dequant_idct4x4_addblk_msa
dequant_idct4x4_addblk_2x_msa
dequant_idct_addconst_2x_msa

Change-Id: Ic1b3752e2421dc7d70a082dcdaab9d140d7e5d9c
2016-10-05 10:12:12 +05:30
clang-format
891a87dccd vp8/encoder/onyx_if.c: apply clang-format
after:
955b3b6 vp8: Allow for cyclic refresh even if error_resilience it off.

Change-Id: Iba189b18c84be8f5140754280c6801cfc387cfcd
2016-10-04 21:12:06 -07:00
Marco
955b3b66bd vp8: Allow for cyclic refresh even if error_resilience it off.
cyclic_refresh was tied to error_resilience mode.
Allow it to be on also for 1 pass CBR mode even if
error_resilience is off.

Other option to use new control for this, but prefer to avoid
that for now.

Change-Id: I3625b292ee059a890e31338b514e211bf0ab5c3e
2016-10-04 14:19:49 -07:00
Sarah Parker
8978704970 Merge "Remove rate deviation metric from vp8" 2016-10-04 18:56:14 +00:00
Sarah Parker
d556d435f3 Remove rate deviation metric from vp8
BUG=b/31780679

Change-Id: I2b2a43b154eeacb4f51a11f6362cc535cfe318da
2016-10-04 11:20:55 -07:00
Johann Koenig
3db06394e7 Merge "Connect partial IDCT tests" 2016-10-04 18:01:19 +00:00
Johann
24c0146403 Connect partial IDCT tests
Change-Id: Ie8d5d9123f5a9d39db4ec9c74f77ee979ae4e685
2016-10-04 10:31:01 -07:00
Angie Chiang
5d635365bb Merge "Move highbd txfm input range check from 2d iht transform to 1d idct/iadst" 2016-10-04 16:57:37 +00:00
Kaustubh Raste
0a92dd7319 Merge "Fix vpx_plane_add_noise_msa functionality bit-mismatch" 2016-10-04 06:35:47 +00:00
Angie Chiang
5b073c695b Move highbd txfm input range check from 2d iht transform to 1d idct/iadst
This change will make the highbd txfm input range check more comprehensive

The 25-bit highbd input range is composed by
12 signal input bits + 7 bits for 2D forward transform amplification + 5 bits for
1D inverse transform amplification + 1 bit for contingency in rounding and quantizing

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=651625

Change-Id: I04c0796edd7653f8d463fba5dc418132986131e7
2016-10-03 17:21:08 -07:00
James Zern
577221bc87 Merge "invalid_file_test: quiet unused const warning" 2016-10-03 22:51:06 +00:00
James Zern
fb020805f9 Merge "Fix warning when building with GCC 5." 2016-10-03 22:42:52 +00:00
James Zern
c6bc7499d9 Merge "cosmetics,*_neon.c: rm redundant return from void fns" 2016-10-03 22:40:42 +00:00
Kaustubh Raste
6922fc8230 Fix vpx_plane_add_noise_msa functionality bit-mismatch
Change-Id: I04961afb592ae6a67fdcfd8c9066e920dd4b30e7
2016-10-03 18:15:59 +00:00
Marco
7eff8f3b1d Update to vpx_temporal_svc_encoder command line.
Set the  #threads at command line.

Change-Id: Id0daa2393880c3da2d903c11a793072d3008b34b
2016-10-03 09:49:15 -07:00
James Zern
50b9c467da Merge "vpx_convolve8_neon,load/store*: correct param type" 2016-10-01 23:52:14 +00:00
Geza Lore
0dc12b4a1c Fix warning when building with GCC 5.
These caused the following warning with GCC 5:
     warning: logical not is only applied to the left hand side of
     comparison [-Wlogical-not-parentheses]
     assert(!is_compound == (cm->reference_mode == SINGLE_REFERENCE));

Change-Id: If296aabb2311ceb7d903b395c1549ef81c2cbf9b
(cherry picked from commit c6cf7a6111)
2016-10-01 12:23:15 -07:00
James Zern
fca2196a2e invalid_file_test: quiet unused const warning
with --disable-vp9

Change-Id: I81bd603b02ee5d1b45a50aa9e7534f9da498b0e0
2016-10-01 11:49:02 -07:00
James Zern
c449983c56 vpx_convolve8_neon,load/store*: correct param type
stride/pitch in convolve is expressed with a ptrdiff_t

Change-Id: Ia5a6732dc509f06ccf7035386fa8ae721b4b1a71
2016-10-01 11:03:29 -07:00
Martin Storsjo
9255328f27 Remove a stray END declaration in loopfilter_4_neon.asm
Change-Id: Ic8c359a5677f9c663787aac74f530e886163bc69
2016-10-01 14:12:42 +03:00
James Zern
3c00132181 Merge "vp8,frame_buffers: remove unused use_frame_threads" 2016-10-01 01:35:55 +00:00
Linfeng Zhang
da14d23e44 Merge "Refactor vpx lpf NEON files (step 2/2)" 2016-10-01 00:07:51 +00:00
Linfeng Zhang
edbca72a53 Merge "Refactor vpx lpf NEON files (step 1/2)" 2016-10-01 00:07:31 +00:00
Marco Paniconi
0a9f56f146 Merge "vp9: On change_config() only call update_frame_size if needed." 2016-09-30 21:43:33 +00:00
Marco Paniconi
5e908aff34 Merge "vp9 real-time mode: Change loopfilter speed feature at speed 8." 2016-09-30 21:42:05 +00:00
James Zern
db80c23fd4 cosmetics,*_neon.c: rm redundant return from void fns
+ a couple of 'break's after a return

Change-Id: Ia21f12ebcef98244feb923c17b689fc8115da015
2016-09-30 13:09:57 -07:00
James Zern
b6277a47c7 Merge changes from topic '8bit-hbd-idct'
* changes:
  *idct*_neon.c: add missing rtcd include
  idct,msa/neon: exclude idct files from hbd build
  *rtcd_defs.pl: remove empty specialize calls
2016-09-30 19:36:08 +00:00
James Zern
1396d12103 *idct*_neon.c: add missing rtcd include
+ correct declarations as necessary

BUG=webm:1294

Change-Id: I719602df9a56e79188a78e7f8b31257c6d3cc11d
2016-09-30 11:41:26 -07:00
James Zern
b51c4df93a idct,msa/neon: exclude idct files from hbd build
these functions are incompatible currently and unreferenced in rtcd,
exclude them from the build.

BUG=webm:1294

Change-Id: I7790c195a91e1b142f56c04d2a5e305d9133b896
2016-09-30 11:32:47 -07:00
Linfeng Zhang
ca2fe7a8c7 Refactor vpx lpf NEON files (step 2/2)
Change-Id: I0744407cd3361ff752bd7f6e654b70ab6b41a58f
2016-09-30 09:56:28 -07:00
Linfeng Zhang
4779f5308d Refactor vpx lpf NEON files (step 1/2)
Change-Id: I4016d096d46ca691f3b17199b259b7231e983cfb
2016-09-30 09:48:54 -07:00
Linfeng Zhang
8c744fd978 Merge "Unify loopfilter function names" 2016-09-30 15:58:08 +00:00
Linfeng Zhang
c435b7fbdd Merge "Refine vpx convolve8 NEON intrinsics optimization" 2016-09-30 15:56:31 +00:00
Linfeng Zhang
bde905cba1 Merge "Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()" 2016-09-30 15:54:02 +00:00
James Zern
ed62d27c71 *rtcd_defs.pl: remove empty specialize calls
add_proto adds a 'c' specialization

Change-Id: I0ed0c2240d45264b0e0056ce7c8f63f4a00780bc
2016-09-29 20:38:26 -07:00
James Zern
f38616e1a2 vp8,frame_buffers: remove unused use_frame_threads
this was never fully implemented

Change-Id: I4640cf84c40ea2cc9c6c12acf116d39df4b04578
2016-09-29 20:24:15 -07:00
James Zern
39ff0de810 Merge "configure: test for -Wshorten-64-to-32 in non hbd builds" 2016-09-30 03:01:56 +00:00
Johann Koenig
cb4aa6d589 Merge changes I158f631a,I0555f639
* changes:
  vp8: remove mmx functions
  Rename _xmm functions to _sse2
2016-09-30 01:47:41 +00:00
Yunqing Wang
9afe2cf599 Merge "Fix an issue in vp9_first_pass for non-mulitple of 16 resolutions" 2016-09-30 00:49:06 +00:00
Linfeng Zhang
7f1f35183a Unify loopfilter function names
Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().

Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
2016-09-29 16:25:42 -07:00
Linfeng Zhang
85a9e48d25 Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()
BUG=webm:1290

Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6
2016-09-29 16:19:39 -07:00
Deepa K G
2745f94deb Fix an issue in vp9_first_pass for non-mulitple of 16 resolutions
This patch sets the 16x16 src_diff to zero and ensures correct calculation
of this_error for block sizes smaller than 16x16.

Change-Id: I7b7c02d267433c9f22c8ac9b8d5df2f499175172
2016-09-29 16:19:23 -07:00
Johann Koenig
ad55b1d270 Merge changes Ia3e9122f,Id33eb6c8,I956bd8ce
* changes:
  Remove vp8_clear_system_state
  vpx_dsp: clean up rtcd
  vp8: clean up rtcd
2016-09-29 23:16:45 +00:00
James Zern
7b9c86167e Merge "vp9_detokenize,decode_coefs: fix signed int overflow" 2016-09-29 22:36:13 +00:00
Johann
721354fe7f vp8: remove mmx functions
When they have sse2 equivalents.

Change-Id: I158f631a3bcecba57b36093ac10114b1904767a7
2016-09-29 15:25:27 -07:00
Johann
2663b092ae Rename _xmm functions to _sse2
Avoid the extra level of indirection/confusion.

Change-Id: I0555f639d67835df9fb7dac0c75085e9954805f1
2016-09-29 15:23:11 -07:00
Johann
1364cb58b4 Remove vp8_clear_system_state
Use vpx_clear_system_state instead.

Change-Id: Ia3e9122f69a2c690ddd7c7bc54f92ccb9ec18b3e
2016-09-29 13:22:49 -07:00
Marco
e765435293 vp9: On change_config() only call update_frame_size if needed.
change_config() may be called often in real-time application,
to update bitrate/framerate or qp-max/min.
No need to do update_frame_size() unless frame size has changed.

Change-Id: I23a51deade1e03adc91c468f9ffde3235298770c
2016-09-29 13:03:26 -07:00
Marco
d017548be6 vp9 real-time mode: Change loopfilter speed feature at speed 8.
For real-time mode at speed 8: turn off MINIMAL_LF at speed 8,
for non-screen content mode.

Visually better, avgPSNR/SSIM on rtc set go up by ~4-5%.
Speed decrease of about ~3%.

Change-Id: I8eb69330f02e0ceece1507d43cfc8a049a1d8291
2016-09-29 12:59:01 -07:00
Linfeng Zhang
b3cb065ee4 Refine vpx convolve8 NEON intrinsics optimization
BUG=webm:1290

Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873
2016-09-29 12:48:59 -07:00
James Zern
691ef20272 Merge changes I11786887,Ia91823ad
* changes:
  vpx_dsp/get_prob: relocate den == 0 test
  vpx_dsp/get_prob: make clip_prob branchless
2016-09-29 19:11:35 +00:00
Johann
7b5a348088 vpx_dsp: clean up rtcd
Remove avx2+ssse3 specialization. Disabling ssse3 now automatically
disables avx2.

Change-Id: Id33eb6c85d1c4ee57128ebe45c995eb15cfcc765
2016-09-29 12:10:07 -07:00
Johann
c7f9d0719d vp8: clean up rtcd
Remove lines which specify the same name for a function.

Change-Id: I956bd8ce2b81a2a8feab5621d28bd2499c2b4c2d
2016-09-29 12:10:01 -07:00
James Zern
450d89034b vp9_detokenize,decode_coefs: fix signed int overflow
when decoding an invalid bitstream with --enable-vp9-highbitdepth

BUG=webm:1297

Change-Id: I401d87033b4293f2ca595bc51678aad9951ecf15
2016-09-28 22:42:03 -07:00
James Zern
93c823e24b vpx_dsp/get_prob: relocate den == 0 test
to get_binary_prob(). the only other caller mode_mv_merge_probs() does
its own test on 0.

BUG=chromium:639712

Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce
2016-09-28 17:42:49 -07:00
James Zern
e094e151de Merge "vp9: fix compilation for g++ 6.2.x" 2016-09-28 23:36:23 +00:00
Johann Koenig
bb27be0dfe Merge "Hook up vp8_diamond_search_sad_sse3" 2016-09-28 20:54:25 +00:00
James Zern
63f7e131fe Merge "vpxdec: avoid memory leaks under most conditions" 2016-09-28 19:35:16 +00:00
James Zern
7481edb33f vpx_dsp/get_prob: make clip_prob branchless
+ inline the function directly as there was only one consumer
(get_prob())

this is an attempt to reduce the amount of branches to workaround an amd
bug. this change is mildly faster or neutral across x86-64, arm.

http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf
665 Integer Divide Instruction May Cause Unpredictable Behavior

BUG=chromium:639712

Suggested-by: Pascal Massimino <pascal.massimino@gmail.com>
Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2
2016-09-28 11:51:46 -07:00
Tristan Matthews
32c375447c vp9: fix compilation for g++ 6.2.x
Inline function called from test/dct16x16_test.cc wouldn't build due to:
  invalid operands of types ‘__gnu_cxx::__enable_if<true, double>::__type
  {aka double}’ and ‘int’ to binary ‘operator>>’
  return (abs(ref->row) >> 3) < COMPANDED_MVREF_THRESH &&

this converts the test to abs() < COMPANDED_MVREF_THRESH << 3 which
hides the promotion issue.

Regression from commit de993a847f

BUG=webm:1291

Change-Id: I73b5943d07d5b61b709d299114216a2371a8fd62
2016-09-27 23:17:31 -07:00
Linfeng Zhang
240726ac85 Merge "Clean convolve_test.cc" 2016-09-28 00:20:28 +00:00
Debargha Mukherjee
74f038e6f8 Merge "Fix for compile error with range checking" 2016-09-28 00:05:37 +00:00
James Zern
06abc1ecd9 configure: test for -Wshorten-64-to-32 in non hbd builds
provides msvc-like warnings for implicit conversions from 64-bit to
32-bit types

--enable-vp9-highbitdepth still requires some work

this also skips CXXFLAGS for now as some work would be needed to cleanup
third_party/*.cc or split it from test/*.cc where it comes to flags.

Change-Id: Ic9a095b73286eba5ed39bfc27ff69593748cbbf4
2016-09-27 16:52:21 -07:00
Johann
3a57ce4478 Cast strto[u]l down
Change-Id: I945b2f8754cf484a08e5ba511cfd2d4a44181b08
2016-09-27 15:37:10 -07:00
Johann
e4ddf9db6a Hook up vp8_diamond_search_sad_sse3
The original commit never set any 'specialize' line:
61311e6103

It appears the sadx4 version of function uses sdx4df calls to speed up
the search. There are no sse3 versions of the sdx4df functions, but
there are sse2 and msa versions.

There is a neon version of vpx_sad16x16x4d but not any of the smaller
versions. Perhaps if they existed this function could be expanded to use
them.

Change-Id: I936d7d6b1a3ff6dcd5a4d2322272708c47cdec13
2016-09-27 15:31:49 -07:00
James Zern
e61d82bd4f vpxdec: avoid memory leaks under most conditions
avoids false positives when fuzzing with ASan+LSan.

Change-Id: I0d23b530ae80e5692b6951fe6e3690ea44159a5a
2016-09-27 14:29:18 -07:00
Johann Koenig
348cff040a Merge changes from topic 'wextra'
* changes:
  Expand -Wextra to more of the library
  mips: clean up wextra warnings
  Add compiler flag -Wsign-compare
  Add compiler warning flag -Wextra and fix related warnings.
2016-09-27 21:13:50 +00:00
Linfeng Zhang
81ff7a065f Clean convolve_test.cc
Combine test MatchesReferenceSubpixelFilter and
MatchesReferenceAveragingSubpixelFilter.

Change-Id: I75f96befbbb118cdc6b8c6001b4cdda8d88fbbd3
2016-09-27 13:36:31 -07:00
Johann
c3a135b5b8 Expand -Wextra to more of the library
Suppress warnings in third_party/.

vp8 -Wclobbered issue is tracked here:
BUG=webm:1246

BUG=webm:1069

Change-Id: I9b94bf546d7b690c26a59ae67967facdce8ec45b
2016-09-27 13:19:27 -07:00
Johann
02fa245d15 mips: clean up wextra warnings
Remove unused zbin variable:
warning: unused parameter ‘zbin’

Use int for loop variables to avoid unsigned conversion:
warning: comparison between signed and unsigned integer expressions

Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad
2016-09-27 13:19:18 -07:00
Urvang Joshi
097b31c7f0 Add compiler flag -Wsign-compare
Also, fix the warnings generated by this flag.

(cherry picked from commit ebeb1155d4fa6d28e2f40c92265245f8df097fcb)

From AOM. Don't actually add -Wsign-compare. It will be covered by
-Wextra.

Switch to vpx_integer.h from df9c9d6d4c43f02c58d4e776c53323788e013cbc

BUG=webm:1069

Change-Id: I1dc6e61caa5d56af4a55b6692ab620bb3144652a
2016-09-27 12:39:36 -07:00
Urvang Joshi
0aa3e2564f Add compiler warning flag -Wextra and fix related warnings.
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.

Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16

Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h

Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.

Does not enable -Wextra yet. There are more issues to fix.

BUG=webm:1069

Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-09-27 12:05:01 -07:00
Paul Wilkins
b3ebea5e8a Merge "Limit max arf boost and scale motion breakout for image size." 2016-09-27 14:08:29 +00:00
Peter de Rivaz
8db503063f Fix for compile error with range checking
Current version does not build with options:
  --enable-vp9-highbitdepth --enable-coefficient-range-checking

Change-Id: Ic3285f1a3e0d6be88da7f2cd8fa5a631368dd03b
2016-09-27 09:28:44 +01:00
Marco Paniconi
70240a77b8 Merge "vp9: Reduce frame loopfilter-level for 1 pass cbr." 2016-09-26 22:05:44 +00:00
Johann Koenig
b165451ad5 Merge "Un-Revert "Restore vp8_sixtap_predict4x4_neon"" 2016-09-26 19:11:00 +00:00
Johann Koenig
37798711aa Merge "Use shifted value for sinpi8sqrt2" 2016-09-26 19:10:57 +00:00
Marco
d9fc28c0a1 vp9: Reduce frame loopfilter-level for 1 pass cbr.
Reduce the filt_guess for 1 pass cbr on inter-frames.
This reduces visual artifact seen in rtc clip (jimred.vga),
and improves metrics on rtc set.

Metrics on rtc set for cbr mode overall positive, most clips are up:
Speed 7 rtc: avgPSNR/SSIM up by: ~2.6/3.9%
Speed 8 rtc: avgPSNR/SSIM up by: ~1.3/2.5%

Change-Id: Ia4eccea1c19d65b583516df28823cd756c49464f
2016-09-26 10:12:43 -07:00
Linfeng Zhang
b46243d7ff Merge "Refactor lpf (size 4 and 8) NEON intrinsics optimization" 2016-09-26 16:11:12 +00:00
paulwilkins
0421d8e318 Limit max arf boost and scale motion breakout for image size.
Added a cap on the maximum boost for an arf based on interval length.
Fixed bug where by the image size was not accounted for in determining
two of the motion breakout thresholds.

Overall small gains of 0.2-0.4% psnr but on large image format clips with
slow zooms the gain may be as much as 20% or more (e.g. in_to_tree
at 1080P)

Change-Id: Id0a47391203026742daa9c97afac5705fd8c4dfb
2016-09-26 15:38:29 +01:00
Scott LaVarnway
60624aa53a Merge "VP9: token decoder expansion" 2016-09-26 12:06:50 +00:00
James Zern
f8c056a895 Merge "vp9_idct: delete dead TODOs" 2016-09-24 01:47:00 +00:00
Johann
ab0e7a237a Use shifted value for sinpi8sqrt2
The value 35468 changes sign when stored in int16_t:
implicit conversion from 'int' to 'int16_t' (aka 'short')
changes value from 35468 to -30068

This negation requires adding back the original value to compensate.
Shifting the value keeps the value positive and saves a post-vqdmulh
shift.

This technique is used in webp and idct_dequant_full_2x_neon

BUG=b/28027557

Change-Id: I0c5ce09bea170fe08061856c2af6f841a557e0c3
2016-09-23 17:04:18 -07:00
Johann
1d14e42df7 Un-Revert "Restore vp8_sixtap_predict4x4_neon"
This restores d9dce2f48e

Switched to using signed shift-and-narrow. Instead of saturating
negative results to 0, it was saturating them to 255.

BUG=webm:817
BUG=webm:1273

Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
2016-09-23 14:58:57 -07:00
Scott LaVarnway
87b689f97a VP9: token decoder expansion
This version is based on Change 267683, but does not
use the macros.

Change-Id: I0619fa618decf8bdeef250584d75d70318b5d9a7
2016-09-23 06:24:20 -07:00
Scott LaVarnway
ada850786c Merge "VP9: pass TileWorkerData instead of MACROBLOCKD and vpx_reader." 2016-09-23 11:59:16 +00:00
James Zern
deadda3dea Merge "vpx_idct32x32_34_add_sse2: rm unneeded transposes" 2016-09-23 02:49:26 +00:00
James Zern
a914ffad97 Merge "variance_neon: sync variance*() w/c,sse2" 2016-09-23 02:18:49 +00:00
Scott LaVarnway
7a34f85955 VP9: pass TileWorkerData instead of MACROBLOCKD and vpx_reader.
Change-Id: I869ef0f113c022143b531c44aefa0f1bb267052d
2016-09-22 13:18:36 -07:00
James Zern
fdd1186f97 vpx_idct32x32_34_add_sse2: rm unneeded transposes
this change is neutral to mildly positive across various x86-64
platforms

Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784
2016-09-21 19:49:25 -07:00
Angie Chiang
99ef84c65a Merge "Detect invalid highbd iht input" 2016-09-22 01:06:38 +00:00
James Zern
e372bfd5ac variance_neon: sync variance*() w/c,sse2
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings

Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
2016-09-21 18:04:45 -07:00
James Zern
fcf281b6a1 Merge "vp8: remove VP8_SET_DBG* control support" 2016-09-22 00:43:35 +00:00
Angie Chiang
80338b91d3 Detect invalid highbd iht input
Do nothing in vp9_highbd_iht#x#_##_add_c when input magnitude is beyond
20 bits. Note that, sign bit is not included here.

In the 20 bits, we use 12 bits for input signal, 7 bits for forward
transform amplification, and 1 bit for contingency in rounding and
quantizing

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1286

Change-Id: I332c6f68df4614fc2e7d2dc4c5bb0d0cff8a245c
2016-09-21 17:15:19 -07:00
Johann
2bed8b6acd Keep vp8 sixtap read within bounds
When filtering it needs 6 pixels: 2 prior to the source, the source, and
3 after the source.

When filtering 16 wide, that means 21. To accomplish this the SSE2 reads
[-2] to [5], [6] to [13], and [14] to [21], a total of 24 bytes (reading
in groups of 8 is easy)

The filter then shifts this last set to the top half of the register and
uses 'or' to combine it with the previous set.

Valgrind detected an issue reading pixels [19], [20] and [21]:
Address 0x7f581c2 is 434 bytes inside a block of size 441 alloc'd

Note: we only need pixels [16], [17], and [18] as context for [15].

To fix this, it now reads 8 bytes starting at [11], which re-loads [11]
through [13], but stops at [18] and does not over-read any values.

This is shifted by 5 and 'or'd with xmm1. Although the lower bits are
not cleared, they overlap directly with [11] through [13], so 'or'
produces the correct results.

Change-Id: I0c89c03afa660fc9b0108ac055d7bd403e493320
2016-09-21 16:17:07 -07:00
Johann
35ebc1cddf predict_test: align dst buffer to 16
On 32 bit machines 'new' does not always appear to allocate sufficiently
aligned buffers, causing intermittent test failures.

Change-Id: I0db4fc73782012e4eef71dc0fb540e74fdbfcebe
2016-09-21 13:35:47 -07:00
James Zern
3f72509587 vp8: remove VP8_SET_DBG* control support
the --enable-postproc-visualizer configure option remains as a no-op as
do the control names and values for compatibility
+ remove the corresponding debug flags from vpxdec: --pp-*

Change-Id: I4a001cd9962b59560d7d6bda6272d4ff32b8d37c
2016-09-20 20:19:36 -07:00
James Zern
cec6433e41 vp9_idct: delete dead TODOs
Change-Id: Icdd5494f557d83026dc078bce37997a76aa288fb
2016-09-20 19:46:27 -07:00
James Zern
b6e686b1ea Merge changes from topic 'Wshorten'
* changes:
  vp8: convert some uses of unsigned long to size_t
  vp8/encoder: quiet some -Wshorten-64-to-32 warnings
2016-09-20 23:17:20 +00:00
James Zern
c31d02615d Merge "variance_avx2: sync variance functions with c-code" 2016-09-20 22:33:39 +00:00
James Zern
2351a73531 Merge "examples: quiet -Wshorten-64-to-32 warnings" 2016-09-20 22:32:58 +00:00
James Zern
feb4313c5f Merge "vp9_rtcd: remove non-existent highbd convolve fns" 2016-09-20 22:07:09 +00:00
Johann Koenig
8478f97105 Merge "Enable ssse3 bilinear tests" 2016-09-20 21:46:50 +00:00
Johann Koenig
18fd69ee91 Merge "Add vp8_bilinear_filter test" 2016-09-20 20:30:48 +00:00
Alex Converse
0d2687ef87 Merge "Code class0 using vpx_read() / vpx_write()." 2016-09-20 19:19:29 +00:00
James Zern
5841929fde vp9_rtcd: remove non-existent highbd convolve fns
these were moved to vpx_dsp

Change-Id: I307b07ae05e2333277d4b7011cba36dcf8409959
2016-09-19 20:01:23 -07:00
James Zern
08b8b6bb8f examples: quiet -Wshorten-64-to-32 warnings
all around usage of strtol/strtoul

Change-Id: If907c89f107a068987aa71ddd93cee9a7389e4cd
2016-09-19 19:02:49 -07:00
James Zern
8281da74b9 vp8: convert some uses of unsigned long to size_t
similar to changes that were done in vp9 for encoded frame size
reporting. has the side-effect of quieting a -Wshorten-64-to-32 warning.

Change-Id: I89f74cb617fc29334ee351dc8dfaa3b8cfd4e5af
2016-09-19 18:35:59 -07:00
James Zern
0ce98b423b vp8/encoder: quiet some -Wshorten-64-to-32 warnings
this code is similar to other existing uses and/or vp9

Change-Id: I56e646931379759d9f7332ea6d746060007c75ee
2016-09-19 18:35:59 -07:00
Linfeng Zhang
761e5ec2f6 Refactor lpf (size 4 and 8) NEON intrinsics optimization
Also check in 8x8 8-bit transpose NEON intrinsics optimization
transpose_u8_8x8()

Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4
2016-09-19 16:41:37 -07:00
James Zern
6acd061aad variance_avx2: sync variance functions with c-code
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings

Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
2016-09-19 16:19:29 -07:00
Johann Koenig
0695843a21 Merge "Remove -fno-strict-aliasing flag" 2016-09-19 22:49:23 +00:00
Johann
fad70a358b Remove -fno-strict-aliasing flag
The referenced bug was fixed by saving neon registers. That this had any
effect was coincidental.

Both chromium and Android build with clang and neither uses this flag.

Change-Id: I470247d6fd9226fc207b42a187105581a94badc3
2016-09-19 12:16:03 -07:00
Nathan E. Egge
de7f5ce9e5 Code class0 using vpx_read() / vpx_write().
The vp9_mv_class0_tree is a balanced tree with two leafs and can
simply be coded as a boolean with probability class0[0].

Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
(cherry picked from commit be8a8ab62ebdd111c6f2e9a33b15630570671eba)
2016-09-19 10:50:39 -07:00
Alex Converse
01e2902521 Zero the whole rd_counts struct rather than the each member
Change-Id: I495aa9cec2b2b8f1ae69bdab8b3feeca76358472
2016-09-19 10:04:47 -07:00
James Zern
aa0eb67bf7 loopfilter_mb_neon: remove unused load_8x8()
quiets a -Wunused-function warning for arm targets

Change-Id: I293a7e3d3d7d61d6af2fbedad5e8c25126c418b6
2016-09-17 11:00:31 -07:00
Linfeng Zhang
5d73639d8f Merge "Refactor lpf (size 16) NEON intrinsics optimization" 2016-09-17 00:33:30 +00:00
James Zern
112eb54c1b Merge "vpx_codec_control: return incapable for unmatched control" 2016-09-16 17:30:44 +00:00
Linfeng Zhang
8107368000 Refactor lpf (size 16) NEON intrinsics optimization
Extract shared code so later lpf size 4 and 8 functions can reuse.

Change-Id: Ibb43ef1fd8651bd2e32fcc4c56cf6fa7ca237401
2016-09-16 09:12:13 -07:00
James Zern
33aef48f29 vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.

BUG=b/30970831

Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
2016-09-16 07:14:17 +00:00
James Zern
7a9e476072 Merge changes from topic 'clang-format'
* changes:
  apply clang-format
  .clang-format: update to 3.8.1
2016-09-16 07:11:33 +00:00
Johann
e813c2b416 Enable ssse3 bilinear tests
The code only has issues when xoffset == 0 and yoffset == 0 which
represents a simple copy. Presumably this case does not need to be
handled because the issue has existed since 2010.

BUG=webm:1287

Change-Id: Ic47e2653f3b729e99b40e53d8d2d8d1501edaaa9
2016-09-15 23:16:26 -07:00
Johann
caf9a7841e Add vp8_bilinear_filter test
Build out the sixtap_predict test because the filters are
interchangeable. Add verbose failures and border checking.

Change-Id: I962f50041750dca6f8d0cd35a943424cf82ddcb1
2016-09-15 23:16:19 -07:00
James Zern
6ae58fd55e Merge "Revert "Restore vp8_sixtap_predict4x4_neon"" 2016-09-16 06:13:42 +00:00
Johann Koenig
7795e99296 Revert "Restore vp8_sixtap_predict4x4_neon"
This reverts commit d9dce2f48e.

Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.

Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
2016-09-16 06:12:49 +00:00
Johann Koenig
fdbe249991 Merge "Restore vp8_bilinear_predict4x4_neon" 2016-09-16 05:33:50 +00:00
Johann Koenig
102eae06e9 Merge "zero structures completely" 2016-09-16 04:41:22 +00:00
Johann
43743b1d3e Restore vp8_bilinear_predict4x4_neon
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421

The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.

It is still ~5x faster than C in the unaligned case and doing both
filters.

BUG=webm:892
BUG=webm:1273

Change-Id: Icf7167189391b46202f47233bb585c24c42bcc36
2016-09-15 21:16:11 -07:00
Johann Koenig
7bc0733c27 Merge "Restore vp8_sixtap_predict4x4_neon" 2016-09-16 04:12:08 +00:00
Johann
d5054504a7 zero structures completely
Use vp[89]_zero when possible.

Expand the {} set when neither is available or nearby.

Change-Id: Ifc1f46f60100916cd798bf7be3a10f09321c99bd
2016-09-16 03:54:11 +00:00
Johann
1d2aaf58dd vp8 postproc: expand CONFIG_POSTPROC guard
postproc.c is overloaded and used for both postproc and internal stats.
If only --enable-internal-stats is specified there are issues with
non-existent struct members and unused functions.

Change-Id: I82367f1ffce659c3918c9f964dbce94a716fbb89
2016-09-16 03:52:19 +00:00
Johann
f2be831885 altref test: comment out 'pass'
All the other test which do not use 'pass' (which appears to be almost
all of them) do this.

Cleans -Wextra/-Wunused-parameter:
unused parameter ‘pass’

Change-Id: I1ff3acf3f3d1e831f94dcb00ea36337afe0aefe0
2016-09-15 17:45:47 -07:00
Johann Koenig
c53aacf408 Merge "vp9 frame parallel test: Initialize cfg differently" 2016-09-15 23:46:56 +00:00
Marco
4c1a9fb8db vp9: Small code cleanup.
Remove the experiment LIMIT_QP_ONEPASS_VBR_LAG, as its
not currently used and no plan to use in near future.

Change-Id: Ib069f8d7225195be04b765d0ab477510dfba6a3b
2016-09-15 15:17:17 -07:00
clang-format
5f6d143b41 apply clang-format
Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487
2016-09-15 15:07:53 -07:00
James Zern
30b1abd6e6 .clang-format: update to 3.8.1
based on --style=Google with the following differences:
3a4
> # Generated with clang-format 3.8.1
13c14
< AllowShortCaseLabelsOnASingleLine: false
---
> AllowShortCaseLabelsOnASingleLine: true
41c42
< ConstructorInitializerAllOnOneLineOrOnePerLine: true
---
> ConstructorInitializerAllOnOneLineOrOnePerLine: false
44,45c45,46
< Cpp11BracedListStyle: true
< DerivePointerAlignment: true
---
> Cpp11BracedListStyle: false
> DerivePointerAlignment: false
73c74
< PointerAlignment: Left
---
> PointerAlignment: Right
75c76
< SortIncludes:    true
---
> SortIncludes:    false

SortIncludes will like be enabled in a future commit

Change-Id: I5c404f44081b65354e7f526411c91fbbe31ac5af
2016-09-15 15:05:52 -07:00
Johann
d9dce2f48e Restore vp8_sixtap_predict4x4_neon
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421

The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.

The store, when unaligned, has a version that is ~25% slower but safe
when xoffset = 0 (second pass filter only). When the first pass filter
(or both) are in play, the new version is almost identical in speed.

Worst case performance (both filters, unaligned stores) is roughly 3-4x
faster than C.

BUG=webm:817
BUG=webm:1273

Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
2016-09-15 14:56:47 -07:00
Johann
284cb5314e vp9 frame parallel test: Initialize cfg differently
Use the canonical 'vpx_codec_dec_cfg_t()' as opposed to 'vp9_zero()'
which just hammered everything to 0.

Change-Id: Id820efef700ad92a625797f8fd58e465b15eeca4
2016-09-15 12:19:25 -07:00
Johann Koenig
ee01b78ddd Merge "Documentation for building unit tests for Android" 2016-09-15 19:17:14 +00:00
Johann
a3400f4376 Documentation for building unit tests for Android
BUG=webm:1258

Change-Id: Iea142f7b0df0e047720e8c5362464932de57d564
2016-09-15 19:16:14 +00:00
James Zern
4282d29355 Merge "cosmetics,vp8: join some lines, fix table format" 2016-09-14 00:41:51 +00:00
Johann
4c6819d0fc vp8 decoder: cast decoding_thread_count to int
For some reason allocated_decoding_thread_count is signed, but decoding_thread_count is not.

Cleans -Wextra/-Wsign-compare:
comparison between signed and unsigned integer expressions

Change-Id: Id0ada78100acff27c1c4ed7493c563d13c55cdcd
2016-09-13 14:51:14 -07:00
Johann
75fe2d4409 vp9 frame parallel test: Initialize cfg to 0
Use vp9_zero() to set every element.

Cleans -Wextra/-Wmissing-field-initializers:
missing initializer for member ‘vpx_codec_dec_cfg::w’
missing initializer for member ‘vpx_codec_dec_cfg::h’

Change-Id: I5b41ce7d55a912e29b1d4c3e840cea80e8510fbe
2016-09-13 14:51:14 -07:00
Johann
db32581650 vp9cx_set_ref.c: remove unused 'cfg' parameter
Cleans -Wextra/-Wunused-parameter warning:
warning: unused parameter ‘cfg’

Change-Id: I84eae57a50306cb66c625bb648b0a330678818db
2016-09-13 14:51:06 -07:00
Johann
bce23ab36b webmenc: remove unused 'fps' parameter
Cleans -Wextra/-Wunused-parameter warning:
warning: unused parameter ‘fps’

Change-Id: Ia5f9338f11ae8d0708a87c6d4e7d7e924fc3b19b
2016-09-13 14:25:40 -07:00
James Zern
6eca31be5f vpx_codec_control: return incapable for unmatched control
VPX_CODEC_INCAPABLE rather than the more generic VPX_CODEC_ERROR

Change-Id: Id1ed7fb23a2910192713c6b2389c0b7320201f52
2016-09-09 17:40:10 -07:00
James Zern
a22a455899 cosmetics,vp8: join some lines, fix table format
Change-Id: Idcf3b68f0e59bd74c9d332bbd4a7c1484ddb691a
2016-09-09 16:39:34 -07:00
Marco
421f376568 vp8: Set the skin model to mode 1.
This change was reverted before due to a hangouts encode-time
regression investigation. But since then this change has been
cleared of causing any noticeable regression.

This mode reduces some false detection, and uses the
same model as in vp9.

Change-Id: I9c82a748c5f601d0aca9f61ee218abfbd58c62bd
2016-09-09 09:09:43 -07:00
James Zern
66241b9579 Merge "vp8: Remove TSAN warning around end of encode." 2016-09-09 03:08:18 +00:00
Alexander Potapenko
948a1f51d0 vp8: Remove TSAN warning around end of encode.
Tsan warns when run in one pass and there is a recode
loop.

Change-Id: Ice2ecb2270f09ebd49efbd49c0e4f77d32e23c0f
2016-09-08 14:36:32 +02:00
James Zern
4b0e78bfda Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()" 2016-09-08 01:05:18 +00:00
James Zern
bcbc4761fa vpx_mem.c: remove unnecessary inline
these aren't overly speed critical, best to leave it to the compiler.

Change-Id: I231c14abee5b845d7b8e8454832f2feb22c6ce45
2016-09-07 12:49:21 -07:00
Scott LaVarnway
309125b1e7 vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()
Change-Id: I140d93aebadb0eaf6220881e61a0451450081227
2016-09-07 05:58:29 -07:00
Sarah Parker
c892521b1d Fix missing write to opsnr in internal stats
Change-Id: I21c8ad0b5ed7f8d843cae45c18f5727bceb8f859
2016-09-03 12:15:32 -07:00
James Zern
4a25b59bbd Merge "invalid_file_test: quiet -Wunused-const-variable warnings" 2016-09-03 01:14:55 +00:00
James Zern
e6f0c26268 invalid_file_test: quiet -Wunused-const-variable warnings
present when --disable-vp8(-decoder) or --disable-vp9(-decoder) was used

Change-Id: I31ebb7a55c6f1af3c744982f56b78e80116cc845
2016-09-01 19:54:34 -07:00
James Zern
3d253b0c71 vp8_cx_iface: quiet -Wshorten-64-to-32 warning
set_reference_and_update(): use the correct type for flags,
vpx_enc_frame_flags_t

Change-Id: I257da784537ff18686f6db8665f99af6ea6a86ba
2016-09-01 19:54:00 -07:00
James Zern
d6d3d4ba31 get_cpu_count: quiet -Wshorten-64-to-32 warnings
sysconf returns a long; cast (unsigned) dwNumberOfProcessors to int for
good measure

Change-Id: I1f181d7bd9a060c0898db41f66a5065394afdc4e
2016-09-01 19:54:00 -07:00
Johann Koenig
4d1540f8ce Merge changes from topic 'Wundef'
* changes:
  Enable -Wundef by default
  Define VP8_TEMPORAL_ALT_REF to !CONFIG_REALTIME_ONLY
  Remove CONFIG_DEBUG guards from assert()
  Remove unused function vpx_de_mblock
  Fix -Wundef warning for OUTPUT_FPF
  Fix -Wundef warning for __SANITIZE_ADDRESS__
2016-09-02 01:39:18 +00:00
Yaowu Xu
594e53514b Merge "Fix formatting in internal stats for vp8 and vp9" 2016-09-01 23:55:23 +00:00
Yaowu Xu
454139ae13 Merge "Casts to remove some warnings." 2016-09-01 23:37:04 +00:00
Debargha Mukherjee
a6bc3dfb0f Merge "Refactor uv tx size with lookup arrays" 2016-09-01 16:46:32 +00:00
Paul Wilkins
009116cb6f Merge "Modified resize loop constraints." 2016-09-01 15:59:30 +00:00
paulwilkins
3e9e77008c Casts to remove some warnings.
Added casts to remove warnings:
BUG=webm:1274

In regards to the safety of these casts they are of two types:-

- Normalized bits per (16x16) MB stored in a 32 bit int (This is safe as bits
per MB even with << 9 normalization cant overflow 32 bits. Even raw 12
bits hdr source even would only be  29 bits :- (4+4+12+9) and the encoder
imposes much stricter limits than this on max bit rate.

- Cast as part of variance calculations.  There is an internal cast up to 64 bit
for the Sum X Sum calculation, but after normalization dividing by the number
of points the result will always be <= the SSE value.

Change-Id: I4e700236ed83d6b2b1955e92e84c3b1978b9eaa0
2016-09-01 16:10:12 +01:00
Johann
4d1c117f5b Enable -Wundef by default
BUG=webm:1069

Change-Id: I43728f9fd007542718a55d5fdcbc63a8d2f86682
2016-08-31 23:01:57 -07:00
Johann
1139f0dbc2 Define VP8_TEMPORAL_ALT_REF to !CONFIG_REALTIME_ONLY
Previously VP8_TEMPORAL_ALT_REF was only defined for non-realtime-only
builds. However, its value was checked with #if, not #ifdef.

Fixes -Wundef warnings.

BUG=webm:1069

Change-Id: If78d8731298f3f0d3662ffa25f973e7adaf67152
2016-08-31 23:01:57 -07:00
Johann
18b6691105 Remove CONFIG_DEBUG guards from assert()
When 'NDEBUG' is set, assert() generates no code.

Change-Id: Icf61cfc1a8f6e5f0770b3626d8c73ae968df1108
2016-08-31 23:01:57 -07:00
Johann
24f534ac90 Remove unused function vpx_de_mblock
vpx_config.h was not included so CONFIG_POSTPROC was never defined.

Change-Id: I777de499823afa286734549a8e7f4a93e7ad97f3
2016-08-31 23:01:45 -07:00
Johann
7b3c2e3269 Fix -Wundef warning for OUTPUT_FPF
BUG=webm:1069

Change-Id: I3d13d07cf0934e6e262c8033bd77d7197d03ce21
2016-08-31 22:59:59 -07:00
Johann
42ccd79b27 Fix -Wundef warning for __SANITIZE_ADDRESS__
BUG=webm:1069

Change-Id: Iad8811939a910a8f31cf5788220712a255ddf36a
2016-08-31 22:59:53 -07:00
Linfeng Zhang
113f9721d1 Merge "Rename test/lpf_8_test.cc to test/lpf_test.cc" 2016-08-31 22:46:07 +00:00
Linfeng Zhang
5399613889 Rename test/lpf_8_test.cc to test/lpf_test.cc
It actually tests all sizes lpf functions.

Change-Id: Ie31798f90165e6e0c13cbac0e0ab9648ab568bce
2016-08-31 15:16:48 -07:00
Linfeng Zhang
bee7d837ab Update NEON transpose functions.
Unify coding style.

Change-Id: I5826f40c02c882df7353391e0c9dd6cef6bd4b97
2016-08-31 14:58:40 -07:00
Debargha Mukherjee
e6446b4b60 Refactor uv tx size with lookup arrays
Change-Id: Ife6a3d301c5faaba89d16d188d638631083511f7
2016-08-31 13:15:38 -07:00
Linfeng Zhang
3dfba04dec Merge "Update vpx_lpf_vertical_16_dual_neon() intrinsics" 2016-08-31 19:41:25 +00:00
paulwilkins
6fc07a217d Modified resize loop constraints.
Using a tighter resize constraint on undershoot seems to help
results (especially SSIM) as significant undershoot on a frame
seems to have more of a damaging impact than overshoot.

This patch has been tuned so that in local testing using the
derf set it is encode speed neutral for speed  setting 2.

Average quality result for speed 2 (psnr,ssim) were  as follows:-

 lowres  0.039,  0.453
 midres  0.249, 0.853
 hdres  0.159, 0.659
 NetFlix -0.241, 0.360

Change-Id: Ie8d3a0d7d6f7ea89d9965d1821be17f8bda85062
2016-08-31 12:45:49 +01:00
Jim Bankoski
66b2266a22 libyuv: update to de944ed8c74909ea6fbd743a22efe1e55e851b83
Fixes windows build issue:
==> tests::VS10_x64 is broken
         LINK : warning C4742: 'kYvuI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxdec.vcxproj]
         LINK : warning C4744: 'kYvuI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxdec.vcxproj]
         LINK : warning C4742: 'kYuvI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxdec.vcxproj]
         LINK : warning C4744: 'kYuvI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxdec.vcxproj]
         LINK : warning C4742: 'kYvuI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxenc.vcxproj]
         LINK : warning C4744: 'kYvuI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxenc.vcxproj]
         LINK : warning C4742: 'kYuvI601Constants' has different alignment in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': 32 and 2 [.build-x86_64-win64-vs10\vpxenc.vcxproj]
         LINK : warning C4744: 'kYuvI601Constants' has different type in 'third_party\libyuv\source\row_common.cc' and 'third_party\libyuv\source\planar_functions.cc': '__declspec(align(32)) struct (224 bytes)' and 'struct (224 bytes)' [.build-x86_64-win64-vs10\vpxenc.vcxproj]
         LINK : error C2220: warning treated as error - no 'executable' file generated [.build-x86_64-win64-vs10\vpxdec.vcxproj]
         LINK : error C2220: warning treated as error - no 'executable' file generated [.build-x86_64-win64-vs10\vpxenc.vcxproj]

Change-Id: Ic3c4fff9209f5a52ff8f8ff321548d49ba09ec06
2016-08-30 14:24:35 -07:00
Linfeng Zhang
f7cbfed682 Update vpx_lpf_vertical_16_dual_neon() intrinsics
Process 16 samples together.

Change-Id: If6ee8e3377aa2786417f2fc411ba7d87ea8b6799
2016-08-30 11:17:33 -07:00
Paul Wilkins
129814fcb4 Merge "Adjust coefficient optimization and tx_domain rd speed features." 2016-08-30 16:54:40 +00:00
Linfeng Zhang
3a3169be59 Merge "Update vpx_lpf_horizontal_edge_16_neon() intrinsics" 2016-08-29 21:37:07 +00:00
Marco Paniconi
e66cd132f0 Merge "vp8: Move loopfilter synchronization to end of encode_frame call." 2016-08-29 05:52:40 +00:00
Linfeng Zhang
4916515511 Update vpx_lpf_horizontal_edge_16_neon() intrinsics
Process 16 samples together.

Change-Id: I9cfbe04c9d25d8b89f63f48f519e812746db754d
2016-08-27 14:47:48 -07:00
James Zern
3a98508775 Merge "vpx_mem,align_addr: use ~ to create mask" 2016-08-27 21:27:45 +00:00
James Zern
19d881290d vpx_mem,align_addr: use ~ to create mask
removes the need for an intermediate cast to int, which was missing in
the call added in:
69c5ba1 vpx_mem: Refactor code

quiets a visual studio warning:
C4146: unary minus operator applied to unsigned type, result still
unsigned

Change-Id: I76c4003416759c6c76b78f74de7c0d2ba5071216
2016-08-27 11:39:18 -07:00
James Zern
2917737879 vp9_alt_ref_aq_set_nsegments: harmonize fn signature
Change-Id: I5f232664652a8dc3a71e43b8b1fa05ddb4a84ecc
2016-08-27 11:16:03 -07:00
Yury Gitman
507d272265 Move vp9_alt_ref_aq_private.h to vp9_alt_ref_aq.c
+ add a temporary dummy element to ALT_REF_AQ to avoid a warning about
an empty struct

Change-Id: Ib6e5c39ff62ad96eb4e3686d4882228a42b3843f
2016-08-27 10:53:41 -07:00
James Zern
a19b9b6185 Merge changes Ia81004d6,I74b80fb6,I38fcb62b,I2da9cd5d
* changes:
  vpx_mem: add basic size check
  vpx_mem: normalize function names
  vpx_realloc correction.
  vpx_mem: Refactor code
2016-08-26 23:52:04 +00:00
James Zern
ed11abbc36 Merge changes I353da4a2,I423f2153
* changes:
  vp8_decoder_create_threads: check sem/pthread returns
  vp8_create_decoder_instances: add missing setjmp
2016-08-26 23:48:08 +00:00
Johann Koenig
a70861c435 Merge "Remove halfpix specialization" 2016-08-26 21:28:01 +00:00
James Zern
58a497dc29 Merge "add_noise,vpx_setup_noise: correct 'char_dist' type" 2016-08-26 18:47:39 +00:00
James Bankoski
fcc4f3fa21 Merge "libyuv: update to c244a3e9" 2016-08-26 18:06:06 +00:00
Jingning Han
dd2a475e43 Merge "Fix VS build warnings in vp9_alt_ref_aq files" 2016-08-26 17:19:12 +00:00
Paul Wilkins
badd32d914 Merge "Add ALLOW_RECODE_FIRST speed mode." 2016-08-26 15:46:45 +00:00
Jingning Han
84fccfe475 Fix VS build warnings in vp9_alt_ref_aq files
Change-Id: I5b19ec00a1eb8b148026f665d217c12eb50b614a
2016-08-26 08:43:36 -07:00
paulwilkins
dc42f343ae Add ALLOW_RECODE_FIRST speed mode.
This patch is to address concerns that changes to allow
recodes on the first frame in each ARF group do not give a
good enough speed quality trade off for speed 2. Though the
average impact  on encode speed is 1-2%, for some hard clips
it is > 5% rise.  For speed 1 this is less an issue and for Speed 0
the previous patch actually  improves speed.

Change-Id: Ie1bcefdbfdf846d3f4428590173f621465dffe3a
2016-08-26 11:43:47 +01:00
James Zern
a91fe33c6d Merge "vp8: fix decoder crash with invalid leading keyframes" 2016-08-26 07:01:42 +00:00
Sarah Parker
37e83789f1 Fix formatting in internal stats for vp8 and vp9
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.

The corresponding fix for vp10 is in
Ica3d625d6672b3c47e0e208b45eede29b9004030.

Change-Id: Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1
2016-08-25 17:46:18 -07:00
Marco
b6a5f6f740 vp8: Move loopfilter synchronization to end of encode_frame call.
Allow loopfilter to continue until encode_frame is completed.

Change-Id: I7bbccc3d409e263aab6a6ff24588d8b2a964a96e
2016-08-25 12:37:30 -07:00
Yury Gitman
292d221fed Create interface for the ALT_REF_AQ class
Current commit is just an API template  for the rest of the code, and
I will add inner logic later.

Altref  frames  generate a  lot  of  bitrate  and  at the  same  time
other  frames  refer to  them  a  lot, so  it  makes  sense to  apply
special  compensation-based adaptive  quantization scheme  for altref
frames. E.g.,  for blocks  that are  good predictors  for the  future
apply rate-control  chosen quantizer  while for bad  predictors apply
worse one.

Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
2016-08-25 10:55:14 -07:00
Yury Gitman
c018032579 Merge "Add --alt-ref-aq=<int> option" 2016-08-25 17:49:41 +00:00
paulwilkins
635ae8bdc1 Adjust coefficient optimization and tx_domain rd speed features.
Previously Tx domain rd was used in all cases above speed 0.
Coefficient optimization was only enabled for best and speed 0.

This patch selectively sets these features at other speed settings
based on block complexity.

For the Netflix and HD sets in particular the quality gains are
large compared to the speed hit. At speed 1 the average psnr
gain in the NF set  is > 2.5% with one clip coming in at 18%
and some points almost 30%.  Average gains for the lower
resolution test sets are around 1%.

The gains are biggest at low Q so some further optimization
may be possible.

Change-Id: I340376c7b2a78e5389a34b7ebdc41072808d0576
2016-08-25 15:36:16 +01:00
Jim Bankoski
6d7a9f3e9c libyuv: update to c244a3e9
Fixes color issue when scaling without breaking mingw.

BUG=https://bugs.chromium.org/p/libyuv/issues/detail?id=605
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1252

Change-Id: I09437d93fd65964ad57113274d8c819f3eaf2e57
2016-08-25 06:39:38 -07:00
James Zern
3ddff4503a add_noise,vpx_setup_noise: correct 'char_dist' type
fixes SSE2/AddNoiseTest.CheckCvsAssembly/0 with -funsigned-char.
visibly broken since:
0dc69c7 postproc : fix function parameters for noise functions.
where the types diverged (char vs. int8)
but likely the return changed in:
2ca24b0 postproc - move filling of noise buffer to vpx_dsp.
when multiple implementations were merged.

Change-Id: I176ca1f170217f05ba7872b0c4de63e41949e999
2016-08-24 21:46:26 -07:00
Marco Paniconi
ce634bbf4d Merge "Add datarate tests for encoder multi-threads (vp8 and vp9)." 2016-08-25 03:13:36 +00:00
James Zern
4699aca87f vpx_mem: add basic size check
set a max allocable size to prevent overflows in 32-bit and extremely
large allocation attempts in 64-bit. this could be amended to allow size
or num parameters to be 64-bits with the correct size being used at each
call site.

BUG=webm:819

Change-Id: Ia81004d6c4279680714c4488b4f6cf287ab396a5
2016-08-24 19:22:57 -07:00
James Zern
963291217f vpx_mem: normalize function names
use lower case + '_' rather than capital followed by camel case

Change-Id: I74b80fb660d281228e25edc8b6509455ffe2920e
2016-08-24 19:22:56 -07:00
Urvang Joshi
28c6207bcd vpx_realloc correction.
vpx_realloc was allocating 1 byte more than needed every time.
Fixed this, and took this opportunity to do a small refactoring.

Change-Id: I38fcb62b698894acbbab43466c1decd12f906789
(cherry picked from aom: 2a876b4 aom_realloc correction.)
2016-08-24 19:22:52 -07:00
Urvang Joshi
69c5ba1910 vpx_mem: Refactor code
Change-Id: I2da9cd5da48ae97e770bccfd1233bcc70b484688
(cherry picked from aom: 83c95f5 aom_mem: Refactor code)
2016-08-24 19:22:41 -07:00
Marco
dde8004716 Add datarate tests for encoder multi-threads (vp8 and vp9).
Change-Id: I7f9b23026aaee309095cc3f4724125ae319875af
2016-08-24 16:25:36 -07:00
Yury Gitman
d7c20079a6 Add --alt-ref-aq=<int> option
In the future this option will activate adaptive quantization special
for altref frames. Encoder will  create the adaptive quantization map
on the basis of lookahead buffers similarity which is the estimate of
the future motion compensation performance.

Change-Id: Ia0088b3babb0f9a4899c79d8d819947ba5a03df2
2016-08-24 15:49:25 -07:00
Jacky Chen
5260a6675e Merge "vp9: Refactor set_low_temp_var_flag." 2016-08-24 22:02:53 +00:00
James Zern
a6efe6d437 vp8_decoder_create_threads: check sem/pthread returns
Change-Id: I353da4a2f988ca51d48d0ca91236e8cc0bb48ff5
2016-08-23 19:19:57 -07:00
James Zern
13338a481f vp8_create_decoder_instances: add missing setjmp
vp8_decoder_create_threads() has allocations that expect one is set.

Change-Id: I423f2153a2969c88d48ba45cc9ead4a01443ce65
2016-08-23 18:29:42 -07:00
Johann
d393885af1 Remove halfpix specialization
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"

Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.

BUG=webm:1273

Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
2016-08-23 17:05:39 -07:00
James Zern
0f42d1fa85 vp8: fix decoder crash with invalid leading keyframes
decoding the same invalid keyframe twice would result in a crash as the
second time through the decoder would be assumed to have been
initialized as there was no resolution change. in this case the
resolution was itself invalid (0x6), but vp8_peek_si() was only failing
in the case of 0x0.
invalid-vp80-00-comprehensive-018.ivf.2kf_0x6.ivf tests this case by
duplicating the first keyframe and additionally adds a valid one to
ensure decoding can resume without error.

BUG=b/30593765

Change-Id: If0859035908b7870d67a7f3f646b5a080252eb6d
2016-08-23 16:27:52 -07:00
Yury Gitman
c325fb748a Correct CHECK_MEM_ERROR macro (release builds)
The previous macro doesn't work with &cpi->common as a first argument

Change-Id: Iddf7a1f5d56d7abafd9b2b8707aa611d349e7a68
2016-08-23 22:46:04 +00:00
jackychen
8d4c0ec1f1 vp9: Refactor set_low_temp_var_flag.
No need to pass in force_split, since we should use sb_type in the
condition.

Change-Id: Ide27243ef46e017bbb98d676347fc566a6c828f7
2016-08-23 15:11:40 -07:00
Yunqing Wang
f6c5410cd4 Merge "Disable split mode in 4k video encoding" 2016-08-23 15:35:33 +00:00
Yunqing Wang
ef98f49cb0 Disable split mode in 4k video encoding
Disabled the split mode while encoding 4k video to speed
up the encoder.

Borg test result on 4k set:
Overall PSNR: +0.029%; SSIM: +0.009%.
Average encoder speedup at speed 2 is 2.5%.

Change-Id: I1519c658f07c3ac838affbe5aff0ed9b94f3f8f4
2016-08-22 19:46:44 -07:00
Yury Gitman
bf7a02a4cf Correct CHECK_MEM_ERROR macro
The previous macro doesn't work with &cpi->common as a first argument

Change-Id: Ic3f5c49a94cf8b17de6569811b957c963341bb58
2016-08-22 14:25:57 -07:00
Marco Paniconi
f5bd76f5c1 Merge "Revert "vp8: Move loopfilter synchronization to end of encode_frame call."" 2016-08-22 15:46:57 +00:00
Marco Paniconi
de075a95e0 Revert "vp8: Move loopfilter synchronization to end of encode_frame call."
This reverts commit c2fe9acced.

This change break linux browser test in chromium:
https://build.chromium.org/p/chromium.webrtc/builders/Linux%20Tester

Change-Id: I226782fad480c17a99ec6c785ad93cf4ab88f0ae
2016-08-22 15:46:20 +00:00
Yunqing Wang
37169c0bd4 Merge "Adjust speed features for 4k video encoding" 2016-08-19 23:11:05 +00:00
Yunqing Wang
fe488cceff Adjust speed features for 4k video encoding
Adjusted speed 2 features to speed up 4k video encoding.
BDBR results from borg test:
PSNR: +0.313%; SSIM: +0.268%.
Average speedup: 8.5%

Change-Id: I1e2695a01fb3f3817c1df4480e184c2aed8f2eba
2016-08-19 09:30:32 -07:00
James Zern
149d082377 vp9_pickmode: quiet float conversion warnings
Change-Id: I591e4f958955b3f2edb2f95a83c54cd83c8ef075
2016-08-19 01:28:01 -07:00
James Zern
8b4c31584e vp9_alloc_context_buffers: clear cm->mi* on failure
this fixes a crash in vp9_dec_setup_mi() via
vp9_init_context_buffers() should decoding continue and the decoder
resyncs on a smaller frame

BUG=b/30593752

Change-Id: I9ce8d94abe89bcd058697e8bd8599690e61bd380
2016-08-19 00:18:11 -07:00
Jacky Chen
52db2b1690 Merge "vp9 svc: SVC encoder speed up." 2016-08-18 21:21:29 +00:00
Johann Koenig
33dedd0628 Merge "Remove '-chromium' flag from ads2gas_apple.pl" 2016-08-18 19:54:55 +00:00
JackyChen
8be7e572a7 vp9 svc: SVC encoder speed up.
Bias towards base_mv and skip 1/4 pixel motion search when using base mv.
2~3% speed up for 2 spatial layers, 3~5% speed up for 3 spatial layers.
PSNR loss:
(2 layers) 0.07dB for gips_stationary, 0.04dB for gips_motion;
(3 layers) 0.07dB for gips_stationary, 0.06dB for gips_motion.

Change-Id: I773acbda080c301cabe8cd259f842bcc5b8bc999
2016-08-18 11:25:45 -07:00
Marco Paniconi
1c07abca18 Merge "vp9 non-rd pickmode: Add limit on newmv-last and golden bias." 2016-08-18 18:03:48 +00:00
Marco Paniconi
37a39ac138 Merge "vp8: Move loopfilter synchronization to end of encode_frame call." 2016-08-18 02:46:31 +00:00
Marco
7eb7d6b227 vp9 non-rd pickmode: Add limit on newmv-last and golden bias.
Add option, for newmv-last, to limit the rd-threshold update for early exit,
under a source varianace condition.
This can improve visual quality in low texture moving areas,
like forehead/faces.

Also add bias against golden to improve the speed/fps,
will little/negligible loss in quality.

Only affects CBR mode, non-svc, non-screen-content.

Change-Id: I3a5229eee860c71499a6fd464c450b167b07534d
2016-08-17 14:33:44 -07:00
Johann
1b982cc64f Remove '-chromium' flag from ads2gas_apple.pl
The flag was added because Apple clang and Chromium clang disagreed
for certain versions of instructions.

qsubaddx, qaddsubx, ldrneb and ldrneh were used in armv6 assembly
which was removed in d55724fae9

vqshrun was used in some neon assembly but superseded by
dcbfacbb98

.include was used for obj_int_extract/asm_offsets and removed in
6eec73a747

Change-Id: I32f4c9b536d0318482101c0b8e91e42b8f545f18
2016-08-17 14:05:16 -07:00
paulwilkins
af3b0de732 Add casting to fix warning.
Frame bits can safely be stored int but group bits
(kf or arf) use 64bit.

Change-Id: I0800f2a28070f8749110a95721c116fc56987885
2016-08-17 11:18:07 +01:00
paulwilkins
ab7cd6d068 Add {} to try and keep Jenkins happy.
Change-Id: If1ca3cf83e058317c9751d7da6caa7cd75eb6845
2016-08-17 11:17:36 +01:00
Marco
c2fe9acced vp8: Move loopfilter synchronization to end of encode_frame call.
Change-Id: I5bdfea7f51df1f1fa5d9c1597e96988acce6c2f2
2016-08-16 11:22:23 -07:00
Linfeng Zhang
f9efbad392 NEON asm of vpx_lpf_{horizontal,vertical}_8_dual_neon()
Also expose the NEON intrinsics version.

BUG=webm:1261, webm:1266.

Change-Id: I8c4ae658467dcf66ebf7a75982b2ef712dbb4535
2016-08-16 08:50:57 -07:00
paulwilkins
5d881770e5 Change default recode rule for good speed 0 and best.
Changes the default recode rule for Speed 0 and best quality
from ALLOW_RECODE to ALLOW_RECODE_KFARFGF.

Tested on the NF, hdres, midres and lowres test sets, this setting
when combined with patch I40cb559... now performs "as well" in
metrics terms (in fact it came out a tiny amount better overall)
but encode time is 9.6%  faster (measured as the average
from 27 mid rate local encodes on clips in the derf/lowres set.

Change-Id: I8c781c0cdfa3a9929cd9406d15582fce47d6ae3b
2016-08-15 10:52:54 +01:00
paulwilkins
de3b769524 Change to recode rules.
Allow recodes for the first inter frame in each arf group
even when the recode rule is set to ALLOW_RECODE_KFARFGF.

Small gains of 0.05%.

Change-Id: I40cb559d36a2bf0ebf5cf758c3f92e452b480577
2016-08-15 10:52:02 +01:00
Paul Wilkins
fe4dd4f43f Merge "Modified ARF group allocation." 2016-08-15 09:42:30 +00:00
Yunqing Wang
fafec95702 Merge "Fix another motion vector out of range bug" 2016-08-12 23:52:14 +00:00
James Zern
dfcefe06fa Merge "variance_impl_avx2: restore table layout" 2016-08-12 23:02:27 +00:00
James Zern
bd7cfb46fb variance_impl_avx2: restore table layout
disable clang-format for bilinear_filters_avx2

restores the row layout prior to:
099bd7f vpx_dsp: apply clang-format
but keeps the justification used by clang-format

Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7
2016-08-12 11:52:53 -07:00
Linfeng Zhang
f09b5a3328 NEON intrinsics for 4 loopfilter functions
New NEON intrinsics functions:
vpx_lpf_horizontal_edge_8_neon()
vpx_lpf_horizontal_edge_16_neon()
vpx_lpf_vertical_16_neon()
vpx_lpf_vertical_16_dual_neon()

BUG=webm:1262, webm:1263, webm:1264, webm:1265.

Change-Id: I7a2aff2a358b22277429329adec606e08efbc8cb
2016-08-12 09:58:17 -07:00
Yunqing Wang
a413dbe594 Fix another motion vector out of range bug
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
 mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
 mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.

For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.

Change-Id: Id7d798fc7214e95c6e4846c588f0233fcf1a4223
2016-08-12 09:27:58 -07:00
Marco
f1e12c1bf3 vp8: Fix denoiser setting in multi-res sample encoder.
Change-Id: I9222f3b252e5ed883659f1a14cd705944ee9da07
2016-08-10 16:22:08 -07:00
paulwilkins
656f4a88cf Modified ARF group allocation.
Small average gains in the range 0.05 - 0.1

Change-Id: I30e85c04be615cc84726427c5057388b20a6ff60
2016-08-10 14:22:01 -07:00
Aleksey Vasenev
343b6b09a1 Align thread entry point stack
_beginthreadex does not align the stack on 16-byte boundary as expected
by gcc.

On x86 targets, the force_align_arg_pointer attribute may be applied to
individual function definitions, generating an alternate prologue and
epilogue that realigns the run-time stack if necessary. This supports
mixing legacy codes that run with a 4-byte aligned stack with modern
codes that keep a 16-byte stack for SSE compatibility.
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html

Change-Id: Ie4e4ab32948c238fa87054d5664189972ca6708e
Signed-off-by: Aleksey Vasenev <margtu-fivt@ya.ru>
2016-08-10 11:57:34 -07:00
James Zern
4916a87bfc Merge changes I1d3edbdb,I8b49fd05
* changes:
  tests: use scoped_ptr for local video source vars
  y4m_test: init members in the constructor
2016-08-10 00:05:58 +00:00
Alex Converse
941fe20336 Merge "Refactor mv limits." 2016-08-09 17:12:50 +00:00
James Zern
475e9d26e0 tests: use scoped_ptr for local video source vars
prevents leak warnings on ASSERT*() failures

Change-Id: I1d3edbdbb18dbbe3b17691971348a8121cf09afa
2016-08-08 14:43:14 -07:00
Yury Gitman
c37d012ada Merge "Add cpi parameter for forcing segmentation update" 2016-08-08 21:29:42 +00:00
James Zern
9e9722bc79 y4m_test: init members in the constructor
prevents use of an uninitialized value in the deconstructor should the
test fail before tmpfile_ is set.

Change-Id: I8b49fd05f0d05e055fdf653bd46983d30f466a68
2016-08-08 14:27:34 -07:00
Yury Gitman
7a730d5901 Add cpi parameter for forcing segmentation update
Change-Id: I1b0bcb1ffe7604117bfaa0b9989d0e25ff04d28c
2016-08-08 13:20:42 -07:00
James Zern
cfd92dab18 Merge changes from topic 'clang-tidy'
* changes:
  *_perf_test.cc: correct DoDecode signature
  test: apply clang-tidy google-readability-braces-around-statements
2016-08-08 20:12:42 +00:00
Alex Converse
6554333b59 Refactor mv limits.
Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987
2016-08-08 11:54:00 -07:00
Yunqing Wang
6a8d4631a8 Merge "Fix a motion vector out of range bug" 2016-08-08 17:59:50 +00:00
James Zern
2c17d54681 *_perf_test.cc: correct DoDecode signature
+ delete unused kMaxPsnr from decode_perf_test.cc

Change-Id: Id93347631e7870491069a8b7c5bb1f6b2828425f
2016-08-05 20:21:02 -07:00
clang-format
9c9d92ae3a test: apply clang-tidy google-readability-braces-around-statements
applied against a x86_64 configure with and without
--enable-vp9-highbitdepth

clang-tidy-3.7.1 \
  -checks='-*,google-readability-braces-around-statements' \
  -header-filter='.*' -fix
+ clang-format afterward

Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d
2016-08-05 20:02:28 -07:00
Linfeng Zhang
2d1e63d0c5 Remove duplicates in Loop8Test6Param and Loop8Test9Param
Extract the duplicated data generation code in OperationCheck() of
Loop8Test6Param and Loop8Test9Param, and put in function InitInput().

Change-Id: Ied39ba4ee86b50501cc5d10ebf54f5333c4708f0
2016-08-05 19:51:01 -07:00
James Zern
c12f2f3187 Merge "remove tools/vpx-style.sh" 2016-08-06 01:23:13 +00:00
James Zern
19d2e73dea Merge changes Ice037acb,I806af11b,I344a7dd0,Ib7cb87fa
* changes:
  vp9: normalize vpx_enc_frame_flags_t usage
  args.c: add some explicit casts
  webmdec: quiet -Wshorten-64-to-32 warning
  test/decode_test_driver: rm unused deadline member
2016-08-06 01:20:52 +00:00
Linfeng Zhang
ba42ce64b7 Fix a bug in test/lpf_8_test.cc
This bug is introduced in 36608af524,
where buffer tmp_s is not fully initialized.

Change-Id: I125b966cf054a82bc63c72647cdd463f434eda17
2016-08-05 17:52:10 -07:00
Yunqing Wang
2fb826c4d5 Fix a motion vector out of range bug
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.

BUG=webm:1271

Change-Id: Ibc4e0de80846f297431923cef8a0c80fe8dcc6a5
2016-08-05 15:23:05 -07:00
James Zern
7104833085 vp9: normalize vpx_enc_frame_flags_t usage
quiets -Wshorten-64-to-32 warnings

Change-Id: Ice037acb675d1d81bfedf2dfcfa91a8a29a19dfd
2016-08-04 23:37:49 -07:00
James Zern
d772d55704 args.c: add some explicit casts
values are range checked before returning; quiets -Wshorten-64-to-32
warnings

Change-Id: I806af11b2aaf6760c7ab234a2fe2fdf40e7bece7
2016-08-04 23:37:49 -07:00
James Zern
c79665d0ad webmdec: quiet -Wshorten-64-to-32 warning
track->GetNumber() will fit in an int in well-behaved files

Change-Id: I344a7dd05d04daf3df2d67358ea69f8014a03a5b
2016-08-04 23:37:49 -07:00
James Zern
1b1e40c0b2 test/decode_test_driver: rm unused deadline member
has the side-effect of removing some lint and -Wshorten-64-to-32
warnings

Change-Id: Ib7cb87fa65cd65534096921f243d15288e97256d
2016-08-04 23:36:53 -07:00
James Zern
958ae5af9c remove tools/vpx-style.sh
update ftfy.sh to use clang-format

Change-Id: I8ac740c5b3842beed2b8878fbe506f381f4c57e4
2016-08-04 20:17:09 -07:00
Johann Koenig
57f49db81f Merge changes I6ef79702,Id332c641,I354b5d22,I84438013
* changes:
  Use common transpose for vpx_idct32x32_1024_add_neon
  Use common transpose for vpx_idct8x8_[12|64]_add_neon
  Use common transpose for vp9_iht8x8_add_neon
  Use common transpose for vpx_idct16x16_[10|256]_add_neon
2016-08-04 22:30:47 +00:00
Johann Koenig
17720b60bb Merge "Remove armv6 target" 2016-08-04 22:21:13 +00:00
James Zern
7f7c888c14 Merge "correct break placement" 2016-08-04 22:19:30 +00:00
Johann
0325b95938 Use common transpose for vpx_idct32x32_1024_add_neon
Change-Id: I6ef7970206d588761ebe80005aecd35365ec50ff
2016-08-04 20:13:18 +00:00
Johann
f4e4ce7549 Use common transpose for vpx_idct8x8_[12|64]_add_neon
Change-Id: Id332c641f05336ef9a45e17493ff149fd0a168f0
2016-08-04 20:13:12 +00:00
Johann
7103b5307d Use common transpose for vp9_iht8x8_add_neon
Change-Id: I354b5d22130d76b0eceda0748db1f871f58fa372
2016-08-04 20:13:03 +00:00
Johann
8619203ddc Use common transpose for vpx_idct16x16_[10|256]_add_neon
Change-Id: I84438013f483e82084d33ba9a63c33273d35fcaa
2016-08-04 20:12:53 +00:00
Johann Koenig
b757d89ff9 Merge "Extract neon transpose for re-use" 2016-08-04 20:12:38 +00:00
James Zern
4db9bd324d Merge "vp9_ratectrl.c: apply clang-format" 2016-08-04 20:01:46 +00:00
James Zern
70a7885a65 correct break placement
these should be placed within {}s when present

Change-Id: Ia775fac5373603e77360398f19b07958fb43f476
2016-08-04 13:00:14 -07:00
Johann Koenig
caac87b05b Merge "Don't expand to Q register for 4x4 intrapred" 2016-08-04 19:55:50 +00:00
Johann
d55724fae9 Remove armv6 target
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
2016-08-04 12:55:06 -07:00
Johann Koenig
476e8fc855 Merge "Pad 'Left' when building under ASan" 2016-08-04 19:27:45 +00:00
Linfeng Zhang
36608af524 Merge "Update Loop8Test{6,9}Param to test filter8() in mb_lpf_vertical_edge_w()" 2016-08-04 19:21:22 +00:00
Johann
377cfa31f0 Extract neon transpose for re-use
Change-Id: I5e1c7f4c80d1c6f7fd582ac468c6eaaa3603a06c
2016-08-04 19:04:25 +00:00
James Zern
374f0ff4a0 Merge changes from topic 'clang-format'
* changes:
  README: add a note about clang-format
  README: update target list
  README: fix typo
2016-08-04 19:03:03 +00:00
clang-format
3a4002b94d vp9_ratectrl.c: apply clang-format
after:
ff0a87c vp9 1pass vbr: Adjustment to gf interval.

Change-Id: I1296e53e601bf0c2b562e3a34082ac45c294a5f1
2016-08-04 11:57:00 -07:00
Johann
df69c751a7 Don't expand to Q register for 4x4 intrapred
The code was expanding to Q registers so that vqrshn could be used, for
vector quad round shift and narrow. If 4 values are added together,
there is a shift by 2. If 8 values, a shift by 3. Since this accounts
for any possibility of overflow, we can skip the narrowing shift.

This allows keeping the values in D registers and casting the 16 bit
value to 8 bits.

Change-Id: I8d9cfa07176271f492c116ffa6a7b351af0b8751
2016-08-04 18:51:46 +00:00
Linfeng Zhang
bbf4c91f79 Update Loop8Test{6,9}Param to test filter8() in mb_lpf_vertical_edge_w()
One branch of filter8() in mb_lpf_vertical_edge_w() was not tested.

Change-Id: I194202d771d9acd6b4e5e600ee2bae89986b49f3
2016-08-04 11:33:14 -07:00
Marco Paniconi
9fdeeaf411 Merge "vp9 1pass vbr: Adjustment to gf interval." 2016-08-04 17:50:55 +00:00
Yaowu Xu
7a79fa1362 Fix msvc compiler warnings
MSVC 2013 complained about using 32 shift where 64 bit shift should be
used.

Change-Id: I7a2b165d1a92d3c0a91dd4511b27aba7709b5e55
2016-08-03 18:33:06 -07:00
James Zern
b51d127c82 Merge "Resolve -Wshorten-64-to-32 warnings in prob.h." 2016-08-04 00:38:08 +00:00
James Zern
15f29ef092 README: add a note about clang-format
Change-Id: I835401e3befffcbc68e7d2bdd2fd556a19948e91
2016-08-03 17:34:03 -07:00
James Zern
77f5c3d2e8 README: update target list
Change-Id: I80293720a5f12bc2449ceaadbb2ad0f924141552
2016-08-03 17:30:45 -07:00
James Zern
5ea8712b82 README: fix typo
Change-Id: I2c3ecc62b1fd1e600b3d70b623c8b11e1e8e4d13
2016-08-03 17:30:45 -07:00
James Zern
068281751c Merge "test: apply clang-format" 2016-08-04 00:27:59 +00:00
James Zern
a412c004e4 Merge "vp9/decoder,vp9/*.[hc]: apply clang-format" 2016-08-04 00:22:59 +00:00
Johann
a7a8e07a44 Pad 'Left' when building under ASan
The neon intrinsics are not able to load just the 4 values that are
used. In vpx_dsp/arm/intrapred_neon.c:dc_4x4 it loads 8 values for both
the 'above' and 'left' computations, but only uses the sum of the first
4 values.

BUG=webm:1268

Change-Id: I937113d7e3a21e25bebde3593de0446bf6b0115a
2016-08-03 16:38:51 -07:00
Marco
ff0a87ce38 vp9 1pass vbr: Adjustment to gf interval.
Increase the minimum distance.
Reduces the overshoot somewhat on some clips,
small gain in avgPSNR (~0.1%) on ytlive set.

Change-Id: Id5ddde20c2907dbdb536e79542eff775019c142b
2016-08-03 15:36:27 -07:00
clang-format
08131055e4 vp9/decoder,vp9/*.[hc]: apply clang-format
Change-Id: Ic38ea06c7b2fb3e8e94a4c0910e82672a1acaea7
2016-08-03 14:29:31 -07:00
Yaowu Xu
85e111b3ba Merge "vp9 svc: Fix a valgrind error." 2016-08-03 20:53:05 +00:00
clang-format
8ff40f8bec vp9/common: apply clang-format
Change-Id: Ie0f150fdcfcbf7c4db52d3a08bc8238ed1c72e3b
2016-08-02 18:27:07 -07:00
clang-format
e0cc52db3f vp9/encoder: apply clang-format
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
2016-08-02 16:47:11 -07:00
JackyChen
f7032713af vp9 svc: Fix a valgrind error.
This error was introduced by the patch:
8ce67d7 vp9 svc: Enable different speed setting for each spatial layer.
To use svc, svc_param should be cleared to 0 at the beginning.

Change-Id: I222f03ddae8a50e84b4690b78263abb742fae91e
2016-08-02 16:16:22 -07:00
Alex Converse
d089ac4dda Resolve -Wshorten-64-to-32 warnings in prob.h.
Change-Id: I1244ee908d81467f0fc8a8fce979fc8077a325b4
2016-08-02 15:40:23 -07:00
Alex Converse
3a04c9c9c4 Merge "Resolve -Wshorten-64-to-32 in variance." 2016-08-02 22:26:55 +00:00
Yaowu Xu
039f9e08f0 change HBD pixel value from uint8_t to uint16_t
This fixes a regression in 10/12 bit encoding results.

Change-Id: I438877352a41aae0a864a8d9979afe4aa2061d81
2016-08-02 11:01:39 -07:00
Yaowu Xu
dc5618f3bb Add pointer conversion for HBD buffers
This fixes a crash in HBD build.

Change-Id: I7f688f50227323e69bba65df0d56f4360f01771b
2016-08-01 15:56:43 -07:00
Alex Converse
004eebed31 Merge "Unfork 8-bit in HBD path in vp9_model_rd_from_var_lapndz callers." 2016-08-01 16:42:39 +00:00
Alex Converse
2c3807b89f Merge "Cache optimizations in optimize_b()." 2016-08-01 16:30:05 +00:00
Alex Converse
e446ffda45 Cache optimizations in optimize_b().
Move best index into the token state. Shrink it down to one byte. This
is more cache friendly (access are group together) and uses less total
memory.

Results in 4% fewer cycles in optimize_b().

Change-Id: I75db484fb3dc82f59928d54b659d79c80ee40452
2016-07-29 12:06:49 -07:00
Johann Koenig
d4ab234869 Merge "replace by VSTM/VLDM to reduce one of VST1/VLD1" 2016-07-29 14:25:10 +00:00
Min Chen
407c2e2974 replace by VSTM/VLDM to reduce one of VST1/VLD1
Change-Id: I596567570580babb1a52925541d1fd1045c352f5
2016-07-28 23:01:38 +00:00
JackyChen
6fbb4c3061 vp8: Switch skin model to mode 0 to save some cycle.
This change will speed up vp8 encoder by 1.5% ~ 2% on linux. No
much speed change on Mac.

Change-Id: Id957f19ddd89805baa2af84c5027d52d9a48553f
2016-07-28 13:32:50 -07:00
Jacky Chen
462a7c9f0a Merge "vp9 svc: Enable different speed setting for each spatial layer." 2016-07-28 20:21:30 +00:00
Alex Converse
c0241664aa Resolve -Wshorten-64-to-32 in variance.
The subtrahend is small enough to fit into uint32_t.

Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f
2016-07-28 10:16:31 -07:00
Alex Converse
4508eb3123 Merge "Fix 64 to 32 narrowing warning." 2016-07-28 16:36:46 +00:00
clang-format
956af1d478 vpx_dsp/x86/quantize_sse2.c: apply clang-format
post:
e429080 .clang-format: disable DerivePointerAlignment

Change-Id: I21a0546668edb2b09660e216d4875a1d2ad24d53
2016-07-27 21:41:18 -07:00
James Zern
6b374abc86 Merge "vp9 denoiser: Derefencing pointer should be after null check." 2016-07-28 00:43:19 +00:00
Alex Converse
335cf67d8b Fix 64 to 32 narrowing warning.
- Solves potential integer overflow on 12-bit
- Fixes Visual Studio build

Change-Id: I26dd660451bbab23040e4123920d59e82585795c
2016-07-27 12:40:23 -07:00
James Zern
341919d038 Merge "vpx_scale: apply clang-format" 2016-07-27 01:59:21 +00:00
clang-format
33e40cb5db test: apply clang-format
Change-Id: I0d9ab85855eb723f653a7bb09b3d0d31dd6cfd2f
2016-07-27 01:58:52 +00:00
JackyChen
47cc64cdf8 vp9 denoiser: Derefencing pointer should be after null check.
BUG=webm:1267

Change-Id: I899fc9e8d784c6eefcbe27945c619845adb7b6f0
2016-07-26 17:31:17 -07:00
James Zern
e4290800b2 .clang-format: disable DerivePointerAlignment
everything outside of third_party should follow 'PointerAlignment:
right' i.e., associate the '*' with the variable

+ add a note about the clang-format that generated this file

Change-Id: I13e3f4f5fb6e22a8fa7fc3d06879c995b7c41a39
2016-07-26 16:46:54 -07:00
clang-format
f4be884466 vpx_scale: apply clang-format
Change-Id: Ia07ba57756f75911d3d06318e1f9b1982e1ca8c5
2016-07-26 15:57:41 -07:00
James Zern
fbf256da41 Merge "vpx_ports: apply clang-format" 2016-07-26 22:54:31 +00:00
Alex Converse
34201e50c1 Unfork 8-bit in HBD path in vp9_model_rd_from_var_lapndz callers.
BUG=b/29583530

Change-Id: Ia88a75f9572e08f228559ab84b8a77efb5aff0af
2016-07-26 21:57:58 +00:00
James Zern
1a3c4f91f6 Merge "vpx_mem: apply clang-format" 2016-07-26 21:19:17 +00:00
James Zern
9f9a8d2aaa Merge "vpx_util: apply clang-format" 2016-07-26 21:18:25 +00:00
Alex Converse
1c85230344 Merge "Only consider visible 4x4s in pixel domain error." 2016-07-26 19:39:54 +00:00
James Zern
7987686397 Merge "register_state_check: simplify Check() methods" 2016-07-26 18:49:18 +00:00
clang-format
6565c17f24 vpx_util: apply clang-format
Change-Id: Ie7eab608e2906b9a2b3533db95292ebc430ad377
2016-07-25 22:33:21 -07:00
James Zern
f8c27d164c register_state_check: simplify Check() methods
- make Check() void as the EXPECT's are sufficient to document failure

cumulatively this has the effect of avoiding reporting incorrect Check()
failures due to earlier test failures.

Change-Id: I2cf775449f18c90c1506b8eadd7067adbc3ea046
2016-07-25 15:14:02 -07:00
jackychen
8ce67d714a vp9 svc: Enable different speed setting for each spatial layer.
This change only affects 1 pass cbr svc mode.

Change-Id: If0da87bb200f7e7762755340c40c8157cc7a16ca
2016-07-25 15:11:43 -07:00
Alex Converse
d6c5ef4557 Only consider visible 4x4s in pixel domain error.
BDRATE change
derf144: -0.327
lowres: -0.048
midres: -0.125
hdres: -0.238

Change-Id: I789aba9870b5c2952373a7dd4fc8ed45590c3c54
2016-07-25 21:44:06 +00:00
clang-format
580f14b68b vpx_ports: apply clang-format
Change-Id: Ice343335a40238fd21490bce0ce2972bdcb87055
2016-07-25 14:29:06 -07:00
clang-format
d7a3b781d3 vpx_mem: apply clang-format
Change-Id: I0440686fc03f1ee02bd0168c91e671a0a2d0056a
2016-07-25 14:17:59 -07:00
clang-format
099bd7f07e vpx_dsp: apply clang-format
Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975
2016-07-25 14:14:19 -07:00
James Zern
82070ae939 Merge "configure: test for -Wfloat-conversion" 2016-07-25 19:47:38 +00:00
Johann Koenig
5c0f5cdda8 Merge "Fix compilation error under Clang 4.0." 2016-07-25 19:44:05 +00:00
Ivan Krasin
91369fd9b7 Fix compilation error under Clang 4.0.
The LLVM trunk has reached 4.0 and now __clang_major__ is not enough
to distinguish between old XCode Clang and the new 'real' Clang.
Using __apple_build_version__ allows to make this distinction.

BUG=chromium:631144

Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9
2016-07-25 19:18:49 +00:00
James Zern
889ed5b158 configure: test for -Wfloat-conversion
supported by clang, gcc-4.9+

Change-Id: I893766de7307fef9a8b68c0cfae137c9d3b0dbe8
2016-07-25 19:05:15 +00:00
James Zern
7aa0c748b3 Merge "vp9: fix frame-level threaded decode shutdown" 2016-07-25 19:00:37 +00:00
Alex Converse
511bf49b7e Merge "Minor skip segment simplification." 2016-07-25 17:50:43 +00:00
Scott LaVarnway
ad5fea03e6 Merge "VP9: get_pred_context_switchable_interp() -- encoder side" 2016-07-25 11:58:24 +00:00
James Zern
54b2071bf4 vp8/decodeframe: fix signed/unsigned comparison
quiets a visual studio warning

Change-Id: Ic7725616bc2cb837e6f79294d4fcff36b67af834
2016-07-23 11:41:52 -07:00
James Zern
f368f86df6 vp9: fix frame-level threaded decode shutdown
Shutdown all threads before reclaiming any memory. The frame-level
parallel decoder may access data from another worker.

BUG=webm:1259

Change-Id: I26856ebd1f77cc4a4545331baa19bbf3e01c4ea4
2016-07-23 10:59:15 -07:00
clang-format
c42d54c3a3 vp8/postproc.c: disable clang-format for RGB_TO_YUV
Change-Id: Id2a936301ec1e3d5648b4f8adbf4e6625002589d
2016-07-23 10:55:44 -07:00
James Zern
18e53642b7 Merge "vpx/: apply clang-format" 2016-07-23 01:27:14 +00:00
James Zern
5cf8eda308 Merge changes I0089e884,Icb0ecb9e
* changes:
  vp8/postproc: fix implicit float conversion
  blockiness_test: fix implicit float conversion
2016-07-23 01:18:32 +00:00
James Zern
256a4af4d1 Merge "resize_test: fix implicit float->int conversion" 2016-07-23 01:17:49 +00:00
James Bankoski
7381e3330f Merge "vp8:fix threading issues" 2016-07-23 00:51:37 +00:00
Jim Bankoski
0fff2fb34c vp8:fix threading issues
1 - stops de allocating before threads are closed.
2 - limits threads to mb_rows when mb_rows < partitions

BUG=webm:851

Change-Id: I7ead53e80cc0f8c2e4c1c53506eff8431de2a37e
2016-07-23 00:50:55 +00:00
James Zern
b2542417cd vp8/postproc: fix implicit float conversion
float->int as reported by -Wfloat-conversion

Change-Id: I0089e8847b218c47526bcfbb0fffd9aad7c5adb3
2016-07-22 16:01:52 -07:00
Yury Gitman
3d3f51262c Add VPX_SWAP macro
Change-Id: I60e233eddef238ad918183392794084673f27d2d
2016-07-22 15:41:25 -07:00
James Zern
5e2791b54d blockiness_test: fix implicit float conversion
float->int as reported by -Wfloat-conversion

Change-Id: Icb0ecb9e2d54edb95813d9f2de34cb6c27b63cbd
2016-07-22 15:35:42 -07:00
Alex Converse
9a62ecbd35 Minor skip segment simplification.
Change-Id: I34863fce1abe94f9539e9a5a6149ae1efb6501bd
2016-07-22 15:31:18 -07:00
Marco Paniconi
53db633349 Merge "vp9 1pass-vbr: Adjust gf setting for nonzero-lag case." 2016-07-22 21:27:05 +00:00
James Zern
325bdddc38 resize_test: fix implicit float->int conversion
Change-Id: I1efc16fa158740a06da719a1ea90c6dd6a182bb4
2016-07-22 13:11:07 -07:00
Marco
c06a4b9df2 vp9 1pass-vbr: Adjust gf setting for nonzero-lag case.
Change-Id: I230c586c6d5ae56ee9a6d37b7d9452351bb4bd80
2016-07-22 11:48:09 -07:00
Paul Wilkins
830fa866a5 Merge "Sample points to reduce encode overhead." 2016-07-22 09:27:34 +00:00
Paul Wilkins
063e4a2914 Merge "Noise energy Experiment in first pass." 2016-07-22 09:27:19 +00:00
clang-format
e3e9fee419 vpx/: apply clang-format
Change-Id: I95922a64568bf289863c1564212b6be5beec36df
2016-07-21 20:49:07 -07:00
Yunqing Wang
4b073bc39a Add back header in threading.h
Added back the header needed in threading.h

Change-Id: I2ce66ad4fe58004997623f6c3f3b8dd11640aa98
2016-07-21 17:26:05 -07:00
Yunqing Wang
930773a1ed Merge "Revert "Amend and improve VP8 multithreading implementation"" 2016-07-21 21:32:55 +00:00
Yunqing Wang
87c6c5224d Revert "Amend and improve VP8 multithreading implementation"
Reverted the patch because of possible performance issue.

Change-Id: I49944f827ccd38ed194c9f8d9cb9036fa9bf79e1
2016-07-21 12:28:25 -07:00
Scott LaVarnway
c969b2b02b VP9: get_pred_context_switchable_interp() -- encoder side
Change-Id: I7217c90d5cf38c51b76759a2dc4f10070f3a40ac
2016-07-21 11:47:51 -07:00
Alex Converse
18c7f46c12 MinArfFreqTest: Don't leak video on failure.
Change-Id: I250379f0ac8d4929c9032e7343290e2980fc2e77
2016-07-21 11:40:51 -07:00
Alex Converse
92e91bd3a1 Make test encoder test driver less likely to leak on failure.
Individual tests still need to be updated.

Change-Id: Ic433d0f742e13560b136f136b72b2a9973970d78
2016-07-21 11:39:47 -07:00
James Zern
16e069b8bb Merge changes from topic 'clang-tidy'
* changes:
  vp8/onyx_if.c: rework #if's to avoid dangling else's
  vp8/bitstream.c: rework #if to avoid dangling else
2016-07-21 03:05:20 +00:00
Johann
a6cc74b987 Merge remote-tracking branch 'origin/khakicampbell' 2016-07-20 18:49:04 -07:00
Johann
042572177b Release v1.6.0 Khaki Campbell Duck
Change-Id: I08da365dd889093f9919476a02ee96ae9615f140
2016-07-20 18:15:41 -07:00
jackychen
71f9cbcfc8 vp9: Fix the clang warning of unsigned int type.
Change-Id: I6308db16bd626fa5943925471e9171f567669350
2016-07-20 15:58:35 -07:00
Yaowu Xu
297b2a12d6 Fix encoder crashes for odd size input
(cherry picked from commit 98431cde07)

Change-Id: Id5c30c419282369cc8c3280d9a70b34a859a71d8
2016-07-20 15:02:13 -07:00
James Zern
b19f8b1607 vp8/onyx_if.c: rework #if's to avoid dangling else's
Change-Id: Ieda8958a3da1000424fcff91a1315d0049612202
2016-07-20 12:36:09 -07:00
James Zern
77a31eb3c5 vp8/bitstream.c: rework #if to avoid dangling else
Change-Id: I9178ae75876f3df3fa3271314db39830552b9549
2016-07-20 12:36:09 -07:00
James Zern
7bb35db872 Merge changes from topic 'clang-tidy'
* changes:
  vp8/{bitstream,rdopt},y4minput: correct break placement
  y4minput.c: correct empty loop formatting
  vp8: simplify a few #if's
  vp8: remove extra semicolons
2016-07-20 19:34:59 +00:00
Yaowu Xu
690fcd793b Change to call vp9_post_proc_frame()
This commit changes the call in vp9 encoder from vp9_deblock() to
vp9_post_proc_frame() to ensure the data structures used in the call
are properly allocated. This fixes an encoder crash when configured
with --enable-internal-stats.

Change-Id: I2393b336c0f566665336df4f1ba91c405eb56764
2016-07-20 11:01:49 -07:00
James Zern
fd85664ae6 vp8/{bitstream,rdopt},y4minput: correct break placement
these should be placed within {}s when present

Change-Id: If00e9766fa8cb039cc070467f353a468f99460fb
2016-07-19 20:51:25 -07:00
James Zern
1b048e966a y4minput.c: correct empty loop formatting
prefer {}s over ';'

Change-Id: I563fc82717e1deb4f42a40e03dca318c6adaa0c1
2016-07-19 20:46:39 -07:00
James Zern
b5164f55a0 vp8: simplify a few #if's
bitstream.c: asserts are disabled when CONFIG_DEBUG is unset
vp8_dx_iface.c: split |s into 2 statements across #if bounds

Change-Id: I307d1e969134db5c9c0edd7690589b6b29116cbd
2016-07-19 20:45:28 -07:00
James Zern
96797e43b4 vp8: remove extra semicolons
Change-Id: I84e1a293ee033865f82c244e8aaaadfb2fb27e63
2016-07-19 20:44:14 -07:00
clang-format
033dab9ca0 top-level: apply clang-format
Change-Id: Ibd5395bf8956a80f7c0df4d539c7a42c927a1fc7
2016-07-19 14:34:19 -07:00
James Zern
6e336f6e5f Merge "vp8: apply clang-tidy google-readability-braces-around-statements" 2016-07-19 21:24:30 +00:00
clang-tidy
7f3e07f1c8 vp8: apply clang-tidy google-readability-braces-around-statements
applied against an x86_64 configure

clang-tidy-3.7.1 \
  -checks='-*,google-readability-braces-around-statements' \
  -header-filter='.*' -fix
+ clang-format afterward

Change-Id: I6694edeaee89b58b8b3082187e6756561136b459
2016-07-19 12:38:03 -07:00
James Zern
5b55bcc564 Merge "examples: apply clang-format" 2016-07-19 19:21:08 +00:00
James Zern
e3f7991f99 Merge changes Ia6004c08,I1954f9d6
* changes:
  cosmetics: Add a few explanatory comments
  cosmetics: Correct grammar/spelling in comments
2016-07-19 19:12:23 +00:00
Yury Gitman
e4ac882007 cosmetics: Add a few explanatory comments
Change-Id: Ia6004c08e6f5fd269a1bbd4df51ce9b76345150d
2016-07-19 10:39:00 -07:00
Marco Paniconi
e90d4f0a03 Merge "vp9: Allow usage of lookahead for real-time, 1 pass vbr." 2016-07-19 17:17:16 +00:00
Johann Koenig
451211cb01 Merge "Change 'git cl upload' default to --no-squash" 2016-07-19 16:37:42 +00:00
James Zern
ea3d324f13 Merge changes I18982dbf,I15c8976c
* changes:
  build/make/Makefile: add a 'test_*' default target
  build/make/Makefile: remove default suffix rules
2016-07-19 06:09:36 +00:00
Pascal Massimino
7e4740156f Merge "take II: variance_test partial clean-up" 2016-07-19 03:52:55 +00:00
clang-format
ef45540927 examples: apply clang-format
Change-Id: Icc3bbb07c99a31a70030baec7e51b881902a7b5e
2016-07-18 19:04:56 -07:00
James Bankoski
c69cc4ce1f Merge "configure: turn on all unused warnings by default" 2016-07-19 00:57:46 +00:00
James Zern
25085a6ac2 build/make/Makefile: add a 'test_*' default target
allows 'make test_libvpx', etc. some reworking of the makefiles would be
needed to avoid hard coding targets here.

Change-Id: I18982dbf691e7d36ab8bcf5934bab9340687b061
2016-07-18 16:30:58 -07:00
James Zern
23d0f73838 build/make/Makefile: remove default suffix rules
Change-Id: I15c8976c6478bf75ec617398f49461b310ab7569
2016-07-18 16:30:40 -07:00
skal
7d72ebaa5c take II: variance_test partial clean-up
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.

Change-Id: Id37b7330eebe80d19b9d12a454f24ff9be6b1116
2016-07-18 16:18:26 -07:00
Marco
05fe0f20a6 vp9: Allow usage of lookahead for real-time, 1 pass vbr.
Allow usage of lookahead for VBR in real-time mode, for 1 pass vbr.

Current usage is for fast checking of future scene cuts/changes,
and adjusting rate control (gf interval and active_worst/target size).

Added unittests (datarate) for 1 pass vbr mode, with non-zero lag.

Added an experimental option to limit QP based on lookahead.

Overall positive gain in metrics on ytlive set:
avgPNSR/SSIM up on average ~1-3%; several clips up by 5, 7%.

Change-Id: I960d57dfc89de121c4824b9a9bf88d2814e74b56
2016-07-18 15:20:17 -07:00
Johann
1afbd88e81 Change 'git cl upload' default to --no-squash
Chromium changed the upstream default to --squash but this conflicts
with libvpx historical defaults.

Change-Id: I80f2f2b48e2ba08e02184b50e6d5f8f5e76fec24
2016-07-18 14:15:24 -07:00
Yury Gitman
bdfdd7d993 cosmetics: Correct grammar/spelling in comments
Change-Id: I1954f9d6e33abff9081fe7a5cf59d5497768e0df
2016-07-18 12:49:00 -07:00
Jim Bankoski
3e04114f3d prepend ++ instead of post in for loops.
Applied the following regex  :
search for: (for.*\(.*;.*;) ([a-zA-Z_]*)\+\+\)
replace with: \1 ++\2)

This misses some for loops:
ie : for (mb_col = 0; mb_col < oci->mb_cols; mb_col++, mi++)

Change-Id: Icf5f6fb93cced0992e0bb71d2241780f7fb1f0a8
2016-07-18 06:54:50 -07:00
James Zern
106a8a1536 Merge "Revert "variance_test partial clean-up"" 2016-07-16 22:01:19 +00:00
James Zern
3d791194f8 vpx_plane_add_noise_c: normalize int types
quiets signed/unsigned mismatch warning

Change-Id: Iaabd7dfff110ba26056258457541f5635d2e85e6
2016-07-16 11:56:55 -07:00
James Zern
090aa88b5a Revert "variance_test partial clean-up"
This reverts commit f993ed5c86.

build warnings under msvc, segfaults with high-bitdepth enabled.

Change-Id: I67502651107830bcadb6ef56d8f2709cccbfdf2b
2016-07-16 11:55:07 -07:00
Pascal Massimino
f44db1487d Merge "sad_test: add some const to methods" 2016-07-16 04:50:45 +00:00
Pascal Massimino
8d681b36c7 Merge "vp9_error_block_test: simplify fn wrapper generation" 2016-07-16 04:41:46 +00:00
Pascal Massimino
5319e83843 sad_test: add some const to methods
Change-Id: I6f2481509b0aa94338ed6185f80c4a6b65532280
2016-07-15 21:07:00 -07:00
Pascal Massimino
22c36dd464 Merge "remove tuple from 'sad_test.cc'" 2016-07-16 03:56:10 +00:00
James Zern
037a50ed36 vp9_error_block_test: simplify fn wrapper generation
Change-Id: I1f1d396b9456e52e863c4c75f23c3d17420668b4
2016-07-15 20:27:45 -07:00
skal
3fc29ae3ee remove tuple from 'sad_test.cc'
+ general clean-up

Change-Id: Ib9dca3d1a3b7f0c1bedef2a26c9ff5ae1c289e8a
2016-07-15 19:52:56 -07:00
clang-format
81a6739533 vp8: apply clang-format
Change-Id: I7605b6678014a5426ceb45c27b54885e0c4e06ed
2016-07-15 19:28:44 -07:00
James Zern
65daa41378 add .clang-format, based on Google style
derived from clang-format 3.7.1; same as used in libaom

Change-Id: I8ea915a41d1f2ea3b0d4e4dab9ebc808e9116f11
2016-07-15 19:26:24 -07:00
James Bankoski
ce6678fdc9 Merge "addnoise : clear out static size for generated noise" 2016-07-16 01:48:07 +00:00
Jim Bankoski
cb957c302a addnoise : clear out static size for generated noise
Change-Id: I5d4343f2da9cd4b01dd37be7a048d159fec109d1
2016-07-15 15:52:45 -07:00
Jim Bankoski
0dfede2e79 configure: turn on all unused warnings by default
Change-Id: I7f6cb446cd3ac57ac39835cf065d9501a66acd5b
2016-07-15 15:26:20 -07:00
Jim Bankoski
da1bda0fb2 vp9_postproc.c : unused variable if not vp9_highbitdepth.
Change-Id: Ib89b128f23767934c40b5add3fcf9dbd875e82f9
2016-07-15 15:04:57 -07:00
James Bankoski
302e425453 Merge "postproc : fix function parameters for noise functions." 2016-07-15 17:33:53 +00:00
Jim Bankoski
0dc69c70f7 postproc : fix function parameters for noise functions.
Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c
2016-07-15 08:27:34 -07:00
James Zern
60ada7edb4 Merge "variance_test partial clean-up" 2016-07-15 07:31:30 +00:00
skal
f993ed5c86 variance_test partial clean-up
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.

Change-Id: Ia14f3924140e8545e4f10d0504475681baae8336
2016-07-14 22:06:44 +00:00
James Zern
6d2b79e3a2 Merge "vp9_intrapred_test: follow-up cleanup" 2016-07-14 19:48:52 +00:00
James Zern
a07bb84215 gtest-all.cc: quiet an unused variable warning
under windows / mingw builds

Change-Id: I93f9a5df77cea0c28d4afb272abcde5a9732e355
2016-07-13 18:36:17 -07:00
skal
3386ca7496 vp9_intrapred_test: follow-up cleanup
address few comments from ce050afaf3

Change-Id: I5d8fc9dab35c4ee5ec3671134c4eef4ec241e309
2016-07-14 00:40:55 +00:00
James Bankoski
7eec1f31b5 Merge "postproc: noise style fixes." 2016-07-13 22:04:47 +00:00
Hui Su
8dd3bef7ef Merge "Revert "Eliminate isolated and small tail coefficients:"" 2016-07-13 21:30:12 +00:00
Pascal Massimino
8c7751e1c2 Merge "clean-up vp9_intrapred_test" 2016-07-13 21:26:15 +00:00
Yaowu Xu
d6197b621d Merge "Fix encoder crashes for odd size input" 2016-07-13 20:05:09 +00:00
Jim Bankoski
e736691a6d postproc: noise style fixes.
Change-Id: Ifdcb36b8e77b65faeeb10644256e175acb32275d
2016-07-13 12:39:01 -07:00
skal
ce050afaf3 clean-up vp9_intrapred_test
remove tuple and overkill VP9IntraPredBase class.

Change-Id: I85b85bdd33d7fe417895e75f77db219f713dfea3
2016-07-13 18:16:32 +00:00
hui su
248f6ad771 Revert "Eliminate isolated and small tail coefficients:"
This reverts commit ff19cdafdb.

Change-Id: I81f68870ca27a1ff683ee22090530b6997815fb2
2016-07-13 11:14:44 -07:00
Jingning Han
fed14a3e94 Merge "Disable trellis optimization when lossless is on" 2016-07-13 16:01:01 +00:00
James Bankoski
e93f2fdb83 Merge "postproc - move filling of noise buffer to vpx_dsp." 2016-07-13 15:31:17 +00:00
Jim Bankoski
2ca24b0075 postproc - move filling of noise buffer to vpx_dsp.
Change-Id: I63ba35dc0ae9286c9812367a531e01d79a4c1635
2016-07-13 07:35:25 -07:00
Jim Bankoski
b24373fec2 deblock: missing const on extern const.
Change-Id: I0df08f7c431daf939e266f008bf5158b0c97358b
2016-07-13 07:27:29 -07:00
Jim Bankoski
6f424a768e vp9_postproc.c missing extern.
BUG=webm:1256

Change-Id: I5271e71bc53cce033fb906040643dcdd5ccb2381
2016-07-12 17:47:49 -07:00
James Zern
6a3ff0b617 Merge changes from topic 'webp-thread-update'
* changes:
  vpx_thread: use CreateThread for windows phone
  vpx_thread: use WaitForSingleObjectEx if available
  vpx_thread: use InitializeCriticalSectionEx if available
  vpx_thread: use native windows cond var if available
  vpx_thread.[hc]: update webp source reference
2016-07-13 00:08:05 +00:00
Yaowu Xu
98431cde07 Fix encoder crashes for odd size input
Change-Id: Id5c30c419282369cc8c3280d9a70b34a859a71d8
2016-07-12 11:11:26 -07:00
Jacky Chen
19c157afe2 Merge "vp9 svc: Reuse scaled_temp in two stage downscaling." 2016-07-12 17:59:09 +00:00
JackyChen
110a2ddc9b vp9 svc: Reuse scaled_temp in two stage downscaling.
This change eliminates redundant computation in the two stage
downscaling, which saves ~1% encoding time in 3-layer svc encoding.

Change-Id: Ib4b218811b68499a740af1f9b7b5a5445e28d671
2016-07-12 10:09:55 -07:00
Jingning Han
efccbc9fb5 Disable trellis optimization when lossless is on
Disable trellis coefficient optimization when the lossless mode
is turned on.

Change-Id: I9001bf626e86dc3c8c32331ede04fd39036e5f7c
2016-07-12 09:00:16 -07:00
Jim Bankoski
88e6951465 deblock filter : moved from vp8 code branch
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.

Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
2016-07-12 05:53:00 -07:00
James Zern
45ed7effed Merge "remove *debugmodes.c from the default build" 2016-07-11 23:49:29 +00:00
Scott LaVarnway
2e93fcf893 Merge "vp9_rd_pick_intra_mode_sb(): set interp_filter to" 2016-07-11 22:31:06 +00:00
paulwilkins
3a986eac57 Sample points to reduce encode overhead.
Only noise filter sampled points in first pass to reduce
any first pass speed overhead.

Change-Id: Ic80d4400e59146d1c3332336c4350faf28ff8b17
2016-07-11 11:45:52 +01:00
Scott LaVarnway
ed7786869a vp9_rd_pick_intra_mode_sb(): set interp_filter to
SWITCHABLE_FILTERS.  This is a partial fix for the build
issues with Change 357240.

Change-Id: I4e507c196175bae729a4f1397878ec8776b0146c
2016-07-09 09:47:34 -07:00
Yaowu Xu
5adb43b8be Fix non-highbitdepth coding path for HBD build
Change-Id: I38eb42b8d051924a7cd1ccc3421a4057cf6e170f
2016-07-08 11:26:34 -07:00
Marco Paniconi
20946cdd3b Merge "vp9: Adjustment of gfu_boost and af_ratio for 1 pass vbr." 2016-07-08 16:26:06 +00:00
Yaowu Xu
dc008cc17d Merge "Enable HBD support in real time encoding path" 2016-07-07 22:32:48 +00:00
Marco
cc431ad50a vp9: Adjustment of gfu_boost and af_ratio for 1 pass vbr.
Modify the gfu_boost and af_ratio setting based on the
average frame motion level.

Change only affects 1 pass vbr.

Metrics overall positive on ytlive set.
On average up by ~1%, several clips up by 2-4%.

Change-Id: Ic18c49eb2df74cb4986b63cdb11be36d86ab5e8d
2016-07-07 15:18:14 -07:00
Marco Paniconi
a75965fa94 Merge "vp9: Adjustment to mv bias for non-rd pickmode." 2016-07-07 21:07:37 +00:00
Jingning Han
2f28f9072e Enable coeff optimization for intra modes
This further improves the coding performance by
lowres 0.3%
midres 0.5%
hdres  0.6%

Change-Id: I6a03b6da210b9cbc261474bad4a103e0ba021c68
2016-07-07 12:25:41 -07:00
Jingning Han
44354ee7bf Use precise context to estimate coeff rate cost
Use the precise context to estimate the zero token cost in trellis
optimization process. This improves the speed 0 coding performance
by 0.15% for lowres and 0.1% for midres. It improves the speed 1
coding performance by 0.2% for midres and hdres.

Change-Id: I59c7c08702fc79dc4f8534b64ca594da909e2c91
2016-07-07 12:25:33 -07:00
Jingning Han
62aa642d71 Enable uniform quantization with trellis optimization in speed 0
This commit allows the inter prediction residual to use uniform
quantization followed by trellis coefficient optimization in
speed 0. It improves the coding performance by

lowres 0.79%
midres 1.07%
hdres  1.44%

Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619
2016-07-07 12:25:33 -07:00
Jingning Han
541eb78994 Refactor coeff_cost() function
Move the operations that update the context buffers outside this
function. The coeff_cost() takes all input as const value and returns
the coefficient cost.

This makes preparation for the next coefficient optimization CLs.

Change-Id: I850eec6e5470b91ea84646ff26b9231b09f70a0c
2016-07-07 18:09:39 +00:00
Jingning Han
7c1fdf02cd Merge "Support measure distortion in the pixel domain" 2016-07-07 18:09:20 +00:00
Marco
f451b404ea vp9: Adjustment to mv bias for non-rd pickmode.
Replace the existing mv bias with a bias only for
NEWMV, and based on the motion vector difference of
its top/left neighbors.

For cbr non-screen-content mode.

Change-Id: I8a8cf56347cfa23e9ffd8ead69eec8746c8f9e09
2016-07-07 10:33:06 -07:00
Yunqing Wang
9976ff8c79 Merge "Fix Visual Studio build warning" 2016-07-07 16:48:49 +00:00
Yunqing Wang
9ef37860cd Fix Visual Studio build warning
Fixed signed/unsigned mismatch warning.

Change-Id: I1634d0634de752f4b8baa8059e8f3e2891fa53b6
2016-07-07 08:43:57 -07:00
paulwilkins
2580e7d63e Noise energy Experiment in first pass.
Use a measure of noise energy to adjust Q estimate and
arf filter strength.

Gains 0.3-0.5% on Lowres and |Netflix sets.
Hdres and Midres neutral.

Change-Id: Ic0de552e7b6763e70eeeaa3651619831b423e151
2016-07-07 14:50:21 +01:00
Paul Wilkins
f037cf80c9 Merge "Add experimental spatial de-noise filter on key frames." 2016-07-07 13:30:07 +00:00
Jingning Han
e357b9efe0 Support measure distortion in the pixel domain
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.

Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
2016-07-06 18:25:17 -07:00
Yaowu Xu
884c2ddc48 Enable HBD support in real time encoding path
BUG=webm:1223

Change-Id: If83a613784e3b2a33c9c93f9ad0ba39dd4d23056
2016-07-06 14:18:37 -07:00
Debargha Mukherjee
adbad6092f Merge "Remove decode asserts from better-hw-compatibility" 2016-07-06 20:55:29 +00:00
Jacky Chen
aa6108382e Merge "vp9: Choose the scheme for modeling rd for 32x32 based on skin color." 2016-07-06 20:03:55 +00:00
Debargha Mukherjee
4b6e4e1813 Remove decode asserts from better-hw-compatibility
Safer to have the decoder operate normally and have
better-hw-compatibility only implement encoding changes.
Fixes some test failures.

Change-Id: I0dd70d002e4e893992f0cd59774b9363e6f7fe76
2016-07-06 12:26:38 -07:00
Yunqing Wang
a921444fdb Merge "Modify the name of vp9cx_set_ref example" 2016-07-06 18:55:15 +00:00
JackyChen
2678aefc48 vp9: Choose the scheme for modeling rd for 32x32 based on skin color.
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the sb is
a skin sb to avoid visual regression on the slowly moving face.

Refer to the cl: https://chromium-review.googlesource.com/#/c/356020/

Change-Id: I42c36666b2b474ce5ee274239d52ae8ab400fd46
2016-07-06 11:12:03 -07:00
Min Ye
ff19cdafdb Eliminate isolated and small tail coefficients:
Improve hdres PSNR by 0.696%
Improve midres PSNR by 0.313%
Improve lowres PSNR by 0.142%

Change-Id: Icabde78aa9689f539f6a03ec09f712c20758796c
2016-07-06 11:08:23 -07:00
Yunqing Wang
fba5c354ad Modify the name of vp9cx_set_ref example
Modified the name of vp9cx_set_ref example so that the test script
ran correctly.

Change-Id: I0ab2de66220b0a88b7af7ea1633a088ab78dd9ff
2016-07-06 10:05:51 -07:00
Jingning Han
51aad61c8c Merge "Remove txfrm_block_to_raster_xy() from vp9 encoder" 2016-07-06 16:00:18 +00:00
Yunqing Wang
825bb86044 Merge "Make set_reference control API work in VP9" 2016-07-06 00:08:52 +00:00
Jingning Han
14011f037d Remove txfrm_block_to_raster_xy() from vp9 encoder
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.

Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
2016-07-04 18:41:47 -07:00
James Zern
5afa3b9150 Merge "improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw" 2016-07-02 03:08:33 +00:00
James Zern
3197172405 Merge "Update vpx subpixel 1d filter ssse3 asm" 2016-07-02 03:08:17 +00:00
James Zern
3007081a87 vpx_thread: use CreateThread for windows phone
BUG=b/29583578

original webp change:

commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:41:26 2015 -0800

    thread: use CreateThread for windows phone

    _beginthreadex is unavailable for winrt/uwp

    Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d

100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

Change-Id: Iade8fff6367b45534986c77ebe61abeb45bce0f8
2016-07-01 19:36:58 -07:00
James Zern
7954e67bb8 vpx_thread: use WaitForSingleObjectEx if available
BUG=b/29583578

original webp change:

commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:40:26 2015 -0800

    thread: use WaitForSingleObjectEx if available

    Windows XP and up

    Change-Id: Ie1a46a82722b8624437c8aba0aa4566a4b0b3f57

100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

Change-Id: If165c38b378c6e0c55e17a1b071efd3ec3e7dcdd
2016-07-01 19:36:58 -07:00
James Zern
a48d42a804 vpx_thread: use InitializeCriticalSectionEx if available
BUG=b/29583578

original webp change:

commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:38:46 2015 -0800

    thread: use InitializeCriticalSectionEx if available

    Windows Vista / Server 2008 and up

    Change-Id: I32c5b4e5384d614c5a821ef511293ff014c67966

100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

Change-Id: I9ce49b3a86857267e504cd8ceab503b7b441d614
2016-07-01 19:36:58 -07:00
James Zern
8cc525e82b vpx_thread: use native windows cond var if available
BUG=b/29583578

original webp change:

commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 19:49:58 2015 -0800

    thread: use native windows cond var if available

    Vista / Server 2008 and up. no speed difference observed.

    Change-Id: Ice19704777cb679b290dc107a751a0f36dd0c0a9

100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

Change-Id: Iede7ae8a7184e4b17a4050b33956918fc84e15b5
2016-07-01 19:36:58 -07:00
James Zern
d4d6c58e37 vpx_thread.[hc]: update webp source reference
+ drop the blob hash, the updated reference will be updated in the
commit message

BUG=b/29583578

Change-Id: Ifabbe52a2f07ac29e1881f5c8a62d7f3eb3c2c04
2016-07-01 19:36:58 -07:00
James Zern
d2b40894be remove *debugmodes.c from the default build
these are debug-only modules that can be added in manually when needed.
leave a reference in vp8_common.mk / vp9_common.mk for easy addition.

quiets -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: Ifc8637d877edfbd562b34dc5c540428bba7951fc
2016-07-01 19:10:04 -07:00
Yunqing Wang
0a075cb39c Make set_reference control API work in VP9
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30

Change-Id: I4cf8f23b86d7ebd85ffd2630dcfbd799c0b88101
2016-07-01 17:58:02 -07:00
James Zern
3ef9c0ba03 vp8/common/reconintra4x4.c: add missing include
quiets -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: I806e3475ebee579dce0073dd1784a7c2899e7de0
2016-07-01 16:20:42 -07:00
James Bankoski
f5a15f270a Merge "Revert "libyuv: update to 2f101fdb"" 2016-07-01 19:14:53 +00:00
James Bankoski
c5372cf077 Revert "libyuv: update to 2f101fdb"
Compile failures on linux platform.  

BUG=webm:1253

This reverts commit aa81375d73.

Change-Id: Ibab2c4827bc21518dc03c6e9716b5015cff56fc7
2016-07-01 19:14:28 +00:00
Johann Koenig
e616012d69 Merge changes I59a11921,I296a0b81,I397d7753
* changes:
  configure: remove x86inc.asm distinction
  test: remove x86inc.asm distinction
  vpx_dsp: remove x86inc.asm distinction
2016-07-01 18:13:41 +00:00
James Zern
fbbd3f0d8d Merge "convolve_test: fix byte offsets in hbd build" 2016-07-01 01:54:30 +00:00
Jacky Chen
ee78c541a4 Merge "vp9 postproc: Bug fix and code clean." 2016-06-30 21:59:44 +00:00
James Bankoski
892ebd9760 Merge "libyuv: update to 2f101fdb" 2016-06-30 19:11:30 +00:00
Johann
571f00cb95 configure: remove x86inc.asm distinction
BUG=b:29583530

Change-Id: I59a1192142e89a6a36b906f65a491a734e603617
2016-06-30 11:14:14 -07:00
Johann
0266e70c52 test: remove x86inc.asm distinction
BUG=b:29583530

Change-Id: I296a0b81755e3086bc0a40cb126d0200ff03c095
2016-06-30 11:14:10 -07:00
Johann Koenig
3c41d7358c Merge "vp9: remove x86inc.asm distinction" 2016-06-30 17:42:17 +00:00
Johann Koenig
89771f2c2c Merge "Require x86inc.asm" 2016-06-30 17:41:33 +00:00
Paul Wilkins
1d3f1983b2 Merge "Fix error in get_ul_intra_threshold() for 10/12 bit." 2016-06-30 16:26:14 +00:00
Paul Wilkins
f7c2d2a3de Merge "Fix error in get_smooth_intra_threshold() for 10/12 bit." 2016-06-30 16:25:55 +00:00
Jim Bankoski
aa81375d73 libyuv: update to 2f101fdb
Fixes color issue when scaling without breaking mingw.

BUG=https://bugs.chromium.org/p/libyuv/issues/detail?id=605
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1252

Change-Id: Ifba747feb0c6a08f2b353b820a24c6c145d440ad
2016-06-30 13:25:39 +00:00
paulwilkins
e25d6252a4 Fix error in get_ul_intra_threshold() for 10/12 bit.
The scaling of the threshold for 10 and 12 bit here appears
to be in the wrong direction. For 10 and 12 bit we expect sse
values to be higher and hence the threshold used should be
scaled up not down.

Change-Id: I2678116652b539aef48100e0f22873edd4f5a786
2016-06-30 13:38:57 +01:00
paulwilkins
f9a3d08f1b Fix error in get_smooth_intra_threshold() for 10/12 bit.
This function seems to scale the threshold for testing an
SSE value in the wrong direction for 10 and 12 bit inputs.

Also for a true SSE the scalings should probably be << 4 and 8

Change-Id: Iba8047b3f70d04aa46d9688a824f3d49c1c58e90
2016-06-30 13:34:11 +01:00
Jacky Chen
e85607410e Merge "vp9: Change the scheme for modeling rd for 32x32 on newmv_last mode." 2016-06-30 05:59:46 +00:00
James Zern
f5a6079141 convolve_test: fix byte offsets in hbd build
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.

+ factorized some common expressions

Change-Id: I171c3faaa5606d098e984baa9aa74bb36042f57f
2016-06-29 20:39:07 -07:00
Johann
1b833d63d9 vpx_dsp: remove x86inc.asm distinction
BUG=b:29583530

Change-Id: I397d77536b0d3cee0a92cdfe8b76bc4e434d0720
2016-06-29 18:55:58 -07:00
Johann
fe96dbda15 vp9: remove x86inc.asm distinction
BUG=b:29583530

Change-Id: I952da3fc0d4716dec897be0d2e9806af6612722b
2016-06-29 18:55:28 -07:00
Johann
d11c97e8e2 Require x86inc.asm
Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.

The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).

BUG=b:29583530

Change-Id: Ib935e97b37ffb22d7af72ba0f04564ae6280f1fd
2016-06-29 18:55:12 -07:00
James Zern
0462263765 configure: restore vs_version variable
inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)

this prevents an empty CONFIG_VS_VERSION and avoids make failure

Change-Id: I529d52eca59329e2715309efd63d80f0e1fed462
2016-06-29 16:57:28 -07:00
JackyChen
5fc2d6cb9f vp9: Change the scheme for modeling rd for 32x32 on newmv_last mode.
For real time CBR mode, use model_rd_for_sb_y for 32x32 if the mode is
newmv last, which is less aggressive in skipping transform and
quantization, to avoid quality regression in some conditions.

Change-Id: Ifa30be587f2a8a4a7f182a172de6ce277c0f8556
2016-06-29 16:28:15 -07:00
James Bankoski
c8f6ed77b9 Merge "Revert "libyuv: update to b8ddb5a2"" 2016-06-29 23:20:25 +00:00
James Bankoski
291033032e Revert "libyuv: update to b8ddb5a2"
This reverts commit b8f83282f8.

Update was to wrong version and still has: 

BUG=webm:1252

Change-Id: I80f3a7c0581ab5e2dd1a84f7840e51d7c362afac
2016-06-29 23:09:10 +00:00
James Zern
3a6a81fc9a Merge changes I9433d858,Iafd05637,If08ce6ca
* changes:
  tests: remove redundant round() definition
  remove visual studio < 2010 workarounds
  configure: remove old visual studio support (<2010)
2016-06-29 23:07:16 +00:00
Yaowu Xu
b458f42966 Merge "Remove effectless initialization" 2016-06-29 22:51:14 +00:00
James Zern
0a64929f19 tests: remove redundant round() definition
use vpx_ports/msvc.h for compatibility

BUG=b/29583530

Change-Id: I9433d8586cd0b790e7f4d697304298feafe801f1
2016-06-29 14:57:47 -07:00
Yaowu Xu
c02a4beed8 Merge "Prevent negative variance" 2016-06-29 20:53:37 +00:00
Linfeng Zhang
6b350766bd Update vpx subpixel 1d filter ssse3 asm
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.

Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87
2016-06-29 13:48:41 -07:00
Yaowu Xu
63a37d16f3 Prevent negative variance
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.

Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94
2016-06-29 11:08:17 -07:00
James Bankoski
f14c323b4c Merge "libyuv: update to b8ddb5a2" 2016-06-29 17:58:40 +00:00
Jim Bankoski
b8f83282f8 libyuv: update to b8ddb5a2
Fixes color issue when scaling without breaking mingw.

BUG=https://bugs.chromium.org/p/libyuv/issues/detail?id=605
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1252

Change-Id: I3920c5664def7ae7a23f60fb160d26d23bc86a27
2016-06-29 17:53:14 +00:00
paulwilkins
be013eb396 Add experimental spatial de-noise filter on key frames.
For forced key frames in particular this helps to make them
blend better with the surrounding frames where noise tends
to be suppressed by a combination of quantization and alt
ref filtering.

Currently disabled by default under and IFDEF flag pending
wider testing.

Change-Id: I971b5cc2b2a4b9e1f11fe06c67ef073f01b25056
2016-06-29 17:25:41 +01:00
Scott LaVarnway
74bb78df82 Merge "VP9: handle_inter_mode()... Use interp_filter" 2016-06-29 11:41:52 +00:00
James Zern
c125f4a594 remove visual studio < 2010 workarounds
BUG=b/29583530

Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a
2016-06-28 20:58:49 -07:00
James Zern
078dff72ca configure: remove old visual studio support (<2010)
BUG=b/29583530

Change-Id: If08ce6ca352f377ac4db6b9b1909b507bba6d872
2016-06-28 20:40:22 -07:00
jackychen
6b4463dc1f vp9 postproc: Bug fix and code clean.
Bug fix: The crash is caused by not allocating buffer for prev_mip in
postproc_state and prev_mip in postproc_state is only used for MFQE,
ohter postproc modules, deblocking and etc., should not use it.

BUG=webm:1251

Change-Id: I3120d2f50603b4a2d400e92d583960a513953a28
2016-06-28 16:13:44 -07:00
Scott LaVarnway
feb7e9a372 VP9: handle_inter_mode()... Use interp_filter
only if above/left is inter.

Change-Id: I0cc1f926425c021c84536df8271e9ee5f3f87caf
2016-06-28 14:09:59 -07:00
Jacky Chen
d004c64013 Merge "vp9: Increase thr_var for 32x32 blocks in var-based partitioning." 2016-06-28 20:54:06 +00:00
Jacky Chen
4736e5f9d1 Merge "vp9: Move chroma sensitivity check out from choose_partitioning." 2016-06-28 20:53:23 +00:00
Yaowu Xu
43ae6c1e22 Remove effectless initialization
Change-Id: Iec117841a7ecf6f99d2b718057d8646e221c5c64
2016-06-28 12:28:45 -07:00
James Zern
0afe5e405d Merge "*.asm: normalize label format" 2016-06-28 19:22:10 +00:00
jackychen
91038e0eb6 vp9: Move chroma sensitivity check out from choose_partitioning.
Change-Id: Ie78185a30cac4d1841be3708bd23e6505d3733b6
2016-06-28 09:58:51 -07:00
Yaowu Xu
b2d690187e Merge "psnr.c: use int64_t for sum of differences" 2016-06-28 16:55:44 +00:00
Yaowu Xu
d34b49d7b9 psnr.c: use int64_t for sum of differences
Since the values can be negative.

Change-Id: Idda69e9fb47bb34696aeb20170341a0191c5d85e
2016-06-28 09:53:11 -07:00
Parag Salasakar
10b4753179 Merge "mips added p6600 cpu support" 2016-06-28 08:45:01 +00:00
James Zern
f51f67602e *.asm: normalize label format
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.

BUG=b/29583530

Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
2016-06-27 19:46:57 -07:00
James Bankoski
32ac7cabdf Merge "Revert "libyuv: update to 1b3e4aee47"" 2016-06-27 22:59:11 +00:00
James Bankoski
7f2628152a Revert "libyuv: update to 1b3e4aee47"
This reverts commit 0c6caf187c.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1252

Fails mingw_64 builds.

Change-Id: I83e7204bf1be48b499dc32b2597693b95ec49d06
2016-06-27 22:29:52 +00:00
jackychen
8cbd4f8701 vp9: Increase thr_var for 32x32 blocks in var-based partitioning.
For real-time mode, increase variance threshold for 32x32 blocks in
var-based partitioning for resolution >= 720p, so that it is more
likely to stay at 32x32 for high resolution which accelerates the
encoding speed with little/no PSNR drop.

PSNR effect on different speed settings:
speed 8 rtc: 0.02 overall PSNR drop, 0.285% SSIM drop
speed 7 rtc: 0.196% overall PSNR increase, 0.066% SSIM increase
speed 5 rtc_derf: no effect.

Speed up:
gips_motion_WHD, 1mbps: 2.5% faster on speed 7, 2.6% faster on speed8
gips_stat_WHD, 1mbps: 4.6% faster on speed 7, 5.6% faster on speed8

Change-Id: Ie7c33c4d2dd7d09294917e031357fc5476c3a4bb
2016-06-27 14:44:27 -07:00
James Bankoski
71aacf39c7 Merge "libyuv: update to 1b3e4aee47" 2016-06-27 19:32:50 +00:00
Yaowu Xu
7676defca9 Merge "Port metric computation changes from nextgenv2" 2016-06-27 19:18:00 +00:00
Min Chen
b2fb48cfcf improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw
Change-Id: I14c0c2e54d0b0584df88e9a3f0a256ec096bea6e
2016-06-27 17:50:45 +00:00
Parag Salasakar
7c184c6e1f mips added p6600 cpu support
Removed -funroll-loops

Change-Id: I6684bcac62902c10f945a6dcc4ed803203fcd829
2016-06-27 13:02:55 +05:30
Yaowu Xu
b9ec759bc2 Fix ubsan warnings: vp9/encoder/vp9_pickmode.c
This commit fixes a number of integer out of range issue in HBD build.

BUG=webm:1219

Change-Id: Ib4192dc74a500e1b86c37a399114c7f6d4ed5185
2016-06-27 05:53:46 +00:00
James Zern
913081ab02 Merge "s/UINT32_MAX/UINT_MAX/" 2016-06-25 21:09:55 +00:00
James Zern
ca88d22f39 s/UINT32_MAX/UINT_MAX/
provides better toolchain compatibility

Change-Id: I8561a6de668a68ff54fe3886a4ee6300f0ae9c04
2016-06-25 12:15:51 -07:00
James Zern
1c0a9f36f1 vp9_pickmode: revert rd modeling change for hbd
Avoids a segfault in high-bitdepth builds.
This restores the condition to its state prior to:
7991241 vp9: Change the scheme for modeling rd for bsize 32x32.

BUG=webm:1250

Change-Id: I6183d5b34cb89dfbf27b7bb589812148a72cd7de
2016-06-25 11:40:26 -07:00
James Zern
cfd5e0221c Revert "Update vpx subpixel 1d filter ssse3 asm"
This reverts commit 1517fb74fd.

Fixes a segfault in windows x64 builds.

Change-Id: I6a6959cd7e64a28376849a9f2b11fc852a7c1fbe
2016-06-25 11:37:20 -07:00
Jacky Chen
168eea5d60 Merge "vp9: Change the scheme for modeling rd for bsize 32x32." 2016-06-25 00:43:40 +00:00
James Zern
922751e059 Merge "datarate_test,DatarateTestLarge: normalize bits type" 2016-06-25 00:36:05 +00:00
Jacky Chen
723e357ead Merge "vp9: Code clean, move low temp var logic out of choose_partitioning." 2016-06-24 22:00:49 +00:00
Jim Bankoski
0c6caf187c libyuv: update to 1b3e4aee47
Color issue when scaling.   https://codereview.chromium.org/2084533006/

Change-Id: I84d74346f754c02a5b770b87b6e0b6885d03bb20
2016-06-24 21:57:48 +00:00
James Zern
b34705f64f Merge "cosmetics: Beautify whitespaces and line wrapping" 2016-06-24 21:51:01 +00:00
James Zern
efad6feb9a Merge "cosmetics: Change few types to their posix version" 2016-06-24 21:50:45 +00:00
James Zern
9e5f355daf Merge "cosmetics: Make few conditions clearer" 2016-06-24 21:50:32 +00:00
Yaowu Xu
003a9d20ad Port metric computation changes from nextgenv2
Change-Id: I4aceffcdf7af59ffeb51984f0345c3a4c7e76a9f
2016-06-24 13:52:50 -07:00
jackychen
dd07443f72 vp9: Code clean, move low temp var logic out of choose_partitioning.
Change-Id: I7093e74131e0964471c9993c1e972b4617c4731d
2016-06-24 13:38:22 -07:00
jackychen
7991241a50 vp9: Change the scheme for modeling rd for bsize 32x32.
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.

PSNR effect on different speed settings:
speed 8 rtc:  0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc:  0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop

Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8

BUG=webm:1250

Change-Id: I818babce5b8549b4b1a7c3978df8591bffde7173
2016-06-24 12:09:13 -07:00
Marco
b582cf0ea9 vp9-svc: Remove some unneeded code/comment.
Change-Id: I710707296042d8586109760544ef68e40ae486c3
2016-06-24 11:43:11 -07:00
Yury Gitman
67611119b5 cosmetics: Beautify whitespaces and line wrapping
Change-Id: I9afa02cae671bd3527cf344695e53d0cc767f549
2016-06-24 10:18:06 -07:00
Yury Gitman
3b2e2f2f77 cosmetics: Change few types to their posix version
Change-Id: I6d7bc9ed7396e7b0d63ee97bfa473fdea002f9ee
2016-06-24 10:18:06 -07:00
Yury Gitman
79436fadfb cosmetics: Make few conditions clearer
Change-Id: Ib024b3e42efc7ce1af56824a4644fdefcd45b215
2016-06-24 10:17:51 -07:00
Yaowu Xu
7ed1d54ab4 Merge "Revert "vp9: Change the scheme for modeling rd for bsize 32x32."" 2016-06-24 16:05:55 +00:00
Yaowu Xu
26daa30da4 Merge "Rationalize type to avoid integer out of range" 2016-06-24 13:58:36 +00:00
Yaowu Xu
7738bcb350 Rationalize type to avoid integer out of range
BUG=webm:1250

Change-Id: Id5bb2762ca1bf996ba4f9a60eec977a7994c1d94
2016-06-24 13:58:02 +00:00
James Zern
73b11ec876 datarate_test,DatarateTestLarge: normalize bits type
quiets a msvc warning:
conversion from 'const int64_t' to 'size_t', possible loss of data

Change-Id: I90a2ac6b040454dac7434fc9b63b98c42ea127b1
2016-06-23 23:29:26 -07:00
James Zern
d4596485be Revert "vp9: Change the scheme for modeling rd for bsize 32x32."
This reverts commit 5c29ee726e.

Causes segfaults in VP9/EndToEndTestLarge.EndtoEndPSNRTest.

BUG=webm:1250

Change-Id: I8a30e97be30589abdb76820b5c3c37c46cd6cafb
2016-06-23 15:59:25 -07:00
Johann Koenig
57adf3d573 Merge "configure: clean up var style and set_all usage" 2016-06-23 22:59:21 +00:00
Johann
74a61b5ab9 configure: clean up var style and set_all usage
Use quotes whenever possible and {} always for variables.

Replace multiple set_all calls with *able_feature().

Change-Id: If579d3f718bd4133cf1592b4554a8ed00cf9f2d3
2016-06-23 22:15:13 +00:00
Vignesh Venkatasubramanian
692fe74deb Merge "vp9: Fix potential SEGV in decoder_peek_si_internal" 2016-06-23 21:33:13 +00:00
Linfeng Zhang
bdeb5febe4 Merge "Update vpx subpixel 1d filter ssse3 asm" 2016-06-23 19:08:04 +00:00
Johann Koenig
9eeb1f2fc3 Merge "Fail early when android target does not include --sdk-path" 2016-06-23 19:04:52 +00:00
Angie Chiang
424982bc41 Merge "set interp_filter to SWITCHABLE_FILTER for intra block" 2016-06-23 18:56:27 +00:00
Johann Koenig
5e9c5dfdf0 Merge changes Ifddff89d,I827dfe59,Idca7ef45
* changes:
  vp8 machine setup: mark unused variable
  vp8 realtime encoder: mark unused variable
  vp8 error concealment: remove unused variables
2016-06-23 17:55:34 +00:00
Vignesh Venkatasubramanian
aa1c813c43 vp9: Fix potential SEGV in decoder_peek_si_internal
decoder_peek_si_internal could potentially read more bytes than
what actually exists in the input buffer. We check for the buffer
size to be at least 8, but we try to read up to 10 bytes in the
worst case. A well crafted file could thus cause a segfault.
Likely change that introduced this bug was:
https://chromium-review.googlesource.com/#/c/70439 (git hash:
7c43fb6)

BUG=chromium:621095

Change-Id: Id74880cfdded44caaa45bbdbaac859c09d3db752
2016-06-23 09:39:26 -07:00
Alex Converse
6e4b73125b Merge "vpx_lpf_horizontal_4_sse2: Remove dead load." 2016-06-23 16:20:36 +00:00
Johann
310073868e Fail early when android target does not include --sdk-path
Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c
2016-06-23 13:48:18 +00:00
Johann Koenig
cc1524aa90 Merge "Add default flags for arm64/armv8 builds" 2016-06-23 13:47:28 +00:00
Johann
6c6eb16bb9 vp8 machine setup: mark unused variable
When building without multithreading and for a non-arm, non-x86 system,
ctx is unused.

Cleans up -Wextra warning:
unused parameter ‘ctx’ [-Werror=unused-parameter]

Change-Id: Ifddff89d2ebd45f7d71e3d415a8f2415dd818957
2016-06-23 13:46:20 +00:00
Johann
3b2c3cb366 vp8 realtime encoder: mark unused variable
'duration' is not used in realtime-only mode:

Cleans up -Wextra warning:
unused parameter 'duration' [-Wunused-parameter]

Change-Id: I827dfe59ebcdc72c5a93fdf7e5aca063433914b1
2016-06-23 13:46:00 +00:00
Johann
55f3740d76 vp8 error concealment: remove unused variables
vp8_conceal_corrupt_mb is an empty function. Remove it entirely.

Cleans up -Wextra warnings:
unused parameter 'mi_stride' [-Wunused-parameter]
unused parameter 'xd' [-Wunused-parameter]

Change-Id: Idca7ef4508fae2b4b76a40d44507522a72ccc2c8
2016-06-22 18:29:03 -07:00
Alex Converse
83db21b2fd vpx_lpf_horizontal_4_sse2: Remove dead load.
Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e
2016-06-22 18:17:41 -07:00
Angie Chiang
d9c417cb49 set interp_filter to SWITCHABLE_FILTER for intra block
In vp9_pick_inter_mode(), instead of using
vp9_get_pred_context_switchable_interp(xd) to assign filter_ref,
we use a less strict condition on assigning filter_ref.
This is to reduce the probabily of entering the flow of not
assigning filter_ref and then skipping filter search.

Overall PSNR gain 0.074% for rtc dataset

Details:
Low    Mid     High
0.185% -0.008% -0.082%

Change-Id: Id5c5ab38d3766c213d5681e17b4d1afd1529e676
2016-06-22 17:19:43 -07:00
Alex Converse
b2597527a5 Merge "Repack vp9_token_state." 2016-06-23 00:17:23 +00:00
Jacky Chen
8496390e73 Merge "vp9: Change the scheme for modeling rd for bsize 32x32." 2016-06-22 23:50:46 +00:00
Johann
ac27b062b0 Add default flags for arm64/armv8 builds
Allows building simple targets with sane default flags.

For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
  --platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
  ~/libvpx/configure --target=arm64-linux-gcc --disable-multithread

BUG=webm:1143

Change-Id: I06f5a7564f5382cf1a4bad41aef4308566c53adf
2016-06-22 23:17:17 +00:00
James Zern
527a9fea76 Merge "remove vp10" 2016-06-22 22:35:57 +00:00
Linfeng Zhang
1517fb74fd Update vpx subpixel 1d filter ssse3 asm
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.

Change-Id: I37e3e9c5694737d9134a6bce6698d3e43f8fc962
2016-06-22 13:15:00 -07:00
jackychen
5c29ee726e vp9: Change the scheme for modeling rd for bsize 32x32.
For real-time CBR mode, use model_rd_for_sb_y_large instead of
model_rd_for_sb_y for 32x32 block. In the former model, transform
might be skipped more aggressively in some condtions, which speeds
up encoding time with only a little PSNR/SSIM drop on rtc test set.
No obvious visual quality regression.

PSNR effect on different speed setting:
speed 8 rtc:  0.129% overall PSNR drop, 0.137% SSIM drop
speed 7 rtc:  0.135% overall PSNR drop, 0.062% SSIM drop
speed 5 rtc_derf: 0.105% overall PSNR drop, 0.095% SSIM drop

Speed up:
gips_motion_WHD, 1mbps: 3.29% faster on speed 7, 2.56% faster on speed8
gips_stat_WHD, 1mbps: 2.17% faster on speed 7, 1.62% faster on speed8

Change-Id: I902f62def225ea01c145d7e5a93497398b8f5edf
2016-06-22 11:17:56 -07:00
Alex Converse
50d3629c61 Repack vp9_token_state.
Reduces size from 32 bytes to 24 bytes on x86_64.

Change-Id: I8a22552343a1fc916117f35267fe6a295250f742
2016-06-20 12:56:32 -07:00
James Zern
67edc5e83b remove vp10
development has moved to the nextgenv2 branch and a snapshot from here
was used to seed aomedia

BUG=b/29457125

Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
2016-06-17 18:26:08 -07:00
1322 changed files with 183939 additions and 204518 deletions

44
.gitignore vendored
View File

@@ -29,36 +29,40 @@
/examples/decode_with_drops
/examples/decode_with_partial_drops
/examples/example_xma
/examples/lossless_encoder
/examples/postproc
/examples/resize_util
/examples/set_maps
/examples/simple_decoder
/examples/simple_encoder
/examples/twopass_encoder
/examples/aom_cx_set_ref
/examples/av1_spatial_scalable_encoder
/examples/aom_temporal_scalable_patterns
/examples/aom_temporal_svc_encoder
/examples/vp8_multi_resolution_encoder
/examples/vp8cx_set_ref
/examples/vp9cx_set_ref
/examples/vp9_lossless_encoder
/examples/vp9_spatial_svc_encoder
/examples/vpx_temporal_svc_encoder
/ivfdec
/ivfdec.dox
/ivfenc
/ivfenc.dox
/libaom.so*
/libaom.ver
/libvpx.so*
/libvpx.ver
/samples.dox
/test_intra_pred_speed
/test_libaom
/aom_api1_migration.dox
/av1_rtcd.h
/aom.pc
/aom_config.c
/aom_config.h
/aom_dsp_rtcd.h
/aom_scale_rtcd.h
/aom_version.h
/aomdec
/aomdec.dox
/aomenc
/aomenc.dox
/test_libvpx
/tools.dox
/tools/*.dox
/tools/tiny_ssim
/vp8_api1_migration.dox
/vp[89x]_rtcd.h
/vpx.pc
/vpx_config.c
/vpx_config.h
/vpx_dsp_rtcd.h
/vpx_scale_rtcd.h
/vpx_version.h
/vpxdec
/vpxdec.dox
/vpxenc
/vpxenc.dox
TAGS

View File

@@ -3,6 +3,7 @@ Aex Converse <aconverse@google.com>
Aex Converse <aconverse@google.com> <alex.converse@gmail.com>
Alexis Ballier <aballier@gentoo.org> <alexis.ballier@gmail.com>
Alpha Lam <hclam@google.com> <hclam@chromium.org>
Daniele Castagna <dcastagna@chromium.org> <dcastagna@google.com>
Deb Mukherjee <debargha@google.com>
Erik Niemeyer <erik.a.niemeyer@intel.com> <erik.a.niemeyer@gmail.com>
Guillaume Martres <gmartres@google.com> <smarter3@gmail.com>
@@ -13,12 +14,15 @@ Jim Bankoski <jimbankoski@google.com>
Johann Koenig <johannkoenig@google.com>
Johann Koenig <johannkoenig@google.com> <johann.koenig@duck.com>
Johann Koenig <johannkoenig@google.com> <johann.koenig@gmail.com>
Johann Koenig <johannkoenig@google.com> <johannkoenig@chromium.org>
John Koleszar <jkoleszar@google.com>
Joshua Litt <joshualitt@google.com> <joshualitt@chromium.org>
Marco Paniconi <marpan@google.com>
Marco Paniconi <marpan@google.com> <marpan@chromium.org>
Pascal Massimino <pascal.massimino@gmail.com>
Paul Wilkins <paulwilkins@google.com>
Peter de Rivaz <peter.derivaz@gmail.com>
Peter de Rivaz <peter.derivaz@gmail.com> <peter.derivaz@argondesign.com>
Ralph Giles <giles@xiph.org> <giles@entropywave.com>
Ralph Giles <giles@xiph.org> <giles@mozilla.com>
Ronald S. Bultje <rsbultje@gmail.com> <rbultje@google.com>
@@ -26,7 +30,8 @@ Sami Pietilä <samipietila@google.com>
Tamar Levy <tamar.levy@intel.com>
Tamar Levy <tamar.levy@intel.com> <levytamar82@gmail.com>
Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
Timothy B. Terriberry <tterribe@xiph.org> Tim Terriberry <tterriberry@mozilla.com>
Timothy B. Terriberry <tterribe@xiph.org> <tterriberry@mozilla.com>
Tom Finegan <tomfinegan@google.com>
Tom Finegan <tomfinegan@google.com> <tomfinegan@chromium.org>
Yaowu Xu <yaowu@google.com> <yaowu@xuyaowu.com>
Yaowu Xu <yaowu@google.com> <Yaowu Xu>

31
AUTHORS
View File

@@ -7,6 +7,8 @@ Adam Xu <adam@xuyaowu.com>
Adrian Grange <agrange@google.com>
Aex Converse <aconverse@google.com>
Ahmad Sharif <asharif@google.com>
Aleksey Vasenev <margtu-fivt@ya.ru>
Alexander Potapenko <glider@google.com>
Alexander Voronov <avoronov@graphics.cs.msu.ru>
Alexis Ballier <aballier@gentoo.org>
Alok Ahuja <waveletcoeff@gmail.com>
@@ -24,8 +26,10 @@ changjun.yang <changjun.yang@intel.com>
Charles 'Buck' Krasic <ckrasic@google.com>
chm <chm@rock-chips.com>
Christian Duvivier <cduvivier@google.com>
Daniele Castagna <dcastagna@chromium.org>
Daniel Kang <ddkang@google.com>
Deb Mukherjee <debargha@google.com>
Deepa K G <deepa.kg@ittiam.com>
Dim Temp <dimtemp0@gmail.com>
Dmitry Kovalev <dkovalev@google.com>
Dragan Mrdjan <dmrdjan@mips.com>
@@ -36,6 +40,7 @@ Fabio Pedretti <fabio.ped@libero.it>
Frank Galligan <fgalligan@google.com>
Fredrik Söderquist <fs@opera.com>
Fritz Koenig <frkoenig@google.com>
Gabriel Marin <gmx@chromium.org>
Gaute Strokkenes <gaute.strokkenes@broadcom.com>
Geza Lore <gezalore@gmail.com>
Ghislain MARY <ghislainmary2@gmail.com>
@@ -47,6 +52,7 @@ Hangyu Kuang <hkuang@google.com>
Hanno Böck <hanno@hboeck.de>
Henrik Lundin <hlundin@google.com>
Hui Su <huisu@google.com>
Ivan Krasin <krasin@chromium.org>
Ivan Maltz <ivanmaltz@google.com>
Jacek Caban <cjacek@gmail.com>
Jacky Chen <jackychen@google.com>
@@ -56,16 +62,16 @@ James Zern <jzern@google.com>
Jan Gerber <j@mailb.org>
Jan Kratochvil <jan.kratochvil@redhat.com>
Janne Salonen <jsalonen@google.com>
Jean-Marc Valin <jmvalin@jmvalin.ca>
Jean-Yves Avenard <jyavenard@mozilla.com>
Jeff Faust <jfaust@google.com>
Jeff Muizelaar <jmuizelaar@mozilla.com>
Jeff Petkau <jpet@chromium.org>
Jerome Jiang <jianj@google.com>
Jia Jia <jia.jia@linaro.org>
Jian Zhou <zhoujian@google.com>
Jim Bankoski <jimbankoski@google.com>
Jingning Han <jingning@google.com>
Joey Parrish <joeyparrish@google.com>
Johann Koenig <johannkoenig@chromium.org>
Johann Koenig <johannkoenig@google.com>
John Koleszar <jkoleszar@google.com>
Johnny Klonaris <google@jawknee.com>
@@ -75,8 +81,10 @@ Joshua Litt <joshualitt@google.com>
Julia Robson <juliamrobson@gmail.com>
Justin Clift <justin@salasaga.org>
Justin Lebar <justin.lebar@gmail.com>
Kaustubh Raste <kaustubh.raste@imgtec.com>
KO Myung-Hun <komh@chollian.net>
Lawrence Velázquez <larryv@macports.org>
Linfeng Zhang <linfengz@google.com>
Lou Quillio <louquillio@google.com>
Luca Barbato <lu_zero@gentoo.org>
Makoto Kato <makoto.kt@gmail.com>
@@ -90,9 +98,11 @@ Michael Kohler <michaelkohler@live.com>
Mike Frysinger <vapier@chromium.org>
Mike Hommey <mhommey@mozilla.com>
Mikhal Shemer <mikhal@google.com>
Min Chen <chenm003@gmail.com>
Minghai Shang <minghai@google.com>
Min Ye <yeemmi@google.com>
Morton Jonuschat <yabawock@gmail.com>
Nathan E. Egge <negge@dgql.org>
Nathan E. Egge <negge@mozilla.com>
Nico Weber <thakis@chromium.org>
Parag Salasakar <img.mips1@gmail.com>
Pascal Massimino <pascal.massimino@gmail.com>
@@ -101,17 +111,19 @@ Paul Wilkins <paulwilkins@google.com>
Pavol Rusnak <stick@gk2.sk>
Paweł Hajdan <phajdan@google.com>
Pengchong Jin <pengchong@google.com>
Peter de Rivaz <peter.derivaz@argondesign.com>
Peter Boström <pbos@google.com>
Peter de Rivaz <peter.derivaz@gmail.com>
Philip Jägenstedt <philipj@opera.com>
Priit Laes <plaes@plaes.org>
Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
Rafaël Carré <funman@videolan.org>
Ralph Giles <giles@xiph.org>
Ranjit Kumar Tulabandu <ranjit.tulabandu@ittiam.com>
Rob Bradford <rob@linux.intel.com>
Ronald S. Bultje <rsbultje@gmail.com>
Rui Ueyama <ruiu@google.com>
Sami Pietilä <samipietila@google.com>
Sarah Parker <sarahparker@google.com>
Sasi Inguva <isasi@google.com>
Scott Graham <scottmg@chromium.org>
Scott LaVarnway <slavarnway@google.com>
@@ -121,7 +133,6 @@ Sergey Ulanov <sergeyu@chromium.org>
Shimon Doodkin <helpmepro1@gmail.com>
Shunyao Li <shunyaoli@google.com>
Stefan Holmer <holmer@google.com>
Steinar Midtskogen <stemidts@cisco.com>
Suman Sunkara <sunkaras@google.com>
Taekhyun Kim <takim@nvidia.com>
Takanori MATSUURA <t.matsuu@gmail.com>
@@ -129,16 +140,18 @@ Tamar Levy <tamar.levy@intel.com>
Tao Bai <michaelbai@chromium.org>
Tero Rintaluoma <teror@google.com>
Thijs Vermeir <thijsvermeir@gmail.com>
Thomas Daede <tdaede@mozilla.com>
Thomas Davies <thdavies@cisco.com>
Thomas <thdavies@cisco.com>
Tim Kopp <tkopp@google.com>
Timothy B. Terriberry <tterribe@xiph.org>
Tom Finegan <tomfinegan@google.com>
Tristan Matthews <le.businessman@gmail.com>
Tristan Matthews <tmatth@videolan.org>
Urvang Joshi <urvang@google.com>
Vignesh Venkatasubramanian <vigneshv@google.com>
Yaowu Xu <yaowu@google.com>
Yi Luo <luoyi@google.com>
Yongzhe Wang <yongzhe@google.com>
Yunqing Wang <yunqingwang@google.com>
Yury Gitman <yuryg@google.com>
Zoe Liu <zoeliu@google.com>
Google Inc.
The Mozilla Foundation
The Xiph.Org Foundation

View File

@@ -1,9 +1,49 @@
Next Release
- Incompatible changes:
The AV1 encoder's default keyframe interval changed to 128 from 9999.
2017-01-09 v1.6.1 "Long Tailed Duck"
This release improves upon the VP9 encoder and speeds up the encoding and
decoding processes.
- Upgrading:
This release is ABI compatible with 1.6.0.
- Enhancements:
Faster VP9 encoding and decoding.
High bit depth builds now provide similar speed for 8 bit encode and decode
for x86 targets. Other platforms and higher bit depth improvements are in
progress.
- Bug Fixes:
A variety of fuzzing issues.
2016-07-20 v1.6.0 "Khaki Campbell Duck"
This release improves upon the VP9 encoder and speeds up the encoding and
decoding processes.
- Upgrading:
This release is ABI incompatible with 1.5.0 due to a new 'color_range' enum
in vpx_image and some minor changes to the VP8_COMP structure.
The default key frame interval for VP9 has changed from 128 to 9999.
- Enhancement:
A core focus has been performance for low end Intel processors. SSSE3
instructions such as 'pshufb' have been avoided and instructions have been
reordered to better accommodate the more constrained pipelines.
As a result, devices based on Celeron processors have seen substantial
decoding improvements. From Indian Runner Duck to Javan Whistling Duck,
decoding speed improved between 10 and 30%. Between Javan Whistling Duck
and Khaki Campbell Duck, it improved another 10 to 15%.
While Celeron benefited most, Core-i5 also improved 5% and 10% between the
respective releases.
Realtime performance for WebRTC for both speed and quality has received a
lot of attention.
- Bug Fixes:
A number of fuzzing issues, found variously by Mozilla, Chromium and others,
have been fixed and we strongly recommend updating.
2016-04-07 v0.1.0 "AOMedia Codec 1"
This release is the first Alliance for Open Media codec.
2015-11-09 v1.5.0 "Javan Whistling Duck"
This release improves upon the VP9 encoder and speeds up the encoding and
decoding processes.

View File

@@ -1,270 +0,0 @@
##
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
cmake_minimum_required(VERSION 3.2)
project(AOM C CXX)
set(AOM_ROOT "${CMAKE_CURRENT_SOURCE_DIR}")
set(AOM_CONFIG_DIR "${CMAKE_CURRENT_BINARY_DIR}")
include("${AOM_ROOT}/build/cmake/aom_configure.cmake")
set(AOM_SRCS
"${AOM_CONFIG_DIR}/aom_config.c"
"${AOM_CONFIG_DIR}/aom_config.h"
"${AOM_ROOT}/aom/aom.h"
"${AOM_ROOT}/aom/aom_codec.h"
"${AOM_ROOT}/aom/aom_decoder.h"
"${AOM_ROOT}/aom/aom_encoder.h"
"${AOM_ROOT}/aom/aom_frame_buffer.h"
"${AOM_ROOT}/aom/aom_image.h"
"${AOM_ROOT}/aom/aom_integer.h"
"${AOM_ROOT}/aom/aomcx.h"
"${AOM_ROOT}/aom/aomdx.h"
"${AOM_ROOT}/aom/internal/aom_codec_internal.h"
"${AOM_ROOT}/aom/src/aom_codec.c"
"${AOM_ROOT}/aom/src/aom_decoder.c"
"${AOM_ROOT}/aom/src/aom_encoder.c"
"${AOM_ROOT}/aom/src/aom_image.c")
set(AOM_DSP_SRCS
"${AOM_ROOT}/aom_dsp/aom_convolve.c"
"${AOM_ROOT}/aom_dsp/aom_convolve.h"
"${AOM_ROOT}/aom_dsp/aom_dsp_common.h"
"${AOM_ROOT}/aom_dsp/aom_dsp_rtcd.c"
"${AOM_ROOT}/aom_dsp/aom_filter.h"
"${AOM_ROOT}/aom_dsp/aom_simd.c"
"${AOM_ROOT}/aom_dsp/aom_simd.h"
"${AOM_ROOT}/aom_dsp/aom_simd_inline.h"
"${AOM_ROOT}/aom_dsp/avg.c"
"${AOM_ROOT}/aom_dsp/bitreader.h"
"${AOM_ROOT}/aom_dsp/bitreader_buffer.c"
"${AOM_ROOT}/aom_dsp/bitreader_buffer.h"
"${AOM_ROOT}/aom_dsp/bitwriter.h"
"${AOM_ROOT}/aom_dsp/bitwriter_buffer.c"
"${AOM_ROOT}/aom_dsp/bitwriter_buffer.h"
"${AOM_ROOT}/aom_dsp/blend.h"
"${AOM_ROOT}/aom_dsp/blend_a64_hmask.c"
"${AOM_ROOT}/aom_dsp/blend_a64_mask.c"
"${AOM_ROOT}/aom_dsp/blend_a64_vmask.c"
"${AOM_ROOT}/aom_dsp/dkboolreader.c"
"${AOM_ROOT}/aom_dsp/dkboolreader.h"
"${AOM_ROOT}/aom_dsp/dkboolwriter.c"
"${AOM_ROOT}/aom_dsp/dkboolwriter.h"
"${AOM_ROOT}/aom_dsp/fwd_txfm.c"
"${AOM_ROOT}/aom_dsp/fwd_txfm.h"
"${AOM_ROOT}/aom_dsp/intrapred.c"
"${AOM_ROOT}/aom_dsp/inv_txfm.c"
"${AOM_ROOT}/aom_dsp/inv_txfm.h"
"${AOM_ROOT}/aom_dsp/loopfilter.c"
"${AOM_ROOT}/aom_dsp/prob.c"
"${AOM_ROOT}/aom_dsp/prob.h"
"${AOM_ROOT}/aom_dsp/psnr.c"
"${AOM_ROOT}/aom_dsp/psnr.h"
"${AOM_ROOT}/aom_dsp/quantize.c"
"${AOM_ROOT}/aom_dsp/quantize.h"
"${AOM_ROOT}/aom_dsp/sad.c"
"${AOM_ROOT}/aom_dsp/simd/v128_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v128_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/simd/v256_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v256_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/simd/v64_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v64_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/subtract.c"
"${AOM_ROOT}/aom_dsp/txfm_common.h"
"${AOM_ROOT}/aom_dsp/variance.c"
"${AOM_ROOT}/aom_dsp/variance.h")
set(AOM_MEM_SRCS
"${AOM_ROOT}/aom_mem/aom_mem.c"
"${AOM_ROOT}/aom_mem/aom_mem.h"
"${AOM_ROOT}/aom_mem/include/aom_mem_intrnl.h")
set(AOM_SCALE_SRCS
"${AOM_ROOT}/aom_scale/aom_scale.h"
"${AOM_ROOT}/aom_scale/aom_scale_rtcd.c"
"${AOM_ROOT}/aom_scale/generic/aom_scale.c"
"${AOM_ROOT}/aom_scale/generic/gen_scalers.c"
"${AOM_ROOT}/aom_scale/generic/yv12config.c"
"${AOM_ROOT}/aom_scale/generic/yv12extend.c"
"${AOM_ROOT}/aom_scale/yv12config.h")
# TODO(tomfinegan): Extract aom_ports from aom_util if possible.
set(AOM_UTIL_SRCS
"${AOM_ROOT}/aom_ports/aom_once.h"
"${AOM_ROOT}/aom_ports/aom_timer.h"
"${AOM_ROOT}/aom_ports/bitops.h"
"${AOM_ROOT}/aom_ports/emmintrin_compat.h"
"${AOM_ROOT}/aom_ports/mem.h"
"${AOM_ROOT}/aom_ports/mem_ops.h"
"${AOM_ROOT}/aom_ports/mem_ops_aligned.h"
"${AOM_ROOT}/aom_ports/msvc.h"
"${AOM_ROOT}/aom_ports/system_state.h"
"${AOM_ROOT}/aom_util/aom_thread.c"
"${AOM_ROOT}/aom_util/aom_thread.h"
"${AOM_ROOT}/aom_util/endian_inl.h")
set(AOM_AV1_COMMON_SRCS
"${AOM_ROOT}/av1/av1_iface_common.h"
"${AOM_ROOT}/av1/common/alloccommon.c"
"${AOM_ROOT}/av1/common/alloccommon.h"
"${AOM_ROOT}/av1/common/av1_fwd_txfm.c"
"${AOM_ROOT}/av1/common/av1_fwd_txfm.h"
"${AOM_ROOT}/av1/common/av1_inv_txfm.c"
"${AOM_ROOT}/av1/common/av1_inv_txfm.h"
"${AOM_ROOT}/av1/common/av1_rtcd.c"
"${AOM_ROOT}/av1/common/blockd.c"
"${AOM_ROOT}/av1/common/blockd.h"
"${AOM_ROOT}/av1/common/common.h"
"${AOM_ROOT}/av1/common/common_data.h"
"${AOM_ROOT}/av1/common/convolve.c"
"${AOM_ROOT}/av1/common/convolve.h"
"${AOM_ROOT}/av1/common/debugmodes.c"
"${AOM_ROOT}/av1/common/entropy.c"
"${AOM_ROOT}/av1/common/entropy.h"
"${AOM_ROOT}/av1/common/entropymode.c"
"${AOM_ROOT}/av1/common/entropymode.h"
"${AOM_ROOT}/av1/common/entropymv.c"
"${AOM_ROOT}/av1/common/entropymv.h"
"${AOM_ROOT}/av1/common/enums.h"
"${AOM_ROOT}/av1/common/filter.c"
"${AOM_ROOT}/av1/common/filter.h"
"${AOM_ROOT}/av1/common/frame_buffers.c"
"${AOM_ROOT}/av1/common/frame_buffers.h"
"${AOM_ROOT}/av1/common/idct.c"
"${AOM_ROOT}/av1/common/idct.h"
"${AOM_ROOT}/av1/common/loopfilter.c"
"${AOM_ROOT}/av1/common/loopfilter.h"
"${AOM_ROOT}/av1/common/mv.h"
"${AOM_ROOT}/av1/common/mvref_common.c"
"${AOM_ROOT}/av1/common/mvref_common.h"
"${AOM_ROOT}/av1/common/odintrin.c"
"${AOM_ROOT}/av1/common/odintrin.h"
"${AOM_ROOT}/av1/common/onyxc_int.h"
"${AOM_ROOT}/av1/common/pred_common.c"
"${AOM_ROOT}/av1/common/pred_common.h"
"${AOM_ROOT}/av1/common/quant_common.c"
"${AOM_ROOT}/av1/common/quant_common.h"
"${AOM_ROOT}/av1/common/reconinter.c"
"${AOM_ROOT}/av1/common/reconinter.h"
"${AOM_ROOT}/av1/common/reconintra.c"
"${AOM_ROOT}/av1/common/reconintra.h"
"${AOM_ROOT}/av1/common/scale.c"
"${AOM_ROOT}/av1/common/scale.h"
"${AOM_ROOT}/av1/common/scan.c"
"${AOM_ROOT}/av1/common/scan.h"
"${AOM_ROOT}/av1/common/seg_common.c"
"${AOM_ROOT}/av1/common/seg_common.h"
"${AOM_ROOT}/av1/common/thread_common.c"
"${AOM_ROOT}/av1/common/thread_common.h"
"${AOM_ROOT}/av1/common/tile_common.c"
"${AOM_ROOT}/av1/common/tile_common.h")
set(AOM_AV1_DECODER_SRCS
"${AOM_ROOT}/av1/av1_dx_iface.c"
"${AOM_ROOT}/av1/decoder/decodeframe.c"
"${AOM_ROOT}/av1/decoder/decodeframe.h"
"${AOM_ROOT}/av1/decoder/decodemv.c"
"${AOM_ROOT}/av1/decoder/decodemv.h"
"${AOM_ROOT}/av1/decoder/decoder.c"
"${AOM_ROOT}/av1/decoder/decoder.h"
"${AOM_ROOT}/av1/decoder/detokenize.c"
"${AOM_ROOT}/av1/decoder/detokenize.h"
"${AOM_ROOT}/av1/decoder/dsubexp.c"
"${AOM_ROOT}/av1/decoder/dsubexp.h"
"${AOM_ROOT}/av1/decoder/dthread.c"
"${AOM_ROOT}/av1/decoder/dthread.h")
set(AOM_AV1_ENCODER_SRCS
"${AOM_ROOT}/av1/av1_cx_iface.c"
"${AOM_ROOT}/av1/encoder/aq_complexity.c"
"${AOM_ROOT}/av1/encoder/aq_complexity.h"
"${AOM_ROOT}/av1/encoder/aq_cyclicrefresh.c"
"${AOM_ROOT}/av1/encoder/aq_cyclicrefresh.h"
"${AOM_ROOT}/av1/encoder/aq_variance.c"
"${AOM_ROOT}/av1/encoder/aq_variance.h"
"${AOM_ROOT}/av1/encoder/bitstream.c"
"${AOM_ROOT}/av1/encoder/bitstream.h"
"${AOM_ROOT}/av1/encoder/block.h"
"${AOM_ROOT}/av1/encoder/context_tree.c"
"${AOM_ROOT}/av1/encoder/context_tree.h"
"${AOM_ROOT}/av1/encoder/cost.c"
"${AOM_ROOT}/av1/encoder/cost.h"
"${AOM_ROOT}/av1/encoder/dct.c"
"${AOM_ROOT}/av1/encoder/encodeframe.c"
"${AOM_ROOT}/av1/encoder/encodeframe.h"
"${AOM_ROOT}/av1/encoder/encodemb.c"
"${AOM_ROOT}/av1/encoder/encodemb.h"
"${AOM_ROOT}/av1/encoder/encodemv.c"
"${AOM_ROOT}/av1/encoder/encodemv.h"
"${AOM_ROOT}/av1/encoder/encoder.c"
"${AOM_ROOT}/av1/encoder/encoder.h"
"${AOM_ROOT}/av1/encoder/ethread.c"
"${AOM_ROOT}/av1/encoder/ethread.h"
"${AOM_ROOT}/av1/encoder/extend.c"
"${AOM_ROOT}/av1/encoder/extend.h"
"${AOM_ROOT}/av1/encoder/firstpass.c"
"${AOM_ROOT}/av1/encoder/firstpass.h"
"${AOM_ROOT}/av1/encoder/hybrid_fwd_txfm.c"
"${AOM_ROOT}/av1/encoder/hybrid_fwd_txfm.h"
"${AOM_ROOT}/av1/encoder/lookahead.c"
"${AOM_ROOT}/av1/encoder/lookahead.h"
"${AOM_ROOT}/av1/encoder/mbgraph.c"
"${AOM_ROOT}/av1/encoder/mbgraph.h"
"${AOM_ROOT}/av1/encoder/mcomp.c"
"${AOM_ROOT}/av1/encoder/mcomp.h"
"${AOM_ROOT}/av1/encoder/picklpf.c"
"${AOM_ROOT}/av1/encoder/picklpf.h"
"${AOM_ROOT}/av1/encoder/quantize.c"
"${AOM_ROOT}/av1/encoder/quantize.h"
"${AOM_ROOT}/av1/encoder/ratectrl.c"
"${AOM_ROOT}/av1/encoder/ratectrl.h"
"${AOM_ROOT}/av1/encoder/rd.c"
"${AOM_ROOT}/av1/encoder/rd.h"
"${AOM_ROOT}/av1/encoder/rdopt.c"
"${AOM_ROOT}/av1/encoder/rdopt.h"
"${AOM_ROOT}/av1/encoder/resize.c"
"${AOM_ROOT}/av1/encoder/resize.h"
"${AOM_ROOT}/av1/encoder/segmentation.c"
"${AOM_ROOT}/av1/encoder/segmentation.h"
"${AOM_ROOT}/av1/encoder/speed_features.c"
"${AOM_ROOT}/av1/encoder/speed_features.h"
"${AOM_ROOT}/av1/encoder/subexp.c"
"${AOM_ROOT}/av1/encoder/subexp.h"
"${AOM_ROOT}/av1/encoder/temporal_filter.c"
"${AOM_ROOT}/av1/encoder/temporal_filter.h"
"${AOM_ROOT}/av1/encoder/tokenize.c"
"${AOM_ROOT}/av1/encoder/tokenize.h"
"${AOM_ROOT}/av1/encoder/treewriter.c"
"${AOM_ROOT}/av1/encoder/treewriter.h")
# Targets
add_library(aom_dsp ${AOM_DSP_SRCS})
include_directories(${AOM_ROOT} ${AOM_CONFIG_DIR})
add_library(aom_mem ${AOM_MEM_SRCS})
add_library(aom_scale ${AOM_SCALE_SRCS})
include_directories(${AOM_ROOT} ${AOM_CONFIG_DIR})
add_library(aom_util ${AOM_UTIL_SRCS})
add_library(aom_av1_decoder ${AOM_AV1_DECODER_SRCS})
add_library(aom_av1_encoder ${AOM_AV1_ENCODER_SRCS})
add_library(aom ${AOM_SRCS})
target_link_libraries(aom LINK_PUBLIC
aom_dsp
aom_mem
aom_scale
aom_util
aom_av1_decoder
aom_av1_encoder)
add_executable(simple_decoder examples/simple_decoder.c)
include_directories(${AOM_ROOT})
target_link_libraries(simple_decoder LINK_PUBLIC aom)
add_executable(simple_encoder examples/simple_encoder.c)
include_directories(${AOM_ROOT})
target_link_libraries(simple_encoder LINK_PUBLIC aom)

42
LICENSE
View File

@@ -1,27 +1,31 @@
Copyright (c) 2016, Alliance for Open Media. All rights reserved.
Copyright (c) 2010, The WebM Project authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Google, nor the WebM Project, nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

127
PATENTS
View File

@@ -1,108 +1,23 @@
Alliance for Open Media Patent License 1.0
Additional IP Rights Grant (Patents)
------------------------------------
1. License Terms.
1.1. Patent License. Subject to the terms and conditions of this License, each
Licensor, on behalf of itself and successors in interest and assigns,
grants Licensee a non-sublicensable, perpetual, worldwide, non-exclusive,
no-charge, royalty-free, irrevocable (except as expressly stated in this
License) patent license to its Necessary Claims to make, use, sell, offer
for sale, import or distribute any Implementation.
1.2. Conditions.
1.2.1. Availability. As a condition to the grant of rights to Licensee to make,
sell, offer for sale, import or distribute an Implementation under
Section 1.1, Licensee must make its Necessary Claims available under
this License, and must reproduce this License with any Implementation
as follows:
a. For distribution in source code, by including this License in the
root directory of the source code with its Implementation.
b. For distribution in any other form (including binary, object form,
and/or hardware description code (e.g., HDL, RTL, Gate Level Netlist,
GDSII, etc.)), by including this License in the documentation, legal
notices, and/or other written materials provided with the
Implementation.
1.2.2. Additional Conditions. This license is directly from Licensor to
Licensee. Licensee acknowledges as a condition of benefiting from it
that no rights from Licensor are received from suppliers, distributors,
or otherwise in connection with this License.
1.3. Defensive Termination. If any Licensee, its Affiliates, or its agents
initiates patent litigation or files, maintains, or voluntarily
participates in a lawsuit against another entity or any person asserting
that any Implementation infringes Necessary Claims, any patent licenses
granted under this License directly to the Licensee are immediately
terminated as of the date of the initiation of action unless 1) that suit
was in response to a corresponding suit regarding an Implementation first
brought against an initiating entity, or 2) that suit was brought to
enforce the terms of this License (including intervention in a third-party
action by a Licensee).
1.4. Disclaimers. The Reference Implementation and Specification are provided
"AS IS" and without warranty. The entire risk as to implementing or
otherwise using the Reference Implementation or Specification is assumed
by the implementer and user. Licensor expressly disclaims any warranties
(express, implied, or otherwise), including implied warranties of
merchantability, non-infringement, fitness for a particular purpose, or
title, related to the material. IN NO EVENT WILL LICENSOR BE LIABLE TO
ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF
ACTION OF ANY KIND WITH RESPECT TO THIS LICENSE, WHETHER BASED ON BREACH
OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR
NOT THE OTHER PARTRY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2. Definitions.
2.1. Affiliate. <20>Affiliate<74> means an entity that directly or indirectly
Controls, is Controlled by, or is under common Control of that party.
2.2. Control. <20>Control<6F> means direct or indirect control of more than 50% of
the voting power to elect directors of that corporation, or for any other
entity, the power to direct management of such entity.
2.3. Decoder. "Decoder" means any decoder that conforms fully with all
non-optional portions of the Specification.
2.4. Encoder. "Encoder" means any encoder that produces a bitstream that can
be decoded by a Decoder only to the extent it produces such a bitstream.
2.5. Final Deliverable. <20>Final Deliverable<6C> means the final version of a
deliverable approved by the Alliance for Open Media as a Final
Deliverable.
2.6. Implementation. "Implementation" means any implementation, including the
Reference Implementation, that is an Encoder and/or a Decoder. An
Implementation also includes components of an Implementation only to the
extent they are used as part of an Implementation.
2.7. License. <20>License<73> means this license.
2.8. Licensee. <20>Licensee<65> means any person or entity who exercises patent
rights granted under this License.
2.9. Licensor. "Licensor" means (i) any Licensee that makes, sells, offers
for sale, imports or distributes any Implementation, or (ii) a person
or entity that has a licensing obligation to the Implementation as a
result of its membership and/or participation in the Alliance for Open
Media working group that developed the Specification.
2.10. Necessary Claims. "Necessary Claims" means all claims of patents or
patent applications, (a) that currently or at any time in the future,
are owned or controlled by the Licensor, and (b) (i) would be an
Essential Claim as defined by the W3C Policy as of February 5, 2004
(https://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential)
as if the Specification was a W3C Recommendation; or (ii) are infringed
by the Reference Implementation.
2.11. Reference Implementation. <20>Reference Implementation<6F> means an Encoder
and/or Decoder released by the Alliance for Open Media as a Final
Deliverable.
2.12. Specification. <20>Specification<6F> means the specification designated by
the Alliance for Open Media as a Final Deliverable for which this
License was issued.
"These implementations" means the copyrightable works that implement the WebM
codecs distributed by Google as part of the WebM Project.
Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge,
royalty-free, irrevocable (except as stated in this section) patent license to
make, have made, use, offer to sell, sell, import, transfer, and otherwise
run, modify and propagate the contents of these implementations of WebM, where
such license applies only to those patent claims, both currently owned by
Google and acquired in the future, licensable by Google that are necessarily
infringed by these implementations of WebM. This grant does not include claims
that would be infringed only as a consequence of further modification of these
implementations. If you or your agent or exclusive licensee institute or order
or agree to the institution of patent litigation or any other patent
enforcement activity against any entity (including a cross-claim or
counterclaim in a lawsuit) alleging that any of these implementations of WebM
or any code incorporated within any of these implementations of WebM
constitute direct or contributory patent infringement, or inducement of
patent infringement, then any patent rights granted to you under this License
for these implementations of WebM shall terminate as of the date such
litigation is filed.

26
README
View File

@@ -1,6 +1,6 @@
README - 23 March 2015
README - 9 January 2017
Welcome to the WebM VP8/AV1 Codec SDK!
Welcome to the WebM VP8/VP9 Codec SDK!
COMPILING THE APPLICATIONS/LIBRARIES:
The build system used is similar to autotools. Building generally consists of
@@ -33,13 +33,13 @@ COMPILING THE APPLICATIONS/LIBRARIES:
$ mkdir build
$ cd build
$ ../libaom/configure <options>
$ ../libvpx/configure <options>
$ make
3. Configuration options
The 'configure' script supports a number of options. The --help option can be
used to get a list of supported options:
$ ../libaom/configure --help
$ ../libvpx/configure --help
4. Cross development
For cross development, the most notable option is the --target option. The
@@ -47,10 +47,9 @@ COMPILING THE APPLICATIONS/LIBRARIES:
--help output of the configure script. As of this writing, the list of
available targets is:
armv6-linux-rvct
armv6-linux-gcc
armv6-none-rvct
arm64-android-gcc
arm64-darwin-gcc
arm64-linux-gcc
armv7-android-gcc
armv7-darwin-gcc
armv7-linux-rvct
@@ -60,6 +59,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
armv7-win32-vs12
armv7-win32-vs14
armv7s-darwin-gcc
armv8-linux-gcc
mips32-linux-gcc
mips64-linux-gcc
sparc-solaris-gcc
@@ -73,6 +73,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
x86-darwin12-gcc
x86-darwin13-gcc
x86-darwin14-gcc
x86-darwin15-gcc
x86-iphonesimulator-gcc
x86-linux-gcc
x86-linux-icc
@@ -90,6 +91,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
x86_64-darwin12-gcc
x86_64-darwin13-gcc
x86_64-darwin14-gcc
x86_64-darwin15-gcc
x86_64-iphonesimulator-gcc
x86_64-linux-gcc
x86_64-linux-icc
@@ -108,7 +110,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
toolchain, the following command could be used (note, POSIX SH syntax, adapt
to your shell as necessary):
$ CROSS=mipsel-linux-uclibc- ../libaom/configure
$ CROSS=mipsel-linux-uclibc- ../libvpx/configure
In addition, the executables to be invoked can be overridden by specifying the
environment variables: CC, AR, LD, AS, STRIP, NM. Additional flags can be
@@ -119,13 +121,13 @@ COMPILING THE APPLICATIONS/LIBRARIES:
This defaults to config.log. This should give a good indication of what went
wrong. If not, contact us for support.
VP8/AV1 TEST VECTORS:
VP8/VP9 TEST VECTORS:
The test vectors can be downloaded and verified using the build system after
running configure. To specify an alternate directory the
LIBAOM_TEST_DATA_PATH environment variable can be used.
LIBVPX_TEST_DATA_PATH environment variable can be used.
$ ./configure --enable-unit-tests
$ LIBAOM_TEST_DATA_PATH=../-test-data make testdata
$ LIBVPX_TEST_DATA_PATH=../libvpx-test-data make testdata
CODE STYLE:
The coding style used by this project is enforced with clang-format using the
@@ -144,5 +146,5 @@ CODE STYLE:
SUPPORT
This library is an open source project supported by its community. Please
please email webm-discuss@webmproject.org for help.
email webm-discuss@webmproject.org for help.

160
aom/aom.h
View File

@@ -1,160 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup aom AOM
* \ingroup codecs
* AOM is aom's newest video compression algorithm that uses motion
* compensated prediction, Discrete Cosine Transform (DCT) coding of the
* prediction error signal and context dependent entropy coding techniques
* based on arithmetic principles. It features:
* - YUV 4:2:0 image format
* - Macro-block based coding (16x16 luma plus two 8x8 chroma)
* - 1/4 (1/8) pixel accuracy motion compensated prediction
* - 4x4 DCT transform
* - 128 level linear quantizer
* - In loop deblocking filter
* - Context-based entropy coding
*
* @{
*/
/*!\file
* \brief Provides controls common to both the AOM encoder and decoder.
*/
#ifndef AOM_AOM_H_
#define AOM_AOM_H_
#include "./aom_codec.h"
#include "./aom_image.h"
#ifdef __cplusplus
extern "C" {
#endif
/*!\brief Control functions
*
* The set of macros define the control functions of AOM interface
*/
enum aom_com_control_id {
/*!\brief pass in an external frame into decoder to be used as reference frame
*/
AOM_SET_REFERENCE = 1,
AOM_COPY_REFERENCE = 2, /**< get a copy of reference frame from the decoder */
AOM_SET_POSTPROC = 3, /**< set the decoder's post processing settings */
AOM_SET_DBG_COLOR_REF_FRAME =
4, /**< set the reference frames to color for each macroblock */
AOM_SET_DBG_COLOR_MB_MODES = 5, /**< set which macro block modes to color */
AOM_SET_DBG_COLOR_B_MODES = 6, /**< set which blocks modes to color */
AOM_SET_DBG_DISPLAY_MV = 7, /**< set which motion vector modes to draw */
/* TODO(jkoleszar): The encoder incorrectly reuses some of these values (5+)
* for its control ids. These should be migrated to something like the
* AOM_DECODER_CTRL_ID_START range next time we're ready to break the ABI.
*/
AV1_GET_REFERENCE = 128, /**< get a pointer to a reference frame */
AOM_COMMON_CTRL_ID_MAX,
AV1_GET_NEW_FRAME_IMAGE = 192, /**< get a pointer to the new frame */
AOM_DECODER_CTRL_ID_START = 256
};
/*!\brief post process flags
*
* The set of macros define AOM decoder post processing flags
*/
enum aom_postproc_level {
AOM_NOFILTERING = 0,
AOM_DEBLOCK = 1 << 0,
AOM_DEMACROBLOCK = 1 << 1,
AOM_ADDNOISE = 1 << 2,
AOM_DEBUG_TXT_FRAME_INFO = 1 << 3, /**< print frame information */
AOM_DEBUG_TXT_MBLK_MODES =
1 << 4, /**< print macro block modes over each macro block */
AOM_DEBUG_TXT_DC_DIFF = 1 << 5, /**< print dc diff for each macro block */
AOM_DEBUG_TXT_RATE_INFO = 1 << 6, /**< print video rate info (encoder only) */
AOM_MFQE = 1 << 10
};
/*!\brief post process flags
*
* This define a structure that describe the post processing settings. For
* the best objective measure (using the PSNR metric) set post_proc_flag
* to AOM_DEBLOCK and deblocking_level to 1.
*/
typedef struct aom_postproc_cfg {
/*!\brief the types of post processing to be done, should be combination of
* "aom_postproc_level" */
int post_proc_flag;
int deblocking_level; /**< the strength of deblocking, valid range [0, 16] */
int noise_level; /**< the strength of additive noise, valid range [0, 16] */
} aom_postproc_cfg_t;
/*!\brief reference frame type
*
* The set of macros define the type of AOM reference frames
*/
typedef enum aom_ref_frame_type {
AOM_LAST_FRAME = 1,
AOM_GOLD_FRAME = 2,
AOM_ALTR_FRAME = 4
} aom_ref_frame_type_t;
/*!\brief reference frame data struct
*
* Define the data struct to access aom reference frames.
*/
typedef struct aom_ref_frame {
aom_ref_frame_type_t frame_type; /**< which reference frame */
aom_image_t img; /**< reference frame data in image format */
} aom_ref_frame_t;
/*!\brief AV1 specific reference frame data struct
*
* Define the data struct to access av1 reference frames.
*/
typedef struct av1_ref_frame {
int idx; /**< frame index to get (input) */
aom_image_t img; /**< img structure to populate (output) */
} av1_ref_frame_t;
/*!\cond */
/*!\brief aom decoder control function parameter type
*
* defines the data type for each of AOM decoder control function requires
*/
AOM_CTRL_USE_TYPE(AOM_SET_REFERENCE, aom_ref_frame_t *)
#define AOM_CTRL_AOM_SET_REFERENCE
AOM_CTRL_USE_TYPE(AOM_COPY_REFERENCE, aom_ref_frame_t *)
#define AOM_CTRL_AOM_COPY_REFERENCE
AOM_CTRL_USE_TYPE(AOM_SET_POSTPROC, aom_postproc_cfg_t *)
#define AOM_CTRL_AOM_SET_POSTPROC
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_REF_FRAME, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_REF_FRAME
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_MB_MODES, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_MB_MODES
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_B_MODES, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_B_MODES
AOM_CTRL_USE_TYPE(AOM_SET_DBG_DISPLAY_MV, int)
#define AOM_CTRL_AOM_SET_DBG_DISPLAY_MV
AOM_CTRL_USE_TYPE(AV1_GET_REFERENCE, av1_ref_frame_t *)
#define AOM_CTRL_AV1_GET_REFERENCE
AOM_CTRL_USE_TYPE(AV1_GET_NEW_FRAME_IMAGE, aom_image_t *)
#define AOM_CTRL_AV1_GET_NEW_FRAME_IMAGE
/*!\endcond */
/*! @} - end defgroup aom */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOM_H_

View File

@@ -1,42 +0,0 @@
##
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
API_EXPORTS += exports
API_SRCS-$(CONFIG_AV1_ENCODER) += aom.h
API_SRCS-$(CONFIG_AV1_ENCODER) += aomcx.h
API_DOC_SRCS-$(CONFIG_AV1_ENCODER) += aom.h
API_DOC_SRCS-$(CONFIG_AV1_ENCODER) += aomcx.h
API_SRCS-$(CONFIG_AV1_DECODER) += aom.h
API_SRCS-$(CONFIG_AV1_DECODER) += aomdx.h
API_DOC_SRCS-$(CONFIG_AV1_DECODER) += aom.h
API_DOC_SRCS-$(CONFIG_AV1_DECODER) += aomdx.h
API_DOC_SRCS-yes += aom_codec.h
API_DOC_SRCS-yes += aom_decoder.h
API_DOC_SRCS-yes += aom_encoder.h
API_DOC_SRCS-yes += aom_frame_buffer.h
API_DOC_SRCS-yes += aom_image.h
API_SRCS-yes += src/aom_decoder.c
API_SRCS-yes += aom_decoder.h
API_SRCS-yes += src/aom_encoder.c
API_SRCS-yes += aom_encoder.h
API_SRCS-yes += internal/aom_codec_internal.h
API_SRCS-yes += src/aom_codec.c
API_SRCS-yes += src/aom_image.c
API_SRCS-yes += aom_codec.h
API_SRCS-yes += aom_codec.mk
API_SRCS-yes += aom_frame_buffer.h
API_SRCS-yes += aom_image.h
API_SRCS-yes += aom_integer.h

View File

@@ -1,759 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOMCX_H_
#define AOM_AOMCX_H_
/*!\defgroup aom_encoder AOMedia AOM/AV1 Encoder
* \ingroup aom
*
* @{
*/
#include "./aom.h"
#include "./aom_encoder.h"
/*!\file
* \brief Provides definitions for using AOM or AV1 encoder algorithm within the
* aom Codec Interface.
*/
#ifdef __cplusplus
extern "C" {
#endif
/*!\name Algorithm interface for AV1
*
* This interface provides the capability to encode raw AV1 streams.
* @{
*/
extern aom_codec_iface_t aom_codec_av1_cx_algo;
extern aom_codec_iface_t *aom_codec_av1_cx(void);
/*!@} - end algorithm interface member group*/
/*
* Algorithm Flags
*/
/*!\brief Don't reference the last frame
*
* When this flag is set, the encoder will not use the last frame as a
* predictor. When not set, the encoder will choose whether to use the
* last frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_LAST (1 << 16)
/*!\brief Don't reference the golden frame
*
* When this flag is set, the encoder will not use the golden frame as a
* predictor. When not set, the encoder will choose whether to use the
* golden frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_GF (1 << 17)
/*!\brief Don't reference the alternate reference frame
*
* When this flag is set, the encoder will not use the alt ref frame as a
* predictor. When not set, the encoder will choose whether to use the
* alt ref frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_ARF (1 << 21)
/*!\brief Don't update the last frame
*
* When this flag is set, the encoder will not update the last frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_LAST (1 << 18)
/*!\brief Don't update the golden frame
*
* When this flag is set, the encoder will not update the golden frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_GF (1 << 22)
/*!\brief Don't update the alternate reference frame
*
* When this flag is set, the encoder will not update the alt ref frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_ARF (1 << 23)
/*!\brief Force golden frame update
*
* When this flag is set, the encoder copy the contents of the current frame
* to the golden frame buffer.
*/
#define AOM_EFLAG_FORCE_GF (1 << 19)
/*!\brief Force alternate reference frame update
*
* When this flag is set, the encoder copy the contents of the current frame
* to the alternate reference frame buffer.
*/
#define AOM_EFLAG_FORCE_ARF (1 << 24)
/*!\brief Disable entropy update
*
* When this flag is set, the encoder will not update its internal entropy
* model based on the entropy of this frame.
*/
#define AOM_EFLAG_NO_UPD_ENTROPY (1 << 20)
/*!\brief AVx encoder control functions
*
* This set of macros define the control functions available for AVx
* encoder interface.
*
* \sa #aom_codec_control
*/
enum aome_enc_control_id {
/*!\brief Codec control function to set which reference frame encoder can use.
*
* Supported in codecs: VP8, AV1
*/
AOME_USE_REFERENCE = 7,
/*!\brief Codec control function to pass an ROI map to encoder.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ROI_MAP = 8,
/*!\brief Codec control function to pass an Active map to encoder.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ACTIVEMAP,
/*!\brief Codec control function to set encoder scaling mode.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_SCALEMODE = 11,
/*!\brief Codec control function to set encoder internal speed settings.
*
* Changes in this value influences, among others, the encoder's selection
* of motion estimation methods. Values greater than 0 will increase encoder
* speed at the expense of quality.
*
* \note Valid range for VP8: -16..16
* \note Valid range for AV1: -8..8
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_CPUUSED = 13,
/*!\brief Codec control function to enable automatic set and use alf frames.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ENABLEAUTOALTREF,
#if CONFIG_EXT_REFS
/*!\brief Codec control function to enable automatic set and use
* bwd-pred frames.
*
* Supported in codecs: AV1
*/
AOME_SET_ENABLEAUTOBWDREF,
#endif // CONFIG_EXT_REFS
/*!\brief control function to set noise sensitivity
*
* 0: off, 1: OnYOnly, 2: OnYUV,
* 3: OnYUVAggressive, 4: Adaptive
*
* Supported in codecs: VP8
*/
AOME_SET_NOISE_SENSITIVITY,
/*!\brief Codec control function to set sharpness.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_SHARPNESS,
/*!\brief Codec control function to set the threshold for MBs treated static.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_STATIC_THRESHOLD,
/*!\brief Codec control function to set the number of token partitions.
*
* Supported in codecs: VP8
*/
AOME_SET_TOKEN_PARTITIONS,
/*!\brief Codec control function to get last quantizer chosen by the encoder.
*
* Return value uses internal quantizer scale defined by the codec.
*
* Supported in codecs: VP8, AV1
*/
AOME_GET_LAST_QUANTIZER,
/*!\brief Codec control function to get last quantizer chosen by the encoder.
*
* Return value uses the 0..63 scale as used by the rc_*_quantizer config
* parameters.
*
* Supported in codecs: VP8, AV1
*/
AOME_GET_LAST_QUANTIZER_64,
/*!\brief Codec control function to set the max no of frames to create arf.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ARNR_MAXFRAMES,
/*!\brief Codec control function to set the filter strength for the arf.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ARNR_STRENGTH,
/*!\deprecated control function to set the filter type to use for the arf. */
AOME_SET_ARNR_TYPE,
/*!\brief Codec control function to set visual tuning.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_TUNING,
/*!\brief Codec control function to set constrained quality level.
*
* \attention For this value to be used aom_codec_enc_cfg_t::g_usage must be
* set to #AOM_CQ.
* \note Valid range: 0..63
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_CQ_LEVEL,
/*!\brief Codec control function to set Max data rate for Intra frames.
*
* This value controls additional clamping on the maximum size of a
* keyframe. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* unlimited, or no additional clamping beyond the codec's built-in
* algorithm.
*
* For example, to allocate no more than 4.5 frames worth of bitrate
* to a keyframe, set this to 450.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_MAX_INTRA_BITRATE_PCT,
/*!\brief Codec control function to set reference and update frame flags.
*
* Supported in codecs: VP8
*/
AOME_SET_FRAME_FLAGS,
/*!\brief Codec control function to set max data rate for Inter frames.
*
* This value controls additional clamping on the maximum size of an
* inter frame. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* unlimited, or no additional clamping beyond the codec's built-in
* algorithm.
*
* For example, to allow no more than 4.5 frames worth of bitrate
* to an inter frame, set this to 450.
*
* Supported in codecs: AV1
*/
AV1E_SET_MAX_INTER_BITRATE_PCT,
/*!\brief Boost percentage for Golden Frame in CBR mode.
*
* This value controls the amount of boost given to Golden Frame in
* CBR mode. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* the feature is off, i.e., no golden frame boost in CBR mode and
* average bitrate target is used.
*
* For example, to allow 100% more bits, i.e, 2X, in a golden frame
* than average frame, set this to 100.
*
* Supported in codecs: AV1
*/
AV1E_SET_GF_CBR_BOOST_PCT,
/*!\brief Codec control function to set encoder screen content mode.
*
* 0: off, 1: On, 2: On with more aggressive rate control.
*
* Supported in codecs: VP8
*/
AOME_SET_SCREEN_CONTENT_MODE,
/*!\brief Codec control function to set lossless encoding mode.
*
* AV1 can operate in lossless encoding mode, in which the bitstream
* produced will be able to decode and reconstruct a perfect copy of
* input source. This control function provides a mean to switch encoder
* into lossless coding mode(1) or normal coding mode(0) that may be lossy.
* 0 = lossy coding mode
* 1 = lossless coding mode
*
* By default, encoder operates in normal coding mode (maybe lossy).
*
* Supported in codecs: AV1
*/
AV1E_SET_LOSSLESS,
#if CONFIG_AOM_QM
/*!\brief Codec control function to encode with quantisation matrices.
*
* AOM can operate with default quantisation matrices dependent on
* quantisation level and block type.
* 0 = do not use quantisation matrices
* 1 = use quantisation matrices
*
* By default, the encoder operates without quantisation matrices.
*
* Supported in codecs: AOM
*/
AV1E_SET_ENABLE_QM,
/*!\brief Codec control function to set the min quant matrix flatness.
*
* AOM can operate with different ranges of quantisation matrices.
* As quantisation levels increase, the matrices get flatter. This
* control sets the minimum level of flatness from which the matrices
* are determined.
*
* By default, the encoder sets this minimum at half the available
* range.
*
* Supported in codecs: AOM
*/
AV1E_SET_QM_MIN,
/*!\brief Codec control function to set the max quant matrix flatness.
*
* AOM can operate with different ranges of quantisation matrices.
* As quantisation levels increase, the matrices get flatter. This
* control sets the maximum level of flatness possible.
*
* By default, the encoder sets this maximum at the top of the
* available range.
*
* Supported in codecs: AOM
*/
AV1E_SET_QM_MAX,
#endif
/*!\brief Codec control function to set number of tile columns.
*
* In encoding and decoding, AV1 allows an input image frame be partitioned
* into separated vertical tile columns, which can be encoded or decoded
* independently. This enables easy implementation of parallel encoding and
* decoding. This control requests the encoder to use column tiles in
* encoding an input frame, with number of tile columns (in Log2 unit) as
* the parameter:
* 0 = 1 tile column
* 1 = 2 tile columns
* 2 = 4 tile columns
* .....
* n = 2**n tile columns
* The requested tile columns will be capped by encoder based on image size
* limitation (The minimum width of a tile column is 256 pixel, the maximum
* is 4096).
*
* By default, the value is 0, i.e. one single column tile for entire image.
*
* Supported in codecs: AV1
*/
AV1E_SET_TILE_COLUMNS,
/*!\brief Codec control function to set number of tile rows.
*
* In encoding and decoding, AV1 allows an input image frame be partitioned
* into separated horizontal tile rows. Tile rows are encoded or decoded
* sequentially. Even though encoding/decoding of later tile rows depends on
* earlier ones, this allows the encoder to output data packets for tile rows
* prior to completely processing all tile rows in a frame, thereby reducing
* the latency in processing between input and output. The parameter
* for this control describes the number of tile rows, which has a valid
* range [0, 2]:
* 0 = 1 tile row
* 1 = 2 tile rows
* 2 = 4 tile rows
*
* By default, the value is 0, i.e. one single row tile for entire image.
*
* Supported in codecs: AV1
*/
AV1E_SET_TILE_ROWS,
/*!\brief Codec control function to enable frame parallel decoding feature.
*
* AV1 has a bitstream feature to reduce decoding dependency between frames
* by turning off backward update of probability context used in encoding
* and decoding. This allows staged parallel processing of more than one
* video frames in the decoder. This control function provides a mean to
* turn this feature on or off for bitstreams produced by encoder.
*
* By default, this feature is off.
*
* Supported in codecs: AV1
*/
AV1E_SET_FRAME_PARALLEL_DECODING,
/*!\brief Codec control function to set adaptive quantization mode.
*
* AV1 has a segment based feature that allows encoder to adaptively change
* quantization parameter for each segment within a frame to improve the
* subjective quality. This control makes encoder operate in one of the
* several AQ_modes supported.
*
* By default, encoder operates with AQ_Mode 0(adaptive quantization off).
*
* Supported in codecs: AV1
*/
AV1E_SET_AQ_MODE,
/*!\brief Codec control function to enable/disable periodic Q boost.
*
* One AV1 encoder speed feature is to enable quality boost by lowering
* frame level Q periodically. This control function provides a mean to
* turn on/off this feature.
* 0 = off
* 1 = on
*
* By default, the encoder is allowed to use this feature for appropriate
* encoding modes.
*
* Supported in codecs: AV1
*/
AV1E_SET_FRAME_PERIODIC_BOOST,
/*!\brief Codec control function to set noise sensitivity.
*
* 0: off, 1: On(YOnly)
*
* Supported in codecs: AV1
*/
AV1E_SET_NOISE_SENSITIVITY,
/*!\brief Codec control function to set content type.
* \note Valid parameter range:
* AOM_CONTENT_DEFAULT = Regular video content (Default)
* AOM_CONTENT_SCREEN = Screen capture content
*
* Supported in codecs: AV1
*/
AV1E_SET_TUNE_CONTENT,
/*!\brief Codec control function to set color space info.
* \note Valid ranges: 0..7, default is "UNKNOWN".
* 0 = UNKNOWN,
* 1 = BT_601
* 2 = BT_709
* 3 = SMPTE_170
* 4 = SMPTE_240
* 5 = BT_2020
* 6 = RESERVED
* 7 = SRGB
*
* Supported in codecs: AV1
*/
AV1E_SET_COLOR_SPACE,
/*!\brief Codec control function to set minimum interval between GF/ARF frames
*
* By default the value is set as 4.
*
* Supported in codecs: AV1
*/
AV1E_SET_MIN_GF_INTERVAL,
/*!\brief Codec control function to set minimum interval between GF/ARF frames
*
* By default the value is set as 16.
*
* Supported in codecs: AV1
*/
AV1E_SET_MAX_GF_INTERVAL,
/*!\brief Codec control function to get an Active map back from the encoder.
*
* Supported in codecs: AV1
*/
AV1E_GET_ACTIVEMAP,
/*!\brief Codec control function to set color range bit.
* \note Valid ranges: 0..1, default is 0
* 0 = Limited range (16..235 or HBD equivalent)
* 1 = Full range (0..255 or HBD equivalent)
*
* Supported in codecs: AV1
*/
AV1E_SET_COLOR_RANGE,
/*!\brief Codec control function to set intended rendering image size.
*
* By default, this is identical to the image size in pixels.
*
* Supported in codecs: AV1
*/
AV1E_SET_RENDER_SIZE,
/*!\brief Codec control function to set target level.
*
* 255: off (default); 0: only keep level stats; 10: target for level 1.0;
* 11: target for level 1.1; ... 62: target for level 6.2
*
* Supported in codecs: AV1
*/
AV1E_SET_TARGET_LEVEL,
/*!\brief Codec control function to get bitstream level.
*
* Supported in codecs: AV1
*/
AV1E_GET_LEVEL,
/*!\brief Codec control function to set intended superblock size.
*
* By default, the superblock size is determined separately for each
* frame by the encoder.
*
* Supported in codecs: AV1
*/
AV1E_SET_SUPERBLOCK_SIZE,
};
/*!\brief aom 1-D scaling mode
*
* This set of constants define 1-D aom scaling modes
*/
typedef enum aom_scaling_mode_1d {
AOME_NORMAL = 0,
AOME_FOURFIVE = 1,
AOME_THREEFIVE = 2,
AOME_ONETWO = 3
} AOM_SCALING_MODE;
/*!\brief aom region of interest map
*
* These defines the data structures for the region of interest map
*
*/
typedef struct aom_roi_map {
/*! An id between 0 and 3 for each 16x16 region within a frame. */
unsigned char *roi_map;
unsigned int rows; /**< Number of rows. */
unsigned int cols; /**< Number of columns. */
// TODO(paulwilkins): broken for AV1 which has 8 segments
// q and loop filter deltas for each segment
// (see MAX_MB_SEGMENTS)
int delta_q[4]; /**< Quantizer deltas. */
int delta_lf[4]; /**< Loop filter deltas. */
/*! Static breakout threshold for each segment. */
unsigned int static_threshold[4];
} aom_roi_map_t;
/*!\brief aom active region map
*
* These defines the data structures for active region map
*
*/
typedef struct aom_active_map {
/*!\brief specify an on (1) or off (0) each 16x16 region within a frame */
unsigned char *active_map;
unsigned int rows; /**< number of rows */
unsigned int cols; /**< number of cols */
} aom_active_map_t;
/*!\brief aom image scaling mode
*
* This defines the data structure for image scaling mode
*
*/
typedef struct aom_scaling_mode {
AOM_SCALING_MODE h_scaling_mode; /**< horizontal scaling mode */
AOM_SCALING_MODE v_scaling_mode; /**< vertical scaling mode */
} aom_scaling_mode_t;
/*!\brief VP8 token partition mode
*
* This defines VP8 partitioning mode for compressed data, i.e., the number of
* sub-streams in the bitstream. Used for parallelized decoding.
*
*/
typedef enum {
AOM_ONE_TOKENPARTITION = 0,
AOM_TWO_TOKENPARTITION = 1,
AOM_FOUR_TOKENPARTITION = 2,
AOM_EIGHT_TOKENPARTITION = 3
} aome_token_partitions;
/*!brief AV1 encoder content type */
typedef enum {
AOM_CONTENT_DEFAULT,
AOM_CONTENT_SCREEN,
AOM_CONTENT_INVALID
} aom_tune_content;
/*!\brief VP8 model tuning parameters
*
* Changes the encoder to tune for certain types of input material.
*
*/
typedef enum { AOM_TUNE_PSNR, AOM_TUNE_SSIM } aom_tune_metric;
/*!\cond */
/*!\brief VP8 encoder control function parameter type
*
* Defines the data types that VP8E control functions take. Note that
* additional common controls are defined in aom.h
*
*/
AOM_CTRL_USE_TYPE_DEPRECATED(AOME_USE_REFERENCE, int)
#define AOM_CTRL_AOME_USE_REFERENCE
AOM_CTRL_USE_TYPE(AOME_SET_FRAME_FLAGS, int)
#define AOM_CTRL_AOME_SET_FRAME_FLAGS
AOM_CTRL_USE_TYPE(AOME_SET_ROI_MAP, aom_roi_map_t *)
#define AOM_CTRL_AOME_SET_ROI_MAP
AOM_CTRL_USE_TYPE(AOME_SET_ACTIVEMAP, aom_active_map_t *)
#define AOM_CTRL_AOME_SET_ACTIVEMAP
AOM_CTRL_USE_TYPE(AOME_SET_SCALEMODE, aom_scaling_mode_t *)
#define AOM_CTRL_AOME_SET_SCALEMODE
AOM_CTRL_USE_TYPE(AOME_SET_CPUUSED, int)
#define AOM_CTRL_AOME_SET_CPUUSED
AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOALTREF, unsigned int)
#define AOM_CTRL_AOME_SET_ENABLEAUTOALTREF
#if CONFIG_EXT_REFS
AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOBWDREF, unsigned int)
#define AOM_CTRL_AOME_SET_ENABLEAUTOBWDREF
#endif // CONFIG_EXT_REFS
AOM_CTRL_USE_TYPE(AOME_SET_NOISE_SENSITIVITY, unsigned int)
#define AOM_CTRL_AOME_SET_NOISE_SENSITIVITY
AOM_CTRL_USE_TYPE(AOME_SET_SHARPNESS, unsigned int)
#define AOM_CTRL_AOME_SET_SHARPNESS
AOM_CTRL_USE_TYPE(AOME_SET_STATIC_THRESHOLD, unsigned int)
#define AOM_CTRL_AOME_SET_STATIC_THRESHOLD
AOM_CTRL_USE_TYPE(AOME_SET_TOKEN_PARTITIONS, int) /* aome_token_partitions */
#define AOM_CTRL_AOME_SET_TOKEN_PARTITIONS
AOM_CTRL_USE_TYPE(AOME_SET_ARNR_MAXFRAMES, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_MAXFRAMES
AOM_CTRL_USE_TYPE(AOME_SET_ARNR_STRENGTH, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_STRENGTH
AOM_CTRL_USE_TYPE_DEPRECATED(AOME_SET_ARNR_TYPE, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_TYPE
AOM_CTRL_USE_TYPE(AOME_SET_TUNING, int) /* aom_tune_metric */
#define AOM_CTRL_AOME_SET_TUNING
AOM_CTRL_USE_TYPE(AOME_SET_CQ_LEVEL, unsigned int)
#define AOM_CTRL_AOME_SET_CQ_LEVEL
AOM_CTRL_USE_TYPE(AV1E_SET_TILE_COLUMNS, int)
#define AOM_CTRL_AV1E_SET_TILE_COLUMNS
AOM_CTRL_USE_TYPE(AV1E_SET_TILE_ROWS, int)
#define AOM_CTRL_AV1E_SET_TILE_ROWS
AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER, int *)
#define AOM_CTRL_AOME_GET_LAST_QUANTIZER
AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER_64, int *)
#define AOM_CTRL_AOME_GET_LAST_QUANTIZER_64
AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTRA_BITRATE_PCT, unsigned int)
#define AOM_CTRL_AOME_SET_MAX_INTRA_BITRATE_PCT
AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTER_BITRATE_PCT, unsigned int)
#define AOM_CTRL_AOME_SET_MAX_INTER_BITRATE_PCT
AOM_CTRL_USE_TYPE(AOME_SET_SCREEN_CONTENT_MODE, unsigned int)
#define AOM_CTRL_AOME_SET_SCREEN_CONTENT_MODE
AOM_CTRL_USE_TYPE(AV1E_SET_GF_CBR_BOOST_PCT, unsigned int)
#define AOM_CTRL_AV1E_SET_GF_CBR_BOOST_PCT
AOM_CTRL_USE_TYPE(AV1E_SET_LOSSLESS, unsigned int)
#define AOM_CTRL_AV1E_SET_LOSSLESS
#if CONFIG_AOM_QM
AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_QM, unsigned int)
#define AOM_CTRL_AV1E_SET_ENABLE_QM
AOM_CTRL_USE_TYPE(AV1E_SET_QM_MIN, unsigned int)
#define AOM_CTRL_AV1E_SET_QM_MIN
AOM_CTRL_USE_TYPE(AV1E_SET_QM_MAX, unsigned int)
#define AOM_CTRL_AV1E_SET_QM_MAX
#endif
AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PARALLEL_DECODING, unsigned int)
#define AOM_CTRL_AV1E_SET_FRAME_PARALLEL_DECODING
AOM_CTRL_USE_TYPE(AV1E_SET_AQ_MODE, unsigned int)
#define AOM_CTRL_AV1E_SET_AQ_MODE
AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PERIODIC_BOOST, unsigned int)
#define AOM_CTRL_AV1E_SET_FRAME_PERIODIC_BOOST
AOM_CTRL_USE_TYPE(AV1E_SET_NOISE_SENSITIVITY, unsigned int)
#define AOM_CTRL_AV1E_SET_NOISE_SENSITIVITY
AOM_CTRL_USE_TYPE(AV1E_SET_TUNE_CONTENT, int) /* aom_tune_content */
#define AOM_CTRL_AV1E_SET_TUNE_CONTENT
AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_SPACE, int)
#define AOM_CTRL_AV1E_SET_COLOR_SPACE
AOM_CTRL_USE_TYPE(AV1E_SET_MIN_GF_INTERVAL, unsigned int)
#define AOM_CTRL_AV1E_SET_MIN_GF_INTERVAL
AOM_CTRL_USE_TYPE(AV1E_SET_MAX_GF_INTERVAL, unsigned int)
#define AOM_CTRL_AV1E_SET_MAX_GF_INTERVAL
AOM_CTRL_USE_TYPE(AV1E_GET_ACTIVEMAP, aom_active_map_t *)
#define AOM_CTRL_AV1E_GET_ACTIVEMAP
AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_RANGE, int)
#define AOM_CTRL_AV1E_SET_COLOR_RANGE
/*!\brief
*
* TODO(rbultje) : add support of the control in ffmpeg
*/
#define AOM_CTRL_AV1E_SET_RENDER_SIZE
AOM_CTRL_USE_TYPE(AV1E_SET_RENDER_SIZE, int *)
AOM_CTRL_USE_TYPE(AV1E_SET_SUPERBLOCK_SIZE, unsigned int)
#define AOM_CTRL_AV1E_SET_SUPERBLOCK_SIZE
AOM_CTRL_USE_TYPE(AV1E_SET_TARGET_LEVEL, unsigned int)
#define AOM_CTRL_AV1E_SET_TARGET_LEVEL
AOM_CTRL_USE_TYPE(AV1E_GET_LEVEL, int *)
#define AOM_CTRL_AV1E_GET_LEVEL
/*!\endcond */
/*! @} - end defgroup vp8_encoder */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOMCX_H_

View File

@@ -1,191 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup aom_decoder AOMedia AOM/AV1 Decoder
* \ingroup aom
*
* @{
*/
/*!\file
* \brief Provides definitions for using AOM or AV1 within the aom Decoder
* interface.
*/
#ifndef AOM_AOMDX_H_
#define AOM_AOMDX_H_
#ifdef __cplusplus
extern "C" {
#endif
/* Include controls common to both the encoder and decoder */
#include "./aom.h"
/*!\name Algorithm interface for AV1
*
* This interface provides the capability to decode AV1 streams.
* @{
*/
extern aom_codec_iface_t aom_codec_av1_dx_algo;
extern aom_codec_iface_t *aom_codec_av1_dx(void);
/*!@} - end algorithm interface member group*/
/** Data structure that stores bit accounting for debug
*/
typedef struct Accounting Accounting;
/*!\enum aom_dec_control_id
* \brief AOM decoder control functions
*
* This set of macros define the control functions available for the AOM
* decoder interface.
*
* \sa #aom_codec_control
*/
enum aom_dec_control_id {
/** control function to get info on which reference frames were updated
* by the last decode
*/
AOMD_GET_LAST_REF_UPDATES = AOM_DECODER_CTRL_ID_START,
/** check if the indicated frame is corrupted */
AOMD_GET_FRAME_CORRUPTED,
/** control function to get info on which reference frames were used
* by the last decode
*/
AOMD_GET_LAST_REF_USED,
/** decryption function to decrypt encoded buffer data immediately
* before decoding. Takes a aom_decrypt_init, which contains
* a callback function and opaque context pointer.
*/
AOMD_SET_DECRYPTOR,
// AOMD_SET_DECRYPTOR = AOMD_SET_DECRYPTOR,
/** control function to get the dimensions that the current frame is decoded
* at. This may be different to the intended display size for the frame as
* specified in the wrapper or frame header (see AV1D_GET_DISPLAY_SIZE). */
AV1D_GET_FRAME_SIZE,
/** control function to get the current frame's intended display dimensions
* (as specified in the wrapper or frame header). This may be different to
* the decoded dimensions of this frame (see AV1D_GET_FRAME_SIZE). */
AV1D_GET_DISPLAY_SIZE,
/** control function to get the bit depth of the stream. */
AV1D_GET_BIT_DEPTH,
/** control function to set the byte alignment of the planes in the reference
* buffers. Valid values are power of 2, from 32 to 1024. A value of 0 sets
* legacy alignment. I.e. Y plane is aligned to 32 bytes, U plane directly
* follows Y plane, and V plane directly follows U plane. Default value is 0.
*/
AV1_SET_BYTE_ALIGNMENT,
/** control function to invert the decoding order to from right to left. The
* function is used in a test to confirm the decoding independence of tile
* columns. The function may be used in application where this order
* of decoding is desired.
*
* TODO(yaowu): Rework the unit test that uses this control, and in a future
* release, this test-only control shall be removed.
*/
AV1_INVERT_TILE_DECODE_ORDER,
/** control function to set the skip loop filter flag. Valid values are
* integers. The decoder will skip the loop filter when its value is set to
* nonzero. If the loop filter is skipped the decoder may accumulate decode
* artifacts. The default value is 0.
*/
AV1_SET_SKIP_LOOP_FILTER,
/** control function to retrieve a pointer to the Accounting struct. When
* compiled without --enable-accounting, this returns AOM_CODEC_INCAPABLE.
* If called before a frame has been decoded, this returns AOM_CODEC_ERROR.
* The caller should ensure that AOM_CODEC_OK is returned before attempting
* to dereference the Accounting pointer.
*/
AV1_GET_ACCOUNTING,
AOM_DECODER_CTRL_ID_MAX,
/** control function to set the range of tile decoding. A value that is
* greater and equal to zero indicates only the specific row/column is
* decoded. A value that is -1 indicates the whole row/column is decoded.
* A special case is both values are -1 that means the whole frame is
* decoded.
*/
AV1_SET_DECODE_TILE_ROW,
AV1_SET_DECODE_TILE_COL
};
/** Decrypt n bytes of data from input -> output, using the decrypt_state
* passed in AOMD_SET_DECRYPTOR.
*/
typedef void (*aom_decrypt_cb)(void *decrypt_state, const unsigned char *input,
unsigned char *output, int count);
/*!\brief Structure to hold decryption state
*
* Defines a structure to hold the decryption state and access function.
*/
typedef struct aom_decrypt_init {
/*! Decrypt callback. */
aom_decrypt_cb decrypt_cb;
/*! Decryption state. */
void *decrypt_state;
} aom_decrypt_init;
/*!\brief A deprecated alias for aom_decrypt_init.
*/
typedef aom_decrypt_init aom_decrypt_init;
/*!\cond */
/*!\brief AOM decoder control function parameter type
*
* Defines the data types that AOMD control functions take. Note that
* additional common controls are defined in aom.h
*
*/
AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_UPDATES, int *)
#define AOM_CTRL_AOMD_GET_LAST_REF_UPDATES
AOM_CTRL_USE_TYPE(AOMD_GET_FRAME_CORRUPTED, int *)
#define AOM_CTRL_AOMD_GET_FRAME_CORRUPTED
AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_USED, int *)
#define AOM_CTRL_AOMD_GET_LAST_REF_USED
AOM_CTRL_USE_TYPE(AOMD_SET_DECRYPTOR, aom_decrypt_init *)
#define AOM_CTRL_AOMD_SET_DECRYPTOR
// AOM_CTRL_USE_TYPE(AOMD_SET_DECRYPTOR, aom_decrypt_init *)
//#define AOM_CTRL_AOMD_SET_DECRYPTOR
AOM_CTRL_USE_TYPE(AV1D_GET_DISPLAY_SIZE, int *)
#define AOM_CTRL_AV1D_GET_DISPLAY_SIZE
AOM_CTRL_USE_TYPE(AV1D_GET_BIT_DEPTH, unsigned int *)
#define AOM_CTRL_AV1D_GET_BIT_DEPTH
AOM_CTRL_USE_TYPE(AV1D_GET_FRAME_SIZE, int *)
#define AOM_CTRL_AV1D_GET_FRAME_SIZE
AOM_CTRL_USE_TYPE(AV1_INVERT_TILE_DECODE_ORDER, int)
#define AOM_CTRL_AV1_INVERT_TILE_DECODE_ORDER
AOM_CTRL_USE_TYPE(AV1_GET_ACCOUNTING, Accounting **)
#define AOM_CTRL_AV1_GET_ACCOUNTING
AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_ROW, int)
#define AOM_CTRL_AV1_SET_DECODE_TILE_ROW
AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_COL, int)
#define AOM_CTRL_AV1_SET_DECODE_TILE_COL
/*!\endcond */
/*! @} - end defgroup aom_decoder */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOMDX_H_

View File

@@ -1,16 +0,0 @@
text aom_codec_build_config
text aom_codec_control_
text aom_codec_destroy
text aom_codec_err_to_string
text aom_codec_error
text aom_codec_error_detail
text aom_codec_get_caps
text aom_codec_iface_name
text aom_codec_version
text aom_codec_version_extra_str
text aom_codec_version_str
text aom_img_alloc
text aom_img_flip
text aom_img_free
text aom_img_set_rect
text aom_img_wrap

View File

@@ -1,8 +0,0 @@
text aom_codec_dec_init_ver
text aom_codec_decode
text aom_codec_get_frame
text aom_codec_get_stream_info
text aom_codec_peek_stream_info
text aom_codec_register_put_frame_cb
text aom_codec_register_put_slice_cb
text aom_codec_set_frame_buffer_functions

View File

@@ -1,9 +0,0 @@
text aom_codec_enc_config_default
text aom_codec_enc_config_set
text aom_codec_enc_init_multi_ver
text aom_codec_enc_init_ver
text aom_codec_encode
text aom_codec_get_cx_data
text aom_codec_get_global_headers
text aom_codec_get_preview_frame
text aom_codec_set_cx_data_buf

View File

@@ -1,134 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap decoder algorithms.
*
*/
#include <stdarg.h>
#include <stdlib.h>
#include "aom/aom_integer.h"
#include "aom/internal/aom_codec_internal.h"
#include "aom_version.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
int aom_codec_version(void) { return VERSION_PACKED; }
const char *aom_codec_version_str(void) { return VERSION_STRING_NOSP; }
const char *aom_codec_version_extra_str(void) { return VERSION_EXTRA; }
const char *aom_codec_iface_name(aom_codec_iface_t *iface) {
return iface ? iface->name : "<invalid interface>";
}
const char *aom_codec_err_to_string(aom_codec_err_t err) {
switch (err) {
case AOM_CODEC_OK: return "Success";
case AOM_CODEC_ERROR: return "Unspecified internal error";
case AOM_CODEC_MEM_ERROR: return "Memory allocation error";
case AOM_CODEC_ABI_MISMATCH: return "ABI version mismatch";
case AOM_CODEC_INCAPABLE:
return "Codec does not implement requested capability";
case AOM_CODEC_UNSUP_BITSTREAM:
return "Bitstream not supported by this decoder";
case AOM_CODEC_UNSUP_FEATURE:
return "Bitstream required feature not supported by this decoder";
case AOM_CODEC_CORRUPT_FRAME: return "Corrupt frame detected";
case AOM_CODEC_INVALID_PARAM: return "Invalid parameter";
case AOM_CODEC_LIST_END: return "End of iterated list";
}
return "Unrecognized error code";
}
const char *aom_codec_error(aom_codec_ctx_t *ctx) {
return (ctx) ? aom_codec_err_to_string(ctx->err)
: aom_codec_err_to_string(AOM_CODEC_INVALID_PARAM);
}
const char *aom_codec_error_detail(aom_codec_ctx_t *ctx) {
if (ctx && ctx->err)
return ctx->priv ? ctx->priv->err_detail : ctx->err_detail;
return NULL;
}
aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx) {
aom_codec_err_t res;
if (!ctx)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
ctx->iface->destroy((aom_codec_alg_priv_t *)ctx->priv);
ctx->iface = NULL;
ctx->name = NULL;
ctx->priv = NULL;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface) {
return (iface) ? iface->caps : 0;
}
aom_codec_err_t aom_codec_control_(aom_codec_ctx_t *ctx, int ctrl_id, ...) {
aom_codec_err_t res;
if (!ctx || !ctrl_id)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv || !ctx->iface->ctrl_maps)
res = AOM_CODEC_ERROR;
else {
aom_codec_ctrl_fn_map_t *entry;
res = AOM_CODEC_ERROR;
for (entry = ctx->iface->ctrl_maps; entry && entry->fn; entry++) {
if (!entry->ctrl_id || entry->ctrl_id == ctrl_id) {
va_list ap;
va_start(ap, ctrl_id);
res = entry->fn((aom_codec_alg_priv_t *)ctx->priv, ap);
va_end(ap);
break;
}
}
}
return SAVE_STATUS(ctx, res);
}
void aom_internal_error(struct aom_internal_error_info *info,
aom_codec_err_t error, const char *fmt, ...) {
va_list ap;
info->error_code = error;
info->has_detail = 0;
if (fmt) {
size_t sz = sizeof(info->detail);
info->has_detail = 1;
va_start(ap, fmt);
vsnprintf(info->detail, sz - 1, fmt, ap);
va_end(ap);
info->detail[sz - 1] = '\0';
}
if (info->setjmp) longjmp(info->jmp, info->error_code);
}

View File

@@ -1,189 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap decoder algorithms.
*
*/
#include <string.h>
#include "aom/internal/aom_codec_internal.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
static aom_codec_alg_priv_t *get_alg_priv(aom_codec_ctx_t *ctx) {
return (aom_codec_alg_priv_t *)ctx->priv;
}
aom_codec_err_t aom_codec_dec_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_dec_cfg_t *cfg,
aom_codec_flags_t flags, int ver) {
aom_codec_err_t res;
if (ver != AOM_DECODER_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!ctx || !iface)
res = AOM_CODEC_INVALID_PARAM;
else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if ((flags & AOM_CODEC_USE_POSTPROC) &&
!(iface->caps & AOM_CODEC_CAP_POSTPROC))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_ERROR_CONCEALMENT) &&
!(iface->caps & AOM_CODEC_CAP_ERROR_CONCEALMENT))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_INPUT_FRAGMENTS) &&
!(iface->caps & AOM_CODEC_CAP_INPUT_FRAGMENTS))
res = AOM_CODEC_INCAPABLE;
else if (!(iface->caps & AOM_CODEC_CAP_DECODER))
res = AOM_CODEC_INCAPABLE;
else {
memset(ctx, 0, sizeof(*ctx));
ctx->iface = iface;
ctx->name = iface->name;
ctx->priv = NULL;
ctx->init_flags = flags;
ctx->config.dec = cfg;
res = ctx->iface->init(ctx, NULL);
if (res) {
ctx->err_detail = ctx->priv ? ctx->priv->err_detail : NULL;
aom_codec_destroy(ctx);
}
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_peek_stream_info(aom_codec_iface_t *iface,
const uint8_t *data,
unsigned int data_sz,
aom_codec_stream_info_t *si) {
aom_codec_err_t res;
if (!iface || !data || !data_sz || !si ||
si->sz < sizeof(aom_codec_stream_info_t))
res = AOM_CODEC_INVALID_PARAM;
else {
/* Set default/unknown values */
si->w = 0;
si->h = 0;
res = iface->dec.peek_si(data, data_sz, si);
}
return res;
}
aom_codec_err_t aom_codec_get_stream_info(aom_codec_ctx_t *ctx,
aom_codec_stream_info_t *si) {
aom_codec_err_t res;
if (!ctx || !si || si->sz < sizeof(aom_codec_stream_info_t))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
/* Set default/unknown values */
si->w = 0;
si->h = 0;
res = ctx->iface->dec.get_si(get_alg_priv(ctx), si);
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_decode(aom_codec_ctx_t *ctx, const uint8_t *data,
unsigned int data_sz, void *user_priv,
long deadline) {
aom_codec_err_t res;
/* Sanity checks */
/* NULL data ptr allowed if data_sz is 0 too */
if (!ctx || (!data && data_sz) || (data && !data_sz))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
res = ctx->iface->dec.decode(get_alg_priv(ctx), data, data_sz, user_priv,
deadline);
}
return SAVE_STATUS(ctx, res);
}
aom_image_t *aom_codec_get_frame(aom_codec_ctx_t *ctx, aom_codec_iter_t *iter) {
aom_image_t *img;
if (!ctx || !iter || !ctx->iface || !ctx->priv)
img = NULL;
else
img = ctx->iface->dec.get_frame(get_alg_priv(ctx), iter);
return img;
}
aom_codec_err_t aom_codec_register_put_frame_cb(aom_codec_ctx_t *ctx,
aom_codec_put_frame_cb_fn_t cb,
void *user_priv) {
aom_codec_err_t res;
if (!ctx || !cb)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_PUT_FRAME))
res = AOM_CODEC_ERROR;
else {
ctx->priv->dec.put_frame_cb.u.put_frame = cb;
ctx->priv->dec.put_frame_cb.user_priv = user_priv;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_register_put_slice_cb(aom_codec_ctx_t *ctx,
aom_codec_put_slice_cb_fn_t cb,
void *user_priv) {
aom_codec_err_t res;
if (!ctx || !cb)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_PUT_SLICE))
res = AOM_CODEC_ERROR;
else {
ctx->priv->dec.put_slice_cb.u.put_slice = cb;
ctx->priv->dec.put_slice_cb.user_priv = user_priv;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_set_frame_buffer_functions(
aom_codec_ctx_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv) {
aom_codec_err_t res;
if (!ctx || !cb_get || !cb_release) {
res = AOM_CODEC_INVALID_PARAM;
} else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER)) {
res = AOM_CODEC_ERROR;
} else {
res = ctx->iface->dec.set_fb_fn(get_alg_priv(ctx), cb_get, cb_release,
cb_priv);
}
return SAVE_STATUS(ctx, res);
}

View File

@@ -1,240 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <stdlib.h>
#include <string.h>
#include "aom/aom_image.h"
#include "aom/aom_integer.h"
#include "aom_mem/aom_mem.h"
static aom_image_t *img_alloc_helper(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int buf_align,
unsigned int stride_align,
unsigned char *img_data) {
unsigned int h, w, s, xcs, ycs, bps;
unsigned int stride_in_bytes;
int align;
/* Treat align==0 like align==1 */
if (!buf_align) buf_align = 1;
/* Validate alignment (must be power of 2) */
if (buf_align & (buf_align - 1)) goto fail;
/* Treat align==0 like align==1 */
if (!stride_align) stride_align = 1;
/* Validate alignment (must be power of 2) */
if (stride_align & (stride_align - 1)) goto fail;
/* Get sample size for this format */
switch (fmt) {
case AOM_IMG_FMT_RGB32:
case AOM_IMG_FMT_RGB32_LE:
case AOM_IMG_FMT_ARGB:
case AOM_IMG_FMT_ARGB_LE: bps = 32; break;
case AOM_IMG_FMT_RGB24:
case AOM_IMG_FMT_BGR24: bps = 24; break;
case AOM_IMG_FMT_RGB565:
case AOM_IMG_FMT_RGB565_LE:
case AOM_IMG_FMT_RGB555:
case AOM_IMG_FMT_RGB555_LE:
case AOM_IMG_FMT_UYVY:
case AOM_IMG_FMT_YUY2:
case AOM_IMG_FMT_YVYU: bps = 16; break;
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12: bps = 12; break;
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I440: bps = 16; break;
case AOM_IMG_FMT_I444: bps = 24; break;
case AOM_IMG_FMT_I42016: bps = 24; break;
case AOM_IMG_FMT_I42216:
case AOM_IMG_FMT_I44016: bps = 32; break;
case AOM_IMG_FMT_I44416: bps = 48; break;
default: bps = 16; break;
}
/* Get chroma shift values for this format */
switch (fmt) {
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12:
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I42216: xcs = 1; break;
default: xcs = 0; break;
}
switch (fmt) {
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_I440:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12:
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I44016: ycs = 1; break;
default: ycs = 0; break;
}
/* Calculate storage sizes given the chroma subsampling */
align = (1 << xcs) - 1;
w = (d_w + align) & ~align;
align = (1 << ycs) - 1;
h = (d_h + align) & ~align;
s = (fmt & AOM_IMG_FMT_PLANAR) ? w : bps * w / 8;
s = (s + stride_align - 1) & ~(stride_align - 1);
stride_in_bytes = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
/* Allocate the new image */
if (!img) {
img = (aom_image_t *)calloc(1, sizeof(aom_image_t));
if (!img) goto fail;
img->self_allocd = 1;
} else {
memset(img, 0, sizeof(aom_image_t));
}
img->img_data = img_data;
if (!img_data) {
const uint64_t alloc_size = (fmt & AOM_IMG_FMT_PLANAR)
? (uint64_t)h * s * bps / 8
: (uint64_t)h * s;
if (alloc_size != (size_t)alloc_size) goto fail;
img->img_data = (uint8_t *)aom_memalign(buf_align, (size_t)alloc_size);
img->img_data_owner = 1;
}
if (!img->img_data) goto fail;
img->fmt = fmt;
img->bit_depth = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 16 : 8;
img->w = w;
img->h = h;
img->x_chroma_shift = xcs;
img->y_chroma_shift = ycs;
img->bps = bps;
/* Calculate strides */
img->stride[AOM_PLANE_Y] = img->stride[AOM_PLANE_ALPHA] = stride_in_bytes;
img->stride[AOM_PLANE_U] = img->stride[AOM_PLANE_V] = stride_in_bytes >> xcs;
/* Default viewport to entire image */
if (!aom_img_set_rect(img, 0, 0, d_w, d_h)) return img;
fail:
aom_img_free(img);
return NULL;
}
aom_image_t *aom_img_alloc(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int align) {
return img_alloc_helper(img, fmt, d_w, d_h, align, align, NULL);
}
aom_image_t *aom_img_wrap(aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w,
unsigned int d_h, unsigned int stride_align,
unsigned char *img_data) {
/* By setting buf_align = 1, we don't change buffer alignment in this
* function. */
return img_alloc_helper(img, fmt, d_w, d_h, 1, stride_align, img_data);
}
int aom_img_set_rect(aom_image_t *img, unsigned int x, unsigned int y,
unsigned int w, unsigned int h) {
unsigned char *data;
if (x + w <= img->w && y + h <= img->h) {
img->d_w = w;
img->d_h = h;
/* Calculate plane pointers */
if (!(img->fmt & AOM_IMG_FMT_PLANAR)) {
img->planes[AOM_PLANE_PACKED] =
img->img_data + x * img->bps / 8 + y * img->stride[AOM_PLANE_PACKED];
} else {
const int bytes_per_sample =
(img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1;
data = img->img_data;
if (img->fmt & AOM_IMG_FMT_HAS_ALPHA) {
img->planes[AOM_PLANE_ALPHA] =
data + x * bytes_per_sample + y * img->stride[AOM_PLANE_ALPHA];
data += img->h * img->stride[AOM_PLANE_ALPHA];
}
img->planes[AOM_PLANE_Y] =
data + x * bytes_per_sample + y * img->stride[AOM_PLANE_Y];
data += img->h * img->stride[AOM_PLANE_Y];
if (!(img->fmt & AOM_IMG_FMT_UV_FLIP)) {
img->planes[AOM_PLANE_U] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
data += (img->h >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
img->planes[AOM_PLANE_V] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
} else {
img->planes[AOM_PLANE_V] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
data += (img->h >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
img->planes[AOM_PLANE_U] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
}
}
return 0;
}
return -1;
}
void aom_img_flip(aom_image_t *img) {
/* Note: In the calculation pointer adjustment calculation, we want the
* rhs to be promoted to a signed type. Section 6.3.1.8 of the ISO C99
* standard indicates that if the adjustment parameter is unsigned, the
* stride parameter will be promoted to unsigned, causing errors when
* the lhs is a larger type than the rhs.
*/
img->planes[AOM_PLANE_Y] += (signed)(img->d_h - 1) * img->stride[AOM_PLANE_Y];
img->stride[AOM_PLANE_Y] = -img->stride[AOM_PLANE_Y];
img->planes[AOM_PLANE_U] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
img->stride[AOM_PLANE_U];
img->stride[AOM_PLANE_U] = -img->stride[AOM_PLANE_U];
img->planes[AOM_PLANE_V] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
img->stride[AOM_PLANE_V];
img->stride[AOM_PLANE_V] = -img->stride[AOM_PLANE_V];
img->planes[AOM_PLANE_ALPHA] +=
(signed)(img->d_h - 1) * img->stride[AOM_PLANE_ALPHA];
img->stride[AOM_PLANE_ALPHA] = -img->stride[AOM_PLANE_ALPHA];
}
void aom_img_free(aom_image_t *img) {
if (img) {
if (img->img_data && img->img_data_owner) aom_free(img->img_data);
if (img->self_allocd) free(img);
}
}

View File

@@ -1,64 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/ans.h"
#include "aom_dsp/prob.h"
static int find_largest(const aom_cdf_prob *const pdf_tab, int num_syms) {
int largest_idx = -1;
int largest_p = -1;
int i;
for (i = 0; i < num_syms; ++i) {
int p = pdf_tab[i];
if (p > largest_p) {
largest_p = p;
largest_idx = i;
}
}
return largest_idx;
}
void aom_rans_merge_prob8_pdf(aom_cdf_prob *const out_pdf,
const AnsP8 node_prob,
const aom_cdf_prob *const src_pdf, int in_syms) {
int i;
int adjustment = RANS_PRECISION;
const int round_fact = ANS_P8_PRECISION >> 1;
const AnsP8 p1 = ANS_P8_PRECISION - node_prob;
const int out_syms = in_syms + 1;
assert(src_pdf != out_pdf);
out_pdf[0] = node_prob << (RANS_PROB_BITS - ANS_P8_SHIFT);
adjustment -= out_pdf[0];
for (i = 0; i < in_syms; ++i) {
int p = (p1 * src_pdf[i] + round_fact) >> ANS_P8_SHIFT;
p = AOMMIN(p, (int)RANS_PRECISION - in_syms);
p = AOMMAX(p, 1);
out_pdf[i + 1] = p;
adjustment -= p;
}
// Adjust probabilities so they sum to the total probability
if (adjustment > 0) {
i = find_largest(out_pdf, out_syms);
out_pdf[i] += adjustment;
} else {
while (adjustment < 0) {
i = find_largest(out_pdf, out_syms);
--out_pdf[i];
assert(out_pdf[i] > 0);
adjustment++;
}
}
}

View File

@@ -1,44 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANS_H_
#define AOM_DSP_ANS_H_
// Constants, types and utilities for Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/prob.h"
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
typedef uint8_t AnsP8;
#define ANS_P8_PRECISION 256u
#define ANS_P8_SHIFT 8
#define RANS_PROB_BITS 15
#define RANS_PRECISION (1u << RANS_PROB_BITS)
// L_BASE % PRECISION must be 0. Increasing L_BASE beyond 2**15 will cause uabs
// to overflow.
#define L_BASE (RANS_PRECISION)
#define IO_BASE 256
// Range I = { L_BASE, L_BASE + 1, ..., L_BASE * IO_BASE - 1 }
void aom_rans_merge_prob8_pdf(aom_cdf_prob *const out_pdf,
const AnsP8 node_prob,
const aom_cdf_prob *const src_pdf, int in_syms);
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANS_H_

View File

@@ -1,146 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANSREADER_H_
#define AOM_DSP_ANSREADER_H_
// A uABS and rANS decoder implementation of Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/prob.h"
#include "aom_dsp/ans.h"
#include "aom_ports/mem_ops.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
struct AnsDecoder {
const uint8_t *buf;
int buf_offset;
uint32_t state;
#if CONFIG_ACCOUNTING
Accounting *accounting;
#endif
};
static INLINE int uabs_read(struct AnsDecoder *ans, AnsP8 p0) {
AnsP8 p = ANS_P8_PRECISION - p0;
int s;
unsigned xp, sp;
unsigned state = ans->state;
while (state < L_BASE && ans->buf_offset > 0) {
state = state * IO_BASE + ans->buf[--ans->buf_offset];
}
sp = state * p;
xp = sp / ANS_P8_PRECISION;
s = (sp & 0xFF) >= p0;
if (s)
ans->state = xp;
else
ans->state = state - xp;
return s;
}
static INLINE int uabs_read_bit(struct AnsDecoder *ans) {
int s;
unsigned state = ans->state;
while (state < L_BASE && ans->buf_offset > 0) {
state = state * IO_BASE + ans->buf[--ans->buf_offset];
}
s = (int)(state & 1);
ans->state = state >> 1;
return s;
}
struct rans_dec_sym {
uint8_t val;
aom_cdf_prob prob;
aom_cdf_prob cum_prob; // not-inclusive
};
static INLINE void fetch_sym(struct rans_dec_sym *out, const aom_cdf_prob *cdf,
aom_cdf_prob rem) {
int i;
aom_cdf_prob cum_prob = 0, top_prob;
// TODO(skal): if critical, could be a binary search.
// Or, better, an O(1) alias-table.
for (i = 0; rem >= (top_prob = cdf[i]); ++i) {
cum_prob = top_prob;
}
out->val = i;
out->prob = top_prob - cum_prob;
out->cum_prob = cum_prob;
}
static INLINE int rans_read(struct AnsDecoder *ans, const aom_cdf_prob *tab) {
unsigned rem;
unsigned quo;
struct rans_dec_sym sym;
while (ans->state < L_BASE && ans->buf_offset > 0) {
ans->state = ans->state * IO_BASE + ans->buf[--ans->buf_offset];
}
quo = ans->state / RANS_PRECISION;
rem = ans->state % RANS_PRECISION;
fetch_sym(&sym, tab, rem);
ans->state = quo * sym.prob + rem - sym.cum_prob;
return sym.val;
}
static INLINE int ans_read_init(struct AnsDecoder *const ans,
const uint8_t *const buf, int offset) {
unsigned x;
if (offset < 1) return 1;
ans->buf = buf;
x = buf[offset - 1] >> 6;
if (x == 0) {
ans->buf_offset = offset - 1;
ans->state = buf[offset - 1] & 0x3F;
} else if (x == 1) {
if (offset < 2) return 1;
ans->buf_offset = offset - 2;
ans->state = mem_get_le16(buf + offset - 2) & 0x3FFF;
} else if (x == 2) {
if (offset < 3) return 1;
ans->buf_offset = offset - 3;
ans->state = mem_get_le24(buf + offset - 3) & 0x3FFFFF;
} else if ((buf[offset - 1] & 0xE0) == 0xE0) {
if (offset < 4) return 1;
ans->buf_offset = offset - 4;
ans->state = mem_get_le32(buf + offset - 4) & 0x1FFFFFFF;
} else {
// 110xxxxx implies this byte is a superframe marker
return 1;
}
#if CONFIG_ACCOUNTING
ans->accounting = NULL;
#endif
ans->state += L_BASE;
if (ans->state >= L_BASE * IO_BASE) return 1;
return 0;
}
static INLINE int ans_read_end(struct AnsDecoder *const ans) {
return ans->state == L_BASE;
}
static INLINE int ans_reader_has_error(const struct AnsDecoder *const ans) {
return ans->state < L_BASE && ans->buf_offset == 0;
}
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANSREADER_H_

View File

@@ -1,120 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANSWRITER_H_
#define AOM_DSP_ANSWRITER_H_
// A uABS and rANS encoder implementation of Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/ans.h"
#include "aom_dsp/prob.h"
#include "aom_ports/mem_ops.h"
#include "av1/common/odintrin.h"
#if RANS_PRECISION <= OD_DIVU_DMAX
#define ANS_DIVREM(quotient, remainder, dividend, divisor) \
do { \
quotient = OD_DIVU_SMALL((dividend), (divisor)); \
remainder = (dividend) - (quotient) * (divisor); \
} while (0)
#else
#define ANS_DIVREM(quotient, remainder, dividend, divisor) \
do { \
quotient = (dividend) / (divisor); \
remainder = (dividend) % (divisor); \
} while (0)
#endif
#define ANS_DIV8(dividend, divisor) OD_DIVU_SMALL((dividend), (divisor))
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
struct AnsCoder {
uint8_t *buf;
int buf_offset;
uint32_t state;
};
static INLINE void ans_write_init(struct AnsCoder *const ans,
uint8_t *const buf) {
ans->buf = buf;
ans->buf_offset = 0;
ans->state = L_BASE;
}
static INLINE int ans_write_end(struct AnsCoder *const ans) {
uint32_t state;
assert(ans->state >= L_BASE);
assert(ans->state < L_BASE * IO_BASE);
state = ans->state - L_BASE;
if (state < (1 << 6)) {
ans->buf[ans->buf_offset] = (0x00 << 6) + state;
return ans->buf_offset + 1;
} else if (state < (1 << 14)) {
mem_put_le16(ans->buf + ans->buf_offset, (0x01 << 14) + state);
return ans->buf_offset + 2;
} else if (state < (1 << 22)) {
mem_put_le24(ans->buf + ans->buf_offset, (0x02 << 22) + state);
return ans->buf_offset + 3;
} else if (state < (1 << 29)) {
mem_put_le32(ans->buf + ans->buf_offset, (0x07 << 29) + state);
return ans->buf_offset + 4;
} else {
assert(0 && "State is too large to be serialized");
return ans->buf_offset;
}
}
// uABS with normalization
static INLINE void uabs_write(struct AnsCoder *ans, int val, AnsP8 p0) {
AnsP8 p = ANS_P8_PRECISION - p0;
const unsigned l_s = val ? p : p0;
while (ans->state >= L_BASE / ANS_P8_PRECISION * IO_BASE * l_s) {
ans->buf[ans->buf_offset++] = ans->state % IO_BASE;
ans->state /= IO_BASE;
}
if (!val)
ans->state = ANS_DIV8(ans->state * ANS_P8_PRECISION, p0);
else
ans->state = ANS_DIV8((ans->state + 1) * ANS_P8_PRECISION + p - 1, p) - 1;
}
struct rans_sym {
aom_cdf_prob prob;
aom_cdf_prob cum_prob; // not-inclusive
};
// rANS with normalization
// sym->prob takes the place of l_s from the paper
// ANS_P10_PRECISION is m
static INLINE void rans_write(struct AnsCoder *ans,
const struct rans_sym *const sym) {
const aom_cdf_prob p = sym->prob;
unsigned quot, rem;
while (ans->state >= L_BASE / RANS_PRECISION * IO_BASE * p) {
ans->buf[ans->buf_offset++] = ans->state % IO_BASE;
ans->state /= IO_BASE;
}
ANS_DIVREM(quot, rem, ans->state, p);
ans->state = quot * RANS_PRECISION + rem + sym->cum_prob;
}
#undef ANS_DIV8
#undef ANS_DIVREM
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANSWRITER_H_

View File

@@ -1,57 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_CONVOLVE_H_
#define AOM_DSP_AOM_CONVOLVE_H_
#include "./aom_config.h"
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
// Note: Fixed size intermediate buffers, place limits on parameters
// of some functions. 2d filtering proceeds in 2 steps:
// (1) Interpolate horizontally into an intermediate buffer, temp.
// (2) Interpolate temp vertically to derive the sub-pixel result.
// Deriving the maximum number of rows in the temp buffer (135):
// --Smallest scaling factor is x1/2 ==> y_step_q4 = 32 (Normative).
// --Largest block size is 64x64 pixels.
// --64 rows in the downscaled frame span a distance of (64 - 1) * 32 in the
// original frame (in 1/16th pixel units).
// --Must round-up because block may be located at sub-pixel position.
// --Require an additional SUBPEL_TAPS rows for the 8-tap filter tails.
// --((64 - 1) * 32 + 15) >> 4 + 8 = 135.
#if CONFIG_AV1 && CONFIG_EXT_PARTITION
#define MAX_EXT_SIZE 263
#else
#define MAX_EXT_SIZE 135
#endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
typedef void (*convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, int w,
int h);
#if CONFIG_AOM_HIGHBITDEPTH
typedef void (*highbd_convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h, int bd);
#endif
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_CONVOLVE_H_

View File

@@ -1,426 +0,0 @@
##
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
DSP_SRCS-yes += aom_dsp.mk
DSP_SRCS-yes += aom_dsp_common.h
DSP_SRCS-$(HAVE_MSA) += mips/macros_msa.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/synonyms.h
# bit reader
DSP_SRCS-yes += prob.h
DSP_SRCS-yes += prob.c
DSP_SRCS-$(CONFIG_ANS) += ans.h
DSP_SRCS-$(CONFIG_ANS) += ans.c
ifeq ($(CONFIG_ENCODERS),yes)
DSP_SRCS-$(CONFIG_ANS) += answriter.h
DSP_SRCS-yes += bitwriter.h
DSP_SRCS-yes += dkboolwriter.h
DSP_SRCS-yes += dkboolwriter.c
DSP_SRCS-yes += bitwriter_buffer.c
DSP_SRCS-yes += bitwriter_buffer.h
DSP_SRCS-yes += psnr.c
DSP_SRCS-yes += psnr.h
DSP_SRCS-$(CONFIG_ANS) += buf_ans.h
DSP_SRCS-$(CONFIG_ANS) += buf_ans.c
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.c
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.h
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += psnrhvs.c
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += fastssim.c
endif
ifeq ($(CONFIG_DECODERS),yes)
DSP_SRCS-$(CONFIG_ANS) += ansreader.h
DSP_SRCS-yes += bitreader.h
DSP_SRCS-yes += dkboolreader.h
DSP_SRCS-yes += dkboolreader.c
DSP_SRCS-yes += bitreader_buffer.c
DSP_SRCS-yes += bitreader_buffer.h
endif
# intra predictions
DSP_SRCS-yes += intrapred.c
ifeq ($(CONFIG_DAALA_EC),yes)
DSP_SRCS-yes += entenc.c
DSP_SRCS-yes += entenc.h
DSP_SRCS-yes += entdec.c
DSP_SRCS-yes += entdec.h
DSP_SRCS-yes += entcode.c
DSP_SRCS-yes += entcode.h
DSP_SRCS-yes += daalaboolreader.c
DSP_SRCS-yes += daalaboolreader.h
DSP_SRCS-yes += daalaboolwriter.c
DSP_SRCS-yes += daalaboolwriter.h
endif
DSP_SRCS-$(HAVE_SSE) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/intrapred_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_ssse3.asm
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE) += x86/highbd_intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_intrapred_sse2.asm
endif # CONFIG_AOM_HIGHBITDEPTH
DSP_SRCS-$(HAVE_NEON_ASM) += arm/intrapred_neon_asm$(ASM)
DSP_SRCS-$(HAVE_NEON) += arm/intrapred_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/intrapred_msa.c
DSP_SRCS-$(HAVE_DSPR2) += mips/intrapred4_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/intrapred8_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/intrapred16_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/common_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/common_dspr2.c
# inter predictions
DSP_SRCS-yes += blend.h
DSP_SRCS-yes += blend_a64_mask.c
DSP_SRCS-yes += blend_a64_hmask.c
DSP_SRCS-yes += blend_a64_vmask.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_sse4.h
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_mask_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_hmask_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_vmask_sse4.c
# interpolation filters
DSP_SRCS-yes += aom_convolve.c
DSP_SRCS-yes += aom_convolve.h
DSP_SRCS-yes += aom_filter.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/convolve.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/aom_asm_stubs.c
DSP_SRCS-$(HAVE_SSE2) += x86/aom_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/aom_subpixel_bilinear_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_bilinear_ssse3.asm
DSP_SRCS-$(HAVE_AVX2) += x86/aom_subpixel_8t_intrin_avx2.c
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_intrin_ssse3.c
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/aom_high_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/aom_high_subpixel_bilinear_sse2.asm
endif
DSP_SRCS-$(HAVE_SSE2) += x86/aom_convolve_copy_sse2.asm
ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/aom_convolve_copy_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve8_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve8_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve_neon.c
else
ifeq ($(HAVE_NEON),yes)
DSP_SRCS-yes += arm/aom_convolve_copy_neon.c
DSP_SRCS-yes += arm/aom_convolve8_avg_neon.c
DSP_SRCS-yes += arm/aom_convolve8_neon.c
DSP_SRCS-yes += arm/aom_convolve_avg_neon.c
DSP_SRCS-yes += arm/aom_convolve_neon.c
endif # HAVE_NEON
endif # HAVE_NEON_ASM
# common (msa)
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_copy_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_msa.h
# common (dspr2)
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve_common_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve2_avg_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve2_avg_horiz_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve2_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve2_horiz_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve2_vert_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve8_avg_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve8_avg_horiz_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve8_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve8_horiz_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve8_vert_dspr2.c
# loop filters
DSP_SRCS-yes += loopfilter.c
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/loopfilter_sse2.c
DSP_SRCS-$(HAVE_AVX2) += x86/loopfilter_avx2.c
DSP_SRCS-$(HAVE_NEON) += arm/loopfilter_neon.c
ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/loopfilter_mb_neon$(ASM)
DSP_SRCS-yes += arm/loopfilter_16_neon$(ASM)
DSP_SRCS-yes += arm/loopfilter_8_neon$(ASM)
DSP_SRCS-yes += arm/loopfilter_4_neon$(ASM)
else
ifeq ($(HAVE_NEON),yes)
DSP_SRCS-yes += arm/loopfilter_16_neon.c
DSP_SRCS-yes += arm/loopfilter_8_neon.c
DSP_SRCS-yes += arm/loopfilter_4_neon.c
endif # HAVE_NEON
endif # HAVE_NEON_ASM
DSP_SRCS-$(HAVE_MSA) += mips/loopfilter_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/loopfilter_16_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/loopfilter_8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/loopfilter_4_msa.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_filters_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_filters_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_macros_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_masks_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_horiz_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_vert_dspr2.c
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_loopfilter_sse2.c
endif # CONFIG_AOM_HIGHBITDEPTH
DSP_SRCS-yes += txfm_common.h
DSP_SRCS-yes += x86/txfm_common_intrin.h
DSP_SRCS-$(HAVE_SSE2) += x86/txfm_common_sse2.h
DSP_SRCS-$(HAVE_MSA) += mips/txfm_macros_msa.h
# forward transform
ifeq ($(CONFIG_AV1),yes)
DSP_SRCS-yes += fwd_txfm.c
DSP_SRCS-yes += fwd_txfm.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32_8cols_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_impl_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32x32_impl_sse2.h
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/fwd_txfm_ssse3_x86_64.asm
endif
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.h
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/txfm_common_avx2.h
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_dct32x32_impl_avx2.h
DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_dct32x32_msa.c
endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_PVQ),yes)
DSP_SRCS-yes += fwd_txfm.c
DSP_SRCS-yes += fwd_txfm.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_impl_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32x32_impl_sse2.h
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/fwd_txfm_ssse3_x86_64.asm
endif
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_dct32x32_impl_avx2.h
DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_dct32x32_msa.c
endif # CONFIG_PVQ
# inverse transform
ifeq ($(CONFIG_AV1), yes)
DSP_SRCS-yes += inv_txfm.h
DSP_SRCS-yes += inv_txfm.c
DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/inv_wht_sse2.asm
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/inv_txfm_ssse3_x86_64.asm
endif # ARCH_X86_64
ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/save_reg_neon$(ASM)
DSP_SRCS-yes += arm/idct4x4_1_add_neon$(ASM)
DSP_SRCS-yes += arm/idct4x4_add_neon$(ASM)
DSP_SRCS-yes += arm/idct8x8_1_add_neon$(ASM)
DSP_SRCS-yes += arm/idct8x8_add_neon$(ASM)
DSP_SRCS-yes += arm/idct16x16_1_add_neon$(ASM)
DSP_SRCS-yes += arm/idct16x16_add_neon$(ASM)
DSP_SRCS-yes += arm/idct32x32_1_add_neon$(ASM)
DSP_SRCS-yes += arm/idct32x32_add_neon$(ASM)
else
ifeq ($(HAVE_NEON),yes)
DSP_SRCS-yes += arm/idct4x4_1_add_neon.c
DSP_SRCS-yes += arm/idct4x4_add_neon.c
DSP_SRCS-yes += arm/idct8x8_1_add_neon.c
DSP_SRCS-yes += arm/idct8x8_add_neon.c
DSP_SRCS-yes += arm/idct16x16_1_add_neon.c
DSP_SRCS-yes += arm/idct16x16_add_neon.c
DSP_SRCS-yes += arm/idct32x32_1_add_neon.c
DSP_SRCS-yes += arm/idct32x32_add_neon.c
endif # HAVE_NEON
endif # HAVE_NEON_ASM
DSP_SRCS-$(HAVE_NEON) += arm/idct16x16_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/inv_txfm_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/idct4x4_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/idct8x8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/idct16x16_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/idct32x32_msa.c
ifneq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_DSPR2) += mips/inv_txfm_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans4_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans8_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans16_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_cols_dspr2.c
endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_AV1
# quantization
ifneq ($(filter yes,$(CONFIG_AV1_ENCODER)),)
DSP_SRCS-yes += quantize.c
DSP_SRCS-yes += quantize.h
DSP_SRCS-$(HAVE_SSE2) += x86/quantize_sse2.c
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_quantize_intrin_sse2.c
endif
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/quantize_ssse3_x86_64.asm
DSP_SRCS-$(HAVE_AVX) += x86/quantize_avx_x86_64.asm
endif
# avg
DSP_SRCS-yes += avg.c
DSP_SRCS-$(HAVE_SSE2) += x86/avg_intrin_sse2.c
DSP_SRCS-$(HAVE_NEON) += arm/avg_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/avg_msa.c
DSP_SRCS-$(HAVE_NEON) += arm/hadamard_neon.c
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/avg_ssse3_x86_64.asm
endif
# high bit depth subtract
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subtract_sse2.c
endif
endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_AV1_ENCODER),yes)
DSP_SRCS-yes += sum_squares.c
DSP_SRCS-$(HAVE_SSE2) += x86/sum_squares_sse2.c
endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_ENCODERS),yes)
DSP_SRCS-yes += sad.c
DSP_SRCS-yes += subtract.c
DSP_SRCS-$(HAVE_MEDIA) += arm/sad_media$(ASM)
DSP_SRCS-$(HAVE_NEON) += arm/sad4d_neon.c
DSP_SRCS-$(HAVE_NEON) += arm/sad_neon.c
DSP_SRCS-$(HAVE_NEON) += arm/subtract_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/sad_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/subtract_msa.c
DSP_SRCS-$(HAVE_SSE3) += x86/sad_sse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/sad_ssse3.asm
DSP_SRCS-$(HAVE_SSE4_1) += x86/sad_sse4.asm
DSP_SRCS-$(HAVE_AVX2) += x86/sad4d_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/sad_avx2.c
ifeq ($(CONFIG_AV1_ENCODER),yes)
ifeq ($(CONFIG_EXT_INTER),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/masked_sad_intrin_ssse3.c
DSP_SRCS-$(HAVE_SSSE3) += x86/masked_variance_intrin_ssse3.c
endif #CONFIG_EXT_INTER
ifeq ($(CONFIG_MOTION_VAR),yes)
DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_sad_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_variance_sse4.c
endif #CONFIG_MOTION_VAR
endif #CONFIG_AV1_ENCODER
DSP_SRCS-$(HAVE_SSE) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE) += x86/sad_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/sad_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subtract_sse2.asm
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm
endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_ENCODERS
ifneq ($(filter yes,$(CONFIG_ENCODERS)),)
DSP_SRCS-yes += variance.c
DSP_SRCS-yes += variance.h
DSP_SRCS-$(HAVE_MEDIA) += arm/bilinear_filter_media$(ASM)
DSP_SRCS-$(HAVE_MEDIA) += arm/subpel_variance_media.c
DSP_SRCS-$(HAVE_MEDIA) += arm/variance_halfpixvar16x16_h_media$(ASM)
DSP_SRCS-$(HAVE_MEDIA) += arm/variance_halfpixvar16x16_hv_media$(ASM)
DSP_SRCS-$(HAVE_MEDIA) += arm/variance_halfpixvar16x16_v_media$(ASM)
DSP_SRCS-$(HAVE_MEDIA) += arm/variance_media$(ASM)
DSP_SRCS-$(HAVE_NEON) += arm/subpel_variance_neon.c
DSP_SRCS-$(HAVE_NEON) += arm/variance_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/variance_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/sub_pixel_variance_msa.c
DSP_SRCS-$(HAVE_SSE) += x86/variance_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/variance_sse2.c # Contains SSE2 and SSSE3
DSP_SRCS-$(HAVE_SSE2) += x86/halfpix_variance_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/halfpix_variance_impl_sse2.asm
DSP_SRCS-$(HAVE_AVX2) += x86/variance_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/variance_impl_avx2.c
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/ssim_opt_x86_64.asm
endif # ARCH_X86_64
DSP_SRCS-$(HAVE_SSE) += x86/subpel_variance_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subpel_variance_sse2.asm # Contains SSE2 and SSSE3
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_sse2.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/highbd_variance_sse4.c
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_impl_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subpel_variance_impl_sse2.asm
endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_ENCODERS
DSP_SRCS-no += $(DSP_SRCS_REMOVE-yes)
DSP_SRCS-yes += aom_dsp_rtcd.c
DSP_SRCS-yes += aom_dsp_rtcd_defs.pl
DSP_SRCS-yes += aom_simd.c
DSP_SRCS-yes += aom_simd.h
DSP_SRCS-yes += aom_simd_inline.h
DSP_SRCS-yes += simd/v64_intrinsics.h
DSP_SRCS-yes += simd/v64_intrinsics_c.h
DSP_SRCS-yes += simd/v128_intrinsics.h
DSP_SRCS-yes += simd/v128_intrinsics_c.h
DSP_SRCS-yes += simd/v256_intrinsics.h
DSP_SRCS-yes += simd/v256_intrinsics_c.h
DSP_SRCS-$(HAVE_SSE2) += simd/v64_intrinsics_x86.h
DSP_SRCS-$(HAVE_SSE2) += simd/v128_intrinsics_x86.h
DSP_SRCS-$(HAVE_SSE2) += simd/v256_intrinsics_x86.h
DSP_SRCS-$(HAVE_NEON) += simd/v64_intrinsics_arm.h
DSP_SRCS-$(HAVE_NEON) += simd/v128_intrinsics_arm.h
DSP_SRCS-$(HAVE_NEON) += simd/v256_intrinsics_arm.h
$(eval $(call rtcd_h_template,aom_dsp_rtcd,aom_dsp/aom_dsp_rtcd_defs.pl))

View File

@@ -1,102 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_DSP_COMMON_H_
#define AOM_DSP_AOM_DSP_COMMON_H_
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#ifdef __cplusplus
extern "C" {
#endif
#ifndef MAX_SB_SIZE
#if CONFIG_AV1 && CONFIG_EXT_PARTITION
#define MAX_SB_SIZE 128
#else
#define MAX_SB_SIZE 64
#endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
#endif // ndef MAX_SB_SIZE
#define AOMMIN(x, y) (((x) < (y)) ? (x) : (y))
#define AOMMAX(x, y) (((x) > (y)) ? (x) : (y))
#define IMPLIES(a, b) (!(a) || (b)) // Logical 'a implies b' (or 'a -> b')
#define IS_POWER_OF_TWO(x) (((x) & ((x)-1)) == 0)
// These can be used to give a hint about branch outcomes.
// This can have an effect, even if your target processor has a
// good branch predictor, as these hints can affect basic block
// ordering by the compiler.
#ifdef __GNUC__
#define LIKELY(v) __builtin_expect(v, 1)
#define UNLIKELY(v) __builtin_expect(v, 0)
#else
#define LIKELY(v) (v)
#define UNLIKELY(v) (v)
#endif
#define AOM_SWAP(type, a, b) \
do { \
type c = (b); \
b = a; \
a = c; \
} while (0)
#if CONFIG_AOM_QM
typedef uint16_t qm_val_t;
#define AOM_QM_BITS 6
#endif
#if CONFIG_AOM_HIGHBITDEPTH
// Note:
// tran_low_t is the datatype used for final transform coefficients.
// tran_high_t is the datatype used for intermediate transform stages.
typedef int64_t tran_high_t;
typedef int32_t tran_low_t;
#else
// Note:
// tran_low_t is the datatype used for final transform coefficients.
// tran_high_t is the datatype used for intermediate transform stages.
typedef int32_t tran_high_t;
typedef int16_t tran_low_t;
#endif // CONFIG_AOM_HIGHBITDEPTH
static INLINE uint8_t clip_pixel(int val) {
return (val > 255) ? 255 : (val < 0) ? 0 : val;
}
static INLINE int clamp(int value, int low, int high) {
return value < low ? low : (value > high ? high : value);
}
static INLINE double fclamp(double value, double low, double high) {
return value < low ? low : (value > high ? high : value);
}
#if CONFIG_AOM_HIGHBITDEPTH
static INLINE uint16_t clip_pixel_highbd(int val, int bd) {
switch (bd) {
case 8:
default: return (uint16_t)clamp(val, 0, 255);
case 10: return (uint16_t)clamp(val, 0, 1023);
case 12: return (uint16_t)clamp(val, 0, 4095);
}
}
#endif // CONFIG_AOM_HIGHBITDEPTH
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_DSP_COMMON_H_

View File

@@ -1,16 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#define RTCD_C
#include "./aom_dsp_rtcd.h"
#include "aom_ports/aom_once.h"
void aom_dsp_rtcd() { once(setup_rtcd_internal); }

File diff suppressed because it is too large Load Diff

View File

@@ -1,43 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_FILTER_H_
#define AOM_DSP_AOM_FILTER_H_
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
#define FILTER_BITS 7
#define SUBPEL_BITS 4
#define SUBPEL_MASK ((1 << SUBPEL_BITS) - 1)
#define SUBPEL_SHIFTS (1 << SUBPEL_BITS)
#define SUBPEL_TAPS 8
typedef int16_t InterpKernel[SUBPEL_TAPS];
#define BIL_SUBPEL_BITS 3
#define BIL_SUBPEL_SHIFTS (1 << BIL_SUBPEL_BITS)
// 2 tap bilinear filters
static const uint8_t bilinear_filters_2t[BIL_SUBPEL_SHIFTS][2] = {
{ 128, 0 }, { 112, 16 }, { 96, 32 }, { 80, 48 },
{ 64, 64 }, { 48, 80 }, { 32, 96 }, { 16, 112 },
};
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_FILTER_H_

View File

@@ -1,13 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
// Set to 1 to add some sanity checks in the fallback C code
const int simd_check = 1;

View File

@@ -1,32 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_AOM_SIMD_H_
#define AOM_DSP_AOM_AOM_SIMD_H_
#include <stdint.h>
#if defined(_WIN32)
#include <intrin.h>
#endif
#include "./aom_config.h"
#include "./aom_simd_inline.h"
#if HAVE_NEON
#include "simd/v256_intrinsics_arm.h"
#elif HAVE_SSE2
#include "simd/v256_intrinsics_x86.h"
#else
#include "simd/v256_intrinsics.h"
#endif
#endif // AOM_DSP_AOM_AOM_SIMD_H_

View File

@@ -1,21 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_SIMD_INLINE_H_
#define AOM_DSP_AOM_SIMD_INLINE_H_
#include "aom/aom_integer.h"
#ifndef SIMD_INLINE
#define SIMD_INLINE static AOM_FORCE_INLINE
#endif
#endif // AOM_DSP_AOM_SIMD_INLINE_H_

View File

@@ -1,364 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include <assert.h>
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
static INLINE int32x4_t MULTIPLY_BY_Q0(int16x4_t dsrc0, int16x4_t dsrc1,
int16x4_t dsrc2, int16x4_t dsrc3,
int16x4_t dsrc4, int16x4_t dsrc5,
int16x4_t dsrc6, int16x4_t dsrc7,
int16x8_t q0s16) {
int32x4_t qdst;
int16x4_t d0s16, d1s16;
d0s16 = vget_low_s16(q0s16);
d1s16 = vget_high_s16(q0s16);
qdst = vmull_lane_s16(dsrc0, d0s16, 0);
qdst = vmlal_lane_s16(qdst, dsrc1, d0s16, 1);
qdst = vmlal_lane_s16(qdst, dsrc2, d0s16, 2);
qdst = vmlal_lane_s16(qdst, dsrc3, d0s16, 3);
qdst = vmlal_lane_s16(qdst, dsrc4, d1s16, 0);
qdst = vmlal_lane_s16(qdst, dsrc5, d1s16, 1);
qdst = vmlal_lane_s16(qdst, dsrc6, d1s16, 2);
qdst = vmlal_lane_s16(qdst, dsrc7, d1s16, 3);
return qdst;
}
void aom_convolve8_avg_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, // unused
int y_step_q4, // unused
int w, int h) {
int width;
const uint8_t *s;
uint8_t *d;
uint8x8_t d2u8, d3u8, d24u8, d25u8, d26u8, d27u8, d28u8, d29u8;
uint32x2_t d2u32, d3u32, d6u32, d7u32, d28u32, d29u32, d30u32, d31u32;
uint8x16_t q1u8, q3u8, q12u8, q13u8, q14u8, q15u8;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16;
uint16x4_t d2u16, d3u16, d4u16, d5u16, d16u16, d17u16, d18u16, d19u16;
int16x8_t q0s16;
uint16x8_t q1u16, q2u16, q8u16, q9u16, q10u16, q11u16, q12u16, q13u16;
int32x4_t q1s32, q2s32, q14s32, q15s32;
uint16x8x2_t q0x2u16;
uint8x8x2_t d0x2u8, d1x2u8;
uint32x2x2_t d0x2u32;
uint16x4x2_t d0x2u16, d1x2u16;
uint32x4x2_t q0x2u32;
assert(x_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_y;
q0s16 = vld1q_s16(filter_x);
src -= 3; // adjust for taps
for (; h > 0; h -= 4) { // loop_horiz_v
s = src;
d24u8 = vld1_u8(s);
s += src_stride;
d25u8 = vld1_u8(s);
s += src_stride;
d26u8 = vld1_u8(s);
s += src_stride;
d27u8 = vld1_u8(s);
q12u8 = vcombine_u8(d24u8, d25u8);
q13u8 = vcombine_u8(d26u8, d27u8);
q0x2u16 =
vtrnq_u16(vreinterpretq_u16_u8(q12u8), vreinterpretq_u16_u8(q13u8));
d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0]));
d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0]));
d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1]));
d27u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[1]));
d0x2u8 = vtrn_u8(d24u8, d25u8);
d1x2u8 = vtrn_u8(d26u8, d27u8);
__builtin_prefetch(src + src_stride * 4);
__builtin_prefetch(src + src_stride * 5);
q8u16 = vmovl_u8(d0x2u8.val[0]);
q9u16 = vmovl_u8(d0x2u8.val[1]);
q10u16 = vmovl_u8(d1x2u8.val[0]);
q11u16 = vmovl_u8(d1x2u8.val[1]);
src += 7;
d16u16 = vget_low_u16(q8u16);
d17u16 = vget_high_u16(q8u16);
d18u16 = vget_low_u16(q9u16);
d19u16 = vget_high_u16(q9u16);
q8u16 = vcombine_u16(d16u16, d18u16); // vswp 17 18
q9u16 = vcombine_u16(d17u16, d19u16);
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21
for (width = w; width > 0; width -= 4, src += 4, dst += 4) { // loop_horiz
s = src;
d28u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride;
d29u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride;
d31u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride;
d30u32 = vld1_dup_u32((const uint32_t *)s);
__builtin_prefetch(src + 64);
d0x2u16 =
vtrn_u16(vreinterpret_u16_u32(d28u32), vreinterpret_u16_u32(d31u32));
d1x2u16 =
vtrn_u16(vreinterpret_u16_u32(d29u32), vreinterpret_u16_u32(d30u32));
d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28
vreinterpret_u8_u16(d1x2u16.val[0])); // d29
d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31
vreinterpret_u8_u16(d1x2u16.val[1])); // d30
__builtin_prefetch(src + 64 + src_stride);
q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]);
q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]);
q0x2u32 =
vtrnq_u32(vreinterpretq_u32_u8(q14u8), vreinterpretq_u32_u8(q15u8));
d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0]));
d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0]));
q12u16 = vmovl_u8(d28u8);
q13u16 = vmovl_u8(d29u8);
__builtin_prefetch(src + 64 + src_stride * 2);
d = dst;
d6u32 = vld1_lane_u32((const uint32_t *)d, d6u32, 0);
d += dst_stride;
d7u32 = vld1_lane_u32((const uint32_t *)d, d7u32, 0);
d += dst_stride;
d6u32 = vld1_lane_u32((const uint32_t *)d, d6u32, 1);
d += dst_stride;
d7u32 = vld1_lane_u32((const uint32_t *)d, d7u32, 1);
d16s16 = vreinterpret_s16_u16(vget_low_u16(q8u16));
d17s16 = vreinterpret_s16_u16(vget_high_u16(q8u16));
d18s16 = vreinterpret_s16_u16(vget_low_u16(q9u16));
d19s16 = vreinterpret_s16_u16(vget_high_u16(q9u16));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, d18s16, d19s16,
d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, d19s16, d23s16,
d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, d23s16, d24s16,
d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
__builtin_prefetch(src + 64 + src_stride * 3);
d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7);
d4u16 = vqrshrun_n_s32(q14s32, 7);
d5u16 = vqrshrun_n_s32(q15s32, 7);
q1u16 = vcombine_u16(d2u16, d3u16);
q2u16 = vcombine_u16(d4u16, d5u16);
d2u8 = vqmovn_u16(q1u16);
d3u8 = vqmovn_u16(q2u16);
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), vreinterpret_u16_u8(d3u8));
d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]),
vreinterpret_u32_u16(d0x2u16.val[1]));
d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]),
vreinterpret_u8_u32(d0x2u32.val[1]));
q1u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]);
q3u8 = vreinterpretq_u8_u32(vcombine_u32(d6u32, d7u32));
q1u8 = vrhaddq_u8(q1u8, q3u8);
d2u32 = vreinterpret_u32_u8(vget_low_u8(q1u8));
d3u32 = vreinterpret_u32_u8(vget_high_u8(q1u8));
d = dst;
vst1_lane_u32((uint32_t *)d, d2u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d2u32, 1);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 1);
q8u16 = q9u16;
d20s16 = d23s16;
q11u16 = q12u16;
q9u16 = q13u16;
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
}
src += src_stride * 4 - w - 7;
dst += dst_stride * 4 - w;
}
return;
}
void aom_convolve8_avg_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, // unused
int x_step_q4, // unused
const int16_t *filter_y, int y_step_q4, int w,
int h) {
int height;
const uint8_t *s;
uint8_t *d;
uint8x8_t d2u8, d3u8;
uint32x2_t d2u32, d3u32, d6u32, d7u32;
uint32x2_t d16u32, d18u32, d20u32, d22u32, d24u32, d26u32;
uint8x16_t q1u8, q3u8;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16;
int16x4_t d24s16, d25s16, d26s16, d27s16;
uint16x4_t d2u16, d3u16, d4u16, d5u16;
int16x8_t q0s16;
uint16x8_t q1u16, q2u16, q8u16, q9u16, q10u16, q11u16, q12u16, q13u16;
int32x4_t q1s32, q2s32, q14s32, q15s32;
assert(y_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_x;
src -= src_stride * 3;
q0s16 = vld1q_s16(filter_y);
for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h
s = src;
d16u32 = vld1_lane_u32((const uint32_t *)s, d16u32, 0);
s += src_stride;
d16u32 = vld1_lane_u32((const uint32_t *)s, d16u32, 1);
s += src_stride;
d18u32 = vld1_lane_u32((const uint32_t *)s, d18u32, 0);
s += src_stride;
d18u32 = vld1_lane_u32((const uint32_t *)s, d18u32, 1);
s += src_stride;
d20u32 = vld1_lane_u32((const uint32_t *)s, d20u32, 0);
s += src_stride;
d20u32 = vld1_lane_u32((const uint32_t *)s, d20u32, 1);
s += src_stride;
d22u32 = vld1_lane_u32((const uint32_t *)s, d22u32, 0);
s += src_stride;
q8u16 = vmovl_u8(vreinterpret_u8_u32(d16u32));
q9u16 = vmovl_u8(vreinterpret_u8_u32(d18u32));
q10u16 = vmovl_u8(vreinterpret_u8_u32(d20u32));
q11u16 = vmovl_u8(vreinterpret_u8_u32(d22u32));
d18s16 = vreinterpret_s16_u16(vget_low_u16(q9u16));
d19s16 = vreinterpret_s16_u16(vget_high_u16(q9u16));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d = dst;
for (height = h; height > 0; height -= 4) { // loop_vert
d24u32 = vld1_lane_u32((const uint32_t *)s, d24u32, 0);
s += src_stride;
d26u32 = vld1_lane_u32((const uint32_t *)s, d26u32, 0);
s += src_stride;
d26u32 = vld1_lane_u32((const uint32_t *)s, d26u32, 1);
s += src_stride;
d24u32 = vld1_lane_u32((const uint32_t *)s, d24u32, 1);
s += src_stride;
q12u16 = vmovl_u8(vreinterpret_u8_u32(d24u32));
q13u16 = vmovl_u8(vreinterpret_u8_u32(d26u32));
d6u32 = vld1_lane_u32((const uint32_t *)d, d6u32, 0);
d += dst_stride;
d6u32 = vld1_lane_u32((const uint32_t *)d, d6u32, 1);
d += dst_stride;
d7u32 = vld1_lane_u32((const uint32_t *)d, d7u32, 0);
d += dst_stride;
d7u32 = vld1_lane_u32((const uint32_t *)d, d7u32, 1);
d -= dst_stride * 3;
d16s16 = vreinterpret_s16_u16(vget_low_u16(q8u16));
d17s16 = vreinterpret_s16_u16(vget_high_u16(q8u16));
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d21s16 = vreinterpret_s16_u16(vget_high_u16(q10u16));
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
__builtin_prefetch(s);
__builtin_prefetch(s + src_stride);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, d20s16, d21s16,
d22s16, d24s16, q0s16);
__builtin_prefetch(s + src_stride * 2);
__builtin_prefetch(s + src_stride * 3);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, d21s16, d22s16,
d24s16, d26s16, q0s16);
__builtin_prefetch(d);
__builtin_prefetch(d + dst_stride);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, d22s16, d24s16,
d26s16, d27s16, q0s16);
__builtin_prefetch(d + dst_stride * 2);
__builtin_prefetch(d + dst_stride * 3);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7);
d4u16 = vqrshrun_n_s32(q14s32, 7);
d5u16 = vqrshrun_n_s32(q15s32, 7);
q1u16 = vcombine_u16(d2u16, d3u16);
q2u16 = vcombine_u16(d4u16, d5u16);
d2u8 = vqmovn_u16(q1u16);
d3u8 = vqmovn_u16(q2u16);
q1u8 = vcombine_u8(d2u8, d3u8);
q3u8 = vreinterpretq_u8_u32(vcombine_u32(d6u32, d7u32));
q1u8 = vrhaddq_u8(q1u8, q3u8);
d2u32 = vreinterpret_u32_u8(vget_low_u8(q1u8));
d3u32 = vreinterpret_u32_u8(vget_high_u8(q1u8));
vst1_lane_u32((uint32_t *)d, d2u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d2u32, 1);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 1);
d += dst_stride;
q8u16 = q10u16;
d18s16 = d22s16;
d19s16 = d24s16;
q10u16 = q13u16;
d22s16 = d25s16;
}
}
return;
}

View File

@@ -1,331 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include <assert.h>
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
static INLINE int32x4_t MULTIPLY_BY_Q0(int16x4_t dsrc0, int16x4_t dsrc1,
int16x4_t dsrc2, int16x4_t dsrc3,
int16x4_t dsrc4, int16x4_t dsrc5,
int16x4_t dsrc6, int16x4_t dsrc7,
int16x8_t q0s16) {
int32x4_t qdst;
int16x4_t d0s16, d1s16;
d0s16 = vget_low_s16(q0s16);
d1s16 = vget_high_s16(q0s16);
qdst = vmull_lane_s16(dsrc0, d0s16, 0);
qdst = vmlal_lane_s16(qdst, dsrc1, d0s16, 1);
qdst = vmlal_lane_s16(qdst, dsrc2, d0s16, 2);
qdst = vmlal_lane_s16(qdst, dsrc3, d0s16, 3);
qdst = vmlal_lane_s16(qdst, dsrc4, d1s16, 0);
qdst = vmlal_lane_s16(qdst, dsrc5, d1s16, 1);
qdst = vmlal_lane_s16(qdst, dsrc6, d1s16, 2);
qdst = vmlal_lane_s16(qdst, dsrc7, d1s16, 3);
return qdst;
}
void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, // unused
int y_step_q4, // unused
int w, int h) {
int width;
const uint8_t *s, *psrc;
uint8_t *d, *pdst;
uint8x8_t d2u8, d3u8, d24u8, d25u8, d26u8, d27u8, d28u8, d29u8;
uint32x2_t d2u32, d3u32, d28u32, d29u32, d30u32, d31u32;
uint8x16_t q12u8, q13u8, q14u8, q15u8;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16;
uint16x4_t d2u16, d3u16, d4u16, d5u16, d16u16, d17u16, d18u16, d19u16;
int16x8_t q0s16;
uint16x8_t q1u16, q2u16, q8u16, q9u16, q10u16, q11u16, q12u16, q13u16;
int32x4_t q1s32, q2s32, q14s32, q15s32;
uint16x8x2_t q0x2u16;
uint8x8x2_t d0x2u8, d1x2u8;
uint32x2x2_t d0x2u32;
uint16x4x2_t d0x2u16, d1x2u16;
uint32x4x2_t q0x2u32;
assert(x_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_y;
q0s16 = vld1q_s16(filter_x);
src -= 3; // adjust for taps
for (; h > 0; h -= 4, src += src_stride * 4,
dst += dst_stride * 4) { // loop_horiz_v
s = src;
d24u8 = vld1_u8(s);
s += src_stride;
d25u8 = vld1_u8(s);
s += src_stride;
d26u8 = vld1_u8(s);
s += src_stride;
d27u8 = vld1_u8(s);
q12u8 = vcombine_u8(d24u8, d25u8);
q13u8 = vcombine_u8(d26u8, d27u8);
q0x2u16 =
vtrnq_u16(vreinterpretq_u16_u8(q12u8), vreinterpretq_u16_u8(q13u8));
d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0]));
d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0]));
d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1]));
d27u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[1]));
d0x2u8 = vtrn_u8(d24u8, d25u8);
d1x2u8 = vtrn_u8(d26u8, d27u8);
__builtin_prefetch(src + src_stride * 4);
__builtin_prefetch(src + src_stride * 5);
__builtin_prefetch(src + src_stride * 6);
q8u16 = vmovl_u8(d0x2u8.val[0]);
q9u16 = vmovl_u8(d0x2u8.val[1]);
q10u16 = vmovl_u8(d1x2u8.val[0]);
q11u16 = vmovl_u8(d1x2u8.val[1]);
d16u16 = vget_low_u16(q8u16);
d17u16 = vget_high_u16(q8u16);
d18u16 = vget_low_u16(q9u16);
d19u16 = vget_high_u16(q9u16);
q8u16 = vcombine_u16(d16u16, d18u16); // vswp 17 18
q9u16 = vcombine_u16(d17u16, d19u16);
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21
for (width = w, psrc = src + 7, pdst = dst; width > 0;
width -= 4, psrc += 4, pdst += 4) { // loop_horiz
s = psrc;
d28u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride;
d29u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride;
d31u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride;
d30u32 = vld1_dup_u32((const uint32_t *)s);
__builtin_prefetch(psrc + 64);
d0x2u16 =
vtrn_u16(vreinterpret_u16_u32(d28u32), vreinterpret_u16_u32(d31u32));
d1x2u16 =
vtrn_u16(vreinterpret_u16_u32(d29u32), vreinterpret_u16_u32(d30u32));
d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28
vreinterpret_u8_u16(d1x2u16.val[0])); // d29
d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31
vreinterpret_u8_u16(d1x2u16.val[1])); // d30
__builtin_prefetch(psrc + 64 + src_stride);
q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]);
q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]);
q0x2u32 =
vtrnq_u32(vreinterpretq_u32_u8(q14u8), vreinterpretq_u32_u8(q15u8));
d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0]));
d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0]));
q12u16 = vmovl_u8(d28u8);
q13u16 = vmovl_u8(d29u8);
__builtin_prefetch(psrc + 64 + src_stride * 2);
d16s16 = vreinterpret_s16_u16(vget_low_u16(q8u16));
d17s16 = vreinterpret_s16_u16(vget_high_u16(q8u16));
d18s16 = vreinterpret_s16_u16(vget_low_u16(q9u16));
d19s16 = vreinterpret_s16_u16(vget_high_u16(q9u16));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, d18s16, d19s16,
d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, d19s16, d23s16,
d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, d23s16, d24s16,
d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
__builtin_prefetch(psrc + 60 + src_stride * 3);
d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7);
d4u16 = vqrshrun_n_s32(q14s32, 7);
d5u16 = vqrshrun_n_s32(q15s32, 7);
q1u16 = vcombine_u16(d2u16, d3u16);
q2u16 = vcombine_u16(d4u16, d5u16);
d2u8 = vqmovn_u16(q1u16);
d3u8 = vqmovn_u16(q2u16);
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), vreinterpret_u16_u8(d3u8));
d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]),
vreinterpret_u32_u16(d0x2u16.val[1]));
d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]),
vreinterpret_u8_u32(d0x2u32.val[1]));
d2u32 = vreinterpret_u32_u8(d0x2u8.val[0]);
d3u32 = vreinterpret_u32_u8(d0x2u8.val[1]);
d = pdst;
vst1_lane_u32((uint32_t *)d, d2u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d2u32, 1);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 1);
q8u16 = q9u16;
d20s16 = d23s16;
q11u16 = q12u16;
q9u16 = q13u16;
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
}
}
return;
}
void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, // unused
int x_step_q4, // unused
const int16_t *filter_y, int y_step_q4, int w,
int h) {
int height;
const uint8_t *s;
uint8_t *d;
uint32x2_t d2u32, d3u32;
uint32x2_t d16u32, d18u32, d20u32, d22u32, d24u32, d26u32;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16;
int16x4_t d24s16, d25s16, d26s16, d27s16;
uint16x4_t d2u16, d3u16, d4u16, d5u16;
int16x8_t q0s16;
uint16x8_t q1u16, q2u16, q8u16, q9u16, q10u16, q11u16, q12u16, q13u16;
int32x4_t q1s32, q2s32, q14s32, q15s32;
assert(y_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_x;
src -= src_stride * 3;
q0s16 = vld1q_s16(filter_y);
for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h
s = src;
d16u32 = vld1_lane_u32((const uint32_t *)s, d16u32, 0);
s += src_stride;
d16u32 = vld1_lane_u32((const uint32_t *)s, d16u32, 1);
s += src_stride;
d18u32 = vld1_lane_u32((const uint32_t *)s, d18u32, 0);
s += src_stride;
d18u32 = vld1_lane_u32((const uint32_t *)s, d18u32, 1);
s += src_stride;
d20u32 = vld1_lane_u32((const uint32_t *)s, d20u32, 0);
s += src_stride;
d20u32 = vld1_lane_u32((const uint32_t *)s, d20u32, 1);
s += src_stride;
d22u32 = vld1_lane_u32((const uint32_t *)s, d22u32, 0);
s += src_stride;
q8u16 = vmovl_u8(vreinterpret_u8_u32(d16u32));
q9u16 = vmovl_u8(vreinterpret_u8_u32(d18u32));
q10u16 = vmovl_u8(vreinterpret_u8_u32(d20u32));
q11u16 = vmovl_u8(vreinterpret_u8_u32(d22u32));
d18s16 = vreinterpret_s16_u16(vget_low_u16(q9u16));
d19s16 = vreinterpret_s16_u16(vget_high_u16(q9u16));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d = dst;
for (height = h; height > 0; height -= 4) { // loop_vert
d24u32 = vld1_lane_u32((const uint32_t *)s, d24u32, 0);
s += src_stride;
d26u32 = vld1_lane_u32((const uint32_t *)s, d26u32, 0);
s += src_stride;
d26u32 = vld1_lane_u32((const uint32_t *)s, d26u32, 1);
s += src_stride;
d24u32 = vld1_lane_u32((const uint32_t *)s, d24u32, 1);
s += src_stride;
q12u16 = vmovl_u8(vreinterpret_u8_u32(d24u32));
q13u16 = vmovl_u8(vreinterpret_u8_u32(d26u32));
d16s16 = vreinterpret_s16_u16(vget_low_u16(q8u16));
d17s16 = vreinterpret_s16_u16(vget_high_u16(q8u16));
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d21s16 = vreinterpret_s16_u16(vget_high_u16(q10u16));
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
__builtin_prefetch(d);
__builtin_prefetch(d + dst_stride);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, d20s16, d21s16,
d22s16, d24s16, q0s16);
__builtin_prefetch(d + dst_stride * 2);
__builtin_prefetch(d + dst_stride * 3);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, d21s16, d22s16,
d24s16, d26s16, q0s16);
__builtin_prefetch(s);
__builtin_prefetch(s + src_stride);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, d22s16, d24s16,
d26s16, d27s16, q0s16);
__builtin_prefetch(s + src_stride * 2);
__builtin_prefetch(s + src_stride * 3);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7);
d4u16 = vqrshrun_n_s32(q14s32, 7);
d5u16 = vqrshrun_n_s32(q15s32, 7);
q1u16 = vcombine_u16(d2u16, d3u16);
q2u16 = vcombine_u16(d4u16, d5u16);
d2u32 = vreinterpret_u32_u8(vqmovn_u16(q1u16));
d3u32 = vreinterpret_u32_u8(vqmovn_u16(q2u16));
vst1_lane_u32((uint32_t *)d, d2u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d2u32, 1);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 0);
d += dst_stride;
vst1_lane_u32((uint32_t *)d, d3u32, 1);
d += dst_stride;
q8u16 = q10u16;
d18s16 = d22s16;
d19s16 = d24s16;
q10u16 = q13u16;
d22s16 = d25s16;
}
}
return;
}

View File

@@ -1,145 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
void aom_convolve_avg_neon(const uint8_t *src, // r0
ptrdiff_t src_stride, // r1
uint8_t *dst, // r2
ptrdiff_t dst_stride, // r3
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride, int w,
int h) {
uint8_t *d;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
uint32x2_t d0u32, d2u32;
uint8x16_t q0u8, q1u8, q2u8, q3u8, q8u8, q9u8, q10u8, q11u8;
(void)filter_x;
(void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
d = dst;
if (w > 32) { // avg64
for (; h > 0; h -= 1) {
q0u8 = vld1q_u8(src);
q1u8 = vld1q_u8(src + 16);
q2u8 = vld1q_u8(src + 32);
q3u8 = vld1q_u8(src + 48);
src += src_stride;
q8u8 = vld1q_u8(d);
q9u8 = vld1q_u8(d + 16);
q10u8 = vld1q_u8(d + 32);
q11u8 = vld1q_u8(d + 48);
d += dst_stride;
q0u8 = vrhaddq_u8(q0u8, q8u8);
q1u8 = vrhaddq_u8(q1u8, q9u8);
q2u8 = vrhaddq_u8(q2u8, q10u8);
q3u8 = vrhaddq_u8(q3u8, q11u8);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q1u8);
vst1q_u8(dst + 32, q2u8);
vst1q_u8(dst + 48, q3u8);
dst += dst_stride;
}
} else if (w == 32) { // avg32
for (; h > 0; h -= 2) {
q0u8 = vld1q_u8(src);
q1u8 = vld1q_u8(src + 16);
src += src_stride;
q2u8 = vld1q_u8(src);
q3u8 = vld1q_u8(src + 16);
src += src_stride;
q8u8 = vld1q_u8(d);
q9u8 = vld1q_u8(d + 16);
d += dst_stride;
q10u8 = vld1q_u8(d);
q11u8 = vld1q_u8(d + 16);
d += dst_stride;
q0u8 = vrhaddq_u8(q0u8, q8u8);
q1u8 = vrhaddq_u8(q1u8, q9u8);
q2u8 = vrhaddq_u8(q2u8, q10u8);
q3u8 = vrhaddq_u8(q3u8, q11u8);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q1u8);
dst += dst_stride;
vst1q_u8(dst, q2u8);
vst1q_u8(dst + 16, q3u8);
dst += dst_stride;
}
} else if (w > 8) { // avg16
for (; h > 0; h -= 2) {
q0u8 = vld1q_u8(src);
src += src_stride;
q1u8 = vld1q_u8(src);
src += src_stride;
q2u8 = vld1q_u8(d);
d += dst_stride;
q3u8 = vld1q_u8(d);
d += dst_stride;
q0u8 = vrhaddq_u8(q0u8, q2u8);
q1u8 = vrhaddq_u8(q1u8, q3u8);
vst1q_u8(dst, q0u8);
dst += dst_stride;
vst1q_u8(dst, q1u8);
dst += dst_stride;
}
} else if (w == 8) { // avg8
for (; h > 0; h -= 2) {
d0u8 = vld1_u8(src);
src += src_stride;
d1u8 = vld1_u8(src);
src += src_stride;
d2u8 = vld1_u8(d);
d += dst_stride;
d3u8 = vld1_u8(d);
d += dst_stride;
q0u8 = vcombine_u8(d0u8, d1u8);
q1u8 = vcombine_u8(d2u8, d3u8);
q0u8 = vrhaddq_u8(q0u8, q1u8);
vst1_u8(dst, vget_low_u8(q0u8));
dst += dst_stride;
vst1_u8(dst, vget_high_u8(q0u8));
dst += dst_stride;
}
} else { // avg4
for (; h > 0; h -= 2) {
d0u32 = vld1_lane_u32((const uint32_t *)src, d0u32, 0);
src += src_stride;
d0u32 = vld1_lane_u32((const uint32_t *)src, d0u32, 1);
src += src_stride;
d2u32 = vld1_lane_u32((const uint32_t *)d, d2u32, 0);
d += dst_stride;
d2u32 = vld1_lane_u32((const uint32_t *)d, d2u32, 1);
d += dst_stride;
d0u8 = vrhadd_u8(vreinterpret_u8_u32(d0u32), vreinterpret_u8_u32(d2u32));
d0u32 = vreinterpret_u32_u8(d0u8);
vst1_lane_u32((uint32_t *)dst, d0u32, 0);
dst += dst_stride;
vst1_lane_u32((uint32_t *)dst, d0u32, 1);
dst += dst_stride;
}
}
return;
}

View File

@@ -1,93 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
void aom_convolve_copy_neon(const uint8_t *src, // r0
ptrdiff_t src_stride, // r1
uint8_t *dst, // r2
ptrdiff_t dst_stride, // r3
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride, int w,
int h) {
uint8x8_t d0u8, d2u8;
uint8x16_t q0u8, q1u8, q2u8, q3u8;
(void)filter_x;
(void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
if (w > 32) { // copy64
for (; h > 0; h--) {
q0u8 = vld1q_u8(src);
q1u8 = vld1q_u8(src + 16);
q2u8 = vld1q_u8(src + 32);
q3u8 = vld1q_u8(src + 48);
src += src_stride;
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q1u8);
vst1q_u8(dst + 32, q2u8);
vst1q_u8(dst + 48, q3u8);
dst += dst_stride;
}
} else if (w == 32) { // copy32
for (; h > 0; h -= 2) {
q0u8 = vld1q_u8(src);
q1u8 = vld1q_u8(src + 16);
src += src_stride;
q2u8 = vld1q_u8(src);
q3u8 = vld1q_u8(src + 16);
src += src_stride;
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q1u8);
dst += dst_stride;
vst1q_u8(dst, q2u8);
vst1q_u8(dst + 16, q3u8);
dst += dst_stride;
}
} else if (w > 8) { // copy16
for (; h > 0; h -= 2) {
q0u8 = vld1q_u8(src);
src += src_stride;
q1u8 = vld1q_u8(src);
src += src_stride;
vst1q_u8(dst, q0u8);
dst += dst_stride;
vst1q_u8(dst, q1u8);
dst += dst_stride;
}
} else if (w == 8) { // copy8
for (; h > 0; h -= 2) {
d0u8 = vld1_u8(src);
src += src_stride;
d2u8 = vld1_u8(src);
src += src_stride;
vst1_u8(dst, d0u8);
dst += dst_stride;
vst1_u8(dst, d2u8);
dst += dst_stride;
}
} else { // copy4
for (; h > 0; h--) {
*(uint32_t *)dst = *(const uint32_t *)src;
src += src_stride;
dst += dst_stride;
}
}
return;
}

View File

@@ -1,240 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_filter_block2d_bil_first_pass_media|
EXPORT |aom_filter_block2d_bil_second_pass_media|
AREA |.text|, CODE, READONLY ; name this block of code
;-------------------------------------
; r0 unsigned char *src_ptr,
; r1 unsigned short *dst_ptr,
; r2 unsigned int src_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *aom_filter
;-------------------------------------
; The output is transposed stroed in output array to make it easy for second pass filtering.
|aom_filter_block2d_bil_first_pass_media| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; aom_filter address
ldr r4, [sp, #36] ; width
mov r12, r3 ; outer-loop counter
add r7, r2, r4 ; preload next row
pld [r0, r7]
sub r2, r2, r4 ; src increment for height loop
ldr r5, [r11] ; load up filter coefficients
mov r3, r3, lsl #1 ; height*2
add r3, r3, #2 ; plus 2 to make output buffer 4-bit aligned since height is actually (height+1)
mov r11, r1 ; save dst_ptr for each row
cmp r5, #128 ; if filter coef = 128, then skip the filter
beq bil_null_1st_filter
|bil_height_loop_1st_v6|
ldrb r6, [r0] ; load source data
ldrb r7, [r0, #1]
ldrb r8, [r0, #2]
mov lr, r4, lsr #2 ; 4-in-parellel loop counter
|bil_width_loop_1st_v6|
ldrb r9, [r0, #3]
ldrb r10, [r0, #4]
pkhbt r6, r6, r7, lsl #16 ; src[1] | src[0]
pkhbt r7, r7, r8, lsl #16 ; src[2] | src[1]
smuad r6, r6, r5 ; apply the filter
pkhbt r8, r8, r9, lsl #16 ; src[3] | src[2]
smuad r7, r7, r5
pkhbt r9, r9, r10, lsl #16 ; src[4] | src[3]
smuad r8, r8, r5
smuad r9, r9, r5
add r0, r0, #4
subs lr, lr, #1
add r6, r6, #0x40 ; round_shift_and_clamp
add r7, r7, #0x40
usat r6, #16, r6, asr #7
usat r7, #16, r7, asr #7
strh r6, [r1], r3 ; result is transposed and stored
add r8, r8, #0x40 ; round_shift_and_clamp
strh r7, [r1], r3
add r9, r9, #0x40
usat r8, #16, r8, asr #7
usat r9, #16, r9, asr #7
strh r8, [r1], r3 ; result is transposed and stored
ldrneb r6, [r0] ; load source data
strh r9, [r1], r3
ldrneb r7, [r0, #1]
ldrneb r8, [r0, #2]
bne bil_width_loop_1st_v6
add r0, r0, r2 ; move to next input row
subs r12, r12, #1
add r9, r2, r4, lsl #1 ; adding back block width
pld [r0, r9] ; preload next row
add r11, r11, #2 ; move over to next column
mov r1, r11
bne bil_height_loop_1st_v6
ldmia sp!, {r4 - r11, pc}
|bil_null_1st_filter|
|bil_height_loop_null_1st|
mov lr, r4, lsr #2 ; loop counter
|bil_width_loop_null_1st|
ldrb r6, [r0] ; load data
ldrb r7, [r0, #1]
ldrb r8, [r0, #2]
ldrb r9, [r0, #3]
strh r6, [r1], r3 ; store it to immediate buffer
add r0, r0, #4
strh r7, [r1], r3
subs lr, lr, #1
strh r8, [r1], r3
strh r9, [r1], r3
bne bil_width_loop_null_1st
subs r12, r12, #1
add r0, r0, r2 ; move to next input line
add r11, r11, #2 ; move over to next column
mov r1, r11
bne bil_height_loop_null_1st
ldmia sp!, {r4 - r11, pc}
ENDP ; |aom_filter_block2d_bil_first_pass_media|
;---------------------------------
; r0 unsigned short *src_ptr,
; r1 unsigned char *dst_ptr,
; r2 int dst_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *aom_filter
;---------------------------------
|aom_filter_block2d_bil_second_pass_media| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; aom_filter address
ldr r4, [sp, #36] ; width
ldr r5, [r11] ; load up filter coefficients
mov r12, r4 ; outer-loop counter = width, since we work on transposed data matrix
mov r11, r1
cmp r5, #128 ; if filter coef = 128, then skip the filter
beq bil_null_2nd_filter
|bil_height_loop_2nd|
ldr r6, [r0] ; load the data
ldr r8, [r0, #4]
ldrh r10, [r0, #8]
mov lr, r3, lsr #2 ; loop counter
|bil_width_loop_2nd|
pkhtb r7, r6, r8 ; src[1] | src[2]
pkhtb r9, r8, r10 ; src[3] | src[4]
smuad r6, r6, r5 ; apply filter
smuad r8, r8, r5 ; apply filter
subs lr, lr, #1
smuadx r7, r7, r5 ; apply filter
smuadx r9, r9, r5 ; apply filter
add r0, r0, #8
add r6, r6, #0x40 ; round_shift_and_clamp
add r7, r7, #0x40
usat r6, #8, r6, asr #7
usat r7, #8, r7, asr #7
strb r6, [r1], r2 ; the result is transposed back and stored
add r8, r8, #0x40 ; round_shift_and_clamp
strb r7, [r1], r2
add r9, r9, #0x40
usat r8, #8, r8, asr #7
usat r9, #8, r9, asr #7
strb r8, [r1], r2 ; the result is transposed back and stored
ldrne r6, [r0] ; load data
strb r9, [r1], r2
ldrne r8, [r0, #4]
ldrneh r10, [r0, #8]
bne bil_width_loop_2nd
subs r12, r12, #1
add r0, r0, #4 ; update src for next row
add r11, r11, #1
mov r1, r11
bne bil_height_loop_2nd
ldmia sp!, {r4 - r11, pc}
|bil_null_2nd_filter|
|bil_height_loop_null_2nd|
mov lr, r3, lsr #2
|bil_width_loop_null_2nd|
ldr r6, [r0], #4 ; load data
subs lr, lr, #1
ldr r8, [r0], #4
strb r6, [r1], r2 ; store data
mov r7, r6, lsr #16
strb r7, [r1], r2
mov r9, r8, lsr #16
strb r8, [r1], r2
strb r9, [r1], r2
bne bil_width_loop_null_2nd
subs r12, r12, #1
add r0, r0, #4
add r11, r11, #1
mov r1, r11
bne bil_height_loop_null_2nd
ldmia sp!, {r4 - r11, pc}
ENDP ; |aom_filter_block2d_second_pass_media|
END

View File

@@ -1,199 +0,0 @@
/*
* Copyright (c) 2016 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1, int16x8_t *a2,
int16x8_t *a3, int16x8_t *a4, int16x8_t *a5,
int16x8_t *a6, int16x8_t *a7) {
const int16x8_t b0 = vaddq_s16(*a0, *a1);
const int16x8_t b1 = vsubq_s16(*a0, *a1);
const int16x8_t b2 = vaddq_s16(*a2, *a3);
const int16x8_t b3 = vsubq_s16(*a2, *a3);
const int16x8_t b4 = vaddq_s16(*a4, *a5);
const int16x8_t b5 = vsubq_s16(*a4, *a5);
const int16x8_t b6 = vaddq_s16(*a6, *a7);
const int16x8_t b7 = vsubq_s16(*a6, *a7);
const int16x8_t c0 = vaddq_s16(b0, b2);
const int16x8_t c1 = vaddq_s16(b1, b3);
const int16x8_t c2 = vsubq_s16(b0, b2);
const int16x8_t c3 = vsubq_s16(b1, b3);
const int16x8_t c4 = vaddq_s16(b4, b6);
const int16x8_t c5 = vaddq_s16(b5, b7);
const int16x8_t c6 = vsubq_s16(b4, b6);
const int16x8_t c7 = vsubq_s16(b5, b7);
*a0 = vaddq_s16(c0, c4);
*a1 = vsubq_s16(c2, c6);
*a2 = vsubq_s16(c0, c4);
*a3 = vaddq_s16(c2, c6);
*a4 = vaddq_s16(c3, c7);
*a5 = vsubq_s16(c3, c7);
*a6 = vsubq_s16(c1, c5);
*a7 = vaddq_s16(c1, c5);
}
// TODO(johannkoenig): Make a transpose library and dedup with idct. Consider
// reversing transpose order which may make it easier for the compiler to
// reconcile the vtrn.64 moves.
static void transpose8x8(int16x8_t *a0, int16x8_t *a1, int16x8_t *a2,
int16x8_t *a3, int16x8_t *a4, int16x8_t *a5,
int16x8_t *a6, int16x8_t *a7) {
// Swap 64 bit elements. Goes from:
// a0: 00 01 02 03 04 05 06 07
// a1: 08 09 10 11 12 13 14 15
// a2: 16 17 18 19 20 21 22 23
// a3: 24 25 26 27 28 29 30 31
// a4: 32 33 34 35 36 37 38 39
// a5: 40 41 42 43 44 45 46 47
// a6: 48 49 50 51 52 53 54 55
// a7: 56 57 58 59 60 61 62 63
// to:
// a04_lo: 00 01 02 03 32 33 34 35
// a15_lo: 08 09 10 11 40 41 42 43
// a26_lo: 16 17 18 19 48 49 50 51
// a37_lo: 24 25 26 27 56 57 58 59
// a04_hi: 04 05 06 07 36 37 38 39
// a15_hi: 12 13 14 15 44 45 46 47
// a26_hi: 20 21 22 23 52 53 54 55
// a37_hi: 28 29 30 31 60 61 62 63
const int16x8_t a04_lo = vcombine_s16(vget_low_s16(*a0), vget_low_s16(*a4));
const int16x8_t a15_lo = vcombine_s16(vget_low_s16(*a1), vget_low_s16(*a5));
const int16x8_t a26_lo = vcombine_s16(vget_low_s16(*a2), vget_low_s16(*a6));
const int16x8_t a37_lo = vcombine_s16(vget_low_s16(*a3), vget_low_s16(*a7));
const int16x8_t a04_hi = vcombine_s16(vget_high_s16(*a0), vget_high_s16(*a4));
const int16x8_t a15_hi = vcombine_s16(vget_high_s16(*a1), vget_high_s16(*a5));
const int16x8_t a26_hi = vcombine_s16(vget_high_s16(*a2), vget_high_s16(*a6));
const int16x8_t a37_hi = vcombine_s16(vget_high_s16(*a3), vget_high_s16(*a7));
// Swap 32 bit elements resulting in:
// a0246_lo:
// 00 01 16 17 32 33 48 49
// 02 03 18 19 34 35 50 51
// a1357_lo:
// 08 09 24 25 40 41 56 57
// 10 11 26 27 42 43 58 59
// a0246_hi:
// 04 05 20 21 36 37 52 53
// 06 07 22 23 38 39 54 55
// a1657_hi:
// 12 13 28 29 44 45 60 61
// 14 15 30 31 46 47 62 63
const int32x4x2_t a0246_lo =
vtrnq_s32(vreinterpretq_s32_s16(a04_lo), vreinterpretq_s32_s16(a26_lo));
const int32x4x2_t a1357_lo =
vtrnq_s32(vreinterpretq_s32_s16(a15_lo), vreinterpretq_s32_s16(a37_lo));
const int32x4x2_t a0246_hi =
vtrnq_s32(vreinterpretq_s32_s16(a04_hi), vreinterpretq_s32_s16(a26_hi));
const int32x4x2_t a1357_hi =
vtrnq_s32(vreinterpretq_s32_s16(a15_hi), vreinterpretq_s32_s16(a37_hi));
// Swap 16 bit elements resulting in:
// b0:
// 00 08 16 24 32 40 48 56
// 01 09 17 25 33 41 49 57
// b1:
// 02 10 18 26 34 42 50 58
// 03 11 19 27 35 43 51 59
// b2:
// 04 12 20 28 36 44 52 60
// 05 13 21 29 37 45 53 61
// b3:
// 06 14 22 30 38 46 54 62
// 07 15 23 31 39 47 55 63
const int16x8x2_t b0 = vtrnq_s16(vreinterpretq_s16_s32(a0246_lo.val[0]),
vreinterpretq_s16_s32(a1357_lo.val[0]));
const int16x8x2_t b1 = vtrnq_s16(vreinterpretq_s16_s32(a0246_lo.val[1]),
vreinterpretq_s16_s32(a1357_lo.val[1]));
const int16x8x2_t b2 = vtrnq_s16(vreinterpretq_s16_s32(a0246_hi.val[0]),
vreinterpretq_s16_s32(a1357_hi.val[0]));
const int16x8x2_t b3 = vtrnq_s16(vreinterpretq_s16_s32(a0246_hi.val[1]),
vreinterpretq_s16_s32(a1357_hi.val[1]));
*a0 = b0.val[0];
*a1 = b0.val[1];
*a2 = b1.val[0];
*a3 = b1.val[1];
*a4 = b2.val[0];
*a5 = b2.val[1];
*a6 = b3.val[0];
*a7 = b3.val[1];
}
void aom_hadamard_8x8_neon(const int16_t *src_diff, int src_stride,
int16_t *coeff) {
int16x8_t a0 = vld1q_s16(src_diff);
int16x8_t a1 = vld1q_s16(src_diff + src_stride);
int16x8_t a2 = vld1q_s16(src_diff + 2 * src_stride);
int16x8_t a3 = vld1q_s16(src_diff + 3 * src_stride);
int16x8_t a4 = vld1q_s16(src_diff + 4 * src_stride);
int16x8_t a5 = vld1q_s16(src_diff + 5 * src_stride);
int16x8_t a6 = vld1q_s16(src_diff + 6 * src_stride);
int16x8_t a7 = vld1q_s16(src_diff + 7 * src_stride);
hadamard8x8_one_pass(&a0, &a1, &a2, &a3, &a4, &a5, &a6, &a7);
transpose8x8(&a0, &a1, &a2, &a3, &a4, &a5, &a6, &a7);
hadamard8x8_one_pass(&a0, &a1, &a2, &a3, &a4, &a5, &a6, &a7);
// Skip the second transpose because it is not required.
vst1q_s16(coeff + 0, a0);
vst1q_s16(coeff + 8, a1);
vst1q_s16(coeff + 16, a2);
vst1q_s16(coeff + 24, a3);
vst1q_s16(coeff + 32, a4);
vst1q_s16(coeff + 40, a5);
vst1q_s16(coeff + 48, a6);
vst1q_s16(coeff + 56, a7);
}
void aom_hadamard_16x16_neon(const int16_t *src_diff, int src_stride,
int16_t *coeff) {
int i;
/* Rearrange 16x16 to 8x32 and remove stride.
* Top left first. */
aom_hadamard_8x8_neon(src_diff + 0 + 0 * src_stride, src_stride, coeff + 0);
/* Top right. */
aom_hadamard_8x8_neon(src_diff + 8 + 0 * src_stride, src_stride, coeff + 64);
/* Bottom left. */
aom_hadamard_8x8_neon(src_diff + 0 + 8 * src_stride, src_stride, coeff + 128);
/* Bottom right. */
aom_hadamard_8x8_neon(src_diff + 8 + 8 * src_stride, src_stride, coeff + 192);
for (i = 0; i < 64; i += 8) {
const int16x8_t a0 = vld1q_s16(coeff + 0);
const int16x8_t a1 = vld1q_s16(coeff + 64);
const int16x8_t a2 = vld1q_s16(coeff + 128);
const int16x8_t a3 = vld1q_s16(coeff + 192);
const int16x8_t b0 = vhaddq_s16(a0, a1);
const int16x8_t b1 = vhsubq_s16(a0, a1);
const int16x8_t b2 = vhaddq_s16(a2, a3);
const int16x8_t b3 = vhsubq_s16(a2, a3);
const int16x8_t c0 = vaddq_s16(b0, b2);
const int16x8_t c1 = vaddq_s16(b1, b3);
const int16x8_t c2 = vsubq_s16(b0, b2);
const int16x8_t c3 = vsubq_s16(b1, b3);
vst1q_s16(coeff + 0, c0);
vst1q_s16(coeff + 64, c1);
vst1q_s16(coeff + 128, c2);
vst1q_s16(coeff + 192, c3);
coeff += 8;
}
}

View File

@@ -1,59 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct16x16_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d2u8, d3u8, d30u8, d31u8;
uint64x1_t d2u64, d3u64, d4u64, d5u64;
uint16x8_t q0u16, q9u16, q10u16, q11u16, q12u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, j, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 6);
q0s16 = vdupq_n_s16(a1);
q0u16 = vreinterpretq_u16_s16(q0s16);
for (d1 = d2 = dest, i = 0; i < 4; i++) {
for (j = 0; j < 2; j++) {
d2u64 = vld1_u64((const uint64_t *)d1);
d3u64 = vld1_u64((const uint64_t *)(d1 + 8));
d1 += dest_stride;
d4u64 = vld1_u64((const uint64_t *)d1);
d5u64 = vld1_u64((const uint64_t *)(d1 + 8));
d1 += dest_stride;
q9u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d2u64));
q10u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d3u64));
q11u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d4u64));
q12u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d5u64));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d30u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
d31u8 = vqmovun_s16(vreinterpretq_s16_u16(q12u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
vst1_u64((uint64_t *)(d2 + 8), vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d30u8));
vst1_u64((uint64_t *)(d2 + 8), vreinterpret_u64_u8(d31u8));
d2 += dest_stride;
}
}
return;
}

View File

@@ -1,147 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |aom_idct32x32_1_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;TODO(hkuang): put the following macros in a seperate
;file so other idct function could also use them.
MACRO
LD_16x8 $src, $stride
vld1.8 {q8}, [$src], $stride
vld1.8 {q9}, [$src], $stride
vld1.8 {q10}, [$src], $stride
vld1.8 {q11}, [$src], $stride
vld1.8 {q12}, [$src], $stride
vld1.8 {q13}, [$src], $stride
vld1.8 {q14}, [$src], $stride
vld1.8 {q15}, [$src], $stride
MEND
MACRO
ADD_DIFF_16x8 $diff
vqadd.u8 q8, q8, $diff
vqadd.u8 q9, q9, $diff
vqadd.u8 q10, q10, $diff
vqadd.u8 q11, q11, $diff
vqadd.u8 q12, q12, $diff
vqadd.u8 q13, q13, $diff
vqadd.u8 q14, q14, $diff
vqadd.u8 q15, q15, $diff
MEND
MACRO
SUB_DIFF_16x8 $diff
vqsub.u8 q8, q8, $diff
vqsub.u8 q9, q9, $diff
vqsub.u8 q10, q10, $diff
vqsub.u8 q11, q11, $diff
vqsub.u8 q12, q12, $diff
vqsub.u8 q13, q13, $diff
vqsub.u8 q14, q14, $diff
vqsub.u8 q15, q15, $diff
MEND
MACRO
ST_16x8 $dst, $stride
vst1.8 {q8}, [$dst], $stride
vst1.8 {q9}, [$dst], $stride
vst1.8 {q10},[$dst], $stride
vst1.8 {q11},[$dst], $stride
vst1.8 {q12},[$dst], $stride
vst1.8 {q13},[$dst], $stride
vst1.8 {q14},[$dst], $stride
vst1.8 {q15},[$dst], $stride
MEND
;void aom_idct32x32_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride
|aom_idct32x32_1_add_neon| PROC
push {lr}
pld [r1]
add r3, r1, #16 ; r3 dest + 16 for second loop
ldrsh r0, [r0]
; generate cospi_16_64 = 11585
mov r12, #0x2d00
add r12, #0x41
; out = dct_const_round_shift(input[0] * cospi_16_64)
mul r0, r0, r12 ; input[0] * cospi_16_64
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; out = dct_const_round_shift(out * cospi_16_64)
mul r0, r0, r12 ; out * cospi_16_64
mov r12, r1 ; save dest
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; a1 = ROUND_POWER_OF_TWO(out, 6)
add r0, r0, #32 ; + (1 <<((6) - 1))
asrs r0, r0, #6 ; >> 6
bge diff_positive_32_32
diff_negative_32_32
neg r0, r0
usat r0, #8, r0
vdup.u8 q0, r0
mov r0, #4
diff_negative_32_32_loop
sub r0, #1
LD_16x8 r1, r2
SUB_DIFF_16x8 q0
ST_16x8 r12, r2
LD_16x8 r1, r2
SUB_DIFF_16x8 q0
ST_16x8 r12, r2
cmp r0, #2
moveq r1, r3
moveq r12, r3
cmp r0, #0
bne diff_negative_32_32_loop
pop {pc}
diff_positive_32_32
usat r0, #8, r0
vdup.u8 q0, r0
mov r0, #4
diff_positive_32_32_loop
sub r0, #1
LD_16x8 r1, r2
ADD_DIFF_16x8 q0
ST_16x8 r12, r2
LD_16x8 r1, r2
ADD_DIFF_16x8 q0
ST_16x8 r12, r2
cmp r0, #2
moveq r1, r3
moveq r12, r3
cmp r0, #0
bne diff_positive_32_32_loop
pop {pc}
ENDP ; |aom_idct32x32_1_add_neon|
END

View File

@@ -1,141 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
static INLINE void LD_16x8(uint8_t *d, int d_stride, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vld1q_u8(d);
d += d_stride;
*q9u8 = vld1q_u8(d);
d += d_stride;
*q10u8 = vld1q_u8(d);
d += d_stride;
*q11u8 = vld1q_u8(d);
d += d_stride;
*q12u8 = vld1q_u8(d);
d += d_stride;
*q13u8 = vld1q_u8(d);
d += d_stride;
*q14u8 = vld1q_u8(d);
d += d_stride;
*q15u8 = vld1q_u8(d);
return;
}
static INLINE void ADD_DIFF_16x8(uint8x16_t qdiffu8, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vqaddq_u8(*q8u8, qdiffu8);
*q9u8 = vqaddq_u8(*q9u8, qdiffu8);
*q10u8 = vqaddq_u8(*q10u8, qdiffu8);
*q11u8 = vqaddq_u8(*q11u8, qdiffu8);
*q12u8 = vqaddq_u8(*q12u8, qdiffu8);
*q13u8 = vqaddq_u8(*q13u8, qdiffu8);
*q14u8 = vqaddq_u8(*q14u8, qdiffu8);
*q15u8 = vqaddq_u8(*q15u8, qdiffu8);
return;
}
static INLINE void SUB_DIFF_16x8(uint8x16_t qdiffu8, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vqsubq_u8(*q8u8, qdiffu8);
*q9u8 = vqsubq_u8(*q9u8, qdiffu8);
*q10u8 = vqsubq_u8(*q10u8, qdiffu8);
*q11u8 = vqsubq_u8(*q11u8, qdiffu8);
*q12u8 = vqsubq_u8(*q12u8, qdiffu8);
*q13u8 = vqsubq_u8(*q13u8, qdiffu8);
*q14u8 = vqsubq_u8(*q14u8, qdiffu8);
*q15u8 = vqsubq_u8(*q15u8, qdiffu8);
return;
}
static INLINE void ST_16x8(uint8_t *d, int d_stride, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
vst1q_u8(d, *q8u8);
d += d_stride;
vst1q_u8(d, *q9u8);
d += d_stride;
vst1q_u8(d, *q10u8);
d += d_stride;
vst1q_u8(d, *q11u8);
d += d_stride;
vst1q_u8(d, *q12u8);
d += d_stride;
vst1q_u8(d, *q13u8);
d += d_stride;
vst1q_u8(d, *q14u8);
d += d_stride;
vst1q_u8(d, *q15u8);
return;
}
void aom_idct32x32_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x16_t q0u8, q8u8, q9u8, q10u8, q11u8, q12u8, q13u8, q14u8, q15u8;
int i, j, dest_stride8;
uint8_t *d;
int16_t a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 6);
dest_stride8 = dest_stride * 8;
if (a1 >= 0) { // diff_positive_32_32
a1 = a1 < 0 ? 0 : a1 > 255 ? 255 : a1;
q0u8 = vdupq_n_u8(a1);
for (i = 0; i < 2; i++, dest += 16) { // diff_positive_32_32_loop
d = dest;
for (j = 0; j < 4; j++) {
LD_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ADD_DIFF_16x8(q0u8, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ST_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
d += dest_stride8;
}
}
} else { // diff_negative_32_32
a1 = -a1;
a1 = a1 < 0 ? 0 : a1 > 255 ? 255 : a1;
q0u8 = vdupq_n_u8(a1);
for (i = 0; i < 2; i++, dest += 16) { // diff_negative_32_32_loop
d = dest;
for (j = 0; j < 4; j++) {
LD_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
SUB_DIFF_16x8(q0u8, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ST_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
d += dest_stride8;
}
}
}
return;
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,47 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct4x4_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d6u8;
uint32x2_t d2u32 = vdup_n_u32(0);
uint16x8_t q8u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 4);
q0s16 = vdupq_n_s16(a1);
// dc_only_idct_add
d1 = d2 = dest;
for (i = 0; i < 2; i++) {
d2u32 = vld1_lane_u32((const uint32_t *)d1, d2u32, 0);
d1 += dest_stride;
d2u32 = vld1_lane_u32((const uint32_t *)d1, d2u32, 1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q0s16), vreinterpret_u8_u32(d2u32));
d6u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
vst1_lane_u32((uint32_t *)d2, vreinterpret_u32_u8(d6u8), 0);
d2 += dest_stride;
vst1_lane_u32((uint32_t *)d2, vreinterpret_u32_u8(d6u8), 1);
d2 += dest_stride;
}
return;
}

View File

@@ -1,146 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/txfm_common.h"
void aom_idct4x4_16_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d26u8, d27u8;
uint32x2_t d26u32, d27u32;
uint16x8_t q8u16, q9u16;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16;
int16x4_t d22s16, d23s16, d24s16, d26s16, d27s16, d28s16, d29s16;
int16x8_t q8s16, q9s16, q13s16, q14s16;
int32x4_t q1s32, q13s32, q14s32, q15s32;
int16x4x2_t d0x2s16, d1x2s16;
int32x4x2_t q0x2s32;
uint8_t *d;
d26u32 = d27u32 = vdup_n_u32(0);
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_low_s16(q9s16);
d19s16 = vget_high_s16(q9s16);
d0x2s16 = vtrn_s16(d16s16, d17s16);
d1x2s16 = vtrn_s16(d18s16, d19s16);
q8s16 = vcombine_s16(d0x2s16.val[0], d0x2s16.val[1]);
q9s16 = vcombine_s16(d1x2s16.val[0], d1x2s16.val[1]);
d20s16 = vdup_n_s16((int16_t)cospi_8_64);
d21s16 = vdup_n_s16((int16_t)cospi_16_64);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q9s16));
d16s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d17s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d18s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d19s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d22s16 = vdup_n_s16((int16_t)cospi_24_64);
// stage 1
d23s16 = vadd_s16(d16s16, d18s16);
d24s16 = vsub_s16(d16s16, d18s16);
q15s32 = vmull_s16(d17s16, d22s16);
q1s32 = vmull_s16(d17s16, d20s16);
q13s32 = vmull_s16(d23s16, d21s16);
q14s32 = vmull_s16(d24s16, d21s16);
q15s32 = vmlsl_s16(q15s32, d19s16, d20s16);
q1s32 = vmlal_s16(q1s32, d19s16, d22s16);
d26s16 = vqrshrn_n_s32(q13s32, 14);
d27s16 = vqrshrn_n_s32(q14s32, 14);
d29s16 = vqrshrn_n_s32(q15s32, 14);
d28s16 = vqrshrn_n_s32(q1s32, 14);
q13s16 = vcombine_s16(d26s16, d27s16);
q14s16 = vcombine_s16(d28s16, d29s16);
// stage 2
q8s16 = vaddq_s16(q13s16, q14s16);
q9s16 = vsubq_s16(q13s16, q14s16);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_high_s16(q9s16); // vswp d18 d19
d19s16 = vget_low_s16(q9s16);
d0x2s16 = vtrn_s16(d16s16, d17s16);
d1x2s16 = vtrn_s16(d18s16, d19s16);
q8s16 = vcombine_s16(d0x2s16.val[0], d0x2s16.val[1]);
q9s16 = vcombine_s16(d1x2s16.val[0], d1x2s16.val[1]);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q9s16));
d16s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d17s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d18s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d19s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
// do the transform on columns
// stage 1
d23s16 = vadd_s16(d16s16, d18s16);
d24s16 = vsub_s16(d16s16, d18s16);
q15s32 = vmull_s16(d17s16, d22s16);
q1s32 = vmull_s16(d17s16, d20s16);
q13s32 = vmull_s16(d23s16, d21s16);
q14s32 = vmull_s16(d24s16, d21s16);
q15s32 = vmlsl_s16(q15s32, d19s16, d20s16);
q1s32 = vmlal_s16(q1s32, d19s16, d22s16);
d26s16 = vqrshrn_n_s32(q13s32, 14);
d27s16 = vqrshrn_n_s32(q14s32, 14);
d29s16 = vqrshrn_n_s32(q15s32, 14);
d28s16 = vqrshrn_n_s32(q1s32, 14);
q13s16 = vcombine_s16(d26s16, d27s16);
q14s16 = vcombine_s16(d28s16, d29s16);
// stage 2
q8s16 = vaddq_s16(q13s16, q14s16);
q9s16 = vsubq_s16(q13s16, q14s16);
q8s16 = vrshrq_n_s16(q8s16, 4);
q9s16 = vrshrq_n_s16(q9s16, 4);
d = dest;
d26u32 = vld1_lane_u32((const uint32_t *)d, d26u32, 0);
d += dest_stride;
d26u32 = vld1_lane_u32((const uint32_t *)d, d26u32, 1);
d += dest_stride;
d27u32 = vld1_lane_u32((const uint32_t *)d, d27u32, 1);
d += dest_stride;
d27u32 = vld1_lane_u32((const uint32_t *)d, d27u32, 0);
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u32(d26u32));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u32(d27u32));
d26u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d27u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d = dest;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d26u8), 0);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d26u8), 1);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d27u8), 1);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d27u8), 0);
return;
}

View File

@@ -1,62 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct8x8_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d2u8, d3u8, d30u8, d31u8;
uint64x1_t d2u64, d3u64, d4u64, d5u64;
uint16x8_t q0u16, q9u16, q10u16, q11u16, q12u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 5);
q0s16 = vdupq_n_s16(a1);
q0u16 = vreinterpretq_u16_s16(q0s16);
d1 = d2 = dest;
for (i = 0; i < 2; i++) {
d2u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d4u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d5u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
q9u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d2u64));
q10u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d3u64));
q11u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d4u64));
q12u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d5u64));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d30u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
d31u8 = vqmovun_s16(vreinterpretq_s16_u16(q12u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d30u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d31u8));
d2 += dest_stride;
}
return;
}

View File

@@ -1,509 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/txfm_common.h"
static INLINE void TRANSPOSE8X8(int16x8_t *q8s16, int16x8_t *q9s16,
int16x8_t *q10s16, int16x8_t *q11s16,
int16x8_t *q12s16, int16x8_t *q13s16,
int16x8_t *q14s16, int16x8_t *q15s16) {
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int32x4x2_t q0x2s32, q1x2s32, q2x2s32, q3x2s32;
int16x8x2_t q0x2s16, q1x2s16, q2x2s16, q3x2s16;
d16s16 = vget_low_s16(*q8s16);
d17s16 = vget_high_s16(*q8s16);
d18s16 = vget_low_s16(*q9s16);
d19s16 = vget_high_s16(*q9s16);
d20s16 = vget_low_s16(*q10s16);
d21s16 = vget_high_s16(*q10s16);
d22s16 = vget_low_s16(*q11s16);
d23s16 = vget_high_s16(*q11s16);
d24s16 = vget_low_s16(*q12s16);
d25s16 = vget_high_s16(*q12s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d30s16 = vget_low_s16(*q15s16);
d31s16 = vget_high_s16(*q15s16);
*q8s16 = vcombine_s16(d16s16, d24s16); // vswp d17, d24
*q9s16 = vcombine_s16(d18s16, d26s16); // vswp d19, d26
*q10s16 = vcombine_s16(d20s16, d28s16); // vswp d21, d28
*q11s16 = vcombine_s16(d22s16, d30s16); // vswp d23, d30
*q12s16 = vcombine_s16(d17s16, d25s16);
*q13s16 = vcombine_s16(d19s16, d27s16);
*q14s16 = vcombine_s16(d21s16, d29s16);
*q15s16 = vcombine_s16(d23s16, d31s16);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q8s16), vreinterpretq_s32_s16(*q10s16));
q1x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q9s16), vreinterpretq_s32_s16(*q11s16));
q2x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q12s16), vreinterpretq_s32_s16(*q14s16));
q3x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q13s16), vreinterpretq_s32_s16(*q15s16));
q0x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[0]), // q8
vreinterpretq_s16_s32(q1x2s32.val[0])); // q9
q1x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[1]), // q10
vreinterpretq_s16_s32(q1x2s32.val[1])); // q11
q2x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[0]), // q12
vreinterpretq_s16_s32(q3x2s32.val[0])); // q13
q3x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[1]), // q14
vreinterpretq_s16_s32(q3x2s32.val[1])); // q15
*q8s16 = q0x2s16.val[0];
*q9s16 = q0x2s16.val[1];
*q10s16 = q1x2s16.val[0];
*q11s16 = q1x2s16.val[1];
*q12s16 = q2x2s16.val[0];
*q13s16 = q2x2s16.val[1];
*q14s16 = q3x2s16.val[0];
*q15s16 = q3x2s16.val[1];
return;
}
static INLINE void IDCT8x8_1D(int16x8_t *q8s16, int16x8_t *q9s16,
int16x8_t *q10s16, int16x8_t *q11s16,
int16x8_t *q12s16, int16x8_t *q13s16,
int16x8_t *q14s16, int16x8_t *q15s16) {
int16x4_t d0s16, d1s16, d2s16, d3s16;
int16x4_t d8s16, d9s16, d10s16, d11s16, d12s16, d13s16, d14s16, d15s16;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int32x4_t q2s32, q3s32, q5s32, q6s32, q8s32, q9s32;
int32x4_t q10s32, q11s32, q12s32, q13s32, q15s32;
d0s16 = vdup_n_s16((int16_t)cospi_28_64);
d1s16 = vdup_n_s16((int16_t)cospi_4_64);
d2s16 = vdup_n_s16((int16_t)cospi_12_64);
d3s16 = vdup_n_s16((int16_t)cospi_20_64);
d16s16 = vget_low_s16(*q8s16);
d17s16 = vget_high_s16(*q8s16);
d18s16 = vget_low_s16(*q9s16);
d19s16 = vget_high_s16(*q9s16);
d20s16 = vget_low_s16(*q10s16);
d21s16 = vget_high_s16(*q10s16);
d22s16 = vget_low_s16(*q11s16);
d23s16 = vget_high_s16(*q11s16);
d24s16 = vget_low_s16(*q12s16);
d25s16 = vget_high_s16(*q12s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d30s16 = vget_low_s16(*q15s16);
d31s16 = vget_high_s16(*q15s16);
q2s32 = vmull_s16(d18s16, d0s16);
q3s32 = vmull_s16(d19s16, d0s16);
q5s32 = vmull_s16(d26s16, d2s16);
q6s32 = vmull_s16(d27s16, d2s16);
q2s32 = vmlsl_s16(q2s32, d30s16, d1s16);
q3s32 = vmlsl_s16(q3s32, d31s16, d1s16);
q5s32 = vmlsl_s16(q5s32, d22s16, d3s16);
q6s32 = vmlsl_s16(q6s32, d23s16, d3s16);
d8s16 = vqrshrn_n_s32(q2s32, 14);
d9s16 = vqrshrn_n_s32(q3s32, 14);
d10s16 = vqrshrn_n_s32(q5s32, 14);
d11s16 = vqrshrn_n_s32(q6s32, 14);
q4s16 = vcombine_s16(d8s16, d9s16);
q5s16 = vcombine_s16(d10s16, d11s16);
q2s32 = vmull_s16(d18s16, d1s16);
q3s32 = vmull_s16(d19s16, d1s16);
q9s32 = vmull_s16(d26s16, d3s16);
q13s32 = vmull_s16(d27s16, d3s16);
q2s32 = vmlal_s16(q2s32, d30s16, d0s16);
q3s32 = vmlal_s16(q3s32, d31s16, d0s16);
q9s32 = vmlal_s16(q9s32, d22s16, d2s16);
q13s32 = vmlal_s16(q13s32, d23s16, d2s16);
d14s16 = vqrshrn_n_s32(q2s32, 14);
d15s16 = vqrshrn_n_s32(q3s32, 14);
d12s16 = vqrshrn_n_s32(q9s32, 14);
d13s16 = vqrshrn_n_s32(q13s32, 14);
q6s16 = vcombine_s16(d12s16, d13s16);
q7s16 = vcombine_s16(d14s16, d15s16);
d0s16 = vdup_n_s16((int16_t)cospi_16_64);
q2s32 = vmull_s16(d16s16, d0s16);
q3s32 = vmull_s16(d17s16, d0s16);
q13s32 = vmull_s16(d16s16, d0s16);
q15s32 = vmull_s16(d17s16, d0s16);
q2s32 = vmlal_s16(q2s32, d24s16, d0s16);
q3s32 = vmlal_s16(q3s32, d25s16, d0s16);
q13s32 = vmlsl_s16(q13s32, d24s16, d0s16);
q15s32 = vmlsl_s16(q15s32, d25s16, d0s16);
d0s16 = vdup_n_s16((int16_t)cospi_24_64);
d1s16 = vdup_n_s16((int16_t)cospi_8_64);
d18s16 = vqrshrn_n_s32(q2s32, 14);
d19s16 = vqrshrn_n_s32(q3s32, 14);
d22s16 = vqrshrn_n_s32(q13s32, 14);
d23s16 = vqrshrn_n_s32(q15s32, 14);
*q9s16 = vcombine_s16(d18s16, d19s16);
*q11s16 = vcombine_s16(d22s16, d23s16);
q2s32 = vmull_s16(d20s16, d0s16);
q3s32 = vmull_s16(d21s16, d0s16);
q8s32 = vmull_s16(d20s16, d1s16);
q12s32 = vmull_s16(d21s16, d1s16);
q2s32 = vmlsl_s16(q2s32, d28s16, d1s16);
q3s32 = vmlsl_s16(q3s32, d29s16, d1s16);
q8s32 = vmlal_s16(q8s32, d28s16, d0s16);
q12s32 = vmlal_s16(q12s32, d29s16, d0s16);
d26s16 = vqrshrn_n_s32(q2s32, 14);
d27s16 = vqrshrn_n_s32(q3s32, 14);
d30s16 = vqrshrn_n_s32(q8s32, 14);
d31s16 = vqrshrn_n_s32(q12s32, 14);
*q13s16 = vcombine_s16(d26s16, d27s16);
*q15s16 = vcombine_s16(d30s16, d31s16);
q0s16 = vaddq_s16(*q9s16, *q15s16);
q1s16 = vaddq_s16(*q11s16, *q13s16);
q2s16 = vsubq_s16(*q11s16, *q13s16);
q3s16 = vsubq_s16(*q9s16, *q15s16);
*q13s16 = vsubq_s16(q4s16, q5s16);
q4s16 = vaddq_s16(q4s16, q5s16);
*q14s16 = vsubq_s16(q7s16, q6s16);
q7s16 = vaddq_s16(q7s16, q6s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d16s16 = vdup_n_s16((int16_t)cospi_16_64);
q9s32 = vmull_s16(d28s16, d16s16);
q10s32 = vmull_s16(d29s16, d16s16);
q11s32 = vmull_s16(d28s16, d16s16);
q12s32 = vmull_s16(d29s16, d16s16);
q9s32 = vmlsl_s16(q9s32, d26s16, d16s16);
q10s32 = vmlsl_s16(q10s32, d27s16, d16s16);
q11s32 = vmlal_s16(q11s32, d26s16, d16s16);
q12s32 = vmlal_s16(q12s32, d27s16, d16s16);
d10s16 = vqrshrn_n_s32(q9s32, 14);
d11s16 = vqrshrn_n_s32(q10s32, 14);
d12s16 = vqrshrn_n_s32(q11s32, 14);
d13s16 = vqrshrn_n_s32(q12s32, 14);
q5s16 = vcombine_s16(d10s16, d11s16);
q6s16 = vcombine_s16(d12s16, d13s16);
*q8s16 = vaddq_s16(q0s16, q7s16);
*q9s16 = vaddq_s16(q1s16, q6s16);
*q10s16 = vaddq_s16(q2s16, q5s16);
*q11s16 = vaddq_s16(q3s16, q4s16);
*q12s16 = vsubq_s16(q3s16, q4s16);
*q13s16 = vsubq_s16(q2s16, q5s16);
*q14s16 = vsubq_s16(q1s16, q6s16);
*q15s16 = vsubq_s16(q0s16, q7s16);
return;
}
void aom_idct8x8_64_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8_t *d1, *d2;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
uint64x1_t d0u64, d1u64, d2u64, d3u64;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
uint16x8_t q8u16, q9u16, q10u16, q11u16;
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
q10s16 = vld1q_s16(input + 16);
q11s16 = vld1q_s16(input + 24);
q12s16 = vld1q_s16(input + 32);
q13s16 = vld1q_s16(input + 40);
q14s16 = vld1q_s16(input + 48);
q15s16 = vld1q_s16(input + 56);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
q8s16 = vrshrq_n_s16(q8s16, 5);
q9s16 = vrshrq_n_s16(q9s16, 5);
q10s16 = vrshrq_n_s16(q10s16, 5);
q11s16 = vrshrq_n_s16(q11s16, 5);
q12s16 = vrshrq_n_s16(q12s16, 5);
q13s16 = vrshrq_n_s16(q13s16, 5);
q14s16 = vrshrq_n_s16(q14s16, 5);
q15s16 = vrshrq_n_s16(q15s16, 5);
d1 = d2 = dest;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
q8s16 = q12s16;
q9s16 = q13s16;
q10s16 = q14s16;
q11s16 = q15s16;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
return;
}
void aom_idct8x8_12_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8_t *d1, *d2;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
int16x4_t d10s16, d11s16, d12s16, d13s16, d16s16;
int16x4_t d26s16, d27s16, d28s16, d29s16;
uint64x1_t d0u64, d1u64, d2u64, d3u64;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
uint16x8_t q8u16, q9u16, q10u16, q11u16;
int32x4_t q9s32, q10s32, q11s32, q12s32;
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
q10s16 = vld1q_s16(input + 16);
q11s16 = vld1q_s16(input + 24);
q12s16 = vld1q_s16(input + 32);
q13s16 = vld1q_s16(input + 40);
q14s16 = vld1q_s16(input + 48);
q15s16 = vld1q_s16(input + 56);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
// First transform rows
// stage 1
q0s16 = vdupq_n_s16((int16_t)cospi_28_64 * 2);
q1s16 = vdupq_n_s16((int16_t)cospi_4_64 * 2);
q4s16 = vqrdmulhq_s16(q9s16, q0s16);
q0s16 = vdupq_n_s16(-(int16_t)cospi_20_64 * 2);
q7s16 = vqrdmulhq_s16(q9s16, q1s16);
q1s16 = vdupq_n_s16((int16_t)cospi_12_64 * 2);
q5s16 = vqrdmulhq_s16(q11s16, q0s16);
q0s16 = vdupq_n_s16((int16_t)cospi_16_64 * 2);
q6s16 = vqrdmulhq_s16(q11s16, q1s16);
// stage 2 & stage 3 - even half
q1s16 = vdupq_n_s16((int16_t)cospi_24_64 * 2);
q9s16 = vqrdmulhq_s16(q8s16, q0s16);
q0s16 = vdupq_n_s16((int16_t)cospi_8_64 * 2);
q13s16 = vqrdmulhq_s16(q10s16, q1s16);
q15s16 = vqrdmulhq_s16(q10s16, q0s16);
// stage 3 -odd half
q0s16 = vaddq_s16(q9s16, q15s16);
q1s16 = vaddq_s16(q9s16, q13s16);
q2s16 = vsubq_s16(q9s16, q13s16);
q3s16 = vsubq_s16(q9s16, q15s16);
// stage 2 - odd half
q13s16 = vsubq_s16(q4s16, q5s16);
q4s16 = vaddq_s16(q4s16, q5s16);
q14s16 = vsubq_s16(q7s16, q6s16);
q7s16 = vaddq_s16(q7s16, q6s16);
d26s16 = vget_low_s16(q13s16);
d27s16 = vget_high_s16(q13s16);
d28s16 = vget_low_s16(q14s16);
d29s16 = vget_high_s16(q14s16);
d16s16 = vdup_n_s16((int16_t)cospi_16_64);
q9s32 = vmull_s16(d28s16, d16s16);
q10s32 = vmull_s16(d29s16, d16s16);
q11s32 = vmull_s16(d28s16, d16s16);
q12s32 = vmull_s16(d29s16, d16s16);
q9s32 = vmlsl_s16(q9s32, d26s16, d16s16);
q10s32 = vmlsl_s16(q10s32, d27s16, d16s16);
q11s32 = vmlal_s16(q11s32, d26s16, d16s16);
q12s32 = vmlal_s16(q12s32, d27s16, d16s16);
d10s16 = vqrshrn_n_s32(q9s32, 14);
d11s16 = vqrshrn_n_s32(q10s32, 14);
d12s16 = vqrshrn_n_s32(q11s32, 14);
d13s16 = vqrshrn_n_s32(q12s32, 14);
q5s16 = vcombine_s16(d10s16, d11s16);
q6s16 = vcombine_s16(d12s16, d13s16);
// stage 4
q8s16 = vaddq_s16(q0s16, q7s16);
q9s16 = vaddq_s16(q1s16, q6s16);
q10s16 = vaddq_s16(q2s16, q5s16);
q11s16 = vaddq_s16(q3s16, q4s16);
q12s16 = vsubq_s16(q3s16, q4s16);
q13s16 = vsubq_s16(q2s16, q5s16);
q14s16 = vsubq_s16(q1s16, q6s16);
q15s16 = vsubq_s16(q0s16, q7s16);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
q8s16 = vrshrq_n_s16(q8s16, 5);
q9s16 = vrshrq_n_s16(q9s16, 5);
q10s16 = vrshrq_n_s16(q10s16, 5);
q11s16 = vrshrq_n_s16(q11s16, 5);
q12s16 = vrshrq_n_s16(q12s16, 5);
q13s16 = vrshrq_n_s16(q13s16, 5);
q14s16 = vrshrq_n_s16(q14s16, 5);
q15s16 = vrshrq_n_s16(q15s16, 5);
d1 = d2 = dest;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
q8s16 = q12s16;
q9s16 = q13s16;
q10s16 = q14s16;
q11s16 = q15s16;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
return;
}

View File

@@ -1,819 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
//------------------------------------------------------------------------------
// DC 4x4
// 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
const uint8_t *left, int do_above, int do_left) {
uint16x8_t sum_top;
uint16x8_t sum_left;
uint8x8_t dc0;
if (do_above) {
const uint8x8_t A = vld1_u8(above); // top row
const uint16x4_t p0 = vpaddl_u8(A); // cascading summation of the top
const uint16x4_t p1 = vpadd_u16(p0, p0);
sum_top = vcombine_u16(p1, p1);
}
if (do_left) {
const uint8x8_t L = vld1_u8(left); // left border
const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left
const uint16x4_t p1 = vpadd_u16(p0, p0);
sum_left = vcombine_u16(p1, p1);
}
if (do_above && do_left) {
const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
dc0 = vrshrn_n_u16(sum, 3);
} else if (do_above) {
dc0 = vrshrn_n_u16(sum_top, 2);
} else if (do_left) {
dc0 = vrshrn_n_u16(sum_left, 2);
} else {
dc0 = vdup_n_u8(0x80);
}
{
const uint8x8_t dc = vdup_lane_u8(dc0, 0);
int i;
for (i = 0; i < 4; ++i) {
vst1_lane_u32((uint32_t *)(dst + i * stride), vreinterpret_u32_u8(dc), 0);
}
}
}
void aom_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_4x4(dst, stride, above, left, 1, 1);
}
void aom_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
dc_4x4(dst, stride, NULL, left, 0, 1);
}
void aom_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)left;
dc_4x4(dst, stride, above, NULL, 1, 0);
}
void aom_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
(void)left;
dc_4x4(dst, stride, NULL, NULL, 0, 0);
}
//------------------------------------------------------------------------------
// DC 8x8
// 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
const uint8_t *left, int do_above, int do_left) {
uint16x8_t sum_top;
uint16x8_t sum_left;
uint8x8_t dc0;
if (do_above) {
const uint8x8_t A = vld1_u8(above); // top row
const uint16x4_t p0 = vpaddl_u8(A); // cascading summation of the top
const uint16x4_t p1 = vpadd_u16(p0, p0);
const uint16x4_t p2 = vpadd_u16(p1, p1);
sum_top = vcombine_u16(p2, p2);
}
if (do_left) {
const uint8x8_t L = vld1_u8(left); // left border
const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left
const uint16x4_t p1 = vpadd_u16(p0, p0);
const uint16x4_t p2 = vpadd_u16(p1, p1);
sum_left = vcombine_u16(p2, p2);
}
if (do_above && do_left) {
const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
dc0 = vrshrn_n_u16(sum, 4);
} else if (do_above) {
dc0 = vrshrn_n_u16(sum_top, 3);
} else if (do_left) {
dc0 = vrshrn_n_u16(sum_left, 3);
} else {
dc0 = vdup_n_u8(0x80);
}
{
const uint8x8_t dc = vdup_lane_u8(dc0, 0);
int i;
for (i = 0; i < 8; ++i) {
vst1_u32((uint32_t *)(dst + i * stride), vreinterpret_u32_u8(dc));
}
}
}
void aom_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_8x8(dst, stride, above, left, 1, 1);
}
void aom_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
dc_8x8(dst, stride, NULL, left, 0, 1);
}
void aom_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)left;
dc_8x8(dst, stride, above, NULL, 1, 0);
}
void aom_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
(void)left;
dc_8x8(dst, stride, NULL, NULL, 0, 0);
}
//------------------------------------------------------------------------------
// DC 16x16
// 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_16x16(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left,
int do_above, int do_left) {
uint16x8_t sum_top;
uint16x8_t sum_left;
uint8x8_t dc0;
if (do_above) {
const uint8x16_t A = vld1q_u8(above); // top row
const uint16x8_t p0 = vpaddlq_u8(A); // cascading summation of the top
const uint16x4_t p1 = vadd_u16(vget_low_u16(p0), vget_high_u16(p0));
const uint16x4_t p2 = vpadd_u16(p1, p1);
const uint16x4_t p3 = vpadd_u16(p2, p2);
sum_top = vcombine_u16(p3, p3);
}
if (do_left) {
const uint8x16_t L = vld1q_u8(left); // left row
const uint16x8_t p0 = vpaddlq_u8(L); // cascading summation of the left
const uint16x4_t p1 = vadd_u16(vget_low_u16(p0), vget_high_u16(p0));
const uint16x4_t p2 = vpadd_u16(p1, p1);
const uint16x4_t p3 = vpadd_u16(p2, p2);
sum_left = vcombine_u16(p3, p3);
}
if (do_above && do_left) {
const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
dc0 = vrshrn_n_u16(sum, 5);
} else if (do_above) {
dc0 = vrshrn_n_u16(sum_top, 4);
} else if (do_left) {
dc0 = vrshrn_n_u16(sum_left, 4);
} else {
dc0 = vdup_n_u8(0x80);
}
{
const uint8x16_t dc = vdupq_lane_u8(dc0, 0);
int i;
for (i = 0; i < 16; ++i) {
vst1q_u8(dst + i * stride, dc);
}
}
}
void aom_dc_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_16x16(dst, stride, above, left, 1, 1);
}
void aom_dc_left_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
dc_16x16(dst, stride, NULL, left, 0, 1);
}
void aom_dc_top_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)left;
dc_16x16(dst, stride, above, NULL, 1, 0);
}
void aom_dc_128_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
(void)left;
dc_16x16(dst, stride, NULL, NULL, 0, 0);
}
//------------------------------------------------------------------------------
// DC 32x32
// 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_32x32(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left,
int do_above, int do_left) {
uint16x8_t sum_top;
uint16x8_t sum_left;
uint8x8_t dc0;
if (do_above) {
const uint8x16_t A0 = vld1q_u8(above); // top row
const uint8x16_t A1 = vld1q_u8(above + 16);
const uint16x8_t p0 = vpaddlq_u8(A0); // cascading summation of the top
const uint16x8_t p1 = vpaddlq_u8(A1);
const uint16x8_t p2 = vaddq_u16(p0, p1);
const uint16x4_t p3 = vadd_u16(vget_low_u16(p2), vget_high_u16(p2));
const uint16x4_t p4 = vpadd_u16(p3, p3);
const uint16x4_t p5 = vpadd_u16(p4, p4);
sum_top = vcombine_u16(p5, p5);
}
if (do_left) {
const uint8x16_t L0 = vld1q_u8(left); // left row
const uint8x16_t L1 = vld1q_u8(left + 16);
const uint16x8_t p0 = vpaddlq_u8(L0); // cascading summation of the left
const uint16x8_t p1 = vpaddlq_u8(L1);
const uint16x8_t p2 = vaddq_u16(p0, p1);
const uint16x4_t p3 = vadd_u16(vget_low_u16(p2), vget_high_u16(p2));
const uint16x4_t p4 = vpadd_u16(p3, p3);
const uint16x4_t p5 = vpadd_u16(p4, p4);
sum_left = vcombine_u16(p5, p5);
}
if (do_above && do_left) {
const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
dc0 = vrshrn_n_u16(sum, 6);
} else if (do_above) {
dc0 = vrshrn_n_u16(sum_top, 5);
} else if (do_left) {
dc0 = vrshrn_n_u16(sum_left, 5);
} else {
dc0 = vdup_n_u8(0x80);
}
{
const uint8x16_t dc = vdupq_lane_u8(dc0, 0);
int i;
for (i = 0; i < 32; ++i) {
vst1q_u8(dst + i * stride, dc);
vst1q_u8(dst + i * stride + 16, dc);
}
}
}
void aom_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_32x32(dst, stride, above, left, 1, 1);
}
void aom_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
dc_32x32(dst, stride, NULL, left, 0, 1);
}
void aom_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)left;
dc_32x32(dst, stride, above, NULL, 1, 0);
}
void aom_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
(void)left;
dc_32x32(dst, stride, NULL, NULL, 0, 0);
}
// -----------------------------------------------------------------------------
void aom_d45_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
const uint64x1_t A0 = vreinterpret_u64_u8(vld1_u8(above)); // top row
const uint64x1_t A1 = vshr_n_u64(A0, 8);
const uint64x1_t A2 = vshr_n_u64(A0, 16);
const uint8x8_t ABCDEFGH = vreinterpret_u8_u64(A0);
const uint8x8_t BCDEFGH0 = vreinterpret_u8_u64(A1);
const uint8x8_t CDEFGH00 = vreinterpret_u8_u64(A2);
const uint8x8_t avg1 = vhadd_u8(ABCDEFGH, CDEFGH00);
const uint8x8_t avg2 = vrhadd_u8(avg1, BCDEFGH0);
const uint64x1_t avg2_u64 = vreinterpret_u64_u8(avg2);
const uint32x2_t r0 = vreinterpret_u32_u8(avg2);
const uint32x2_t r1 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 8));
const uint32x2_t r2 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 16));
const uint32x2_t r3 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 24));
(void)left;
vst1_lane_u32((uint32_t *)(dst + 0 * stride), r0, 0);
vst1_lane_u32((uint32_t *)(dst + 1 * stride), r1, 0);
vst1_lane_u32((uint32_t *)(dst + 2 * stride), r2, 0);
vst1_lane_u32((uint32_t *)(dst + 3 * stride), r3, 0);
dst[3 * stride + 3] = above[7];
}
void aom_d45_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
static const uint8_t shuffle1[8] = { 1, 2, 3, 4, 5, 6, 7, 7 };
static const uint8_t shuffle2[8] = { 2, 3, 4, 5, 6, 7, 7, 7 };
const uint8x8_t sh_12345677 = vld1_u8(shuffle1);
const uint8x8_t sh_23456777 = vld1_u8(shuffle2);
const uint8x8_t A0 = vld1_u8(above); // top row
const uint8x8_t A1 = vtbl1_u8(A0, sh_12345677);
const uint8x8_t A2 = vtbl1_u8(A0, sh_23456777);
const uint8x8_t avg1 = vhadd_u8(A0, A2);
uint8x8_t row = vrhadd_u8(avg1, A1);
int i;
(void)left;
for (i = 0; i < 7; ++i) {
vst1_u8(dst + i * stride, row);
row = vtbl1_u8(row, sh_12345677);
}
vst1_u8(dst + i * stride, row);
}
void aom_d45_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
const uint8x16_t A0 = vld1q_u8(above); // top row
const uint8x16_t above_right = vld1q_dup_u8(above + 15);
const uint8x16_t A1 = vextq_u8(A0, above_right, 1);
const uint8x16_t A2 = vextq_u8(A0, above_right, 2);
const uint8x16_t avg1 = vhaddq_u8(A0, A2);
uint8x16_t row = vrhaddq_u8(avg1, A1);
int i;
(void)left;
for (i = 0; i < 15; ++i) {
vst1q_u8(dst + i * stride, row);
row = vextq_u8(row, above_right, 1);
}
vst1q_u8(dst + i * stride, row);
}
// -----------------------------------------------------------------------------
void aom_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
const uint8x8_t XABCD_u8 = vld1_u8(above - 1);
const uint64x1_t XABCD = vreinterpret_u64_u8(XABCD_u8);
const uint64x1_t ____XABC = vshl_n_u64(XABCD, 32);
const uint32x2_t zero = vdup_n_u32(0);
const uint32x2_t IJKL = vld1_lane_u32((const uint32_t *)left, zero, 0);
const uint8x8_t IJKL_u8 = vreinterpret_u8_u32(IJKL);
const uint64x1_t LKJI____ = vreinterpret_u64_u8(vrev32_u8(IJKL_u8));
const uint64x1_t LKJIXABC = vorr_u64(LKJI____, ____XABC);
const uint8x8_t KJIXABC_ = vreinterpret_u8_u64(vshr_n_u64(LKJIXABC, 8));
const uint8x8_t JIXABC__ = vreinterpret_u8_u64(vshr_n_u64(LKJIXABC, 16));
const uint8_t D = vget_lane_u8(XABCD_u8, 4);
const uint8x8_t JIXABCD_ = vset_lane_u8(D, JIXABC__, 6);
const uint8x8_t LKJIXABC_u8 = vreinterpret_u8_u64(LKJIXABC);
const uint8x8_t avg1 = vhadd_u8(JIXABCD_, LKJIXABC_u8);
const uint8x8_t avg2 = vrhadd_u8(avg1, KJIXABC_);
const uint64x1_t avg2_u64 = vreinterpret_u64_u8(avg2);
const uint32x2_t r3 = vreinterpret_u32_u8(avg2);
const uint32x2_t r2 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 8));
const uint32x2_t r1 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 16));
const uint32x2_t r0 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 24));
vst1_lane_u32((uint32_t *)(dst + 0 * stride), r0, 0);
vst1_lane_u32((uint32_t *)(dst + 1 * stride), r1, 0);
vst1_lane_u32((uint32_t *)(dst + 2 * stride), r2, 0);
vst1_lane_u32((uint32_t *)(dst + 3 * stride), r3, 0);
}
#if !HAVE_NEON_ASM
void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint32x2_t d0u32 = vdup_n_u32(0);
(void)left;
d0u32 = vld1_lane_u32((const uint32_t *)above, d0u32, 0);
for (i = 0; i < 4; i++, dst += stride)
vst1_lane_u32((uint32_t *)dst, d0u32, 0);
}
void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint8x8_t d0u8 = vdup_n_u8(0);
(void)left;
d0u8 = vld1_u8(above);
for (i = 0; i < 8; i++, dst += stride) vst1_u8(dst, d0u8);
}
void aom_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint8x16_t q0u8 = vdupq_n_u8(0);
(void)left;
q0u8 = vld1q_u8(above);
for (i = 0; i < 16; i++, dst += stride) vst1q_u8(dst, q0u8);
}
void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint8x16_t q0u8 = vdupq_n_u8(0);
uint8x16_t q1u8 = vdupq_n_u8(0);
(void)left;
q0u8 = vld1q_u8(above);
q1u8 = vld1q_u8(above + 16);
for (i = 0; i < 32; i++, dst += stride) {
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q1u8);
}
}
void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
uint8x8_t d0u8 = vdup_n_u8(0);
uint32x2_t d1u32 = vdup_n_u32(0);
(void)above;
d1u32 = vld1_lane_u32((const uint32_t *)left, d1u32, 0);
d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 0);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 1);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 2);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 3);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
}
void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
uint8x8_t d0u8 = vdup_n_u8(0);
uint64x1_t d1u64 = vdup_n_u64(0);
(void)above;
d1u64 = vld1_u64((const uint64_t *)left);
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 0);
vst1_u8(dst, d0u8);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 1);
vst1_u8(dst, d0u8);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 2);
vst1_u8(dst, d0u8);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 3);
vst1_u8(dst, d0u8);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 4);
vst1_u8(dst, d0u8);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 5);
vst1_u8(dst, d0u8);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 6);
vst1_u8(dst, d0u8);
dst += stride;
d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 7);
vst1_u8(dst, d0u8);
}
void aom_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j;
uint8x8_t d2u8 = vdup_n_u8(0);
uint8x16_t q0u8 = vdupq_n_u8(0);
uint8x16_t q1u8 = vdupq_n_u8(0);
(void)above;
q1u8 = vld1q_u8(left);
d2u8 = vget_low_u8(q1u8);
for (j = 0; j < 2; j++, d2u8 = vget_high_u8(q1u8)) {
q0u8 = vdupq_lane_u8(d2u8, 0);
vst1q_u8(dst, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 1);
vst1q_u8(dst, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 2);
vst1q_u8(dst, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 3);
vst1q_u8(dst, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 4);
vst1q_u8(dst, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 5);
vst1q_u8(dst, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 6);
vst1q_u8(dst, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 7);
vst1q_u8(dst, q0u8);
dst += stride;
}
}
void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j, k;
uint8x8_t d2u8 = vdup_n_u8(0);
uint8x16_t q0u8 = vdupq_n_u8(0);
uint8x16_t q1u8 = vdupq_n_u8(0);
(void)above;
for (k = 0; k < 2; k++, left += 16) {
q1u8 = vld1q_u8(left);
d2u8 = vget_low_u8(q1u8);
for (j = 0; j < 2; j++, d2u8 = vget_high_u8(q1u8)) {
q0u8 = vdupq_lane_u8(d2u8, 0);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 1);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 2);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 3);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 4);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 5);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 6);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
q0u8 = vdupq_lane_u8(d2u8, 7);
vst1q_u8(dst, q0u8);
vst1q_u8(dst + 16, q0u8);
dst += stride;
}
}
}
void aom_tm_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint16x8_t q1u16, q3u16;
int16x8_t q1s16;
uint8x8_t d0u8 = vdup_n_u8(0);
uint32x2_t d2u32 = vdup_n_u32(0);
d0u8 = vld1_dup_u8(above - 1);
d2u32 = vld1_lane_u32((const uint32_t *)above, d2u32, 0);
q3u16 = vsubl_u8(vreinterpret_u8_u32(d2u32), d0u8);
for (i = 0; i < 4; i++, dst += stride) {
q1u16 = vdupq_n_u16((uint16_t)left[i]);
q1s16 =
vaddq_s16(vreinterpretq_s16_u16(q1u16), vreinterpretq_s16_u16(q3u16));
d0u8 = vqmovun_s16(q1s16);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
}
}
void aom_tm_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j;
uint16x8_t q0u16, q3u16, q10u16;
int16x8_t q0s16;
uint16x4_t d20u16;
uint8x8_t d0u8, d2u8, d30u8;
d0u8 = vld1_dup_u8(above - 1);
d30u8 = vld1_u8(left);
d2u8 = vld1_u8(above);
q10u16 = vmovl_u8(d30u8);
q3u16 = vsubl_u8(d2u8, d0u8);
d20u16 = vget_low_u16(q10u16);
for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) {
q0u16 = vdupq_lane_u16(d20u16, 0);
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 1);
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 2);
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 3);
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
}
}
void aom_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j, k;
uint16x8_t q0u16, q2u16, q3u16, q8u16, q10u16;
uint8x16_t q0u8, q1u8;
int16x8_t q0s16, q1s16, q8s16, q11s16;
uint16x4_t d20u16;
uint8x8_t d2u8, d3u8, d18u8, d22u8, d23u8;
q0u8 = vld1q_dup_u8(above - 1);
q1u8 = vld1q_u8(above);
q2u16 = vsubl_u8(vget_low_u8(q1u8), vget_low_u8(q0u8));
q3u16 = vsubl_u8(vget_high_u8(q1u8), vget_high_u8(q0u8));
for (k = 0; k < 2; k++, left += 8) {
d18u8 = vld1_u8(left);
q10u16 = vmovl_u8(d18u8);
d20u16 = vget_low_u16(q10u16);
for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) {
q0u16 = vdupq_lane_u16(d20u16, 0);
q8u16 = vdupq_lane_u16(d20u16, 1);
q1s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q2u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q3u16));
q11s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q2u16));
q8s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q3u16));
d2u8 = vqmovun_s16(q1s16);
d3u8 = vqmovun_s16(q0s16);
d22u8 = vqmovun_s16(q11s16);
d23u8 = vqmovun_s16(q8s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d2u8));
vst1_u64((uint64_t *)(dst + 8), vreinterpret_u64_u8(d3u8));
dst += stride;
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d22u8));
vst1_u64((uint64_t *)(dst + 8), vreinterpret_u64_u8(d23u8));
dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 2);
q8u16 = vdupq_lane_u16(d20u16, 3);
q1s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q2u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q3u16));
q11s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q2u16));
q8s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q3u16));
d2u8 = vqmovun_s16(q1s16);
d3u8 = vqmovun_s16(q0s16);
d22u8 = vqmovun_s16(q11s16);
d23u8 = vqmovun_s16(q8s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d2u8));
vst1_u64((uint64_t *)(dst + 8), vreinterpret_u64_u8(d3u8));
dst += stride;
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d22u8));
vst1_u64((uint64_t *)(dst + 8), vreinterpret_u64_u8(d23u8));
dst += stride;
}
}
}
void aom_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j, k;
uint16x8_t q0u16, q3u16, q8u16, q9u16, q10u16, q11u16;
uint8x16_t q0u8, q1u8, q2u8;
int16x8_t q12s16, q13s16, q14s16, q15s16;
uint16x4_t d6u16;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d26u8;
q0u8 = vld1q_dup_u8(above - 1);
q1u8 = vld1q_u8(above);
q2u8 = vld1q_u8(above + 16);
q8u16 = vsubl_u8(vget_low_u8(q1u8), vget_low_u8(q0u8));
q9u16 = vsubl_u8(vget_high_u8(q1u8), vget_high_u8(q0u8));
q10u16 = vsubl_u8(vget_low_u8(q2u8), vget_low_u8(q0u8));
q11u16 = vsubl_u8(vget_high_u8(q2u8), vget_high_u8(q0u8));
for (k = 0; k < 4; k++, left += 8) {
d26u8 = vld1_u8(left);
q3u16 = vmovl_u8(d26u8);
d6u16 = vget_low_u16(q3u16);
for (j = 0; j < 2; j++, d6u16 = vget_high_u16(q3u16)) {
q0u16 = vdupq_lane_u16(d6u16, 0);
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q11u16));
d0u8 = vqmovun_s16(q12s16);
d1u8 = vqmovun_s16(q13s16);
d2u8 = vqmovun_s16(q14s16);
d3u8 = vqmovun_s16(q15s16);
q0u8 = vcombine_u8(d0u8, d1u8);
q1u8 = vcombine_u8(d2u8, d3u8);
vst1q_u64((uint64_t *)dst, vreinterpretq_u64_u8(q0u8));
vst1q_u64((uint64_t *)(dst + 16), vreinterpretq_u64_u8(q1u8));
dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 1);
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q11u16));
d0u8 = vqmovun_s16(q12s16);
d1u8 = vqmovun_s16(q13s16);
d2u8 = vqmovun_s16(q14s16);
d3u8 = vqmovun_s16(q15s16);
q0u8 = vcombine_u8(d0u8, d1u8);
q1u8 = vcombine_u8(d2u8, d3u8);
vst1q_u64((uint64_t *)dst, vreinterpretq_u64_u8(q0u8));
vst1q_u64((uint64_t *)(dst + 16), vreinterpretq_u64_u8(q1u8));
dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 2);
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q11u16));
d0u8 = vqmovun_s16(q12s16);
d1u8 = vqmovun_s16(q13s16);
d2u8 = vqmovun_s16(q14s16);
d3u8 = vqmovun_s16(q15s16);
q0u8 = vcombine_u8(d0u8, d1u8);
q1u8 = vcombine_u8(d2u8, d3u8);
vst1q_u64((uint64_t *)dst, vreinterpretq_u64_u8(q0u8));
vst1q_u64((uint64_t *)(dst + 16), vreinterpretq_u64_u8(q1u8));
dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 3);
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q11u16));
d0u8 = vqmovun_s16(q12s16);
d1u8 = vqmovun_s16(q13s16);
d2u8 = vqmovun_s16(q14s16);
d3u8 = vqmovun_s16(q15s16);
q0u8 = vcombine_u8(d0u8, d1u8);
q1u8 = vcombine_u8(d2u8, d3u8);
vst1q_u64((uint64_t *)dst, vreinterpretq_u64_u8(q0u8));
vst1q_u64((uint64_t *)(dst + 16), vreinterpretq_u64_u8(q1u8));
dst += stride;
}
}
}
#endif // !HAVE_NEON_ASM

View File

@@ -1,202 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_lpf_horizontal_4_dual_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
;void aom_lpf_horizontal_4_dual_neon(uint8_t *s, int p,
; const uint8_t *blimit0,
; const uint8_t *limit0,
; const uint8_t *thresh0,
; const uint8_t *blimit1,
; const uint8_t *limit1,
; const uint8_t *thresh1)
; r0 uint8_t *s,
; r1 int p,
; r2 const uint8_t *blimit0,
; r3 const uint8_t *limit0,
; sp const uint8_t *thresh0,
; sp+4 const uint8_t *blimit1,
; sp+8 const uint8_t *limit1,
; sp+12 const uint8_t *thresh1,
|aom_lpf_horizontal_4_dual_neon| PROC
push {lr}
ldr r12, [sp, #4] ; load thresh0
vld1.8 {d0}, [r2] ; load blimit0 to first half q
vld1.8 {d2}, [r3] ; load limit0 to first half q
add r1, r1, r1 ; double pitch
ldr r2, [sp, #8] ; load blimit1
vld1.8 {d4}, [r12] ; load thresh0 to first half q
ldr r3, [sp, #12] ; load limit1
ldr r12, [sp, #16] ; load thresh1
vld1.8 {d1}, [r2] ; load blimit1 to 2nd half q
sub r2, r0, r1, lsl #1 ; s[-4 * p]
vld1.8 {d3}, [r3] ; load limit1 to 2nd half q
vld1.8 {d5}, [r12] ; load thresh1 to 2nd half q
vpush {d8-d15} ; save neon registers
add r3, r2, r1, lsr #1 ; s[-3 * p]
vld1.u8 {q3}, [r2@64], r1 ; p3
vld1.u8 {q4}, [r3@64], r1 ; p2
vld1.u8 {q5}, [r2@64], r1 ; p1
vld1.u8 {q6}, [r3@64], r1 ; p0
vld1.u8 {q7}, [r2@64], r1 ; q0
vld1.u8 {q8}, [r3@64], r1 ; q1
vld1.u8 {q9}, [r2@64] ; q2
vld1.u8 {q10}, [r3@64] ; q3
sub r2, r2, r1, lsl #1
sub r3, r3, r1, lsl #1
bl aom_loop_filter_neon_16
vst1.u8 {q5}, [r2@64], r1 ; store op1
vst1.u8 {q6}, [r3@64], r1 ; store op0
vst1.u8 {q7}, [r2@64], r1 ; store oq0
vst1.u8 {q8}, [r3@64], r1 ; store oq1
vpop {d8-d15} ; restore neon registers
pop {pc}
ENDP ; |aom_lpf_horizontal_4_dual_neon|
; void aom_loop_filter_neon_16();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. This function uses
; registers d8-d15, so the calling function must save those registers.
;
; r0-r3, r12 PRESERVE
; q0 blimit
; q1 limit
; q2 thresh
; q3 p3
; q4 p2
; q5 p1
; q6 p0
; q7 q0
; q8 q1
; q9 q2
; q10 q3
;
; Outputs:
; q5 op1
; q6 op0
; q7 oq0
; q8 oq1
|aom_loop_filter_neon_16| PROC
; filter_mask
vabd.u8 q11, q3, q4 ; m1 = abs(p3 - p2)
vabd.u8 q12, q4, q5 ; m2 = abs(p2 - p1)
vabd.u8 q13, q5, q6 ; m3 = abs(p1 - p0)
vabd.u8 q14, q8, q7 ; m4 = abs(q1 - q0)
vabd.u8 q3, q9, q8 ; m5 = abs(q2 - q1)
vabd.u8 q4, q10, q9 ; m6 = abs(q3 - q2)
; only compare the largest value to limit
vmax.u8 q11, q11, q12 ; m7 = max(m1, m2)
vmax.u8 q12, q13, q14 ; m8 = max(m3, m4)
vabd.u8 q9, q6, q7 ; abs(p0 - q0)
vmax.u8 q3, q3, q4 ; m9 = max(m5, m6)
vmov.u8 q10, #0x80
vmax.u8 q15, q11, q12 ; m10 = max(m7, m8)
vcgt.u8 q13, q13, q2 ; (abs(p1 - p0) > thresh)*-1
vcgt.u8 q14, q14, q2 ; (abs(q1 - q0) > thresh)*-1
vmax.u8 q15, q15, q3 ; m11 = max(m10, m9)
vabd.u8 q2, q5, q8 ; a = abs(p1 - q1)
vqadd.u8 q9, q9, q9 ; b = abs(p0 - q0) * 2
veor q7, q7, q10 ; qs0
vcge.u8 q15, q1, q15 ; abs(m11) > limit
vshr.u8 q2, q2, #1 ; a = a / 2
veor q6, q6, q10 ; ps0
veor q5, q5, q10 ; ps1
vqadd.u8 q9, q9, q2 ; a = b + a
veor q8, q8, q10 ; qs1
vmov.u16 q4, #3
vsubl.s8 q2, d14, d12 ; ( qs0 - ps0)
vsubl.s8 q11, d15, d13
vcge.u8 q9, q0, q9 ; a > blimit
vqsub.s8 q1, q5, q8 ; filter = clamp(ps1-qs1)
vorr q14, q13, q14 ; hev
vmul.i16 q2, q2, q4 ; 3 * ( qs0 - ps0)
vmul.i16 q11, q11, q4
vand q1, q1, q14 ; filter &= hev
vand q15, q15, q9 ; mask
vmov.u8 q4, #3
vaddw.s8 q2, q2, d2 ; filter + 3 * (qs0 - ps0)
vaddw.s8 q11, q11, d3
vmov.u8 q9, #4
; filter = clamp(filter + 3 * ( qs0 - ps0))
vqmovn.s16 d2, q2
vqmovn.s16 d3, q11
vand q1, q1, q15 ; filter &= mask
vqadd.s8 q2, q1, q4 ; filter2 = clamp(filter+3)
vqadd.s8 q1, q1, q9 ; filter1 = clamp(filter+4)
vshr.s8 q2, q2, #3 ; filter2 >>= 3
vshr.s8 q1, q1, #3 ; filter1 >>= 3
vqadd.s8 q11, q6, q2 ; u = clamp(ps0 + filter2)
vqsub.s8 q0, q7, q1 ; u = clamp(qs0 - filter1)
; outer tap adjustments
vrshr.s8 q1, q1, #1 ; filter = ++filter1 >> 1
veor q7, q0, q10 ; *oq0 = u^0x80
vbic q1, q1, q14 ; filter &= ~hev
vqadd.s8 q13, q5, q1 ; u = clamp(ps1 + filter)
vqsub.s8 q12, q8, q1 ; u = clamp(qs1 - filter)
veor q6, q11, q10 ; *op0 = u^0x80
veor q5, q13, q10 ; *op1 = u^0x80
veor q8, q12, q10 ; *oq1 = u^0x80
bx lr
ENDP ; |aom_loop_filter_neon_16|
END

View File

@@ -1,174 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
static INLINE void loop_filter_neon_16(uint8x16_t qblimit, // blimit
uint8x16_t qlimit, // limit
uint8x16_t qthresh, // thresh
uint8x16_t q3, // p3
uint8x16_t q4, // p2
uint8x16_t q5, // p1
uint8x16_t q6, // p0
uint8x16_t q7, // q0
uint8x16_t q8, // q1
uint8x16_t q9, // q2
uint8x16_t q10, // q3
uint8x16_t *q5r, // p1
uint8x16_t *q6r, // p0
uint8x16_t *q7r, // q0
uint8x16_t *q8r) { // q1
uint8x16_t q1u8, q2u8, q11u8, q12u8, q13u8, q14u8, q15u8;
int16x8_t q2s16, q11s16;
uint16x8_t q4u16;
int8x16_t q0s8, q1s8, q2s8, q11s8, q12s8, q13s8;
int8x8_t d2s8, d3s8;
q11u8 = vabdq_u8(q3, q4);
q12u8 = vabdq_u8(q4, q5);
q13u8 = vabdq_u8(q5, q6);
q14u8 = vabdq_u8(q8, q7);
q3 = vabdq_u8(q9, q8);
q4 = vabdq_u8(q10, q9);
q11u8 = vmaxq_u8(q11u8, q12u8);
q12u8 = vmaxq_u8(q13u8, q14u8);
q3 = vmaxq_u8(q3, q4);
q15u8 = vmaxq_u8(q11u8, q12u8);
q9 = vabdq_u8(q6, q7);
// aom_hevmask
q13u8 = vcgtq_u8(q13u8, qthresh);
q14u8 = vcgtq_u8(q14u8, qthresh);
q15u8 = vmaxq_u8(q15u8, q3);
q2u8 = vabdq_u8(q5, q8);
q9 = vqaddq_u8(q9, q9);
q15u8 = vcgeq_u8(qlimit, q15u8);
// aom_filter() function
// convert to signed
q10 = vdupq_n_u8(0x80);
q8 = veorq_u8(q8, q10);
q7 = veorq_u8(q7, q10);
q6 = veorq_u8(q6, q10);
q5 = veorq_u8(q5, q10);
q2u8 = vshrq_n_u8(q2u8, 1);
q9 = vqaddq_u8(q9, q2u8);
q2s16 = vsubl_s8(vget_low_s8(vreinterpretq_s8_u8(q7)),
vget_low_s8(vreinterpretq_s8_u8(q6)));
q11s16 = vsubl_s8(vget_high_s8(vreinterpretq_s8_u8(q7)),
vget_high_s8(vreinterpretq_s8_u8(q6)));
q9 = vcgeq_u8(qblimit, q9);
q1s8 = vqsubq_s8(vreinterpretq_s8_u8(q5), vreinterpretq_s8_u8(q8));
q14u8 = vorrq_u8(q13u8, q14u8);
q4u16 = vdupq_n_u16(3);
q2s16 = vmulq_s16(q2s16, vreinterpretq_s16_u16(q4u16));
q11s16 = vmulq_s16(q11s16, vreinterpretq_s16_u16(q4u16));
q1u8 = vandq_u8(vreinterpretq_u8_s8(q1s8), q14u8);
q15u8 = vandq_u8(q15u8, q9);
q1s8 = vreinterpretq_s8_u8(q1u8);
q2s16 = vaddw_s8(q2s16, vget_low_s8(q1s8));
q11s16 = vaddw_s8(q11s16, vget_high_s8(q1s8));
q4 = vdupq_n_u8(3);
q9 = vdupq_n_u8(4);
// aom_filter = clamp(aom_filter + 3 * ( qs0 - ps0))
d2s8 = vqmovn_s16(q2s16);
d3s8 = vqmovn_s16(q11s16);
q1s8 = vcombine_s8(d2s8, d3s8);
q1u8 = vandq_u8(vreinterpretq_u8_s8(q1s8), q15u8);
q1s8 = vreinterpretq_s8_u8(q1u8);
q2s8 = vqaddq_s8(q1s8, vreinterpretq_s8_u8(q4));
q1s8 = vqaddq_s8(q1s8, vreinterpretq_s8_u8(q9));
q2s8 = vshrq_n_s8(q2s8, 3);
q1s8 = vshrq_n_s8(q1s8, 3);
q11s8 = vqaddq_s8(vreinterpretq_s8_u8(q6), q2s8);
q0s8 = vqsubq_s8(vreinterpretq_s8_u8(q7), q1s8);
q1s8 = vrshrq_n_s8(q1s8, 1);
q1s8 = vbicq_s8(q1s8, vreinterpretq_s8_u8(q14u8));
q13s8 = vqaddq_s8(vreinterpretq_s8_u8(q5), q1s8);
q12s8 = vqsubq_s8(vreinterpretq_s8_u8(q8), q1s8);
*q8r = veorq_u8(vreinterpretq_u8_s8(q12s8), q10);
*q7r = veorq_u8(vreinterpretq_u8_s8(q0s8), q10);
*q6r = veorq_u8(vreinterpretq_u8_s8(q11s8), q10);
*q5r = veorq_u8(vreinterpretq_u8_s8(q13s8), q10);
return;
}
void aom_lpf_horizontal_4_dual_neon(
uint8_t *s, int p /* pitch */, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1,
const uint8_t *limit1, const uint8_t *thresh1) {
uint8x8_t dblimit0, dlimit0, dthresh0, dblimit1, dlimit1, dthresh1;
uint8x16_t qblimit, qlimit, qthresh;
uint8x16_t q3u8, q4u8, q5u8, q6u8, q7u8, q8u8, q9u8, q10u8;
dblimit0 = vld1_u8(blimit0);
dlimit0 = vld1_u8(limit0);
dthresh0 = vld1_u8(thresh0);
dblimit1 = vld1_u8(blimit1);
dlimit1 = vld1_u8(limit1);
dthresh1 = vld1_u8(thresh1);
qblimit = vcombine_u8(dblimit0, dblimit1);
qlimit = vcombine_u8(dlimit0, dlimit1);
qthresh = vcombine_u8(dthresh0, dthresh1);
s -= (p << 2);
q3u8 = vld1q_u8(s);
s += p;
q4u8 = vld1q_u8(s);
s += p;
q5u8 = vld1q_u8(s);
s += p;
q6u8 = vld1q_u8(s);
s += p;
q7u8 = vld1q_u8(s);
s += p;
q8u8 = vld1q_u8(s);
s += p;
q9u8 = vld1q_u8(s);
s += p;
q10u8 = vld1q_u8(s);
loop_filter_neon_16(qblimit, qlimit, qthresh, q3u8, q4u8, q5u8, q6u8, q7u8,
q8u8, q9u8, q10u8, &q5u8, &q6u8, &q7u8, &q8u8);
s -= (p * 5);
vst1q_u8(s, q5u8);
s += p;
vst1q_u8(s, q6u8);
s += p;
vst1q_u8(s, q7u8);
s += p;
vst1q_u8(s, q8u8);
return;
}

View File

@@ -1,252 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_lpf_horizontal_4_neon|
EXPORT |aom_lpf_vertical_4_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time.
;
; void aom_lpf_horizontal_4_neon(uint8_t *s,
; int p /* pitch */,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh)
;
; r0 uint8_t *s,
; r1 int p, /* pitch */
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|aom_lpf_horizontal_4_neon| PROC
push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
ldr r2, [sp, #4] ; load thresh
add r1, r1, r1 ; double pitch
vld1.8 {d1[]}, [r3] ; duplicate *limit
vld1.8 {d2[]}, [r2] ; duplicate *thresh
sub r2, r0, r1, lsl #1 ; move src pointer down by 4 lines
add r3, r2, r1, lsr #1 ; set to 3 lines down
vld1.u8 {d3}, [r2@64], r1 ; p3
vld1.u8 {d4}, [r3@64], r1 ; p2
vld1.u8 {d5}, [r2@64], r1 ; p1
vld1.u8 {d6}, [r3@64], r1 ; p0
vld1.u8 {d7}, [r2@64], r1 ; q0
vld1.u8 {d16}, [r3@64], r1 ; q1
vld1.u8 {d17}, [r2@64] ; q2
vld1.u8 {d18}, [r3@64] ; q3
sub r2, r2, r1, lsl #1
sub r3, r3, r1, lsl #1
bl aom_loop_filter_neon
vst1.u8 {d4}, [r2@64], r1 ; store op1
vst1.u8 {d5}, [r3@64], r1 ; store op0
vst1.u8 {d6}, [r2@64], r1 ; store oq0
vst1.u8 {d7}, [r3@64], r1 ; store oq1
pop {pc}
ENDP ; |aom_lpf_horizontal_4_neon|
; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time.
;
; void aom_lpf_vertical_4_neon(uint8_t *s,
; int p /* pitch */,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh)
;
; r0 uint8_t *s,
; r1 int p, /* pitch */
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|aom_lpf_vertical_4_neon| PROC
push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
vld1.8 {d1[]}, [r3] ; duplicate *limit
ldr r3, [sp, #4] ; load thresh
sub r2, r0, #4 ; move s pointer down by 4 columns
vld1.8 {d2[]}, [r3] ; duplicate *thresh
vld1.u8 {d3}, [r2], r1 ; load s data
vld1.u8 {d4}, [r2], r1
vld1.u8 {d5}, [r2], r1
vld1.u8 {d6}, [r2], r1
vld1.u8 {d7}, [r2], r1
vld1.u8 {d16}, [r2], r1
vld1.u8 {d17}, [r2], r1
vld1.u8 {d18}, [r2]
;transpose to 8x16 matrix
vtrn.32 d3, d7
vtrn.32 d4, d16
vtrn.32 d5, d17
vtrn.32 d6, d18
vtrn.16 d3, d5
vtrn.16 d4, d6
vtrn.16 d7, d17
vtrn.16 d16, d18
vtrn.8 d3, d4
vtrn.8 d5, d6
vtrn.8 d7, d16
vtrn.8 d17, d18
bl aom_loop_filter_neon
sub r0, r0, #2
;store op1, op0, oq0, oq1
vst4.8 {d4[0], d5[0], d6[0], d7[0]}, [r0], r1
vst4.8 {d4[1], d5[1], d6[1], d7[1]}, [r0], r1
vst4.8 {d4[2], d5[2], d6[2], d7[2]}, [r0], r1
vst4.8 {d4[3], d5[3], d6[3], d7[3]}, [r0], r1
vst4.8 {d4[4], d5[4], d6[4], d7[4]}, [r0], r1
vst4.8 {d4[5], d5[5], d6[5], d7[5]}, [r0], r1
vst4.8 {d4[6], d5[6], d6[6], d7[6]}, [r0], r1
vst4.8 {d4[7], d5[7], d6[7], d7[7]}, [r0]
pop {pc}
ENDP ; |aom_lpf_vertical_4_neon|
; void aom_loop_filter_neon();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. The function does not use
; registers d8-d15.
;
; Inputs:
; r0-r3, r12 PRESERVE
; d0 blimit
; d1 limit
; d2 thresh
; d3 p3
; d4 p2
; d5 p1
; d6 p0
; d7 q0
; d16 q1
; d17 q2
; d18 q3
;
; Outputs:
; d4 op1
; d5 op0
; d6 oq0
; d7 oq1
|aom_loop_filter_neon| PROC
; filter_mask
vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2)
vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1)
vabd.u8 d21, d5, d6 ; m3 = abs(p1 - p0)
vabd.u8 d22, d16, d7 ; m4 = abs(q1 - q0)
vabd.u8 d3, d17, d16 ; m5 = abs(q2 - q1)
vabd.u8 d4, d18, d17 ; m6 = abs(q3 - q2)
; only compare the largest value to limit
vmax.u8 d19, d19, d20 ; m1 = max(m1, m2)
vmax.u8 d20, d21, d22 ; m2 = max(m3, m4)
vabd.u8 d17, d6, d7 ; abs(p0 - q0)
vmax.u8 d3, d3, d4 ; m3 = max(m5, m6)
vmov.u8 d18, #0x80
vmax.u8 d23, d19, d20 ; m1 = max(m1, m2)
; hevmask
vcgt.u8 d21, d21, d2 ; (abs(p1 - p0) > thresh)*-1
vcgt.u8 d22, d22, d2 ; (abs(q1 - q0) > thresh)*-1
vmax.u8 d23, d23, d3 ; m1 = max(m1, m3)
vabd.u8 d28, d5, d16 ; a = abs(p1 - q1)
vqadd.u8 d17, d17, d17 ; b = abs(p0 - q0) * 2
veor d7, d7, d18 ; qs0
vcge.u8 d23, d1, d23 ; abs(m1) > limit
; filter() function
; convert to signed
vshr.u8 d28, d28, #1 ; a = a / 2
veor d6, d6, d18 ; ps0
veor d5, d5, d18 ; ps1
vqadd.u8 d17, d17, d28 ; a = b + a
veor d16, d16, d18 ; qs1
vmov.u8 d19, #3
vsub.s8 d28, d7, d6 ; ( qs0 - ps0)
vcge.u8 d17, d0, d17 ; a > blimit
vqsub.s8 d27, d5, d16 ; filter = clamp(ps1-qs1)
vorr d22, d21, d22 ; hevmask
vmull.s8 q12, d28, d19 ; 3 * ( qs0 - ps0)
vand d27, d27, d22 ; filter &= hev
vand d23, d23, d17 ; filter_mask
vaddw.s8 q12, q12, d27 ; filter + 3 * (qs0 - ps0)
vmov.u8 d17, #4
; filter = clamp(filter + 3 * ( qs0 - ps0))
vqmovn.s16 d27, q12
vand d27, d27, d23 ; filter &= mask
vqadd.s8 d28, d27, d19 ; filter2 = clamp(filter+3)
vqadd.s8 d27, d27, d17 ; filter1 = clamp(filter+4)
vshr.s8 d28, d28, #3 ; filter2 >>= 3
vshr.s8 d27, d27, #3 ; filter1 >>= 3
vqadd.s8 d19, d6, d28 ; u = clamp(ps0 + filter2)
vqsub.s8 d26, d7, d27 ; u = clamp(qs0 - filter1)
; outer tap adjustments
vrshr.s8 d27, d27, #1 ; filter = ++filter1 >> 1
veor d6, d26, d18 ; *oq0 = u^0x80
vbic d27, d27, d22 ; filter &= ~hev
vqadd.s8 d21, d5, d27 ; u = clamp(ps1 + filter)
vqsub.s8 d20, d16, d27 ; u = clamp(qs1 - filter)
veor d5, d19, d18 ; *op0 = u^0x80
veor d4, d21, d18 ; *op1 = u^0x80
veor d7, d20, d18 ; *oq1 = u^0x80
bx lr
ENDP ; |aom_loop_filter_neon|
END

View File

@@ -1,250 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
static INLINE void loop_filter_neon(uint8x8_t dblimit, // flimit
uint8x8_t dlimit, // limit
uint8x8_t dthresh, // thresh
uint8x8_t d3u8, // p3
uint8x8_t d4u8, // p2
uint8x8_t d5u8, // p1
uint8x8_t d6u8, // p0
uint8x8_t d7u8, // q0
uint8x8_t d16u8, // q1
uint8x8_t d17u8, // q2
uint8x8_t d18u8, // q3
uint8x8_t *d4ru8, // p1
uint8x8_t *d5ru8, // p0
uint8x8_t *d6ru8, // q0
uint8x8_t *d7ru8) { // q1
uint8x8_t d19u8, d20u8, d21u8, d22u8, d23u8, d27u8, d28u8;
int16x8_t q12s16;
int8x8_t d19s8, d20s8, d21s8, d26s8, d27s8, d28s8;
d19u8 = vabd_u8(d3u8, d4u8);
d20u8 = vabd_u8(d4u8, d5u8);
d21u8 = vabd_u8(d5u8, d6u8);
d22u8 = vabd_u8(d16u8, d7u8);
d3u8 = vabd_u8(d17u8, d16u8);
d4u8 = vabd_u8(d18u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d20u8 = vmax_u8(d21u8, d22u8);
d3u8 = vmax_u8(d3u8, d4u8);
d23u8 = vmax_u8(d19u8, d20u8);
d17u8 = vabd_u8(d6u8, d7u8);
d21u8 = vcgt_u8(d21u8, dthresh);
d22u8 = vcgt_u8(d22u8, dthresh);
d23u8 = vmax_u8(d23u8, d3u8);
d28u8 = vabd_u8(d5u8, d16u8);
d17u8 = vqadd_u8(d17u8, d17u8);
d23u8 = vcge_u8(dlimit, d23u8);
d18u8 = vdup_n_u8(0x80);
d5u8 = veor_u8(d5u8, d18u8);
d6u8 = veor_u8(d6u8, d18u8);
d7u8 = veor_u8(d7u8, d18u8);
d16u8 = veor_u8(d16u8, d18u8);
d28u8 = vshr_n_u8(d28u8, 1);
d17u8 = vqadd_u8(d17u8, d28u8);
d19u8 = vdup_n_u8(3);
d28s8 = vsub_s8(vreinterpret_s8_u8(d7u8), vreinterpret_s8_u8(d6u8));
d17u8 = vcge_u8(dblimit, d17u8);
d27s8 = vqsub_s8(vreinterpret_s8_u8(d5u8), vreinterpret_s8_u8(d16u8));
d22u8 = vorr_u8(d21u8, d22u8);
q12s16 = vmull_s8(d28s8, vreinterpret_s8_u8(d19u8));
d27u8 = vand_u8(vreinterpret_u8_s8(d27s8), d22u8);
d23u8 = vand_u8(d23u8, d17u8);
q12s16 = vaddw_s8(q12s16, vreinterpret_s8_u8(d27u8));
d17u8 = vdup_n_u8(4);
d27s8 = vqmovn_s16(q12s16);
d27u8 = vand_u8(vreinterpret_u8_s8(d27s8), d23u8);
d27s8 = vreinterpret_s8_u8(d27u8);
d28s8 = vqadd_s8(d27s8, vreinterpret_s8_u8(d19u8));
d27s8 = vqadd_s8(d27s8, vreinterpret_s8_u8(d17u8));
d28s8 = vshr_n_s8(d28s8, 3);
d27s8 = vshr_n_s8(d27s8, 3);
d19s8 = vqadd_s8(vreinterpret_s8_u8(d6u8), d28s8);
d26s8 = vqsub_s8(vreinterpret_s8_u8(d7u8), d27s8);
d27s8 = vrshr_n_s8(d27s8, 1);
d27s8 = vbic_s8(d27s8, vreinterpret_s8_u8(d22u8));
d21s8 = vqadd_s8(vreinterpret_s8_u8(d5u8), d27s8);
d20s8 = vqsub_s8(vreinterpret_s8_u8(d16u8), d27s8);
*d4ru8 = veor_u8(vreinterpret_u8_s8(d21s8), d18u8);
*d5ru8 = veor_u8(vreinterpret_u8_s8(d19s8), d18u8);
*d6ru8 = veor_u8(vreinterpret_u8_s8(d26s8), d18u8);
*d7ru8 = veor_u8(vreinterpret_u8_s8(d20s8), d18u8);
return;
}
void aom_lpf_horizontal_4_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s, *psrc;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d3u8, d4u8, d5u8, d6u8, d7u8, d16u8, d17u8, d18u8;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
psrc = src - (pitch << 2);
for (i = 0; i < 1; i++) {
s = psrc + i * 8;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
loop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d4u8, &d5u8, &d6u8, &d7u8);
s -= (pitch * 5);
vst1_u8(s, d4u8);
s += pitch;
vst1_u8(s, d5u8);
s += pitch;
vst1_u8(s, d6u8);
s += pitch;
vst1_u8(s, d7u8);
}
return;
}
void aom_lpf_vertical_4_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i, pitch8;
uint8_t *s;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d3u8, d4u8, d5u8, d6u8, d7u8, d16u8, d17u8, d18u8;
uint32x2x2_t d2tmp0, d2tmp1, d2tmp2, d2tmp3;
uint16x4x2_t d2tmp4, d2tmp5, d2tmp6, d2tmp7;
uint8x8x2_t d2tmp8, d2tmp9, d2tmp10, d2tmp11;
uint8x8x4_t d4Result;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
pitch8 = pitch * 8;
for (i = 0; i < 1; i++, src += pitch8) {
s = src - (i + 1) * 4;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
d2tmp0 = vtrn_u32(vreinterpret_u32_u8(d3u8), vreinterpret_u32_u8(d7u8));
d2tmp1 = vtrn_u32(vreinterpret_u32_u8(d4u8), vreinterpret_u32_u8(d16u8));
d2tmp2 = vtrn_u32(vreinterpret_u32_u8(d5u8), vreinterpret_u32_u8(d17u8));
d2tmp3 = vtrn_u32(vreinterpret_u32_u8(d6u8), vreinterpret_u32_u8(d18u8));
d2tmp4 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[0]),
vreinterpret_u16_u32(d2tmp2.val[0]));
d2tmp5 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[0]),
vreinterpret_u16_u32(d2tmp3.val[0]));
d2tmp6 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[1]),
vreinterpret_u16_u32(d2tmp2.val[1]));
d2tmp7 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[1]),
vreinterpret_u16_u32(d2tmp3.val[1]));
d2tmp8 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[0]),
vreinterpret_u8_u16(d2tmp5.val[0]));
d2tmp9 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[1]),
vreinterpret_u8_u16(d2tmp5.val[1]));
d2tmp10 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[0]),
vreinterpret_u8_u16(d2tmp7.val[0]));
d2tmp11 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[1]),
vreinterpret_u8_u16(d2tmp7.val[1]));
d3u8 = d2tmp8.val[0];
d4u8 = d2tmp8.val[1];
d5u8 = d2tmp9.val[0];
d6u8 = d2tmp9.val[1];
d7u8 = d2tmp10.val[0];
d16u8 = d2tmp10.val[1];
d17u8 = d2tmp11.val[0];
d18u8 = d2tmp11.val[1];
loop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d4u8, &d5u8, &d6u8, &d7u8);
d4Result.val[0] = d4u8;
d4Result.val[1] = d5u8;
d4Result.val[2] = d6u8;
d4Result.val[3] = d7u8;
src -= 2;
vst4_lane_u8(src, d4Result, 0);
src += pitch;
vst4_lane_u8(src, d4Result, 1);
src += pitch;
vst4_lane_u8(src, d4Result, 2);
src += pitch;
vst4_lane_u8(src, d4Result, 3);
src += pitch;
vst4_lane_u8(src, d4Result, 4);
src += pitch;
vst4_lane_u8(src, d4Result, 5);
src += pitch;
vst4_lane_u8(src, d4Result, 6);
src += pitch;
vst4_lane_u8(src, d4Result, 7);
}
return;
}

View File

@@ -1,430 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
static INLINE void mbloop_filter_neon(uint8x8_t dblimit, // mblimit
uint8x8_t dlimit, // limit
uint8x8_t dthresh, // thresh
uint8x8_t d3u8, // p2
uint8x8_t d4u8, // p2
uint8x8_t d5u8, // p1
uint8x8_t d6u8, // p0
uint8x8_t d7u8, // q0
uint8x8_t d16u8, // q1
uint8x8_t d17u8, // q2
uint8x8_t d18u8, // q3
uint8x8_t *d0ru8, // p1
uint8x8_t *d1ru8, // p1
uint8x8_t *d2ru8, // p0
uint8x8_t *d3ru8, // q0
uint8x8_t *d4ru8, // q1
uint8x8_t *d5ru8) { // q1
uint32_t flat;
uint8x8_t d0u8, d1u8, d2u8, d19u8, d20u8, d21u8, d22u8, d23u8, d24u8;
uint8x8_t d25u8, d26u8, d27u8, d28u8, d29u8, d30u8, d31u8;
int16x8_t q15s16;
uint16x8_t q10u16, q14u16;
int8x8_t d21s8, d24s8, d25s8, d26s8, d28s8, d29s8, d30s8;
d19u8 = vabd_u8(d3u8, d4u8);
d20u8 = vabd_u8(d4u8, d5u8);
d21u8 = vabd_u8(d5u8, d6u8);
d22u8 = vabd_u8(d16u8, d7u8);
d23u8 = vabd_u8(d17u8, d16u8);
d24u8 = vabd_u8(d18u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d20u8 = vmax_u8(d21u8, d22u8);
d25u8 = vabd_u8(d6u8, d4u8);
d23u8 = vmax_u8(d23u8, d24u8);
d26u8 = vabd_u8(d7u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d24u8 = vabd_u8(d6u8, d7u8);
d27u8 = vabd_u8(d3u8, d6u8);
d28u8 = vabd_u8(d18u8, d7u8);
d19u8 = vmax_u8(d19u8, d23u8);
d23u8 = vabd_u8(d5u8, d16u8);
d24u8 = vqadd_u8(d24u8, d24u8);
d19u8 = vcge_u8(dlimit, d19u8);
d25u8 = vmax_u8(d25u8, d26u8);
d26u8 = vmax_u8(d27u8, d28u8);
d23u8 = vshr_n_u8(d23u8, 1);
d25u8 = vmax_u8(d25u8, d26u8);
d24u8 = vqadd_u8(d24u8, d23u8);
d20u8 = vmax_u8(d20u8, d25u8);
d23u8 = vdup_n_u8(1);
d24u8 = vcge_u8(dblimit, d24u8);
d21u8 = vcgt_u8(d21u8, dthresh);
d20u8 = vcge_u8(d23u8, d20u8);
d19u8 = vand_u8(d19u8, d24u8);
d23u8 = vcgt_u8(d22u8, dthresh);
d20u8 = vand_u8(d20u8, d19u8);
d22u8 = vdup_n_u8(0x80);
d23u8 = vorr_u8(d21u8, d23u8);
q10u16 = vcombine_u16(vreinterpret_u16_u8(d20u8), vreinterpret_u16_u8(d21u8));
d30u8 = vshrn_n_u16(q10u16, 4);
flat = vget_lane_u32(vreinterpret_u32_u8(d30u8), 0);
if (flat == 0xffffffff) { // Check for all 1's, power_branch_only
d27u8 = vdup_n_u8(3);
d21u8 = vdup_n_u8(2);
q14u16 = vaddl_u8(d6u8, d7u8);
q14u16 = vmlal_u8(q14u16, d3u8, d27u8);
q14u16 = vmlal_u8(q14u16, d4u8, d21u8);
q14u16 = vaddw_u8(q14u16, d5u8);
*d0ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d16u8);
*d1ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d17u8);
*d2ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d3ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vsubw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d4ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vsubw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d17u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d5ru8 = vqrshrn_n_u16(q14u16, 3);
} else {
d21u8 = veor_u8(d7u8, d22u8);
d24u8 = veor_u8(d6u8, d22u8);
d25u8 = veor_u8(d5u8, d22u8);
d26u8 = veor_u8(d16u8, d22u8);
d27u8 = vdup_n_u8(3);
d28s8 = vsub_s8(vreinterpret_s8_u8(d21u8), vreinterpret_s8_u8(d24u8));
d29s8 = vqsub_s8(vreinterpret_s8_u8(d25u8), vreinterpret_s8_u8(d26u8));
q15s16 = vmull_s8(d28s8, vreinterpret_s8_u8(d27u8));
d29s8 = vand_s8(d29s8, vreinterpret_s8_u8(d23u8));
q15s16 = vaddw_s8(q15s16, d29s8);
d29u8 = vdup_n_u8(4);
d28s8 = vqmovn_s16(q15s16);
d28s8 = vand_s8(d28s8, vreinterpret_s8_u8(d19u8));
d30s8 = vqadd_s8(d28s8, vreinterpret_s8_u8(d27u8));
d29s8 = vqadd_s8(d28s8, vreinterpret_s8_u8(d29u8));
d30s8 = vshr_n_s8(d30s8, 3);
d29s8 = vshr_n_s8(d29s8, 3);
d24s8 = vqadd_s8(vreinterpret_s8_u8(d24u8), d30s8);
d21s8 = vqsub_s8(vreinterpret_s8_u8(d21u8), d29s8);
d29s8 = vrshr_n_s8(d29s8, 1);
d29s8 = vbic_s8(d29s8, vreinterpret_s8_u8(d23u8));
d25s8 = vqadd_s8(vreinterpret_s8_u8(d25u8), d29s8);
d26s8 = vqsub_s8(vreinterpret_s8_u8(d26u8), d29s8);
if (flat == 0) { // filter_branch_only
*d0ru8 = d4u8;
*d1ru8 = veor_u8(vreinterpret_u8_s8(d25s8), d22u8);
*d2ru8 = veor_u8(vreinterpret_u8_s8(d24s8), d22u8);
*d3ru8 = veor_u8(vreinterpret_u8_s8(d21s8), d22u8);
*d4ru8 = veor_u8(vreinterpret_u8_s8(d26s8), d22u8);
*d5ru8 = d17u8;
return;
}
d21u8 = veor_u8(vreinterpret_u8_s8(d21s8), d22u8);
d24u8 = veor_u8(vreinterpret_u8_s8(d24s8), d22u8);
d25u8 = veor_u8(vreinterpret_u8_s8(d25s8), d22u8);
d26u8 = veor_u8(vreinterpret_u8_s8(d26s8), d22u8);
d23u8 = vdup_n_u8(2);
q14u16 = vaddl_u8(d6u8, d7u8);
q14u16 = vmlal_u8(q14u16, d3u8, d27u8);
q14u16 = vmlal_u8(q14u16, d4u8, d23u8);
d0u8 = vbsl_u8(d20u8, dblimit, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
d1u8 = vbsl_u8(d20u8, dlimit, d25u8);
d30u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d16u8);
d2u8 = vbsl_u8(d20u8, dthresh, d24u8);
d31u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d17u8);
*d0ru8 = vbsl_u8(d20u8, d30u8, d0u8);
d23u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d7u8);
*d1ru8 = vbsl_u8(d20u8, d31u8, d1u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d2ru8 = vbsl_u8(d20u8, d23u8, d2u8);
d22u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vsubw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d16u8);
d3u8 = vbsl_u8(d20u8, d3u8, d21u8);
q14u16 = vaddw_u8(q14u16, d18u8);
d4u8 = vbsl_u8(d20u8, d4u8, d26u8);
d6u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vsubw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d17u8);
q14u16 = vaddw_u8(q14u16, d18u8);
d5u8 = vbsl_u8(d20u8, d5u8, d17u8);
d7u8 = vqrshrn_n_u16(q14u16, 3);
*d3ru8 = vbsl_u8(d20u8, d22u8, d3u8);
*d4ru8 = vbsl_u8(d20u8, d6u8, d4u8);
*d5ru8 = vbsl_u8(d20u8, d7u8, d5u8);
}
return;
}
void aom_lpf_horizontal_8_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s, *psrc;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
uint8x8_t d16u8, d17u8, d18u8;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
psrc = src - (pitch << 2);
for (i = 0; i < 1; i++) {
s = psrc + i * 8;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
mbloop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d0u8, &d1u8, &d2u8, &d3u8, &d4u8,
&d5u8);
s -= (pitch * 6);
vst1_u8(s, d0u8);
s += pitch;
vst1_u8(s, d1u8);
s += pitch;
vst1_u8(s, d2u8);
s += pitch;
vst1_u8(s, d3u8);
s += pitch;
vst1_u8(s, d4u8);
s += pitch;
vst1_u8(s, d5u8);
}
return;
}
void aom_lpf_vertical_8_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
uint8x8_t d16u8, d17u8, d18u8;
uint32x2x2_t d2tmp0, d2tmp1, d2tmp2, d2tmp3;
uint16x4x2_t d2tmp4, d2tmp5, d2tmp6, d2tmp7;
uint8x8x2_t d2tmp8, d2tmp9, d2tmp10, d2tmp11;
uint8x8x4_t d4Result;
uint8x8x2_t d2Result;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
for (i = 0; i < 1; i++) {
s = src + (i * (pitch << 3)) - 4;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
d2tmp0 = vtrn_u32(vreinterpret_u32_u8(d3u8), vreinterpret_u32_u8(d7u8));
d2tmp1 = vtrn_u32(vreinterpret_u32_u8(d4u8), vreinterpret_u32_u8(d16u8));
d2tmp2 = vtrn_u32(vreinterpret_u32_u8(d5u8), vreinterpret_u32_u8(d17u8));
d2tmp3 = vtrn_u32(vreinterpret_u32_u8(d6u8), vreinterpret_u32_u8(d18u8));
d2tmp4 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[0]),
vreinterpret_u16_u32(d2tmp2.val[0]));
d2tmp5 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[0]),
vreinterpret_u16_u32(d2tmp3.val[0]));
d2tmp6 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[1]),
vreinterpret_u16_u32(d2tmp2.val[1]));
d2tmp7 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[1]),
vreinterpret_u16_u32(d2tmp3.val[1]));
d2tmp8 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[0]),
vreinterpret_u8_u16(d2tmp5.val[0]));
d2tmp9 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[1]),
vreinterpret_u8_u16(d2tmp5.val[1]));
d2tmp10 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[0]),
vreinterpret_u8_u16(d2tmp7.val[0]));
d2tmp11 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[1]),
vreinterpret_u8_u16(d2tmp7.val[1]));
d3u8 = d2tmp8.val[0];
d4u8 = d2tmp8.val[1];
d5u8 = d2tmp9.val[0];
d6u8 = d2tmp9.val[1];
d7u8 = d2tmp10.val[0];
d16u8 = d2tmp10.val[1];
d17u8 = d2tmp11.val[0];
d18u8 = d2tmp11.val[1];
mbloop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d0u8, &d1u8, &d2u8, &d3u8, &d4u8,
&d5u8);
d4Result.val[0] = d0u8;
d4Result.val[1] = d1u8;
d4Result.val[2] = d2u8;
d4Result.val[3] = d3u8;
d2Result.val[0] = d4u8;
d2Result.val[1] = d5u8;
s = src - 3;
vst4_lane_u8(s, d4Result, 0);
s += pitch;
vst4_lane_u8(s, d4Result, 1);
s += pitch;
vst4_lane_u8(s, d4Result, 2);
s += pitch;
vst4_lane_u8(s, d4Result, 3);
s += pitch;
vst4_lane_u8(s, d4Result, 4);
s += pitch;
vst4_lane_u8(s, d4Result, 5);
s += pitch;
vst4_lane_u8(s, d4Result, 6);
s += pitch;
vst4_lane_u8(s, d4Result, 7);
s = src + 1;
vst2_lane_u8(s, d2Result, 0);
s += pitch;
vst2_lane_u8(s, d2Result, 1);
s += pitch;
vst2_lane_u8(s, d2Result, 2);
s += pitch;
vst2_lane_u8(s, d2Result, 3);
s += pitch;
vst2_lane_u8(s, d2Result, 4);
s += pitch;
vst2_lane_u8(s, d2Result, 5);
s += pitch;
vst2_lane_u8(s, d2Result, 6);
s += pitch;
vst2_lane_u8(s, d2Result, 7);
}
return;
}

View File

@@ -1,49 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
void aom_lpf_vertical_4_dual_neon(uint8_t *s, int p, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0,
const uint8_t *blimit1, const uint8_t *limit1,
const uint8_t *thresh1) {
aom_lpf_vertical_4_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_vertical_4_neon(s + 8 * p, p, blimit1, limit1, thresh1);
}
#if HAVE_NEON_ASM
void aom_lpf_horizontal_8_dual_neon(
uint8_t *s, int p /* pitch */, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1,
const uint8_t *limit1, const uint8_t *thresh1) {
aom_lpf_horizontal_8_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_horizontal_8_neon(s + 8, p, blimit1, limit1, thresh1);
}
void aom_lpf_vertical_8_dual_neon(uint8_t *s, int p, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0,
const uint8_t *blimit1, const uint8_t *limit1,
const uint8_t *thresh1) {
aom_lpf_vertical_8_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_vertical_8_neon(s + 8 * p, p, blimit1, limit1, thresh1);
}
void aom_lpf_vertical_16_dual_neon(uint8_t *s, int p, const uint8_t *blimit,
const uint8_t *limit,
const uint8_t *thresh) {
aom_lpf_vertical_16_neon(s, p, blimit, limit, thresh);
aom_lpf_vertical_16_neon(s + 8 * p, p, blimit, limit, thresh);
}
#endif // HAVE_NEON_ASM

View File

@@ -1,98 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_sad16x16_media|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; r0 const unsigned char *src_ptr
; r1 int src_stride
; r2 const unsigned char *ref_ptr
; r3 int ref_stride
|aom_sad16x16_media| PROC
stmfd sp!, {r4-r12, lr}
pld [r0, r1, lsl #0]
pld [r2, r3, lsl #0]
pld [r0, r1, lsl #1]
pld [r2, r3, lsl #1]
mov r4, #0 ; sad = 0;
mov r5, #8 ; loop count
loop
; 1st row
ldr r6, [r0, #0x0] ; load 4 src pixels (1A)
ldr r8, [r2, #0x0] ; load 4 ref pixels (1A)
ldr r7, [r0, #0x4] ; load 4 src pixels (1A)
ldr r9, [r2, #0x4] ; load 4 ref pixels (1A)
ldr r10, [r0, #0x8] ; load 4 src pixels (1B)
ldr r11, [r0, #0xC] ; load 4 src pixels (1B)
usada8 r4, r8, r6, r4 ; calculate sad for 4 pixels
usad8 r8, r7, r9 ; calculate sad for 4 pixels
ldr r12, [r2, #0x8] ; load 4 ref pixels (1B)
ldr lr, [r2, #0xC] ; load 4 ref pixels (1B)
add r0, r0, r1 ; set src pointer to next row
add r2, r2, r3 ; set dst pointer to next row
pld [r0, r1, lsl #1]
pld [r2, r3, lsl #1]
usada8 r4, r10, r12, r4 ; calculate sad for 4 pixels
usada8 r8, r11, lr, r8 ; calculate sad for 4 pixels
ldr r6, [r0, #0x0] ; load 4 src pixels (2A)
ldr r7, [r0, #0x4] ; load 4 src pixels (2A)
add r4, r4, r8 ; add partial sad values
; 2nd row
ldr r8, [r2, #0x0] ; load 4 ref pixels (2A)
ldr r9, [r2, #0x4] ; load 4 ref pixels (2A)
ldr r10, [r0, #0x8] ; load 4 src pixels (2B)
ldr r11, [r0, #0xC] ; load 4 src pixels (2B)
usada8 r4, r6, r8, r4 ; calculate sad for 4 pixels
usad8 r8, r7, r9 ; calculate sad for 4 pixels
ldr r12, [r2, #0x8] ; load 4 ref pixels (2B)
ldr lr, [r2, #0xC] ; load 4 ref pixels (2B)
add r0, r0, r1 ; set src pointer to next row
add r2, r2, r3 ; set dst pointer to next row
usada8 r4, r10, r12, r4 ; calculate sad for 4 pixels
usada8 r8, r11, lr, r8 ; calculate sad for 4 pixels
pld [r0, r1, lsl #1]
pld [r2, r3, lsl #1]
subs r5, r5, #1 ; decrement loop counter
add r4, r4, r8 ; add partial sad values
bne loop
mov r0, r4 ; return sad
ldmfd sp!, {r4-r12, pc}
ENDP
END

View File

@@ -1,39 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_push_neon|
EXPORT |aom_pop_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
|aom_push_neon| PROC
vst1.i64 {d8, d9, d10, d11}, [r0]!
vst1.i64 {d12, d13, d14, d15}, [r0]!
bx lr
ENDP
|aom_pop_neon| PROC
vld1.i64 {d8, d9, d10, d11}, [r0]!
vld1.i64 {d12, d13, d14, d15}, [r0]!
bx lr
ENDP
END

View File

@@ -1,81 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#if HAVE_MEDIA
static const int16_t bilinear_filters_media[8][2] = { { 128, 0 }, { 112, 16 },
{ 96, 32 }, { 80, 48 },
{ 64, 64 }, { 48, 80 },
{ 32, 96 }, { 16, 112 } };
extern void aom_filter_block2d_bil_first_pass_media(
const uint8_t *src_ptr, uint16_t *dst_ptr, uint32_t src_pitch,
uint32_t height, uint32_t width, const int16_t *filter);
extern void aom_filter_block2d_bil_second_pass_media(
const uint16_t *src_ptr, uint8_t *dst_ptr, int32_t src_pitch,
uint32_t height, uint32_t width, const int16_t *filter);
unsigned int aom_sub_pixel_variance8x8_media(
const uint8_t *src_ptr, int src_pixels_per_line, int xoffset, int yoffset,
const uint8_t *dst_ptr, int dst_pixels_per_line, unsigned int *sse) {
uint16_t first_pass[10 * 8];
uint8_t second_pass[8 * 8];
const int16_t *HFilter, *VFilter;
HFilter = bilinear_filters_media[xoffset];
VFilter = bilinear_filters_media[yoffset];
aom_filter_block2d_bil_first_pass_media(src_ptr, first_pass,
src_pixels_per_line, 9, 8, HFilter);
aom_filter_block2d_bil_second_pass_media(first_pass, second_pass, 8, 8, 8,
VFilter);
return aom_variance8x8_media(second_pass, 8, dst_ptr, dst_pixels_per_line,
sse);
}
unsigned int aom_sub_pixel_variance16x16_media(
const uint8_t *src_ptr, int src_pixels_per_line, int xoffset, int yoffset,
const uint8_t *dst_ptr, int dst_pixels_per_line, unsigned int *sse) {
uint16_t first_pass[36 * 16];
uint8_t second_pass[20 * 16];
const int16_t *HFilter, *VFilter;
unsigned int var;
if (xoffset == 4 && yoffset == 0) {
var = aom_variance_halfpixvar16x16_h_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else if (xoffset == 0 && yoffset == 4) {
var = aom_variance_halfpixvar16x16_v_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else if (xoffset == 4 && yoffset == 4) {
var = aom_variance_halfpixvar16x16_hv_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else {
HFilter = bilinear_filters_media[xoffset];
VFilter = bilinear_filters_media[yoffset];
aom_filter_block2d_bil_first_pass_media(
src_ptr, first_pass, src_pixels_per_line, 17, 16, HFilter);
aom_filter_block2d_bil_second_pass_media(first_pass, second_pass, 16, 16,
16, VFilter);
var = aom_variance16x16_media(second_pass, 16, dst_ptr, dst_pixels_per_line,
sse);
}
return var;
}
#endif // HAVE_MEDIA

View File

@@ -1,185 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_variance_halfpixvar16x16_h_media|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; r0 unsigned char *src_ptr
; r1 int source_stride
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|aom_variance_halfpixvar16x16_h_media| PROC
stmfd sp!, {r4-r12, lr}
pld [r0, r1, lsl #0]
pld [r2, r3, lsl #0]
mov r8, #0 ; initialize sum = 0
ldr r10, c80808080
mov r11, #0 ; initialize sse = 0
mov r12, #16 ; set loop counter to 16 (=block height)
mov lr, #0 ; constant zero
loop
; 1st 4 pixels
ldr r4, [r0, #0] ; load 4 src pixels
ldr r6, [r0, #1] ; load 4 src pixels with 1 byte offset
ldr r5, [r2, #0] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
usub8 r6, r4, r5 ; calculate difference
pld [r0, r1, lsl #1]
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
pld [r2, r3, lsl #1]
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
adds r8, r8, r4 ; add positive differences to sum
subs r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 2nd 4 pixels
ldr r4, [r0, #4] ; load 4 src pixels
ldr r6, [r0, #5] ; load 4 src pixels with 1 byte offset
ldr r5, [r2, #4] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 3rd 4 pixels
ldr r4, [r0, #8] ; load 4 src pixels
ldr r6, [r0, #9] ; load 4 src pixels with 1 byte offset
ldr r5, [r2, #8] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 4th 4 pixels
ldr r4, [r0, #12] ; load 4 src pixels
ldr r6, [r0, #13] ; load 4 src pixels with 1 byte offset
ldr r5, [r2, #12] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
add r0, r0, r1 ; set src_ptr to next row
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
add r2, r2, r3 ; set dst_ptr to next row
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
subs r12, r12, #1
bne loop
; return stuff
ldr r6, [sp, #40] ; get address of sse
mul r0, r8, r8 ; sum * sum
str r11, [r6] ; store sse
sub r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))
ldmfd sp!, {r4-r12, pc}
ENDP
c80808080
DCD 0x80808080
END

View File

@@ -1,225 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_variance_halfpixvar16x16_hv_media|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; r0 unsigned char *src_ptr
; r1 int source_stride
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|aom_variance_halfpixvar16x16_hv_media| PROC
stmfd sp!, {r4-r12, lr}
pld [r0, r1, lsl #0]
pld [r2, r3, lsl #0]
mov r8, #0 ; initialize sum = 0
ldr r10, c80808080
mov r11, #0 ; initialize sse = 0
mov r12, #16 ; set loop counter to 16 (=block height)
mov lr, #0 ; constant zero
loop
add r9, r0, r1 ; pointer to pixels on the next row
; 1st 4 pixels
ldr r4, [r0, #0] ; load source pixels a, row N
ldr r6, [r0, #1] ; load source pixels b, row N
ldr r5, [r9, #0] ; load source pixels c, row N+1
ldr r7, [r9, #1] ; load source pixels d, row N+1
; x = (a + b + 1) >> 1, interpolate pixels horizontally on row N
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
; y = (c + d + 1) >> 1, interpolate pixels horizontally on row N+1
mvn r7, r7
uhsub8 r5, r5, r7
eor r5, r5, r10
; z = (x + y + 1) >> 1, interpolate half pixel values vertically
mvn r5, r5
uhsub8 r4, r4, r5
ldr r5, [r2, #0] ; load 4 ref pixels
eor r4, r4, r10
usub8 r6, r4, r5 ; calculate difference
pld [r0, r1, lsl #1]
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
pld [r2, r3, lsl #1]
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
adds r8, r8, r4 ; add positive differences to sum
subs r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 2nd 4 pixels
ldr r4, [r0, #4] ; load source pixels a, row N
ldr r6, [r0, #5] ; load source pixels b, row N
ldr r5, [r9, #4] ; load source pixels c, row N+1
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
ldr r7, [r9, #5] ; load source pixels d, row N+1
; x = (a + b + 1) >> 1, interpolate pixels horizontally on row N
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
; y = (c + d + 1) >> 1, interpolate pixels horizontally on row N+1
mvn r7, r7
uhsub8 r5, r5, r7
eor r5, r5, r10
; z = (x + y + 1) >> 1, interpolate half pixel values vertically
mvn r5, r5
uhsub8 r4, r4, r5
ldr r5, [r2, #4] ; load 4 ref pixels
eor r4, r4, r10
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 3rd 4 pixels
ldr r4, [r0, #8] ; load source pixels a, row N
ldr r6, [r0, #9] ; load source pixels b, row N
ldr r5, [r9, #8] ; load source pixels c, row N+1
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
ldr r7, [r9, #9] ; load source pixels d, row N+1
; x = (a + b + 1) >> 1, interpolate pixels horizontally on row N
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
; y = (c + d + 1) >> 1, interpolate pixels horizontally on row N+1
mvn r7, r7
uhsub8 r5, r5, r7
eor r5, r5, r10
; z = (x + y + 1) >> 1, interpolate half pixel values vertically
mvn r5, r5
uhsub8 r4, r4, r5
ldr r5, [r2, #8] ; load 4 ref pixels
eor r4, r4, r10
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 4th 4 pixels
ldr r4, [r0, #12] ; load source pixels a, row N
ldr r6, [r0, #13] ; load source pixels b, row N
ldr r5, [r9, #12] ; load source pixels c, row N+1
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
ldr r7, [r9, #13] ; load source pixels d, row N+1
; x = (a + b + 1) >> 1, interpolate pixels horizontally on row N
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
; y = (c + d + 1) >> 1, interpolate pixels horizontally on row N+1
mvn r7, r7
uhsub8 r5, r5, r7
eor r5, r5, r10
; z = (x + y + 1) >> 1, interpolate half pixel values vertically
mvn r5, r5
uhsub8 r4, r4, r5
ldr r5, [r2, #12] ; load 4 ref pixels
eor r4, r4, r10
usub8 r6, r4, r5 ; calculate difference
add r0, r0, r1 ; set src_ptr to next row
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
add r2, r2, r3 ; set dst_ptr to next row
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
subs r12, r12, #1
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
bne loop
; return stuff
ldr r6, [sp, #40] ; get address of sse
mul r0, r8, r8 ; sum * sum
str r11, [r6] ; store sse
sub r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))
ldmfd sp!, {r4-r12, pc}
ENDP
c80808080
DCD 0x80808080
END

View File

@@ -1,187 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_variance_halfpixvar16x16_v_media|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; r0 unsigned char *src_ptr
; r1 int source_stride
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|aom_variance_halfpixvar16x16_v_media| PROC
stmfd sp!, {r4-r12, lr}
pld [r0, r1, lsl #0]
pld [r2, r3, lsl #0]
mov r8, #0 ; initialize sum = 0
ldr r10, c80808080
mov r11, #0 ; initialize sse = 0
mov r12, #16 ; set loop counter to 16 (=block height)
mov lr, #0 ; constant zero
loop
add r9, r0, r1 ; set src pointer to next row
; 1st 4 pixels
ldr r4, [r0, #0] ; load 4 src pixels
ldr r6, [r9, #0] ; load 4 src pixels from next row
ldr r5, [r2, #0] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
usub8 r6, r4, r5 ; calculate difference
pld [r0, r1, lsl #1]
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
pld [r2, r3, lsl #1]
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
adds r8, r8, r4 ; add positive differences to sum
subs r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 2nd 4 pixels
ldr r4, [r0, #4] ; load 4 src pixels
ldr r6, [r9, #4] ; load 4 src pixels from next row
ldr r5, [r2, #4] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 3rd 4 pixels
ldr r4, [r0, #8] ; load 4 src pixels
ldr r6, [r9, #8] ; load 4 src pixels from next row
ldr r5, [r2, #8] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 4th 4 pixels
ldr r4, [r0, #12] ; load 4 src pixels
ldr r6, [r9, #12] ; load 4 src pixels from next row
ldr r5, [r2, #12] ; load 4 ref pixels
; bilinear interpolation
mvn r6, r6
uhsub8 r4, r4, r6
eor r4, r4, r10
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
add r0, r0, r1 ; set src_ptr to next row
sel r7, r6, lr ; select bytes with positive difference
usub8 r6, r5, r4 ; calculate difference with reversed operands
add r2, r2, r3 ; set dst_ptr to next row
sel r6, r6, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r7, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
smlad r11, r7, r7, r11 ; dual signed multiply, add and accumulate (2)
subs r12, r12, #1
bne loop
; return stuff
ldr r6, [sp, #40] ; get address of sse
mul r0, r8, r8 ; sum * sum
str r11, [r6] ; store sse
sub r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))
ldmfd sp!, {r4-r12, pc}
ENDP
c80808080
DCD 0x80808080
END

View File

@@ -1,361 +0,0 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_variance16x16_media|
EXPORT |aom_variance8x8_media|
EXPORT |aom_mse16x16_media|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; r0 unsigned char *src_ptr
; r1 int source_stride
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|aom_variance16x16_media| PROC
stmfd sp!, {r4-r12, lr}
pld [r0, r1, lsl #0]
pld [r2, r3, lsl #0]
mov r8, #0 ; initialize sum = 0
mov r11, #0 ; initialize sse = 0
mov r12, #16 ; set loop counter to 16 (=block height)
loop16x16
; 1st 4 pixels
ldr r4, [r0, #0] ; load 4 src pixels
ldr r5, [r2, #0] ; load 4 ref pixels
mov lr, #0 ; constant zero
usub8 r6, r4, r5 ; calculate difference
pld [r0, r1, lsl #1]
sel r7, r6, lr ; select bytes with positive difference
usub8 r9, r5, r4 ; calculate difference with reversed operands
pld [r2, r3, lsl #1]
sel r6, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
adds r8, r8, r4 ; add positive differences to sum
subs r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r10, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 2nd 4 pixels
ldr r4, [r0, #4] ; load 4 src pixels
ldr r5, [r2, #4] ; load 4 ref pixels
smlad r11, r10, r10, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r9, r5, r4 ; calculate difference with reversed operands
sel r6, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r10, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 3rd 4 pixels
ldr r4, [r0, #8] ; load 4 src pixels
ldr r5, [r2, #8] ; load 4 ref pixels
smlad r11, r10, r10, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
sel r7, r6, lr ; select bytes with positive difference
usub8 r9, r5, r4 ; calculate difference with reversed operands
sel r6, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r10, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
; 4th 4 pixels
ldr r4, [r0, #12] ; load 4 src pixels
ldr r5, [r2, #12] ; load 4 ref pixels
smlad r11, r10, r10, r11 ; dual signed multiply, add and accumulate (2)
usub8 r6, r4, r5 ; calculate difference
add r0, r0, r1 ; set src_ptr to next row
sel r7, r6, lr ; select bytes with positive difference
usub8 r9, r5, r4 ; calculate difference with reversed operands
add r2, r2, r3 ; set dst_ptr to next row
sel r6, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r4, r7, lr ; calculate sum of positive differences
usad8 r5, r6, lr ; calculate sum of negative differences
orr r6, r6, r7 ; differences of all 4 pixels
; calculate total sum
add r8, r8, r4 ; add positive differences to sum
sub r8, r8, r5 ; subtract negative differences from sum
; calculate sse
uxtb16 r5, r6 ; byte (two pixels) to halfwords
uxtb16 r10, r6, ror #8 ; another two pixels to halfwords
smlad r11, r5, r5, r11 ; dual signed multiply, add and accumulate (1)
smlad r11, r10, r10, r11 ; dual signed multiply, add and accumulate (2)
subs r12, r12, #1
bne loop16x16
; return stuff
ldr r6, [sp, #40] ; get address of sse
mul r0, r8, r8 ; sum * sum
str r11, [r6] ; store sse
sub r0, r11, r0, lsr #8 ; return (sse - ((sum * sum) >> 8))
ldmfd sp!, {r4-r12, pc}
ENDP
; r0 unsigned char *src_ptr
; r1 int source_stride
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|aom_variance8x8_media| PROC
push {r4-r10, lr}
pld [r0, r1, lsl #0]
pld [r2, r3, lsl #0]
mov r12, #8 ; set loop counter to 8 (=block height)
mov r4, #0 ; initialize sum = 0
mov r5, #0 ; initialize sse = 0
loop8x8
; 1st 4 pixels
ldr r6, [r0, #0x0] ; load 4 src pixels
ldr r7, [r2, #0x0] ; load 4 ref pixels
mov lr, #0 ; constant zero
usub8 r8, r6, r7 ; calculate difference
pld [r0, r1, lsl #1]
sel r10, r8, lr ; select bytes with positive difference
usub8 r9, r7, r6 ; calculate difference with reversed operands
pld [r2, r3, lsl #1]
sel r8, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r6, r10, lr ; calculate sum of positive differences
usad8 r7, r8, lr ; calculate sum of negative differences
orr r8, r8, r10 ; differences of all 4 pixels
; calculate total sum
add r4, r4, r6 ; add positive differences to sum
sub r4, r4, r7 ; subtract negative differences from sum
; calculate sse
uxtb16 r7, r8 ; byte (two pixels) to halfwords
uxtb16 r10, r8, ror #8 ; another two pixels to halfwords
smlad r5, r7, r7, r5 ; dual signed multiply, add and accumulate (1)
; 2nd 4 pixels
ldr r6, [r0, #0x4] ; load 4 src pixels
ldr r7, [r2, #0x4] ; load 4 ref pixels
smlad r5, r10, r10, r5 ; dual signed multiply, add and accumulate (2)
usub8 r8, r6, r7 ; calculate difference
add r0, r0, r1 ; set src_ptr to next row
sel r10, r8, lr ; select bytes with positive difference
usub8 r9, r7, r6 ; calculate difference with reversed operands
add r2, r2, r3 ; set dst_ptr to next row
sel r8, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r6, r10, lr ; calculate sum of positive differences
usad8 r7, r8, lr ; calculate sum of negative differences
orr r8, r8, r10 ; differences of all 4 pixels
; calculate total sum
add r4, r4, r6 ; add positive differences to sum
sub r4, r4, r7 ; subtract negative differences from sum
; calculate sse
uxtb16 r7, r8 ; byte (two pixels) to halfwords
uxtb16 r10, r8, ror #8 ; another two pixels to halfwords
smlad r5, r7, r7, r5 ; dual signed multiply, add and accumulate (1)
subs r12, r12, #1 ; next row
smlad r5, r10, r10, r5 ; dual signed multiply, add and accumulate (2)
bne loop8x8
; return stuff
ldr r8, [sp, #32] ; get address of sse
mul r1, r4, r4 ; sum * sum
str r5, [r8] ; store sse
sub r0, r5, r1, ASR #6 ; return (sse - ((sum * sum) >> 6))
pop {r4-r10, pc}
ENDP
; r0 unsigned char *src_ptr
; r1 int source_stride
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
;
;note: Based on aom_variance16x16_media. In this function, sum is never used.
; So, we can remove this part of calculation.
|aom_mse16x16_media| PROC
push {r4-r9, lr}
pld [r0, r1, lsl #0]
pld [r2, r3, lsl #0]
mov r12, #16 ; set loop counter to 16 (=block height)
mov r4, #0 ; initialize sse = 0
loopmse
; 1st 4 pixels
ldr r5, [r0, #0x0] ; load 4 src pixels
ldr r6, [r2, #0x0] ; load 4 ref pixels
mov lr, #0 ; constant zero
usub8 r8, r5, r6 ; calculate difference
pld [r0, r1, lsl #1]
sel r7, r8, lr ; select bytes with positive difference
usub8 r9, r6, r5 ; calculate difference with reversed operands
pld [r2, r3, lsl #1]
sel r8, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r5, r7, lr ; calculate sum of positive differences
usad8 r6, r8, lr ; calculate sum of negative differences
orr r8, r8, r7 ; differences of all 4 pixels
ldr r5, [r0, #0x4] ; load 4 src pixels
; calculate sse
uxtb16 r6, r8 ; byte (two pixels) to halfwords
uxtb16 r7, r8, ror #8 ; another two pixels to halfwords
smlad r4, r6, r6, r4 ; dual signed multiply, add and accumulate (1)
; 2nd 4 pixels
ldr r6, [r2, #0x4] ; load 4 ref pixels
smlad r4, r7, r7, r4 ; dual signed multiply, add and accumulate (2)
usub8 r8, r5, r6 ; calculate difference
sel r7, r8, lr ; select bytes with positive difference
usub8 r9, r6, r5 ; calculate difference with reversed operands
sel r8, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r5, r7, lr ; calculate sum of positive differences
usad8 r6, r8, lr ; calculate sum of negative differences
orr r8, r8, r7 ; differences of all 4 pixels
ldr r5, [r0, #0x8] ; load 4 src pixels
; calculate sse
uxtb16 r6, r8 ; byte (two pixels) to halfwords
uxtb16 r7, r8, ror #8 ; another two pixels to halfwords
smlad r4, r6, r6, r4 ; dual signed multiply, add and accumulate (1)
; 3rd 4 pixels
ldr r6, [r2, #0x8] ; load 4 ref pixels
smlad r4, r7, r7, r4 ; dual signed multiply, add and accumulate (2)
usub8 r8, r5, r6 ; calculate difference
sel r7, r8, lr ; select bytes with positive difference
usub8 r9, r6, r5 ; calculate difference with reversed operands
sel r8, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r5, r7, lr ; calculate sum of positive differences
usad8 r6, r8, lr ; calculate sum of negative differences
orr r8, r8, r7 ; differences of all 4 pixels
ldr r5, [r0, #0xc] ; load 4 src pixels
; calculate sse
uxtb16 r6, r8 ; byte (two pixels) to halfwords
uxtb16 r7, r8, ror #8 ; another two pixels to halfwords
smlad r4, r6, r6, r4 ; dual signed multiply, add and accumulate (1)
; 4th 4 pixels
ldr r6, [r2, #0xc] ; load 4 ref pixels
smlad r4, r7, r7, r4 ; dual signed multiply, add and accumulate (2)
usub8 r8, r5, r6 ; calculate difference
add r0, r0, r1 ; set src_ptr to next row
sel r7, r8, lr ; select bytes with positive difference
usub8 r9, r6, r5 ; calculate difference with reversed operands
add r2, r2, r3 ; set dst_ptr to next row
sel r8, r9, lr ; select bytes with negative difference
; calculate partial sums
usad8 r5, r7, lr ; calculate sum of positive differences
usad8 r6, r8, lr ; calculate sum of negative differences
orr r8, r8, r7 ; differences of all 4 pixels
subs r12, r12, #1 ; next row
; calculate sse
uxtb16 r6, r8 ; byte (two pixels) to halfwords
uxtb16 r7, r8, ror #8 ; another two pixels to halfwords
smlad r4, r6, r6, r4 ; dual signed multiply, add and accumulate (1)
smlad r4, r7, r7, r4 ; dual signed multiply, add and accumulate (2)
bne loopmse
; return stuff
ldr r1, [sp, #28] ; get address of sse
mov r0, r4 ; return sse
str r4, [r1] ; store sse
pop {r4-r9, pc}
ENDP
END

View File

@@ -1,240 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITREADER_H_
#define AOM_DSP_BITREADER_H_
#include <assert.h>
#include <limits.h>
#include "./aom_config.h"
#if CONFIG_EC_ADAPT && !CONFIG_EC_MULTISYMBOL
#error "CONFIG_EC_ADAPT is enabled without enabling CONFIG_EC_MULTISYMBOL."
#endif
#include "aom/aomdx.h"
#include "aom/aom_integer.h"
#if CONFIG_ANS
#include "aom_dsp/ansreader.h"
#elif CONFIG_DAALA_EC
#include "aom_dsp/daalaboolreader.h"
#else
#include "aom_dsp/dkboolreader.h"
#endif
#include "aom_dsp/prob.h"
#include "av1/common/odintrin.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#define ACCT_STR_NAME acct_str
#define ACCT_STR_PARAM , const char *ACCT_STR_NAME
#define ACCT_STR_ARG(s) , s
#else
#define ACCT_STR_PARAM
#define ACCT_STR_ARG(s)
#endif
#define aom_read(r, prob, ACCT_STR_NAME) \
aom_read_(r, prob ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_bit(r, ACCT_STR_NAME) \
aom_read_bit_(r ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_tree(r, tree, probs, ACCT_STR_NAME) \
aom_read_tree_(r, tree, probs ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_literal(r, bits, ACCT_STR_NAME) \
aom_read_literal_(r, bits ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_tree_bits(r, tree, probs, ACCT_STR_NAME) \
aom_read_tree_bits_(r, tree, probs ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_symbol(r, cdf, nsymbs, ACCT_STR_NAME) \
aom_read_symbol_(r, cdf, nsymbs ACCT_STR_ARG(ACCT_STR_NAME))
#ifdef __cplusplus
extern "C" {
#endif
#if CONFIG_ANS
typedef struct AnsDecoder aom_reader;
#elif CONFIG_DAALA_EC
typedef struct daala_reader aom_reader;
#else
typedef struct aom_dk_reader aom_reader;
#endif
static INLINE int aom_reader_init(aom_reader *r, const uint8_t *buffer,
size_t size, aom_decrypt_cb decrypt_cb,
void *decrypt_state) {
#if CONFIG_ANS
(void)decrypt_cb;
(void)decrypt_state;
assert(size <= INT_MAX);
return ans_read_init(r, buffer, size);
#elif CONFIG_DAALA_EC
(void)decrypt_cb;
(void)decrypt_state;
return aom_daala_reader_init(r, buffer, size);
#else
return aom_dk_reader_init(r, buffer, size, decrypt_cb, decrypt_state);
#endif
}
static INLINE const uint8_t *aom_reader_find_end(aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "Use the raw buffer size with ANS");
return NULL;
#elif CONFIG_DAALA_EC
return aom_daala_reader_find_end(r);
#else
return aom_dk_reader_find_end(r);
#endif
}
static INLINE int aom_reader_has_error(aom_reader *r) {
#if CONFIG_ANS
return ans_reader_has_error(r);
#elif CONFIG_DAALA_EC
return aom_daala_reader_has_error(r);
#else
return aom_dk_reader_has_error(r);
#endif
}
// Returns the position in the bit reader in bits.
static INLINE uint32_t aom_reader_tell(const aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "aom_reader_tell() is unimplemented for ANS");
return 0;
#elif CONFIG_DAALA_EC
return aom_daala_reader_tell(r);
#else
return aom_dk_reader_tell(r);
#endif
}
// Returns the position in the bit reader in 1/8th bits.
static INLINE uint32_t aom_reader_tell_frac(const aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "aom_reader_tell_frac() is unimplemented for ANS");
return 0;
#elif CONFIG_DAALA_EC
return aom_daala_reader_tell_frac(r);
#else
return aom_dk_reader_tell_frac(r);
#endif
}
#if CONFIG_ACCOUNTING
static INLINE void aom_process_accounting(const aom_reader *r ACCT_STR_PARAM) {
if (r->accounting != NULL) {
uint32_t tell_frac;
tell_frac = aom_reader_tell_frac(r);
aom_accounting_record(r->accounting, ACCT_STR_NAME,
tell_frac - r->accounting->last_tell_frac);
r->accounting->last_tell_frac = tell_frac;
}
}
#endif
static INLINE int aom_read_(aom_reader *r, int prob ACCT_STR_PARAM) {
int ret;
#if CONFIG_ANS
ret = uabs_read(r, prob);
#elif CONFIG_DAALA_EC
ret = aom_daala_read(r, prob);
#else
ret = aom_dk_read(r, prob);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
static INLINE int aom_read_bit_(aom_reader *r ACCT_STR_PARAM) {
int ret;
#if CONFIG_ANS
ret = uabs_read_bit(r); // Non trivial optimization at half probability
#else
ret = aom_read(r, 128, NULL); // aom_prob_half
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
static INLINE int aom_read_literal_(aom_reader *r, int bits ACCT_STR_PARAM) {
int literal = 0, bit;
for (bit = bits - 1; bit >= 0; bit--) literal |= aom_read_bit(r, NULL) << bit;
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return literal;
}
static INLINE int aom_read_tree_bits_(aom_reader *r, const aom_tree_index *tree,
const aom_prob *probs ACCT_STR_PARAM) {
aom_tree_index i = 0;
while ((i = tree[i + aom_read(r, probs[i >> 1], NULL)]) > 0) continue;
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return -i;
}
static INLINE int aom_read_tree_(aom_reader *r, const aom_tree_index *tree,
const aom_prob *probs ACCT_STR_PARAM) {
int ret;
#if CONFIG_DAALA_EC
ret = daala_read_tree_bits(r, tree, probs);
#else
ret = aom_read_tree_bits(r, tree, probs, NULL);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
#if CONFIG_EC_MULTISYMBOL
static INLINE int aom_read_symbol_(aom_reader *r, aom_cdf_prob *cdf,
int nsymbs ACCT_STR_PARAM) {
int ret;
#if CONFIG_RANS
(void)nsymbs;
ret = rans_read(r, cdf);
#elif CONFIG_DAALA_EC
ret = daala_read_symbol(r, cdf, nsymbs);
#else
#error \
"CONFIG_EC_MULTISYMBOL is selected without a valid backing entropy " \
"coder. Enable daala_ec or ans for a valid configuration."
#endif
#if CONFIG_EC_ADAPT
update_cdf(cdf, ret, nsymbs);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
#endif // CONFIG_EC_MULTISYMBOL
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITREADER_H_

View File

@@ -1,47 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#include "./bitreader_buffer.h"
size_t aom_rb_bytes_read(struct aom_read_bit_buffer *rb) {
return (rb->bit_offset + 7) >> 3;
}
int aom_rb_read_bit(struct aom_read_bit_buffer *rb) {
const size_t off = rb->bit_offset;
const size_t p = off >> 3;
const int q = 7 - (int)(off & 0x7);
if (rb->bit_buffer + p < rb->bit_buffer_end) {
const int bit = (rb->bit_buffer[p] >> q) & 1;
rb->bit_offset = off + 1;
return bit;
} else {
rb->error_handler(rb->error_handler_data);
return 0;
}
}
int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits) {
int value = 0, bit;
for (bit = bits - 1; bit >= 0; bit--) value |= aom_rb_read_bit(rb) << bit;
return value;
}
int aom_rb_read_signed_literal(struct aom_read_bit_buffer *rb, int bits) {
const int value = aom_rb_read_literal(rb, bits);
return aom_rb_read_bit(rb) ? -value : value;
}
int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits) {
const int nbits = sizeof(unsigned) * 8 - bits - 1;
const unsigned value = (unsigned)aom_rb_read_literal(rb, bits + 1) << nbits;
return ((int)value) >> nbits;
}

View File

@@ -1,48 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITREADER_BUFFER_H_
#define AOM_DSP_BITREADER_BUFFER_H_
#include <limits.h>
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef void (*aom_rb_error_handler)(void *data);
struct aom_read_bit_buffer {
const uint8_t *bit_buffer;
const uint8_t *bit_buffer_end;
size_t bit_offset;
void *error_handler_data;
aom_rb_error_handler error_handler;
};
size_t aom_rb_bytes_read(struct aom_read_bit_buffer *rb);
int aom_rb_read_bit(struct aom_read_bit_buffer *rb);
int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits);
int aom_rb_read_signed_literal(struct aom_read_bit_buffer *rb, int bits);
int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITREADER_BUFFER_H_

View File

@@ -1,179 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITWRITER_H_
#define AOM_DSP_BITWRITER_H_
#include <assert.h>
#include "./aom_config.h"
#if CONFIG_EC_ADAPT && !CONFIG_EC_MULTISYMBOL
#error "CONFIG_EC_ADAPT is enabled without enabling CONFIG_EC_MULTISYMBOL"
#endif
#if CONFIG_ANS
#include "aom_dsp/buf_ans.h"
#elif CONFIG_DAALA_EC
#include "aom_dsp/daalaboolwriter.h"
#else
#include "aom_dsp/dkboolwriter.h"
#endif
#include "aom_dsp/prob.h"
#if CONFIG_RD_DEBUG
#include "av1/encoder/cost.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif
#if CONFIG_ANS
typedef struct BufAnsCoder aom_writer;
#elif CONFIG_DAALA_EC
typedef struct daala_writer aom_writer;
#else
typedef struct aom_dk_writer aom_writer;
#endif
typedef struct TOKEN_STATS { int64_t cost; } TOKEN_STATS;
static INLINE void aom_start_encode(aom_writer *bc, uint8_t *buffer) {
#if CONFIG_ANS
(void)bc;
(void)buffer;
assert(0 && "buf_ans requires a more complicated startup procedure");
#elif CONFIG_DAALA_EC
aom_daala_start_encode(bc, buffer);
#else
aom_dk_start_encode(bc, buffer);
#endif
}
static INLINE void aom_stop_encode(aom_writer *bc) {
#if CONFIG_ANS
(void)bc;
assert(0 && "buf_ans requires a more complicated shutdown procedure");
#elif CONFIG_DAALA_EC
aom_daala_stop_encode(bc);
#else
aom_dk_stop_encode(bc);
#endif
}
static INLINE void aom_write(aom_writer *br, int bit, int probability) {
#if CONFIG_ANS
buf_uabs_write(br, bit, probability);
#elif CONFIG_DAALA_EC
aom_daala_write(br, bit, probability);
#else
aom_dk_write(br, bit, probability);
#endif
}
static INLINE void aom_write_record(aom_writer *br, int bit, int probability,
TOKEN_STATS *token_stats) {
aom_write(br, bit, probability);
#if CONFIG_RD_DEBUG
token_stats->cost += av1_cost_bit(probability, bit);
#else
(void)token_stats;
#endif
}
static INLINE void aom_write_bit(aom_writer *w, int bit) {
aom_write(w, bit, 128); // aom_prob_half
}
static INLINE void aom_write_bit_record(aom_writer *w, int bit,
TOKEN_STATS *token_stats) {
aom_write_record(w, bit, 128, token_stats); // aom_prob_half
}
static INLINE void aom_write_literal(aom_writer *w, int data, int bits) {
int bit;
for (bit = bits - 1; bit >= 0; bit--) aom_write_bit(w, 1 & (data >> bit));
}
static INLINE void aom_write_tree_bits(aom_writer *w, const aom_tree_index *tr,
const aom_prob *probs, int bits, int len,
aom_tree_index i) {
do {
const int bit = (bits >> --len) & 1;
aom_write(w, bit, probs[i >> 1]);
i = tr[i + bit];
} while (len);
}
static INLINE void aom_write_tree_bits_record(aom_writer *w,
const aom_tree_index *tr,
const aom_prob *probs, int bits,
int len, aom_tree_index i,
TOKEN_STATS *token_stats) {
do {
const int bit = (bits >> --len) & 1;
aom_write_record(w, bit, probs[i >> 1], token_stats);
i = tr[i + bit];
} while (len);
}
static INLINE void aom_write_tree(aom_writer *w, const aom_tree_index *tree,
const aom_prob *probs, int bits, int len,
aom_tree_index i) {
#if CONFIG_DAALA_EC
daala_write_tree_bits(w, tree, probs, bits, len, i);
#else
aom_write_tree_bits(w, tree, probs, bits, len, i);
#endif
}
static INLINE void aom_write_tree_record(aom_writer *w,
const aom_tree_index *tree,
const aom_prob *probs, int bits,
int len, aom_tree_index i,
TOKEN_STATS *token_stats) {
#if CONFIG_DAALA_EC
(void)token_stats;
daala_write_tree_bits(w, tree, probs, bits, len, i);
#else
aom_write_tree_bits_record(w, tree, probs, bits, len, i, token_stats);
#endif
}
#if CONFIG_EC_MULTISYMBOL
static INLINE void aom_write_symbol(aom_writer *w, int symb, aom_cdf_prob *cdf,
int nsymbs) {
#if CONFIG_RANS
struct rans_sym s;
(void)nsymbs;
assert(cdf);
s.cum_prob = symb > 0 ? cdf[symb - 1] : 0;
s.prob = cdf[symb] - s.cum_prob;
buf_rans_write(w, &s);
#elif CONFIG_DAALA_EC
daala_write_symbol(w, symb, cdf, nsymbs);
#else
#error \
"CONFIG_EC_MULTISYMBOL is selected without a valid backing entropy " \
"coder. Enable daala_ec or ans for a valid configuration."
#endif
#if CONFIG_EC_ADAPT
update_cdf(cdf, symb, nsymbs);
#endif
}
#endif // CONFIG_EC_MULTISYMBOL
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITWRITER_H_

View File

@@ -1,43 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <limits.h>
#include <stdlib.h>
#include "./aom_config.h"
#include "./bitwriter_buffer.h"
size_t aom_wb_bytes_written(const struct aom_write_bit_buffer *wb) {
return wb->bit_offset / CHAR_BIT + (wb->bit_offset % CHAR_BIT > 0);
}
void aom_wb_write_bit(struct aom_write_bit_buffer *wb, int bit) {
const int off = (int)wb->bit_offset;
const int p = off / CHAR_BIT;
const int q = CHAR_BIT - 1 - off % CHAR_BIT;
if (q == CHAR_BIT - 1) {
wb->bit_buffer[p] = bit << q;
} else {
wb->bit_buffer[p] &= ~(1 << q);
wb->bit_buffer[p] |= bit << q;
}
wb->bit_offset = off + 1;
}
void aom_wb_write_literal(struct aom_write_bit_buffer *wb, int data, int bits) {
int bit;
for (bit = bits - 1; bit >= 0; bit--) aom_wb_write_bit(wb, (data >> bit) & 1);
}
void aom_wb_write_inv_signed_literal(struct aom_write_bit_buffer *wb, int data,
int bits) {
aom_wb_write_literal(wb, data, bits + 1);
}

View File

@@ -1,39 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITWRITER_BUFFER_H_
#define AOM_DSP_BITWRITER_BUFFER_H_
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
struct aom_write_bit_buffer {
uint8_t *bit_buffer;
size_t bit_offset;
};
size_t aom_wb_bytes_written(const struct aom_write_bit_buffer *wb);
void aom_wb_write_bit(struct aom_write_bit_buffer *wb, int bit);
void aom_wb_write_literal(struct aom_write_bit_buffer *wb, int data, int bits);
void aom_wb_write_inv_signed_literal(struct aom_write_bit_buffer *wb, int data,
int bits);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITWRITER_BUFFER_H_

View File

@@ -1,42 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BLEND_H_
#define AOM_DSP_BLEND_H_
#include "aom_ports/mem.h"
// Various blending functions and macros.
// See also the aom_blend_* functions in aom_dsp_rtcd.h
// Alpha blending with alpha values from the range [0, 64], where 64
// means use the first input and 0 means use the second input.
#define AOM_BLEND_A64_ROUND_BITS 6
#define AOM_BLEND_A64_MAX_ALPHA (1 << AOM_BLEND_A64_ROUND_BITS) // 64
#define AOM_BLEND_A64(a, v0, v1) \
ROUND_POWER_OF_TWO((a) * (v0) + (AOM_BLEND_A64_MAX_ALPHA - (a)) * (v1), \
AOM_BLEND_A64_ROUND_BITS)
// Alpha blending with alpha values from the range [0, 256], where 256
// means use the first input and 0 means use the second input.
#define AOM_BLEND_A256_ROUND_BITS 8
#define AOM_BLEND_A256_MAX_ALPHA (1 << AOM_BLEND_A256_ROUND_BITS) // 256
#define AOM_BLEND_A256(a, v0, v1) \
ROUND_POWER_OF_TWO((a) * (v0) + (AOM_BLEND_A256_MAX_ALPHA - (a)) * (v1), \
AOM_BLEND_A256_ROUND_BITS)
// Blending by averaging.
#define AOM_BLEND_AVG(v0, v1) ROUND_POWER_OF_TWO((v0) + (v1), 1)
#endif // AOM_DSP_BLEND_H_

View File

@@ -1,71 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#include "aom_dsp/aom_dsp_common.h"
#include "aom_dsp/blend.h"
#include "./aom_dsp_rtcd.h"
void aom_blend_a64_hmask_c(uint8_t *dst, uint32_t dst_stride,
const uint8_t *src0, uint32_t src0_stride,
const uint8_t *src1, uint32_t src1_stride,
const uint8_t *mask, int h, int w) {
int i, j;
assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
assert(h >= 1);
assert(w >= 1);
assert(IS_POWER_OF_TWO(h));
assert(IS_POWER_OF_TWO(w));
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
dst[i * dst_stride + j] = AOM_BLEND_A64(
mask[j], src0[i * src0_stride + j], src1[i * src1_stride + j]);
}
}
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_blend_a64_hmask_c(uint8_t *dst_8, uint32_t dst_stride,
const uint8_t *src0_8, uint32_t src0_stride,
const uint8_t *src1_8, uint32_t src1_stride,
const uint8_t *mask, int h, int w, int bd) {
int i, j;
uint16_t *dst = CONVERT_TO_SHORTPTR(dst_8);
const uint16_t *src0 = CONVERT_TO_SHORTPTR(src0_8);
const uint16_t *src1 = CONVERT_TO_SHORTPTR(src1_8);
(void)bd;
assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
assert(h >= 1);
assert(w >= 1);
assert(IS_POWER_OF_TWO(h));
assert(IS_POWER_OF_TWO(w));
assert(bd == 8 || bd == 10 || bd == 12);
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
dst[i * dst_stride + j] = AOM_BLEND_A64(
mask[j], src0[i * src0_stride + j], src1[i * src1_stride + j]);
}
}
}
#endif // CONFIG_AOM_HIGHBITDEPTH

View File

@@ -1,145 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#include "aom_dsp/blend.h"
#include "aom_dsp/aom_dsp_common.h"
#include "./aom_dsp_rtcd.h"
// Blending with alpha mask. Mask values come from the range [0, 64],
// as described for AOM_BLEND_A64 in aom_dsp/blend.h. src0 or src1 can
// be the same as dst, or dst can be different from both sources.
void aom_blend_a64_mask_c(uint8_t *dst, uint32_t dst_stride,
const uint8_t *src0, uint32_t src0_stride,
const uint8_t *src1, uint32_t src1_stride,
const uint8_t *mask, uint32_t mask_stride, int h,
int w, int subh, int subw) {
int i, j;
assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
assert(h >= 1);
assert(w >= 1);
assert(IS_POWER_OF_TWO(h));
assert(IS_POWER_OF_TWO(w));
if (subw == 0 && subh == 0) {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = mask[i * mask_stride + j];
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
} else if (subw == 1 && subh == 1) {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = ROUND_POWER_OF_TWO(
mask[(2 * i) * mask_stride + (2 * j)] +
mask[(2 * i + 1) * mask_stride + (2 * j)] +
mask[(2 * i) * mask_stride + (2 * j + 1)] +
mask[(2 * i + 1) * mask_stride + (2 * j + 1)],
2);
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
} else if (subw == 1 && subh == 0) {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = AOM_BLEND_AVG(mask[i * mask_stride + (2 * j)],
mask[i * mask_stride + (2 * j + 1)]);
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
} else {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = AOM_BLEND_AVG(mask[(2 * i) * mask_stride + j],
mask[(2 * i + 1) * mask_stride + j]);
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
}
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_blend_a64_mask_c(uint8_t *dst_8, uint32_t dst_stride,
const uint8_t *src0_8, uint32_t src0_stride,
const uint8_t *src1_8, uint32_t src1_stride,
const uint8_t *mask, uint32_t mask_stride,
int h, int w, int subh, int subw, int bd) {
int i, j;
uint16_t *dst = CONVERT_TO_SHORTPTR(dst_8);
const uint16_t *src0 = CONVERT_TO_SHORTPTR(src0_8);
const uint16_t *src1 = CONVERT_TO_SHORTPTR(src1_8);
(void)bd;
assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
assert(h >= 1);
assert(w >= 1);
assert(IS_POWER_OF_TWO(h));
assert(IS_POWER_OF_TWO(w));
assert(bd == 8 || bd == 10 || bd == 12);
if (subw == 0 && subh == 0) {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = mask[i * mask_stride + j];
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
} else if (subw == 1 && subh == 1) {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = ROUND_POWER_OF_TWO(
mask[(2 * i) * mask_stride + (2 * j)] +
mask[(2 * i + 1) * mask_stride + (2 * j)] +
mask[(2 * i) * mask_stride + (2 * j + 1)] +
mask[(2 * i + 1) * mask_stride + (2 * j + 1)],
2);
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
} else if (subw == 1 && subh == 0) {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = AOM_BLEND_AVG(mask[i * mask_stride + (2 * j)],
mask[i * mask_stride + (2 * j + 1)]);
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
} else {
for (i = 0; i < h; ++i) {
for (j = 0; j < w; ++j) {
const int m = AOM_BLEND_AVG(mask[(2 * i) * mask_stride + j],
mask[(2 * i + 1) * mask_stride + j]);
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
}
}
#endif // CONFIG_AOM_HIGHBITDEPTH

View File

@@ -1,73 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#include "aom_dsp/aom_dsp_common.h"
#include "aom_dsp/blend.h"
#include "./aom_dsp_rtcd.h"
void aom_blend_a64_vmask_c(uint8_t *dst, uint32_t dst_stride,
const uint8_t *src0, uint32_t src0_stride,
const uint8_t *src1, uint32_t src1_stride,
const uint8_t *mask, int h, int w) {
int i, j;
assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
assert(h >= 1);
assert(w >= 1);
assert(IS_POWER_OF_TWO(h));
assert(IS_POWER_OF_TWO(w));
for (i = 0; i < h; ++i) {
const int m = mask[i];
for (j = 0; j < w; ++j) {
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_blend_a64_vmask_c(uint8_t *dst_8, uint32_t dst_stride,
const uint8_t *src0_8, uint32_t src0_stride,
const uint8_t *src1_8, uint32_t src1_stride,
const uint8_t *mask, int h, int w, int bd) {
int i, j;
uint16_t *dst = CONVERT_TO_SHORTPTR(dst_8);
const uint16_t *src0 = CONVERT_TO_SHORTPTR(src0_8);
const uint16_t *src1 = CONVERT_TO_SHORTPTR(src1_8);
(void)bd;
assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
assert(h >= 1);
assert(w >= 1);
assert(IS_POWER_OF_TWO(h));
assert(IS_POWER_OF_TWO(w));
assert(bd == 8 || bd == 10 || bd == 12);
for (i = 0; i < h; ++i) {
const int m = mask[i];
for (j = 0; j < w; ++j) {
dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
src1[i * src1_stride + j]);
}
}
}
#endif // CONFIG_AOM_HIGHBITDEPTH

View File

@@ -1,42 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <string.h>
#include "aom_dsp/buf_ans.h"
#include "aom_mem/aom_mem.h"
#include "aom/internal/aom_codec_internal.h"
void aom_buf_ans_alloc(struct BufAnsCoder *c,
struct aom_internal_error_info *error, int size_hint) {
c->error = error;
c->size = size_hint;
AOM_CHECK_MEM_ERROR(error, c->buf, aom_malloc(c->size * sizeof(*c->buf)));
// Initialize to overfull to trigger the assert in write.
c->offset = c->size + 1;
}
void aom_buf_ans_free(struct BufAnsCoder *c) {
aom_free(c->buf);
c->buf = NULL;
c->size = 0;
}
void aom_buf_ans_grow(struct BufAnsCoder *c) {
struct buffered_ans_symbol *new_buf = NULL;
int new_size = c->size * 2;
AOM_CHECK_MEM_ERROR(c->error, new_buf,
aom_malloc(new_size * sizeof(*new_buf)));
memcpy(new_buf, c->buf, c->size * sizeof(*c->buf));
aom_free(c->buf);
c->buf = new_buf;
c->size = new_size;
}

View File

@@ -1,112 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BUF_ANS_H_
#define AOM_DSP_BUF_ANS_H_
// Buffered forward ANS writer.
// Symbols are written to the writer in forward (decode) order and serialized
// backwards due to ANS's stack like behavior.
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/ans.h"
#include "aom_dsp/answriter.h"
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
#define ANS_METHOD_UABS 0
#define ANS_METHOD_RANS 1
struct buffered_ans_symbol {
unsigned int method : 1; // one of ANS_METHOD_UABS or ANS_METHOD_RANS
// TODO(aconverse): Should be possible to write this in terms of start for ABS
unsigned int val_start : RANS_PROB_BITS; // Boolean value for ABS
// start in symbol cycle for Rans
unsigned int prob : RANS_PROB_BITS; // Probability of this symbol
};
struct BufAnsCoder {
struct aom_internal_error_info *error;
struct buffered_ans_symbol *buf;
int size;
int offset;
};
void aom_buf_ans_alloc(struct BufAnsCoder *c,
struct aom_internal_error_info *error, int size_hint);
void aom_buf_ans_free(struct BufAnsCoder *c);
void aom_buf_ans_grow(struct BufAnsCoder *c);
static INLINE void buf_ans_write_reset(struct BufAnsCoder *const c) {
c->offset = 0;
}
static INLINE void buf_uabs_write(struct BufAnsCoder *const c, uint8_t val,
AnsP8 prob) {
assert(c->offset <= c->size);
if (c->offset == c->size) {
aom_buf_ans_grow(c);
}
c->buf[c->offset].method = ANS_METHOD_UABS;
c->buf[c->offset].val_start = val;
c->buf[c->offset].prob = prob;
++c->offset;
}
static INLINE void buf_rans_write(struct BufAnsCoder *const c,
const struct rans_sym *const sym) {
assert(c->offset <= c->size);
if (c->offset == c->size) {
aom_buf_ans_grow(c);
}
c->buf[c->offset].method = ANS_METHOD_RANS;
c->buf[c->offset].val_start = sym->cum_prob;
c->buf[c->offset].prob = sym->prob;
++c->offset;
}
static INLINE void buf_ans_flush(const struct BufAnsCoder *const c,
struct AnsCoder *ans) {
int offset;
for (offset = c->offset - 1; offset >= 0; --offset) {
if (c->buf[offset].method == ANS_METHOD_RANS) {
struct rans_sym sym;
sym.prob = c->buf[offset].prob;
sym.cum_prob = c->buf[offset].val_start;
rans_write(ans, &sym);
} else {
uabs_write(ans, (uint8_t)c->buf[offset].val_start,
(AnsP8)c->buf[offset].prob);
}
}
}
static INLINE void buf_uabs_write_bit(struct BufAnsCoder *c, int bit) {
buf_uabs_write(c, bit, 128);
}
static INLINE void buf_uabs_write_literal(struct BufAnsCoder *c, int literal,
int bits) {
int bit;
assert(bits < 31);
for (bit = bits - 1; bit >= 0; bit--)
buf_uabs_write_bit(c, 1 & (literal >> bit));
}
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_BUF_ANS_H_

View File

@@ -1,37 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "aom_dsp/daalaboolreader.h"
int aom_daala_reader_init(daala_reader *r, const uint8_t *buffer, int size) {
if (size && !buffer) {
return 1;
}
r->buffer_end = buffer + size;
r->buffer = buffer;
od_ec_dec_init(&r->ec, buffer, size - 1);
#if CONFIG_ACCOUNTING
r->accounting = NULL;
#endif
return 0;
}
const uint8_t *aom_daala_reader_find_end(daala_reader *r) {
return r->buffer_end;
}
uint32_t aom_daala_reader_tell(const daala_reader *r) {
return od_ec_dec_tell(&r->ec);
}
uint32_t aom_daala_reader_tell_frac(const daala_reader *r) {
return od_ec_dec_tell_frac(&r->ec);
}

View File

@@ -1,87 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_DAALABOOLREADER_H_
#define AOM_DSP_DAALABOOLREADER_H_
#include "aom/aom_integer.h"
#include "aom_dsp/entdec.h"
#include "aom_dsp/prob.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif
struct daala_reader {
const uint8_t *buffer;
const uint8_t *buffer_end;
od_ec_dec ec;
#if CONFIG_ACCOUNTING
Accounting *accounting;
#endif
};
typedef struct daala_reader daala_reader;
int aom_daala_reader_init(daala_reader *r, const uint8_t *buffer, int size);
const uint8_t *aom_daala_reader_find_end(daala_reader *r);
uint32_t aom_daala_reader_tell(const daala_reader *r);
uint32_t aom_daala_reader_tell_frac(const daala_reader *r);
static INLINE int aom_daala_read(daala_reader *r, int prob) {
if (prob == 128) {
return od_ec_dec_bits(&r->ec, 1, "aom_bits");
} else {
int p = ((prob << 15) + (256 - prob)) >> 8;
return od_ec_decode_bool_q15(&r->ec, p);
}
}
static INLINE int aom_daala_read_bit(daala_reader *r) {
return aom_daala_read(r, 128);
}
static INLINE int aom_daala_reader_has_error(daala_reader *r) {
return r->ec.error;
}
static INLINE int daala_read_tree_bits(daala_reader *r,
const aom_tree_index *tree,
const aom_prob *probs) {
aom_tree_index i = 0;
do {
aom_cdf_prob cdf[16];
aom_tree_index index[16];
int path[16];
int dist[16];
int nsymbs;
int symb;
nsymbs = tree_to_cdf(tree, probs, i, cdf, index, path, dist);
symb = od_ec_decode_cdf_q15(&r->ec, cdf, nsymbs);
OD_ASSERT(symb >= 0 && symb < nsymbs);
i = index[symb];
} while (i > 0);
return -i;
}
static INLINE int daala_read_symbol(daala_reader *r, const aom_cdf_prob *cdf,
int nsymbs) {
return od_ec_decode_cdf_q15(&r->ec, cdf, nsymbs);
}
#ifdef __cplusplus
} // extern "C"
#endif
#endif

View File

@@ -1,32 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <string.h>
#include "aom_dsp/daalaboolwriter.h"
void aom_daala_start_encode(daala_writer *br, uint8_t *source) {
br->buffer = source;
br->pos = 0;
od_ec_enc_init(&br->ec, 62025);
}
void aom_daala_stop_encode(daala_writer *br) {
uint32_t daala_bytes;
unsigned char *daala_data;
daala_data = od_ec_enc_done(&br->ec, &daala_bytes);
memcpy(br->buffer, daala_data, daala_bytes);
br->pos = daala_bytes;
/* Prevent ec bitstream from being detected as a superframe marker.
Must always be added, so that rawbits knows the exact length of the
bitstream. */
br->buffer[br->pos++] = 0;
od_ec_enc_clear(&br->ec);
}

View File

@@ -1,90 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_DAALABOOLWRITER_H_
#define AOM_DSP_DAALABOOLWRITER_H_
#include "aom_dsp/entenc.h"
#include "aom_dsp/prob.h"
#ifdef __cplusplus
extern "C" {
#endif
struct daala_writer {
unsigned int pos;
uint8_t *buffer;
od_ec_enc ec;
};
typedef struct daala_writer daala_writer;
void aom_daala_start_encode(daala_writer *w, uint8_t *buffer);
void aom_daala_stop_encode(daala_writer *w);
static INLINE void aom_daala_write(daala_writer *w, int bit, int prob) {
if (prob == 128) {
od_ec_enc_bits(&w->ec, bit, 1);
} else {
int p = ((prob << 15) + (256 - prob)) >> 8;
od_ec_encode_bool_q15(&w->ec, bit, p);
}
}
static INLINE void daala_write_tree_bits(daala_writer *w,
const aom_tree_index *tree,
const aom_prob *probs, int bits,
int len, aom_tree_index i) {
aom_tree_index root;
root = i;
do {
aom_cdf_prob cdf[16];
aom_tree_index index[16];
int path[16];
int dist[16];
int nsymbs;
int symb;
int j;
/* Compute the CDF of the binary tree using the given probabilities. */
nsymbs = tree_to_cdf(tree, probs, root, cdf, index, path, dist);
/* Find the symbol to code. */
symb = -1;
for (j = 0; j < nsymbs; j++) {
/* If this symbol codes a leaf node, */
if (index[j] <= 0) {
if (len == dist[j] && path[j] == bits) {
symb = j;
break;
}
} else {
if (len > dist[j] && path[j] == bits >> (len - dist[j])) {
symb = j;
break;
}
}
}
OD_ASSERT(symb != -1);
od_ec_encode_cdf_q15(&w->ec, symb, cdf, nsymbs);
bits &= (1 << (len - dist[symb])) - 1;
len -= dist[symb];
} while (len);
}
static INLINE void daala_write_symbol(daala_writer *w, int symb,
const aom_cdf_prob *cdf, int nsymbs) {
od_ec_encode_cdf_q15(&w->ec, symb, cdf, nsymbs);
}
#ifdef __cplusplus
} // extern "C"
#endif
#endif

View File

@@ -1,180 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_DKBOOLREADER_H_
#define AOM_DSP_DKBOOLREADER_H_
#include <assert.h>
#include <stddef.h>
#include <limits.h>
#include "./aom_config.h"
#if CONFIG_BITSTREAM_DEBUG
#include <assert.h>
#include <stdio.h>
#include "aom_util/debug_util.h"
#endif // CONFIG_BITSTREAM_DEBUG
#include "aom_ports/mem.h"
#include "aom/aomdx.h"
#include "aom/aom_integer.h"
#include "aom_dsp/prob.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif
typedef size_t BD_VALUE;
#define BD_VALUE_SIZE ((int)sizeof(BD_VALUE) * CHAR_BIT)
// This is meant to be a large, positive constant that can still be efficiently
// loaded as an immediate (on platforms like ARM, for example).
// Even relatively modest values like 100 would work fine.
#define LOTS_OF_BITS 0x40000000
struct aom_dk_reader {
// Be careful when reordering this struct, it may impact the cache negatively.
BD_VALUE value;
unsigned int range;
int count;
const uint8_t *buffer_start;
const uint8_t *buffer_end;
const uint8_t *buffer;
aom_decrypt_cb decrypt_cb;
void *decrypt_state;
uint8_t clear_buffer[sizeof(BD_VALUE) + 1];
#if CONFIG_ACCOUNTING
Accounting *accounting;
#endif
};
int aom_dk_reader_init(struct aom_dk_reader *r, const uint8_t *buffer,
size_t size, aom_decrypt_cb decrypt_cb,
void *decrypt_state);
void aom_dk_reader_fill(struct aom_dk_reader *r);
const uint8_t *aom_dk_reader_find_end(struct aom_dk_reader *r);
static INLINE uint32_t aom_dk_reader_tell(const struct aom_dk_reader *r) {
const uint32_t bits_read = (r->buffer - r->buffer_start) * CHAR_BIT;
const int count =
(r->count < LOTS_OF_BITS) ? r->count : r->count - LOTS_OF_BITS;
assert(r->buffer >= r->buffer_start);
return bits_read - (count + CHAR_BIT);
}
/*The resolution of fractional-precision bit usage measurements, i.e.,
3 => 1/8th bits.*/
#define DK_BITRES (3)
static INLINE uint32_t aom_dk_reader_tell_frac(const struct aom_dk_reader *r) {
uint32_t num_bits;
uint32_t range;
int l;
int i;
num_bits = aom_dk_reader_tell(r) << DK_BITRES;
range = r->range;
l = 0;
for (i = DK_BITRES; i-- > 0;) {
int b;
range = range * range >> 7;
b = (int)(range >> 8);
l = l << 1 | b;
range >>= b;
}
return num_bits - l;
}
static INLINE int aom_dk_reader_has_error(struct aom_dk_reader *r) {
// Check if we have reached the end of the buffer.
//
// Variable 'count' stores the number of bits in the 'value' buffer, minus
// 8. The top byte is part of the algorithm, and the remainder is buffered
// to be shifted into it. So if count == 8, the top 16 bits of 'value' are
// occupied, 8 for the algorithm and 8 in the buffer.
//
// When reading a byte from the user's buffer, count is filled with 8 and
// one byte is filled into the value buffer. When we reach the end of the
// data, count is additionally filled with LOTS_OF_BITS. So when
// count == LOTS_OF_BITS - 1, the user's data has been exhausted.
//
// 1 if we have tried to decode bits after the end of stream was encountered.
// 0 No error.
return r->count > BD_VALUE_SIZE && r->count < LOTS_OF_BITS;
}
static INLINE int aom_dk_read(struct aom_dk_reader *r, int prob) {
unsigned int bit = 0;
BD_VALUE value;
BD_VALUE bigsplit;
int count;
unsigned int range;
unsigned int split = (r->range * prob + (256 - prob)) >> CHAR_BIT;
if (r->count < 0) aom_dk_reader_fill(r);
value = r->value;
count = r->count;
bigsplit = (BD_VALUE)split << (BD_VALUE_SIZE - CHAR_BIT);
range = split;
if (value >= bigsplit) {
range = r->range - split;
value = value - bigsplit;
bit = 1;
}
{
register int shift = aom_norm[range];
range <<= shift;
value <<= shift;
count -= shift;
}
r->value = value;
r->count = count;
r->range = range;
#if CONFIG_BITSTREAM_DEBUG
{
int ref_bit, ref_prob;
const int queue_r = bitstream_queue_get_read();
const int frame_idx = bitstream_queue_get_frame_read();
bitstream_queue_pop(&ref_bit, &ref_prob);
if (prob != ref_prob) {
fprintf(
stderr,
"\n *** prob error, frame_idx_r %d prob %d ref_prob %d queue_r %d\n",
frame_idx, prob, ref_prob, queue_r);
assert(0);
}
if ((int)bit != ref_bit) {
fprintf(stderr, "\n *** bit error, frame_idx_r %d bit %d ref_bit %d\n",
frame_idx, bit, ref_bit);
assert(0);
}
}
#endif // CONFIG_BITSTREAM_DEBUG
return bit;
}
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_DKBOOLREADER_H_

View File

@@ -1,44 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "./dkboolwriter.h"
static INLINE void aom_dk_write_bit(aom_dk_writer *w, int bit) {
aom_dk_write(w, bit, 128); // aom_prob_half
}
void aom_dk_start_encode(aom_dk_writer *br, uint8_t *source) {
br->lowvalue = 0;
br->range = 255;
br->count = -24;
br->buffer = source;
br->pos = 0;
aom_dk_write_bit(br, 0);
}
void aom_dk_stop_encode(aom_dk_writer *br) {
int i;
#if CONFIG_BITSTREAM_DEBUG
bitstream_queue_set_skip_write(1);
#endif // CONFIG_BITSTREAM_DEBUG
for (i = 0; i < 32; i++) aom_dk_write_bit(br, 0);
#if CONFIG_BITSTREAM_DEBUG
bitstream_queue_set_skip_write(0);
#endif // CONFIG_BITSTREAM_DEBUG
// Ensure there's no ambigous collision with any index marker bytes
if ((br->buffer[br->pos - 1] & 0xe0) == 0xc0) br->buffer[br->pos++] = 0;
}

View File

@@ -1,104 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_DKBOOLWRITER_H_
#define AOM_DSP_DKBOOLWRITER_H_
#include "./aom_config.h"
#if CONFIG_BITSTREAM_DEBUG
#include <stdio.h>
#include "aom_util/debug_util.h"
#endif // CONFIG_BITSTREAM_DEBUG
#include "aom_dsp/prob.h"
#include "aom_ports/mem.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef struct aom_dk_writer {
unsigned int lowvalue;
unsigned int range;
int count;
unsigned int pos;
uint8_t *buffer;
} aom_dk_writer;
void aom_dk_start_encode(aom_dk_writer *bc, uint8_t *buffer);
void aom_dk_stop_encode(aom_dk_writer *bc);
static INLINE void aom_dk_write(aom_dk_writer *br, int bit, int probability) {
unsigned int split;
int count = br->count;
unsigned int range = br->range;
unsigned int lowvalue = br->lowvalue;
register int shift;
#if CONFIG_BITSTREAM_DEBUG
// int queue_r = 0;
// int frame_idx_r = 0;
// int queue_w = bitstream_queue_get_write();
// int frame_idx_w = bitstream_queue_get_frame_write();
// if (frame_idx_w == frame_idx_r && queue_w == queue_r) {
// fprintf(stderr, "\n *** bitstream queue at frame_idx_w %d queue_w %d\n",
// frame_idx_w, queue_w);
// }
bitstream_queue_push(bit, probability);
#endif // CONFIG_BITSTREAM_DEBUG
split = 1 + (((range - 1) * probability) >> 8);
range = split;
if (bit) {
lowvalue += split;
range = br->range - split;
}
shift = aom_norm[range];
range <<= shift;
count += shift;
if (count >= 0) {
int offset = shift - count;
if ((lowvalue << (offset - 1)) & 0x80000000) {
int x = br->pos - 1;
while (x >= 0 && br->buffer[x] == 0xff) {
br->buffer[x] = 0;
x--;
}
br->buffer[x] += 1;
}
br->buffer[br->pos++] = (lowvalue >> (24 - offset));
lowvalue <<= offset;
shift = count;
lowvalue &= 0xffffff;
count -= 8;
}
lowvalue <<= shift;
br->count = count;
br->lowvalue = lowvalue;
br->range = range;
}
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_DKBOOLWRITER_H_

View File

@@ -1,80 +0,0 @@
/*
* Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifdef HAVE_CONFIG_H
#include "./config.h"
#endif
#include "aom_dsp/entcode.h"
/*CDFs for uniform probability distributions of small sizes (2 through 16,
inclusive).*/
// clang-format off
const uint16_t OD_UNIFORM_CDFS_Q15[135] = {
16384, 32768,
10923, 21845, 32768,
8192, 16384, 24576, 32768,
6554, 13107, 19661, 26214, 32768,
5461, 10923, 16384, 21845, 27307, 32768,
4681, 9362, 14043, 18725, 23406, 28087, 32768,
4096, 8192, 12288, 16384, 20480, 24576, 28672, 32768,
3641, 7282, 10923, 14564, 18204, 21845, 25486, 29127, 32768,
3277, 6554, 9830, 13107, 16384, 19661, 22938, 26214, 29491, 32768,
2979, 5958, 8937, 11916, 14895, 17873, 20852, 23831, 26810, 29789, 32768,
2731, 5461, 8192, 10923, 13653, 16384, 19115, 21845, 24576, 27307, 30037,
32768,
2521, 5041, 7562, 10082, 12603, 15124, 17644, 20165, 22686, 25206, 27727,
30247, 32768,
2341, 4681, 7022, 9362, 11703, 14043, 16384, 18725, 21065, 23406, 25746,
28087, 30427, 32768,
2185, 4369, 6554, 8738, 10923, 13107, 15292, 17476, 19661, 21845, 24030,
26214, 28399, 30583, 32768,
2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432, 20480, 22528,
24576, 26624, 28672, 30720, 32768
};
// clang-format on
/*Given the current total integer number of bits used and the current value of
rng, computes the fraction number of bits used to OD_BITRES precision.
This is used by od_ec_enc_tell_frac() and od_ec_dec_tell_frac().
nbits_total: The number of whole bits currently used, i.e., the value
returned by od_ec_enc_tell() or od_ec_dec_tell().
rng: The current value of rng from either the encoder or decoder state.
Return: The number of bits scaled by 2**OD_BITRES.
This will always be slightly larger than the exact value (e.g., all
rounding error is in the positive direction).*/
uint32_t od_ec_tell_frac(uint32_t nbits_total, uint32_t rng) {
uint32_t nbits;
int l;
int i;
/*To handle the non-integral number of bits still left in the encoder/decoder
state, we compute the worst-case number of bits of val that must be
encoded to ensure that the value is inside the range for any possible
subsequent bits.
The computation here is independent of val itself (the decoder does not
even track that value), even though the real number of bits used after
od_ec_enc_done() may be 1 smaller if rng is a power of two and the
corresponding trailing bits of val are all zeros.
If we did try to track that special case, then coding a value with a
probability of 1/(1 << n) might sometimes appear to use more than n bits.
This may help explain the surprising result that a newly initialized
encoder or decoder claims to have used 1 bit.*/
nbits = nbits_total << OD_BITRES;
l = 0;
for (i = OD_BITRES; i-- > 0;) {
int b;
rng = rng * rng >> 15;
b = (int)(rng >> 16);
l = l << 1 | b;
rng >>= b;
}
return nbits - l;
}

View File

@@ -1,105 +0,0 @@
/*
* Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#if !defined(_entcode_H)
#define _entcode_H (1)
#include <limits.h>
#include <stddef.h>
#include "av1/common/odintrin.h"
/*Set this flag 1 to enable a "reduced overhead" version of the entropy coder.
This uses a partition function that more accurately follows the input
probability estimates at the expense of some additional CPU cost (though
still an order of magnitude less than a full division).
In classic arithmetic coding, the partition function maps a value x in the
range [0, ft] to a value in y in [0, r] with 0 < ft <= r via
y = x*r/ft.
Any deviation from this value increases coding inefficiency.
To avoid divisions, we require ft <= r < 2*ft (enforcing it by shifting up
ft if necessary), and replace that function with
y = x + OD_MINI(x, r - ft).
This counts values of x smaller than r - ft double compared to values larger
than r - ft, which over-estimates the probability of symbols at the start of
the alphabet, and under-estimates the probability of symbols at the end of
the alphabet.
The overall coding inefficiency assuming accurate probability models and
independent symbols is in the 1% range, which is similar to that of CABAC.
To reduce overhead even further, we split this into two cases:
1) r - ft > ft - (r - ft).
That is, we have more values of x that are double-counted than
single-counted.
In this case, we still double-count the first 2*r - 3*ft values of x, but
after that we alternate between single-counting and double-counting for
the rest.
2) r - ft < ft - (r - ft).
That is, we have more values of x that are single-counted than
double-counted.
In this case, we alternate between single-counting and double-counting for
the first 2*(r - ft) values of x, and single-count the rest.
For two equiprobable symbols in different places in the alphabet, this
reduces the maximum ratio of over-estimation to under-estimation from 2:1
for the previous partition function to either 4:3 or 3:2 (for each of the
two cases above, respectively), assuming symbol probabilities significantly
greater than 1/32768.
That reduces the worst-case per-symbol overhead from 1 bit to 0.58 bits.
The resulting function is
e = OD_MAXI(2*r - 3*ft, 0);
y = x + OD_MINI(x, e) + OD_MINI(OD_MAXI(x - e, 0) >> 1, r - ft).
Here, e is a value that is greater than 0 in case 1, and 0 in case 2.
This function is about 3 times as expensive to evaluate as the high-overhead
version, but still an order of magnitude cheaper than a division, since it
is composed only of very simple operations.
Because we want to fit in 16-bit registers and must use unsigned values to do
so, we use saturating subtraction to enforce the maximums with 0.
Enabling this reduces the measured overhead in ectest from 0.805% to 0.621%
(vs. 0.022% for the division-based partition function with r much greater
than ft).
It improves performance on ntt-short-1 by about 0.3%.*/
#define OD_EC_REDUCED_OVERHEAD (1)
/*OPT: od_ec_window must be at least 32 bits, but if you have fast arithmetic
on a larger type, you can speed up the decoder by using it here.*/
typedef uint32_t od_ec_window;
#define OD_EC_WINDOW_SIZE ((int)sizeof(od_ec_window) * CHAR_BIT)
/*Unsigned subtraction with unsigned saturation.
This implementation of the macro is intentionally chosen to increase the
number of common subexpressions in the reduced-overhead partition function.
This matters for C code, but it would not for hardware with a saturating
subtraction instruction.*/
#define OD_SUBSATU(a, b) ((a)-OD_MINI(a, b))
/*The number of bits to use for the range-coded part of unsigned integers.*/
#define OD_EC_UINT_BITS (4)
/*The resolution of fractional-precision bit usage measurements, i.e.,
3 => 1/8th bits.*/
#define OD_BITRES (3)
extern const uint16_t OD_UNIFORM_CDFS_Q15[135];
/*Returns a Q15 CDF for a uniform probability distribution of the given size.
n: The size of the distribution.
This must be at least 2, and no more than 16.*/
#define OD_UNIFORM_CDF_Q15(n) (OD_UNIFORM_CDFS_Q15 + ((n) * ((n)-1) >> 1) - 1)
/*See entcode.c for further documentation.*/
OD_WARN_UNUSED_RESULT uint32_t od_ec_tell_frac(uint32_t nbits_total,
uint32_t rng);
#endif

View File

@@ -1,494 +0,0 @@
/*
* Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifdef HAVE_CONFIG_H
#include "./config.h"
#endif
#include "aom_dsp/entdec.h"
/*A range decoder.
This is an entropy decoder based upon \cite{Mar79}, which is itself a
rediscovery of the FIFO arithmetic code introduced by \cite{Pas76}.
It is very similar to arithmetic encoding, except that encoding is done with
digits in any base, instead of with bits, and so it is faster when using
larger bases (i.e.: a byte).
The author claims an average waste of $\frac{1}{2}\log_b(2b)$ bits, where $b$
is the base, longer than the theoretical optimum, but to my knowledge there
is no published justification for this claim.
This only seems true when using near-infinite precision arithmetic so that
the process is carried out with no rounding errors.
An excellent description of implementation details is available at
http://www.arturocampos.com/ac_range.html
A recent work \cite{MNW98} which proposes several changes to arithmetic
encoding for efficiency actually re-discovers many of the principles
behind range encoding, and presents a good theoretical analysis of them.
End of stream is handled by writing out the smallest number of bits that
ensures that the stream will be correctly decoded regardless of the value of
any subsequent bits.
od_ec_dec_tell() can be used to determine how many bits were needed to decode
all the symbols thus far; other data can be packed in the remaining bits of
the input buffer.
@PHDTHESIS{Pas76,
author="Richard Clark Pasco",
title="Source coding algorithms for fast data compression",
school="Dept. of Electrical Engineering, Stanford University",
address="Stanford, CA",
month=May,
year=1976,
URL="http://www.richpasco.org/scaffdc.pdf"
}
@INPROCEEDINGS{Mar79,
author="Martin, G.N.N.",
title="Range encoding: an algorithm for removing redundancy from a digitised
message",
booktitle="Video & Data Recording Conference",
year=1979,
address="Southampton",
month=Jul,
URL="http://www.compressconsult.com/rangecoder/rngcod.pdf.gz"
}
@ARTICLE{MNW98,
author="Alistair Moffat and Radford Neal and Ian H. Witten",
title="Arithmetic Coding Revisited",
journal="{ACM} Transactions on Information Systems",
year=1998,
volume=16,
number=3,
pages="256--294",
month=Jul,
URL="http://researchcommons.waikato.ac.nz/bitstream/handle/10289/78/content.pdf"
}*/
/*This is meant to be a large, positive constant that can still be efficiently
loaded as an immediate (on platforms like ARM, for example).
Even relatively modest values like 100 would work fine.*/
#define OD_EC_LOTS_OF_BITS (0x4000)
static void od_ec_dec_refill(od_ec_dec *dec) {
int s;
od_ec_window dif;
int16_t cnt;
const unsigned char *bptr;
const unsigned char *end;
dif = dec->dif;
cnt = dec->cnt;
bptr = dec->bptr;
end = dec->end;
s = OD_EC_WINDOW_SIZE - 9 - (cnt + 15);
for (; s >= 0 && bptr < end; s -= 8, bptr++) {
OD_ASSERT(s <= OD_EC_WINDOW_SIZE - 8);
dif |= (od_ec_window)bptr[0] << s;
cnt += 8;
}
if (bptr >= end) {
dec->tell_offs += OD_EC_LOTS_OF_BITS - cnt;
cnt = OD_EC_LOTS_OF_BITS;
}
dec->dif = dif;
dec->cnt = cnt;
dec->bptr = bptr;
}
/*Takes updated dif and range values, renormalizes them so that
32768 <= rng < 65536 (reading more bytes from the stream into dif if
necessary), and stores them back in the decoder context.
dif: The new value of dif.
rng: The new value of the range.
ret: The value to return.
Return: ret.
This allows the compiler to jump to this function via a tail-call.*/
static int od_ec_dec_normalize(od_ec_dec *dec, od_ec_window dif, unsigned rng,
int ret) {
int d;
OD_ASSERT(rng <= 65535U);
d = 16 - OD_ILOG_NZ(rng);
dec->cnt -= d;
dec->dif = dif << d;
dec->rng = rng << d;
if (dec->cnt < 0) od_ec_dec_refill(dec);
return ret;
}
/*Initializes the decoder.
buf: The input buffer to use.
Return: 0 on success, or a negative value on error.*/
void od_ec_dec_init(od_ec_dec *dec, const unsigned char *buf,
uint32_t storage) {
dec->buf = buf;
dec->eptr = buf + storage;
dec->end_window = 0;
dec->nend_bits = 0;
dec->tell_offs = 10 - (OD_EC_WINDOW_SIZE - 8);
dec->end = buf + storage;
dec->bptr = buf;
dec->dif = 0;
dec->rng = 0x8000;
dec->cnt = -15;
dec->error = 0;
od_ec_dec_refill(dec);
}
/*Decode a bit that has an fz/ft probability of being a zero.
fz: The probability that the bit is zero, scaled by _ft.
ft: The total probability.
This must be at least 16384 and no more than 32768.
Return: The value decoded (0 or 1).*/
int od_ec_decode_bool(od_ec_dec *dec, unsigned fz, unsigned ft) {
od_ec_window dif;
od_ec_window vw;
unsigned r;
int s;
unsigned v;
int ret;
OD_ASSERT(0 < fz);
OD_ASSERT(fz < ft);
OD_ASSERT(16384 <= ft);
OD_ASSERT(ft <= 32768U);
dif = dec->dif;
r = dec->rng;
OD_ASSERT(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
OD_ASSERT(ft <= r);
s = r - ft >= ft;
ft <<= s;
fz <<= s;
OD_ASSERT(r - ft < ft);
#if OD_EC_REDUCED_OVERHEAD
{
unsigned d;
unsigned e;
d = r - ft;
e = OD_SUBSATU(2 * d, ft);
v = fz + OD_MINI(fz, e) + OD_MINI(OD_SUBSATU(fz, e) >> 1, d);
}
#else
v = fz + OD_MINI(fz, r - ft);
#endif
vw = (od_ec_window)v << (OD_EC_WINDOW_SIZE - 16);
ret = dif >= vw;
if (ret) dif -= vw;
r = ret ? r - v : v;
return od_ec_dec_normalize(dec, dif, r, ret);
}
/*Decode a bit that has an fz probability of being a zero in Q15.
This is a simpler, lower overhead version of od_ec_decode_bool() for use when
ft == 32768.
To be decoded properly by this function, symbols cannot have been encoded by
od_ec_encode(), but must have been encoded with one of the equivalent _q15()
or _dyadic() functions instead.
fz: The probability that the bit is zero, scaled by 32768.
Return: The value decoded (0 or 1).*/
int od_ec_decode_bool_q15(od_ec_dec *dec, unsigned fz) {
od_ec_window dif;
od_ec_window vw;
unsigned r;
unsigned r_new;
unsigned v;
int ret;
OD_ASSERT(0 < fz);
OD_ASSERT(fz < 32768U);
dif = dec->dif;
r = dec->rng;
OD_ASSERT(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
OD_ASSERT(32768U <= r);
v = fz * (uint32_t)r >> 15;
vw = (od_ec_window)v << (OD_EC_WINDOW_SIZE - 16);
ret = 0;
r_new = v;
if (dif >= vw) {
r_new = r - v;
dif -= vw;
ret = 1;
}
return od_ec_dec_normalize(dec, dif, r_new, ret);
}
/*Decodes a symbol given a cumulative distribution function (CDF) table.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-increasing, and cdf[nsyms - 1]
must be at least 16384, and no more than 32768.
nsyms: The number of symbols in the alphabet.
This should be at most 16.
Return: The decoded symbol s.*/
int od_ec_decode_cdf(od_ec_dec *dec, const uint16_t *cdf, int nsyms) {
od_ec_window dif;
unsigned r;
unsigned c;
unsigned d;
#if OD_EC_REDUCED_OVERHEAD
unsigned e;
#endif
int s;
unsigned u;
unsigned v;
unsigned q;
unsigned fl;
unsigned fh;
unsigned ft;
int ret;
dif = dec->dif;
r = dec->rng;
OD_ASSERT(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
OD_ASSERT(nsyms > 0);
ft = cdf[nsyms - 1];
OD_ASSERT(16384 <= ft);
OD_ASSERT(ft <= 32768U);
OD_ASSERT(ft <= r);
s = r - ft >= ft;
ft <<= s;
d = r - ft;
OD_ASSERT(d < ft);
c = (unsigned)(dif >> (OD_EC_WINDOW_SIZE - 16));
q = OD_MAXI((int)(c >> 1), (int)(c - d));
#if OD_EC_REDUCED_OVERHEAD
e = OD_SUBSATU(2 * d, ft);
/*The correctness of this inverse partition function is not obvious, but it
was checked exhaustively for all possible values of r, ft, and c.
TODO: It should be possible to optimize this better than the compiler,
given that we do not care about the accuracy of negative results (as we
will not use them).
It would also be nice to get rid of the 32-bit dividend, as it requires a
32x32->64 bit multiply to invert.*/
q = OD_MAXI((int)q, (int)((2 * (int32_t)c + 1 - (int32_t)e) / 3));
#endif
q >>= s;
OD_ASSERT(q<ft>> s);
fl = 0;
ret = 0;
for (fh = cdf[ret]; fh <= q; fh = cdf[++ret]) fl = fh;
OD_ASSERT(fh <= ft >> s);
fl <<= s;
fh <<= s;
#if OD_EC_REDUCED_OVERHEAD
u = fl + OD_MINI(fl, e) + OD_MINI(OD_SUBSATU(fl, e) >> 1, d);
v = fh + OD_MINI(fh, e) + OD_MINI(OD_SUBSATU(fh, e) >> 1, d);
#else
u = fl + OD_MINI(fl, d);
v = fh + OD_MINI(fh, d);
#endif
r = v - u;
dif -= (od_ec_window)u << (OD_EC_WINDOW_SIZE - 16);
return od_ec_dec_normalize(dec, dif, r, ret);
}
/*Decodes a symbol given a cumulative distribution function (CDF) table.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-increasing, and cdf[nsyms - 1]
must be at least 2, and no more than 32768.
nsyms: The number of symbols in the alphabet.
This should be at most 16.
Return: The decoded symbol s.*/
int od_ec_decode_cdf_unscaled(od_ec_dec *dec, const uint16_t *cdf, int nsyms) {
od_ec_window dif;
unsigned r;
unsigned c;
unsigned d;
#if OD_EC_REDUCED_OVERHEAD
unsigned e;
#endif
int s;
unsigned u;
unsigned v;
unsigned q;
unsigned fl;
unsigned fh;
unsigned ft;
int ret;
dif = dec->dif;
r = dec->rng;
OD_ASSERT(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
OD_ASSERT(nsyms > 0);
ft = cdf[nsyms - 1];
OD_ASSERT(2 <= ft);
OD_ASSERT(ft <= 32768U);
s = 15 - OD_ILOG_NZ(ft - 1);
ft <<= s;
OD_ASSERT(ft <= r);
if (r - ft >= ft) {
ft <<= 1;
s++;
}
d = r - ft;
OD_ASSERT(d < ft);
c = (unsigned)(dif >> (OD_EC_WINDOW_SIZE - 16));
q = OD_MAXI((int)(c >> 1), (int)(c - d));
#if OD_EC_REDUCED_OVERHEAD
e = OD_SUBSATU(2 * d, ft);
/*TODO: See TODO above.*/
q = OD_MAXI((int)q, (int)((2 * (int32_t)c + 1 - (int32_t)e) / 3));
#endif
q >>= s;
OD_ASSERT(q<ft>> s);
fl = 0;
ret = 0;
for (fh = cdf[ret]; fh <= q; fh = cdf[++ret]) fl = fh;
OD_ASSERT(fh <= ft >> s);
fl <<= s;
fh <<= s;
#if OD_EC_REDUCED_OVERHEAD
u = fl + OD_MINI(fl, e) + OD_MINI(OD_SUBSATU(fl, e) >> 1, d);
v = fh + OD_MINI(fh, e) + OD_MINI(OD_SUBSATU(fh, e) >> 1, d);
#else
u = fl + OD_MINI(fl, d);
v = fh + OD_MINI(fh, d);
#endif
r = v - u;
dif -= (od_ec_window)u << (OD_EC_WINDOW_SIZE - 16);
return od_ec_dec_normalize(dec, dif, r, ret);
}
/*Decodes a symbol given a cumulative distribution function (CDF) table that
sums to a power of two.
This is a simpler, lower overhead version of od_ec_decode_cdf() for use when
cdf[nsyms - 1] is a power of two.
To be decoded properly by this function, symbols cannot have been encoded by
od_ec_encode(), but must have been encoded with one of the equivalent _q15()
functions instead.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-increasing, and cdf[nsyms - 1]
must be exactly 1 << ftb.
nsyms: The number of symbols in the alphabet.
This should be at most 16.
ftb: The number of bits of precision in the cumulative distribution.
This must be no more than 15.
Return: The decoded symbol s.*/
int od_ec_decode_cdf_unscaled_dyadic(od_ec_dec *dec, const uint16_t *cdf,
int nsyms, unsigned ftb) {
od_ec_window dif;
unsigned r;
unsigned c;
unsigned u;
unsigned v;
int ret;
(void)nsyms;
dif = dec->dif;
r = dec->rng;
OD_ASSERT(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
OD_ASSERT(ftb <= 15);
OD_ASSERT(cdf[nsyms - 1] == 1U << ftb);
OD_ASSERT(32768U <= r);
c = (unsigned)(dif >> (OD_EC_WINDOW_SIZE - 16));
v = 0;
ret = -1;
do {
u = v;
v = cdf[++ret] * (uint32_t)r >> ftb;
} while (v <= c);
OD_ASSERT(v <= r);
r = v - u;
dif -= (od_ec_window)u << (OD_EC_WINDOW_SIZE - 16);
return od_ec_dec_normalize(dec, dif, r, ret);
}
/*Decodes a symbol given a cumulative distribution function (CDF) table in Q15.
This is a simpler, lower overhead version of od_ec_decode_cdf() for use when
cdf[nsyms - 1] == 32768.
To be decoded properly by this function, symbols cannot have been encoded by
od_ec_encode(), but must have been encoded with one of the equivalent _q15()
or dyadic() functions instead.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-increasing, and cdf[nsyms - 1]
must be 32768.
nsyms: The number of symbols in the alphabet.
This should be at most 16.
Return: The decoded symbol s.*/
int od_ec_decode_cdf_q15(od_ec_dec *dec, const uint16_t *cdf, int nsyms) {
return od_ec_decode_cdf_unscaled_dyadic(dec, cdf, nsyms, 15);
}
/*Extracts a raw unsigned integer with a non-power-of-2 range from the stream.
The integer must have been encoded with od_ec_enc_uint().
ft: The number of integers that can be decoded (one more than the max).
This must be at least 2, and no more than 2**29.
Return: The decoded bits.*/
uint32_t od_ec_dec_uint(od_ec_dec *dec, uint32_t ft) {
OD_ASSERT(ft >= 2);
OD_ASSERT(ft <= (uint32_t)1 << (25 + OD_EC_UINT_BITS));
if (ft > 1U << OD_EC_UINT_BITS) {
uint32_t t;
int ft1;
int ftb;
ft--;
ftb = OD_ILOG_NZ(ft) - OD_EC_UINT_BITS;
ft1 = (int)(ft >> ftb) + 1;
t = od_ec_decode_cdf_q15(dec, OD_UNIFORM_CDF_Q15(ft1), ft1);
t = t << ftb | od_ec_dec_bits(dec, ftb, "");
if (t <= ft) return t;
dec->error = 1;
return ft;
}
return od_ec_decode_cdf_q15(dec, OD_UNIFORM_CDF_Q15(ft), (int)ft);
}
/*Extracts a sequence of raw bits from the stream.
The bits must have been encoded with od_ec_enc_bits().
ftb: The number of bits to extract.
This must be between 0 and 25, inclusive.
Return: The decoded bits.*/
uint32_t od_ec_dec_bits_(od_ec_dec *dec, unsigned ftb) {
od_ec_window window;
int available;
uint32_t ret;
OD_ASSERT(ftb <= 25);
window = dec->end_window;
available = dec->nend_bits;
if ((unsigned)available < ftb) {
const unsigned char *buf;
const unsigned char *eptr;
buf = dec->buf;
eptr = dec->eptr;
OD_ASSERT(available <= OD_EC_WINDOW_SIZE - 8);
do {
if (eptr <= buf) {
dec->tell_offs += OD_EC_LOTS_OF_BITS - available;
available = OD_EC_LOTS_OF_BITS;
break;
}
window |= (od_ec_window) * --eptr << available;
available += 8;
} while (available <= OD_EC_WINDOW_SIZE - 8);
dec->eptr = eptr;
}
ret = (uint32_t)window & (((uint32_t)1 << ftb) - 1);
window >>= ftb;
available -= ftb;
dec->end_window = window;
dec->nend_bits = available;
return ret;
}
/*Returns the number of bits "used" by the decoded symbols so far.
This same number can be computed in either the encoder or the decoder, and is
suitable for making coding decisions.
Return: The number of bits.
This will always be slightly larger than the exact value (e.g., all
rounding error is in the positive direction).*/
int od_ec_dec_tell(const od_ec_dec *dec) {
return ((dec->end - dec->eptr) + (dec->bptr - dec->buf)) * 8 - dec->cnt -
dec->nend_bits + dec->tell_offs;
}
/*Returns the number of bits "used" by the decoded symbols so far.
This same number can be computed in either the encoder or the decoder, and is
suitable for making coding decisions.
Return: The number of bits scaled by 2**OD_BITRES.
This will always be slightly larger than the exact value (e.g., all
rounding error is in the positive direction).*/
uint32_t od_ec_dec_tell_frac(const od_ec_dec *dec) {
return od_ec_tell_frac(od_ec_dec_tell(dec), dec->rng);
}

View File

@@ -1,101 +0,0 @@
/*
* Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#if !defined(_entdec_H)
#define _entdec_H (1)
#include <limits.h>
#include "aom_dsp/entcode.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef struct od_ec_dec od_ec_dec;
#if OD_ACCOUNTING
#define OD_ACC_STR , char *acc_str
#define od_ec_dec_bits(dec, ftb, str) od_ec_dec_bits_(dec, ftb, str)
#else
#define OD_ACC_STR
#define od_ec_dec_bits(dec, ftb, str) od_ec_dec_bits_(dec, ftb)
#endif
/*The entropy decoder context.*/
struct od_ec_dec {
/*The start of the current input buffer.*/
const unsigned char *buf;
/*The read pointer for the raw bits.*/
const unsigned char *eptr;
/*Bits that will be read from/written at the end.*/
od_ec_window end_window;
/*Number of valid bits in end_window.*/
int nend_bits;
/*An offset used to keep track of tell after reaching the end of the stream.
This is constant throughout most of the decoding process, but becomes
important once we hit the end of the buffer and stop incrementing pointers
(and instead pretend cnt/nend_bits have lots of bits).*/
int32_t tell_offs;
/*The end of the current input buffer.*/
const unsigned char *end;
/*The read pointer for the entropy-coded bits.*/
const unsigned char *bptr;
/*The difference between the coded value and the low end of the current
range.*/
od_ec_window dif;
/*The number of values in the current range.*/
uint16_t rng;
/*The number of bits of data in the current value.*/
int16_t cnt;
/*Nonzero if an error occurred.*/
int error;
};
/*See entdec.c for further documentation.*/
void od_ec_dec_init(od_ec_dec *dec, const unsigned char *buf, uint32_t storage)
OD_ARG_NONNULL(1) OD_ARG_NONNULL(2);
OD_WARN_UNUSED_RESULT int od_ec_decode_bool(od_ec_dec *dec, unsigned fz,
unsigned ft) OD_ARG_NONNULL(1);
OD_WARN_UNUSED_RESULT int od_ec_decode_bool_q15(od_ec_dec *dec, unsigned fz)
OD_ARG_NONNULL(1);
OD_WARN_UNUSED_RESULT int od_ec_decode_cdf(od_ec_dec *dec, const uint16_t *cdf,
int nsyms) OD_ARG_NONNULL(1)
OD_ARG_NONNULL(2);
OD_WARN_UNUSED_RESULT int od_ec_decode_cdf_q15(od_ec_dec *dec,
const uint16_t *cdf, int nsyms)
OD_ARG_NONNULL(1) OD_ARG_NONNULL(2);
OD_WARN_UNUSED_RESULT int od_ec_decode_cdf_unscaled(od_ec_dec *dec,
const uint16_t *cdf,
int nsyms) OD_ARG_NONNULL(1)
OD_ARG_NONNULL(2);
OD_WARN_UNUSED_RESULT int od_ec_decode_cdf_unscaled_dyadic(od_ec_dec *dec,
const uint16_t *cdf,
int nsyms,
unsigned _ftb)
OD_ARG_NONNULL(1) OD_ARG_NONNULL(2);
OD_WARN_UNUSED_RESULT uint32_t od_ec_dec_uint(od_ec_dec *dec, uint32_t ft)
OD_ARG_NONNULL(1);
OD_WARN_UNUSED_RESULT uint32_t od_ec_dec_bits_(od_ec_dec *dec, unsigned ftb)
OD_ARG_NONNULL(1);
OD_WARN_UNUSED_RESULT int od_ec_dec_tell(const od_ec_dec *dec)
OD_ARG_NONNULL(1);
OD_WARN_UNUSED_RESULT uint32_t od_ec_dec_tell_frac(const od_ec_dec *dec)
OD_ARG_NONNULL(1);
#ifdef __cplusplus
} // extern "C"
#endif
#endif

View File

@@ -1,686 +0,0 @@
/*
* Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifdef HAVE_CONFIG_H
#include "./config.h"
#endif
#include <stdlib.h>
#include <string.h>
#include "aom_dsp/entenc.h"
/*A range encoder.
See entdec.c and the references for implementation details \cite{Mar79,MNW98}.
@INPROCEEDINGS{Mar79,
author="Martin, G.N.N.",
title="Range encoding: an algorithm for removing redundancy from a digitised
message",
booktitle="Video \& Data Recording Conference",
year=1979,
address="Southampton",
month=Jul,
URL="http://www.compressconsult.com/rangecoder/rngcod.pdf.gz"
}
@ARTICLE{MNW98,
author="Alistair Moffat and Radford Neal and Ian H. Witten",
title="Arithmetic Coding Revisited",
journal="{ACM} Transactions on Information Systems",
year=1998,
volume=16,
number=3,
pages="256--294",
month=Jul,
URL="http://researchcommons.waikato.ac.nz/bitstream/handle/10289/78/content.pdf"
}*/
/*Takes updated low and range values, renormalizes them so that
32768 <= rng < 65536 (flushing bytes from low to the pre-carry buffer if
necessary), and stores them back in the encoder context.
low: The new value of low.
rng: The new value of the range.*/
static void od_ec_enc_normalize(od_ec_enc *enc, od_ec_window low,
unsigned rng) {
int d;
int c;
int s;
c = enc->cnt;
OD_ASSERT(rng <= 65535U);
d = 16 - OD_ILOG_NZ(rng);
s = c + d;
/*TODO: Right now we flush every time we have at least one byte available.
Instead we should use an od_ec_window and flush right before we're about to
shift bits off the end of the window.
For a 32-bit window this is about the same amount of work, but for a 64-bit
window it should be a fair win.*/
if (s >= 0) {
uint16_t *buf;
uint32_t storage;
uint32_t offs;
unsigned m;
buf = enc->precarry_buf;
storage = enc->precarry_storage;
offs = enc->offs;
if (offs + 2 > storage) {
storage = 2 * storage + 2;
buf = (uint16_t *)realloc(buf, sizeof(*buf) * storage);
if (buf == NULL) {
enc->error = -1;
enc->offs = 0;
return;
}
enc->precarry_buf = buf;
enc->precarry_storage = storage;
}
c += 16;
m = (1 << c) - 1;
if (s >= 8) {
OD_ASSERT(offs < storage);
buf[offs++] = (uint16_t)(low >> c);
low &= m;
c -= 8;
m >>= 8;
}
OD_ASSERT(offs < storage);
buf[offs++] = (uint16_t)(low >> c);
s = c + d - 24;
low &= m;
enc->offs = offs;
}
enc->low = low << d;
enc->rng = rng << d;
enc->cnt = s;
}
/*Initializes the encoder.
size: The initial size of the buffer, in bytes.*/
void od_ec_enc_init(od_ec_enc *enc, uint32_t size) {
od_ec_enc_reset(enc);
enc->buf = (unsigned char *)malloc(sizeof(*enc->buf) * size);
enc->storage = size;
if (size > 0 && enc->buf == NULL) {
enc->storage = 0;
enc->error = -1;
}
enc->precarry_buf = (uint16_t *)malloc(sizeof(*enc->precarry_buf) * size);
enc->precarry_storage = size;
if (size > 0 && enc->precarry_buf == NULL) {
enc->precarry_storage = 0;
enc->error = -1;
}
}
/*Reinitializes the encoder.*/
void od_ec_enc_reset(od_ec_enc *enc) {
enc->end_offs = 0;
enc->end_window = 0;
enc->nend_bits = 0;
enc->offs = 0;
enc->low = 0;
enc->rng = 0x8000;
/*This is initialized to -9 so that it crosses zero after we've accumulated
one byte + one carry bit.*/
enc->cnt = -9;
enc->error = 0;
#if OD_MEASURE_EC_OVERHEAD
enc->entropy = 0;
enc->nb_symbols = 0;
#endif
}
/*Frees the buffers used by the encoder.*/
void od_ec_enc_clear(od_ec_enc *enc) {
free(enc->precarry_buf);
free(enc->buf);
}
/*Encodes a symbol given its scaled frequency information.
The frequency information must be discernable by the decoder, assuming it
has read only the previous symbols from the stream.
You can change the frequency information, or even the entire source alphabet,
so long as the decoder can tell from the context of the previously encoded
information that it is supposed to do so as well.
fl: The cumulative frequency of all symbols that come before the one to be
encoded.
fh: The cumulative frequency of all symbols up to and including the one to
be encoded.
Together with fl, this defines the range [fl, fh) in which the decoded
value will fall.
ft: The sum of the frequencies of all the symbols.
This must be at least 16384, and no more than 32768.*/
static void od_ec_encode(od_ec_enc *enc, unsigned fl, unsigned fh,
unsigned ft) {
od_ec_window l;
unsigned r;
int s;
unsigned d;
unsigned u;
unsigned v;
OD_ASSERT(fl < fh);
OD_ASSERT(fh <= ft);
OD_ASSERT(16384 <= ft);
OD_ASSERT(ft <= 32768U);
l = enc->low;
r = enc->rng;
OD_ASSERT(ft <= r);
s = r - ft >= ft;
ft <<= s;
fl <<= s;
fh <<= s;
d = r - ft;
OD_ASSERT(d < ft);
#if OD_EC_REDUCED_OVERHEAD
{
unsigned e;
e = OD_SUBSATU(2 * d, ft);
u = fl + OD_MINI(fl, e) + OD_MINI(OD_SUBSATU(fl, e) >> 1, d);
v = fh + OD_MINI(fh, e) + OD_MINI(OD_SUBSATU(fh, e) >> 1, d);
}
#else
u = fl + OD_MINI(fl, d);
v = fh + OD_MINI(fh, d);
#endif
r = v - u;
l += u;
od_ec_enc_normalize(enc, l, r);
#if OD_MEASURE_EC_OVERHEAD
enc->entropy -= OD_LOG2((double)(fh - fl) / ft);
enc->nb_symbols++;
#endif
}
/*Encodes a symbol given its frequency in Q15.
This is like od_ec_encode() when ft == 32768, but is simpler and has lower
overhead.
Symbols encoded with this function cannot be properly decoded with
od_ec_decode(), and must be decoded with one of the equivalent _q15()
functions instead.
fl: The cumulative frequency of all symbols that come before the one to be
encoded.
fh: The cumulative frequency of all symbols up to and including the one to
be encoded.*/
static void od_ec_encode_q15(od_ec_enc *enc, unsigned fl, unsigned fh) {
od_ec_window l;
unsigned r;
unsigned u;
unsigned v;
OD_ASSERT(fl < fh);
OD_ASSERT(fh <= 32768U);
l = enc->low;
r = enc->rng;
OD_ASSERT(32768U <= r);
u = fl * (uint32_t)r >> 15;
v = fh * (uint32_t)r >> 15;
r = v - u;
l += u;
od_ec_enc_normalize(enc, l, r);
#if OD_MEASURE_EC_OVERHEAD
enc->entropy -= OD_LOG2((double)(fh - fl) / 32768.);
enc->nb_symbols++;
#endif
}
/*Encodes a symbol given its frequency information with an arbitrary scale.
This operates just like od_ec_encode(), but does not require that ft be at
least 16384.
fl: The cumulative frequency of all symbols that come before the one to be
encoded.
fh: The cumulative frequency of all symbols up to and including the one to
be encoded.
ft: The sum of the frequencies of all the symbols.
This must be at least 2 and no more than 32768.*/
static void od_ec_encode_unscaled(od_ec_enc *enc, unsigned fl, unsigned fh,
unsigned ft) {
int s;
OD_ASSERT(fl < fh);
OD_ASSERT(fh <= ft);
OD_ASSERT(2 <= ft);
OD_ASSERT(ft <= 32768U);
s = 15 - OD_ILOG_NZ(ft - 1);
od_ec_encode(enc, fl << s, fh << s, ft << s);
}
/*Encode a bit that has an fz/ft probability of being a zero.
val: The value to encode (0 or 1).
fz: The probability that val is zero, scaled by ft.
ft: The total probability.
This must be at least 16384 and no more than 32768.*/
void od_ec_encode_bool(od_ec_enc *enc, int val, unsigned fz, unsigned ft) {
od_ec_window l;
unsigned r;
int s;
unsigned v;
OD_ASSERT(0 < fz);
OD_ASSERT(fz < ft);
OD_ASSERT(16384 <= ft);
OD_ASSERT(ft <= 32768U);
l = enc->low;
r = enc->rng;
OD_ASSERT(ft <= r);
s = r - ft >= ft;
ft <<= s;
fz <<= s;
OD_ASSERT(r - ft < ft);
#if OD_EC_REDUCED_OVERHEAD
{
unsigned d;
unsigned e;
d = r - ft;
e = OD_SUBSATU(2 * d, ft);
v = fz + OD_MINI(fz, e) + OD_MINI(OD_SUBSATU(fz, e) >> 1, d);
}
#else
v = fz + OD_MINI(fz, r - ft);
#endif
if (val) l += v;
r = val ? r - v : v;
od_ec_enc_normalize(enc, l, r);
#if OD_MEASURE_EC_OVERHEAD
enc->entropy -= OD_LOG2((double)(val ? ft - fz : fz) / ft);
enc->nb_symbols++;
#endif
}
/*Encode a bit that has an fz probability of being a zero in Q15.
This is a simpler, lower overhead version of od_ec_encode_bool() for use when
ft == 32768.
Symbols encoded with this function cannot be properly decoded with
od_ec_decode(), and must be decoded with one of the equivalent _q15()
functions instead.
val: The value to encode (0 or 1).
fz: The probability that val is zero, scaled by 32768.*/
void od_ec_encode_bool_q15(od_ec_enc *enc, int val, unsigned fz) {
od_ec_window l;
unsigned r;
unsigned v;
OD_ASSERT(0 < fz);
OD_ASSERT(fz < 32768U);
l = enc->low;
r = enc->rng;
OD_ASSERT(32768U <= r);
v = fz * (uint32_t)r >> 15;
if (val) l += v;
r = val ? r - v : v;
od_ec_enc_normalize(enc, l, r);
#if OD_MEASURE_EC_OVERHEAD
enc->entropy -= OD_LOG2((double)(val ? 32768 - fz : fz) / 32768.);
enc->nb_symbols++;
#endif
}
/*Encodes a symbol given a cumulative distribution function (CDF) table.
s: The index of the symbol to encode.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-decreasing, and the last value
must be at least 16384, and no more than 32768.
nsyms: The number of symbols in the alphabet.
This should be at most 16.*/
void od_ec_encode_cdf(od_ec_enc *enc, int s, const uint16_t *cdf, int nsyms) {
OD_ASSERT(s >= 0);
OD_ASSERT(s < nsyms);
od_ec_encode(enc, s > 0 ? cdf[s - 1] : 0, cdf[s], cdf[nsyms - 1]);
}
/*Encodes a symbol given a cumulative distribution function (CDF) table in Q15.
This is a simpler, lower overhead version of od_ec_encode_cdf() for use when
cdf[nsyms - 1] == 32768.
Symbols encoded with this function cannot be properly decoded with
od_ec_decode(), and must be decoded with one of the equivalent _q15()
functions instead.
s: The index of the symbol to encode.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-decreasing, and the last value
must be exactly 32768.
nsyms: The number of symbols in the alphabet.
This should be at most 16.*/
void od_ec_encode_cdf_q15(od_ec_enc *enc, int s, const uint16_t *cdf,
int nsyms) {
(void)nsyms;
OD_ASSERT(s >= 0);
OD_ASSERT(s < nsyms);
OD_ASSERT(cdf[nsyms - 1] == 32768U);
od_ec_encode_q15(enc, s > 0 ? cdf[s - 1] : 0, cdf[s]);
}
/*Encodes a symbol given a cumulative distribution function (CDF) table.
s: The index of the symbol to encode.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-decreasing, and the last value
must be at least 2, and no more than 32768.
nsyms: The number of symbols in the alphabet.
This should be at most 16.*/
void od_ec_encode_cdf_unscaled(od_ec_enc *enc, int s, const uint16_t *cdf,
int nsyms) {
OD_ASSERT(s >= 0);
OD_ASSERT(s < nsyms);
od_ec_encode_unscaled(enc, s > 0 ? cdf[s - 1] : 0, cdf[s], cdf[nsyms - 1]);
}
/*Equivalent to od_ec_encode_cdf_q15() with the cdf scaled by
(1 << (15 - ftb)).
s: The index of the symbol to encode.
cdf: The CDF, such that symbol s falls in the range
[s > 0 ? cdf[s - 1] : 0, cdf[s]).
The values must be monotonically non-decreasing, and the last value
must be exactly 1 << ftb.
nsyms: The number of symbols in the alphabet.
This should be at most 16.
ftb: The number of bits of precision in the cumulative distribution.
This must be no more than 15.*/
void od_ec_encode_cdf_unscaled_dyadic(od_ec_enc *enc, int s,
const uint16_t *cdf, int nsyms,
unsigned ftb) {
(void)nsyms;
OD_ASSERT(s >= 0);
OD_ASSERT(s < nsyms);
OD_ASSERT(ftb <= 15);
OD_ASSERT(cdf[nsyms - 1] == 1U << ftb);
od_ec_encode_q15(enc, s > 0 ? cdf[s - 1] << (15 - ftb) : 0,
cdf[s] << (15 - ftb));
}
/*Encodes a raw unsigned integer in the stream.
fl: The integer to encode.
ft: The number of integers that can be encoded (one more than the max).
This must be at least 2, and no more than 2**29.*/
void od_ec_enc_uint(od_ec_enc *enc, uint32_t fl, uint32_t ft) {
OD_ASSERT(ft >= 2);
OD_ASSERT(fl < ft);
OD_ASSERT(ft <= (uint32_t)1 << (25 + OD_EC_UINT_BITS));
if (ft > 1U << OD_EC_UINT_BITS) {
int ft1;
int ftb;
ft--;
ftb = OD_ILOG_NZ(ft) - OD_EC_UINT_BITS;
ft1 = (int)(ft >> ftb) + 1;
od_ec_encode_cdf_q15(enc, (int)(fl >> ftb), OD_UNIFORM_CDF_Q15(ft1), ft1);
od_ec_enc_bits(enc, fl & (((uint32_t)1 << ftb) - 1), ftb);
} else {
od_ec_encode_cdf_q15(enc, (int)fl, OD_UNIFORM_CDF_Q15(ft), (int)ft);
}
}
/*Encodes a sequence of raw bits in the stream.
fl: The bits to encode.
ftb: The number of bits to encode.
This must be between 0 and 25, inclusive.*/
void od_ec_enc_bits(od_ec_enc *enc, uint32_t fl, unsigned ftb) {
od_ec_window end_window;
int nend_bits;
OD_ASSERT(ftb <= 25);
OD_ASSERT(fl < (uint32_t)1 << ftb);
#if OD_MEASURE_EC_OVERHEAD
enc->entropy += ftb;
#endif
end_window = enc->end_window;
nend_bits = enc->nend_bits;
if (nend_bits + ftb > OD_EC_WINDOW_SIZE) {
unsigned char *buf;
uint32_t storage;
uint32_t end_offs;
buf = enc->buf;
storage = enc->storage;
end_offs = enc->end_offs;
if (end_offs + (OD_EC_WINDOW_SIZE >> 3) >= storage) {
unsigned char *new_buf;
uint32_t new_storage;
new_storage = 2 * storage + (OD_EC_WINDOW_SIZE >> 3);
new_buf = (unsigned char *)malloc(sizeof(*new_buf) * new_storage);
if (new_buf == NULL) {
enc->error = -1;
enc->end_offs = 0;
return;
}
OD_COPY(new_buf + new_storage - end_offs, buf + storage - end_offs,
end_offs);
storage = new_storage;
free(buf);
enc->buf = buf = new_buf;
enc->storage = storage;
}
do {
OD_ASSERT(end_offs < storage);
buf[storage - ++end_offs] = (unsigned char)end_window;
end_window >>= 8;
nend_bits -= 8;
} while (nend_bits >= 8);
enc->end_offs = end_offs;
}
OD_ASSERT(nend_bits + ftb <= OD_EC_WINDOW_SIZE);
end_window |= (od_ec_window)fl << nend_bits;
nend_bits += ftb;
enc->end_window = end_window;
enc->nend_bits = nend_bits;
}
/*Overwrites a few bits at the very start of an existing stream, after they
have already been encoded.
This makes it possible to have a few flags up front, where it is easy for
decoders to access them without parsing the whole stream, even if their
values are not determined until late in the encoding process, without having
to buffer all the intermediate symbols in the encoder.
In order for this to work, at least nbits bits must have already been encoded
using probabilities that are an exact power of two.
The encoder can verify the number of encoded bits is sufficient, but cannot
check this latter condition.
val: The bits to encode (in the least nbits significant bits).
They will be decoded in order from most-significant to least.
nbits: The number of bits to overwrite.
This must be no more than 8.*/
void od_ec_enc_patch_initial_bits(od_ec_enc *enc, unsigned val, int nbits) {
int shift;
unsigned mask;
OD_ASSERT(nbits >= 0);
OD_ASSERT(nbits <= 8);
OD_ASSERT(val < 1U << nbits);
shift = 8 - nbits;
mask = ((1U << nbits) - 1) << shift;
if (enc->offs > 0) {
/*The first byte has been finalized.*/
enc->precarry_buf[0] =
(uint16_t)((enc->precarry_buf[0] & ~mask) | val << shift);
} else if (9 + enc->cnt + (enc->rng == 0x8000) > nbits) {
/*The first byte has yet to be output.*/
enc->low = (enc->low & ~((od_ec_window)mask << (16 + enc->cnt))) |
(od_ec_window)val << (16 + enc->cnt + shift);
} else {
/*The encoder hasn't even encoded _nbits of data yet.*/
enc->error = -1;
}
}
#if OD_MEASURE_EC_OVERHEAD
#include <stdio.h>
#endif
/*Indicates that there are no more symbols to encode.
All remaining output bytes are flushed to the output buffer.
od_ec_enc_reset() should be called before using the encoder again.
bytes: Returns the size of the encoded data in the returned buffer.
Return: A pointer to the start of the final buffer, or NULL if there was an
encoding error.*/
unsigned char *od_ec_enc_done(od_ec_enc *enc, uint32_t *nbytes) {
unsigned char *out;
uint32_t storage;
uint16_t *buf;
uint32_t offs;
uint32_t end_offs;
int nend_bits;
od_ec_window m;
od_ec_window e;
od_ec_window l;
unsigned r;
int c;
int s;
if (enc->error) return NULL;
#if OD_MEASURE_EC_OVERHEAD
{
uint32_t tell;
/* Don't count the 1 bit we lose to raw bits as overhead. */
tell = od_ec_enc_tell(enc) - 1;
fprintf(stderr, "overhead: %f%%\n",
100 * (tell - enc->entropy) / enc->entropy);
fprintf(stderr, "efficiency: %f bits/symbol\n",
(double)tell / enc->nb_symbols);
}
#endif
/*We output the minimum number of bits that ensures that the symbols encoded
thus far will be decoded correctly regardless of the bits that follow.*/
l = enc->low;
r = enc->rng;
c = enc->cnt;
s = 9;
m = 0x7FFF;
e = (l + m) & ~m;
while ((e | m) >= l + r) {
s++;
m >>= 1;
e = (l + m) & ~m;
}
s += c;
offs = enc->offs;
buf = enc->precarry_buf;
if (s > 0) {
unsigned n;
storage = enc->precarry_storage;
if (offs + ((s + 7) >> 3) > storage) {
storage = storage * 2 + ((s + 7) >> 3);
buf = (uint16_t *)realloc(buf, sizeof(*buf) * storage);
if (buf == NULL) {
enc->error = -1;
return NULL;
}
enc->precarry_buf = buf;
enc->precarry_storage = storage;
}
n = (1 << (c + 16)) - 1;
do {
OD_ASSERT(offs < storage);
buf[offs++] = (uint16_t)(e >> (c + 16));
e &= n;
s -= 8;
c -= 8;
n >>= 8;
} while (s > 0);
}
/*Make sure there's enough room for the entropy-coded bits and the raw
bits.*/
out = enc->buf;
storage = enc->storage;
end_offs = enc->end_offs;
e = enc->end_window;
nend_bits = enc->nend_bits;
s = -s;
c = OD_MAXI((nend_bits - s + 7) >> 3, 0);
if (offs + end_offs + c > storage) {
storage = offs + end_offs + c;
out = (unsigned char *)realloc(out, sizeof(*out) * storage);
if (out == NULL) {
enc->error = -1;
return NULL;
}
OD_MOVE(out + storage - end_offs, out + enc->storage - end_offs, end_offs);
enc->buf = out;
enc->storage = storage;
}
/*If we have buffered raw bits, flush them as well.*/
while (nend_bits > s) {
OD_ASSERT(end_offs < storage);
out[storage - ++end_offs] = (unsigned char)e;
e >>= 8;
nend_bits -= 8;
}
*nbytes = offs + end_offs;
/*Perform carry propagation.*/
OD_ASSERT(offs + end_offs <= storage);
out = out + storage - (offs + end_offs);
c = 0;
end_offs = offs;
while (offs-- > 0) {
c = buf[offs] + c;
out[offs] = (unsigned char)c;
c >>= 8;
}
/*Add any remaining raw bits to the last byte.
There is guaranteed to be enough room, because nend_bits <= s.*/
OD_ASSERT(nend_bits <= 0 || end_offs > 0);
if (nend_bits > 0) out[end_offs - 1] |= (unsigned char)e;
/*Note: Unless there's an allocation error, if you keep encoding into the
current buffer and call this function again later, everything will work
just fine (you won't get a new packet out, but you will get a single
buffer with the new data appended to the old).
However, this function is O(N) where N is the amount of data coded so far,
so calling it more than once for a given packet is a bad idea.*/
return out;
}
/*Returns the number of bits "used" by the encoded symbols so far.
This same number can be computed in either the encoder or the decoder, and is
suitable for making coding decisions.
Warning: The value returned by this function can decrease compared to an
earlier call, even after encoding more data, if there is an encoding error
(i.e., a failure to allocate enough space for the output buffer).
Return: The number of bits.
This will always be slightly larger than the exact value (e.g., all
rounding error is in the positive direction).*/
int od_ec_enc_tell(const od_ec_enc *enc) {
/*The 10 here counteracts the offset of -9 baked into cnt, and adds 1 extra
bit, which we reserve for terminating the stream.*/
return (enc->offs + enc->end_offs) * 8 + enc->cnt + enc->nend_bits + 10;
}
/*Returns the number of bits "used" by the encoded symbols so far.
This same number can be computed in either the encoder or the decoder, and is
suitable for making coding decisions.
Warning: The value returned by this function can decrease compared to an
earlier call, even after encoding more data, if there is an encoding error
(i.e., a failure to allocate enough space for the output buffer).
Return: The number of bits scaled by 2**OD_BITRES.
This will always be slightly larger than the exact value (e.g., all
rounding error is in the positive direction).*/
uint32_t od_ec_enc_tell_frac(const od_ec_enc *enc) {
return od_ec_tell_frac(od_ec_enc_tell(enc), enc->rng);
}
/*Saves a entropy coder checkpoint to dst.
This allows an encoder to reverse a series of entropy coder
decisions if it decides that the information would have been
better coded some other way.*/
void od_ec_enc_checkpoint(od_ec_enc *dst, const od_ec_enc *src) {
OD_COPY(dst, src, 1);
}
/*Restores an entropy coder checkpoint saved by od_ec_enc_checkpoint.
This can only be used to restore from checkpoints earlier in the target
state's history: you can not switch backwards and forwards or otherwise
switch to a state which isn't a casual ancestor of the current state.
Restore is also incompatible with patching the initial bits, as the
changes will remain in the restored version.*/
void od_ec_enc_rollback(od_ec_enc *dst, const od_ec_enc *src) {
unsigned char *buf;
uint32_t storage;
uint16_t *precarry_buf;
uint32_t precarry_storage;
OD_ASSERT(dst->storage >= src->storage);
OD_ASSERT(dst->precarry_storage >= src->precarry_storage);
buf = dst->buf;
storage = dst->storage;
precarry_buf = dst->precarry_buf;
precarry_storage = dst->precarry_storage;
OD_COPY(dst, src, 1);
dst->buf = buf;
dst->storage = storage;
dst->precarry_buf = precarry_buf;
dst->precarry_storage = precarry_storage;
}

View File

@@ -1,103 +0,0 @@
/*
* Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#if !defined(_entenc_H)
#define _entenc_H (1)
#include <stddef.h>
#include "aom_dsp/entcode.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef struct od_ec_enc od_ec_enc;
#define OD_MEASURE_EC_OVERHEAD (0)
/*The entropy encoder context.*/
struct od_ec_enc {
/*Buffered output.
This contains only the raw bits until the final call to od_ec_enc_done(),
where all the arithmetic-coded data gets prepended to it.*/
unsigned char *buf;
/*The size of the buffer.*/
uint32_t storage;
/*The offset at which the last byte containing raw bits was written.*/
uint32_t end_offs;
/*Bits that will be read from/written at the end.*/
od_ec_window end_window;
/*Number of valid bits in end_window.*/
int nend_bits;
/*A buffer for output bytes with their associated carry flags.*/
uint16_t *precarry_buf;
/*The size of the pre-carry buffer.*/
uint32_t precarry_storage;
/*The offset at which the next entropy-coded byte will be written.*/
uint32_t offs;
/*The low end of the current range.*/
od_ec_window low;
/*The number of values in the current range.*/
uint16_t rng;
/*The number of bits of data in the current value.*/
int16_t cnt;
/*Nonzero if an error occurred.*/
int error;
#if OD_MEASURE_EC_OVERHEAD
double entropy;
int nb_symbols;
#endif
};
/*See entenc.c for further documentation.*/
void od_ec_enc_init(od_ec_enc *enc, uint32_t size) OD_ARG_NONNULL(1);
void od_ec_enc_reset(od_ec_enc *enc) OD_ARG_NONNULL(1);
void od_ec_enc_clear(od_ec_enc *enc) OD_ARG_NONNULL(1);
void od_ec_encode_bool(od_ec_enc *enc, int val, unsigned fz, unsigned _ft)
OD_ARG_NONNULL(1);
void od_ec_encode_bool_q15(od_ec_enc *enc, int val, unsigned fz_q15)
OD_ARG_NONNULL(1);
void od_ec_encode_cdf(od_ec_enc *enc, int s, const uint16_t *cdf, int nsyms)
OD_ARG_NONNULL(1) OD_ARG_NONNULL(3);
void od_ec_encode_cdf_q15(od_ec_enc *enc, int s, const uint16_t *cdf, int nsyms)
OD_ARG_NONNULL(1) OD_ARG_NONNULL(3);
void od_ec_encode_cdf_unscaled(od_ec_enc *enc, int s, const uint16_t *cdf,
int nsyms) OD_ARG_NONNULL(1) OD_ARG_NONNULL(3);
void od_ec_encode_cdf_unscaled_dyadic(od_ec_enc *enc, int s,
const uint16_t *cdf, int nsyms,
unsigned ftb) OD_ARG_NONNULL(1)
OD_ARG_NONNULL(3);
void od_ec_enc_uint(od_ec_enc *enc, uint32_t fl, uint32_t ft) OD_ARG_NONNULL(1);
void od_ec_enc_bits(od_ec_enc *enc, uint32_t fl, unsigned ftb)
OD_ARG_NONNULL(1);
void od_ec_enc_patch_initial_bits(od_ec_enc *enc, unsigned val, int nbits)
OD_ARG_NONNULL(1);
OD_WARN_UNUSED_RESULT unsigned char *od_ec_enc_done(od_ec_enc *enc,
uint32_t *nbytes)
OD_ARG_NONNULL(1) OD_ARG_NONNULL(2);
OD_WARN_UNUSED_RESULT int od_ec_enc_tell(const od_ec_enc *enc)
OD_ARG_NONNULL(1);
OD_WARN_UNUSED_RESULT uint32_t od_ec_enc_tell_frac(const od_ec_enc *enc)
OD_ARG_NONNULL(1);
void od_ec_enc_checkpoint(od_ec_enc *dst, const od_ec_enc *src);
void od_ec_enc_rollback(od_ec_enc *dst, const od_ec_enc *src);
#ifdef __cplusplus
} // extern "C"
#endif
#endif

View File

@@ -1,26 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_FWD_TXFM_H_
#define AOM_DSP_FWD_TXFM_H_
#include "aom_dsp/txfm_common.h"
static INLINE tran_high_t fdct_round_shift(tran_high_t input) {
tran_high_t rv = ROUND_POWER_OF_TWO(input, DCT_CONST_BITS);
// TODO(debargha, peter.derivaz): Find new bounds for this assert
// and make the bounds consts.
// assert(INT16_MIN <= rv && rv <= INT16_MAX);
return rv;
}
void aom_fdct32(const tran_high_t *input, tran_high_t *output, int round);
#endif // AOM_DSP_FWD_TXFM_H_

View File

@@ -1,59 +0,0 @@
/*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
#include "./macros_msa.h"
void aom_plane_add_noise_msa(uint8_t *start_ptr, char *noise,
char blackclamp[16], char whiteclamp[16],
char bothclamp[16], uint32_t width,
uint32_t height, int32_t pitch) {
uint32_t i, j;
for (i = 0; i < height / 2; ++i) {
uint8_t *pos0_ptr = start_ptr + (2 * i) * pitch;
int8_t *ref0_ptr = (int8_t *)(noise + (rand() & 0xff));
uint8_t *pos1_ptr = start_ptr + (2 * i + 1) * pitch;
int8_t *ref1_ptr = (int8_t *)(noise + (rand() & 0xff));
for (j = width / 16; j--;) {
v16i8 temp00_s, temp01_s;
v16u8 temp00, temp01, black_clamp, white_clamp;
v16u8 pos0, ref0, pos1, ref1;
v16i8 const127 = __msa_ldi_b(127);
pos0 = LD_UB(pos0_ptr);
ref0 = LD_UB(ref0_ptr);
pos1 = LD_UB(pos1_ptr);
ref1 = LD_UB(ref1_ptr);
black_clamp = (v16u8)__msa_fill_b(blackclamp[0]);
white_clamp = (v16u8)__msa_fill_b(whiteclamp[0]);
temp00 = (pos0 < black_clamp);
pos0 = __msa_bmnz_v(pos0, black_clamp, temp00);
temp01 = (pos1 < black_clamp);
pos1 = __msa_bmnz_v(pos1, black_clamp, temp01);
XORI_B2_128_UB(pos0, pos1);
temp00_s = __msa_adds_s_b((v16i8)white_clamp, const127);
temp00 = (v16u8)(temp00_s < pos0);
pos0 = (v16u8)__msa_bmnz_v((v16u8)pos0, (v16u8)temp00_s, temp00);
temp01_s = __msa_adds_s_b((v16i8)white_clamp, const127);
temp01 = (temp01_s < pos1);
pos1 = (v16u8)__msa_bmnz_v((v16u8)pos1, (v16u8)temp01_s, temp01);
XORI_B2_128_UB(pos0, pos1);
pos0 += ref0;
ST_UB(pos0, pos0_ptr);
pos1 += ref1;
ST_UB(pos1, pos1_ptr);
pos0_ptr += 16;
pos1_ptr += 16;
ref0_ptr += 16;
ref1_ptr += 16;
}
}
}

View File

@@ -1,31 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "aom_dsp/mips/common_dspr2.h"
#if HAVE_DSPR2
uint8_t aom_ff_cropTbl_a[256 + 2 * CROP_WIDTH];
uint8_t *aom_ff_cropTbl;
void aom_dsputil_static_init(void) {
int i;
for (i = 0; i < 256; i++) aom_ff_cropTbl_a[i + CROP_WIDTH] = i;
for (i = 0; i < CROP_WIDTH; i++) {
aom_ff_cropTbl_a[i] = 0;
aom_ff_cropTbl_a[i + CROP_WIDTH + 256] = 255;
}
aom_ff_cropTbl = &aom_ff_cropTbl_a[CROP_WIDTH];
}
#endif

View File

@@ -1,226 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#if CONFIG_EC_MULTISYMBOL
#include <string.h>
#endif
#include "aom_dsp/prob.h"
#if CONFIG_DAALA_EC
#include "aom_dsp/entcode.h"
#endif
const uint8_t aom_norm[256] = {
0, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
static unsigned int tree_merge_probs_impl(unsigned int i,
const aom_tree_index *tree,
const aom_prob *pre_probs,
const unsigned int *counts,
aom_prob *probs) {
const int l = tree[i];
const unsigned int left_count =
(l <= 0) ? counts[-l]
: tree_merge_probs_impl(l, tree, pre_probs, counts, probs);
const int r = tree[i + 1];
const unsigned int right_count =
(r <= 0) ? counts[-r]
: tree_merge_probs_impl(r, tree, pre_probs, counts, probs);
const unsigned int ct[2] = { left_count, right_count };
probs[i >> 1] = mode_mv_merge_probs(pre_probs[i >> 1], ct);
return left_count + right_count;
}
void aom_tree_merge_probs(const aom_tree_index *tree, const aom_prob *pre_probs,
const unsigned int *counts, aom_prob *probs) {
tree_merge_probs_impl(0, tree, pre_probs, counts, probs);
}
#if CONFIG_EC_MULTISYMBOL
typedef struct tree_node tree_node;
struct tree_node {
aom_tree_index index;
uint8_t probs[16];
uint8_t prob;
int path;
int len;
int l;
int r;
aom_cdf_prob pdf;
};
/* Compute the probability of this node in Q23 */
static uint32_t tree_node_prob(tree_node n, int i) {
uint32_t prob;
/* 1.0 in Q23 */
prob = 16777216;
for (; i < n.len; i++) {
prob = prob * n.probs[i] >> 8;
}
return prob;
}
static int tree_node_cmp(tree_node a, tree_node b) {
int i;
uint32_t pa;
uint32_t pb;
for (i = 0; i < AOMMIN(a.len, b.len) && a.probs[i] == b.probs[i]; i++) {
}
pa = tree_node_prob(a, i);
pb = tree_node_prob(b, i);
return pa > pb ? 1 : pa < pb ? -1 : 0;
}
/* Given a Q15 probability for symbol subtree rooted at tree[n], this function
computes the probability of each symbol (defined as a node that has no
children). */
static aom_cdf_prob tree_node_compute_probs(tree_node *tree, int n,
aom_cdf_prob pdf) {
if (tree[n].l == 0) {
/* This prevents probability computations in Q15 that underflow from
producing a symbol that has zero probability. */
if (pdf == 0) pdf = 1;
tree[n].pdf = pdf;
return pdf;
} else {
/* We process the smaller probability first, */
if (tree[n].prob < 128) {
aom_cdf_prob lp;
aom_cdf_prob rp;
lp = (((uint32_t)pdf) * tree[n].prob + 128) >> 8;
lp = tree_node_compute_probs(tree, tree[n].l, lp);
rp = tree_node_compute_probs(tree, tree[n].r, lp > pdf ? 0 : pdf - lp);
return lp + rp;
} else {
aom_cdf_prob rp;
aom_cdf_prob lp;
rp = (((uint32_t)pdf) * (256 - tree[n].prob) + 128) >> 8;
rp = tree_node_compute_probs(tree, tree[n].r, rp);
lp = tree_node_compute_probs(tree, tree[n].l, rp > pdf ? 0 : pdf - rp);
return lp + rp;
}
}
}
static int tree_node_extract(tree_node *tree, int n, int symb,
aom_cdf_prob *pdf, aom_tree_index *index,
int *path, int *len) {
if (tree[n].l == 0) {
pdf[symb] = tree[n].pdf;
if (index != NULL) index[symb] = tree[n].index;
if (path != NULL) path[symb] = tree[n].path;
if (len != NULL) len[symb] = tree[n].len;
return symb + 1;
} else {
symb = tree_node_extract(tree, tree[n].l, symb, pdf, index, path, len);
return tree_node_extract(tree, tree[n].r, symb, pdf, index, path, len);
}
}
int tree_to_cdf(const aom_tree_index *tree, const aom_prob *probs,
aom_tree_index root, aom_cdf_prob *cdf, aom_tree_index *index,
int *path, int *len) {
tree_node symb[2 * 16 - 1];
int nodes;
int next[16];
int size;
int nsymbs;
int i;
/* Create the root node with probability 1 in Q15. */
symb[0].index = root;
symb[0].path = 0;
symb[0].len = 0;
symb[0].l = symb[0].r = 0;
nodes = 1;
next[0] = 0;
size = 1;
nsymbs = 1;
while (size > 0 && nsymbs < 16) {
int m;
tree_node n;
aom_tree_index j;
uint8_t prob;
m = 0;
/* Find the internal node with the largest probability. */
for (i = 1; i < size; i++) {
if (tree_node_cmp(symb[next[i]], symb[next[m]]) > 0) m = i;
}
i = next[m];
memmove(&next[m], &next[m + 1], sizeof(*next) * (size - (m + 1)));
size--;
/* Split this symbol into two symbols */
n = symb[i];
j = n.index;
prob = probs[j >> 1];
/* Left */
n.index = tree[j];
n.path <<= 1;
n.len++;
n.probs[n.len - 1] = prob;
symb[nodes] = n;
if (n.index > 0) {
next[size++] = nodes;
}
/* Right */
n.index = tree[j + 1];
n.path += 1;
n.probs[n.len - 1] = 256 - prob;
symb[nodes + 1] = n;
if (n.index > 0) {
next[size++] = nodes + 1;
}
symb[i].prob = prob;
symb[i].l = nodes;
symb[i].r = nodes + 1;
nodes += 2;
nsymbs++;
}
/* Compute the probabilities of each symbol in Q15 */
tree_node_compute_probs(symb, 0, 32768);
/* Extract the cdf, index, path and length */
tree_node_extract(symb, 0, 0, cdf, index, path, len);
/* Convert to CDF */
for (i = 1; i < nsymbs; i++) {
cdf[i] = cdf[i - 1] + cdf[i];
}
return nsymbs;
}
/* This code assumes that tree contains as unique leaf nodes the integer values
0 to len - 1 and produces the forward and inverse mapping tables in ind[]
and inv[] respectively. */
void av1_indices_from_tree(int *ind, int *inv, int len,
const aom_tree_index *tree) {
int i;
int index;
for (i = index = 0; i < TREE_SIZE(len); i++) {
const aom_tree_index j = tree[i];
if (j <= 0) {
inv[index] = -j;
ind[-j] = index++;
}
}
}
#endif

View File

@@ -1,158 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_PROB_H_
#define AOM_DSP_PROB_H_
#include "./aom_config.h"
#include "./aom_dsp_common.h"
#include "aom_ports/bitops.h"
#include "aom_ports/mem.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef uint8_t aom_prob;
// TODO(negge): Rename this aom_prob once we remove vpxbool.
typedef uint16_t aom_cdf_prob;
#define MAX_PROB 255
#define aom_prob_half ((aom_prob)128)
typedef int8_t aom_tree_index;
#define TREE_SIZE(leaf_count) (-2 + 2 * (leaf_count))
#define aom_complement(x) (255 - x)
#define MODE_MV_COUNT_SAT 20
/* We build coding trees compactly in arrays.
Each node of the tree is a pair of aom_tree_indices.
Array index often references a corresponding probability table.
Index <= 0 means done encoding/decoding and value = -Index,
Index > 0 means need another bit, specification at index.
Nonnegative indices are always even; processing begins at node 0. */
typedef const aom_tree_index aom_tree[];
static INLINE aom_prob clip_prob(int p) {
return (p > 255) ? 255 : (p < 1) ? 1 : p;
}
static INLINE aom_prob get_prob(int num, int den) {
return (den == 0) ? 128u : clip_prob(((int64_t)num * 256 + (den >> 1)) / den);
}
static INLINE aom_prob get_binary_prob(int n0, int n1) {
return get_prob(n0, n0 + n1);
}
/* This function assumes prob1 and prob2 are already within [1,255] range. */
static INLINE aom_prob weighted_prob(int prob1, int prob2, int factor) {
return ROUND_POWER_OF_TWO(prob1 * (256 - factor) + prob2 * factor, 8);
}
static INLINE aom_prob merge_probs(aom_prob pre_prob, const unsigned int ct[2],
unsigned int count_sat,
unsigned int max_update_factor) {
const aom_prob prob = get_binary_prob(ct[0], ct[1]);
const unsigned int count = AOMMIN(ct[0] + ct[1], count_sat);
const unsigned int factor = max_update_factor * count / count_sat;
return weighted_prob(pre_prob, prob, factor);
}
// MODE_MV_MAX_UPDATE_FACTOR (128) * count / MODE_MV_COUNT_SAT;
static const int count_to_update_factor[MODE_MV_COUNT_SAT + 1] = {
0, 6, 12, 19, 25, 32, 38, 44, 51, 57, 64,
70, 76, 83, 89, 96, 102, 108, 115, 121, 128
};
static INLINE aom_prob mode_mv_merge_probs(aom_prob pre_prob,
const unsigned int ct[2]) {
const unsigned int den = ct[0] + ct[1];
if (den == 0) {
return pre_prob;
} else {
const unsigned int count = AOMMIN(den, MODE_MV_COUNT_SAT);
const unsigned int factor = count_to_update_factor[count];
const aom_prob prob =
clip_prob(((int64_t)(ct[0]) * 256 + (den >> 1)) / den);
return weighted_prob(pre_prob, prob, factor);
}
}
void aom_tree_merge_probs(const aom_tree_index *tree, const aom_prob *pre_probs,
const unsigned int *counts, aom_prob *probs);
#if CONFIG_EC_MULTISYMBOL
int tree_to_cdf(const aom_tree_index *tree, const aom_prob *probs,
aom_tree_index root, aom_cdf_prob *cdf, aom_tree_index *ind,
int *pth, int *len);
static INLINE void av1_tree_to_cdf(const aom_tree_index *tree,
const aom_prob *probs, aom_cdf_prob *cdf) {
aom_tree_index index[16];
int path[16];
int dist[16];
tree_to_cdf(tree, probs, 0, cdf, index, path, dist);
}
#define av1_tree_to_cdf_1D(tree, probs, cdf, u) \
do { \
int i; \
for (i = 0; i < u; i++) { \
av1_tree_to_cdf(tree, probs[i], cdf[i]); \
} \
} while (0)
#define av1_tree_to_cdf_2D(tree, probs, cdf, v, u) \
do { \
int j; \
int i; \
for (j = 0; j < v; j++) { \
for (i = 0; i < u; i++) { \
av1_tree_to_cdf(tree, probs[j][i], cdf[j][i]); \
} \
} \
} while (0)
void av1_indices_from_tree(int *ind, int *inv, int len,
const aom_tree_index *tree);
#endif
DECLARE_ALIGNED(16, extern const uint8_t, aom_norm[256]);
#if CONFIG_EC_ADAPT
static INLINE void update_cdf(aom_cdf_prob *cdf, int val, int nsymbs) {
const int rate = 4 + get_msb(nsymbs);
int i, diff, tmp;
for (i = 0; i < nsymbs; ++i) {
tmp = (i + 1) << (12 - rate);
cdf[i] -= ((cdf[i] - tmp) >> rate);
}
diff = 32768 - cdf[nsymbs - 1];
for (i = val; i < nsymbs; ++i) {
cdf[i] += diff;
}
}
#endif
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_PROB_H_

View File

@@ -1,67 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_PSNR_H_
#define AOM_DSP_PSNR_H_
#include "aom_scale/yv12config.h"
#define MAX_PSNR 100.0
#ifdef __cplusplus
extern "C" {
#endif
typedef struct {
double psnr[4]; // total/y/u/v
uint64_t sse[4]; // total/y/u/v
uint32_t samples[4]; // total/y/u/v
} PSNR_STATS;
/*!\brief Converts SSE to PSNR
*
* Converts sum of squared errros (SSE) to peak signal-to-noise ratio (PNSR).
*
* \param[in] samples Number of samples
* \param[in] peak Max sample value
* \param[in] sse Sum of squared errors
*/
double aom_sse_to_psnr(double samples, double peak, double sse);
int64_t aom_get_y_sse_part(const YV12_BUFFER_CONFIG *a,
const YV12_BUFFER_CONFIG *b, int hstart, int width,
int vstart, int height);
int64_t aom_get_y_sse(const YV12_BUFFER_CONFIG *a, const YV12_BUFFER_CONFIG *b);
int64_t aom_get_u_sse(const YV12_BUFFER_CONFIG *a, const YV12_BUFFER_CONFIG *b);
int64_t aom_get_v_sse(const YV12_BUFFER_CONFIG *a, const YV12_BUFFER_CONFIG *b);
#if CONFIG_AOM_HIGHBITDEPTH
int64_t aom_highbd_get_y_sse_part(const YV12_BUFFER_CONFIG *a,
const YV12_BUFFER_CONFIG *b, int hstart,
int width, int vstart, int height);
int64_t aom_highbd_get_y_sse(const YV12_BUFFER_CONFIG *a,
const YV12_BUFFER_CONFIG *b);
int64_t aom_highbd_get_u_sse(const YV12_BUFFER_CONFIG *a,
const YV12_BUFFER_CONFIG *b);
int64_t aom_highbd_get_v_sse(const YV12_BUFFER_CONFIG *a,
const YV12_BUFFER_CONFIG *b);
void aom_calc_highbd_psnr(const YV12_BUFFER_CONFIG *a,
const YV12_BUFFER_CONFIG *b, PSNR_STATS *psnr,
unsigned int bit_depth, unsigned int in_bit_depth);
#endif
void aom_calc_psnr(const YV12_BUFFER_CONFIG *a, const YV12_BUFFER_CONFIG *b,
PSNR_STATS *psnr);
double aom_psnrhvs(const YV12_BUFFER_CONFIG *source,
const YV12_BUFFER_CONFIG *dest, double *phvs_y,
double *phvs_u, double *phvs_v, uint32_t bd, uint32_t in_bd);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_PSNR_H_

View File

@@ -1,686 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "aom_dsp/quantize.h"
#include "aom_mem/aom_mem.h"
#if CONFIG_AOM_QM
void aom_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs, int skip_block,
const int16_t *round_ptr, const int16_t quant,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr,
const qm_val_t *qm_ptr, const qm_val_t *iqm_ptr) {
const int rc = 0;
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
int64_t tmp, eob = -1;
int32_t tmp32;
int dequant =
(dequant_ptr * iqm_ptr[rc] + (1 << (AOM_QM_BITS - 1))) >> AOM_QM_BITS;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
tmp = clamp(abs_coeff + round_ptr[rc != 0], INT16_MIN, INT16_MAX);
tmp32 = (int32_t)((tmp * qm_ptr[rc] * quant) >> (16 + AOM_QM_BITS));
qcoeff_ptr[rc] = (tmp32 ^ coeff_sign) - coeff_sign;
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant;
if (tmp32) eob = 0;
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs,
int skip_block, const int16_t *round_ptr,
const int16_t quant, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t dequant_ptr,
uint16_t *eob_ptr, const qm_val_t *qm_ptr,
const qm_val_t *iqm_ptr) {
int eob = -1;
int dequant =
(dequant_ptr * iqm_ptr[0] + (1 << (AOM_QM_BITS - 1))) >> AOM_QM_BITS;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
const int coeff = coeff_ptr[0];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
const int64_t tmp = abs_coeff + round_ptr[0];
const uint32_t abs_qcoeff =
(uint32_t)((tmp * qm_ptr[0] * quant) >> (16 + AOM_QM_BITS));
qcoeff_ptr[0] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dqcoeff_ptr[0] = qcoeff_ptr[0] * dequant;
if (abs_qcoeff) eob = 0;
}
*eob_ptr = eob + 1;
}
#endif
void aom_quantize_dc_32x32(const tran_low_t *coeff_ptr, int skip_block,
const int16_t *round_ptr, const int16_t quant,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr,
const qm_val_t *qm_ptr, const qm_val_t *iqm_ptr) {
const int n_coeffs = 1024;
const int rc = 0;
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
int64_t tmp, eob = -1;
int32_t tmp32;
int dequant;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
tmp = clamp(abs_coeff + ROUND_POWER_OF_TWO(round_ptr[rc != 0], 1),
INT16_MIN, INT16_MAX);
tmp32 = (int32_t)((tmp * qm_ptr[rc] * quant) >> (15 + AOM_QM_BITS));
qcoeff_ptr[rc] = (tmp32 ^ coeff_sign) - coeff_sign;
dequant =
(dequant_ptr * iqm_ptr[rc] + (1 << (AOM_QM_BITS - 1))) >> AOM_QM_BITS;
dqcoeff_ptr[rc] = (qcoeff_ptr[rc] * dequant) / 2;
if (tmp32) eob = 0;
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_dc_32x32(const tran_low_t *coeff_ptr, int skip_block,
const int16_t *round_ptr, const int16_t quant,
tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr,
const qm_val_t *qm_ptr,
const qm_val_t *iqm_ptr) {
const int n_coeffs = 1024;
int eob = -1;
int dequant;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
const int coeff = coeff_ptr[0];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
const int64_t tmp = abs_coeff + ROUND_POWER_OF_TWO(round_ptr[0], 1);
const uint32_t abs_qcoeff =
(uint32_t)((tmp * qm_ptr[0] * quant) >> (15 + AOM_QM_BITS));
qcoeff_ptr[0] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dequant =
(dequant_ptr * iqm_ptr[0] + (1 << (AOM_QM_BITS - 1))) >> AOM_QM_BITS;
dqcoeff_ptr[0] = (qcoeff_ptr[0] * dequant) / 2;
if (abs_qcoeff) eob = 0;
}
*eob_ptr = eob + 1;
}
#endif
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr,
uint16_t *eob_ptr, const int16_t *scan,
const int16_t *iscan, const qm_val_t *qm_ptr,
const qm_val_t *iqm_ptr) {
int i, non_zero_count = (int)n_coeffs, eob = -1;
const int zbins[2] = { zbin_ptr[0], zbin_ptr[1] };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = (int)n_coeffs - 1; i >= 0; i--) {
const int rc = scan[i];
const qm_val_t wt = qm_ptr[rc];
const int coeff = coeff_ptr[rc] * wt;
if (coeff < (zbins[rc != 0] << AOM_QM_BITS) &&
coeff > (nzbins[rc != 0] << AOM_QM_BITS))
non_zero_count--;
else
break;
}
// Quantization pass: All coefficients with index >= zero_flag are
// skippable. Note: zero_flag can be zero.
for (i = 0; i < non_zero_count; i++) {
const int rc = scan[i];
const qm_val_t wt = qm_ptr[rc];
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
int dequant;
if (abs_coeff * wt >= (zbins[rc != 0] << AOM_QM_BITS)) {
int32_t tmp32;
int64_t tmp =
clamp(abs_coeff + round_ptr[rc != 0], INT16_MIN, INT16_MAX);
tmp = tmp * wt;
tmp32 = ((((tmp * quant_ptr[rc != 0]) >> 16) + tmp) *
quant_shift_ptr[rc != 0]) >>
(16 + AOM_QM_BITS); // quantization
dequant =
(dequant_ptr[rc != 0] * iqm_ptr[rc] + (1 << (AOM_QM_BITS - 1))) >>
AOM_QM_BITS;
qcoeff_ptr[rc] = (tmp32 ^ coeff_sign) - coeff_sign;
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant;
if (tmp32) eob = i;
}
}
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t *dequant_ptr, uint16_t *eob_ptr,
const int16_t *scan, const int16_t *iscan,
const qm_val_t *qm_ptr, const qm_val_t *iqm_ptr) {
int i, non_zero_count = (int)n_coeffs, eob = -1;
const int zbins[2] = { zbin_ptr[0], zbin_ptr[1] };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
int dequant;
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = (int)n_coeffs - 1; i >= 0; i--) {
const int rc = scan[i];
const qm_val_t wt = qm_ptr[rc];
const int coeff = coeff_ptr[rc] * wt;
if (coeff < (zbins[rc != 0] << AOM_QM_BITS) &&
coeff > (nzbins[rc != 0] << AOM_QM_BITS))
non_zero_count--;
else
break;
}
// Quantization pass: All coefficients with index >= zero_flag are
// skippable. Note: zero_flag can be zero.
for (i = 0; i < non_zero_count; i++) {
const int rc = scan[i];
const int coeff = coeff_ptr[rc];
const qm_val_t wt = qm_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
if (abs_coeff * wt >= (zbins[rc != 0] << AOM_QM_BITS)) {
const int64_t tmp1 = abs_coeff + round_ptr[rc != 0];
const int64_t tmpw = tmp1 * wt;
const int64_t tmp2 = ((tmpw * quant_ptr[rc != 0]) >> 16) + tmpw;
const uint32_t abs_qcoeff =
(uint32_t)((tmp2 * quant_shift_ptr[rc != 0]) >> (16 + AOM_QM_BITS));
qcoeff_ptr[rc] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dequant =
(dequant_ptr[rc != 0] * iqm_ptr[rc] + (1 << (AOM_QM_BITS - 1))) >>
AOM_QM_BITS;
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant;
if (abs_qcoeff) eob = i;
}
}
}
*eob_ptr = eob + 1;
}
#endif
void aom_quantize_b_32x32_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t *dequant_ptr, uint16_t *eob_ptr,
const int16_t *scan, const int16_t *iscan,
const qm_val_t *qm_ptr, const qm_val_t *iqm_ptr) {
const int zbins[2] = { ROUND_POWER_OF_TWO(zbin_ptr[0], 1),
ROUND_POWER_OF_TWO(zbin_ptr[1], 1) };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
int idx = 0;
int idx_arr[1024];
int i, eob = -1;
int dequant;
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = 0; i < n_coeffs; i++) {
const int rc = scan[i];
const qm_val_t wt = qm_ptr[rc];
const int coeff = coeff_ptr[rc] * wt;
// If the coefficient is out of the base ZBIN range, keep it for
// quantization.
if (coeff >= (zbins[rc != 0] << AOM_QM_BITS) ||
coeff <= (nzbins[rc != 0] << AOM_QM_BITS))
idx_arr[idx++] = i;
}
// Quantization pass: only process the coefficients selected in
// pre-scan pass. Note: idx can be zero.
for (i = 0; i < idx; i++) {
const int rc = scan[idx_arr[i]];
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const qm_val_t wt = qm_ptr[rc];
int64_t tmp;
int tmp32;
int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
abs_coeff += ROUND_POWER_OF_TWO(round_ptr[rc != 0], 1);
tmp = clamp(abs_coeff, INT16_MIN, INT16_MAX);
tmp = tmp * wt;
tmp32 = ((((tmp * quant_ptr[rc != 0]) >> 16) + tmp) *
quant_shift_ptr[rc != 0]) >>
(15 + AOM_QM_BITS);
qcoeff_ptr[rc] = (tmp32 ^ coeff_sign) - coeff_sign;
dequant =
(dequant_ptr[rc != 0] * iqm_ptr[rc] + (1 << (AOM_QM_BITS - 1))) >>
AOM_QM_BITS;
dqcoeff_ptr[rc] = (qcoeff_ptr[rc] * dequant) / 2;
if (tmp32) eob = idx_arr[i];
}
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_b_32x32_c(
const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block,
const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr,
const int16_t *scan, const int16_t *iscan, const qm_val_t *qm_ptr,
const qm_val_t *iqm_ptr) {
const int zbins[2] = { ROUND_POWER_OF_TWO(zbin_ptr[0], 1),
ROUND_POWER_OF_TWO(zbin_ptr[1], 1) };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
int idx = 0;
int idx_arr[1024];
int i, eob = -1;
int dequant;
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = 0; i < n_coeffs; i++) {
const int rc = scan[i];
const qm_val_t wt = qm_ptr[rc];
const int coeff = coeff_ptr[rc] * wt;
// If the coefficient is out of the base ZBIN range, keep it for
// quantization.
if (coeff >= (zbins[rc != 0] << AOM_QM_BITS) ||
coeff <= (nzbins[rc != 0] << AOM_QM_BITS))
idx_arr[idx++] = i;
}
// Quantization pass: only process the coefficients selected in
// pre-scan pass. Note: idx can be zero.
for (i = 0; i < idx; i++) {
const int rc = scan[idx_arr[i]];
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const qm_val_t wt = qm_ptr[rc];
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
const int64_t tmp1 =
abs_coeff + ROUND_POWER_OF_TWO(round_ptr[rc != 0], 1);
const int64_t tmpw = tmp1 * wt;
const int64_t tmp2 = ((tmpw * quant_ptr[rc != 0]) >> 16) + tmpw;
const uint32_t abs_qcoeff =
(uint32_t)((tmp2 * quant_shift_ptr[rc != 0]) >> (15 + AOM_QM_BITS));
qcoeff_ptr[rc] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dequant =
(dequant_ptr[rc != 0] * iqm_ptr[rc] + (1 << (AOM_QM_BITS - 1))) >>
AOM_QM_BITS;
dqcoeff_ptr[rc] = (qcoeff_ptr[rc] * dequant) / 2;
if (abs_qcoeff) eob = idx_arr[i];
}
}
*eob_ptr = eob + 1;
}
#endif
#else
void aom_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs, int skip_block,
const int16_t *round_ptr, const int16_t quant,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr) {
const int rc = 0;
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
int tmp, eob = -1;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
tmp = clamp(abs_coeff + round_ptr[rc != 0], INT16_MIN, INT16_MAX);
tmp = (tmp * quant) >> 16;
qcoeff_ptr[rc] = (tmp ^ coeff_sign) - coeff_sign;
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant_ptr;
if (tmp) eob = 0;
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs,
int skip_block, const int16_t *round_ptr,
const int16_t quant, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t dequant_ptr,
uint16_t *eob_ptr) {
int eob = -1;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
const int coeff = coeff_ptr[0];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
const int64_t tmp = abs_coeff + round_ptr[0];
const uint32_t abs_qcoeff = (uint32_t)((tmp * quant) >> 16);
qcoeff_ptr[0] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dqcoeff_ptr[0] = qcoeff_ptr[0] * dequant_ptr;
if (abs_qcoeff) eob = 0;
}
*eob_ptr = eob + 1;
}
#endif
void aom_quantize_dc_32x32(const tran_low_t *coeff_ptr, int skip_block,
const int16_t *round_ptr, const int16_t quant,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr) {
const int n_coeffs = 1024;
const int rc = 0;
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
int tmp, eob = -1;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
tmp = clamp(abs_coeff + ROUND_POWER_OF_TWO(round_ptr[rc != 0], 1),
INT16_MIN, INT16_MAX);
tmp = (tmp * quant) >> 15;
qcoeff_ptr[rc] = (tmp ^ coeff_sign) - coeff_sign;
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant_ptr / 2;
if (tmp) eob = 0;
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_dc_32x32(const tran_low_t *coeff_ptr, int skip_block,
const int16_t *round_ptr, const int16_t quant,
tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr,
uint16_t *eob_ptr) {
const int n_coeffs = 1024;
int eob = -1;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
const int coeff = coeff_ptr[0];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
const int64_t tmp = abs_coeff + ROUND_POWER_OF_TWO(round_ptr[0], 1);
const uint32_t abs_qcoeff = (uint32_t)((tmp * quant) >> 15);
qcoeff_ptr[0] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dqcoeff_ptr[0] = qcoeff_ptr[0] * dequant_ptr / 2;
if (abs_qcoeff) eob = 0;
}
*eob_ptr = eob + 1;
}
#endif
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr,
uint16_t *eob_ptr, const int16_t *scan,
const int16_t *iscan) {
int i, non_zero_count = (int)n_coeffs, eob = -1;
const int zbins[2] = { zbin_ptr[0], zbin_ptr[1] };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = (int)n_coeffs - 1; i >= 0; i--) {
const int rc = scan[i];
const int coeff = coeff_ptr[rc];
if (coeff < zbins[rc != 0] && coeff > nzbins[rc != 0])
non_zero_count--;
else
break;
}
// Quantization pass: All coefficients with index >= zero_flag are
// skippable. Note: zero_flag can be zero.
for (i = 0; i < non_zero_count; i++) {
const int rc = scan[i];
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
if (abs_coeff >= zbins[rc != 0]) {
int tmp = clamp(abs_coeff + round_ptr[rc != 0], INT16_MIN, INT16_MAX);
tmp = ((((tmp * quant_ptr[rc != 0]) >> 16) + tmp) *
quant_shift_ptr[rc != 0]) >>
16; // quantization
qcoeff_ptr[rc] = (tmp ^ coeff_sign) - coeff_sign;
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant_ptr[rc != 0];
if (tmp) eob = i;
}
}
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t *dequant_ptr, uint16_t *eob_ptr,
const int16_t *scan, const int16_t *iscan) {
int i, non_zero_count = (int)n_coeffs, eob = -1;
const int zbins[2] = { zbin_ptr[0], zbin_ptr[1] };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = (int)n_coeffs - 1; i >= 0; i--) {
const int rc = scan[i];
const int coeff = coeff_ptr[rc];
if (coeff < zbins[rc != 0] && coeff > nzbins[rc != 0])
non_zero_count--;
else
break;
}
// Quantization pass: All coefficients with index >= zero_flag are
// skippable. Note: zero_flag can be zero.
for (i = 0; i < non_zero_count; i++) {
const int rc = scan[i];
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
if (abs_coeff >= zbins[rc != 0]) {
const int64_t tmp1 = abs_coeff + round_ptr[rc != 0];
const int64_t tmp2 = ((tmp1 * quant_ptr[rc != 0]) >> 16) + tmp1;
const uint32_t abs_qcoeff =
(uint32_t)((tmp2 * quant_shift_ptr[rc != 0]) >> 16);
qcoeff_ptr[rc] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant_ptr[rc != 0];
if (abs_qcoeff) eob = i;
}
}
}
*eob_ptr = eob + 1;
}
#endif
void aom_quantize_b_32x32_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t *dequant_ptr, uint16_t *eob_ptr,
const int16_t *scan, const int16_t *iscan) {
const int zbins[2] = { ROUND_POWER_OF_TWO(zbin_ptr[0], 1),
ROUND_POWER_OF_TWO(zbin_ptr[1], 1) };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
int idx = 0;
int idx_arr[1024];
int i, eob = -1;
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = 0; i < n_coeffs; i++) {
const int rc = scan[i];
const int coeff = coeff_ptr[rc];
// If the coefficient is out of the base ZBIN range, keep it for
// quantization.
if (coeff >= zbins[rc != 0] || coeff <= nzbins[rc != 0])
idx_arr[idx++] = i;
}
// Quantization pass: only process the coefficients selected in
// pre-scan pass. Note: idx can be zero.
for (i = 0; i < idx; i++) {
const int rc = scan[idx_arr[i]];
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
int tmp;
int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
abs_coeff += ROUND_POWER_OF_TWO(round_ptr[rc != 0], 1);
abs_coeff = clamp(abs_coeff, INT16_MIN, INT16_MAX);
tmp = ((((abs_coeff * quant_ptr[rc != 0]) >> 16) + abs_coeff) *
quant_shift_ptr[rc != 0]) >>
15;
qcoeff_ptr[rc] = (tmp ^ coeff_sign) - coeff_sign;
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant_ptr[rc != 0] / 2;
if (tmp) eob = idx_arr[i];
}
}
*eob_ptr = eob + 1;
}
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_b_32x32_c(
const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block,
const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr,
const int16_t *scan, const int16_t *iscan) {
const int zbins[2] = { ROUND_POWER_OF_TWO(zbin_ptr[0], 1),
ROUND_POWER_OF_TWO(zbin_ptr[1], 1) };
const int nzbins[2] = { zbins[0] * -1, zbins[1] * -1 };
int idx = 0;
int idx_arr[1024];
int i, eob = -1;
(void)iscan;
memset(qcoeff_ptr, 0, n_coeffs * sizeof(*qcoeff_ptr));
memset(dqcoeff_ptr, 0, n_coeffs * sizeof(*dqcoeff_ptr));
if (!skip_block) {
// Pre-scan pass
for (i = 0; i < n_coeffs; i++) {
const int rc = scan[i];
const int coeff = coeff_ptr[rc];
// If the coefficient is out of the base ZBIN range, keep it for
// quantization.
if (coeff >= zbins[rc != 0] || coeff <= nzbins[rc != 0])
idx_arr[idx++] = i;
}
// Quantization pass: only process the coefficients selected in
// pre-scan pass. Note: idx can be zero.
for (i = 0; i < idx; i++) {
const int rc = scan[idx_arr[i]];
const int coeff = coeff_ptr[rc];
const int coeff_sign = (coeff >> 31);
const int abs_coeff = (coeff ^ coeff_sign) - coeff_sign;
const int64_t tmp1 =
abs_coeff + ROUND_POWER_OF_TWO(round_ptr[rc != 0], 1);
const int64_t tmp2 = ((tmp1 * quant_ptr[rc != 0]) >> 16) + tmp1;
const uint32_t abs_qcoeff =
(uint32_t)((tmp2 * quant_shift_ptr[rc != 0]) >> 15);
qcoeff_ptr[rc] = (tran_low_t)((abs_qcoeff ^ coeff_sign) - coeff_sign);
dqcoeff_ptr[rc] = qcoeff_ptr[rc] * dequant_ptr[rc != 0] / 2;
if (abs_qcoeff) eob = idx_arr[i];
}
}
*eob_ptr = eob + 1;
}
#endif
#endif

View File

@@ -1,91 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_QUANTIZE_H_
#define AOM_DSP_QUANTIZE_H_
#include "./aom_config.h"
#include "aom_dsp/aom_dsp_common.h"
#ifdef __cplusplus
extern "C" {
#endif
#if CONFIG_AOM_QM
void aom_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs, int skip_block,
const int16_t *round_ptr, const int16_t quant_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr,
const qm_val_t *qm_ptr, const qm_val_t *iqm_ptr);
void aom_quantize_dc_32x32(const tran_low_t *coeff_ptr, int skip_block,
const int16_t *round_ptr, const int16_t quant_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr,
const qm_val_t *qm_ptr, const qm_val_t *iqm_ptr);
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr,
uint16_t *eob_ptr, const int16_t *scan,
const int16_t *iscan, const qm_val_t *qm_ptr,
const qm_val_t *iqm_ptr);
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs,
int skip_block, const int16_t *round_ptr,
const int16_t quant_ptr, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t dequant_ptr,
uint16_t *eob_ptr, const qm_val_t *qm_ptr,
const qm_val_t *iqm_ptr);
void aom_highbd_quantize_dc_32x32(
const tran_low_t *coeff_ptr, int skip_block, const int16_t *round_ptr,
const int16_t quant_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr, const qm_val_t *qm_ptr,
const qm_val_t *iqm_ptr);
void aom_highbd_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs,
int skip_block, const int16_t *zbin_ptr,
const int16_t *round_ptr, const int16_t *quant_ptr,
const int16_t *quant_shift_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t *dequant_ptr, uint16_t *eob_ptr,
const int16_t *scan, const int16_t *iscan,
const qm_val_t *qm_ptr, const qm_val_t *iqm_ptr);
#endif
#else
void aom_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs, int skip_block,
const int16_t *round_ptr, const int16_t quant_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr);
void aom_quantize_dc_32x32(const tran_low_t *coeff_ptr, int skip_block,
const int16_t *round_ptr, const int16_t quant_ptr,
tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr);
#if CONFIG_AOM_HIGHBITDEPTH
void aom_highbd_quantize_dc(const tran_low_t *coeff_ptr, int n_coeffs,
int skip_block, const int16_t *round_ptr,
const int16_t quant_ptr, tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr, const int16_t dequant_ptr,
uint16_t *eob_ptr);
void aom_highbd_quantize_dc_32x32(const tran_low_t *coeff_ptr, int skip_block,
const int16_t *round_ptr,
const int16_t quant_ptr,
tran_low_t *qcoeff_ptr,
tran_low_t *dqcoeff_ptr,
const int16_t dequant_ptr, uint16_t *eob_ptr);
#endif
#endif
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_QUANTIZE_H_

View File

@@ -1,512 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <stdlib.h>
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
/* Sum the difference between every corresponding element of the buffers. */
static INLINE unsigned int sad(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, int width, int height) {
int y, x;
unsigned int sad = 0;
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) sad += abs(a[x] - b[x]);
a += a_stride;
b += b_stride;
}
return sad;
}
#define sadMxN(m, n) \
unsigned int aom_sad##m##x##n##_c(const uint8_t *src, int src_stride, \
const uint8_t *ref, int ref_stride) { \
return sad(src, src_stride, ref, ref_stride, m, n); \
} \
unsigned int aom_sad##m##x##n##_avg_c(const uint8_t *src, int src_stride, \
const uint8_t *ref, int ref_stride, \
const uint8_t *second_pred) { \
uint8_t comp_pred[m * n]; \
aom_comp_avg_pred_c(comp_pred, second_pred, m, n, ref, ref_stride); \
return sad(src, src_stride, comp_pred, m, m, n); \
}
// depending on call sites, pass **ref_array to avoid & in subsequent call and
// de-dup with 4D below.
#define sadMxNxK(m, n, k) \
void aom_sad##m##x##n##x##k##_c(const uint8_t *src, int src_stride, \
const uint8_t *ref_array, int ref_stride, \
uint32_t *sad_array) { \
int i; \
for (i = 0; i < k; ++i) \
sad_array[i] = \
aom_sad##m##x##n##_c(src, src_stride, &ref_array[i], ref_stride); \
}
// This appears to be equivalent to the above when k == 4 and refs is const
#define sadMxNx4D(m, n) \
void aom_sad##m##x##n##x4d_c(const uint8_t *src, int src_stride, \
const uint8_t *const ref_array[], \
int ref_stride, uint32_t *sad_array) { \
int i; \
for (i = 0; i < 4; ++i) \
sad_array[i] = \
aom_sad##m##x##n##_c(src, src_stride, ref_array[i], ref_stride); \
}
/* clang-format off */
#if CONFIG_AV1 && CONFIG_EXT_PARTITION
// 128x128
sadMxN(128, 128)
sadMxNxK(128, 128, 3)
sadMxNxK(128, 128, 8)
sadMxNx4D(128, 128)
// 128x64
sadMxN(128, 64)
sadMxNx4D(128, 64)
// 64x128
sadMxN(64, 128)
sadMxNx4D(64, 128)
#endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
// 64x64
sadMxN(64, 64)
sadMxNxK(64, 64, 3)
sadMxNxK(64, 64, 8)
sadMxNx4D(64, 64)
// 64x32
sadMxN(64, 32)
sadMxNx4D(64, 32)
// 32x64
sadMxN(32, 64)
sadMxNx4D(32, 64)
// 32x32
sadMxN(32, 32)
sadMxNxK(32, 32, 3)
sadMxNxK(32, 32, 8)
sadMxNx4D(32, 32)
// 32x16
sadMxN(32, 16)
sadMxNx4D(32, 16)
// 16x32
sadMxN(16, 32)
sadMxNx4D(16, 32)
// 16x16
sadMxN(16, 16)
sadMxNxK(16, 16, 3)
sadMxNxK(16, 16, 8)
sadMxNx4D(16, 16)
// 16x8
sadMxN(16, 8)
sadMxNxK(16, 8, 3)
sadMxNxK(16, 8, 8)
sadMxNx4D(16, 8)
// 8x16
sadMxN(8, 16)
sadMxNxK(8, 16, 3)
sadMxNxK(8, 16, 8)
sadMxNx4D(8, 16)
// 8x8
sadMxN(8, 8)
sadMxNxK(8, 8, 3)
sadMxNxK(8, 8, 8)
sadMxNx4D(8, 8)
// 8x4
sadMxN(8, 4)
sadMxNxK(8, 4, 8)
sadMxNx4D(8, 4)
// 4x8
sadMxN(4, 8)
sadMxNxK(4, 8, 8)
sadMxNx4D(4, 8)
// 4x4
sadMxN(4, 4)
sadMxNxK(4, 4, 3)
sadMxNxK(4, 4, 8)
sadMxNx4D(4, 4)
/* clang-format on */
#if CONFIG_AOM_HIGHBITDEPTH
static INLINE
unsigned int highbd_sad(const uint8_t *a8, int a_stride, const uint8_t *b8,
int b_stride, int width, int height) {
int y, x;
unsigned int sad = 0;
const uint16_t *a = CONVERT_TO_SHORTPTR(a8);
const uint16_t *b = CONVERT_TO_SHORTPTR(b8);
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) sad += abs(a[x] - b[x]);
a += a_stride;
b += b_stride;
}
return sad;
}
static INLINE unsigned int highbd_sadb(const uint8_t *a8, int a_stride,
const uint16_t *b, int b_stride,
int width, int height) {
int y, x;
unsigned int sad = 0;
const uint16_t *a = CONVERT_TO_SHORTPTR(a8);
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) sad += abs(a[x] - b[x]);
a += a_stride;
b += b_stride;
}
return sad;
}
#define highbd_sadMxN(m, n) \
unsigned int aom_highbd_sad##m##x##n##_c(const uint8_t *src, int src_stride, \
const uint8_t *ref, \
int ref_stride) { \
return highbd_sad(src, src_stride, ref, ref_stride, m, n); \
} \
unsigned int aom_highbd_sad##m##x##n##_avg_c( \
const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, \
const uint8_t *second_pred) { \
uint16_t comp_pred[m * n]; \
aom_highbd_comp_avg_pred_c(comp_pred, second_pred, m, n, ref, ref_stride); \
return highbd_sadb(src, src_stride, comp_pred, m, m, n); \
}
#define highbd_sadMxNxK(m, n, k) \
void aom_highbd_sad##m##x##n##x##k##_c( \
const uint8_t *src, int src_stride, const uint8_t *ref_array, \
int ref_stride, uint32_t *sad_array) { \
int i; \
for (i = 0; i < k; ++i) { \
sad_array[i] = aom_highbd_sad##m##x##n##_c(src, src_stride, \
&ref_array[i], ref_stride); \
} \
}
#define highbd_sadMxNx4D(m, n) \
void aom_highbd_sad##m##x##n##x4d_c(const uint8_t *src, int src_stride, \
const uint8_t *const ref_array[], \
int ref_stride, uint32_t *sad_array) { \
int i; \
for (i = 0; i < 4; ++i) { \
sad_array[i] = aom_highbd_sad##m##x##n##_c(src, src_stride, \
ref_array[i], ref_stride); \
} \
}
/* clang-format off */
#if CONFIG_AV1 && CONFIG_EXT_PARTITION
// 128x128
highbd_sadMxN(128, 128)
highbd_sadMxNxK(128, 128, 3)
highbd_sadMxNxK(128, 128, 8)
highbd_sadMxNx4D(128, 128)
// 128x64
highbd_sadMxN(128, 64)
highbd_sadMxNx4D(128, 64)
// 64x128
highbd_sadMxN(64, 128)
highbd_sadMxNx4D(64, 128)
#endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
// 64x64
highbd_sadMxN(64, 64)
highbd_sadMxNxK(64, 64, 3)
highbd_sadMxNxK(64, 64, 8)
highbd_sadMxNx4D(64, 64)
// 64x32
highbd_sadMxN(64, 32)
highbd_sadMxNx4D(64, 32)
// 32x64
highbd_sadMxN(32, 64)
highbd_sadMxNx4D(32, 64)
// 32x32
highbd_sadMxN(32, 32)
highbd_sadMxNxK(32, 32, 3)
highbd_sadMxNxK(32, 32, 8)
highbd_sadMxNx4D(32, 32)
// 32x16
highbd_sadMxN(32, 16)
highbd_sadMxNx4D(32, 16)
// 16x32
highbd_sadMxN(16, 32)
highbd_sadMxNx4D(16, 32)
// 16x16
highbd_sadMxN(16, 16)
highbd_sadMxNxK(16, 16, 3)
highbd_sadMxNxK(16, 16, 8)
highbd_sadMxNx4D(16, 16)
// 16x8
highbd_sadMxN(16, 8)
highbd_sadMxNxK(16, 8, 3)
highbd_sadMxNxK(16, 8, 8)
highbd_sadMxNx4D(16, 8)
// 8x16
highbd_sadMxN(8, 16)
highbd_sadMxNxK(8, 16, 3)
highbd_sadMxNxK(8, 16, 8)
highbd_sadMxNx4D(8, 16)
// 8x8
highbd_sadMxN(8, 8)
highbd_sadMxNxK(8, 8, 3)
highbd_sadMxNxK(8, 8, 8)
highbd_sadMxNx4D(8, 8)
// 8x4
highbd_sadMxN(8, 4)
highbd_sadMxNxK(8, 4, 8)
highbd_sadMxNx4D(8, 4)
// 4x8
highbd_sadMxN(4, 8)
highbd_sadMxNxK(4, 8, 8)
highbd_sadMxNx4D(4, 8)
// 4x4
highbd_sadMxN(4, 4)
highbd_sadMxNxK(4, 4, 3)
highbd_sadMxNxK(4, 4, 8)
highbd_sadMxNx4D(4, 4)
/* clang-format on */
#endif // CONFIG_AOM_HIGHBITDEPTH
#if CONFIG_AV1 && CONFIG_EXT_INTER
static INLINE
unsigned int masked_sad(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, const uint8_t *m, int m_stride,
int width, int height) {
int y, x;
unsigned int sad = 0;
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) sad += m[x] * abs(a[x] - b[x]);
a += a_stride;
b += b_stride;
m += m_stride;
}
sad = (sad + 31) >> 6;
return sad;
}
#define MASKSADMxN(m, n) \
unsigned int aom_masked_sad##m##x##n##_c( \
const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, \
const uint8_t *msk, int msk_stride) { \
return masked_sad(src, src_stride, ref, ref_stride, msk, msk_stride, m, \
n); \
}
/* clang-format off */
#if CONFIG_EXT_PARTITION
MASKSADMxN(128, 128)
MASKSADMxN(128, 64)
MASKSADMxN(64, 128)
#endif // CONFIG_EXT_PARTITION
MASKSADMxN(64, 64)
MASKSADMxN(64, 32)
MASKSADMxN(32, 64)
MASKSADMxN(32, 32)
MASKSADMxN(32, 16)
MASKSADMxN(16, 32)
MASKSADMxN(16, 16)
MASKSADMxN(16, 8)
MASKSADMxN(8, 16)
MASKSADMxN(8, 8)
MASKSADMxN(8, 4)
MASKSADMxN(4, 8)
MASKSADMxN(4, 4)
/* clang-format on */
#if CONFIG_AOM_HIGHBITDEPTH
static INLINE
unsigned int highbd_masked_sad(const uint8_t *a8, int a_stride,
const uint8_t *b8, int b_stride,
const uint8_t *m, int m_stride, int width,
int height) {
int y, x;
unsigned int sad = 0;
const uint16_t *a = CONVERT_TO_SHORTPTR(a8);
const uint16_t *b = CONVERT_TO_SHORTPTR(b8);
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) sad += m[x] * abs(a[x] - b[x]);
a += a_stride;
b += b_stride;
m += m_stride;
}
sad = (sad + 31) >> 6;
return sad;
}
#define HIGHBD_MASKSADMXN(m, n) \
unsigned int aom_highbd_masked_sad##m##x##n##_c( \
const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, \
const uint8_t *msk, int msk_stride) { \
return highbd_masked_sad(src, src_stride, ref, ref_stride, msk, \
msk_stride, m, n); \
}
#if CONFIG_EXT_PARTITION
HIGHBD_MASKSADMXN(128, 128)
HIGHBD_MASKSADMXN(128, 64)
HIGHBD_MASKSADMXN(64, 128)
#endif // CONFIG_EXT_PARTITION
HIGHBD_MASKSADMXN(64, 64)
HIGHBD_MASKSADMXN(64, 32)
HIGHBD_MASKSADMXN(32, 64)
HIGHBD_MASKSADMXN(32, 32)
HIGHBD_MASKSADMXN(32, 16)
HIGHBD_MASKSADMXN(16, 32)
HIGHBD_MASKSADMXN(16, 16)
HIGHBD_MASKSADMXN(16, 8)
HIGHBD_MASKSADMXN(8, 16)
HIGHBD_MASKSADMXN(8, 8)
HIGHBD_MASKSADMXN(8, 4)
HIGHBD_MASKSADMXN(4, 8)
HIGHBD_MASKSADMXN(4, 4)
#endif // CONFIG_AOM_HIGHBITDEPTH
#endif // CONFIG_AV1 && CONFIG_EXT_INTER
#if CONFIG_AV1 && CONFIG_MOTION_VAR
// pre: predictor being evaluated
// wsrc: target weighted prediction (has been *4096 to keep precision)
// mask: 2d weights (scaled by 4096)
static INLINE unsigned int obmc_sad(const uint8_t *pre, int pre_stride,
const int32_t *wsrc, const int32_t *mask,
int width, int height) {
int y, x;
unsigned int sad = 0;
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++)
sad += ROUND_POWER_OF_TWO(abs(wsrc[x] - pre[x] * mask[x]), 12);
pre += pre_stride;
wsrc += width;
mask += width;
}
return sad;
}
#define OBMCSADMxN(m, n) \
unsigned int aom_obmc_sad##m##x##n##_c(const uint8_t *ref, int ref_stride, \
const int32_t *wsrc, \
const int32_t *mask) { \
return obmc_sad(ref, ref_stride, wsrc, mask, m, n); \
}
/* clang-format off */
#if CONFIG_EXT_PARTITION
OBMCSADMxN(128, 128)
OBMCSADMxN(128, 64)
OBMCSADMxN(64, 128)
#endif // CONFIG_EXT_PARTITION
OBMCSADMxN(64, 64)
OBMCSADMxN(64, 32)
OBMCSADMxN(32, 64)
OBMCSADMxN(32, 32)
OBMCSADMxN(32, 16)
OBMCSADMxN(16, 32)
OBMCSADMxN(16, 16)
OBMCSADMxN(16, 8)
OBMCSADMxN(8, 16)
OBMCSADMxN(8, 8)
OBMCSADMxN(8, 4)
OBMCSADMxN(4, 8)
OBMCSADMxN(4, 4)
/* clang-format on */
#if CONFIG_AOM_HIGHBITDEPTH
static INLINE
unsigned int highbd_obmc_sad(const uint8_t *pre8, int pre_stride,
const int32_t *wsrc, const int32_t *mask,
int width, int height) {
int y, x;
unsigned int sad = 0;
const uint16_t *pre = CONVERT_TO_SHORTPTR(pre8);
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++)
sad += ROUND_POWER_OF_TWO(abs(wsrc[x] - pre[x] * mask[x]), 12);
pre += pre_stride;
wsrc += width;
mask += width;
}
return sad;
}
#define HIGHBD_OBMCSADMXN(m, n) \
unsigned int aom_highbd_obmc_sad##m##x##n##_c( \
const uint8_t *ref, int ref_stride, const int32_t *wsrc, \
const int32_t *mask) { \
return highbd_obmc_sad(ref, ref_stride, wsrc, mask, m, n); \
}
/* clang-format off */
#if CONFIG_EXT_PARTITION
HIGHBD_OBMCSADMXN(128, 128)
HIGHBD_OBMCSADMXN(128, 64)
HIGHBD_OBMCSADMXN(64, 128)
#endif // CONFIG_EXT_PARTITION
HIGHBD_OBMCSADMXN(64, 64)
HIGHBD_OBMCSADMXN(64, 32)
HIGHBD_OBMCSADMXN(32, 64)
HIGHBD_OBMCSADMXN(32, 32)
HIGHBD_OBMCSADMXN(32, 16)
HIGHBD_OBMCSADMXN(16, 32)
HIGHBD_OBMCSADMXN(16, 16)
HIGHBD_OBMCSADMXN(16, 8)
HIGHBD_OBMCSADMXN(8, 16)
HIGHBD_OBMCSADMXN(8, 8)
HIGHBD_OBMCSADMXN(8, 4)
HIGHBD_OBMCSADMXN(4, 8)
HIGHBD_OBMCSADMXN(4, 4)
/* clang-format on */
#endif // CONFIG_AOM_HIGHBITDEPTH
#endif // CONFIG_AV1 && CONFIG_MOTION_VAR

View File

@@ -1,259 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef _V128_INTRINSICS_H
#define _V128_INTRINSICS_H
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "./v128_intrinsics_c.h"
#include "./v64_intrinsics.h"
/* Fallback to plain, unoptimised C. */
typedef c_v128 v128;
SIMD_INLINE uint32_t v128_low_u32(v128 a) { return c_v128_low_u32(a); }
SIMD_INLINE v64 v128_low_v64(v128 a) { return c_v128_low_v64(a); }
SIMD_INLINE v64 v128_high_v64(v128 a) { return c_v128_high_v64(a); }
SIMD_INLINE v128 v128_from_64(uint64_t hi, uint64_t lo) {
return c_v128_from_64(hi, lo);
}
SIMD_INLINE v128 v128_from_v64(v64 hi, v64 lo) {
return c_v128_from_v64(hi, lo);
}
SIMD_INLINE v128 v128_from_32(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
return c_v128_from_32(a, b, c, d);
}
SIMD_INLINE v128 v128_load_unaligned(const void *p) {
return c_v128_load_unaligned(p);
}
SIMD_INLINE v128 v128_load_aligned(const void *p) {
return c_v128_load_aligned(p);
}
SIMD_INLINE void v128_store_unaligned(void *p, v128 a) {
c_v128_store_unaligned(p, a);
}
SIMD_INLINE void v128_store_aligned(void *p, v128 a) {
c_v128_store_aligned(p, a);
}
SIMD_INLINE v128 v128_align(v128 a, v128 b, const unsigned int c) {
return c_v128_align(a, b, c);
}
SIMD_INLINE v128 v128_zero() { return c_v128_zero(); }
SIMD_INLINE v128 v128_dup_8(uint8_t x) { return c_v128_dup_8(x); }
SIMD_INLINE v128 v128_dup_16(uint16_t x) { return c_v128_dup_16(x); }
SIMD_INLINE v128 v128_dup_32(uint32_t x) { return c_v128_dup_32(x); }
typedef uint32_t sad128_internal;
SIMD_INLINE sad128_internal v128_sad_u8_init() { return c_v128_sad_u8_init(); }
SIMD_INLINE sad128_internal v128_sad_u8(sad128_internal s, v128 a, v128 b) {
return c_v128_sad_u8(s, a, b);
}
SIMD_INLINE uint32_t v128_sad_u8_sum(sad128_internal s) {
return c_v128_sad_u8_sum(s);
}
typedef uint32_t ssd128_internal;
SIMD_INLINE ssd128_internal v128_ssd_u8_init() { return c_v128_ssd_u8_init(); }
SIMD_INLINE ssd128_internal v128_ssd_u8(ssd128_internal s, v128 a, v128 b) {
return c_v128_ssd_u8(s, a, b);
}
SIMD_INLINE uint32_t v128_ssd_u8_sum(ssd128_internal s) {
return c_v128_ssd_u8_sum(s);
}
SIMD_INLINE int64_t v128_dotp_s16(v128 a, v128 b) {
return c_v128_dotp_s16(a, b);
}
SIMD_INLINE uint64_t v128_hadd_u8(v128 a) { return c_v128_hadd_u8(a); }
SIMD_INLINE v128 v128_or(v128 a, v128 b) { return c_v128_or(a, b); }
SIMD_INLINE v128 v128_xor(v128 a, v128 b) { return c_v128_xor(a, b); }
SIMD_INLINE v128 v128_and(v128 a, v128 b) { return c_v128_and(a, b); }
SIMD_INLINE v128 v128_andn(v128 a, v128 b) { return c_v128_andn(a, b); }
SIMD_INLINE v128 v128_add_8(v128 a, v128 b) { return c_v128_add_8(a, b); }
SIMD_INLINE v128 v128_add_16(v128 a, v128 b) { return c_v128_add_16(a, b); }
SIMD_INLINE v128 v128_sadd_s16(v128 a, v128 b) { return c_v128_sadd_s16(a, b); }
SIMD_INLINE v128 v128_add_32(v128 a, v128 b) { return c_v128_add_32(a, b); }
SIMD_INLINE v128 v128_padd_s16(v128 a) { return c_v128_padd_s16(a); }
SIMD_INLINE v128 v128_sub_8(v128 a, v128 b) { return c_v128_sub_8(a, b); }
SIMD_INLINE v128 v128_ssub_u8(v128 a, v128 b) { return c_v128_ssub_u8(a, b); }
SIMD_INLINE v128 v128_ssub_s8(v128 a, v128 b) { return c_v128_ssub_s8(a, b); }
SIMD_INLINE v128 v128_sub_16(v128 a, v128 b) { return c_v128_sub_16(a, b); }
SIMD_INLINE v128 v128_ssub_s16(v128 a, v128 b) { return c_v128_ssub_s16(a, b); }
SIMD_INLINE v128 v128_sub_32(v128 a, v128 b) { return c_v128_sub_32(a, b); }
SIMD_INLINE v128 v128_abs_s16(v128 a) { return c_v128_abs_s16(a); }
SIMD_INLINE v128 v128_mul_s16(v64 a, v64 b) { return c_v128_mul_s16(a, b); }
SIMD_INLINE v128 v128_mullo_s16(v128 a, v128 b) {
return c_v128_mullo_s16(a, b);
}
SIMD_INLINE v128 v128_mulhi_s16(v128 a, v128 b) {
return c_v128_mulhi_s16(a, b);
}
SIMD_INLINE v128 v128_mullo_s32(v128 a, v128 b) {
return c_v128_mullo_s32(a, b);
}
SIMD_INLINE v128 v128_madd_s16(v128 a, v128 b) { return c_v128_madd_s16(a, b); }
SIMD_INLINE v128 v128_madd_us8(v128 a, v128 b) { return c_v128_madd_us8(a, b); }
SIMD_INLINE v128 v128_avg_u8(v128 a, v128 b) { return c_v128_avg_u8(a, b); }
SIMD_INLINE v128 v128_rdavg_u8(v128 a, v128 b) { return c_v128_rdavg_u8(a, b); }
SIMD_INLINE v128 v128_avg_u16(v128 a, v128 b) { return c_v128_avg_u16(a, b); }
SIMD_INLINE v128 v128_min_u8(v128 a, v128 b) { return c_v128_min_u8(a, b); }
SIMD_INLINE v128 v128_max_u8(v128 a, v128 b) { return c_v128_max_u8(a, b); }
SIMD_INLINE v128 v128_min_s8(v128 a, v128 b) { return c_v128_min_s8(a, b); }
SIMD_INLINE v128 v128_max_s8(v128 a, v128 b) { return c_v128_max_s8(a, b); }
SIMD_INLINE v128 v128_min_s16(v128 a, v128 b) { return c_v128_min_s16(a, b); }
SIMD_INLINE v128 v128_max_s16(v128 a, v128 b) { return c_v128_max_s16(a, b); }
SIMD_INLINE v128 v128_ziplo_8(v128 a, v128 b) { return c_v128_ziplo_8(a, b); }
SIMD_INLINE v128 v128_ziphi_8(v128 a, v128 b) { return c_v128_ziphi_8(a, b); }
SIMD_INLINE v128 v128_ziplo_16(v128 a, v128 b) { return c_v128_ziplo_16(a, b); }
SIMD_INLINE v128 v128_ziphi_16(v128 a, v128 b) { return c_v128_ziphi_16(a, b); }
SIMD_INLINE v128 v128_ziplo_32(v128 a, v128 b) { return c_v128_ziplo_32(a, b); }
SIMD_INLINE v128 v128_ziphi_32(v128 a, v128 b) { return c_v128_ziphi_32(a, b); }
SIMD_INLINE v128 v128_ziplo_64(v128 a, v128 b) { return c_v128_ziplo_64(a, b); }
SIMD_INLINE v128 v128_ziphi_64(v128 a, v128 b) { return c_v128_ziphi_64(a, b); }
SIMD_INLINE v128 v128_zip_8(v64 a, v64 b) { return c_v128_zip_8(a, b); }
SIMD_INLINE v128 v128_zip_16(v64 a, v64 b) { return c_v128_zip_16(a, b); }
SIMD_INLINE v128 v128_zip_32(v64 a, v64 b) { return c_v128_zip_32(a, b); }
SIMD_INLINE v128 v128_unziplo_8(v128 a, v128 b) {
return c_v128_unziplo_8(a, b);
}
SIMD_INLINE v128 v128_unziphi_8(v128 a, v128 b) {
return c_v128_unziphi_8(a, b);
}
SIMD_INLINE v128 v128_unziplo_16(v128 a, v128 b) {
return c_v128_unziplo_16(a, b);
}
SIMD_INLINE v128 v128_unziphi_16(v128 a, v128 b) {
return c_v128_unziphi_16(a, b);
}
SIMD_INLINE v128 v128_unziplo_32(v128 a, v128 b) {
return c_v128_unziplo_32(a, b);
}
SIMD_INLINE v128 v128_unziphi_32(v128 a, v128 b) {
return c_v128_unziphi_32(a, b);
}
SIMD_INLINE v128 v128_unpack_u8_s16(v64 a) { return c_v128_unpack_u8_s16(a); }
SIMD_INLINE v128 v128_unpacklo_u8_s16(v128 a) {
return c_v128_unpacklo_u8_s16(a);
}
SIMD_INLINE v128 v128_unpackhi_u8_s16(v128 a) {
return c_v128_unpackhi_u8_s16(a);
}
SIMD_INLINE v128 v128_pack_s32_s16(v128 a, v128 b) {
return c_v128_pack_s32_s16(a, b);
}
SIMD_INLINE v128 v128_pack_s16_u8(v128 a, v128 b) {
return c_v128_pack_s16_u8(a, b);
}
SIMD_INLINE v128 v128_pack_s16_s8(v128 a, v128 b) {
return c_v128_pack_s16_s8(a, b);
}
SIMD_INLINE v128 v128_unpack_u16_s32(v64 a) { return c_v128_unpack_u16_s32(a); }
SIMD_INLINE v128 v128_unpack_s16_s32(v64 a) { return c_v128_unpack_s16_s32(a); }
SIMD_INLINE v128 v128_unpacklo_u16_s32(v128 a) {
return c_v128_unpacklo_u16_s32(a);
}
SIMD_INLINE v128 v128_unpacklo_s16_s32(v128 a) {
return c_v128_unpacklo_s16_s32(a);
}
SIMD_INLINE v128 v128_unpackhi_u16_s32(v128 a) {
return c_v128_unpackhi_u16_s32(a);
}
SIMD_INLINE v128 v128_unpackhi_s16_s32(v128 a) {
return c_v128_unpackhi_s16_s32(a);
}
SIMD_INLINE v128 v128_shuffle_8(v128 a, v128 pattern) {
return c_v128_shuffle_8(a, pattern);
}
SIMD_INLINE v128 v128_cmpgt_s8(v128 a, v128 b) { return c_v128_cmpgt_s8(a, b); }
SIMD_INLINE v128 v128_cmplt_s8(v128 a, v128 b) { return c_v128_cmplt_s8(a, b); }
SIMD_INLINE v128 v128_cmpeq_8(v128 a, v128 b) { return c_v128_cmpeq_8(a, b); }
SIMD_INLINE v128 v128_cmpgt_s16(v128 a, v128 b) {
return c_v128_cmpgt_s16(a, b);
}
SIMD_INLINE v128 v128_cmplt_s16(v128 a, v128 b) {
return c_v128_cmplt_s16(a, b);
}
SIMD_INLINE v128 v128_cmpeq_16(v128 a, v128 b) { return c_v128_cmpeq_16(a, b); }
SIMD_INLINE v128 v128_shl_8(v128 a, unsigned int c) {
return c_v128_shl_8(a, c);
}
SIMD_INLINE v128 v128_shr_u8(v128 a, unsigned int c) {
return c_v128_shr_u8(a, c);
}
SIMD_INLINE v128 v128_shr_s8(v128 a, unsigned int c) {
return c_v128_shr_s8(a, c);
}
SIMD_INLINE v128 v128_shl_16(v128 a, unsigned int c) {
return c_v128_shl_16(a, c);
}
SIMD_INLINE v128 v128_shr_u16(v128 a, unsigned int c) {
return c_v128_shr_u16(a, c);
}
SIMD_INLINE v128 v128_shr_s16(v128 a, unsigned int c) {
return c_v128_shr_s16(a, c);
}
SIMD_INLINE v128 v128_shl_32(v128 a, unsigned int c) {
return c_v128_shl_32(a, c);
}
SIMD_INLINE v128 v128_shr_u32(v128 a, unsigned int c) {
return c_v128_shr_u32(a, c);
}
SIMD_INLINE v128 v128_shr_s32(v128 a, unsigned int c) {
return c_v128_shr_s32(a, c);
}
SIMD_INLINE v128 v128_shr_n_byte(v128 a, const unsigned int n) {
return c_v128_shr_n_byte(a, n);
}
SIMD_INLINE v128 v128_shl_n_byte(v128 a, const unsigned int n) {
return c_v128_shl_n_byte(a, n);
}
SIMD_INLINE v128 v128_shl_n_8(v128 a, const unsigned int n) {
return c_v128_shl_n_8(a, n);
}
SIMD_INLINE v128 v128_shl_n_16(v128 a, const unsigned int n) {
return c_v128_shl_n_16(a, n);
}
SIMD_INLINE v128 v128_shl_n_32(v128 a, const unsigned int n) {
return c_v128_shl_n_32(a, n);
}
SIMD_INLINE v128 v128_shr_n_u8(v128 a, const unsigned int n) {
return c_v128_shr_n_u8(a, n);
}
SIMD_INLINE v128 v128_shr_n_u16(v128 a, const unsigned int n) {
return c_v128_shr_n_u16(a, n);
}
SIMD_INLINE v128 v128_shr_n_u32(v128 a, const unsigned int n) {
return c_v128_shr_n_u32(a, n);
}
SIMD_INLINE v128 v128_shr_n_s8(v128 a, const unsigned int n) {
return c_v128_shr_n_s8(a, n);
}
SIMD_INLINE v128 v128_shr_n_s16(v128 a, const unsigned int n) {
return c_v128_shr_n_s16(a, n);
}
SIMD_INLINE v128 v128_shr_n_s32(v128 a, const unsigned int n) {
return c_v128_shr_n_s32(a, n);
}
#endif /* _V128_INTRINSICS_H */

View File

@@ -1,655 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef _V128_INTRINSICS_H
#define _V128_INTRINSICS_H
#include <arm_neon.h>
#include "./v64_intrinsics_arm.h"
typedef int64x2_t v128;
SIMD_INLINE uint32_t v128_low_u32(v128 a) {
return v64_low_u32(vget_low_s64(a));
}
SIMD_INLINE v64 v128_low_v64(v128 a) { return vget_low_s64(a); }
SIMD_INLINE v64 v128_high_v64(v128 a) { return vget_high_s64(a); }
SIMD_INLINE v128 v128_from_v64(v64 a, v64 b) { return vcombine_s64(b, a); }
SIMD_INLINE v128 v128_from_64(uint64_t a, uint64_t b) {
return vcombine_s64((uint64x1_t)b, (uint64x1_t)a);
}
SIMD_INLINE v128 v128_from_32(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
return vcombine_s64(v64_from_32(c, d), v64_from_32(a, b));
}
SIMD_INLINE v128 v128_load_aligned(const void *p) {
return vreinterpretq_s64_u8(vld1q_u8((const uint8_t *)p));
}
SIMD_INLINE v128 v128_load_unaligned(const void *p) {
return v128_load_aligned(p);
}
SIMD_INLINE void v128_store_aligned(void *p, v128 r) {
vst1q_u8((uint8_t *)p, vreinterpretq_u8_s64(r));
}
SIMD_INLINE void v128_store_unaligned(void *p, v128 r) {
vst1q_u8((uint8_t *)p, vreinterpretq_u8_s64(r));
}
SIMD_INLINE v128 v128_align(v128 a, v128 b, const unsigned int c) {
// The following functions require an immediate.
// Some compilers will check this during optimisation, others wont.
#if __OPTIMIZE__ && !__clang__
return c ? vreinterpretq_s64_s8(
vextq_s8(vreinterpretq_s8_s64(b), vreinterpretq_s8_s64(a), c))
: b;
#else
return c < 8 ? v128_from_v64(v64_align(v128_low_v64(a), v128_high_v64(b), c),
v64_align(v128_high_v64(b), v128_low_v64(b), c))
: v128_from_v64(
v64_align(v128_high_v64(a), v128_low_v64(a), c - 8),
v64_align(v128_low_v64(a), v128_high_v64(b), c - 8));
#endif
}
SIMD_INLINE v128 v128_zero() { return vreinterpretq_s64_u8(vdupq_n_u8(0)); }
SIMD_INLINE v128 v128_ones() { return vreinterpretq_s64_u8(vdupq_n_u8(-1)); }
SIMD_INLINE v128 v128_dup_8(uint8_t x) {
return vreinterpretq_s64_u8(vdupq_n_u8(x));
}
SIMD_INLINE v128 v128_dup_16(uint16_t x) {
return vreinterpretq_s64_u16(vdupq_n_u16(x));
}
SIMD_INLINE v128 v128_dup_32(uint32_t x) {
return vreinterpretq_s64_u32(vdupq_n_u32(x));
}
SIMD_INLINE int64_t v128_dotp_s16(v128 a, v128 b) {
return v64_dotp_s16(vget_high_s64(a), vget_high_s64(b)) +
v64_dotp_s16(vget_low_s64(a), vget_low_s64(b));
}
SIMD_INLINE uint64_t v128_hadd_u8(v128 x) {
uint64x2_t t = vpaddlq_u32(vpaddlq_u16(vpaddlq_u8(vreinterpretq_u8_s64(x))));
return vget_lane_s32(
vreinterpret_s32_u64(vadd_u64(vget_high_u64(t), vget_low_u64(t))), 0);
}
SIMD_INLINE v128 v128_padd_s16(v128 a) {
return vreinterpretq_s64_s32(vpaddlq_s16(vreinterpretq_s16_s64(a)));
}
typedef struct { sad64_internal hi, lo; } sad128_internal;
SIMD_INLINE sad128_internal v128_sad_u8_init() {
sad128_internal s;
s.hi = s.lo = vdupq_n_u16(0);
return s;
}
/* Implementation dependent return value. Result must be finalised with
v128_sad_u8_sum().
The result for more than 32 v128_sad_u8() calls is undefined. */
SIMD_INLINE sad128_internal v128_sad_u8(sad128_internal s, v128 a, v128 b) {
sad128_internal r;
r.hi = v64_sad_u8(s.hi, vget_high_s64(a), vget_high_s64(b));
r.lo = v64_sad_u8(s.lo, vget_low_s64(a), vget_low_s64(b));
return r;
}
SIMD_INLINE uint32_t v128_sad_u8_sum(sad128_internal s) {
return (uint32_t)(v64_sad_u8_sum(s.hi) + v64_sad_u8_sum(s.lo));
}
typedef struct { ssd64_internal hi, lo; } ssd128_internal;
SIMD_INLINE ssd128_internal v128_ssd_u8_init() {
ssd128_internal s;
s.hi = s.lo = (ssd64_internal)(uint64_t)0;
return s;
}
/* Implementation dependent return value. Result must be finalised with
* v128_ssd_u8_sum(). */
SIMD_INLINE ssd128_internal v128_ssd_u8(ssd128_internal s, v128 a, v128 b) {
ssd128_internal r;
r.hi = v64_ssd_u8(s.hi, vget_high_s64(a), vget_high_s64(b));
r.lo = v64_ssd_u8(s.lo, vget_low_s64(a), vget_low_s64(b));
return r;
}
SIMD_INLINE uint32_t v128_ssd_u8_sum(ssd128_internal s) {
return (uint32_t)(v64_ssd_u8_sum(s.hi) + v64_ssd_u8_sum(s.lo));
}
SIMD_INLINE v128 v128_or(v128 x, v128 y) { return vorrq_s64(x, y); }
SIMD_INLINE v128 v128_xor(v128 x, v128 y) { return veorq_s64(x, y); }
SIMD_INLINE v128 v128_and(v128 x, v128 y) { return vandq_s64(x, y); }
SIMD_INLINE v128 v128_andn(v128 x, v128 y) { return vbicq_s64(x, y); }
SIMD_INLINE v128 v128_add_8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vaddq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_add_16(v128 x, v128 y) {
return vreinterpretq_s64_s16(
vaddq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_sadd_s16(v128 x, v128 y) {
return vreinterpretq_s64_s16(
vqaddq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_add_32(v128 x, v128 y) {
return vreinterpretq_s64_u32(
vaddq_u32(vreinterpretq_u32_s64(x), vreinterpretq_u32_s64(y)));
}
SIMD_INLINE v128 v128_sub_8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vsubq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_sub_u8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vqsubq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_sub_16(v128 x, v128 y) {
return vreinterpretq_s64_s16(
vsubq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_ssub_s16(v128 x, v128 y) {
return vreinterpretq_s64_s16(
vqsubq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_ssub_u8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vqsubq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_ssub_s8(v128 x, v128 y) {
return vreinterpretq_s64_s8(
vqsubq_s8(vreinterpretq_s8_s64(x), vreinterpretq_s8_s64(y)));
}
SIMD_INLINE v128 v128_sub_32(v128 x, v128 y) {
return vreinterpretq_s64_s32(
vsubq_s32(vreinterpretq_s32_s64(x), vreinterpretq_s32_s64(y)));
}
SIMD_INLINE v128 v128_abs_s16(v128 x) {
return vreinterpretq_s64_s16(vabsq_s16(vreinterpretq_s16_s64(x)));
}
SIMD_INLINE v128 v128_mul_s16(v64 a, v64 b) {
return vreinterpretq_s64_s32(
vmull_s16(vreinterpret_s16_s64(a), vreinterpret_s16_s64(b)));
}
SIMD_INLINE v128 v128_mullo_s16(v128 a, v128 b) {
return vreinterpretq_s64_s16(
vmulq_s16(vreinterpretq_s16_s64(a), vreinterpretq_s16_s64(b)));
}
SIMD_INLINE v128 v128_mulhi_s16(v128 a, v128 b) {
return v128_from_v64(v64_mulhi_s16(vget_high_s64(a), vget_high_s64(b)),
v64_mulhi_s16(vget_low_s64(a), vget_low_s64(b)));
}
SIMD_INLINE v128 v128_mullo_s32(v128 a, v128 b) {
return vreinterpretq_s64_s32(
vmulq_s32(vreinterpretq_s32_s64(a), vreinterpretq_s32_s64(b)));
}
SIMD_INLINE v128 v128_madd_s16(v128 a, v128 b) {
return v128_from_v64(v64_madd_s16(vget_high_s64(a), vget_high_s64(b)),
v64_madd_s16(vget_low_s64(a), vget_low_s64(b)));
}
SIMD_INLINE v128 v128_madd_us8(v128 a, v128 b) {
return v128_from_v64(v64_madd_us8(vget_high_s64(a), vget_high_s64(b)),
v64_madd_us8(vget_low_s64(a), vget_low_s64(b)));
}
SIMD_INLINE v128 v128_avg_u8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vrhaddq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_rdavg_u8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vhaddq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_avg_u16(v128 x, v128 y) {
return vreinterpretq_s64_u16(
vrhaddq_u16(vreinterpretq_u16_s64(x), vreinterpretq_u16_s64(y)));
}
SIMD_INLINE v128 v128_min_u8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vminq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_max_u8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vmaxq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_min_s8(v128 x, v128 y) {
return vreinterpretq_s64_s8(
vminq_s8(vreinterpretq_s8_s64(x), vreinterpretq_s8_s64(y)));
}
SIMD_INLINE v128 v128_max_s8(v128 x, v128 y) {
return vreinterpretq_s64_s8(
vmaxq_s8(vreinterpretq_s8_s64(x), vreinterpretq_s8_s64(y)));
}
SIMD_INLINE v128 v128_min_s16(v128 x, v128 y) {
return vreinterpretq_s64_s16(
vminq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_max_s16(v128 x, v128 y) {
return vreinterpretq_s64_s16(
vmaxq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_ziplo_8(v128 x, v128 y) {
uint8x16x2_t r = vzipq_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x));
return vreinterpretq_s64_u8(r.val[0]);
}
SIMD_INLINE v128 v128_ziphi_8(v128 x, v128 y) {
uint8x16x2_t r = vzipq_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x));
return vreinterpretq_s64_u8(r.val[1]);
}
SIMD_INLINE v128 v128_zip_8(v64 x, v64 y) {
uint8x8x2_t r = vzip_u8(vreinterpret_u8_s64(y), vreinterpret_u8_s64(x));
return vreinterpretq_s64_u8(vcombine_u8(r.val[0], r.val[1]));
}
SIMD_INLINE v128 v128_ziplo_16(v128 x, v128 y) {
int16x8x2_t r = vzipq_s16(vreinterpretq_s16_s64(y), vreinterpretq_s16_s64(x));
return vreinterpretq_s64_s16(r.val[0]);
}
SIMD_INLINE v128 v128_ziphi_16(v128 x, v128 y) {
int16x8x2_t r = vzipq_s16(vreinterpretq_s16_s64(y), vreinterpretq_s16_s64(x));
return vreinterpretq_s64_s16(r.val[1]);
}
SIMD_INLINE v128 v128_zip_16(v64 x, v64 y) {
uint16x4x2_t r = vzip_u16(vreinterpret_u16_s64(y), vreinterpret_u16_s64(x));
return vreinterpretq_s64_u16(vcombine_u16(r.val[0], r.val[1]));
}
SIMD_INLINE v128 v128_ziplo_32(v128 x, v128 y) {
int32x4x2_t r = vzipq_s32(vreinterpretq_s32_s64(y), vreinterpretq_s32_s64(x));
return vreinterpretq_s64_s32(r.val[0]);
}
SIMD_INLINE v128 v128_ziphi_32(v128 x, v128 y) {
int32x4x2_t r = vzipq_s32(vreinterpretq_s32_s64(y), vreinterpretq_s32_s64(x));
return vreinterpretq_s64_s32(r.val[1]);
}
SIMD_INLINE v128 v128_zip_32(v64 x, v64 y) {
uint32x2x2_t r = vzip_u32(vreinterpret_u32_s64(y), vreinterpret_u32_s64(x));
return vreinterpretq_s64_u32(vcombine_u32(r.val[0], r.val[1]));
}
SIMD_INLINE v128 v128_ziplo_64(v128 a, v128 b) {
return v128_from_v64(vget_low_u64((uint64x2_t)a),
vget_low_u64((uint64x2_t)b));
}
SIMD_INLINE v128 v128_ziphi_64(v128 a, v128 b) {
return v128_from_v64(vget_high_u64((uint64x2_t)a),
vget_high_u64((uint64x2_t)b));
}
SIMD_INLINE v128 v128_unziplo_8(v128 x, v128 y) {
uint8x16x2_t r = vuzpq_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x));
return vreinterpretq_s64_u8(r.val[0]);
}
SIMD_INLINE v128 v128_unziphi_8(v128 x, v128 y) {
uint8x16x2_t r = vuzpq_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x));
return vreinterpretq_s64_u8(r.val[1]);
}
SIMD_INLINE v128 v128_unziplo_16(v128 x, v128 y) {
uint16x8x2_t r =
vuzpq_u16(vreinterpretq_u16_s64(y), vreinterpretq_u16_s64(x));
return vreinterpretq_s64_u16(r.val[0]);
}
SIMD_INLINE v128 v128_unziphi_16(v128 x, v128 y) {
uint16x8x2_t r =
vuzpq_u16(vreinterpretq_u16_s64(y), vreinterpretq_u16_s64(x));
return vreinterpretq_s64_u16(r.val[1]);
}
SIMD_INLINE v128 v128_unziplo_32(v128 x, v128 y) {
uint32x4x2_t r =
vuzpq_u32(vreinterpretq_u32_s64(y), vreinterpretq_u32_s64(x));
return vreinterpretq_s64_u32(r.val[0]);
}
SIMD_INLINE v128 v128_unziphi_32(v128 x, v128 y) {
uint32x4x2_t r =
vuzpq_u32(vreinterpretq_u32_s64(y), vreinterpretq_u32_s64(x));
return vreinterpretq_s64_u32(r.val[1]);
}
SIMD_INLINE v128 v128_unpack_u8_s16(v64 a) {
return vreinterpretq_s64_u16(vmovl_u8(vreinterpret_u8_s64(a)));
}
SIMD_INLINE v128 v128_unpacklo_u8_s16(v128 a) {
return vreinterpretq_s64_u16(vmovl_u8(vreinterpret_u8_s64(vget_low_s64(a))));
}
SIMD_INLINE v128 v128_unpackhi_u8_s16(v128 a) {
return vreinterpretq_s64_u16(vmovl_u8(vreinterpret_u8_s64(vget_high_s64(a))));
}
SIMD_INLINE v128 v128_pack_s32_s16(v128 a, v128 b) {
return v128_from_v64(
vreinterpret_s64_s16(vqmovn_s32(vreinterpretq_s32_s64(a))),
vreinterpret_s64_s16(vqmovn_s32(vreinterpretq_s32_s64(b))));
}
SIMD_INLINE v128 v128_pack_s16_u8(v128 a, v128 b) {
return v128_from_v64(
vreinterpret_s64_u8(vqmovun_s16(vreinterpretq_s16_s64(a))),
vreinterpret_s64_u8(vqmovun_s16(vreinterpretq_s16_s64(b))));
}
SIMD_INLINE v128 v128_pack_s16_s8(v128 a, v128 b) {
return v128_from_v64(
vreinterpret_s64_s8(vqmovn_s16(vreinterpretq_s16_s64(a))),
vreinterpret_s64_s8(vqmovn_s16(vreinterpretq_s16_s64(b))));
}
SIMD_INLINE v128 v128_unpack_u16_s32(v64 a) {
return vreinterpretq_s64_u32(vmovl_u16(vreinterpret_u16_s64(a)));
}
SIMD_INLINE v128 v128_unpack_s16_s32(v64 a) {
return vreinterpretq_s64_s32(vmovl_s16(vreinterpret_s16_s64(a)));
}
SIMD_INLINE v128 v128_unpacklo_u16_s32(v128 a) {
return vreinterpretq_s64_u32(
vmovl_u16(vreinterpret_u16_s64(vget_low_s64(a))));
}
SIMD_INLINE v128 v128_unpacklo_s16_s32(v128 a) {
return vreinterpretq_s64_s32(
vmovl_s16(vreinterpret_s16_s64(vget_low_s64(a))));
}
SIMD_INLINE v128 v128_unpackhi_u16_s32(v128 a) {
return vreinterpretq_s64_u32(
vmovl_u16(vreinterpret_u16_s64(vget_high_s64(a))));
}
SIMD_INLINE v128 v128_unpackhi_s16_s32(v128 a) {
return vreinterpretq_s64_s32(
vmovl_s16(vreinterpret_s16_s64(vget_high_s64(a))));
}
SIMD_INLINE v128 v128_shuffle_8(v128 x, v128 pattern) {
return v128_from_64(
(uint64_t)vreinterpret_s64_u8(
vtbl2_u8((uint8x8x2_t){ { vget_low_u8(vreinterpretq_u8_s64(x)),
vget_high_u8(vreinterpretq_u8_s64(x)) } },
vreinterpret_u8_s64(vget_high_s64(pattern)))),
(uint64_t)vreinterpret_s64_u8(
vtbl2_u8((uint8x8x2_t){ { vget_low_u8(vreinterpretq_u8_s64(x)),
vget_high_u8(vreinterpretq_u8_s64(x)) } },
vreinterpret_u8_s64(vget_low_s64(pattern)))));
}
SIMD_INLINE v128 v128_cmpgt_s8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vcgtq_s8(vreinterpretq_s8_s64(x), vreinterpretq_s8_s64(y)));
}
SIMD_INLINE v128 v128_cmplt_s8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vcltq_s8(vreinterpretq_s8_s64(x), vreinterpretq_s8_s64(y)));
}
SIMD_INLINE v128 v128_cmpeq_8(v128 x, v128 y) {
return vreinterpretq_s64_u8(
vceqq_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(y)));
}
SIMD_INLINE v128 v128_cmpgt_s16(v128 x, v128 y) {
return vreinterpretq_s64_u16(
vcgtq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_cmplt_s16(v128 x, v128 y) {
return vreinterpretq_s64_u16(
vcltq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_cmpeq_16(v128 x, v128 y) {
return vreinterpretq_s64_u16(
vceqq_s16(vreinterpretq_s16_s64(x), vreinterpretq_s16_s64(y)));
}
SIMD_INLINE v128 v128_shl_8(v128 a, unsigned int c) {
return (c > 7) ? v128_zero() : vreinterpretq_s64_u8(vshlq_u8(
vreinterpretq_u8_s64(a), vdupq_n_s8(c)));
}
SIMD_INLINE v128 v128_shr_u8(v128 a, unsigned int c) {
return (c > 7) ? v128_zero() : vreinterpretq_s64_u8(vshlq_u8(
vreinterpretq_u8_s64(a), vdupq_n_s8(-c)));
}
SIMD_INLINE v128 v128_shr_s8(v128 a, unsigned int c) {
return (c > 7) ? v128_ones() : vreinterpretq_s64_s8(vshlq_s8(
vreinterpretq_s8_s64(a), vdupq_n_s8(-c)));
}
SIMD_INLINE v128 v128_shl_16(v128 a, unsigned int c) {
return (c > 15) ? v128_zero()
: vreinterpretq_s64_u16(
vshlq_u16(vreinterpretq_u16_s64(a), vdupq_n_s16(c)));
}
SIMD_INLINE v128 v128_shr_u16(v128 a, unsigned int c) {
return (c > 15) ? v128_zero()
: vreinterpretq_s64_u16(
vshlq_u16(vreinterpretq_u16_s64(a), vdupq_n_s16(-c)));
}
SIMD_INLINE v128 v128_shr_s16(v128 a, unsigned int c) {
return (c > 15) ? v128_ones()
: vreinterpretq_s64_s16(
vshlq_s16(vreinterpretq_s16_s64(a), vdupq_n_s16(-c)));
}
SIMD_INLINE v128 v128_shl_32(v128 a, unsigned int c) {
return (c > 31) ? v128_zero()
: vreinterpretq_s64_u32(
vshlq_u32(vreinterpretq_u32_s64(a), vdupq_n_s32(c)));
}
SIMD_INLINE v128 v128_shr_u32(v128 a, unsigned int c) {
return (c > 31) ? v128_zero()
: vreinterpretq_s64_u32(
vshlq_u32(vreinterpretq_u32_s64(a), vdupq_n_s32(-c)));
}
SIMD_INLINE v128 v128_shr_s32(v128 a, unsigned int c) {
return (c > 31) ? v128_ones()
: vreinterpretq_s64_s32(
vshlq_s32(vreinterpretq_s32_s64(a), vdupq_n_s32(-c)));
}
#if __OPTIMIZE__ && !__clang__
SIMD_INLINE v128 v128_shl_n_byte(v128 a, const unsigned int n) {
return n < 8
? v128_from_64(
(uint64_t)vorr_u64(
vshl_n_u64(vreinterpret_u64_s64(vget_high_s64(a)),
n * 8),
vshr_n_u64(vreinterpret_u64_s64(vget_low_s64(a)),
(8 - n) * 8)),
(uint64_t)vshl_n_u64(vreinterpret_u64_s64(vget_low_s64(a)),
n * 8))
: (n == 8 ? v128_from_64(
(uint64_t)vreinterpret_u64_s64(vget_low_s64(a)), 0)
: v128_from_64((uint64_t)vshl_n_u64(
vreinterpret_u64_s64(vget_low_s64(a)),
(n - 8) * 8),
0));
}
SIMD_INLINE v128 v128_shr_n_byte(v128 a, const unsigned int n) {
return n < 8
? v128_from_64(
vshr_n_u64(vreinterpret_u64_s64(vget_high_s64(a)), n * 8),
vorr_u64(
vshr_n_u64(vreinterpret_u64_s64(vget_low_s64(a)), n * 8),
vshl_n_u64(vreinterpret_u64_s64(vget_high_s64(a)),
(8 - n) * 8)))
: (n == 8
? v128_from_64(0, vreinterpret_u64_s64(vget_high_s64(a)))
: v128_from_64(
0, vshr_n_u64(vreinterpret_u64_s64(vget_high_s64(a)),
(n - 8) * 8)));
}
SIMD_INLINE v128 v128_shl_n_8(v128 a, const unsigned int c) {
return vreinterpretq_s64_u8(vshlq_n_u8(vreinterpretq_u8_s64(a), c));
}
SIMD_INLINE v128 v128_shr_n_u8(v128 a, const unsigned int c) {
return vreinterpretq_s64_u8(vshrq_n_u8(vreinterpretq_u8_s64(a), c));
}
SIMD_INLINE v128 v128_shr_n_s8(v128 a, const unsigned int c) {
return vreinterpretq_s64_s8(vshrq_n_s8(vreinterpretq_s8_s64(a), c));
}
SIMD_INLINE v128 v128_shl_n_16(v128 a, const unsigned int c) {
return vreinterpretq_s64_u16(vshlq_n_u16(vreinterpretq_u16_s64(a), c));
}
SIMD_INLINE v128 v128_shr_n_u16(v128 a, const unsigned int c) {
return vreinterpretq_s64_u16(vshrq_n_u16(vreinterpretq_u16_s64(a), c));
}
SIMD_INLINE v128 v128_shr_n_s16(v128 a, const unsigned int c) {
return vreinterpretq_s64_s16(vshrq_n_s16(vreinterpretq_s16_s64(a), c));
}
SIMD_INLINE v128 v128_shl_n_32(v128 a, const unsigned int c) {
return vreinterpretq_s64_u32(vshlq_n_u32(vreinterpretq_u32_s64(a), c));
}
SIMD_INLINE v128 v128_shr_n_u32(v128 a, const unsigned int c) {
return vreinterpretq_s64_u32(vshrq_n_u32(vreinterpretq_u32_s64(a), c));
}
SIMD_INLINE v128 v128_shr_n_s32(v128 a, const unsigned int c) {
return vreinterpretq_s64_s32(vshrq_n_s32(vreinterpretq_s32_s64(a), c));
}
#else
SIMD_INLINE v128 v128_shl_n_byte(v128 a, const unsigned int n) {
if (n < 8)
return v128_from_v64(v64_or(v64_shl_n_byte(v128_high_v64(a), n),
v64_shr_n_byte(v128_low_v64(a), 8 - n)),
v64_shl_n_byte(v128_low_v64(a), n));
else
return v128_from_v64(v64_shl_n_byte(v128_low_v64(a), n - 8), v64_zero());
}
SIMD_INLINE v128 v128_shr_n_byte(v128 a, const unsigned int n) {
if (n < 8)
return v128_from_v64(v64_shr_n_byte(v128_high_v64(a), n),
v64_or(v64_shr_n_byte(v128_low_v64(a), n),
v64_shl_n_byte(v128_high_v64(a), 8 - n)));
else
return v128_from_v64(v64_zero(), v64_shr_n_byte(v128_high_v64(a), n - 8));
}
SIMD_INLINE v128 v128_shl_n_8(v128 a, const unsigned int c) {
return v128_shl_8(a, c);
}
SIMD_INLINE v128 v128_shr_n_u8(v128 a, const unsigned int c) {
return v128_shr_u8(a, c);
}
SIMD_INLINE v128 v128_shr_n_s8(v128 a, const unsigned int c) {
return v128_shr_s8(a, c);
}
SIMD_INLINE v128 v128_shl_n_16(v128 a, const unsigned int c) {
return v128_shl_16(a, c);
}
SIMD_INLINE v128 v128_shr_n_u16(v128 a, const unsigned int c) {
return v128_shr_u16(a, c);
}
SIMD_INLINE v128 v128_shr_n_s16(v128 a, const unsigned int c) {
return v128_shr_s16(a, c);
}
SIMD_INLINE v128 v128_shl_n_32(v128 a, const unsigned int c) {
return v128_shl_32(a, c);
}
SIMD_INLINE v128 v128_shr_n_u32(v128 a, const unsigned int c) {
return v128_shr_u32(a, c);
}
SIMD_INLINE v128 v128_shr_n_s32(v128 a, const unsigned int c) {
return v128_shr_s32(a, c);
}
#endif
#endif /* _V128_INTRINSICS_H */

View File

@@ -1,684 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef _V128_INTRINSICS_C_H
#define _V128_INTRINSICS_C_H
#include <stdio.h>
#include <stdlib.h>
#include "./v64_intrinsics_c.h"
#include "./aom_config.h"
typedef union {
uint8_t u8[16];
uint16_t u16[8];
uint32_t u32[4];
uint64_t u64[2];
int8_t s8[16];
int16_t s16[8];
int32_t s32[4];
int64_t s64[2];
c_v64 v64[2];
} c_v128;
SIMD_INLINE uint32_t c_v128_low_u32(c_v128 a) { return a.u32[0]; }
SIMD_INLINE c_v64 c_v128_low_v64(c_v128 a) { return a.v64[0]; }
SIMD_INLINE c_v64 c_v128_high_v64(c_v128 a) { return a.v64[1]; }
SIMD_INLINE c_v128 c_v128_from_64(uint64_t hi, uint64_t lo) {
c_v128 t;
t.u64[1] = hi;
t.u64[0] = lo;
return t;
}
SIMD_INLINE c_v128 c_v128_from_v64(c_v64 hi, c_v64 lo) {
c_v128 t;
t.v64[1] = hi;
t.v64[0] = lo;
return t;
}
SIMD_INLINE c_v128 c_v128_from_32(uint32_t a, uint32_t b, uint32_t c,
uint32_t d) {
c_v128 t;
t.u32[3] = a;
t.u32[2] = b;
t.u32[1] = c;
t.u32[0] = d;
return t;
}
SIMD_INLINE c_v128 c_v128_load_unaligned(const void *p) {
c_v128 t;
uint8_t *pp = (uint8_t *)p;
uint8_t *q = (uint8_t *)&t;
int c;
for (c = 0; c < 16; c++) q[c] = pp[c];
return t;
}
SIMD_INLINE c_v128 c_v128_load_aligned(const void *p) {
if (simd_check && (uintptr_t)p & 15) {
fprintf(stderr, "Error: unaligned v128 load at %p\n", p);
abort();
}
return c_v128_load_unaligned(p);
}
SIMD_INLINE void c_v128_store_unaligned(void *p, c_v128 a) {
uint8_t *pp = (uint8_t *)p;
uint8_t *q = (uint8_t *)&a;
int c;
for (c = 0; c < 16; c++) pp[c] = q[c];
}
SIMD_INLINE void c_v128_store_aligned(void *p, c_v128 a) {
if (simd_check && (uintptr_t)p & 15) {
fprintf(stderr, "Error: unaligned v128 store at %p\n", p);
abort();
}
c_v128_store_unaligned(p, a);
}
SIMD_INLINE c_v128 c_v128_zero() {
c_v128 t;
t.u64[1] = t.u64[0] = 0;
return t;
}
SIMD_INLINE c_v128 c_v128_dup_8(uint8_t x) {
c_v128 t;
t.v64[1] = t.v64[0] = c_v64_dup_8(x);
return t;
}
SIMD_INLINE c_v128 c_v128_dup_16(uint16_t x) {
c_v128 t;
t.v64[1] = t.v64[0] = c_v64_dup_16(x);
return t;
}
SIMD_INLINE c_v128 c_v128_dup_32(uint32_t x) {
c_v128 t;
t.v64[1] = t.v64[0] = c_v64_dup_32(x);
return t;
}
SIMD_INLINE int64_t c_v128_dotp_s16(c_v128 a, c_v128 b) {
return c_v64_dotp_s16(a.v64[1], b.v64[1]) +
c_v64_dotp_s16(a.v64[0], b.v64[0]);
}
SIMD_INLINE uint64_t c_v128_hadd_u8(c_v128 a) {
return c_v64_hadd_u8(a.v64[1]) + c_v64_hadd_u8(a.v64[0]);
}
typedef uint32_t c_sad128_internal;
SIMD_INLINE c_sad128_internal c_v128_sad_u8_init() { return 0; }
/* Implementation dependent return value. Result must be finalised with
v128_sad_u8_sum().
The result for more than 32 v128_sad_u8() calls is undefined. */
SIMD_INLINE c_sad128_internal c_v128_sad_u8(c_sad128_internal s, c_v128 a,
c_v128 b) {
int c;
for (c = 0; c < 16; c++)
s += a.u8[c] > b.u8[c] ? a.u8[c] - b.u8[c] : b.u8[c] - a.u8[c];
return s;
}
SIMD_INLINE uint32_t c_v128_sad_u8_sum(c_sad128_internal s) { return s; }
typedef uint32_t c_ssd128_internal;
SIMD_INLINE c_ssd128_internal c_v128_ssd_u8_init() { return 0; }
/* Implementation dependent return value. Result must be finalised with
* v128_ssd_u8_sum(). */
SIMD_INLINE c_ssd128_internal c_v128_ssd_u8(c_ssd128_internal s, c_v128 a,
c_v128 b) {
int c;
for (c = 0; c < 16; c++) s += (a.u8[c] - b.u8[c]) * (a.u8[c] - b.u8[c]);
return s;
}
SIMD_INLINE uint32_t c_v128_ssd_u8_sum(c_ssd128_internal s) { return s; }
SIMD_INLINE c_v128 c_v128_or(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_or(a.v64[1], b.v64[1]),
c_v64_or(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_xor(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_xor(a.v64[1], b.v64[1]),
c_v64_xor(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_and(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_and(a.v64[1], b.v64[1]),
c_v64_and(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_andn(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_andn(a.v64[1], b.v64[1]),
c_v64_andn(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_add_8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_add_8(a.v64[1], b.v64[1]),
c_v64_add_8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_add_16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_add_16(a.v64[1], b.v64[1]),
c_v64_add_16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_sadd_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_sadd_s16(a.v64[1], b.v64[1]),
c_v64_sadd_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_add_32(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_add_32(a.v64[1], b.v64[1]),
c_v64_add_32(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_padd_s16(c_v128 a) {
c_v128 t;
t.s32[0] = (int32_t)a.s16[0] + (int32_t)a.s16[1];
t.s32[1] = (int32_t)a.s16[2] + (int32_t)a.s16[3];
t.s32[2] = (int32_t)a.s16[4] + (int32_t)a.s16[5];
t.s32[3] = (int32_t)a.s16[6] + (int32_t)a.s16[7];
return t;
}
SIMD_INLINE c_v128 c_v128_sub_8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_sub_8(a.v64[1], b.v64[1]),
c_v64_sub_8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_ssub_u8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ssub_u8(a.v64[1], b.v64[1]),
c_v64_ssub_u8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_ssub_s8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ssub_s8(a.v64[1], b.v64[1]),
c_v64_ssub_s8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_sub_16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_sub_16(a.v64[1], b.v64[1]),
c_v64_sub_16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_ssub_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ssub_s16(a.v64[1], b.v64[1]),
c_v64_ssub_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_sub_32(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_sub_32(a.v64[1], b.v64[1]),
c_v64_sub_32(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_abs_s16(c_v128 a) {
return c_v128_from_v64(c_v64_abs_s16(a.v64[1]), c_v64_abs_s16(a.v64[0]));
}
SIMD_INLINE c_v128 c_v128_mul_s16(c_v64 a, c_v64 b) {
c_v64 lo_bits = c_v64_mullo_s16(a, b);
c_v64 hi_bits = c_v64_mulhi_s16(a, b);
return c_v128_from_v64(c_v64_ziphi_16(hi_bits, lo_bits),
c_v64_ziplo_16(hi_bits, lo_bits));
}
SIMD_INLINE c_v128 c_v128_mullo_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_mullo_s16(a.v64[1], b.v64[1]),
c_v64_mullo_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_mulhi_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_mulhi_s16(a.v64[1], b.v64[1]),
c_v64_mulhi_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_mullo_s32(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_mullo_s32(a.v64[1], b.v64[1]),
c_v64_mullo_s32(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_madd_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_madd_s16(a.v64[1], b.v64[1]),
c_v64_madd_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_madd_us8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_madd_us8(a.v64[1], b.v64[1]),
c_v64_madd_us8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_avg_u8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_avg_u8(a.v64[1], b.v64[1]),
c_v64_avg_u8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_rdavg_u8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_rdavg_u8(a.v64[1], b.v64[1]),
c_v64_rdavg_u8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_avg_u16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_avg_u16(a.v64[1], b.v64[1]),
c_v64_avg_u16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_min_u8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_min_u8(a.v64[1], b.v64[1]),
c_v64_min_u8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_max_u8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_max_u8(a.v64[1], b.v64[1]),
c_v64_max_u8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_min_s8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_min_s8(a.v64[1], b.v64[1]),
c_v64_min_s8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_max_s8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_max_s8(a.v64[1], b.v64[1]),
c_v64_max_s8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_min_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_min_s16(a.v64[1], b.v64[1]),
c_v64_min_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_max_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_max_s16(a.v64[1], b.v64[1]),
c_v64_max_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_ziplo_8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ziphi_8(a.v64[0], b.v64[0]),
c_v64_ziplo_8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_ziphi_8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ziphi_8(a.v64[1], b.v64[1]),
c_v64_ziplo_8(a.v64[1], b.v64[1]));
}
SIMD_INLINE c_v128 c_v128_ziplo_16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ziphi_16(a.v64[0], b.v64[0]),
c_v64_ziplo_16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_ziphi_16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ziphi_16(a.v64[1], b.v64[1]),
c_v64_ziplo_16(a.v64[1], b.v64[1]));
}
SIMD_INLINE c_v128 c_v128_ziplo_32(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ziphi_32(a.v64[0], b.v64[0]),
c_v64_ziplo_32(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_ziphi_32(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_ziphi_32(a.v64[1], b.v64[1]),
c_v64_ziplo_32(a.v64[1], b.v64[1]));
}
SIMD_INLINE c_v128 c_v128_ziplo_64(c_v128 a, c_v128 b) {
return c_v128_from_v64(a.v64[0], b.v64[0]);
}
SIMD_INLINE c_v128 c_v128_ziphi_64(c_v128 a, c_v128 b) {
return c_v128_from_v64(a.v64[1], b.v64[1]);
}
SIMD_INLINE c_v128 c_v128_zip_8(c_v64 a, c_v64 b) {
return c_v128_from_v64(c_v64_ziphi_8(a, b), c_v64_ziplo_8(a, b));
}
SIMD_INLINE c_v128 c_v128_zip_16(c_v64 a, c_v64 b) {
return c_v128_from_v64(c_v64_ziphi_16(a, b), c_v64_ziplo_16(a, b));
}
SIMD_INLINE c_v128 c_v128_zip_32(c_v64 a, c_v64 b) {
return c_v128_from_v64(c_v64_ziphi_32(a, b), c_v64_ziplo_32(a, b));
}
SIMD_INLINE c_v128 _c_v128_unzip_8(c_v128 a, c_v128 b, int mode) {
c_v128 t;
if (mode) {
t.u8[15] = b.u8[15];
t.u8[14] = b.u8[13];
t.u8[13] = b.u8[11];
t.u8[12] = b.u8[9];
t.u8[11] = b.u8[7];
t.u8[10] = b.u8[5];
t.u8[9] = b.u8[3];
t.u8[8] = b.u8[1];
t.u8[7] = a.u8[15];
t.u8[6] = a.u8[13];
t.u8[5] = a.u8[11];
t.u8[4] = a.u8[9];
t.u8[3] = a.u8[7];
t.u8[2] = a.u8[5];
t.u8[1] = a.u8[3];
t.u8[0] = a.u8[1];
} else {
t.u8[15] = a.u8[14];
t.u8[14] = a.u8[12];
t.u8[13] = a.u8[10];
t.u8[12] = a.u8[8];
t.u8[11] = a.u8[6];
t.u8[10] = a.u8[4];
t.u8[9] = a.u8[2];
t.u8[8] = a.u8[0];
t.u8[7] = b.u8[14];
t.u8[6] = b.u8[12];
t.u8[5] = b.u8[10];
t.u8[4] = b.u8[8];
t.u8[3] = b.u8[6];
t.u8[2] = b.u8[4];
t.u8[1] = b.u8[2];
t.u8[0] = b.u8[0];
}
return t;
}
SIMD_INLINE c_v128 c_v128_unziplo_8(c_v128 a, c_v128 b) {
return CONFIG_BIG_ENDIAN ? _c_v128_unzip_8(a, b, 1)
: _c_v128_unzip_8(a, b, 0);
}
SIMD_INLINE c_v128 c_v128_unziphi_8(c_v128 a, c_v128 b) {
return CONFIG_BIG_ENDIAN ? _c_v128_unzip_8(b, a, 0)
: _c_v128_unzip_8(b, a, 1);
}
SIMD_INLINE c_v128 _c_v128_unzip_16(c_v128 a, c_v128 b, int mode) {
c_v128 t;
if (mode) {
t.u16[7] = b.u16[7];
t.u16[6] = b.u16[5];
t.u16[5] = b.u16[3];
t.u16[4] = b.u16[1];
t.u16[3] = a.u16[7];
t.u16[2] = a.u16[5];
t.u16[1] = a.u16[3];
t.u16[0] = a.u16[1];
} else {
t.u16[7] = a.u16[6];
t.u16[6] = a.u16[4];
t.u16[5] = a.u16[2];
t.u16[4] = a.u16[0];
t.u16[3] = b.u16[6];
t.u16[2] = b.u16[4];
t.u16[1] = b.u16[2];
t.u16[0] = b.u16[0];
}
return t;
}
SIMD_INLINE c_v128 c_v128_unziplo_16(c_v128 a, c_v128 b) {
return CONFIG_BIG_ENDIAN ? _c_v128_unzip_16(a, b, 1)
: _c_v128_unzip_16(a, b, 0);
}
SIMD_INLINE c_v128 c_v128_unziphi_16(c_v128 a, c_v128 b) {
return CONFIG_BIG_ENDIAN ? _c_v128_unzip_16(b, a, 0)
: _c_v128_unzip_16(b, a, 1);
}
SIMD_INLINE c_v128 _c_v128_unzip_32(c_v128 a, c_v128 b, int mode) {
c_v128 t;
if (mode) {
t.u32[3] = b.u32[3];
t.u32[2] = b.u32[1];
t.u32[1] = a.u32[3];
t.u32[0] = a.u32[1];
} else {
t.u32[3] = a.u32[2];
t.u32[2] = a.u32[0];
t.u32[1] = b.u32[2];
t.u32[0] = b.u32[0];
}
return t;
}
SIMD_INLINE c_v128 c_v128_unziplo_32(c_v128 a, c_v128 b) {
return CONFIG_BIG_ENDIAN ? _c_v128_unzip_32(a, b, 1)
: _c_v128_unzip_32(a, b, 0);
}
SIMD_INLINE c_v128 c_v128_unziphi_32(c_v128 a, c_v128 b) {
return CONFIG_BIG_ENDIAN ? _c_v128_unzip_32(b, a, 0)
: _c_v128_unzip_32(b, a, 1);
}
SIMD_INLINE c_v128 c_v128_unpack_u8_s16(c_v64 a) {
return c_v128_from_v64(c_v64_unpackhi_u8_s16(a), c_v64_unpacklo_u8_s16(a));
}
SIMD_INLINE c_v128 c_v128_unpacklo_u8_s16(c_v128 a) {
return c_v128_from_v64(c_v64_unpackhi_u8_s16(a.v64[0]),
c_v64_unpacklo_u8_s16(a.v64[0]));
}
SIMD_INLINE c_v128 c_v128_unpackhi_u8_s16(c_v128 a) {
return c_v128_from_v64(c_v64_unpackhi_u8_s16(a.v64[1]),
c_v64_unpacklo_u8_s16(a.v64[1]));
}
SIMD_INLINE c_v128 c_v128_pack_s32_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_pack_s32_s16(a.v64[1], a.v64[0]),
c_v64_pack_s32_s16(b.v64[1], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_pack_s16_u8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_pack_s16_u8(a.v64[1], a.v64[0]),
c_v64_pack_s16_u8(b.v64[1], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_pack_s16_s8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_pack_s16_s8(a.v64[1], a.v64[0]),
c_v64_pack_s16_s8(b.v64[1], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_unpack_u16_s32(c_v64 a) {
return c_v128_from_v64(c_v64_unpackhi_u16_s32(a), c_v64_unpacklo_u16_s32(a));
}
SIMD_INLINE c_v128 c_v128_unpack_s16_s32(c_v64 a) {
return c_v128_from_v64(c_v64_unpackhi_s16_s32(a), c_v64_unpacklo_s16_s32(a));
}
SIMD_INLINE c_v128 c_v128_unpacklo_u16_s32(c_v128 a) {
return c_v128_from_v64(c_v64_unpackhi_u16_s32(a.v64[0]),
c_v64_unpacklo_u16_s32(a.v64[0]));
}
SIMD_INLINE c_v128 c_v128_unpacklo_s16_s32(c_v128 a) {
return c_v128_from_v64(c_v64_unpackhi_s16_s32(a.v64[0]),
c_v64_unpacklo_s16_s32(a.v64[0]));
}
SIMD_INLINE c_v128 c_v128_unpackhi_u16_s32(c_v128 a) {
return c_v128_from_v64(c_v64_unpackhi_u16_s32(a.v64[1]),
c_v64_unpacklo_u16_s32(a.v64[1]));
}
SIMD_INLINE c_v128 c_v128_unpackhi_s16_s32(c_v128 a) {
return c_v128_from_v64(c_v64_unpackhi_s16_s32(a.v64[1]),
c_v64_unpacklo_s16_s32(a.v64[1]));
}
SIMD_INLINE c_v128 c_v128_shuffle_8(c_v128 a, c_v128 pattern) {
c_v128 t;
int c;
for (c = 0; c < 16; c++) {
if (pattern.u8[c] & ~15) {
fprintf(stderr, "Undefined v128_shuffle_8 index %d/%d\n", pattern.u8[c],
c);
abort();
}
t.u8[c] = a.u8[CONFIG_BIG_ENDIAN ? 15 - (pattern.u8[c] & 15)
: pattern.u8[c] & 15];
}
return t;
}
SIMD_INLINE c_v128 c_v128_cmpgt_s8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_cmpgt_s8(a.v64[1], b.v64[1]),
c_v64_cmpgt_s8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_cmplt_s8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_cmplt_s8(a.v64[1], b.v64[1]),
c_v64_cmplt_s8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_cmpeq_8(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_cmpeq_8(a.v64[1], b.v64[1]),
c_v64_cmpeq_8(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_cmpgt_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_cmpgt_s16(a.v64[1], b.v64[1]),
c_v64_cmpgt_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_cmplt_s16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_cmplt_s16(a.v64[1], b.v64[1]),
c_v64_cmplt_s16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_cmpeq_16(c_v128 a, c_v128 b) {
return c_v128_from_v64(c_v64_cmpeq_16(a.v64[1], b.v64[1]),
c_v64_cmpeq_16(a.v64[0], b.v64[0]));
}
SIMD_INLINE c_v128 c_v128_shl_n_byte(c_v128 a, const unsigned int n) {
if (n < 8)
return c_v128_from_v64(c_v64_or(c_v64_shl_n_byte(a.v64[1], n),
c_v64_shr_n_byte(a.v64[0], 8 - n)),
c_v64_shl_n_byte(a.v64[0], n));
else
return c_v128_from_v64(c_v64_shl_n_byte(a.v64[0], n - 8), c_v64_zero());
}
SIMD_INLINE c_v128 c_v128_shr_n_byte(c_v128 a, const unsigned int n) {
if (n < 8)
return c_v128_from_v64(c_v64_shr_n_byte(a.v64[1], n),
c_v64_or(c_v64_shr_n_byte(a.v64[0], n),
c_v64_shl_n_byte(a.v64[1], 8 - n)));
else
return c_v128_from_v64(c_v64_zero(), c_v64_shr_n_byte(a.v64[1], n - 8));
}
SIMD_INLINE c_v128 c_v128_align(c_v128 a, c_v128 b, const unsigned int c) {
if (simd_check && c > 15) {
fprintf(stderr, "Error: undefined alignment %d\n", c);
abort();
}
return c ? c_v128_or(c_v128_shr_n_byte(b, c), c_v128_shl_n_byte(a, 16 - c))
: b;
}
SIMD_INLINE c_v128 c_v128_shl_8(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shl_8(a.v64[1], c), c_v64_shl_8(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shr_u8(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shr_u8(a.v64[1], c), c_v64_shr_u8(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shr_s8(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shr_s8(a.v64[1], c), c_v64_shr_s8(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shl_16(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shl_16(a.v64[1], c), c_v64_shl_16(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shr_u16(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shr_u16(a.v64[1], c),
c_v64_shr_u16(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shr_s16(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shr_s16(a.v64[1], c),
c_v64_shr_s16(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shl_32(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shl_32(a.v64[1], c), c_v64_shl_32(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shr_u32(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shr_u32(a.v64[1], c),
c_v64_shr_u32(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shr_s32(c_v128 a, const unsigned int c) {
return c_v128_from_v64(c_v64_shr_s32(a.v64[1], c),
c_v64_shr_s32(a.v64[0], c));
}
SIMD_INLINE c_v128 c_v128_shl_n_8(c_v128 a, const unsigned int n) {
return c_v128_shl_8(a, n);
}
SIMD_INLINE c_v128 c_v128_shl_n_16(c_v128 a, const unsigned int n) {
return c_v128_shl_16(a, n);
}
SIMD_INLINE c_v128 c_v128_shl_n_32(c_v128 a, const unsigned int n) {
return c_v128_shl_32(a, n);
}
SIMD_INLINE c_v128 c_v128_shr_n_u8(c_v128 a, const unsigned int n) {
return c_v128_shr_u8(a, n);
}
SIMD_INLINE c_v128 c_v128_shr_n_u16(c_v128 a, const unsigned int n) {
return c_v128_shr_u16(a, n);
}
SIMD_INLINE c_v128 c_v128_shr_n_u32(c_v128 a, const unsigned int n) {
return c_v128_shr_u32(a, n);
}
SIMD_INLINE c_v128 c_v128_shr_n_s8(c_v128 a, const unsigned int n) {
return c_v128_shr_s8(a, n);
}
SIMD_INLINE c_v128 c_v128_shr_n_s16(c_v128 a, const unsigned int n) {
return c_v128_shr_s16(a, n);
}
SIMD_INLINE c_v128 c_v128_shr_n_s32(c_v128 a, const unsigned int n) {
return c_v128_shr_s32(a, n);
}
#endif /* _V128_INTRINSICS_C_H */

View File

@@ -1,488 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef _V128_INTRINSICS_H
#define _V128_INTRINSICS_H
#include "./v64_intrinsics_x86.h"
typedef __m128i v128;
SIMD_INLINE uint32_t v128_low_u32(v128 a) {
return (uint32_t)_mm_cvtsi128_si32(a);
}
SIMD_INLINE v64 v128_low_v64(v128 a) {
return _mm_unpacklo_epi64(a, v64_zero());
}
SIMD_INLINE v64 v128_high_v64(v128 a) { return _mm_srli_si128(a, 8); }
SIMD_INLINE v128 v128_from_v64(v64 a, v64 b) {
return _mm_unpacklo_epi64(b, a);
}
SIMD_INLINE v128 v128_from_64(uint64_t a, uint64_t b) {
return v128_from_v64(v64_from_64(a), v64_from_64(b));
}
SIMD_INLINE v128 v128_from_32(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
return _mm_set_epi32(a, b, c, d);
}
SIMD_INLINE v128 v128_load_aligned(const void *p) {
return _mm_load_si128((__m128i *)p);
}
SIMD_INLINE v128 v128_load_unaligned(const void *p) {
#if defined(__SSSE3__)
return (__m128i)_mm_lddqu_si128((__m128i *)p);
#else
return _mm_loadu_si128((__m128i *)p);
#endif
}
SIMD_INLINE void v128_store_aligned(void *p, v128 a) {
_mm_store_si128((__m128i *)p, a);
}
SIMD_INLINE void v128_store_unaligned(void *p, v128 a) {
_mm_storeu_si128((__m128i *)p, a);
}
// The following function requires an immediate.
// Some compilers will check this during optimisation, others wont.
#if __OPTIMIZE__ && !__clang__
#if defined(__SSSE3__)
SIMD_INLINE v128 v128_align(v128 a, v128 b, const unsigned int c) {
return c ? _mm_alignr_epi8(a, b, c) : b;
}
#else
#define v128_align(a, b, c) \
((c) ? _mm_or_si128(_mm_srli_si128(b, c), _mm_slli_si128(a, 16 - (c))) : (b))
#endif
#else
#if defined(__SSSE3__)
#define v128_align(a, b, c) ((c) ? _mm_alignr_epi8(a, b, c) : (b))
#else
#define v128_align(a, b, c) \
((c) ? _mm_or_si128(_mm_srli_si128(b, c), _mm_slli_si128(a, 16 - (c))) : (b))
#endif
#endif
SIMD_INLINE v128 v128_zero() { return _mm_setzero_si128(); }
SIMD_INLINE v128 v128_dup_8(uint8_t x) { return _mm_set1_epi8(x); }
SIMD_INLINE v128 v128_dup_16(uint16_t x) { return _mm_set1_epi16(x); }
SIMD_INLINE v128 v128_dup_32(uint32_t x) { return _mm_set1_epi32(x); }
SIMD_INLINE v128 v128_add_8(v128 a, v128 b) { return _mm_add_epi8(a, b); }
SIMD_INLINE v128 v128_add_16(v128 a, v128 b) { return _mm_add_epi16(a, b); }
SIMD_INLINE v128 v128_sadd_s16(v128 a, v128 b) { return _mm_adds_epi16(a, b); }
SIMD_INLINE v128 v128_add_32(v128 a, v128 b) { return _mm_add_epi32(a, b); }
SIMD_INLINE v128 v128_padd_s16(v128 a) {
return _mm_madd_epi16(a, _mm_set1_epi16(1));
}
SIMD_INLINE v128 v128_sub_8(v128 a, v128 b) { return _mm_sub_epi8(a, b); }
SIMD_INLINE v128 v128_ssub_u8(v128 a, v128 b) { return _mm_subs_epu8(a, b); }
SIMD_INLINE v128 v128_ssub_s8(v128 a, v128 b) { return _mm_subs_epi8(a, b); }
SIMD_INLINE v128 v128_sub_16(v128 a, v128 b) { return _mm_sub_epi16(a, b); }
SIMD_INLINE v128 v128_ssub_s16(v128 a, v128 b) { return _mm_subs_epi16(a, b); }
SIMD_INLINE v128 v128_sub_32(v128 a, v128 b) { return _mm_sub_epi32(a, b); }
SIMD_INLINE v128 v128_abs_s16(v128 a) {
#if defined(__SSSE3__)
return _mm_abs_epi16(a);
#else
return _mm_max_epi16(a, _mm_sub_epi16(_mm_setzero_si128(), a));
#endif
}
SIMD_INLINE v128 v128_ziplo_8(v128 a, v128 b) {
return _mm_unpacklo_epi8(b, a);
}
SIMD_INLINE v128 v128_ziphi_8(v128 a, v128 b) {
return _mm_unpackhi_epi8(b, a);
}
SIMD_INLINE v128 v128_ziplo_16(v128 a, v128 b) {
return _mm_unpacklo_epi16(b, a);
}
SIMD_INLINE v128 v128_ziphi_16(v128 a, v128 b) {
return _mm_unpackhi_epi16(b, a);
}
SIMD_INLINE v128 v128_ziplo_32(v128 a, v128 b) {
return _mm_unpacklo_epi32(b, a);
}
SIMD_INLINE v128 v128_ziphi_32(v128 a, v128 b) {
return _mm_unpackhi_epi32(b, a);
}
SIMD_INLINE v128 v128_ziplo_64(v128 a, v128 b) {
return _mm_unpacklo_epi64(b, a);
}
SIMD_INLINE v128 v128_ziphi_64(v128 a, v128 b) {
return _mm_unpackhi_epi64(b, a);
}
SIMD_INLINE v128 v128_zip_8(v64 a, v64 b) { return _mm_unpacklo_epi8(b, a); }
SIMD_INLINE v128 v128_zip_16(v64 a, v64 b) { return _mm_unpacklo_epi16(b, a); }
SIMD_INLINE v128 v128_zip_32(v64 a, v64 b) { return _mm_unpacklo_epi32(b, a); }
SIMD_INLINE v128 v128_unziphi_8(v128 a, v128 b) {
return _mm_packs_epi16(_mm_srai_epi16(b, 8), _mm_srai_epi16(a, 8));
}
SIMD_INLINE v128 v128_unziplo_8(v128 a, v128 b) {
#if defined(__SSSE3__)
#ifdef __x86_64__
v128 order = _mm_cvtsi64_si128(0x0e0c0a0806040200LL);
#else
v128 order = _mm_set_epi32(0, 0, 0x0e0c0a08, 0x06040200);
#endif
return _mm_unpacklo_epi64(_mm_shuffle_epi8(b, order),
_mm_shuffle_epi8(a, order));
#else
return v128_unziphi_8(_mm_slli_si128(a, 1), _mm_slli_si128(b, 1));
#endif
}
SIMD_INLINE v128 v128_unziphi_16(v128 a, v128 b) {
return _mm_packs_epi32(_mm_srai_epi32(b, 16), _mm_srai_epi32(a, 16));
}
SIMD_INLINE v128 v128_unziplo_16(v128 a, v128 b) {
#if defined(__SSSE3__)
#ifdef __x86_64__
v128 order = _mm_cvtsi64_si128(0x0d0c090805040100LL);
#else
v128 order = _mm_set_epi32(0, 0, 0x0d0c0908, 0x05040100);
#endif
return _mm_unpacklo_epi64(_mm_shuffle_epi8(b, order),
_mm_shuffle_epi8(a, order));
#else
return v128_unziphi_16(_mm_slli_si128(a, 2), _mm_slli_si128(b, 2));
#endif
}
SIMD_INLINE v128 v128_unziphi_32(v128 a, v128 b) {
return _mm_castps_si128(_mm_shuffle_ps(
_mm_castsi128_ps(b), _mm_castsi128_ps(a), _MM_SHUFFLE(3, 1, 3, 1)));
}
SIMD_INLINE v128 v128_unziplo_32(v128 a, v128 b) {
return _mm_castps_si128(_mm_shuffle_ps(
_mm_castsi128_ps(b), _mm_castsi128_ps(a), _MM_SHUFFLE(2, 0, 2, 0)));
}
SIMD_INLINE v128 v128_unpack_u8_s16(v64 a) {
return _mm_unpacklo_epi8(a, _mm_setzero_si128());
}
SIMD_INLINE v128 v128_unpacklo_u8_s16(v128 a) {
return _mm_unpacklo_epi8(a, _mm_setzero_si128());
}
SIMD_INLINE v128 v128_unpackhi_u8_s16(v128 a) {
return _mm_unpackhi_epi8(a, _mm_setzero_si128());
}
SIMD_INLINE v128 v128_pack_s32_s16(v128 a, v128 b) {
return _mm_packs_epi32(b, a);
}
SIMD_INLINE v128 v128_pack_s16_u8(v128 a, v128 b) {
return _mm_packus_epi16(b, a);
}
SIMD_INLINE v128 v128_pack_s16_s8(v128 a, v128 b) {
return _mm_packs_epi16(b, a);
}
SIMD_INLINE v128 v128_unpack_u16_s32(v64 a) {
return _mm_unpacklo_epi16(a, _mm_setzero_si128());
}
SIMD_INLINE v128 v128_unpack_s16_s32(v64 a) {
return _mm_srai_epi32(_mm_unpacklo_epi16(a, a), 16);
}
SIMD_INLINE v128 v128_unpacklo_u16_s32(v128 a) {
return _mm_unpacklo_epi16(a, _mm_setzero_si128());
}
SIMD_INLINE v128 v128_unpacklo_s16_s32(v128 a) {
return _mm_srai_epi32(_mm_unpacklo_epi16(a, a), 16);
}
SIMD_INLINE v128 v128_unpackhi_u16_s32(v128 a) {
return _mm_unpackhi_epi16(a, _mm_setzero_si128());
}
SIMD_INLINE v128 v128_unpackhi_s16_s32(v128 a) {
return _mm_srai_epi32(_mm_unpackhi_epi16(a, a), 16);
}
SIMD_INLINE v128 v128_shuffle_8(v128 x, v128 pattern) {
#if defined(__SSSE3__)
return _mm_shuffle_epi8(x, pattern);
#else
v128 output;
unsigned char *input = (unsigned char *)&x;
unsigned char *index = (unsigned char *)&pattern;
char *selected = (char *)&output;
int counter;
for (counter = 0; counter < 16; counter++) {
selected[counter] = input[index[counter] & 15];
}
return output;
#endif
}
SIMD_INLINE int64_t v128_dotp_s16(v128 a, v128 b) {
v128 r = _mm_madd_epi16(a, b);
#if defined(__SSE4_1__) && defined(__x86_64__)
v128 c = _mm_add_epi64(_mm_cvtepi32_epi64(r),
_mm_cvtepi32_epi64(_mm_srli_si128(r, 8)));
return _mm_cvtsi128_si64(_mm_add_epi64(c, _mm_srli_si128(c, 8)));
#else
return (int64_t)_mm_cvtsi128_si32(r) +
(int64_t)_mm_cvtsi128_si32(_mm_srli_si128(r, 4)) +
(int64_t)_mm_cvtsi128_si32(_mm_srli_si128(r, 8)) +
(int64_t)_mm_cvtsi128_si32(_mm_srli_si128(r, 12));
#endif
}
SIMD_INLINE uint64_t v128_hadd_u8(v128 a) {
v128 t = _mm_sad_epu8(a, _mm_setzero_si128());
return v64_low_u32(v128_low_v64(t)) + v64_low_u32(v128_high_v64(t));
}
typedef v128 sad128_internal;
SIMD_INLINE sad128_internal v128_sad_u8_init() { return _mm_setzero_si128(); }
/* Implementation dependent return value. Result must be finalised with
v128_sad_sum().
The result for more than 32 v128_sad_u8() calls is undefined. */
SIMD_INLINE sad128_internal v128_sad_u8(sad128_internal s, v128 a, v128 b) {
return _mm_add_epi64(s, _mm_sad_epu8(a, b));
}
SIMD_INLINE uint32_t v128_sad_u8_sum(sad128_internal s) {
return v128_low_u32(_mm_add_epi32(s, _mm_unpackhi_epi64(s, s)));
}
typedef v128 ssd128_internal;
SIMD_INLINE ssd128_internal v128_ssd_u8_init() { return _mm_setzero_si128(); }
/* Implementation dependent return value. Result must be finalised with
* v128_ssd_sum(). */
SIMD_INLINE ssd128_internal v128_ssd_u8(ssd128_internal s, v128 a, v128 b) {
v128 l = _mm_sub_epi16(_mm_unpacklo_epi8(a, _mm_setzero_si128()),
_mm_unpacklo_epi8(b, _mm_setzero_si128()));
v128 h = _mm_sub_epi16(_mm_unpackhi_epi8(a, _mm_setzero_si128()),
_mm_unpackhi_epi8(b, _mm_setzero_si128()));
v128 rl = _mm_madd_epi16(l, l);
v128 rh = _mm_madd_epi16(h, h);
v128 c = _mm_cvtsi32_si128(32);
rl = _mm_add_epi32(rl, _mm_srli_si128(rl, 8));
rl = _mm_add_epi32(rl, _mm_srli_si128(rl, 4));
rh = _mm_add_epi32(rh, _mm_srli_si128(rh, 8));
rh = _mm_add_epi32(rh, _mm_srli_si128(rh, 4));
return _mm_add_epi64(
s, _mm_srl_epi64(_mm_sll_epi64(_mm_unpacklo_epi64(rl, rh), c), c));
}
SIMD_INLINE uint32_t v128_ssd_u8_sum(ssd128_internal s) {
return v128_low_u32(_mm_add_epi32(s, _mm_unpackhi_epi64(s, s)));
}
SIMD_INLINE v128 v128_or(v128 a, v128 b) { return _mm_or_si128(a, b); }
SIMD_INLINE v128 v128_xor(v128 a, v128 b) { return _mm_xor_si128(a, b); }
SIMD_INLINE v128 v128_and(v128 a, v128 b) { return _mm_and_si128(a, b); }
SIMD_INLINE v128 v128_andn(v128 a, v128 b) { return _mm_andnot_si128(b, a); }
SIMD_INLINE v128 v128_mul_s16(v64 a, v64 b) {
v64 lo_bits = v64_mullo_s16(a, b);
v64 hi_bits = v64_mulhi_s16(a, b);
return v128_from_v64(v64_ziphi_16(hi_bits, lo_bits),
v64_ziplo_16(hi_bits, lo_bits));
}
SIMD_INLINE v128 v128_mullo_s16(v128 a, v128 b) {
return _mm_mullo_epi16(a, b);
}
SIMD_INLINE v128 v128_mulhi_s16(v128 a, v128 b) {
return _mm_mulhi_epi16(a, b);
}
SIMD_INLINE v128 v128_mullo_s32(v128 a, v128 b) {
#if defined(__SSE4_1__)
return _mm_mullo_epi32(a, b);
#else
return _mm_unpacklo_epi32(
_mm_shuffle_epi32(_mm_mul_epu32(a, b), 8),
_mm_shuffle_epi32(
_mm_mul_epu32(_mm_srli_si128(a, 4), _mm_srli_si128(b, 4)), 8));
#endif
}
SIMD_INLINE v128 v128_madd_s16(v128 a, v128 b) { return _mm_madd_epi16(a, b); }
SIMD_INLINE v128 v128_madd_us8(v128 a, v128 b) {
#if defined(__SSSE3__)
return _mm_maddubs_epi16(a, b);
#else
return _mm_packs_epi32(
_mm_madd_epi16(_mm_unpacklo_epi8(a, _mm_setzero_si128()),
_mm_srai_epi16(_mm_unpacklo_epi8(b, b), 8)),
_mm_madd_epi16(_mm_unpackhi_epi8(a, _mm_setzero_si128()),
_mm_srai_epi16(_mm_unpackhi_epi8(b, b), 8)));
#endif
}
SIMD_INLINE v128 v128_avg_u8(v128 a, v128 b) { return _mm_avg_epu8(a, b); }
SIMD_INLINE v128 v128_rdavg_u8(v128 a, v128 b) {
return _mm_sub_epi8(_mm_avg_epu8(a, b),
_mm_and_si128(_mm_xor_si128(a, b), v128_dup_8(1)));
}
SIMD_INLINE v128 v128_avg_u16(v128 a, v128 b) { return _mm_avg_epu16(a, b); }
SIMD_INLINE v128 v128_min_u8(v128 a, v128 b) { return _mm_min_epu8(a, b); }
SIMD_INLINE v128 v128_max_u8(v128 a, v128 b) { return _mm_max_epu8(a, b); }
SIMD_INLINE v128 v128_min_s8(v128 a, v128 b) {
#if defined(__SSE4_1__)
return _mm_min_epi8(a, b);
#else
v128 mask = _mm_cmplt_epi8(a, b);
return _mm_or_si128(_mm_andnot_si128(mask, b), _mm_and_si128(mask, a));
#endif
}
SIMD_INLINE v128 v128_max_s8(v128 a, v128 b) {
#if defined(__SSE4_1__)
return _mm_max_epi8(a, b);
#else
v128 mask = _mm_cmplt_epi8(b, a);
return _mm_or_si128(_mm_andnot_si128(mask, b), _mm_and_si128(mask, a));
#endif
}
SIMD_INLINE v128 v128_min_s16(v128 a, v128 b) { return _mm_min_epi16(a, b); }
SIMD_INLINE v128 v128_max_s16(v128 a, v128 b) { return _mm_max_epi16(a, b); }
SIMD_INLINE v128 v128_cmpgt_s8(v128 a, v128 b) { return _mm_cmpgt_epi8(a, b); }
SIMD_INLINE v128 v128_cmplt_s8(v128 a, v128 b) { return _mm_cmplt_epi8(a, b); }
SIMD_INLINE v128 v128_cmpeq_8(v128 a, v128 b) { return _mm_cmpeq_epi8(a, b); }
SIMD_INLINE v128 v128_cmpgt_s16(v128 a, v128 b) {
return _mm_cmpgt_epi16(a, b);
}
SIMD_INLINE v128 v128_cmplt_s16(v128 a, v128 b) {
return _mm_cmplt_epi16(a, b);
}
SIMD_INLINE v128 v128_cmpeq_16(v128 a, v128 b) { return _mm_cmpeq_epi16(a, b); }
SIMD_INLINE v128 v128_shl_8(v128 a, unsigned int c) {
return _mm_and_si128(_mm_set1_epi8((uint8_t)(0xff << c)),
_mm_sll_epi16(a, _mm_cvtsi32_si128(c)));
}
SIMD_INLINE v128 v128_shr_u8(v128 a, unsigned int c) {
return _mm_and_si128(_mm_set1_epi8(0xff >> c),
_mm_srl_epi16(a, _mm_cvtsi32_si128(c)));
}
SIMD_INLINE v128 v128_shr_s8(v128 a, unsigned int c) {
__m128i x = _mm_cvtsi32_si128(c + 8);
return _mm_packs_epi16(_mm_sra_epi16(_mm_unpacklo_epi8(a, a), x),
_mm_sra_epi16(_mm_unpackhi_epi8(a, a), x));
}
SIMD_INLINE v128 v128_shl_16(v128 a, unsigned int c) {
return _mm_sll_epi16(a, _mm_cvtsi32_si128(c));
}
SIMD_INLINE v128 v128_shr_u16(v128 a, unsigned int c) {
return _mm_srl_epi16(a, _mm_cvtsi32_si128(c));
}
SIMD_INLINE v128 v128_shr_s16(v128 a, unsigned int c) {
return _mm_sra_epi16(a, _mm_cvtsi32_si128(c));
}
SIMD_INLINE v128 v128_shl_32(v128 a, unsigned int c) {
return _mm_sll_epi32(a, _mm_cvtsi32_si128(c));
}
SIMD_INLINE v128 v128_shr_u32(v128 a, unsigned int c) {
return _mm_srl_epi32(a, _mm_cvtsi32_si128(c));
}
SIMD_INLINE v128 v128_shr_s32(v128 a, unsigned int c) {
return _mm_sra_epi32(a, _mm_cvtsi32_si128(c));
}
/* These intrinsics require immediate values, so we must use #defines
to enforce that. */
#define v128_shl_n_byte(a, c) _mm_slli_si128(a, c)
#define v128_shr_n_byte(a, c) _mm_srli_si128(a, c)
#define v128_shl_n_8(a, c) \
_mm_and_si128(_mm_set1_epi8((uint8_t)(0xff << (c))), _mm_slli_epi16(a, c))
#define v128_shr_n_u8(a, c) \
_mm_and_si128(_mm_set1_epi8(0xff >> (c)), _mm_srli_epi16(a, c))
#define v128_shr_n_s8(a, c) \
_mm_packs_epi16(_mm_srai_epi16(_mm_unpacklo_epi8(a, a), (c) + 8), \
_mm_srai_epi16(_mm_unpackhi_epi8(a, a), (c) + 8))
#define v128_shl_n_16(a, c) _mm_slli_epi16(a, c)
#define v128_shr_n_u16(a, c) _mm_srli_epi16(a, c)
#define v128_shr_n_s16(a, c) _mm_srai_epi16(a, c)
#define v128_shl_n_32(a, c) _mm_slli_epi32(a, c)
#define v128_shr_n_u32(a, c) _mm_srli_epi32(a, c)
#define v128_shr_n_s32(a, c) _mm_srai_epi32(a, c)
#endif /* _V128_INTRINSICS_H */

View File

@@ -1,274 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef _V256_INTRINSICS_H
#define _V256_INTRINSICS_H
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "./v256_intrinsics_c.h"
#include "./v128_intrinsics.h"
#include "./v64_intrinsics.h"
/* Fallback to plain, unoptimised C. */
typedef c_v256 v256;
SIMD_INLINE uint32_t v256_low_u32(v256 a) { return c_v256_low_u32(a); }
SIMD_INLINE v64 v256_low_v64(v256 a) { return c_v256_low_v64(a); }
SIMD_INLINE v128 v256_low_v128(v256 a) { return c_v256_low_v128(a); }
SIMD_INLINE v128 v256_high_v128(v256 a) { return c_v256_high_v128(a); }
SIMD_INLINE v256 v256_from_v128(v128 hi, v128 lo) {
return c_v256_from_v128(hi, lo);
}
SIMD_INLINE v256 v256_from_64(uint64_t a, uint64_t b, uint64_t c, uint64_t d) {
return c_v256_from_64(a, b, c, d);
}
SIMD_INLINE v256 v256_from_v64(v64 a, v64 b, v64 c, v64 d) {
return c_v256_from_v64(a, b, c, d);
}
SIMD_INLINE v256 v256_load_unaligned(const void *p) {
return c_v256_load_unaligned(p);
}
SIMD_INLINE v256 v256_load_aligned(const void *p) {
return c_v256_load_aligned(p);
}
SIMD_INLINE void v256_store_unaligned(void *p, v256 a) {
c_v256_store_unaligned(p, a);
}
SIMD_INLINE void v256_store_aligned(void *p, v256 a) {
c_v256_store_aligned(p, a);
}
SIMD_INLINE v256 v256_align(v256 a, v256 b, const unsigned int c) {
return c_v256_align(a, b, c);
}
SIMD_INLINE v256 v256_zero() { return c_v256_zero(); }
SIMD_INLINE v256 v256_dup_8(uint8_t x) { return c_v256_dup_8(x); }
SIMD_INLINE v256 v256_dup_16(uint16_t x) { return c_v256_dup_16(x); }
SIMD_INLINE v256 v256_dup_32(uint32_t x) { return c_v256_dup_32(x); }
typedef uint32_t sad256_internal;
SIMD_INLINE sad256_internal v256_sad_u8_init() { return c_v256_sad_u8_init(); }
SIMD_INLINE sad256_internal v256_sad_u8(sad256_internal s, v256 a, v256 b) {
return c_v256_sad_u8(s, a, b);
}
SIMD_INLINE uint32_t v256_sad_u8_sum(sad256_internal s) {
return c_v256_sad_u8_sum(s);
}
typedef uint32_t ssd256_internal;
SIMD_INLINE ssd256_internal v256_ssd_u8_init() { return c_v256_ssd_u8_init(); }
SIMD_INLINE ssd256_internal v256_ssd_u8(ssd256_internal s, v256 a, v256 b) {
return c_v256_ssd_u8(s, a, b);
}
SIMD_INLINE uint32_t v256_ssd_u8_sum(ssd256_internal s) {
return c_v256_ssd_u8_sum(s);
}
SIMD_INLINE int64_t v256_dotp_s16(v256 a, v256 b) {
return c_v256_dotp_s16(a, b);
}
SIMD_INLINE uint64_t v256_hadd_u8(v256 a) { return c_v256_hadd_u8(a); }
SIMD_INLINE v256 v256_or(v256 a, v256 b) { return c_v256_or(a, b); }
SIMD_INLINE v256 v256_xor(v256 a, v256 b) { return c_v256_xor(a, b); }
SIMD_INLINE v256 v256_and(v256 a, v256 b) { return c_v256_and(a, b); }
SIMD_INLINE v256 v256_andn(v256 a, v256 b) { return c_v256_andn(a, b); }
SIMD_INLINE v256 v256_add_8(v256 a, v256 b) { return c_v256_add_8(a, b); }
SIMD_INLINE v256 v256_add_16(v256 a, v256 b) { return c_v256_add_16(a, b); }
SIMD_INLINE v256 v256_sadd_s16(v256 a, v256 b) { return c_v256_sadd_s16(a, b); }
SIMD_INLINE v256 v256_add_32(v256 a, v256 b) { return c_v256_add_32(a, b); }
SIMD_INLINE v256 v256_padd_s16(v256 a) { return c_v256_padd_s16(a); }
SIMD_INLINE v256 v256_sub_8(v256 a, v256 b) { return c_v256_sub_8(a, b); }
SIMD_INLINE v256 v256_ssub_u8(v256 a, v256 b) { return c_v256_ssub_u8(a, b); }
SIMD_INLINE v256 v256_ssub_s8(v256 a, v256 b) { return c_v256_ssub_s8(a, b); }
SIMD_INLINE v256 v256_sub_16(v256 a, v256 b) { return c_v256_sub_16(a, b); }
SIMD_INLINE v256 v256_ssub_s16(v256 a, v256 b) { return c_v256_ssub_s16(a, b); }
SIMD_INLINE v256 v256_sub_32(v256 a, v256 b) { return c_v256_sub_32(a, b); }
SIMD_INLINE v256 v256_abs_s16(v256 a) { return c_v256_abs_s16(a); }
SIMD_INLINE v256 v256_mul_s16(v128 a, v128 b) { return c_v256_mul_s16(a, b); }
SIMD_INLINE v256 v256_mullo_s16(v256 a, v256 b) {
return c_v256_mullo_s16(a, b);
}
SIMD_INLINE v256 v256_mulhi_s16(v256 a, v256 b) {
return c_v256_mulhi_s16(a, b);
}
SIMD_INLINE v256 v256_mullo_s32(v256 a, v256 b) {
return c_v256_mullo_s32(a, b);
}
SIMD_INLINE v256 v256_madd_s16(v256 a, v256 b) { return c_v256_madd_s16(a, b); }
SIMD_INLINE v256 v256_madd_us8(v256 a, v256 b) { return c_v256_madd_us8(a, b); }
SIMD_INLINE v256 v256_avg_u8(v256 a, v256 b) { return c_v256_avg_u8(a, b); }
SIMD_INLINE v256 v256_rdavg_u8(v256 a, v256 b) { return c_v256_rdavg_u8(a, b); }
SIMD_INLINE v256 v256_avg_u16(v256 a, v256 b) { return c_v256_avg_u16(a, b); }
SIMD_INLINE v256 v256_min_u8(v256 a, v256 b) { return c_v256_min_u8(a, b); }
SIMD_INLINE v256 v256_max_u8(v256 a, v256 b) { return c_v256_max_u8(a, b); }
SIMD_INLINE v256 v256_min_s8(v256 a, v256 b) { return c_v256_min_s8(a, b); }
SIMD_INLINE v256 v256_max_s8(v256 a, v256 b) { return c_v256_max_s8(a, b); }
SIMD_INLINE v256 v256_min_s16(v256 a, v256 b) { return c_v256_min_s16(a, b); }
SIMD_INLINE v256 v256_max_s16(v256 a, v256 b) { return c_v256_max_s16(a, b); }
SIMD_INLINE v256 v256_ziplo_8(v256 a, v256 b) { return c_v256_ziplo_8(a, b); }
SIMD_INLINE v256 v256_ziphi_8(v256 a, v256 b) { return c_v256_ziphi_8(a, b); }
SIMD_INLINE v256 v256_ziplo_16(v256 a, v256 b) { return c_v256_ziplo_16(a, b); }
SIMD_INLINE v256 v256_ziphi_16(v256 a, v256 b) { return c_v256_ziphi_16(a, b); }
SIMD_INLINE v256 v256_ziplo_32(v256 a, v256 b) { return c_v256_ziplo_32(a, b); }
SIMD_INLINE v256 v256_ziphi_32(v256 a, v256 b) { return c_v256_ziphi_32(a, b); }
SIMD_INLINE v256 v256_ziplo_64(v256 a, v256 b) { return c_v256_ziplo_64(a, b); }
SIMD_INLINE v256 v256_ziphi_64(v256 a, v256 b) { return c_v256_ziphi_64(a, b); }
SIMD_INLINE v256 v256_ziplo_128(v256 a, v256 b) {
return c_v256_ziplo_128(a, b);
}
SIMD_INLINE v256 v256_ziphi_128(v256 a, v256 b) {
return c_v256_ziphi_128(a, b);
}
SIMD_INLINE v256 v256_zip_8(v128 a, v128 b) { return c_v256_zip_8(a, b); }
SIMD_INLINE v256 v256_zip_16(v128 a, v128 b) { return c_v256_zip_16(a, b); }
SIMD_INLINE v256 v256_zip_32(v128 a, v128 b) { return c_v256_zip_32(a, b); }
SIMD_INLINE v256 v256_unziplo_8(v256 a, v256 b) {
return c_v256_unziplo_8(a, b);
}
SIMD_INLINE v256 v256_unziphi_8(v256 a, v256 b) {
return c_v256_unziphi_8(a, b);
}
SIMD_INLINE v256 v256_unziplo_16(v256 a, v256 b) {
return c_v256_unziplo_16(a, b);
}
SIMD_INLINE v256 v256_unziphi_16(v256 a, v256 b) {
return c_v256_unziphi_16(a, b);
}
SIMD_INLINE v256 v256_unziplo_32(v256 a, v256 b) {
return c_v256_unziplo_32(a, b);
}
SIMD_INLINE v256 v256_unziphi_32(v256 a, v256 b) {
return c_v256_unziphi_32(a, b);
}
SIMD_INLINE v256 v256_unpack_u8_s16(v128 a) { return c_v256_unpack_u8_s16(a); }
SIMD_INLINE v256 v256_unpacklo_u8_s16(v256 a) {
return c_v256_unpacklo_u8_s16(a);
}
SIMD_INLINE v256 v256_unpackhi_u8_s16(v256 a) {
return c_v256_unpackhi_u8_s16(a);
}
SIMD_INLINE v256 v256_pack_s32_s16(v256 a, v256 b) {
return c_v256_pack_s32_s16(a, b);
}
SIMD_INLINE v256 v256_pack_s16_u8(v256 a, v256 b) {
return c_v256_pack_s16_u8(a, b);
}
SIMD_INLINE v256 v256_pack_s16_s8(v256 a, v256 b) {
return c_v256_pack_s16_s8(a, b);
}
SIMD_INLINE v256 v256_unpack_u16_s32(v128 a) {
return c_v256_unpack_u16_s32(a);
}
SIMD_INLINE v256 v256_unpack_s16_s32(v128 a) {
return c_v256_unpack_s16_s32(a);
}
SIMD_INLINE v256 v256_unpacklo_u16_s32(v256 a) {
return c_v256_unpacklo_u16_s32(a);
}
SIMD_INLINE v256 v256_unpacklo_s16_s32(v256 a) {
return c_v256_unpacklo_s16_s32(a);
}
SIMD_INLINE v256 v256_unpackhi_u16_s32(v256 a) {
return c_v256_unpackhi_u16_s32(a);
}
SIMD_INLINE v256 v256_unpackhi_s16_s32(v256 a) {
return c_v256_unpackhi_s16_s32(a);
}
SIMD_INLINE v256 v256_shuffle_8(v256 a, v256 pattern) {
return c_v256_shuffle_8(a, pattern);
}
SIMD_INLINE v256 v256_pshuffle_8(v256 a, v256 pattern) {
return c_v256_pshuffle_8(a, pattern);
}
SIMD_INLINE v256 v256_cmpgt_s8(v256 a, v256 b) { return c_v256_cmpgt_s8(a, b); }
SIMD_INLINE v256 v256_cmplt_s8(v256 a, v256 b) { return c_v256_cmplt_s8(a, b); }
SIMD_INLINE v256 v256_cmpeq_8(v256 a, v256 b) { return c_v256_cmpeq_8(a, b); }
SIMD_INLINE v256 v256_cmpgt_s16(v256 a, v256 b) {
return c_v256_cmpgt_s16(a, b);
}
SIMD_INLINE v256 v256_cmplt_s16(v256 a, v256 b) {
return c_v256_cmplt_s16(a, b);
}
SIMD_INLINE v256 v256_cmpeq_16(v256 a, v256 b) { return c_v256_cmpeq_16(a, b); }
SIMD_INLINE v256 v256_shl_8(v256 a, unsigned int c) {
return c_v256_shl_8(a, c);
}
SIMD_INLINE v256 v256_shr_u8(v256 a, unsigned int c) {
return c_v256_shr_u8(a, c);
}
SIMD_INLINE v256 v256_shr_s8(v256 a, unsigned int c) {
return c_v256_shr_s8(a, c);
}
SIMD_INLINE v256 v256_shl_16(v256 a, unsigned int c) {
return c_v256_shl_16(a, c);
}
SIMD_INLINE v256 v256_shr_u16(v256 a, unsigned int c) {
return c_v256_shr_u16(a, c);
}
SIMD_INLINE v256 v256_shr_s16(v256 a, unsigned int c) {
return c_v256_shr_s16(a, c);
}
SIMD_INLINE v256 v256_shl_32(v256 a, unsigned int c) {
return c_v256_shl_32(a, c);
}
SIMD_INLINE v256 v256_shr_u32(v256 a, unsigned int c) {
return c_v256_shr_u32(a, c);
}
SIMD_INLINE v256 v256_shr_s32(v256 a, unsigned int c) {
return c_v256_shr_s32(a, c);
}
SIMD_INLINE v256 v256_shr_n_byte(v256 a, const unsigned int n) {
return c_v256_shr_n_byte(a, n);
}
SIMD_INLINE v256 v256_shl_n_byte(v256 a, const unsigned int n) {
return c_v256_shl_n_byte(a, n);
}
SIMD_INLINE v256 v256_shl_n_8(v256 a, const unsigned int n) {
return c_v256_shl_n_8(a, n);
}
SIMD_INLINE v256 v256_shl_n_16(v256 a, const unsigned int n) {
return c_v256_shl_n_16(a, n);
}
SIMD_INLINE v256 v256_shl_n_32(v256 a, const unsigned int n) {
return c_v256_shl_n_32(a, n);
}
SIMD_INLINE v256 v256_shr_n_u8(v256 a, const unsigned int n) {
return c_v256_shr_n_u8(a, n);
}
SIMD_INLINE v256 v256_shr_n_u16(v256 a, const unsigned int n) {
return c_v256_shr_n_u16(a, n);
}
SIMD_INLINE v256 v256_shr_n_u32(v256 a, const unsigned int n) {
return c_v256_shr_n_u32(a, n);
}
SIMD_INLINE v256 v256_shr_n_s8(v256 a, const unsigned int n) {
return c_v256_shr_n_s8(a, n);
}
SIMD_INLINE v256 v256_shr_n_s16(v256 a, const unsigned int n) {
return c_v256_shr_n_s16(a, n);
}
SIMD_INLINE v256 v256_shr_n_s32(v256 a, const unsigned int n) {
return c_v256_shr_n_s32(a, n);
}
#endif /* _V256_INTRINSICS_H */

View File

@@ -1,17 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef _V256_INTRINSICS_H
#define _V256_INTRINSICS_H
#include "./v256_intrinsics_v128.h"
#endif /* _V256_INTRINSICS_H */

View File

@@ -1,701 +0,0 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef _V256_INTRINSICS_C_H
#define _V256_INTRINSICS_C_H
#include <stdio.h>
#include <stdlib.h>
#include "./v128_intrinsics_c.h"
#include "./aom_config.h"
typedef union {
uint8_t u8[32];
uint16_t u16[16];
uint32_t u32[8];
uint64_t u64[4];
int8_t s8[32];
int16_t s16[16];
int32_t s32[8];
int64_t s64[4];
c_v64 v64[4];
c_v128 v128[2];
} c_v256;
SIMD_INLINE uint32_t c_v256_low_u32(c_v256 a) { return a.u32[0]; }
SIMD_INLINE c_v64 c_v256_low_v64(c_v256 a) { return a.v64[0]; }
SIMD_INLINE c_v128 c_v256_low_v128(c_v256 a) { return a.v128[0]; }
SIMD_INLINE c_v128 c_v256_high_v128(c_v256 a) { return a.v128[1]; }
SIMD_INLINE c_v256 c_v256_from_v128(c_v128 hi, c_v128 lo) {
c_v256 t;
t.v128[1] = hi;
t.v128[0] = lo;
return t;
}
SIMD_INLINE c_v256 c_v256_from_64(uint64_t a, uint64_t b, uint64_t c,
uint64_t d) {
c_v256 t;
t.u64[3] = a;
t.u64[2] = b;
t.u64[1] = c;
t.u64[0] = d;
return t;
}
SIMD_INLINE c_v256 c_v256_from_v64(c_v64 a, c_v64 b, c_v64 c, c_v64 d) {
c_v256 t;
t.u64[3] = a.u64;
t.u64[2] = b.u64;
t.u64[1] = c.u64;
t.u64[0] = d.u64;
return t;
}
SIMD_INLINE c_v256 c_v256_load_unaligned(const void *p) {
c_v256 t;
uint8_t *pp = (uint8_t *)p;
uint8_t *q = (uint8_t *)&t;
int c;
for (c = 0; c < 32; c++) q[c] = pp[c];
return t;
}
SIMD_INLINE c_v256 c_v256_load_aligned(const void *p) {
if (simd_check && (uintptr_t)p & 31) {
fprintf(stderr, "Error: unaligned v256 load at %p\n", p);
abort();
}
return c_v256_load_unaligned(p);
}
SIMD_INLINE void c_v256_store_unaligned(void *p, c_v256 a) {
uint8_t *pp = (uint8_t *)p;
uint8_t *q = (uint8_t *)&a;
int c;
for (c = 0; c < 32; c++) pp[c] = q[c];
}
SIMD_INLINE void c_v256_store_aligned(void *p, c_v256 a) {
if (simd_check && (uintptr_t)p & 31) {
fprintf(stderr, "Error: unaligned v256 store at %p\n", p);
abort();
}
c_v256_store_unaligned(p, a);
}
SIMD_INLINE c_v256 c_v256_zero() {
c_v256 t;
t.u64[3] = t.u64[2] = t.u64[1] = t.u64[0] = 0;
return t;
}
SIMD_INLINE c_v256 c_v256_dup_8(uint8_t x) {
c_v256 t;
t.v64[3] = t.v64[2] = t.v64[1] = t.v64[0] = c_v64_dup_8(x);
return t;
}
SIMD_INLINE c_v256 c_v256_dup_16(uint16_t x) {
c_v256 t;
t.v64[3] = t.v64[2] = t.v64[1] = t.v64[0] = c_v64_dup_16(x);
return t;
}
SIMD_INLINE c_v256 c_v256_dup_32(uint32_t x) {
c_v256 t;
t.v64[3] = t.v64[2] = t.v64[1] = t.v64[0] = c_v64_dup_32(x);
return t;
}
SIMD_INLINE int64_t c_v256_dotp_s16(c_v256 a, c_v256 b) {
return c_v128_dotp_s16(a.v128[1], b.v128[1]) +
c_v128_dotp_s16(a.v128[0], b.v128[0]);
}
SIMD_INLINE uint64_t c_v256_hadd_u8(c_v256 a) {
return c_v128_hadd_u8(a.v128[1]) + c_v128_hadd_u8(a.v128[0]);
}
typedef uint32_t c_sad256_internal;
SIMD_INLINE c_sad128_internal c_v256_sad_u8_init() { return 0; }
/* Implementation dependent return value. Result must be finalised with
v256_sad_u8_sum().
The result for more than 16 v256_sad_u8() calls is undefined. */
SIMD_INLINE c_sad128_internal c_v256_sad_u8(c_sad256_internal s, c_v256 a,
c_v256 b) {
int c;
for (c = 0; c < 32; c++)
s += a.u8[c] > b.u8[c] ? a.u8[c] - b.u8[c] : b.u8[c] - a.u8[c];
return s;
}
SIMD_INLINE uint32_t c_v256_sad_u8_sum(c_sad256_internal s) { return s; }
typedef uint32_t c_ssd256_internal;
SIMD_INLINE c_ssd256_internal c_v256_ssd_u8_init() { return 0; }
/* Implementation dependent return value. Result must be finalised with
* v256_ssd_u8_sum(). */
SIMD_INLINE c_ssd256_internal c_v256_ssd_u8(c_ssd256_internal s, c_v256 a,
c_v256 b) {
int c;
for (c = 0; c < 32; c++) s += (a.u8[c] - b.u8[c]) * (a.u8[c] - b.u8[c]);
return s;
}
SIMD_INLINE uint32_t c_v256_ssd_u8_sum(c_ssd256_internal s) { return s; }
SIMD_INLINE c_v256 c_v256_or(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_or(a.v128[1], b.v128[1]),
c_v128_or(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_xor(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_xor(a.v128[1], b.v128[1]),
c_v128_xor(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_and(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_and(a.v128[1], b.v128[1]),
c_v128_and(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_andn(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_andn(a.v128[1], b.v128[1]),
c_v128_andn(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_add_8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_add_8(a.v128[1], b.v128[1]),
c_v128_add_8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_add_16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_add_16(a.v128[1], b.v128[1]),
c_v128_add_16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_sadd_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_sadd_s16(a.v128[1], b.v128[1]),
c_v128_sadd_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_add_32(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_add_32(a.v128[1], b.v128[1]),
c_v128_add_32(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_padd_s16(c_v256 a) {
c_v256 t;
t.s32[0] = (int32_t)a.s16[0] + (int32_t)a.s16[1];
t.s32[1] = (int32_t)a.s16[2] + (int32_t)a.s16[3];
t.s32[2] = (int32_t)a.s16[4] + (int32_t)a.s16[5];
t.s32[3] = (int32_t)a.s16[6] + (int32_t)a.s16[7];
t.s32[4] = (int32_t)a.s16[8] + (int32_t)a.s16[9];
t.s32[5] = (int32_t)a.s16[10] + (int32_t)a.s16[11];
t.s32[6] = (int32_t)a.s16[12] + (int32_t)a.s16[13];
t.s32[7] = (int32_t)a.s16[14] + (int32_t)a.s16[15];
return t;
}
SIMD_INLINE c_v256 c_v256_sub_8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_sub_8(a.v128[1], b.v128[1]),
c_v128_sub_8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ssub_u8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ssub_u8(a.v128[1], b.v128[1]),
c_v128_ssub_u8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ssub_s8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ssub_s8(a.v128[1], b.v128[1]),
c_v128_ssub_s8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_sub_16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_sub_16(a.v128[1], b.v128[1]),
c_v128_sub_16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ssub_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ssub_s16(a.v128[1], b.v128[1]),
c_v128_ssub_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_sub_32(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_sub_32(a.v128[1], b.v128[1]),
c_v128_sub_32(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_abs_s16(c_v256 a) {
return c_v256_from_v128(c_v128_abs_s16(a.v128[1]), c_v128_abs_s16(a.v128[0]));
}
SIMD_INLINE c_v256 c_v256_mul_s16(c_v128 a, c_v128 b) {
c_v128 lo_bits = c_v128_mullo_s16(a, b);
c_v128 hi_bits = c_v128_mulhi_s16(a, b);
return c_v256_from_v128(c_v128_ziphi_16(hi_bits, lo_bits),
c_v128_ziplo_16(hi_bits, lo_bits));
}
SIMD_INLINE c_v256 c_v256_mullo_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_mullo_s16(a.v128[1], b.v128[1]),
c_v128_mullo_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_mulhi_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_mulhi_s16(a.v128[1], b.v128[1]),
c_v128_mulhi_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_mullo_s32(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_mullo_s32(a.v128[1], b.v128[1]),
c_v128_mullo_s32(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_madd_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_madd_s16(a.v128[1], b.v128[1]),
c_v128_madd_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_madd_us8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_madd_us8(a.v128[1], b.v128[1]),
c_v128_madd_us8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_avg_u8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_avg_u8(a.v128[1], b.v128[1]),
c_v128_avg_u8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_rdavg_u8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_rdavg_u8(a.v128[1], b.v128[1]),
c_v128_rdavg_u8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_avg_u16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_avg_u16(a.v128[1], b.v128[1]),
c_v128_avg_u16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_min_u8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_min_u8(a.v128[1], b.v128[1]),
c_v128_min_u8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_max_u8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_max_u8(a.v128[1], b.v128[1]),
c_v128_max_u8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_min_s8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_min_s8(a.v128[1], b.v128[1]),
c_v128_min_s8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_max_s8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_max_s8(a.v128[1], b.v128[1]),
c_v128_max_s8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_min_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_min_s16(a.v128[1], b.v128[1]),
c_v128_min_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_max_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_max_s16(a.v128[1], b.v128[1]),
c_v128_max_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ziplo_8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_8(a.v128[0], b.v128[0]),
c_v128_ziplo_8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ziphi_8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_8(a.v128[1], b.v128[1]),
c_v128_ziplo_8(a.v128[1], b.v128[1]));
}
SIMD_INLINE c_v256 c_v256_ziplo_16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_16(a.v128[0], b.v128[0]),
c_v128_ziplo_16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ziphi_16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_16(a.v128[1], b.v128[1]),
c_v128_ziplo_16(a.v128[1], b.v128[1]));
}
SIMD_INLINE c_v256 c_v256_ziplo_32(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_32(a.v128[0], b.v128[0]),
c_v128_ziplo_32(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ziphi_32(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_32(a.v128[1], b.v128[1]),
c_v128_ziplo_32(a.v128[1], b.v128[1]));
}
SIMD_INLINE c_v256 c_v256_ziplo_64(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_64(a.v128[0], b.v128[0]),
c_v128_ziplo_64(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_ziphi_64(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_ziphi_64(a.v128[1], b.v128[1]),
c_v128_ziplo_64(a.v128[1], b.v128[1]));
}
SIMD_INLINE c_v256 c_v256_ziplo_128(c_v256 a, c_v256 b) {
return c_v256_from_v128(a.v128[0], b.v128[0]);
}
SIMD_INLINE c_v256 c_v256_ziphi_128(c_v256 a, c_v256 b) {
return c_v256_from_v128(a.v128[1], b.v128[1]);
}
SIMD_INLINE c_v256 c_v256_zip_8(c_v128 a, c_v128 b) {
return c_v256_from_v128(c_v128_ziphi_8(a, b), c_v128_ziplo_8(a, b));
}
SIMD_INLINE c_v256 c_v256_zip_16(c_v128 a, c_v128 b) {
return c_v256_from_v128(c_v128_ziphi_16(a, b), c_v128_ziplo_16(a, b));
}
SIMD_INLINE c_v256 c_v256_zip_32(c_v128 a, c_v128 b) {
return c_v256_from_v128(c_v128_ziphi_32(a, b), c_v128_ziplo_32(a, b));
}
SIMD_INLINE c_v256 _c_v256_unzip_8(c_v256 a, c_v256 b, int mode) {
c_v256 t;
int i;
if (mode) {
for (i = 0; i < 16; i++) {
t.u8[i] = a.u8[i * 2 + 1];
t.u8[i + 16] = b.u8[i * 2 + 1];
}
} else {
for (i = 0; i < 16; i++) {
t.u8[i] = b.u8[i * 2];
t.u8[i + 16] = a.u8[i * 2];
}
}
return t;
}
SIMD_INLINE c_v256 c_v256_unziplo_8(c_v256 a, c_v256 b) {
return CONFIG_BIG_ENDIAN ? _c_v256_unzip_8(a, b, 1)
: _c_v256_unzip_8(a, b, 0);
}
SIMD_INLINE c_v256 c_v256_unziphi_8(c_v256 a, c_v256 b) {
return CONFIG_BIG_ENDIAN ? _c_v256_unzip_8(b, a, 0)
: _c_v256_unzip_8(b, a, 1);
}
SIMD_INLINE c_v256 _c_v256_unzip_16(c_v256 a, c_v256 b, int mode) {
c_v256 t;
int i;
if (mode) {
for (i = 0; i < 8; i++) {
t.u16[i] = a.u16[i * 2 + 1];
t.u16[i + 8] = b.u16[i * 2 + 1];
}
} else {
for (i = 0; i < 8; i++) {
t.u16[i] = b.u16[i * 2];
t.u16[i + 8] = a.u16[i * 2];
}
}
return t;
}
SIMD_INLINE c_v256 c_v256_unziplo_16(c_v256 a, c_v256 b) {
return CONFIG_BIG_ENDIAN ? _c_v256_unzip_16(a, b, 1)
: _c_v256_unzip_16(a, b, 0);
}
SIMD_INLINE c_v256 c_v256_unziphi_16(c_v256 a, c_v256 b) {
return CONFIG_BIG_ENDIAN ? _c_v256_unzip_16(b, a, 0)
: _c_v256_unzip_16(b, a, 1);
}
SIMD_INLINE c_v256 _c_v256_unzip_32(c_v256 a, c_v256 b, int mode) {
c_v256 t;
if (mode) {
t.u32[7] = b.u32[7];
t.u32[6] = b.u32[5];
t.u32[5] = b.u32[3];
t.u32[4] = b.u32[1];
t.u32[3] = a.u32[7];
t.u32[2] = a.u32[5];
t.u32[1] = a.u32[3];
t.u32[0] = a.u32[1];
} else {
t.u32[7] = a.u32[6];
t.u32[6] = a.u32[4];
t.u32[5] = a.u32[2];
t.u32[4] = a.u32[0];
t.u32[3] = b.u32[6];
t.u32[2] = b.u32[4];
t.u32[1] = b.u32[2];
t.u32[0] = b.u32[0];
}
return t;
}
SIMD_INLINE c_v256 c_v256_unziplo_32(c_v256 a, c_v256 b) {
return CONFIG_BIG_ENDIAN ? _c_v256_unzip_32(a, b, 1)
: _c_v256_unzip_32(a, b, 0);
}
SIMD_INLINE c_v256 c_v256_unziphi_32(c_v256 a, c_v256 b) {
return CONFIG_BIG_ENDIAN ? _c_v256_unzip_32(b, a, 0)
: _c_v256_unzip_32(b, a, 1);
}
SIMD_INLINE c_v256 c_v256_unpack_u8_s16(c_v128 a) {
return c_v256_from_v128(c_v128_unpackhi_u8_s16(a), c_v128_unpacklo_u8_s16(a));
}
SIMD_INLINE c_v256 c_v256_unpacklo_u8_s16(c_v256 a) {
return c_v256_from_v128(c_v128_unpackhi_u8_s16(a.v128[0]),
c_v128_unpacklo_u8_s16(a.v128[0]));
}
SIMD_INLINE c_v256 c_v256_unpackhi_u8_s16(c_v256 a) {
return c_v256_from_v128(c_v128_unpackhi_u8_s16(a.v128[1]),
c_v128_unpacklo_u8_s16(a.v128[1]));
}
SIMD_INLINE c_v256 c_v256_pack_s32_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_pack_s32_s16(a.v128[1], a.v128[0]),
c_v128_pack_s32_s16(b.v128[1], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_pack_s16_u8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_pack_s16_u8(a.v128[1], a.v128[0]),
c_v128_pack_s16_u8(b.v128[1], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_pack_s16_s8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_pack_s16_s8(a.v128[1], a.v128[0]),
c_v128_pack_s16_s8(b.v128[1], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_unpack_u16_s32(c_v128 a) {
return c_v256_from_v128(c_v128_unpackhi_u16_s32(a),
c_v128_unpacklo_u16_s32(a));
}
SIMD_INLINE c_v256 c_v256_unpack_s16_s32(c_v128 a) {
return c_v256_from_v128(c_v128_unpackhi_s16_s32(a),
c_v128_unpacklo_s16_s32(a));
}
SIMD_INLINE c_v256 c_v256_unpacklo_u16_s32(c_v256 a) {
return c_v256_from_v128(c_v128_unpackhi_u16_s32(a.v128[0]),
c_v128_unpacklo_u16_s32(a.v128[0]));
}
SIMD_INLINE c_v256 c_v256_unpacklo_s16_s32(c_v256 a) {
return c_v256_from_v128(c_v128_unpackhi_s16_s32(a.v128[0]),
c_v128_unpacklo_s16_s32(a.v128[0]));
}
SIMD_INLINE c_v256 c_v256_unpackhi_u16_s32(c_v256 a) {
return c_v256_from_v128(c_v128_unpackhi_u16_s32(a.v128[1]),
c_v128_unpacklo_u16_s32(a.v128[1]));
}
SIMD_INLINE c_v256 c_v256_unpackhi_s16_s32(c_v256 a) {
return c_v256_from_v128(c_v128_unpackhi_s16_s32(a.v128[1]),
c_v128_unpacklo_s16_s32(a.v128[1]));
}
SIMD_INLINE c_v256 c_v256_shuffle_8(c_v256 a, c_v256 pattern) {
c_v256 t;
int c;
for (c = 0; c < 32; c++) {
if (pattern.u8[c] & ~31) {
fprintf(stderr, "Undefined v256_shuffle_8 index %d/%d\n", pattern.u8[c],
c);
abort();
}
t.u8[c] = a.u8[CONFIG_BIG_ENDIAN ? 31 - (pattern.u8[c] & 31)
: pattern.u8[c] & 31];
}
return t;
}
// Pairwise / dual-lane shuffle: shuffle two 128 bit lates.
SIMD_INLINE c_v256 c_v256_pshuffle_8(c_v256 a, c_v256 pattern) {
return c_v256_from_v128(
c_v128_shuffle_8(c_v256_high_v128(a), c_v256_high_v128(pattern)),
c_v128_shuffle_8(c_v256_low_v128(a), c_v256_low_v128(pattern)));
}
SIMD_INLINE c_v256 c_v256_cmpgt_s8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_cmpgt_s8(a.v128[1], b.v128[1]),
c_v128_cmpgt_s8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_cmplt_s8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_cmplt_s8(a.v128[1], b.v128[1]),
c_v128_cmplt_s8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_cmpeq_8(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_cmpeq_8(a.v128[1], b.v128[1]),
c_v128_cmpeq_8(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_cmpgt_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_cmpgt_s16(a.v128[1], b.v128[1]),
c_v128_cmpgt_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_cmplt_s16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_cmplt_s16(a.v128[1], b.v128[1]),
c_v128_cmplt_s16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_cmpeq_16(c_v256 a, c_v256 b) {
return c_v256_from_v128(c_v128_cmpeq_16(a.v128[1], b.v128[1]),
c_v128_cmpeq_16(a.v128[0], b.v128[0]));
}
SIMD_INLINE c_v256 c_v256_shl_n_byte(c_v256 a, const unsigned int n) {
if (n < 16)
return c_v256_from_v128(c_v128_or(c_v128_shl_n_byte(a.v128[1], n),
c_v128_shr_n_byte(a.v128[0], 16 - n)),
c_v128_shl_n_byte(a.v128[0], n));
else if (n > 16)
return c_v256_from_v128(c_v128_shl_n_byte(a.v128[0], n - 16),
c_v128_zero());
else
return c_v256_from_v128(c_v256_low_v128(a), c_v128_zero());
}
SIMD_INLINE c_v256 c_v256_shr_n_byte(c_v256 a, const unsigned int n) {
if (n < 16)
return c_v256_from_v128(c_v128_shr_n_byte(a.v128[1], n),
c_v128_or(c_v128_shr_n_byte(a.v128[0], n),
c_v128_shl_n_byte(a.v128[1], 16 - n)));
else if (n > 16)
return c_v256_from_v128(c_v128_zero(),
c_v128_shr_n_byte(a.v128[1], n - 16));
else
return c_v256_from_v128(c_v128_zero(), c_v256_high_v128(a));
}
SIMD_INLINE c_v256 c_v256_align(c_v256 a, c_v256 b, const unsigned int c) {
if (simd_check && c > 31) {
fprintf(stderr, "Error: undefined alignment %d\n", c);
abort();
}
return c ? c_v256_or(c_v256_shr_n_byte(b, c), c_v256_shl_n_byte(a, 32 - c))
: b;
}
SIMD_INLINE c_v256 c_v256_shl_8(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shl_8(a.v128[1], c),
c_v128_shl_8(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shr_u8(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shr_u8(a.v128[1], c),
c_v128_shr_u8(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shr_s8(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shr_s8(a.v128[1], c),
c_v128_shr_s8(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shl_16(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shl_16(a.v128[1], c),
c_v128_shl_16(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shr_u16(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shr_u16(a.v128[1], c),
c_v128_shr_u16(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shr_s16(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shr_s16(a.v128[1], c),
c_v128_shr_s16(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shl_32(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shl_32(a.v128[1], c),
c_v128_shl_32(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shr_u32(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shr_u32(a.v128[1], c),
c_v128_shr_u32(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shr_s32(c_v256 a, const unsigned int c) {
return c_v256_from_v128(c_v128_shr_s32(a.v128[1], c),
c_v128_shr_s32(a.v128[0], c));
}
SIMD_INLINE c_v256 c_v256_shl_n_8(c_v256 a, const unsigned int n) {
return c_v256_shl_8(a, n);
}
SIMD_INLINE c_v256 c_v256_shl_n_16(c_v256 a, const unsigned int n) {
return c_v256_shl_16(a, n);
}
SIMD_INLINE c_v256 c_v256_shl_n_32(c_v256 a, const unsigned int n) {
return c_v256_shl_32(a, n);
}
SIMD_INLINE c_v256 c_v256_shr_n_u8(c_v256 a, const unsigned int n) {
return c_v256_shr_u8(a, n);
}
SIMD_INLINE c_v256 c_v256_shr_n_u16(c_v256 a, const unsigned int n) {
return c_v256_shr_u16(a, n);
}
SIMD_INLINE c_v256 c_v256_shr_n_u32(c_v256 a, const unsigned int n) {
return c_v256_shr_u32(a, n);
}
SIMD_INLINE c_v256 c_v256_shr_n_s8(c_v256 a, const unsigned int n) {
return c_v256_shr_s8(a, n);
}
SIMD_INLINE c_v256 c_v256_shr_n_s16(c_v256 a, const unsigned int n) {
return c_v256_shr_s16(a, n);
}
SIMD_INLINE c_v256 c_v256_shr_n_s32(c_v256 a, const unsigned int n) {
return c_v256_shr_s32(a, n);
}
#endif /* _V256_INTRINSICS_C_H */

Some files were not shown because too many files have changed in this diff Show More