Compare commits

...

1130 Commits

Author SHA1 Message Date
Jingning Han
6515afc6b9 Merge "Add min_tx_size variable to recursive transform block partition system" into nextgenv2 2016-11-08 19:14:33 +00:00
Yaowu Xu
f6e958b604 Merge "Fix the bug that PVQ commit broke dering" into nextgenv2 2016-11-08 18:00:53 +00:00
Angie Chiang
13ea019574 Merge changes Ib9428dc9,Ide04717a,If1dba7d8,I6da97880 into nextgenv2
* changes:
  Merge rd_stats only when it is valid
  Let parentheses in handle_inter_mode be symmetric
  Add RD_STATS into MB_MODE_INFO
  Add txb_coeff_cost_map
2016-11-08 17:42:04 +00:00
Jingning Han
e67b38aa7c Add min_tx_size variable to recursive transform block partition system
Replace max_tx_size with min_tx_size for transform type decision.

Change-Id: I64e39923a67903d52b381bd93eaac33b3400a201
2016-11-08 09:36:54 -08:00
Yushin Cho
48f84dbd1c Fix the bug that PVQ commit broke dering
Since PVQ's max block size equals to the max transform size,
daala's definition of OD_BSIZE_MAX was changed from 5 down to 4 to
use AV1's max trasform size 32x32. However, dering also uses
OD_BSIZE_MAX and assumes its value is 5, which caused dering
not working.

Change-Id: I9d82bb24adc7d57552a8e0a8a7e798e77d96fd4b
2016-11-08 08:15:57 -08:00
Nathan E. Egge
f0481a590f Use --enable-daala_ec by default.
Change-Id: I9e2a8db4e59cb9c109e978e473749ebc4e910148
2016-11-07 21:11:31 -08:00
Brennan Shacklett
e0b5ae8c4e Remove multiple coefficient buffers from PICK_MODE_CONTEXT
This commit is a manual cherry-pick from aom/master:
45592a39d3b00aee4d6bd70da669400017b7a5d8

Only part of the changes apply in nextgenv2

Change-Id: I1e22514c6fe5af556710254278f2f8a5805db999
2016-11-08 03:50:10 +00:00
Tom Finegan
973d4d56fa cmake: Add partial configure.
- Add minimal compiler flag testing.
- Generate aom_config.c and aom_config.h. Note: hard coded
  to generic-gnu values for now.
- Still a work in progress. This will not build anything.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76

Change-Id: Id65b42ea9f4c4f744d788660e2de7234886ce039
2016-11-08 01:51:12 +00:00
Tom Finegan
9b3974ef34 aom_ports: Fix build in Xcode 8.
Use void casts and avoid unused/unnamed parameter warnings.

Change-Id: Id02ec2c613cb1423f693bcc56832ccd9b41d05bd
2016-11-08 01:50:54 +00:00
Yaowu Xu
fc1b213af1 Use block_size for max_scan_line in pvq decoding
Change-Id: I642bc205a7d2c4d472385fbeb4323e62e17984b4
2016-11-08 00:55:23 +00:00
Yaowu Xu
856c55e93d Add transform parameter initialization
The initialization of transform parameters was missing, that led to a
crash in encoder.

Change-Id: I9e35830d5f24e771c845f0d8881671d6b7228c5e
2016-11-08 00:36:30 +00:00
Tristan Matthews
cc37e36683 pvq: drop unused declaration
Change-Id: I5d95bb897d335dc17aa0ae5e873ba7dee46c6fda
2016-11-07 22:14:50 +00:00
Yaowu Xu
f591782085 Fix compiler warning of out-of-bound array access
Change-Id: I00f147cd372cedc5038708b0f23f6fae68918528
2016-11-07 22:14:24 +00:00
Yaowu Xu
dc9720433f Merge "Fix compiler warning of un-used variables" into nextgenv2 2016-11-07 22:03:12 +00:00
Yaowu Xu
4007fa6852 Merge "change to call fwd_txfm()" into nextgenv2 2016-11-07 22:03:01 +00:00
Yaowu Xu
c4c21734d6 Merge "Resolve merge issues with --enable-pvq" into nextgenv2 2016-11-07 20:31:33 +00:00
Yaowu Xu
02d4c3b780 Fix compiler warning of un-used variables
Change-Id: I17d05bbf75a201fd010fc17e2d9bd0db8ef36d41
2016-11-07 19:56:13 +00:00
Yaowu Xu
3442b4b159 change to call fwd_txfm()
The transform functions have been refactored in nextgenv2, this commit
resolves the calls in pvq patch to use this new scheme.

Change-Id: I1b56e75106a3357bb19bd7df2b4ba305eb9ed185
2016-11-07 10:40:41 -08:00
Yaowu Xu
d6ea71cf73 Resolve merge issues with --enable-pvq
This commit resolves some compiling issues due to merge.

Change-Id: I0eef8aa36c404e185e0b0004948a49307c360d3e
2016-11-07 10:35:55 -08:00
Debargha Mukherjee
03dc29bdf3 Merge "Fix bug in bicubic filter in warped_motion.c" into nextgenv2 2016-11-07 17:58:47 +00:00
Yaowu Xu
00a0e010f7 Merge "New experiment: Perceptual Vector Quantization from Daala" into nextgenv2 2016-11-07 16:00:32 +00:00
David Barker
f23bdca6a8 Fix bug in bicubic filter in warped_motion.c
Previously, do_cubic_filter would return results with the
wrong precision if the sample point was exactly aligned to
a pixel.

Change-Id: I40139f9a6701a8e72e691f37bb352f7814a7f306
2016-11-07 13:47:13 +00:00
Yushin Cho
77bba8d30a New experiment: Perceptual Vector Quantization from Daala
PVQ replaces the scalar quantizer and coefficient coding with a new
design originally developed in Daala. It currently depends on the
Daala entropy coder although it could be adapted to work with another
entropy coder if needed:
./configure --enable-experimental --enable-daala_ec --enable-pvq

The version of PVQ in this commit is adapted from the following
revision of Daala:
fb51c1ade6

More information about PVQ:
- https://people.xiph.org/~jm/daala/pvq_demo/
- https://jmvalin.ca/papers/spie_pvq.pdf

The following files are copied as-is from Daala with minimal
adaptations, therefore we disable clang-format on those files
to make it easier to synchronize the AV1 and Daala codebases in the future:
 av1/common/generic_code.c
 av1/common/generic_code.h
 av1/common/laplace_tables.c
 av1/common/partition.c
 av1/common/partition.h
 av1/common/pvq.c
 av1/common/pvq.h
 av1/common/state.c
 av1/common/state.h
 av1/common/zigzag.h
 av1/common/zigzag16.c
 av1/common/zigzag32.c
 av1/common/zigzag4.c
 av1/common/zigzag64.c
 av1/common/zigzag8.c
 av1/decoder/decint.h
 av1/decoder/generic_decoder.c
 av1/decoder/laplace_decoder.c
 av1/decoder/pvq_decoder.c
 av1/decoder/pvq_decoder.h
 av1/encoder/daala_compat_enc.c
 av1/encoder/encint.h
 av1/encoder/generic_encoder.c
 av1/encoder/laplace_encoder.c
 av1/encoder/pvq_encoder.c
 av1/encoder/pvq_encoder.h

Known issues:
- Lossless mode is not supported, '--lossless=1' will give the same result as
'--end-usage=q --cq-level=1'.
- High bit depth is not supported by PVQ.

Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
2016-11-06 22:18:01 -08:00
Angie Chiang
616990d607 Merge rd_stats only when it is valid
Change-Id: Ib9428dc9b6e224fdb5d410368c5b92042c96f68a
2016-11-06 15:25:37 -08:00
Angie Chiang
78a3bc165c Let parentheses in handle_inter_mode be symmetric
Change-Id: Ide04717a8ce2a7c1245f9614485647e296e96abd
2016-11-06 13:01:16 -08:00
Angie Chiang
9a44f5fbc8 Add RD_STATS into MB_MODE_INFO
With RD_STATS in MB_MODE_INFO, we will be able to compare the results
from rate-distortion loop and the results from bitstream packing.

Change-Id: If1dba7d87126577a6f369ac087d4517f7cebb0c5
2016-11-06 12:21:34 -08:00
Angie Chiang
85279f6668 Add txb_coeff_cost_map
The txb_coeff_cost_map is a 16x16 map which records each single
transform block's cost from the transform block's location in 4-pixel
unit in recursive transform experiment.

Change-Id: I6da97880c457680594bca56617084010891beaa2
2016-11-06 11:55:17 -08:00
Debargha Mukherjee
92447f34df Merge "Increase gm precision from 16 to 32 bit ints" into nextgenv2 2016-11-06 10:08:02 +00:00
Debargha Mukherjee
5f305854e6 Increase gm precision from 16 to 32 bit ints
Change-Id: I7117a6c14dc8438e4225b50bd2d3ebbaa7f850cc
2016-11-05 16:50:08 -07:00
Tristan Matthews
ec994d8bbd accounting_test: fix read of uninitialized data
Only read bits that were actually written.

Change-Id: I6d901123c319a1d92c54f511d3caa56daf882281
2016-11-05 10:49:16 -07:00
Tristan Matthews
4891ef9ae0 boolcoder_test: fix read of uninitialized data
Only read bits that were actually written.

Change-Id: Id62c52b7804cbfb401e6e7388201406bc899ea5d
2016-11-05 10:48:55 -07:00
Tom Finegan
591fc6f1aa aom_ports: Silence warnings in aom_timer.h
When CONFIG_OS_SUPPORT is not enabled the aom_timer timer function
stubs cause unused parameter warnings. This comments out the arg
names and silences the warning.

Change-Id: I97bdbcbebdf081ac5cb2ffd86439028a1e672fa2
2016-11-05 10:48:36 -07:00
Yaowu Xu
deaff66955 Merge "Fix the bool coder test" into nextgenv2 2016-11-05 17:43:54 +00:00
James Zern
99ff89b6fb Merge "rdopt: clear maybe-uninitialized variable warning" into nextgenv2 2016-11-05 03:19:33 +00:00
Sarah Parker
70c4fab569 rdopt: clear maybe-uninitialized variable warning
av1/encoder/rdopt.c:9533 ‘zeromv[1].as_int’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
this was spurious given the logic in the if

Change-Id: I8ddfe7e46d1bf5593cc8624f05c9f181243a87d4
2016-11-04 17:56:23 -07:00
Yushin Cho
3e4fcb4ff4 Fix the bool coder test
Fix the bool coder test not to use a probability of 100%.

Change-Id: I799871cb0c48580edf0ee15a6c9931d27591ec99
(cherry picked from commit 9b79f6a3d6ea398e5d51d3d1dd69cbfb1725370e)
2016-11-04 16:07:00 -07:00
Jingning Han
713b56121a Merge "Clean up write_tx_type()" into nextgenv2 2016-11-04 23:03:53 +00:00
Jingning Han
0880b5466f Merge "Refactor tx_type reader" into nextgenv2 2016-11-04 23:03:47 +00:00
Jingning Han
05abee1530 Merge "Factor out common tx_type writing codes from inter/intra frame" into nextgenv2 2016-11-04 23:03:38 +00:00
Angie Chiang
7a77169a35 Merge changes Ia37f170d,Ie3082db5 into nextgenv2
* changes:
  Record YUV planes' txfm block coeff cost in handle_inter_mode()
  Separate coefficient cost of U/V planes in write_modes_b()
2016-11-04 22:58:58 +00:00
Angie Chiang
59526ead45 Merge changes I3bc782d6,I8359e849,Iae50d0b0,Id1704d88,Ia69f13c4, ... into nextgenv2
* changes:
  Add av1_ prefix on ###_rd_stats functions
  Use init_rd_stats() in encodeframe.c
  Add transform block coefficient cost in RD_STATS for debugging
  Add helper functions to modify RD_STATS
  Add mi_row and mi_col into mbmi to facilitate rd_debug process
  Add token cost comparison in write_modes_b()
2016-11-04 22:43:30 +00:00
James Zern
653fdd6d55 Merge changes I139808f4,I3d97d8db into nextgenv2
* changes:
  warped_motion.c: delete unused filter_4tap[]
  warped_motion.c: quiet float-conversion warnings
2016-11-04 22:34:11 +00:00
Angie Chiang
628d7c915b Record YUV planes' txfm block coeff cost in handle_inter_mode()
Change-Id: Ia37f170d8fd961d78a751d84b9525ab7e973b81a
2016-11-04 11:12:44 -07:00
Angie Chiang
c0feea8a0c Add av1_ prefix on ###_rd_stats functions
Change-Id: I3bc782d68bcd9b52b38210eec9eecb21146fde75
2016-11-04 11:12:44 -07:00
Angie Chiang
75f45814ba Separate coefficient cost of U/V planes in write_modes_b()
Change-Id: Ie3082db5b0fead8c322b2aeede4eff7cd723ea12
2016-11-04 11:12:44 -07:00
Angie Chiang
4695b97030 Use init_rd_stats() in encodeframe.c
Change-Id: I8359e8498efd301ff81eea1d7466d0f3fec5e006
2016-11-04 11:11:27 -07:00
Angie Chiang
d81fdb41e6 Add transform block coefficient cost in RD_STATS for debugging
Change-Id: Iae50d0b0c4f8f383ab4f91d2c1c2fa4e799c7250
2016-11-04 11:11:27 -07:00
Angie Chiang
d7246171b5 Add helper functions to modify RD_STATS
Those functions includes
init_rd_stats()
invalid_rd_stats()
merge_rd_stats()

This CL help simplify the code.

Change-Id: Id1704d883bd21a039b0478a940994ca14184ae1c
2016-11-04 11:11:27 -07:00
Angie Chiang
394c337754 Add mi_row and mi_col into mbmi to facilitate rd_debug process
Change-Id: Ia69f13c47f2dd34fabd220652691049166a06a68
2016-11-04 11:09:24 -07:00
Angie Chiang
d402282f69 Add token cost comparison in write_modes_b()
This is just partial implementation
Compare token cost of pack_mb_tokens/pack_txb_tokens with token cost
from rate-distortion loop. If there is any difference, dump out mode
info.

Change-Id: I46b373ee2522c5047f799f36baf7cec5fbc06f06
2016-11-04 11:09:24 -07:00
Jingning Han
4be3214fec Merge "Properly schedule the transform block recursion order" into nextgenv2 2016-11-04 17:53:53 +00:00
Jingning Han
641b1ad5ad Clean up write_tx_type()
Remove repeated mbmi->tx_size calls.

Change-Id: I3e4e03b69b2efffd860cc1ea34e150f4257bf081
2016-11-04 10:36:20 -07:00
Jingning Han
ab7163db08 Refactor tx_type reader
Factor out common codes. Remove repeated mbmi->tx_size calls.

Change-Id: Id5de35e88f1a5f16223eaa06fc2c9f69124061ef
2016-11-04 10:35:34 -07:00
Jingning Han
2a4da9476b Factor out common tx_type writing codes from inter/intra frame
Change-Id: Id2626bd19db2504756d9a1dee709c2d08c79f771
2016-11-04 10:33:12 -07:00
Yue Chen
95a3898cbd Merge "Remove duplicated variables in EXT_INTER" into nextgenv2 2016-11-04 17:11:08 +00:00
Jingning Han
98d6a1f247 Properly schedule the transform block recursion order
This commit replaces the offset based block index calculation with
incremental based one. It does not change the coding statistics.

Change-Id: I3789294eb45416bd0823e773ec30f05ed41ba0dc
2016-11-04 09:06:49 -07:00
Jingning Han
137b2671eb Fix format issue in handle_inter_mode()
Change-Id: I681fd799cf46991de419cc867ccb649a6990c19d
2016-11-04 08:31:24 -07:00
Debargha Mukherjee
68d695b7ca Merge "Further work on 64x64 fwd/inv transform support" into nextgenv2 2016-11-04 09:32:07 +00:00
Angie Chiang
e89ea0ceb7 Merge "Refactor: Replace rate dist sse skip by RD_STATS in VAR_TX" into nextgenv2 2016-11-04 05:42:59 +00:00
Debargha Mukherjee
21378b8ad0 Merge "Fix bilateral filter asan error for highbitdepth" into nextgenv2 2016-11-04 05:25:49 +00:00
James Zern
5d54c175c2 warped_motion.c: delete unused filter_4tap[]
Change-Id: I139808f492a9e9dcac44a36237b61231ede7edc3
2016-11-03 20:12:20 -07:00
James Zern
4846e446c6 warped_motion.c: quiet float-conversion warnings
Change-Id: I3d97d8db51a5a5d6b2c1cae47492b53ab37100a7
2016-11-03 20:11:06 -07:00
James Zern
005ff81598 Merge "warped_motion: Fix ubsan warning for signed integer overflow" into nextgenv2 2016-11-04 00:58:07 +00:00
James Zern
9371394492 Merge "Fix ubsan divide by zero warning in ransac" into nextgenv2 2016-11-04 00:56:23 +00:00
Sarah Parker
db92635745 warped_motion: Fix ubsan warning for signed integer overflow
Change-Id: Ie698aa02ef56128759c71079e9bfa1af25149644
2016-11-04 00:54:25 +00:00
Angie Chiang
b5dda4887b Refactor: Replace rate dist sse skip by RD_STATS in VAR_TX
This is to facilitate implementation of rd_debug tool; it doesn't change
coding behavior.

Change-Id: I0eb82b31473883ba6652ed11dca09b9ec4530183
2016-11-03 17:51:26 -07:00
Debargha Mukherjee
c57924cb9e Fix bilateral filter asan error for highbitdepth
BUG=webm:1334

Change-Id: I5886eec0a22a8cc056e1bdb493d2faf183816656
2016-11-03 16:23:09 -07:00
James Zern
97a2c675e7 Merge "rdopt,global-motion: Fix -1 indexing ubsan warning" into nextgenv2 2016-11-03 22:59:34 +00:00
Sarah Parker
182953b299 rdopt,global-motion: Fix -1 indexing ubsan warning
Change-Id: I1b3caf3543ab385f39f5f253c9949ad89ea5af7d
2016-11-03 22:58:47 +00:00
Alex Converse
5cb72a2dba Merge "Use TX_SIZES in intra_high_pred_fn declarations" into nextgenv2 2016-11-03 22:13:34 +00:00
Yue Chen
9d3e478e72 Remove duplicated variables in EXT_INTER
Introduced by merge commit 141f7a9

Change-Id: Idd68e09a6cd925d97466eabebe0e4905b5031340
2016-11-03 15:12:42 -07:00
Sarah Parker
b60c138cdf Merge "Make inline function static to fix clang compile error" into nextgenv2 2016-11-03 22:09:46 +00:00
Alex Converse
9613758e71 Merge "Don't use a TX_SIZE as a TX_TYPE" into nextgenv2 2016-11-03 21:44:21 +00:00
Debargha Mukherjee
6a47cff882 Further work on 64x64 fwd/inv transform support
For higher level fwd and inv transform functions.

Change-Id: I91518250a0be7d94aada7519f6c9e7ed024574fb
2016-11-03 14:32:54 -07:00
Sarah Parker
fa75ae0663 Fix ubsan divide by zero warning in ransac
Change-Id: I8c736ff665a27ce8307fd62571b9728333756d7e
2016-11-03 13:03:45 -07:00
Debargha Mukherjee
d65708a375 Merge "Replace hard coded numbers with TX_SIZES macro" into nextgenv2 2016-11-03 19:59:10 +00:00
Jingning Han
a504e77a98 Merge "Fix txb_w/h use case in av1_tx_block_rd_b()" into nextgenv2 2016-11-03 19:31:48 +00:00
Yaowu Xu
565f788de9 Merge "fix build issue with --enable-delta-q" into nextgenv2 2016-11-03 18:45:57 +00:00
Sarah Parker
fb3971e55a Make inline function static to fix clang compile error
Change-Id: I0432b8274a2764ba978dd6c4ed532fb7e4b7b519
2016-11-03 10:50:50 -07:00
Alex Converse
86b56742fb Use TX_SIZES in intra_high_pred_fn declarations
Change-Id: I078bb5244dbff153bcfab226206540ca6cebdad0
2016-11-03 10:28:11 -07:00
Alex Converse
f0ede18718 Don't use a TX_SIZE as a TX_TYPE
Change-Id: I26b02e6578ad2d82aadfe1df2aeb84e6c11a747b
2016-11-03 10:28:05 -07:00
Angie Chiang
2b10128a55 Add rd_debug flag
rd_debug is a debug tool aim at finding discrepancy between rate-distortion
loop and bitstream packing.

Change-Id: I751c4121516c5e6368668229c77778880a9dcb9d
2016-11-03 10:25:50 -07:00
Jingning Han
4b47c937d0 Fix txb_w/h use case in av1_tx_block_rd_b()
Match them with block_row/col index.

Change-Id: Idf0f924a093e5312b0a36b765d295e52d033eb5a
2016-11-03 09:20:08 -07:00
Yaowu Xu
5bb8f5b705 fix build issue with --enable-delta-q
BUG=webm:1330

Change-Id: I120ce8ea3581018b232b19ca7ffbb07d3e99d8d0
2016-11-03 09:03:39 -07:00
Debargha Mukherjee
e04fdb2308 Replace hard coded numbers with TX_SIZES macro
Replaces a couple of hard coded numbers with TX_SIZES macro
in common/reconiontra.c

Change-Id: I8a2a53ca16bc3ab51409cec340bea55292ff2dee
2016-11-03 08:51:42 -07:00
Jingning Han
1b5bbf8e97 Merge "Refactor recursive transform block partition search" into nextgenv2 2016-11-03 15:41:57 +00:00
Jingning Han
07cfa29031 Merge "Make bit-stream support rectangular tx_size" into nextgenv2 2016-11-03 15:41:46 +00:00
Yaowu Xu
c1ca945ce5 Merge changes from topic 'update_dering' into nextgenv2
* changes:
  Reformatting the deringing code
  Introducing OD_DERING_SIZE_LOG2 constant (3)
  Renaming deringing blockwise write-back functions to make code clearer
  Deringing refactoring: replace last_sbc with simpler dering_left flag
  Getting rid of the od_dering_in type
2016-11-03 14:03:25 +00:00
Yaowu Xu
7036aee1a4 Merge "Refactoring deringed block list code" into nextgenv2 2016-11-03 13:48:58 +00:00
Yaowu Xu
71c72561fa Merge "Deringing line buffer no longer depends on holding OD_DERING_VERY_LARGE" into nextgenv2 2016-11-03 05:02:32 +00:00
Jingning Han
e60d3294ea Merge "Make recursive txfm encoding process support rectangular tx_size" into nextgenv2 2016-11-03 04:36:55 +00:00
Jingning Han
141f7a9757 Merge "Fix a merge bug between dual_filter and sub8x8mc" into nextgenv2 2016-11-03 01:06:39 +00:00
Jingning Han
1e477f9833 Merge "Remove redundant experimental flags from common_data.h" into nextgenv2 2016-11-03 01:04:45 +00:00
Jingning Han
18482fe32d Refactor recursive transform block partition search
This commit refactors the recursive transform block partition
search process to make it support rectangular transform block size
coding.

Change-Id: I0207ae40d83c7eae3cb5d460e403f470747590d3
2016-11-02 17:03:09 -07:00
Jingning Han
f64062f36f Make bit-stream support rectangular tx_size
Allow the transform size writing, reading, and the reconstruction
process to support rectangular transform block size coding.

Change-Id: I57393c73ec60835a088d785ca838d7e3d7eb29a4
2016-11-02 16:24:20 -07:00
Jean-Marc Valin
39d92a071d Reformatting the deringing code
Manally removed the "clang-format off" lines. The rest is done by clang

Change-Id: I88a2028b55a541729b4e8896cdf66b544e9898bb
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
e04650347c Refactoring deringed block list code
Using a struct named dlist rather than an array named bskip. Simplified some
code.

No change in output

Change-Id: Id40d40b19b5d8f2ebafe347590fa1bb8cb80e6e1
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
01b7780154 Deringing line buffer no longer depends on holding OD_DERING_VERY_LARGE
The OD_DERING_VERY_LARGE values are now explicitly copied to the buffer instead
of being read from the line buffer when we're on the edge of the frame. This
will make it possible to make the line buffer 8-bit for non-high-bitdepth.

No change in output

Change-Id: I1a4134d67ac7f8c239f08d73941405c56f01050b
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
e254241ce7 Introducing OD_DERING_SIZE_LOG2 constant (3)
Also cleans up the size of the deringing destination buffer.

No change in output.

Change-Id: I7fc50d862d3906ce809c1031bf0789acdf39cf34
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
58fdec2cbf Renaming deringing blockwise write-back functions to make code clearer
No change in output.

Change-Id: Ifa5df3adce9f24ef6dcd89a5f33a744bfb57194d
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
3544d15130 Deringing refactoring: replace last_sbc with simpler dering_left flag
No change in output.

Change-Id: I1cc2e14b2bb6c343baa7f88348c875085e5863af
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
39ee109333 Getting rid of the od_dering_in type
We no longer need the deringing code to be generic wrt the input depth.

No change in output.

Change-Id: I2db2beb82f1816e611cd2c0438dff217d363de33
2016-11-02 15:51:00 -07:00
Jingning Han
fee498255d Merge "Remove unused get_intra/inter_scan() from scan.h" into nextgenv2 2016-11-02 22:50:59 +00:00
Jingning Han
1a0faab642 Merge "Remove redundant config flags from get_entropy_context" into nextgenv2 2016-11-02 22:50:49 +00:00
Jean-Marc Valin
d95322a35c Now using a single line buffer
No change in output.

Change-Id: I4701a5517fb97889f970acfb0b44cee51c34fd95
2016-11-02 22:50:10 +00:00
Jean-Marc Valin
621e707259 Only copy data from deringed blocks to the line buffer
No change in output

Change-Id: I6ec4a8c635337562170153585e427afd6f9d9a0f
2016-11-02 22:49:54 +00:00
Jean-Marc Valin
50bb32ec87 Splitting out 8->16 block copy code into copy_sb8_16()
No change in output.

Change-Id: I4f0e37a879432e2647b3debe6a2c0c670a79dd6f
2016-11-02 22:49:39 +00:00
Jean-Marc Valin
39b0d2fb14 Eliminate the big superblock row buffer.
Now only buffering three lines across the entire frame and four lines
over the height of one superblock.

No change in output.

Change-Id: I6b99399974e197dc02f2e4ff2e60cdd7fdaa2e43
2016-11-02 22:48:47 +00:00
Jean-Marc Valin
b154a24283 Making deringing buffer only one row of superblocks at a time
This introduces a line buffer that hold the last three lines of each original
row so that the next row can be deringed with the original input of the upper
row.

No change in output

Change-Id: I8fad3bc48745e9ce3e440289f453477a0c5442c0
2016-11-02 22:48:19 +00:00
Jingning Han
a9336328d4 Make recursive txfm encoding process support rectangular tx_size
This commit makes the encoding process of the recursive transform
block partition support both rectangular and square transform block
sizes as the starting point. If the coding block size is rectangular,
it would allow the transform block size to start from the largest
rectangular transform size, and recursive parse to the selected
coding sizes.

Change-Id: I576628b9166565bada6a918f0a1e67849dfef4cd
2016-11-02 15:48:07 -07:00
Jean-Marc Valin
ca1eb5dc58 Duplicating deringing input superblock copy to make upcoming changes easier
No change in output

Change-Id: Iaa06043dcc31308c83f667424e5a83c2db50ed24
2016-11-02 22:46:53 +00:00
Jean-Marc Valin
8e941780be Using a uniform definition for "bsize" in deringing filter
No change in output

Change-Id: Ia3a1679aa75cb58f4bc6459791e061176eeafd52
2016-11-02 22:46:27 +00:00
Jean-Marc Valin
eab77ea936 Using the copied input for od_dir_find8()
No change in output

Change-Id: Iec1411c35bf175a462eade34e89a4c60eb2a1da4
2016-11-02 15:41:01 -07:00
Yaowu Xu
6285c6674d Merge "Increase deringing horizontal padding to 4 pixels on each side" into nextgenv2 2016-11-02 22:37:39 +00:00
Jean-Marc Valin
471687a9ac Increase deringing horizontal padding to 4 pixels on each side
This makes vectorization easier by having buffer lines be a multiple of 4.

No change in output

Change-Id: I7ec06e03a49554206af0a55aab03daccc411b50f
2016-11-02 22:37:35 +00:00
Yaowu Xu
4cb9a620db Merge "De-sparsifying the deringing output buffer" into nextgenv2 2016-11-02 22:35:05 +00:00
Jean-Marc Valin
82c65fc837 De-sparsifying the deringing output buffer
No change in output

Change-Id: I940203975564aedca8734d6f74b013edb513f517
2016-11-02 22:35:00 +00:00
Yaowu Xu
44f3587459 Merge "No need to store the deringing filter direction variance in an array" into nextgenv2 2016-11-02 22:34:46 +00:00
Jingning Han
46003149e5 Fix a merge bug between dual_filter and sub8x8mc
The function module in inter_predictor() has been changed to
universally support arbitrary block size inter prediction. Hence
sub8x8mc can be a standalone experiment now.

Change-Id: Ie9d87f61fc317b1d114edb4e0bf5544f918ed08e
2016-11-02 14:57:11 -07:00
Jingning Han
d611808324 Remove redundant experimental flags from common_data.h
No coding statistics change.

Change-Id: I88cbb828308b5796a2e87079c2f1bf0dabd99a11
2016-11-02 14:51:23 -07:00
Jingning Han
c104b8f269 Merge "Support rectangular tx_size in the common lib" into nextgenv2 2016-11-02 21:49:41 +00:00
Jingning Han
e714e70f77 Merge "Support rectangular transform block units in the codebase" into nextgenv2 2016-11-02 21:49:18 +00:00
Jingning Han
a834925778 Merge "Make highbd rectangular transform block available in the common lib" into nextgenv2 2016-11-02 21:49:07 +00:00
Jean-Marc Valin
643902d621 No need to store the deringing filter direction variance in an array
No change in output

Change-Id: Ifa5c5d4ed33ff11ea3c56ee5d559c7a40599b3dc
2016-11-02 13:15:42 -07:00
Sarah Parker
fcb2ca6eda Merge "Fix ubsan left shift warnings in warped motion library" into nextgenv2 2016-11-02 19:45:45 +00:00
Jingning Han
653102ab1c Remove unused get_intra/inter_scan() from scan.h
Change-Id: I96fc1da1ce56593ae35ebbc93a668e4ba241234a
2016-11-02 12:00:51 -07:00
Jingning Han
8b5380ac77 Remove redundant config flags from get_entropy_context
The rectangular transform syntax is by default supported, hence
no need to put it under the experimental flag. This does not change
the coding statistics.

Change-Id: I3a147503d973a03400f8a86e11f07c7d754e6234
2016-11-02 11:48:39 -07:00
Jingning Han
9fe31390ca Support rectangular tx_size in the common lib
Change-Id: I4128ab932a967a3d657bb1f95f0fa2af20a06469
2016-11-02 11:48:31 -07:00
Jingning Han
4ba26dc0e1 Support rectangular transform block units in the codebase
Change-Id: I9183851258478a36dc5a4ad2d4faa3d3c8b18bd3
2016-11-02 11:47:48 -07:00
Jingning Han
5238e6eaee Make highbd rectangular transform block available in the common lib
Change-Id: Ief08b23b30b78d640f6d7c702145e5bcf1b37b57
2016-11-02 11:47:48 -07:00
Debargha Mukherjee
deef66db01 Merge "Adding 64x64 forward and inverse transforms" into nextgenv2 2016-11-02 18:40:55 +00:00
Yaowu Xu
1af3d51685 Merge changes I313bde67,I2ddc2d70,Ifb9094c3,I9051ed6e,I5681e332, ... into nextgenv2
* changes:
  Avoid the "initial copy" in the deringing filter
  Only copy the deringed blocks back into the buffer
  Reducing copies in deringing filter
  sb_all_skip_out() now computes a list of deringed blocks
  compute bskip as we go
  Revert "Fix dering filter when using 4:2:2 or 4:4:0 subsampling"
2016-11-02 18:02:56 +00:00
Debargha Mukherjee
67d134772c Adding 64x64 forward and inverse transforms
Change-Id: I213f3111fc0656aecd1303a8b871ecded2b92bc2
2016-11-02 09:48:46 -07:00
Zoe Liu
bd163bc199 Merge "Make a small code clean on handle_inter_mode()" into nextgenv2 2016-11-02 16:39:36 +00:00
Jingning Han
6a503e4110 Merge "Make rectangular transform block available in the common lib" into nextgenv2 2016-11-02 16:17:00 +00:00
Jingning Han
f8a29663be Merge "Simplify tx_size enums" into nextgenv2 2016-11-02 16:16:50 +00:00
Jean-Marc Valin
bcf3580b1e Avoid the "initial copy" in the deringing filter
No change in output

Change-Id: I313bde67e59835f88e3b2e6079b0df2d7ed1a903
2016-11-02 08:23:04 -07:00
Jean-Marc Valin
7618daa555 Only copy the deringed blocks back into the buffer
No change in output

Change-Id: I2ddc2d70c6534e7cfd315d66e838410677f91356
2016-11-02 08:22:58 -07:00
Jean-Marc Valin
cf23aefab5 Reducing copies in deringing filter
Only copy the modified pixels from the first filter back into the input of the
second filter.

Change-Id: Ifb9094c33c876a8c6caa0f68771fc7ef59c78b53
2016-11-02 08:22:51 -07:00
Jean-Marc Valin
3e44bccb50 sb_all_skip_out() now computes a list of deringed blocks
No change in output

Change-Id: I9051ed6e1fbca7d80412ba2b53f7aacbc3ef70eb
2016-11-02 08:22:45 -07:00
Jean-Marc Valin
71466d2288 compute bskip as we go
Change-Id: I5681e3329ad3677296161de59f5ff1236a14f086
2016-11-02 08:22:38 -07:00
Yaowu Xu
3e90f84a34 Revert "Fix dering filter when using 4:2:2 or 4:4:0 subsampling"
This reverts commit 401204a50b.

Change-Id: Id27eadf679b0df2d2ccfab61155be29979b0b6ba
2016-11-02 08:22:02 -07:00
Jingning Han
ec419e0771 Make rectangular transform block available in the common lib
This prepares the integration of rectangular transform block size
with recursive transform block partition system.

Change-Id: Id96aa3790dace15619c665f438241938992d1730
2016-11-01 22:25:54 -07:00
Yaowu Xu
f67e5eec8b Merge "Disable upsampled references for resolutions above 1080p." into nextgenv2 2016-11-02 04:28:51 +00:00
Jingning Han
aad298ffcf Simplify tx_size enums
Remove redundant experimental flag. This does not change the coding
statistics.

Change-Id: I35b3cb04025c5c2d2744312e5efc00d0473c990d
2016-11-01 21:12:55 -07:00
Yi Luo
fb77385fd0 Merge "Remove unused copies of transform related source code" into nextgenv2 2016-11-02 01:43:19 +00:00
Yi Luo
7f6bf9c70d Merge "Hybrid inverse transforms 16x16 AVX2 optimization" into nextgenv2 2016-11-02 01:43:02 +00:00
Jingning Han
9679464e28 Merge "Change TXFM_CONTEXT from TX_SIZE to uint8_t" into nextgenv2 2016-11-02 01:18:19 +00:00
Jingning Han
746e2220b5 Merge "Rework transform block partition context model" into nextgenv2 2016-11-02 01:18:13 +00:00
Thomas Daede
a9e96d4000 Disable upsampled references for resolutions above 1080p.
Upsampled references currently increase the size of references by
64 times. This patch limits the memory used by the encoder to
about 3GB when encoding high bit depth content.

This should be re-evaluated in the future, if doing 8-tap
resampling in the motion search becomes reasonably fast, or if
the upsampled references are reduced in size (by omitting some
subpel positions and interpolating them instead).

Change-Id: I6d84ff0d6202ec46f4fa53e268e68aa808e5df85
2016-11-01 17:39:16 -07:00
Urvang Joshi
a5b09216b5 Merge "Revert of "Mark bogus palette color probabilities as zero"." into nextgenv2 2016-11-02 00:31:55 +00:00
Jingning Han
8b9478af1e Change TXFM_CONTEXT from TX_SIZE to uint8_t
Count the transform block partition context in the unit of pixels.

Change-Id: Ibb66f053526ed347ad0274b78db7ac35cc086b0e
2016-11-01 15:44:26 -07:00
Urvang Joshi
eb54e0cde8 Revert of "Mark bogus palette color probabilities as zero".
Reverted commit: f8306bfdc (with some changes).

Reason: This was triggering an assert in debug build because of zero
probability values. So, using an "UNUSED_PROB" macro to replace these to
retain clarity.

Assertion failure can be reproduced as follows:

$ make clean; extra_cflags='-O0 -g -fno-inline' ../../configure
--enable-debug --enable-experimental --enable-palette && make -j 16

$ ./aomenc -D --codec=av1 ~/videos/screen_content_set/gimp.y4m -o
/tmp/foo.webm --tune-content=screen --limit=50

Pass 1/2 frame   50/51      8976B    1436b/f   86169b/s 2902620 us
(17.23 fps)
Pass 2/2 frame   25/0          0B 2933053 us 8.52 fps [ETA  unknown]
aomenc: ../../av1/encoder/cost.c:46: cost: Assertion `prob != 0' failed.
Aborted (core dumped)

Change-Id: I47a76b8f415060909bc8448fae3002857eb61d8e
2016-11-01 15:25:57 -07:00
Yi Luo
ea1167c33f Remove unused copies of transform related source code
- Library size reduces: 165 kB, 292 kB (HBD).

Change-Id: I50cb630dde326bd2a28c0db4b7e2d53c2fd94a2a
2016-11-01 15:07:46 -07:00
Jingning Han
c8b8936fdc Rework transform block partition context model
This commit allows the partition context model to account for the
maximum transform block size of the coding block.

Change-Id: I22b91e85fff70faa974afd362ce327d3f2eda81d
2016-11-01 15:00:04 -07:00
Zoe Liu
82c8c92cc5 Make a small code clean on handle_inter_mode()
Change-Id: I5fb4898045a481f7996c2ad019d2f741aab08fc7
2016-11-01 14:52:34 -07:00
Yaowu Xu
57a7baf666 Merge "Fix merge issues related --enable-ec-adapt" into nextgenv2 2016-11-01 21:07:18 +00:00
Yaowu Xu
980eb2e9fa Merge "Change to use correct variable in for-loop" into nextgenv2 2016-11-01 21:07:11 +00:00
Yi Luo
7317200002 Hybrid inverse transforms 16x16 AVX2 optimization
- Add unit tests to verify the bit-exact result.
- User level time reduction (EXT_TX):
    encoder: 3.63%
    decoder: 2.36%
- Also add tx_type=V_DCT...H_FLIPADST SSE2 for 16x16 inv txfm.

Change-Id: Idc6d9e8254aa536e5f18a87fa0d37c6bd551c083
2016-11-01 13:38:20 -07:00
Yaowu Xu
8af861bbf1 Fix merge issues related --enable-ec-adapt
1. Avoid compiler warnings.
2. Enable prob_diff_update() required by update_txfm_probs().

Change-Id: I9081b645c55a8432bdaeb600e9ba901c0d0d96f5
2016-11-01 12:36:04 -07:00
Yaowu Xu
ddcdd5b1e5 Merge "Fix a compiler warning with --enable-adapt-scan" into nextgenv2 2016-11-01 18:12:49 +00:00
Yaowu Xu
2ce9707910 Merge "Resolve build issue --enable-aom-qm" into nextgenv2 2016-11-01 18:12:39 +00:00
Yaowu Xu
6043bfdb03 Change to use correct variable in for-loop
Change-Id: I252c2f06dfe256d2d33fd1abc42aaadf50273cc8
2016-11-01 09:54:05 -07:00
Jingning Han
ae81f8b2ab Merge "Make txfm_partition_update support rectangular tx_size" into nextgenv2 2016-11-01 16:51:03 +00:00
Jingning Han
2e4f129b42 Merge "Use get_entropy_context() in select_tx_block" into nextgenv2 2016-11-01 16:50:55 +00:00
Yaowu Xu
b386f0b762 Fix a compiler warning with --enable-adapt-scan
Change-Id: I93b191a522ed3e3ca9a363beab4292f64e869610
2016-11-01 09:40:12 -07:00
Yaowu Xu
a5924740a2 Resolve build issue --enable-aom-qm
Change-Id: I9f52ddb53b39cefd2e0ee7144203e1f3958d01aa
2016-11-01 09:32:03 -07:00
Yaowu Xu
fd601e346c Merge "Rename av1_convolve.[hc] to convolve.[hc]" into nextgenv2 2016-11-01 02:25:19 +00:00
Yaowu Xu
ec040fe23c Merge "cmake support: A starting point." into nextgenv2 2016-11-01 02:25:05 +00:00
Yaowu Xu
0279e91576 Merge "decodemv.c: relocate a function" into nextgenv2 2016-11-01 02:24:51 +00:00
Yaowu Xu
8040aaf9b1 Merge "Fix a bad merge" into nextgenv2 2016-11-01 02:24:38 +00:00
Yaowu Xu
6557ea9fe2 Rename av1_convolve.[hc] to convolve.[hc]
Change-Id: I2047adc4c147201ce0ce3c533fe2861cbff1002c
2016-10-31 17:17:37 -07:00
Jingning Han
7956bd64d7 Make txfm_partition_update support rectangular tx_size
Change-Id: I7d2414a8766141d5109b599271179bc505c772d3
2016-10-31 16:46:30 -07:00
Tom Finegan
fc6f23647d cmake support: A starting point.
Start adding cmake build support. This is based on the generic-gnu
target and will not build anything. It simply produces a project file
(when generating for a IDE) that can be loaded and that allows for
interaction with (most of) the aom sources used in a generic-gnu
build.

Notable missing pieces:
- flag testing
- config generation
- experiment configuration
- enable/disable encoder/decoder
- aomenc/aomdec
- all third party library build integration
- all tests

Change-Id: Iaeda0b03d58591a26a8fb54f63a2aa3b5354e3a6
2016-10-31 16:46:05 -07:00
Yaowu Xu
b24e115bc6 decodemv.c: relocate a function
Change-Id: I932dd9c8b43a20d248c00847b19dff88e6eb11be
2016-10-31 16:45:37 -07:00
Angie Chiang
fd248ab173 Merge "Refactor scan_test.cc" into nextgenv2 2016-10-31 23:31:18 +00:00
Jingning Han
ce059e86fb Use get_entropy_context() in select_tx_block
Replace redundant separate handling to retrieve the context value.

Change-Id: I18dde4599cd08ffe33a78694ec377487609de1b1
2016-10-31 16:27:28 -07:00
Yaowu Xu
e86288d2de Fix a bad merge
Change-Id: I4615e8e64d75b1f4277d2221ec94c5d4f1830aa4
2016-10-31 15:56:38 -07:00
Jingning Han
e29fc1daef Merge "Refactor max_blocks_wide/high computation" into nextgenv2 2016-10-31 22:20:16 +00:00
Jingning Han
609f5c63ac Merge "Remove unused tx_size tables" into nextgenv2 2016-10-31 22:20:08 +00:00
Jingning Han
6511159787 Merge "Replace get_tx2d_size() with direct tx_size_2d[] table access" into nextgenv2 2016-10-31 22:19:59 +00:00
Jingning Han
6491b97350 Merge "Support rectangular tx_size in recursive txfm syntax coding" into nextgenv2 2016-10-31 22:19:46 +00:00
Yaowu Xu
292bd65510 Merge changes from topic 'fix_ec_adapt' into nextgenv2
* changes:
  Reverse order of CLPF and dering
  Refactor: read_tx_size_probs()
  Fix compiling issues with --enable-ec-adapt
  Fixes compilation error on Windows/Visual Studio
2016-10-31 22:18:52 +00:00
Angie Chiang
ec932242b3 Refactor scan_test.cc
Change-Id: I546a955a95d6d43182631ad5e8d1c137c36e9a0c
2016-10-31 13:43:51 -07:00
Steinar Midtskogen
5d56f4d69a Reverse order of CLPF and dering
Low latency:
PSNR YCbCr:     -0.15%      0.11%      0.12%
   PSNRHVS:     -0.25%
      SSIM:     -0.26%
    MSSSIM:     -0.26%
 CIEDE2000:     -0.03%

High latency:
PSNR YCbCr:     -0.18%      0.18%      0.07%
   PSNRHVS:     -0.20%
      SSIM:     -0.21%
    MSSSIM:     -0.21%
 CIEDE2000:     -0.03%

Change-Id: Ieb86d9ba353220de6454bdc15cea825944b6385b
2016-10-31 12:50:11 -07:00
Jingning Han
f65b870e27 Refactor max_blocks_wide/high computation
Factor common codes that show up in multiple places.

Change-Id: I0a72213a151f74bdad926d59f86f0a28d00968fc
2016-10-31 12:39:36 -07:00
Jingning Han
393a60d208 Remove unused tx_size tables
Change-Id: I04367fb68e8fd027f4b9d945f4001e5ab346d098
2016-10-31 12:39:33 -07:00
Jingning Han
7e9929736c Replace get_tx2d_size() with direct tx_size_2d[] table access
Change-Id: I20040cdb5d9fdbf6c50082e5e17b4cfbd1926b13
2016-10-31 12:39:33 -07:00
Jingning Han
42a0fb369d Support rectangular tx_size in recursive txfm syntax coding
Change-Id: I40aa342ffa5b6effe8b124b94783e5f0bd2f2a81
2016-10-31 12:38:07 -07:00
Jingning Han
a98d80fdaa Merge "Use the actual transform block size for loop filter selection" into nextgenv2 2016-10-31 19:09:07 +00:00
Yaowu Xu
efc7535beb Refactor: read_tx_size_probs()
Change-Id: Ibdedd9b8e0b6646b882bc159856ac7c7e7073149
2016-10-31 09:46:42 -07:00
Yaowu Xu
750955b4c1 mvref_common.c: apply clang-format
Change-Id: I755bfb11a57e92e3a68855a53e95efe526f198fd
2016-10-31 09:13:53 -07:00
Yaowu Xu
1aceffa06c Fix compiling issues with --enable-ec-adapt
Change-Id: I52e2c84ce43d36f78806c54b214f9e5b07c5f0f5
2016-10-31 09:13:53 -07:00
Arild Fuldseth (arilfuld)
59622cf292 Fixes compilation error on Windows/Visual Studio
Change-Id: I32377deb5f1e882370c70449cb8f68f2fdafcbef
2016-10-31 09:13:53 -07:00
Yaowu Xu
09a4265725 Merge "simp-mv-pred integration with ref-mv" into nextgenv2 2016-10-30 19:54:41 +00:00
Yaowu Xu
aa70234e82 Merge "Fix the top-right reference block location" into nextgenv2 2016-10-30 19:54:23 +00:00
Yaowu Xu
ca5e18b750 Merge "Upsample reference frames after size dependent speed features are calculated." into nextgenv2 2016-10-30 19:54:17 +00:00
Yaowu Xu
e7a64cc9ec Merge "Let is_interp_needed always return 1" into nextgenv2 2016-10-30 19:53:40 +00:00
Yaowu Xu
02d33bfeeb Merge "Centralize EC_MULTISYMBOL error checking." into nextgenv2 2016-10-30 19:53:29 +00:00
Yaowu Xu
2a67024991 Merge "EC_ADAPT: disable tests requiring tiles." into nextgenv2 2016-10-30 17:16:34 +00:00
Yaowu Xu
a2d2a1858e Merge "EC_ADAPT: refactor and fix MinArfFreq unit tests." into nextgenv2 2016-10-30 17:16:19 +00:00
Yaowu Xu
99be652acb Merge "Only build aom_read/write_symbol if CONFIG_EC_MULTISYMBOL" into nextgenv2 2016-10-30 17:16:06 +00:00
Yaowu Xu
06a5ea9617 Merge "EC_ADAPT: improved symbol adaptation." into nextgenv2 2016-10-30 17:15:54 +00:00
Yaowu Xu
46fcecc395 Merge "EC_ADAPT: send updates for the correct nodes." into nextgenv2 2016-10-30 17:15:40 +00:00
Yaowu Xu
2ae4214618 Merge "Add ec_multisymbol for common daala_ec and rans code" into nextgenv2 2016-10-30 17:15:28 +00:00
Yaowu Xu
58eeb100ab Merge "Handle entropy coder experiment dependencies" into nextgenv2 2016-10-30 16:22:27 +00:00
Yaowu Xu
28adf035df Merge "Disable the SuperframeTest with --enable-daala_ec." into nextgenv2 2016-10-30 16:22:16 +00:00
Yaowu Xu
cfab447bf8 Merge "Fix ec_adapt+daala_ec test failure" into nextgenv2 2016-10-30 16:22:05 +00:00
Yaowu Xu
eaafb17d41 Merge "Add EC_ADAPT experiment for symbol-adaptive entropy coding." into nextgenv2 2016-10-30 16:21:47 +00:00
Deng
ca8d24d4e1 simp-mv-pred integration with ref-mv
This commit adds simp-mv-pred experiment. The experiment is to work on
top of ref-mv experiment to save memory bandwidth and reduce the size
of line buffer needed in ref-mv experiment.

When compared to ref-mv, this experiment showed:
low-delay BDR gain: 0.03%
High-delay BDR gain: 0.01%
memory/memory bandwidth saving: 40%
local memory/gate count saving: 20%

Change-Id: Ic4006e041fc58ede411da83d0d730c464ebe1749
2016-10-29 22:26:48 -07:00
Jingning Han
ea9cf097c9 Fix the top-right reference block location
This commit fixes the top-right reference block location for block
sizes above 8x8. It improves the coding performance of ref-mv:

lowres 0.08%
midres 0.15%

Thanks to jiafeng@ for finding this issue.

Change-Id: I70750fc7b18bf0126d3e07abc1b63ca5a160193e
2016-10-29 22:26:48 -07:00
Thomas Daede
919bd6abd7 Upsample reference frames after size dependent speed features are calculated.
This prevents a crash if the upsample_refs speed feature is
changed as part of set_size_dependent_vars, when the recode
loop is enabled.

Change-Id: I645e389bfe961879dd2001439a34fde2993868d9
2016-10-29 22:26:48 -07:00
Angie Chiang
a69ce1b314 Let is_interp_needed always return 1
This CL will cause
0.122% PSNR drop on lowres dataset
0.059% PSNR drop on midres dataset

However, it will facilitate hardware implementation.

Change-Id: I0a0713acacbfd571509a721337711c021915dd3c
2016-10-29 22:26:48 -07:00
Nathan E. Egge
baaaa16186 Centralize EC_MULTISYMBOL error checking.
The EC_ADAPT experiment cannot work unless EC_MULTISYMBOL is also
 enabled.
This patch replaces all individual checks with a centralized check in
 both the bitreader.h and bitwriter.h.

Change-Id: I418852d95c5012cc074ed65cd24997e08bc2aadd
2016-10-29 22:26:27 -07:00
Thomas Davies
0575e6c2d4 EC_ADAPT: disable tests requiring tiles.
EC_ADAPT is currently not compatible with tiles.

Change-Id: Idd000f0ff23c28e7e4952024eadb55ba0a1da13d
2016-10-29 22:22:19 -07:00
Thomas Davies
6519bebf34 EC_ADAPT: refactor and fix MinArfFreq unit tests.
Ensure that cdfs are synced with pdfs after every
forward update.

Change-Id: I5677f78300156c8622f1728d7a343ff6c3a4ea64
2016-10-29 22:21:32 -07:00
Alex Converse
58c520afe9 Only build aom_read/write_symbol if CONFIG_EC_MULTISYMBOL
Change-Id: If86c7220ac9199a59e605dc43d42cc3db26cf8bd
2016-10-29 17:05:40 -07:00
Thomas Davies
f6c04acaa3 EC_ADAPT: improved symbol adaptation.
Place a floor under symbol probabilities and
modify adaptation rate.

Change-Id: Ic9cf6d9fadfc3bf1f3027bc3d2bb198526441591
2016-10-29 17:05:40 -07:00
Thomas Davies
09ebbfb39f EC_ADAPT: send updates for the correct nodes.
EOB and ZERO token are not currently adapted.

Change-Id: Ie7d657b71fcb157b09e40874fb06a8b7cd95cc70
2016-10-29 17:05:40 -07:00
Alex Converse
aca9feba82 Add ec_multisymbol for common daala_ec and rans code
The new ec_multisymbol experiment supersedes the rans experiment and is
used for multisymbol features that can be backed by either daala_ec or
rans.

This experiment is automatically enabled by ec_adapt and will try to
enable daala_ec or ans (in that order).

Change-Id: Ie75b4002b7a9d7f5f7b4d130c1aacb3dbe97e54f
2016-10-29 17:05:40 -07:00
Alex Converse
242558a21b Handle entropy coder experiment dependencies
Change-Id: I854c53d9379f820b5a78fcb53f9ef09bc6f9d9e7
2016-10-29 17:05:40 -07:00
Yaowu Xu
15c1aa60f3 Disable the SuperframeTest with --enable-daala_ec.
Due to the way the daala entropy coder handles raw bits, the current
test is broken because the buffer length is not known when calling
aom_reader_init() is called.

Change-Id: I76e93ec0e160e31f286c23f7c9c0094390c6c2d4
2016-10-29 17:05:40 -07:00
Alex Converse
bc0a5bacb5 Fix ec_adapt+daala_ec test failure
AV1/AqSegmentTest.TestNoMisMatchAQ1/6 was failing with this experiment
pair.

BUG=aomedia:70

Change-Id: I8c53a043471a87a98a06687afce2e28891592362
2016-10-29 17:05:40 -07:00
Thomas
9ac5508f32 Add EC_ADAPT experiment for symbol-adaptive entropy coding.
This experiment performs symbol-by-symbol statistics
adaptation for non-binary symbols. It requires DAALA_EC or
RANS and ANS to be enabled. The adaptation is currently
based on a simple recursive filter and is taken from
Daala. It has an adaptation rate dependent on alphabet size,
taken from Daala. It applies wherever non-binary symbols
are encoded using Cumulative Probability Functions rather
than trees.

Where symbols are adapted, forward updates in the compressed
header are removed.

In the case of RANS coefficient token values are adapted,
with the exception of the zero token which remains a
binary symbol. In the case of DAALA_EC other values
such as inter and intra modes are adapted as CDFs are
provided in those cases.

The experiment is configured with:

./configure --enable-experimental --enable-daala-ec --enable-ec-adapt

or

./configure --enable-experimental --enable-ans --enable-rans \
    --enable-ec-adapt

EC_ADAPT is not currently compatible with tiles.

BDR results on Objective-1-fast give a small loss:

PSNR YCbCr:      0.51%      0.49%      0.48%
PSNRHVS:      0.50%
SSIM:      0.50%
MSSSIM:      0.51%
CIEDE2000:      0.50%

Change-Id: I3888718e42616f3fd87144de7f125228446ac984
2016-10-29 16:57:48 -07:00
Jingning Han
ee9264c923 Merge "Replace num_4x4_blocks_txsize_loopup table" into nextgenv2 2016-10-29 23:01:26 +00:00
Jingning Han
73d65a49a9 Merge "Refactor rate-distortion optimization of recursive transform partition" into nextgenv2 2016-10-29 23:01:14 +00:00
Jingning Han
9fb1d69e82 Use the actual transform block size for loop filter selection
Parse the recursive transform block partition to fetch the actual
transform size. Use this correct transform size to select the
corresponding loop filter kernel. This slightly improves the coding
performance of recursive transform partition for hdres to 0.14%.

Change-Id: Ibe8bc3fdd0d222a4f1fb8156c56a407bec052b9b
2016-10-29 15:59:55 -07:00
Urvang Joshi
1252f75616 Merge "RANGE_CHECK: "==" || ">" is simply ">="." into nextgenv2 2016-10-28 23:55:01 +00:00
Zoe Liu
9d37fe47a2 Merge "Clean the code in ref frame context decision for ext-refs" into nextgenv2 2016-10-28 23:36:41 +00:00
Jingning Han
32b2028b30 Replace num_4x4_blocks_txsize_loopup table
Unify the transform block size access table in preparation for
2x2 transform integration.

Change-Id: I308def6729e138ae2b2542175206e3225c0cb392
2016-10-28 15:42:44 -07:00
Jingning Han
9fdc42293f Refactor rate-distortion optimization of recursive transform partition
Support rectangular transform block in the rate-distortion cost
estimator.

Change-Id: I99201fcae797c1ed2f2184021a215867eac0288f
2016-10-28 14:48:40 -07:00
Sarah Parker
d722f71ed8 Merge "Bitwise to logical & in rdopt ext tx prune function" into nextgenv2 2016-10-28 21:43:03 +00:00
Urvang Joshi
cd8ab904e1 RANGE_CHECK: "==" || ">" is simply ">=".
Also:
- For unsigned ints, don't check value >= 0 as that is always true.
- Add "-Wlogical-op" warning flag which would have warned that "logical
  'or' of collectively exhaustive tests is always true" before this
  patch.

Change-Id: Idf3bd312464397f2df19256fc69b22f345dc7753
2016-10-28 14:40:29 -07:00
Yaowu Xu
d64eaf138e Merge "Tile groups: ensure each tile in a TG has a length." into nextgenv2 2016-10-28 21:26:32 +00:00
Yaowu Xu
edd3f9c418 Merge "Fix update_delta_q_probs compile warning" into nextgenv2 2016-10-28 21:26:23 +00:00
Yaowu Xu
efd5725242 Merge "Encode and decode multiple tile groups" into nextgenv2 2016-10-28 21:26:11 +00:00
Sarah Parker
68a26b6b4a Bitwise to logical & in rdopt ext tx prune function
Making this change in case the future implementation changes and the
compairson is no longer between single bits.

Change-Id: I94f474ce7d82febfa23cec65cbe1b9d240b42e02
2016-10-28 13:19:33 -07:00
Thomas Davies
8fe64a3a23 Tile groups: ensure each tile in a TG has a length.
This ensures TGs can be decoded even if the whole
frame has not been received and the frame length
is not known.

Change-Id: If24837fcc3b5c46554751be792e91100de73e8d6
2016-10-28 13:01:40 -07:00
Jingning Han
be44c5f46f Fix update_delta_q_probs compile warning
Change-Id: Ifb93970ed876ed61259b2f8da739171857c97fda
2016-10-28 13:01:40 -07:00
Debargha Mukherjee
3ff8cb764b Merge "Fix aom_fdct8x8_ssse3 in high bit depth mode" into nextgenv2 2016-10-28 19:31:45 +00:00
Zoe Liu
782c96438c Clean the code in ref frame context decision for ext-refs
For compound mode, it is a sure thing that one of the 2 reference frames
would be either a forward predictive reference, or a backward predictive
reference, and the other would provide a different prediction.

Change-Id: I8d7b40525bec4db0f26ba255c8eefa9f20bd52a3
2016-10-28 12:23:38 -07:00
Urvang Joshi
76bc587f69 Merge "get_palette_color_context: Make code more readable." into nextgenv2 2016-10-28 19:03:26 +00:00
Thomas Davies
80188d1546 Encode and decode multiple tile groups
This is a manual adaptation of the following commit from aom/master:
ce12003d60a1c8d6c65ed07ba165c34062fcbcbd

The original commit message:

A tile group is a set of tiles in scan order.

Each tile group has a version of uncompressed and compressed headers,
identical apart from tile group parameters.
Encoding probability updates takes account of the number of
headers to control overheads.

The decoder supports arbitrary numbers of tile groups with
arbitrary number of tiles. The number of tiles in a TG is
signalled in the uncompressed header for that TG.

The encoder currently only supports a fixed number
of TGs (3, when error resilient mode is on) of equal size
(except possibly for the last one).

The average BDR performnce with 3 tile groups versus
anchor with error resilient mode and up to 16 tiles is:

NR YCbCr:      3.02%      3.04%      3.05%
PSNRHVS:      3.09%
SSIM:      3.06%
MSSSIM:      3.05%
CIEDE2000:      3.04%

Change-Id: I9b97c5ed733103b9160a3a5d4370de5322c00c0b
2016-10-28 11:52:13 -07:00
Urvang Joshi
79f4fc476d get_palette_color_context: Make code more readable.
For clarity, use separate variables for 'color_ctx_hash' and
'color_ctx' instead of reusing same variables for both.

BUG=webm:1324

Change-Id: I3a516ea54353e1f0737822c613a68da252e30c6e
2016-10-28 09:42:05 -07:00
Angie Chiang
3655dcd4cf Fix tmp_rd type error in handle_inter_mode()
Change-Id: I9398c77c12e9c4caa19a76b92e3035a3135cfd7a
2016-10-28 09:05:27 -07:00
Angie Chiang
349b723f5c Merge "Add unit test for adapt_scan experiment" into nextgenv2 2016-10-28 15:53:59 +00:00
Angie Chiang
6b7255374d Merge "Pass block pixel width/height into av1_predict_intra_block" into nextgenv2 2016-10-28 15:51:30 +00:00
Jingning Han
cb277c0b82 Merge "Refactor recursive transform block partition search" into nextgenv2 2016-10-28 15:50:36 +00:00
Jingning Han
6675bbca0e Merge "Simplify logics in encode_inter_mb_segment" into nextgenv2 2016-10-28 15:50:15 +00:00
Jingning Han
fe8d6c62ce Merge "Refactor recursive transform block decoding" into nextgenv2 2016-10-28 15:49:27 +00:00
Jingning Han
c17b9e00dc Merge "Refactor recursive transform block size decoding" into nextgenv2 2016-10-28 15:49:06 +00:00
Jingning Han
73144260e3 Merge "Remove unused get_tx1d_width/height wrapper" into nextgenv2 2016-10-28 15:48:25 +00:00
Jingning Han
2b0670e10a Merge "Use transform block partition depth count for frame header reset" into nextgenv2 2016-10-28 15:48:11 +00:00
Yaowu Xu
2df83d5c10 Merge "Remove av1_use_hp_mv()" into nextgenv2 2016-10-28 15:26:35 +00:00
Yaowu Xu
c66f264d17 Merge "rans: Use symbol coding for motion vectors" into nextgenv2 2016-10-28 15:26:22 +00:00
David Barker
0602edfbc5 Fix aom_fdct8x8_ssse3 in high bit depth mode
Change-Id: I63e492163ef10e12a842837368c209b8ffc4eee0
2016-10-28 10:13:43 +01:00
Jingning Han
5822404485 Refactor recursive transform block partition search
Use unified transform block size and coding block size map. This
prepares for the integration of 2x2 transform block size and the
rectangular transform block size.

Change-Id: I99f51017d19aef337639b708ee9c7faedcc20935
2016-10-28 05:12:19 +00:00
Jingning Han
c4049db573 Simplify logics in encode_inter_mb_segment
Unify coefficient context used by different experiments. Make
block size and transform block size consistent with rest codebase.

Change-Id: I237336f161d6c473b88c59c48ee68d24b75ce738
2016-10-28 05:12:05 +00:00
Jingning Han
5f61426424 Refactor recursive transform block decoding
Unify the transform block and coding block mapping.

Change-Id: Ifb394809a4aafee6adf2b49a2607036cf13c878e
2016-10-27 22:11:24 -07:00
Jingning Han
65abc314c4 Refactor recursive transform block size decoding
Unify the transform block size to block size mapping.

Change-Id: Ic7359d016cd5965983c4a5476624c09f3123f91c
2016-10-27 22:11:20 -07:00
Yaowu Xu
94df7ab121 Merge "Deringing support for 4:2:2 by not deringing chroma" into nextgenv2 2016-10-28 04:20:17 +00:00
Yaowu Xu
fbf8788d99 Merge "Namespace the idct/iad symbols" into nextgenv2 2016-10-28 04:19:51 +00:00
Angie Chiang
45c198a197 Pass block pixel width/height into av1_predict_intra_block
Change-Id: Ia69bceef24b61b0a222783eba79e7a70bb60edd8
2016-10-27 17:13:50 -07:00
Sarah Parker
243f87ef49 Merge "Cosmetic fixes in global motion experiment" into nextgenv2 2016-10-27 23:23:39 +00:00
Zoe Liu
b99af6e3e9 Merge "A small bug fix in ext-refs on the RD mode selection" into nextgenv2 2016-10-27 22:43:55 +00:00
Alex Converse
6317c88f5a Remove av1_use_hp_mv()
It always returns true since the related misc_fix[1] was merged.

[1] 23e83574b6a5105bdc686c49f2d5909f33ea721f

Change-Id: Ie3af685572a2f0a42d2b9fb9903c1abeea225dfd
2016-10-27 14:33:48 -07:00
Debargha Mukherjee
058e42d399 Merge "Fix clpf and dering signalling when used with ext-partition-types" into nextgenv2 2016-10-27 20:45:07 +00:00
Alex Converse
3fc98e86d1 rans: Use symbol coding for motion vectors
Change-Id: If497b53c3b36e32fb98c99dba2d4a490e226572a
2016-10-27 12:38:43 -07:00
Jean-Marc Valin
c67b895fa4 Deringing support for 4:2:2 by not deringing chroma
No change in output for 4:2:0 and 4:4:4

Change-Id: Ic46753d23a5b5f90b611a3da1a4574870519957c
2016-10-27 12:37:52 -07:00
Luca Barbato
f0f98578df Namespace the idct/iad symbols
Make linking to libvpx and libaom at the same time possible.

Change-Id: I7bab8527a32e446e3d564e6fa5d94ccd056bc63f
2016-10-27 12:36:37 -07:00
Debargha Mukherjee
a5e3bc0fbc Merge "Fix compile error with --enable-ans + --enable-accounting" into nextgenv2 2016-10-27 19:03:22 +00:00
Debargha Mukherjee
030527c54a Merge "Fix dering filter when using 4:2:2 or 4:4:0 subsampling" into nextgenv2 2016-10-27 19:03:04 +00:00
Jingning Han
d4c65cdba4 Remove unused get_tx1d_width/height wrapper
Change-Id: Ie8bc40579720b8c402bbc8b23b6fd3a7a50834bb
2016-10-27 18:49:45 +00:00
Jingning Han
2adcfb19d5 Use transform block partition depth count for frame header reset
Use the transform block partition depth counts to decide if to
reset the tx_mode at frame header level. Add a comment to make this
explicit.

Change-Id: I417920b4b61eeb91cde9536336a12deea2d42f79
2016-10-27 18:49:32 +00:00
Angie Chiang
3f8419976d Add unit test for adapt_scan experiment
Change-Id: I90518b7b5c8bb930f5eeef4ce4cbb536139722ca
2016-10-27 11:43:10 -07:00
Angie Chiang
3d41cb339c Merge "Refactor: Localize tmp_rd in handle_inter_mode()" into nextgenv2 2016-10-27 18:27:18 +00:00
Angie Chiang
47d56f4f36 Merge "Sync definition of av1_get_switchable_rate in rd.c/h" into nextgenv2 2016-10-27 18:27:07 +00:00
Yaowu Xu
18ee02b0b9 Merge "Fix two bugs in parallel_deblocking experiment" into nextgenv2 2016-10-27 14:06:07 +00:00
Yaowu Xu
9edd6005fd Merge "fix filtering of uv int4x4 for odd rows" into nextgenv2 2016-10-27 14:05:52 +00:00
Yaowu Xu
d5723e6f09 Merge "Add parallel-deblocking experiment" into nextgenv2 2016-10-27 14:05:39 +00:00
David Barker
f8935c9c92 Fix clpf and dering signalling when used with ext-partition-types
Previously, when ext-partition-types and either clpf or dering were
enabled, the signalling for clpf/dering would not be encoded or decoded,
as the code to do so was inside a #if !CONFIG_EXT_PARTITION_TYPES block.
This caused many tests (eg, AV1/EndToEndTestLarge.EndToEndPSNRTest/0)
to fail with encode/decode mismatches.

Change-Id: If1742deb1812877813b2c3e93a048430f9a504ba
2016-10-27 13:19:01 +00:00
Jingning Han
199502f259 Merge "Support potential 2x2 transform block unit" into nextgenv2 2016-10-27 00:50:02 +00:00
Yaowu Xu
f6f2cfcaa7 Merge "av1/common/filter.h: apply clang-format" into nextgenv2 2016-10-26 23:43:02 +00:00
Yi Luo
400dcc8088 Merge "Fix aom_fdct32x32_avx2 output as CONFIG_AOM_HIGHBITDEPTH=1" into nextgenv2 2016-10-26 22:42:17 +00:00
Jingning Han
607fa6a6ce Support potential 2x2 transform block unit
Make the codec support 2x2 tranform block unit for chroma components.

Change-Id: Ic454535bd5620abe88a2e99789160cc4664ee518
2016-10-26 15:38:13 -07:00
Jingning Han
b5a3082190 Merge "Synchronize tx_size counts in the decoder" into nextgenv2 2016-10-26 21:46:18 +00:00
Ryan Lei
6f8c1a78da Fix two bugs in parallel_deblocking experiment
This commit fixes two major bugs in parallel deblocking experiment, the
first one is missing initialization of lfm->lfl_uv array for horizontal
filtering. The second one is inconsistent order of vertical/horizontal
filtering of superblocks within a frame between encoder and decoder.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=45#c2
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=53#c1

Change-Id: I2df7eb313d49203fb70efe2bdf957b9d7e0bf678
2016-10-26 13:42:31 -07:00
Sarah Parker
7ba8dc1688 Fix ubsan left shift warnings in warped motion library
Change-Id: I14f609664411577706dbe4c099d90f0cfe2f7bb3
2016-10-26 12:58:36 -07:00
Yi Luo
97b29925fe Merge "Fix incorrect merge of forward txfm function declarations" into nextgenv2 2016-10-26 19:22:53 +00:00
Sarah Parker
b3dab4983b Cosmetic fixes in global motion experiment
These are in response to post-commit suggestions made on
If429c93bb90b66fdff0edc07ecd9fc078077d303.

Change-Id: Id29afa158471bd6259bd07ac00812a50bfd0a709
2016-10-26 11:45:50 -07:00
Urvang Joshi
839b07feec Merge changes I56cddcb4,I40c5a652 into nextgenv2
* changes:
  Mark bogus palette color probabilities as zero
  get_palette_color_context: code cleanup
2016-10-26 18:28:56 +00:00
Jingning Han
906be078a5 Synchronize tx_size counts in the decoder
Make both encoder and decoder use depth index for frame count.

Change-Id: I96dddffc0a83ad5e4e2847b15391e01ba01ee502
2016-10-26 11:04:58 -07:00
Angie Chiang
180566d854 Merge "av1/convolve.[hc],av1_convolve_test: add missing copyright" into nextgenv2 2016-10-26 17:51:28 +00:00
Angie Chiang
8e26f768c1 Merge "Use has_subpel_mv_component in av1_is_interp_needed" into nextgenv2 2016-10-26 17:50:54 +00:00
Angie Chiang
65eb2cf78a Sync definition of av1_get_switchable_rate in rd.c/h
Change-Id: I720934e02a15fd6184bdda6c1b8a23d5b02a5284
2016-10-26 10:48:47 -07:00
Sarah Parker
70c0df29da Merge "Revise precision clamping in GM param refinement" into nextgenv2 2016-10-26 17:33:47 +00:00
Yi Luo
133c13d637 Fix incorrect merge of forward txfm function declarations
- Restore the fwd txfm HBD function declarations exposure.

Change-Id: I1e33df6297fd37e242f4b73c8ab97063b9feb7c6
2016-10-26 10:30:53 -07:00
Jingning Han
b0a7130656 Convert tx_size to relative depth to fetch tx_size_cost
Use the relative transform partition depth as index to fetch the
tx_size_cost value.

Change-Id: I7d5119817baa96f23c32828065ff3175bb9f75cf
2016-10-26 17:12:41 +00:00
Jingning Han
8e022edd59 Allow backward probability model update from tx_size=0
Replace enum items with range definitions.

Change-Id: Iba2b7cac657db5fb6177cb5c9e6f40ec0125d926
2016-10-26 17:12:20 +00:00
Jingning Han
e5596d3168 Merge "Add depth to tx_size mapper to bit-stream coding" into nextgenv2 2016-10-26 17:11:56 +00:00
Angie Chiang
c352e79ee6 Merge "Simplify interpolation filter search in handle_inter_mode()" into nextgenv2 2016-10-26 16:51:58 +00:00
Janne Salonen
e8a3dbc0ff fix filtering of uv int4x4 for odd rows
Change-Id: I61f91855430e11da45d4e91ec6d3a8976c461cb7
2016-10-26 09:26:28 -07:00
Ryan Lei
15149484ec Add parallel-deblocking experiment
This commit is a manual cherry-pick from aom/master:
42ff3881ace1564aac9debae86ef37a8deb8d381

Change-Id: I4a3cdb939b7b96a3aa27f6a00da7a0e73222f3f3
2016-10-26 09:20:47 -07:00
Yunqing Wang
e61ec7bc19 Merge "Change 2 motion search counts to be tile data" into nextgenv2 2016-10-26 16:17:42 +00:00
Yaowu Xu
5a1fedfdda av1/common/filter.h: apply clang-format
Change-Id: I37f0d1fbcc6f262ae287290e2e6f5648ad0113c8
2016-10-26 09:14:01 -07:00
Jingning Han
4e1737af64 Add depth to tx_size mapper to bit-stream coding
It serves as a helper function to integrate various transform coding
options.

Change-Id: I64e7d0c88ea10137fa1ff1072d865eb0054c2a25
2016-10-26 15:45:19 +00:00
Sarah Parker
f41a06b231 Revise precision clamping in GM param refinement
This ensures that the parameter refinement never
results in a motion parameter value that exceeds the number
of alloted bits in the bitstream. It accounts for all of
the necessary precision shifts required to make global motion compatible
with the warped motion library. It also accounts for the
zero-centering that is applied to global motion parameters that are
naturally centered around one.
Change-Id: If429c93bb90b66fdff0edc07ecd9fc078077d303
2016-10-25 21:11:39 -07:00
Jingning Han
c83ef8b946 Merge "Refactor transform size coding" into nextgenv2 2016-10-26 01:12:04 +00:00
Angie Chiang
a2b56d3e05 Refactor: Localize tmp_rd in handle_inter_mode()
Change-Id: I01cb5cd544c849be160a9441d141c01a3424d32b
2016-10-25 17:34:59 -07:00
Angie Chiang
b135debcb6 Use has_subpel_mv_component in av1_is_interp_needed
Change-Id: I8980df4512de605aaa6a67c1f05e544f69a12e96
2016-10-25 17:10:19 -07:00
Angie Chiang
75c2209341 Simplify interpolation filter search in handle_inter_mode()
BDRate
ext_interp  lowres -0.001%
dual_filter lowres  0.001%

Change-Id: Ic24165d554c300eaa0188ee8cb88d320b74125aa
2016-10-25 17:10:08 -07:00
Angie Chiang
6421191247 av1/convolve.[hc],av1_convolve_test: add missing copyright
Change-Id: Ie84bdf90c31b12977d32baacfc8086c1fdd96e65
2016-10-25 16:43:43 -07:00
Jingning Han
aae72a69c3 Refactor transform size coding
Introduce the transform block partition depth macro definition.

Change-Id: I218dc77a77c8e967da4d270d4ec0d7691b712a5f
2016-10-25 15:42:30 -07:00
Jingning Han
b2d6a59ad5 Merge "Refactor tx_size use case in block encoding stage" into nextgenv2 2016-10-25 22:29:21 +00:00
Jingning Han
2eded9a3ff Merge "Refactor tokenize_vartx to use aligned transform block size fetch" into nextgenv2 2016-10-25 22:29:03 +00:00
Yunqing Wang
8c1e57c278 Change 2 motion search counts to be tile data
Imported changes from VP9:
https://chromium-review.googlesource.com/#/c/402551/
https://chromium-review.googlesource.com/#/c/403128/

Change-Id: I8570c867190a6fa641926431ce97f7d9d7da3528
2016-10-25 15:25:37 -07:00
Jingning Han
a1730659ec Merge "Use table fetch for block width in block_rd_txfm" into nextgenv2 2016-10-25 22:18:44 +00:00
James Zern
8aa4cbf5d5 Merge "update_state: quiet const warning w/global-motion" into nextgenv2 2016-10-25 22:15:39 +00:00
Yi Luo
0c552dfd82 Fix aom_fdct32x32_avx2 output as CONFIG_AOM_HIGHBITDEPTH=1
- Change FDCT32x32_2D_AVX2 output parameter to tran_low_t.
- Add unit tests for CONFIG_AOM_HIGHBITDEPTH=1.
- Update TODO notes.
BUG=webm:1323

Change-Id: If4766c919a24231fce886de74658b6dd7a011246
2016-10-25 14:33:21 -07:00
Urvang Joshi
d650f276ce Vertical scalers: Use signed int for src/dst pitch in parameters.
This avoids explicitly casting them to 'int' later.
These methods were already called with signed int arguments for pitch,
so this also avoids int -> unsigned int -> int conversion.

Change-Id: I2129f5ceff8f2525a188ee3ae52f9fe7067bd2e3
2016-10-25 13:00:22 -07:00
Angie Chiang
df70d29b72 Merge "Fix unsigned type error in gen_scaler.c" into nextgenv2 2016-10-25 19:43:09 +00:00
Yaowu Xu
15c37a5ae3 Merge "dkboolwriter.c: change copyright notice" into nextgenv2 2016-10-25 19:41:18 +00:00
Yaowu Xu
c2ac0a1d4c Merge "7-bit interpolation filters" into nextgenv2 2016-10-25 19:41:07 +00:00
Yaowu Xu
dece603fdf Merge "Use constrained tokenset with --enable-daala_ec." into nextgenv2 2016-10-25 19:40:51 +00:00
Jingning Han
e71ad1d4a2 Merge "Refactor dist_block() function" into nextgenv2 2016-10-25 19:39:22 +00:00
Angie Chiang
7c7e555ca0 Merge changes I6faedb29,Ic6586114 into nextgenv2
* changes:
  Remove speed feature of ext_interp experiment
  Refactor: handle_inter()
2016-10-25 19:36:49 +00:00
Jingning Han
de953b9d05 Refactor tx_size use case in block encoding stage
Change-Id: I56110d1fc94b335668e6b991442e9083bbaea8ee
2016-10-25 12:36:09 -07:00
Jingning Han
a893936335 Refactor tokenize_vartx to use aligned transform block size fetch
This prepares for the integration of rectangular transform size
into recursive transform block partition.

Change-Id: I164eb43d10afa9bb2f4722de7a48faa770ba4ced
2016-10-25 12:16:21 -07:00
Jingning Han
99e7a8d837 Merge "Refactor tx_size use cases in blockd.c" into nextgenv2 2016-10-25 19:03:29 +00:00
Jingning Han
c598cf853f Use table fetch for block width in block_rd_txfm
Make direct use of block_size_wide to fetch data for stride.

Change-Id: I0d8491e58cf00ea73c764d218cb56408b64d9ee7
2016-10-25 10:47:46 -07:00
Yaowu Xu
b695b1c118 dkboolwriter.c: change copyright notice
Change-Id: I1d9349a07ffd85991fc5673354d3ceff3404b358
2016-10-25 10:32:33 -07:00
Jingning Han
b9c572706d Refactor dist_block() function
Support automatic scale for mapping between transform block size
and pixel block size.

Change-Id: I141b0477a85c0dcc5f99b4e5d880cfccfae6d316
2016-10-25 10:22:17 -07:00
Arild Fuldseth
7acfabbc40 7-bit interpolation filters
Purpose:
-Reduce dynamic range of interpolation filter coefficents from 8
bits to 7 bits.
-Inner product for 8-bit input data can be stored in a 16-bit signed
integer.

Impact on compression efficiency:
-Marginal improvement, typically less than 0.5% BDR.

Change-Id: I58d1408307ae7d2a6f9de8965c5877b258703199
2016-10-25 10:18:55 -07:00
Yaowu Xu
1f112841d2 Merge "Refactor extrabits packing" into nextgenv2 2016-10-25 17:14:57 +00:00
Yaowu Xu
d8dc1fc522 Merge "Linearize extrabits writing." into nextgenv2 2016-10-25 17:14:44 +00:00
Nathan E. Egge
46e8490498 Use constrained tokenset with --enable-daala_ec.
Change-Id: Ia09edf92bf9f7ecacc65c232ac6e656cde236634
2016-10-25 10:13:22 -07:00
Jingning Han
95cff5c979 Refactor tx_size use cases in blockd.c
Use table to replace the arithmetic computation for mapping between
transform block and pixel number. Support automatic scale of block
size and transform block size.

Change-Id: I84766850172265d4295f418383dbc5e6e5838ec8
2016-10-25 09:50:07 -07:00
Debargha Mukherjee
7f9eb87082 Merge "Fix compile error with --enable-accounting" into nextgenv2 2016-10-25 16:25:42 +00:00
Angie Chiang
d35e12b184 Merge "Refactor: Add macro LOG_SWITCHABLE_FILTERS" into nextgenv2 2016-10-25 16:24:23 +00:00
Angie Chiang
d0aa90ed79 Remove speed feature of ext_interp experiment
This is to facilitate the refactor process

Change-Id: I6faedb29129b47abefe20821dc3f32a43db149d8
2016-10-25 09:22:35 -07:00
Angie Chiang
6305abe114 Refactor: Add macro LOG_SWITCHABLE_FILTERS
Change-Id: I7593ff2f8949d8bc26ca1c8577faaefb09640b59
2016-10-25 09:22:35 -07:00
Angie Chiang
1b131f1c64 Refactor: handle_inter()
Make the parenthesis symmetric
Replace interpolation filter mode number by macro

Change-Id: Ic6586114c4cebe920b950e1b3adc8ebc764d4713
2016-10-25 09:22:35 -07:00
Debargha Mukherjee
f8038850b6 Merge "Fix to make intra_only frames decodable out of order" into nextgenv2 2016-10-25 16:21:20 +00:00
Angie Chiang
dc1813ffd9 Fix unsigned type error in gen_scaler.c
Avoid applying unary minus operator on unsigned type

Change-Id: Ibc60541837eef06810f5be0aaa7fef9edcc8f8a4
2016-10-25 09:18:22 -07:00
Yaowu Xu
4579c5e458 Merge "update_state_supertx: rename a shadowed variable" into nextgenv2 2016-10-25 16:04:33 +00:00
Yaowu Xu
d971eb8521 Merge "Move small fixes and refactoring for obmc pred from AV1" into nextgenv2 2016-10-25 16:03:47 +00:00
Yaowu Xu
3635a832ab Merge "adapt_scan experiment" into nextgenv2 2016-10-25 16:03:40 +00:00
Alex Converse
d8fdfaa4da Refactor extrabits packing
- Eliminate the awkward _av1 suffix/infix in local variable names.
- Lift bitdepth selection out of the token loop.

Change-Id: I26d3397464f7808e0481a804033a93ca4f01f5d5
2016-10-25 08:59:08 -07:00
Alex Converse
81fd890773 Linearize extrabits writing.
The decoder is already linear so changing these tables would just create
a mismatch.

Change-Id: Ib888c0dc273e089c38298f569bb35b6e4c32dd60
2016-10-25 08:59:08 -07:00
Thomas Daede
8ec53b2655 Automatically upgrade profile to match input chroma subsampling.
This is a follow-up to 1195a396f6c53a5bd35559eed957c2aac855f7e.

Change-Id: I4db554e5d88290d55258062e29a1652707d91037
2016-10-25 08:56:55 -07:00
Yaowu Xu
2b33aa903f Remove select_tx_size from struct macroblock
It is no longer used anywhere.

Change-Id: I5d40664373d66821c5382f6155302b8719ce58c0
2016-10-25 08:56:55 -07:00
Guillaume Martres
4e4d3a075b Avoid unnecessary reencode in choose_largest_tx_size
This change is similar to the one done for choose_tx_size_from_rd in
daf841b4a10ece1b6831300d79f271d00f9d027b

It gives a 4% speed-up on bus_cif.y4m with the following settings:
--cpu-used=4 -p 1 --end-usage=q --cq-level=40 --tile-columns=0 --tile-rows=0

Change-Id: Ic54fe4a066a2c0b5f6349d80cd13de8bb8ddcabc
2016-10-25 08:56:55 -07:00
Brennan Shacklett
d4add7aca9 Remove encode_breakout and related speed features
Seems to be dead code

Change-Id: I17b3edc9e82d6a1da172a686522358a6b1a630e9
2016-10-25 08:56:55 -07:00
David Barker
01b16baa5a Fix compile error with --enable-ans + --enable-accounting
Change-Id: I43deba9c80b324c12852750d08c62dc2dd783835
2016-10-25 16:22:24 +01:00
David Barker
d971f40bcc Fix compile error with --enable-accounting
Change-Id: I4b18dbfb013c9805cb23083a68560ab212a0867a
2016-10-25 13:52:07 +01:00
David Barker
401204a50b Fix dering filter when using 4:2:2 or 4:4:0 subsampling
Change-Id: Ifa5bef5123e13df9cad59c7c870b58e18c2ce213
2016-10-25 12:54:59 +01:00
Peter de Rivaz
9d07888350 Fix to make intra_only frames decodable out of order
last_frame_type is not well defined for intra_only frames
if we are decoding them out of order.
This change removes a dependency on last_frame_type for these frames.

Change-Id: I440cac68792714de222e192a0b3e75f6e1aa5e4b
2016-10-25 10:19:57 +01:00
Sarah Parker
4b4e5eefe3 Merge "Extend warp_frame functions to average compound predictions" into nextgenv2 2016-10-25 02:00:48 +00:00
Angie Chiang
ed8cd9a9b4 adapt_scan experiment
Performance improvement
        BDRate
lowres  0.921%
midres  0.730%
hdres   1.019%

Change-Id: I26208d6c0531937bff44de505b4ea355c7852802
2016-10-24 18:24:56 -07:00
Alex Converse
f8306bfdc7 Mark bogus palette color probabilities as zero
It's clearer on inspection that the zero probabilities are unused.

Cherry-picked from aomedia/master: 8134db1

Change-Id: I56cddcb41ba256b7bb921d6a8538405165566dfb
2016-10-24 18:11:59 -07:00
Urvang Joshi
7bc1fa194d Merge changes I2153c57e,I0e291edd into nextgenv2
* changes:
  Palette: Generate encodings automatically from tree.
  Palette + Ext-Intra: shadowed declaration fix.
2016-10-25 01:06:28 +00:00
Urvang Joshi
4f4b68e245 get_palette_color_context: code cleanup
consts, comments and other small readability improvements.

Change-Id: I40c5a652811a796fdb91dc7ca6b108e8871f72d1
2016-10-24 18:03:09 -07:00
Yue Chen
cf6caf7a0c Merge "Fix bugs in SUB8X8_MC" into nextgenv2 2016-10-24 23:16:09 +00:00
Jingning Han
e8a17ba34e Merge "Refactor tx_size to pixel number mapping in reconintra.c" into nextgenv2 2016-10-24 22:24:04 +00:00
Jingning Han
61a50f73cf Merge "Simplify variable defs in av1_encode_block_intra" into nextgenv2 2016-10-24 22:23:59 +00:00
Jingning Han
8d6eaec1d7 Merge "Refactor av1_predict_intra_block tx_size interface" into nextgenv2 2016-10-24 22:23:40 +00:00
Jingning Han
9b0406454d Merge "Add block size in pixels lookup table" into nextgenv2 2016-10-24 22:23:36 +00:00
Angie Chiang
7e213aab0a Merge "Fix unsigned type error in aom_scale.c" into nextgenv2 2016-10-24 21:41:18 +00:00
Urvang Joshi
0b325978d7 Palette: Generate encodings automatically from tree.
Ran some manual sanity checks:
- Verified that the automatically generated encodings match the
  hand-written encodings before the patch.
- Verified that the encoded bitstream before/after this patch is
  identical.

Change-Id: I2153c57e463cff09c1d03d619b432fb1015199c3
2016-10-24 14:37:25 -07:00
Yue Chen
894fcceb87 Move small fixes and refactoring for obmc pred from AV1
Covering commits 1c263e0 and 79d8a07 from AOM codebase

Change-Id: I6400e5f99bbb2ef6584ef232d465e520230c06e0
2016-10-24 14:14:47 -07:00
Urvang Joshi
626591dfa1 Palette + Ext-Intra: shadowed declaration fix.
This shadowed declaration warning was generated when both experiments
are on.

Change-Id: I0e291eddeefabd68c5c3a0e5f8ac87706a82d55a
2016-10-24 14:13:55 -07:00
Jingning Han
7f76d4763d Prevent potential token buffer overflow in format 444
For a 16x16 pixel block, one needs to allocate 16x16 coefficient
tokens, plus up to 16 eob tokens, per plane. This commit increases
the token allocation size to cover the case where all the transform
blocks are of size 4x4 in format 444.

Change-Id: I5755e6a53771053d51163d01ec1d62e670c5009e
2016-10-24 14:08:34 -07:00
Thomas Daede
c0dca3c507 Automatically set internal bit depth to at least the input bit depth.
Upgrade profile if required.

Change-Id: Ieb2b77d2446290a8fc749739247a01e8f0600c55
2016-10-24 14:08:34 -07:00
Jingning Han
63632447ae Merge "Add MAX_VARTX_DEPTH macro" into nextgenv2 2016-10-24 21:01:29 +00:00
Jingning Han
e98c4a10e5 Merge "Simplify the recursive transform block decoding" into nextgenv2 2016-10-24 21:01:17 +00:00
Yaowu Xu
1ca24708a0 Merge "Correct data size estimation for odd size video" into nextgenv2 2016-10-24 20:57:55 +00:00
Debargha Mukherjee
0c78ebb22d Merge "Fix a bug when combining new-quant + supertx" into nextgenv2 2016-10-24 19:53:42 +00:00
Yue Chen
edd2915e21 Fix bugs in SUB8X8_MC
Change-Id: Ia544974f83c6b7f9cdb148eeb13a6d0c6eb4ed24
2016-10-24 12:22:59 -07:00
Yaowu Xu
7e87bef0ff Merge "Increase min size of compressed data" into nextgenv2 2016-10-24 19:21:45 +00:00
Yaowu Xu
23fb2feaa5 Merge "Avoid the use of uninitialized value in ActiveMap encoding route" into nextgenv2 2016-10-24 19:21:29 +00:00
Angie Chiang
10ab157a53 Fix unsigned type error in aom_scale.c
Avoid unary minus operator applied to unsigned type

Change-Id: I6986cd2b0ea236e0129ee94c02275593c287a87d
2016-10-24 11:51:50 -07:00
Yaowu Xu
10d9627ffe Merge "Use the actual inter prediction filter buffer in DRL" into nextgenv2 2016-10-24 18:34:29 +00:00
Yaowu Xu
932ca0ece2 Merge "fdct4x4_test: fix unsigned overflow" into nextgenv2 2016-10-24 18:25:55 +00:00
Debargha Mukherjee
bbd9705802 Merge "Add bit accounting information for deringing" into nextgenv2 2016-10-24 18:14:31 +00:00
Yaowu Xu
02be3ee60a Merge "Use remove some magic numbers in aom_rans_merge_prob8_pdf." into nextgenv2 2016-10-24 18:10:32 +00:00
Jingning Han
d89c72e997 Refactor tx_size to pixel number mapping in reconintra.c
Change-Id: Id66a14a869df8317c5bbb693d14262326fe84206
2016-10-24 11:07:46 -07:00
Jingning Han
62a2b9e197 Simplify variable defs in av1_encode_block_intra
Use direct table access to fetch the block size and transform size
in pixels.

Change-Id: Ia0093d5aed912be24996a06b0567bb2d873ec068
2016-10-24 11:07:27 -07:00
Jingning Han
c4c99da925 Refactor av1_predict_intra_block tx_size interface
Simplify the input arguments. Make direct use of the block size
in the unit of pixels.

Change-Id: Ifec9d90b4b4fa9605f93b4f93b8242f76f898b5f
2016-10-24 11:06:23 -07:00
Yaowu Xu
abc7d81b40 Correct data size estimation for odd size video
Given the largest transform size is 32x32, this commmit changes size
estiiation based on the size rounding up to 32 multiples to avoid
insufficient buffer allocations.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=36

Change-Id: I6eab09dc6acdc0f5a6bcadb918d62c4852aae21f
2016-10-24 10:46:32 -07:00
Jingning Han
571189c66d Add MAX_VARTX_DEPTH macro
Change-Id: I85532cf88f91f0f0cb4d9cb4b2dbda8a181297bf
2016-10-24 10:38:43 -07:00
Yaowu Xu
416b0d94de Increase min size of compressed data
This commit increases the minimum size for allocated buffer for
compressed data. The old size underestimated the size needed for
small images with width or height less than 64 pixels.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=31

Change-Id: Ia12507edc2be1e737ec49c32f64fd2ebf1eab41f
2016-10-24 09:56:09 -07:00
David Barker
d7d78c83e5 Fix a bug when combining new-quant + supertx
Previously, we assumed that av1_init_plane_quantizers is always called with
segment_id == xd->mi[0]->mbmi.segment_id (and use the latter to derive the value
of 'qindex' to use in the quantizer). But this is no longer true when supertx
is enabled. This patch instead remembers the value of 'qindex' derived from
the latest call to av1_init_plane_quantizers and uses that directly.

Change-Id: Ifa1c5bf74cad29942ff79b88ca92c231bc07f336
2016-10-24 17:43:51 +01:00
Jingning Han
6408895e69 Avoid the use of uninitialized value in ActiveMap encoding route
This commit resets the transform size to be the maximum possible
value. It avoids out-of-boundary writing when the ActiveMap is
turned on.

Change-Id: I8302dd9a5c9fffaea3edf9ad33f72aa111999737
2016-10-24 09:41:40 -07:00
Angie Chiang
91072e982f Merge "Align frame contexts." into nextgenv2 2016-10-24 16:36:36 +00:00
Jingning Han
72120969bc Use the actual inter prediction filter buffer in DRL
This avoids an encoding segmentation fault in speed 5, due to the
use of uninitialized dummy inter prediction filter buffer in the
dynamic motion vector referencing scheme.

Change-Id: Icd888d46623e8abf34267838135eed8656d552e4
2016-10-24 09:32:41 -07:00
Yaowu Xu
59b969daae fdct4x4_test: fix unsigned overflow
The difference between src and dst will be signed, the error will be
unsigned. The change quiets -fsanitize=integer:
    unsigned integer overflow: 4294967295 * 4294967295

Change-Id: I131cefcc9583ee8a5b98eb5182fd30e9c7237ea0
2016-10-24 09:21:55 -07:00
David Barker
95e248e7d7 Add bit accounting information for deringing
It seems that when bit accounting was introduced in
https://chromium-review.googlesource.com/#/c/400658/
there was one place which was accidentally skipped, leading to build failures
with --enable-dering. This patch adds the missing information.

Change-Id: I59e1bd6f7e1d4fa58506ee7af307b845c78a7cbe
2016-10-24 16:14:50 +00:00
Alex Converse
8db9faefe8 Use remove some magic numbers in aom_rans_merge_prob8_pdf.
Change-Id: I0cefae17642d7adf1b9bd637ecb81b437629aa0c
2016-10-24 09:05:03 -07:00
Jingning Han
421af3538d Merge "Limit the transform block partition depth" into nextgenv2 2016-10-24 15:57:28 +00:00
Jingning Han
bb22bbf01d Merge "Allow frame level tx_mode switch" into nextgenv2 2016-10-24 15:57:17 +00:00
Jingning Han
57d093793e Merge "Separate intra and inter tx_size counting" into nextgenv2 2016-10-24 15:56:33 +00:00
Jingning Han
a59d71a678 Make set context function aware of rectangular transform block size
Account for the rectangular transform block size in setting the
context data.

Change-Id: Ic30a6a3eaaca4c945e0aab3acbaeb99aa48b0064
2016-10-23 17:46:42 -07:00
Yaowu Xu
d30a563d23 Merge "Add a runtime flag to enable bit accounting." into nextgenv2 2016-10-23 03:15:37 +00:00
Yaowu Xu
d9301c7eb6 Merge "Add a decoder control to retrieve accounting data." into nextgenv2 2016-10-23 03:15:21 +00:00
James Zern
af322e1d71 update_state: quiet const warning w/global-motion
+ add a TODO as this is incompatible with tile-based threading

Change-Id: I057c551a5f19020366c6b85c2e67e8394bb3306f
2016-10-22 12:46:43 -07:00
James Zern
9ca190c690 update_state_supertx: rename a shadowed variable
Change-Id: I0e5fa71a4b7cd03c9e28b434b1ea72b090ca6772
2016-10-22 12:42:43 -07:00
Jingning Han
958077ab0b Merge "Fix comment typo in common_data.h" into nextgenv2 2016-10-22 03:41:05 +00:00
Jingning Han
8dbf0fd6f7 Merge "Refactor tx_size to pixel number mapping in reconintra.c" into nextgenv2 2016-10-22 03:40:53 +00:00
Jingning Han
60e6516e26 Merge "Refactor tx_size step use cases in decoder" into nextgenv2 2016-10-22 03:40:43 +00:00
Jingning Han
88b198d84b Merge "Replace tx_size_1d with tx_size_wide/high" into nextgenv2 2016-10-22 03:40:32 +00:00
Yi Luo
62b6cc0bc9 Merge "Fix avx2 16x16/32x32 fwd txfm coeff output on HBD" into nextgenv2 2016-10-22 01:46:09 +00:00
Yaowu Xu
c06feefbde Merge "Fix compiler warning when CONFIG_ACCOUNTING enabled." into nextgenv2 2016-10-22 01:18:35 +00:00
Jingning Han
bd161f9f6d Merge "Refactor decoder side qcoeff reset" into nextgenv2 2016-10-22 01:07:32 +00:00
Sarah Parker
43d56f32e5 Extend warp_frame functions to average compound predictions
Change-Id: I400e95161d576510423880b5b9923a2307b5eb02
2016-10-21 17:18:48 -07:00
Angie Chiang
a5d96c4a65 Align frame contexts.
This will allow for aligned cdfs and scan orders inside.

Change-Id: I8ebcd64d55e41da20f518a39ae6ef192def70109
2016-10-21 17:15:07 -07:00
Angie Chiang
a1a753c765 Run clang-format on entropymv.c
Change-Id: Ic9f34e32e51f8a8a4426543bae0b92f5fab0792e
2016-10-21 17:13:59 -07:00
Jingning Han
c47fe6c64b Add block size in pixels lookup table
This prepares for the next refactoring to support 2x2 transform
block sizes.

Change-Id: Ia06bc487da34e853ef9323cd13e3d482e819db43
2016-10-21 16:47:08 -07:00
Jingning Han
e7230e9f20 Fix comment typo in common_data.h
"varios" -> "various"

Change-Id: If91a462dc009f701c48c2cfd7965cd71f61f2970
2016-10-21 16:30:10 -07:00
Nathan E. Egge
eb64fc28b6 Add a runtime flag to enable bit accounting.
By default, when building with --enable-accounting the bit accounting
 code will collect statistics for every frame while decoding.
Collecting statistics can slow down decode time and we would eventually
 like to enable the CONFIG_ACCOUNTING flag by default.
This patch adds a runtime flag so that bit accounting statistics are
 only collected when actually needed.

Change-Id: I25d9eaf26ea132d61ace95b952872158c9ac29e7
2016-10-21 23:12:50 +00:00
Nathan E. Egge
c9862e05f5 Add a decoder control to retrieve accounting data.
This decoder control requires AV1 to be compiled with --enable-accounting.
Note that bit accounting data is only available after a frame has been
 decoded.

Change-Id: I8a15213d9f2587638e0edb62932738e985160e03
2016-10-21 16:12:01 -07:00
Nathan E. Egge
ebbd479e18 Fix compiler warning when CONFIG_ACCOUNTING enabled.
ISO C90 forbids mixed declarations and code and the function
 aom_accounting_set_context() was being called before the MB_MODE_INFO
 declaration.

Change-Id: I8619525b1b2fd37753891bd310d9d59c881b8807
2016-10-21 22:57:23 +00:00
Nathan E. Egge
5f34b61903 Update class0_fp_cdf and fp_cdf tables once per frame.
Move computing the class0_fp_cdf and fp_cdf tables per coded mv
 symbol to computing it only when the probabilities are updated.

Change-Id: Ib4957c8ab21e6189bcc3817a07b7681dfb343223
2016-10-21 22:56:41 +00:00
Nathan E. Egge
d7b893c667 Update class_cdf table once per frame.
Move computing the class_cdf table per coded mv class symbol to
 computing it only when the probabilities are updated.

Change-Id: I6c4a9075817e8ba2e251f0e82436995f08f2ec5c
2016-10-21 22:55:54 +00:00
Nathan E. Egge
5f7fd7ab5e Update joint_cdf table once per frame.
Move computing the joint_cdf table per coded mv joint symbol to
 computing it only when the probabilities are updated.

Change-Id: If5d195f70e6fad7b60f69606c8386ad5e69657d2
2016-10-21 22:55:31 +00:00
Nathan E. Egge
6ec4d10d3c Update inter_mode_cdf tables once per frame.
Move computing the inter_mode_cdf tables per coded inter mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I7a7b059ee75723cb6f278ed82a20cf34c27915d8
2016-10-21 22:54:50 +00:00
Yaowu Xu
b808b43b36 Merge "Update uv_mode_cdf tables once per frame." into nextgenv2 2016-10-21 22:53:42 +00:00
Jingning Han
94d5bfccdd Limit the transform block partition depth
Limit the recursive transform block partition depth to 2. For a
32x32 transform block unit, one can maximally go down to 8x8 transform
block size.

Change-Id: I2caa92bb2eee64762b7ecca8920259f7c50fb0aa
2016-10-21 15:44:34 -07:00
Yaowu Xu
e86df524b9 Merge "Update y_mode_cdf tables once per frame." into nextgenv2 2016-10-21 22:44:09 +00:00
Yaowu Xu
a82712b2a6 Merge "Update kf_y_mode_cdf tables once per frame." into nextgenv2 2016-10-21 22:43:57 +00:00
Jingning Han
9777afc392 Allow frame level tx_mode switch
Check the encoding statistics. If all the coding blocks use the
max transform size, skip transform size coding in the frame header.

Change-Id: I31cb16314e87f945d7e95a34a90a5536b3ed82d5
2016-10-21 15:42:50 -07:00
Jingning Han
dc9ad312be Separate intra and inter tx_size counting
Skip counting the inter transform block size distribution for
the intra transform block size coding.

Change-Id: Ifad9d843f57d069d0619a54d66ca18101e1b69f1
2016-10-21 15:40:18 -07:00
Jingning Han
8fd62b75c1 Simplify the recursive transform block decoding
Remove unneeded block index.

Change-Id: Ifceab4985d3ccd65d4c0a110de83a0b457ce5868
2016-10-21 15:31:21 -07:00
Jingning Han
22daaa3aea Refactor tx_size to pixel number mapping in reconintra.c
Change-Id: I1e4a43f5f08b76867240a207c60d7e85a8ffbb74
2016-10-21 15:25:17 -07:00
Jingning Han
2d64f12595 Refactor tx_size step use cases in decoder
Use lookup table to replace the arithmetic computation for transform
block step.

Change-Id: I1318d81bda9d7ffaf9d550acd19354b0615ede36
2016-10-21 15:22:12 -07:00
Jingning Han
5d5cd6a748 Replace tx_size_1d with tx_size_wide/high
This prepares the support to both rectangular and 2x2 transform
block sizes.

Change-Id: I3c2d4e317f6b627bb45d2273c278331bd976ee92
2016-10-21 15:18:39 -07:00
Jingning Han
1be1878572 Refactor decoder side qcoeff reset
Allow the decoder to memset partial dequantized coefficient line
to zero.

Change-Id: I1f07dc7bf802958754502c1b5c819cc81e7a08cb
2016-10-21 15:10:23 -07:00
Yaowu Xu
dc3c3a33cb Merge "Pass AV1_COMMON into get_scan" into nextgenv2 2016-10-21 21:51:50 +00:00
Yaowu Xu
b11ee30519 Merge "Decoder performance improvement with daala_ec." into nextgenv2 2016-10-21 21:50:13 +00:00
Sarah Parker
1634b48022 Merge "Fix logical vs bitwise & bug" into nextgenv2 2016-10-21 21:16:54 +00:00
Yi Luo
1a0f27aaa6 Fix avx2 16x16/32x32 fwd txfm coeff output on HBD
Change-Id: Ida036defe5688894a63007a31aa2dd0b3f0b5d59
2016-10-21 14:14:00 -07:00
Jingning Han
dc90bf0737 Merge "Fix unused variable error in intrapred.c" into nextgenv2 2016-10-21 21:11:31 +00:00
Jingning Han
823411ea4d Merge "Refactor tx_size to pixel number in decodeframe.c" into nextgenv2 2016-10-21 19:39:22 +00:00
Nathan E. Egge
380cb1a93c Update uv_mode_cdf tables once per frame.
Move computing the uv_mode_cdf tables per coded intra mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I627b59d30726c913f5d7ba7753cb0446a12655bb
2016-10-21 12:39:04 -07:00
Nathan E. Egge
5710c722af Update y_mode_cdf tables once per frame.
Move computing the y_mode_cdf tables per coded intra mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I8c43d09b8ef5febe2a3ec64bd51d28bd78ea73ed
2016-10-21 12:39:04 -07:00
Nathan E. Egge
3ef926edc2 Update kf_y_mode_cdf tables once per frame.
Move computing the kf_y_mode_cdf tables per coded intra mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I5999447050c2f7d5dbccde80bee05ecd1c5440ab
2016-10-21 12:39:04 -07:00
Nathan E. Egge
5357dcaf71 Decoder performance improvement with daala_ec.
Cherry-pick Daala b5020bee:
 Remove redundant test in od_ec_decode_bool_q15().
Using a test that decodes 100M random binary symbols, making this change
 produced a speed up of 8.81% with gcc-4.9.3 and 3.71% with clang-3.7.1,
 both compiled with -O2.

Change-Id: If6d0077a56121a575ae53bcd4d1d9b7d800a317d
2016-10-21 12:38:30 -07:00
Yaowu Xu
91219941b1 Merge "Use divide by multiply in the ans writer." into nextgenv2 2016-10-21 18:46:29 +00:00
Angie Chiang
ff6d890557 Pass AV1_COMMON into get_scan
This CL will facilitate adapt_scan experiment.
In adapt_scan experiment, dynamic scan order will be stored in
AV1_COMMON

Change-Id: I4763ea931b5e1af54d4f173971befeb01a4db335
2016-10-21 11:46:19 -07:00
Yaowu Xu
65818322ef Merge "Sub8x8 block chroma component inter prediction" into nextgenv2 2016-10-21 18:46:18 +00:00
Angie Chiang
646e52a85a Fix unused variable error in intrapred.c
Change-Id: Icda975cd9b264c1752c3057bce8031791f91c08a
2016-10-21 11:45:31 -07:00
Angie Chiang
b0f9968ac7 Merge "Remove the has_no_coeffs corner case" into nextgenv2 2016-10-21 18:16:45 +00:00
Yaowu Xu
c2c5ec21b6 Merge "Unify set_contexts() function for encoder and decoder" into nextgenv2 2016-10-21 18:00:32 +00:00
Yaowu Xu
2f5b9d66b5 Merge "Add support for v256 intrinsics" into nextgenv2 2016-10-21 18:00:20 +00:00
Jingning Han
3d855c5e75 Refactor tx_size to pixel number in decodeframe.c
Use the table access to retrieve pixel numbers from tx_size.

Change-Id: I9459f2c3292c2f9ddf963f16b79e142de7432031
2016-10-21 10:55:54 -07:00
Yaowu Xu
c76572af16 Merge changes Icfc16070,Ied47a248,I8af087d9,I322a1366,If04580af into nextgenv2
* changes:
  Palette: Use inverse_color_order to find color index faster.
  Rewrite some loops to avoid -Wunsafe-loop-optimizations warnings.
  Remove some useless casts
  Add compiler warning flag -Wextra and fix related warnings.
  Declare some array sizes to be constants (known at compile time).
2016-10-21 17:31:42 +00:00
Yaowu Xu
98a306a1b2 Merge changes I027a4f2a,Ide91d76f into nextgenv2
* changes:
  Add complier warning -wunused.
  angle estimation: Some renames/tweaks to sync with aomedia code.
2016-10-21 17:31:22 +00:00
Yaowu Xu
32d8a496ef Merge "Code class0 using aom_read() / aom_write()." into nextgenv2 2016-10-21 17:25:50 +00:00
Yaowu Xu
f29166deff Merge "Use intra_ext_tx_cdf when coding tx_type." into nextgenv2 2016-10-21 17:25:18 +00:00
Yaowu Xu
c53f8ca6fb Merge "Use MV_FP_SIZE based constant instead of 3." into nextgenv2 2016-10-21 17:19:27 +00:00
Alex Converse
64e2f105a7 Use divide by multiply in the ans writer.
Change-Id: Ide4e9b3a605571ec41c265347217e103df8d0821
2016-10-21 09:54:41 -07:00
Jingning Han
e29ea12fc2 Sub8x8 block chroma component inter prediction
Handle the sub8x8 chroma component at the unit of 2x2/4x2/2x4 level
and use the motion vector inherited from the luma component. This
improves the coding performance:

lowres 0.4%
midres 0.25%
hdres  0.15%

Change-Id: I34dff4218cfa3e5d55e7ed0341f36f4719389f7e
2016-10-21 09:39:34 -07:00
Yaowu Xu
67cf85b883 Merge "Remove duplicate code" into nextgenv2 2016-10-21 16:34:24 +00:00
Jingning Han
a6923f7f97 Unify set_contexts() function for encoder and decoder
Remove the separate implementations of set_contexts() in encoder
and decoder.

Change-Id: I9f6e9b075532faae0f74f885d9443589254258a7
2016-10-21 09:32:28 -07:00
Yi Luo
e4abb97ba3 Merge "Fix the overflow of av1_fht32x32() in 2D DCT_DCT" into nextgenv2 2016-10-21 16:13:18 +00:00
Steinar Midtskogen
045d413ca2 Add support for v256 intrinsics
Change-Id: I1da08afaa945ca1aaf4bf9f50cf649a7feef2e60
2016-10-21 08:55:37 -07:00
Nathan E. Egge
45ea963f0b Code class0 using aom_read() / aom_write().
The av1_mv_class0_tree is a balanced tree with two leafs and can
 simply be coded as a boolean with probability class0[0].
If CLASS0_SIZE is ever changed from 1, this change will need to be
 reverted.

Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
2016-10-21 08:34:03 -07:00
Nathan E. Egge
72762a2827 Use intra_ext_tx_cdf when coding tx_type.
When building with --enable-daala_ec, the tx_type for intra blocks can be
 coded using the CDFs that are updated once per frame.
This patch converts a tx_type symbol to be coded with aom_write_symbol()
 and aom_read_symbol() that was missed in f3e8e267.

Change-Id: I34f8fef7525f88e156bbcb78dfc48994367610ce
2016-10-21 08:29:08 -07:00
Nathan E. Egge
ac499f352e Use MV_FP_SIZE based constant instead of 3.
Change-Id: I90ef3b49b499c2ac9c24797467cb4eb194fdf23b
2016-10-21 08:25:33 -07:00
Yaowu Xu
68cb657e92 Remove duplicate code
The duplicate breaks build.

Change-Id: I0f16761c4bcb8563402a664013429403b883c2e1
2016-10-21 08:22:46 -07:00
Yaowu Xu
b97c3a13de Merge "Fix typos" into nextgenv2 2016-10-21 14:44:35 +00:00
Yaowu Xu
23f0604188 Merge "Fix encoder crash when --enable-daala-ec" into nextgenv2 2016-10-21 14:44:26 +00:00
Yaowu Xu
d56df2f9f0 Merge "Pass AV1_COMMON into av1_cost_coeffs" into nextgenv2 2016-10-21 03:20:28 +00:00
Yaowu Xu
432d9071ce Merge "Add adapt_scan APIs and some helping functions" into nextgenv2 2016-10-21 03:20:17 +00:00
Yaowu Xu
361f3fe3b0 Merge "Compute all token encodings from symbol trees." into nextgenv2 2016-10-21 03:20:01 +00:00
Yaowu Xu
360383bfca Merge "decodeframe.c: aom_read_tree_cdf->aom_read_symbol" into nextgenv2 2016-10-21 03:19:14 +00:00
Yaowu Xu
6d5ebbd76c Merge "Encoder/Decoder mismatch fix: need a separate copy of eob_counts." into nextgenv2 2016-10-21 01:23:25 +00:00
Yaowu Xu
c287e271f2 Fix typos
In a previous commit: 5db9743fbb, two
changes that appeared to be typos are breaking build when experiments
are enabled:

../../libvpx/configure --enable-experimental --enable-ref-mv
--enable-ext-intra --enable-ext-refs --enable-ext-interp
--enable-supertx --enable-var-tx --enable-entropy --enable-ext-inter
--enable-ext-tx  --enable-motion-var --enable-dual-filter
--enable-ext-partition --enable-ext-partition-types
--enable-loop-restoration --enable-rect-tx --enable-palette
--enable-aom-highbitdepth --enable-filter-intra --enable-internal-stats
&& make clean && make -j16

This commit fixes the issue.

Change-Id: I9ce5bbc96df326214202868cb0669bd334c86851
2016-10-20 18:19:16 -07:00
Yaowu Xu
e1466ad4e4 Fix encoder crash when --enable-daala-ec
Change-Id: I6855e18d92f693a9789eda7c91a3430566469bdd
2016-10-20 17:56:54 -07:00
Angie Chiang
22ba7514df Pass AV1_COMMON into av1_cost_coeffs
Change-Id: I2043d635e2a7f50f84a541501f28179b797ca326
2016-10-20 17:18:18 -07:00
Angie Chiang
e7d9d1ebeb Merge changes I163874ee,I1424690f into nextgenv2
* changes:
  Add data structure of adpat_scan experiment
  Add adapt_scan experimental flag
2016-10-20 23:52:11 +00:00
Zoe Liu
528d9de543 Merge "Sync with aom branch for ext-refs" into nextgenv2 2016-10-20 22:58:15 +00:00
Urvang Joshi
967ff395b6 Palette: Use inverse_color_order to find color index faster.
Cherry-picked from aomedia/master: b1c3bb5

Change-Id: Icfc16070160fd9763abb1dbf5545103e62b4b9ff
2016-10-20 15:54:33 -07:00
Urvang Joshi
b42827f650 Rewrite some loops to avoid -Wunsafe-loop-optimizations warnings.
For example, loops of the form:
"for (i = 0; i < 1 + max_value; ++i) ..." or
"for (i = 0; i <= max_value; ++i) ..." are possibly infinite loops,
theoretically speaking (even if practically, they aren't).
So, compiler cannot optimize those loops.

When possible, I rewrote such loops to be finite even theoretically.

Cherry-picked from aomedia/master: 4e69284

Change-Id: Ied47a24833b689c0ec011f8645cf1c01856f7c59
2016-10-20 15:53:11 -07:00
Urvang Joshi
77853e56ea Remove some useless casts
Cherry-picked from aomedia/master: 6796e7f

Change-Id: I8af087d97cadb0c2a9e37a4e4723246cdd397995
2016-10-20 15:51:41 -07:00
Urvang Joshi
d71a231c49 Add compiler warning flag -Wextra and fix related warnings.
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.

Cherry-picked from aomedia/master: 4790a69

Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-10-20 15:49:16 -07:00
Nathan E. Egge
3c05679017 Compute all token encodings from symbol trees.
The av1_token encodings must match the contents of the aom_tree_index
 structures so generate all encodings from the symbol trees.

Change-Id: I37be9f12c86a02693ae3c3c1d24b00f2abb29bfb
2016-10-20 15:34:08 -07:00
Yaowu Xu
f2581a3a30 decodeframe.c: aom_read_tree_cdf->aom_read_symbol
This was a missed replacement from cherry-pick of:
9ac7a9dc8ced90a28f5b83801a50597dc12e50a7

Change-Id: I9e01d9d7a39bed397500a293bf68dca2746aa917
2016-10-20 15:31:11 -07:00
Yaowu Xu
ec5a1942e2 Merge changes I7d6394e4,Ia8ce1464,If20e8637,Ia9adc46b,I651db25b into nextgenv2
* changes:
  Define SIMD_INLINE using AOM_FORCE_INLINE
  AOM_FORCE_INLINE: fix always_inline attribute
  Free memory allocated by daala_ec encoder.
  Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention
  sync avg_test.cc with aom/master
2016-10-20 22:30:11 +00:00
Yaowu Xu
748d3f5e0f Merge "Fix Visual Studio build." into nextgenv2 2016-10-20 22:29:57 +00:00
Jingning Han
6d51377858 Merge "Offset speed feature setting index" into nextgenv2 2016-10-20 22:16:00 +00:00
Jingning Han
feee3ed5ee Merge "Add tx_size to pixel number map" into nextgenv2 2016-10-20 22:15:53 +00:00
Jingning Han
7ae6ae3497 Merge "Add 2x2 directional intra predictors" into nextgenv2 2016-10-20 22:15:46 +00:00
Debargha Mukherjee
9cce436975 Merge "Fix for AV1.TestTell" into nextgenv2 2016-10-20 22:05:36 +00:00
Urvang Joshi
bffc0b5748 Declare some array sizes to be constants (known at compile time).
This reduces some memcpys and callocs.

Cherry-picked from aomedia/master: 4081013

Change-Id: If04580af4c63892c8af8ac5b405c7d6aabe5af89
2016-10-20 14:58:13 -07:00
Urvang Joshi
3212dda94d Add complier warning -wunused.
Cherry-picked from aomedia/master: 953f086c

Note: related fixes were already part of webm/nextgenv2.

Change-Id: I027a4f2a540af5a304b358ddbf293965b4211b9e
2016-10-20 14:58:01 -07:00
Urvang Joshi
da70e7b0fa angle estimation: Some renames/tweaks to sync with aomedia code.
Change-Id: Ide91d76fafe79b2b310ffd5afb7cd5b26b681f78
2016-10-20 14:57:34 -07:00
Urvang Joshi
43e6281f62 Encoder/Decoder mismatch fix: need a separate copy of eob_counts.
The bug was introduced here:
https://chromium-review.googlesource.com/#/c/399975/4/av1/encoder/bitstream.c
In that patch, I had removed 2nd declaration of a variable of the same
name. But it turns out that the two variables actually had a different
type (even though the name was same).

Now, we keep both variables, but rename one of them -- that fixes the
mismatch. While we are at it, made both variables local as well.

The fix can be verified as follows:
../../libvpx/configure --enable-experimental --enable-supertx
--enable-var-tx --enable-entropy --enable-internal-stats && make clean
&& make -j16

aomenc -o soccer_cif_1000_av1_b8.webm ../soccer_cif.y4m --codec=av1
--limit=50 --skip=0 -p 2 --pass=1 --fpf=soccer_cif_av1.fpf --good
--cpu-used=0 --target-bitrate=1000 --lag-in-frames=25 --min-q=0
--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0
--drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0
--maxsection-pct=2000 --arnr-maxframes=7 --arnr-strength=5 --sharpness=0
--undershoot-pct=100 --overshoot-pct=100 --frame-parallel=0
--tile-columns=0 --profile=0 --test-decode=warn

aomenc -o soccer_cif_1000_av1_b8.webm ../soccer_cif.y4m --codec=av1
--limit=50 --skip=0 -p 2 --pass=2 --fpf=soccer_cif_av1.fpf --good
--cpu-used=0 --target-bitrate=1000 --lag-in-frames=25 --min-q=0
--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0
--drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0
--maxsection-pct=2000 --arnr-maxframes=7 --arnr-strength=5 --sharpness=0
--undershoot-pct=100 --overshoot-pct=100 --frame-parallel=0
--tile-columns=0 --profile=0 --test-decode=warn -v --psnr

Change-Id: Ibd72dbe1f620e6de231513220ee4e190606613ae
2016-10-20 14:51:01 -07:00
Hui Su
c58d95717f Merge "Renaming in filter-intra sse4 code" into nextgenv2 2016-10-20 21:36:42 +00:00
Hui Su
e3642ac688 Merge "Remove av1/common/intra_filters.h" into nextgenv2 2016-10-20 21:35:10 +00:00
Hui Su
475159cb69 Merge "Seperate FILTER_INTRA from EXT_INTRA experiment" into nextgenv2 2016-10-20 21:34:33 +00:00
James Zern
d37c22271c Merge "Add matching brace in aomenc.c" into nextgenv2 2016-10-20 19:38:57 +00:00
Sarah Parker
ea16b68986 Fix logical vs bitwise & bug
This was causing one of the global motion parameters to not
be centered at 0.

Change-Id: Ide32e3d177bed5613ab768a19b4e33b37692463a
2016-10-20 12:00:16 -07:00
Peter de Rivaz
130ca4d675 Remove the has_no_coeffs corner case
BUG=webm:1277

Change-Id: I052239e8a6c468da8704bdbbb663b59533c01be2
2016-10-20 19:38:26 +01:00
Angie Chiang
648aeb0b1b Add adapt_scan APIs and some helping functions
av1_init_scan_order
initialize data structures related to adaptive scan order

av1_update_scan_prob
update nonzero probabilities from nonzero counts

av1_augment_prob
embed r + c and coeff_idx info with nonzero probabilities.
When sorting the nonzero probabilities, if there is a tie,
the coefficient with smaller r + c will be scanned first

av1_update_sort_order
apply quick sort on nonzero probabilities to obtain a sort order

av1_update_scan_order
apply topological sort on the nonzero probabilities sorting order to
guarantee each to-be-scanned coefficient's upper and left coefficient
will be scanned before the to-be-scanned coefficient.

av1_update_neighbors
For each coeff_idx in scan[], update its above and left neighbors in
neighbors[] accordingly.

Change-Id: I64c4938057daf8e30e48609a00ecc08d2e3062f4
2016-10-20 11:20:40 -07:00
Zoe Liu
6cfaff95b7 Sync with aom branch for ext-refs
Plus a small code clean up. The experiment of EXT_REFS, compared against
the baseline, using Overall PSNR, now obtains a gain on lowres as:
Avg: -5.818; BDRate: -5.653

Compared against the previous EXT_REFS results on lowres, a tiny gain is
obtained as:
Avg: -0.047, BDRate: -0.063

(1) 780952 Add encoder first pass support to bi-prediction in EXT_REFS
(2) f91498 Add pred prob handling for new references in EXT_REFS
(3) e91472 Add decoder support for bi-direct prediction in EXT_REFS
(4) 0dbac9 Add encoder support to new references in EXT_REFS
(5) ad70cc Remove hard-coded number for EXT_REFS
(6) 9c1e2f Add the use of new reference frames at encoder in EXT_REFS
(7) 6d4fde Add the experiment flag of EXT_REFS

Change-Id: I26f7ca45b9ede7579fdb9d0d6a1a91f4334599bd
2016-10-20 10:55:11 -07:00
Angie Chiang
37fb8edd7c Add data structure of adpat_scan experiment
Change-Id: I163874ee64b9c348de2c7cc8e7b2852308734b0e
2016-10-20 10:00:10 -07:00
Yi Luo
157e45a44b Fix the overflow of av1_fht32x32() in 2D DCT_DCT
- Use range check function to avoid DCT_DCT overflow.
  We need to re-develop the column txfm side scaling/rounding. Now,
  we prefer to maintain the current BDRate level.
- Encoder user level time reduction <1% owing to av1_fht32x32_avx2.
- Add MemCheck unit test and fdct32() unit test.

Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
2016-10-20 09:22:24 -07:00
Angie Chiang
8c2dc6f591 Add adapt_scan experimental flag
Change-Id: I1424690fa792b960a1cfb78bbcb37da6b9899ee6
2016-10-20 09:19:01 -07:00
Peter de Rivaz
f994855e8e Fix for AV1.TestTell
The tell functions return an unsigned integer.
This causes the AV1.TestTell test case to fail because
-1 is greater than 20 when treated as an unsigned integer.

Change-Id: I9dd1d7eb61260d30d1713a4917159fc6fe8eee42
2016-10-20 16:24:06 +01:00
hui su
9ff4134f54 Renaming in filter-intra sse4 code
Change-Id: Iff1786a92d164e6b9cfaf4a59ece79819494276f
2016-10-19 21:41:06 -07:00
hui su
344b643d59 Remove av1/common/intra_filters.h
Use a single header reconintra.h for all intra prediction
related codes.

Change-Id: Ib869447f8c482b534c890eab673e81ff830e8d85
2016-10-19 21:41:06 -07:00
hui su
5db9743fbb Seperate FILTER_INTRA from EXT_INTRA experiment
Prepare for the av1/nextgenv2 merge.

Coding gain (%):

               lowres     midres
ext-intra       0.69       0.97
filter-intra    0.67       0.83
both            1.05       1.48

Change-Id: Ia24d6fafb3e484c4f92192e0b7eee5e39f4f4ee6
2016-10-19 21:40:49 -07:00
Yaowu Xu
cfc5ac5034 Merge "Partition the ans experiment into 'ans' and 'rans'" into nextgenv2 2016-10-19 22:58:05 +00:00
Jingning Han
775d99f07e Offset speed feature setting index
Change-Id: If201cbd4175842f68e6dcfb0414ff16ca07e0881
2016-10-19 22:55:44 +00:00
hui su
251e151c3d Add matching brace in aomenc.c
Change-Id: Iccb75d5204f0f52f2c7d6e18d1f8223ce10f68ba
2016-10-19 15:31:51 -07:00
Steinar Midtskogen
c38afedb8d Define SIMD_INLINE using AOM_FORCE_INLINE
Change-Id: I7d6394e48e9b6093e5b523387ed250f371ee7fb9
2016-10-19 15:14:27 -07:00
Thomas
e28d92be97 Fix Visual Studio build.
Change-Id: I01608dfd597cc1d2bd4e73918aa29cf9251edb08
2016-10-19 15:14:27 -07:00
Thomas Davies
f693610a1a Step size and arithmetic coding for delta quantization.
Example performance: 1.8% bit rate savings using
the AQ test mode aq-mode=4 :
./aomenc --codec=av1 --ivf --tile-columns=1 --tile-rows=1 \
                 --kf-max-dist=1000 --kf-min-dist=1000 --cpu-used=0 \
                 --passes=1 --threads=1 --lag-in-frames=0 \
                 --end-usage=q --limit=600 --cq-level=42 \
                 --aq-mode=4 --error-resilient=1 out.bits FourPeople_1280x720_60.y4m

Change-Id: Iba01cf2732a57f3c27481ac2a3c8fc37bb9e5533
2016-10-19 15:14:27 -07:00
James Zern
7dec51534f AOM_FORCE_INLINE: fix always_inline attribute
Change-Id: Ia8ce146489713e137004ccf41faf35aa5645b8ae
2016-10-19 15:14:27 -07:00
Arild Fuldseth
07441165fe Support for delta-q at superblock level
Change-Id: I4128af44776d1f361bddc1fdffb75ed2224dbfa5
2016-10-19 15:14:27 -07:00
Nathan E. Egge
e734fcb114 Free memory allocated by daala_ec encoder.
Free the two memory buffers allocated by the daala_ec encoder when
 calling od_ec_enc_clear() from aom_daala_stop_encode().

Change-Id: If20e86374ea29e51ee59111012905e56039dd4cc
2016-10-19 15:14:27 -07:00
Steinar Midtskogen
f250e20d13 Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention
Change-Id: Ia9adc46b8a4d08c5b8e0089ea1a1526df4f1e1dc
2016-10-19 15:14:27 -07:00
Yaowu Xu
fc5176f851 sync avg_test.cc with aom/master
Change-Id: I651db25bee8f83a9fc6dcd35db5007a002f171c0
2016-10-19 15:14:27 -07:00
Yaowu Xu
dc8a2c523f Merge "Always send frame size explicitly" into nextgenv2 2016-10-19 22:00:40 +00:00
Jingning Han
03b3514058 Add 2x2 directional intra predictors
Change-Id: Iaa25269a15231dadeaba0f4836c864fc10e858df
2016-10-19 21:58:09 +00:00
Yaowu Xu
0a3284cbb9 Merge "Fix build issues when --enable-aom-qm" into nextgenv2 2016-10-19 21:56:41 +00:00
Jingning Han
02935f5f1b Add tx_size to pixel number map
Change-Id: I789fa11638f155f1092a1e9260d26c7855d18e37
2016-10-19 14:52:53 -07:00
Yaowu Xu
8057103d2e Merge "Fix decodeframe.c format" into nextgenv2 2016-10-19 21:28:06 +00:00
Yue Chen
0651eced9f Merge "Remove OBMC from the experimental configure list" into nextgenv2 2016-10-19 21:02:15 +00:00
Yaowu Xu
2a813e41ce Merge "Add unit test for delta-q (aq-mode=4)" into nextgenv2 2016-10-19 21:01:03 +00:00
Jingning Han
8f6eb189e6 Fix decodeframe.c format
Change-Id: I2228a3d1778917ac760582fbec3c868be5d9ba1c
2016-10-19 13:48:57 -07:00
Arild Fuldseth
842e9b030f Always send frame size explicitly
This commit changes to send frame size explicitly when
error_resilient_mode=1. Purpose is to allow parsing of bitstream
after a packet loss.

Change-Id: I7d1c010a465aa18914762cc1a3e61db377304c08
2016-10-19 12:35:12 -07:00
Yaowu Xu
0dd046371f Fix build issues when --enable-aom-qm
Change-Id: I1a462675c06c4b2a5f8b4b347f23fec67feccdd0
2016-10-19 12:26:53 -07:00
Alex Converse
ec6fb649da Partition the ans experiment into 'ans' and 'rans'
The (new) ans experiment replaces the bool coder with uABS bools. The
'rans' experiment adds multisymbol coding.

This matches the setup in aom/master.

Change-Id: Ida8372ccabf1e1e9afc45fe66362cda35a491222
2016-10-19 12:03:15 -07:00
Yaowu Xu
870a72d6b5 Merge "Fix failing TestBitIO test with --enable-daala_ec." into nextgenv2 2016-10-19 18:59:20 +00:00
Yaowu Xu
e94767ae97 Merge "Change return type of tell and tell_frac to uint32_t." into nextgenv2 2016-10-19 18:59:08 +00:00
Yue Chen
48877de873 Remove OBMC from the experimental configure list
It was replaced by MOTION_VAR in commit cb60b18

Change-Id: I7ab625eef4dbae2e5585d9fa3b6873aa78b2c254
2016-10-19 18:45:34 +00:00
Arild Fuldseth (arilfuld)
9f28cb8f93 Add unit test for delta-q (aq-mode=4)
Change-Id: Ic529355880b4dbd076a7e46e7b03a49a1ee5f6f0
2016-10-19 11:35:40 -07:00
Urvang Joshi
66b1fcc924 Merge changes I3922dea2,I3bab2848,I21f7478a,Ida5de713,Ib9f0eefe, ... into nextgenv2
* changes:
  Fix warnings reported by -Wshadow: Part4: main directory
  Fix warnings reported by -Wshadow: Part3: test/ directory
  Fix warnings reported by -Wshadow: Part2b: more from av1 directory
  Fix warnings reported by -Wshadow: Part2: av1 directory
  Fix warnings reported by -Wshadow: Part1b: scan_order struct and variable
  Fix warnings reported by -Wshadow: Part1: aom_dsp directory
  Move STAT_TYPE enum to source file.
  Code cleanup: mainly rd_pick_partition and methods called from there.
2016-10-19 18:25:52 +00:00
Nathan E. Egge
e58781d329 Fix failing TestBitIO test with --enable-daala_ec.
Change-Id: I6a885b7c6315261d67a9c2fcde914206b8301f4a
2016-10-19 10:54:40 -07:00
Nathan E. Egge
b244f39627 Change return type of tell and tell_frac to uint32_t.
The bit accounting functions aom_reader_tell() and aom_reader_tell_frac()
 return the number of bits and 1/8th bits respectively.
This patch changes the return type from ptrdiff_t which is signed to
 uint32_t which is unsigned.
The size_t type is not used since we only care about the number of bits
 or 1/8 bits per entropy coder context and we don't expect to code more
 than 512 megabits per tile.

Change-Id: I84a119d1f52829dcbdb66a92656eacca06e42b11
2016-10-19 10:53:52 -07:00
Hui Su
3e908b7f44 Merge "Temporary fix for 4X8 block intra prediction." into nextgenv2 2016-10-19 16:55:20 +00:00
Hui Su
e22a480225 Merge "Fix format in set_offsets()" into nextgenv2 2016-10-19 16:54:30 +00:00
Angie Chiang
d83fc3b8d9 Merge "Add av1_fdct64_new and av1_idct64_new" into nextgenv2 2016-10-19 16:34:24 +00:00
Urvang Joshi
4145bf05ae Fix warnings reported by -Wshadow: Part4: main directory
Now that all warnings are taken care of, add warning flag -Wshadow to
configure.

Note: Enabling this flag for C++ generates some useless warnings about
some function parameters shadowing class member function names. So, only
enabling this warning for C code.

Cherry-picked from aomedia/master: b96cbc4

Change-Id: I3922dea2e6976b16519c4aa4d1bd395c198134f1
2016-10-19 07:56:53 -07:00
Peter de Rivaz
74d0ad844e Fix for var_tx context update
The tx_partition_set_contexts function changes tx_size even
for blocks coded with a rectangular transform.
This causes an internal rd inconsistency when using all of
CONFIG_VAR_TX, CONFIG_RECT_TX, CONFIG_EXT_TX.

Change-Id: Ia45d4a8893b0961534219bb96d9652719038c7a1
2016-10-19 11:43:11 +01:00
Yaowu Xu
caf2023ae1 Reorder includes
Change-Id: I97487bf353471bf9d245cd620780adfb1d3fc2b1
2016-10-19 04:34:49 +00:00
Michael Bebenita
6048d05225 Bit accounting.
This patch adds bit account infrastructure to the bit reader API.
When configured with --enable-accounting, every bit reader API
function records the number of bits necessary to decoding a symbol.
Accounting symbol entries are collected in global accounting data
structure, that can be used to understand exactly where bits are
spent (http://aomanalyzer.org). The data structure is cleared and
reused each frame to reduce memory usage. When configured without
--enable-accounting, bit accounting does not incur any runtime
overhead.

All aom_read_xxx functions now have an additional string parameter
that specifies the symbol name. By default, the ACCT_STR macro is
used (which expands to __func__). For more precise accounting,
these should be replaced with more descriptive names.

Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
2016-10-19 04:34:29 +00:00
Debargha Mukherjee
4bacfcffd0 Merge "Fix ransac random generator seeding" into nextgenv2 2016-10-19 01:39:08 +00:00
Yaowu Xu
321556a557 Merge "Update segment tree_cdf per frame." into nextgenv2 2016-10-19 01:09:52 +00:00
Yaowu Xu
4aec17a7ec Merge "Adds ability to measure with a higher precision the number of bits read per symbol." into nextgenv2 2016-10-19 01:09:41 +00:00
Sarah Parker
cd2750048f Merge "Add clamping to parameter search" into nextgenv2 2016-10-19 00:44:28 +00:00
Sarah Parker
5572486ed7 Merge "Adjust gm costing so GLOBAL_ZERO is treated as regular zeromv" into nextgenv2 2016-10-19 00:44:12 +00:00
Jingning Han
97d854831f Fix format in set_offsets()
Change-Id: I371297e6ee000e6dc01ba1544763cbed429b0e5a
2016-10-18 17:42:09 -07:00
Brennan Shacklett
7523a7ecd6 Temporary fix for 4X8 block intra prediction.
Currently the RD loop traverses 4X8 blocks in inverted N order while
the bitstream stores blocks smaller than 8x8 in Z order. This causes a
discrepancy where the RD loop reads uninitialized data while
performing intra prediction.  As a temporary fix simply disable the
use of the extended right edge for 4X8 blocks, until the bitstream can
be changed to match the logical structure of the blocks.

Change-Id: I44a9e4fc1a15cd551a7b38c3c1227bc5dac77e9a
2016-10-18 17:24:53 -07:00
Urvang Joshi
88a03bb68f Fix warnings reported by -Wshadow: Part3: test/ directory
Cherry-picked from aomedia/master: be029580

Change-Id: I3bab28488388f92f2db20e6af8fc9cf2d7f26015
2016-10-18 17:22:58 -07:00
Urvang Joshi
368fbc955d Fix warnings reported by -Wshadow: Part2b: more from av1 directory
From code only part of nextgenv2 (and not aomedia)

Change-Id: I21f7478a59d525dff23747efe5238ded16b743d2
2016-10-18 17:22:44 -07:00
Urvang Joshi
454280dabf Fix warnings reported by -Wshadow: Part2: av1 directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Cherry-picked from aomedia/master: 863b0499

Change-Id: Ida5de713156dc0126a27f90fdd36d29a398a3c88
2016-10-18 17:22:34 -07:00
Urvang Joshi
03f6fdcfca Fix warnings reported by -Wshadow: Part1b: scan_order struct and variable
- Change struct name to all caps SCAN_ORDER to be locally consistent.
- Rename struct pointers to 'scan_order' instead of hard to read short
  names 'so' and 'sc'.

Cherry-picked from aomedia/master: 30abc082

Change-Id: Ib9f0eefe28fa97d23d642b77d7dc8e5f8613177d
2016-10-18 17:22:23 -07:00
Urvang Joshi
fdb60962f4 Fix warnings reported by -Wshadow: Part1: aom_dsp directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Cherry-picked from aomedia/master: 09eea2193

Change-Id: I61030e773137ae107d3bd43556c0d5bb26f9dbf8
2016-10-18 17:22:12 -07:00
Urvang Joshi
b5ed35008d Move STAT_TYPE enum to source file.
In the header, all we need is number of stat types, not the names for actual
types.

Removing it avoids names like 'Y', 'U', 'V' and 'ALL' being visible
in all files that include the encoder.h header.

Change-Id: I874a73a3cfe6bcb29aedea102077a52addc49af6
2016-10-18 17:22:00 -07:00
Urvang Joshi
526484482a Code cleanup: mainly rd_pick_partition and methods called from there.
- Const correctness
- Refactoring
- Make variables local when possible etc
- Remove -Wcast-qual to allow explicitly casting away const.

Cherry-picked from aomedia/master: c27fcccc
And then a number of more const correctness changes to make sure other
experiments build OK.

Change-Id: I77c18d99d21218fbdc9b186d7ed3792dc401a0a0
2016-10-18 17:21:27 -07:00
Nathan E. Egge
f627e58e0f Update segment tree_cdf per frame.
Move computing the segmentation_probs.tree_cdf table per symbol to
 computing it only when the probabilities are updated.

Change-Id: I3826418094bbaca4ded87de5ff04d4b27c85e35a
2016-10-18 16:58:48 -07:00
Michael Bebenita
d7baf45ff6 Adds ability to measure with a higher precision the number of bits
read per symbol.

Change-Id: I218abaa5172b769b66dba45050381c0212602668
2016-10-18 16:57:56 -07:00
Sarah Parker
081783dc67 Add clamping to parameter search
This fixes mismatches due to overflowing low precision parameters.

Change-Id: If34e39ca7ab0adc9688d46b0e8ed62cbb6fdaff0
2016-10-18 16:43:54 -07:00
Sarah Parker
ae51dd820d Adjust gm costing so GLOBAL_ZERO is treated as regular zeromv
Change-Id: I1b41146ae844c985566f5f9fdaeb5d4a4a5927b6
2016-10-18 16:18:23 -07:00
Sarah Parker
efa6582235 Fix ransac random generator seeding
Ransac's get_rand_indices originally used rand_r seeded with the
same value every time, producing the same random sequence at every
iteration. This causes the global motion parameters to be slightly
less accurate because ransac cannot improve the model fit after
the first attempt.

Change-Id: Idca2f88468ea21d19ba41ab66e5a2744ee33aade
2016-10-18 16:14:46 -07:00
Angie Chiang
792519bdef Add av1_fdct64_new and av1_idct64_new
Change-Id: If497816d7f6ee094d40872a2f988c91e90b78d7b
2016-10-18 16:07:56 -07:00
Guillaume Martres
470efbcf01 Remove rd_variance_adjustment
This function is called after `super_block_yrd` and assumes that the dst
buffer is correct but that is no longer always the case after
daf841b4a10ece1b6831300d79f271d00f9d027b since we don't call
`txfm_rd_in_plane` after the RDO loop in `choose_tx_size_from_rd`.
We could fix this by always saving and restoring the dst buffer but
removing `rd_variance_adjustment` is a better solution:
- Getting the dst buffer always right is tricky as demonstrated by the
  fact that it is wrong now, even if we fix it now we could break it later
  and not notice
- Perceptual weighting is a good idea but `rd_variance_adjustment` is the
  wrong approach as it weights both the rate and the distortion:
  to get meaningful units you should only weight the distortion,
  weighting rate means that we pretend some bits cost less than other
  bits, this is not the case. The distortion weighting approach is
  implemented by Daala in `od_compute_dist` and we plan to experiment
  with this in AV1 too.
- Removing `rd_variance_adjustment` improves coding efficiency on all
  metrics, here are the results for objective-1-fast using the Low
  Latency settings:

      PSNR Y:     -0.14%
     PSNRHVS:     -0.17%
        SSIM:     -0.12%
      MSSSIM:     -0.12%
   CIEDE2000:     -0.07%

Change-Id: I74b26b568ee65f56521646b8f30dd53bcd29fce3
2016-10-18 14:40:15 -07:00
Jingning Han
32658e2ab8 Add cb4x4 experimental flag
Experiment on coding block at resolution of 4x4 block.

Change-Id: I6aa201038f00c590747d800edb0a3e76ab1a51e8
2016-10-18 14:30:51 -07:00
Zoe Liu
a6a6dd509d A small bug fix in ext-refs on the RD mode selection
Change-Id: I25f14fec8e806cdf98d904488aaf200169def34d
2016-10-18 13:03:12 -07:00
Yushin Cho
40f1d487ad Remove unused PICK_MODE_CONTEXT::is_coded.
Change-Id: Ibc73b4066dcdee45d32355144124762d26a16a28
2016-10-18 12:54:12 -07:00
Urvang Joshi
8a02d76a93 Remove unused array 'last_frame_seg_map_copy'.
This array was allocated and used to save and restore segmentation map,
however the original segmentation map was never modified between the
calls to save and restore.

Change-Id: Iaf0fbfed733c097e84cf44d2aa6b8f35d2fb456b
2016-10-18 12:54:12 -07:00
Jingning Han
d98a45a6cc Add sub8x8_mc experimental flag
Change-Id: Ifcc329df240c0771172180933a6180b21fd31abe
2016-10-18 12:54:12 -07:00
Yaowu Xu
c2461b5e87 Merge "Remove macroblock::skip_optimize." into nextgenv2 2016-10-18 19:52:50 +00:00
Yaowu Xu
be0d933671 Merge "Skip 4x4 transform if maximum possible transform is 32x32" into nextgenv2 2016-10-18 19:52:42 +00:00
Yaowu Xu
cb61012305 Merge "Take out some early termination speed features" into nextgenv2 2016-10-18 19:48:46 +00:00
Angie Chiang
8c9893be05 Merge "Add experimental tag for 64x64 transform" into nextgenv2 2016-10-18 19:14:56 +00:00
Yushin Cho
e2b403b979 Remove macroblock::skip_optimize.
This is not used since the commint 00cd5de536fd5545d8fb663b2db81c014e3e6a41,
"Remove skip_recode speed feature".

Change-Id: Ic03da6c0095f6285a3889d5d22e8aaa2e6cbfd79
2016-10-18 11:26:11 -07:00
Hui Su
eafb2e62ac Skip 4x4 transform if maximum possible transform is 32x32
On average no compression performance changes. Encoding speed is
increased by 10~20% on some test clips in the derf set.

Change-Id: I9856caaa260303f6f6259686671bed7d51012277
2016-10-18 11:26:11 -07:00
Jingning Han
3f16725ff2 Take out some early termination speed features
Drop some speed features used in speed 2 and above, during the
algorithm development process. This helps simplify the codebase.

Change-Id: I3b2f5560d90b00d2d8fd57c2cb36f6ddd3f228e4
2016-10-18 11:26:11 -07:00
Yaowu Xu
8f7b1d3db9 Merge "Move a statement to match order in aom/master" into nextgenv2 2016-10-18 17:58:33 +00:00
Yaowu Xu
31e76edbfe Merge "Remove stale OD_ACCOUNTING code." into nextgenv2 2016-10-18 17:58:13 +00:00
Debargha Mukherjee
fe3814846b Add experimental tag for 64x64 transform
Change-Id: I65c04006f6e6eb13ceb22efc1c39915cb3c82b82
2016-10-18 10:24:31 -07:00
Yaowu Xu
ee775b13e2 Move a statement to match order in aom/master
Change-Id: Ic11eae36c9c62a20699197847aa3ef9562d4ad7e
2016-10-18 10:00:21 -07:00
Yaowu Xu
85c5566559 Merge "Port aom_reader_tell() support" into nextgenv2 2016-10-18 16:48:57 +00:00
Michael Bebenita
63b44c4c50 Remove stale OD_ACCOUNTING code.
Change-Id: Ie90dd06c387119ccd9c920a328c942477df00bb7
2016-10-18 09:12:06 -07:00
Debargha Mukherjee
d8ff1986d4 Merge "Fix for var_tx entropy context with rect_tx" into nextgenv2 2016-10-18 16:03:37 +00:00
Debargha Mukherjee
3f8b5b903f Merge "Correction to costing rect_tx" into nextgenv2 2016-10-18 16:03:18 +00:00
Michael Bebenita
868fc0b04a Port aom_reader_tell() support
This commit ports the following from aom/master:
4c46278 Add aom_reader_tell() support.
b9c9935 Remove an erroneous declaration.
56c9c3b Fix ANS build.

Change-Id: I59bd910f58c218c649a1de2a7b5fae0397e13cb1
2016-10-18 08:50:05 -07:00
Peter de Rivaz
46fcb05fde Fix for var_tx entropy context with rect_tx
This computation should match the code in encode_block
to increase the accuracy of the rd optimization.

Change-Id: Ibc9d9ab6d88d0c0f3af62e9cc233216aba48a57e
2016-10-18 15:38:01 +01:00
Peter de Rivaz
b85a5a7eac Correction to costing rect_tx
When built with var_tx and ext_tx, select_tx_size_fix_type is used
to compute the cost for using a particular tx_type.
The code indexes the array inter_tx_type_costs at the wrong location
resulting in a zero cost for signalling tx_type for rect_tx blocks.

Change-Id: Iba38be3a0d822109f778f0600b242dfb40359766
2016-10-18 11:55:36 +01:00
Nathan E. Egge
9ac1f7d770 Create aom_cdf_prob type for 16-bit probabilities.
Change-Id: I33899eca44300037816c9f20c965aa8311a1ef52
2016-10-17 20:22:48 -07:00
Nathan E. Egge
45741e9351 Rename daala_read_tree_cdf() to daala_read_symbol().
Change-Id: I35f85bad88c637cea62577c546cdd5ced0e21bd6
2016-10-17 20:22:19 -07:00
Hui Su
abf6fb9967 Merge "Add filter_intra experiment flag" into nextgenv2 2016-10-18 00:54:20 +00:00
Yaowu Xu
40bcdbcf3a Merge "Fix warning when discarding const qualifier." into nextgenv2 2016-10-18 00:50:09 +00:00
Yaowu Xu
65147563a5 Merge "Revert code formatting of OD_UNIFORM_CDFS_Q15." into nextgenv2 2016-10-18 00:49:57 +00:00
Yaowu Xu
9ce9e6d533 Merge "Rename aom_write_tree_cdf() to aom_write_symbol()." into nextgenv2 2016-10-18 00:49:41 +00:00
Yaowu Xu
007fd85007 Merge "Bug fix in super_block_uvrd()." into nextgenv2 2016-10-18 00:49:28 +00:00
Yaowu Xu
b44f53ba26 Merge "Display --bit-depth in -h with highbitdepth enabled." into nextgenv2 2016-10-18 00:49:18 +00:00
Yaowu Xu
79644f615e Merge "Update partition_cdf per frame." into nextgenv2 2016-10-18 00:49:05 +00:00
Yaowu Xu
153df29bbf Merge "Update inter_ext_tx_cdf per frame." into nextgenv2 2016-10-18 00:48:53 +00:00
Yaowu Xu
a6fa5436ff Merge "Update intra_ext_tx_cdf per frame." into nextgenv2 2016-10-18 00:48:41 +00:00
Yaowu Xu
f507ba79fe Merge "Update switchable_interp_cdf once per frame." into nextgenv2 2016-10-18 00:48:26 +00:00
Yue Chen
3fcf53e381 Merge "Refactor motion estimation in MOTION_VAR experiment" into nextgenv2 2016-10-18 00:32:00 +00:00
hui su
ffcf4fb788 Add filter_intra experiment flag
Will break ext-intra into 2 experiments: ext-intra and filter-intra.

Change-Id: Ibf66e9b9d9307fd58a703eada9569b74d171434b
2016-10-17 16:17:16 -07:00
Yue Chen
e9638ccfff Refactor motion estimation in MOTION_VAR experiment
To get ready for pulling AV1 to nextgenv2. Refactoring is done to
make the code structures similar, especially for the motion search
part.

Change-Id: I5d7636394408d97de55394d668540f5627827983
2016-10-17 12:48:10 -07:00
Nathan E. Egge
19698a7084 Fix warning when discarding const qualifier.
Cherry-pick Daala 211c2a41: Clean up EC tell() and tell_frac() functions.
Add a const qualifier to the od_ec_enc and od_ec_dec parameters of
 the od_ec_enc_tell(), od_ec_enc_tell_frac(), od_ec_dec_tell(), and
 od_ec_dec_tell_frac() functions.
Add an OD_WARN_UNUSED_RESULT to od_ec_enc_tell_frac().

Change-Id: Ia50e2fd75e98d8a03d993449d658b695cf56e6fb
2016-10-17 12:16:27 -07:00
Nathan E. Egge
f3035f2bc7 Revert code formatting of OD_UNIFORM_CDFS_Q15.
The formatting of OD_UNIFORM_CDFS_Q15[] in entcode.c is helpful for
 for understanding what is contained in the array (e.g., the uniform
 probability distributions of small sizes 2 through 16).
This patch reverts the change made in f4b2926d and adds linter hints to
 ignore the formatting.

Change-Id: I2ad9fe6673b86e6067cb97b40f0f0e69a119cdf5
2016-10-17 12:16:26 -07:00
Nathan E. Egge
56eeaa5daf Rename aom_write_tree_cdf() to aom_write_symbol().
Change-Id: I7c088c55f1c461063976d5bd84ff2026c4f3bc69
2016-10-17 11:54:51 -07:00
Yushin Cho
09de28b4f7 Bug fix in super_block_uvrd().
In super_block_uvrd(),if is_cost_valid == 0, all return parameters,
i.e. rate, distortion, skippable, and sse, are reset.
So, should not call txfm_rd_in_plane() if is_cost_valid == 0.
Also, the bug causes av1_xform_quant() to see invalid diff signal
since av1_subtract_plane() is not called in super_block_uvrd().

Change-Id: Iaa06061e2e9aa8876b4611a54f4ae6b8d499332b
2016-10-17 11:25:13 -07:00
Nathan E. Egge
d1b239c0c3 Display --bit-depth in -h with highbitdepth enabled.
Display the -b --bit-depth command line parameter on of aomenc when
 --config-aom-highbitdepth is enabled.

Change-Id: I76147e38b9985e68b1e642e21be8fd4d8ec4d966
2016-10-17 10:45:24 -07:00
Nathan E. Egge
fba2be692f Update partition_cdf per frame.
Move computing the partition_cdf tables per symbol to
 computing them only when the probabilities are updated.

Change-Id: I442f9230ba00be7f5d0558d7c38d7324ad009ee8
2016-10-17 10:21:06 -07:00
Nathan E. Egge
93878c4243 Update inter_ext_tx_cdf per frame.
Move computing the inter_ext_tx_cdf tables per symbol to
 computing them only when the probabilities are updated.

Change-Id: I5e1e62f8eae8f6b2edbbd378beeb786649502c10
2016-10-17 10:20:53 -07:00
Nathan E. Egge
7c5b4c1665 Update intra_ext_tx_cdf per frame.
Move computing the intra_ext_tx_cdf tables per symbol to
 computing them only when the probabilities are updated.

Change-Id: I26d5e419e103093e98a7d896c196176305b50fc9
2016-10-17 08:47:02 -07:00
Nathan E. Egge
4947c296f7 Update switchable_interp_cdf once per frame.
Move from computing the switchable_interp_cdf per symbol to
 computing once per frame when the probabilities are adapted.

Change-Id: I6571126239f0327e22bb09ee8bad94114291683e
2016-10-17 08:44:57 -07:00
Yaowu Xu
5cb0a7abc9 Replace {} with continue
Change-Id: I2e939e898cc30c2999b47f2789191e08272b1cc0
2016-10-17 08:12:18 -07:00
Yaowu Xu
2bdb9e6344 Merge changes Ie43c599f,Icd0dbed4,Ic04e180b into nextgenv2
* changes:
  Move av1_indices_from_tree() to common code space.
  Add code to compute in-order mappings for tokens.
  Fix bug in av1_tree_to_cdf_2D() macro.
2016-10-14 23:46:48 +00:00
Yaowu Xu
73d702db7f Merge changes I339d0389,I2fa1e87a,If79fa5ae,Icb1a8cb8,Ic76de4a4, ... into nextgenv2
* changes:
  Add missing CONFIG_DAALA_EC declaration.
  Add API for writing trees using a CDF.
  Add macro to build a simple cdf table.
  Use Daala entropy coder to code trees.
  Silence clang-format code review warning.
  Use Daala entropy coder to code bits.
  Clear existing format issue in the codebase
  Add Daala entropy coder.
2016-10-14 23:42:22 +00:00
Yi Luo
1dec26e004 Merge "Zero high 128b YMM registers to avoid SSE-AVX transition penalties" into nextgenv2 2016-10-14 23:13:10 +00:00
Urvang Joshi
03ae55214c Merge "Bugfix: fix the build for CONFIG_FP_MB_STATS" into nextgenv2 2016-10-14 22:11:28 +00:00
Nathan E. Egge
8abf8673e6 Move av1_indices_from_tree() to common code space.
Move the av1_indices_from_tree() function from av1/encoder/treewriter.c
 to aom_dsp/prob.c so that it can be used by both the encoder and
 the decoder.

Change-Id: Ie43c599f425c3503b1ff93f0c77b5033a05b1bb4
2016-10-14 14:59:27 -07:00
Nathan E. Egge
a67c0ff4d7 Add missing CONFIG_DAALA_EC declaration.
Without first including ./aom_config.h in aom_dsp/prob.c the memmove
 function is implicitly defined and causes a compiler warning.

Change-Id: I339d0389f10324a1085aba7d6492b2159a14da92
2016-10-14 14:59:27 -07:00
Nathan E. Egge
cfb02ddcad Add code to compute in-order mappings for tokens.
Add av1_indices_from_tree() function that computes a forward and inverse
 mapping of the tree leaf-node symbols to their in-order traversal.
This is necessary because many of the aom_tree binary trees have their
 leaf nodes out of order (e.g., an in-order traversal of a tree with n
 nodes does not start at symbol 0 and go to symbol n - 1), but the CDFs
 created by tree_to_cdf() are indexed in-order.

Change-Id: Icd0dbed4c171a67c9e84a634106c4fdb5b1b3488
2016-10-14 14:59:27 -07:00
Nathan E. Egge
44460148b2 Add API for writing trees using a CDF.
Added aom_write_tree_cdf() and aom_read_tree_cdf() function calls to
 bitwriter.h and bitreader.h respectively.
These calls take a multisymbol CDF and an index and directly encode the
 symbol using the enabled entropy coder.
Currently only the daala entropy encoder supports this (enabled with
 --enable-daala_ec) and a compile error is thrown otherwise.

Change-Id: I2fa1e87af4352c94384e0cfdbfd170ac99cf3705
2016-10-14 14:59:27 -07:00
Nathan E. Egge
439c50251f Fix bug in av1_tree_to_cdf_2D() macro.
Change-Id: Ic04e180b09745fab2230d05985770c41deea4fad
2016-10-14 14:59:27 -07:00
Nathan E. Egge
e2ed411836 Add macro to build a simple cdf table.
Add the av1_tree_to_cdf() macro which takes a aom_tree_index tree and
 associated aom_prob probabilities and constructs a daala uint16_t cdf.
The av1_tree_to_cdf_1D() and av1_tree_to_cdf_2D() apply av1_tree_to_cdf()
 across 1D and 2D arrays respectively.

Change-Id: If79fa5ae034263f279d7d0842493570885272fb2
2016-10-14 14:59:27 -07:00
Nathan E. Egge
43acafdee2 Use Daala entropy coder to code trees.
When building with --enable-daala_ec, calls to aom_write_tree() and
 aom_read_tree() will convert a aom_tree_index structure with associated
 aom_prob probabilities into a CDF on the fly for use with the
 od_ec_encode_cdf_q15().
The number of symbols in the CDF is capped at 16, and trees that contain
 more than 16 leaf nodes are handled by splitting the most likely, e.g.,
 highest probability symbols, first and coding multiple symbols if
 necessary.

ntt-short-1:

         MEDIUM (%) HIGH (%)
    PSNR 0.000227   0.000213
 PSNRHVS 0.000215   0.000205
    SSIM 0.000229   0.000209
FASTSSIM 0.000229   0.000214

subset1:

          RATE (%)  DSNR (dB)
    PSNR -0.00026   0.00002
 PSNRHVS -0.00026   0.00002
    SSIM -0.00026   0.00001
FASTSSIM -0.00026   0.00001

Change-Id: Icb1a8cb854fd81fdd88fbe4bc6761c7eb4757dfe
2016-10-14 14:59:27 -07:00
Nathan E. Egge
0435f0eae6 Silence clang-format code review warning.
Change-Id: Ic76de4a4c0c39924bf04c3c2fa9214d33bcee9fb
2016-10-14 14:59:27 -07:00
Nathan E. Egge
8043cc4018 Use Daala entropy coder to code bits.
When building with --enable-daala_ec, calls to aom_write() and aom_read()
 use the daala entropy coder to write and read bits.
When the probability is exactly 0.5 (128), then raw bits are used.

ntt-short-1:

          MEDIUM (%) HIGH (%)
    PSNR -0.027556  -0.020114
 PSNRHVS -0.027401  -0.020169
    SSIM -0.027587  -0.020151
FASTSSIM -0.027592  -0.020102

subset1:

         RATE (%)  DSNR (dB)
    PSNR 0.03296  -0.00210
 PSNRHVS 0.03537  -0.00281
    SSIM 0.03299  -0.00161
FASTSSIM 0.03458  -0.00111

Change-Id: I48ad8eb40fc895d62d6e241ea8abc02820d573f7
2016-10-14 14:59:27 -07:00
Yaowu Xu
931bc2a714 Clear existing format issue in the codebase
Fix the clang-format warnings on the existing codes.

Change-Id: I8e9e781b6f68f41a7fbd0a2116f6b35290d73dc8
2016-10-14 14:59:27 -07:00
Nathan E. Egge
1078dee569 Add Daala entropy coder.
Change-Id: I2849a50163268d58cc5d80aacfec1fd02299ca43
2016-10-14 14:59:27 -07:00
Alex Converse
b60dfc2542 Merge "Switch rANS to 15 bit precision, and adjust L_BASE." into nextgenv2 2016-10-14 21:56:34 +00:00
Alex Converse
62a94a649d Switch rANS to 15 bit precision, and adjust L_BASE.
This causes rANS to operate at the same precision as the Daala EC.

aom/master stats: rans10uabs8lbase12 → rans15uabs8lbase15

objective-1-fast
PSNR YCbCr:      0.01%      0.01%      0.01%
   PSNRHVS:      0.01%
      SSIM:      0.01%
    MSSSIM:      0.01%
 CIEDE2000:      0.01%

subset1
PSNR YCbCr:     -0.01%     -0.00%     -0.00%
   PSNRHVS:     -0.01%
      SSIM:     -0.01%
    MSSSIM:     -0.01%
 CIEDE2000:     -0.01%

(cherry picked from aom/master commit ddbc2e2a68bfc997dc61fca5bcaac3a75245e965)

Change-Id: I6ef0a4f6198784b3712a61af9f105d560a22eaea
2016-10-14 14:05:50 -07:00
Urvang Joshi
74114a3a1e Bugfix: fix the build for CONFIG_FP_MB_STATS
Cherry-picked from aomedia/master: bf6c636

Change-Id: Iea3fb46d23cb94d1152de3a7a40b6a183e78b4d7
2016-10-14 13:42:53 -07:00
Urvang Joshi
b100db7c1d Wrap palette code inside CONFIG_PALETTE flag.
This flag was already added to aomedia/master, so bringing it back to
webm/nextgenv2, as part of an effort to get the two codebases in sync.

Change-Id: I2b933a6a160e4210d1411a9e7978149eb8553205
2016-10-14 13:42:02 -07:00
Yi Luo
e9fde265f7 Zero high 128b YMM registers to avoid SSE-AVX transition penalties
Documents:
- https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx
- https://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf

Change-Id: I90f85fcb15a7a2c49ee068300be6ffe9c68d371c
2016-10-14 12:22:35 -07:00
James Zern
fbabcad67c Merge changes I4850b36e,Ic4d7128a into nextgenv2
* changes:
  variance_avx2: sync variance functions with c-code
  Resolve -Wshorten-64-to-32 in variance.
2016-10-14 19:10:20 +00:00
Yaowu Xu
8d510e2e78 Use "av1" as codec name
Change-Id: I7650f1e96df0bcd53b1733c7967aae52dccf836a
2016-10-14 11:05:54 -07:00
Yaowu Xu
931bf3d6e1 Merge "Revert "Revert "Move CLPF block signals from frame to SB level.""" into nextgenv2 2016-10-14 17:58:20 +00:00
Yi Luo
b9fbf38bff Merge "Delete some redundant function declarations in aom_dsp_rtcd_defs.pl" into nextgenv2 2016-10-14 17:50:37 +00:00
Yaowu Xu
d71be7815d Revert "Revert "Move CLPF block signals from frame to SB level.""
This reverts commit 9b25f30674 to
reinstate the reverted commit with fixes that solved the build issues
when --enalbe-clpf is used in configure.

Change-Id: I15447cae7fa9b3deb27976345dc3db230a4a7a60
2016-10-14 08:58:49 -07:00
Yaowu Xu
4b71775307 Merge "Revert "Move CLPF block signals from frame to SB level."" into nextgenv2 2016-10-14 15:39:36 +00:00
Yaowu Xu
9b25f30674 Revert "Move CLPF block signals from frame to SB level."
This reverts commit 975350387c.

Change-Id: I9f8e891739352ca2bde4b294e37c85a668f416e0
2016-10-14 15:39:03 +00:00
James Zern
8c64331aa2 variance_avx2: sync variance functions with c-code
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings

Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
(cherry picked from commit 6acd061aad)
2016-10-13 20:15:18 -07:00
Alex Converse
2176b7acc2 Resolve -Wshorten-64-to-32 in variance.
The subtrahend is small enough to fit into uint32_t.

Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f
(cherry picked from commit c0241664aa)
2016-10-13 20:12:20 -07:00
Debargha Mukherjee
078856a4df Merge "Simplify 8x16 and 16x8 inverse transform tests" into nextgenv2 2016-10-14 02:53:38 +00:00
Debargha Mukherjee
089315fc5e Merge "Enable test system to detect transforms misusing 'stride' parameter" into nextgenv2 2016-10-14 02:50:47 +00:00
Debargha Mukherjee
a720f4b3b5 Merge "Add sse2 forward and inverse 16x32 and 32x16 transforms" into nextgenv2 2016-10-14 02:49:20 +00:00
Yue Chen
a48764d05f Merge "Renamings for OBMC experiment" into nextgenv2 2016-10-14 01:33:00 +00:00
Yi Luo
761ae880d7 Delete some redundant function declarations in aom_dsp_rtcd_defs.pl
Change-Id: I4df57a7faba5800c048b2dc469ec31545406f55c
2016-10-13 17:53:45 -07:00
Steinar Midtskogen
975350387c Move CLPF block signals from frame to SB level.
These signals were in the uncompressed frame header (as a temporary
hack), which caused two problems:

* We don't want that header to be duplicated in the slice header
* It was necessary to signal the number of bits to transmit up front

However, the filter size can be 128x128 which is greater than the SB
size, and a decoder wouldn't be able to know whether to read a bit or
not until the final SB of that 128x128 block has been decoded
(depending on whether the 128x128 is all skip or not).  Therefore the
signalling was changed for 128x128 blocks so that every top left SB of
a 128x128 filter block contains a signal regardless of whether the
block is all skip or not.  Also, all the MB's of 128x128 block are
filtered even if they are skip MB's.  This gives the signal a purpose
even when the 128x128 block is all skip, and it also gives a slight
coding gain as it leaves a way to filter skip blocks, which was
previously forbidden.

Low latency:
PSNR YCbCr:     -0.19%     -0.14%     -0.06%
   PSNRHVS:     -0.15%
      SSIM:     -0.13%
    MSSSIM:     -0.15%
 CIEDE2000:     -0.19%

High latency:
PSNR YCbCr:     -0.03%     -0.01%     -0.09%
   PSNRHVS:      0.04%
      SSIM:      0.00%
    MSSSIM:      0.02%
 CIEDE2000:     -0.02%

Change-Id: I69ba7144d07d388b4f0968f6a53558f480979171
2016-10-13 16:06:10 -07:00
Yue Chen
cb60b185c7 Renamings for OBMC experiment
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.

Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
2016-10-13 15:51:22 -07:00
Steinar Midtskogen
2d5f752ae9 Don't use _mm_cvtsi128_si64 on 32 bit systems
Change-Id: I332afb8d9e35cd60f05915160a5b2e1dc8757de5
2016-10-13 14:35:00 -07:00
Yaowu Xu
410fee8de6 Fix formatting in a few files
Change-Id: Ia5175afe82b142d9e18c01c546610202c630588e
2016-10-13 13:04:29 -07:00
Jean-Marc Valin
a8ce2c9199 Removing some useless loops in deringing filter
No change in the output

Change-Id: I1627feaa163d65da0df90e9dacbc5e39ee755de8
2016-10-13 18:27:25 +00:00
Jean-Marc Valin
209f830d97 Fix deringing level choice for 10-bit and 12-bit
Making sure we never exceed a base level of 63

Change-Id: I821254b8d970446bd40fdd6e4d7073c69760a86d
2016-10-13 18:27:17 +00:00
Jean-Marc Valin
3cfec90d33 Don't dering superblocks that have deringing disabled
Doesn't change the output, but avoids useless deringing with threshold=0

Change-Id: I69f3e54abad2d2493cfbc76c188ad7d190f0aeff
2016-10-13 18:27:03 +00:00
Yaowu Xu
98e9ce923b Merge "Add SSE4.1 code for deringing functions." into nextgenv2 2016-10-13 18:02:59 +00:00
Michael Bebenita
7227b65c4c Add SSE4.1 code for deringing functions.
Change-Id: I363f7fb610a5c86ea9f417e34b57c6373af877e5
2016-10-13 18:02:19 +00:00
Yaowu Xu
3feb89170b Merge "Simpler threshold calculation for the second filter" into nextgenv2 2016-10-13 18:01:45 +00:00
Yaowu Xu
5d2f01284f Merge "Make 4x4 deringing (chroma) use shorter filters" into nextgenv2 2016-10-13 18:01:23 +00:00
Yaowu Xu
fd44e24541 Merge "Removing Daala-specific deringing code" into nextgenv2 2016-10-13 18:01:11 +00:00
Zoe Liu
12cbaac759 Merge "Clean code a bit and fix a couple of small bugs in ext-refs" into nextgenv2 2016-10-13 16:47:03 +00:00
Yaowu Xu
9ffdf48c5a Merge "Use a quantizer-based threshold rather than full search for deringing" into nextgenv2 2016-10-13 16:35:08 +00:00
Yaowu Xu
8ac419f307 Merge changes Ic3a68557,Ib1dbe41a,I0da09270,Ibdbd720d into nextgenv2
* changes:
  Deringing cleanup: remove DERING_REFINEMENT (always on now)
  Don't run the deringing filter on skipped blocks within a superblock
  Don't dering skipped superblocks
  On x86 use _mm_set_epi32 when _mm_cvtsi64_si128 isn't available
2016-10-13 15:54:32 +00:00
Zoe Liu
f0e4669edb Clean code a bit and fix a couple of small bugs in ext-refs
Currently the patch does not have any impact on the RD performance. The
fix could however potentially help on the next step of work, especially
when the extra altref frames allow non-zero temporal filtering strength
and their corresponding OVERLAY frames, i.e. the INTNL_OVERLAY frames
are being added.

Change-Id: I2e07fb3d0aa547a0b5dd05bb4ba865cd46309076
2016-10-13 08:42:51 -07:00
Yaowu Xu
89d3f2fd10 Merge "Sync 2x2 intra predictors" into nextgenv2 2016-10-13 15:20:52 +00:00
David Barker
4f803efac1 Simplify 8x16 and 16x8 inverse transform tests
Change-Id: Ie86aedfb1f3e0d9c0cf58d7183861a0ed0e8ccc8
2016-10-13 16:02:59 +01:00
David Barker
7825022daa Enable test system to detect transforms misusing 'stride' parameter
This would have caught the bug introduced in patch set 1 of
https://chromium-review.googlesource.com/#/c/397378/

Change-Id: I9c6d5d9c4c98aed5ac48c4fb1c4ff4131b0df1d5
2016-10-13 15:50:44 +01:00
Alex Converse
cba3d1f1c3 AnsTest: Replace the dummy distribution
Use constrained token table row 65/256 instead.

Change-Id: I8b442d4c82af8fa9d36ac2de0d73179ed040478d
(cherry picked from commit 47eb9a2ca46821b468903514cd34eaaca2533d45)
2016-10-13 07:04:55 -07:00
Alex Converse
fc4980edb7 Merge changes Ic74d9d88,Ie93b474e,I544989ea,Ic273f7d9,Idfd2d2b3, ... into nextgenv2
* changes:
  Remove custom rans types
  Remove add_token_no_extra.
  Remove unused aom_rans_build_cdf_from_pdf
  Add the tool used to generate the constrained tokenset.
  Remove the starting zero from ANS CDFs.
  Import the aom_read/write_symbol abstractions from aom/master
2016-10-13 14:03:15 +00:00
David Barker
33231d4801 Add sse2 forward and inverse 16x32 and 32x16 transforms
Change-Id: I1241257430f1e08ead1ce0f31db8272b50783102
2016-10-13 14:01:22 +01:00
Debargha Mukherjee
cad8283e55 Merge "Fix a bug in inverse halfright 32x32 transform" into nextgenv2 2016-10-13 08:16:47 +00:00
Alex Converse
9ed1a2ff44 Remove custom rans types
(cherry picked from aom/master commit 11206c60d930be9d29100567aa67f2a65463852a)

Includes renames in a bunch of places not handled by the original
due to differing tree states.

Change-Id: Ic74d9d8850b8c80a51e55e425bbf472a67e2653f
2016-10-13 05:53:58 +00:00
Jingning Han
e3954d8312 Sync 2x2 intra predictors
Add 2x2 DC, V, H, TM intra predictors.

Change-Id: I2a614adde553f821c45bc5a9bf09800a9f0aaa26
2016-10-12 21:04:01 -07:00
Jean-Marc Valin
4713d8d019 Simpler threshold calculation for the second filter
PSNR YCbCr:      0.03%     -0.00%      0.07%
   PSNRHVS:      0.06%
      SSIM:      0.12%
    MSSSIM:      0.09%
 CIEDE2000:      0.05%

Change-Id: I15ef9598a08f6713bc28ab98b0182310433e97ef
2016-10-12 18:17:10 -07:00
Jean-Marc Valin
ea64c342b7 Make 4x4 deringing (chroma) use shorter filters
Avoids blurring chroma for 4:2:0

PSNR YCbCr:      0.03%     -0.31%     -0.29%
   PSNRHVS:      0.02%
      SSIM:      0.03%
    MSSSIM:      0.02%
 CIEDE2000:      0.01%

Change-Id: If744fb902b5f24404479def22b9ca8a19baec722
2016-10-12 18:16:54 -07:00
Jean-Marc Valin
2c616e61e0 Removing Daala-specific deringing code
No point in keeping them in sync now that all the code is reformatted

Change-Id: I8a062253ed6a5f86028cd5a2a922b3c760def6fb
2016-10-12 18:16:23 -07:00
Jean-Marc Valin
6d5a7a924b Use a quantizer-based threshold rather than full search for deringing
objective-1-short results (with deringing enabled):
PSNR YCbCr:      0.08%      0.03%      0.11%
   PSNRHVS:      0.06%
      SSIM:      0.12%
    MSSSIM:      0.08%
 CIEDE2000:      0.05%

Change-Id: Ifcfc42c14c33650dcf879c4d0ddd8688d4d07da1
2016-10-12 18:16:07 -07:00
Alex Converse
4ce69de9a6 Remove add_token_no_extra.
It was a fairly small production optimization for VP9.

Change-Id: Ie93b474ea5b7e63384a7c0b3a56b135462d1471b
(cherry picked from aom/master commit df9bb76b1330de42fe13827df4c72010adb51429)
2016-10-12 17:44:28 -07:00
Alex Converse
d5b9c730ad Remove unused aom_rans_build_cdf_from_pdf
Change-Id: I544989eae45b7dda04250365c3de99f50110a76b
(cherry picked from aom/master commit 06cce842caa5212826d51c2a317de0bdfae74349)
2016-10-12 17:44:14 -07:00
Alex Converse
dacf45facd Add the tool used to generate the constrained tokenset.
The code that generates the raw distribution is based on a MATLAB
program by Debargha Mukherjee, and the algorithm used to quantize the
distribution comes from the ANS Toolkit by Jarek Duda.

Change-Id: Ic273f7d9e43e3ecd999e9e7e04cde57e8559375a
(cherry picked from aom/master commit ef446026aeafa318f9bee182b8c80eb4f1ef5a0a)
2016-10-12 17:41:01 -07:00
Alex Converse
e9f70f8f10 Remove the starting zero from ANS CDFs.
This brings it in line with the Daala CDFs and will make it easier to
share code.

Change-Id: Idfd2d2b33c3b9b2c4e72ce72fb3d8039013448b9
(cherry picked from aom/master commit af98507ca928afe33e9f88fdd2ca168379528d6a)
2016-10-12 17:41:01 -07:00
Alex Converse
a1ac972867 Import the aom_read/write_symbol abstractions from aom/master
Change-Id: I0b255c05108c3b97e74df1b59c34111c9e9a5770
2016-10-12 17:41:01 -07:00
Jean-Marc Valin
e874ce0300 Deringing cleanup: remove DERING_REFINEMENT (always on now)
Change-Id: Ic3a6855799be010e69aeab924b013679282ab191
2016-10-12 17:13:09 -07:00
Jean-Marc Valin
8455cd9fc1 Don't run the deringing filter on skipped blocks within a superblock
No change in metrics

Change-Id: Ib1dbe41a9e1a564dd9a63a33e2a5315ad6bca70c
2016-10-12 17:12:45 -07:00
Jean-Marc Valin
56b0c3c51b Don't dering skipped superblocks
No change in metrics

Change-Id: I0da09270d78c3caf78a32a3157f02c87f2232e3e
2016-10-12 17:12:10 -07:00
Yi Luo
e01484e412 Merge "Hybrid forward transform 32x32 AVX2 optimization" into nextgenv2 2016-10-13 00:08:48 +00:00
Steinar Midtskogen
b074823863 On x86 use _mm_set_epi32 when _mm_cvtsi64_si128 isn't available
Change-Id: Ibdbd720d4f68892da6164a9849e212e759305005
2016-10-12 15:48:13 -07:00
Alex Converse
91e4e604bd Merge changes I3ca2b674,I78afc587,I3ae62181,I5ed91556 into nextgenv2
* changes:
  Unfork ANS decode_coefs
  Remove ZERO_TOKEN from the ANS tokenset
  Drop costing ANS tokens from derived probabilities
  Unfork ANS pack_mb_tokens
2016-10-12 22:25:27 +00:00
Debargha Mukherjee
e52816bf8f Fix a bug in inverse halfright 32x32 transform
Fix a bug in the C implementation of the ihalfright32
transform, in the case that its input and output buffers are the same.
This occurs when it is called by av1_iht32x16_512_add_c.

Change-Id: I61c652e2662178520c0639a2879ae128a9c7ec3f
2016-10-12 14:49:18 -07:00
Yi Luo
fed8e1c06d Hybrid forward transform 32x32 AVX2 optimization
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.

- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
  But function replacement must go with the corresponding inverse txfm.

- No obvious user level time reduction due to 32x32 TX_TYPE selection.

- Zero high 128b YMM to avoid AVX-SSE transition penalties
  (fix 16x16 case).

- Added 32x32 AVX2 unit tests to verify bitexact.

- AVX2 optimization summary:
  On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
  C to AVX2: function level time reduction, ~86-89%.
  SSE2 to AVX2: function level time reduction, ~51%.

Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
2016-10-12 14:19:53 -07:00
Hui Su
933bf08cfb Merge "Send allow_screen_content flag for both key and intra only frames" into nextgenv2 2016-10-12 21:13:24 +00:00
Debargha Mukherjee
4282b6bbbb Merge "Refactor expand dry_run types to return coef rate" into nextgenv2 2016-10-12 21:06:41 +00:00
Alex Converse
5e4d00c37e Unfork ANS decode_coefs
This is less code and more like what we have in aom/master.

Change-Id: I3ca2b674e4ad9e2e211d08bb51d78549e8b63a54
2016-10-12 13:23:33 -07:00
Alex Converse
ea7e990fd4 Remove ZERO_TOKEN from the ANS tokenset
This can be re-added after aligning AOM's ANS with nextgenv2's ANS.

This partially reverts commit 3829cd2f2f.

Change-Id: I78afc587f1abfe33ffcd53b3262910cfae135534
2016-10-12 13:15:08 -07:00
Alex Converse
ccf472bc05 Drop costing ANS tokens from derived probabilities
This mimics what's currently done in aom/master. This can be re-added
after aligning AOM's ANS with nextgenv2's ANS.

Change-Id: I3ae62181dd4803694204a234c717a86a15ca8a40
2016-10-12 13:14:21 -07:00
Alex Converse
dc62b0925d Unfork ANS pack_mb_tokens
This is less code and more like what we have in aom/master.

Change-Id: I5ed915563cbfbc6281113c1eb31455f50710ba9f
2016-10-12 13:09:13 -07:00
Jim Bankoski
3265ef3d1d AUTHORS regenerated
script changed to remove extra entities and clang-format bot.

Change-Id: I102cd80fdf4b240e6e4d5172943e49146a601a72
2016-10-12 12:26:05 -07:00
Yaowu Xu
c4d8fea575 Merge "minor updates" into nextgenv2 2016-10-12 19:25:47 +00:00
hui su
24f7b07f2e Send allow_screen_content flag for both key and intra only frames
BUG=webm:1311

Change-Id: I03c1043d17ed4e4ea22002473779a9612884c6c6
2016-10-12 11:45:05 -07:00
Yaowu Xu
c49a6f2a21 Merge "Include fix: use aom_integer.h" into nextgenv2 2016-10-12 18:26:30 +00:00
Yaowu Xu
694419b6a6 Merge "Add compiler flag -Wsign-compare" into nextgenv2 2016-10-12 18:26:22 +00:00
Yaowu Xu
732c188523 Merge "LIBVPX_TEST_DATA_PATH -> LIBAOM_TEST_DATA_PATH" into nextgenv2 2016-10-12 17:56:26 +00:00
Yaowu Xu
f36d0b46d1 minor updates
1. vp8->aom
2. removed no-effect statements and spaces

Change-Id: I367d05ff9bf1b9f3c71c517c45d8049d9d4236ec
2016-10-12 10:50:08 -07:00
Sarah Parker
d2b1fe4a1f Merge "Fix inconsistency in gm parameter write to bitstream" into nextgenv2 2016-10-12 17:32:21 +00:00
Urvang Joshi
f792a72740 Include fix: use aom_integer.h
Change-Id: I98919a04bead417379e555461f67978501f922e7
2016-10-12 08:27:00 -07:00
Urvang Joshi
d3a7576fbc Add compiler flag -Wsign-compare
Also, fix the warnings generated by this flag.

Conflicts:
	examples/aom_cx_set_ref.c

Change-Id: I0451e119c52000aa7c1c55027d53f1da5a02a11f
2016-10-12 08:27:00 -07:00
Yaowu Xu
97aa09f658 LIBVPX_TEST_DATA_PATH -> LIBAOM_TEST_DATA_PATH
This commit renames LIBVPX_TEST_DATA_PATH to LIBAOM_TEST_DATA_PATH,
with a work around for working with jenkins environmnet variables.

Change-Id: If664ce57e25ad2af8121d1b578bf64043f0baa2a
2016-10-12 08:26:44 -07:00
Yaowu Xu
445ae93ec7 Merge "y4m_test: fix segfault if test files are missing" into nextgenv2 2016-10-12 04:26:52 +00:00
Yaowu Xu
6bb9b697be Merge "Remove two files not in use" into nextgenv2 2016-10-12 04:26:36 +00:00
Sarah Parker
689b0caea7 Fix inconsistency in gm parameter write to bitstream
Before this change, gm parameters were being written to the
bitstream for all frames, but only read for inter only frames,
causing a bitstream error.

Change-Id: I63b8e2fdf6358e07cc00718de04cc399809bde37
2016-10-11 19:35:26 -07:00
Tristan Matthews
46940a8e7d y4m_test: fix segfault if test files are missing
Change-Id: I7a04beb83095e5c0821048909f81f45be8b5eee3
2016-10-11 18:20:01 -07:00
Alex Converse
5cca4187fe Merge "Remove -fno-strict-aliasing flag" into nextgenv2 2016-10-11 23:24:39 +00:00
Yaowu Xu
5a9b51c725 Remove two files not in use
test/cx_set_ref.sh: replaced by test/aomcx_set_ref.sh
test/vpxdec.sh: replaced by aomdec.sh

Change-Id: I74136d311eee7666e08ed8f573a17f810992fc52
2016-10-11 15:12:11 -07:00
Yaowu Xu
4a01dca3c6 Merge "change to use aomedia copyright notice" into nextgenv2 2016-10-11 22:11:09 +00:00
Yaowu Xu
058ec6cd56 Merge "Fix missing parentheses in v64_align()" into nextgenv2 2016-10-11 22:10:08 +00:00
Yaowu Xu
f72f844572 Merge "Improve v128 and v64 8 bit shifts for x86" into nextgenv2 2016-10-11 22:09:53 +00:00
Yaowu Xu
c96168987d Merge "Clean up and speed up CLPF clipping" into nextgenv2 2016-10-11 22:09:31 +00:00
Yaowu Xu
afb60c361c Merge "Fix typos in CLPF unit test" into nextgenv2 2016-10-11 22:06:59 +00:00
Yaowu Xu
bd979a16c8 Merge "Make generic SIMD code compile if no native support" into nextgenv2 2016-10-11 22:06:43 +00:00
Debargha Mukherjee
ceebb70197 Refactor expand dry_run types to return coef rate
Adds the functionality to return the rate cost due to
coefficients without doing full search of all modes.
This will be subsequently used in various experiments,
including in new_quant experiment to search quantization
profiles at the superblock level without repeating the
full mode/partition search.

Change-Id: I4aad3f3f0c8b8dfdea38f8f4f094a98283f47f08
2016-10-11 14:55:26 -07:00
Yaowu Xu
53a9745c7a Merge "Bugfix in CLPF RDO. Prevented selection of enable_fb_flag=0." into nextgenv2 2016-10-11 21:54:13 +00:00
Yaowu Xu
1aa6cbc7ea Merge "Bugfix in the CLPF RDO." into nextgenv2 2016-10-11 21:53:56 +00:00
Sarah Parker
4082ff0bf6 Merge "Read mode to mi->bmi for sub 8x8 blocks" into nextgenv2 2016-10-11 21:48:01 +00:00
Yaowu Xu
6e0d64c5fe change to use aomedia copyright notice
Change-Id: Idb2cf2555bcbe04a6650c492a3a714d7d5836b67
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
b066b962a7 Fix missing parentheses in v64_align()
Change-Id: I16469062853c101965f56002be30ebc5823975b1
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
9d6a53b8fd Improve v128 and v64 8 bit shifts for x86
Change-Id: I25dc61bab46895d425ce49f89fceb164bee36906
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
e66fc87c46 Clean up and speed up CLPF clipping
* Move clipping tests from inside to outside loops
* Let sizex and sizey to clpf_block() be the clipped block size rather
  than both just bs
* Make fallback tests to C more accurate

Change-Id: Icdc57540ce21b41a95403fdcc37988a4ebf546c7
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
6116141c23 Fix typos in CLPF unit test
Change-Id: Ia69bad44e47509208e3b9d306165d0872d4e92f3
2016-10-11 12:36:16 -07:00
Steinar Midtskogen
ebf209ba82 Make generic SIMD code compile if no native support
Change-Id: I7f691a0ae27f06ef3d727764829a60a8ffc509eb
2016-10-11 12:36:16 -07:00
Steinar Midtskogen
86b19177ab Bugfix in CLPF RDO. Prevented selection of enable_fb_flag=0.
PSNR YCbCr:     -0.01%     -0.06%     -0.17%
   PSNRHVS:      0.01%
      SSIM:      0.03%
    MSSSIM:      0.00%
 CIEDE2000:     -0.05%

Change-Id: I1205c021bfc5cee6f80344fec92aabb529af9bd1
2016-10-11 12:35:48 -07:00
Steinar Midtskogen
2e40cc4ce6 Bugfix in the CLPF RDO.
When CLPF was extended to chroma, the chroma RDO accidentally
discarded the optimal block size found in the luma RDO.

PSNR YCbCr:     -0.25%      0.05%      0.06%
   PSNRHVS:     -0.19%
      SSIM:     -0.36%
    MSSSIM:     -0.23%

Conflicts:
	av1/common/clpf.c

Change-Id: Ie49cd30f9276a311ada88cb2f13d14757617f030
2016-10-11 12:35:10 -07:00
Yaowu Xu
25faa0e9f5 Merge "Move tree writing code into bitwriter.h." into nextgenv2 2016-10-11 19:16:25 +00:00
Yaowu Xu
de005d322a Merge "Remove unused color_sensitivity member from MACROBLOCK." into nextgenv2 2016-10-11 19:16:07 +00:00
Sarah Parker
d7fa8542f6 Read mode to mi->bmi for sub 8x8 blocks
Previously, only the motion vectors were being stored. This caused
a mismatch in the global motion experiment, which needs this
mode information to decide whether or not to use the gm parameters
in reconstruction.

Change-Id: I58cde750ec06587dbfb8d65b07c15a67b7d6b1f6
2016-10-11 11:51:59 -07:00
Yaowu Xu
57aa518c30 Merge "CLPF: Remove redundant function argument." into nextgenv2 2016-10-11 18:44:56 +00:00
Yaowu Xu
80eaf1a120 Merge "Extend CLPF to chroma." into nextgenv2 2016-10-11 18:44:31 +00:00
Yaowu Xu
39b25dfa38 Merge "Remove some dead code in CLPF." into nextgenv2 2016-10-11 18:43:27 +00:00
Yaowu Xu
aaf64c4387 Merge "Print correct info if CLPF unit tests fail." into nextgenv2 2016-10-11 18:42:52 +00:00
Yaowu Xu
443e522b5c Merge "Reduce memory footprint for CLPF encoding." into nextgenv2 2016-10-11 18:42:34 +00:00
Yaowu Xu
a1a7ad0c15 Merge "Make generic SIMD work with clang." into nextgenv2 2016-10-11 18:42:15 +00:00
Yaowu Xu
0bab35bf64 Merge "Fix clang-format warnings in aom_dsp/simd/v64_intrinsics_arm.h" into nextgenv2 2016-10-11 18:41:50 +00:00
Yaowu Xu
a71552421d Merge "Non-normative quality improvements to CLPF." into nextgenv2 2016-10-11 18:41:40 +00:00
Yaowu Xu
038d41045b Merge "Added high bit-depth support in CLPF." into nextgenv2 2016-10-11 18:41:15 +00:00
Yaowu Xu
6fc92c1ccc Merge "Fix a memleak in CLPF." into nextgenv2 2016-10-11 18:41:03 +00:00
Yaowu Xu
a2bbf621f1 Merge "Reduce memory footprint for CLPF decoding." into nextgenv2 2016-10-11 18:40:47 +00:00
Yaowu Xu
4da3ed40a3 Merge "Make CLPF handle frame widths and heights not divisible by 8." into nextgenv2 2016-10-11 18:40:05 +00:00
Yaowu Xu
b5e73bddb0 Merge "CLPF: Don't assume sb size=64 and w&h multiple of 8 + valgrind fix." into nextgenv2 2016-10-11 17:44:12 +00:00
Yaowu Xu
3b161e14b3 Merge "Silence some harmless compiler warnings in CLPF." into nextgenv2 2016-10-11 17:43:23 +00:00
Zoe Liu
d623c4122a Merge "Add a small code clean for show_existing_frame" into nextgenv2 2016-10-11 16:58:17 +00:00
Nathan E. Egge
eeedc633c0 Move tree writing code into bitwriter.h.
Rename av1_write_tree() to aom_write_tree() and move it into bitwriter.h
 to match aom_read_tree() in bitreader.h.

Manually cherry-picked from aom/master:
33a143fa7ac42d62080bfc20468cb76ad26045db

Change-Id: I6c686cdd3e0f179d7e95c5bc6984558b62d46d67
2016-10-11 09:36:01 -07:00
Thomas Daede
debaface95 Remove unused color_sensitivity member from MACROBLOCK.
Conflicts:
	av1/encoder/block.h
	av1/encoder/encodeframe.c

Change-Id: I941e7b9e76380f262b173928d3c5132c5613b3ce
2016-10-11 09:35:39 -07:00
Yaowu Xu
12fcf74c8a Merge "Use derived variable size for memcpy" into nextgenv2 2016-10-11 16:15:43 +00:00
Yaowu Xu
4960f7c3bd Merge "Added generic SIMD support for CLPF." into nextgenv2 2016-10-11 16:05:18 +00:00
Debargha Mukherjee
fb865cf41c Merge "Add sse2 forward / inverse 4x8 and 8x4 transforms" into nextgenv2 2016-10-11 15:50:32 +00:00
Yaowu Xu
c648a9fd83 Use derived variable size for memcpy
Manually cherry-picked from aom/master:
bf2ad75a1723d223c376b93295aa06dd23226937

Change-Id: I99f05e79ec8ad35a49bc124e6dd829ccc7d9cc36
2016-10-10 17:39:29 -07:00
Zoe Liu
5fca72498a Add a small code clean for show_existing_frame
Change-Id: I42dc9f0fdecd3cf3398ab82d6e01dde06bdf7b24
2016-10-10 17:18:57 -07:00
Steinar Midtskogen
ded69f5668 CLPF: Remove redundant function argument.
Change-Id: I31bea3b1f76493060edd7e1bd616a223841d5f77
2016-10-10 15:24:33 -07:00
Steinar Midtskogen
ecf9a0c821 Extend CLPF to chroma.
Objective quality impact (low latency):

PSNR YCbCr:      0.13%     -1.37%     -1.79%
   PSNRHVS:      0.03%
      SSIM:      0.24%
    MSSSIM:      0.10%
 CIEDE2000:     -0.83%

Change-Id: I8ddf0def569286775f0f9d4d4005932766a7fc27
2016-10-10 15:23:38 -07:00
Steinar Midtskogen
9021d09f9a Remove some dead code in CLPF.
av1_clpf_frame() was always called with the same src and dst,
so we only need one argument and the code supporting different
src and dst was removed.

Change-Id: I70919f50e5cfb19c22eb4dff9ee7c0fa2697fad3
2016-10-10 15:23:09 -07:00
Steinar Midtskogen
ee54e5f3c5 Print correct info if CLPF unit tests fail.
Change-Id: Ieac27194f342d8ef9ef98c96ebea9d0c444658cf
2016-10-10 15:21:06 -07:00
Steinar Midtskogen
a8af9126fb Reduce memory footprint for CLPF encoding.
Use in-place filtering, like in the decoder
(see eb5794da1659f87597291d84c2fbdfd89280065d).

Change-Id: If037ead45f5cb3461347a63e0e415954d5dcba8b
2016-10-10 15:20:42 -07:00
Steinar Midtskogen
7b7624e89e Make generic SIMD work with clang.
Change-Id: I2c504a078a7137bea6ba50c5768c1295878e9ea1
2016-10-10 15:18:57 -07:00
Jingning Han
0b44cdcab1 Fix clang-format warnings in aom_dsp/simd/v64_intrinsics_arm.h
Change-Id: I221bf4520d7030133e3b2fea883a995b3d6f6282
2016-10-10 15:18:49 -07:00
Steinar Midtskogen
499deb9def Non-normative quality improvements to CLPF.
BDR improvements:
     PSNR  PSNRHVS SSIM  MSSSIM CIEDE2000 PSNR Cb  PSNR Cr
LL: -0.17% -0.13% -0.11% -0.12%   -0.18%   -0.19%   -0.21%
HL: -0.21% -0.14% -0.15% -0.11%   -0.37%   -0.39%   -0.52%

Change-Id: I58c00a1cc0ddfc3376644f66345e99472482a613
2016-10-10 11:31:50 -07:00
Steinar Midtskogen
3dbd55a6c4 Added high bit-depth support in CLPF.
Change-Id: Ic5eadb323227a820ad876c32d4dc296e05db6ece
2016-10-10 11:27:04 -07:00
Steinar Midtskogen
9351b2f792 Fix a memleak in CLPF.
The memleak appeared in eb5794da1659f87597291d84c2fbdfd89280065d.

Change-Id: Ifdd6d64aafa0d0ce4dfaf1844f594d5f843bf2e0
2016-10-10 11:26:52 -07:00
Steinar Midtskogen
e8224c7ad5 Reduce memory footprint for CLPF decoding.
Instead of having CLPF write to an entire new frame and
copy the result back into the original frame, make the
filter able to work in-place by keeping a buffer of size
frame_width*filter_block_size and delay the write-back
by one filter_block_size row.

This reduces the cycles spent in the filter to ~75%.

Change-Id: I78ca74380c45492daa8935d08d766851edb5fbc1
2016-10-10 11:26:33 -07:00
Steinar Midtskogen
34dac00adc Make CLPF handle frame widths and heights not divisible by 8.
Change-Id: If5eb33b6b090f43ba64c82468576b89eddd872c3
2016-10-10 11:26:15 -07:00
Steinar Midtskogen
f4d41e6330 CLPF: Don't assume sb size=64 and w&h multiple of 8 + valgrind fix.
Change-Id: I518ad9c58973910eb0bdcb377f2d90138208c570
2016-10-10 11:21:23 -07:00
Steinar Midtskogen
2fd70ee124 Silence some harmless compiler warnings in CLPF.
Change-Id: I4a6d84007bc17b89cfd8d8f2440bf2968505bd6a
2016-10-10 11:20:43 -07:00
Steinar Midtskogen
be668e92c3 Added generic SIMD support for CLPF.
Change-Id: Ie03f9a5b0a4c708a586532198d755a1e7509f149
2016-10-10 11:19:37 -07:00
Yaowu Xu
607048d606 Merge "Added generic SIMD library supporting x86 SSE2+ and ARM NEON." into nextgenv2 2016-10-10 18:17:50 +00:00
Yaowu Xu
abe0484cee Merge "New CLPF: New kernel and RDO for strength and block size" into nextgenv2 2016-10-10 18:17:41 +00:00
David Barker
4d03d6fc6f Add sse2 forward / inverse 4x8 and 8x4 transforms
Change-Id: I89ed93fb20cf975c2b463cff58879521ceaa4163
2016-10-10 09:02:45 -07:00
Yi Luo
3a8217f21b Merge "Hybrid forward transforms 16x16 AVX2 optimization" into nextgenv2 2016-10-07 01:52:11 +00:00
Debargha Mukherjee
609453e7e4 Merge "Added sse2 inverse 8x16 and 16x8 transforms" into nextgenv2 2016-10-07 00:03:34 +00:00
Debargha Mukherjee
e4dc5f8dc9 Merge "A bug fix for var-tx" into nextgenv2 2016-10-07 00:02:31 +00:00
Johann
9ed9cedae1 Remove -fno-strict-aliasing flag
The referenced bug was fixed by saving neon registers. That this had any
effect was coincidental.

Both chromium and Android build with clang and neither uses this flag.

Change-Id: I470247d6fd9226fc207b42a187105581a94badc3
(cherry picked from commit fad70a358b)
2016-10-06 15:52:39 -07:00
Yi Luo
e8e8cd8f1b Hybrid forward transforms 16x16 AVX2 optimization
- Unit tests are added for AVX2 SIMD.
- Encoder speed improvement:
  AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
  800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
  user level time reduction: 3.86%.

Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
2016-10-06 15:33:15 -07:00
Alex Converse
24aa59cc51 Fix left shift of negative integer in hbd directional predictors
Change-Id: Id78139ae2dfa2d521bd50618b4a81cf24e09e391
2016-10-06 11:41:47 -07:00
Peter de Rivaz
1baecfeb03 Added sse2 inverse 8x16 and 16x8 transforms
Change-Id: I43628407b11e5c8e6af4df69f2acdc67ac827834
2016-10-06 11:23:14 -07:00
Debargha Mukherjee
29804479b5 Merge "Silence some warnings" into nextgenv2 2016-10-06 18:15:16 +00:00
Debargha Mukherjee
28d924b7b8 A bug fix for var-tx
Fixes a crash with supertx, ext-tx and rect-tx

Change-Id: I6b5f4cfd6e209558541a791be685b55156aa0138
2016-10-06 11:14:27 -07:00
Steinar Midtskogen
a5f8ea1109 Added generic SIMD library supporting x86 SSE2+ and ARM NEON.
Change-Id: I037f4c44f621a7e909b82ccb6a299d41bcbf8607
2016-10-06 16:37:08 +00:00
Steinar Midtskogen
d06588ab18 New CLPF: New kernel and RDO for strength and block size
This commit ports a CLPF change from aom/master by manually
cherry-picking:
7560123c066854aa40c4685625454aea03410b18

Change-Id: I61eb08862a101df74a6b65ece459833401e81117
2016-10-06 09:36:03 -07:00
Jingning Han
3b22d1a875 Merge "Make ref_mv_idx syntax context dependent on block distance only" into nextgenv2 2016-10-06 15:55:40 +00:00
Angie Chiang
9c2d401ca0 Merge "Simplify file dependencies of SIMD implementation of interpolation filters" into nextgenv2 2016-10-05 16:26:26 +00:00
Jingning Han
8205b78552 Make ref_mv_idx syntax context dependent on block distance only
This allows the hardware decoder to start decoding ref_mv_idx
syntax prior to the sorting stage and hide the latency of entropy
decoding. The compression performance change is about 0.01% level.

Change-Id: I86b34f31f6c99a36ae2780416175cc0bd90ff492
2016-10-05 09:09:00 -07:00
Debargha Mukherjee
1ae9f2cfab Silence some warnings
Change-Id: I8efb64eac3438484e7a77a8a1db198223fc52bfa
2016-10-04 14:30:16 -07:00
Debargha Mukherjee
cb603790b0 Fix a compiler warning in ext-inter experiment
Change-Id: If36417c1384646da57453344b208e7653a4d31e5
2016-10-04 13:22:21 -07:00
Debargha Mukherjee
1a16a987ee Fix an integer overflow issue in restoration
https://bugs.chromium.org/p/webm/issues/detail?id=1306

Change-Id: Icd11d373ff08954121c097728e4c7791791e223f
2016-10-04 11:50:00 -07:00
Alex Converse
438b1dcb72 Merge "ext_tx: fix a signed overflow" into nextgenv2 2016-10-04 17:24:06 +00:00
Yi Luo
2b47628903 Merge "Fix high bitdepth variance overflow on uint32_t" into nextgenv2 2016-10-04 17:11:07 +00:00
Angie Chiang
b9ba5c251b Simplify file dependencies of SIMD implementation of interpolation filters
This is a similar change to following aom CL
https://aomedia-review.googlesource.com/#/c/1961/

Move SIMD related functions from filter.c/h to following files
av1_convolve_ssse3.c
av1_highbd_convolve_filters_sse4.c

Change following c files to header files.
av1_highbd_convolve_filters_sse4.c
av1_convolve_filters_ssse3.c

Change-Id: I41a3cc6b0789e632451aeda82f5eb97a4d78e370
2016-10-03 18:43:23 -07:00
Yi Luo
a674ba93fe Fix high bitdepth variance overflow on uint32_t
BUG=webm:1305

Change-Id: I4c56631359e298b99e618c07bcbae9f793c5e2ac
2016-10-03 16:37:00 -07:00
Yi Luo
8e46b860c6 Fix filter type mismatch warning on Visual Studio
- Move filter look-up functions to corresponding optimization modules.

BUG=webm:1296

Change-Id: I87f399609052db2dbc7e5a590afb08b82e3fa89f
2016-10-03 16:24:25 -07:00
Alex Converse
aa77b5168f ext_tx: fix a signed overflow
Change-Id: I9a08bc5da1a84c3d4b8fe2d457bb80406c0bc028
2016-10-03 16:17:24 -07:00
Debargha Mukherjee
bf0431276d Merge "Further changes to new-quant tables" into nextgenv2 2016-10-03 21:10:30 +00:00
Yaowu Xu
0badd0201e Merge "decode_with_drops.sh : make sample test work for av1" into nextgenv2 2016-10-03 19:57:43 +00:00
Yaowu Xu
6f745a2795 Merge "decode_with_drop.sh: vp8->aom" into nextgenv2 2016-10-03 19:57:30 +00:00
Yaowu Xu
f54733ad25 Merge "decode_to_md5_test: fixes and runs quick encode and checks decode" into nextgenv2 2016-10-03 19:57:17 +00:00
Yaowu Xu
4707c75a5b Merge "decode_to_md5.sh: vp8->aom" into nextgenv2 2016-10-03 19:56:59 +00:00
Jim Bankoski
ddb90bb445 decode_with_drops.sh : make sample test work for av1
Change-Id: I4175070840a6561c1cec5f5a50b64e425f3e2926
2016-10-03 11:13:13 -07:00
Yaowu Xu
a7c2e5c3f5 decode_with_drop.sh: vp8->aom
Change-Id: I22dacbc2e4933a60ce7151204af9ee253990ca1f
2016-10-03 11:13:08 -07:00
Jim Bankoski
0d730f95f0 decode_to_md5_test: fixes and runs quick encode and checks decode
This test checks if there's any basic change to the bitstream or default
encoder by running an encode and checking that the md5 from the decode
doesn't change.

Any change to the default encoder or bitstream should be accompanied by
a change to the md5 in this file.

Change-Id: Ibdd5a1442296fd3e946823ec1f43e8ac4e66dd34
2016-10-03 11:12:13 -07:00
Yaowu Xu
8982e20889 decode_to_md5.sh: vp8->aom
Change-Id: I0dcb0643cf83ee99b63336df851cbca749c11b68
2016-10-03 11:11:54 -07:00
Jingning Han
42bc3a9ef3 Sync ref-mv experiment between aom and nextgenv2
Change-Id: I134d276234b3b8aa7df1ab647892b5d739647f4c
2016-10-03 09:02:20 -07:00
Debargha Mukherjee
3c42c09608 Further changes to new-quant tables
Refactor to streamline the number of profiles needed, in
preparation for the next steps.

NO change in performance.

Change-Id: I753b89299897857f3c250c316b4cdc4fedcb90e8
2016-10-01 17:59:28 -07:00
Jingning Han
1f470046aa Merge "Rename aom_write_nmv_probs as av1_write_nmv_probs" into nextgenv2 2016-10-01 01:06:09 +00:00
Yaowu Xu
d59fb48bc7 Add notes for an option
cherry-picked from aom/master:
2b407394907253be68bc497aa978b0adc298bbf8

Change-Id: Ia7b3bfd68e2c31b21f49a429fecc4d0b701b045f
2016-09-30 15:39:53 -07:00
Yaowu Xu
671f2bd3f5 Rename AOM_ENC/DEC_BORDER_IN_PIXELS
Cherry-picked from aom/master:
e2721a65cbfb5b560cd884d60eb17f53539df5f0

Change-Id: I4ade58be91e7bca0cc4f2bed98a43177d7f590a5
2016-09-30 15:17:16 -07:00
Jingning Han
71e4553c3b Clean up av1_adapt_mv_probs format
Change-Id: Ib5226d4fe3dcf916fe8954c7240966e3a32eed31
2016-09-30 17:58:21 +00:00
Jingning Han
5c60cdf23f Sync assign_mv format
Change-Id: I4fea280d72d7e428f2ab0820fd728997d5a903c9
2016-09-30 17:58:06 +00:00
Jingning Han
3b0a3f3ab3 Merge "Set spatial neighbor search resolution 16x16 for block size 64x64" into nextgenv2 2016-09-30 17:57:52 +00:00
Jingning Han
dcf1b40d91 Merge "Search collocated reference block in 16x16 unit" into nextgenv2 2016-09-30 17:45:09 +00:00
Jingning Han
fd0cf16d7f Rename aom_write_nmv_probs as av1_write_nmv_probs
Change-Id: Ia33ce4918d3d40eba331f81909f3f1f0f3ab7a58
2016-09-30 10:34:33 -07:00
Jingning Han
75e513f126 Set spatial neighbor search resolution 16x16 for block size 64x64
When the block has width/height above or equal to 64, use 16x16
block search step for reference motion vector search in the non-
immediate rows and columns.

Change-Id: If11ce97a9328b879f30ef87115086aa0cd985a2f
2016-09-30 10:00:10 -07:00
Jingning Han
883c63ca57 Search collocated reference block in 16x16 unit
Use 16x16 block resolution for collocated reference motion vector
search.

Change-Id: I1091b5b178e255eb6cc0b994de360994f7661b79
2016-09-30 09:04:21 -07:00
Alex Converse
770911d48c Merge changes I319cb856,Ib009b6b6 into nextgenv2
* changes:
  Remove multi-entropy coder hacks from the treewriter
  Rename rans_dec_lut to rans_lut
2016-09-29 21:54:28 +00:00
Jingning Han
d54e5a04c4 Merge "more ref_mv changes from aom/master" into nextgenv2 2016-09-29 21:46:56 +00:00
Yue Chen
7dc7703bcb Merge "Fix unit test failure for RECT_TX + VAR_TX" into nextgenv2 2016-09-29 21:41:10 +00:00
Yaowu Xu
4306b6e599 more ref_mv changes from aom/master
Change-Id: I9152f898dfacdf3877ed719f193bb1e0dbee0a1a
2016-09-29 12:41:55 -07:00
Yue Chen
235133a22e Fix compiler error for GLOBAL_MOTION+WARPED_MOTION
Fix the logical OR computation in .mk file. Otherwise, when both
experiments are on, the output of $(filter... will be two 'yes',
which will cause missing library issue.

Change-Id: I53c44e925dc9ea77c7467217c20e4f1bc7e20fc3
2016-09-29 12:12:47 -07:00
Yue Chen
8e87224604 Merge "Move warping model estimation functions to COMMON folder" into nextgenv2 2016-09-29 18:24:32 +00:00
Alex Converse
57aa0f656d Merge changes Ideda50a6,Id2bced5f,If423eeb3 into nextgenv2
* changes:
  Port ANS from aom/master 25aaf40
  Refactor bitreader and bitwriter wrapper.
  Migrate aom/master ANS test from d311d02.
2016-09-29 16:43:12 +00:00
Yue Chen
49587a77f1 Fix unit test failure for RECT_TX + VAR_TX
Disable rect_tx because we only support 4x4 Walsh-Hadamard transform
in lossless mode.

Fixes failure in ./test_libaom --gtest_filter=*Large*ScreencastQ0/1
Configuration: --enable-experimental --enable-var-tx --enable-rect-tx
 --enable-ref-mv --enable-ext_intra --enable-ext_tx --enable-debug
 --disable-optimizations

Change-Id: Ib6b3494c7dcf7182f1cab9b138388d054851a23d
2016-09-29 09:20:52 -07:00
Debargha Mukherjee
485af9e580 Merge "Change non-uniform-quant parameters" into nextgenv2 2016-09-29 16:04:58 +00:00
Debargha Mukherjee
4ee1f71b5c Merge "Update codec name in test enviroment to match decoder" into nextgenv2 2016-09-29 16:04:37 +00:00
Jingning Han
4d7d2254bc Merge "mvref_common.c: port refactoring from aom/master" into nextgenv2 2016-09-29 15:45:20 +00:00
Jingning Han
3f485e9528 Merge "Remove an intermediate variable" into nextgenv2 2016-09-29 15:45:10 +00:00
Alex Converse
5847de75c2 Remove multi-entropy coder hacks from the treewriter
Change-Id: I319cb856a16ace343359c2aebc449c1d73bdedee
2016-09-28 15:35:12 -07:00
Alex Converse
33590f8c71 Rename rans_dec_lut to rans_lut
It's used in both encoding and decoding. Matches (historical)
implementation in aom/master.

Change-Id: Ib009b6b6023cfe69e99a0a92f3c70f4416fcdb47
2016-09-28 15:35:04 -07:00
Alex Converse
7fe2ae8e88 Port ANS from aom/master 25aaf40
Reconciles the following commits from aom/master to nextgenv2:
- 25aaf40bbc24beeb52de9af7d7624b7d7c6ce9de
- 87073de5693df70eba1c9b9be2b2732ed3b08fb3

Change-Id: Ideda50a6ec75485cb4fa7437c69f4e58d6a2ca73
2016-09-28 12:07:00 -07:00
Alex Converse
018150d01b Clang-format ransac.c
Change-Id: I1679da4fb8832133ab1bcb396f4bed4e5448e504
2016-09-28 12:07:00 -07:00
Nathan E. Egge
e691a24cff Refactor bitreader and bitwriter wrapper.
Move code for reading and writing literals and reading trees to use
just the aom_read_bit() and aom_write_bit() function calls.

Change-Id: Id2bced5f0125a5558030a813c51c3d79e5701873
(cherry picked from aom/master commit bc1ac15846a200272551699d45457039535e56b2)
2016-09-28 12:07:00 -07:00
Alex Converse
5d33cc42c3 Migrate aom/master ANS test from d311d02.
This helps in porting entropy coder changes that happened in aom/master.

Change-Id: If423eeb3da552066cceb88227138ea61d6a20f07
(cherry picked from aom/master commit d311d02da55433d20aad6dd88e0bbb992919988d)
2016-09-28 12:07:00 -07:00
Peter de Rivaz
105fa6d9f2 Update codec name in test enviroment to match decoder
The codec name is defined in av1_dx_iface.c
This name needs to match kAV1Name in decode_test_driver.cc.
Otherwise the EndtoEndPSNRTest fails when built with --enable-ext-tile,
(because we need the IsAV1 function to return true.)

Change-Id: I05d5ea5b6fd4bbd49e8bcacd047fb81c27efb3b3
2016-09-28 09:11:41 -07:00
Debargha Mukherjee
9324d38825 Change non-uniform-quant parameters
Also adds hooks to choose different profiles for UV and intra.

Results
lowres: -0.15%
midres: -0.24%

Change-Id: I4af8bc3e9b82b6f8a061dce9f52c89afa6239ae1
2016-09-28 09:09:35 -07:00
Yue Chen
1ab57800f1 Move warping model estimation functions to COMMON folder
These functions will be called by both enc and dec in WARPED_MOTION
experiment.

Change-Id: I4b4a20af111b30822760aee8c9451e9ccbb2dd05
2016-09-27 17:59:45 -07:00
Yi Luo
fda2f1b95a Merge "Add a TODO for aom_highbd_fdct16x16_1_sse2 tests" into nextgenv2 2016-09-27 22:21:54 +00:00
Yaowu Xu
dc035da9b9 mvref_common.c: port refactoring from aom/master
Change-Id: I53cf072f33de957eed6bf6be270218db8ff33af9
2016-09-27 11:59:15 -07:00
Yaowu Xu
439286a6c5 Remove an intermediate variable
This commit changes to use function parameter "len" directly.

Change-Id: I072d165aeca59cfbbcf52c9be3c2a91e3191b980
2016-09-27 10:13:33 -07:00
Yue Chen
6c6ddac3a4 Merge "Fix for compile error with RECT_TX without EXT_TX" into nextgenv2 2016-09-27 06:46:19 +00:00
Alex Converse
c8b229772e Merge changes I13eed9cb,I3b213790,I7232f9ae into nextgenv2
* changes:
  Remove VP10 style bitreader and bitwriter wrappers
  Rename av1_ans_test to match aom/master.
  Migrate bitreader to the interface from aom/master
2016-09-26 22:34:57 +00:00
Yaowu Xu
c7d6eaa5fe Merge "rename pred_mv_s8 to pred_mv" into nextgenv2 2016-09-26 21:12:05 +00:00
Alex Converse
4fb213f31f Remove VP10 style bitreader and bitwriter wrappers
Change-Id: I13eed9cb6950ea4fbdd586d43b73ac0cc2d78d33
2016-09-26 14:02:34 -07:00
Alex Converse
0ad82c6edb Rename av1_ans_test to match aom/master.
Change-Id: I3b2137903a87a1f8169ff45e940575b917c26a6a
2016-09-26 13:15:41 -07:00
Alex Converse
acef60bd2c Migrate bitreader to the interface from aom/master
Change-Id: I7232f9ae3d97e730f66e4b80f550192e3ef7230b
2016-09-26 12:19:11 -07:00
Sarah Parker
f94296dec6 Merge "Add double precision warping for ransac" into nextgenv2 2016-09-26 19:03:52 +00:00
Yaowu Xu
f5bbbfad1d rename pred_mv_s8 to pred_mv
Change-Id: Ib1088c3fc80952074e098385fe5eb81742e7dc59
2016-09-26 09:13:38 -07:00
Yaowu Xu
d9470c20df Merge "minor format fix" into nextgenv2 2016-09-26 15:13:05 +00:00
Yaowu Xu
3bf484efb2 Merge "change to use aomedia copyright notice" into nextgenv2 2016-09-26 15:12:57 +00:00
Peter de Rivaz
a7c814664e Fix for compile error with RECT_TX without EXT_TX
Change-Id: I2f4e3fc877c03a5bee7f7fd1dc50e6a693697647
2016-09-26 14:20:13 +01:00
Alex Converse
71427df526 Merge "enums.h: Combine related #defines into packed enums." into nextgenv2 2016-09-24 00:38:53 +00:00
Yaowu Xu
def1a3d65e minor format fix
Change-Id: Ia4a37d43a7110c84cda6ad317aa7f799e00bde82
2016-09-23 15:37:46 -07:00
Yaowu Xu
5e53c43ec7 change to use aomedia copyright notice
av1/common/allcommon.h
doc.mk

Change-Id: I7e08c9131ab1c0d7e7854f7e70b90397d041143a
2016-09-23 15:37:36 -07:00
Sarah Parker
97fa6da1d2 Add double precision warping for ransac
Change-Id: I32b6e2e6c8454ffb64e4a4ceb87070d175f05fe9
2016-09-23 11:19:27 -07:00
Alex Converse
1d1e0844e9 Merge "Migrate bitwriter to the interface in aom/master" into nextgenv2 2016-09-23 01:18:30 +00:00
Debargha Mukherjee
60b2927d51 Merge "Fix bug in table for UV tx ize" into nextgenv2 2016-09-22 18:53:51 +00:00
Debargha Mukherjee
6de06dd3d8 Fix bug in table for UV tx ize
Change-Id: I086b79462b0933cf9dc1101ff71cbc71c7da2738
2016-09-22 10:10:20 -07:00
Urvang Joshi
cb586f3ba9 enums.h: Combine related #defines into packed enums.
enums for BLOCK_SIZE, TX_SIZE and PREDICTION_MODE.

Note: These were converted to #defines earlier to save on memory:
https://chromium-review.googlesource.com/#/c/269854/

But we, instead, use attribute 'packed' (see here:
https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#Common-Type-Attributes)
to ensure that these enums use the smallest possible integer type,
and so use smallest memory when used in structs/arrays etc.

Change-Id: If1fc136686b28847109c9f3a06f8728165e7e475
2016-09-22 09:44:51 -07:00
Angie Chiang
6062a8bfee bitstream_debug: build related cleanup
Move experimental config from debug_util.c/h to aom_util.mk to avoid
empty object.

Change-Id: Id7978ed6a342262bddaa4df8b53115e750fa1c2c
2016-09-22 09:37:56 -07:00
Alex Converse
080a2cccba Migrate bitwriter to the interface in aom/master
Change-Id: I73d46229f0feea43cbe933e51da997833cce032b
2016-09-21 11:17:08 -07:00
Debargha Mukherjee
7a9ad9c83f Merge "Misc. refactoring of loop restoration" into nextgenv2 2016-09-21 04:37:17 +00:00
Debargha Mukherjee
5d89a63a7e Misc. refactoring of loop restoration
Streamilines the functions and data structures to make it
easy to add new restore options.

Change-Id: Ib00638a5749e6c38c2455f3e3142b1025e6e0624
2016-09-20 20:46:36 -07:00
Sarah Parker
8f71e396b1 Merge "Fix naming mistake in multiply_mat" into nextgenv2 2016-09-20 23:15:35 +00:00
Alex Converse
3e457ba154 Merge changes I38f40582,Ib7afcffa into nextgenv2
* changes:
  Move ANS to aom_dsp.
  Move and wrap the old vpx boolcoder.
2016-09-20 22:55:18 +00:00
Sarah Parker
8f90d8b59b Fix naming mistake in multiply_mat
This was introduced in a cleanup in
I1e07ccab18558dfdd996547a72a396abe02ed23d

Change-Id: If6ac798d838a1ad392981f4e5970778207c3cb0b
2016-09-20 15:37:15 -07:00
Yi Luo
fbf5681aae Add a TODO for aom_highbd_fdct16x16_1_sse2 tests
- Here function, aom_fdct16x16_1_sse2 is mistakely tested. It can pass
  AOM_BITS_8, AOM_BITS_10, but not AOM_BITS_12. We should fix this test
  when aom_highbd_fdct16x16_1_sse2 is available.

Change-Id: I5cac6ee5404ff6d833940e1ecc34663b29d7a41c
2016-09-19 16:26:08 -07:00
clang-format
bda8d61ed1 apply clang-format after 5cd2ab9
Change-Id: I186e90d99cd54e66d38159b7cb55a881226b1568
2016-09-19 15:56:08 -07:00
Alex Converse
674e9a7ca6 Merge "Use the aom_writer type rather than the tag in calling code." into nextgenv2 2016-09-19 21:50:56 +00:00
Alex Converse
1ac1ae73dc Move ANS to aom_dsp.
That's where it lives in aom/master.

Change-Id: I38f405827d9c2d0b06ef5f3bfd7cadc35d5991ef
2016-09-19 09:51:27 -07:00
Pascal Massimino
e5868cdba9 Merge "Kludge to keep ANS building while porting from aom/master." into nextgenv2 2016-09-18 07:21:58 +00:00
Alex Converse
e54fd03c5a Use the aom_writer type rather than the tag in calling code.
This makes room for typedefing some other struct to aom_writer.

Change-Id: I1e82de1320da00b3e41c90b14f2df45e7628aa89
(cherry picked from commit d69161f8f1eed602e0e5d21f4e6157b674e30cf6)
2016-09-17 14:56:51 -07:00
Alex Converse
eb00cb289b Move and wrap the old vpx boolcoder.
This should make room for compile time pluggable replacements.

Change-Id: Ib7afcffa93bf664b89a49da21a20138127443292
(cherry picked from commit 9dd0b8982445515d6dddb6342e655b56062a8f7f)
2016-09-17 14:56:51 -07:00
Alex Converse
9264650838 Kludge to keep ANS building while porting from aom/master.
Change-Id: I9e74bdb94c5640aca025b11b6676e8a8c008f47e
2016-09-17 14:56:48 -07:00
Debargha Mukherjee
4c80804e66 Merge "Enable tile-adaptive restoration" into nextgenv2 2016-09-17 19:10:28 +00:00
Debargha Mukherjee
5cd2ab95c9 Enable tile-adaptive restoration
Includes a major refactoring/enhancement to support
tile-adaptive switchable restoration. The framework can be
readily extended to add more restoration schemes in the
future. Also includes various cleanups and fixes.

Specifically the framework allows restoration to be conducted
on tiles such that each tile can be either left unrestored, or
use bilateral or wiener filtering.

There is a modest improvemnt in coding efficiency (0.1 - 0.2%).

Further enhancements will be added subsequently to improve coding
efficiency and complexity.

Change-Id: I5ebedb04785ce1ef6f324abe209e925c2d6cbe8a
2016-09-17 09:46:28 -07:00
Sarah Parker
f9a961c5d0 Style fixes for global motion experiment
These are in response to a post-commit review in
Ib6664df44090e8cfa4db9f2f9e0556931ccfe5c8

Change-Id: I1e07ccab18558dfdd996547a72a396abe02ed23d
2016-09-16 16:22:24 -07:00
clang-format
67948d312d apply clang-format
Change-Id: If22018f8911d9d7ee99c2127bdfcc56e42b0e2d7
2016-09-15 16:41:21 -07:00
James Zern
964a717acf .clang-format: update to 3.8.1
based on --style=Google with the following differences:
3a4
> # Generated with clang-format 3.8.1
13c14
< AllowShortCaseLabelsOnASingleLine: false
---
> AllowShortCaseLabelsOnASingleLine: true
41c42
< ConstructorInitializerAllOnOneLineOrOnePerLine: true
---
> ConstructorInitializerAllOnOneLineOrOnePerLine: false
44,45c45,46
< Cpp11BracedListStyle: true
< DerivePointerAlignment: true
---
> Cpp11BracedListStyle: false
> DerivePointerAlignment: false
73c74
< PointerAlignment: Left
---
> PointerAlignment: Right
75c76
< SortIncludes:    true
---
> SortIncludes:    false

SortIncludes will like be enabled in a future commit

Change-Id: I5c404f44081b65354e7f526411c91fbbe31ac5af
(cherry picked from commit 6d84689870e1437b2ebb5df56c672b3249b975bb)
2016-09-15 15:12:14 -07:00
Jingning Han
1aab81843d Sort header files
cherry-picked #ecd07473 from aom/master

Change-Id: Id8f45d9c11406fc301b39801c5228ccd6aa2d5d6
2016-09-09 16:45:02 -07:00
Jim Bankoski
f7f043774b aomdec.sh : Make this test create files if needed to test decoder.
If test files don't already exist it calls aomenc to create them.

cherry-picked #ee9ac321 from aom/master

Change-Id: I0e0f33cb60b3492e9106d6c9e2c51f64f71ebb63
2016-09-09 16:39:21 -07:00
Jim Bankoski
5d105b40c3 simple_encoder: make it so we can run it in tests.
Added a limit, resolving a todo and added a limit parameter so that we
can do a very simple fast encode in 1 pass.

Change-Id: I265cd912d970d560a0b00b86e6c7ec7b6fef1e7b
2016-09-09 15:54:51 -07:00
Jim Bankoski
e78a964e29 simple_decoder.sh: Support encoding in decode test scripts.
Adding AV1 input files to the test set is not feasible because the
bitstream is in constant flux. Add test input encoding and hook
it up in simple_decoder.sh to start.

cherry-picked #b591df89 from aom/master

Change-Id: Ie4c06a7c458cdc2ab003d27fb92418c77c87fc88
2016-09-09 15:49:56 -07:00
Yaowu Xu
2a88d24907 Merge "Convert to int before adding negative numbers" into nextgenv2 2016-09-09 22:39:32 +00:00
Yaowu Xu
f9490ff58a Merge "Convert "var" to uint64_t" into nextgenv2 2016-09-09 22:39:24 +00:00
Yaowu Xu
ca38a67a5c Merge "twopass_encoder: sample and test script fixed." into nextgenv2 2016-09-09 22:39:16 +00:00
Yaowu Xu
250a52ed06 Merge "set_maps: add back script and fix." into nextgenv2 2016-09-09 22:39:07 +00:00
Yaowu Xu
66c41f9937 Merge "Clarify valid value ranges" into nextgenv2 2016-09-09 22:38:57 +00:00
Yaowu Xu
34ddb7ab1f Merge "change to use correct type" into nextgenv2 2016-09-09 22:38:44 +00:00
Debargha Mukherjee
8e80f422d6 Merge "Add SSE2 versions of av1_fht8x16 and av1_fht16x8" into nextgenv2 2016-09-09 20:51:03 +00:00
Yaowu Xu
8706182376 Convert to int before adding negative numbers
This is avoid that -1 overflows uint32_t.

cherry-picked #c48106da from aom/master

Change-Id: Ic3d99b1985cdb0a28cc83f8291422f5aba5a5a6d
2016-09-09 12:43:02 -07:00
Yaowu Xu
aa8729c55f Convert "var" to uint64_t
This is to avoid overflow at uint32_t.

cherry-picked #000098a0 from aom/master

Change-Id: I549d2d13d0577fd05d57303a438fbc8034755e45
2016-09-09 12:42:12 -07:00
Jim Bankoski
a65e7beea8 twopass_encoder: sample and test script fixed.
Added a limit function and removed a todo and fixed script so that
it can actually be run on av1.

cherry-picked #1801d35d from aom/master

Change-Id: Ib8d1d1b5c7dbe0169e4e6c7d89d28801d7699c37
2016-09-09 12:38:37 -07:00
Jim Bankoski
a7a3909f55 set_maps: add back script and fix.
cherry-picked #a5c5f856 from aom/master

Change-Id: Ie50a81063b5e14f4b5f3b5adcb822dba6b3ee93d
2016-09-09 12:33:09 -07:00
Yaowu Xu
6feda0602a Clarify valid value ranges
This commit adds asserts to clarify value ranges in sum computations,
also corrects type conversion used in related calculations.

cherry-picked #738d5b19 from aom/master

Change-Id: Ib6d574ec23e5c28ccd994dac26f973eb3920430d
2016-09-09 11:58:53 -07:00
Yaowu Xu
57d92577d4 change to use correct type
This commit changes to use uint32_t for cost (always non-negative),
and promote to int64_t before calculation of the savings.

This fixes an integer overflow.

cherry-picked #a3028ddf from aom/master

Change-Id: I71c2580d188cc79d2d8069241d0353cf331b5c83
2016-09-09 11:52:34 -07:00
Jim Bankoski
19a06bccdf resize_util.sh : resize util was removed.
The app this script called was removed in this patch.
50cbe24 remove more vp8 and vp9 only code

cherry-picked #1c17dd6f from aom/master

Change-Id: Ib622eff6a3a35c5dab26908b094ace969f128c11
2016-09-09 11:51:13 -07:00
Thomas Daede
ac0a380ea2 Make deadline mode not depend on frame duration.
Backwards compatible with old API.

cherry-picked #02ae3dd3 from aom/master.

Change-Id: I65aa43f84bb9491e8cca73fe444094c2622b0187
2016-09-09 11:50:33 -07:00
Thomas Daede
f56859f198 Fix decoding Daala deringing and CLPF filters with tiling.
cherry-picked #14ed7a61 from aom/master

Change-Id: I077b0e97186bdd292f925e08966a2ca3cf8c250d
2016-09-09 11:47:56 -07:00
Yaowu Xu
af048635bb Change to use correct type
This commit changes to use int instead of unsigned for a variable used in
inverse quantization.

Change-Id: I8f0ff5f80c9e68d52425265ef177357c65ead1e2
2016-09-09 18:47:15 +00:00
Jim Bankoski
23938a73c0 aomenc: Remove tests unsupported in av1.
Change-Id: I9379eedd577c8bfb7b82f1c996e4ee4c62ce686b
2016-09-09 18:46:57 +00:00
Yaowu Xu
3a45d57574 vp8_multi_resolution_encoder.sh: remove file
Change-Id: I3be6480b98cdde4c24b6cdfbebf362072153bcca
2016-09-09 18:46:40 +00:00
Yaowu Xu
1ff9579773 restore vp9 and vpx in libwebm
renaming should not have been applied to third_party.

Change-Id: I95be7ec4b7558298cd49ec4c5d1ed15a17ad222b
2016-09-09 18:46:13 +00:00
Yaowu Xu
70287defe1 Merge "simplify test code" into nextgenv2 2016-09-09 18:45:54 +00:00
Geza Lore
1a800f6539 Add SSE2 versions of av1_fht8x16 and av1_fht16x8
Encoder speedup ~2% with ext-tx + rect-tx

Change-Id: Id56ddf102a887de31d181bde6d8ef8c4f03da945
2016-09-09 11:29:41 -07:00
Debargha Mukherjee
d610d209c9 Merge "Fix some var_tx related rd_costing mismatches" into nextgenv2 2016-09-09 17:46:38 +00:00
Sarah Parker
e51ee021dc Merge "Swap order of affine parameters" into nextgenv2 2016-09-09 17:08:11 +00:00
Yaowu Xu
81fb4cf1ee simplify test code
Change-Id: Ib5491fb8f5dd7edf27c74abdd21b1f0a42aafd1f
2016-09-09 16:40:58 +00:00
Debargha Mukherjee
797cc30f23 Merge "Rd fixes and cleanups" into nextgenv2 2016-09-09 01:52:59 +00:00
James Zern
ea3621ab95 Merge "aom_mem,align_addr: use ~ to create mask" into nextgenv2 2016-09-09 01:41:31 +00:00
Yaowu Xu
fe24b956e9 aom_cx_set_ref: add example showing setting reference frame
Manually cherry-picked from AOM:
16944e59 aom_cx_set_ref: Example showing setting a reference frame
8f4c0ec8 examples.mk - Invalid comment fixed

Change-Id: Ifa87611561b089aebef2c132099baf265c845b10
2016-09-08 17:36:44 -07:00
Yaowu Xu
628d3c5839 variance_impl_avx2.c: align a table for better readability
Change-Id: I8cd99f9807dbfe6f70147615d2fd6775a7d98c16
2016-09-08 17:36:44 -07:00
James Zern
7b9407a81b s/INTERP_FILTER/InterpFilter/
this matches style guidelines and stabilizes successive runs of
clang-format across the tree. remaining types should be address in
successive commits.

Change-Id: I6ad3f69cf0a22cb9a9b895b272195f891f71170f
2016-09-09 00:32:31 +00:00
Debargha Mukherjee
096ae4cb68 Rd fixes and cleanups
A minor cleanup and an enhancement to return y_skip correctly
from sub8x8 intra mode search.

Change-Id: I87256d3cc5f57a2fd7b837d461cc1a7f06e01a1b
2016-09-08 15:48:05 -07:00
Peter de Rivaz
c0b4d7ae2c Fix some var_tx related rd_costing mismatches
This makes the code in select_tx_size_fix_type match the
corresponding code in pack_inter_mode_mvs.

Change-Id: I69bcc0dc6fdd733091fafe9188a3f7397e1e613f
2016-09-08 12:04:55 -07:00
James Zern
20b859833c aom_mem,align_addr: use ~ to create mask
removes the need for an intermediate cast to int, which was missing in
the call added in:
73a3fd4 aom_mem: Refactor code

quiets a visual studio warning:
C4146: unary minus operator applied to unsigned type, result still
unsigned

Change-Id: I76c4003416759c6c76b78f74de7c0d2ba5071216
2016-09-08 10:45:15 -07:00
James Zern
9fa47587d9 fix 'dist' & other decode-only builds
common/av1_fwd_txfm.[hc] are encode-only; add a TODO to relocate them

Change-Id: I28cf8d0b22632b04066bcb72f3d2252ee7eb153e
2016-09-08 14:53:42 +00:00
James Zern
ba98061af3 av1_inv_txfm_test: fix decode-only build
fdct's are only enabled with --enable-av1-encoder

Change-Id: Iaf1dfdf713f2ecd1d215ba7ec635f353c02fa4d0
2016-09-07 16:33:35 -07:00
Debargha Mukherjee
d125b7a0cd Merge "Parameter adjustments to loop restoration" into nextgenv2 2016-09-07 21:26:34 +00:00
Debargha Mukherjee
035c5f34eb Parameter adjustments to loop restoration
Some minor adjustments to tile size and bilateral filters.

About 0.1% improvement for midres and hdres, very small change for
lowres.

Change-Id: Ia94f68a926867dfd67da1a8795fd8de0ddd8e2d6
2016-09-07 13:51:01 -07:00
Sarah Parker
c4bcb50635 Swap order of affine parameters
This allows for a clean subtraction of 1 along the transform
matrix diagonal and also makes the order of the parameter list
a little more intuitive.

Change-Id: I6a5d754af41b8d1292f241f9b21473160517d24f
2016-09-07 13:41:03 -07:00
Sarah Parker
3410a88373 Merge "Add parameter search to global motion computation" into nextgenv2 2016-09-07 20:39:37 +00:00
Sarah Parker
e3b8ff50f2 Fix hbd naming mistake in warped_motion.h
This changes a remaining VP9_HIGHBITDEPTH to AOM_HIGHBITDEPTH

Change-Id: I35efaf9528de660fb69104792a563dba5c41f329
2016-09-07 12:20:16 -07:00
Debargha Mukherjee
f579555423 Merge "Minor transform code cleanup" into nextgenv2 2016-09-07 16:59:42 +00:00
Debargha Mukherjee
ff4e315d13 Merge "Harmonize and fix coeff context computation" into nextgenv2 2016-09-07 16:54:21 +00:00
Pascal Massimino
4d5fda029c Merge "aom_mem.c: remove unnecessary inline" into nextgenv2 2016-09-07 09:34:37 +00:00
James Zern
cd24516347 aom_mem.c: remove unnecessary inline
these aren't overly speed critical, best to leave it to the compiler. as
a side-effect this fixes Visual Studio compilation (should have been
INLINE)

Change-Id: Ic81fb5ac76bc19c61efb2f1a965c0f79e9e45ebd
2016-09-06 23:36:59 -07:00
James Zern
5d986e5a30 odintrin.h: add missing extern "C"
fixes test linkage

Change-Id: I15a7b32551fddc5e78e3035e9d2e94a57ff9f1d2
2016-09-06 23:31:26 -07:00
Sarah Parker
cda2345787 Merge "Adjust types in hbd error computation to avoid overflow" into nextgenv2 2016-09-07 03:28:07 +00:00
Sarah Parker
ca92da752b Adjust types in hbd error computation to avoid overflow
Change-Id: I8e08ebc8cbb2d1a1f97c8ef0c9237d8dfe0df208
2016-09-06 19:43:01 -07:00
Sarah Parker
ecb0afc838 Add parameter search to global motion computation
Change-Id: I66ea5a819ab54ecb5327eee20f798d7d7f0833d3
2016-09-06 19:33:52 -07:00
Sarah Parker
1d22837fbd Merge "Fix formatting in internal stats for vp10" into nextgenv2 2016-09-07 00:53:14 +00:00
Yaowu Xu
a668cce3b5 Merge "Use AOMedia's Patents and LICENSE files" into nextgenv2 2016-09-07 00:34:34 +00:00
Yaowu Xu
cf92ae9f16 Merge "README.libvpx -> README.libaom" into nextgenv2 2016-09-07 00:34:25 +00:00
Yaowu Xu
0dc4cbb059 sad_avx2.c: add hints for clang-foramt
Change-Id: I721c52e69395a99b3a0395dc229de1cbb32670e9
2016-09-07 00:29:13 +00:00
Yaowu Xu
151864fcff Use AOMedia's Patents and LICENSE files
Change-Id: Icb53448442a8f341af3799d873e2fd6f3db5fbe2
2016-09-06 16:09:03 -07:00
Yaowu Xu
848668bee7 README.libvpx -> README.libaom
Change-Id: Ie7dd4aeee084ef9520f68663aa566ea32350e227
2016-09-06 14:35:52 -07:00
Urvang Joshi
497f27ed9d aom_realloc correction.
aom_realloc was allocating 1 byte more than needed every time.
Fixed this, and took this opportunity to do a small refactoring.

Change-Id: I38fcb62b698894acbbab43466c1decd12f906789
2016-09-06 21:27:20 +00:00
Urvang Joshi
73a3fd4710 aom_mem: Refactor code
Change-Id: I2da9cd5da48ae97e770bccfd1233bcc70b484688
2016-09-06 21:27:03 +00:00
Yaowu Xu
e14a42a453 Merge "Move CHECK_MEM_ERROR implementation to aom/internal." into nextgenv2 2016-09-06 21:26:52 +00:00
Alex Converse
f5550733e8 Move CHECK_MEM_ERROR implementation to aom/internal.
Allow using it in aom_dsp.

Change-Id: Ide7d58b6d11f8a45d473fc13bf730ba5bccb5516
2016-09-06 21:23:36 +00:00
Debargha Mukherjee
2963054ef6 Harmonize and fix coeff context computation
Change-Id: I75740e221deb3872647bd480ae506ba68800e8c7
2016-09-06 13:23:31 -07:00
Yue Chen
a1e48dccf2 Make RECT_TX(>=8x8) work with VAR_TX
Bitstream syntax:
For a rectangular inter block, 'rect_tx' flag is sent to indicate if
the biggest rect tx is used. If no, continue to decode regular
recursive tx partition.

Change-Id: I127e35cc619b65acb5e9a0717f399cdcdb73fbf0
2016-09-06 11:26:15 -07:00
Sarah Parker
5ebdf40d77 Merge "Add global motion experiment to rdopt" into nextgenv2 2016-09-06 18:07:31 +00:00
Debargha Mukherjee
40da2c899a Merge "Enable rectangular transforms for UV" into nextgenv2 2016-09-06 15:46:21 +00:00
Yaowu Xu
f87b9021f1 Fix a compiler warning of unused variable
Change-Id: I4a2faa32cc0847fe14dd8f40156163f4713055ca
2016-09-06 14:52:49 +00:00
Yaowu Xu
037845507d Avoid re-use same temp variables
In highbd_quantize_intrin_sse2.c.

Change-Id: Iaf6360e456f1fb2f8ff06461afbfecfc0103dda3
2016-09-06 14:52:19 +00:00
Yaowu Xu
34b0ee61b2 quantize.c: int->uint32_t for absolute values
Change-Id: I784f32e0e86d873655e46cf68c5c124a698af361
2016-09-06 14:51:47 +00:00
Yaowu Xu
1f9356a536 aom_dsp: AV1_IADST8x16_1D to AOM_IADST8x16_1D
Change-Id: Iba415ab2d4adb3350b4747a58f69db7d02bbab68
2016-09-06 14:51:32 +00:00
Debargha Mukherjee
2f12340ff0 Enable rectangular transforms for UV
Uses an array to map block sizes, y tx sizes, and subsampling
factors to various transform sizes for UV.

Results improve by 0.1-0.2%

Change-Id: Icb58fd96bc7c01a72cbf1332fe2be4d55a0feedc
2016-09-05 15:06:19 -07:00
Sarah Parker
f97b7860d5 Fix formatting in internal stats for vp10
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.

The corresponding fixes for vp9 and vp8 are in
Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1.

Change-Id: Ica3d625d6672b3c47e0e208b45eede29b9004030
2016-09-03 12:02:01 -07:00
Yaowu Xu
01bade1064 Removed tests and data not in use
Change-Id: If688da3089ad33f18751fa2f8c46b6f5dc708bd2
2016-09-03 00:06:09 +00:00
Urvang Joshi
51dcf564b8 Merge "test_intra_pred_speed.cc : Fix visual studio build." into nextgenv2 2016-09-02 23:12:34 +00:00
Yaowu Xu
ecee7f29d0 Merge "Change to use AOM copyright notice" into nextgenv2 2016-09-02 22:13:24 +00:00
Urvang Joshi
31744ec4f2 test_intra_pred_speed.cc : Fix visual studio build.
Visual studio doesn't like nested macros, apparently. This patch should
fix it.

Change-Id: Ifa56fae5be0b3dfd3fecd88a8a443e39135f96ab
2016-09-02 15:11:59 -07:00
Yaowu Xu
2ab7ff05f1 Change to use AOM copyright notice
Change-Id: I2b2b70e756b7eb9611b7b33b7d5f19b3b30e0a50
2016-09-02 19:52:03 +00:00
Yunqing Wang
99c6637dfa Merge "Remove unused buffer allocation functions" into nextgenv2 2016-09-02 17:52:48 +00:00
Yaowu Xu
0efe92f177 Merge "aomcx_set_ref -> aom_cx_set_ref" into nextgenv2 2016-09-02 17:41:44 +00:00
Yaowu Xu
0764955002 Merge "Change to use aom copyright notice" into nextgenv2 2016-09-02 17:41:21 +00:00
Angie Chiang
0bfe491967 Merge "Add frame info in bitstream debug tool" into nextgenv2 2016-09-02 17:05:48 +00:00
Yunqing Wang
8aa228019c Remove unused buffer allocation functions
Removed unused buffer allocation functions.

Change-Id: Ib779dde9ad6a511d88b7f7cba2604902eff7aa05
2016-09-02 09:23:53 -07:00
Yaowu Xu
890c4f2497 aomcx_set_ref -> aom_cx_set_ref
Change-Id: I60dd645451d6d65465f099a16ac855fb0b5a57a9
2016-09-02 08:54:47 -07:00
Yaowu Xu
9c01aa1b0c Change to use aom copyright notice
This minimize code differences between AOM master and nextgenv2

Change-Id: If144865bdf3ef0818e7aac11018b9e786444c550
2016-09-02 08:22:07 -07:00
Geza Lore
a1ddae59eb Minor transform code cleanup
- Localize static lookup tables in the sole functions that use them.
- Remove dead high bit-depth IDST functions.
- Apply clang-format

Change-Id: Ibbd7db4259f9ea64d695b2f13f5c118aac8f1cf9
2016-09-02 09:58:09 +01:00
Debargha Mukherjee
a782a3b68f Merge "Some cleanups for unnecessary macros" into nextgenv2 2016-09-02 08:37:36 +00:00
Sarah Parker
e529986568 Add global motion experiment to rdopt
This patch completes the global motion experiment
implementation. It modifies the format of the motion
parameters to use the mv union to facilitate faster
copying and checks for parameters equal to 0 that occur
frequently in rdopt. The rd decisions for the global motion experiment
have also been added to rdopt.
Change-Id: Idfb9f0c6d23e538221763881099c5a2a3891f5a9
2016-09-01 19:51:11 -07:00
Yaowu Xu
9c323bc272 Port two daala_dering changes from AOMedia
03394bd Remove dead code from av1_dering_search.
337b23a Changing the weights of the first CRF filter in deringing

Change-Id: I1216c146dc3f72f24ceec3d3c65c4dd6cd73623e
2016-09-02 00:39:52 +00:00
Yaowu Xu
3b95d59a1b rename two mk files to make naming consistent
av1cx.mk -> av1_cx.mk
av1dx.mk -> av1_dx.mk

Change-Id: I698bd65b933c433066d5dfeb94cee680095508e4
2016-09-02 00:39:32 +00:00
Yaowu Xu
14292bbb10 Merge "Add explict conversion from int64_t to int" into nextgenv2 2016-09-02 00:39:19 +00:00
Angie Chiang
cb9a9ebd81 Add frame info in bitstream debug tool
Change-Id: Iead3edd8563d7900481eb199e8b003d2d3df075b
2016-09-01 16:24:49 -07:00
Yaowu Xu
9702fcbb16 Add explict conversion from int64_t to int
The values after right shifts should fit into 32bit int. The commit
fixes MSVC build warning when new-quant is enabled.

Change-Id: Ic89dd86fb981a1206653943658af2b6b2925a676
2016-09-01 22:33:56 +00:00
Yaowu Xu
c8b2fd8022 Merge ".gitignore: corrent entries from vpx to aom" into nextgenv2 2016-09-01 22:33:19 +00:00
Urvang Joshi
0d515b29b1 Merge "Add ALT_INTRA experiment." into nextgenv2 2016-09-01 21:45:32 +00:00
Yaowu Xu
75c9abd28f .gitignore: corrent entries from vpx to aom
Change-Id: I8af6a9723c31c0f868e9bd75dcc079413a3700c4
2016-09-01 13:57:13 -07:00
Urvang Joshi
340593e530 Add ALT_INTRA experiment.
When the experiment is ON, we use Paeth predictor instead of TM
predictor.

For derf set, this gives about 0.09% improvement overall, and 0.55%
improvement if all frames are forced to be intra-only.

Also, if the EXT_INTRA experiment is also on, the improvement overall
is 0.056%, and improvement if all frames are forced to be intra-only is
0.465%.

Change-Id: Id74e107ede70a8d2107fa14fcb3f44b23a437274
2016-09-01 12:03:20 -07:00
Yaowu Xu
f7ae12d7fd add an explicit conversion from size_t to int
Function ans_read_int() takes int as parameter, this commit uses an
explicit conversion to avoid MSVC building warning.

Change-Id: Ia405e1d5a86c0f42932fa1da29417ccbf2dd58e7
2016-09-01 08:59:46 -07:00
Yaowu Xu
958303c4c6 Replace inline with INLINE
This fixes msvc build errors.

Change-Id: I1344685e891db61ba569d818e0f2167b2978c299
2016-09-01 08:45:22 -07:00
Debargha Mukherjee
3b52b3ac27 Some cleanups for unnecessary macros
Remove some macros that are no longer necessary for experimentation.

Change-Id: I959bf441c8333607df4aa1ee18841f189ade8112
2016-09-01 00:30:32 -07:00
Yaowu Xu
f883b42cab Port renaming changes from AOMedia
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7  Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*

Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419
2016-08-31 18:19:03 -07:00
Yaowu Xu
c27fc14b02 Port folder renaming changes from AOM
Manually cherry-picked commits:
ceef058 libvpx->libaom part2
3d26d91 libvpx -> libaom
cfea7dd vp10/ -> av1/
3a8eff7 Fix a build issue for a test
bf4202e Rename vpx to aom

Change-Id: I1b0eb5a40796e3aaf41c58984b4229a439a597dc
2016-08-31 17:26:24 -07:00
Yunqing Wang
b1fb998c46 Merge "Change buffer_alloc_sz and frame_size type to size_t" into nextgenv2 2016-08-31 23:56:02 +00:00
Yunqing Wang
a722a114d6 Change buffer_alloc_sz and frame_size type to size_t
1. Changed buffer_alloc_sz and frame_size type to size_t.
2. Added a TODO for video resolution limits. On 32 bit systems, the maximum
resolution supported in the encoder is 4k(3840x2160). The malloc() would
fail if encoding >4k video on a 32 bit system.

Change-Id: Ibd91b28fd63d1b04e8ac9a5270a17629f239188a
2016-08-31 14:56:21 -07:00
Yunqing Wang
0de5d5d221 Merge "Fix Visual Studio build warnings" into nextgenv2 2016-08-31 18:50:07 +00:00
Yunqing Wang
53db7d0caa Merge "Remove unused buffer allocation functions" into nextgenv2 2016-08-31 18:49:29 +00:00
Zoe Liu
03a11f6ceb Fix a bug in calculating the compound ref frame cost
The previous ext-refs experiment did not consider the cost of the 2nd
reference frame on mode decision in the compound mode. With the fix,
using Overall PSNR, compared to the previous ext-refs RD performance
before the bug fix, all against the baseline, the improvements are:

"ext-refs" before fix: lowres -5.665%  midres: -4.833%
"ext-refs" after fix:  lowres -5.776%  midres: -5.000%
Improvement by the fix: lowres -0.111%  midres: -0.167%

Change-Id: I2eceedf2d4046b169514e049fd01baaf0bbb50c6
2016-08-31 09:43:31 -07:00
Zoe Liu
2033078a18 Merge "Fix a bug in deciding ref frame context in ext-refs" into nextgenv2 2016-08-31 16:42:43 +00:00
Wei-Ting
966e609e95 Make an unmeaningful change to be added into the list
Change-Id: I578589a955bd3f3d7ff61723b574361661453f67
2016-08-30 15:42:32 -07:00
Zoe Liu
27af52300e Fix a bug in deciding ref frame context in ext-refs
Change-Id: Ie58b98baa870c5d2a5b7193f8fe4f84fd7ec6c16
2016-08-30 10:20:04 -07:00
Yunqing Wang
ed07056f1a Fix Visual Studio build warnings
Fixed a list of VS warnings. Warning message:
..\test\vp10_convolve_test.cc(34): warning C4244: 'initializing' : conversion
from 'ptrdiff_t' to 'int', possible loss of data

Change-Id: I9a1d3978a79fbb7b1ac028c5713ac72b6ff99172
2016-08-30 09:40:24 -07:00
Debargha Mukherjee
df73dd0dc3 Merge "clpf experiment build fix" into nextgenv2 2016-08-30 05:45:33 +00:00
Sarah Parker
4dc0f1b186 Implement global motion parameter computation
This computes global motion parameters between 2 frames by
matching corresponding points using FAST feature and then
fitting a model using RANSAC.

Change-Id: Ib6664df44090e8cfa4db9f2f9e0556931ccfe5c8
2016-08-29 16:59:43 -07:00
Yunqing Wang
e9947a8d2d Remove unused buffer allocation functions
Removed unused buffer allocation functions.

Change-Id: I5aa265a7698a2d2df736ddb89c6d93a5ee34895b
2016-08-29 15:02:09 -07:00
Debargha Mukherjee
162f5f792b Merge "Tiling in loop restoration + cosmetics" into nextgenv2 2016-08-29 20:46:13 +00:00
Debargha Mukherjee
100846a8ac clpf experiment build fix
Change-Id: I729e14916ecb58b5a75756078ab96a2d340bc0d6
2016-08-29 12:28:00 -07:00
Aamir Anis
e40e6e576a Tiling in loop restoration + cosmetics
Frame can be split into rectangular tiles for application of separate
bilateral or Wiener filters per tile. Some variable names changed for
better readability.

Change-Id: I13ebc4d0b0baf368e524db5ce276f03ed76af9c8
2016-08-29 11:24:11 -07:00
Debargha Mukherjee
8ee5ab9f13 Fix for supertx with rect-tx
Change-Id: I0cc3523a8992f889f8dd203449ceb55f2a422324
2016-08-29 11:16:17 -07:00
Pascal Massimino
04ed7ad57b fix uint32_t <-> size_t mismatch in tests
Change-Id: Ifde4d57957219560e01ebd1657f1c0721f041054
2016-08-29 09:09:09 +02:00
James Zern
f7a865859b Merge "vp10_alloc_context_buffers: clear cm->mi* on failure" into nextgenv2 2016-08-27 16:43:53 +00:00
Jingning Han
003dff6962 Merge "Fix VS build warnings in blend_a64_mask_test.cc" into nextgenv2 2016-08-26 21:47:39 +00:00
Hui Su
976a9b9304 Merge "Remove unnecessary buffer pointers in PICK_MODE_CONTEXT" into nextgenv2 2016-08-26 21:38:14 +00:00
Jingning Han
91ae5d954a Fix VS build warnings in blend_a64_mask_test.cc
Change-Id: Id4c764198549a60d98e5c4a74083972b97da5b81
2016-08-26 11:25:08 -07:00
Debargha Mukherjee
8b7e4dbaf4 Fix compile error in dering
Change-Id: I56890c813de1b366e4ef482d9fc6da81034636ab
2016-08-26 10:48:16 -07:00
Geza Lore
d21982c80f Use rectangular transforms for >= 8x8 blocks
For rectangular blocks between 8x8 and 32x32, we can now code the
transform size as one bigger than the largest square that fits in
the block (eg, for 16x8, we can code a transform size of 16x16
rather than the previous maximum of 8x8), when this oversized
transform is coded in the bitstream, the codec will use the full
size rectangular transform for that block (eg 16x8 transform in
the above example).

Also fixes a scaling bug in 16x8/8x16 transforms.

Change-Id: I62ce75f1b01c46fe2fbc727ce4abef695f4fcd43
2016-08-25 17:31:51 -07:00
Sarah Parker
a97fd6c43e Merge "Update VP9_PROB_COST_SHIFT to VP10_PROB_COST_SHIFT" into nextgenv2 2016-08-25 18:31:22 +00:00
Wei-ting Lin
4c7e1cd973 Separate EXT_ARFs' frame context index
This commit separate the frame index of EXT_ARFs' from other frame
types in the ext-refs setting.

It improves the average RD performance by

0.206% in the lowres, and
0.173% in the midres.

The overall gains for the ext-refs compared to the baseline are

5.665% in the lowres, and
4.883% in the midres.

Change-Id: I6591ad29120880c1aef0bd0b7cf15238c3f3b8f3
2016-08-25 09:31:00 -07:00
Yunqing Wang
167a4efbb5 Merge "Fix motion vector out of range bugs" into nextgenv2 2016-08-25 15:29:20 +00:00
Sarah Parker
6632915485 Update VP9_PROB_COST_SHIFT to VP10_PROB_COST_SHIFT
Change-Id: Ie1416569e73e66518cdb2765d79a2fb3dd570489
2016-08-24 17:25:00 -07:00
Yunqing Wang
90e12eaecb Fix motion vector out of range bugs
2 bugs were fixed in VP9.
https://chromium-review.googlesource.com/#/c/366873/
https://chromium-review.googlesource.com/#/c/368440/
Fixed them in VP10 as well.

Change-Id: I2e53fabc6131ff80ba6dcfd4c73eb76c59b4c474
2016-08-24 17:11:14 -07:00
Urvang Joshi
c691864423 Merge "gitignore: add some entries" into nextgenv2 2016-08-24 22:12:03 +00:00
Urvang Joshi
4b3f980828 Merge "test_intra_pred_speed fix: use dspr2 version when HAVE_DSPR2" into nextgenv2 2016-08-24 22:10:01 +00:00
hui su
71c625d758 Remove unnecessary buffer pointers in PICK_MODE_CONTEXT
Change-Id: I600af6a66dc0e1310c8bfc7c16efa8a82e90856d
2016-08-24 14:18:56 -07:00
Yue Chen
35d4524b5b Merge "Make rectangular txfm in EXT_TX work with VAR_TX" into nextgenv2 2016-08-23 23:54:40 +00:00
Urvang Joshi
f1906e966a Palette code: remove the use of same if condition twice.
rd_pick_palette_intra_sby() method is called only when,
cpi->common.allow_screen_content_tools is on. So, no need to check that
again. We just use an assert() instead to still be safe.

Change-Id: I19785c2aac016798c8d331bbe91971b3806b73a8
2016-08-23 15:01:41 -07:00
Urvang Joshi
7e5aa9e7a5 Merge "Rename CONFIG_VPX_HIGHBITDEPTH -> CONFIG_VP9_HIGHBITDEPTH" into nextgenv2 2016-08-23 19:33:53 +00:00
Yue Chen
e57b1a5ea5 Make rectangular txfm in EXT_TX work with VAR_TX
Adapt rectangular txfm experiment to syntax/tokenization/loopfilter
framework of VAR_TX

Change-Id: Idcb005ecf5b3712de3e1cccb0d811ca16d87af24
2016-08-23 12:11:23 -07:00
Urvang Joshi
e4e63b63c0 Rename CONFIG_VPX_HIGHBITDEPTH -> CONFIG_VP9_HIGHBITDEPTH
"vpx-highbitdepth" config doesn't exist.

Change-Id: Ib6d3691454299bb381ecc75b80657fbebf9f59b2
2016-08-23 12:04:18 -07:00
Urvang Joshi
3bcf3f07ac test_intra_pred_speed fix: use dspr2 version when HAVE_DSPR2
Change-Id: Ie7c78e19e077516615c71669022f505f8b3c80ca
2016-08-23 11:29:44 -07:00
Urvang Joshi
3ea2c234fa gitignore: add some entries
Change-Id: I65507c3d132b2b3ba90cf0a7b1c729da7d3de15f
2016-08-23 11:19:17 -07:00
Wei-ting Lin
7fed5044ca Merge "Allow LF_UPDATE type of frames to use BWDREF" into nextgenv2 2016-08-23 18:06:56 +00:00
Debargha Mukherjee
49b85d3965 Missing fixes for rect-tx
Reintroducing some fixes that were dropped inadvertently in
course of rebasing.

Change-Id: I5f51160c586010590d4bfd5cf225fb21347b0a40
2016-08-23 07:12:51 -07:00
Yaowu Xu
9a89ec5447 Merge "Make type conversion explicit" into nextgenv2 2016-08-23 01:28:05 +00:00
Debargha Mukherjee
ccbefec3d8 Merge "Various rect-tx fixes" into nextgenv2 2016-08-23 01:00:39 +00:00
Yaowu Xu
04fe3499a4 Make type conversion explicit
This fixes two MSVC compiler warnings.

Change-Id: I55ad8833676e20c2c4a55885b99a7a9293d9623f
2016-08-23 00:01:00 +00:00
Yaowu Xu
88849e1395 Merge "Apply clang-format" into nextgenv2 2016-08-23 00:00:48 +00:00
Wei-ting Lin
4e8acca925 Allow LF_UPDATE type of frames to use BWDREF
Originally, only bi-pred type of frames can use BWDREF. When
extra alt-refs are inserted in a gf group, the closest alt-ref
serves as ALTREF for the frames within the corresponding
subgroup. Therefore, the original alt-ref can be used as BWDREF
for the LF_UPDATE type of frames.

This patch further swaps the virtual indices of BWDREF and ALTREF
for those frames whose BWDREF is farther than ALTREF. As a result,
the BWDREF is always the closet backward reference frame, and the
ALTREF is the farther one.

It improves the average RD performance by

0.132% in lowres, and
0.030% in midres.

The overall gains for the ext-refs compared to the baseline are

5.486% in lowres, and
4.666% in midres.

Change-Id: I22e4e5f378f19c4c89196a0a5e9214adb46c3428
2016-08-22 17:00:41 -07:00
Yaowu Xu
c3cc46d8c2 Apply clang-format
Change-Id: Ie283af5f30324f54b4f749becdb48f937584707d
2016-08-22 16:22:10 -07:00
Debargha Mukherjee
44026851c3 Various rect-tx fixes
Change-Id: I02f44713b99284092ecfc50ce7ab268e91d2c6f8
2016-08-22 14:18:40 -07:00
Sarah Parker
3464aff41f Add integerize function back in warped_motion.c
This function was previously unused and removed in
I6bc740e778658d6f81ca54888fc6fa822d3b5ee0. I am adding it back in
with previously suggested fixes.

Change-Id: Iee0afb39170d25895b11d07e71843eae6913efd1
2016-08-22 12:29:26 -07:00
Urvang Joshi
3c7aa7ce2d Merge "Palette: count Y colors only for screen content." into nextgenv2 2016-08-19 23:18:39 +00:00
Urvang Joshi
28ca8554c5 Merge "Handle centroid rounding inside palette.c itself." into nextgenv2 2016-08-19 22:22:52 +00:00
Wei-ting Lin
7417932401 Merge "Insert extra ARFs' in a gf group" into nextgenv2 2016-08-19 22:10:41 +00:00
James Zern
7b2537b5e9 Merge "Fix compiler warnings in rdopt when warped motion is enabled" into nextgenv2 2016-08-19 21:39:59 +00:00
Urvang Joshi
d68c7b6d6d Palette: count Y colors only for screen content.
Change-Id: Id4e12708598100df54bdfcf8cdb248161ab6ef88
2016-08-19 13:02:02 -07:00
Urvang Joshi
f746c103a7 Handle centroid rounding inside palette.c itself.
Mostly refactoring, but a very tiny functional change:
Do all rounding in calc_centroids() itself, instead of rounding in two
places inside palette.c

This gives a slight performance improvement for screen content:
0.078% on average.

Change-Id: I7a0e007d30ebf4e59839483a167123f31a222dd4
2016-08-19 12:23:41 -07:00
Sarah Parker
984f073b8a Fix compiler warnings in rdopt when warped motion is enabled
The previous code was giving:
 unused variable ‘tmp_rate’ [-Wunused-variable]
 unused variable ‘tmp_dist’ [-Wunused-variable]
 ‘rate2_nocoeff’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Change-Id: I26326d0e5ffc141ad548654356a877cd3627cea6
2016-08-19 11:10:44 -07:00
James Zern
e0ab852f0b vp10_alloc_context_buffers: clear cm->mi* on failure
based on:
8b4c315 vp9_alloc_context_buffers: clear cm->mi* on failure

Change-Id: I3438a052721b960ff178cb647780f11bc33571fe
2016-08-19 10:39:46 -07:00
Alex Converse
32c92c97ea Merge "Don't send segment probability updates when the map isn't updated." into nextgenv2 2016-08-19 16:52:11 +00:00
James Zern
b360168783 Merge "apply clang-format" into nextgenv2 2016-08-19 07:31:50 +00:00
Wei-ting Lin
41d5d52d78 Insert extra ARFs' in a gf group
Insert multiple arfs in a gf group to emulate multi-layer backward
reference frames structure. At maximum, two extra ARF's are inserted
in a gf group.

It improves the RD performance by 0.317% in Avg in lowres dataset.

Change-Id: I62c32e1b0f25b978484dd113b319bebcd959bf60
2016-08-18 18:21:13 -07:00
Sarah Parker
daa4ba8d19 Disable global motion experiment when incompatible experiments are enabled
This is temporary until the global motion experiment is made to work
with ext_inter and dual_filter.

Change-Id: I73624ca6f536fd98218d7e07bcd7a2c1e6f5aebd
2016-08-18 16:00:38 -07:00
clang-format
21a0c2c9d7 apply clang-format
after:
253c001 Port dering experiment from aom
7208145 Adding 8x16/16x8/32x16/16x32 transforms

Change-Id: Id93e0d7b72a128701d8dec35fc2fac473944d0c1
2016-08-18 15:10:22 -07:00
Alex Converse
fd96aec9c6 Don't send segment probability updates when the map isn't updated.
BUG=webm:1275

Change-Id: I7d4bbaaf2f2146b023e1902fbc535a70e490cf2d
2016-08-18 18:02:01 +00:00
James Zern
0996fc6be3 Merge "fix mips msa build w/CONFIG_EXT_TX" into nextgenv2 2016-08-18 01:44:39 +00:00
Wei-ting Lin
c0235c2c21 Merge "Change the B-frame coding structure." into nextgenv2 2016-08-17 21:15:15 +00:00
Sarah Parker
d4553f5b4d Merge "Switch order of gm parameters for affine model" into nextgenv2 2016-08-17 20:30:52 +00:00
Yi Luo
bfeb90f92a Merge "Delete DCT 64x64 functions to save code size" into nextgenv2 2016-08-17 16:31:28 +00:00
Angie Chiang
688a2ed1f5 Remove __func__
Change-Id: Ibdf1c2d422b9e644eba76fc200c8c10217394036
2016-08-16 18:43:41 -07:00
James Zern
1c25b7f29e fix mips msa build w/CONFIG_EXT_TX
vp10_fht{16x16,8x8,4x4}_msa and the iht were disabled with this config
in:
4ab19ea Fix assertion failures in mips+msa setting

Change-Id: Ic675258b89ca490e8021c887b705c68428925129
2016-08-16 17:30:17 -07:00
Yi Luo
166dd79368 Delete DCT 64x64 functions to save code size
- gcc x86_64 build binary is about 47 KB smaller.

Change-Id: I9e5f41fc9c5c75aec453f8b8567e228a6a6cd71d
2016-08-16 17:16:05 -07:00
Sarah Parker
bec4fbe4be Switch order of gm parameters for affine model
This was originally subtracting 1 from the wrong element in the
parameter set.

Change-Id: I790aafc505f7a8fe7bb00d7d6c62549487a0980f
2016-08-16 15:06:31 -07:00
Wei-ting Lin
b20d0777a8 Change the B-frame coding structure.
Originally we can have a BRF right before an overlay frame (in
display order), which might be unnecessary since we already has a
quality backward reference frame (ARF).
This patch avoids such a coding structure and improves the RD
performance by 0.086% in Avg in the lowres dataset, and 0.153 in
Avg in the midres dataset.

In the lowres dataset, significant gains are obtained for the
following sequences:

mobisode2_240p: 0.563%
keiba_240p: 0.440%
bus_cif: 0.336%
soccer_cif: 0.333%

And the performance drops only in the following four video sequences:

motherdaughter_cif: 0.028%
bqsquare_240p: 0.017%
basketballpass_240p: 0.015%
bowing_cif: 0.006%

Change-Id: Ic94f648ba8e52eb0014933d484fb247610a9ae05
2016-08-16 10:52:24 -07:00
Yaowu Xu
253c001f8f Port dering experiment from aom
Mannually cherry-picked:
1579133 Use OD_DIVU for small divisions in temporal_filter.
0312229 Replace divides by small values with multiplies.
9c48eec Removing divisions from od_dir_find8()
0950ed8 Merge "Port active map / cyclic refresh fixes to vp10."
efefdad Port active map / cyclic refresh fixes to vp10.
1eaf748 Port switch to 9-bit rate cost to aom.
0b1606e Only build deringing code when --enable-dering.
e2511e1 Deringing cleanup: don't hardcode the number of levels
8fe5c5d Rename dering_in to od_dering_in to sync with Daala
4eb1380 Makes second filters for 45-degree directions horizontal
7f4c3f5 Removes the superblock variance contribution to the threshold
3dc56f9 Simplifying arithmetic by using multiply+shift
cf2aaba Return 0 explicitly for OD_ILOG(0).
49ca22a Use the Daala implementation of OD_ILOG().
8518724 Fix compiler warning in od_dering.c.
485d6a6 Prevent multiple inclusion of odintrin.h.
51b7a99 Adds the Daala deringing filter as experimental

Note that a few of the changes were already in libvpx codebse.

Change-Id: I1c32ee7694e5ad22c98b06ff97737cd792cd88ae
2016-08-16 13:47:18 +00:00
Yaowu Xu
0818a7c828 Port commits related to clpf and qm experiments
Manually cherry-picked following commits from AOMedia git repository:
bb2727c Sort includess for "clpf.h"
c297fd0 Add quantisation matrix range parameters.
0527894 Add encoder option and signaling for quant matrix control.
4106232 Turn off trellis coding for quantization matrices.
4017fca Modify tests to allow quantization matrices.
1c122c2 Add quant and dequant functions for new quant matrices.
95a8999 Enable CLPF
f72782b Fix a build issue
73bae50 Add quantisation matrices and selection functions
33208d2 Added support for constrained low pass filter (CLPF)

Change-Id: I60fc1ee1ac40e6b9d1d00affd97547ee5d5dd6be
2016-08-16 13:46:49 +00:00
Sarah Parker
ac917ec262 Merge "Fix dropped const qualifier in new_quant experiment" into nextgenv2 2016-08-16 03:09:54 +00:00
Sarah Parker
28666204ca Fix dropped const qualifier in new_quant experiment
This was causing a compiler warning from -Wcast-qual.

Change-Id: Ie525ffe20be4f38ced68fb0c4141e36400eb0717
2016-08-15 19:26:31 -07:00
James Zern
58b3813cda Merge changes from topic 'clang-format' into nextgenv2
* changes:
  remove tools/vpx-style.sh
  README: add a note about clang-format
2016-08-16 02:00:01 +00:00
Sarah Parker
9142e515e2 Merge "Fix precision bug in warped_motion.c" into nextgenv2 2016-08-15 18:51:37 +00:00
Debargha Mukherjee
7208145722 Adding 8x16/16x8/32x16/16x32 transforms
Adds forward, inverse transforms and scan orders.

Change-Id: Iab6994f4b0ef65e660b714d111b79b1c8172d6a8
2016-08-15 10:33:24 -07:00
Sarah Parker
99adc57976 Fix precision bug in warped_motion.c
The projected coordiantes in projectPointsTranslation were
being shifted by the incorrect precision.

Change-Id: If6040bea9e5187020d85c6095d85c7ff5786b7f9
2016-08-12 16:44:05 -07:00
James Zern
7dcd4993bb remove tools/vpx-style.sh
update ftfy.sh to use clang-format

Change-Id: I8ac740c5b3842beed2b8878fbe506f381f4c57e4
(cherry picked from commit 958ae5af9c)
2016-08-12 16:41:19 -07:00
James Zern
92ed0c9146 README: add a note about clang-format
Change-Id: I835401e3befffcbc68e7d2bdd2fd556a19948e91
(cherry picked from commit 15f29ef092)
2016-08-12 16:41:19 -07:00
James Zern
814986b84e Merge "webm{dec,enc}.cc,debug_util.c: apply clang-format" into nextgenv2 2016-08-12 23:40:17 +00:00
James Zern
09e3f49854 Merge "vp10/encoder: apply clang-format" into nextgenv2 2016-08-12 23:36:17 +00:00
clang-format
01f4c71719 webm{dec,enc}.cc,debug_util.c: apply clang-format
top-level *.cc were missed in the original change
debug_util.c was checked in with some warnings

Change-Id: I72999bf94d734ffc127bf6f96a8d17f9c313d5a0
2016-08-12 16:23:55 -07:00
James Zern
4efb9771ff Merge "vp10/common: apply clang-format" into nextgenv2 2016-08-12 23:23:04 +00:00
James Zern
ca502bf018 Merge "vp10_fwd_txfm2d_test: use sizeof(var)" into nextgenv2 2016-08-12 23:01:45 +00:00
clang-format
d9f9a34bb1 vp10/encoder: apply clang-format
Change-Id: I58a42ced5b8a4338524434ff3356850b89aa705a
2016-08-12 15:08:05 -07:00
clang-format
7feae8e84e vp10/common: apply clang-format
Change-Id: I01d8241eba3ccaf4d06c00a51df2d17c126f6f9d
2016-08-12 15:07:08 -07:00
James Zern
26777fca7b Merge "vp10/decoder,vp10/*.[hc]: apply clang-format" into nextgenv2 2016-08-12 22:01:48 +00:00
James Zern
245c5a865b vp10_fwd_txfm2d_test: use sizeof(var)
rather than sizeof(type)

Change-Id: I63755e4ca3810bec2d31013bebcc363c5c9f56ed
2016-08-12 14:58:07 -07:00
James Zern
ea74959b7f Merge "test/: apply clang-format" into nextgenv2 2016-08-12 21:57:05 +00:00
James Zern
79fa2f6eba Merge "reconintra_predictors_test: use new[] operator" into nextgenv2 2016-08-12 21:41:32 +00:00
clang-format
3a826f1d3d test/: apply clang-format
Change-Id: I1138fbeff5f63beb5c0de2c357793da12502d453
2016-08-12 12:40:41 -07:00
Yi Luo
4dc5bd7b71 Apply branch prediction on quantize/quantize_skip functions
- On E5-2680, park_joy_1080p, 5 frames, baseline encoding time
  reduces about 0.8~1.0%.
- Credit goes to Erik Niemeyer (erik.a.niemeyer@intel.com).

Change-Id: I69f191d5a4e4b96a5f9ffd8286e484b69d565c01
2016-08-12 12:37:32 -07:00
Yi Luo
712e66dafa reconintra_predictors_test: use new[] operator
fixes mix of malloc & delete[]

Change-Id: I89a1de0614234bf8b3dbe4aacfe71f75f39d08ff
2016-08-12 12:34:23 -07:00
Yi Luo
454fd586b3 Merge "Optimization for HBD filter intra predictors (SSE4.1)" into nextgenv2 2016-08-12 16:20:54 +00:00
clang-format
8a061d421e vp10/decoder,vp10/*.[hc]: apply clang-format
Change-Id: Ie4d7ecb2f692c1c43eff1242e1f00e7fbae00e57
2016-08-11 20:11:16 -07:00
Yi Luo
8e0360a130 Optimization for HBD filter intra predictors (SSE4.1)
- Add unit tests to verify the bit-exact.
- Speed unit test, function improvement: about 8%-23%.
- On E5-2680, park_joy_1080p_12, 25 frames, --kf-max-dist=1
  encoding time improves from <1% to 3.5%

Change-Id: Ic16368885bb253db0200c3a6db143ab1a0b7fc26
2016-08-11 17:34:51 -07:00
James Zern
9c6a7cabd8 Merge "vpx_mem/: apply clang-format" into nextgenv2 2016-08-12 00:18:19 +00:00
Angie Chiang
d2697fce4e Merge "Bitstream debug tool" into nextgenv2 2016-08-11 23:44:08 +00:00
Debargha Mukherjee
c94b635190 Merge "A fix in optimize_b for new-quant" into nextgenv2 2016-08-11 22:09:33 +00:00
clang-format
031d46c941 vpx_mem/: apply clang-format
Change-Id: Ib21077a85ded17823ab62e0b7fdf663ae3dbc05d
2016-08-11 13:02:30 -07:00
Angie Chiang
4de81ee1f1 Bitstream debug tool
This is a debug tool used to detect bitstream error. On encoder side, it pushes
each bit and probability into a queue before the bit is written into the
Arithmetic coder. On decoder side, whenever a bit is read out from the
Arithmetic coder, it pops up the reference bit and probability from the queue as
well. If the two results do not match, this debug tool will report an error.
This tool can be used to pin down the bitstream error precisely. By combining
gdb's backtrace method, we can detect which module causes the bitstream error.

Change-Id: I133a4371fafdd48c488f2ca47f9e395676c401f2
2016-08-11 11:16:04 -07:00
clang-format
05ce850890 vpx_ports/: apply clang-format
Change-Id: I9654530a34a3d0691baeca9d62184cd7b9ac3b4c
2016-08-11 10:52:34 -07:00
Debargha Mukherjee
f4112212da A fix in optimize_b for new-quant
Change-Id: I5a7bd3c2d0c7f6cf714367674f1d75510659b54d
2016-08-11 10:01:54 -07:00
Zoe Liu
cdd4eb0291 Fix a bug in RATE_FACTOR_LEVEL definition for ext-refs
There was a bug in the original set up for RATE_FACTOR_LEVELS, which
results that rate_factor_deltas for GF_ARF_STD is 2.00, instead of the
intentional value of 1.75, whereas for KF_STD is 0.00, instead of the
intentional value of 2.00.

Nevertheless, if simply fixing the bug as in the first patch, the RD
performance unexpectedly dropped by 0.143% in Avg bitrate using
Overall PSNR, especially for following sequences in lowres:

bridge_close_cif: dropped by 1.468%
container_cif: dropped by 2.140%
husky_cif: dropped by 0.826%
motherdaughter_cif: dropped by 0.798%
rasehorses_240p: dropped by 0.805%
students_cif: dropped by 1.411%

This indicates that we should boost up the value for GF_ARF_STD from
1.75 to at least to 2.00. After doing so, while still keeps 2.00 for
KF_STD, the new patch achieves a small gain of 0.15% for the baseline,
and a smaller gain of 0.06% for the experiment of ext-refs. Most
sequences keep the similar RD performance in lowres, except for the
following ones that obtain a bigger gain:

(1) Baseline:
container_cif: 1.628%
students_cif: 1.015%

(2) ext-refs
tennis_sif: 1.248%

Change-Id: I992f8f6a3e20f1b71ec52a1ddc969af4968b78d5
2016-08-11 09:47:46 -07:00
Yaowu Xu
445274d962 Merge "vpx_scale/: apply clang-format" into nextgenv2 2016-08-11 14:33:47 +00:00
clang-format
923d155179 vpx_scale/: apply clang-format
Change-Id: I514654a0704512fb44c7eef5dd045a5767df953a
2016-08-10 23:53:14 -07:00
James Zern
db6a1120a9 Merge "vpx_util/: apply clang-format" into nextgenv2 2016-08-11 04:00:23 +00:00
James Zern
45d1294fdf Merge changes from topic 'clang-format' into nextgenv2
* changes:
  vpx_dsp/: apply clang-format
  vpx/: apply clang-format
  top-level: apply clang-format
  examples: apply clang-format
2016-08-11 03:55:53 +00:00
clang-format
3a992f848a vpx_util/: apply clang-format
Change-Id: I831214d16a5bbfdb86e24dbff8afe4ff4aeebdde
2016-08-10 17:15:04 -07:00
Zoe Liu
4e2d26bd17 Code clean on encoder rate controller
Change-Id: Iec29c00e24ac8c4f24d43142db6ae03f1b3945ac
2016-08-10 15:34:01 -07:00
clang-format
1214cee2f7 vpx_dsp/: apply clang-format
Change-Id: Ia3f96910409be4ae8a4907a2f0dee73b1af8f93d
2016-08-10 12:56:41 -07:00
clang-format
83a5207893 vpx/: apply clang-format
Change-Id: I727b41153cc7929a143e5c370623277558b66e80
2016-08-10 12:42:59 -07:00
clang-format
6c4d83ec9e top-level: apply clang-format
Change-Id: Iac1d97d84518649404e32b136b8fdd840723303c
2016-08-10 12:42:52 -07:00
clang-format
397d964f29 examples: apply clang-format
Change-Id: I06903104bf822819fae39e42fdb6e44d3f9d7787
2016-08-10 12:42:44 -07:00
Urvang Joshi
6dde801818 Palette code: Use built-in qsort() method; create remove_dup() method.
Change-Id: Id816413307334336a9f473540cf9aa0e789ea9e9
2016-08-10 12:10:09 -07:00
Debargha Mukherjee
1da3e129ff Merge "Fix for lossless with rect-tx" into nextgenv2 2016-08-10 19:06:47 +00:00
James Zern
4cfb8309f8 Merge changes I619b365d,I579a9328 into nextgenv2
* changes:
  lossless_test: mark tests as Large
  cpu_speed_test: mark speed 0 as Large
2016-08-10 19:06:16 +00:00
Urvang Joshi
a017c372e5 Merge "Palette code cleanup:" into nextgenv2 2016-08-10 19:05:11 +00:00
Yaowu Xu
d67a8feb93 Change to use proper types
block: from int64_t to int as it is a block index.
sse: from unsigned int to int64_t to reduce type conversion. 

Change-Id: Iec8104ff8a3fd3a77d4e451c12918bd869966c2f
2016-08-10 14:27:12 +00:00
Peter de Rivaz
ffbdc51018 Fix for lossless with rect-tx
Change-Id: Ibb1e5d5137c7717bc6a8683ad78d842c3e5f052e
2016-08-10 12:00:55 +00:00
James Zern
239bb16fef lossless_test: mark tests as Large
Change-Id: I619b365d636737da8b1a322bab3be973de53200d
2016-08-09 20:39:44 -07:00
James Zern
b5818b7722 cpu_speed_test: mark speed 0 as Large
TestTuneScreen / TestScreencastQ0 are the worst offenders

Change-Id: I579a93289aa431afbfea8a280ddcb1011ab1a8cf
2016-08-09 20:32:51 -07:00
Yaowu Xu
c57816fb58 Merge "vp10_highbd_quantize_fp: use const consistently" into nextgenv2 2016-08-10 03:13:42 +00:00
Yaowu Xu
f9efcb345a vp10_highbd_quantize_fp: use const consistently
Remove a few extra ones that are consistent with the definitions, this
fixes some MSVC warnings.

Change-Id: I4b26de4cca71f0ac85667bd641c448b44315941b
2016-08-10 03:13:22 +00:00
James Zern
9df7c2544a Merge "remove SVC" into nextgenv2 2016-08-10 03:07:07 +00:00
James Zern
cc73e1fcd4 remove SVC
spatial/temporal scalability are not supported in VP10 currently.
+ remove the unused vp10/encoder/skin_detection.[hc]

this also enables DatarateTestLarge for VP10 which passes with no
experiments enabled. these were removed previously when only the SVC
tests should have been:
134710a Disable tests not applicable to VP10

Change-Id: I9ee7a0dd5ad3d8cc1e8fd5f0a90260fa43da387c
2016-08-09 18:42:20 -07:00
Sarah Parker
b4d9a2caf3 Merge "Add interface to compute gm parameters in encodeframe" into nextgenv2 2016-08-10 00:37:28 +00:00
James Zern
d44472b646 Merge "remove vp8cx_set_ref.c" into nextgenv2 2016-08-10 00:03:54 +00:00
Sarah Parker
d616a5cee4 Add interface to compute gm parameters in encodeframe
This patch just creates the interface for global motion computation
and calls it from encodeframe. Currently, the function
compute_global_motion_feature_based is empty and the work to do
the actual parameter calculation will be added in a future patch.

Change-Id: Ife142742140079e1c1743b66f180aeb2ecea29ae
2016-08-09 16:00:59 -07:00
Wei-ting Lin
ffdd988427 Merge "Fix a bug for multi_arf_allowed" into nextgenv2 2016-08-09 21:09:57 +00:00
Urvang Joshi
d000020840 Palette code cleanup:
- Avoid some memcpy()s
- Remove indices array
- Make pre_indices array local
- Avoid rounding twice
- Other small simplifications

Change-Id: Iac3236daaad04f21f54054cdd9504de13b942a07
2016-08-09 11:53:34 -07:00
James Zern
2c14c539b3 remove vp8cx_set_ref.c
and the related tests. vpxcx_set_ref is the binary to use for vp10.

Change-Id: I4c4ce7b36b165e6d06b87fd6b53923a1c11e4e6c
2016-08-08 17:14:04 -07:00
James Zern
b869de9856 Merge "configure: test for -Wfloat-conversion" into nextgenv2 2016-08-08 21:48:14 +00:00
Yi Luo
dd2edd0ad5 Merge "Optimization EXT_INTRA's filtered intra predictor (SSE4.1)" into nextgenv2 2016-08-08 20:55:44 +00:00
Sarah Parker
b659281eec Add reconstruction using gm parameters
This patch only includes inter frame reconstruction using gm
parameters when GLOBAL_MOTION and/or VP9_HIGHBITDEPTH are enabled.
GM is not currently used when EXT_INTER or DUAL_FILTER is enabled.
This will be added in a followup patch. For now, these experiments
will take precedence over GLOBAL_MOTION when they are all enabled.

Change-Id: I930ddda529c44d7245dbb56db3c9c5524cf45473
2016-08-08 10:17:05 -07:00
Yi Luo
57c4711b5c Optimization EXT_INTRA's filtered intra predictor (SSE4.1)
- Add unit tests to verify the bit-exact result.
- In speed test, function speed (for each mode/tx_size)
  improves about 23%~35%.
- On E5-2680, park_joy_1080p, 10 frames, --kf-max-dist=1,
  encoding time improves about 1%~2%.

Change-Id: Id89f313d44eea562c02e775a6253dc4df7e046a9
2016-08-08 10:02:36 -07:00
Yue Chen
292ea74fe4 Merge "Speed filter intra mode search in EXT_INTRA experiment" into nextgenv2 2016-08-06 00:17:33 +00:00
James Zern
a9d984830a configure: test for -Wfloat-conversion
supported by clang, gcc-4.9+

Change-Id: I893766de7307fef9a8b68c0cfae137c9d3b0dbe8
(cherry picked from commit 889ed5b158)
2016-08-06 00:02:45 +00:00
James Zern
d609b520fa Merge "warped_motion: remove unused vp10_integerize_model" into nextgenv2 2016-08-06 00:02:15 +00:00
Yue Chen
f6a5c27493 Speed filter intra mode search in EXT_INTRA experiment
(1) Key frame: skip filter intra modes whose directional pred
    version is relatively bad (rd >= 1.125 * best_rd)
(2) Inter frame: do not check filter intra modes if best_intra_rd
    >= 1.25 * best_rd

Encoding time overhead is reduced by:
4.9% (9.2%->4.3%, soccer_cif)
Coding gains drop by 0.021% on lowres and by 0.076% on midres

Change-Id: I29b6f7d3d3dc4b362c6d63bc447e6a429ba5dc66
2016-08-05 23:04:46 +00:00
Wei-ting Lin
c0e55de06b Fix a bug for multi_arf_allowed
The ARF Index was wrong when updating the upsampled reference
frame buffer.

Compared to the baseline in which multi_arf_allowed is disabled, the
RD performance drops 2.250% in Avg using Overall PSNR in the derf
dataset. The performance decrease is especially in the following
video sequences:

foreman_cif: drops 7.489%
husky_cif: drops 6.421%
soccer_cif: drops 4.850%

However, it has a significant gain in the following video sequences:

container_cif: increases 8.043%
harbour_cif: increases 1.332%

Change-Id: I02472909eb34bd070d7544f57383e72559fa42b3
2016-08-05 14:05:50 -07:00
Urvang Joshi
016a5daa59 Palette code: simpler and faster duplicate removal
Change-Id: I0c1baa5ca73c1f067d69239d3e31d1050b4706d2
2016-08-05 12:33:21 -07:00
James Zern
32427b379c warped_motion: remove unused vp10_integerize_model
this function produces implicit double -> int conversion warnings and
has additional style issues.

Change-Id: I6bc740e778658d6f81ca54888fc6fa822d3b5ee0
2016-08-03 15:52:03 -07:00
1076 changed files with 176186 additions and 146950 deletions

View File

@@ -1,10 +1,11 @@
--- ---
Language: Cpp Language: Cpp
# BasedOnStyle: Google # BasedOnStyle: Google
# Generated with clang-format 3.7.1 # Generated with clang-format 3.8.1
AccessModifierOffset: -1 AccessModifierOffset: -1
AlignAfterOpenBracket: true AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlinesLeft: true AlignEscapedNewlinesLeft: true
AlignOperands: true AlignOperands: true
AlignTrailingComments: true AlignTrailingComments: true
@@ -15,10 +16,23 @@ AllowShortFunctionsOnASingleLine: All
AllowShortIfStatementsOnASingleLine: true AllowShortIfStatementsOnASingleLine: true
AllowShortLoopsOnASingleLine: true AllowShortLoopsOnASingleLine: true
AlwaysBreakAfterDefinitionReturnType: None AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: true AlwaysBreakTemplateDeclarations: true
BinPackArguments: true BinPackArguments: true
BinPackParameters: true BinPackParameters: true
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
BeforeCatch: false
BeforeElse: false
IndentBraces: false
BreakBeforeBinaryOperators: None BreakBeforeBinaryOperators: None
BreakBeforeBraces: Attach BreakBeforeBraces: Attach
BreakBeforeTernaryOperators: true BreakBeforeTernaryOperators: true
@@ -33,6 +47,13 @@ DerivePointerAlignment: false
DisableFormat: false DisableFormat: false
ExperimentalAutoDetectBinPacking: false ExperimentalAutoDetectBinPacking: false
ForEachMacros: [ foreach, Q_FOREACH, BOOST_FOREACH ] ForEachMacros: [ foreach, Q_FOREACH, BOOST_FOREACH ]
IncludeCategories:
- Regex: '^<.*\.h>'
Priority: 1
- Regex: '^<.*'
Priority: 2
- Regex: '.*'
Priority: 3
IndentCaseLabels: true IndentCaseLabels: true
IndentWidth: 2 IndentWidth: 2
IndentWrappedFunctionNames: false IndentWrappedFunctionNames: false
@@ -51,6 +72,8 @@ PenaltyBreakString: 1000
PenaltyExcessCharacter: 1000000 PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 200 PenaltyReturnTypeOnItsOwnLine: 200
PointerAlignment: Right PointerAlignment: Right
ReflowComments: true
SortIncludes: false
SpaceAfterCStyleCast: false SpaceAfterCStyleCast: false
SpaceBeforeAssignmentOperators: true SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements SpaceBeforeParens: ControlStatements

41
.gitignore vendored
View File

@@ -29,37 +29,36 @@
/examples/decode_with_drops /examples/decode_with_drops
/examples/decode_with_partial_drops /examples/decode_with_partial_drops
/examples/example_xma /examples/example_xma
/examples/lossless_encoder
/examples/postproc /examples/postproc
/examples/resize_util /examples/resize_util
/examples/set_maps /examples/set_maps
/examples/simple_decoder /examples/simple_decoder
/examples/simple_encoder /examples/simple_encoder
/examples/twopass_encoder /examples/twopass_encoder
/examples/vp8_multi_resolution_encoder /examples/aom_cx_set_ref
/examples/vp8cx_set_ref /examples/av1_spatial_scalable_encoder
/examples/vp9_lossless_encoder /examples/aom_temporal_scalable_patterns
/examples/vp9_spatial_scalable_encoder /examples/aom_temporal_svc_encoder
/examples/vpx_temporal_scalable_patterns
/examples/vpx_temporal_svc_encoder
/ivfdec /ivfdec
/ivfdec.dox /ivfdec.dox
/ivfenc /ivfenc
/ivfenc.dox /ivfenc.dox
/libvpx.so* /libaom.so*
/libvpx.ver /libaom.ver
/samples.dox /samples.dox
/test_intra_pred_speed /test_intra_pred_speed
/test_libvpx /test_libaom
/vp8_api1_migration.dox /aom_api1_migration.dox
/vp[89x]_rtcd.h /av1_rtcd.h
/vpx.pc /aom.pc
/vpx_config.c /aom_config.c
/vpx_config.h /aom_config.h
/vpx_dsp_rtcd.h /aom_dsp_rtcd.h
/vpx_scale_rtcd.h /aom_scale_rtcd.h
/vpx_version.h /aom_version.h
/vpxdec /aomdec
/vpxdec.dox /aomdec.dox
/vpxenc /aomenc
/vpxenc.dox /aomenc.dox
TAGS TAGS

16
AUTHORS
View File

@@ -56,13 +56,16 @@ James Zern <jzern@google.com>
Jan Gerber <j@mailb.org> Jan Gerber <j@mailb.org>
Jan Kratochvil <jan.kratochvil@redhat.com> Jan Kratochvil <jan.kratochvil@redhat.com>
Janne Salonen <jsalonen@google.com> Janne Salonen <jsalonen@google.com>
Jean-Marc Valin <jmvalin@jmvalin.ca>
Jeff Faust <jfaust@google.com> Jeff Faust <jfaust@google.com>
Jeff Muizelaar <jmuizelaar@mozilla.com> Jeff Muizelaar <jmuizelaar@mozilla.com>
Jeff Petkau <jpet@chromium.org> Jeff Petkau <jpet@chromium.org>
Jia Jia <jia.jia@linaro.org> Jia Jia <jia.jia@linaro.org>
Jian Zhou <zhoujian@google.com>
Jim Bankoski <jimbankoski@google.com> Jim Bankoski <jimbankoski@google.com>
Jingning Han <jingning@google.com> Jingning Han <jingning@google.com>
Joey Parrish <joeyparrish@google.com> Joey Parrish <joeyparrish@google.com>
Johann Koenig <johannkoenig@chromium.org>
Johann Koenig <johannkoenig@google.com> Johann Koenig <johannkoenig@google.com>
John Koleszar <jkoleszar@google.com> John Koleszar <jkoleszar@google.com>
Johnny Klonaris <google@jawknee.com> Johnny Klonaris <google@jawknee.com>
@@ -89,6 +92,7 @@ Mike Hommey <mhommey@mozilla.com>
Mikhal Shemer <mikhal@google.com> Mikhal Shemer <mikhal@google.com>
Minghai Shang <minghai@google.com> Minghai Shang <minghai@google.com>
Morton Jonuschat <yabawock@gmail.com> Morton Jonuschat <yabawock@gmail.com>
Nathan E. Egge <negge@dgql.org>
Nico Weber <thakis@chromium.org> Nico Weber <thakis@chromium.org>
Parag Salasakar <img.mips1@gmail.com> Parag Salasakar <img.mips1@gmail.com>
Pascal Massimino <pascal.massimino@gmail.com> Pascal Massimino <pascal.massimino@gmail.com>
@@ -97,6 +101,7 @@ Paul Wilkins <paulwilkins@google.com>
Pavol Rusnak <stick@gk2.sk> Pavol Rusnak <stick@gk2.sk>
Paweł Hajdan <phajdan@google.com> Paweł Hajdan <phajdan@google.com>
Pengchong Jin <pengchong@google.com> Pengchong Jin <pengchong@google.com>
Peter de Rivaz <peter.derivaz@argondesign.com>
Peter de Rivaz <peter.derivaz@gmail.com> Peter de Rivaz <peter.derivaz@gmail.com>
Philip Jägenstedt <philipj@opera.com> Philip Jägenstedt <philipj@opera.com>
Priit Laes <plaes@plaes.org> Priit Laes <plaes@plaes.org>
@@ -107,13 +112,16 @@ Rob Bradford <rob@linux.intel.com>
Ronald S. Bultje <rsbultje@gmail.com> Ronald S. Bultje <rsbultje@gmail.com>
Rui Ueyama <ruiu@google.com> Rui Ueyama <ruiu@google.com>
Sami Pietilä <samipietila@google.com> Sami Pietilä <samipietila@google.com>
Sasi Inguva <isasi@google.com>
Scott Graham <scottmg@chromium.org> Scott Graham <scottmg@chromium.org>
Scott LaVarnway <slavarnway@google.com> Scott LaVarnway <slavarnway@google.com>
Sean McGovern <gseanmcg@gmail.com> Sean McGovern <gseanmcg@gmail.com>
Sergey Kolomenkin <kolomenkin@gmail.com>
Sergey Ulanov <sergeyu@chromium.org> Sergey Ulanov <sergeyu@chromium.org>
Shimon Doodkin <helpmepro1@gmail.com> Shimon Doodkin <helpmepro1@gmail.com>
Shunyao Li <shunyaoli@google.com> Shunyao Li <shunyaoli@google.com>
Stefan Holmer <holmer@google.com> Stefan Holmer <holmer@google.com>
Steinar Midtskogen <stemidts@cisco.com>
Suman Sunkara <sunkaras@google.com> Suman Sunkara <sunkaras@google.com>
Taekhyun Kim <takim@nvidia.com> Taekhyun Kim <takim@nvidia.com>
Takanori MATSUURA <t.matsuu@gmail.com> Takanori MATSUURA <t.matsuu@gmail.com>
@@ -121,14 +129,16 @@ Tamar Levy <tamar.levy@intel.com>
Tao Bai <michaelbai@chromium.org> Tao Bai <michaelbai@chromium.org>
Tero Rintaluoma <teror@google.com> Tero Rintaluoma <teror@google.com>
Thijs Vermeir <thijsvermeir@gmail.com> Thijs Vermeir <thijsvermeir@gmail.com>
Thomas Daede <tdaede@mozilla.com>
Thomas Davies <thdavies@cisco.com>
Thomas <thdavies@cisco.com>
Tim Kopp <tkopp@google.com> Tim Kopp <tkopp@google.com>
Timothy B. Terriberry <tterribe@xiph.org> Timothy B. Terriberry <tterribe@xiph.org>
Tom Finegan <tomfinegan@google.com> Tom Finegan <tomfinegan@google.com>
Tristan Matthews <le.businessman@gmail.com>
Tristan Matthews <tmatth@videolan.org>
Vignesh Venkatasubramanian <vigneshv@google.com> Vignesh Venkatasubramanian <vigneshv@google.com>
Yaowu Xu <yaowu@google.com> Yaowu Xu <yaowu@google.com>
Yongzhe Wang <yongzhe@google.com> Yongzhe Wang <yongzhe@google.com>
Yunqing Wang <yunqingwang@google.com> Yunqing Wang <yunqingwang@google.com>
Zoe Liu <zoeliu@google.com> Zoe Liu <zoeliu@google.com>
Google Inc.
The Mozilla Foundation
The Xiph.Org Foundation

View File

@@ -1,7 +1,9 @@
Next Release Next Release
- Incompatible changes: - Incompatible changes:
The VP9 encoder's default keyframe interval changed to 128 from 9999. The AV1 encoder's default keyframe interval changed to 128 from 9999.
2016-04-07 v0.1.0 "AOMedia Codec 1"
This release is the first Alliance for Open Media codec.
2015-11-09 v1.5.0 "Javan Whistling Duck" 2015-11-09 v1.5.0 "Javan Whistling Duck"
This release improves upon the VP9 encoder and speeds up the encoding and This release improves upon the VP9 encoder and speeds up the encoding and
decoding processes. decoding processes.

270
CMakeLists.txt Normal file
View File

@@ -0,0 +1,270 @@
##
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
cmake_minimum_required(VERSION 3.2)
project(AOM C CXX)
set(AOM_ROOT "${CMAKE_CURRENT_SOURCE_DIR}")
set(AOM_CONFIG_DIR "${CMAKE_CURRENT_BINARY_DIR}")
include("${AOM_ROOT}/build/cmake/aom_configure.cmake")
set(AOM_SRCS
"${AOM_CONFIG_DIR}/aom_config.c"
"${AOM_CONFIG_DIR}/aom_config.h"
"${AOM_ROOT}/aom/aom.h"
"${AOM_ROOT}/aom/aom_codec.h"
"${AOM_ROOT}/aom/aom_decoder.h"
"${AOM_ROOT}/aom/aom_encoder.h"
"${AOM_ROOT}/aom/aom_frame_buffer.h"
"${AOM_ROOT}/aom/aom_image.h"
"${AOM_ROOT}/aom/aom_integer.h"
"${AOM_ROOT}/aom/aomcx.h"
"${AOM_ROOT}/aom/aomdx.h"
"${AOM_ROOT}/aom/internal/aom_codec_internal.h"
"${AOM_ROOT}/aom/src/aom_codec.c"
"${AOM_ROOT}/aom/src/aom_decoder.c"
"${AOM_ROOT}/aom/src/aom_encoder.c"
"${AOM_ROOT}/aom/src/aom_image.c")
set(AOM_DSP_SRCS
"${AOM_ROOT}/aom_dsp/aom_convolve.c"
"${AOM_ROOT}/aom_dsp/aom_convolve.h"
"${AOM_ROOT}/aom_dsp/aom_dsp_common.h"
"${AOM_ROOT}/aom_dsp/aom_dsp_rtcd.c"
"${AOM_ROOT}/aom_dsp/aom_filter.h"
"${AOM_ROOT}/aom_dsp/aom_simd.c"
"${AOM_ROOT}/aom_dsp/aom_simd.h"
"${AOM_ROOT}/aom_dsp/aom_simd_inline.h"
"${AOM_ROOT}/aom_dsp/avg.c"
"${AOM_ROOT}/aom_dsp/bitreader.h"
"${AOM_ROOT}/aom_dsp/bitreader_buffer.c"
"${AOM_ROOT}/aom_dsp/bitreader_buffer.h"
"${AOM_ROOT}/aom_dsp/bitwriter.h"
"${AOM_ROOT}/aom_dsp/bitwriter_buffer.c"
"${AOM_ROOT}/aom_dsp/bitwriter_buffer.h"
"${AOM_ROOT}/aom_dsp/blend.h"
"${AOM_ROOT}/aom_dsp/blend_a64_hmask.c"
"${AOM_ROOT}/aom_dsp/blend_a64_mask.c"
"${AOM_ROOT}/aom_dsp/blend_a64_vmask.c"
"${AOM_ROOT}/aom_dsp/dkboolreader.c"
"${AOM_ROOT}/aom_dsp/dkboolreader.h"
"${AOM_ROOT}/aom_dsp/dkboolwriter.c"
"${AOM_ROOT}/aom_dsp/dkboolwriter.h"
"${AOM_ROOT}/aom_dsp/fwd_txfm.c"
"${AOM_ROOT}/aom_dsp/fwd_txfm.h"
"${AOM_ROOT}/aom_dsp/intrapred.c"
"${AOM_ROOT}/aom_dsp/inv_txfm.c"
"${AOM_ROOT}/aom_dsp/inv_txfm.h"
"${AOM_ROOT}/aom_dsp/loopfilter.c"
"${AOM_ROOT}/aom_dsp/prob.c"
"${AOM_ROOT}/aom_dsp/prob.h"
"${AOM_ROOT}/aom_dsp/psnr.c"
"${AOM_ROOT}/aom_dsp/psnr.h"
"${AOM_ROOT}/aom_dsp/quantize.c"
"${AOM_ROOT}/aom_dsp/quantize.h"
"${AOM_ROOT}/aom_dsp/sad.c"
"${AOM_ROOT}/aom_dsp/simd/v128_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v128_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/simd/v256_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v256_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/simd/v64_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v64_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/subtract.c"
"${AOM_ROOT}/aom_dsp/txfm_common.h"
"${AOM_ROOT}/aom_dsp/variance.c"
"${AOM_ROOT}/aom_dsp/variance.h")
set(AOM_MEM_SRCS
"${AOM_ROOT}/aom_mem/aom_mem.c"
"${AOM_ROOT}/aom_mem/aom_mem.h"
"${AOM_ROOT}/aom_mem/include/aom_mem_intrnl.h")
set(AOM_SCALE_SRCS
"${AOM_ROOT}/aom_scale/aom_scale.h"
"${AOM_ROOT}/aom_scale/aom_scale_rtcd.c"
"${AOM_ROOT}/aom_scale/generic/aom_scale.c"
"${AOM_ROOT}/aom_scale/generic/gen_scalers.c"
"${AOM_ROOT}/aom_scale/generic/yv12config.c"
"${AOM_ROOT}/aom_scale/generic/yv12extend.c"
"${AOM_ROOT}/aom_scale/yv12config.h")
# TODO(tomfinegan): Extract aom_ports from aom_util if possible.
set(AOM_UTIL_SRCS
"${AOM_ROOT}/aom_ports/aom_once.h"
"${AOM_ROOT}/aom_ports/aom_timer.h"
"${AOM_ROOT}/aom_ports/bitops.h"
"${AOM_ROOT}/aom_ports/emmintrin_compat.h"
"${AOM_ROOT}/aom_ports/mem.h"
"${AOM_ROOT}/aom_ports/mem_ops.h"
"${AOM_ROOT}/aom_ports/mem_ops_aligned.h"
"${AOM_ROOT}/aom_ports/msvc.h"
"${AOM_ROOT}/aom_ports/system_state.h"
"${AOM_ROOT}/aom_util/aom_thread.c"
"${AOM_ROOT}/aom_util/aom_thread.h"
"${AOM_ROOT}/aom_util/endian_inl.h")
set(AOM_AV1_COMMON_SRCS
"${AOM_ROOT}/av1/av1_iface_common.h"
"${AOM_ROOT}/av1/common/alloccommon.c"
"${AOM_ROOT}/av1/common/alloccommon.h"
"${AOM_ROOT}/av1/common/av1_fwd_txfm.c"
"${AOM_ROOT}/av1/common/av1_fwd_txfm.h"
"${AOM_ROOT}/av1/common/av1_inv_txfm.c"
"${AOM_ROOT}/av1/common/av1_inv_txfm.h"
"${AOM_ROOT}/av1/common/av1_rtcd.c"
"${AOM_ROOT}/av1/common/blockd.c"
"${AOM_ROOT}/av1/common/blockd.h"
"${AOM_ROOT}/av1/common/common.h"
"${AOM_ROOT}/av1/common/common_data.h"
"${AOM_ROOT}/av1/common/convolve.c"
"${AOM_ROOT}/av1/common/convolve.h"
"${AOM_ROOT}/av1/common/debugmodes.c"
"${AOM_ROOT}/av1/common/entropy.c"
"${AOM_ROOT}/av1/common/entropy.h"
"${AOM_ROOT}/av1/common/entropymode.c"
"${AOM_ROOT}/av1/common/entropymode.h"
"${AOM_ROOT}/av1/common/entropymv.c"
"${AOM_ROOT}/av1/common/entropymv.h"
"${AOM_ROOT}/av1/common/enums.h"
"${AOM_ROOT}/av1/common/filter.c"
"${AOM_ROOT}/av1/common/filter.h"
"${AOM_ROOT}/av1/common/frame_buffers.c"
"${AOM_ROOT}/av1/common/frame_buffers.h"
"${AOM_ROOT}/av1/common/idct.c"
"${AOM_ROOT}/av1/common/idct.h"
"${AOM_ROOT}/av1/common/loopfilter.c"
"${AOM_ROOT}/av1/common/loopfilter.h"
"${AOM_ROOT}/av1/common/mv.h"
"${AOM_ROOT}/av1/common/mvref_common.c"
"${AOM_ROOT}/av1/common/mvref_common.h"
"${AOM_ROOT}/av1/common/odintrin.c"
"${AOM_ROOT}/av1/common/odintrin.h"
"${AOM_ROOT}/av1/common/onyxc_int.h"
"${AOM_ROOT}/av1/common/pred_common.c"
"${AOM_ROOT}/av1/common/pred_common.h"
"${AOM_ROOT}/av1/common/quant_common.c"
"${AOM_ROOT}/av1/common/quant_common.h"
"${AOM_ROOT}/av1/common/reconinter.c"
"${AOM_ROOT}/av1/common/reconinter.h"
"${AOM_ROOT}/av1/common/reconintra.c"
"${AOM_ROOT}/av1/common/reconintra.h"
"${AOM_ROOT}/av1/common/scale.c"
"${AOM_ROOT}/av1/common/scale.h"
"${AOM_ROOT}/av1/common/scan.c"
"${AOM_ROOT}/av1/common/scan.h"
"${AOM_ROOT}/av1/common/seg_common.c"
"${AOM_ROOT}/av1/common/seg_common.h"
"${AOM_ROOT}/av1/common/thread_common.c"
"${AOM_ROOT}/av1/common/thread_common.h"
"${AOM_ROOT}/av1/common/tile_common.c"
"${AOM_ROOT}/av1/common/tile_common.h")
set(AOM_AV1_DECODER_SRCS
"${AOM_ROOT}/av1/av1_dx_iface.c"
"${AOM_ROOT}/av1/decoder/decodeframe.c"
"${AOM_ROOT}/av1/decoder/decodeframe.h"
"${AOM_ROOT}/av1/decoder/decodemv.c"
"${AOM_ROOT}/av1/decoder/decodemv.h"
"${AOM_ROOT}/av1/decoder/decoder.c"
"${AOM_ROOT}/av1/decoder/decoder.h"
"${AOM_ROOT}/av1/decoder/detokenize.c"
"${AOM_ROOT}/av1/decoder/detokenize.h"
"${AOM_ROOT}/av1/decoder/dsubexp.c"
"${AOM_ROOT}/av1/decoder/dsubexp.h"
"${AOM_ROOT}/av1/decoder/dthread.c"
"${AOM_ROOT}/av1/decoder/dthread.h")
set(AOM_AV1_ENCODER_SRCS
"${AOM_ROOT}/av1/av1_cx_iface.c"
"${AOM_ROOT}/av1/encoder/aq_complexity.c"
"${AOM_ROOT}/av1/encoder/aq_complexity.h"
"${AOM_ROOT}/av1/encoder/aq_cyclicrefresh.c"
"${AOM_ROOT}/av1/encoder/aq_cyclicrefresh.h"
"${AOM_ROOT}/av1/encoder/aq_variance.c"
"${AOM_ROOT}/av1/encoder/aq_variance.h"
"${AOM_ROOT}/av1/encoder/bitstream.c"
"${AOM_ROOT}/av1/encoder/bitstream.h"
"${AOM_ROOT}/av1/encoder/block.h"
"${AOM_ROOT}/av1/encoder/context_tree.c"
"${AOM_ROOT}/av1/encoder/context_tree.h"
"${AOM_ROOT}/av1/encoder/cost.c"
"${AOM_ROOT}/av1/encoder/cost.h"
"${AOM_ROOT}/av1/encoder/dct.c"
"${AOM_ROOT}/av1/encoder/encodeframe.c"
"${AOM_ROOT}/av1/encoder/encodeframe.h"
"${AOM_ROOT}/av1/encoder/encodemb.c"
"${AOM_ROOT}/av1/encoder/encodemb.h"
"${AOM_ROOT}/av1/encoder/encodemv.c"
"${AOM_ROOT}/av1/encoder/encodemv.h"
"${AOM_ROOT}/av1/encoder/encoder.c"
"${AOM_ROOT}/av1/encoder/encoder.h"
"${AOM_ROOT}/av1/encoder/ethread.c"
"${AOM_ROOT}/av1/encoder/ethread.h"
"${AOM_ROOT}/av1/encoder/extend.c"
"${AOM_ROOT}/av1/encoder/extend.h"
"${AOM_ROOT}/av1/encoder/firstpass.c"
"${AOM_ROOT}/av1/encoder/firstpass.h"
"${AOM_ROOT}/av1/encoder/hybrid_fwd_txfm.c"
"${AOM_ROOT}/av1/encoder/hybrid_fwd_txfm.h"
"${AOM_ROOT}/av1/encoder/lookahead.c"
"${AOM_ROOT}/av1/encoder/lookahead.h"
"${AOM_ROOT}/av1/encoder/mbgraph.c"
"${AOM_ROOT}/av1/encoder/mbgraph.h"
"${AOM_ROOT}/av1/encoder/mcomp.c"
"${AOM_ROOT}/av1/encoder/mcomp.h"
"${AOM_ROOT}/av1/encoder/picklpf.c"
"${AOM_ROOT}/av1/encoder/picklpf.h"
"${AOM_ROOT}/av1/encoder/quantize.c"
"${AOM_ROOT}/av1/encoder/quantize.h"
"${AOM_ROOT}/av1/encoder/ratectrl.c"
"${AOM_ROOT}/av1/encoder/ratectrl.h"
"${AOM_ROOT}/av1/encoder/rd.c"
"${AOM_ROOT}/av1/encoder/rd.h"
"${AOM_ROOT}/av1/encoder/rdopt.c"
"${AOM_ROOT}/av1/encoder/rdopt.h"
"${AOM_ROOT}/av1/encoder/resize.c"
"${AOM_ROOT}/av1/encoder/resize.h"
"${AOM_ROOT}/av1/encoder/segmentation.c"
"${AOM_ROOT}/av1/encoder/segmentation.h"
"${AOM_ROOT}/av1/encoder/speed_features.c"
"${AOM_ROOT}/av1/encoder/speed_features.h"
"${AOM_ROOT}/av1/encoder/subexp.c"
"${AOM_ROOT}/av1/encoder/subexp.h"
"${AOM_ROOT}/av1/encoder/temporal_filter.c"
"${AOM_ROOT}/av1/encoder/temporal_filter.h"
"${AOM_ROOT}/av1/encoder/tokenize.c"
"${AOM_ROOT}/av1/encoder/tokenize.h"
"${AOM_ROOT}/av1/encoder/treewriter.c"
"${AOM_ROOT}/av1/encoder/treewriter.h")
# Targets
add_library(aom_dsp ${AOM_DSP_SRCS})
include_directories(${AOM_ROOT} ${AOM_CONFIG_DIR})
add_library(aom_mem ${AOM_MEM_SRCS})
add_library(aom_scale ${AOM_SCALE_SRCS})
include_directories(${AOM_ROOT} ${AOM_CONFIG_DIR})
add_library(aom_util ${AOM_UTIL_SRCS})
add_library(aom_av1_decoder ${AOM_AV1_DECODER_SRCS})
add_library(aom_av1_encoder ${AOM_AV1_ENCODER_SRCS})
add_library(aom ${AOM_SRCS})
target_link_libraries(aom LINK_PUBLIC
aom_dsp
aom_mem
aom_scale
aom_util
aom_av1_decoder
aom_av1_encoder)
add_executable(simple_decoder examples/simple_decoder.c)
include_directories(${AOM_ROOT})
target_link_libraries(simple_decoder LINK_PUBLIC aom)
add_executable(simple_encoder examples/simple_encoder.c)
include_directories(${AOM_ROOT})
target_link_libraries(simple_encoder LINK_PUBLIC aom)

42
LICENSE
View File

@@ -1,31 +1,27 @@
Copyright (c) 2010, The WebM Project authors. All rights reserved. Copyright (c) 2016, Alliance for Open Media. All rights reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are modification, are permitted provided that the following conditions
met: are met:
* Redistributions of source code must retain the above copyright 1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer. notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright 2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the the documentation and/or other materials provided with the
distribution. distribution.
* Neither the name of Google, nor the WebM Project, nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

127
PATENTS
View File

@@ -1,23 +1,108 @@
Additional IP Rights Grant (Patents) Alliance for Open Media Patent License 1.0
------------------------------------
"These implementations" means the copyrightable works that implement the WebM 1. License Terms.
codecs distributed by Google as part of the WebM Project.
1.1. Patent License. Subject to the terms and conditions of this License, each
Licensor, on behalf of itself and successors in interest and assigns,
grants Licensee a non-sublicensable, perpetual, worldwide, non-exclusive,
no-charge, royalty-free, irrevocable (except as expressly stated in this
License) patent license to its Necessary Claims to make, use, sell, offer
for sale, import or distribute any Implementation.
1.2. Conditions.
1.2.1. Availability. As a condition to the grant of rights to Licensee to make,
sell, offer for sale, import or distribute an Implementation under
Section 1.1, Licensee must make its Necessary Claims available under
this License, and must reproduce this License with any Implementation
as follows:
a. For distribution in source code, by including this License in the
root directory of the source code with its Implementation.
b. For distribution in any other form (including binary, object form,
and/or hardware description code (e.g., HDL, RTL, Gate Level Netlist,
GDSII, etc.)), by including this License in the documentation, legal
notices, and/or other written materials provided with the
Implementation.
1.2.2. Additional Conditions. This license is directly from Licensor to
Licensee. Licensee acknowledges as a condition of benefiting from it
that no rights from Licensor are received from suppliers, distributors,
or otherwise in connection with this License.
1.3. Defensive Termination. If any Licensee, its Affiliates, or its agents
initiates patent litigation or files, maintains, or voluntarily
participates in a lawsuit against another entity or any person asserting
that any Implementation infringes Necessary Claims, any patent licenses
granted under this License directly to the Licensee are immediately
terminated as of the date of the initiation of action unless 1) that suit
was in response to a corresponding suit regarding an Implementation first
brought against an initiating entity, or 2) that suit was brought to
enforce the terms of this License (including intervention in a third-party
action by a Licensee).
1.4. Disclaimers. The Reference Implementation and Specification are provided
"AS IS" and without warranty. The entire risk as to implementing or
otherwise using the Reference Implementation or Specification is assumed
by the implementer and user. Licensor expressly disclaims any warranties
(express, implied, or otherwise), including implied warranties of
merchantability, non-infringement, fitness for a particular purpose, or
title, related to the material. IN NO EVENT WILL LICENSOR BE LIABLE TO
ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF
ACTION OF ANY KIND WITH RESPECT TO THIS LICENSE, WHETHER BASED ON BREACH
OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR
NOT THE OTHER PARTRY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2. Definitions.
2.1. Affiliate. <20>Affiliate<74> means an entity that directly or indirectly
Controls, is Controlled by, or is under common Control of that party.
2.2. Control. <20>Control<6F> means direct or indirect control of more than 50% of
the voting power to elect directors of that corporation, or for any other
entity, the power to direct management of such entity.
2.3. Decoder. "Decoder" means any decoder that conforms fully with all
non-optional portions of the Specification.
2.4. Encoder. "Encoder" means any encoder that produces a bitstream that can
be decoded by a Decoder only to the extent it produces such a bitstream.
2.5. Final Deliverable. <20>Final Deliverable<6C> means the final version of a
deliverable approved by the Alliance for Open Media as a Final
Deliverable.
2.6. Implementation. "Implementation" means any implementation, including the
Reference Implementation, that is an Encoder and/or a Decoder. An
Implementation also includes components of an Implementation only to the
extent they are used as part of an Implementation.
2.7. License. <20>License<73> means this license.
2.8. Licensee. <20>Licensee<65> means any person or entity who exercises patent
rights granted under this License.
2.9. Licensor. "Licensor" means (i) any Licensee that makes, sells, offers
for sale, imports or distributes any Implementation, or (ii) a person
or entity that has a licensing obligation to the Implementation as a
result of its membership and/or participation in the Alliance for Open
Media working group that developed the Specification.
2.10. Necessary Claims. "Necessary Claims" means all claims of patents or
patent applications, (a) that currently or at any time in the future,
are owned or controlled by the Licensor, and (b) (i) would be an
Essential Claim as defined by the W3C Policy as of February 5, 2004
(https://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential)
as if the Specification was a W3C Recommendation; or (ii) are infringed
by the Reference Implementation.
2.11. Reference Implementation. <20>Reference Implementation<6F> means an Encoder
and/or Decoder released by the Alliance for Open Media as a Final
Deliverable.
2.12. Specification. <20>Specification<6F> means the specification designated by
the Alliance for Open Media as a Final Deliverable for which this
License was issued.
Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge,
royalty-free, irrevocable (except as stated in this section) patent license to
make, have made, use, offer to sell, sell, import, transfer, and otherwise
run, modify and propagate the contents of these implementations of WebM, where
such license applies only to those patent claims, both currently owned by
Google and acquired in the future, licensable by Google that are necessarily
infringed by these implementations of WebM. This grant does not include claims
that would be infringed only as a consequence of further modification of these
implementations. If you or your agent or exclusive licensee institute or order
or agree to the institution of patent litigation or any other patent
enforcement activity against any entity (including a cross-claim or
counterclaim in a lawsuit) alleging that any of these implementations of WebM
or any code incorporated within any of these implementations of WebM
constitute direct or contributory patent infringement, or inducement of
patent infringement, then any patent rights granted to you under this License
for these implementations of WebM shall terminate as of the date such
litigation is filed.

29
README
View File

@@ -1,6 +1,6 @@
README - 23 March 2015 README - 23 March 2015
Welcome to the WebM VP8/VP9 Codec SDK! Welcome to the WebM VP8/AV1 Codec SDK!
COMPILING THE APPLICATIONS/LIBRARIES: COMPILING THE APPLICATIONS/LIBRARIES:
The build system used is similar to autotools. Building generally consists of The build system used is similar to autotools. Building generally consists of
@@ -33,13 +33,13 @@ COMPILING THE APPLICATIONS/LIBRARIES:
$ mkdir build $ mkdir build
$ cd build $ cd build
$ ../libvpx/configure <options> $ ../libaom/configure <options>
$ make $ make
3. Configuration options 3. Configuration options
The 'configure' script supports a number of options. The --help option can be The 'configure' script supports a number of options. The --help option can be
used to get a list of supported options: used to get a list of supported options:
$ ../libvpx/configure --help $ ../libaom/configure --help
4. Cross development 4. Cross development
For cross development, the most notable option is the --target option. The For cross development, the most notable option is the --target option. The
@@ -108,7 +108,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
toolchain, the following command could be used (note, POSIX SH syntax, adapt toolchain, the following command could be used (note, POSIX SH syntax, adapt
to your shell as necessary): to your shell as necessary):
$ CROSS=mipsel-linux-uclibc- ../libvpx/configure $ CROSS=mipsel-linux-uclibc- ../libaom/configure
In addition, the executables to be invoked can be overridden by specifying the In addition, the executables to be invoked can be overridden by specifying the
environment variables: CC, AR, LD, AS, STRIP, NM. Additional flags can be environment variables: CC, AR, LD, AS, STRIP, NM. Additional flags can be
@@ -119,13 +119,28 @@ COMPILING THE APPLICATIONS/LIBRARIES:
This defaults to config.log. This should give a good indication of what went This defaults to config.log. This should give a good indication of what went
wrong. If not, contact us for support. wrong. If not, contact us for support.
VP8/VP9 TEST VECTORS: VP8/AV1 TEST VECTORS:
The test vectors can be downloaded and verified using the build system after The test vectors can be downloaded and verified using the build system after
running configure. To specify an alternate directory the running configure. To specify an alternate directory the
LIBVPX_TEST_DATA_PATH environment variable can be used. LIBAOM_TEST_DATA_PATH environment variable can be used.
$ ./configure --enable-unit-tests $ ./configure --enable-unit-tests
$ LIBVPX_TEST_DATA_PATH=../libvpx-test-data make testdata $ LIBAOM_TEST_DATA_PATH=../-test-data make testdata
CODE STYLE:
The coding style used by this project is enforced with clang-format using the
configuration contained in the .clang-format file in the root of the
repository.
Before pushing changes for review you can format your code with:
# Apply clang-format to modified .c, .h and .cc files
$ clang-format -i --style=file \
$(git diff --name-only --diff-filter=ACMR '*.[hc]' '*.cc')
Check the .clang-format file for the version used to generate it if there is
any difference between your local formatting and the review system.
See also: http://clang.llvm.org/docs/ClangFormat.html
SUPPORT SUPPORT
This library is an open source project supported by its community. Please This library is an open source project supported by its community. Please

160
aom/aom.h Normal file
View File

@@ -0,0 +1,160 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup aom AOM
* \ingroup codecs
* AOM is aom's newest video compression algorithm that uses motion
* compensated prediction, Discrete Cosine Transform (DCT) coding of the
* prediction error signal and context dependent entropy coding techniques
* based on arithmetic principles. It features:
* - YUV 4:2:0 image format
* - Macro-block based coding (16x16 luma plus two 8x8 chroma)
* - 1/4 (1/8) pixel accuracy motion compensated prediction
* - 4x4 DCT transform
* - 128 level linear quantizer
* - In loop deblocking filter
* - Context-based entropy coding
*
* @{
*/
/*!\file
* \brief Provides controls common to both the AOM encoder and decoder.
*/
#ifndef AOM_AOM_H_
#define AOM_AOM_H_
#include "./aom_codec.h"
#include "./aom_image.h"
#ifdef __cplusplus
extern "C" {
#endif
/*!\brief Control functions
*
* The set of macros define the control functions of AOM interface
*/
enum aom_com_control_id {
/*!\brief pass in an external frame into decoder to be used as reference frame
*/
AOM_SET_REFERENCE = 1,
AOM_COPY_REFERENCE = 2, /**< get a copy of reference frame from the decoder */
AOM_SET_POSTPROC = 3, /**< set the decoder's post processing settings */
AOM_SET_DBG_COLOR_REF_FRAME =
4, /**< set the reference frames to color for each macroblock */
AOM_SET_DBG_COLOR_MB_MODES = 5, /**< set which macro block modes to color */
AOM_SET_DBG_COLOR_B_MODES = 6, /**< set which blocks modes to color */
AOM_SET_DBG_DISPLAY_MV = 7, /**< set which motion vector modes to draw */
/* TODO(jkoleszar): The encoder incorrectly reuses some of these values (5+)
* for its control ids. These should be migrated to something like the
* AOM_DECODER_CTRL_ID_START range next time we're ready to break the ABI.
*/
AV1_GET_REFERENCE = 128, /**< get a pointer to a reference frame */
AOM_COMMON_CTRL_ID_MAX,
AV1_GET_NEW_FRAME_IMAGE = 192, /**< get a pointer to the new frame */
AOM_DECODER_CTRL_ID_START = 256
};
/*!\brief post process flags
*
* The set of macros define AOM decoder post processing flags
*/
enum aom_postproc_level {
AOM_NOFILTERING = 0,
AOM_DEBLOCK = 1 << 0,
AOM_DEMACROBLOCK = 1 << 1,
AOM_ADDNOISE = 1 << 2,
AOM_DEBUG_TXT_FRAME_INFO = 1 << 3, /**< print frame information */
AOM_DEBUG_TXT_MBLK_MODES =
1 << 4, /**< print macro block modes over each macro block */
AOM_DEBUG_TXT_DC_DIFF = 1 << 5, /**< print dc diff for each macro block */
AOM_DEBUG_TXT_RATE_INFO = 1 << 6, /**< print video rate info (encoder only) */
AOM_MFQE = 1 << 10
};
/*!\brief post process flags
*
* This define a structure that describe the post processing settings. For
* the best objective measure (using the PSNR metric) set post_proc_flag
* to AOM_DEBLOCK and deblocking_level to 1.
*/
typedef struct aom_postproc_cfg {
/*!\brief the types of post processing to be done, should be combination of
* "aom_postproc_level" */
int post_proc_flag;
int deblocking_level; /**< the strength of deblocking, valid range [0, 16] */
int noise_level; /**< the strength of additive noise, valid range [0, 16] */
} aom_postproc_cfg_t;
/*!\brief reference frame type
*
* The set of macros define the type of AOM reference frames
*/
typedef enum aom_ref_frame_type {
AOM_LAST_FRAME = 1,
AOM_GOLD_FRAME = 2,
AOM_ALTR_FRAME = 4
} aom_ref_frame_type_t;
/*!\brief reference frame data struct
*
* Define the data struct to access aom reference frames.
*/
typedef struct aom_ref_frame {
aom_ref_frame_type_t frame_type; /**< which reference frame */
aom_image_t img; /**< reference frame data in image format */
} aom_ref_frame_t;
/*!\brief AV1 specific reference frame data struct
*
* Define the data struct to access av1 reference frames.
*/
typedef struct av1_ref_frame {
int idx; /**< frame index to get (input) */
aom_image_t img; /**< img structure to populate (output) */
} av1_ref_frame_t;
/*!\cond */
/*!\brief aom decoder control function parameter type
*
* defines the data type for each of AOM decoder control function requires
*/
AOM_CTRL_USE_TYPE(AOM_SET_REFERENCE, aom_ref_frame_t *)
#define AOM_CTRL_AOM_SET_REFERENCE
AOM_CTRL_USE_TYPE(AOM_COPY_REFERENCE, aom_ref_frame_t *)
#define AOM_CTRL_AOM_COPY_REFERENCE
AOM_CTRL_USE_TYPE(AOM_SET_POSTPROC, aom_postproc_cfg_t *)
#define AOM_CTRL_AOM_SET_POSTPROC
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_REF_FRAME, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_REF_FRAME
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_MB_MODES, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_MB_MODES
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_B_MODES, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_B_MODES
AOM_CTRL_USE_TYPE(AOM_SET_DBG_DISPLAY_MV, int)
#define AOM_CTRL_AOM_SET_DBG_DISPLAY_MV
AOM_CTRL_USE_TYPE(AV1_GET_REFERENCE, av1_ref_frame_t *)
#define AOM_CTRL_AV1_GET_REFERENCE
AOM_CTRL_USE_TYPE(AV1_GET_NEW_FRAME_IMAGE, aom_image_t *)
#define AOM_CTRL_AV1_GET_NEW_FRAME_IMAGE
/*!\endcond */
/*! @} - end defgroup aom */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOM_H_

487
aom/aom_codec.h Normal file
View File

@@ -0,0 +1,487 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup codec Common Algorithm Interface
* This abstraction allows applications to easily support multiple video
* formats with minimal code duplication. This section describes the interface
* common to all codecs (both encoders and decoders).
* @{
*/
/*!\file
* \brief Describes the codec algorithm interface to applications.
*
* This file describes the interface between an application and a
* video codec algorithm.
*
* An application instantiates a specific codec instance by using
* aom_codec_init() and a pointer to the algorithm's interface structure:
* <pre>
* my_app.c:
* extern aom_codec_iface_t my_codec;
* {
* aom_codec_ctx_t algo;
* res = aom_codec_init(&algo, &my_codec);
* }
* </pre>
*
* Once initialized, the instance is manged using other functions from
* the aom_codec_* family.
*/
#ifndef AOM_AOM_CODEC_H_
#define AOM_AOM_CODEC_H_
#ifdef __cplusplus
extern "C" {
#endif
#include "./aom_integer.h"
#include "./aom_image.h"
/*!\brief Decorator indicating a function is deprecated */
#ifndef DEPRECATED
#if defined(__GNUC__) && __GNUC__
#define DEPRECATED __attribute__((deprecated))
#elif defined(_MSC_VER)
#define DEPRECATED
#else
#define DEPRECATED
#endif
#endif /* DEPRECATED */
#ifndef DECLSPEC_DEPRECATED
#if defined(__GNUC__) && __GNUC__
#define DECLSPEC_DEPRECATED /**< \copydoc #DEPRECATED */
#elif defined(_MSC_VER)
/*!\brief \copydoc #DEPRECATED */
#define DECLSPEC_DEPRECATED __declspec(deprecated)
#else
#define DECLSPEC_DEPRECATED /**< \copydoc #DEPRECATED */
#endif
#endif /* DECLSPEC_DEPRECATED */
/*!\brief Decorator indicating a function is potentially unused */
#ifdef UNUSED
#elif defined(__GNUC__) || defined(__clang__)
#define UNUSED __attribute__((unused))
#else
#define UNUSED
#endif
/*!\brief Decorator indicating that given struct/union/enum is packed */
#ifndef ATTRIBUTE_PACKED
#if defined(__GNUC__) && __GNUC__
#define ATTRIBUTE_PACKED __attribute__((packed))
#elif defined(_MSC_VER)
#define ATTRIBUTE_PACKED
#else
#define ATTRIBUTE_PACKED
#endif
#endif /* ATTRIBUTE_PACKED */
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_CODEC_ABI_VERSION (3 + AOM_IMAGE_ABI_VERSION) /**<\hideinitializer*/
/*!\brief Algorithm return codes */
typedef enum {
/*!\brief Operation completed without error */
AOM_CODEC_OK,
/*!\brief Unspecified error */
AOM_CODEC_ERROR,
/*!\brief Memory operation failed */
AOM_CODEC_MEM_ERROR,
/*!\brief ABI version mismatch */
AOM_CODEC_ABI_MISMATCH,
/*!\brief Algorithm does not have required capability */
AOM_CODEC_INCAPABLE,
/*!\brief The given bitstream is not supported.
*
* The bitstream was unable to be parsed at the highest level. The decoder
* is unable to proceed. This error \ref SHOULD be treated as fatal to the
* stream. */
AOM_CODEC_UNSUP_BITSTREAM,
/*!\brief Encoded bitstream uses an unsupported feature
*
* The decoder does not implement a feature required by the encoder. This
* return code should only be used for features that prevent future
* pictures from being properly decoded. This error \ref MAY be treated as
* fatal to the stream or \ref MAY be treated as fatal to the current GOP.
*/
AOM_CODEC_UNSUP_FEATURE,
/*!\brief The coded data for this stream is corrupt or incomplete
*
* There was a problem decoding the current frame. This return code
* should only be used for failures that prevent future pictures from
* being properly decoded. This error \ref MAY be treated as fatal to the
* stream or \ref MAY be treated as fatal to the current GOP. If decoding
* is continued for the current GOP, artifacts may be present.
*/
AOM_CODEC_CORRUPT_FRAME,
/*!\brief An application-supplied parameter is not valid.
*
*/
AOM_CODEC_INVALID_PARAM,
/*!\brief An iterator reached the end of list.
*
*/
AOM_CODEC_LIST_END
} aom_codec_err_t;
/*! \brief Codec capabilities bitfield
*
* Each codec advertises the capabilities it supports as part of its
* ::aom_codec_iface_t interface structure. Capabilities are extra interfaces
* or functionality, and are not required to be supported.
*
* The available flags are specified by AOM_CODEC_CAP_* defines.
*/
typedef long aom_codec_caps_t;
#define AOM_CODEC_CAP_DECODER 0x1 /**< Is a decoder */
#define AOM_CODEC_CAP_ENCODER 0x2 /**< Is an encoder */
/*! \brief Initialization-time Feature Enabling
*
* Certain codec features must be known at initialization time, to allow for
* proper memory allocation.
*
* The available flags are specified by AOM_CODEC_USE_* defines.
*/
typedef long aom_codec_flags_t;
/*!\brief Codec interface structure.
*
* Contains function pointers and other data private to the codec
* implementation. This structure is opaque to the application.
*/
typedef const struct aom_codec_iface aom_codec_iface_t;
/*!\brief Codec private data structure.
*
* Contains data private to the codec implementation. This structure is opaque
* to the application.
*/
typedef struct aom_codec_priv aom_codec_priv_t;
/*!\brief Iterator
*
* Opaque storage used for iterating over lists.
*/
typedef const void *aom_codec_iter_t;
/*!\brief Codec context structure
*
* All codecs \ref MUST support this context structure fully. In general,
* this data should be considered private to the codec algorithm, and
* not be manipulated or examined by the calling application. Applications
* may reference the 'name' member to get a printable description of the
* algorithm.
*/
typedef struct aom_codec_ctx {
const char *name; /**< Printable interface name */
aom_codec_iface_t *iface; /**< Interface pointers */
aom_codec_err_t err; /**< Last returned error */
const char *err_detail; /**< Detailed info, if available */
aom_codec_flags_t init_flags; /**< Flags passed at init time */
union {
/**< Decoder Configuration Pointer */
const struct aom_codec_dec_cfg *dec;
/**< Encoder Configuration Pointer */
const struct aom_codec_enc_cfg *enc;
const void *raw;
} config; /**< Configuration pointer aliasing union */
aom_codec_priv_t *priv; /**< Algorithm private storage */
} aom_codec_ctx_t;
/*!\brief Bit depth for codec
* *
* This enumeration determines the bit depth of the codec.
*/
typedef enum aom_bit_depth {
AOM_BITS_8 = 8, /**< 8 bits */
AOM_BITS_10 = 10, /**< 10 bits */
AOM_BITS_12 = 12, /**< 12 bits */
} aom_bit_depth_t;
/*!\brief Superblock size selection.
*
* Defines the superblock size used for encoding. The superblock size can
* either be fixed at 64x64 or 128x128 pixels, or it can be dynamically
* selected by the encoder for each frame.
*/
typedef enum aom_superblock_size {
AOM_SUPERBLOCK_SIZE_64X64, /**< Always use 64x64 superblocks. */
AOM_SUPERBLOCK_SIZE_128X128, /**< Always use 128x128 superblocks. */
AOM_SUPERBLOCK_SIZE_DYNAMIC /**< Select superblock size dynamically. */
} aom_superblock_size_t;
/*
* Library Version Number Interface
*
* For example, see the following sample return values:
* aom_codec_version() (1<<16 | 2<<8 | 3)
* aom_codec_version_str() "v1.2.3-rc1-16-gec6a1ba"
* aom_codec_version_extra_str() "rc1-16-gec6a1ba"
*/
/*!\brief Return the version information (as an integer)
*
* Returns a packed encoding of the library version number. This will only
* include
* the major.minor.patch component of the version number. Note that this encoded
* value should be accessed through the macros provided, as the encoding may
* change
* in the future.
*
*/
int aom_codec_version(void);
#define AOM_VERSION_MAJOR(v) \
((v >> 16) & 0xff) /**< extract major from packed version */
#define AOM_VERSION_MINOR(v) \
((v >> 8) & 0xff) /**< extract minor from packed version */
#define AOM_VERSION_PATCH(v) \
((v >> 0) & 0xff) /**< extract patch from packed version */
/*!\brief Return the version major number */
#define aom_codec_version_major() ((aom_codec_version() >> 16) & 0xff)
/*!\brief Return the version minor number */
#define aom_codec_version_minor() ((aom_codec_version() >> 8) & 0xff)
/*!\brief Return the version patch number */
#define aom_codec_version_patch() ((aom_codec_version() >> 0) & 0xff)
/*!\brief Return the version information (as a string)
*
* Returns a printable string containing the full library version number. This
* may
* contain additional text following the three digit version number, as to
* indicate
* release candidates, prerelease versions, etc.
*
*/
const char *aom_codec_version_str(void);
/*!\brief Return the version information (as a string)
*
* Returns a printable "extra string". This is the component of the string
* returned
* by aom_codec_version_str() following the three digit version number.
*
*/
const char *aom_codec_version_extra_str(void);
/*!\brief Return the build configuration
*
* Returns a printable string containing an encoded version of the build
* configuration. This may be useful to aom support.
*
*/
const char *aom_codec_build_config(void);
/*!\brief Return the name for a given interface
*
* Returns a human readable string for name of the given codec interface.
*
* \param[in] iface Interface pointer
*
*/
const char *aom_codec_iface_name(aom_codec_iface_t *iface);
/*!\brief Convert error number to printable string
*
* Returns a human readable string for the last error returned by the
* algorithm. The returned error will be one line and will not contain
* any newline characters.
*
*
* \param[in] err Error number.
*
*/
const char *aom_codec_err_to_string(aom_codec_err_t err);
/*!\brief Retrieve error synopsis for codec context
*
* Returns a human readable string for the last error returned by the
* algorithm. The returned error will be one line and will not contain
* any newline characters.
*
*
* \param[in] ctx Pointer to this instance's context.
*
*/
const char *aom_codec_error(aom_codec_ctx_t *ctx);
/*!\brief Retrieve detailed error information for codec context
*
* Returns a human readable string providing detailed information about
* the last error.
*
* \param[in] ctx Pointer to this instance's context.
*
* \retval NULL
* No detailed information is available.
*/
const char *aom_codec_error_detail(aom_codec_ctx_t *ctx);
/* REQUIRED FUNCTIONS
*
* The following functions are required to be implemented for all codecs.
* They represent the base case functionality expected of all codecs.
*/
/*!\brief Destroy a codec instance
*
* Destroys a codec context, freeing any associated memory buffers.
*
* \param[in] ctx Pointer to this instance's context
*
* \retval #AOM_CODEC_OK
* The codec algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx);
/*!\brief Get the capabilities of an algorithm.
*
* Retrieves the capabilities bitfield from the algorithm's interface.
*
* \param[in] iface Pointer to the algorithm interface
*
*/
aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface);
/*!\brief Control algorithm
*
* This function is used to exchange algorithm specific data with the codec
* instance. This can be used to implement features specific to a particular
* algorithm.
*
* This wrapper function dispatches the request to the helper function
* associated with the given ctrl_id. It tries to call this function
* transparently, but will return #AOM_CODEC_ERROR if the request could not
* be dispatched.
*
* Note that this function should not be used directly. Call the
* #aom_codec_control wrapper macro instead.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] ctrl_id Algorithm specific control identifier
*
* \retval #AOM_CODEC_OK
* The control request was processed.
* \retval #AOM_CODEC_ERROR
* The control request was not processed.
* \retval #AOM_CODEC_INVALID_PARAM
* The data was not valid.
*/
aom_codec_err_t aom_codec_control_(aom_codec_ctx_t *ctx, int ctrl_id, ...);
#if defined(AOM_DISABLE_CTRL_TYPECHECKS) && AOM_DISABLE_CTRL_TYPECHECKS
#define aom_codec_control(ctx, id, data) aom_codec_control_(ctx, id, data)
#define AOM_CTRL_USE_TYPE(id, typ)
#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ)
#define AOM_CTRL_VOID(id, typ)
#else
/*!\brief aom_codec_control wrapper macro
*
* This macro allows for type safe conversions across the variadic parameter
* to aom_codec_control_().
*
* \internal
* It works by dispatching the call to the control function through a wrapper
* function named with the id parameter.
*/
#define aom_codec_control(ctx, id, data) \
aom_codec_control_##id(ctx, id, data) /**<\hideinitializer*/
/*!\brief aom_codec_control type definition macro
*
* This macro allows for type safe conversions across the variadic parameter
* to aom_codec_control_(). It defines the type of the argument for a given
* control identifier.
*
* \internal
* It defines a static function with
* the correctly typed arguments as a wrapper to the type-unsafe internal
* function.
*/
#define AOM_CTRL_USE_TYPE(id, typ) \
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *, int, typ) \
UNUSED; \
\
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *ctx, \
int ctrl_id, typ data) { \
return aom_codec_control_(ctx, ctrl_id, data); \
} /**<\hideinitializer*/
/*!\brief aom_codec_control deprecated type definition macro
*
* Like #AOM_CTRL_USE_TYPE, but indicates that the specified control is
* deprecated and should not be used. Consult the documentation for your
* codec for more information.
*
* \internal
* It defines a static function with the correctly typed arguments as a
* wrapper to the type-unsafe internal function.
*/
#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ) \
DECLSPEC_DEPRECATED static aom_codec_err_t aom_codec_control_##id( \
aom_codec_ctx_t *, int, typ) DEPRECATED UNUSED; \
\
DECLSPEC_DEPRECATED static aom_codec_err_t aom_codec_control_##id( \
aom_codec_ctx_t *ctx, int ctrl_id, typ data) { \
return aom_codec_control_(ctx, ctrl_id, data); \
} /**<\hideinitializer*/
/*!\brief aom_codec_control void type definition macro
*
* This macro allows for type safe conversions across the variadic parameter
* to aom_codec_control_(). It indicates that a given control identifier takes
* no argument.
*
* \internal
* It defines a static function without a data argument as a wrapper to the
* type-unsafe internal function.
*/
#define AOM_CTRL_VOID(id) \
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *, int) \
UNUSED; \
\
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *ctx, \
int ctrl_id) { \
return aom_codec_control_(ctx, ctrl_id); \
} /**<\hideinitializer*/
#endif
/*!@} - end defgroup codec*/
#ifdef __cplusplus
}
#endif
#endif // AOM_AOM_CODEC_H_

42
aom/aom_codec.mk Normal file
View File

@@ -0,0 +1,42 @@
##
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
API_EXPORTS += exports
API_SRCS-$(CONFIG_AV1_ENCODER) += aom.h
API_SRCS-$(CONFIG_AV1_ENCODER) += aomcx.h
API_DOC_SRCS-$(CONFIG_AV1_ENCODER) += aom.h
API_DOC_SRCS-$(CONFIG_AV1_ENCODER) += aomcx.h
API_SRCS-$(CONFIG_AV1_DECODER) += aom.h
API_SRCS-$(CONFIG_AV1_DECODER) += aomdx.h
API_DOC_SRCS-$(CONFIG_AV1_DECODER) += aom.h
API_DOC_SRCS-$(CONFIG_AV1_DECODER) += aomdx.h
API_DOC_SRCS-yes += aom_codec.h
API_DOC_SRCS-yes += aom_decoder.h
API_DOC_SRCS-yes += aom_encoder.h
API_DOC_SRCS-yes += aom_frame_buffer.h
API_DOC_SRCS-yes += aom_image.h
API_SRCS-yes += src/aom_decoder.c
API_SRCS-yes += aom_decoder.h
API_SRCS-yes += src/aom_encoder.c
API_SRCS-yes += aom_encoder.h
API_SRCS-yes += internal/aom_codec_internal.h
API_SRCS-yes += src/aom_codec.c
API_SRCS-yes += src/aom_image.c
API_SRCS-yes += aom_codec.h
API_SRCS-yes += aom_codec.mk
API_SRCS-yes += aom_frame_buffer.h
API_SRCS-yes += aom_image.h
API_SRCS-yes += aom_integer.h

366
aom/aom_decoder.h Normal file
View File

@@ -0,0 +1,366 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOM_DECODER_H_
#define AOM_AOM_DECODER_H_
/*!\defgroup decoder Decoder Algorithm Interface
* \ingroup codec
* This abstraction allows applications using this decoder to easily support
* multiple video formats with minimal code duplication. This section describes
* the interface common to all decoders.
* @{
*/
/*!\file
* \brief Describes the decoder algorithm interface to applications.
*
* This file describes the interface between an application and a
* video decoder algorithm.
*
*/
#ifdef __cplusplus
extern "C" {
#endif
#include "./aom_codec.h"
#include "./aom_frame_buffer.h"
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_DECODER_ABI_VERSION \
(3 + AOM_CODEC_ABI_VERSION) /**<\hideinitializer*/
/*! \brief Decoder capabilities bitfield
*
* Each decoder advertises the capabilities it supports as part of its
* ::aom_codec_iface_t interface structure. Capabilities are extra interfaces
* or functionality, and are not required to be supported by a decoder.
*
* The available flags are specified by AOM_CODEC_CAP_* defines.
*/
#define AOM_CODEC_CAP_PUT_SLICE 0x10000 /**< Will issue put_slice callbacks */
#define AOM_CODEC_CAP_PUT_FRAME 0x20000 /**< Will issue put_frame callbacks */
#define AOM_CODEC_CAP_POSTPROC 0x40000 /**< Can postprocess decoded frame */
/*!\brief Can conceal errors due to packet loss */
#define AOM_CODEC_CAP_ERROR_CONCEALMENT 0x80000
/*!\brief Can receive encoded frames one fragment at a time */
#define AOM_CODEC_CAP_INPUT_FRAGMENTS 0x100000
/*! \brief Initialization-time Feature Enabling
*
* Certain codec features must be known at initialization time, to allow for
* proper memory allocation.
*
* The available flags are specified by AOM_CODEC_USE_* defines.
*/
/*!\brief Can support frame-based multi-threading */
#define AOM_CODEC_CAP_FRAME_THREADING 0x200000
/*!brief Can support external frame buffers */
#define AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER 0x400000
#define AOM_CODEC_USE_POSTPROC 0x10000 /**< Postprocess decoded frame */
/*!\brief Conceal errors in decoded frames */
#define AOM_CODEC_USE_ERROR_CONCEALMENT 0x20000
/*!\brief The input frame should be passed to the decoder one fragment at a
* time */
#define AOM_CODEC_USE_INPUT_FRAGMENTS 0x40000
/*!\brief Enable frame-based multi-threading */
#define AOM_CODEC_USE_FRAME_THREADING 0x80000
/*!\brief Stream properties
*
* This structure is used to query or set properties of the decoded
* stream. Algorithms may extend this structure with data specific
* to their bitstream by setting the sz member appropriately.
*/
typedef struct aom_codec_stream_info {
unsigned int sz; /**< Size of this structure */
unsigned int w; /**< Width (or 0 for unknown/default) */
unsigned int h; /**< Height (or 0 for unknown/default) */
unsigned int is_kf; /**< Current frame is a keyframe */
} aom_codec_stream_info_t;
/* REQUIRED FUNCTIONS
*
* The following functions are required to be implemented for all decoders.
* They represent the base case functionality expected of all decoders.
*/
/*!\brief Initialization Configurations
*
* This structure is used to pass init time configuration options to the
* decoder.
*/
typedef struct aom_codec_dec_cfg {
unsigned int threads; /**< Maximum number of threads to use, default 1 */
unsigned int w; /**< Width */
unsigned int h; /**< Height */
} aom_codec_dec_cfg_t; /**< alias for struct aom_codec_dec_cfg */
/*!\brief Initialize a decoder instance
*
* Initializes a decoder context using the given interface. Applications
* should call the aom_codec_dec_init convenience macro instead of this
* function directly, to ensure that the ABI version number parameter
* is properly initialized.
*
* If the library was configured with --disable-multithread, this call
* is not thread safe and should be guarded with a lock if being used
* in a multithreaded context.
*
* \param[in] ctx Pointer to this instance's context.
* \param[in] iface Pointer to the algorithm interface to use.
* \param[in] cfg Configuration to use, if known. May be NULL.
* \param[in] flags Bitfield of AOM_CODEC_USE_* flags
* \param[in] ver ABI version number. Must be set to
* AOM_DECODER_ABI_VERSION
* \retval #AOM_CODEC_OK
* The decoder algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_dec_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_dec_cfg_t *cfg,
aom_codec_flags_t flags, int ver);
/*!\brief Convenience macro for aom_codec_dec_init_ver()
*
* Ensures the ABI version parameter is properly set.
*/
#define aom_codec_dec_init(ctx, iface, cfg, flags) \
aom_codec_dec_init_ver(ctx, iface, cfg, flags, AOM_DECODER_ABI_VERSION)
/*!\brief Parse stream info from a buffer
*
* Performs high level parsing of the bitstream. Construction of a decoder
* context is not necessary. Can be used to determine if the bitstream is
* of the proper format, and to extract information from the stream.
*
* \param[in] iface Pointer to the algorithm interface
* \param[in] data Pointer to a block of data to parse
* \param[in] data_sz Size of the data buffer
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
aom_codec_err_t aom_codec_peek_stream_info(aom_codec_iface_t *iface,
const uint8_t *data,
unsigned int data_sz,
aom_codec_stream_info_t *si);
/*!\brief Return information about the current stream.
*
* Returns information about the stream that has been parsed during decoding.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
aom_codec_err_t aom_codec_get_stream_info(aom_codec_ctx_t *ctx,
aom_codec_stream_info_t *si);
/*!\brief Decode data
*
* Processes a buffer of coded data. If the processing results in a new
* decoded frame becoming available, PUT_SLICE and PUT_FRAME events may be
* generated, as appropriate. Encoded data \ref MUST be passed in DTS (decode
* time stamp) order. Frames produced will always be in PTS (presentation
* time stamp) order.
* If the decoder is configured with AOM_CODEC_USE_INPUT_FRAGMENTS enabled,
* data and data_sz can contain a fragment of the encoded frame. Fragment
* \#n must contain at least partition \#n, but can also contain subsequent
* partitions (\#n+1 - \#n+i), and if so, fragments \#n+1, .., \#n+i must
* be empty. When no more data is available, this function should be called
* with NULL as data and 0 as data_sz. The memory passed to this function
* must be available until the frame has been decoded.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] data Pointer to this block of new coded data. If
* NULL, a AOM_CODEC_CB_PUT_FRAME event is posted
* for the previously decoded frame.
* \param[in] data_sz Size of the coded data, in bytes.
* \param[in] user_priv Application specific data to associate with
* this frame.
* \param[in] deadline Soft deadline the decoder should attempt to meet,
* in us. Set to zero for unlimited.
*
* \return Returns #AOM_CODEC_OK if the coded data was processed completely
* and future pictures can be decoded without error. Otherwise,
* see the descriptions of the other error codes in ::aom_codec_err_t
* for recoverability capabilities.
*/
aom_codec_err_t aom_codec_decode(aom_codec_ctx_t *ctx, const uint8_t *data,
unsigned int data_sz, void *user_priv,
long deadline);
/*!\brief Decoded frames iterator
*
* Iterates over a list of the frames available for display. The iterator
* storage should be initialized to NULL to start the iteration. Iteration is
* complete when this function returns NULL.
*
* The list of available frames becomes valid upon completion of the
* aom_codec_decode call, and remains valid until the next call to
* aom_codec_decode.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] iter Iterator storage, initialized to NULL
*
* \return Returns a pointer to an image, if one is ready for display. Frames
* produced will always be in PTS (presentation time stamp) order.
*/
aom_image_t *aom_codec_get_frame(aom_codec_ctx_t *ctx, aom_codec_iter_t *iter);
/*!\defgroup cap_put_frame Frame-Based Decoding Functions
*
* The following functions are required to be implemented for all decoders
* that advertise the AOM_CODEC_CAP_PUT_FRAME capability. Calling these
* functions
* for codecs that don't advertise this capability will result in an error
* code being returned, usually AOM_CODEC_ERROR
* @{
*/
/*!\brief put frame callback prototype
*
* This callback is invoked by the decoder to notify the application of
* the availability of decoded image data.
*/
typedef void (*aom_codec_put_frame_cb_fn_t)(void *user_priv,
const aom_image_t *img);
/*!\brief Register for notification of frame completion.
*
* Registers a given function to be called when a decoded frame is
* available.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb Pointer to the callback function
* \param[in] user_priv User's private data
*
* \retval #AOM_CODEC_OK
* Callback successfully registered.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* posting slice completion.
*/
aom_codec_err_t aom_codec_register_put_frame_cb(aom_codec_ctx_t *ctx,
aom_codec_put_frame_cb_fn_t cb,
void *user_priv);
/*!@} - end defgroup cap_put_frame */
/*!\defgroup cap_put_slice Slice-Based Decoding Functions
*
* The following functions are required to be implemented for all decoders
* that advertise the AOM_CODEC_CAP_PUT_SLICE capability. Calling these
* functions
* for codecs that don't advertise this capability will result in an error
* code being returned, usually AOM_CODEC_ERROR
* @{
*/
/*!\brief put slice callback prototype
*
* This callback is invoked by the decoder to notify the application of
* the availability of partially decoded image data. The
*/
typedef void (*aom_codec_put_slice_cb_fn_t)(void *user_priv,
const aom_image_t *img,
const aom_image_rect_t *valid,
const aom_image_rect_t *update);
/*!\brief Register for notification of slice completion.
*
* Registers a given function to be called when a decoded slice is
* available.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb Pointer to the callback function
* \param[in] user_priv User's private data
*
* \retval #AOM_CODEC_OK
* Callback successfully registered.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* posting slice completion.
*/
aom_codec_err_t aom_codec_register_put_slice_cb(aom_codec_ctx_t *ctx,
aom_codec_put_slice_cb_fn_t cb,
void *user_priv);
/*!@} - end defgroup cap_put_slice*/
/*!\defgroup cap_external_frame_buffer External Frame Buffer Functions
*
* The following section is required to be implemented for all decoders
* that advertise the AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER capability.
* Calling this function for codecs that don't advertise this capability
* will result in an error code being returned, usually AOM_CODEC_ERROR.
*
* \note
* Currently this only works with AV1.
* @{
*/
/*!\brief Pass in external frame buffers for the decoder to use.
*
* Registers functions to be called when libaom needs a frame buffer
* to decode the current frame and a function to be called when libaom does
* not internally reference the frame buffer. This set function must
* be called before the first call to decode or libaom will assume the
* default behavior of allocating frame buffers internally.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb_get Pointer to the get callback function
* \param[in] cb_release Pointer to the release callback function
* \param[in] cb_priv Callback's private data
*
* \retval #AOM_CODEC_OK
* External frame buffers will be used by libaom.
* \retval #AOM_CODEC_INVALID_PARAM
* One or more of the callbacks were NULL.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* using external frame buffers.
*
* \note
* When decoding AV1, the application may be required to pass in at least
* #AOM_MAXIMUM_WORK_BUFFERS external frame
* buffers.
*/
aom_codec_err_t aom_codec_set_frame_buffer_functions(
aom_codec_ctx_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv);
/*!@} - end defgroup cap_external_frame_buffer */
/*!@} - end defgroup decoder*/
#ifdef __cplusplus
}
#endif
#endif // AOM_AOM_DECODER_H_

837
aom/aom_encoder.h Normal file
View File

@@ -0,0 +1,837 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOM_ENCODER_H_
#define AOM_AOM_ENCODER_H_
/*!\defgroup encoder Encoder Algorithm Interface
* \ingroup codec
* This abstraction allows applications using this encoder to easily support
* multiple video formats with minimal code duplication. This section describes
* the interface common to all encoders.
* @{
*/
/*!\file
* \brief Describes the encoder algorithm interface to applications.
*
* This file describes the interface between an application and a
* video encoder algorithm.
*
*/
#ifdef __cplusplus
extern "C" {
#endif
#include "./aom_codec.h"
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_ENCODER_ABI_VERSION \
(5 + AOM_CODEC_ABI_VERSION) /**<\hideinitializer*/
/*! \brief Encoder capabilities bitfield
*
* Each encoder advertises the capabilities it supports as part of its
* ::aom_codec_iface_t interface structure. Capabilities are extra
* interfaces or functionality, and are not required to be supported
* by an encoder.
*
* The available flags are specified by AOM_CODEC_CAP_* defines.
*/
#define AOM_CODEC_CAP_PSNR 0x10000 /**< Can issue PSNR packets */
/*! Can output one partition at a time. Each partition is returned in its
* own AOM_CODEC_CX_FRAME_PKT, with the FRAME_IS_FRAGMENT flag set for
* every partition but the last. In this mode all frames are always
* returned partition by partition.
*/
#define AOM_CODEC_CAP_OUTPUT_PARTITION 0x20000
/*! Can support input images at greater than 8 bitdepth.
*/
#define AOM_CODEC_CAP_HIGHBITDEPTH 0x40000
/*! \brief Initialization-time Feature Enabling
*
* Certain codec features must be known at initialization time, to allow
* for proper memory allocation.
*
* The available flags are specified by AOM_CODEC_USE_* defines.
*/
#define AOM_CODEC_USE_PSNR 0x10000 /**< Calculate PSNR on each frame */
/*!\brief Make the encoder output one partition at a time. */
#define AOM_CODEC_USE_OUTPUT_PARTITION 0x20000
#define AOM_CODEC_USE_HIGHBITDEPTH 0x40000 /**< Use high bitdepth */
/*!\brief Generic fixed size buffer structure
*
* This structure is able to hold a reference to any fixed size buffer.
*/
typedef struct aom_fixed_buf {
void *buf; /**< Pointer to the data */
size_t sz; /**< Length of the buffer, in chars */
} aom_fixed_buf_t; /**< alias for struct aom_fixed_buf */
/*!\brief Time Stamp Type
*
* An integer, which when multiplied by the stream's time base, provides
* the absolute time of a sample.
*/
typedef int64_t aom_codec_pts_t;
/*!\brief Compressed Frame Flags
*
* This type represents a bitfield containing information about a compressed
* frame that may be useful to an application. The most significant 16 bits
* can be used by an algorithm to provide additional detail, for example to
* support frame types that are codec specific (MPEG-1 D-frames for example)
*/
typedef uint32_t aom_codec_frame_flags_t;
#define AOM_FRAME_IS_KEY 0x1 /**< frame is the start of a GOP */
/*!\brief frame can be dropped without affecting the stream (no future frame
* depends on this one) */
#define AOM_FRAME_IS_DROPPABLE 0x2
/*!\brief frame should be decoded but will not be shown */
#define AOM_FRAME_IS_INVISIBLE 0x4
/*!\brief this is a fragment of the encoded frame */
#define AOM_FRAME_IS_FRAGMENT 0x8
/*!\brief Error Resilient flags
*
* These flags define which error resilient features to enable in the
* encoder. The flags are specified through the
* aom_codec_enc_cfg::g_error_resilient variable.
*/
typedef uint32_t aom_codec_er_flags_t;
/*!\brief Improve resiliency against losses of whole frames */
#define AOM_ERROR_RESILIENT_DEFAULT 0x1
/*!\brief The frame partitions are independently decodable by the bool decoder,
* meaning that partitions can be decoded even though earlier partitions have
* been lost. Note that intra prediction is still done over the partition
* boundary. */
#define AOM_ERROR_RESILIENT_PARTITIONS 0x2
/*!\brief Encoder output packet variants
*
* This enumeration lists the different kinds of data packets that can be
* returned by calls to aom_codec_get_cx_data(). Algorithms \ref MAY
* extend this list to provide additional functionality.
*/
enum aom_codec_cx_pkt_kind {
AOM_CODEC_CX_FRAME_PKT, /**< Compressed video frame */
AOM_CODEC_STATS_PKT, /**< Two-pass statistics for this frame */
AOM_CODEC_FPMB_STATS_PKT, /**< first pass mb statistics for this frame */
AOM_CODEC_PSNR_PKT, /**< PSNR statistics for this frame */
AOM_CODEC_CUSTOM_PKT = 256 /**< Algorithm extensions */
};
/*!\brief Encoder output packet
*
* This structure contains the different kinds of output data the encoder
* may produce while compressing a frame.
*/
typedef struct aom_codec_cx_pkt {
enum aom_codec_cx_pkt_kind kind; /**< packet variant */
union {
struct {
void *buf; /**< compressed data buffer */
size_t sz; /**< length of compressed data */
/*!\brief time stamp to show frame (in timebase units) */
aom_codec_pts_t pts;
/*!\brief duration to show frame (in timebase units) */
unsigned long duration;
aom_codec_frame_flags_t flags; /**< flags for this frame */
/*!\brief the partition id defines the decoding order of the partitions.
* Only applicable when "output partition" mode is enabled. First
* partition has id 0.*/
int partition_id;
} frame; /**< data for compressed frame packet */
aom_fixed_buf_t twopass_stats; /**< data for two-pass packet */
aom_fixed_buf_t firstpass_mb_stats; /**< first pass mb packet */
struct aom_psnr_pkt {
unsigned int samples[4]; /**< Number of samples, total/y/u/v */
uint64_t sse[4]; /**< sum squared error, total/y/u/v */
double psnr[4]; /**< PSNR, total/y/u/v */
} psnr; /**< data for PSNR packet */
aom_fixed_buf_t raw; /**< data for arbitrary packets */
/* This packet size is fixed to allow codecs to extend this
* interface without having to manage storage for raw packets,
* i.e., if it's smaller than 128 bytes, you can store in the
* packet list directly.
*/
char pad[128 - sizeof(enum aom_codec_cx_pkt_kind)]; /**< fixed sz */
} data; /**< packet data */
} aom_codec_cx_pkt_t; /**< alias for struct aom_codec_cx_pkt */
/*!\brief Rational Number
*
* This structure holds a fractional value.
*/
typedef struct aom_rational {
int num; /**< fraction numerator */
int den; /**< fraction denominator */
} aom_rational_t; /**< alias for struct aom_rational */
/*!\brief Multi-pass Encoding Pass */
enum aom_enc_pass {
AOM_RC_ONE_PASS, /**< Single pass mode */
AOM_RC_FIRST_PASS, /**< First pass of multi-pass mode */
AOM_RC_LAST_PASS /**< Final pass of multi-pass mode */
};
/*!\brief Rate control mode */
enum aom_rc_mode {
AOM_VBR, /**< Variable Bit Rate (VBR) mode */
AOM_CBR, /**< Constant Bit Rate (CBR) mode */
AOM_CQ, /**< Constrained Quality (CQ) mode */
AOM_Q, /**< Constant Quality (Q) mode */
};
/*!\brief Keyframe placement mode.
*
* This enumeration determines whether keyframes are placed automatically by
* the encoder or whether this behavior is disabled. Older releases of this
* SDK were implemented such that AOM_KF_FIXED meant keyframes were disabled.
* This name is confusing for this behavior, so the new symbols to be used
* are AOM_KF_AUTO and AOM_KF_DISABLED.
*/
enum aom_kf_mode {
AOM_KF_FIXED, /**< deprecated, implies AOM_KF_DISABLED */
AOM_KF_AUTO, /**< Encoder determines optimal placement automatically */
AOM_KF_DISABLED = 0 /**< Encoder does not place keyframes. */
};
/*!\brief Encoded Frame Flags
*
* This type indicates a bitfield to be passed to aom_codec_encode(), defining
* per-frame boolean values. By convention, bits common to all codecs will be
* named AOM_EFLAG_*, and bits specific to an algorithm will be named
* /algo/_eflag_*. The lower order 16 bits are reserved for common use.
*/
typedef long aom_enc_frame_flags_t;
#define AOM_EFLAG_FORCE_KF (1 << 0) /**< Force this frame to be a keyframe */
/*!\brief Encoder configuration structure
*
* This structure contains the encoder settings that have common representations
* across all codecs. This doesn't imply that all codecs support all features,
* however.
*/
typedef struct aom_codec_enc_cfg {
/*
* generic settings (g)
*/
/*!\brief Algorithm specific "usage" value
*
* Algorithms may define multiple values for usage, which may convey the
* intent of how the application intends to use the stream. If this value
* is non-zero, consult the documentation for the codec to determine its
* meaning.
*/
unsigned int g_usage;
/*!\brief Maximum number of threads to use
*
* For multi-threaded implementations, use no more than this number of
* threads. The codec may use fewer threads than allowed. The value
* 0 is equivalent to the value 1.
*/
unsigned int g_threads;
/*!\brief Bitstream profile to use
*
* Some codecs support a notion of multiple bitstream profiles. Typically
* this maps to a set of features that are turned on or off. Often the
* profile to use is determined by the features of the intended decoder.
* Consult the documentation for the codec to determine the valid values
* for this parameter, or set to zero for a sane default.
*/
unsigned int g_profile; /**< profile of bitstream to use */
/*!\brief Width of the frame
*
* This value identifies the presentation resolution of the frame,
* in pixels. Note that the frames passed as input to the encoder must
* have this resolution. Frames will be presented by the decoder in this
* resolution, independent of any spatial resampling the encoder may do.
*/
unsigned int g_w;
/*!\brief Height of the frame
*
* This value identifies the presentation resolution of the frame,
* in pixels. Note that the frames passed as input to the encoder must
* have this resolution. Frames will be presented by the decoder in this
* resolution, independent of any spatial resampling the encoder may do.
*/
unsigned int g_h;
/*!\brief Bit-depth of the codec
*
* This value identifies the bit_depth of the codec,
* Only certain bit-depths are supported as identified in the
* aom_bit_depth_t enum.
*/
aom_bit_depth_t g_bit_depth;
/*!\brief Bit-depth of the input frames
*
* This value identifies the bit_depth of the input frames in bits.
* Note that the frames passed as input to the encoder must have
* this bit-depth.
*/
unsigned int g_input_bit_depth;
/*!\brief Stream timebase units
*
* Indicates the smallest interval of time, in seconds, used by the stream.
* For fixed frame rate material, or variable frame rate material where
* frames are timed at a multiple of a given clock (ex: video capture),
* the \ref RECOMMENDED method is to set the timebase to the reciprocal
* of the frame rate (ex: 1001/30000 for 29.970 Hz NTSC). This allows the
* pts to correspond to the frame number, which can be handy. For
* re-encoding video from containers with absolute time timestamps, the
* \ref RECOMMENDED method is to set the timebase to that of the parent
* container or multimedia framework (ex: 1/1000 for ms, as in FLV).
*/
struct aom_rational g_timebase;
/*!\brief Enable error resilient modes.
*
* The error resilient bitfield indicates to the encoder which features
* it should enable to take measures for streaming over lossy or noisy
* links.
*/
aom_codec_er_flags_t g_error_resilient;
/*!\brief Multi-pass Encoding Mode
*
* This value should be set to the current phase for multi-pass encoding.
* For single pass, set to #AOM_RC_ONE_PASS.
*/
enum aom_enc_pass g_pass;
/*!\brief Allow lagged encoding
*
* If set, this value allows the encoder to consume a number of input
* frames before producing output frames. This allows the encoder to
* base decisions for the current frame on future frames. This does
* increase the latency of the encoding pipeline, so it is not appropriate
* in all situations (ex: realtime encoding).
*
* Note that this is a maximum value -- the encoder may produce frames
* sooner than the given limit. Set this value to 0 to disable this
* feature.
*/
unsigned int g_lag_in_frames;
/*
* rate control settings (rc)
*/
/*!\brief Temporal resampling configuration, if supported by the codec.
*
* Temporal resampling allows the codec to "drop" frames as a strategy to
* meet its target data rate. This can cause temporal discontinuities in
* the encoded video, which may appear as stuttering during playback. This
* trade-off is often acceptable, but for many applications is not. It can
* be disabled in these cases.
*
* Note that not all codecs support this feature. All aom AVx codecs do.
* For other codecs, consult the documentation for that algorithm.
*
* This threshold is described as a percentage of the target data buffer.
* When the data buffer falls below this percentage of fullness, a
* dropped frame is indicated. Set the threshold to zero (0) to disable
* this feature.
*/
unsigned int rc_dropframe_thresh;
/*!\brief Enable/disable spatial resampling, if supported by the codec.
*
* Spatial resampling allows the codec to compress a lower resolution
* version of the frame, which is then upscaled by the encoder to the
* correct presentation resolution. This increases visual quality at
* low data rates, at the expense of CPU time on the encoder/decoder.
*/
unsigned int rc_resize_allowed;
/*!\brief Internal coded frame width.
*
* If spatial resampling is enabled this specifies the width of the
* encoded frame.
*/
unsigned int rc_scaled_width;
/*!\brief Internal coded frame height.
*
* If spatial resampling is enabled this specifies the height of the
* encoded frame.
*/
unsigned int rc_scaled_height;
/*!\brief Spatial resampling up watermark.
*
* This threshold is described as a percentage of the target data buffer.
* When the data buffer rises above this percentage of fullness, the
* encoder will step up to a higher resolution version of the frame.
*/
unsigned int rc_resize_up_thresh;
/*!\brief Spatial resampling down watermark.
*
* This threshold is described as a percentage of the target data buffer.
* When the data buffer falls below this percentage of fullness, the
* encoder will step down to a lower resolution version of the frame.
*/
unsigned int rc_resize_down_thresh;
/*!\brief Rate control algorithm to use.
*
* Indicates whether the end usage of this stream is to be streamed over
* a bandwidth constrained link, indicating that Constant Bit Rate (CBR)
* mode should be used, or whether it will be played back on a high
* bandwidth link, as from a local disk, where higher variations in
* bitrate are acceptable.
*/
enum aom_rc_mode rc_end_usage;
/*!\brief Two-pass stats buffer.
*
* A buffer containing all of the stats packets produced in the first
* pass, concatenated.
*/
aom_fixed_buf_t rc_twopass_stats_in;
/*!\brief first pass mb stats buffer.
*
* A buffer containing all of the first pass mb stats packets produced
* in the first pass, concatenated.
*/
aom_fixed_buf_t rc_firstpass_mb_stats_in;
/*!\brief Target data rate
*
* Target bandwidth to use for this stream, in kilobits per second.
*/
unsigned int rc_target_bitrate;
/*
* quantizer settings
*/
/*!\brief Minimum (Best Quality) Quantizer
*
* The quantizer is the most direct control over the quality of the
* encoded image. The range of valid values for the quantizer is codec
* specific. Consult the documentation for the codec to determine the
* values to use. To determine the range programmatically, call
* aom_codec_enc_config_default() with a usage value of 0.
*/
unsigned int rc_min_quantizer;
/*!\brief Maximum (Worst Quality) Quantizer
*
* The quantizer is the most direct control over the quality of the
* encoded image. The range of valid values for the quantizer is codec
* specific. Consult the documentation for the codec to determine the
* values to use. To determine the range programmatically, call
* aom_codec_enc_config_default() with a usage value of 0.
*/
unsigned int rc_max_quantizer;
/*
* bitrate tolerance
*/
/*!\brief Rate control adaptation undershoot control
*
* This value, expressed as a percentage of the target bitrate,
* controls the maximum allowed adaptation speed of the codec.
* This factor controls the maximum amount of bits that can
* be subtracted from the target bitrate in order to compensate
* for prior overshoot.
*
* Valid values in the range 0-1000.
*/
unsigned int rc_undershoot_pct;
/*!\brief Rate control adaptation overshoot control
*
* This value, expressed as a percentage of the target bitrate,
* controls the maximum allowed adaptation speed of the codec.
* This factor controls the maximum amount of bits that can
* be added to the target bitrate in order to compensate for
* prior undershoot.
*
* Valid values in the range 0-1000.
*/
unsigned int rc_overshoot_pct;
/*
* decoder buffer model parameters
*/
/*!\brief Decoder Buffer Size
*
* This value indicates the amount of data that may be buffered by the
* decoding application. Note that this value is expressed in units of
* time (milliseconds). For example, a value of 5000 indicates that the
* client will buffer (at least) 5000ms worth of encoded data. Use the
* target bitrate (#rc_target_bitrate) to convert to bits/bytes, if
* necessary.
*/
unsigned int rc_buf_sz;
/*!\brief Decoder Buffer Initial Size
*
* This value indicates the amount of data that will be buffered by the
* decoding application prior to beginning playback. This value is
* expressed in units of time (milliseconds). Use the target bitrate
* (#rc_target_bitrate) to convert to bits/bytes, if necessary.
*/
unsigned int rc_buf_initial_sz;
/*!\brief Decoder Buffer Optimal Size
*
* This value indicates the amount of data that the encoder should try
* to maintain in the decoder's buffer. This value is expressed in units
* of time (milliseconds). Use the target bitrate (#rc_target_bitrate)
* to convert to bits/bytes, if necessary.
*/
unsigned int rc_buf_optimal_sz;
/*
* 2 pass rate control parameters
*/
/*!\brief Two-pass mode CBR/VBR bias
*
* Bias, expressed on a scale of 0 to 100, for determining target size
* for the current frame. The value 0 indicates the optimal CBR mode
* value should be used. The value 100 indicates the optimal VBR mode
* value should be used. Values in between indicate which way the
* encoder should "lean."
*/
unsigned int rc_2pass_vbr_bias_pct;
/*!\brief Two-pass mode per-GOP minimum bitrate
*
* This value, expressed as a percentage of the target bitrate, indicates
* the minimum bitrate to be used for a single GOP (aka "section")
*/
unsigned int rc_2pass_vbr_minsection_pct;
/*!\brief Two-pass mode per-GOP maximum bitrate
*
* This value, expressed as a percentage of the target bitrate, indicates
* the maximum bitrate to be used for a single GOP (aka "section")
*/
unsigned int rc_2pass_vbr_maxsection_pct;
/*
* keyframing settings (kf)
*/
/*!\brief Keyframe placement mode
*
* This value indicates whether the encoder should place keyframes at a
* fixed interval, or determine the optimal placement automatically
* (as governed by the #kf_min_dist and #kf_max_dist parameters)
*/
enum aom_kf_mode kf_mode;
/*!\brief Keyframe minimum interval
*
* This value, expressed as a number of frames, prevents the encoder from
* placing a keyframe nearer than kf_min_dist to the previous keyframe. At
* least kf_min_dist frames non-keyframes will be coded before the next
* keyframe. Set kf_min_dist equal to kf_max_dist for a fixed interval.
*/
unsigned int kf_min_dist;
/*!\brief Keyframe maximum interval
*
* This value, expressed as a number of frames, forces the encoder to code
* a keyframe if one has not been coded in the last kf_max_dist frames.
* A value of 0 implies all frames will be keyframes. Set kf_min_dist
* equal to kf_max_dist for a fixed interval.
*/
unsigned int kf_max_dist;
} aom_codec_enc_cfg_t; /**< alias for struct aom_codec_enc_cfg */
/*!\brief Initialize an encoder instance
*
* Initializes a encoder context using the given interface. Applications
* should call the aom_codec_enc_init convenience macro instead of this
* function directly, to ensure that the ABI version number parameter
* is properly initialized.
*
* If the library was configured with --disable-multithread, this call
* is not thread safe and should be guarded with a lock if being used
* in a multithreaded context.
*
* \param[in] ctx Pointer to this instance's context.
* \param[in] iface Pointer to the algorithm interface to use.
* \param[in] cfg Configuration to use, if known. May be NULL.
* \param[in] flags Bitfield of AOM_CODEC_USE_* flags
* \param[in] ver ABI version number. Must be set to
* AOM_ENCODER_ABI_VERSION
* \retval #AOM_CODEC_OK
* The decoder algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_enc_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_enc_cfg_t *cfg,
aom_codec_flags_t flags, int ver);
/*!\brief Convenience macro for aom_codec_enc_init_ver()
*
* Ensures the ABI version parameter is properly set.
*/
#define aom_codec_enc_init(ctx, iface, cfg, flags) \
aom_codec_enc_init_ver(ctx, iface, cfg, flags, AOM_ENCODER_ABI_VERSION)
/*!\brief Initialize multi-encoder instance
*
* Initializes multi-encoder context using the given interface.
* Applications should call the aom_codec_enc_init_multi convenience macro
* instead of this function directly, to ensure that the ABI version number
* parameter is properly initialized.
*
* \param[in] ctx Pointer to this instance's context.
* \param[in] iface Pointer to the algorithm interface to use.
* \param[in] cfg Configuration to use, if known. May be NULL.
* \param[in] num_enc Total number of encoders.
* \param[in] flags Bitfield of AOM_CODEC_USE_* flags
* \param[in] dsf Pointer to down-sampling factors.
* \param[in] ver ABI version number. Must be set to
* AOM_ENCODER_ABI_VERSION
* \retval #AOM_CODEC_OK
* The decoder algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_enc_init_multi_ver(
aom_codec_ctx_t *ctx, aom_codec_iface_t *iface, aom_codec_enc_cfg_t *cfg,
int num_enc, aom_codec_flags_t flags, aom_rational_t *dsf, int ver);
/*!\brief Convenience macro for aom_codec_enc_init_multi_ver()
*
* Ensures the ABI version parameter is properly set.
*/
#define aom_codec_enc_init_multi(ctx, iface, cfg, num_enc, flags, dsf) \
aom_codec_enc_init_multi_ver(ctx, iface, cfg, num_enc, flags, dsf, \
AOM_ENCODER_ABI_VERSION)
/*!\brief Get a default configuration
*
* Initializes a encoder configuration structure with default values. Supports
* the notion of "usages" so that an algorithm may offer different default
* settings depending on the user's intended goal. This function \ref SHOULD
* be called by all applications to initialize the configuration structure
* before specializing the configuration with application specific values.
*
* \param[in] iface Pointer to the algorithm interface to use.
* \param[out] cfg Configuration buffer to populate.
* \param[in] reserved Must set to 0 for VP8 and AV1.
*
* \retval #AOM_CODEC_OK
* The configuration was populated.
* \retval #AOM_CODEC_INCAPABLE
* Interface is not an encoder interface.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, or the usage value was not recognized.
*/
aom_codec_err_t aom_codec_enc_config_default(aom_codec_iface_t *iface,
aom_codec_enc_cfg_t *cfg,
unsigned int reserved);
/*!\brief Set or change configuration
*
* Reconfigures an encoder instance according to the given configuration.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cfg Configuration buffer to use
*
* \retval #AOM_CODEC_OK
* The configuration was populated.
* \retval #AOM_CODEC_INCAPABLE
* Interface is not an encoder interface.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, or the usage value was not recognized.
*/
aom_codec_err_t aom_codec_enc_config_set(aom_codec_ctx_t *ctx,
const aom_codec_enc_cfg_t *cfg);
/*!\brief Get global stream headers
*
* Retrieves a stream level global header packet, if supported by the codec.
*
* \param[in] ctx Pointer to this instance's context
*
* \retval NULL
* Encoder does not support global header
* \retval Non-NULL
* Pointer to buffer containing global header packet
*/
aom_fixed_buf_t *aom_codec_get_global_headers(aom_codec_ctx_t *ctx);
/*!\brief deadline parameter analogous to AVx REALTIME mode. */
#define AOM_DL_REALTIME (1)
/*!\brief deadline parameter analogous to AVx GOOD QUALITY mode. */
#define AOM_DL_GOOD_QUALITY (1000000)
/*!\brief deadline parameter analogous to AVx BEST QUALITY mode. */
#define AOM_DL_BEST_QUALITY (0)
/*!\brief Encode a frame
*
* Encodes a video frame at the given "presentation time." The presentation
* time stamp (PTS) \ref MUST be strictly increasing.
*
* The encoder supports the notion of a soft real-time deadline. Given a
* non-zero value to the deadline parameter, the encoder will make a "best
* effort" guarantee to return before the given time slice expires. It is
* implicit that limiting the available time to encode will degrade the
* output quality. The encoder can be given an unlimited time to produce the
* best possible frame by specifying a deadline of '0'. This deadline
* supercedes the AVx notion of "best quality, good quality, realtime".
* Applications that wish to map these former settings to the new deadline
* based system can use the symbols #AOM_DL_REALTIME, #AOM_DL_GOOD_QUALITY,
* and #AOM_DL_BEST_QUALITY.
*
* When the last frame has been passed to the encoder, this function should
* continue to be called, with the img parameter set to NULL. This will
* signal the end-of-stream condition to the encoder and allow it to encode
* any held buffers. Encoding is complete when aom_codec_encode() is called
* and aom_codec_get_cx_data() returns no data.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] img Image data to encode, NULL to flush.
* \param[in] pts Presentation time stamp, in timebase units.
* \param[in] duration Duration to show frame, in timebase units.
* \param[in] flags Flags to use for encoding this frame.
* \param[in] deadline Time to spend encoding, in microseconds. (0=infinite)
*
* \retval #AOM_CODEC_OK
* The configuration was populated.
* \retval #AOM_CODEC_INCAPABLE
* Interface is not an encoder interface.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, the image format is unsupported, etc.
*/
aom_codec_err_t aom_codec_encode(aom_codec_ctx_t *ctx, const aom_image_t *img,
aom_codec_pts_t pts, unsigned long duration,
aom_enc_frame_flags_t flags,
unsigned long deadline);
/*!\brief Set compressed data output buffer
*
* Sets the buffer that the codec should output the compressed data
* into. This call effectively sets the buffer pointer returned in the
* next AOM_CODEC_CX_FRAME_PKT packet. Subsequent packets will be
* appended into this buffer. The buffer is preserved across frames,
* so applications must periodically call this function after flushing
* the accumulated compressed data to disk or to the network to reset
* the pointer to the buffer's head.
*
* `pad_before` bytes will be skipped before writing the compressed
* data, and `pad_after` bytes will be appended to the packet. The size
* of the packet will be the sum of the size of the actual compressed
* data, pad_before, and pad_after. The padding bytes will be preserved
* (not overwritten).
*
* Note that calling this function does not guarantee that the returned
* compressed data will be placed into the specified buffer. In the
* event that the encoded data will not fit into the buffer provided,
* the returned packet \ref MAY point to an internal buffer, as it would
* if this call were never used. In this event, the output packet will
* NOT have any padding, and the application must free space and copy it
* to the proper place. This is of particular note in configurations
* that may output multiple packets for a single encoded frame (e.g., lagged
* encoding) or if the application does not reset the buffer periodically.
*
* Applications may restore the default behavior of the codec providing
* the compressed data buffer by calling this function with a NULL
* buffer.
*
* Applications \ref MUSTNOT call this function during iteration of
* aom_codec_get_cx_data().
*
* \param[in] ctx Pointer to this instance's context
* \param[in] buf Buffer to store compressed data into
* \param[in] pad_before Bytes to skip before writing compressed data
* \param[in] pad_after Bytes to skip after writing compressed data
*
* \retval #AOM_CODEC_OK
* The buffer was set successfully.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, the image format is unsupported, etc.
*/
aom_codec_err_t aom_codec_set_cx_data_buf(aom_codec_ctx_t *ctx,
const aom_fixed_buf_t *buf,
unsigned int pad_before,
unsigned int pad_after);
/*!\brief Encoded data iterator
*
* Iterates over a list of data packets to be passed from the encoder to the
* application. The different kinds of packets available are enumerated in
* #aom_codec_cx_pkt_kind.
*
* #AOM_CODEC_CX_FRAME_PKT packets should be passed to the application's
* muxer. Multiple compressed frames may be in the list.
* #AOM_CODEC_STATS_PKT packets should be appended to a global buffer.
*
* The application \ref MUST silently ignore any packet kinds that it does
* not recognize or support.
*
* The data buffers returned from this function are only guaranteed to be
* valid until the application makes another call to any aom_codec_* function.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] iter Iterator storage, initialized to NULL
*
* \return Returns a pointer to an output data packet (compressed frame data,
* two-pass statistics, etc.) or NULL to signal end-of-list.
*
*/
const aom_codec_cx_pkt_t *aom_codec_get_cx_data(aom_codec_ctx_t *ctx,
aom_codec_iter_t *iter);
/*!\brief Get Preview Frame
*
* Returns an image that can be used as a preview. Shows the image as it would
* exist at the decompressor. The application \ref MUST NOT write into this
* image buffer.
*
* \param[in] ctx Pointer to this instance's context
*
* \return Returns a pointer to a preview image, or NULL if no image is
* available.
*
*/
const aom_image_t *aom_codec_get_preview_frame(aom_codec_ctx_t *ctx);
/*!@} - end defgroup encoder*/
#ifdef __cplusplus
}
#endif
#endif // AOM_AOM_ENCODER_H_

View File

@@ -1,15 +1,16 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#ifndef VPX_VPX_FRAME_BUFFER_H_ #ifndef AOM_AOM_FRAME_BUFFER_H_
#define VPX_VPX_FRAME_BUFFER_H_ #define AOM_AOM_FRAME_BUFFER_H_
/*!\file /*!\file
* \brief Describes the decoder external frame buffer interface. * \brief Describes the decoder external frame buffer interface.
@@ -19,28 +20,28 @@
extern "C" { extern "C" {
#endif #endif
#include "./vpx_integer.h" #include "./aom_integer.h"
/*!\brief The maximum number of work buffers used by libvpx. /*!\brief The maximum number of work buffers used by libaom.
* Support maximum 4 threads to decode video in parallel. * Support maximum 4 threads to decode video in parallel.
* Each thread will use one work buffer. * Each thread will use one work buffer.
* TODO(hkuang): Add support to set number of worker threads dynamically. * TODO(hkuang): Add support to set number of worker threads dynamically.
*/ */
#define VPX_MAXIMUM_WORK_BUFFERS 8 #define AOM_MAXIMUM_WORK_BUFFERS 8
/*!\brief The maximum number of reference buffers that a VP9 encoder may use. /*!\brief The maximum number of reference buffers that a AV1 encoder may use.
*/ */
#define VPX_MAXIMUM_REF_BUFFERS 8 #define AOM_MAXIMUM_REF_BUFFERS 8
/*!\brief External frame buffer /*!\brief External frame buffer
* *
* This structure holds allocated frame buffers used by the decoder. * This structure holds allocated frame buffers used by the decoder.
*/ */
typedef struct vpx_codec_frame_buffer { typedef struct aom_codec_frame_buffer {
uint8_t *data; /**< Pointer to the data buffer */ uint8_t *data; /**< Pointer to the data buffer */
size_t size; /**< Size of data in bytes */ size_t size; /**< Size of data in bytes */
void *priv; /**< Frame's private data */ void *priv; /**< Frame's private data */
} vpx_codec_frame_buffer_t; } aom_codec_frame_buffer_t;
/*!\brief get frame buffer callback prototype /*!\brief get frame buffer callback prototype
* *
@@ -51,17 +52,17 @@ typedef struct vpx_codec_frame_buffer {
* to the allocated size. The application does not need to align the allocated * to the allocated size. The application does not need to align the allocated
* data. The callback is triggered when the decoder needs a frame buffer to * data. The callback is triggered when the decoder needs a frame buffer to
* decode a compressed image into. This function may be called more than once * decode a compressed image into. This function may be called more than once
* for every call to vpx_codec_decode. The application may set fb->priv to * for every call to aom_codec_decode. The application may set fb->priv to
* some data which will be passed back in the ximage and the release function * some data which will be passed back in the ximage and the release function
* call. |fb| is guaranteed to not be NULL. On success the callback must * call. |fb| is guaranteed to not be NULL. On success the callback must
* return 0. Any failure the callback must return a value less than 0. * return 0. Any failure the callback must return a value less than 0.
* *
* \param[in] priv Callback's private data * \param[in] priv Callback's private data
* \param[in] new_size Size in bytes needed by the buffer * \param[in] new_size Size in bytes needed by the buffer
* \param[in,out] fb Pointer to vpx_codec_frame_buffer_t * \param[in,out] fb Pointer to aom_codec_frame_buffer_t
*/ */
typedef int (*vpx_get_frame_buffer_cb_fn_t)( typedef int (*aom_get_frame_buffer_cb_fn_t)(void *priv, size_t min_size,
void *priv, size_t min_size, vpx_codec_frame_buffer_t *fb); aom_codec_frame_buffer_t *fb);
/*!\brief release frame buffer callback prototype /*!\brief release frame buffer callback prototype
* *
@@ -71,13 +72,13 @@ typedef int (*vpx_get_frame_buffer_cb_fn_t)(
* a value less than 0. * a value less than 0.
* *
* \param[in] priv Callback's private data * \param[in] priv Callback's private data
* \param[in] fb Pointer to vpx_codec_frame_buffer_t * \param[in] fb Pointer to aom_codec_frame_buffer_t
*/ */
typedef int (*vpx_release_frame_buffer_cb_fn_t)( typedef int (*aom_release_frame_buffer_cb_fn_t)(void *priv,
void *priv, vpx_codec_frame_buffer_t *fb); aom_codec_frame_buffer_t *fb);
#ifdef __cplusplus #ifdef __cplusplus
} // extern "C" } // extern "C"
#endif #endif
#endif // VPX_VPX_FRAME_BUFFER_H_ #endif // AOM_AOM_FRAME_BUFFER_H_

225
aom/aom_image.h Normal file
View File

@@ -0,0 +1,225 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Describes the aom image descriptor and associated operations
*
*/
#ifndef AOM_AOM_IMAGE_H_
#define AOM_AOM_IMAGE_H_
#ifdef __cplusplus
extern "C" {
#endif
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_IMAGE_ABI_VERSION (4) /**<\hideinitializer*/
#define AOM_IMG_FMT_PLANAR 0x100 /**< Image is a planar format. */
#define AOM_IMG_FMT_UV_FLIP 0x200 /**< V plane precedes U in memory. */
#define AOM_IMG_FMT_HAS_ALPHA 0x400 /**< Image has an alpha channel. */
#define AOM_IMG_FMT_HIGHBITDEPTH 0x800 /**< Image uses 16bit framebuffer. */
/*!\brief List of supported image formats */
typedef enum aom_img_fmt {
AOM_IMG_FMT_NONE,
AOM_IMG_FMT_RGB24, /**< 24 bit per pixel packed RGB */
AOM_IMG_FMT_RGB32, /**< 32 bit per pixel packed 0RGB */
AOM_IMG_FMT_RGB565, /**< 16 bit per pixel, 565 */
AOM_IMG_FMT_RGB555, /**< 16 bit per pixel, 555 */
AOM_IMG_FMT_UYVY, /**< UYVY packed YUV */
AOM_IMG_FMT_YUY2, /**< YUYV packed YUV */
AOM_IMG_FMT_YVYU, /**< YVYU packed YUV */
AOM_IMG_FMT_BGR24, /**< 24 bit per pixel packed BGR */
AOM_IMG_FMT_RGB32_LE, /**< 32 bit packed BGR0 */
AOM_IMG_FMT_ARGB, /**< 32 bit packed ARGB, alpha=255 */
AOM_IMG_FMT_ARGB_LE, /**< 32 bit packed BGRA, alpha=255 */
AOM_IMG_FMT_RGB565_LE, /**< 16 bit per pixel, gggbbbbb rrrrrggg */
AOM_IMG_FMT_RGB555_LE, /**< 16 bit per pixel, gggbbbbb 0rrrrrgg */
AOM_IMG_FMT_YV12 =
AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_UV_FLIP | 1, /**< planar YVU */
AOM_IMG_FMT_I420 = AOM_IMG_FMT_PLANAR | 2,
AOM_IMG_FMT_AOMYV12 = AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_UV_FLIP |
3, /** < planar 4:2:0 format with aom color space */
AOM_IMG_FMT_AOMI420 = AOM_IMG_FMT_PLANAR | 4,
AOM_IMG_FMT_I422 = AOM_IMG_FMT_PLANAR | 5,
AOM_IMG_FMT_I444 = AOM_IMG_FMT_PLANAR | 6,
AOM_IMG_FMT_I440 = AOM_IMG_FMT_PLANAR | 7,
AOM_IMG_FMT_444A = AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_HAS_ALPHA | 6,
AOM_IMG_FMT_I42016 = AOM_IMG_FMT_I420 | AOM_IMG_FMT_HIGHBITDEPTH,
AOM_IMG_FMT_I42216 = AOM_IMG_FMT_I422 | AOM_IMG_FMT_HIGHBITDEPTH,
AOM_IMG_FMT_I44416 = AOM_IMG_FMT_I444 | AOM_IMG_FMT_HIGHBITDEPTH,
AOM_IMG_FMT_I44016 = AOM_IMG_FMT_I440 | AOM_IMG_FMT_HIGHBITDEPTH
} aom_img_fmt_t; /**< alias for enum aom_img_fmt */
/*!\brief List of supported color spaces */
typedef enum aom_color_space {
AOM_CS_UNKNOWN = 0, /**< Unknown */
AOM_CS_BT_601 = 1, /**< BT.601 */
AOM_CS_BT_709 = 2, /**< BT.709 */
AOM_CS_SMPTE_170 = 3, /**< SMPTE.170 */
AOM_CS_SMPTE_240 = 4, /**< SMPTE.240 */
AOM_CS_BT_2020 = 5, /**< BT.2020 */
AOM_CS_RESERVED = 6, /**< Reserved */
AOM_CS_SRGB = 7 /**< sRGB */
} aom_color_space_t; /**< alias for enum aom_color_space */
/*!\brief List of supported color range */
typedef enum aom_color_range {
AOM_CR_STUDIO_RANGE = 0, /**< Y [16..235], UV [16..240] */
AOM_CR_FULL_RANGE = 1 /**< YUV/RGB [0..255] */
} aom_color_range_t; /**< alias for enum aom_color_range */
/**\brief Image Descriptor */
typedef struct aom_image {
aom_img_fmt_t fmt; /**< Image Format */
aom_color_space_t cs; /**< Color Space */
aom_color_range_t range; /**< Color Range */
/* Image storage dimensions */
unsigned int w; /**< Stored image width */
unsigned int h; /**< Stored image height */
unsigned int bit_depth; /**< Stored image bit-depth */
/* Image display dimensions */
unsigned int d_w; /**< Displayed image width */
unsigned int d_h; /**< Displayed image height */
/* Image intended rendering dimensions */
unsigned int r_w; /**< Intended rendering image width */
unsigned int r_h; /**< Intended rendering image height */
/* Chroma subsampling info */
unsigned int x_chroma_shift; /**< subsampling order, X */
unsigned int y_chroma_shift; /**< subsampling order, Y */
/* Image data pointers. */
#define AOM_PLANE_PACKED 0 /**< To be used for all packed formats */
#define AOM_PLANE_Y 0 /**< Y (Luminance) plane */
#define AOM_PLANE_U 1 /**< U (Chroma) plane */
#define AOM_PLANE_V 2 /**< V (Chroma) plane */
#define AOM_PLANE_ALPHA 3 /**< A (Transparency) plane */
unsigned char *planes[4]; /**< pointer to the top left pixel for each plane */
int stride[4]; /**< stride between rows for each plane */
int bps; /**< bits per sample (for packed formats) */
/*!\brief The following member may be set by the application to associate
* data with this image.
*/
void *user_priv;
/* The following members should be treated as private. */
unsigned char *img_data; /**< private */
int img_data_owner; /**< private */
int self_allocd; /**< private */
void *fb_priv; /**< Frame buffer data associated with the image. */
} aom_image_t; /**< alias for struct aom_image */
/**\brief Representation of a rectangle on a surface */
typedef struct aom_image_rect {
unsigned int x; /**< leftmost column */
unsigned int y; /**< topmost row */
unsigned int w; /**< width */
unsigned int h; /**< height */
} aom_image_rect_t; /**< alias for struct aom_image_rect */
/*!\brief Open a descriptor, allocating storage for the underlying image
*
* Returns a descriptor for storing an image of the given format. The
* storage for the descriptor is allocated on the heap.
*
* \param[in] img Pointer to storage for descriptor. If this parameter
* is NULL, the storage for the descriptor will be
* allocated on the heap.
* \param[in] fmt Format for the image
* \param[in] d_w Width of the image
* \param[in] d_h Height of the image
* \param[in] align Alignment, in bytes, of the image buffer and
* each row in the image(stride).
*
* \return Returns a pointer to the initialized image descriptor. If the img
* parameter is non-null, the value of the img parameter will be
* returned.
*/
aom_image_t *aom_img_alloc(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int align);
/*!\brief Open a descriptor, using existing storage for the underlying image
*
* Returns a descriptor for storing an image of the given format. The
* storage for descriptor has been allocated elsewhere, and a descriptor is
* desired to "wrap" that storage.
*
* \param[in] img Pointer to storage for descriptor. If this parameter
* is NULL, the storage for the descriptor will be
* allocated on the heap.
* \param[in] fmt Format for the image
* \param[in] d_w Width of the image
* \param[in] d_h Height of the image
* \param[in] align Alignment, in bytes, of each row in the image.
* \param[in] img_data Storage to use for the image
*
* \return Returns a pointer to the initialized image descriptor. If the img
* parameter is non-null, the value of the img parameter will be
* returned.
*/
aom_image_t *aom_img_wrap(aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w,
unsigned int d_h, unsigned int align,
unsigned char *img_data);
/*!\brief Set the rectangle identifying the displayed portion of the image
*
* Updates the displayed rectangle (aka viewport) on the image surface to
* match the specified coordinates and size.
*
* \param[in] img Image descriptor
* \param[in] x leftmost column
* \param[in] y topmost row
* \param[in] w width
* \param[in] h height
*
* \return 0 if the requested rectangle is valid, nonzero otherwise.
*/
int aom_img_set_rect(aom_image_t *img, unsigned int x, unsigned int y,
unsigned int w, unsigned int h);
/*!\brief Flip the image vertically (top for bottom)
*
* Adjusts the image descriptor's pointers and strides to make the image
* be referenced upside-down.
*
* \param[in] img Image descriptor
*/
void aom_img_flip(aom_image_t *img);
/*!\brief Close an image descriptor
*
* Frees all allocated storage associated with an image descriptor.
*
* \param[in] img Image descriptor
*/
void aom_img_free(aom_image_t *img);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOM_IMAGE_H_

64
aom/aom_integer.h Normal file
View File

@@ -0,0 +1,64 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOM_INTEGER_H_
#define AOM_AOM_INTEGER_H_
/* get ptrdiff_t, size_t, wchar_t, NULL */
#include <stddef.h>
#if defined(_MSC_VER)
#define AOM_FORCE_INLINE __forceinline
#define AOM_INLINE __inline
#else
#define AOM_FORCE_INLINE __inline__ __attribute__((always_inline))
// TODO(jbb): Allow a way to force inline off for older compilers.
#define AOM_INLINE inline
#endif
#if defined(AOM_EMULATE_INTTYPES)
typedef signed char int8_t;
typedef signed short int16_t;
typedef signed int int32_t;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
#ifndef _UINTPTR_T_DEFINED
typedef size_t uintptr_t;
#endif
#else
/* Most platforms have the C99 standard integer types. */
#if defined(__cplusplus)
#if !defined(__STDC_FORMAT_MACROS)
#define __STDC_FORMAT_MACROS
#endif
#if !defined(__STDC_LIMIT_MACROS)
#define __STDC_LIMIT_MACROS
#endif
#endif // __cplusplus
#include <stdint.h>
#endif
/* VS2010 defines stdint.h, but not inttypes.h */
#if defined(_MSC_VER) && _MSC_VER < 1800
#define PRId64 "I64d"
#else
#include <inttypes.h>
#endif
#endif // AOM_AOM_INTEGER_H_

759
aom/aomcx.h Normal file
View File

@@ -0,0 +1,759 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOMCX_H_
#define AOM_AOMCX_H_
/*!\defgroup aom_encoder AOMedia AOM/AV1 Encoder
* \ingroup aom
*
* @{
*/
#include "./aom.h"
#include "./aom_encoder.h"
/*!\file
* \brief Provides definitions for using AOM or AV1 encoder algorithm within the
* aom Codec Interface.
*/
#ifdef __cplusplus
extern "C" {
#endif
/*!\name Algorithm interface for AV1
*
* This interface provides the capability to encode raw AV1 streams.
* @{
*/
extern aom_codec_iface_t aom_codec_av1_cx_algo;
extern aom_codec_iface_t *aom_codec_av1_cx(void);
/*!@} - end algorithm interface member group*/
/*
* Algorithm Flags
*/
/*!\brief Don't reference the last frame
*
* When this flag is set, the encoder will not use the last frame as a
* predictor. When not set, the encoder will choose whether to use the
* last frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_LAST (1 << 16)
/*!\brief Don't reference the golden frame
*
* When this flag is set, the encoder will not use the golden frame as a
* predictor. When not set, the encoder will choose whether to use the
* golden frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_GF (1 << 17)
/*!\brief Don't reference the alternate reference frame
*
* When this flag is set, the encoder will not use the alt ref frame as a
* predictor. When not set, the encoder will choose whether to use the
* alt ref frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_ARF (1 << 21)
/*!\brief Don't update the last frame
*
* When this flag is set, the encoder will not update the last frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_LAST (1 << 18)
/*!\brief Don't update the golden frame
*
* When this flag is set, the encoder will not update the golden frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_GF (1 << 22)
/*!\brief Don't update the alternate reference frame
*
* When this flag is set, the encoder will not update the alt ref frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_ARF (1 << 23)
/*!\brief Force golden frame update
*
* When this flag is set, the encoder copy the contents of the current frame
* to the golden frame buffer.
*/
#define AOM_EFLAG_FORCE_GF (1 << 19)
/*!\brief Force alternate reference frame update
*
* When this flag is set, the encoder copy the contents of the current frame
* to the alternate reference frame buffer.
*/
#define AOM_EFLAG_FORCE_ARF (1 << 24)
/*!\brief Disable entropy update
*
* When this flag is set, the encoder will not update its internal entropy
* model based on the entropy of this frame.
*/
#define AOM_EFLAG_NO_UPD_ENTROPY (1 << 20)
/*!\brief AVx encoder control functions
*
* This set of macros define the control functions available for AVx
* encoder interface.
*
* \sa #aom_codec_control
*/
enum aome_enc_control_id {
/*!\brief Codec control function to set which reference frame encoder can use.
*
* Supported in codecs: VP8, AV1
*/
AOME_USE_REFERENCE = 7,
/*!\brief Codec control function to pass an ROI map to encoder.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ROI_MAP = 8,
/*!\brief Codec control function to pass an Active map to encoder.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ACTIVEMAP,
/*!\brief Codec control function to set encoder scaling mode.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_SCALEMODE = 11,
/*!\brief Codec control function to set encoder internal speed settings.
*
* Changes in this value influences, among others, the encoder's selection
* of motion estimation methods. Values greater than 0 will increase encoder
* speed at the expense of quality.
*
* \note Valid range for VP8: -16..16
* \note Valid range for AV1: -8..8
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_CPUUSED = 13,
/*!\brief Codec control function to enable automatic set and use alf frames.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ENABLEAUTOALTREF,
#if CONFIG_EXT_REFS
/*!\brief Codec control function to enable automatic set and use
* bwd-pred frames.
*
* Supported in codecs: AV1
*/
AOME_SET_ENABLEAUTOBWDREF,
#endif // CONFIG_EXT_REFS
/*!\brief control function to set noise sensitivity
*
* 0: off, 1: OnYOnly, 2: OnYUV,
* 3: OnYUVAggressive, 4: Adaptive
*
* Supported in codecs: VP8
*/
AOME_SET_NOISE_SENSITIVITY,
/*!\brief Codec control function to set sharpness.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_SHARPNESS,
/*!\brief Codec control function to set the threshold for MBs treated static.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_STATIC_THRESHOLD,
/*!\brief Codec control function to set the number of token partitions.
*
* Supported in codecs: VP8
*/
AOME_SET_TOKEN_PARTITIONS,
/*!\brief Codec control function to get last quantizer chosen by the encoder.
*
* Return value uses internal quantizer scale defined by the codec.
*
* Supported in codecs: VP8, AV1
*/
AOME_GET_LAST_QUANTIZER,
/*!\brief Codec control function to get last quantizer chosen by the encoder.
*
* Return value uses the 0..63 scale as used by the rc_*_quantizer config
* parameters.
*
* Supported in codecs: VP8, AV1
*/
AOME_GET_LAST_QUANTIZER_64,
/*!\brief Codec control function to set the max no of frames to create arf.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ARNR_MAXFRAMES,
/*!\brief Codec control function to set the filter strength for the arf.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ARNR_STRENGTH,
/*!\deprecated control function to set the filter type to use for the arf. */
AOME_SET_ARNR_TYPE,
/*!\brief Codec control function to set visual tuning.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_TUNING,
/*!\brief Codec control function to set constrained quality level.
*
* \attention For this value to be used aom_codec_enc_cfg_t::g_usage must be
* set to #AOM_CQ.
* \note Valid range: 0..63
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_CQ_LEVEL,
/*!\brief Codec control function to set Max data rate for Intra frames.
*
* This value controls additional clamping on the maximum size of a
* keyframe. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* unlimited, or no additional clamping beyond the codec's built-in
* algorithm.
*
* For example, to allocate no more than 4.5 frames worth of bitrate
* to a keyframe, set this to 450.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_MAX_INTRA_BITRATE_PCT,
/*!\brief Codec control function to set reference and update frame flags.
*
* Supported in codecs: VP8
*/
AOME_SET_FRAME_FLAGS,
/*!\brief Codec control function to set max data rate for Inter frames.
*
* This value controls additional clamping on the maximum size of an
* inter frame. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* unlimited, or no additional clamping beyond the codec's built-in
* algorithm.
*
* For example, to allow no more than 4.5 frames worth of bitrate
* to an inter frame, set this to 450.
*
* Supported in codecs: AV1
*/
AV1E_SET_MAX_INTER_BITRATE_PCT,
/*!\brief Boost percentage for Golden Frame in CBR mode.
*
* This value controls the amount of boost given to Golden Frame in
* CBR mode. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* the feature is off, i.e., no golden frame boost in CBR mode and
* average bitrate target is used.
*
* For example, to allow 100% more bits, i.e, 2X, in a golden frame
* than average frame, set this to 100.
*
* Supported in codecs: AV1
*/
AV1E_SET_GF_CBR_BOOST_PCT,
/*!\brief Codec control function to set encoder screen content mode.
*
* 0: off, 1: On, 2: On with more aggressive rate control.
*
* Supported in codecs: VP8
*/
AOME_SET_SCREEN_CONTENT_MODE,
/*!\brief Codec control function to set lossless encoding mode.
*
* AV1 can operate in lossless encoding mode, in which the bitstream
* produced will be able to decode and reconstruct a perfect copy of
* input source. This control function provides a mean to switch encoder
* into lossless coding mode(1) or normal coding mode(0) that may be lossy.
* 0 = lossy coding mode
* 1 = lossless coding mode
*
* By default, encoder operates in normal coding mode (maybe lossy).
*
* Supported in codecs: AV1
*/
AV1E_SET_LOSSLESS,
#if CONFIG_AOM_QM
/*!\brief Codec control function to encode with quantisation matrices.
*
* AOM can operate with default quantisation matrices dependent on
* quantisation level and block type.
* 0 = do not use quantisation matrices
* 1 = use quantisation matrices
*
* By default, the encoder operates without quantisation matrices.
*
* Supported in codecs: AOM
*/
AV1E_SET_ENABLE_QM,
/*!\brief Codec control function to set the min quant matrix flatness.
*
* AOM can operate with different ranges of quantisation matrices.
* As quantisation levels increase, the matrices get flatter. This
* control sets the minimum level of flatness from which the matrices
* are determined.
*
* By default, the encoder sets this minimum at half the available
* range.
*
* Supported in codecs: AOM
*/
AV1E_SET_QM_MIN,
/*!\brief Codec control function to set the max quant matrix flatness.
*
* AOM can operate with different ranges of quantisation matrices.
* As quantisation levels increase, the matrices get flatter. This
* control sets the maximum level of flatness possible.
*
* By default, the encoder sets this maximum at the top of the
* available range.
*
* Supported in codecs: AOM
*/
AV1E_SET_QM_MAX,
#endif
/*!\brief Codec control function to set number of tile columns.
*
* In encoding and decoding, AV1 allows an input image frame be partitioned
* into separated vertical tile columns, which can be encoded or decoded
* independently. This enables easy implementation of parallel encoding and
* decoding. This control requests the encoder to use column tiles in
* encoding an input frame, with number of tile columns (in Log2 unit) as
* the parameter:
* 0 = 1 tile column
* 1 = 2 tile columns
* 2 = 4 tile columns
* .....
* n = 2**n tile columns
* The requested tile columns will be capped by encoder based on image size
* limitation (The minimum width of a tile column is 256 pixel, the maximum
* is 4096).
*
* By default, the value is 0, i.e. one single column tile for entire image.
*
* Supported in codecs: AV1
*/
AV1E_SET_TILE_COLUMNS,
/*!\brief Codec control function to set number of tile rows.
*
* In encoding and decoding, AV1 allows an input image frame be partitioned
* into separated horizontal tile rows. Tile rows are encoded or decoded
* sequentially. Even though encoding/decoding of later tile rows depends on
* earlier ones, this allows the encoder to output data packets for tile rows
* prior to completely processing all tile rows in a frame, thereby reducing
* the latency in processing between input and output. The parameter
* for this control describes the number of tile rows, which has a valid
* range [0, 2]:
* 0 = 1 tile row
* 1 = 2 tile rows
* 2 = 4 tile rows
*
* By default, the value is 0, i.e. one single row tile for entire image.
*
* Supported in codecs: AV1
*/
AV1E_SET_TILE_ROWS,
/*!\brief Codec control function to enable frame parallel decoding feature.
*
* AV1 has a bitstream feature to reduce decoding dependency between frames
* by turning off backward update of probability context used in encoding
* and decoding. This allows staged parallel processing of more than one
* video frames in the decoder. This control function provides a mean to
* turn this feature on or off for bitstreams produced by encoder.
*
* By default, this feature is off.
*
* Supported in codecs: AV1
*/
AV1E_SET_FRAME_PARALLEL_DECODING,
/*!\brief Codec control function to set adaptive quantization mode.
*
* AV1 has a segment based feature that allows encoder to adaptively change
* quantization parameter for each segment within a frame to improve the
* subjective quality. This control makes encoder operate in one of the
* several AQ_modes supported.
*
* By default, encoder operates with AQ_Mode 0(adaptive quantization off).
*
* Supported in codecs: AV1
*/
AV1E_SET_AQ_MODE,
/*!\brief Codec control function to enable/disable periodic Q boost.
*
* One AV1 encoder speed feature is to enable quality boost by lowering
* frame level Q periodically. This control function provides a mean to
* turn on/off this feature.
* 0 = off
* 1 = on
*
* By default, the encoder is allowed to use this feature for appropriate
* encoding modes.
*
* Supported in codecs: AV1
*/
AV1E_SET_FRAME_PERIODIC_BOOST,
/*!\brief Codec control function to set noise sensitivity.
*
* 0: off, 1: On(YOnly)
*
* Supported in codecs: AV1
*/
AV1E_SET_NOISE_SENSITIVITY,
/*!\brief Codec control function to set content type.
* \note Valid parameter range:
* AOM_CONTENT_DEFAULT = Regular video content (Default)
* AOM_CONTENT_SCREEN = Screen capture content
*
* Supported in codecs: AV1
*/
AV1E_SET_TUNE_CONTENT,
/*!\brief Codec control function to set color space info.
* \note Valid ranges: 0..7, default is "UNKNOWN".
* 0 = UNKNOWN,
* 1 = BT_601
* 2 = BT_709
* 3 = SMPTE_170
* 4 = SMPTE_240
* 5 = BT_2020
* 6 = RESERVED
* 7 = SRGB
*
* Supported in codecs: AV1
*/
AV1E_SET_COLOR_SPACE,
/*!\brief Codec control function to set minimum interval between GF/ARF frames
*
* By default the value is set as 4.
*
* Supported in codecs: AV1
*/
AV1E_SET_MIN_GF_INTERVAL,
/*!\brief Codec control function to set minimum interval between GF/ARF frames
*
* By default the value is set as 16.
*
* Supported in codecs: AV1
*/
AV1E_SET_MAX_GF_INTERVAL,
/*!\brief Codec control function to get an Active map back from the encoder.
*
* Supported in codecs: AV1
*/
AV1E_GET_ACTIVEMAP,
/*!\brief Codec control function to set color range bit.
* \note Valid ranges: 0..1, default is 0
* 0 = Limited range (16..235 or HBD equivalent)
* 1 = Full range (0..255 or HBD equivalent)
*
* Supported in codecs: AV1
*/
AV1E_SET_COLOR_RANGE,
/*!\brief Codec control function to set intended rendering image size.
*
* By default, this is identical to the image size in pixels.
*
* Supported in codecs: AV1
*/
AV1E_SET_RENDER_SIZE,
/*!\brief Codec control function to set target level.
*
* 255: off (default); 0: only keep level stats; 10: target for level 1.0;
* 11: target for level 1.1; ... 62: target for level 6.2
*
* Supported in codecs: AV1
*/
AV1E_SET_TARGET_LEVEL,
/*!\brief Codec control function to get bitstream level.
*
* Supported in codecs: AV1
*/
AV1E_GET_LEVEL,
/*!\brief Codec control function to set intended superblock size.
*
* By default, the superblock size is determined separately for each
* frame by the encoder.
*
* Supported in codecs: AV1
*/
AV1E_SET_SUPERBLOCK_SIZE,
};
/*!\brief aom 1-D scaling mode
*
* This set of constants define 1-D aom scaling modes
*/
typedef enum aom_scaling_mode_1d {
AOME_NORMAL = 0,
AOME_FOURFIVE = 1,
AOME_THREEFIVE = 2,
AOME_ONETWO = 3
} AOM_SCALING_MODE;
/*!\brief aom region of interest map
*
* These defines the data structures for the region of interest map
*
*/
typedef struct aom_roi_map {
/*! An id between 0 and 3 for each 16x16 region within a frame. */
unsigned char *roi_map;
unsigned int rows; /**< Number of rows. */
unsigned int cols; /**< Number of columns. */
// TODO(paulwilkins): broken for AV1 which has 8 segments
// q and loop filter deltas for each segment
// (see MAX_MB_SEGMENTS)
int delta_q[4]; /**< Quantizer deltas. */
int delta_lf[4]; /**< Loop filter deltas. */
/*! Static breakout threshold for each segment. */
unsigned int static_threshold[4];
} aom_roi_map_t;
/*!\brief aom active region map
*
* These defines the data structures for active region map
*
*/
typedef struct aom_active_map {
/*!\brief specify an on (1) or off (0) each 16x16 region within a frame */
unsigned char *active_map;
unsigned int rows; /**< number of rows */
unsigned int cols; /**< number of cols */
} aom_active_map_t;
/*!\brief aom image scaling mode
*
* This defines the data structure for image scaling mode
*
*/
typedef struct aom_scaling_mode {
AOM_SCALING_MODE h_scaling_mode; /**< horizontal scaling mode */
AOM_SCALING_MODE v_scaling_mode; /**< vertical scaling mode */
} aom_scaling_mode_t;
/*!\brief VP8 token partition mode
*
* This defines VP8 partitioning mode for compressed data, i.e., the number of
* sub-streams in the bitstream. Used for parallelized decoding.
*
*/
typedef enum {
AOM_ONE_TOKENPARTITION = 0,
AOM_TWO_TOKENPARTITION = 1,
AOM_FOUR_TOKENPARTITION = 2,
AOM_EIGHT_TOKENPARTITION = 3
} aome_token_partitions;
/*!brief AV1 encoder content type */
typedef enum {
AOM_CONTENT_DEFAULT,
AOM_CONTENT_SCREEN,
AOM_CONTENT_INVALID
} aom_tune_content;
/*!\brief VP8 model tuning parameters
*
* Changes the encoder to tune for certain types of input material.
*
*/
typedef enum { AOM_TUNE_PSNR, AOM_TUNE_SSIM } aom_tune_metric;
/*!\cond */
/*!\brief VP8 encoder control function parameter type
*
* Defines the data types that VP8E control functions take. Note that
* additional common controls are defined in aom.h
*
*/
AOM_CTRL_USE_TYPE_DEPRECATED(AOME_USE_REFERENCE, int)
#define AOM_CTRL_AOME_USE_REFERENCE
AOM_CTRL_USE_TYPE(AOME_SET_FRAME_FLAGS, int)
#define AOM_CTRL_AOME_SET_FRAME_FLAGS
AOM_CTRL_USE_TYPE(AOME_SET_ROI_MAP, aom_roi_map_t *)
#define AOM_CTRL_AOME_SET_ROI_MAP
AOM_CTRL_USE_TYPE(AOME_SET_ACTIVEMAP, aom_active_map_t *)
#define AOM_CTRL_AOME_SET_ACTIVEMAP
AOM_CTRL_USE_TYPE(AOME_SET_SCALEMODE, aom_scaling_mode_t *)
#define AOM_CTRL_AOME_SET_SCALEMODE
AOM_CTRL_USE_TYPE(AOME_SET_CPUUSED, int)
#define AOM_CTRL_AOME_SET_CPUUSED
AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOALTREF, unsigned int)
#define AOM_CTRL_AOME_SET_ENABLEAUTOALTREF
#if CONFIG_EXT_REFS
AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOBWDREF, unsigned int)
#define AOM_CTRL_AOME_SET_ENABLEAUTOBWDREF
#endif // CONFIG_EXT_REFS
AOM_CTRL_USE_TYPE(AOME_SET_NOISE_SENSITIVITY, unsigned int)
#define AOM_CTRL_AOME_SET_NOISE_SENSITIVITY
AOM_CTRL_USE_TYPE(AOME_SET_SHARPNESS, unsigned int)
#define AOM_CTRL_AOME_SET_SHARPNESS
AOM_CTRL_USE_TYPE(AOME_SET_STATIC_THRESHOLD, unsigned int)
#define AOM_CTRL_AOME_SET_STATIC_THRESHOLD
AOM_CTRL_USE_TYPE(AOME_SET_TOKEN_PARTITIONS, int) /* aome_token_partitions */
#define AOM_CTRL_AOME_SET_TOKEN_PARTITIONS
AOM_CTRL_USE_TYPE(AOME_SET_ARNR_MAXFRAMES, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_MAXFRAMES
AOM_CTRL_USE_TYPE(AOME_SET_ARNR_STRENGTH, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_STRENGTH
AOM_CTRL_USE_TYPE_DEPRECATED(AOME_SET_ARNR_TYPE, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_TYPE
AOM_CTRL_USE_TYPE(AOME_SET_TUNING, int) /* aom_tune_metric */
#define AOM_CTRL_AOME_SET_TUNING
AOM_CTRL_USE_TYPE(AOME_SET_CQ_LEVEL, unsigned int)
#define AOM_CTRL_AOME_SET_CQ_LEVEL
AOM_CTRL_USE_TYPE(AV1E_SET_TILE_COLUMNS, int)
#define AOM_CTRL_AV1E_SET_TILE_COLUMNS
AOM_CTRL_USE_TYPE(AV1E_SET_TILE_ROWS, int)
#define AOM_CTRL_AV1E_SET_TILE_ROWS
AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER, int *)
#define AOM_CTRL_AOME_GET_LAST_QUANTIZER
AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER_64, int *)
#define AOM_CTRL_AOME_GET_LAST_QUANTIZER_64
AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTRA_BITRATE_PCT, unsigned int)
#define AOM_CTRL_AOME_SET_MAX_INTRA_BITRATE_PCT
AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTER_BITRATE_PCT, unsigned int)
#define AOM_CTRL_AOME_SET_MAX_INTER_BITRATE_PCT
AOM_CTRL_USE_TYPE(AOME_SET_SCREEN_CONTENT_MODE, unsigned int)
#define AOM_CTRL_AOME_SET_SCREEN_CONTENT_MODE
AOM_CTRL_USE_TYPE(AV1E_SET_GF_CBR_BOOST_PCT, unsigned int)
#define AOM_CTRL_AV1E_SET_GF_CBR_BOOST_PCT
AOM_CTRL_USE_TYPE(AV1E_SET_LOSSLESS, unsigned int)
#define AOM_CTRL_AV1E_SET_LOSSLESS
#if CONFIG_AOM_QM
AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_QM, unsigned int)
#define AOM_CTRL_AV1E_SET_ENABLE_QM
AOM_CTRL_USE_TYPE(AV1E_SET_QM_MIN, unsigned int)
#define AOM_CTRL_AV1E_SET_QM_MIN
AOM_CTRL_USE_TYPE(AV1E_SET_QM_MAX, unsigned int)
#define AOM_CTRL_AV1E_SET_QM_MAX
#endif
AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PARALLEL_DECODING, unsigned int)
#define AOM_CTRL_AV1E_SET_FRAME_PARALLEL_DECODING
AOM_CTRL_USE_TYPE(AV1E_SET_AQ_MODE, unsigned int)
#define AOM_CTRL_AV1E_SET_AQ_MODE
AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PERIODIC_BOOST, unsigned int)
#define AOM_CTRL_AV1E_SET_FRAME_PERIODIC_BOOST
AOM_CTRL_USE_TYPE(AV1E_SET_NOISE_SENSITIVITY, unsigned int)
#define AOM_CTRL_AV1E_SET_NOISE_SENSITIVITY
AOM_CTRL_USE_TYPE(AV1E_SET_TUNE_CONTENT, int) /* aom_tune_content */
#define AOM_CTRL_AV1E_SET_TUNE_CONTENT
AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_SPACE, int)
#define AOM_CTRL_AV1E_SET_COLOR_SPACE
AOM_CTRL_USE_TYPE(AV1E_SET_MIN_GF_INTERVAL, unsigned int)
#define AOM_CTRL_AV1E_SET_MIN_GF_INTERVAL
AOM_CTRL_USE_TYPE(AV1E_SET_MAX_GF_INTERVAL, unsigned int)
#define AOM_CTRL_AV1E_SET_MAX_GF_INTERVAL
AOM_CTRL_USE_TYPE(AV1E_GET_ACTIVEMAP, aom_active_map_t *)
#define AOM_CTRL_AV1E_GET_ACTIVEMAP
AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_RANGE, int)
#define AOM_CTRL_AV1E_SET_COLOR_RANGE
/*!\brief
*
* TODO(rbultje) : add support of the control in ffmpeg
*/
#define AOM_CTRL_AV1E_SET_RENDER_SIZE
AOM_CTRL_USE_TYPE(AV1E_SET_RENDER_SIZE, int *)
AOM_CTRL_USE_TYPE(AV1E_SET_SUPERBLOCK_SIZE, unsigned int)
#define AOM_CTRL_AV1E_SET_SUPERBLOCK_SIZE
AOM_CTRL_USE_TYPE(AV1E_SET_TARGET_LEVEL, unsigned int)
#define AOM_CTRL_AV1E_SET_TARGET_LEVEL
AOM_CTRL_USE_TYPE(AV1E_GET_LEVEL, int *)
#define AOM_CTRL_AV1E_GET_LEVEL
/*!\endcond */
/*! @} - end defgroup vp8_encoder */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOMCX_H_

191
aom/aomdx.h Normal file
View File

@@ -0,0 +1,191 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup aom_decoder AOMedia AOM/AV1 Decoder
* \ingroup aom
*
* @{
*/
/*!\file
* \brief Provides definitions for using AOM or AV1 within the aom Decoder
* interface.
*/
#ifndef AOM_AOMDX_H_
#define AOM_AOMDX_H_
#ifdef __cplusplus
extern "C" {
#endif
/* Include controls common to both the encoder and decoder */
#include "./aom.h"
/*!\name Algorithm interface for AV1
*
* This interface provides the capability to decode AV1 streams.
* @{
*/
extern aom_codec_iface_t aom_codec_av1_dx_algo;
extern aom_codec_iface_t *aom_codec_av1_dx(void);
/*!@} - end algorithm interface member group*/
/** Data structure that stores bit accounting for debug
*/
typedef struct Accounting Accounting;
/*!\enum aom_dec_control_id
* \brief AOM decoder control functions
*
* This set of macros define the control functions available for the AOM
* decoder interface.
*
* \sa #aom_codec_control
*/
enum aom_dec_control_id {
/** control function to get info on which reference frames were updated
* by the last decode
*/
AOMD_GET_LAST_REF_UPDATES = AOM_DECODER_CTRL_ID_START,
/** check if the indicated frame is corrupted */
AOMD_GET_FRAME_CORRUPTED,
/** control function to get info on which reference frames were used
* by the last decode
*/
AOMD_GET_LAST_REF_USED,
/** decryption function to decrypt encoded buffer data immediately
* before decoding. Takes a aom_decrypt_init, which contains
* a callback function and opaque context pointer.
*/
AOMD_SET_DECRYPTOR,
// AOMD_SET_DECRYPTOR = AOMD_SET_DECRYPTOR,
/** control function to get the dimensions that the current frame is decoded
* at. This may be different to the intended display size for the frame as
* specified in the wrapper or frame header (see AV1D_GET_DISPLAY_SIZE). */
AV1D_GET_FRAME_SIZE,
/** control function to get the current frame's intended display dimensions
* (as specified in the wrapper or frame header). This may be different to
* the decoded dimensions of this frame (see AV1D_GET_FRAME_SIZE). */
AV1D_GET_DISPLAY_SIZE,
/** control function to get the bit depth of the stream. */
AV1D_GET_BIT_DEPTH,
/** control function to set the byte alignment of the planes in the reference
* buffers. Valid values are power of 2, from 32 to 1024. A value of 0 sets
* legacy alignment. I.e. Y plane is aligned to 32 bytes, U plane directly
* follows Y plane, and V plane directly follows U plane. Default value is 0.
*/
AV1_SET_BYTE_ALIGNMENT,
/** control function to invert the decoding order to from right to left. The
* function is used in a test to confirm the decoding independence of tile
* columns. The function may be used in application where this order
* of decoding is desired.
*
* TODO(yaowu): Rework the unit test that uses this control, and in a future
* release, this test-only control shall be removed.
*/
AV1_INVERT_TILE_DECODE_ORDER,
/** control function to set the skip loop filter flag. Valid values are
* integers. The decoder will skip the loop filter when its value is set to
* nonzero. If the loop filter is skipped the decoder may accumulate decode
* artifacts. The default value is 0.
*/
AV1_SET_SKIP_LOOP_FILTER,
/** control function to retrieve a pointer to the Accounting struct. When
* compiled without --enable-accounting, this returns AOM_CODEC_INCAPABLE.
* If called before a frame has been decoded, this returns AOM_CODEC_ERROR.
* The caller should ensure that AOM_CODEC_OK is returned before attempting
* to dereference the Accounting pointer.
*/
AV1_GET_ACCOUNTING,
AOM_DECODER_CTRL_ID_MAX,
/** control function to set the range of tile decoding. A value that is
* greater and equal to zero indicates only the specific row/column is
* decoded. A value that is -1 indicates the whole row/column is decoded.
* A special case is both values are -1 that means the whole frame is
* decoded.
*/
AV1_SET_DECODE_TILE_ROW,
AV1_SET_DECODE_TILE_COL
};
/** Decrypt n bytes of data from input -> output, using the decrypt_state
* passed in AOMD_SET_DECRYPTOR.
*/
typedef void (*aom_decrypt_cb)(void *decrypt_state, const unsigned char *input,
unsigned char *output, int count);
/*!\brief Structure to hold decryption state
*
* Defines a structure to hold the decryption state and access function.
*/
typedef struct aom_decrypt_init {
/*! Decrypt callback. */
aom_decrypt_cb decrypt_cb;
/*! Decryption state. */
void *decrypt_state;
} aom_decrypt_init;
/*!\brief A deprecated alias for aom_decrypt_init.
*/
typedef aom_decrypt_init aom_decrypt_init;
/*!\cond */
/*!\brief AOM decoder control function parameter type
*
* Defines the data types that AOMD control functions take. Note that
* additional common controls are defined in aom.h
*
*/
AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_UPDATES, int *)
#define AOM_CTRL_AOMD_GET_LAST_REF_UPDATES
AOM_CTRL_USE_TYPE(AOMD_GET_FRAME_CORRUPTED, int *)
#define AOM_CTRL_AOMD_GET_FRAME_CORRUPTED
AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_USED, int *)
#define AOM_CTRL_AOMD_GET_LAST_REF_USED
AOM_CTRL_USE_TYPE(AOMD_SET_DECRYPTOR, aom_decrypt_init *)
#define AOM_CTRL_AOMD_SET_DECRYPTOR
// AOM_CTRL_USE_TYPE(AOMD_SET_DECRYPTOR, aom_decrypt_init *)
//#define AOM_CTRL_AOMD_SET_DECRYPTOR
AOM_CTRL_USE_TYPE(AV1D_GET_DISPLAY_SIZE, int *)
#define AOM_CTRL_AV1D_GET_DISPLAY_SIZE
AOM_CTRL_USE_TYPE(AV1D_GET_BIT_DEPTH, unsigned int *)
#define AOM_CTRL_AV1D_GET_BIT_DEPTH
AOM_CTRL_USE_TYPE(AV1D_GET_FRAME_SIZE, int *)
#define AOM_CTRL_AV1D_GET_FRAME_SIZE
AOM_CTRL_USE_TYPE(AV1_INVERT_TILE_DECODE_ORDER, int)
#define AOM_CTRL_AV1_INVERT_TILE_DECODE_ORDER
AOM_CTRL_USE_TYPE(AV1_GET_ACCOUNTING, Accounting **)
#define AOM_CTRL_AV1_GET_ACCOUNTING
AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_ROW, int)
#define AOM_CTRL_AV1_SET_DECODE_TILE_ROW
AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_COL, int)
#define AOM_CTRL_AV1_SET_DECODE_TILE_COL
/*!\endcond */
/*! @} - end defgroup aom_decoder */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOMDX_H_

16
aom/exports_com Normal file
View File

@@ -0,0 +1,16 @@
text aom_codec_build_config
text aom_codec_control_
text aom_codec_destroy
text aom_codec_err_to_string
text aom_codec_error
text aom_codec_error_detail
text aom_codec_get_caps
text aom_codec_iface_name
text aom_codec_version
text aom_codec_version_extra_str
text aom_codec_version_str
text aom_img_alloc
text aom_img_flip
text aom_img_free
text aom_img_set_rect
text aom_img_wrap

8
aom/exports_dec Normal file
View File

@@ -0,0 +1,8 @@
text aom_codec_dec_init_ver
text aom_codec_decode
text aom_codec_get_frame
text aom_codec_get_stream_info
text aom_codec_peek_stream_info
text aom_codec_register_put_frame_cb
text aom_codec_register_put_slice_cb
text aom_codec_set_frame_buffer_functions

9
aom/exports_enc Normal file
View File

@@ -0,0 +1,9 @@
text aom_codec_enc_config_default
text aom_codec_enc_config_set
text aom_codec_enc_init_multi_ver
text aom_codec_enc_init_ver
text aom_codec_encode
text aom_codec_get_cx_data
text aom_codec_get_global_headers
text aom_codec_get_preview_frame
text aom_codec_set_cx_data_buf

View File

@@ -0,0 +1,465 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Describes the decoder algorithm interface for algorithm
* implementations.
*
* This file defines the private structures and data types that are only
* relevant to implementing an algorithm, as opposed to using it.
*
* To create a decoder algorithm class, an interface structure is put
* into the global namespace:
* <pre>
* my_codec.c:
* aom_codec_iface_t my_codec = {
* "My Codec v1.0",
* AOM_CODEC_ALG_ABI_VERSION,
* ...
* };
* </pre>
*
* An application instantiates a specific decoder instance by using
* aom_codec_init() and a pointer to the algorithm's interface structure:
* <pre>
* my_app.c:
* extern aom_codec_iface_t my_codec;
* {
* aom_codec_ctx_t algo;
* res = aom_codec_init(&algo, &my_codec);
* }
* </pre>
*
* Once initialized, the instance is manged using other functions from
* the aom_codec_* family.
*/
#ifndef AOM_INTERNAL_AOM_CODEC_INTERNAL_H_
#define AOM_INTERNAL_AOM_CODEC_INTERNAL_H_
#include "./aom_config.h"
#include "../aom_decoder.h"
#include "../aom_encoder.h"
#include <stdarg.h>
#ifdef __cplusplus
extern "C" {
#endif
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_CODEC_INTERNAL_ABI_VERSION (5) /**<\hideinitializer*/
typedef struct aom_codec_alg_priv aom_codec_alg_priv_t;
typedef struct aom_codec_priv_enc_mr_cfg aom_codec_priv_enc_mr_cfg_t;
/*!\brief init function pointer prototype
*
* Performs algorithm-specific initialization of the decoder context. This
* function is called by the generic aom_codec_init() wrapper function, so
* plugins implementing this interface may trust the input parameters to be
* properly initialized.
*
* \param[in] ctx Pointer to this instance's context
* \retval #AOM_CODEC_OK
* The input stream was recognized and decoder initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory operation failed.
*/
typedef aom_codec_err_t (*aom_codec_init_fn_t)(
aom_codec_ctx_t *ctx, aom_codec_priv_enc_mr_cfg_t *data);
/*!\brief destroy function pointer prototype
*
* Performs algorithm-specific destruction of the decoder context. This
* function is called by the generic aom_codec_destroy() wrapper function,
* so plugins implementing this interface may trust the input parameters
* to be properly initialized.
*
* \param[in] ctx Pointer to this instance's context
* \retval #AOM_CODEC_OK
* The input stream was recognized and decoder initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory operation failed.
*/
typedef aom_codec_err_t (*aom_codec_destroy_fn_t)(aom_codec_alg_priv_t *ctx);
/*!\brief parse stream info function pointer prototype
*
* Performs high level parsing of the bitstream. This function is called by the
* generic aom_codec_peek_stream_info() wrapper function, so plugins
* implementing this interface may trust the input parameters to be properly
* initialized.
*
* \param[in] data Pointer to a block of data to parse
* \param[in] data_sz Size of the data buffer
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
typedef aom_codec_err_t (*aom_codec_peek_si_fn_t)(const uint8_t *data,
unsigned int data_sz,
aom_codec_stream_info_t *si);
/*!\brief Return information about the current stream.
*
* Returns information about the stream that has been parsed during decoding.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
typedef aom_codec_err_t (*aom_codec_get_si_fn_t)(aom_codec_alg_priv_t *ctx,
aom_codec_stream_info_t *si);
/*!\brief control function pointer prototype
*
* This function is used to exchange algorithm specific data with the decoder
* instance. This can be used to implement features specific to a particular
* algorithm.
*
* This function is called by the generic aom_codec_control() wrapper
* function, so plugins implementing this interface may trust the input
* parameters to be properly initialized. However, this interface does not
* provide type safety for the exchanged data or assign meanings to the
* control codes. Those details should be specified in the algorithm's
* header file. In particular, the ctrl_id parameter is guaranteed to exist
* in the algorithm's control mapping table, and the data parameter may be NULL.
*
*
* \param[in] ctx Pointer to this instance's context
* \param[in] ctrl_id Algorithm specific control identifier
* \param[in,out] data Data to exchange with algorithm instance.
*
* \retval #AOM_CODEC_OK
* The internal state data was deserialized.
*/
typedef aom_codec_err_t (*aom_codec_control_fn_t)(aom_codec_alg_priv_t *ctx,
va_list ap);
/*!\brief control function pointer mapping
*
* This structure stores the mapping between control identifiers and
* implementing functions. Each algorithm provides a list of these
* mappings. This list is searched by the aom_codec_control() wrapper
* function to determine which function to invoke. The special
* value {0, NULL} is used to indicate end-of-list, and must be
* present. The special value {0, <non-null>} can be used as a catch-all
* mapping. This implies that ctrl_id values chosen by the algorithm
* \ref MUST be non-zero.
*/
typedef const struct aom_codec_ctrl_fn_map {
int ctrl_id;
aom_codec_control_fn_t fn;
} aom_codec_ctrl_fn_map_t;
/*!\brief decode data function pointer prototype
*
* Processes a buffer of coded data. If the processing results in a new
* decoded frame becoming available, #AOM_CODEC_CB_PUT_SLICE and
* #AOM_CODEC_CB_PUT_FRAME events are generated as appropriate. This
* function is called by the generic aom_codec_decode() wrapper function,
* so plugins implementing this interface may trust the input parameters
* to be properly initialized.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] data Pointer to this block of new coded data. If
* NULL, a #AOM_CODEC_CB_PUT_FRAME event is posted
* for the previously decoded frame.
* \param[in] data_sz Size of the coded data, in bytes.
*
* \return Returns #AOM_CODEC_OK if the coded data was processed completely
* and future pictures can be decoded without error. Otherwise,
* see the descriptions of the other error codes in ::aom_codec_err_t
* for recoverability capabilities.
*/
typedef aom_codec_err_t (*aom_codec_decode_fn_t)(aom_codec_alg_priv_t *ctx,
const uint8_t *data,
unsigned int data_sz,
void *user_priv,
long deadline);
/*!\brief Decoded frames iterator
*
* Iterates over a list of the frames available for display. The iterator
* storage should be initialized to NULL to start the iteration. Iteration is
* complete when this function returns NULL.
*
* The list of available frames becomes valid upon completion of the
* aom_codec_decode call, and remains valid until the next call to
* aom_codec_decode.
*
* \param[in] ctx Pointer to this instance's context
* \param[in out] iter Iterator storage, initialized to NULL
*
* \return Returns a pointer to an image, if one is ready for display. Frames
* produced will always be in PTS (presentation time stamp) order.
*/
typedef aom_image_t *(*aom_codec_get_frame_fn_t)(aom_codec_alg_priv_t *ctx,
aom_codec_iter_t *iter);
/*!\brief Pass in external frame buffers for the decoder to use.
*
* Registers functions to be called when libaom needs a frame buffer
* to decode the current frame and a function to be called when libaom does
* not internally reference the frame buffer. This set function must
* be called before the first call to decode or libaom will assume the
* default behavior of allocating frame buffers internally.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb_get Pointer to the get callback function
* \param[in] cb_release Pointer to the release callback function
* \param[in] cb_priv Callback's private data
*
* \retval #AOM_CODEC_OK
* External frame buffers will be used by libaom.
* \retval #AOM_CODEC_INVALID_PARAM
* One or more of the callbacks were NULL.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* using external frame buffers.
*
* \note
* When decoding AV1, the application may be required to pass in at least
* #AOM_MAXIMUM_WORK_BUFFERS external frame
* buffers.
*/
typedef aom_codec_err_t (*aom_codec_set_fb_fn_t)(
aom_codec_alg_priv_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv);
typedef aom_codec_err_t (*aom_codec_encode_fn_t)(aom_codec_alg_priv_t *ctx,
const aom_image_t *img,
aom_codec_pts_t pts,
unsigned long duration,
aom_enc_frame_flags_t flags,
unsigned long deadline);
typedef const aom_codec_cx_pkt_t *(*aom_codec_get_cx_data_fn_t)(
aom_codec_alg_priv_t *ctx, aom_codec_iter_t *iter);
typedef aom_codec_err_t (*aom_codec_enc_config_set_fn_t)(
aom_codec_alg_priv_t *ctx, const aom_codec_enc_cfg_t *cfg);
typedef aom_fixed_buf_t *(*aom_codec_get_global_headers_fn_t)(
aom_codec_alg_priv_t *ctx);
typedef aom_image_t *(*aom_codec_get_preview_frame_fn_t)(
aom_codec_alg_priv_t *ctx);
typedef aom_codec_err_t (*aom_codec_enc_mr_get_mem_loc_fn_t)(
const aom_codec_enc_cfg_t *cfg, void **mem_loc);
/*!\brief usage configuration mapping
*
* This structure stores the mapping between usage identifiers and
* configuration structures. Each algorithm provides a list of these
* mappings. This list is searched by the aom_codec_enc_config_default()
* wrapper function to determine which config to return. The special value
* {-1, {0}} is used to indicate end-of-list, and must be present. At least
* one mapping must be present, in addition to the end-of-list.
*
*/
typedef const struct aom_codec_enc_cfg_map {
int usage;
aom_codec_enc_cfg_t cfg;
} aom_codec_enc_cfg_map_t;
/*!\brief Decoder algorithm interface interface
*
* All decoders \ref MUST expose a variable of this type.
*/
struct aom_codec_iface {
const char *name; /**< Identification String */
int abi_version; /**< Implemented ABI version */
aom_codec_caps_t caps; /**< Decoder capabilities */
aom_codec_init_fn_t init; /**< \copydoc ::aom_codec_init_fn_t */
aom_codec_destroy_fn_t destroy; /**< \copydoc ::aom_codec_destroy_fn_t */
aom_codec_ctrl_fn_map_t *ctrl_maps; /**< \copydoc ::aom_codec_ctrl_fn_map_t */
struct aom_codec_dec_iface {
aom_codec_peek_si_fn_t peek_si; /**< \copydoc ::aom_codec_peek_si_fn_t */
aom_codec_get_si_fn_t get_si; /**< \copydoc ::aom_codec_get_si_fn_t */
aom_codec_decode_fn_t decode; /**< \copydoc ::aom_codec_decode_fn_t */
aom_codec_get_frame_fn_t
get_frame; /**< \copydoc ::aom_codec_get_frame_fn_t */
aom_codec_set_fb_fn_t set_fb_fn; /**< \copydoc ::aom_codec_set_fb_fn_t */
} dec;
struct aom_codec_enc_iface {
int cfg_map_count;
aom_codec_enc_cfg_map_t
*cfg_maps; /**< \copydoc ::aom_codec_enc_cfg_map_t */
aom_codec_encode_fn_t encode; /**< \copydoc ::aom_codec_encode_fn_t */
aom_codec_get_cx_data_fn_t
get_cx_data; /**< \copydoc ::aom_codec_get_cx_data_fn_t */
aom_codec_enc_config_set_fn_t
cfg_set; /**< \copydoc ::aom_codec_enc_config_set_fn_t */
aom_codec_get_global_headers_fn_t
get_glob_hdrs; /**< \copydoc ::aom_codec_get_global_headers_fn_t */
aom_codec_get_preview_frame_fn_t
get_preview; /**< \copydoc ::aom_codec_get_preview_frame_fn_t */
aom_codec_enc_mr_get_mem_loc_fn_t
mr_get_mem_loc; /**< \copydoc ::aom_codec_enc_mr_get_mem_loc_fn_t */
} enc;
};
/*!\brief Callback function pointer / user data pair storage */
typedef struct aom_codec_priv_cb_pair {
union {
aom_codec_put_frame_cb_fn_t put_frame;
aom_codec_put_slice_cb_fn_t put_slice;
} u;
void *user_priv;
} aom_codec_priv_cb_pair_t;
/*!\brief Instance private storage
*
* This structure is allocated by the algorithm's init function. It can be
* extended in one of two ways. First, a second, algorithm specific structure
* can be allocated and the priv member pointed to it. Alternatively, this
* structure can be made the first member of the algorithm specific structure,
* and the pointer cast to the proper type.
*/
struct aom_codec_priv {
const char *err_detail;
aom_codec_flags_t init_flags;
struct {
aom_codec_priv_cb_pair_t put_frame_cb;
aom_codec_priv_cb_pair_t put_slice_cb;
} dec;
struct {
aom_fixed_buf_t cx_data_dst_buf;
unsigned int cx_data_pad_before;
unsigned int cx_data_pad_after;
aom_codec_cx_pkt_t cx_data_pkt;
unsigned int total_encoders;
} enc;
};
/*
* Multi-resolution encoding internal configuration
*/
struct aom_codec_priv_enc_mr_cfg {
unsigned int mr_total_resolutions;
unsigned int mr_encoder_id;
struct aom_rational mr_down_sampling_factor;
void *mr_low_res_mode_info;
};
#undef AOM_CTRL_USE_TYPE
#define AOM_CTRL_USE_TYPE(id, typ) \
static AOM_INLINE typ id##__value(va_list args) { return va_arg(args, typ); }
#undef AOM_CTRL_USE_TYPE_DEPRECATED
#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ) \
static AOM_INLINE typ id##__value(va_list args) { return va_arg(args, typ); }
#define CAST(id, arg) id##__value(arg)
/* CODEC_INTERFACE convenience macro
*
* By convention, each codec interface is a struct with extern linkage, where
* the symbol is suffixed with _algo. A getter function is also defined to
* return a pointer to the struct, since in some cases it's easier to work
* with text symbols than data symbols (see issue #169). This function has
* the same name as the struct, less the _algo suffix. The CODEC_INTERFACE
* macro is provided to define this getter function automatically.
*/
#define CODEC_INTERFACE(id) \
aom_codec_iface_t *id(void) { return &id##_algo; } \
aom_codec_iface_t id##_algo
/* Internal Utility Functions
*
* The following functions are intended to be used inside algorithms as
* utilities for manipulating aom_codec_* data structures.
*/
struct aom_codec_pkt_list {
unsigned int cnt;
unsigned int max;
struct aom_codec_cx_pkt pkts[1];
};
#define aom_codec_pkt_list_decl(n) \
union { \
struct aom_codec_pkt_list head; \
struct { \
struct aom_codec_pkt_list head; \
struct aom_codec_cx_pkt pkts[n]; \
} alloc; \
}
#define aom_codec_pkt_list_init(m) \
(m)->alloc.head.cnt = 0, \
(m)->alloc.head.max = sizeof((m)->alloc.pkts) / sizeof((m)->alloc.pkts[0])
int aom_codec_pkt_list_add(struct aom_codec_pkt_list *,
const struct aom_codec_cx_pkt *);
const aom_codec_cx_pkt_t *aom_codec_pkt_list_get(
struct aom_codec_pkt_list *list, aom_codec_iter_t *iter);
#include <stdio.h>
#include <setjmp.h>
struct aom_internal_error_info {
aom_codec_err_t error_code;
int has_detail;
char detail[80];
int setjmp;
jmp_buf jmp;
};
#define CLANG_ANALYZER_NORETURN
#if defined(__has_feature)
#if __has_feature(attribute_analyzer_noreturn)
#undef CLANG_ANALYZER_NORETURN
#define CLANG_ANALYZER_NORETURN __attribute__((analyzer_noreturn))
#endif
#endif
void aom_internal_error(struct aom_internal_error_info *info,
aom_codec_err_t error, const char *fmt,
...) CLANG_ANALYZER_NORETURN;
#if CONFIG_DEBUG
#define AOM_CHECK_MEM_ERROR(error_info, lval, expr) \
do { \
lval = (expr); \
if (!lval) \
aom_internal_error(error_info, AOM_CODEC_MEM_ERROR, \
"Failed to allocate " #lval " at %s:%d", __FILE__, \
__LINE__); \
} while (0)
#else
#define AOM_CHECK_MEM_ERROR(error_info, lval, expr) \
do { \
lval = (expr); \
if (!lval) \
aom_internal_error(error_info, AOM_CODEC_MEM_ERROR, \
"Failed to allocate " #lval); \
} while (0)
#endif
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_INTERNAL_AOM_CODEC_INTERNAL_H_

134
aom/src/aom_codec.c Normal file
View File

@@ -0,0 +1,134 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap decoder algorithms.
*
*/
#include <stdarg.h>
#include <stdlib.h>
#include "aom/aom_integer.h"
#include "aom/internal/aom_codec_internal.h"
#include "aom_version.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
int aom_codec_version(void) { return VERSION_PACKED; }
const char *aom_codec_version_str(void) { return VERSION_STRING_NOSP; }
const char *aom_codec_version_extra_str(void) { return VERSION_EXTRA; }
const char *aom_codec_iface_name(aom_codec_iface_t *iface) {
return iface ? iface->name : "<invalid interface>";
}
const char *aom_codec_err_to_string(aom_codec_err_t err) {
switch (err) {
case AOM_CODEC_OK: return "Success";
case AOM_CODEC_ERROR: return "Unspecified internal error";
case AOM_CODEC_MEM_ERROR: return "Memory allocation error";
case AOM_CODEC_ABI_MISMATCH: return "ABI version mismatch";
case AOM_CODEC_INCAPABLE:
return "Codec does not implement requested capability";
case AOM_CODEC_UNSUP_BITSTREAM:
return "Bitstream not supported by this decoder";
case AOM_CODEC_UNSUP_FEATURE:
return "Bitstream required feature not supported by this decoder";
case AOM_CODEC_CORRUPT_FRAME: return "Corrupt frame detected";
case AOM_CODEC_INVALID_PARAM: return "Invalid parameter";
case AOM_CODEC_LIST_END: return "End of iterated list";
}
return "Unrecognized error code";
}
const char *aom_codec_error(aom_codec_ctx_t *ctx) {
return (ctx) ? aom_codec_err_to_string(ctx->err)
: aom_codec_err_to_string(AOM_CODEC_INVALID_PARAM);
}
const char *aom_codec_error_detail(aom_codec_ctx_t *ctx) {
if (ctx && ctx->err)
return ctx->priv ? ctx->priv->err_detail : ctx->err_detail;
return NULL;
}
aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx) {
aom_codec_err_t res;
if (!ctx)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
ctx->iface->destroy((aom_codec_alg_priv_t *)ctx->priv);
ctx->iface = NULL;
ctx->name = NULL;
ctx->priv = NULL;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface) {
return (iface) ? iface->caps : 0;
}
aom_codec_err_t aom_codec_control_(aom_codec_ctx_t *ctx, int ctrl_id, ...) {
aom_codec_err_t res;
if (!ctx || !ctrl_id)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv || !ctx->iface->ctrl_maps)
res = AOM_CODEC_ERROR;
else {
aom_codec_ctrl_fn_map_t *entry;
res = AOM_CODEC_ERROR;
for (entry = ctx->iface->ctrl_maps; entry && entry->fn; entry++) {
if (!entry->ctrl_id || entry->ctrl_id == ctrl_id) {
va_list ap;
va_start(ap, ctrl_id);
res = entry->fn((aom_codec_alg_priv_t *)ctx->priv, ap);
va_end(ap);
break;
}
}
}
return SAVE_STATUS(ctx, res);
}
void aom_internal_error(struct aom_internal_error_info *info,
aom_codec_err_t error, const char *fmt, ...) {
va_list ap;
info->error_code = error;
info->has_detail = 0;
if (fmt) {
size_t sz = sizeof(info->detail);
info->has_detail = 1;
va_start(ap, fmt);
vsnprintf(info->detail, sz - 1, fmt, ap);
va_end(ap);
info->detail[sz - 1] = '\0';
}
if (info->setjmp) longjmp(info->jmp, info->error_code);
}

189
aom/src/aom_decoder.c Normal file
View File

@@ -0,0 +1,189 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap decoder algorithms.
*
*/
#include <string.h>
#include "aom/internal/aom_codec_internal.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
static aom_codec_alg_priv_t *get_alg_priv(aom_codec_ctx_t *ctx) {
return (aom_codec_alg_priv_t *)ctx->priv;
}
aom_codec_err_t aom_codec_dec_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_dec_cfg_t *cfg,
aom_codec_flags_t flags, int ver) {
aom_codec_err_t res;
if (ver != AOM_DECODER_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!ctx || !iface)
res = AOM_CODEC_INVALID_PARAM;
else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if ((flags & AOM_CODEC_USE_POSTPROC) &&
!(iface->caps & AOM_CODEC_CAP_POSTPROC))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_ERROR_CONCEALMENT) &&
!(iface->caps & AOM_CODEC_CAP_ERROR_CONCEALMENT))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_INPUT_FRAGMENTS) &&
!(iface->caps & AOM_CODEC_CAP_INPUT_FRAGMENTS))
res = AOM_CODEC_INCAPABLE;
else if (!(iface->caps & AOM_CODEC_CAP_DECODER))
res = AOM_CODEC_INCAPABLE;
else {
memset(ctx, 0, sizeof(*ctx));
ctx->iface = iface;
ctx->name = iface->name;
ctx->priv = NULL;
ctx->init_flags = flags;
ctx->config.dec = cfg;
res = ctx->iface->init(ctx, NULL);
if (res) {
ctx->err_detail = ctx->priv ? ctx->priv->err_detail : NULL;
aom_codec_destroy(ctx);
}
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_peek_stream_info(aom_codec_iface_t *iface,
const uint8_t *data,
unsigned int data_sz,
aom_codec_stream_info_t *si) {
aom_codec_err_t res;
if (!iface || !data || !data_sz || !si ||
si->sz < sizeof(aom_codec_stream_info_t))
res = AOM_CODEC_INVALID_PARAM;
else {
/* Set default/unknown values */
si->w = 0;
si->h = 0;
res = iface->dec.peek_si(data, data_sz, si);
}
return res;
}
aom_codec_err_t aom_codec_get_stream_info(aom_codec_ctx_t *ctx,
aom_codec_stream_info_t *si) {
aom_codec_err_t res;
if (!ctx || !si || si->sz < sizeof(aom_codec_stream_info_t))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
/* Set default/unknown values */
si->w = 0;
si->h = 0;
res = ctx->iface->dec.get_si(get_alg_priv(ctx), si);
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_decode(aom_codec_ctx_t *ctx, const uint8_t *data,
unsigned int data_sz, void *user_priv,
long deadline) {
aom_codec_err_t res;
/* Sanity checks */
/* NULL data ptr allowed if data_sz is 0 too */
if (!ctx || (!data && data_sz) || (data && !data_sz))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
res = ctx->iface->dec.decode(get_alg_priv(ctx), data, data_sz, user_priv,
deadline);
}
return SAVE_STATUS(ctx, res);
}
aom_image_t *aom_codec_get_frame(aom_codec_ctx_t *ctx, aom_codec_iter_t *iter) {
aom_image_t *img;
if (!ctx || !iter || !ctx->iface || !ctx->priv)
img = NULL;
else
img = ctx->iface->dec.get_frame(get_alg_priv(ctx), iter);
return img;
}
aom_codec_err_t aom_codec_register_put_frame_cb(aom_codec_ctx_t *ctx,
aom_codec_put_frame_cb_fn_t cb,
void *user_priv) {
aom_codec_err_t res;
if (!ctx || !cb)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_PUT_FRAME))
res = AOM_CODEC_ERROR;
else {
ctx->priv->dec.put_frame_cb.u.put_frame = cb;
ctx->priv->dec.put_frame_cb.user_priv = user_priv;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_register_put_slice_cb(aom_codec_ctx_t *ctx,
aom_codec_put_slice_cb_fn_t cb,
void *user_priv) {
aom_codec_err_t res;
if (!ctx || !cb)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_PUT_SLICE))
res = AOM_CODEC_ERROR;
else {
ctx->priv->dec.put_slice_cb.u.put_slice = cb;
ctx->priv->dec.put_slice_cb.user_priv = user_priv;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_set_frame_buffer_functions(
aom_codec_ctx_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv) {
aom_codec_err_t res;
if (!ctx || !cb_get || !cb_release) {
res = AOM_CODEC_INVALID_PARAM;
} else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER)) {
res = AOM_CODEC_ERROR;
} else {
res = ctx->iface->dec.set_fb_fn(get_alg_priv(ctx), cb_get, cb_release,
cb_priv);
}
return SAVE_STATUS(ctx, res);
}

380
aom/src/aom_encoder.c Normal file
View File

@@ -0,0 +1,380 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap encoder algorithms.
*
*/
#include <limits.h>
#include <string.h>
#include "aom_config.h"
#include "aom/internal/aom_codec_internal.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
static aom_codec_alg_priv_t *get_alg_priv(aom_codec_ctx_t *ctx) {
return (aom_codec_alg_priv_t *)ctx->priv;
}
aom_codec_err_t aom_codec_enc_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_enc_cfg_t *cfg,
aom_codec_flags_t flags, int ver) {
aom_codec_err_t res;
if (ver != AOM_ENCODER_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!ctx || !iface || !cfg)
res = AOM_CODEC_INVALID_PARAM;
else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!(iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_PSNR) && !(iface->caps & AOM_CODEC_CAP_PSNR))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_OUTPUT_PARTITION) &&
!(iface->caps & AOM_CODEC_CAP_OUTPUT_PARTITION))
res = AOM_CODEC_INCAPABLE;
else {
ctx->iface = iface;
ctx->name = iface->name;
ctx->priv = NULL;
ctx->init_flags = flags;
ctx->config.enc = cfg;
res = ctx->iface->init(ctx, NULL);
if (res) {
ctx->err_detail = ctx->priv ? ctx->priv->err_detail : NULL;
aom_codec_destroy(ctx);
}
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_enc_init_multi_ver(
aom_codec_ctx_t *ctx, aom_codec_iface_t *iface, aom_codec_enc_cfg_t *cfg,
int num_enc, aom_codec_flags_t flags, aom_rational_t *dsf, int ver) {
aom_codec_err_t res = AOM_CODEC_OK;
if (ver != AOM_ENCODER_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!ctx || !iface || !cfg || (num_enc > 16 || num_enc < 1))
res = AOM_CODEC_INVALID_PARAM;
else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!(iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_PSNR) && !(iface->caps & AOM_CODEC_CAP_PSNR))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_OUTPUT_PARTITION) &&
!(iface->caps & AOM_CODEC_CAP_OUTPUT_PARTITION))
res = AOM_CODEC_INCAPABLE;
else {
int i;
void *mem_loc = NULL;
if (!(res = iface->enc.mr_get_mem_loc(cfg, &mem_loc))) {
for (i = 0; i < num_enc; i++) {
aom_codec_priv_enc_mr_cfg_t mr_cfg;
/* Validate down-sampling factor. */
if (dsf->num < 1 || dsf->num > 4096 || dsf->den < 1 ||
dsf->den > dsf->num) {
res = AOM_CODEC_INVALID_PARAM;
break;
}
mr_cfg.mr_low_res_mode_info = mem_loc;
mr_cfg.mr_total_resolutions = num_enc;
mr_cfg.mr_encoder_id = num_enc - 1 - i;
mr_cfg.mr_down_sampling_factor.num = dsf->num;
mr_cfg.mr_down_sampling_factor.den = dsf->den;
/* Force Key-frame synchronization. Namely, encoder at higher
* resolution always use the same frame_type chosen by the
* lowest-resolution encoder.
*/
if (mr_cfg.mr_encoder_id) cfg->kf_mode = AOM_KF_DISABLED;
ctx->iface = iface;
ctx->name = iface->name;
ctx->priv = NULL;
ctx->init_flags = flags;
ctx->config.enc = cfg;
res = ctx->iface->init(ctx, &mr_cfg);
if (res) {
const char *error_detail = ctx->priv ? ctx->priv->err_detail : NULL;
/* Destroy current ctx */
ctx->err_detail = error_detail;
aom_codec_destroy(ctx);
/* Destroy already allocated high-level ctx */
while (i) {
ctx--;
ctx->err_detail = error_detail;
aom_codec_destroy(ctx);
i--;
}
}
if (res) break;
ctx++;
cfg++;
dsf++;
}
ctx--;
}
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_enc_config_default(aom_codec_iface_t *iface,
aom_codec_enc_cfg_t *cfg,
unsigned int usage) {
aom_codec_err_t res;
aom_codec_enc_cfg_map_t *map;
int i;
if (!iface || !cfg || usage > INT_MAX)
res = AOM_CODEC_INVALID_PARAM;
else if (!(iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else {
res = AOM_CODEC_INVALID_PARAM;
for (i = 0; i < iface->enc.cfg_map_count; ++i) {
map = iface->enc.cfg_maps + i;
if (map->usage == (int)usage) {
*cfg = map->cfg;
cfg->g_usage = usage;
res = AOM_CODEC_OK;
break;
}
}
}
return res;
}
#if ARCH_X86 || ARCH_X86_64
/* On X86, disable the x87 unit's internal 80 bit precision for better
* consistency with the SSE unit's 64 bit precision.
*/
#include "aom_ports/x86.h"
#define FLOATING_POINT_INIT() \
do { \
unsigned short x87_orig_mode = x87_set_double_precision();
#define FLOATING_POINT_RESTORE() \
x87_set_control_word(x87_orig_mode); \
} \
while (0)
#else
static void FLOATING_POINT_INIT() {}
static void FLOATING_POINT_RESTORE() {}
#endif
aom_codec_err_t aom_codec_encode(aom_codec_ctx_t *ctx, const aom_image_t *img,
aom_codec_pts_t pts, unsigned long duration,
aom_enc_frame_flags_t flags,
unsigned long deadline) {
aom_codec_err_t res = AOM_CODEC_OK;
if (!ctx || (img && !duration))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else {
unsigned int num_enc = ctx->priv->enc.total_encoders;
/* Execute in a normalized floating point environment, if the platform
* requires it.
*/
FLOATING_POINT_INIT();
if (num_enc == 1)
res = ctx->iface->enc.encode(get_alg_priv(ctx), img, pts, duration, flags,
deadline);
else {
/* Multi-resolution encoding:
* Encode multi-levels in reverse order. For example,
* if mr_total_resolutions = 3, first encode level 2,
* then encode level 1, and finally encode level 0.
*/
int i;
ctx += num_enc - 1;
if (img) img += num_enc - 1;
for (i = num_enc - 1; i >= 0; i--) {
if ((res = ctx->iface->enc.encode(get_alg_priv(ctx), img, pts, duration,
flags, deadline)))
break;
ctx--;
if (img) img--;
}
ctx++;
}
FLOATING_POINT_RESTORE();
}
return SAVE_STATUS(ctx, res);
}
const aom_codec_cx_pkt_t *aom_codec_get_cx_data(aom_codec_ctx_t *ctx,
aom_codec_iter_t *iter) {
const aom_codec_cx_pkt_t *pkt = NULL;
if (ctx) {
if (!iter)
ctx->err = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
ctx->err = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
ctx->err = AOM_CODEC_INCAPABLE;
else
pkt = ctx->iface->enc.get_cx_data(get_alg_priv(ctx), iter);
}
if (pkt && pkt->kind == AOM_CODEC_CX_FRAME_PKT) {
// If the application has specified a destination area for the
// compressed data, and the codec has not placed the data there,
// and it fits, copy it.
aom_codec_priv_t *const priv = ctx->priv;
char *const dst_buf = (char *)priv->enc.cx_data_dst_buf.buf;
if (dst_buf && pkt->data.raw.buf != dst_buf &&
pkt->data.raw.sz + priv->enc.cx_data_pad_before +
priv->enc.cx_data_pad_after <=
priv->enc.cx_data_dst_buf.sz) {
aom_codec_cx_pkt_t *modified_pkt = &priv->enc.cx_data_pkt;
memcpy(dst_buf + priv->enc.cx_data_pad_before, pkt->data.raw.buf,
pkt->data.raw.sz);
*modified_pkt = *pkt;
modified_pkt->data.raw.buf = dst_buf;
modified_pkt->data.raw.sz +=
priv->enc.cx_data_pad_before + priv->enc.cx_data_pad_after;
pkt = modified_pkt;
}
if (dst_buf == pkt->data.raw.buf) {
priv->enc.cx_data_dst_buf.buf = dst_buf + pkt->data.raw.sz;
priv->enc.cx_data_dst_buf.sz -= pkt->data.raw.sz;
}
}
return pkt;
}
aom_codec_err_t aom_codec_set_cx_data_buf(aom_codec_ctx_t *ctx,
const aom_fixed_buf_t *buf,
unsigned int pad_before,
unsigned int pad_after) {
if (!ctx || !ctx->priv) return AOM_CODEC_INVALID_PARAM;
if (buf) {
ctx->priv->enc.cx_data_dst_buf = *buf;
ctx->priv->enc.cx_data_pad_before = pad_before;
ctx->priv->enc.cx_data_pad_after = pad_after;
} else {
ctx->priv->enc.cx_data_dst_buf.buf = NULL;
ctx->priv->enc.cx_data_dst_buf.sz = 0;
ctx->priv->enc.cx_data_pad_before = 0;
ctx->priv->enc.cx_data_pad_after = 0;
}
return AOM_CODEC_OK;
}
const aom_image_t *aom_codec_get_preview_frame(aom_codec_ctx_t *ctx) {
aom_image_t *img = NULL;
if (ctx) {
if (!ctx->iface || !ctx->priv)
ctx->err = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
ctx->err = AOM_CODEC_INCAPABLE;
else if (!ctx->iface->enc.get_preview)
ctx->err = AOM_CODEC_INCAPABLE;
else
img = ctx->iface->enc.get_preview(get_alg_priv(ctx));
}
return img;
}
aom_fixed_buf_t *aom_codec_get_global_headers(aom_codec_ctx_t *ctx) {
aom_fixed_buf_t *buf = NULL;
if (ctx) {
if (!ctx->iface || !ctx->priv)
ctx->err = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
ctx->err = AOM_CODEC_INCAPABLE;
else if (!ctx->iface->enc.get_glob_hdrs)
ctx->err = AOM_CODEC_INCAPABLE;
else
buf = ctx->iface->enc.get_glob_hdrs(get_alg_priv(ctx));
}
return buf;
}
aom_codec_err_t aom_codec_enc_config_set(aom_codec_ctx_t *ctx,
const aom_codec_enc_cfg_t *cfg) {
aom_codec_err_t res;
if (!ctx || !ctx->iface || !ctx->priv || !cfg)
res = AOM_CODEC_INVALID_PARAM;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else
res = ctx->iface->enc.cfg_set(get_alg_priv(ctx), cfg);
return SAVE_STATUS(ctx, res);
}
int aom_codec_pkt_list_add(struct aom_codec_pkt_list *list,
const struct aom_codec_cx_pkt *pkt) {
if (list->cnt < list->max) {
list->pkts[list->cnt++] = *pkt;
return 0;
}
return 1;
}
const aom_codec_cx_pkt_t *aom_codec_pkt_list_get(
struct aom_codec_pkt_list *list, aom_codec_iter_t *iter) {
const aom_codec_cx_pkt_t *pkt;
if (!(*iter)) {
*iter = list->pkts;
}
pkt = (const aom_codec_cx_pkt_t *)*iter;
if ((size_t)(pkt - list->pkts) < list->cnt)
*iter = pkt + 1;
else
pkt = NULL;
return pkt;
}

240
aom/src/aom_image.c Normal file
View File

@@ -0,0 +1,240 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <stdlib.h>
#include <string.h>
#include "aom/aom_image.h"
#include "aom/aom_integer.h"
#include "aom_mem/aom_mem.h"
static aom_image_t *img_alloc_helper(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int buf_align,
unsigned int stride_align,
unsigned char *img_data) {
unsigned int h, w, s, xcs, ycs, bps;
unsigned int stride_in_bytes;
int align;
/* Treat align==0 like align==1 */
if (!buf_align) buf_align = 1;
/* Validate alignment (must be power of 2) */
if (buf_align & (buf_align - 1)) goto fail;
/* Treat align==0 like align==1 */
if (!stride_align) stride_align = 1;
/* Validate alignment (must be power of 2) */
if (stride_align & (stride_align - 1)) goto fail;
/* Get sample size for this format */
switch (fmt) {
case AOM_IMG_FMT_RGB32:
case AOM_IMG_FMT_RGB32_LE:
case AOM_IMG_FMT_ARGB:
case AOM_IMG_FMT_ARGB_LE: bps = 32; break;
case AOM_IMG_FMT_RGB24:
case AOM_IMG_FMT_BGR24: bps = 24; break;
case AOM_IMG_FMT_RGB565:
case AOM_IMG_FMT_RGB565_LE:
case AOM_IMG_FMT_RGB555:
case AOM_IMG_FMT_RGB555_LE:
case AOM_IMG_FMT_UYVY:
case AOM_IMG_FMT_YUY2:
case AOM_IMG_FMT_YVYU: bps = 16; break;
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12: bps = 12; break;
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I440: bps = 16; break;
case AOM_IMG_FMT_I444: bps = 24; break;
case AOM_IMG_FMT_I42016: bps = 24; break;
case AOM_IMG_FMT_I42216:
case AOM_IMG_FMT_I44016: bps = 32; break;
case AOM_IMG_FMT_I44416: bps = 48; break;
default: bps = 16; break;
}
/* Get chroma shift values for this format */
switch (fmt) {
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12:
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I42216: xcs = 1; break;
default: xcs = 0; break;
}
switch (fmt) {
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_I440:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12:
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I44016: ycs = 1; break;
default: ycs = 0; break;
}
/* Calculate storage sizes given the chroma subsampling */
align = (1 << xcs) - 1;
w = (d_w + align) & ~align;
align = (1 << ycs) - 1;
h = (d_h + align) & ~align;
s = (fmt & AOM_IMG_FMT_PLANAR) ? w : bps * w / 8;
s = (s + stride_align - 1) & ~(stride_align - 1);
stride_in_bytes = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
/* Allocate the new image */
if (!img) {
img = (aom_image_t *)calloc(1, sizeof(aom_image_t));
if (!img) goto fail;
img->self_allocd = 1;
} else {
memset(img, 0, sizeof(aom_image_t));
}
img->img_data = img_data;
if (!img_data) {
const uint64_t alloc_size = (fmt & AOM_IMG_FMT_PLANAR)
? (uint64_t)h * s * bps / 8
: (uint64_t)h * s;
if (alloc_size != (size_t)alloc_size) goto fail;
img->img_data = (uint8_t *)aom_memalign(buf_align, (size_t)alloc_size);
img->img_data_owner = 1;
}
if (!img->img_data) goto fail;
img->fmt = fmt;
img->bit_depth = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 16 : 8;
img->w = w;
img->h = h;
img->x_chroma_shift = xcs;
img->y_chroma_shift = ycs;
img->bps = bps;
/* Calculate strides */
img->stride[AOM_PLANE_Y] = img->stride[AOM_PLANE_ALPHA] = stride_in_bytes;
img->stride[AOM_PLANE_U] = img->stride[AOM_PLANE_V] = stride_in_bytes >> xcs;
/* Default viewport to entire image */
if (!aom_img_set_rect(img, 0, 0, d_w, d_h)) return img;
fail:
aom_img_free(img);
return NULL;
}
aom_image_t *aom_img_alloc(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int align) {
return img_alloc_helper(img, fmt, d_w, d_h, align, align, NULL);
}
aom_image_t *aom_img_wrap(aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w,
unsigned int d_h, unsigned int stride_align,
unsigned char *img_data) {
/* By setting buf_align = 1, we don't change buffer alignment in this
* function. */
return img_alloc_helper(img, fmt, d_w, d_h, 1, stride_align, img_data);
}
int aom_img_set_rect(aom_image_t *img, unsigned int x, unsigned int y,
unsigned int w, unsigned int h) {
unsigned char *data;
if (x + w <= img->w && y + h <= img->h) {
img->d_w = w;
img->d_h = h;
/* Calculate plane pointers */
if (!(img->fmt & AOM_IMG_FMT_PLANAR)) {
img->planes[AOM_PLANE_PACKED] =
img->img_data + x * img->bps / 8 + y * img->stride[AOM_PLANE_PACKED];
} else {
const int bytes_per_sample =
(img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1;
data = img->img_data;
if (img->fmt & AOM_IMG_FMT_HAS_ALPHA) {
img->planes[AOM_PLANE_ALPHA] =
data + x * bytes_per_sample + y * img->stride[AOM_PLANE_ALPHA];
data += img->h * img->stride[AOM_PLANE_ALPHA];
}
img->planes[AOM_PLANE_Y] =
data + x * bytes_per_sample + y * img->stride[AOM_PLANE_Y];
data += img->h * img->stride[AOM_PLANE_Y];
if (!(img->fmt & AOM_IMG_FMT_UV_FLIP)) {
img->planes[AOM_PLANE_U] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
data += (img->h >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
img->planes[AOM_PLANE_V] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
} else {
img->planes[AOM_PLANE_V] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
data += (img->h >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
img->planes[AOM_PLANE_U] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
}
}
return 0;
}
return -1;
}
void aom_img_flip(aom_image_t *img) {
/* Note: In the calculation pointer adjustment calculation, we want the
* rhs to be promoted to a signed type. Section 6.3.1.8 of the ISO C99
* standard indicates that if the adjustment parameter is unsigned, the
* stride parameter will be promoted to unsigned, causing errors when
* the lhs is a larger type than the rhs.
*/
img->planes[AOM_PLANE_Y] += (signed)(img->d_h - 1) * img->stride[AOM_PLANE_Y];
img->stride[AOM_PLANE_Y] = -img->stride[AOM_PLANE_Y];
img->planes[AOM_PLANE_U] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
img->stride[AOM_PLANE_U];
img->stride[AOM_PLANE_U] = -img->stride[AOM_PLANE_U];
img->planes[AOM_PLANE_V] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
img->stride[AOM_PLANE_V];
img->stride[AOM_PLANE_V] = -img->stride[AOM_PLANE_V];
img->planes[AOM_PLANE_ALPHA] +=
(signed)(img->d_h - 1) * img->stride[AOM_PLANE_ALPHA];
img->stride[AOM_PLANE_ALPHA] = -img->stride[AOM_PLANE_ALPHA];
}
void aom_img_free(aom_image_t *img) {
if (img) {
if (img->img_data && img->img_data_owner) aom_free(img->img_data);
if (img->self_allocd) free(img);
}
}

View File

@@ -11,22 +11,20 @@
#include <math.h> #include <math.h>
#include <stdlib.h> #include <stdlib.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
#include "vpx_ports/mem.h" #include "aom_ports/mem.h"
void vpx_plane_add_noise_c(uint8_t *start, char *noise, void aom_plane_add_noise_c(uint8_t *start, char *noise, char blackclamp[16],
char blackclamp[16], char whiteclamp[16], char bothclamp[16],
char whiteclamp[16],
char bothclamp[16],
unsigned int width, unsigned int height, int pitch) { unsigned int width, unsigned int height, int pitch) {
unsigned int i, j; unsigned int i, j;
for (i = 0; i < height; ++i) { for (i = 0; i < height; ++i) {
uint8_t *pos = start + i * pitch; uint8_t *pos = start + i * pitch;
char *ref = (char *)(noise + (rand() & 0xff)); // NOLINT char *ref = (char *)(noise + (rand() & 0xff)); // NOLINT
for (j = 0; j < width; ++j) { for (j = 0; j < width; ++j) {
int v = pos[j]; int v = pos[j];
@@ -45,13 +43,13 @@ static double gaussian(double sigma, double mu, double x) {
(exp(-(x - mu) * (x - mu) / (2 * sigma * sigma))); (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma)));
} }
int vpx_setup_noise(double sigma, int size, char *noise) { int aom_setup_noise(double sigma, int size, char *noise) {
char char_dist[256]; char char_dist[256];
int next = 0, i, j; int next = 0, i, j;
// set up a 256 entry lookup that matches gaussian distribution // set up a 256 entry lookup that matches gaussian distribution
for (i = -32; i < 32; ++i) { for (i = -32; i < 32; ++i) {
const int a_i = (int) (0.5 + 256 * gaussian(sigma, 0, i)); const int a_i = (int)(0.5 + 256 * gaussian(sigma, 0, i));
if (a_i) { if (a_i) {
for (j = 0; j < a_i; ++j) { for (j = 0; j < a_i; ++j) {
char_dist[next + j] = (char)i; char_dist[next + j] = (char)i;

64
aom_dsp/ans.c Normal file
View File

@@ -0,0 +1,64 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/ans.h"
#include "aom_dsp/prob.h"
static int find_largest(const aom_cdf_prob *const pdf_tab, int num_syms) {
int largest_idx = -1;
int largest_p = -1;
int i;
for (i = 0; i < num_syms; ++i) {
int p = pdf_tab[i];
if (p > largest_p) {
largest_p = p;
largest_idx = i;
}
}
return largest_idx;
}
void aom_rans_merge_prob8_pdf(aom_cdf_prob *const out_pdf,
const AnsP8 node_prob,
const aom_cdf_prob *const src_pdf, int in_syms) {
int i;
int adjustment = RANS_PRECISION;
const int round_fact = ANS_P8_PRECISION >> 1;
const AnsP8 p1 = ANS_P8_PRECISION - node_prob;
const int out_syms = in_syms + 1;
assert(src_pdf != out_pdf);
out_pdf[0] = node_prob << (RANS_PROB_BITS - ANS_P8_SHIFT);
adjustment -= out_pdf[0];
for (i = 0; i < in_syms; ++i) {
int p = (p1 * src_pdf[i] + round_fact) >> ANS_P8_SHIFT;
p = AOMMIN(p, (int)RANS_PRECISION - in_syms);
p = AOMMAX(p, 1);
out_pdf[i + 1] = p;
adjustment -= p;
}
// Adjust probabilities so they sum to the total probability
if (adjustment > 0) {
i = find_largest(out_pdf, out_syms);
out_pdf[i] += adjustment;
} else {
while (adjustment < 0) {
i = find_largest(out_pdf, out_syms);
--out_pdf[i];
assert(out_pdf[i] > 0);
adjustment++;
}
}
}

44
aom_dsp/ans.h Normal file
View File

@@ -0,0 +1,44 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANS_H_
#define AOM_DSP_ANS_H_
// Constants, types and utilities for Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/prob.h"
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
typedef uint8_t AnsP8;
#define ANS_P8_PRECISION 256u
#define ANS_P8_SHIFT 8
#define RANS_PROB_BITS 15
#define RANS_PRECISION (1u << RANS_PROB_BITS)
// L_BASE % PRECISION must be 0. Increasing L_BASE beyond 2**15 will cause uabs
// to overflow.
#define L_BASE (RANS_PRECISION)
#define IO_BASE 256
// Range I = { L_BASE, L_BASE + 1, ..., L_BASE * IO_BASE - 1 }
void aom_rans_merge_prob8_pdf(aom_cdf_prob *const out_pdf,
const AnsP8 node_prob,
const aom_cdf_prob *const src_pdf, int in_syms);
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANS_H_

146
aom_dsp/ansreader.h Normal file
View File

@@ -0,0 +1,146 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANSREADER_H_
#define AOM_DSP_ANSREADER_H_
// A uABS and rANS decoder implementation of Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/prob.h"
#include "aom_dsp/ans.h"
#include "aom_ports/mem_ops.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
struct AnsDecoder {
const uint8_t *buf;
int buf_offset;
uint32_t state;
#if CONFIG_ACCOUNTING
Accounting *accounting;
#endif
};
static INLINE int uabs_read(struct AnsDecoder *ans, AnsP8 p0) {
AnsP8 p = ANS_P8_PRECISION - p0;
int s;
unsigned xp, sp;
unsigned state = ans->state;
while (state < L_BASE && ans->buf_offset > 0) {
state = state * IO_BASE + ans->buf[--ans->buf_offset];
}
sp = state * p;
xp = sp / ANS_P8_PRECISION;
s = (sp & 0xFF) >= p0;
if (s)
ans->state = xp;
else
ans->state = state - xp;
return s;
}
static INLINE int uabs_read_bit(struct AnsDecoder *ans) {
int s;
unsigned state = ans->state;
while (state < L_BASE && ans->buf_offset > 0) {
state = state * IO_BASE + ans->buf[--ans->buf_offset];
}
s = (int)(state & 1);
ans->state = state >> 1;
return s;
}
struct rans_dec_sym {
uint8_t val;
aom_cdf_prob prob;
aom_cdf_prob cum_prob; // not-inclusive
};
static INLINE void fetch_sym(struct rans_dec_sym *out, const aom_cdf_prob *cdf,
aom_cdf_prob rem) {
int i;
aom_cdf_prob cum_prob = 0, top_prob;
// TODO(skal): if critical, could be a binary search.
// Or, better, an O(1) alias-table.
for (i = 0; rem >= (top_prob = cdf[i]); ++i) {
cum_prob = top_prob;
}
out->val = i;
out->prob = top_prob - cum_prob;
out->cum_prob = cum_prob;
}
static INLINE int rans_read(struct AnsDecoder *ans, const aom_cdf_prob *tab) {
unsigned rem;
unsigned quo;
struct rans_dec_sym sym;
while (ans->state < L_BASE && ans->buf_offset > 0) {
ans->state = ans->state * IO_BASE + ans->buf[--ans->buf_offset];
}
quo = ans->state / RANS_PRECISION;
rem = ans->state % RANS_PRECISION;
fetch_sym(&sym, tab, rem);
ans->state = quo * sym.prob + rem - sym.cum_prob;
return sym.val;
}
static INLINE int ans_read_init(struct AnsDecoder *const ans,
const uint8_t *const buf, int offset) {
unsigned x;
if (offset < 1) return 1;
ans->buf = buf;
x = buf[offset - 1] >> 6;
if (x == 0) {
ans->buf_offset = offset - 1;
ans->state = buf[offset - 1] & 0x3F;
} else if (x == 1) {
if (offset < 2) return 1;
ans->buf_offset = offset - 2;
ans->state = mem_get_le16(buf + offset - 2) & 0x3FFF;
} else if (x == 2) {
if (offset < 3) return 1;
ans->buf_offset = offset - 3;
ans->state = mem_get_le24(buf + offset - 3) & 0x3FFFFF;
} else if ((buf[offset - 1] & 0xE0) == 0xE0) {
if (offset < 4) return 1;
ans->buf_offset = offset - 4;
ans->state = mem_get_le32(buf + offset - 4) & 0x1FFFFFFF;
} else {
// 110xxxxx implies this byte is a superframe marker
return 1;
}
#if CONFIG_ACCOUNTING
ans->accounting = NULL;
#endif
ans->state += L_BASE;
if (ans->state >= L_BASE * IO_BASE) return 1;
return 0;
}
static INLINE int ans_read_end(struct AnsDecoder *const ans) {
return ans->state == L_BASE;
}
static INLINE int ans_reader_has_error(const struct AnsDecoder *const ans) {
return ans->state < L_BASE && ans->buf_offset == 0;
}
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANSREADER_H_

120
aom_dsp/answriter.h Normal file
View File

@@ -0,0 +1,120 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANSWRITER_H_
#define AOM_DSP_ANSWRITER_H_
// A uABS and rANS encoder implementation of Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/ans.h"
#include "aom_dsp/prob.h"
#include "aom_ports/mem_ops.h"
#include "av1/common/odintrin.h"
#if RANS_PRECISION <= OD_DIVU_DMAX
#define ANS_DIVREM(quotient, remainder, dividend, divisor) \
do { \
quotient = OD_DIVU_SMALL((dividend), (divisor)); \
remainder = (dividend) - (quotient) * (divisor); \
} while (0)
#else
#define ANS_DIVREM(quotient, remainder, dividend, divisor) \
do { \
quotient = (dividend) / (divisor); \
remainder = (dividend) % (divisor); \
} while (0)
#endif
#define ANS_DIV8(dividend, divisor) OD_DIVU_SMALL((dividend), (divisor))
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
struct AnsCoder {
uint8_t *buf;
int buf_offset;
uint32_t state;
};
static INLINE void ans_write_init(struct AnsCoder *const ans,
uint8_t *const buf) {
ans->buf = buf;
ans->buf_offset = 0;
ans->state = L_BASE;
}
static INLINE int ans_write_end(struct AnsCoder *const ans) {
uint32_t state;
assert(ans->state >= L_BASE);
assert(ans->state < L_BASE * IO_BASE);
state = ans->state - L_BASE;
if (state < (1 << 6)) {
ans->buf[ans->buf_offset] = (0x00 << 6) + state;
return ans->buf_offset + 1;
} else if (state < (1 << 14)) {
mem_put_le16(ans->buf + ans->buf_offset, (0x01 << 14) + state);
return ans->buf_offset + 2;
} else if (state < (1 << 22)) {
mem_put_le24(ans->buf + ans->buf_offset, (0x02 << 22) + state);
return ans->buf_offset + 3;
} else if (state < (1 << 29)) {
mem_put_le32(ans->buf + ans->buf_offset, (0x07 << 29) + state);
return ans->buf_offset + 4;
} else {
assert(0 && "State is too large to be serialized");
return ans->buf_offset;
}
}
// uABS with normalization
static INLINE void uabs_write(struct AnsCoder *ans, int val, AnsP8 p0) {
AnsP8 p = ANS_P8_PRECISION - p0;
const unsigned l_s = val ? p : p0;
while (ans->state >= L_BASE / ANS_P8_PRECISION * IO_BASE * l_s) {
ans->buf[ans->buf_offset++] = ans->state % IO_BASE;
ans->state /= IO_BASE;
}
if (!val)
ans->state = ANS_DIV8(ans->state * ANS_P8_PRECISION, p0);
else
ans->state = ANS_DIV8((ans->state + 1) * ANS_P8_PRECISION + p - 1, p) - 1;
}
struct rans_sym {
aom_cdf_prob prob;
aom_cdf_prob cum_prob; // not-inclusive
};
// rANS with normalization
// sym->prob takes the place of l_s from the paper
// ANS_P10_PRECISION is m
static INLINE void rans_write(struct AnsCoder *ans,
const struct rans_sym *const sym) {
const aom_cdf_prob p = sym->prob;
unsigned quot, rem;
while (ans->state >= L_BASE / RANS_PRECISION * IO_BASE * p) {
ans->buf[ans->buf_offset++] = ans->state % IO_BASE;
ans->state /= IO_BASE;
}
ANS_DIVREM(quot, rem, ans->state, p);
ans->state = quot * RANS_PRECISION + rem + sym->cum_prob;
}
#undef ANS_DIV8
#undef ANS_DIVREM
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANSWRITER_H_

View File

@@ -1,28 +1,29 @@
/* /*
* Copyright (c) 2013 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <assert.h> #include <assert.h>
#include <string.h> #include <string.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
#include "vpx_dsp/vpx_convolve.h" #include "aom_dsp/aom_convolve.h"
#include "vpx_dsp/vpx_dsp_common.h" #include "aom_dsp/aom_dsp_common.h"
#include "vpx_dsp/vpx_filter.h" #include "aom_dsp/aom_filter.h"
#include "vpx_ports/mem.h" #include "aom_ports/mem.h"
static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride, static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *x_filters, const InterpKernel *x_filters, int x0_q4,
int x0_q4, int x_step_q4, int w, int h) { int x_step_q4, int w, int h) {
int x, y; int x, y;
src -= SUBPEL_TAPS / 2 - 1; src -= SUBPEL_TAPS / 2 - 1;
for (y = 0; y < h; ++y) { for (y = 0; y < h; ++y) {
@@ -31,8 +32,7 @@ static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS]; const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK]; const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0; int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k) for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
sum += src_x[k] * x_filter[k];
dst[x] = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)); dst[x] = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
x_q4 += x_step_q4; x_q4 += x_step_q4;
} }
@@ -43,8 +43,8 @@ static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride,
static void convolve_avg_horiz(const uint8_t *src, ptrdiff_t src_stride, static void convolve_avg_horiz(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *x_filters, const InterpKernel *x_filters, int x0_q4,
int x0_q4, int x_step_q4, int w, int h) { int x_step_q4, int w, int h) {
int x, y; int x, y;
src -= SUBPEL_TAPS / 2 - 1; src -= SUBPEL_TAPS / 2 - 1;
for (y = 0; y < h; ++y) { for (y = 0; y < h; ++y) {
@@ -53,10 +53,9 @@ static void convolve_avg_horiz(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS]; const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK]; const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0; int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k) for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
sum += src_x[k] * x_filter[k]; dst[x] = ROUND_POWER_OF_TWO(
dst[x] = ROUND_POWER_OF_TWO(dst[x] + dst[x] + clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)), 1);
clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)), 1);
x_q4 += x_step_q4; x_q4 += x_step_q4;
} }
src += src_stride; src += src_stride;
@@ -66,8 +65,8 @@ static void convolve_avg_horiz(const uint8_t *src, ptrdiff_t src_stride,
static void convolve_vert(const uint8_t *src, ptrdiff_t src_stride, static void convolve_vert(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *y_filters, const InterpKernel *y_filters, int y0_q4,
int y0_q4, int y_step_q4, int w, int h) { int y_step_q4, int w, int h) {
int x, y; int x, y;
src -= src_stride * (SUBPEL_TAPS / 2 - 1); src -= src_stride * (SUBPEL_TAPS / 2 - 1);
@@ -89,8 +88,8 @@ static void convolve_vert(const uint8_t *src, ptrdiff_t src_stride,
static void convolve_avg_vert(const uint8_t *src, ptrdiff_t src_stride, static void convolve_avg_vert(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *y_filters, const InterpKernel *y_filters, int y0_q4,
int y0_q4, int y_step_q4, int w, int h) { int y_step_q4, int w, int h) {
int x, y; int x, y;
src -= src_stride * (SUBPEL_TAPS / 2 - 1); src -= src_stride * (SUBPEL_TAPS / 2 - 1);
@@ -102,8 +101,10 @@ static void convolve_avg_vert(const uint8_t *src, ptrdiff_t src_stride,
int k, sum = 0; int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k) for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_y[k * src_stride] * y_filter[k]; sum += src_y[k * src_stride] * y_filter[k];
dst[y * dst_stride] = ROUND_POWER_OF_TWO(dst[y * dst_stride] + dst[y * dst_stride] = ROUND_POWER_OF_TWO(
clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)), 1); dst[y * dst_stride] +
clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)),
1);
y_q4 += y_step_q4; y_q4 += y_step_q4;
} }
++src; ++src;
@@ -111,13 +112,11 @@ static void convolve_avg_vert(const uint8_t *src, ptrdiff_t src_stride,
} }
} }
static void convolve(const uint8_t *src, ptrdiff_t src_stride, static void convolve(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const InterpKernel *const x_filters,
const InterpKernel *const x_filters,
int x0_q4, int x_step_q4, int x0_q4, int x_step_q4,
const InterpKernel *const y_filters, const InterpKernel *const y_filters, int y0_q4,
int y0_q4, int y_step_q4, int y_step_q4, int w, int h) {
int w, int h) {
// Note: Fixed size intermediate buffer, temp, places limits on parameters. // Note: Fixed size intermediate buffer, temp, places limits on parameters.
// 2d filtering proceeds in 2 steps: // 2d filtering proceeds in 2 steps:
// (1) Interpolate horizontally into an intermediate buffer, temp. // (1) Interpolate horizontally into an intermediate buffer, temp.
@@ -132,7 +131,7 @@ static void convolve(const uint8_t *src, ptrdiff_t src_stride,
// --((64 - 1) * 32 + 15) >> 4 + 8 = 135. // --((64 - 1) * 32 + 15) >> 4 + 8 = 135.
uint8_t temp[MAX_EXT_SIZE * MAX_SB_SIZE]; uint8_t temp[MAX_EXT_SIZE * MAX_SB_SIZE];
int intermediate_height = int intermediate_height =
(((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS; (((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS;
assert(w <= MAX_SB_SIZE); assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE); assert(h <= MAX_SB_SIZE);
@@ -140,12 +139,11 @@ static void convolve(const uint8_t *src, ptrdiff_t src_stride,
assert(y_step_q4 <= 32); assert(y_step_q4 <= 32);
assert(x_step_q4 <= 32); assert(x_step_q4 <= 32);
convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1), src_stride, convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1), src_stride, temp,
temp, MAX_SB_SIZE, MAX_SB_SIZE, x_filters, x0_q4, x_step_q4, w,
x_filters, x0_q4, x_step_q4, w, intermediate_height); intermediate_height);
convolve_vert(temp + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1), MAX_SB_SIZE, convolve_vert(temp + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1), MAX_SB_SIZE, dst,
dst, dst_stride, dst_stride, y_filters, y0_q4, y_step_q4, w, h);
y_filters, y0_q4, y_step_q4, w, h);
} }
static const InterpKernel *get_filter_base(const int16_t *filter) { static const InterpKernel *get_filter_base(const int16_t *filter) {
@@ -158,70 +156,69 @@ static int get_filter_offset(const int16_t *f, const InterpKernel *base) {
return (int)((const InterpKernel *)(intptr_t)f - base); return (int)((const InterpKernel *)(intptr_t)f - base);
} }
void vpx_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h) { int h) {
const InterpKernel *const filters_x = get_filter_base(filter_x); const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x); const int x0_q4 = get_filter_offset(filter_x, filters_x);
(void)filter_y; (void)filter_y;
(void)y_step_q4; (void)y_step_q4;
convolve_horiz(src, src_stride, dst, dst_stride, filters_x, convolve_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4, x_step_q4,
x0_q4, x_step_q4, w, h); w, h);
} }
void vpx_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h) { int h) {
const InterpKernel *const filters_x = get_filter_base(filter_x); const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x); const int x0_q4 = get_filter_offset(filter_x, filters_x);
(void)filter_y; (void)filter_y;
(void)y_step_q4; (void)y_step_q4;
convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x, convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4,
x0_q4, x_step_q4, w, h); x_step_q4, w, h);
} }
void vpx_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h) { int h) {
const InterpKernel *const filters_y = get_filter_base(filter_y); const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y); const int y0_q4 = get_filter_offset(filter_y, filters_y);
(void)filter_x; (void)filter_x;
(void)x_step_q4; (void)x_step_q4;
convolve_vert(src, src_stride, dst, dst_stride, filters_y, convolve_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4, y_step_q4,
y0_q4, y_step_q4, w, h); w, h);
} }
void vpx_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h) { int h) {
const InterpKernel *const filters_y = get_filter_base(filter_y); const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y); const int y0_q4 = get_filter_offset(filter_y, filters_y);
(void)filter_x; (void)filter_x;
(void)x_step_q4; (void)x_step_q4;
convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y, convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4,
y0_q4, y_step_q4, w, h); y_step_q4, w, h);
} }
void vpx_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int x_step_q4, int x_step_q4, const int16_t *filter_y, int y_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) { int w, int h) {
const InterpKernel *const filters_x = get_filter_base(filter_x); const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x); const int x0_q4 = get_filter_offset(filter_x, filters_x);
@@ -229,36 +226,35 @@ void vpx_convolve8_c(const uint8_t *src, ptrdiff_t src_stride,
const InterpKernel *const filters_y = get_filter_base(filter_y); const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y); const int y0_q4 = get_filter_offset(filter_y, filters_y);
convolve(src, src_stride, dst, dst_stride, convolve(src, src_stride, dst, dst_stride, filters_x, x0_q4, x_step_q4,
filters_x, x0_q4, x_step_q4,
filters_y, y0_q4, y_step_q4, w, h); filters_y, y0_q4, y_step_q4, w, h);
} }
void vpx_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int x_step_q4, int x_step_q4, const int16_t *filter_y, int y_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) { int w, int h) {
/* Fixed size intermediate buffer places limits on parameters. */ /* Fixed size intermediate buffer places limits on parameters. */
DECLARE_ALIGNED(16, uint8_t, temp[MAX_SB_SIZE * MAX_SB_SIZE]); DECLARE_ALIGNED(16, uint8_t, temp[MAX_SB_SIZE * MAX_SB_SIZE]);
assert(w <= MAX_SB_SIZE); assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE); assert(h <= MAX_SB_SIZE);
vpx_convolve8_c(src, src_stride, temp, MAX_SB_SIZE, aom_convolve8_c(src, src_stride, temp, MAX_SB_SIZE, filter_x, x_step_q4,
filter_x, x_step_q4, filter_y, y_step_q4, w, h); filter_y, y_step_q4, w, h);
vpx_convolve_avg_c(temp, MAX_SB_SIZE, dst, dst_stride, aom_convolve_avg_c(temp, MAX_SB_SIZE, dst, dst_stride, NULL, 0, NULL, 0, w,
NULL, 0, NULL, 0, w, h); h);
} }
void vpx_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int filter_x_stride, int filter_x_stride, const int16_t *filter_y,
const int16_t *filter_y, int filter_y_stride, int filter_y_stride, int w, int h) {
int w, int h) {
int r; int r;
(void)filter_x; (void)filter_x_stride; (void)filter_x;
(void)filter_y; (void)filter_y_stride; (void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
for (r = h; r > 0; --r) { for (r = h; r > 0; --r) {
memcpy(dst, src, w); memcpy(dst, src, w);
@@ -267,85 +263,80 @@ void vpx_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride,
} }
} }
void vpx_convolve_avg_c(const uint8_t *src, ptrdiff_t src_stride, void aom_convolve_avg_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int filter_x_stride, int filter_x_stride, const int16_t *filter_y,
const int16_t *filter_y, int filter_y_stride, int filter_y_stride, int w, int h) {
int w, int h) {
int x, y; int x, y;
(void)filter_x; (void)filter_x_stride; (void)filter_x;
(void)filter_y; (void)filter_y_stride; (void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
for (y = 0; y < h; ++y) { for (y = 0; y < h; ++y) {
for (x = 0; x < w; ++x) for (x = 0; x < w; ++x) dst[x] = ROUND_POWER_OF_TWO(dst[x] + src[x], 1);
dst[x] = ROUND_POWER_OF_TWO(dst[x] + src[x], 1);
src += src_stride; src += src_stride;
dst += dst_stride; dst += dst_stride;
} }
} }
void vpx_scaled_horiz_c(const uint8_t *src, ptrdiff_t src_stride, void aom_scaled_horiz_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int x_step_q4, int x_step_q4, const int16_t *filter_y, int y_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) { int w, int h) {
vpx_convolve8_horiz_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4, aom_convolve8_horiz_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h); filter_y, y_step_q4, w, h);
} }
void vpx_scaled_vert_c(const uint8_t *src, ptrdiff_t src_stride, void aom_scaled_vert_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int x_step_q4, int x_step_q4, const int16_t *filter_y, int y_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) { int w, int h) {
vpx_convolve8_vert_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4, aom_convolve8_vert_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h); filter_y, y_step_q4, w, h);
} }
void vpx_scaled_2d_c(const uint8_t *src, ptrdiff_t src_stride, void aom_scaled_2d_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int x_step_q4, int x_step_q4, const int16_t *filter_y, int y_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) { int w, int h) {
vpx_convolve8_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4, aom_convolve8_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h); filter_y, y_step_q4, w, h);
} }
void vpx_scaled_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride, void aom_scaled_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h) { int h) {
vpx_convolve8_avg_horiz_c(src, src_stride, dst, dst_stride, filter_x, aom_convolve8_avg_horiz_c(src, src_stride, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h); x_step_q4, filter_y, y_step_q4, w, h);
} }
void vpx_scaled_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride, void aom_scaled_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h) { int h) {
vpx_convolve8_avg_vert_c(src, src_stride, dst, dst_stride, filter_x, aom_convolve8_avg_vert_c(src, src_stride, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h); x_step_q4, filter_y, y_step_q4, w, h);
} }
void vpx_scaled_avg_2d_c(const uint8_t *src, ptrdiff_t src_stride, void aom_scaled_avg_2d_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
uint8_t *dst, ptrdiff_t dst_stride, ptrdiff_t dst_stride, const int16_t *filter_x,
const int16_t *filter_x, int x_step_q4, int x_step_q4, const int16_t *filter_y, int y_step_q4,
const int16_t *filter_y, int y_step_q4, int w, int h) {
int w, int h) { aom_convolve8_avg_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
vpx_convolve8_avg_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h); filter_y, y_step_q4, w, h);
} }
#if CONFIG_VP9_HIGHBITDEPTH #if CONFIG_AOM_HIGHBITDEPTH
static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride, static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride, uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *x_filters, const InterpKernel *x_filters, int x0_q4,
int x0_q4, int x_step_q4, int x_step_q4, int w, int h, int bd) {
int w, int h, int bd) {
int x, y; int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8); uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8); uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@@ -356,8 +347,7 @@ static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride,
const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS]; const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK]; const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0; int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k) for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
sum += src_x[k] * x_filter[k];
dst[x] = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd); dst[x] = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
x_q4 += x_step_q4; x_q4 += x_step_q4;
} }
@@ -368,9 +358,8 @@ static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve_avg_horiz(const uint8_t *src8, ptrdiff_t src_stride, static void highbd_convolve_avg_horiz(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride, uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *x_filters, const InterpKernel *x_filters, int x0_q4,
int x0_q4, int x_step_q4, int x_step_q4, int w, int h, int bd) {
int w, int h, int bd) {
int x, y; int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8); uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8); uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@@ -381,10 +370,10 @@ static void highbd_convolve_avg_horiz(const uint8_t *src8, ptrdiff_t src_stride,
const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS]; const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK]; const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0; int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k) for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
sum += src_x[k] * x_filter[k]; dst[x] = ROUND_POWER_OF_TWO(
dst[x] = ROUND_POWER_OF_TWO(dst[x] + dst[x] + clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd),
clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd), 1); 1);
x_q4 += x_step_q4; x_q4 += x_step_q4;
} }
src += src_stride; src += src_stride;
@@ -394,9 +383,8 @@ static void highbd_convolve_avg_horiz(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride, static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride, uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *y_filters, const InterpKernel *y_filters, int y0_q4,
int y0_q4, int y_step_q4, int w, int h, int y_step_q4, int w, int h, int bd) {
int bd) {
int x, y; int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8); uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8); uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@@ -409,8 +397,8 @@ static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride,
int k, sum = 0; int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k) for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_y[k * src_stride] * y_filter[k]; sum += src_y[k * src_stride] * y_filter[k];
dst[y * dst_stride] = clip_pixel_highbd( dst[y * dst_stride] =
ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd); clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
y_q4 += y_step_q4; y_q4 += y_step_q4;
} }
++src; ++src;
@@ -420,9 +408,8 @@ static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve_avg_vert(const uint8_t *src8, ptrdiff_t src_stride, static void highbd_convolve_avg_vert(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride, uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *y_filters, const InterpKernel *y_filters, int y0_q4,
int y0_q4, int y_step_q4, int w, int h, int y_step_q4, int w, int h, int bd) {
int bd) {
int x, y; int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8); uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8); uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@@ -435,8 +422,10 @@ static void highbd_convolve_avg_vert(const uint8_t *src8, ptrdiff_t src_stride,
int k, sum = 0; int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k) for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_y[k * src_stride] * y_filter[k]; sum += src_y[k * src_stride] * y_filter[k];
dst[y * dst_stride] = ROUND_POWER_OF_TWO(dst[y * dst_stride] + dst[y * dst_stride] = ROUND_POWER_OF_TWO(
clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd), 1); dst[y * dst_stride] +
clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd),
1);
y_q4 += y_step_q4; y_q4 += y_step_q4;
} }
++src; ++src;
@@ -446,11 +435,9 @@ static void highbd_convolve_avg_vert(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve(const uint8_t *src, ptrdiff_t src_stride, static void highbd_convolve(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *const x_filters, const InterpKernel *const x_filters, int x0_q4,
int x0_q4, int x_step_q4, int x_step_q4, const InterpKernel *const y_filters,
const InterpKernel *const y_filters, int y0_q4, int y_step_q4, int w, int h, int bd) {
int y0_q4, int y_step_q4,
int w, int h, int bd) {
// Note: Fixed size intermediate buffer, temp, places limits on parameters. // Note: Fixed size intermediate buffer, temp, places limits on parameters.
// 2d filtering proceeds in 2 steps: // 2d filtering proceeds in 2 steps:
// (1) Interpolate horizontally into an intermediate buffer, temp. // (1) Interpolate horizontally into an intermediate buffer, temp.
@@ -465,7 +452,7 @@ static void highbd_convolve(const uint8_t *src, ptrdiff_t src_stride,
// --((64 - 1) * 32 + 15) >> 4 + 8 = 135. // --((64 - 1) * 32 + 15) >> 4 + 8 = 135.
uint16_t temp[MAX_EXT_SIZE * MAX_SB_SIZE]; uint16_t temp[MAX_EXT_SIZE * MAX_SB_SIZE];
int intermediate_height = int intermediate_height =
(((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS; (((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS;
assert(w <= MAX_SB_SIZE); assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE); assert(h <= MAX_SB_SIZE);
@@ -473,31 +460,28 @@ static void highbd_convolve(const uint8_t *src, ptrdiff_t src_stride,
assert(x_step_q4 <= 32); assert(x_step_q4 <= 32);
highbd_convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1), src_stride, highbd_convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1), src_stride,
CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE, CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE, x_filters, x0_q4,
x_filters, x0_q4, x_step_q4, w, x_step_q4, w, intermediate_height, bd);
intermediate_height, bd);
highbd_convolve_vert( highbd_convolve_vert(
CONVERT_TO_BYTEPTR(temp) + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1), MAX_SB_SIZE, CONVERT_TO_BYTEPTR(temp) + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1),
dst, dst_stride, MAX_SB_SIZE, dst, dst_stride, y_filters, y0_q4, y_step_q4, w, h, bd);
y_filters, y0_q4, y_step_q4, w, h, bd);
} }
void aom_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
void vpx_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h, int bd) { int h, int bd) {
const InterpKernel *const filters_x = get_filter_base(filter_x); const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x); const int x0_q4 = get_filter_offset(filter_x, filters_x);
(void)filter_y; (void)filter_y;
(void)y_step_q4; (void)y_step_q4;
highbd_convolve_horiz(src, src_stride, dst, dst_stride, filters_x, highbd_convolve_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4,
x0_q4, x_step_q4, w, h, bd); x_step_q4, w, h, bd);
} }
void vpx_highbd_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride, void aom_highbd_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4,
@@ -507,25 +491,25 @@ void vpx_highbd_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
(void)filter_y; (void)filter_y;
(void)y_step_q4; (void)y_step_q4;
highbd_convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x, highbd_convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4,
x0_q4, x_step_q4, w, h, bd); x_step_q4, w, h, bd);
} }
void vpx_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride, void aom_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h, int bd) { int h, int bd) {
const InterpKernel *const filters_y = get_filter_base(filter_y); const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y); const int y0_q4 = get_filter_offset(filter_y, filters_y);
(void)filter_x; (void)filter_x;
(void)x_step_q4; (void)x_step_q4;
highbd_convolve_vert(src, src_stride, dst, dst_stride, filters_y, highbd_convolve_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4,
y0_q4, y_step_q4, w, h, bd); y_step_q4, w, h, bd);
} }
void vpx_highbd_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride, void aom_highbd_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4,
@@ -535,45 +519,42 @@ void vpx_highbd_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
(void)filter_x; (void)filter_x;
(void)x_step_q4; (void)x_step_q4;
highbd_convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y, highbd_convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4,
y0_q4, y_step_q4, w, h, bd); y_step_q4, w, h, bd);
} }
void vpx_highbd_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, void aom_highbd_convolve8_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h, int bd) { int h, int bd) {
const InterpKernel *const filters_x = get_filter_base(filter_x); const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x); const int x0_q4 = get_filter_offset(filter_x, filters_x);
const InterpKernel *const filters_y = get_filter_base(filter_y); const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y); const int y0_q4 = get_filter_offset(filter_y, filters_y);
highbd_convolve(src, src_stride, dst, dst_stride, highbd_convolve(src, src_stride, dst, dst_stride, filters_x, x0_q4, x_step_q4,
filters_x, x0_q4, x_step_q4,
filters_y, y0_q4, y_step_q4, w, h, bd); filters_y, y0_q4, y_step_q4, w, h, bd);
} }
void vpx_highbd_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride, void aom_highbd_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h, int bd) { int h, int bd) {
// Fixed size intermediate buffer places limits on parameters. // Fixed size intermediate buffer places limits on parameters.
DECLARE_ALIGNED(16, uint16_t, temp[MAX_SB_SIZE * MAX_SB_SIZE]); DECLARE_ALIGNED(16, uint16_t, temp[MAX_SB_SIZE * MAX_SB_SIZE]);
assert(w <= MAX_SB_SIZE); assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE); assert(h <= MAX_SB_SIZE);
vpx_highbd_convolve8_c(src, src_stride, aom_highbd_convolve8_c(src, src_stride, CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE,
CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE,
filter_x, x_step_q4, filter_y, y_step_q4, w, h, bd); filter_x, x_step_q4, filter_y, y_step_q4, w, h, bd);
vpx_highbd_convolve_avg_c(CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE, aom_highbd_convolve_avg_c(CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE, dst,
dst, dst_stride, dst_stride, NULL, 0, NULL, 0, w, h, bd);
NULL, 0, NULL, 0, w, h, bd);
} }
void vpx_highbd_convolve_copy_c(const uint8_t *src8, ptrdiff_t src_stride, void aom_highbd_convolve_copy_c(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride, uint8_t *dst8, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride, const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride, const int16_t *filter_y, int filter_y_stride,
@@ -594,7 +575,7 @@ void vpx_highbd_convolve_copy_c(const uint8_t *src8, ptrdiff_t src_stride,
} }
} }
void vpx_highbd_convolve_avg_c(const uint8_t *src8, ptrdiff_t src_stride, void aom_highbd_convolve_avg_c(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride, uint8_t *dst8, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride, const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride, const int16_t *filter_y, int filter_y_stride,

View File

@@ -1,17 +1,18 @@
/* /*
* Copyright (c) 2013 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#ifndef VPX_DSP_VPX_CONVOLVE_H_ #ifndef AOM_DSP_AOM_CONVOLVE_H_
#define VPX_DSP_VPX_CONVOLVE_H_ #define AOM_DSP_AOM_CONVOLVE_H_
#include "./vpx_config.h" #include "./aom_config.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
#ifdef __cplusplus #ifdef __cplusplus
extern "C" { extern "C" {
@@ -29,19 +30,19 @@ extern "C" {
// --Must round-up because block may be located at sub-pixel position. // --Must round-up because block may be located at sub-pixel position.
// --Require an additional SUBPEL_TAPS rows for the 8-tap filter tails. // --Require an additional SUBPEL_TAPS rows for the 8-tap filter tails.
// --((64 - 1) * 32 + 15) >> 4 + 8 = 135. // --((64 - 1) * 32 + 15) >> 4 + 8 = 135.
#if CONFIG_VP10 && CONFIG_EXT_PARTITION #if CONFIG_AV1 && CONFIG_EXT_PARTITION
# define MAX_EXT_SIZE 263 #define MAX_EXT_SIZE 263
#else #else
# define MAX_EXT_SIZE 135 #define MAX_EXT_SIZE 135
#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION #endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
typedef void (*convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride, typedef void (*convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, const int16_t *filter_y, int y_step_q4, int w,
int w, int h); int h);
#if CONFIG_VP9_HIGHBITDEPTH #if CONFIG_AOM_HIGHBITDEPTH
typedef void (*highbd_convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride, typedef void (*highbd_convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride, uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4, const int16_t *filter_x, int x_step_q4,
@@ -53,4 +54,4 @@ typedef void (*highbd_convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
} // extern "C" } // extern "C"
#endif #endif
#endif // VPX_DSP_VPX_CONVOLVE_H_ #endif // AOM_DSP_AOM_CONVOLVE_H_

View File

@@ -1,15 +1,17 @@
## ##
## Copyright (c) 2015 The WebM project authors. All Rights Reserved. ## Copyright (c) 2016, Alliance for Open Media. All rights reserved
## ##
## Use of this source code is governed by a BSD-style license ## This source code is subject to the terms of the BSD 2 Clause License and
## that can be found in the LICENSE file in the root of the source ## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## tree. An additional intellectual property rights grant can be found ## was not distributed with this source code in the LICENSE file, you can
## in the file PATENTS. All contributing project authors may ## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## be found in the AUTHORS file in the root of the source tree. ## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
## ##
DSP_SRCS-yes += vpx_dsp.mk
DSP_SRCS-yes += vpx_dsp_common.h DSP_SRCS-yes += aom_dsp.mk
DSP_SRCS-yes += aom_dsp_common.h
DSP_SRCS-$(HAVE_MSA) += mips/macros_msa.h DSP_SRCS-$(HAVE_MSA) += mips/macros_msa.h
@@ -18,14 +20,20 @@ DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/synonyms.h
# bit reader # bit reader
DSP_SRCS-yes += prob.h DSP_SRCS-yes += prob.h
DSP_SRCS-yes += prob.c DSP_SRCS-yes += prob.c
DSP_SRCS-$(CONFIG_ANS) += ans.h
DSP_SRCS-$(CONFIG_ANS) += ans.c
ifeq ($(CONFIG_ENCODERS),yes) ifeq ($(CONFIG_ENCODERS),yes)
DSP_SRCS-$(CONFIG_ANS) += answriter.h
DSP_SRCS-yes += bitwriter.h DSP_SRCS-yes += bitwriter.h
DSP_SRCS-yes += bitwriter.c DSP_SRCS-yes += dkboolwriter.h
DSP_SRCS-yes += dkboolwriter.c
DSP_SRCS-yes += bitwriter_buffer.c DSP_SRCS-yes += bitwriter_buffer.c
DSP_SRCS-yes += bitwriter_buffer.h DSP_SRCS-yes += bitwriter_buffer.h
DSP_SRCS-yes += psnr.c DSP_SRCS-yes += psnr.c
DSP_SRCS-yes += psnr.h DSP_SRCS-yes += psnr.h
DSP_SRCS-$(CONFIG_ANS) += buf_ans.h
DSP_SRCS-$(CONFIG_ANS) += buf_ans.c
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.c DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.c
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.h DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.h
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += psnrhvs.c DSP_SRCS-$(CONFIG_INTERNAL_STATS) += psnrhvs.c
@@ -33,8 +41,10 @@ DSP_SRCS-$(CONFIG_INTERNAL_STATS) += fastssim.c
endif endif
ifeq ($(CONFIG_DECODERS),yes) ifeq ($(CONFIG_DECODERS),yes)
DSP_SRCS-$(CONFIG_ANS) += ansreader.h
DSP_SRCS-yes += bitreader.h DSP_SRCS-yes += bitreader.h
DSP_SRCS-yes += bitreader.c DSP_SRCS-yes += dkboolreader.h
DSP_SRCS-yes += dkboolreader.c
DSP_SRCS-yes += bitreader_buffer.c DSP_SRCS-yes += bitreader_buffer.c
DSP_SRCS-yes += bitreader_buffer.h DSP_SRCS-yes += bitreader_buffer.h
endif endif
@@ -42,15 +52,28 @@ endif
# intra predictions # intra predictions
DSP_SRCS-yes += intrapred.c DSP_SRCS-yes += intrapred.c
ifeq ($(CONFIG_DAALA_EC),yes)
DSP_SRCS-yes += entenc.c
DSP_SRCS-yes += entenc.h
DSP_SRCS-yes += entdec.c
DSP_SRCS-yes += entdec.h
DSP_SRCS-yes += entcode.c
DSP_SRCS-yes += entcode.h
DSP_SRCS-yes += daalaboolreader.c
DSP_SRCS-yes += daalaboolreader.h
DSP_SRCS-yes += daalaboolwriter.c
DSP_SRCS-yes += daalaboolwriter.h
endif
DSP_SRCS-$(HAVE_SSE) += x86/intrapred_sse2.asm DSP_SRCS-$(HAVE_SSE) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/intrapred_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/intrapred_ssse3.asm DSP_SRCS-$(HAVE_SSSE3) += x86/intrapred_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_ssse3.asm DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_ssse3.asm
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE) += x86/highbd_intrapred_sse2.asm DSP_SRCS-$(HAVE_SSE) += x86/highbd_intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_intrapred_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/highbd_intrapred_sse2.asm
endif # CONFIG_VP9_HIGHBITDEPTH endif # CONFIG_AOM_HIGHBITDEPTH
DSP_SRCS-$(HAVE_NEON_ASM) += arm/intrapred_neon_asm$(ASM) DSP_SRCS-$(HAVE_NEON_ASM) += arm/intrapred_neon_asm$(ASM)
DSP_SRCS-$(HAVE_NEON) += arm/intrapred_neon.c DSP_SRCS-$(HAVE_NEON) += arm/intrapred_neon.c
@@ -63,8 +86,6 @@ DSP_SRCS-$(HAVE_DSPR2) += mips/common_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/common_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/common_dspr2.c
# inter predictions # inter predictions
ifeq ($(CONFIG_VP10),yes)
DSP_SRCS-yes += blend.h DSP_SRCS-yes += blend.h
DSP_SRCS-yes += blend_a64_mask.c DSP_SRCS-yes += blend_a64_mask.c
DSP_SRCS-yes += blend_a64_hmask.c DSP_SRCS-yes += blend_a64_hmask.c
@@ -73,54 +94,52 @@ DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_sse4.h
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_mask_sse4.c DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_mask_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_hmask_sse4.c DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_hmask_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_vmask_sse4.c DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_vmask_sse4.c
endif #CONFIG_VP10
# interpolation filters # interpolation filters
DSP_SRCS-yes += vpx_convolve.c DSP_SRCS-yes += aom_convolve.c
DSP_SRCS-yes += vpx_convolve.h DSP_SRCS-yes += aom_convolve.h
DSP_SRCS-yes += vpx_filter.h DSP_SRCS-yes += aom_filter.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/convolve.h DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/convolve.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/vpx_asm_stubs.c DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/aom_asm_stubs.c
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_subpixel_8t_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/aom_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_subpixel_bilinear_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/aom_subpixel_bilinear_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_ssse3.asm DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_bilinear_ssse3.asm DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_bilinear_ssse3.asm
DSP_SRCS-$(HAVE_AVX2) += x86/vpx_subpixel_8t_intrin_avx2.c DSP_SRCS-$(HAVE_AVX2) += x86/aom_subpixel_8t_intrin_avx2.c
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_intrin_ssse3.c DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_intrin_ssse3.c
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_8t_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/aom_high_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_bilinear_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/aom_high_subpixel_bilinear_sse2.asm
endif endif
DSP_SRCS-$(HAVE_SSE2) += x86/aom_convolve_copy_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_convolve_copy_sse2.asm
ifeq ($(HAVE_NEON_ASM),yes) ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/vpx_convolve_copy_neon_asm$(ASM) DSP_SRCS-yes += arm/aom_convolve_copy_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve8_avg_neon_asm$(ASM) DSP_SRCS-yes += arm/aom_convolve8_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve8_neon_asm$(ASM) DSP_SRCS-yes += arm/aom_convolve8_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve_avg_neon_asm$(ASM) DSP_SRCS-yes += arm/aom_convolve_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve_neon.c DSP_SRCS-yes += arm/aom_convolve_neon.c
else else
ifeq ($(HAVE_NEON),yes) ifeq ($(HAVE_NEON),yes)
DSP_SRCS-yes += arm/vpx_convolve_copy_neon.c DSP_SRCS-yes += arm/aom_convolve_copy_neon.c
DSP_SRCS-yes += arm/vpx_convolve8_avg_neon.c DSP_SRCS-yes += arm/aom_convolve8_avg_neon.c
DSP_SRCS-yes += arm/vpx_convolve8_neon.c DSP_SRCS-yes += arm/aom_convolve8_neon.c
DSP_SRCS-yes += arm/vpx_convolve_avg_neon.c DSP_SRCS-yes += arm/aom_convolve_avg_neon.c
DSP_SRCS-yes += arm/vpx_convolve_neon.c DSP_SRCS-yes += arm/aom_convolve_neon.c
endif # HAVE_NEON endif # HAVE_NEON
endif # HAVE_NEON_ASM endif # HAVE_NEON_ASM
# common (msa) # common (msa)
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_avg_horiz_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_avg_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_avg_vert_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_horiz_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_vert_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve_avg_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve_copy_msa.c DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_copy_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve_msa.h DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_msa.h
# common (dspr2) # common (dspr2)
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve_common_dspr2.h DSP_SRCS-$(HAVE_DSPR2) += mips/convolve_common_dspr2.h
@@ -167,15 +186,37 @@ DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_horiz_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_horiz_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_vert_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_vert_dspr2.c
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_loopfilter_sse2.c DSP_SRCS-$(HAVE_SSE2) += x86/highbd_loopfilter_sse2.c
endif # CONFIG_VP9_HIGHBITDEPTH endif # CONFIG_AOM_HIGHBITDEPTH
DSP_SRCS-yes += txfm_common.h DSP_SRCS-yes += txfm_common.h
DSP_SRCS-yes += x86/txfm_common_intrin.h
DSP_SRCS-$(HAVE_SSE2) += x86/txfm_common_sse2.h DSP_SRCS-$(HAVE_SSE2) += x86/txfm_common_sse2.h
DSP_SRCS-$(HAVE_MSA) += mips/txfm_macros_msa.h DSP_SRCS-$(HAVE_MSA) += mips/txfm_macros_msa.h
# forward transform # forward transform
ifeq ($(CONFIG_VP10),yes) ifeq ($(CONFIG_AV1),yes)
DSP_SRCS-yes += fwd_txfm.c
DSP_SRCS-yes += fwd_txfm.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32_8cols_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_impl_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32x32_impl_sse2.h
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/fwd_txfm_ssse3_x86_64.asm
endif
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.h
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/txfm_common_avx2.h
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_dct32x32_impl_avx2.h
DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_dct32x32_msa.c
endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_PVQ),yes)
DSP_SRCS-yes += fwd_txfm.c DSP_SRCS-yes += fwd_txfm.c
DSP_SRCS-yes += fwd_txfm.h DSP_SRCS-yes += fwd_txfm.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.h DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.h
@@ -191,10 +232,10 @@ DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.h DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.c DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_dct32x32_msa.c DSP_SRCS-$(HAVE_MSA) += mips/fwd_dct32x32_msa.c
endif # CONFIG_VP10_ENCODER endif # CONFIG_PVQ
# inverse transform # inverse transform
ifeq ($(CONFIG_VP10), yes) ifeq ($(CONFIG_AV1), yes)
DSP_SRCS-yes += inv_txfm.h DSP_SRCS-yes += inv_txfm.h
DSP_SRCS-yes += inv_txfm.c DSP_SRCS-yes += inv_txfm.c
DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.h DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.h
@@ -234,23 +275,23 @@ DSP_SRCS-$(HAVE_MSA) += mips/idct8x8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/idct16x16_msa.c DSP_SRCS-$(HAVE_MSA) += mips/idct16x16_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/idct32x32_msa.c DSP_SRCS-$(HAVE_MSA) += mips/idct32x32_msa.c
ifneq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifneq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_DSPR2) += mips/inv_txfm_dspr2.h DSP_SRCS-$(HAVE_DSPR2) += mips/inv_txfm_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans4_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/itrans4_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans8_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/itrans8_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans16_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/itrans16_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_cols_dspr2.c DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_cols_dspr2.c
endif # CONFIG_VP9_HIGHBITDEPTH endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_VP10 endif # CONFIG_AV1
# quantization # quantization
ifneq ($(filter yes,$(CONFIG_VP10_ENCODER)),) ifneq ($(filter yes,$(CONFIG_AV1_ENCODER)),)
DSP_SRCS-yes += quantize.c DSP_SRCS-yes += quantize.c
DSP_SRCS-yes += quantize.h DSP_SRCS-yes += quantize.h
DSP_SRCS-$(HAVE_SSE2) += x86/quantize_sse2.c DSP_SRCS-$(HAVE_SSE2) += x86/quantize_sse2.c
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_quantize_intrin_sse2.c DSP_SRCS-$(HAVE_SSE2) += x86/highbd_quantize_intrin_sse2.c
endif endif
ifeq ($(ARCH_X86_64),yes) ifeq ($(ARCH_X86_64),yes)
@@ -269,17 +310,17 @@ DSP_SRCS-$(HAVE_SSSE3) += x86/avg_ssse3_x86_64.asm
endif endif
# high bit depth subtract # high bit depth subtract
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subtract_sse2.c DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subtract_sse2.c
endif endif
endif # CONFIG_VP10_ENCODER endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_VP10_ENCODER),yes) ifeq ($(CONFIG_AV1_ENCODER),yes)
DSP_SRCS-yes += sum_squares.c DSP_SRCS-yes += sum_squares.c
DSP_SRCS-$(HAVE_SSE2) += x86/sum_squares_sse2.c DSP_SRCS-$(HAVE_SSE2) += x86/sum_squares_sse2.c
endif # CONFIG_VP10_ENCODER endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_ENCODERS),yes) ifeq ($(CONFIG_ENCODERS),yes)
DSP_SRCS-yes += sad.c DSP_SRCS-yes += sad.c
@@ -299,16 +340,16 @@ DSP_SRCS-$(HAVE_SSE4_1) += x86/sad_sse4.asm
DSP_SRCS-$(HAVE_AVX2) += x86/sad4d_avx2.c DSP_SRCS-$(HAVE_AVX2) += x86/sad4d_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/sad_avx2.c DSP_SRCS-$(HAVE_AVX2) += x86/sad_avx2.c
ifeq ($(CONFIG_VP10_ENCODER),yes) ifeq ($(CONFIG_AV1_ENCODER),yes)
ifeq ($(CONFIG_EXT_INTER),yes) ifeq ($(CONFIG_EXT_INTER),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/masked_sad_intrin_ssse3.c DSP_SRCS-$(HAVE_SSSE3) += x86/masked_sad_intrin_ssse3.c
DSP_SRCS-$(HAVE_SSSE3) += x86/masked_variance_intrin_ssse3.c DSP_SRCS-$(HAVE_SSSE3) += x86/masked_variance_intrin_ssse3.c
endif #CONFIG_EXT_INTER endif #CONFIG_EXT_INTER
ifeq ($(CONFIG_OBMC),yes) ifeq ($(CONFIG_MOTION_VAR),yes)
DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_sad_sse4.c DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_sad_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_variance_sse4.c DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_variance_sse4.c
endif #CONFIG_OBMC endif #CONFIG_MOTION_VAR
endif #CONFIG_VP10_ENCODER endif #CONFIG_AV1_ENCODER
DSP_SRCS-$(HAVE_SSE) += x86/sad4d_sse2.asm DSP_SRCS-$(HAVE_SSE) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE) += x86/sad_sse2.asm DSP_SRCS-$(HAVE_SSE) += x86/sad_sse2.asm
@@ -316,10 +357,10 @@ DSP_SRCS-$(HAVE_SSE2) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/sad_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/sad_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subtract_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/subtract_sse2.asm
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm
endif # CONFIG_VP9_HIGHBITDEPTH endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_ENCODERS endif # CONFIG_ENCODERS
@@ -353,17 +394,33 @@ endif # ARCH_X86_64
DSP_SRCS-$(HAVE_SSE) += x86/subpel_variance_sse2.asm DSP_SRCS-$(HAVE_SSE) += x86/subpel_variance_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subpel_variance_sse2.asm # Contains SSE2 and SSSE3 DSP_SRCS-$(HAVE_SSE2) += x86/subpel_variance_sse2.asm # Contains SSE2 and SSSE3
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_sse2.c DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_sse2.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/highbd_variance_sse4.c DSP_SRCS-$(HAVE_SSE4_1) += x86/highbd_variance_sse4.c
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_impl_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_impl_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subpel_variance_impl_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subpel_variance_impl_sse2.asm
endif # CONFIG_VP9_HIGHBITDEPTH endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_ENCODERS endif # CONFIG_ENCODERS
DSP_SRCS-no += $(DSP_SRCS_REMOVE-yes) DSP_SRCS-no += $(DSP_SRCS_REMOVE-yes)
DSP_SRCS-yes += vpx_dsp_rtcd.c DSP_SRCS-yes += aom_dsp_rtcd.c
DSP_SRCS-yes += vpx_dsp_rtcd_defs.pl DSP_SRCS-yes += aom_dsp_rtcd_defs.pl
$(eval $(call rtcd_h_template,vpx_dsp_rtcd,vpx_dsp/vpx_dsp_rtcd_defs.pl)) DSP_SRCS-yes += aom_simd.c
DSP_SRCS-yes += aom_simd.h
DSP_SRCS-yes += aom_simd_inline.h
DSP_SRCS-yes += simd/v64_intrinsics.h
DSP_SRCS-yes += simd/v64_intrinsics_c.h
DSP_SRCS-yes += simd/v128_intrinsics.h
DSP_SRCS-yes += simd/v128_intrinsics_c.h
DSP_SRCS-yes += simd/v256_intrinsics.h
DSP_SRCS-yes += simd/v256_intrinsics_c.h
DSP_SRCS-$(HAVE_SSE2) += simd/v64_intrinsics_x86.h
DSP_SRCS-$(HAVE_SSE2) += simd/v128_intrinsics_x86.h
DSP_SRCS-$(HAVE_SSE2) += simd/v256_intrinsics_x86.h
DSP_SRCS-$(HAVE_NEON) += simd/v64_intrinsics_arm.h
DSP_SRCS-$(HAVE_NEON) += simd/v128_intrinsics_arm.h
DSP_SRCS-$(HAVE_NEON) += simd/v256_intrinsics_arm.h
$(eval $(call rtcd_h_template,aom_dsp_rtcd,aom_dsp/aom_dsp_rtcd_defs.pl))

102
aom_dsp/aom_dsp_common.h Normal file
View File

@@ -0,0 +1,102 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_DSP_COMMON_H_
#define AOM_DSP_AOM_DSP_COMMON_H_
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#ifdef __cplusplus
extern "C" {
#endif
#ifndef MAX_SB_SIZE
#if CONFIG_AV1 && CONFIG_EXT_PARTITION
#define MAX_SB_SIZE 128
#else
#define MAX_SB_SIZE 64
#endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
#endif // ndef MAX_SB_SIZE
#define AOMMIN(x, y) (((x) < (y)) ? (x) : (y))
#define AOMMAX(x, y) (((x) > (y)) ? (x) : (y))
#define IMPLIES(a, b) (!(a) || (b)) // Logical 'a implies b' (or 'a -> b')
#define IS_POWER_OF_TWO(x) (((x) & ((x)-1)) == 0)
// These can be used to give a hint about branch outcomes.
// This can have an effect, even if your target processor has a
// good branch predictor, as these hints can affect basic block
// ordering by the compiler.
#ifdef __GNUC__
#define LIKELY(v) __builtin_expect(v, 1)
#define UNLIKELY(v) __builtin_expect(v, 0)
#else
#define LIKELY(v) (v)
#define UNLIKELY(v) (v)
#endif
#define AOM_SWAP(type, a, b) \
do { \
type c = (b); \
b = a; \
a = c; \
} while (0)
#if CONFIG_AOM_QM
typedef uint16_t qm_val_t;
#define AOM_QM_BITS 6
#endif
#if CONFIG_AOM_HIGHBITDEPTH
// Note:
// tran_low_t is the datatype used for final transform coefficients.
// tran_high_t is the datatype used for intermediate transform stages.
typedef int64_t tran_high_t;
typedef int32_t tran_low_t;
#else
// Note:
// tran_low_t is the datatype used for final transform coefficients.
// tran_high_t is the datatype used for intermediate transform stages.
typedef int32_t tran_high_t;
typedef int16_t tran_low_t;
#endif // CONFIG_AOM_HIGHBITDEPTH
static INLINE uint8_t clip_pixel(int val) {
return (val > 255) ? 255 : (val < 0) ? 0 : val;
}
static INLINE int clamp(int value, int low, int high) {
return value < low ? low : (value > high ? high : value);
}
static INLINE double fclamp(double value, double low, double high) {
return value < low ? low : (value > high ? high : value);
}
#if CONFIG_AOM_HIGHBITDEPTH
static INLINE uint16_t clip_pixel_highbd(int val, int bd) {
switch (bd) {
case 8:
default: return (uint16_t)clamp(val, 0, 255);
case 10: return (uint16_t)clamp(val, 0, 1023);
case 12: return (uint16_t)clamp(val, 0, 4095);
}
}
#endif // CONFIG_AOM_HIGHBITDEPTH
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_DSP_COMMON_H_

16
aom_dsp/aom_dsp_rtcd.c Normal file
View File

@@ -0,0 +1,16 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#define RTCD_C
#include "./aom_dsp_rtcd.h"
#include "aom_ports/aom_once.h"
void aom_dsp_rtcd() { once(setup_rtcd_internal); }

1950
aom_dsp/aom_dsp_rtcd_defs.pl Normal file

File diff suppressed because it is too large Load Diff

43
aom_dsp/aom_filter.h Normal file
View File

@@ -0,0 +1,43 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_FILTER_H_
#define AOM_DSP_AOM_FILTER_H_
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
#define FILTER_BITS 7
#define SUBPEL_BITS 4
#define SUBPEL_MASK ((1 << SUBPEL_BITS) - 1)
#define SUBPEL_SHIFTS (1 << SUBPEL_BITS)
#define SUBPEL_TAPS 8
typedef int16_t InterpKernel[SUBPEL_TAPS];
#define BIL_SUBPEL_BITS 3
#define BIL_SUBPEL_SHIFTS (1 << BIL_SUBPEL_BITS)
// 2 tap bilinear filters
static const uint8_t bilinear_filters_2t[BIL_SUBPEL_SHIFTS][2] = {
{ 128, 0 }, { 112, 16 }, { 96, 32 }, { 80, 48 },
{ 64, 64 }, { 48, 80 }, { 32, 96 }, { 16, 112 },
};
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_FILTER_H_

13
aom_dsp/aom_simd.c Normal file
View File

@@ -0,0 +1,13 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
// Set to 1 to add some sanity checks in the fallback C code
const int simd_check = 1;

32
aom_dsp/aom_simd.h Normal file
View File

@@ -0,0 +1,32 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_AOM_SIMD_H_
#define AOM_DSP_AOM_AOM_SIMD_H_
#include <stdint.h>
#if defined(_WIN32)
#include <intrin.h>
#endif
#include "./aom_config.h"
#include "./aom_simd_inline.h"
#if HAVE_NEON
#include "simd/v256_intrinsics_arm.h"
#elif HAVE_SSE2
#include "simd/v256_intrinsics_x86.h"
#else
#include "simd/v256_intrinsics.h"
#endif
#endif // AOM_DSP_AOM_AOM_SIMD_H_

21
aom_dsp/aom_simd_inline.h Normal file
View File

@@ -0,0 +1,21 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_SIMD_INLINE_H_
#define AOM_DSP_AOM_SIMD_INLINE_H_
#include "aom/aom_integer.h"
#ifndef SIMD_INLINE
#define SIMD_INLINE static AOM_FORCE_INLINE
#endif
#endif // AOM_DSP_AOM_SIMD_INLINE_H_

View File

@@ -1,31 +1,27 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include <assert.h> #include <assert.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
#include "vpx_ports/mem.h" #include "aom_ports/mem.h"
static INLINE int32x4_t MULTIPLY_BY_Q0( static INLINE int32x4_t MULTIPLY_BY_Q0(int16x4_t dsrc0, int16x4_t dsrc1,
int16x4_t dsrc0, int16x4_t dsrc2, int16x4_t dsrc3,
int16x4_t dsrc1, int16x4_t dsrc4, int16x4_t dsrc5,
int16x4_t dsrc2, int16x4_t dsrc6, int16x4_t dsrc7,
int16x4_t dsrc3, int16x8_t q0s16) {
int16x4_t dsrc4,
int16x4_t dsrc5,
int16x4_t dsrc6,
int16x4_t dsrc7,
int16x8_t q0s16) {
int32x4_t qdst; int32x4_t qdst;
int16x4_t d0s16, d1s16; int16x4_t d0s16, d1s16;
@@ -43,17 +39,12 @@ static INLINE int32x4_t MULTIPLY_BY_Q0(
return qdst; return qdst;
} }
void vpx_convolve8_avg_horiz_neon( void aom_convolve8_avg_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *src, uint8_t *dst, ptrdiff_t dst_stride,
ptrdiff_t src_stride, const int16_t *filter_x, int x_step_q4,
uint8_t *dst, const int16_t *filter_y, // unused
ptrdiff_t dst_stride, int y_step_q4, // unused
const int16_t *filter_x, int w, int h) {
int x_step_q4,
const int16_t *filter_y, // unused
int y_step_q4, // unused
int w,
int h) {
int width; int width;
const uint8_t *s; const uint8_t *s;
uint8_t *d; uint8_t *d;
@@ -74,9 +65,13 @@ void vpx_convolve8_avg_horiz_neon(
assert(x_step_q4 == 16); assert(x_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_y;
q0s16 = vld1q_s16(filter_x); q0s16 = vld1q_s16(filter_x);
src -= 3; // adjust for taps src -= 3; // adjust for taps
for (; h > 0; h -= 4) { // loop_horiz_v for (; h > 0; h -= 4) { // loop_horiz_v
s = src; s = src;
d24u8 = vld1_u8(s); d24u8 = vld1_u8(s);
@@ -90,8 +85,8 @@ void vpx_convolve8_avg_horiz_neon(
q12u8 = vcombine_u8(d24u8, d25u8); q12u8 = vcombine_u8(d24u8, d25u8);
q13u8 = vcombine_u8(d26u8, d27u8); q13u8 = vcombine_u8(d26u8, d27u8);
q0x2u16 = vtrnq_u16(vreinterpretq_u16_u8(q12u8), q0x2u16 =
vreinterpretq_u16_u8(q13u8)); vtrnq_u16(vreinterpretq_u16_u8(q12u8), vreinterpretq_u16_u8(q13u8));
d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0])); d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0]));
d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0])); d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0]));
d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1])); d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1]));
@@ -116,10 +111,8 @@ void vpx_convolve8_avg_horiz_neon(
q9u16 = vcombine_u16(d17u16, d19u16); q9u16 = vcombine_u16(d17u16, d19u16);
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16)); d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21 d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21
for (width = w; for (width = w; width > 0; width -= 4, src += 4, dst += 4) { // loop_horiz
width > 0;
width -= 4, src += 4, dst += 4) { // loop_horiz
s = src; s = src;
d28u32 = vld1_dup_u32((const uint32_t *)s); d28u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride; s += src_stride;
@@ -131,10 +124,10 @@ void vpx_convolve8_avg_horiz_neon(
__builtin_prefetch(src + 64); __builtin_prefetch(src + 64);
d0x2u16 = vtrn_u16(vreinterpret_u16_u32(d28u32), d0x2u16 =
vreinterpret_u16_u32(d31u32)); vtrn_u16(vreinterpret_u16_u32(d28u32), vreinterpret_u16_u32(d31u32));
d1x2u16 = vtrn_u16(vreinterpret_u16_u32(d29u32), d1x2u16 =
vreinterpret_u16_u32(d30u32)); vtrn_u16(vreinterpret_u16_u32(d29u32), vreinterpret_u16_u32(d30u32));
d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28 d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28
vreinterpret_u8_u16(d1x2u16.val[0])); // d29 vreinterpret_u8_u16(d1x2u16.val[0])); // d29
d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31 d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31
@@ -144,8 +137,8 @@ void vpx_convolve8_avg_horiz_neon(
q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]); q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]);
q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]); q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]);
q0x2u32 = vtrnq_u32(vreinterpretq_u32_u8(q14u8), q0x2u32 =
vreinterpretq_u32_u8(q15u8)); vtrnq_u32(vreinterpretq_u32_u8(q14u8), vreinterpretq_u32_u8(q15u8));
d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0])); d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0]));
d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0])); d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0]));
@@ -173,14 +166,14 @@ void vpx_convolve8_avg_horiz_neon(
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16)); d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16)); d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, d18s16, d19s16,
d18s16, d19s16, d23s16, d24s16, q0s16); d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, d19s16, d23s16,
d19s16, d23s16, d24s16, d26s16, q0s16); d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, d23s16, d24s16,
d23s16, d24s16, d26s16, d27s16, q0s16); d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, d24s16, d26s16,
d24s16, d26s16, d27s16, d25s16, q0s16); d27s16, d25s16, q0s16);
__builtin_prefetch(src + 64 + src_stride * 3); __builtin_prefetch(src + 64 + src_stride * 3);
@@ -195,8 +188,7 @@ void vpx_convolve8_avg_horiz_neon(
d2u8 = vqmovn_u16(q1u16); d2u8 = vqmovn_u16(q1u16);
d3u8 = vqmovn_u16(q2u16); d3u8 = vqmovn_u16(q2u16);
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), vreinterpret_u16_u8(d3u8));
vreinterpret_u16_u8(d3u8));
d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]), d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]),
vreinterpret_u32_u16(d0x2u16.val[1])); vreinterpret_u32_u16(d0x2u16.val[1]));
d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]), d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]),
@@ -231,17 +223,12 @@ void vpx_convolve8_avg_horiz_neon(
return; return;
} }
void vpx_convolve8_avg_vert_neon( void aom_convolve8_avg_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *src, uint8_t *dst, ptrdiff_t dst_stride,
ptrdiff_t src_stride, const int16_t *filter_x, // unused
uint8_t *dst, int x_step_q4, // unused
ptrdiff_t dst_stride, const int16_t *filter_y, int y_step_q4, int w,
const int16_t *filter_x, // unused int h) {
int x_step_q4, // unused
const int16_t *filter_y,
int y_step_q4,
int w,
int h) {
int height; int height;
const uint8_t *s; const uint8_t *s;
uint8_t *d; uint8_t *d;
@@ -258,6 +245,10 @@ void vpx_convolve8_avg_vert_neon(
assert(y_step_q4 == 16); assert(y_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_x;
src -= src_stride * 3; src -= src_stride * 3;
q0s16 = vld1q_s16(filter_y); q0s16 = vld1q_s16(filter_y);
for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h
@@ -277,8 +268,8 @@ void vpx_convolve8_avg_vert_neon(
d22u32 = vld1_lane_u32((const uint32_t *)s, d22u32, 0); d22u32 = vld1_lane_u32((const uint32_t *)s, d22u32, 0);
s += src_stride; s += src_stride;
q8u16 = vmovl_u8(vreinterpret_u8_u32(d16u32)); q8u16 = vmovl_u8(vreinterpret_u8_u32(d16u32));
q9u16 = vmovl_u8(vreinterpret_u8_u32(d18u32)); q9u16 = vmovl_u8(vreinterpret_u8_u32(d18u32));
q10u16 = vmovl_u8(vreinterpret_u8_u32(d20u32)); q10u16 = vmovl_u8(vreinterpret_u8_u32(d20u32));
q11u16 = vmovl_u8(vreinterpret_u8_u32(d22u32)); q11u16 = vmovl_u8(vreinterpret_u8_u32(d22u32));
@@ -319,20 +310,20 @@ void vpx_convolve8_avg_vert_neon(
__builtin_prefetch(s); __builtin_prefetch(s);
__builtin_prefetch(s + src_stride); __builtin_prefetch(s + src_stride);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, d20s16, d21s16,
d20s16, d21s16, d22s16, d24s16, q0s16); d22s16, d24s16, q0s16);
__builtin_prefetch(s + src_stride * 2); __builtin_prefetch(s + src_stride * 2);
__builtin_prefetch(s + src_stride * 3); __builtin_prefetch(s + src_stride * 3);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, d21s16, d22s16,
d21s16, d22s16, d24s16, d26s16, q0s16); d24s16, d26s16, q0s16);
__builtin_prefetch(d); __builtin_prefetch(d);
__builtin_prefetch(d + dst_stride); __builtin_prefetch(d + dst_stride);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, d22s16, d24s16,
d22s16, d24s16, d26s16, d27s16, q0s16); d26s16, d27s16, q0s16);
__builtin_prefetch(d + dst_stride * 2); __builtin_prefetch(d + dst_stride * 2);
__builtin_prefetch(d + dst_stride * 3); __builtin_prefetch(d + dst_stride * 3);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, d24s16, d26s16,
d24s16, d26s16, d27s16, d25s16, q0s16); d27s16, d25s16, q0s16);
d2u16 = vqrshrun_n_s32(q1s32, 7); d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7); d3u16 = vqrshrun_n_s32(q2s32, 7);

View File

@@ -1,11 +1,14 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
@@ -14,11 +17,11 @@
; w%4 == 0 ; w%4 == 0
; h%4 == 0 ; h%4 == 0
; taps == 8 ; taps == 8
; VP9_FILTER_WEIGHT == 128 ; AV1_FILTER_WEIGHT == 128
; VP9_FILTER_SHIFT == 7 ; AV1_FILTER_SHIFT == 7
EXPORT |vpx_convolve8_avg_horiz_neon| EXPORT |aom_convolve8_avg_horiz_neon|
EXPORT |vpx_convolve8_avg_vert_neon| EXPORT |aom_convolve8_avg_vert_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
@@ -49,7 +52,7 @@
; sp[]int w ; sp[]int w
; sp[]int h ; sp[]int h
|vpx_convolve8_avg_horiz_neon| PROC |aom_convolve8_avg_horiz_neon| PROC
push {r4-r10, lr} push {r4-r10, lr}
sub r0, r0, #3 ; adjust for taps sub r0, r0, #3 ; adjust for taps
@@ -72,7 +75,7 @@
mov r10, r6 ; w loop counter mov r10, r6 ; w loop counter
vpx_convolve8_avg_loop_horiz_v aom_convolve8_avg_loop_horiz_v
vld1.8 {d24}, [r0], r1 vld1.8 {d24}, [r0], r1
vld1.8 {d25}, [r0], r1 vld1.8 {d25}, [r0], r1
vld1.8 {d26}, [r0], r1 vld1.8 {d26}, [r0], r1
@@ -95,7 +98,7 @@ vpx_convolve8_avg_loop_horiz_v
add r0, r0, #3 add r0, r0, #3
vpx_convolve8_avg_loop_horiz aom_convolve8_avg_loop_horiz
add r5, r0, #64 add r5, r0, #64
vld1.32 {d28[]}, [r0], r1 vld1.32 {d28[]}, [r0], r1
@@ -164,20 +167,20 @@ vpx_convolve8_avg_loop_horiz
vmov q9, q13 vmov q9, q13
subs r6, r6, #4 ; w -= 4 subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_avg_loop_horiz bgt aom_convolve8_avg_loop_horiz
; outer loop ; outer loop
mov r6, r10 ; restore w counter mov r6, r10 ; restore w counter
add r0, r0, r9 ; src += src_stride * 4 - w add r0, r0, r9 ; src += src_stride * 4 - w
add r2, r2, r12 ; dst += dst_stride * 4 - w add r2, r2, r12 ; dst += dst_stride * 4 - w
subs r7, r7, #4 ; h -= 4 subs r7, r7, #4 ; h -= 4
bgt vpx_convolve8_avg_loop_horiz_v bgt aom_convolve8_avg_loop_horiz_v
pop {r4-r10, pc} pop {r4-r10, pc}
ENDP ENDP
|vpx_convolve8_avg_vert_neon| PROC |aom_convolve8_avg_vert_neon| PROC
push {r4-r8, lr} push {r4-r8, lr}
; adjust for taps ; adjust for taps
@@ -193,7 +196,7 @@ vpx_convolve8_avg_loop_horiz
lsl r1, r1, #1 lsl r1, r1, #1
lsl r3, r3, #1 lsl r3, r3, #1
vpx_convolve8_avg_loop_vert_h aom_convolve8_avg_loop_vert_h
mov r4, r0 mov r4, r0
add r7, r0, r1, asr #1 add r7, r0, r1, asr #1
mov r5, r2 mov r5, r2
@@ -213,7 +216,7 @@ vpx_convolve8_avg_loop_vert_h
vmovl.u8 q10, d20 vmovl.u8 q10, d20
vmovl.u8 q11, d22 vmovl.u8 q11, d22
vpx_convolve8_avg_loop_vert aom_convolve8_avg_loop_vert
; always process a 4x4 block at a time ; always process a 4x4 block at a time
vld1.u32 {d24[0]}, [r7], r1 vld1.u32 {d24[0]}, [r7], r1
vld1.u32 {d26[0]}, [r4], r1 vld1.u32 {d26[0]}, [r4], r1
@@ -278,13 +281,13 @@ vpx_convolve8_avg_loop_vert
vmov d22, d25 vmov d22, d25
subs r12, r12, #4 ; h -= 4 subs r12, r12, #4 ; h -= 4
bgt vpx_convolve8_avg_loop_vert bgt aom_convolve8_avg_loop_vert
; outer loop ; outer loop
add r0, r0, #4 add r0, r0, #4
add r2, r2, #4 add r2, r2, #4
subs r6, r6, #4 ; w -= 4 subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_avg_loop_vert_h bgt aom_convolve8_avg_loop_vert_h
pop {r4-r8, pc} pop {r4-r8, pc}

View File

@@ -1,31 +1,27 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include <assert.h> #include <assert.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
#include "vpx_ports/mem.h" #include "aom_ports/mem.h"
static INLINE int32x4_t MULTIPLY_BY_Q0( static INLINE int32x4_t MULTIPLY_BY_Q0(int16x4_t dsrc0, int16x4_t dsrc1,
int16x4_t dsrc0, int16x4_t dsrc2, int16x4_t dsrc3,
int16x4_t dsrc1, int16x4_t dsrc4, int16x4_t dsrc5,
int16x4_t dsrc2, int16x4_t dsrc6, int16x4_t dsrc7,
int16x4_t dsrc3, int16x8_t q0s16) {
int16x4_t dsrc4,
int16x4_t dsrc5,
int16x4_t dsrc6,
int16x4_t dsrc7,
int16x8_t q0s16) {
int32x4_t qdst; int32x4_t qdst;
int16x4_t d0s16, d1s16; int16x4_t d0s16, d1s16;
@@ -43,17 +39,12 @@ static INLINE int32x4_t MULTIPLY_BY_Q0(
return qdst; return qdst;
} }
void vpx_convolve8_horiz_neon( void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *src, uint8_t *dst, ptrdiff_t dst_stride,
ptrdiff_t src_stride, const int16_t *filter_x, int x_step_q4,
uint8_t *dst, const int16_t *filter_y, // unused
ptrdiff_t dst_stride, int y_step_q4, // unused
const int16_t *filter_x, int w, int h) {
int x_step_q4,
const int16_t *filter_y, // unused
int y_step_q4, // unused
int w,
int h) {
int width; int width;
const uint8_t *s, *psrc; const uint8_t *s, *psrc;
uint8_t *d, *pdst; uint8_t *d, *pdst;
@@ -74,12 +65,15 @@ void vpx_convolve8_horiz_neon(
assert(x_step_q4 == 16); assert(x_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_y;
q0s16 = vld1q_s16(filter_x); q0s16 = vld1q_s16(filter_x);
src -= 3; // adjust for taps src -= 3; // adjust for taps
for (; h > 0; h -= 4, for (; h > 0; h -= 4, src += src_stride * 4,
src += src_stride * 4, dst += dst_stride * 4) { // loop_horiz_v
dst += dst_stride * 4) { // loop_horiz_v
s = src; s = src;
d24u8 = vld1_u8(s); d24u8 = vld1_u8(s);
s += src_stride; s += src_stride;
@@ -92,8 +86,8 @@ void vpx_convolve8_horiz_neon(
q12u8 = vcombine_u8(d24u8, d25u8); q12u8 = vcombine_u8(d24u8, d25u8);
q13u8 = vcombine_u8(d26u8, d27u8); q13u8 = vcombine_u8(d26u8, d27u8);
q0x2u16 = vtrnq_u16(vreinterpretq_u16_u8(q12u8), q0x2u16 =
vreinterpretq_u16_u8(q13u8)); vtrnq_u16(vreinterpretq_u16_u8(q12u8), vreinterpretq_u16_u8(q13u8));
d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0])); d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0]));
d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0])); d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0]));
d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1])); d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1]));
@@ -105,8 +99,8 @@ void vpx_convolve8_horiz_neon(
__builtin_prefetch(src + src_stride * 5); __builtin_prefetch(src + src_stride * 5);
__builtin_prefetch(src + src_stride * 6); __builtin_prefetch(src + src_stride * 6);
q8u16 = vmovl_u8(d0x2u8.val[0]); q8u16 = vmovl_u8(d0x2u8.val[0]);
q9u16 = vmovl_u8(d0x2u8.val[1]); q9u16 = vmovl_u8(d0x2u8.val[1]);
q10u16 = vmovl_u8(d1x2u8.val[0]); q10u16 = vmovl_u8(d1x2u8.val[0]);
q11u16 = vmovl_u8(d1x2u8.val[1]); q11u16 = vmovl_u8(d1x2u8.val[1]);
@@ -119,8 +113,7 @@ void vpx_convolve8_horiz_neon(
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16)); d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21 d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21
for (width = w, psrc = src + 7, pdst = dst; for (width = w, psrc = src + 7, pdst = dst; width > 0;
width > 0;
width -= 4, psrc += 4, pdst += 4) { // loop_horiz width -= 4, psrc += 4, pdst += 4) { // loop_horiz
s = psrc; s = psrc;
d28u32 = vld1_dup_u32((const uint32_t *)s); d28u32 = vld1_dup_u32((const uint32_t *)s);
@@ -133,10 +126,10 @@ void vpx_convolve8_horiz_neon(
__builtin_prefetch(psrc + 64); __builtin_prefetch(psrc + 64);
d0x2u16 = vtrn_u16(vreinterpret_u16_u32(d28u32), d0x2u16 =
vreinterpret_u16_u32(d31u32)); vtrn_u16(vreinterpret_u16_u32(d28u32), vreinterpret_u16_u32(d31u32));
d1x2u16 = vtrn_u16(vreinterpret_u16_u32(d29u32), d1x2u16 =
vreinterpret_u16_u32(d30u32)); vtrn_u16(vreinterpret_u16_u32(d29u32), vreinterpret_u16_u32(d30u32));
d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28 d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28
vreinterpret_u8_u16(d1x2u16.val[0])); // d29 vreinterpret_u8_u16(d1x2u16.val[0])); // d29
d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31 d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31
@@ -146,8 +139,8 @@ void vpx_convolve8_horiz_neon(
q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]); q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]);
q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]); q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]);
q0x2u32 = vtrnq_u32(vreinterpretq_u32_u8(q14u8), q0x2u32 =
vreinterpretq_u32_u8(q15u8)); vtrnq_u32(vreinterpretq_u32_u8(q14u8), vreinterpretq_u32_u8(q15u8));
d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0])); d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0]));
d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0])); d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0]));
@@ -166,14 +159,14 @@ void vpx_convolve8_horiz_neon(
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16)); d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16)); d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, d18s16, d19s16,
d18s16, d19s16, d23s16, d24s16, q0s16); d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, d19s16, d23s16,
d19s16, d23s16, d24s16, d26s16, q0s16); d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, d23s16, d24s16,
d23s16, d24s16, d26s16, d27s16, q0s16); d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, d24s16, d26s16,
d24s16, d26s16, d27s16, d25s16, q0s16); d27s16, d25s16, q0s16);
__builtin_prefetch(psrc + 60 + src_stride * 3); __builtin_prefetch(psrc + 60 + src_stride * 3);
@@ -188,8 +181,7 @@ void vpx_convolve8_horiz_neon(
d2u8 = vqmovn_u16(q1u16); d2u8 = vqmovn_u16(q1u16);
d3u8 = vqmovn_u16(q2u16); d3u8 = vqmovn_u16(q2u16);
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), vreinterpret_u16_u8(d3u8));
vreinterpret_u16_u8(d3u8));
d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]), d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]),
vreinterpret_u32_u16(d0x2u16.val[1])); vreinterpret_u32_u16(d0x2u16.val[1]));
d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]), d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]),
@@ -217,17 +209,12 @@ void vpx_convolve8_horiz_neon(
return; return;
} }
void vpx_convolve8_vert_neon( void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *src, uint8_t *dst, ptrdiff_t dst_stride,
ptrdiff_t src_stride, const int16_t *filter_x, // unused
uint8_t *dst, int x_step_q4, // unused
ptrdiff_t dst_stride, const int16_t *filter_y, int y_step_q4, int w,
const int16_t *filter_x, // unused int h) {
int x_step_q4, // unused
const int16_t *filter_y,
int y_step_q4,
int w,
int h) {
int height; int height;
const uint8_t *s; const uint8_t *s;
uint8_t *d; uint8_t *d;
@@ -242,6 +229,10 @@ void vpx_convolve8_vert_neon(
assert(y_step_q4 == 16); assert(y_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_x;
src -= src_stride * 3; src -= src_stride * 3;
q0s16 = vld1q_s16(filter_y); q0s16 = vld1q_s16(filter_y);
for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h
@@ -261,8 +252,8 @@ void vpx_convolve8_vert_neon(
d22u32 = vld1_lane_u32((const uint32_t *)s, d22u32, 0); d22u32 = vld1_lane_u32((const uint32_t *)s, d22u32, 0);
s += src_stride; s += src_stride;
q8u16 = vmovl_u8(vreinterpret_u8_u32(d16u32)); q8u16 = vmovl_u8(vreinterpret_u8_u32(d16u32));
q9u16 = vmovl_u8(vreinterpret_u8_u32(d18u32)); q9u16 = vmovl_u8(vreinterpret_u8_u32(d18u32));
q10u16 = vmovl_u8(vreinterpret_u8_u32(d20u32)); q10u16 = vmovl_u8(vreinterpret_u8_u32(d20u32));
q11u16 = vmovl_u8(vreinterpret_u8_u32(d22u32)); q11u16 = vmovl_u8(vreinterpret_u8_u32(d22u32));
@@ -294,20 +285,20 @@ void vpx_convolve8_vert_neon(
__builtin_prefetch(d); __builtin_prefetch(d);
__builtin_prefetch(d + dst_stride); __builtin_prefetch(d + dst_stride);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, d20s16, d21s16,
d20s16, d21s16, d22s16, d24s16, q0s16); d22s16, d24s16, q0s16);
__builtin_prefetch(d + dst_stride * 2); __builtin_prefetch(d + dst_stride * 2);
__builtin_prefetch(d + dst_stride * 3); __builtin_prefetch(d + dst_stride * 3);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, d21s16, d22s16,
d21s16, d22s16, d24s16, d26s16, q0s16); d24s16, d26s16, q0s16);
__builtin_prefetch(s); __builtin_prefetch(s);
__builtin_prefetch(s + src_stride); __builtin_prefetch(s + src_stride);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, d22s16, d24s16,
d22s16, d24s16, d26s16, d27s16, q0s16); d26s16, d27s16, q0s16);
__builtin_prefetch(s + src_stride * 2); __builtin_prefetch(s + src_stride * 2);
__builtin_prefetch(s + src_stride * 3); __builtin_prefetch(s + src_stride * 3);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, d24s16, d26s16,
d24s16, d26s16, d27s16, d25s16, q0s16); d27s16, d25s16, q0s16);
d2u16 = vqrshrun_n_s32(q1s32, 7); d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7); d3u16 = vqrshrun_n_s32(q2s32, 7);

View File

@@ -1,11 +1,14 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
@@ -14,11 +17,11 @@
; w%4 == 0 ; w%4 == 0
; h%4 == 0 ; h%4 == 0
; taps == 8 ; taps == 8
; VP9_FILTER_WEIGHT == 128 ; AV1_FILTER_WEIGHT == 128
; VP9_FILTER_SHIFT == 7 ; AV1_FILTER_SHIFT == 7
EXPORT |vpx_convolve8_horiz_neon| EXPORT |aom_convolve8_horiz_neon|
EXPORT |vpx_convolve8_vert_neon| EXPORT |aom_convolve8_vert_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
@@ -49,7 +52,7 @@
; sp[]int w ; sp[]int w
; sp[]int h ; sp[]int h
|vpx_convolve8_horiz_neon| PROC |aom_convolve8_horiz_neon| PROC
push {r4-r10, lr} push {r4-r10, lr}
sub r0, r0, #3 ; adjust for taps sub r0, r0, #3 ; adjust for taps
@@ -72,7 +75,7 @@
mov r10, r6 ; w loop counter mov r10, r6 ; w loop counter
vpx_convolve8_loop_horiz_v aom_convolve8_loop_horiz_v
vld1.8 {d24}, [r0], r1 vld1.8 {d24}, [r0], r1
vld1.8 {d25}, [r0], r1 vld1.8 {d25}, [r0], r1
vld1.8 {d26}, [r0], r1 vld1.8 {d26}, [r0], r1
@@ -95,7 +98,7 @@ vpx_convolve8_loop_horiz_v
add r0, r0, #3 add r0, r0, #3
vpx_convolve8_loop_horiz aom_convolve8_loop_horiz
add r5, r0, #64 add r5, r0, #64
vld1.32 {d28[]}, [r0], r1 vld1.32 {d28[]}, [r0], r1
@@ -153,20 +156,20 @@ vpx_convolve8_loop_horiz
vmov q9, q13 vmov q9, q13
subs r6, r6, #4 ; w -= 4 subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_loop_horiz bgt aom_convolve8_loop_horiz
; outer loop ; outer loop
mov r6, r10 ; restore w counter mov r6, r10 ; restore w counter
add r0, r0, r9 ; src += src_stride * 4 - w add r0, r0, r9 ; src += src_stride * 4 - w
add r2, r2, r12 ; dst += dst_stride * 4 - w add r2, r2, r12 ; dst += dst_stride * 4 - w
subs r7, r7, #4 ; h -= 4 subs r7, r7, #4 ; h -= 4
bgt vpx_convolve8_loop_horiz_v bgt aom_convolve8_loop_horiz_v
pop {r4-r10, pc} pop {r4-r10, pc}
ENDP ENDP
|vpx_convolve8_vert_neon| PROC |aom_convolve8_vert_neon| PROC
push {r4-r8, lr} push {r4-r8, lr}
; adjust for taps ; adjust for taps
@@ -182,7 +185,7 @@ vpx_convolve8_loop_horiz
lsl r1, r1, #1 lsl r1, r1, #1
lsl r3, r3, #1 lsl r3, r3, #1
vpx_convolve8_loop_vert_h aom_convolve8_loop_vert_h
mov r4, r0 mov r4, r0
add r7, r0, r1, asr #1 add r7, r0, r1, asr #1
mov r5, r2 mov r5, r2
@@ -202,7 +205,7 @@ vpx_convolve8_loop_vert_h
vmovl.u8 q10, d20 vmovl.u8 q10, d20
vmovl.u8 q11, d22 vmovl.u8 q11, d22
vpx_convolve8_loop_vert aom_convolve8_loop_vert
; always process a 4x4 block at a time ; always process a 4x4 block at a time
vld1.u32 {d24[0]}, [r7], r1 vld1.u32 {d24[0]}, [r7], r1
vld1.u32 {d26[0]}, [r4], r1 vld1.u32 {d26[0]}, [r4], r1
@@ -256,13 +259,13 @@ vpx_convolve8_loop_vert
vmov d22, d25 vmov d22, d25
subs r12, r12, #4 ; h -= 4 subs r12, r12, #4 ; h -= 4
bgt vpx_convolve8_loop_vert bgt aom_convolve8_loop_vert
; outer loop ; outer loop
add r0, r0, #4 add r0, r0, #4
add r2, r2, #4 add r2, r2, #4
subs r6, r6, #4 ; w -= 4 subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_loop_vert_h bgt aom_convolve8_loop_vert_h
pop {r4-r8, pc} pop {r4-r8, pc}

View File

@@ -1,46 +1,45 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
void vpx_convolve_avg_neon( void aom_convolve_avg_neon(const uint8_t *src, // r0
const uint8_t *src, // r0 ptrdiff_t src_stride, // r1
ptrdiff_t src_stride, // r1 uint8_t *dst, // r2
uint8_t *dst, // r2 ptrdiff_t dst_stride, // r3
ptrdiff_t dst_stride, // r3 const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_x, const int16_t *filter_y, int filter_y_stride, int w,
int filter_x_stride, int h) {
const int16_t *filter_y,
int filter_y_stride,
int w,
int h) {
uint8_t *d; uint8_t *d;
uint8x8_t d0u8, d1u8, d2u8, d3u8; uint8x8_t d0u8, d1u8, d2u8, d3u8;
uint32x2_t d0u32, d2u32; uint32x2_t d0u32, d2u32;
uint8x16_t q0u8, q1u8, q2u8, q3u8, q8u8, q9u8, q10u8, q11u8; uint8x16_t q0u8, q1u8, q2u8, q3u8, q8u8, q9u8, q10u8, q11u8;
(void)filter_x; (void)filter_x_stride; (void)filter_x;
(void)filter_y; (void)filter_y_stride; (void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
d = dst; d = dst;
if (w > 32) { // avg64 if (w > 32) { // avg64
for (; h > 0; h -= 1) { for (; h > 0; h -= 1) {
q0u8 = vld1q_u8(src); q0u8 = vld1q_u8(src);
q1u8 = vld1q_u8(src + 16); q1u8 = vld1q_u8(src + 16);
q2u8 = vld1q_u8(src + 32); q2u8 = vld1q_u8(src + 32);
q3u8 = vld1q_u8(src + 48); q3u8 = vld1q_u8(src + 48);
src += src_stride; src += src_stride;
q8u8 = vld1q_u8(d); q8u8 = vld1q_u8(d);
q9u8 = vld1q_u8(d + 16); q9u8 = vld1q_u8(d + 16);
q10u8 = vld1q_u8(d + 32); q10u8 = vld1q_u8(d + 32);
q11u8 = vld1q_u8(d + 48); q11u8 = vld1q_u8(d + 48);
d += dst_stride; d += dst_stride;
@@ -133,8 +132,7 @@ void vpx_convolve_avg_neon(
d2u32 = vld1_lane_u32((const uint32_t *)d, d2u32, 1); d2u32 = vld1_lane_u32((const uint32_t *)d, d2u32, 1);
d += dst_stride; d += dst_stride;
d0u8 = vrhadd_u8(vreinterpret_u8_u32(d0u32), d0u8 = vrhadd_u8(vreinterpret_u8_u32(d0u32), vreinterpret_u8_u32(d2u32));
vreinterpret_u8_u32(d2u32));
d0u32 = vreinterpret_u32_u8(d0u8); d0u32 = vreinterpret_u32_u8(d0u8);
vst1_lane_u32((uint32_t *)dst, d0u32, 0); vst1_lane_u32((uint32_t *)dst, d0u32, 0);

View File

@@ -1,21 +1,24 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_convolve_avg_neon| ;
EXPORT |aom_convolve_avg_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
|vpx_convolve_avg_neon| PROC |aom_convolve_avg_neon| PROC
push {r4-r6, lr} push {r4-r6, lr}
ldrd r4, r5, [sp, #32] ldrd r4, r5, [sp, #32]
mov r6, r2 mov r6, r2

View File

@@ -1,33 +1,32 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
void vpx_convolve_copy_neon( void aom_convolve_copy_neon(const uint8_t *src, // r0
const uint8_t *src, // r0 ptrdiff_t src_stride, // r1
ptrdiff_t src_stride, // r1 uint8_t *dst, // r2
uint8_t *dst, // r2 ptrdiff_t dst_stride, // r3
ptrdiff_t dst_stride, // r3 const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_x, const int16_t *filter_y, int filter_y_stride, int w,
int filter_x_stride, int h) {
const int16_t *filter_y,
int filter_y_stride,
int w,
int h) {
uint8x8_t d0u8, d2u8; uint8x8_t d0u8, d2u8;
uint8x16_t q0u8, q1u8, q2u8, q3u8; uint8x16_t q0u8, q1u8, q2u8, q3u8;
(void)filter_x; (void)filter_x_stride; (void)filter_x;
(void)filter_y; (void)filter_y_stride; (void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
if (w > 32) { // copy64 if (w > 32) { // copy64
for (; h > 0; h--) { for (; h > 0; h--) {

View File

@@ -1,21 +1,24 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_convolve_copy_neon| ;
EXPORT |aom_convolve_copy_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
|vpx_convolve_copy_neon| PROC |aom_convolve_copy_neon| PROC
push {r4-r5, lr} push {r4-r5, lr}
ldrd r4, r5, [sp, #28] ldrd r4, r5, [sp, #28]

View File

@@ -0,0 +1,66 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "./aom_dsp_rtcd.h"
#include "aom_dsp/aom_dsp_common.h"
#include "aom_ports/mem.h"
void aom_convolve8_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
/* Given our constraints: w <= 64, h <= 64, taps == 8 we can reduce the
* maximum buffer size to 64 * 64 + 7 (+ 1 to make it divisible by 4).
*/
DECLARE_ALIGNED(8, uint8_t, temp[64 * 72]);
// Account for the vertical phase needing 3 lines prior and 4 lines post
int intermediate_height = h + 7;
assert(y_step_q4 == 16);
assert(x_step_q4 == 16);
/* Filter starting 3 lines back. The neon implementation will ignore the
* given height and filter a multiple of 4 lines. Since this goes in to
* the temp buffer which has lots of extra room and is subsequently discarded
* this is safe if somewhat less than ideal.
*/
aom_convolve8_horiz_neon(src - src_stride * 3, src_stride, temp, 64, filter_x,
x_step_q4, filter_y, y_step_q4, w,
intermediate_height);
/* Step into the temp buffer 3 lines to get the actual frame data */
aom_convolve8_vert_neon(temp + 64 * 3, 64, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h);
}
void aom_convolve8_avg_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, int w,
int h) {
DECLARE_ALIGNED(8, uint8_t, temp[64 * 72]);
int intermediate_height = h + 7;
assert(y_step_q4 == 16);
assert(x_step_q4 == 16);
/* This implementation has the same issues as above. In addition, we only want
* to average the values after both passes.
*/
aom_convolve8_horiz_neon(src - src_stride * 3, src_stride, temp, 64, filter_x,
x_step_q4, filter_y, y_step_q4, w,
intermediate_height);
aom_convolve8_avg_vert_neon(temp + 64 * 3, 64, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h);
}

View File

@@ -1,20 +1,21 @@
/* /*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include <assert.h> #include <assert.h>
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "./vpx_config.h" #include "./aom_config.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
static INLINE unsigned int horizontal_add_u16x8(const uint16x8_t v_16x8) { static INLINE unsigned int horizontal_add_u16x8(const uint16x8_t v_16x8) {
const uint32x4_t a = vpaddlq_u16(v_16x8); const uint32x4_t a = vpaddlq_u16(v_16x8);
@@ -24,7 +25,7 @@ static INLINE unsigned int horizontal_add_u16x8(const uint16x8_t v_16x8) {
return vget_lane_u32(c, 0); return vget_lane_u32(c, 0);
} }
unsigned int vpx_avg_4x4_neon(const uint8_t *s, int p) { unsigned int aom_avg_4x4_neon(const uint8_t *s, int p) {
uint16x8_t v_sum; uint16x8_t v_sum;
uint32x2_t v_s0 = vdup_n_u32(0); uint32x2_t v_s0 = vdup_n_u32(0);
uint32x2_t v_s1 = vdup_n_u32(0); uint32x2_t v_s1 = vdup_n_u32(0);
@@ -36,7 +37,7 @@ unsigned int vpx_avg_4x4_neon(const uint8_t *s, int p) {
return (horizontal_add_u16x8(v_sum) + 8) >> 4; return (horizontal_add_u16x8(v_sum) + 8) >> 4;
} }
unsigned int vpx_avg_8x8_neon(const uint8_t *s, int p) { unsigned int aom_avg_8x8_neon(const uint8_t *s, int p) {
uint8x8_t v_s0 = vld1_u8(s); uint8x8_t v_s0 = vld1_u8(s);
const uint8x8_t v_s1 = vld1_u8(s + p); const uint8x8_t v_s1 = vld1_u8(s + p);
uint16x8_t v_sum = vaddl_u8(v_s0, v_s1); uint16x8_t v_sum = vaddl_u8(v_s0, v_s1);
@@ -64,7 +65,7 @@ unsigned int vpx_avg_8x8_neon(const uint8_t *s, int p) {
// coeff: 16 bits, dynamic range [-32640, 32640]. // coeff: 16 bits, dynamic range [-32640, 32640].
// length: value range {16, 64, 256, 1024}. // length: value range {16, 64, 256, 1024}.
int vpx_satd_neon(const int16_t *coeff, int length) { int aom_satd_neon(const int16_t *coeff, int length) {
const int16x4_t zero = vdup_n_s16(0); const int16x4_t zero = vdup_n_s16(0);
int32x4_t accum = vdupq_n_s32(0); int32x4_t accum = vdupq_n_s32(0);
@@ -89,7 +90,7 @@ int vpx_satd_neon(const int16_t *coeff, int length) {
} }
} }
void vpx_int_pro_row_neon(int16_t hbuf[16], uint8_t const *ref, void aom_int_pro_row_neon(int16_t hbuf[16], uint8_t const *ref,
const int ref_stride, const int height) { const int ref_stride, const int height) {
int i; int i;
uint16x8_t vec_sum_lo = vdupq_n_u16(0); uint16x8_t vec_sum_lo = vdupq_n_u16(0);
@@ -142,7 +143,7 @@ void vpx_int_pro_row_neon(int16_t hbuf[16], uint8_t const *ref,
vst1q_s16(hbuf, vreinterpretq_s16_u16(vec_sum_hi)); vst1q_s16(hbuf, vreinterpretq_s16_u16(vec_sum_hi));
} }
int16_t vpx_int_pro_col_neon(uint8_t const *ref, const int width) { int16_t aom_int_pro_col_neon(uint8_t const *ref, const int width) {
int i; int i;
uint16x8_t vec_sum = vdupq_n_u16(0); uint16x8_t vec_sum = vdupq_n_u16(0);
@@ -158,7 +159,7 @@ int16_t vpx_int_pro_col_neon(uint8_t const *ref, const int width) {
// ref, src = [0, 510] - max diff = 16-bits // ref, src = [0, 510] - max diff = 16-bits
// bwl = {2, 3, 4}, width = {16, 32, 64} // bwl = {2, 3, 4}, width = {16, 32, 64}
int vpx_vector_var_neon(int16_t const *ref, int16_t const *src, const int bwl) { int aom_vector_var_neon(int16_t const *ref, int16_t const *src, const int bwl) {
int width = 4 << bwl; int width = 4 << bwl;
int32x4_t sse = vdupq_n_s32(0); int32x4_t sse = vdupq_n_s32(0);
int16x8_t total = vdupq_n_s16(0); int16x8_t total = vdupq_n_s16(0);
@@ -198,27 +199,24 @@ int vpx_vector_var_neon(int16_t const *ref, int16_t const *src, const int bwl) {
} }
} }
void vpx_minmax_8x8_neon(const uint8_t *a, int a_stride, void aom_minmax_8x8_neon(const uint8_t *a, int a_stride, const uint8_t *b,
const uint8_t *b, int b_stride, int b_stride, int *min, int *max) {
int *min, int *max) {
// Load and concatenate. // Load and concatenate.
const uint8x16_t a01 = vcombine_u8(vld1_u8(a), const uint8x16_t a01 = vcombine_u8(vld1_u8(a), vld1_u8(a + a_stride));
vld1_u8(a + a_stride)); const uint8x16_t a23 =
const uint8x16_t a23 = vcombine_u8(vld1_u8(a + 2 * a_stride), vcombine_u8(vld1_u8(a + 2 * a_stride), vld1_u8(a + 3 * a_stride));
vld1_u8(a + 3 * a_stride)); const uint8x16_t a45 =
const uint8x16_t a45 = vcombine_u8(vld1_u8(a + 4 * a_stride), vcombine_u8(vld1_u8(a + 4 * a_stride), vld1_u8(a + 5 * a_stride));
vld1_u8(a + 5 * a_stride)); const uint8x16_t a67 =
const uint8x16_t a67 = vcombine_u8(vld1_u8(a + 6 * a_stride), vcombine_u8(vld1_u8(a + 6 * a_stride), vld1_u8(a + 7 * a_stride));
vld1_u8(a + 7 * a_stride));
const uint8x16_t b01 = vcombine_u8(vld1_u8(b), const uint8x16_t b01 = vcombine_u8(vld1_u8(b), vld1_u8(b + b_stride));
vld1_u8(b + b_stride)); const uint8x16_t b23 =
const uint8x16_t b23 = vcombine_u8(vld1_u8(b + 2 * b_stride), vcombine_u8(vld1_u8(b + 2 * b_stride), vld1_u8(b + 3 * b_stride));
vld1_u8(b + 3 * b_stride)); const uint8x16_t b45 =
const uint8x16_t b45 = vcombine_u8(vld1_u8(b + 4 * b_stride), vcombine_u8(vld1_u8(b + 4 * b_stride), vld1_u8(b + 5 * b_stride));
vld1_u8(b + 5 * b_stride)); const uint8x16_t b67 =
const uint8x16_t b67 = vcombine_u8(vld1_u8(b + 6 * b_stride), vcombine_u8(vld1_u8(b + 6 * b_stride), vld1_u8(b + 7 * b_stride));
vld1_u8(b + 7 * b_stride));
// Absolute difference. // Absolute difference.
const uint8x16_t ab01_diff = vabdq_u8(a01, b01); const uint8x16_t ab01_diff = vabdq_u8(a01, b01);

View File

@@ -1,16 +1,19 @@
; ;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
EXPORT |vpx_filter_block2d_bil_first_pass_media| EXPORT |aom_filter_block2d_bil_first_pass_media|
EXPORT |vpx_filter_block2d_bil_second_pass_media| EXPORT |aom_filter_block2d_bil_second_pass_media|
AREA |.text|, CODE, READONLY ; name this block of code AREA |.text|, CODE, READONLY ; name this block of code
@@ -20,13 +23,13 @@
; r2 unsigned int src_pitch, ; r2 unsigned int src_pitch,
; r3 unsigned int height, ; r3 unsigned int height,
; stack unsigned int width, ; stack unsigned int width,
; stack const short *vpx_filter ; stack const short *aom_filter
;------------------------------------- ;-------------------------------------
; The output is transposed stroed in output array to make it easy for second pass filtering. ; The output is transposed stroed in output array to make it easy for second pass filtering.
|vpx_filter_block2d_bil_first_pass_media| PROC |aom_filter_block2d_bil_first_pass_media| PROC
stmdb sp!, {r4 - r11, lr} stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vpx_filter address ldr r11, [sp, #40] ; aom_filter address
ldr r4, [sp, #36] ; width ldr r4, [sp, #36] ; width
mov r12, r3 ; outer-loop counter mov r12, r3 ; outer-loop counter
@@ -134,7 +137,7 @@
ldmia sp!, {r4 - r11, pc} ldmia sp!, {r4 - r11, pc}
ENDP ; |vpx_filter_block2d_bil_first_pass_media| ENDP ; |aom_filter_block2d_bil_first_pass_media|
;--------------------------------- ;---------------------------------
@@ -143,12 +146,12 @@
; r2 int dst_pitch, ; r2 int dst_pitch,
; r3 unsigned int height, ; r3 unsigned int height,
; stack unsigned int width, ; stack unsigned int width,
; stack const short *vpx_filter ; stack const short *aom_filter
;--------------------------------- ;---------------------------------
|vpx_filter_block2d_bil_second_pass_media| PROC |aom_filter_block2d_bil_second_pass_media| PROC
stmdb sp!, {r4 - r11, lr} stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vpx_filter address ldr r11, [sp, #40] ; aom_filter address
ldr r4, [sp, #36] ; width ldr r4, [sp, #36] ; width
ldr r5, [r11] ; load up filter coefficients ldr r5, [r11] ; load up filter coefficients
@@ -232,6 +235,6 @@
bne bil_height_loop_null_2nd bne bil_height_loop_null_2nd
ldmia sp!, {r4 - r11, pc} ldmia sp!, {r4 - r11, pc}
ENDP ; |vpx_filter_block2d_second_pass_media| ENDP ; |aom_filter_block2d_second_pass_media|
END END

View File

@@ -1,19 +1,20 @@
/* /*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "vpx_dsp/txfm_common.h" #include "aom_dsp/txfm_common.h"
void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) { void aom_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
int i; int i;
// stage 1 // stage 1
int16x8_t input_0 = vshlq_n_s16(vld1q_s16(&input[0 * stride]), 2); int16x8_t input_0 = vshlq_n_s16(vld1q_s16(&input[0 * stride]), 2);
@@ -52,10 +53,10 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
v_t2_hi = vmlal_n_s16(v_t2_hi, vget_high_s16(v_x3), (int16_t)cospi_8_64); v_t2_hi = vmlal_n_s16(v_t2_hi, vget_high_s16(v_x3), (int16_t)cospi_8_64);
v_t3_lo = vmlsl_n_s16(v_t3_lo, vget_low_s16(v_x2), (int16_t)cospi_8_64); v_t3_lo = vmlsl_n_s16(v_t3_lo, vget_low_s16(v_x2), (int16_t)cospi_8_64);
v_t3_hi = vmlsl_n_s16(v_t3_hi, vget_high_s16(v_x2), (int16_t)cospi_8_64); v_t3_hi = vmlsl_n_s16(v_t3_hi, vget_high_s16(v_x2), (int16_t)cospi_8_64);
v_t0_lo = vmulq_n_s32(v_t0_lo, cospi_16_64); v_t0_lo = vmulq_n_s32(v_t0_lo, (int32_t)cospi_16_64);
v_t0_hi = vmulq_n_s32(v_t0_hi, cospi_16_64); v_t0_hi = vmulq_n_s32(v_t0_hi, (int32_t)cospi_16_64);
v_t1_lo = vmulq_n_s32(v_t1_lo, cospi_16_64); v_t1_lo = vmulq_n_s32(v_t1_lo, (int32_t)cospi_16_64);
v_t1_hi = vmulq_n_s32(v_t1_hi, cospi_16_64); v_t1_hi = vmulq_n_s32(v_t1_hi, (int32_t)cospi_16_64);
{ {
const int16x4_t a = vrshrn_n_s32(v_t0_lo, DCT_CONST_BITS); const int16x4_t a = vrshrn_n_s32(v_t0_lo, DCT_CONST_BITS);
const int16x4_t b = vrshrn_n_s32(v_t0_hi, DCT_CONST_BITS); const int16x4_t b = vrshrn_n_s32(v_t0_hi, DCT_CONST_BITS);
@@ -131,14 +132,14 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
// 14 15 16 17 54 55 56 57 // 14 15 16 17 54 55 56 57
// 24 25 26 27 64 65 66 67 // 24 25 26 27 64 65 66 67
// 34 35 36 37 74 75 76 77 // 34 35 36 37 74 75 76 77
const int32x4x2_t r02_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_0), const int32x4x2_t r02_s32 =
vreinterpretq_s32_s16(out_2)); vtrnq_s32(vreinterpretq_s32_s16(out_0), vreinterpretq_s32_s16(out_2));
const int32x4x2_t r13_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_1), const int32x4x2_t r13_s32 =
vreinterpretq_s32_s16(out_3)); vtrnq_s32(vreinterpretq_s32_s16(out_1), vreinterpretq_s32_s16(out_3));
const int32x4x2_t r46_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_4), const int32x4x2_t r46_s32 =
vreinterpretq_s32_s16(out_6)); vtrnq_s32(vreinterpretq_s32_s16(out_4), vreinterpretq_s32_s16(out_6));
const int32x4x2_t r57_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_5), const int32x4x2_t r57_s32 =
vreinterpretq_s32_s16(out_7)); vtrnq_s32(vreinterpretq_s32_s16(out_5), vreinterpretq_s32_s16(out_7));
const int16x8x2_t r01_s16 = const int16x8x2_t r01_s16 =
vtrnq_s16(vreinterpretq_s16_s32(r02_s32.val[0]), vtrnq_s16(vreinterpretq_s16_s32(r02_s32.val[0]),
vreinterpretq_s16_s32(r13_s32.val[0])); vreinterpretq_s16_s32(r13_s32.val[0]));
@@ -170,7 +171,7 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
} }
} // for } // for
{ {
// from vpx_dct_sse2.c // from aom_dct_sse2.c
// Post-condition (division by two) // Post-condition (division by two)
// division of two 16 bits signed numbers using shifts // division of two 16 bits signed numbers using shifts
// n / 2 = (n - (n >> 15)) >> 1 // n / 2 = (n - (n >> 15)) >> 1
@@ -202,7 +203,7 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
} }
} }
void vpx_fdct8x8_1_neon(const int16_t *input, int16_t *output, int stride) { void aom_fdct8x8_1_neon(const int16_t *input, int16_t *output, int stride) {
int r; int r;
int16x8_t sum = vld1q_s16(&input[0]); int16x8_t sum = vld1q_s16(&input[0]);
for (r = 1; r < 8; ++r) { for (r = 1; r < 8; ++r) {

View File

@@ -10,11 +10,10 @@
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1, static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1, int16x8_t *a2,
int16x8_t *a2, int16x8_t *a3, int16x8_t *a3, int16x8_t *a4, int16x8_t *a5,
int16x8_t *a4, int16x8_t *a5,
int16x8_t *a6, int16x8_t *a7) { int16x8_t *a6, int16x8_t *a7) {
const int16x8_t b0 = vaddq_s16(*a0, *a1); const int16x8_t b0 = vaddq_s16(*a0, *a1);
const int16x8_t b1 = vsubq_s16(*a0, *a1); const int16x8_t b1 = vsubq_s16(*a0, *a1);
@@ -47,9 +46,8 @@ static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1,
// TODO(johannkoenig): Make a transpose library and dedup with idct. Consider // TODO(johannkoenig): Make a transpose library and dedup with idct. Consider
// reversing transpose order which may make it easier for the compiler to // reversing transpose order which may make it easier for the compiler to
// reconcile the vtrn.64 moves. // reconcile the vtrn.64 moves.
static void transpose8x8(int16x8_t *a0, int16x8_t *a1, static void transpose8x8(int16x8_t *a0, int16x8_t *a1, int16x8_t *a2,
int16x8_t *a2, int16x8_t *a3, int16x8_t *a3, int16x8_t *a4, int16x8_t *a5,
int16x8_t *a4, int16x8_t *a5,
int16x8_t *a6, int16x8_t *a7) { int16x8_t *a6, int16x8_t *a7) {
// Swap 64 bit elements. Goes from: // Swap 64 bit elements. Goes from:
// a0: 00 01 02 03 04 05 06 07 // a0: 00 01 02 03 04 05 06 07
@@ -91,14 +89,14 @@ static void transpose8x8(int16x8_t *a0, int16x8_t *a1,
// a1657_hi: // a1657_hi:
// 12 13 28 29 44 45 60 61 // 12 13 28 29 44 45 60 61
// 14 15 30 31 46 47 62 63 // 14 15 30 31 46 47 62 63
const int32x4x2_t a0246_lo = vtrnq_s32(vreinterpretq_s32_s16(a04_lo), const int32x4x2_t a0246_lo =
vreinterpretq_s32_s16(a26_lo)); vtrnq_s32(vreinterpretq_s32_s16(a04_lo), vreinterpretq_s32_s16(a26_lo));
const int32x4x2_t a1357_lo = vtrnq_s32(vreinterpretq_s32_s16(a15_lo), const int32x4x2_t a1357_lo =
vreinterpretq_s32_s16(a37_lo)); vtrnq_s32(vreinterpretq_s32_s16(a15_lo), vreinterpretq_s32_s16(a37_lo));
const int32x4x2_t a0246_hi = vtrnq_s32(vreinterpretq_s32_s16(a04_hi), const int32x4x2_t a0246_hi =
vreinterpretq_s32_s16(a26_hi)); vtrnq_s32(vreinterpretq_s32_s16(a04_hi), vreinterpretq_s32_s16(a26_hi));
const int32x4x2_t a1357_hi = vtrnq_s32(vreinterpretq_s32_s16(a15_hi), const int32x4x2_t a1357_hi =
vreinterpretq_s32_s16(a37_hi)); vtrnq_s32(vreinterpretq_s32_s16(a15_hi), vreinterpretq_s32_s16(a37_hi));
// Swap 16 bit elements resulting in: // Swap 16 bit elements resulting in:
// b0: // b0:
@@ -132,7 +130,7 @@ static void transpose8x8(int16x8_t *a0, int16x8_t *a1,
*a7 = b3.val[1]; *a7 = b3.val[1];
} }
void vpx_hadamard_8x8_neon(const int16_t *src_diff, int src_stride, void aom_hadamard_8x8_neon(const int16_t *src_diff, int src_stride,
int16_t *coeff) { int16_t *coeff) {
int16x8_t a0 = vld1q_s16(src_diff); int16x8_t a0 = vld1q_s16(src_diff);
int16x8_t a1 = vld1q_s16(src_diff + src_stride); int16x8_t a1 = vld1q_s16(src_diff + src_stride);
@@ -161,19 +159,19 @@ void vpx_hadamard_8x8_neon(const int16_t *src_diff, int src_stride,
vst1q_s16(coeff + 56, a7); vst1q_s16(coeff + 56, a7);
} }
void vpx_hadamard_16x16_neon(const int16_t *src_diff, int src_stride, void aom_hadamard_16x16_neon(const int16_t *src_diff, int src_stride,
int16_t *coeff) { int16_t *coeff) {
int i; int i;
/* Rearrange 16x16 to 8x32 and remove stride. /* Rearrange 16x16 to 8x32 and remove stride.
* Top left first. */ * Top left first. */
vpx_hadamard_8x8_neon(src_diff + 0 + 0 * src_stride, src_stride, coeff + 0); aom_hadamard_8x8_neon(src_diff + 0 + 0 * src_stride, src_stride, coeff + 0);
/* Top right. */ /* Top right. */
vpx_hadamard_8x8_neon(src_diff + 8 + 0 * src_stride, src_stride, coeff + 64); aom_hadamard_8x8_neon(src_diff + 8 + 0 * src_stride, src_stride, coeff + 64);
/* Bottom left. */ /* Bottom left. */
vpx_hadamard_8x8_neon(src_diff + 0 + 8 * src_stride, src_stride, coeff + 128); aom_hadamard_8x8_neon(src_diff + 0 + 8 * src_stride, src_stride, coeff + 128);
/* Bottom right. */ /* Bottom right. */
vpx_hadamard_8x8_neon(src_diff + 8 + 8 * src_stride, src_stride, coeff + 192); aom_hadamard_8x8_neon(src_diff + 8 + 8 * src_stride, src_stride, coeff + 192);
for (i = 0; i < 64; i += 8) { for (i = 0; i < 64; i += 8) {
const int16x8_t a0 = vld1q_s16(coeff + 0); const int16x8_t a0 = vld1q_s16(coeff + 0);

View File

@@ -1,28 +1,31 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license and patent ; This source code is subject to the terms of the BSD 2 Clause License and
; grant that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. All contributing project authors may be found in the AUTHORS ; was not distributed with this source code in the LICENSE file, you can
; file in the root of the source tree. ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_idct16x16_1_add_neon|
EXPORT |aom_idct16x16_1_add_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_idct16x16_1_add_neon(int16_t *input, uint8_t *dest, ;void aom_idct16x16_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride) ; int dest_stride)
; ;
; r0 int16_t input ; r0 int16_t input
; r1 uint8_t *dest ; r1 uint8_t *dest
; r2 int dest_stride) ; r2 int dest_stride)
|vpx_idct16x16_1_add_neon| PROC |aom_idct16x16_1_add_neon| PROC
ldrsh r0, [r0] ldrsh r0, [r0]
; generate cospi_16_64 = 11585 ; generate cospi_16_64 = 11585
@@ -193,6 +196,6 @@
vst1.64 {d31}, [r12], r2 vst1.64 {d31}, [r12], r2
bx lr bx lr
ENDP ; |vpx_idct16x16_1_add_neon| ENDP ; |aom_idct16x16_1_add_neon|
END END

View File

@@ -0,0 +1,59 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct16x16_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d2u8, d3u8, d30u8, d31u8;
uint64x1_t d2u64, d3u64, d4u64, d5u64;
uint16x8_t q0u16, q9u16, q10u16, q11u16, q12u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, j, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 6);
q0s16 = vdupq_n_s16(a1);
q0u16 = vreinterpretq_u16_s16(q0s16);
for (d1 = d2 = dest, i = 0; i < 4; i++) {
for (j = 0; j < 2; j++) {
d2u64 = vld1_u64((const uint64_t *)d1);
d3u64 = vld1_u64((const uint64_t *)(d1 + 8));
d1 += dest_stride;
d4u64 = vld1_u64((const uint64_t *)d1);
d5u64 = vld1_u64((const uint64_t *)(d1 + 8));
d1 += dest_stride;
q9u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d2u64));
q10u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d3u64));
q11u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d4u64));
q12u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d5u64));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d30u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
d31u8 = vqmovun_s16(vreinterpretq_s16_u16(q12u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
vst1_u64((uint64_t *)(d2 + 8), vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d30u8));
vst1_u64((uint64_t *)(d2 + 8), vreinterpret_u64_u8(d31u8));
d2 += dest_stride;
}
}
return;
}

View File

@@ -1,17 +1,20 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_idct16x16_256_add_neon_pass1| ;
EXPORT |vpx_idct16x16_256_add_neon_pass2|
EXPORT |vpx_idct16x16_10_add_neon_pass1| EXPORT |aom_idct16x16_256_add_neon_pass1|
EXPORT |vpx_idct16x16_10_add_neon_pass2| EXPORT |aom_idct16x16_256_add_neon_pass2|
EXPORT |aom_idct16x16_10_add_neon_pass1|
EXPORT |aom_idct16x16_10_add_neon_pass2|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
@@ -36,7 +39,7 @@
MEND MEND
AREA Block, CODE, READONLY ; name this block of code AREA Block, CODE, READONLY ; name this block of code
;void |vpx_idct16x16_256_add_neon_pass1|(int16_t *input, ;void |aom_idct16x16_256_add_neon_pass1|(int16_t *input,
; int16_t *output, int output_stride) ; int16_t *output, int output_stride)
; ;
; r0 int16_t input ; r0 int16_t input
@@ -46,7 +49,7 @@
; idct16 stage1 - stage6 on all the elements loaded in q8-q15. The output ; idct16 stage1 - stage6 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7 ; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation. ; registers and use them as buffer during calculation.
|vpx_idct16x16_256_add_neon_pass1| PROC |aom_idct16x16_256_add_neon_pass1| PROC
; TODO(hkuang): Find a better way to load the elements. ; TODO(hkuang): Find a better way to load the elements.
; load elements of 0, 2, 4, 6, 8, 10, 12, 14 into q8 - q15 ; load elements of 0, 2, 4, 6, 8, 10, 12, 14 into q8 - q15
@@ -273,9 +276,9 @@
vst1.64 {d31}, [r1], r2 vst1.64 {d31}, [r1], r2
bx lr bx lr
ENDP ; |vpx_idct16x16_256_add_neon_pass1| ENDP ; |aom_idct16x16_256_add_neon_pass1|
;void vpx_idct16x16_256_add_neon_pass2(int16_t *src, ;void aom_idct16x16_256_add_neon_pass2(int16_t *src,
; int16_t *output, ; int16_t *output,
; int16_t *pass1Output, ; int16_t *pass1Output,
; int16_t skip_adding, ; int16_t skip_adding,
@@ -292,7 +295,7 @@
; idct16 stage1 - stage7 on all the elements loaded in q8-q15. The output ; idct16 stage1 - stage7 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7 ; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation. ; registers and use them as buffer during calculation.
|vpx_idct16x16_256_add_neon_pass2| PROC |aom_idct16x16_256_add_neon_pass2| PROC
push {r3-r9} push {r3-r9}
; TODO(hkuang): Find a better way to load the elements. ; TODO(hkuang): Find a better way to load the elements.
@@ -784,9 +787,9 @@ skip_adding_dest
end_idct16x16_pass2 end_idct16x16_pass2
pop {r3-r9} pop {r3-r9}
bx lr bx lr
ENDP ; |vpx_idct16x16_256_add_neon_pass2| ENDP ; |aom_idct16x16_256_add_neon_pass2|
;void |vpx_idct16x16_10_add_neon_pass1|(int16_t *input, ;void |aom_idct16x16_10_add_neon_pass1|(int16_t *input,
; int16_t *output, int output_stride) ; int16_t *output, int output_stride)
; ;
; r0 int16_t input ; r0 int16_t input
@@ -796,7 +799,7 @@ end_idct16x16_pass2
; idct16 stage1 - stage6 on all the elements loaded in q8-q15. The output ; idct16 stage1 - stage6 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7 ; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation. ; registers and use them as buffer during calculation.
|vpx_idct16x16_10_add_neon_pass1| PROC |aom_idct16x16_10_add_neon_pass1| PROC
; TODO(hkuang): Find a better way to load the elements. ; TODO(hkuang): Find a better way to load the elements.
; load elements of 0, 2, 4, 6, 8, 10, 12, 14 into q8 - q15 ; load elements of 0, 2, 4, 6, 8, 10, 12, 14 into q8 - q15
@@ -905,9 +908,9 @@ end_idct16x16_pass2
vst1.64 {d31}, [r1], r2 vst1.64 {d31}, [r1], r2
bx lr bx lr
ENDP ; |vpx_idct16x16_10_add_neon_pass1| ENDP ; |aom_idct16x16_10_add_neon_pass1|
;void vpx_idct16x16_10_add_neon_pass2(int16_t *src, ;void aom_idct16x16_10_add_neon_pass2(int16_t *src,
; int16_t *output, ; int16_t *output,
; int16_t *pass1Output, ; int16_t *pass1Output,
; int16_t skip_adding, ; int16_t skip_adding,
@@ -924,7 +927,7 @@ end_idct16x16_pass2
; idct16 stage1 - stage7 on all the elements loaded in q8-q15. The output ; idct16 stage1 - stage7 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7 ; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation. ; registers and use them as buffer during calculation.
|vpx_idct16x16_10_add_neon_pass2| PROC |aom_idct16x16_10_add_neon_pass2| PROC
push {r3-r9} push {r3-r9}
; TODO(hkuang): Find a better way to load the elements. ; TODO(hkuang): Find a better way to load the elements.
@@ -1175,5 +1178,5 @@ end_idct16x16_pass2
end_idct10_16x16_pass2 end_idct10_16x16_pass2
pop {r3-r9} pop {r3-r9}
bx lr bx lr
ENDP ; |vpx_idct16x16_10_add_neon_pass2| ENDP ; |aom_idct16x16_10_add_neon_pass2|
END END

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,152 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "aom_dsp/aom_dsp_common.h"
void aom_idct16x16_256_add_neon_pass1(const int16_t *input, int16_t *output,
int output_stride);
void aom_idct16x16_256_add_neon_pass2(const int16_t *src, int16_t *output,
int16_t *pass1Output, int16_t skip_adding,
uint8_t *dest, int dest_stride);
void aom_idct16x16_10_add_neon_pass1(const int16_t *input, int16_t *output,
int output_stride);
void aom_idct16x16_10_add_neon_pass2(const int16_t *src, int16_t *output,
int16_t *pass1Output, int16_t skip_adding,
uint8_t *dest, int dest_stride);
#if HAVE_NEON_ASM
/* For ARM NEON, d8-d15 are callee-saved registers, and need to be saved. */
extern void aom_push_neon(int64_t *store);
extern void aom_pop_neon(int64_t *store);
#endif // HAVE_NEON_ASM
void aom_idct16x16_256_add_neon(const int16_t *input, uint8_t *dest,
int dest_stride) {
#if HAVE_NEON_ASM
int64_t store_reg[8];
#endif
int16_t pass1_output[16 * 16] = { 0 };
int16_t row_idct_output[16 * 16] = { 0 };
#if HAVE_NEON_ASM
// save d8-d15 register values.
aom_push_neon(store_reg);
#endif
/* Parallel idct on the upper 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(input, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
aom_idct16x16_256_add_neon_pass2(input + 1, row_idct_output, pass1_output, 0,
dest, dest_stride);
/* Parallel idct on the lower 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(input + 8 * 16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
aom_idct16x16_256_add_neon_pass2(input + 8 * 16 + 1, row_idct_output + 8,
pass1_output, 0, dest, dest_stride);
/* Parallel idct on the left 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 1, row_idct_output,
pass1_output, 1, dest, dest_stride);
/* Parallel idct on the right 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output + 8 * 16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 8 * 16 + 1,
row_idct_output + 8, pass1_output, 1,
dest + 8, dest_stride);
#if HAVE_NEON_ASM
// restore d8-d15 register values.
aom_pop_neon(store_reg);
#endif
return;
}
void aom_idct16x16_10_add_neon(const int16_t *input, uint8_t *dest,
int dest_stride) {
#if HAVE_NEON_ASM
int64_t store_reg[8];
#endif
int16_t pass1_output[16 * 16] = { 0 };
int16_t row_idct_output[16 * 16] = { 0 };
#if HAVE_NEON_ASM
// save d8-d15 register values.
aom_push_neon(store_reg);
#endif
/* Parallel idct on the upper 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_10_add_neon_pass1(input, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
aom_idct16x16_10_add_neon_pass2(input + 1, row_idct_output, pass1_output, 0,
dest, dest_stride);
/* Skip Parallel idct on the lower 8 rows as they are all 0s */
/* Parallel idct on the left 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 1, row_idct_output,
pass1_output, 1, dest, dest_stride);
/* Parallel idct on the right 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output + 8 * 16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 8 * 16 + 1,
row_idct_output + 8, pass1_output, 1,
dest + 8, dest_stride);
#if HAVE_NEON_ASM
// restore d8-d15 register values.
aom_pop_neon(store_reg);
#endif
return;
}

View File

@@ -1,13 +1,16 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license and patent ; This source code is subject to the terms of the BSD 2 Clause License and
; grant that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. All contributing project authors may be found in the AUTHORS ; was not distributed with this source code in the LICENSE file, you can
; file in the root of the source tree. ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_idct32x32_1_add_neon|
EXPORT |aom_idct32x32_1_add_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
@@ -64,14 +67,14 @@
vst1.8 {q15},[$dst], $stride vst1.8 {q15},[$dst], $stride
MEND MEND
;void vpx_idct32x32_1_add_neon(int16_t *input, uint8_t *dest, ;void aom_idct32x32_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride) ; int dest_stride)
; ;
; r0 int16_t input ; r0 int16_t input
; r1 uint8_t *dest ; r1 uint8_t *dest
; r2 int dest_stride ; r2 int dest_stride
|vpx_idct32x32_1_add_neon| PROC |aom_idct32x32_1_add_neon| PROC
push {lr} push {lr}
pld [r1] pld [r1]
add r3, r1, #16 ; r3 dest + 16 for second loop add r3, r1, #16 ; r3 dest + 16 for second loop
@@ -140,5 +143,5 @@ diff_positive_32_32_loop
bne diff_positive_32_32_loop bne diff_positive_32_32_loop
pop {pc} pop {pc}
ENDP ; |vpx_idct32x32_1_add_neon| ENDP ; |aom_idct32x32_1_add_neon|
END END

View File

@@ -0,0 +1,141 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
static INLINE void LD_16x8(uint8_t *d, int d_stride, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vld1q_u8(d);
d += d_stride;
*q9u8 = vld1q_u8(d);
d += d_stride;
*q10u8 = vld1q_u8(d);
d += d_stride;
*q11u8 = vld1q_u8(d);
d += d_stride;
*q12u8 = vld1q_u8(d);
d += d_stride;
*q13u8 = vld1q_u8(d);
d += d_stride;
*q14u8 = vld1q_u8(d);
d += d_stride;
*q15u8 = vld1q_u8(d);
return;
}
static INLINE void ADD_DIFF_16x8(uint8x16_t qdiffu8, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vqaddq_u8(*q8u8, qdiffu8);
*q9u8 = vqaddq_u8(*q9u8, qdiffu8);
*q10u8 = vqaddq_u8(*q10u8, qdiffu8);
*q11u8 = vqaddq_u8(*q11u8, qdiffu8);
*q12u8 = vqaddq_u8(*q12u8, qdiffu8);
*q13u8 = vqaddq_u8(*q13u8, qdiffu8);
*q14u8 = vqaddq_u8(*q14u8, qdiffu8);
*q15u8 = vqaddq_u8(*q15u8, qdiffu8);
return;
}
static INLINE void SUB_DIFF_16x8(uint8x16_t qdiffu8, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vqsubq_u8(*q8u8, qdiffu8);
*q9u8 = vqsubq_u8(*q9u8, qdiffu8);
*q10u8 = vqsubq_u8(*q10u8, qdiffu8);
*q11u8 = vqsubq_u8(*q11u8, qdiffu8);
*q12u8 = vqsubq_u8(*q12u8, qdiffu8);
*q13u8 = vqsubq_u8(*q13u8, qdiffu8);
*q14u8 = vqsubq_u8(*q14u8, qdiffu8);
*q15u8 = vqsubq_u8(*q15u8, qdiffu8);
return;
}
static INLINE void ST_16x8(uint8_t *d, int d_stride, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
vst1q_u8(d, *q8u8);
d += d_stride;
vst1q_u8(d, *q9u8);
d += d_stride;
vst1q_u8(d, *q10u8);
d += d_stride;
vst1q_u8(d, *q11u8);
d += d_stride;
vst1q_u8(d, *q12u8);
d += d_stride;
vst1q_u8(d, *q13u8);
d += d_stride;
vst1q_u8(d, *q14u8);
d += d_stride;
vst1q_u8(d, *q15u8);
return;
}
void aom_idct32x32_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x16_t q0u8, q8u8, q9u8, q10u8, q11u8, q12u8, q13u8, q14u8, q15u8;
int i, j, dest_stride8;
uint8_t *d;
int16_t a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 6);
dest_stride8 = dest_stride * 8;
if (a1 >= 0) { // diff_positive_32_32
a1 = a1 < 0 ? 0 : a1 > 255 ? 255 : a1;
q0u8 = vdupq_n_u8(a1);
for (i = 0; i < 2; i++, dest += 16) { // diff_positive_32_32_loop
d = dest;
for (j = 0; j < 4; j++) {
LD_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ADD_DIFF_16x8(q0u8, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ST_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
d += dest_stride8;
}
}
} else { // diff_negative_32_32
a1 = -a1;
a1 = a1 < 0 ? 0 : a1 > 255 ? 255 : a1;
q0u8 = vdupq_n_u8(a1);
for (i = 0; i < 2; i++, dest += 16) { // diff_negative_32_32_loop
d = dest;
for (j = 0; j < 4; j++) {
LD_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
SUB_DIFF_16x8(q0u8, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ST_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
d += dest_stride8;
}
}
}
return;
}

View File

@@ -1,11 +1,14 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
;TODO(cd): adjust these constant to be able to use vqdmulh for faster ;TODO(cd): adjust these constant to be able to use vqdmulh for faster
@@ -43,7 +46,7 @@ cospi_30_64 EQU 1606
cospi_31_64 EQU 804 cospi_31_64 EQU 804
EXPORT |vpx_idct32x32_1024_add_neon| EXPORT |aom_idct32x32_1024_add_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
@@ -288,7 +291,7 @@ cospi_31_64 EQU 804
MEND MEND
; -------------------------------------------------------------------------- ; --------------------------------------------------------------------------
;void vpx_idct32x32_1024_add_neon(int16_t *input, uint8_t *dest, int dest_stride); ;void aom_idct32x32_1024_add_neon(int16_t *input, uint8_t *dest, int dest_stride);
; ;
; r0 int16_t *input, ; r0 int16_t *input,
; r1 uint8_t *dest, ; r1 uint8_t *dest,
@@ -303,7 +306,7 @@ cospi_31_64 EQU 804
; r9 dest + 15 * dest_stride, descending (14, 13, 12, ...) ; r9 dest + 15 * dest_stride, descending (14, 13, 12, ...)
; r10 dest + 16 * dest_stride, ascending (17, 18, 19, ...) ; r10 dest + 16 * dest_stride, ascending (17, 18, 19, ...)
|vpx_idct32x32_1024_add_neon| PROC |aom_idct32x32_1024_add_neon| PROC
; This function does one pass of idct32x32 transform. ; This function does one pass of idct32x32 transform.
; ;
; This is done by transposing the input and then doing a 1d transform on ; This is done by transposing the input and then doing a 1d transform on
@@ -1295,5 +1298,5 @@ idct32_bands_end_2nd_pass
vpop {d8-d15} vpop {d8-d15}
pop {r4-r11} pop {r4-r11}
bx lr bx lr
ENDP ; |vpx_idct32x32_1024_add_neon| ENDP ; |aom_idct32x32_1024_add_neon|
END END

View File

@@ -0,0 +1,686 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/txfm_common.h"
#define LOAD_FROM_TRANSPOSED(prev, first, second) \
q14s16 = vld1q_s16(trans_buf + first * 8); \
q13s16 = vld1q_s16(trans_buf + second * 8);
#define LOAD_FROM_OUTPUT(prev, first, second, qA, qB) \
qA = vld1q_s16(out + first * 32); \
qB = vld1q_s16(out + second * 32);
#define STORE_IN_OUTPUT(prev, first, second, qA, qB) \
vst1q_s16(out + first * 32, qA); \
vst1q_s16(out + second * 32, qB);
#define STORE_COMBINE_CENTER_RESULTS(r10, r9) \
__STORE_COMBINE_CENTER_RESULTS(r10, r9, stride, q6s16, q7s16, q8s16, q9s16);
static INLINE void __STORE_COMBINE_CENTER_RESULTS(uint8_t *p1, uint8_t *p2,
int stride, int16x8_t q6s16,
int16x8_t q7s16,
int16x8_t q8s16,
int16x8_t q9s16) {
int16x4_t d8s16, d9s16, d10s16, d11s16;
d8s16 = vld1_s16((int16_t *)p1);
p1 += stride;
d11s16 = vld1_s16((int16_t *)p2);
p2 -= stride;
d9s16 = vld1_s16((int16_t *)p1);
d10s16 = vld1_s16((int16_t *)p2);
q7s16 = vrshrq_n_s16(q7s16, 6);
q8s16 = vrshrq_n_s16(q8s16, 6);
q9s16 = vrshrq_n_s16(q9s16, 6);
q6s16 = vrshrq_n_s16(q6s16, 6);
q7s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q7s16), vreinterpret_u8_s16(d9s16)));
q8s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_s16(d10s16)));
q9s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_s16(d11s16)));
q6s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q6s16), vreinterpret_u8_s16(d8s16)));
d9s16 = vreinterpret_s16_u8(vqmovun_s16(q7s16));
d10s16 = vreinterpret_s16_u8(vqmovun_s16(q8s16));
d11s16 = vreinterpret_s16_u8(vqmovun_s16(q9s16));
d8s16 = vreinterpret_s16_u8(vqmovun_s16(q6s16));
vst1_s16((int16_t *)p1, d9s16);
p1 -= stride;
vst1_s16((int16_t *)p2, d10s16);
p2 += stride;
vst1_s16((int16_t *)p1, d8s16);
vst1_s16((int16_t *)p2, d11s16);
return;
}
#define STORE_COMBINE_EXTREME_RESULTS(r7, r6) \
; \
__STORE_COMBINE_EXTREME_RESULTS(r7, r6, stride, q4s16, q5s16, q6s16, q7s16);
static INLINE void __STORE_COMBINE_EXTREME_RESULTS(uint8_t *p1, uint8_t *p2,
int stride, int16x8_t q4s16,
int16x8_t q5s16,
int16x8_t q6s16,
int16x8_t q7s16) {
int16x4_t d4s16, d5s16, d6s16, d7s16;
d4s16 = vld1_s16((int16_t *)p1);
p1 += stride;
d7s16 = vld1_s16((int16_t *)p2);
p2 -= stride;
d5s16 = vld1_s16((int16_t *)p1);
d6s16 = vld1_s16((int16_t *)p2);
q5s16 = vrshrq_n_s16(q5s16, 6);
q6s16 = vrshrq_n_s16(q6s16, 6);
q7s16 = vrshrq_n_s16(q7s16, 6);
q4s16 = vrshrq_n_s16(q4s16, 6);
q5s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q5s16), vreinterpret_u8_s16(d5s16)));
q6s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q6s16), vreinterpret_u8_s16(d6s16)));
q7s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q7s16), vreinterpret_u8_s16(d7s16)));
q4s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q4s16), vreinterpret_u8_s16(d4s16)));
d5s16 = vreinterpret_s16_u8(vqmovun_s16(q5s16));
d6s16 = vreinterpret_s16_u8(vqmovun_s16(q6s16));
d7s16 = vreinterpret_s16_u8(vqmovun_s16(q7s16));
d4s16 = vreinterpret_s16_u8(vqmovun_s16(q4s16));
vst1_s16((int16_t *)p1, d5s16);
p1 -= stride;
vst1_s16((int16_t *)p2, d6s16);
p2 += stride;
vst1_s16((int16_t *)p2, d7s16);
vst1_s16((int16_t *)p1, d4s16);
return;
}
#define DO_BUTTERFLY_STD(const_1, const_2, qA, qB) \
DO_BUTTERFLY(q14s16, q13s16, const_1, const_2, qA, qB);
static INLINE void DO_BUTTERFLY(int16x8_t q14s16, int16x8_t q13s16,
int16_t first_const, int16_t second_const,
int16x8_t *qAs16, int16x8_t *qBs16) {
int16x4_t d30s16, d31s16;
int32x4_t q8s32, q9s32, q10s32, q11s32, q12s32, q15s32;
int16x4_t dCs16, dDs16, dAs16, dBs16;
dCs16 = vget_low_s16(q14s16);
dDs16 = vget_high_s16(q14s16);
dAs16 = vget_low_s16(q13s16);
dBs16 = vget_high_s16(q13s16);
d30s16 = vdup_n_s16(first_const);
d31s16 = vdup_n_s16(second_const);
q8s32 = vmull_s16(dCs16, d30s16);
q10s32 = vmull_s16(dAs16, d31s16);
q9s32 = vmull_s16(dDs16, d30s16);
q11s32 = vmull_s16(dBs16, d31s16);
q12s32 = vmull_s16(dCs16, d31s16);
q8s32 = vsubq_s32(q8s32, q10s32);
q9s32 = vsubq_s32(q9s32, q11s32);
q10s32 = vmull_s16(dDs16, d31s16);
q11s32 = vmull_s16(dAs16, d30s16);
q15s32 = vmull_s16(dBs16, d30s16);
q11s32 = vaddq_s32(q12s32, q11s32);
q10s32 = vaddq_s32(q10s32, q15s32);
*qAs16 = vcombine_s16(vqrshrn_n_s32(q8s32, 14), vqrshrn_n_s32(q9s32, 14));
*qBs16 = vcombine_s16(vqrshrn_n_s32(q11s32, 14), vqrshrn_n_s32(q10s32, 14));
return;
}
static INLINE void idct32_transpose_pair(int16_t *input, int16_t *t_buf) {
int16_t *in;
int i;
const int stride = 32;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
int32x4x2_t q0x2s32, q1x2s32, q2x2s32, q3x2s32;
int16x8x2_t q0x2s16, q1x2s16, q2x2s16, q3x2s16;
for (i = 0; i < 4; i++, input += 8) {
in = input;
q8s16 = vld1q_s16(in);
in += stride;
q9s16 = vld1q_s16(in);
in += stride;
q10s16 = vld1q_s16(in);
in += stride;
q11s16 = vld1q_s16(in);
in += stride;
q12s16 = vld1q_s16(in);
in += stride;
q13s16 = vld1q_s16(in);
in += stride;
q14s16 = vld1q_s16(in);
in += stride;
q15s16 = vld1q_s16(in);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_low_s16(q9s16);
d19s16 = vget_high_s16(q9s16);
d20s16 = vget_low_s16(q10s16);
d21s16 = vget_high_s16(q10s16);
d22s16 = vget_low_s16(q11s16);
d23s16 = vget_high_s16(q11s16);
d24s16 = vget_low_s16(q12s16);
d25s16 = vget_high_s16(q12s16);
d26s16 = vget_low_s16(q13s16);
d27s16 = vget_high_s16(q13s16);
d28s16 = vget_low_s16(q14s16);
d29s16 = vget_high_s16(q14s16);
d30s16 = vget_low_s16(q15s16);
d31s16 = vget_high_s16(q15s16);
q8s16 = vcombine_s16(d16s16, d24s16); // vswp d17, d24
q9s16 = vcombine_s16(d18s16, d26s16); // vswp d19, d26
q10s16 = vcombine_s16(d20s16, d28s16); // vswp d21, d28
q11s16 = vcombine_s16(d22s16, d30s16); // vswp d23, d30
q12s16 = vcombine_s16(d17s16, d25s16);
q13s16 = vcombine_s16(d19s16, d27s16);
q14s16 = vcombine_s16(d21s16, d29s16);
q15s16 = vcombine_s16(d23s16, d31s16);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q10s16));
q1x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q9s16), vreinterpretq_s32_s16(q11s16));
q2x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q12s16), vreinterpretq_s32_s16(q14s16));
q3x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q13s16), vreinterpretq_s32_s16(q15s16));
q0x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[0]), // q8
vreinterpretq_s16_s32(q1x2s32.val[0])); // q9
q1x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[1]), // q10
vreinterpretq_s16_s32(q1x2s32.val[1])); // q11
q2x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[0]), // q12
vreinterpretq_s16_s32(q3x2s32.val[0])); // q13
q3x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[1]), // q14
vreinterpretq_s16_s32(q3x2s32.val[1])); // q15
vst1q_s16(t_buf, q0x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q0x2s16.val[1]);
t_buf += 8;
vst1q_s16(t_buf, q1x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q1x2s16.val[1]);
t_buf += 8;
vst1q_s16(t_buf, q2x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q2x2s16.val[1]);
t_buf += 8;
vst1q_s16(t_buf, q3x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q3x2s16.val[1]);
t_buf += 8;
}
return;
}
static INLINE void idct32_bands_end_1st_pass(int16_t *out, int16x8_t q2s16,
int16x8_t q3s16, int16x8_t q6s16,
int16x8_t q7s16, int16x8_t q8s16,
int16x8_t q9s16, int16x8_t q10s16,
int16x8_t q11s16, int16x8_t q12s16,
int16x8_t q13s16, int16x8_t q14s16,
int16x8_t q15s16) {
int16x8_t q0s16, q1s16, q4s16, q5s16;
STORE_IN_OUTPUT(17, 16, 17, q6s16, q7s16);
STORE_IN_OUTPUT(17, 14, 15, q8s16, q9s16);
LOAD_FROM_OUTPUT(15, 30, 31, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(31, 30, 31, q6s16, q7s16);
STORE_IN_OUTPUT(31, 0, 1, q4s16, q5s16);
LOAD_FROM_OUTPUT(1, 12, 13, q0s16, q1s16);
q2s16 = vaddq_s16(q10s16, q1s16);
q3s16 = vaddq_s16(q11s16, q0s16);
q4s16 = vsubq_s16(q11s16, q0s16);
q5s16 = vsubq_s16(q10s16, q1s16);
LOAD_FROM_OUTPUT(13, 18, 19, q0s16, q1s16);
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_IN_OUTPUT(19, 18, 19, q6s16, q7s16);
STORE_IN_OUTPUT(19, 12, 13, q8s16, q9s16);
LOAD_FROM_OUTPUT(13, 28, 29, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(29, 28, 29, q6s16, q7s16);
STORE_IN_OUTPUT(29, 2, 3, q4s16, q5s16);
LOAD_FROM_OUTPUT(3, 10, 11, q0s16, q1s16);
q2s16 = vaddq_s16(q12s16, q1s16);
q3s16 = vaddq_s16(q13s16, q0s16);
q4s16 = vsubq_s16(q13s16, q0s16);
q5s16 = vsubq_s16(q12s16, q1s16);
LOAD_FROM_OUTPUT(11, 20, 21, q0s16, q1s16);
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_IN_OUTPUT(21, 20, 21, q6s16, q7s16);
STORE_IN_OUTPUT(21, 10, 11, q8s16, q9s16);
LOAD_FROM_OUTPUT(11, 26, 27, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(27, 26, 27, q6s16, q7s16);
STORE_IN_OUTPUT(27, 4, 5, q4s16, q5s16);
LOAD_FROM_OUTPUT(5, 8, 9, q0s16, q1s16);
q2s16 = vaddq_s16(q14s16, q1s16);
q3s16 = vaddq_s16(q15s16, q0s16);
q4s16 = vsubq_s16(q15s16, q0s16);
q5s16 = vsubq_s16(q14s16, q1s16);
LOAD_FROM_OUTPUT(9, 22, 23, q0s16, q1s16);
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_IN_OUTPUT(23, 22, 23, q6s16, q7s16);
STORE_IN_OUTPUT(23, 8, 9, q8s16, q9s16);
LOAD_FROM_OUTPUT(9, 24, 25, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(25, 24, 25, q6s16, q7s16);
STORE_IN_OUTPUT(25, 6, 7, q4s16, q5s16);
return;
}
static INLINE void idct32_bands_end_2nd_pass(
int16_t *out, uint8_t *dest, int stride, int16x8_t q2s16, int16x8_t q3s16,
int16x8_t q6s16, int16x8_t q7s16, int16x8_t q8s16, int16x8_t q9s16,
int16x8_t q10s16, int16x8_t q11s16, int16x8_t q12s16, int16x8_t q13s16,
int16x8_t q14s16, int16x8_t q15s16) {
uint8_t *r6 = dest + 31 * stride;
uint8_t *r7 = dest /* + 0 * stride*/;
uint8_t *r9 = dest + 15 * stride;
uint8_t *r10 = dest + 16 * stride;
int str2 = stride << 1;
int16x8_t q0s16, q1s16, q4s16, q5s16;
STORE_COMBINE_CENTER_RESULTS(r10, r9);
r10 += str2;
r9 -= str2;
LOAD_FROM_OUTPUT(17, 30, 31, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
r7 += str2;
r6 -= str2;
LOAD_FROM_OUTPUT(31, 12, 13, q0s16, q1s16)
q2s16 = vaddq_s16(q10s16, q1s16);
q3s16 = vaddq_s16(q11s16, q0s16);
q4s16 = vsubq_s16(q11s16, q0s16);
q5s16 = vsubq_s16(q10s16, q1s16);
LOAD_FROM_OUTPUT(13, 18, 19, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_COMBINE_CENTER_RESULTS(r10, r9);
r10 += str2;
r9 -= str2;
LOAD_FROM_OUTPUT(19, 28, 29, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
r7 += str2;
r6 -= str2;
LOAD_FROM_OUTPUT(29, 10, 11, q0s16, q1s16)
q2s16 = vaddq_s16(q12s16, q1s16);
q3s16 = vaddq_s16(q13s16, q0s16);
q4s16 = vsubq_s16(q13s16, q0s16);
q5s16 = vsubq_s16(q12s16, q1s16);
LOAD_FROM_OUTPUT(11, 20, 21, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_COMBINE_CENTER_RESULTS(r10, r9);
r10 += str2;
r9 -= str2;
LOAD_FROM_OUTPUT(21, 26, 27, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
r7 += str2;
r6 -= str2;
LOAD_FROM_OUTPUT(27, 8, 9, q0s16, q1s16)
q2s16 = vaddq_s16(q14s16, q1s16);
q3s16 = vaddq_s16(q15s16, q0s16);
q4s16 = vsubq_s16(q15s16, q0s16);
q5s16 = vsubq_s16(q14s16, q1s16);
LOAD_FROM_OUTPUT(9, 22, 23, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_COMBINE_CENTER_RESULTS(r10, r9);
LOAD_FROM_OUTPUT(23, 24, 25, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
return;
}
void aom_idct32x32_1024_add_neon(int16_t *input, uint8_t *dest, int stride) {
int i, idct32_pass_loop;
int16_t trans_buf[32 * 8];
int16_t pass1[32 * 32];
int16_t pass2[32 * 32];
int16_t *out;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
for (idct32_pass_loop = 0, out = pass1; idct32_pass_loop < 2;
idct32_pass_loop++,
input = pass1, // the input of pass2 is the result of pass1
out = pass2) {
for (i = 0; i < 4; i++, input += 32 * 8, out += 8) { // idct32_bands_loop
idct32_transpose_pair(input, trans_buf);
// -----------------------------------------
// BLOCK A: 16-19,28-31
// -----------------------------------------
// generate 16,17,30,31
// part of stage 1
LOAD_FROM_TRANSPOSED(0, 1, 31)
DO_BUTTERFLY_STD(cospi_31_64, cospi_1_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(31, 17, 15)
DO_BUTTERFLY_STD(cospi_15_64, cospi_17_64, &q1s16, &q3s16)
// part of stage 2
q4s16 = vaddq_s16(q0s16, q1s16);
q13s16 = vsubq_s16(q0s16, q1s16);
q6s16 = vaddq_s16(q2s16, q3s16);
q14s16 = vsubq_s16(q2s16, q3s16);
// part of stage 3
DO_BUTTERFLY_STD(cospi_28_64, cospi_4_64, &q5s16, &q7s16)
// generate 18,19,28,29
// part of stage 1
LOAD_FROM_TRANSPOSED(15, 9, 23)
DO_BUTTERFLY_STD(cospi_23_64, cospi_9_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(23, 25, 7)
DO_BUTTERFLY_STD(cospi_7_64, cospi_25_64, &q1s16, &q3s16)
// part of stage 2
q13s16 = vsubq_s16(q3s16, q2s16);
q3s16 = vaddq_s16(q3s16, q2s16);
q14s16 = vsubq_s16(q1s16, q0s16);
q2s16 = vaddq_s16(q1s16, q0s16);
// part of stage 3
DO_BUTTERFLY_STD(-cospi_4_64, -cospi_28_64, &q1s16, &q0s16)
// part of stage 4
q8s16 = vaddq_s16(q4s16, q2s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q10s16 = vaddq_s16(q7s16, q1s16);
q15s16 = vaddq_s16(q6s16, q3s16);
q13s16 = vsubq_s16(q5s16, q0s16);
q14s16 = vsubq_s16(q7s16, q1s16);
STORE_IN_OUTPUT(0, 16, 31, q8s16, q15s16)
STORE_IN_OUTPUT(31, 17, 30, q9s16, q10s16)
// part of stage 5
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q0s16, &q1s16)
STORE_IN_OUTPUT(30, 29, 18, q1s16, q0s16)
// part of stage 4
q13s16 = vsubq_s16(q4s16, q2s16);
q14s16 = vsubq_s16(q6s16, q3s16);
// part of stage 5
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q4s16, &q6s16)
STORE_IN_OUTPUT(18, 19, 28, q4s16, q6s16)
// -----------------------------------------
// BLOCK B: 20-23,24-27
// -----------------------------------------
// generate 20,21,26,27
// part of stage 1
LOAD_FROM_TRANSPOSED(7, 5, 27)
DO_BUTTERFLY_STD(cospi_27_64, cospi_5_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(27, 21, 11)
DO_BUTTERFLY_STD(cospi_11_64, cospi_21_64, &q1s16, &q3s16)
// part of stage 2
q13s16 = vsubq_s16(q0s16, q1s16);
q0s16 = vaddq_s16(q0s16, q1s16);
q14s16 = vsubq_s16(q2s16, q3s16);
q2s16 = vaddq_s16(q2s16, q3s16);
// part of stage 3
DO_BUTTERFLY_STD(cospi_12_64, cospi_20_64, &q1s16, &q3s16)
// generate 22,23,24,25
// part of stage 1
LOAD_FROM_TRANSPOSED(11, 13, 19)
DO_BUTTERFLY_STD(cospi_19_64, cospi_13_64, &q5s16, &q7s16)
LOAD_FROM_TRANSPOSED(19, 29, 3)
DO_BUTTERFLY_STD(cospi_3_64, cospi_29_64, &q4s16, &q6s16)
// part of stage 2
q14s16 = vsubq_s16(q4s16, q5s16);
q5s16 = vaddq_s16(q4s16, q5s16);
q13s16 = vsubq_s16(q6s16, q7s16);
q6s16 = vaddq_s16(q6s16, q7s16);
// part of stage 3
DO_BUTTERFLY_STD(-cospi_20_64, -cospi_12_64, &q4s16, &q7s16)
// part of stage 4
q10s16 = vaddq_s16(q7s16, q1s16);
q11s16 = vaddq_s16(q5s16, q0s16);
q12s16 = vaddq_s16(q6s16, q2s16);
q15s16 = vaddq_s16(q4s16, q3s16);
// part of stage 6
LOAD_FROM_OUTPUT(28, 16, 17, q14s16, q13s16)
q8s16 = vaddq_s16(q14s16, q11s16);
q9s16 = vaddq_s16(q13s16, q10s16);
q13s16 = vsubq_s16(q13s16, q10s16);
q11s16 = vsubq_s16(q14s16, q11s16);
STORE_IN_OUTPUT(17, 17, 16, q9s16, q8s16)
LOAD_FROM_OUTPUT(16, 30, 31, q14s16, q9s16)
q8s16 = vsubq_s16(q9s16, q12s16);
q10s16 = vaddq_s16(q14s16, q15s16);
q14s16 = vsubq_s16(q14s16, q15s16);
q12s16 = vaddq_s16(q9s16, q12s16);
STORE_IN_OUTPUT(31, 30, 31, q10s16, q12s16)
// part of stage 7
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q13s16, &q14s16)
STORE_IN_OUTPUT(31, 25, 22, q14s16, q13s16)
q13s16 = q11s16;
q14s16 = q8s16;
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q13s16, &q14s16)
STORE_IN_OUTPUT(22, 24, 23, q14s16, q13s16)
// part of stage 4
q14s16 = vsubq_s16(q5s16, q0s16);
q13s16 = vsubq_s16(q6s16, q2s16);
DO_BUTTERFLY_STD(-cospi_8_64, -cospi_24_64, &q5s16, &q6s16);
q14s16 = vsubq_s16(q7s16, q1s16);
q13s16 = vsubq_s16(q4s16, q3s16);
DO_BUTTERFLY_STD(-cospi_8_64, -cospi_24_64, &q0s16, &q1s16);
// part of stage 6
LOAD_FROM_OUTPUT(23, 18, 19, q14s16, q13s16)
q8s16 = vaddq_s16(q14s16, q1s16);
q9s16 = vaddq_s16(q13s16, q6s16);
q13s16 = vsubq_s16(q13s16, q6s16);
q1s16 = vsubq_s16(q14s16, q1s16);
STORE_IN_OUTPUT(19, 18, 19, q8s16, q9s16)
LOAD_FROM_OUTPUT(19, 28, 29, q8s16, q9s16)
q14s16 = vsubq_s16(q8s16, q5s16);
q10s16 = vaddq_s16(q8s16, q5s16);
q11s16 = vaddq_s16(q9s16, q0s16);
q0s16 = vsubq_s16(q9s16, q0s16);
STORE_IN_OUTPUT(29, 28, 29, q10s16, q11s16)
// part of stage 7
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q13s16, &q14s16)
STORE_IN_OUTPUT(29, 20, 27, q13s16, q14s16)
DO_BUTTERFLY(q0s16, q1s16, cospi_16_64, cospi_16_64, &q1s16, &q0s16);
STORE_IN_OUTPUT(27, 21, 26, q1s16, q0s16)
// -----------------------------------------
// BLOCK C: 8-10,11-15
// -----------------------------------------
// generate 8,9,14,15
// part of stage 2
LOAD_FROM_TRANSPOSED(3, 2, 30)
DO_BUTTERFLY_STD(cospi_30_64, cospi_2_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(30, 18, 14)
DO_BUTTERFLY_STD(cospi_14_64, cospi_18_64, &q1s16, &q3s16)
// part of stage 3
q13s16 = vsubq_s16(q0s16, q1s16);
q0s16 = vaddq_s16(q0s16, q1s16);
q14s16 = vsubq_s16(q2s16, q3s16);
q2s16 = vaddq_s16(q2s16, q3s16);
// part of stage 4
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q1s16, &q3s16)
// generate 10,11,12,13
// part of stage 2
LOAD_FROM_TRANSPOSED(14, 10, 22)
DO_BUTTERFLY_STD(cospi_22_64, cospi_10_64, &q5s16, &q7s16)
LOAD_FROM_TRANSPOSED(22, 26, 6)
DO_BUTTERFLY_STD(cospi_6_64, cospi_26_64, &q4s16, &q6s16)
// part of stage 3
q14s16 = vsubq_s16(q4s16, q5s16);
q5s16 = vaddq_s16(q4s16, q5s16);
q13s16 = vsubq_s16(q6s16, q7s16);
q6s16 = vaddq_s16(q6s16, q7s16);
// part of stage 4
DO_BUTTERFLY_STD(-cospi_8_64, -cospi_24_64, &q4s16, &q7s16)
// part of stage 5
q8s16 = vaddq_s16(q0s16, q5s16);
q9s16 = vaddq_s16(q1s16, q7s16);
q13s16 = vsubq_s16(q1s16, q7s16);
q14s16 = vsubq_s16(q3s16, q4s16);
q10s16 = vaddq_s16(q3s16, q4s16);
q15s16 = vaddq_s16(q2s16, q6s16);
STORE_IN_OUTPUT(26, 8, 15, q8s16, q15s16)
STORE_IN_OUTPUT(15, 9, 14, q9s16, q10s16)
// part of stage 6
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q1s16, &q3s16)
STORE_IN_OUTPUT(14, 13, 10, q3s16, q1s16)
q13s16 = vsubq_s16(q0s16, q5s16);
q14s16 = vsubq_s16(q2s16, q6s16);
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q1s16, &q3s16)
STORE_IN_OUTPUT(10, 11, 12, q1s16, q3s16)
// -----------------------------------------
// BLOCK D: 0-3,4-7
// -----------------------------------------
// generate 4,5,6,7
// part of stage 3
LOAD_FROM_TRANSPOSED(6, 4, 28)
DO_BUTTERFLY_STD(cospi_28_64, cospi_4_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(28, 20, 12)
DO_BUTTERFLY_STD(cospi_12_64, cospi_20_64, &q1s16, &q3s16)
// part of stage 4
q13s16 = vsubq_s16(q0s16, q1s16);
q0s16 = vaddq_s16(q0s16, q1s16);
q14s16 = vsubq_s16(q2s16, q3s16);
q2s16 = vaddq_s16(q2s16, q3s16);
// part of stage 5
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q1s16, &q3s16)
// generate 0,1,2,3
// part of stage 4
LOAD_FROM_TRANSPOSED(12, 0, 16)
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q5s16, &q7s16)
LOAD_FROM_TRANSPOSED(16, 8, 24)
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q14s16, &q6s16)
// part of stage 5
q4s16 = vaddq_s16(q7s16, q6s16);
q7s16 = vsubq_s16(q7s16, q6s16);
q6s16 = vsubq_s16(q5s16, q14s16);
q5s16 = vaddq_s16(q5s16, q14s16);
// part of stage 6
q8s16 = vaddq_s16(q4s16, q2s16);
q9s16 = vaddq_s16(q5s16, q3s16);
q10s16 = vaddq_s16(q6s16, q1s16);
q11s16 = vaddq_s16(q7s16, q0s16);
q12s16 = vsubq_s16(q7s16, q0s16);
q13s16 = vsubq_s16(q6s16, q1s16);
q14s16 = vsubq_s16(q5s16, q3s16);
q15s16 = vsubq_s16(q4s16, q2s16);
// part of stage 7
LOAD_FROM_OUTPUT(12, 14, 15, q0s16, q1s16)
q2s16 = vaddq_s16(q8s16, q1s16);
q3s16 = vaddq_s16(q9s16, q0s16);
q4s16 = vsubq_s16(q9s16, q0s16);
q5s16 = vsubq_s16(q8s16, q1s16);
LOAD_FROM_OUTPUT(15, 16, 17, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
if (idct32_pass_loop == 0) {
idct32_bands_end_1st_pass(out, q2s16, q3s16, q6s16, q7s16, q8s16, q9s16,
q10s16, q11s16, q12s16, q13s16, q14s16,
q15s16);
} else {
idct32_bands_end_2nd_pass(out, dest, stride, q2s16, q3s16, q6s16, q7s16,
q8s16, q9s16, q10s16, q11s16, q12s16, q13s16,
q14s16, q15s16);
dest += 8;
}
}
}
return;
}

View File

@@ -1,28 +1,31 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license and patent ; This source code is subject to the terms of the BSD 2 Clause License and
; grant that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. All contributing project authors may be found in the AUTHORS ; was not distributed with this source code in the LICENSE file, you can
; file in the root of the source tree. ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_idct4x4_1_add_neon|
EXPORT |aom_idct4x4_1_add_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_idct4x4_1_add_neon(int16_t *input, uint8_t *dest, ;void aom_idct4x4_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride) ; int dest_stride)
; ;
; r0 int16_t input ; r0 int16_t input
; r1 uint8_t *dest ; r1 uint8_t *dest
; r2 int dest_stride) ; r2 int dest_stride)
|vpx_idct4x4_1_add_neon| PROC |aom_idct4x4_1_add_neon| PROC
ldrsh r0, [r0] ldrsh r0, [r0]
; generate cospi_16_64 = 11585 ; generate cospi_16_64 = 11585
@@ -63,6 +66,6 @@
vst1.32 {d7[1]}, [r12] vst1.32 {d7[1]}, [r12]
bx lr bx lr
ENDP ; |vpx_idct4x4_1_add_neon| ENDP ; |aom_idct4x4_1_add_neon|
END END

View File

@@ -0,0 +1,47 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct4x4_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d6u8;
uint32x2_t d2u32 = vdup_n_u32(0);
uint16x8_t q8u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 4);
q0s16 = vdupq_n_s16(a1);
// dc_only_idct_add
d1 = d2 = dest;
for (i = 0; i < 2; i++) {
d2u32 = vld1_lane_u32((const uint32_t *)d1, d2u32, 0);
d1 += dest_stride;
d2u32 = vld1_lane_u32((const uint32_t *)d1, d2u32, 1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q0s16), vreinterpret_u8_u32(d2u32));
d6u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
vst1_lane_u32((uint32_t *)d2, vreinterpret_u32_u8(d6u8), 0);
d2 += dest_stride;
vst1_lane_u32((uint32_t *)d2, vreinterpret_u32_u8(d6u8), 1);
d2 += dest_stride;
}
return;
}

View File

@@ -1,14 +1,17 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_idct4x4_16_add_neon| ;
EXPORT |aom_idct4x4_16_add_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
@@ -16,13 +19,13 @@
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
AREA Block, CODE, READONLY ; name this block of code AREA Block, CODE, READONLY ; name this block of code
;void vpx_idct4x4_16_add_neon(int16_t *input, uint8_t *dest, int dest_stride) ;void aom_idct4x4_16_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
; ;
; r0 int16_t input ; r0 int16_t input
; r1 uint8_t *dest ; r1 uint8_t *dest
; r2 int dest_stride) ; r2 int dest_stride)
|vpx_idct4x4_16_add_neon| PROC |aom_idct4x4_16_add_neon| PROC
; The 2D transform is done with two passes which are actually pretty ; The 2D transform is done with two passes which are actually pretty
; similar. We first transform the rows. This is done by transposing ; similar. We first transform the rows. This is done by transposing
@@ -185,6 +188,6 @@
vst1.32 {d26[1]}, [r1], r2 vst1.32 {d26[1]}, [r1], r2
vst1.32 {d26[0]}, [r1] ; no post-increment vst1.32 {d26[0]}, [r1] ; no post-increment
bx lr bx lr
ENDP ; |vpx_idct4x4_16_add_neon| ENDP ; |aom_idct4x4_16_add_neon|
END END

View File

@@ -0,0 +1,146 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/txfm_common.h"
void aom_idct4x4_16_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d26u8, d27u8;
uint32x2_t d26u32, d27u32;
uint16x8_t q8u16, q9u16;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16;
int16x4_t d22s16, d23s16, d24s16, d26s16, d27s16, d28s16, d29s16;
int16x8_t q8s16, q9s16, q13s16, q14s16;
int32x4_t q1s32, q13s32, q14s32, q15s32;
int16x4x2_t d0x2s16, d1x2s16;
int32x4x2_t q0x2s32;
uint8_t *d;
d26u32 = d27u32 = vdup_n_u32(0);
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_low_s16(q9s16);
d19s16 = vget_high_s16(q9s16);
d0x2s16 = vtrn_s16(d16s16, d17s16);
d1x2s16 = vtrn_s16(d18s16, d19s16);
q8s16 = vcombine_s16(d0x2s16.val[0], d0x2s16.val[1]);
q9s16 = vcombine_s16(d1x2s16.val[0], d1x2s16.val[1]);
d20s16 = vdup_n_s16((int16_t)cospi_8_64);
d21s16 = vdup_n_s16((int16_t)cospi_16_64);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q9s16));
d16s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d17s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d18s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d19s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d22s16 = vdup_n_s16((int16_t)cospi_24_64);
// stage 1
d23s16 = vadd_s16(d16s16, d18s16);
d24s16 = vsub_s16(d16s16, d18s16);
q15s32 = vmull_s16(d17s16, d22s16);
q1s32 = vmull_s16(d17s16, d20s16);
q13s32 = vmull_s16(d23s16, d21s16);
q14s32 = vmull_s16(d24s16, d21s16);
q15s32 = vmlsl_s16(q15s32, d19s16, d20s16);
q1s32 = vmlal_s16(q1s32, d19s16, d22s16);
d26s16 = vqrshrn_n_s32(q13s32, 14);
d27s16 = vqrshrn_n_s32(q14s32, 14);
d29s16 = vqrshrn_n_s32(q15s32, 14);
d28s16 = vqrshrn_n_s32(q1s32, 14);
q13s16 = vcombine_s16(d26s16, d27s16);
q14s16 = vcombine_s16(d28s16, d29s16);
// stage 2
q8s16 = vaddq_s16(q13s16, q14s16);
q9s16 = vsubq_s16(q13s16, q14s16);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_high_s16(q9s16); // vswp d18 d19
d19s16 = vget_low_s16(q9s16);
d0x2s16 = vtrn_s16(d16s16, d17s16);
d1x2s16 = vtrn_s16(d18s16, d19s16);
q8s16 = vcombine_s16(d0x2s16.val[0], d0x2s16.val[1]);
q9s16 = vcombine_s16(d1x2s16.val[0], d1x2s16.val[1]);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q9s16));
d16s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d17s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d18s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d19s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
// do the transform on columns
// stage 1
d23s16 = vadd_s16(d16s16, d18s16);
d24s16 = vsub_s16(d16s16, d18s16);
q15s32 = vmull_s16(d17s16, d22s16);
q1s32 = vmull_s16(d17s16, d20s16);
q13s32 = vmull_s16(d23s16, d21s16);
q14s32 = vmull_s16(d24s16, d21s16);
q15s32 = vmlsl_s16(q15s32, d19s16, d20s16);
q1s32 = vmlal_s16(q1s32, d19s16, d22s16);
d26s16 = vqrshrn_n_s32(q13s32, 14);
d27s16 = vqrshrn_n_s32(q14s32, 14);
d29s16 = vqrshrn_n_s32(q15s32, 14);
d28s16 = vqrshrn_n_s32(q1s32, 14);
q13s16 = vcombine_s16(d26s16, d27s16);
q14s16 = vcombine_s16(d28s16, d29s16);
// stage 2
q8s16 = vaddq_s16(q13s16, q14s16);
q9s16 = vsubq_s16(q13s16, q14s16);
q8s16 = vrshrq_n_s16(q8s16, 4);
q9s16 = vrshrq_n_s16(q9s16, 4);
d = dest;
d26u32 = vld1_lane_u32((const uint32_t *)d, d26u32, 0);
d += dest_stride;
d26u32 = vld1_lane_u32((const uint32_t *)d, d26u32, 1);
d += dest_stride;
d27u32 = vld1_lane_u32((const uint32_t *)d, d27u32, 1);
d += dest_stride;
d27u32 = vld1_lane_u32((const uint32_t *)d, d27u32, 0);
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u32(d26u32));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u32(d27u32));
d26u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d27u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d = dest;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d26u8), 0);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d26u8), 1);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d27u8), 1);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d27u8), 0);
return;
}

View File

@@ -1,28 +1,31 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license and patent ; This source code is subject to the terms of the BSD 2 Clause License and
; grant that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. All contributing project authors may be found in the AUTHORS ; was not distributed with this source code in the LICENSE file, you can
; file in the root of the source tree. ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_idct8x8_1_add_neon|
EXPORT |aom_idct8x8_1_add_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_idct8x8_1_add_neon(int16_t *input, uint8_t *dest, ;void aom_idct8x8_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride) ; int dest_stride)
; ;
; r0 int16_t input ; r0 int16_t input
; r1 uint8_t *dest ; r1 uint8_t *dest
; r2 int dest_stride) ; r2 int dest_stride)
|vpx_idct8x8_1_add_neon| PROC |aom_idct8x8_1_add_neon| PROC
ldrsh r0, [r0] ldrsh r0, [r0]
; generate cospi_16_64 = 11585 ; generate cospi_16_64 = 11585
@@ -83,6 +86,6 @@
vst1.64 {d31}, [r12], r2 vst1.64 {d31}, [r12], r2
bx lr bx lr
ENDP ; |vpx_idct8x8_1_add_neon| ENDP ; |aom_idct8x8_1_add_neon|
END END

View File

@@ -0,0 +1,62 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct8x8_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d2u8, d3u8, d30u8, d31u8;
uint64x1_t d2u64, d3u64, d4u64, d5u64;
uint16x8_t q0u16, q9u16, q10u16, q11u16, q12u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 5);
q0s16 = vdupq_n_s16(a1);
q0u16 = vreinterpretq_u16_s16(q0s16);
d1 = d2 = dest;
for (i = 0; i < 2; i++) {
d2u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d4u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d5u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
q9u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d2u64));
q10u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d3u64));
q11u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d4u64));
q12u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d5u64));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d30u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
d31u8 = vqmovun_s16(vreinterpretq_s16_u16(q12u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d30u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d31u8));
d2 += dest_stride;
}
return;
}

View File

@@ -1,15 +1,18 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_idct8x8_64_add_neon| ;
EXPORT |vpx_idct8x8_12_add_neon|
EXPORT |aom_idct8x8_64_add_neon|
EXPORT |aom_idct8x8_12_add_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
@@ -198,13 +201,13 @@
MEND MEND
AREA Block, CODE, READONLY ; name this block of code AREA Block, CODE, READONLY ; name this block of code
;void vpx_idct8x8_64_add_neon(int16_t *input, uint8_t *dest, int dest_stride) ;void aom_idct8x8_64_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
; ;
; r0 int16_t input ; r0 int16_t input
; r1 uint8_t *dest ; r1 uint8_t *dest
; r2 int dest_stride) ; r2 int dest_stride)
|vpx_idct8x8_64_add_neon| PROC |aom_idct8x8_64_add_neon| PROC
push {r4-r9} push {r4-r9}
vpush {d8-d15} vpush {d8-d15}
vld1.s16 {q8,q9}, [r0]! vld1.s16 {q8,q9}, [r0]!
@@ -308,15 +311,15 @@
vpop {d8-d15} vpop {d8-d15}
pop {r4-r9} pop {r4-r9}
bx lr bx lr
ENDP ; |vpx_idct8x8_64_add_neon| ENDP ; |aom_idct8x8_64_add_neon|
;void vpx_idct8x8_12_add_neon(int16_t *input, uint8_t *dest, int dest_stride) ;void aom_idct8x8_12_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
; ;
; r0 int16_t input ; r0 int16_t input
; r1 uint8_t *dest ; r1 uint8_t *dest
; r2 int dest_stride) ; r2 int dest_stride)
|vpx_idct8x8_12_add_neon| PROC |aom_idct8x8_12_add_neon| PROC
push {r4-r9} push {r4-r9}
vpush {d8-d15} vpush {d8-d15}
vld1.s16 {q8,q9}, [r0]! vld1.s16 {q8,q9}, [r0]!
@@ -514,6 +517,6 @@
vpop {d8-d15} vpop {d8-d15}
pop {r4-r9} pop {r4-r9}
bx lr bx lr
ENDP ; |vpx_idct8x8_12_add_neon| ENDP ; |aom_idct8x8_12_add_neon|
END END

View File

@@ -0,0 +1,509 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/txfm_common.h"
static INLINE void TRANSPOSE8X8(int16x8_t *q8s16, int16x8_t *q9s16,
int16x8_t *q10s16, int16x8_t *q11s16,
int16x8_t *q12s16, int16x8_t *q13s16,
int16x8_t *q14s16, int16x8_t *q15s16) {
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int32x4x2_t q0x2s32, q1x2s32, q2x2s32, q3x2s32;
int16x8x2_t q0x2s16, q1x2s16, q2x2s16, q3x2s16;
d16s16 = vget_low_s16(*q8s16);
d17s16 = vget_high_s16(*q8s16);
d18s16 = vget_low_s16(*q9s16);
d19s16 = vget_high_s16(*q9s16);
d20s16 = vget_low_s16(*q10s16);
d21s16 = vget_high_s16(*q10s16);
d22s16 = vget_low_s16(*q11s16);
d23s16 = vget_high_s16(*q11s16);
d24s16 = vget_low_s16(*q12s16);
d25s16 = vget_high_s16(*q12s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d30s16 = vget_low_s16(*q15s16);
d31s16 = vget_high_s16(*q15s16);
*q8s16 = vcombine_s16(d16s16, d24s16); // vswp d17, d24
*q9s16 = vcombine_s16(d18s16, d26s16); // vswp d19, d26
*q10s16 = vcombine_s16(d20s16, d28s16); // vswp d21, d28
*q11s16 = vcombine_s16(d22s16, d30s16); // vswp d23, d30
*q12s16 = vcombine_s16(d17s16, d25s16);
*q13s16 = vcombine_s16(d19s16, d27s16);
*q14s16 = vcombine_s16(d21s16, d29s16);
*q15s16 = vcombine_s16(d23s16, d31s16);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q8s16), vreinterpretq_s32_s16(*q10s16));
q1x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q9s16), vreinterpretq_s32_s16(*q11s16));
q2x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q12s16), vreinterpretq_s32_s16(*q14s16));
q3x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q13s16), vreinterpretq_s32_s16(*q15s16));
q0x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[0]), // q8
vreinterpretq_s16_s32(q1x2s32.val[0])); // q9
q1x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[1]), // q10
vreinterpretq_s16_s32(q1x2s32.val[1])); // q11
q2x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[0]), // q12
vreinterpretq_s16_s32(q3x2s32.val[0])); // q13
q3x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[1]), // q14
vreinterpretq_s16_s32(q3x2s32.val[1])); // q15
*q8s16 = q0x2s16.val[0];
*q9s16 = q0x2s16.val[1];
*q10s16 = q1x2s16.val[0];
*q11s16 = q1x2s16.val[1];
*q12s16 = q2x2s16.val[0];
*q13s16 = q2x2s16.val[1];
*q14s16 = q3x2s16.val[0];
*q15s16 = q3x2s16.val[1];
return;
}
static INLINE void IDCT8x8_1D(int16x8_t *q8s16, int16x8_t *q9s16,
int16x8_t *q10s16, int16x8_t *q11s16,
int16x8_t *q12s16, int16x8_t *q13s16,
int16x8_t *q14s16, int16x8_t *q15s16) {
int16x4_t d0s16, d1s16, d2s16, d3s16;
int16x4_t d8s16, d9s16, d10s16, d11s16, d12s16, d13s16, d14s16, d15s16;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int32x4_t q2s32, q3s32, q5s32, q6s32, q8s32, q9s32;
int32x4_t q10s32, q11s32, q12s32, q13s32, q15s32;
d0s16 = vdup_n_s16((int16_t)cospi_28_64);
d1s16 = vdup_n_s16((int16_t)cospi_4_64);
d2s16 = vdup_n_s16((int16_t)cospi_12_64);
d3s16 = vdup_n_s16((int16_t)cospi_20_64);
d16s16 = vget_low_s16(*q8s16);
d17s16 = vget_high_s16(*q8s16);
d18s16 = vget_low_s16(*q9s16);
d19s16 = vget_high_s16(*q9s16);
d20s16 = vget_low_s16(*q10s16);
d21s16 = vget_high_s16(*q10s16);
d22s16 = vget_low_s16(*q11s16);
d23s16 = vget_high_s16(*q11s16);
d24s16 = vget_low_s16(*q12s16);
d25s16 = vget_high_s16(*q12s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d30s16 = vget_low_s16(*q15s16);
d31s16 = vget_high_s16(*q15s16);
q2s32 = vmull_s16(d18s16, d0s16);
q3s32 = vmull_s16(d19s16, d0s16);
q5s32 = vmull_s16(d26s16, d2s16);
q6s32 = vmull_s16(d27s16, d2s16);
q2s32 = vmlsl_s16(q2s32, d30s16, d1s16);
q3s32 = vmlsl_s16(q3s32, d31s16, d1s16);
q5s32 = vmlsl_s16(q5s32, d22s16, d3s16);
q6s32 = vmlsl_s16(q6s32, d23s16, d3s16);
d8s16 = vqrshrn_n_s32(q2s32, 14);
d9s16 = vqrshrn_n_s32(q3s32, 14);
d10s16 = vqrshrn_n_s32(q5s32, 14);
d11s16 = vqrshrn_n_s32(q6s32, 14);
q4s16 = vcombine_s16(d8s16, d9s16);
q5s16 = vcombine_s16(d10s16, d11s16);
q2s32 = vmull_s16(d18s16, d1s16);
q3s32 = vmull_s16(d19s16, d1s16);
q9s32 = vmull_s16(d26s16, d3s16);
q13s32 = vmull_s16(d27s16, d3s16);
q2s32 = vmlal_s16(q2s32, d30s16, d0s16);
q3s32 = vmlal_s16(q3s32, d31s16, d0s16);
q9s32 = vmlal_s16(q9s32, d22s16, d2s16);
q13s32 = vmlal_s16(q13s32, d23s16, d2s16);
d14s16 = vqrshrn_n_s32(q2s32, 14);
d15s16 = vqrshrn_n_s32(q3s32, 14);
d12s16 = vqrshrn_n_s32(q9s32, 14);
d13s16 = vqrshrn_n_s32(q13s32, 14);
q6s16 = vcombine_s16(d12s16, d13s16);
q7s16 = vcombine_s16(d14s16, d15s16);
d0s16 = vdup_n_s16((int16_t)cospi_16_64);
q2s32 = vmull_s16(d16s16, d0s16);
q3s32 = vmull_s16(d17s16, d0s16);
q13s32 = vmull_s16(d16s16, d0s16);
q15s32 = vmull_s16(d17s16, d0s16);
q2s32 = vmlal_s16(q2s32, d24s16, d0s16);
q3s32 = vmlal_s16(q3s32, d25s16, d0s16);
q13s32 = vmlsl_s16(q13s32, d24s16, d0s16);
q15s32 = vmlsl_s16(q15s32, d25s16, d0s16);
d0s16 = vdup_n_s16((int16_t)cospi_24_64);
d1s16 = vdup_n_s16((int16_t)cospi_8_64);
d18s16 = vqrshrn_n_s32(q2s32, 14);
d19s16 = vqrshrn_n_s32(q3s32, 14);
d22s16 = vqrshrn_n_s32(q13s32, 14);
d23s16 = vqrshrn_n_s32(q15s32, 14);
*q9s16 = vcombine_s16(d18s16, d19s16);
*q11s16 = vcombine_s16(d22s16, d23s16);
q2s32 = vmull_s16(d20s16, d0s16);
q3s32 = vmull_s16(d21s16, d0s16);
q8s32 = vmull_s16(d20s16, d1s16);
q12s32 = vmull_s16(d21s16, d1s16);
q2s32 = vmlsl_s16(q2s32, d28s16, d1s16);
q3s32 = vmlsl_s16(q3s32, d29s16, d1s16);
q8s32 = vmlal_s16(q8s32, d28s16, d0s16);
q12s32 = vmlal_s16(q12s32, d29s16, d0s16);
d26s16 = vqrshrn_n_s32(q2s32, 14);
d27s16 = vqrshrn_n_s32(q3s32, 14);
d30s16 = vqrshrn_n_s32(q8s32, 14);
d31s16 = vqrshrn_n_s32(q12s32, 14);
*q13s16 = vcombine_s16(d26s16, d27s16);
*q15s16 = vcombine_s16(d30s16, d31s16);
q0s16 = vaddq_s16(*q9s16, *q15s16);
q1s16 = vaddq_s16(*q11s16, *q13s16);
q2s16 = vsubq_s16(*q11s16, *q13s16);
q3s16 = vsubq_s16(*q9s16, *q15s16);
*q13s16 = vsubq_s16(q4s16, q5s16);
q4s16 = vaddq_s16(q4s16, q5s16);
*q14s16 = vsubq_s16(q7s16, q6s16);
q7s16 = vaddq_s16(q7s16, q6s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d16s16 = vdup_n_s16((int16_t)cospi_16_64);
q9s32 = vmull_s16(d28s16, d16s16);
q10s32 = vmull_s16(d29s16, d16s16);
q11s32 = vmull_s16(d28s16, d16s16);
q12s32 = vmull_s16(d29s16, d16s16);
q9s32 = vmlsl_s16(q9s32, d26s16, d16s16);
q10s32 = vmlsl_s16(q10s32, d27s16, d16s16);
q11s32 = vmlal_s16(q11s32, d26s16, d16s16);
q12s32 = vmlal_s16(q12s32, d27s16, d16s16);
d10s16 = vqrshrn_n_s32(q9s32, 14);
d11s16 = vqrshrn_n_s32(q10s32, 14);
d12s16 = vqrshrn_n_s32(q11s32, 14);
d13s16 = vqrshrn_n_s32(q12s32, 14);
q5s16 = vcombine_s16(d10s16, d11s16);
q6s16 = vcombine_s16(d12s16, d13s16);
*q8s16 = vaddq_s16(q0s16, q7s16);
*q9s16 = vaddq_s16(q1s16, q6s16);
*q10s16 = vaddq_s16(q2s16, q5s16);
*q11s16 = vaddq_s16(q3s16, q4s16);
*q12s16 = vsubq_s16(q3s16, q4s16);
*q13s16 = vsubq_s16(q2s16, q5s16);
*q14s16 = vsubq_s16(q1s16, q6s16);
*q15s16 = vsubq_s16(q0s16, q7s16);
return;
}
void aom_idct8x8_64_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8_t *d1, *d2;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
uint64x1_t d0u64, d1u64, d2u64, d3u64;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
uint16x8_t q8u16, q9u16, q10u16, q11u16;
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
q10s16 = vld1q_s16(input + 16);
q11s16 = vld1q_s16(input + 24);
q12s16 = vld1q_s16(input + 32);
q13s16 = vld1q_s16(input + 40);
q14s16 = vld1q_s16(input + 48);
q15s16 = vld1q_s16(input + 56);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
q8s16 = vrshrq_n_s16(q8s16, 5);
q9s16 = vrshrq_n_s16(q9s16, 5);
q10s16 = vrshrq_n_s16(q10s16, 5);
q11s16 = vrshrq_n_s16(q11s16, 5);
q12s16 = vrshrq_n_s16(q12s16, 5);
q13s16 = vrshrq_n_s16(q13s16, 5);
q14s16 = vrshrq_n_s16(q14s16, 5);
q15s16 = vrshrq_n_s16(q15s16, 5);
d1 = d2 = dest;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
q8s16 = q12s16;
q9s16 = q13s16;
q10s16 = q14s16;
q11s16 = q15s16;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
return;
}
void aom_idct8x8_12_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8_t *d1, *d2;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
int16x4_t d10s16, d11s16, d12s16, d13s16, d16s16;
int16x4_t d26s16, d27s16, d28s16, d29s16;
uint64x1_t d0u64, d1u64, d2u64, d3u64;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
uint16x8_t q8u16, q9u16, q10u16, q11u16;
int32x4_t q9s32, q10s32, q11s32, q12s32;
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
q10s16 = vld1q_s16(input + 16);
q11s16 = vld1q_s16(input + 24);
q12s16 = vld1q_s16(input + 32);
q13s16 = vld1q_s16(input + 40);
q14s16 = vld1q_s16(input + 48);
q15s16 = vld1q_s16(input + 56);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
// First transform rows
// stage 1
q0s16 = vdupq_n_s16((int16_t)cospi_28_64 * 2);
q1s16 = vdupq_n_s16((int16_t)cospi_4_64 * 2);
q4s16 = vqrdmulhq_s16(q9s16, q0s16);
q0s16 = vdupq_n_s16(-(int16_t)cospi_20_64 * 2);
q7s16 = vqrdmulhq_s16(q9s16, q1s16);
q1s16 = vdupq_n_s16((int16_t)cospi_12_64 * 2);
q5s16 = vqrdmulhq_s16(q11s16, q0s16);
q0s16 = vdupq_n_s16((int16_t)cospi_16_64 * 2);
q6s16 = vqrdmulhq_s16(q11s16, q1s16);
// stage 2 & stage 3 - even half
q1s16 = vdupq_n_s16((int16_t)cospi_24_64 * 2);
q9s16 = vqrdmulhq_s16(q8s16, q0s16);
q0s16 = vdupq_n_s16((int16_t)cospi_8_64 * 2);
q13s16 = vqrdmulhq_s16(q10s16, q1s16);
q15s16 = vqrdmulhq_s16(q10s16, q0s16);
// stage 3 -odd half
q0s16 = vaddq_s16(q9s16, q15s16);
q1s16 = vaddq_s16(q9s16, q13s16);
q2s16 = vsubq_s16(q9s16, q13s16);
q3s16 = vsubq_s16(q9s16, q15s16);
// stage 2 - odd half
q13s16 = vsubq_s16(q4s16, q5s16);
q4s16 = vaddq_s16(q4s16, q5s16);
q14s16 = vsubq_s16(q7s16, q6s16);
q7s16 = vaddq_s16(q7s16, q6s16);
d26s16 = vget_low_s16(q13s16);
d27s16 = vget_high_s16(q13s16);
d28s16 = vget_low_s16(q14s16);
d29s16 = vget_high_s16(q14s16);
d16s16 = vdup_n_s16((int16_t)cospi_16_64);
q9s32 = vmull_s16(d28s16, d16s16);
q10s32 = vmull_s16(d29s16, d16s16);
q11s32 = vmull_s16(d28s16, d16s16);
q12s32 = vmull_s16(d29s16, d16s16);
q9s32 = vmlsl_s16(q9s32, d26s16, d16s16);
q10s32 = vmlsl_s16(q10s32, d27s16, d16s16);
q11s32 = vmlal_s16(q11s32, d26s16, d16s16);
q12s32 = vmlal_s16(q12s32, d27s16, d16s16);
d10s16 = vqrshrn_n_s32(q9s32, 14);
d11s16 = vqrshrn_n_s32(q10s32, 14);
d12s16 = vqrshrn_n_s32(q11s32, 14);
d13s16 = vqrshrn_n_s32(q12s32, 14);
q5s16 = vcombine_s16(d10s16, d11s16);
q6s16 = vcombine_s16(d12s16, d13s16);
// stage 4
q8s16 = vaddq_s16(q0s16, q7s16);
q9s16 = vaddq_s16(q1s16, q6s16);
q10s16 = vaddq_s16(q2s16, q5s16);
q11s16 = vaddq_s16(q3s16, q4s16);
q12s16 = vsubq_s16(q3s16, q4s16);
q13s16 = vsubq_s16(q2s16, q5s16);
q14s16 = vsubq_s16(q1s16, q6s16);
q15s16 = vsubq_s16(q0s16, q7s16);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
q8s16 = vrshrq_n_s16(q8s16, 5);
q9s16 = vrshrq_n_s16(q9s16, 5);
q10s16 = vrshrq_n_s16(q10s16, 5);
q11s16 = vrshrq_n_s16(q11s16, 5);
q12s16 = vrshrq_n_s16(q12s16, 5);
q13s16 = vrshrq_n_s16(q13s16, 5);
q14s16 = vrshrq_n_s16(q14s16, 5);
q15s16 = vrshrq_n_s16(q15s16, 5);
d1 = d2 = dest;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
q8s16 = q12s16;
q9s16 = q13s16;
q10s16 = q14s16;
q11s16 = q15s16;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
return;
}

View File

@@ -1,26 +1,26 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
//------------------------------------------------------------------------------ //------------------------------------------------------------------------------
// DC 4x4 // DC 4x4
// 'do_above' and 'do_left' facilitate branch removal when inlined. // 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride, static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
const uint8_t *above, const uint8_t *left, const uint8_t *left, int do_above, int do_left) {
int do_above, int do_left) {
uint16x8_t sum_top; uint16x8_t sum_top;
uint16x8_t sum_left; uint16x8_t sum_left;
uint8x8_t dc0; uint8x8_t dc0;
@@ -33,7 +33,7 @@ static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride,
} }
if (do_left) { if (do_left) {
const uint8x8_t L = vld1_u8(left); // left border const uint8x8_t L = vld1_u8(left); // left border
const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left
const uint16x4_t p1 = vpadd_u16(p0, p0); const uint16x4_t p1 = vpadd_u16(p0, p0);
sum_left = vcombine_u16(p1, p1); sum_left = vcombine_u16(p1, p1);
@@ -54,29 +54,29 @@ static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride,
const uint8x8_t dc = vdup_lane_u8(dc0, 0); const uint8x8_t dc = vdup_lane_u8(dc0, 0);
int i; int i;
for (i = 0; i < 4; ++i) { for (i = 0; i < 4; ++i) {
vst1_lane_u32((uint32_t*)(dst + i * stride), vreinterpret_u32_u8(dc), 0); vst1_lane_u32((uint32_t *)(dst + i * stride), vreinterpret_u32_u8(dc), 0);
} }
} }
} }
void vpx_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
dc_4x4(dst, stride, above, left, 1, 1); dc_4x4(dst, stride, above, left, 1, 1);
} }
void vpx_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
(void)above; (void)above;
dc_4x4(dst, stride, NULL, left, 0, 1); dc_4x4(dst, stride, NULL, left, 0, 1);
} }
void vpx_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
(void)left; (void)left;
dc_4x4(dst, stride, above, NULL, 1, 0); dc_4x4(dst, stride, above, NULL, 1, 0);
} }
void vpx_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
(void)above; (void)above;
(void)left; (void)left;
@@ -87,9 +87,8 @@ void vpx_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
// DC 8x8 // DC 8x8
// 'do_above' and 'do_left' facilitate branch removal when inlined. // 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride, static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
const uint8_t *above, const uint8_t *left, const uint8_t *left, int do_above, int do_left) {
int do_above, int do_left) {
uint16x8_t sum_top; uint16x8_t sum_top;
uint16x8_t sum_left; uint16x8_t sum_left;
uint8x8_t dc0; uint8x8_t dc0;
@@ -103,7 +102,7 @@ static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride,
} }
if (do_left) { if (do_left) {
const uint8x8_t L = vld1_u8(left); // left border const uint8x8_t L = vld1_u8(left); // left border
const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left
const uint16x4_t p1 = vpadd_u16(p0, p0); const uint16x4_t p1 = vpadd_u16(p0, p0);
const uint16x4_t p2 = vpadd_u16(p1, p1); const uint16x4_t p2 = vpadd_u16(p1, p1);
@@ -125,29 +124,29 @@ static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride,
const uint8x8_t dc = vdup_lane_u8(dc0, 0); const uint8x8_t dc = vdup_lane_u8(dc0, 0);
int i; int i;
for (i = 0; i < 8; ++i) { for (i = 0; i < 8; ++i) {
vst1_u32((uint32_t*)(dst + i * stride), vreinterpret_u32_u8(dc)); vst1_u32((uint32_t *)(dst + i * stride), vreinterpret_u32_u8(dc));
} }
} }
} }
void vpx_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
dc_8x8(dst, stride, above, left, 1, 1); dc_8x8(dst, stride, above, left, 1, 1);
} }
void vpx_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
(void)above; (void)above;
dc_8x8(dst, stride, NULL, left, 0, 1); dc_8x8(dst, stride, NULL, left, 0, 1);
} }
void vpx_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
(void)left; (void)left;
dc_8x8(dst, stride, above, NULL, 1, 0); dc_8x8(dst, stride, above, NULL, 1, 0);
} }
void vpx_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
(void)above; (void)above;
(void)left; (void)left;
@@ -167,7 +166,7 @@ static INLINE void dc_16x16(uint8_t *dst, ptrdiff_t stride,
if (do_above) { if (do_above) {
const uint8x16_t A = vld1q_u8(above); // top row const uint8x16_t A = vld1q_u8(above); // top row
const uint16x8_t p0 = vpaddlq_u8(A); // cascading summation of the top const uint16x8_t p0 = vpaddlq_u8(A); // cascading summation of the top
const uint16x4_t p1 = vadd_u16(vget_low_u16(p0), vget_high_u16(p0)); const uint16x4_t p1 = vadd_u16(vget_low_u16(p0), vget_high_u16(p0));
const uint16x4_t p2 = vpadd_u16(p1, p1); const uint16x4_t p2 = vpadd_u16(p1, p1);
const uint16x4_t p3 = vpadd_u16(p2, p2); const uint16x4_t p3 = vpadd_u16(p2, p2);
@@ -203,26 +202,26 @@ static INLINE void dc_16x16(uint8_t *dst, ptrdiff_t stride,
} }
} }
void vpx_dc_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
dc_16x16(dst, stride, above, left, 1, 1); dc_16x16(dst, stride, above, left, 1, 1);
} }
void vpx_dc_left_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_left_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *above,
const uint8_t *left) { const uint8_t *left) {
(void)above; (void)above;
dc_16x16(dst, stride, NULL, left, 0, 1); dc_16x16(dst, stride, NULL, left, 0, 1);
} }
void vpx_dc_top_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_top_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *above,
const uint8_t *left) { const uint8_t *left) {
(void)left; (void)left;
dc_16x16(dst, stride, above, NULL, 1, 0); dc_16x16(dst, stride, above, NULL, 1, 0);
} }
void vpx_dc_128_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_128_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *above,
const uint8_t *left) { const uint8_t *left) {
(void)above; (void)above;
@@ -286,26 +285,26 @@ static INLINE void dc_32x32(uint8_t *dst, ptrdiff_t stride,
} }
} }
void vpx_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
dc_32x32(dst, stride, above, left, 1, 1); dc_32x32(dst, stride, above, left, 1, 1);
} }
void vpx_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *above,
const uint8_t *left) { const uint8_t *left) {
(void)above; (void)above;
dc_32x32(dst, stride, NULL, left, 0, 1); dc_32x32(dst, stride, NULL, left, 0, 1);
} }
void vpx_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *above,
const uint8_t *left) { const uint8_t *left) {
(void)left; (void)left;
dc_32x32(dst, stride, above, NULL, 1, 0); dc_32x32(dst, stride, above, NULL, 1, 0);
} }
void vpx_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride, void aom_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *above,
const uint8_t *left) { const uint8_t *left) {
(void)above; (void)above;
@@ -315,7 +314,7 @@ void vpx_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
// ----------------------------------------------------------------------------- // -----------------------------------------------------------------------------
void vpx_d45_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_d45_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
const uint64x1_t A0 = vreinterpret_u64_u8(vld1_u8(above)); // top row const uint64x1_t A0 = vreinterpret_u64_u8(vld1_u8(above)); // top row
const uint64x1_t A1 = vshr_n_u64(A0, 8); const uint64x1_t A1 = vshr_n_u64(A0, 8);
@@ -338,7 +337,7 @@ void vpx_d45_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
dst[3 * stride + 3] = above[7]; dst[3 * stride + 3] = above[7];
} }
void vpx_d45_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_d45_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
static const uint8_t shuffle1[8] = { 1, 2, 3, 4, 5, 6, 7, 7 }; static const uint8_t shuffle1[8] = { 1, 2, 3, 4, 5, 6, 7, 7 };
static const uint8_t shuffle2[8] = { 2, 3, 4, 5, 6, 7, 7, 7 }; static const uint8_t shuffle2[8] = { 2, 3, 4, 5, 6, 7, 7, 7 };
@@ -358,7 +357,7 @@ void vpx_d45_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
vst1_u8(dst + i * stride, row); vst1_u8(dst + i * stride, row);
} }
void vpx_d45_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_d45_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
const uint8x16_t A0 = vld1q_u8(above); // top row const uint8x16_t A0 = vld1q_u8(above); // top row
const uint8x16_t above_right = vld1q_dup_u8(above + 15); const uint8x16_t above_right = vld1q_dup_u8(above + 15);
@@ -377,7 +376,7 @@ void vpx_d45_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
// ----------------------------------------------------------------------------- // -----------------------------------------------------------------------------
void vpx_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
const uint8x8_t XABCD_u8 = vld1_u8(above - 1); const uint8x8_t XABCD_u8 = vld1_u8(above - 1);
const uint64x1_t XABCD = vreinterpret_u64_u8(XABCD_u8); const uint64x1_t XABCD = vreinterpret_u64_u8(XABCD_u8);
@@ -407,7 +406,7 @@ void vpx_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
#if !HAVE_NEON_ASM #if !HAVE_NEON_ASM
void vpx_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int i; int i;
uint32x2_t d0u32 = vdup_n_u32(0); uint32x2_t d0u32 = vdup_n_u32(0);
@@ -418,29 +417,27 @@ void vpx_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
vst1_lane_u32((uint32_t *)dst, d0u32, 0); vst1_lane_u32((uint32_t *)dst, d0u32, 0);
} }
void vpx_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int i; int i;
uint8x8_t d0u8 = vdup_n_u8(0); uint8x8_t d0u8 = vdup_n_u8(0);
(void)left; (void)left;
d0u8 = vld1_u8(above); d0u8 = vld1_u8(above);
for (i = 0; i < 8; i++, dst += stride) for (i = 0; i < 8; i++, dst += stride) vst1_u8(dst, d0u8);
vst1_u8(dst, d0u8);
} }
void vpx_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int i; int i;
uint8x16_t q0u8 = vdupq_n_u8(0); uint8x16_t q0u8 = vdupq_n_u8(0);
(void)left; (void)left;
q0u8 = vld1q_u8(above); q0u8 = vld1q_u8(above);
for (i = 0; i < 16; i++, dst += stride) for (i = 0; i < 16; i++, dst += stride) vst1q_u8(dst, q0u8);
vst1q_u8(dst, q0u8);
} }
void vpx_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride, void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int i; int i;
uint8x16_t q0u8 = vdupq_n_u8(0); uint8x16_t q0u8 = vdupq_n_u8(0);
@@ -455,7 +452,7 @@ void vpx_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
} }
} }
void vpx_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
uint8x8_t d0u8 = vdup_n_u8(0); uint8x8_t d0u8 = vdup_n_u8(0);
uint32x2_t d1u32 = vdup_n_u32(0); uint32x2_t d1u32 = vdup_n_u32(0);
@@ -476,7 +473,7 @@ void vpx_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0); vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
} }
void vpx_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
uint8x8_t d0u8 = vdup_n_u8(0); uint8x8_t d0u8 = vdup_n_u8(0);
uint64x1_t d1u64 = vdup_n_u64(0); uint64x1_t d1u64 = vdup_n_u64(0);
@@ -509,7 +506,7 @@ void vpx_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
vst1_u8(dst, d0u8); vst1_u8(dst, d0u8);
} }
void vpx_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int j; int j;
uint8x8_t d2u8 = vdup_n_u8(0); uint8x8_t d2u8 = vdup_n_u8(0);
@@ -547,7 +544,7 @@ void vpx_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
} }
} }
void vpx_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride, void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int j, k; int j, k;
uint8x8_t d2u8 = vdup_n_u8(0); uint8x8_t d2u8 = vdup_n_u8(0);
@@ -595,7 +592,7 @@ void vpx_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
} }
} }
void vpx_tm_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride, void aom_tm_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int i; int i;
uint16x8_t q1u16, q3u16; uint16x8_t q1u16, q3u16;
@@ -608,14 +605,14 @@ void vpx_tm_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
q3u16 = vsubl_u8(vreinterpret_u8_u32(d2u32), d0u8); q3u16 = vsubl_u8(vreinterpret_u8_u32(d2u32), d0u8);
for (i = 0; i < 4; i++, dst += stride) { for (i = 0; i < 4; i++, dst += stride) {
q1u16 = vdupq_n_u16((uint16_t)left[i]); q1u16 = vdupq_n_u16((uint16_t)left[i]);
q1s16 = vaddq_s16(vreinterpretq_s16_u16(q1u16), q1s16 =
vreinterpretq_s16_u16(q3u16)); vaddq_s16(vreinterpretq_s16_u16(q1u16), vreinterpretq_s16_u16(q3u16));
d0u8 = vqmovun_s16(q1s16); d0u8 = vqmovun_s16(q1s16);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0); vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
} }
} }
void vpx_tm_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride, void aom_tm_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int j; int j;
uint16x8_t q0u16, q3u16, q10u16; uint16x8_t q0u16, q3u16, q10u16;
@@ -631,33 +628,33 @@ void vpx_tm_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
d20u16 = vget_low_u16(q10u16); d20u16 = vget_low_u16(q10u16);
for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) { for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) {
q0u16 = vdupq_lane_u16(d20u16, 0); q0u16 = vdupq_lane_u16(d20u16, 0);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16), q0s16 =
vreinterpretq_s16_u16(q0u16)); vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16); d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8)); vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride; dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 1); q0u16 = vdupq_lane_u16(d20u16, 1);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16), q0s16 =
vreinterpretq_s16_u16(q0u16)); vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16); d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8)); vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride; dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 2); q0u16 = vdupq_lane_u16(d20u16, 2);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16), q0s16 =
vreinterpretq_s16_u16(q0u16)); vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16); d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8)); vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride; dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 3); q0u16 = vdupq_lane_u16(d20u16, 3);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16), q0s16 =
vreinterpretq_s16_u16(q0u16)); vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16); d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8)); vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride; dst += stride;
} }
} }
void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride, void aom_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int j, k; int j, k;
uint16x8_t q0u16, q2u16, q3u16, q8u16, q10u16; uint16x8_t q0u16, q2u16, q3u16, q8u16, q10u16;
@@ -677,14 +674,14 @@ void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) { for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) {
q0u16 = vdupq_lane_u16(d20u16, 0); q0u16 = vdupq_lane_u16(d20u16, 0);
q8u16 = vdupq_lane_u16(d20u16, 1); q8u16 = vdupq_lane_u16(d20u16, 1);
q1s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q1s16 =
vreinterpretq_s16_u16(q2u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q2u16));
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q0s16 =
vreinterpretq_s16_u16(q3u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q3u16));
q11s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16), q11s16 =
vreinterpretq_s16_u16(q2u16)); vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q2u16));
q8s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16), q8s16 =
vreinterpretq_s16_u16(q3u16)); vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q3u16));
d2u8 = vqmovun_s16(q1s16); d2u8 = vqmovun_s16(q1s16);
d3u8 = vqmovun_s16(q0s16); d3u8 = vqmovun_s16(q0s16);
d22u8 = vqmovun_s16(q11s16); d22u8 = vqmovun_s16(q11s16);
@@ -698,14 +695,14 @@ void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
q0u16 = vdupq_lane_u16(d20u16, 2); q0u16 = vdupq_lane_u16(d20u16, 2);
q8u16 = vdupq_lane_u16(d20u16, 3); q8u16 = vdupq_lane_u16(d20u16, 3);
q1s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q1s16 =
vreinterpretq_s16_u16(q2u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q2u16));
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q0s16 =
vreinterpretq_s16_u16(q3u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q3u16));
q11s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16), q11s16 =
vreinterpretq_s16_u16(q2u16)); vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q2u16));
q8s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16), q8s16 =
vreinterpretq_s16_u16(q3u16)); vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q3u16));
d2u8 = vqmovun_s16(q1s16); d2u8 = vqmovun_s16(q1s16);
d3u8 = vqmovun_s16(q0s16); d3u8 = vqmovun_s16(q0s16);
d22u8 = vqmovun_s16(q11s16); d22u8 = vqmovun_s16(q11s16);
@@ -720,7 +717,7 @@ void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
} }
} }
void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride, void aom_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) { const uint8_t *above, const uint8_t *left) {
int j, k; int j, k;
uint16x8_t q0u16, q3u16, q8u16, q9u16, q10u16, q11u16; uint16x8_t q0u16, q3u16, q8u16, q9u16, q10u16, q11u16;
@@ -742,10 +739,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
d6u16 = vget_low_u16(q3u16); d6u16 = vget_low_u16(q3u16);
for (j = 0; j < 2; j++, d6u16 = vget_high_u16(q3u16)) { for (j = 0; j < 2; j++, d6u16 = vget_high_u16(q3u16)) {
q0u16 = vdupq_lane_u16(d6u16, 0); q0u16 = vdupq_lane_u16(d6u16, 0);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q12s16 =
vreinterpretq_s16_u16(q8u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q13s16 =
vreinterpretq_s16_u16(q9u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16)); vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
@@ -761,10 +758,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
dst += stride; dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 1); q0u16 = vdupq_lane_u16(d6u16, 1);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q12s16 =
vreinterpretq_s16_u16(q8u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q13s16 =
vreinterpretq_s16_u16(q9u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16)); vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
@@ -780,10 +777,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
dst += stride; dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 2); q0u16 = vdupq_lane_u16(d6u16, 2);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q12s16 =
vreinterpretq_s16_u16(q8u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q13s16 =
vreinterpretq_s16_u16(q9u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16)); vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
@@ -799,10 +796,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
dst += stride; dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 3); q0u16 = vdupq_lane_u16(d6u16, 3);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q12s16 =
vreinterpretq_s16_u16(q8u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q13s16 =
vreinterpretq_s16_u16(q9u16)); vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16)); vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16), q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),

View File

@@ -1,32 +1,35 @@
; ;
; Copyright (c) 2014 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_v_predictor_4x4_neon| ;
EXPORT |vpx_v_predictor_8x8_neon|
EXPORT |vpx_v_predictor_16x16_neon| EXPORT |aom_v_predictor_4x4_neon|
EXPORT |vpx_v_predictor_32x32_neon| EXPORT |aom_v_predictor_8x8_neon|
EXPORT |vpx_h_predictor_4x4_neon| EXPORT |aom_v_predictor_16x16_neon|
EXPORT |vpx_h_predictor_8x8_neon| EXPORT |aom_v_predictor_32x32_neon|
EXPORT |vpx_h_predictor_16x16_neon| EXPORT |aom_h_predictor_4x4_neon|
EXPORT |vpx_h_predictor_32x32_neon| EXPORT |aom_h_predictor_8x8_neon|
EXPORT |vpx_tm_predictor_4x4_neon| EXPORT |aom_h_predictor_16x16_neon|
EXPORT |vpx_tm_predictor_8x8_neon| EXPORT |aom_h_predictor_32x32_neon|
EXPORT |vpx_tm_predictor_16x16_neon| EXPORT |aom_tm_predictor_4x4_neon|
EXPORT |vpx_tm_predictor_32x32_neon| EXPORT |aom_tm_predictor_8x8_neon|
EXPORT |aom_tm_predictor_16x16_neon|
EXPORT |aom_tm_predictor_32x32_neon|
ARM ARM
REQUIRE8 REQUIRE8
PRESERVE8 PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -34,16 +37,16 @@
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_v_predictor_4x4_neon| PROC |aom_v_predictor_4x4_neon| PROC
vld1.32 {d0[0]}, [r2] vld1.32 {d0[0]}, [r2]
vst1.32 {d0[0]}, [r0], r1 vst1.32 {d0[0]}, [r0], r1
vst1.32 {d0[0]}, [r0], r1 vst1.32 {d0[0]}, [r0], r1
vst1.32 {d0[0]}, [r0], r1 vst1.32 {d0[0]}, [r0], r1
vst1.32 {d0[0]}, [r0], r1 vst1.32 {d0[0]}, [r0], r1
bx lr bx lr
ENDP ; |vpx_v_predictor_4x4_neon| ENDP ; |aom_v_predictor_4x4_neon|
;void vpx_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -51,7 +54,7 @@
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_v_predictor_8x8_neon| PROC |aom_v_predictor_8x8_neon| PROC
vld1.8 {d0}, [r2] vld1.8 {d0}, [r2]
vst1.8 {d0}, [r0], r1 vst1.8 {d0}, [r0], r1
vst1.8 {d0}, [r0], r1 vst1.8 {d0}, [r0], r1
@@ -62,9 +65,9 @@
vst1.8 {d0}, [r0], r1 vst1.8 {d0}, [r0], r1
vst1.8 {d0}, [r0], r1 vst1.8 {d0}, [r0], r1
bx lr bx lr
ENDP ; |vpx_v_predictor_8x8_neon| ENDP ; |aom_v_predictor_8x8_neon|
;void vpx_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -72,7 +75,7 @@
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_v_predictor_16x16_neon| PROC |aom_v_predictor_16x16_neon| PROC
vld1.8 {q0}, [r2] vld1.8 {q0}, [r2]
vst1.8 {q0}, [r0], r1 vst1.8 {q0}, [r0], r1
vst1.8 {q0}, [r0], r1 vst1.8 {q0}, [r0], r1
@@ -91,9 +94,9 @@
vst1.8 {q0}, [r0], r1 vst1.8 {q0}, [r0], r1
vst1.8 {q0}, [r0], r1 vst1.8 {q0}, [r0], r1
bx lr bx lr
ENDP ; |vpx_v_predictor_16x16_neon| ENDP ; |aom_v_predictor_16x16_neon|
;void vpx_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -101,7 +104,7 @@
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_v_predictor_32x32_neon| PROC |aom_v_predictor_32x32_neon| PROC
vld1.8 {q0, q1}, [r2] vld1.8 {q0, q1}, [r2]
mov r2, #2 mov r2, #2
loop_v loop_v
@@ -124,9 +127,9 @@ loop_v
subs r2, r2, #1 subs r2, r2, #1
bgt loop_v bgt loop_v
bx lr bx lr
ENDP ; |vpx_v_predictor_32x32_neon| ENDP ; |aom_v_predictor_32x32_neon|
;void vpx_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -134,7 +137,7 @@ loop_v
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_h_predictor_4x4_neon| PROC |aom_h_predictor_4x4_neon| PROC
vld1.32 {d1[0]}, [r3] vld1.32 {d1[0]}, [r3]
vdup.8 d0, d1[0] vdup.8 d0, d1[0]
vst1.32 {d0[0]}, [r0], r1 vst1.32 {d0[0]}, [r0], r1
@@ -145,9 +148,9 @@ loop_v
vdup.8 d0, d1[3] vdup.8 d0, d1[3]
vst1.32 {d0[0]}, [r0], r1 vst1.32 {d0[0]}, [r0], r1
bx lr bx lr
ENDP ; |vpx_h_predictor_4x4_neon| ENDP ; |aom_h_predictor_4x4_neon|
;void vpx_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -155,7 +158,7 @@ loop_v
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_h_predictor_8x8_neon| PROC |aom_h_predictor_8x8_neon| PROC
vld1.64 {d1}, [r3] vld1.64 {d1}, [r3]
vdup.8 d0, d1[0] vdup.8 d0, d1[0]
vst1.64 {d0}, [r0], r1 vst1.64 {d0}, [r0], r1
@@ -174,9 +177,9 @@ loop_v
vdup.8 d0, d1[7] vdup.8 d0, d1[7]
vst1.64 {d0}, [r0], r1 vst1.64 {d0}, [r0], r1
bx lr bx lr
ENDP ; |vpx_h_predictor_8x8_neon| ENDP ; |aom_h_predictor_8x8_neon|
;void vpx_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -184,7 +187,7 @@ loop_v
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_h_predictor_16x16_neon| PROC |aom_h_predictor_16x16_neon| PROC
vld1.8 {q1}, [r3] vld1.8 {q1}, [r3]
vdup.8 q0, d2[0] vdup.8 q0, d2[0]
vst1.8 {q0}, [r0], r1 vst1.8 {q0}, [r0], r1
@@ -219,9 +222,9 @@ loop_v
vdup.8 q0, d3[7] vdup.8 q0, d3[7]
vst1.8 {q0}, [r0], r1 vst1.8 {q0}, [r0], r1
bx lr bx lr
ENDP ; |vpx_h_predictor_16x16_neon| ENDP ; |aom_h_predictor_16x16_neon|
;void vpx_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, ;void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -229,7 +232,7 @@ loop_v
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_h_predictor_32x32_neon| PROC |aom_h_predictor_32x32_neon| PROC
sub r1, r1, #16 sub r1, r1, #16
mov r2, #2 mov r2, #2
loop_h loop_h
@@ -285,9 +288,9 @@ loop_h
subs r2, r2, #1 subs r2, r2, #1
bgt loop_h bgt loop_h
bx lr bx lr
ENDP ; |vpx_h_predictor_32x32_neon| ENDP ; |aom_h_predictor_32x32_neon|
;void vpx_tm_predictor_4x4_neon (uint8_t *dst, ptrdiff_t y_stride, ;void aom_tm_predictor_4x4_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -295,7 +298,7 @@ loop_h
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_tm_predictor_4x4_neon| PROC |aom_tm_predictor_4x4_neon| PROC
; Load ytop_left = above[-1]; ; Load ytop_left = above[-1];
sub r12, r2, #1 sub r12, r2, #1
vld1.u8 {d0[]}, [r12] vld1.u8 {d0[]}, [r12]
@@ -331,9 +334,9 @@ loop_h
vst1.32 {d0[0]}, [r0], r1 vst1.32 {d0[0]}, [r0], r1
vst1.32 {d1[0]}, [r0], r1 vst1.32 {d1[0]}, [r0], r1
bx lr bx lr
ENDP ; |vpx_tm_predictor_4x4_neon| ENDP ; |aom_tm_predictor_4x4_neon|
;void vpx_tm_predictor_8x8_neon (uint8_t *dst, ptrdiff_t y_stride, ;void aom_tm_predictor_8x8_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -341,7 +344,7 @@ loop_h
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_tm_predictor_8x8_neon| PROC |aom_tm_predictor_8x8_neon| PROC
; Load ytop_left = above[-1]; ; Load ytop_left = above[-1];
sub r12, r2, #1 sub r12, r2, #1
vld1.8 {d0[]}, [r12] vld1.8 {d0[]}, [r12]
@@ -403,9 +406,9 @@ loop_h
vst1.64 {d3}, [r0], r1 vst1.64 {d3}, [r0], r1
bx lr bx lr
ENDP ; |vpx_tm_predictor_8x8_neon| ENDP ; |aom_tm_predictor_8x8_neon|
;void vpx_tm_predictor_16x16_neon (uint8_t *dst, ptrdiff_t y_stride, ;void aom_tm_predictor_16x16_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -413,7 +416,7 @@ loop_h
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_tm_predictor_16x16_neon| PROC |aom_tm_predictor_16x16_neon| PROC
; Load ytop_left = above[-1]; ; Load ytop_left = above[-1];
sub r12, r2, #1 sub r12, r2, #1
vld1.8 {d0[]}, [r12] vld1.8 {d0[]}, [r12]
@@ -496,9 +499,9 @@ loop_16x16_neon
bgt loop_16x16_neon bgt loop_16x16_neon
bx lr bx lr
ENDP ; |vpx_tm_predictor_16x16_neon| ENDP ; |aom_tm_predictor_16x16_neon|
;void vpx_tm_predictor_32x32_neon (uint8_t *dst, ptrdiff_t y_stride, ;void aom_tm_predictor_32x32_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above, ; const uint8_t *above,
; const uint8_t *left) ; const uint8_t *left)
; r0 uint8_t *dst ; r0 uint8_t *dst
@@ -506,7 +509,7 @@ loop_16x16_neon
; r2 const uint8_t *above ; r2 const uint8_t *above
; r3 const uint8_t *left ; r3 const uint8_t *left
|vpx_tm_predictor_32x32_neon| PROC |aom_tm_predictor_32x32_neon| PROC
; Load ytop_left = above[-1]; ; Load ytop_left = above[-1];
sub r12, r2, #1 sub r12, r2, #1
vld1.8 {d0[]}, [r12] vld1.8 {d0[]}, [r12]
@@ -625,6 +628,6 @@ loop_32x32_neon
bgt loop_32x32_neon bgt loop_32x32_neon
bx lr bx lr
ENDP ; |vpx_tm_predictor_32x32_neon| ENDP ; |aom_tm_predictor_32x32_neon|
END END

View File

@@ -1,19 +1,22 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_lpf_horizontal_4_dual_neon| ;
EXPORT |aom_lpf_horizontal_4_dual_neon|
ARM ARM
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_lpf_horizontal_4_dual_neon(uint8_t *s, int p, ;void aom_lpf_horizontal_4_dual_neon(uint8_t *s, int p,
; const uint8_t *blimit0, ; const uint8_t *blimit0,
; const uint8_t *limit0, ; const uint8_t *limit0,
; const uint8_t *thresh0, ; const uint8_t *thresh0,
@@ -29,7 +32,7 @@
; sp+8 const uint8_t *limit1, ; sp+8 const uint8_t *limit1,
; sp+12 const uint8_t *thresh1, ; sp+12 const uint8_t *thresh1,
|vpx_lpf_horizontal_4_dual_neon| PROC |aom_lpf_horizontal_4_dual_neon| PROC
push {lr} push {lr}
ldr r12, [sp, #4] ; load thresh0 ldr r12, [sp, #4] ; load thresh0
@@ -66,7 +69,7 @@
sub r2, r2, r1, lsl #1 sub r2, r2, r1, lsl #1
sub r3, r3, r1, lsl #1 sub r3, r3, r1, lsl #1
bl vpx_loop_filter_neon_16 bl aom_loop_filter_neon_16
vst1.u8 {q5}, [r2@64], r1 ; store op1 vst1.u8 {q5}, [r2@64], r1 ; store op1
vst1.u8 {q6}, [r3@64], r1 ; store op0 vst1.u8 {q6}, [r3@64], r1 ; store op0
@@ -76,9 +79,9 @@
vpop {d8-d15} ; restore neon registers vpop {d8-d15} ; restore neon registers
pop {pc} pop {pc}
ENDP ; |vpx_lpf_horizontal_4_dual_neon| ENDP ; |aom_lpf_horizontal_4_dual_neon|
; void vpx_loop_filter_neon_16(); ; void aom_loop_filter_neon_16();
; This is a helper function for the loopfilters. The invidual functions do the ; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. This function uses ; necessary load, transpose (if necessary) and store. This function uses
; registers d8-d15, so the calling function must save those registers. ; registers d8-d15, so the calling function must save those registers.
@@ -101,7 +104,7 @@
; q6 op0 ; q6 op0
; q7 oq0 ; q7 oq0
; q8 oq1 ; q8 oq1
|vpx_loop_filter_neon_16| PROC |aom_loop_filter_neon_16| PROC
; filter_mask ; filter_mask
vabd.u8 q11, q3, q4 ; m1 = abs(p3 - p2) vabd.u8 q11, q3, q4 ; m1 = abs(p3 - p2)
@@ -194,6 +197,6 @@
veor q8, q12, q10 ; *oq1 = u^0x80 veor q8, q12, q10 ; *oq1 = u^0x80
bx lr bx lr
ENDP ; |vpx_loop_filter_neon_16| ENDP ; |aom_loop_filter_neon_16|
END END

View File

@@ -0,0 +1,174 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
static INLINE void loop_filter_neon_16(uint8x16_t qblimit, // blimit
uint8x16_t qlimit, // limit
uint8x16_t qthresh, // thresh
uint8x16_t q3, // p3
uint8x16_t q4, // p2
uint8x16_t q5, // p1
uint8x16_t q6, // p0
uint8x16_t q7, // q0
uint8x16_t q8, // q1
uint8x16_t q9, // q2
uint8x16_t q10, // q3
uint8x16_t *q5r, // p1
uint8x16_t *q6r, // p0
uint8x16_t *q7r, // q0
uint8x16_t *q8r) { // q1
uint8x16_t q1u8, q2u8, q11u8, q12u8, q13u8, q14u8, q15u8;
int16x8_t q2s16, q11s16;
uint16x8_t q4u16;
int8x16_t q0s8, q1s8, q2s8, q11s8, q12s8, q13s8;
int8x8_t d2s8, d3s8;
q11u8 = vabdq_u8(q3, q4);
q12u8 = vabdq_u8(q4, q5);
q13u8 = vabdq_u8(q5, q6);
q14u8 = vabdq_u8(q8, q7);
q3 = vabdq_u8(q9, q8);
q4 = vabdq_u8(q10, q9);
q11u8 = vmaxq_u8(q11u8, q12u8);
q12u8 = vmaxq_u8(q13u8, q14u8);
q3 = vmaxq_u8(q3, q4);
q15u8 = vmaxq_u8(q11u8, q12u8);
q9 = vabdq_u8(q6, q7);
// aom_hevmask
q13u8 = vcgtq_u8(q13u8, qthresh);
q14u8 = vcgtq_u8(q14u8, qthresh);
q15u8 = vmaxq_u8(q15u8, q3);
q2u8 = vabdq_u8(q5, q8);
q9 = vqaddq_u8(q9, q9);
q15u8 = vcgeq_u8(qlimit, q15u8);
// aom_filter() function
// convert to signed
q10 = vdupq_n_u8(0x80);
q8 = veorq_u8(q8, q10);
q7 = veorq_u8(q7, q10);
q6 = veorq_u8(q6, q10);
q5 = veorq_u8(q5, q10);
q2u8 = vshrq_n_u8(q2u8, 1);
q9 = vqaddq_u8(q9, q2u8);
q2s16 = vsubl_s8(vget_low_s8(vreinterpretq_s8_u8(q7)),
vget_low_s8(vreinterpretq_s8_u8(q6)));
q11s16 = vsubl_s8(vget_high_s8(vreinterpretq_s8_u8(q7)),
vget_high_s8(vreinterpretq_s8_u8(q6)));
q9 = vcgeq_u8(qblimit, q9);
q1s8 = vqsubq_s8(vreinterpretq_s8_u8(q5), vreinterpretq_s8_u8(q8));
q14u8 = vorrq_u8(q13u8, q14u8);
q4u16 = vdupq_n_u16(3);
q2s16 = vmulq_s16(q2s16, vreinterpretq_s16_u16(q4u16));
q11s16 = vmulq_s16(q11s16, vreinterpretq_s16_u16(q4u16));
q1u8 = vandq_u8(vreinterpretq_u8_s8(q1s8), q14u8);
q15u8 = vandq_u8(q15u8, q9);
q1s8 = vreinterpretq_s8_u8(q1u8);
q2s16 = vaddw_s8(q2s16, vget_low_s8(q1s8));
q11s16 = vaddw_s8(q11s16, vget_high_s8(q1s8));
q4 = vdupq_n_u8(3);
q9 = vdupq_n_u8(4);
// aom_filter = clamp(aom_filter + 3 * ( qs0 - ps0))
d2s8 = vqmovn_s16(q2s16);
d3s8 = vqmovn_s16(q11s16);
q1s8 = vcombine_s8(d2s8, d3s8);
q1u8 = vandq_u8(vreinterpretq_u8_s8(q1s8), q15u8);
q1s8 = vreinterpretq_s8_u8(q1u8);
q2s8 = vqaddq_s8(q1s8, vreinterpretq_s8_u8(q4));
q1s8 = vqaddq_s8(q1s8, vreinterpretq_s8_u8(q9));
q2s8 = vshrq_n_s8(q2s8, 3);
q1s8 = vshrq_n_s8(q1s8, 3);
q11s8 = vqaddq_s8(vreinterpretq_s8_u8(q6), q2s8);
q0s8 = vqsubq_s8(vreinterpretq_s8_u8(q7), q1s8);
q1s8 = vrshrq_n_s8(q1s8, 1);
q1s8 = vbicq_s8(q1s8, vreinterpretq_s8_u8(q14u8));
q13s8 = vqaddq_s8(vreinterpretq_s8_u8(q5), q1s8);
q12s8 = vqsubq_s8(vreinterpretq_s8_u8(q8), q1s8);
*q8r = veorq_u8(vreinterpretq_u8_s8(q12s8), q10);
*q7r = veorq_u8(vreinterpretq_u8_s8(q0s8), q10);
*q6r = veorq_u8(vreinterpretq_u8_s8(q11s8), q10);
*q5r = veorq_u8(vreinterpretq_u8_s8(q13s8), q10);
return;
}
void aom_lpf_horizontal_4_dual_neon(
uint8_t *s, int p /* pitch */, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1,
const uint8_t *limit1, const uint8_t *thresh1) {
uint8x8_t dblimit0, dlimit0, dthresh0, dblimit1, dlimit1, dthresh1;
uint8x16_t qblimit, qlimit, qthresh;
uint8x16_t q3u8, q4u8, q5u8, q6u8, q7u8, q8u8, q9u8, q10u8;
dblimit0 = vld1_u8(blimit0);
dlimit0 = vld1_u8(limit0);
dthresh0 = vld1_u8(thresh0);
dblimit1 = vld1_u8(blimit1);
dlimit1 = vld1_u8(limit1);
dthresh1 = vld1_u8(thresh1);
qblimit = vcombine_u8(dblimit0, dblimit1);
qlimit = vcombine_u8(dlimit0, dlimit1);
qthresh = vcombine_u8(dthresh0, dthresh1);
s -= (p << 2);
q3u8 = vld1q_u8(s);
s += p;
q4u8 = vld1q_u8(s);
s += p;
q5u8 = vld1q_u8(s);
s += p;
q6u8 = vld1q_u8(s);
s += p;
q7u8 = vld1q_u8(s);
s += p;
q8u8 = vld1q_u8(s);
s += p;
q9u8 = vld1q_u8(s);
s += p;
q10u8 = vld1q_u8(s);
loop_filter_neon_16(qblimit, qlimit, qthresh, q3u8, q4u8, q5u8, q6u8, q7u8,
q8u8, q9u8, q10u8, &q5u8, &q6u8, &q7u8, &q8u8);
s -= (p * 5);
vst1q_u8(s, q5u8);
s += p;
vst1q_u8(s, q6u8);
s += p;
vst1q_u8(s, q7u8);
s += p;
vst1q_u8(s, q8u8);
return;
}

View File

@@ -1,23 +1,26 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_lpf_horizontal_4_neon| ;
EXPORT |vpx_lpf_vertical_4_neon|
EXPORT |aom_lpf_horizontal_4_neon|
EXPORT |aom_lpf_vertical_4_neon|
ARM ARM
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
; Currently vpx only works on iterations 8 at a time. The vp8 loop filter ; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time. ; works on 16 iterations at a time.
; ;
; void vpx_lpf_horizontal_4_neon(uint8_t *s, ; void aom_lpf_horizontal_4_neon(uint8_t *s,
; int p /* pitch */, ; int p /* pitch */,
; const uint8_t *blimit, ; const uint8_t *blimit,
; const uint8_t *limit, ; const uint8_t *limit,
@@ -28,7 +31,7 @@
; r2 const uint8_t *blimit, ; r2 const uint8_t *blimit,
; r3 const uint8_t *limit, ; r3 const uint8_t *limit,
; sp const uint8_t *thresh, ; sp const uint8_t *thresh,
|vpx_lpf_horizontal_4_neon| PROC |aom_lpf_horizontal_4_neon| PROC
push {lr} push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit vld1.8 {d0[]}, [r2] ; duplicate *blimit
@@ -53,7 +56,7 @@
sub r2, r2, r1, lsl #1 sub r2, r2, r1, lsl #1
sub r3, r3, r1, lsl #1 sub r3, r3, r1, lsl #1
bl vpx_loop_filter_neon bl aom_loop_filter_neon
vst1.u8 {d4}, [r2@64], r1 ; store op1 vst1.u8 {d4}, [r2@64], r1 ; store op1
vst1.u8 {d5}, [r3@64], r1 ; store op0 vst1.u8 {d5}, [r3@64], r1 ; store op0
@@ -61,12 +64,12 @@
vst1.u8 {d7}, [r3@64], r1 ; store oq1 vst1.u8 {d7}, [r3@64], r1 ; store oq1
pop {pc} pop {pc}
ENDP ; |vpx_lpf_horizontal_4_neon| ENDP ; |aom_lpf_horizontal_4_neon|
; Currently vpx only works on iterations 8 at a time. The vp8 loop filter ; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time. ; works on 16 iterations at a time.
; ;
; void vpx_lpf_vertical_4_neon(uint8_t *s, ; void aom_lpf_vertical_4_neon(uint8_t *s,
; int p /* pitch */, ; int p /* pitch */,
; const uint8_t *blimit, ; const uint8_t *blimit,
; const uint8_t *limit, ; const uint8_t *limit,
@@ -77,7 +80,7 @@
; r2 const uint8_t *blimit, ; r2 const uint8_t *blimit,
; r3 const uint8_t *limit, ; r3 const uint8_t *limit,
; sp const uint8_t *thresh, ; sp const uint8_t *thresh,
|vpx_lpf_vertical_4_neon| PROC |aom_lpf_vertical_4_neon| PROC
push {lr} push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit vld1.8 {d0[]}, [r2] ; duplicate *blimit
@@ -113,7 +116,7 @@
vtrn.8 d7, d16 vtrn.8 d7, d16
vtrn.8 d17, d18 vtrn.8 d17, d18
bl vpx_loop_filter_neon bl aom_loop_filter_neon
sub r0, r0, #2 sub r0, r0, #2
@@ -128,9 +131,9 @@
vst4.8 {d4[7], d5[7], d6[7], d7[7]}, [r0] vst4.8 {d4[7], d5[7], d6[7], d7[7]}, [r0]
pop {pc} pop {pc}
ENDP ; |vpx_lpf_vertical_4_neon| ENDP ; |aom_lpf_vertical_4_neon|
; void vpx_loop_filter_neon(); ; void aom_loop_filter_neon();
; This is a helper function for the loopfilters. The invidual functions do the ; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. The function does not use ; necessary load, transpose (if necessary) and store. The function does not use
; registers d8-d15. ; registers d8-d15.
@@ -154,7 +157,7 @@
; d5 op0 ; d5 op0
; d6 oq0 ; d6 oq0
; d7 oq1 ; d7 oq1
|vpx_loop_filter_neon| PROC |aom_loop_filter_neon| PROC
; filter_mask ; filter_mask
vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2) vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2)
vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1) vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1)
@@ -244,6 +247,6 @@
veor d7, d20, d18 ; *oq1 = u^0x80 veor d7, d20, d18 ; *oq1 = u^0x80
bx lr bx lr
ENDP ; |vpx_loop_filter_neon| ENDP ; |aom_loop_filter_neon|
END END

View File

@@ -0,0 +1,250 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
static INLINE void loop_filter_neon(uint8x8_t dblimit, // flimit
uint8x8_t dlimit, // limit
uint8x8_t dthresh, // thresh
uint8x8_t d3u8, // p3
uint8x8_t d4u8, // p2
uint8x8_t d5u8, // p1
uint8x8_t d6u8, // p0
uint8x8_t d7u8, // q0
uint8x8_t d16u8, // q1
uint8x8_t d17u8, // q2
uint8x8_t d18u8, // q3
uint8x8_t *d4ru8, // p1
uint8x8_t *d5ru8, // p0
uint8x8_t *d6ru8, // q0
uint8x8_t *d7ru8) { // q1
uint8x8_t d19u8, d20u8, d21u8, d22u8, d23u8, d27u8, d28u8;
int16x8_t q12s16;
int8x8_t d19s8, d20s8, d21s8, d26s8, d27s8, d28s8;
d19u8 = vabd_u8(d3u8, d4u8);
d20u8 = vabd_u8(d4u8, d5u8);
d21u8 = vabd_u8(d5u8, d6u8);
d22u8 = vabd_u8(d16u8, d7u8);
d3u8 = vabd_u8(d17u8, d16u8);
d4u8 = vabd_u8(d18u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d20u8 = vmax_u8(d21u8, d22u8);
d3u8 = vmax_u8(d3u8, d4u8);
d23u8 = vmax_u8(d19u8, d20u8);
d17u8 = vabd_u8(d6u8, d7u8);
d21u8 = vcgt_u8(d21u8, dthresh);
d22u8 = vcgt_u8(d22u8, dthresh);
d23u8 = vmax_u8(d23u8, d3u8);
d28u8 = vabd_u8(d5u8, d16u8);
d17u8 = vqadd_u8(d17u8, d17u8);
d23u8 = vcge_u8(dlimit, d23u8);
d18u8 = vdup_n_u8(0x80);
d5u8 = veor_u8(d5u8, d18u8);
d6u8 = veor_u8(d6u8, d18u8);
d7u8 = veor_u8(d7u8, d18u8);
d16u8 = veor_u8(d16u8, d18u8);
d28u8 = vshr_n_u8(d28u8, 1);
d17u8 = vqadd_u8(d17u8, d28u8);
d19u8 = vdup_n_u8(3);
d28s8 = vsub_s8(vreinterpret_s8_u8(d7u8), vreinterpret_s8_u8(d6u8));
d17u8 = vcge_u8(dblimit, d17u8);
d27s8 = vqsub_s8(vreinterpret_s8_u8(d5u8), vreinterpret_s8_u8(d16u8));
d22u8 = vorr_u8(d21u8, d22u8);
q12s16 = vmull_s8(d28s8, vreinterpret_s8_u8(d19u8));
d27u8 = vand_u8(vreinterpret_u8_s8(d27s8), d22u8);
d23u8 = vand_u8(d23u8, d17u8);
q12s16 = vaddw_s8(q12s16, vreinterpret_s8_u8(d27u8));
d17u8 = vdup_n_u8(4);
d27s8 = vqmovn_s16(q12s16);
d27u8 = vand_u8(vreinterpret_u8_s8(d27s8), d23u8);
d27s8 = vreinterpret_s8_u8(d27u8);
d28s8 = vqadd_s8(d27s8, vreinterpret_s8_u8(d19u8));
d27s8 = vqadd_s8(d27s8, vreinterpret_s8_u8(d17u8));
d28s8 = vshr_n_s8(d28s8, 3);
d27s8 = vshr_n_s8(d27s8, 3);
d19s8 = vqadd_s8(vreinterpret_s8_u8(d6u8), d28s8);
d26s8 = vqsub_s8(vreinterpret_s8_u8(d7u8), d27s8);
d27s8 = vrshr_n_s8(d27s8, 1);
d27s8 = vbic_s8(d27s8, vreinterpret_s8_u8(d22u8));
d21s8 = vqadd_s8(vreinterpret_s8_u8(d5u8), d27s8);
d20s8 = vqsub_s8(vreinterpret_s8_u8(d16u8), d27s8);
*d4ru8 = veor_u8(vreinterpret_u8_s8(d21s8), d18u8);
*d5ru8 = veor_u8(vreinterpret_u8_s8(d19s8), d18u8);
*d6ru8 = veor_u8(vreinterpret_u8_s8(d26s8), d18u8);
*d7ru8 = veor_u8(vreinterpret_u8_s8(d20s8), d18u8);
return;
}
void aom_lpf_horizontal_4_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s, *psrc;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d3u8, d4u8, d5u8, d6u8, d7u8, d16u8, d17u8, d18u8;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
psrc = src - (pitch << 2);
for (i = 0; i < 1; i++) {
s = psrc + i * 8;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
loop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d4u8, &d5u8, &d6u8, &d7u8);
s -= (pitch * 5);
vst1_u8(s, d4u8);
s += pitch;
vst1_u8(s, d5u8);
s += pitch;
vst1_u8(s, d6u8);
s += pitch;
vst1_u8(s, d7u8);
}
return;
}
void aom_lpf_vertical_4_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i, pitch8;
uint8_t *s;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d3u8, d4u8, d5u8, d6u8, d7u8, d16u8, d17u8, d18u8;
uint32x2x2_t d2tmp0, d2tmp1, d2tmp2, d2tmp3;
uint16x4x2_t d2tmp4, d2tmp5, d2tmp6, d2tmp7;
uint8x8x2_t d2tmp8, d2tmp9, d2tmp10, d2tmp11;
uint8x8x4_t d4Result;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
pitch8 = pitch * 8;
for (i = 0; i < 1; i++, src += pitch8) {
s = src - (i + 1) * 4;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
d2tmp0 = vtrn_u32(vreinterpret_u32_u8(d3u8), vreinterpret_u32_u8(d7u8));
d2tmp1 = vtrn_u32(vreinterpret_u32_u8(d4u8), vreinterpret_u32_u8(d16u8));
d2tmp2 = vtrn_u32(vreinterpret_u32_u8(d5u8), vreinterpret_u32_u8(d17u8));
d2tmp3 = vtrn_u32(vreinterpret_u32_u8(d6u8), vreinterpret_u32_u8(d18u8));
d2tmp4 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[0]),
vreinterpret_u16_u32(d2tmp2.val[0]));
d2tmp5 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[0]),
vreinterpret_u16_u32(d2tmp3.val[0]));
d2tmp6 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[1]),
vreinterpret_u16_u32(d2tmp2.val[1]));
d2tmp7 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[1]),
vreinterpret_u16_u32(d2tmp3.val[1]));
d2tmp8 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[0]),
vreinterpret_u8_u16(d2tmp5.val[0]));
d2tmp9 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[1]),
vreinterpret_u8_u16(d2tmp5.val[1]));
d2tmp10 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[0]),
vreinterpret_u8_u16(d2tmp7.val[0]));
d2tmp11 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[1]),
vreinterpret_u8_u16(d2tmp7.val[1]));
d3u8 = d2tmp8.val[0];
d4u8 = d2tmp8.val[1];
d5u8 = d2tmp9.val[0];
d6u8 = d2tmp9.val[1];
d7u8 = d2tmp10.val[0];
d16u8 = d2tmp10.val[1];
d17u8 = d2tmp11.val[0];
d18u8 = d2tmp11.val[1];
loop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d4u8, &d5u8, &d6u8, &d7u8);
d4Result.val[0] = d4u8;
d4Result.val[1] = d5u8;
d4Result.val[2] = d6u8;
d4Result.val[3] = d7u8;
src -= 2;
vst4_lane_u8(src, d4Result, 0);
src += pitch;
vst4_lane_u8(src, d4Result, 1);
src += pitch;
vst4_lane_u8(src, d4Result, 2);
src += pitch;
vst4_lane_u8(src, d4Result, 3);
src += pitch;
vst4_lane_u8(src, d4Result, 4);
src += pitch;
vst4_lane_u8(src, d4Result, 5);
src += pitch;
vst4_lane_u8(src, d4Result, 6);
src += pitch;
vst4_lane_u8(src, d4Result, 7);
}
return;
}

View File

@@ -1,23 +1,26 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_lpf_horizontal_8_neon| ;
EXPORT |vpx_lpf_vertical_8_neon|
EXPORT |aom_lpf_horizontal_8_neon|
EXPORT |aom_lpf_vertical_8_neon|
ARM ARM
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
; Currently vpx only works on iterations 8 at a time. The vp8 loop filter ; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time. ; works on 16 iterations at a time.
; ;
; void vpx_lpf_horizontal_8_neon(uint8_t *s, int p, ; void aom_lpf_horizontal_8_neon(uint8_t *s, int p,
; const uint8_t *blimit, ; const uint8_t *blimit,
; const uint8_t *limit, ; const uint8_t *limit,
; const uint8_t *thresh) ; const uint8_t *thresh)
@@ -26,7 +29,7 @@
; r2 const uint8_t *blimit, ; r2 const uint8_t *blimit,
; r3 const uint8_t *limit, ; r3 const uint8_t *limit,
; sp const uint8_t *thresh, ; sp const uint8_t *thresh,
|vpx_lpf_horizontal_8_neon| PROC |aom_lpf_horizontal_8_neon| PROC
push {r4-r5, lr} push {r4-r5, lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit vld1.8 {d0[]}, [r2] ; duplicate *blimit
@@ -51,7 +54,7 @@
sub r3, r3, r1, lsl #1 sub r3, r3, r1, lsl #1
sub r2, r2, r1, lsl #2 sub r2, r2, r1, lsl #2
bl vpx_mbloop_filter_neon bl aom_mbloop_filter_neon
vst1.u8 {d0}, [r2@64], r1 ; store op2 vst1.u8 {d0}, [r2@64], r1 ; store op2
vst1.u8 {d1}, [r3@64], r1 ; store op1 vst1.u8 {d1}, [r3@64], r1 ; store op1
@@ -62,9 +65,9 @@
pop {r4-r5, pc} pop {r4-r5, pc}
ENDP ; |vpx_lpf_horizontal_8_neon| ENDP ; |aom_lpf_horizontal_8_neon|
; void vpx_lpf_vertical_8_neon(uint8_t *s, ; void aom_lpf_vertical_8_neon(uint8_t *s,
; int pitch, ; int pitch,
; const uint8_t *blimit, ; const uint8_t *blimit,
; const uint8_t *limit, ; const uint8_t *limit,
@@ -75,7 +78,7 @@
; r2 const uint8_t *blimit, ; r2 const uint8_t *blimit,
; r3 const uint8_t *limit, ; r3 const uint8_t *limit,
; sp const uint8_t *thresh, ; sp const uint8_t *thresh,
|vpx_lpf_vertical_8_neon| PROC |aom_lpf_vertical_8_neon| PROC
push {r4-r5, lr} push {r4-r5, lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit vld1.8 {d0[]}, [r2] ; duplicate *blimit
@@ -114,7 +117,7 @@
sub r2, r0, #3 sub r2, r0, #3
add r3, r0, #1 add r3, r0, #1
bl vpx_mbloop_filter_neon bl aom_mbloop_filter_neon
;store op2, op1, op0, oq0 ;store op2, op1, op0, oq0
vst4.8 {d0[0], d1[0], d2[0], d3[0]}, [r2], r1 vst4.8 {d0[0], d1[0], d2[0], d3[0]}, [r2], r1
@@ -137,9 +140,9 @@
vst2.8 {d4[7], d5[7]}, [r3] vst2.8 {d4[7], d5[7]}, [r3]
pop {r4-r5, pc} pop {r4-r5, pc}
ENDP ; |vpx_lpf_vertical_8_neon| ENDP ; |aom_lpf_vertical_8_neon|
; void vpx_mbloop_filter_neon(); ; void aom_mbloop_filter_neon();
; This is a helper function for the loopfilters. The invidual functions do the ; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. The function does not use ; necessary load, transpose (if necessary) and store. The function does not use
; registers d8-d15. ; registers d8-d15.
@@ -165,7 +168,7 @@
; d3 oq0 ; d3 oq0
; d4 oq1 ; d4 oq1
; d5 oq2 ; d5 oq2
|vpx_mbloop_filter_neon| PROC |aom_mbloop_filter_neon| PROC
; filter_mask ; filter_mask
vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2) vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2)
vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1) vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1)
@@ -420,6 +423,6 @@ filter_branch_only
bx lr bx lr
ENDP ; |vpx_mbloop_filter_neon| ENDP ; |aom_mbloop_filter_neon|
END END

View File

@@ -0,0 +1,430 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
static INLINE void mbloop_filter_neon(uint8x8_t dblimit, // mblimit
uint8x8_t dlimit, // limit
uint8x8_t dthresh, // thresh
uint8x8_t d3u8, // p2
uint8x8_t d4u8, // p2
uint8x8_t d5u8, // p1
uint8x8_t d6u8, // p0
uint8x8_t d7u8, // q0
uint8x8_t d16u8, // q1
uint8x8_t d17u8, // q2
uint8x8_t d18u8, // q3
uint8x8_t *d0ru8, // p1
uint8x8_t *d1ru8, // p1
uint8x8_t *d2ru8, // p0
uint8x8_t *d3ru8, // q0
uint8x8_t *d4ru8, // q1
uint8x8_t *d5ru8) { // q1
uint32_t flat;
uint8x8_t d0u8, d1u8, d2u8, d19u8, d20u8, d21u8, d22u8, d23u8, d24u8;
uint8x8_t d25u8, d26u8, d27u8, d28u8, d29u8, d30u8, d31u8;
int16x8_t q15s16;
uint16x8_t q10u16, q14u16;
int8x8_t d21s8, d24s8, d25s8, d26s8, d28s8, d29s8, d30s8;
d19u8 = vabd_u8(d3u8, d4u8);
d20u8 = vabd_u8(d4u8, d5u8);
d21u8 = vabd_u8(d5u8, d6u8);
d22u8 = vabd_u8(d16u8, d7u8);
d23u8 = vabd_u8(d17u8, d16u8);
d24u8 = vabd_u8(d18u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d20u8 = vmax_u8(d21u8, d22u8);
d25u8 = vabd_u8(d6u8, d4u8);
d23u8 = vmax_u8(d23u8, d24u8);
d26u8 = vabd_u8(d7u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d24u8 = vabd_u8(d6u8, d7u8);
d27u8 = vabd_u8(d3u8, d6u8);
d28u8 = vabd_u8(d18u8, d7u8);
d19u8 = vmax_u8(d19u8, d23u8);
d23u8 = vabd_u8(d5u8, d16u8);
d24u8 = vqadd_u8(d24u8, d24u8);
d19u8 = vcge_u8(dlimit, d19u8);
d25u8 = vmax_u8(d25u8, d26u8);
d26u8 = vmax_u8(d27u8, d28u8);
d23u8 = vshr_n_u8(d23u8, 1);
d25u8 = vmax_u8(d25u8, d26u8);
d24u8 = vqadd_u8(d24u8, d23u8);
d20u8 = vmax_u8(d20u8, d25u8);
d23u8 = vdup_n_u8(1);
d24u8 = vcge_u8(dblimit, d24u8);
d21u8 = vcgt_u8(d21u8, dthresh);
d20u8 = vcge_u8(d23u8, d20u8);
d19u8 = vand_u8(d19u8, d24u8);
d23u8 = vcgt_u8(d22u8, dthresh);
d20u8 = vand_u8(d20u8, d19u8);
d22u8 = vdup_n_u8(0x80);
d23u8 = vorr_u8(d21u8, d23u8);
q10u16 = vcombine_u16(vreinterpret_u16_u8(d20u8), vreinterpret_u16_u8(d21u8));
d30u8 = vshrn_n_u16(q10u16, 4);
flat = vget_lane_u32(vreinterpret_u32_u8(d30u8), 0);
if (flat == 0xffffffff) { // Check for all 1's, power_branch_only
d27u8 = vdup_n_u8(3);
d21u8 = vdup_n_u8(2);
q14u16 = vaddl_u8(d6u8, d7u8);
q14u16 = vmlal_u8(q14u16, d3u8, d27u8);
q14u16 = vmlal_u8(q14u16, d4u8, d21u8);
q14u16 = vaddw_u8(q14u16, d5u8);
*d0ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d16u8);
*d1ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d17u8);
*d2ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d3ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vsubw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d4ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vsubw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d17u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d5ru8 = vqrshrn_n_u16(q14u16, 3);
} else {
d21u8 = veor_u8(d7u8, d22u8);
d24u8 = veor_u8(d6u8, d22u8);
d25u8 = veor_u8(d5u8, d22u8);
d26u8 = veor_u8(d16u8, d22u8);
d27u8 = vdup_n_u8(3);
d28s8 = vsub_s8(vreinterpret_s8_u8(d21u8), vreinterpret_s8_u8(d24u8));
d29s8 = vqsub_s8(vreinterpret_s8_u8(d25u8), vreinterpret_s8_u8(d26u8));
q15s16 = vmull_s8(d28s8, vreinterpret_s8_u8(d27u8));
d29s8 = vand_s8(d29s8, vreinterpret_s8_u8(d23u8));
q15s16 = vaddw_s8(q15s16, d29s8);
d29u8 = vdup_n_u8(4);
d28s8 = vqmovn_s16(q15s16);
d28s8 = vand_s8(d28s8, vreinterpret_s8_u8(d19u8));
d30s8 = vqadd_s8(d28s8, vreinterpret_s8_u8(d27u8));
d29s8 = vqadd_s8(d28s8, vreinterpret_s8_u8(d29u8));
d30s8 = vshr_n_s8(d30s8, 3);
d29s8 = vshr_n_s8(d29s8, 3);
d24s8 = vqadd_s8(vreinterpret_s8_u8(d24u8), d30s8);
d21s8 = vqsub_s8(vreinterpret_s8_u8(d21u8), d29s8);
d29s8 = vrshr_n_s8(d29s8, 1);
d29s8 = vbic_s8(d29s8, vreinterpret_s8_u8(d23u8));
d25s8 = vqadd_s8(vreinterpret_s8_u8(d25u8), d29s8);
d26s8 = vqsub_s8(vreinterpret_s8_u8(d26u8), d29s8);
if (flat == 0) { // filter_branch_only
*d0ru8 = d4u8;
*d1ru8 = veor_u8(vreinterpret_u8_s8(d25s8), d22u8);
*d2ru8 = veor_u8(vreinterpret_u8_s8(d24s8), d22u8);
*d3ru8 = veor_u8(vreinterpret_u8_s8(d21s8), d22u8);
*d4ru8 = veor_u8(vreinterpret_u8_s8(d26s8), d22u8);
*d5ru8 = d17u8;
return;
}
d21u8 = veor_u8(vreinterpret_u8_s8(d21s8), d22u8);
d24u8 = veor_u8(vreinterpret_u8_s8(d24s8), d22u8);
d25u8 = veor_u8(vreinterpret_u8_s8(d25s8), d22u8);
d26u8 = veor_u8(vreinterpret_u8_s8(d26s8), d22u8);
d23u8 = vdup_n_u8(2);
q14u16 = vaddl_u8(d6u8, d7u8);
q14u16 = vmlal_u8(q14u16, d3u8, d27u8);
q14u16 = vmlal_u8(q14u16, d4u8, d23u8);
d0u8 = vbsl_u8(d20u8, dblimit, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
d1u8 = vbsl_u8(d20u8, dlimit, d25u8);
d30u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d16u8);
d2u8 = vbsl_u8(d20u8, dthresh, d24u8);
d31u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d17u8);
*d0ru8 = vbsl_u8(d20u8, d30u8, d0u8);
d23u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d7u8);
*d1ru8 = vbsl_u8(d20u8, d31u8, d1u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d2ru8 = vbsl_u8(d20u8, d23u8, d2u8);
d22u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vsubw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d16u8);
d3u8 = vbsl_u8(d20u8, d3u8, d21u8);
q14u16 = vaddw_u8(q14u16, d18u8);
d4u8 = vbsl_u8(d20u8, d4u8, d26u8);
d6u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vsubw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d17u8);
q14u16 = vaddw_u8(q14u16, d18u8);
d5u8 = vbsl_u8(d20u8, d5u8, d17u8);
d7u8 = vqrshrn_n_u16(q14u16, 3);
*d3ru8 = vbsl_u8(d20u8, d22u8, d3u8);
*d4ru8 = vbsl_u8(d20u8, d6u8, d4u8);
*d5ru8 = vbsl_u8(d20u8, d7u8, d5u8);
}
return;
}
void aom_lpf_horizontal_8_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s, *psrc;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
uint8x8_t d16u8, d17u8, d18u8;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
psrc = src - (pitch << 2);
for (i = 0; i < 1; i++) {
s = psrc + i * 8;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
mbloop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d0u8, &d1u8, &d2u8, &d3u8, &d4u8,
&d5u8);
s -= (pitch * 6);
vst1_u8(s, d0u8);
s += pitch;
vst1_u8(s, d1u8);
s += pitch;
vst1_u8(s, d2u8);
s += pitch;
vst1_u8(s, d3u8);
s += pitch;
vst1_u8(s, d4u8);
s += pitch;
vst1_u8(s, d5u8);
}
return;
}
void aom_lpf_vertical_8_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
uint8x8_t d16u8, d17u8, d18u8;
uint32x2x2_t d2tmp0, d2tmp1, d2tmp2, d2tmp3;
uint16x4x2_t d2tmp4, d2tmp5, d2tmp6, d2tmp7;
uint8x8x2_t d2tmp8, d2tmp9, d2tmp10, d2tmp11;
uint8x8x4_t d4Result;
uint8x8x2_t d2Result;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
for (i = 0; i < 1; i++) {
s = src + (i * (pitch << 3)) - 4;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
d2tmp0 = vtrn_u32(vreinterpret_u32_u8(d3u8), vreinterpret_u32_u8(d7u8));
d2tmp1 = vtrn_u32(vreinterpret_u32_u8(d4u8), vreinterpret_u32_u8(d16u8));
d2tmp2 = vtrn_u32(vreinterpret_u32_u8(d5u8), vreinterpret_u32_u8(d17u8));
d2tmp3 = vtrn_u32(vreinterpret_u32_u8(d6u8), vreinterpret_u32_u8(d18u8));
d2tmp4 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[0]),
vreinterpret_u16_u32(d2tmp2.val[0]));
d2tmp5 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[0]),
vreinterpret_u16_u32(d2tmp3.val[0]));
d2tmp6 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[1]),
vreinterpret_u16_u32(d2tmp2.val[1]));
d2tmp7 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[1]),
vreinterpret_u16_u32(d2tmp3.val[1]));
d2tmp8 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[0]),
vreinterpret_u8_u16(d2tmp5.val[0]));
d2tmp9 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[1]),
vreinterpret_u8_u16(d2tmp5.val[1]));
d2tmp10 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[0]),
vreinterpret_u8_u16(d2tmp7.val[0]));
d2tmp11 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[1]),
vreinterpret_u8_u16(d2tmp7.val[1]));
d3u8 = d2tmp8.val[0];
d4u8 = d2tmp8.val[1];
d5u8 = d2tmp9.val[0];
d6u8 = d2tmp9.val[1];
d7u8 = d2tmp10.val[0];
d16u8 = d2tmp10.val[1];
d17u8 = d2tmp11.val[0];
d18u8 = d2tmp11.val[1];
mbloop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d0u8, &d1u8, &d2u8, &d3u8, &d4u8,
&d5u8);
d4Result.val[0] = d0u8;
d4Result.val[1] = d1u8;
d4Result.val[2] = d2u8;
d4Result.val[3] = d3u8;
d2Result.val[0] = d4u8;
d2Result.val[1] = d5u8;
s = src - 3;
vst4_lane_u8(s, d4Result, 0);
s += pitch;
vst4_lane_u8(s, d4Result, 1);
s += pitch;
vst4_lane_u8(s, d4Result, 2);
s += pitch;
vst4_lane_u8(s, d4Result, 3);
s += pitch;
vst4_lane_u8(s, d4Result, 4);
s += pitch;
vst4_lane_u8(s, d4Result, 5);
s += pitch;
vst4_lane_u8(s, d4Result, 6);
s += pitch;
vst4_lane_u8(s, d4Result, 7);
s = src + 1;
vst2_lane_u8(s, d2Result, 0);
s += pitch;
vst2_lane_u8(s, d2Result, 1);
s += pitch;
vst2_lane_u8(s, d2Result, 2);
s += pitch;
vst2_lane_u8(s, d2Result, 3);
s += pitch;
vst2_lane_u8(s, d2Result, 4);
s += pitch;
vst2_lane_u8(s, d2Result, 5);
s += pitch;
vst2_lane_u8(s, d2Result, 6);
s += pitch;
vst2_lane_u8(s, d2Result, 7);
}
return;
}

View File

@@ -1,16 +1,19 @@
; ;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
; ;
EXPORT |vpx_lpf_horizontal_edge_8_neon| ;
EXPORT |vpx_lpf_horizontal_edge_16_neon|
EXPORT |vpx_lpf_vertical_16_neon| EXPORT |aom_lpf_horizontal_edge_8_neon|
EXPORT |aom_lpf_horizontal_edge_16_neon|
EXPORT |aom_lpf_vertical_16_neon|
ARM ARM
AREA ||.text||, CODE, READONLY, ALIGN=2 AREA ||.text||, CODE, READONLY, ALIGN=2
@@ -55,7 +58,7 @@ h_count
vld1.u8 {d14}, [r8@64], r1 ; q6 vld1.u8 {d14}, [r8@64], r1 ; q6
vld1.u8 {d15}, [r8@64], r1 ; q7 vld1.u8 {d15}, [r8@64], r1 ; q7
bl vpx_wide_mbfilter_neon bl aom_wide_mbfilter_neon
tst r7, #1 tst r7, #1
beq h_mbfilter beq h_mbfilter
@@ -118,7 +121,7 @@ h_next
ENDP ; |mb_lpf_horizontal_edge| ENDP ; |mb_lpf_horizontal_edge|
; void vpx_lpf_horizontal_edge_8_neon(uint8_t *s, int pitch, ; void aom_lpf_horizontal_edge_8_neon(uint8_t *s, int pitch,
; const uint8_t *blimit, ; const uint8_t *blimit,
; const uint8_t *limit, ; const uint8_t *limit,
; const uint8_t *thresh) ; const uint8_t *thresh)
@@ -127,12 +130,12 @@ h_next
; r2 const uint8_t *blimit, ; r2 const uint8_t *blimit,
; r3 const uint8_t *limit, ; r3 const uint8_t *limit,
; sp const uint8_t *thresh ; sp const uint8_t *thresh
|vpx_lpf_horizontal_edge_8_neon| PROC |aom_lpf_horizontal_edge_8_neon| PROC
mov r12, #1 mov r12, #1
b mb_lpf_horizontal_edge b mb_lpf_horizontal_edge
ENDP ; |vpx_lpf_horizontal_edge_8_neon| ENDP ; |aom_lpf_horizontal_edge_8_neon|
; void vpx_lpf_horizontal_edge_16_neon(uint8_t *s, int pitch, ; void aom_lpf_horizontal_edge_16_neon(uint8_t *s, int pitch,
; const uint8_t *blimit, ; const uint8_t *blimit,
; const uint8_t *limit, ; const uint8_t *limit,
; const uint8_t *thresh) ; const uint8_t *thresh)
@@ -141,12 +144,12 @@ h_next
; r2 const uint8_t *blimit, ; r2 const uint8_t *blimit,
; r3 const uint8_t *limit, ; r3 const uint8_t *limit,
; sp const uint8_t *thresh ; sp const uint8_t *thresh
|vpx_lpf_horizontal_edge_16_neon| PROC |aom_lpf_horizontal_edge_16_neon| PROC
mov r12, #2 mov r12, #2
b mb_lpf_horizontal_edge b mb_lpf_horizontal_edge
ENDP ; |vpx_lpf_horizontal_edge_16_neon| ENDP ; |aom_lpf_horizontal_edge_16_neon|
; void vpx_lpf_vertical_16_neon(uint8_t *s, int p, ; void aom_lpf_vertical_16_neon(uint8_t *s, int p,
; const uint8_t *blimit, ; const uint8_t *blimit,
; const uint8_t *limit, ; const uint8_t *limit,
; const uint8_t *thresh) ; const uint8_t *thresh)
@@ -155,7 +158,7 @@ h_next
; r2 const uint8_t *blimit, ; r2 const uint8_t *blimit,
; r3 const uint8_t *limit, ; r3 const uint8_t *limit,
; sp const uint8_t *thresh, ; sp const uint8_t *thresh,
|vpx_lpf_vertical_16_neon| PROC |aom_lpf_vertical_16_neon| PROC
push {r4-r8, lr} push {r4-r8, lr}
vpush {d8-d15} vpush {d8-d15}
ldr r4, [sp, #88] ; load thresh ldr r4, [sp, #88] ; load thresh
@@ -205,7 +208,7 @@ h_next
vtrn.8 d12, d13 vtrn.8 d12, d13
vtrn.8 d14, d15 vtrn.8 d14, d15
bl vpx_wide_mbfilter_neon bl aom_wide_mbfilter_neon
tst r7, #1 tst r7, #1
beq v_mbfilter beq v_mbfilter
@@ -308,9 +311,9 @@ v_end
vpop {d8-d15} vpop {d8-d15}
pop {r4-r8, pc} pop {r4-r8, pc}
ENDP ; |vpx_lpf_vertical_16_neon| ENDP ; |aom_lpf_vertical_16_neon|
; void vpx_wide_mbfilter_neon(); ; void aom_wide_mbfilter_neon();
; This is a helper function for the loopfilters. The invidual functions do the ; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. ; necessary load, transpose (if necessary) and store.
; ;
@@ -334,7 +337,7 @@ v_end
; d13 q5 ; d13 q5
; d14 q6 ; d14 q6
; d15 q7 ; d15 q7
|vpx_wide_mbfilter_neon| PROC |aom_wide_mbfilter_neon| PROC
mov r7, #0 mov r7, #0
; filter_mask ; filter_mask
@@ -630,6 +633,6 @@ v_end
vbif d3, d14, d17 ; oq6 |= q6 & ~(f2 & f & m) vbif d3, d14, d17 ; oq6 |= q6 & ~(f2 & f & m)
bx lr bx lr
ENDP ; |vpx_wide_mbfilter_neon| ENDP ; |aom_wide_mbfilter_neon|
END END

View File

@@ -0,0 +1,49 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
void aom_lpf_vertical_4_dual_neon(uint8_t *s, int p, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0,
const uint8_t *blimit1, const uint8_t *limit1,
const uint8_t *thresh1) {
aom_lpf_vertical_4_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_vertical_4_neon(s + 8 * p, p, blimit1, limit1, thresh1);
}
#if HAVE_NEON_ASM
void aom_lpf_horizontal_8_dual_neon(
uint8_t *s, int p /* pitch */, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1,
const uint8_t *limit1, const uint8_t *thresh1) {
aom_lpf_horizontal_8_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_horizontal_8_neon(s + 8, p, blimit1, limit1, thresh1);
}
void aom_lpf_vertical_8_dual_neon(uint8_t *s, int p, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0,
const uint8_t *blimit1, const uint8_t *limit1,
const uint8_t *thresh1) {
aom_lpf_vertical_8_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_vertical_8_neon(s + 8 * p, p, blimit1, limit1, thresh1);
}
void aom_lpf_vertical_16_dual_neon(uint8_t *s, int p, const uint8_t *blimit,
const uint8_t *limit,
const uint8_t *thresh) {
aom_lpf_vertical_16_neon(s, p, blimit, limit, thresh);
aom_lpf_vertical_16_neon(s + 8 * p, p, blimit, limit, thresh);
}
#endif // HAVE_NEON_ASM

View File

@@ -1,25 +1,26 @@
/* /*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo, static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo,
const uint16x8_t vec_hi) { const uint16x8_t vec_hi) {
const uint32x4_t vec_l_lo = vaddl_u16(vget_low_u16(vec_lo), const uint32x4_t vec_l_lo =
vget_high_u16(vec_lo)); vaddl_u16(vget_low_u16(vec_lo), vget_high_u16(vec_lo));
const uint32x4_t vec_l_hi = vaddl_u16(vget_low_u16(vec_hi), const uint32x4_t vec_l_hi =
vget_high_u16(vec_hi)); vaddl_u16(vget_low_u16(vec_hi), vget_high_u16(vec_hi));
const uint32x4_t a = vaddq_u32(vec_l_lo, vec_l_hi); const uint32x4_t a = vaddq_u32(vec_l_lo, vec_l_hi);
const uint64x2_t b = vpaddlq_u32(a); const uint64x2_t b = vpaddlq_u32(a);
const uint32x2_t c = vadd_u32(vreinterpret_u32_u64(vget_low_u64(b)), const uint32x2_t c = vadd_u32(vreinterpret_u32_u64(vget_low_u64(b)),
@@ -33,8 +34,7 @@ static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo,
static void sad_neon_64(const uint8x16_t vec_src_00, static void sad_neon_64(const uint8x16_t vec_src_00,
const uint8x16_t vec_src_16, const uint8x16_t vec_src_16,
const uint8x16_t vec_src_32, const uint8x16_t vec_src_32,
const uint8x16_t vec_src_48, const uint8x16_t vec_src_48, const uint8_t *ref,
const uint8_t *ref,
uint16x8_t *vec_sum_ref_lo, uint16x8_t *vec_sum_ref_lo,
uint16x8_t *vec_sum_ref_hi) { uint16x8_t *vec_sum_ref_hi) {
const uint8x16_t vec_ref_00 = vld1q_u8(ref); const uint8x16_t vec_ref_00 = vld1q_u8(ref);
@@ -63,8 +63,7 @@ static void sad_neon_64(const uint8x16_t vec_src_00,
// Calculate the absolute difference of 32 bytes from vec_src_00, vec_src_16, // Calculate the absolute difference of 32 bytes from vec_src_00, vec_src_16,
// and ref. Accumulate partial sums in vec_sum_ref_lo and vec_sum_ref_hi. // and ref. Accumulate partial sums in vec_sum_ref_lo and vec_sum_ref_hi.
static void sad_neon_32(const uint8x16_t vec_src_00, static void sad_neon_32(const uint8x16_t vec_src_00,
const uint8x16_t vec_src_16, const uint8x16_t vec_src_16, const uint8_t *ref,
const uint8_t *ref,
uint16x8_t *vec_sum_ref_lo, uint16x8_t *vec_sum_ref_lo,
uint16x8_t *vec_sum_ref_hi) { uint16x8_t *vec_sum_ref_hi) {
const uint8x16_t vec_ref_00 = vld1q_u8(ref); const uint8x16_t vec_ref_00 = vld1q_u8(ref);
@@ -80,8 +79,8 @@ static void sad_neon_32(const uint8x16_t vec_src_00,
vget_high_u8(vec_ref_16)); vget_high_u8(vec_ref_16));
} }
void vpx_sad64x64x4d_neon(const uint8_t *src, int src_stride, void aom_sad64x64x4d_neon(const uint8_t *src, int src_stride,
const uint8_t* const ref[4], int ref_stride, const uint8_t *const ref[4], int ref_stride,
uint32_t *res) { uint32_t *res) {
int i; int i;
uint16x8_t vec_sum_ref0_lo = vdupq_n_u16(0); uint16x8_t vec_sum_ref0_lo = vdupq_n_u16(0);
@@ -126,8 +125,8 @@ void vpx_sad64x64x4d_neon(const uint8_t *src, int src_stride,
res[3] = horizontal_long_add_16x8(vec_sum_ref3_lo, vec_sum_ref3_hi); res[3] = horizontal_long_add_16x8(vec_sum_ref3_lo, vec_sum_ref3_hi);
} }
void vpx_sad32x32x4d_neon(const uint8_t *src, int src_stride, void aom_sad32x32x4d_neon(const uint8_t *src, int src_stride,
const uint8_t* const ref[4], int ref_stride, const uint8_t *const ref[4], int ref_stride,
uint32_t *res) { uint32_t *res) {
int i; int i;
uint16x8_t vec_sum_ref0_lo = vdupq_n_u16(0); uint16x8_t vec_sum_ref0_lo = vdupq_n_u16(0);
@@ -148,14 +147,14 @@ void vpx_sad32x32x4d_neon(const uint8_t *src, int src_stride,
const uint8x16_t vec_src_00 = vld1q_u8(src); const uint8x16_t vec_src_00 = vld1q_u8(src);
const uint8x16_t vec_src_16 = vld1q_u8(src + 16); const uint8x16_t vec_src_16 = vld1q_u8(src + 16);
sad_neon_32(vec_src_00, vec_src_16, ref0, sad_neon_32(vec_src_00, vec_src_16, ref0, &vec_sum_ref0_lo,
&vec_sum_ref0_lo, &vec_sum_ref0_hi); &vec_sum_ref0_hi);
sad_neon_32(vec_src_00, vec_src_16, ref1, sad_neon_32(vec_src_00, vec_src_16, ref1, &vec_sum_ref1_lo,
&vec_sum_ref1_lo, &vec_sum_ref1_hi); &vec_sum_ref1_hi);
sad_neon_32(vec_src_00, vec_src_16, ref2, sad_neon_32(vec_src_00, vec_src_16, ref2, &vec_sum_ref2_lo,
&vec_sum_ref2_lo, &vec_sum_ref2_hi); &vec_sum_ref2_hi);
sad_neon_32(vec_src_00, vec_src_16, ref3, sad_neon_32(vec_src_00, vec_src_16, ref3, &vec_sum_ref3_lo,
&vec_sum_ref3_lo, &vec_sum_ref3_hi); &vec_sum_ref3_hi);
src += src_stride; src += src_stride;
ref0 += ref_stride; ref0 += ref_stride;
@@ -170,8 +169,8 @@ void vpx_sad32x32x4d_neon(const uint8_t *src, int src_stride,
res[3] = horizontal_long_add_16x8(vec_sum_ref3_lo, vec_sum_ref3_hi); res[3] = horizontal_long_add_16x8(vec_sum_ref3_lo, vec_sum_ref3_hi);
} }
void vpx_sad16x16x4d_neon(const uint8_t *src, int src_stride, void aom_sad16x16x4d_neon(const uint8_t *src, int src_stride,
const uint8_t* const ref[4], int ref_stride, const uint8_t *const ref[4], int ref_stride,
uint32_t *res) { uint32_t *res) {
int i; int i;
uint16x8_t vec_sum_ref0_lo = vdupq_n_u16(0); uint16x8_t vec_sum_ref0_lo = vdupq_n_u16(0);
@@ -195,20 +194,20 @@ void vpx_sad16x16x4d_neon(const uint8_t *src, int src_stride,
const uint8x16_t vec_ref2 = vld1q_u8(ref2); const uint8x16_t vec_ref2 = vld1q_u8(ref2);
const uint8x16_t vec_ref3 = vld1q_u8(ref3); const uint8x16_t vec_ref3 = vld1q_u8(ref3);
vec_sum_ref0_lo = vabal_u8(vec_sum_ref0_lo, vget_low_u8(vec_src), vec_sum_ref0_lo =
vget_low_u8(vec_ref0)); vabal_u8(vec_sum_ref0_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref0));
vec_sum_ref0_hi = vabal_u8(vec_sum_ref0_hi, vget_high_u8(vec_src), vec_sum_ref0_hi = vabal_u8(vec_sum_ref0_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref0)); vget_high_u8(vec_ref0));
vec_sum_ref1_lo = vabal_u8(vec_sum_ref1_lo, vget_low_u8(vec_src), vec_sum_ref1_lo =
vget_low_u8(vec_ref1)); vabal_u8(vec_sum_ref1_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref1));
vec_sum_ref1_hi = vabal_u8(vec_sum_ref1_hi, vget_high_u8(vec_src), vec_sum_ref1_hi = vabal_u8(vec_sum_ref1_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref1)); vget_high_u8(vec_ref1));
vec_sum_ref2_lo = vabal_u8(vec_sum_ref2_lo, vget_low_u8(vec_src), vec_sum_ref2_lo =
vget_low_u8(vec_ref2)); vabal_u8(vec_sum_ref2_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref2));
vec_sum_ref2_hi = vabal_u8(vec_sum_ref2_hi, vget_high_u8(vec_src), vec_sum_ref2_hi = vabal_u8(vec_sum_ref2_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref2)); vget_high_u8(vec_ref2));
vec_sum_ref3_lo = vabal_u8(vec_sum_ref3_lo, vget_low_u8(vec_src), vec_sum_ref3_lo =
vget_low_u8(vec_ref3)); vabal_u8(vec_sum_ref3_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref3));
vec_sum_ref3_hi = vabal_u8(vec_sum_ref3_hi, vget_high_u8(vec_src), vec_sum_ref3_hi = vabal_u8(vec_sum_ref3_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref3)); vget_high_u8(vec_ref3));

View File

@@ -1,15 +1,18 @@
; ;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
EXPORT |vpx_sad16x16_media| EXPORT |aom_sad16x16_media|
ARM ARM
REQUIRE8 REQUIRE8
@@ -21,7 +24,7 @@
; r1 int src_stride ; r1 int src_stride
; r2 const unsigned char *ref_ptr ; r2 const unsigned char *ref_ptr
; r3 int ref_stride ; r3 int ref_stride
|vpx_sad16x16_media| PROC |aom_sad16x16_media| PROC
stmfd sp!, {r4-r12, lr} stmfd sp!, {r4-r12, lr}
pld [r0, r1, lsl #0] pld [r0, r1, lsl #0]

View File

@@ -1,127 +1,119 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_config.h" #include "./aom_config.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
unsigned int vpx_sad8x16_neon( unsigned int aom_sad8x16_neon(unsigned char *src_ptr, int src_stride,
unsigned char *src_ptr, unsigned char *ref_ptr, int ref_stride) {
int src_stride, uint8x8_t d0, d8;
unsigned char *ref_ptr, uint16x8_t q12;
int ref_stride) { uint32x4_t q1;
uint8x8_t d0, d8; uint64x2_t q3;
uint16x8_t q12; uint32x2_t d5;
uint32x4_t q1; int i;
uint64x2_t q3;
uint32x2_t d5;
int i;
d0 = vld1_u8(src_ptr);
src_ptr += src_stride;
d8 = vld1_u8(ref_ptr);
ref_ptr += ref_stride;
q12 = vabdl_u8(d0, d8);
for (i = 0; i < 15; i++) {
d0 = vld1_u8(src_ptr); d0 = vld1_u8(src_ptr);
src_ptr += src_stride; src_ptr += src_stride;
d8 = vld1_u8(ref_ptr); d8 = vld1_u8(ref_ptr);
ref_ptr += ref_stride; ref_ptr += ref_stride;
q12 = vabdl_u8(d0, d8); q12 = vabal_u8(q12, d0, d8);
}
for (i = 0; i < 15; i++) { q1 = vpaddlq_u16(q12);
d0 = vld1_u8(src_ptr); q3 = vpaddlq_u32(q1);
src_ptr += src_stride; d5 = vadd_u32(vreinterpret_u32_u64(vget_low_u64(q3)),
d8 = vld1_u8(ref_ptr); vreinterpret_u32_u64(vget_high_u64(q3)));
ref_ptr += ref_stride;
q12 = vabal_u8(q12, d0, d8);
}
q1 = vpaddlq_u16(q12); return vget_lane_u32(d5, 0);
q3 = vpaddlq_u32(q1);
d5 = vadd_u32(vreinterpret_u32_u64(vget_low_u64(q3)),
vreinterpret_u32_u64(vget_high_u64(q3)));
return vget_lane_u32(d5, 0);
} }
unsigned int vpx_sad4x4_neon( unsigned int aom_sad4x4_neon(unsigned char *src_ptr, int src_stride,
unsigned char *src_ptr, unsigned char *ref_ptr, int ref_stride) {
int src_stride, uint8x8_t d0, d8;
unsigned char *ref_ptr, uint16x8_t q12;
int ref_stride) { uint32x2_t d1;
uint8x8_t d0, d8; uint64x1_t d3;
uint16x8_t q12; int i;
uint32x2_t d1;
uint64x1_t d3;
int i;
d0 = vld1_u8(src_ptr);
src_ptr += src_stride;
d8 = vld1_u8(ref_ptr);
ref_ptr += ref_stride;
q12 = vabdl_u8(d0, d8);
for (i = 0; i < 3; i++) {
d0 = vld1_u8(src_ptr); d0 = vld1_u8(src_ptr);
src_ptr += src_stride; src_ptr += src_stride;
d8 = vld1_u8(ref_ptr); d8 = vld1_u8(ref_ptr);
ref_ptr += ref_stride; ref_ptr += ref_stride;
q12 = vabdl_u8(d0, d8); q12 = vabal_u8(q12, d0, d8);
}
for (i = 0; i < 3; i++) { d1 = vpaddl_u16(vget_low_u16(q12));
d0 = vld1_u8(src_ptr); d3 = vpaddl_u32(d1);
src_ptr += src_stride;
d8 = vld1_u8(ref_ptr);
ref_ptr += ref_stride;
q12 = vabal_u8(q12, d0, d8);
}
d1 = vpaddl_u16(vget_low_u16(q12)); return vget_lane_u32(vreinterpret_u32_u64(d3), 0);
d3 = vpaddl_u32(d1);
return vget_lane_u32(vreinterpret_u32_u64(d3), 0);
} }
unsigned int vpx_sad16x8_neon( unsigned int aom_sad16x8_neon(unsigned char *src_ptr, int src_stride,
unsigned char *src_ptr, unsigned char *ref_ptr, int ref_stride) {
int src_stride, uint8x16_t q0, q4;
unsigned char *ref_ptr, uint16x8_t q12, q13;
int ref_stride) { uint32x4_t q1;
uint8x16_t q0, q4; uint64x2_t q3;
uint16x8_t q12, q13; uint32x2_t d5;
uint32x4_t q1; int i;
uint64x2_t q3;
uint32x2_t d5;
int i;
q0 = vld1q_u8(src_ptr);
src_ptr += src_stride;
q4 = vld1q_u8(ref_ptr);
ref_ptr += ref_stride;
q12 = vabdl_u8(vget_low_u8(q0), vget_low_u8(q4));
q13 = vabdl_u8(vget_high_u8(q0), vget_high_u8(q4));
for (i = 0; i < 7; i++) {
q0 = vld1q_u8(src_ptr); q0 = vld1q_u8(src_ptr);
src_ptr += src_stride; src_ptr += src_stride;
q4 = vld1q_u8(ref_ptr); q4 = vld1q_u8(ref_ptr);
ref_ptr += ref_stride; ref_ptr += ref_stride;
q12 = vabdl_u8(vget_low_u8(q0), vget_low_u8(q4)); q12 = vabal_u8(q12, vget_low_u8(q0), vget_low_u8(q4));
q13 = vabdl_u8(vget_high_u8(q0), vget_high_u8(q4)); q13 = vabal_u8(q13, vget_high_u8(q0), vget_high_u8(q4));
}
for (i = 0; i < 7; i++) { q12 = vaddq_u16(q12, q13);
q0 = vld1q_u8(src_ptr); q1 = vpaddlq_u16(q12);
src_ptr += src_stride; q3 = vpaddlq_u32(q1);
q4 = vld1q_u8(ref_ptr); d5 = vadd_u32(vreinterpret_u32_u64(vget_low_u64(q3)),
ref_ptr += ref_stride; vreinterpret_u32_u64(vget_high_u64(q3)));
q12 = vabal_u8(q12, vget_low_u8(q0), vget_low_u8(q4));
q13 = vabal_u8(q13, vget_high_u8(q0), vget_high_u8(q4));
}
q12 = vaddq_u16(q12, q13); return vget_lane_u32(d5, 0);
q1 = vpaddlq_u16(q12);
q3 = vpaddlq_u32(q1);
d5 = vadd_u32(vreinterpret_u32_u64(vget_low_u64(q3)),
vreinterpret_u32_u64(vget_high_u64(q3)));
return vget_lane_u32(d5, 0);
} }
static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo, static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo,
const uint16x8_t vec_hi) { const uint16x8_t vec_hi) {
const uint32x4_t vec_l_lo = vaddl_u16(vget_low_u16(vec_lo), const uint32x4_t vec_l_lo =
vget_high_u16(vec_lo)); vaddl_u16(vget_low_u16(vec_lo), vget_high_u16(vec_lo));
const uint32x4_t vec_l_hi = vaddl_u16(vget_low_u16(vec_hi), const uint32x4_t vec_l_hi =
vget_high_u16(vec_hi)); vaddl_u16(vget_low_u16(vec_hi), vget_high_u16(vec_hi));
const uint32x4_t a = vaddq_u32(vec_l_lo, vec_l_hi); const uint32x4_t a = vaddq_u32(vec_l_lo, vec_l_hi);
const uint64x2_t b = vpaddlq_u32(a); const uint64x2_t b = vpaddlq_u32(a);
const uint32x2_t c = vadd_u32(vreinterpret_u32_u64(vget_low_u64(b)), const uint32x2_t c = vadd_u32(vreinterpret_u32_u64(vget_low_u64(b)),
@@ -136,7 +128,7 @@ static INLINE unsigned int horizontal_add_16x8(const uint16x8_t vec_16x8) {
return vget_lane_u32(c, 0); return vget_lane_u32(c, 0);
} }
unsigned int vpx_sad64x64_neon(const uint8_t *src, int src_stride, unsigned int aom_sad64x64_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) { const uint8_t *ref, int ref_stride) {
int i; int i;
uint16x8_t vec_accum_lo = vdupq_n_u16(0); uint16x8_t vec_accum_lo = vdupq_n_u16(0);
@@ -172,7 +164,7 @@ unsigned int vpx_sad64x64_neon(const uint8_t *src, int src_stride,
return horizontal_long_add_16x8(vec_accum_lo, vec_accum_hi); return horizontal_long_add_16x8(vec_accum_lo, vec_accum_hi);
} }
unsigned int vpx_sad32x32_neon(const uint8_t *src, int src_stride, unsigned int aom_sad32x32_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) { const uint8_t *ref, int ref_stride) {
int i; int i;
uint16x8_t vec_accum_lo = vdupq_n_u16(0); uint16x8_t vec_accum_lo = vdupq_n_u16(0);
@@ -197,7 +189,7 @@ unsigned int vpx_sad32x32_neon(const uint8_t *src, int src_stride,
return horizontal_add_16x8(vaddq_u16(vec_accum_lo, vec_accum_hi)); return horizontal_add_16x8(vaddq_u16(vec_accum_lo, vec_accum_hi));
} }
unsigned int vpx_sad16x16_neon(const uint8_t *src, int src_stride, unsigned int aom_sad16x16_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) { const uint8_t *ref, int ref_stride) {
int i; int i;
uint16x8_t vec_accum_lo = vdupq_n_u16(0); uint16x8_t vec_accum_lo = vdupq_n_u16(0);
@@ -208,15 +200,15 @@ unsigned int vpx_sad16x16_neon(const uint8_t *src, int src_stride,
const uint8x16_t vec_ref = vld1q_u8(ref); const uint8x16_t vec_ref = vld1q_u8(ref);
src += src_stride; src += src_stride;
ref += ref_stride; ref += ref_stride;
vec_accum_lo = vabal_u8(vec_accum_lo, vget_low_u8(vec_src), vec_accum_lo =
vget_low_u8(vec_ref)); vabal_u8(vec_accum_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref));
vec_accum_hi = vabal_u8(vec_accum_hi, vget_high_u8(vec_src), vec_accum_hi =
vget_high_u8(vec_ref)); vabal_u8(vec_accum_hi, vget_high_u8(vec_src), vget_high_u8(vec_ref));
} }
return horizontal_add_16x8(vaddq_u16(vec_accum_lo, vec_accum_hi)); return horizontal_add_16x8(vaddq_u16(vec_accum_lo, vec_accum_hi));
} }
unsigned int vpx_sad8x8_neon(const uint8_t *src, int src_stride, unsigned int aom_sad8x8_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) { const uint8_t *ref, int ref_stride) {
int i; int i;
uint16x8_t vec_accum = vdupq_n_u16(0); uint16x8_t vec_accum = vdupq_n_u16(0);

View File

@@ -0,0 +1,39 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_push_neon|
EXPORT |aom_pop_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
|aom_push_neon| PROC
vst1.i64 {d8, d9, d10, d11}, [r0]!
vst1.i64 {d12, d13, d14, d15}, [r0]!
bx lr
ENDP
|aom_pop_neon| PROC
vld1.i64 {d8, d9, d10, d11}, [r0]!
vld1.i64 {d12, d13, d14, d15}, [r0]!
bx lr
ENDP
END

View File

@@ -0,0 +1,81 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#if HAVE_MEDIA
static const int16_t bilinear_filters_media[8][2] = { { 128, 0 }, { 112, 16 },
{ 96, 32 }, { 80, 48 },
{ 64, 64 }, { 48, 80 },
{ 32, 96 }, { 16, 112 } };
extern void aom_filter_block2d_bil_first_pass_media(
const uint8_t *src_ptr, uint16_t *dst_ptr, uint32_t src_pitch,
uint32_t height, uint32_t width, const int16_t *filter);
extern void aom_filter_block2d_bil_second_pass_media(
const uint16_t *src_ptr, uint8_t *dst_ptr, int32_t src_pitch,
uint32_t height, uint32_t width, const int16_t *filter);
unsigned int aom_sub_pixel_variance8x8_media(
const uint8_t *src_ptr, int src_pixels_per_line, int xoffset, int yoffset,
const uint8_t *dst_ptr, int dst_pixels_per_line, unsigned int *sse) {
uint16_t first_pass[10 * 8];
uint8_t second_pass[8 * 8];
const int16_t *HFilter, *VFilter;
HFilter = bilinear_filters_media[xoffset];
VFilter = bilinear_filters_media[yoffset];
aom_filter_block2d_bil_first_pass_media(src_ptr, first_pass,
src_pixels_per_line, 9, 8, HFilter);
aom_filter_block2d_bil_second_pass_media(first_pass, second_pass, 8, 8, 8,
VFilter);
return aom_variance8x8_media(second_pass, 8, dst_ptr, dst_pixels_per_line,
sse);
}
unsigned int aom_sub_pixel_variance16x16_media(
const uint8_t *src_ptr, int src_pixels_per_line, int xoffset, int yoffset,
const uint8_t *dst_ptr, int dst_pixels_per_line, unsigned int *sse) {
uint16_t first_pass[36 * 16];
uint8_t second_pass[20 * 16];
const int16_t *HFilter, *VFilter;
unsigned int var;
if (xoffset == 4 && yoffset == 0) {
var = aom_variance_halfpixvar16x16_h_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else if (xoffset == 0 && yoffset == 4) {
var = aom_variance_halfpixvar16x16_v_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else if (xoffset == 4 && yoffset == 4) {
var = aom_variance_halfpixvar16x16_hv_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else {
HFilter = bilinear_filters_media[xoffset];
VFilter = bilinear_filters_media[yoffset];
aom_filter_block2d_bil_first_pass_media(
src_ptr, first_pass, src_pixels_per_line, 17, 16, HFilter);
aom_filter_block2d_bil_second_pass_media(first_pass, second_pass, 16, 16,
16, VFilter);
var = aom_variance16x16_media(second_pass, 16, dst_ptr, dst_pixels_per_line,
sse);
}
return var;
}
#endif // HAVE_MEDIA

View File

@@ -1,31 +1,26 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <arm_neon.h> #include <arm_neon.h>
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "./vpx_config.h" #include "./aom_config.h"
#include "vpx_ports/mem.h" #include "aom_ports/mem.h"
#include "vpx/vpx_integer.h" #include "aom/aom_integer.h"
#include "vpx_dsp/variance.h" #include "aom_dsp/variance.h"
static const uint8_t bilinear_filters[8][2] = { static const uint8_t bilinear_filters[8][2] = {
{ 128, 0, }, { 128, 0 }, { 112, 16 }, { 96, 32 }, { 80, 48 },
{ 112, 16, }, { 64, 64 }, { 48, 80 }, { 32, 96 }, { 16, 112 },
{ 96, 32, },
{ 80, 48, },
{ 64, 64, },
{ 48, 80, },
{ 32, 96, },
{ 16, 112, },
}; };
static void var_filter_block2d_bil_w8(const uint8_t *src_ptr, static void var_filter_block2d_bil_w8(const uint8_t *src_ptr,
@@ -79,74 +74,61 @@ static void var_filter_block2d_bil_w16(const uint8_t *src_ptr,
} }
} }
unsigned int vpx_sub_pixel_variance8x8_neon(const uint8_t *src, unsigned int aom_sub_pixel_variance8x8_neon(const uint8_t *src, int src_stride,
int src_stride, int xoffset, int yoffset,
int xoffset, const uint8_t *dst, int dst_stride,
int yoffset,
const uint8_t *dst,
int dst_stride,
unsigned int *sse) { unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[8 * 8]); DECLARE_ALIGNED(16, uint8_t, temp2[8 * 8]);
DECLARE_ALIGNED(16, uint8_t, fdata3[9 * 8]); DECLARE_ALIGNED(16, uint8_t, fdata3[9 * 8]);
var_filter_block2d_bil_w8(src, fdata3, src_stride, 1, var_filter_block2d_bil_w8(src, fdata3, src_stride, 1, 9, 8,
9, 8,
bilinear_filters[xoffset]); bilinear_filters[xoffset]);
var_filter_block2d_bil_w8(fdata3, temp2, 8, 8, 8, var_filter_block2d_bil_w8(fdata3, temp2, 8, 8, 8, 8,
8, bilinear_filters[yoffset]); bilinear_filters[yoffset]);
return vpx_variance8x8_neon(temp2, 8, dst, dst_stride, sse); return aom_variance8x8_neon(temp2, 8, dst, dst_stride, sse);
} }
unsigned int vpx_sub_pixel_variance16x16_neon(const uint8_t *src, unsigned int aom_sub_pixel_variance16x16_neon(const uint8_t *src,
int src_stride, int src_stride, int xoffset,
int xoffset, int yoffset, const uint8_t *dst,
int yoffset,
const uint8_t *dst,
int dst_stride, int dst_stride,
unsigned int *sse) { unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[16 * 16]); DECLARE_ALIGNED(16, uint8_t, temp2[16 * 16]);
DECLARE_ALIGNED(16, uint8_t, fdata3[17 * 16]); DECLARE_ALIGNED(16, uint8_t, fdata3[17 * 16]);
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, 17, 16,
17, 16,
bilinear_filters[xoffset]); bilinear_filters[xoffset]);
var_filter_block2d_bil_w16(fdata3, temp2, 16, 16, 16, var_filter_block2d_bil_w16(fdata3, temp2, 16, 16, 16, 16,
16, bilinear_filters[yoffset]); bilinear_filters[yoffset]);
return vpx_variance16x16_neon(temp2, 16, dst, dst_stride, sse); return aom_variance16x16_neon(temp2, 16, dst, dst_stride, sse);
} }
unsigned int vpx_sub_pixel_variance32x32_neon(const uint8_t *src, unsigned int aom_sub_pixel_variance32x32_neon(const uint8_t *src,
int src_stride, int src_stride, int xoffset,
int xoffset, int yoffset, const uint8_t *dst,
int yoffset,
const uint8_t *dst,
int dst_stride, int dst_stride,
unsigned int *sse) { unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[32 * 32]); DECLARE_ALIGNED(16, uint8_t, temp2[32 * 32]);
DECLARE_ALIGNED(16, uint8_t, fdata3[33 * 32]); DECLARE_ALIGNED(16, uint8_t, fdata3[33 * 32]);
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, 33, 32,
33, 32,
bilinear_filters[xoffset]); bilinear_filters[xoffset]);
var_filter_block2d_bil_w16(fdata3, temp2, 32, 32, 32, var_filter_block2d_bil_w16(fdata3, temp2, 32, 32, 32, 32,
32, bilinear_filters[yoffset]); bilinear_filters[yoffset]);
return vpx_variance32x32_neon(temp2, 32, dst, dst_stride, sse); return aom_variance32x32_neon(temp2, 32, dst, dst_stride, sse);
} }
unsigned int vpx_sub_pixel_variance64x64_neon(const uint8_t *src, unsigned int aom_sub_pixel_variance64x64_neon(const uint8_t *src,
int src_stride, int src_stride, int xoffset,
int xoffset, int yoffset, const uint8_t *dst,
int yoffset,
const uint8_t *dst,
int dst_stride, int dst_stride,
unsigned int *sse) { unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[64 * 64]); DECLARE_ALIGNED(16, uint8_t, temp2[64 * 64]);
DECLARE_ALIGNED(16, uint8_t, fdata3[65 * 64]); DECLARE_ALIGNED(16, uint8_t, fdata3[65 * 64]);
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, 65, 64,
65, 64,
bilinear_filters[xoffset]); bilinear_filters[xoffset]);
var_filter_block2d_bil_w16(fdata3, temp2, 64, 64, 64, var_filter_block2d_bil_w16(fdata3, temp2, 64, 64, 64, 64,
64, bilinear_filters[yoffset]); bilinear_filters[yoffset]);
return vpx_variance64x64_neon(temp2, 64, dst, dst_stride, sse); return aom_variance64x64_neon(temp2, 64, dst, dst_stride, sse);
} }

View File

@@ -0,0 +1,80 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
void aom_subtract_block_neon(int rows, int cols, int16_t *diff,
ptrdiff_t diff_stride, const uint8_t *src,
ptrdiff_t src_stride, const uint8_t *pred,
ptrdiff_t pred_stride) {
int r, c;
if (cols > 16) {
for (r = 0; r < rows; ++r) {
for (c = 0; c < cols; c += 32) {
const uint8x16_t v_src_00 = vld1q_u8(&src[c + 0]);
const uint8x16_t v_src_16 = vld1q_u8(&src[c + 16]);
const uint8x16_t v_pred_00 = vld1q_u8(&pred[c + 0]);
const uint8x16_t v_pred_16 = vld1q_u8(&pred[c + 16]);
const uint16x8_t v_diff_lo_00 =
vsubl_u8(vget_low_u8(v_src_00), vget_low_u8(v_pred_00));
const uint16x8_t v_diff_hi_00 =
vsubl_u8(vget_high_u8(v_src_00), vget_high_u8(v_pred_00));
const uint16x8_t v_diff_lo_16 =
vsubl_u8(vget_low_u8(v_src_16), vget_low_u8(v_pred_16));
const uint16x8_t v_diff_hi_16 =
vsubl_u8(vget_high_u8(v_src_16), vget_high_u8(v_pred_16));
vst1q_s16(&diff[c + 0], vreinterpretq_s16_u16(v_diff_lo_00));
vst1q_s16(&diff[c + 8], vreinterpretq_s16_u16(v_diff_hi_00));
vst1q_s16(&diff[c + 16], vreinterpretq_s16_u16(v_diff_lo_16));
vst1q_s16(&diff[c + 24], vreinterpretq_s16_u16(v_diff_hi_16));
}
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
} else if (cols > 8) {
for (r = 0; r < rows; ++r) {
const uint8x16_t v_src = vld1q_u8(&src[0]);
const uint8x16_t v_pred = vld1q_u8(&pred[0]);
const uint16x8_t v_diff_lo =
vsubl_u8(vget_low_u8(v_src), vget_low_u8(v_pred));
const uint16x8_t v_diff_hi =
vsubl_u8(vget_high_u8(v_src), vget_high_u8(v_pred));
vst1q_s16(&diff[0], vreinterpretq_s16_u16(v_diff_lo));
vst1q_s16(&diff[8], vreinterpretq_s16_u16(v_diff_hi));
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
} else if (cols > 4) {
for (r = 0; r < rows; ++r) {
const uint8x8_t v_src = vld1_u8(&src[0]);
const uint8x8_t v_pred = vld1_u8(&pred[0]);
const uint16x8_t v_diff = vsubl_u8(v_src, v_pred);
vst1q_s16(&diff[0], vreinterpretq_s16_u16(v_diff));
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
} else {
for (r = 0; r < rows; ++r) {
for (c = 0; c < cols; ++c) diff[c] = src[c] - pred[c];
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
}
}

View File

@@ -1,15 +1,18 @@
; ;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
EXPORT |vpx_variance_halfpixvar16x16_h_media| EXPORT |aom_variance_halfpixvar16x16_h_media|
ARM ARM
REQUIRE8 REQUIRE8
@@ -22,7 +25,7 @@
; r2 unsigned char *ref_ptr ; r2 unsigned char *ref_ptr
; r3 int recon_stride ; r3 int recon_stride
; stack unsigned int *sse ; stack unsigned int *sse
|vpx_variance_halfpixvar16x16_h_media| PROC |aom_variance_halfpixvar16x16_h_media| PROC
stmfd sp!, {r4-r12, lr} stmfd sp!, {r4-r12, lr}

View File

@@ -1,15 +1,18 @@
; ;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
EXPORT |vpx_variance_halfpixvar16x16_hv_media| EXPORT |aom_variance_halfpixvar16x16_hv_media|
ARM ARM
REQUIRE8 REQUIRE8
@@ -22,7 +25,7 @@
; r2 unsigned char *ref_ptr ; r2 unsigned char *ref_ptr
; r3 int recon_stride ; r3 int recon_stride
; stack unsigned int *sse ; stack unsigned int *sse
|vpx_variance_halfpixvar16x16_hv_media| PROC |aom_variance_halfpixvar16x16_hv_media| PROC
stmfd sp!, {r4-r12, lr} stmfd sp!, {r4-r12, lr}

View File

@@ -1,15 +1,18 @@
; ;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
EXPORT |vpx_variance_halfpixvar16x16_v_media| EXPORT |aom_variance_halfpixvar16x16_v_media|
ARM ARM
REQUIRE8 REQUIRE8
@@ -22,7 +25,7 @@
; r2 unsigned char *ref_ptr ; r2 unsigned char *ref_ptr
; r3 int recon_stride ; r3 int recon_stride
; stack unsigned int *sse ; stack unsigned int *sse
|vpx_variance_halfpixvar16x16_v_media| PROC |aom_variance_halfpixvar16x16_v_media| PROC
stmfd sp!, {r4-r12, lr} stmfd sp!, {r4-r12, lr}

View File

@@ -1,17 +1,20 @@
; ;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved. ; Copyright (c) 2016, Alliance for Open Media. All rights reserved
; ;
; Use of this source code is governed by a BSD-style license ; This source code is subject to the terms of the BSD 2 Clause License and
; that can be found in the LICENSE file in the root of the source ; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; tree. An additional intellectual property rights grant can be found ; was not distributed with this source code in the LICENSE file, you can
; in the file PATENTS. All contributing project authors may ; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; be found in the AUTHORS file in the root of the source tree. ; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
; ;
EXPORT |vpx_variance16x16_media| EXPORT |aom_variance16x16_media|
EXPORT |vpx_variance8x8_media| EXPORT |aom_variance8x8_media|
EXPORT |vpx_mse16x16_media| EXPORT |aom_mse16x16_media|
ARM ARM
REQUIRE8 REQUIRE8
@@ -24,7 +27,7 @@
; r2 unsigned char *ref_ptr ; r2 unsigned char *ref_ptr
; r3 int recon_stride ; r3 int recon_stride
; stack unsigned int *sse ; stack unsigned int *sse
|vpx_variance16x16_media| PROC |aom_variance16x16_media| PROC
stmfd sp!, {r4-r12, lr} stmfd sp!, {r4-r12, lr}
@@ -157,7 +160,7 @@ loop16x16
; r2 unsigned char *ref_ptr ; r2 unsigned char *ref_ptr
; r3 int recon_stride ; r3 int recon_stride
; stack unsigned int *sse ; stack unsigned int *sse
|vpx_variance8x8_media| PROC |aom_variance8x8_media| PROC
push {r4-r10, lr} push {r4-r10, lr}
@@ -241,10 +244,10 @@ loop8x8
; r3 int recon_stride ; r3 int recon_stride
; stack unsigned int *sse ; stack unsigned int *sse
; ;
;note: Based on vpx_variance16x16_media. In this function, sum is never used. ;note: Based on aom_variance16x16_media. In this function, sum is never used.
; So, we can remove this part of calculation. ; So, we can remove this part of calculation.
|vpx_mse16x16_media| PROC |aom_mse16x16_media| PROC
push {r4-r9, lr} push {r4-r9, lr}

400
aom_dsp/arm/variance_neon.c Normal file
View File

@@ -0,0 +1,400 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
static INLINE int horizontal_add_s16x8(const int16x8_t v_16x8) {
const int32x4_t a = vpaddlq_s16(v_16x8);
const int64x2_t b = vpaddlq_s32(a);
const int32x2_t c = vadd_s32(vreinterpret_s32_s64(vget_low_s64(b)),
vreinterpret_s32_s64(vget_high_s64(b)));
return vget_lane_s32(c, 0);
}
static INLINE int horizontal_add_s32x4(const int32x4_t v_32x4) {
const int64x2_t b = vpaddlq_s32(v_32x4);
const int32x2_t c = vadd_s32(vreinterpret_s32_s64(vget_low_s64(b)),
vreinterpret_s32_s64(vget_high_s64(b)));
return vget_lane_s32(c, 0);
}
// w * h must be less than 2048 or local variable v_sum may overflow.
static void variance_neon_w8(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, int w, int h, uint32_t *sse,
int *sum) {
int i, j;
int16x8_t v_sum = vdupq_n_s16(0);
int32x4_t v_sse_lo = vdupq_n_s32(0);
int32x4_t v_sse_hi = vdupq_n_s32(0);
for (i = 0; i < h; ++i) {
for (j = 0; j < w; j += 8) {
const uint8x8_t v_a = vld1_u8(&a[j]);
const uint8x8_t v_b = vld1_u8(&b[j]);
const uint16x8_t v_diff = vsubl_u8(v_a, v_b);
const int16x8_t sv_diff = vreinterpretq_s16_u16(v_diff);
v_sum = vaddq_s16(v_sum, sv_diff);
v_sse_lo =
vmlal_s16(v_sse_lo, vget_low_s16(sv_diff), vget_low_s16(sv_diff));
v_sse_hi =
vmlal_s16(v_sse_hi, vget_high_s16(sv_diff), vget_high_s16(sv_diff));
}
a += a_stride;
b += b_stride;
}
*sum = horizontal_add_s16x8(v_sum);
*sse = (unsigned int)horizontal_add_s32x4(vaddq_s32(v_sse_lo, v_sse_hi));
}
void aom_get8x8var_neon(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, unsigned int *sse, int *sum) {
variance_neon_w8(a, a_stride, b, b_stride, 8, 8, sse, sum);
}
void aom_get16x16var_neon(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, unsigned int *sse, int *sum) {
variance_neon_w8(a, a_stride, b, b_stride, 16, 16, sse, sum);
}
unsigned int aom_variance8x8_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum;
variance_neon_w8(a, a_stride, b, b_stride, 8, 8, sse, &sum);
return *sse - (((int64_t)sum * sum) >> 6); // >> 6 = / 8 * 8
}
unsigned int aom_variance16x16_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum;
variance_neon_w8(a, a_stride, b, b_stride, 16, 16, sse, &sum);
return *sse - (((int64_t)sum * sum) >> 8); // >> 8 = / 16 * 16
}
unsigned int aom_variance32x32_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum;
variance_neon_w8(a, a_stride, b, b_stride, 32, 32, sse, &sum);
return *sse - (((int64_t)sum * sum) >> 10); // >> 10 = / 32 * 32
}
unsigned int aom_variance32x64_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum1, sum2;
uint32_t sse1, sse2;
variance_neon_w8(a, a_stride, b, b_stride, 32, 32, &sse1, &sum1);
variance_neon_w8(a + (32 * a_stride), a_stride, b + (32 * b_stride), b_stride,
32, 32, &sse2, &sum2);
*sse = sse1 + sse2;
sum1 += sum2;
return *sse - (((int64_t)sum1 * sum1) >> 11); // >> 11 = / 32 * 64
}
unsigned int aom_variance64x32_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum1, sum2;
uint32_t sse1, sse2;
variance_neon_w8(a, a_stride, b, b_stride, 64, 16, &sse1, &sum1);
variance_neon_w8(a + (16 * a_stride), a_stride, b + (16 * b_stride), b_stride,
64, 16, &sse2, &sum2);
*sse = sse1 + sse2;
sum1 += sum2;
return *sse - (((int64_t)sum1 * sum1) >> 11); // >> 11 = / 32 * 64
}
unsigned int aom_variance64x64_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum1, sum2;
uint32_t sse1, sse2;
variance_neon_w8(a, a_stride, b, b_stride, 64, 16, &sse1, &sum1);
variance_neon_w8(a + (16 * a_stride), a_stride, b + (16 * b_stride), b_stride,
64, 16, &sse2, &sum2);
sse1 += sse2;
sum1 += sum2;
variance_neon_w8(a + (16 * 2 * a_stride), a_stride, b + (16 * 2 * b_stride),
b_stride, 64, 16, &sse2, &sum2);
sse1 += sse2;
sum1 += sum2;
variance_neon_w8(a + (16 * 3 * a_stride), a_stride, b + (16 * 3 * b_stride),
b_stride, 64, 16, &sse2, &sum2);
*sse = sse1 + sse2;
sum1 += sum2;
return *sse - (((int64_t)sum1 * sum1) >> 12); // >> 12 = / 64 * 64
}
unsigned int aom_variance16x8_neon(const unsigned char *src_ptr,
int source_stride,
const unsigned char *ref_ptr,
int recon_stride, unsigned int *sse) {
int i;
int16x4_t d22s16, d23s16, d24s16, d25s16, d26s16, d27s16, d28s16, d29s16;
uint32x2_t d0u32, d10u32;
int64x1_t d0s64, d1s64;
uint8x16_t q0u8, q1u8, q2u8, q3u8;
uint16x8_t q11u16, q12u16, q13u16, q14u16;
int32x4_t q8s32, q9s32, q10s32;
int64x2_t q0s64, q1s64, q5s64;
q8s32 = vdupq_n_s32(0);
q9s32 = vdupq_n_s32(0);
q10s32 = vdupq_n_s32(0);
for (i = 0; i < 4; i++) {
q0u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
q1u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
__builtin_prefetch(src_ptr);
q2u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
q3u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
__builtin_prefetch(ref_ptr);
q11u16 = vsubl_u8(vget_low_u8(q0u8), vget_low_u8(q2u8));
q12u16 = vsubl_u8(vget_high_u8(q0u8), vget_high_u8(q2u8));
q13u16 = vsubl_u8(vget_low_u8(q1u8), vget_low_u8(q3u8));
q14u16 = vsubl_u8(vget_high_u8(q1u8), vget_high_u8(q3u8));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q11u16));
q9s32 = vmlal_s16(q9s32, d22s16, d22s16);
q10s32 = vmlal_s16(q10s32, d23s16, d23s16);
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q12u16));
q9s32 = vmlal_s16(q9s32, d24s16, d24s16);
q10s32 = vmlal_s16(q10s32, d25s16, d25s16);
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q13u16));
q9s32 = vmlal_s16(q9s32, d26s16, d26s16);
q10s32 = vmlal_s16(q10s32, d27s16, d27s16);
d28s16 = vreinterpret_s16_u16(vget_low_u16(q14u16));
d29s16 = vreinterpret_s16_u16(vget_high_u16(q14u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q14u16));
q9s32 = vmlal_s16(q9s32, d28s16, d28s16);
q10s32 = vmlal_s16(q10s32, d29s16, d29s16);
}
q10s32 = vaddq_s32(q10s32, q9s32);
q0s64 = vpaddlq_s32(q8s32);
q1s64 = vpaddlq_s32(q10s32);
d0s64 = vadd_s64(vget_low_s64(q0s64), vget_high_s64(q0s64));
d1s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
q5s64 = vmull_s32(vreinterpret_s32_s64(d0s64), vreinterpret_s32_s64(d0s64));
vst1_lane_u32((uint32_t *)sse, vreinterpret_u32_s64(d1s64), 0);
d10u32 = vshr_n_u32(vreinterpret_u32_s64(vget_low_s64(q5s64)), 7);
d0u32 = vsub_u32(vreinterpret_u32_s64(d1s64), d10u32);
return vget_lane_u32(d0u32, 0);
}
unsigned int aom_variance8x16_neon(const unsigned char *src_ptr,
int source_stride,
const unsigned char *ref_ptr,
int recon_stride, unsigned int *sse) {
int i;
uint8x8_t d0u8, d2u8, d4u8, d6u8;
int16x4_t d22s16, d23s16, d24s16, d25s16;
uint32x2_t d0u32, d10u32;
int64x1_t d0s64, d1s64;
uint16x8_t q11u16, q12u16;
int32x4_t q8s32, q9s32, q10s32;
int64x2_t q0s64, q1s64, q5s64;
q8s32 = vdupq_n_s32(0);
q9s32 = vdupq_n_s32(0);
q10s32 = vdupq_n_s32(0);
for (i = 0; i < 8; i++) {
d0u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d2u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
__builtin_prefetch(src_ptr);
d4u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d6u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
__builtin_prefetch(ref_ptr);
q11u16 = vsubl_u8(d0u8, d4u8);
q12u16 = vsubl_u8(d2u8, d6u8);
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q11u16));
q9s32 = vmlal_s16(q9s32, d22s16, d22s16);
q10s32 = vmlal_s16(q10s32, d23s16, d23s16);
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q12u16));
q9s32 = vmlal_s16(q9s32, d24s16, d24s16);
q10s32 = vmlal_s16(q10s32, d25s16, d25s16);
}
q10s32 = vaddq_s32(q10s32, q9s32);
q0s64 = vpaddlq_s32(q8s32);
q1s64 = vpaddlq_s32(q10s32);
d0s64 = vadd_s64(vget_low_s64(q0s64), vget_high_s64(q0s64));
d1s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
q5s64 = vmull_s32(vreinterpret_s32_s64(d0s64), vreinterpret_s32_s64(d0s64));
vst1_lane_u32((uint32_t *)sse, vreinterpret_u32_s64(d1s64), 0);
d10u32 = vshr_n_u32(vreinterpret_u32_s64(vget_low_s64(q5s64)), 7);
d0u32 = vsub_u32(vreinterpret_u32_s64(d1s64), d10u32);
return vget_lane_u32(d0u32, 0);
}
unsigned int aom_mse16x16_neon(const unsigned char *src_ptr, int source_stride,
const unsigned char *ref_ptr, int recon_stride,
unsigned int *sse) {
int i;
int16x4_t d22s16, d23s16, d24s16, d25s16, d26s16, d27s16, d28s16, d29s16;
int64x1_t d0s64;
uint8x16_t q0u8, q1u8, q2u8, q3u8;
int32x4_t q7s32, q8s32, q9s32, q10s32;
uint16x8_t q11u16, q12u16, q13u16, q14u16;
int64x2_t q1s64;
q7s32 = vdupq_n_s32(0);
q8s32 = vdupq_n_s32(0);
q9s32 = vdupq_n_s32(0);
q10s32 = vdupq_n_s32(0);
for (i = 0; i < 8; i++) { // mse16x16_neon_loop
q0u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
q1u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
q2u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
q3u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
q11u16 = vsubl_u8(vget_low_u8(q0u8), vget_low_u8(q2u8));
q12u16 = vsubl_u8(vget_high_u8(q0u8), vget_high_u8(q2u8));
q13u16 = vsubl_u8(vget_low_u8(q1u8), vget_low_u8(q3u8));
q14u16 = vsubl_u8(vget_high_u8(q1u8), vget_high_u8(q3u8));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
q7s32 = vmlal_s16(q7s32, d22s16, d22s16);
q8s32 = vmlal_s16(q8s32, d23s16, d23s16);
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
q9s32 = vmlal_s16(q9s32, d24s16, d24s16);
q10s32 = vmlal_s16(q10s32, d25s16, d25s16);
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q7s32 = vmlal_s16(q7s32, d26s16, d26s16);
q8s32 = vmlal_s16(q8s32, d27s16, d27s16);
d28s16 = vreinterpret_s16_u16(vget_low_u16(q14u16));
d29s16 = vreinterpret_s16_u16(vget_high_u16(q14u16));
q9s32 = vmlal_s16(q9s32, d28s16, d28s16);
q10s32 = vmlal_s16(q10s32, d29s16, d29s16);
}
q7s32 = vaddq_s32(q7s32, q8s32);
q9s32 = vaddq_s32(q9s32, q10s32);
q10s32 = vaddq_s32(q7s32, q9s32);
q1s64 = vpaddlq_s32(q10s32);
d0s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
vst1_lane_u32((uint32_t *)sse, vreinterpret_u32_s64(d0s64), 0);
return vget_lane_u32(vreinterpret_u32_s64(d0s64), 0);
}
unsigned int aom_get4x4sse_cs_neon(const unsigned char *src_ptr,
int source_stride,
const unsigned char *ref_ptr,
int recon_stride) {
int16x4_t d22s16, d24s16, d26s16, d28s16;
int64x1_t d0s64;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
int32x4_t q7s32, q8s32, q9s32, q10s32;
uint16x8_t q11u16, q12u16, q13u16, q14u16;
int64x2_t q1s64;
d0u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d4u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d1u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d5u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d2u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d6u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d3u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d7u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
q11u16 = vsubl_u8(d0u8, d4u8);
q12u16 = vsubl_u8(d1u8, d5u8);
q13u16 = vsubl_u8(d2u8, d6u8);
q14u16 = vsubl_u8(d3u8, d7u8);
d22s16 = vget_low_s16(vreinterpretq_s16_u16(q11u16));
d24s16 = vget_low_s16(vreinterpretq_s16_u16(q12u16));
d26s16 = vget_low_s16(vreinterpretq_s16_u16(q13u16));
d28s16 = vget_low_s16(vreinterpretq_s16_u16(q14u16));
q7s32 = vmull_s16(d22s16, d22s16);
q8s32 = vmull_s16(d24s16, d24s16);
q9s32 = vmull_s16(d26s16, d26s16);
q10s32 = vmull_s16(d28s16, d28s16);
q7s32 = vaddq_s32(q7s32, q8s32);
q9s32 = vaddq_s32(q9s32, q10s32);
q9s32 = vaddq_s32(q7s32, q9s32);
q1s64 = vpaddlq_s32(q9s32);
d0s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
return vget_lane_u32(vreinterpret_u32_s64(d0s64), 0);
}

View File

@@ -1,31 +1,34 @@
/* /*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved. * Copyright (c) 2016, Alliance for Open Media. All rights reserved
* *
* Use of this source code is governed by a BSD-style license * This source code is subject to the terms of the BSD 2 Clause License and
* that can be found in the LICENSE file in the root of the source * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* tree. An additional intellectual property rights grant can be found * was not distributed with this source code in the LICENSE file, you can
* in the file PATENTS. All contributing project authors may * obtain it at www.aomedia.org/license/software. If the Alliance for Open
* be found in the AUTHORS file in the root of the source tree. * Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/ */
#include <stdlib.h> #include <stdlib.h>
#include "./vpx_dsp_rtcd.h" #include "./aom_dsp_rtcd.h"
#include "vpx_ports/mem.h" #include "aom_ports/mem.h"
unsigned int vpx_avg_8x8_c(const uint8_t *src, int stride) { unsigned int aom_avg_8x8_c(const uint8_t *src, int stride) {
int i, j; int i, j;
int sum = 0; int sum = 0;
for (i = 0; i < 8; ++i, src += stride) for (i = 0; i < 8; ++i, src += stride)
for (j = 0; j < 8; sum += src[j], ++j) {} for (j = 0; j < 8; sum += src[j], ++j) {
}
return ROUND_POWER_OF_TWO(sum, 6); return ROUND_POWER_OF_TWO(sum, 6);
} }
unsigned int vpx_avg_4x4_c(const uint8_t *src, int stride) { unsigned int aom_avg_4x4_c(const uint8_t *src, int stride) {
int i, j; int i, j;
int sum = 0; int sum = 0;
for (i = 0; i < 4; ++i, src += stride) for (i = 0; i < 4; ++i, src += stride)
for (j = 0; j < 4; sum += src[j], ++j) {} for (j = 0; j < 4; sum += src[j], ++j) {
}
return ROUND_POWER_OF_TWO(sum, 4); return ROUND_POWER_OF_TWO(sum, 4);
} }
@@ -64,7 +67,7 @@ static void hadamard_col8(const int16_t *src_diff, int src_stride,
// The order of the output coeff of the hadamard is not important. For // The order of the output coeff of the hadamard is not important. For
// optimization purposes the final transpose may be skipped. // optimization purposes the final transpose may be skipped.
void vpx_hadamard_8x8_c(const int16_t *src_diff, int src_stride, void aom_hadamard_8x8_c(const int16_t *src_diff, int src_stride,
int16_t *coeff) { int16_t *coeff) {
int idx; int idx;
int16_t buffer[64]; int16_t buffer[64];
@@ -80,21 +83,21 @@ void vpx_hadamard_8x8_c(const int16_t *src_diff, int src_stride,
for (idx = 0; idx < 8; ++idx) { for (idx = 0; idx < 8; ++idx) {
hadamard_col8(tmp_buf, 8, coeff); // tmp_buf: 12 bit hadamard_col8(tmp_buf, 8, coeff); // tmp_buf: 12 bit
// dynamic range [-2040, 2040] // dynamic range [-2040, 2040]
coeff += 8; // coeff: 15 bit coeff += 8; // coeff: 15 bit
// dynamic range [-16320, 16320] // dynamic range [-16320, 16320]
++tmp_buf; ++tmp_buf;
} }
} }
// In place 16x16 2D Hadamard transform // In place 16x16 2D Hadamard transform
void vpx_hadamard_16x16_c(const int16_t *src_diff, int src_stride, void aom_hadamard_16x16_c(const int16_t *src_diff, int src_stride,
int16_t *coeff) { int16_t *coeff) {
int idx; int idx;
for (idx = 0; idx < 4; ++idx) { for (idx = 0; idx < 4; ++idx) {
// src_diff: 9 bit, dynamic range [-255, 255] // src_diff: 9 bit, dynamic range [-255, 255]
const int16_t *src_ptr = src_diff + (idx >> 1) * 8 * src_stride const int16_t *src_ptr =
+ (idx & 0x01) * 8; src_diff + (idx >> 1) * 8 * src_stride + (idx & 0x01) * 8;
vpx_hadamard_8x8_c(src_ptr, src_stride, coeff + idx * 64); aom_hadamard_8x8_c(src_ptr, src_stride, coeff + idx * 64);
} }
// coeff: 15 bit, dynamic range [-16320, 16320] // coeff: 15 bit, dynamic range [-16320, 16320]
@@ -109,8 +112,8 @@ void vpx_hadamard_16x16_c(const int16_t *src_diff, int src_stride,
int16_t b2 = (a2 + a3) >> 1; // [-16320, 16320] int16_t b2 = (a2 + a3) >> 1; // [-16320, 16320]
int16_t b3 = (a2 - a3) >> 1; int16_t b3 = (a2 - a3) >> 1;
coeff[0] = b0 + b2; // 16 bit, [-32640, 32640] coeff[0] = b0 + b2; // 16 bit, [-32640, 32640]
coeff[64] = b1 + b3; coeff[64] = b1 + b3;
coeff[128] = b0 - b2; coeff[128] = b0 - b2;
coeff[192] = b1 - b3; coeff[192] = b1 - b3;
@@ -120,11 +123,10 @@ void vpx_hadamard_16x16_c(const int16_t *src_diff, int src_stride,
// coeff: 16 bits, dynamic range [-32640, 32640]. // coeff: 16 bits, dynamic range [-32640, 32640].
// length: value range {16, 64, 256, 1024}. // length: value range {16, 64, 256, 1024}.
int vpx_satd_c(const int16_t *coeff, int length) { int aom_satd_c(const int16_t *coeff, int length) {
int i; int i;
int satd = 0; int satd = 0;
for (i = 0; i < length; ++i) for (i = 0; i < length; ++i) satd += abs(coeff[i]);
satd += abs(coeff[i]);
// satd: 26 bits, dynamic range [-32640 * 1024, 32640 * 1024] // satd: 26 bits, dynamic range [-32640 * 1024, 32640 * 1024]
return satd; return satd;
@@ -132,7 +134,7 @@ int vpx_satd_c(const int16_t *coeff, int length) {
// Integer projection onto row vectors. // Integer projection onto row vectors.
// height: value range {16, 32, 64}. // height: value range {16, 32, 64}.
void vpx_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref, void aom_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref,
const int ref_stride, const int height) { const int ref_stride, const int height) {
int idx; int idx;
const int norm_factor = height >> 1; const int norm_factor = height >> 1;
@@ -140,8 +142,7 @@ void vpx_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref,
int i; int i;
hbuf[idx] = 0; hbuf[idx] = 0;
// hbuf[idx]: 14 bit, dynamic range [0, 16320]. // hbuf[idx]: 14 bit, dynamic range [0, 16320].
for (i = 0; i < height; ++i) for (i = 0; i < height; ++i) hbuf[idx] += ref[i * ref_stride];
hbuf[idx] += ref[i * ref_stride];
// hbuf[idx]: 9 bit, dynamic range [0, 510]. // hbuf[idx]: 9 bit, dynamic range [0, 510].
hbuf[idx] /= norm_factor; hbuf[idx] /= norm_factor;
++ref; ++ref;
@@ -149,20 +150,18 @@ void vpx_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref,
} }
// width: value range {16, 32, 64}. // width: value range {16, 32, 64}.
int16_t vpx_int_pro_col_c(const uint8_t *ref, const int width) { int16_t aom_int_pro_col_c(const uint8_t *ref, const int width) {
int idx; int idx;
int16_t sum = 0; int16_t sum = 0;
// sum: 14 bit, dynamic range [0, 16320] // sum: 14 bit, dynamic range [0, 16320]
for (idx = 0; idx < width; ++idx) for (idx = 0; idx < width; ++idx) sum += ref[idx];
sum += ref[idx];
return sum; return sum;
} }
// ref: [0 - 510] // ref: [0 - 510]
// src: [0 - 510] // src: [0 - 510]
// bwl: {2, 3, 4} // bwl: {2, 3, 4}
int vpx_vector_var_c(const int16_t *ref, const int16_t *src, int aom_vector_var_c(const int16_t *ref, const int16_t *src, const int bwl) {
const int bwl) {
int i; int i;
int width = 4 << bwl; int width = 4 << bwl;
int sse = 0, mean = 0, var; int sse = 0, mean = 0, var;
@@ -178,57 +177,56 @@ int vpx_vector_var_c(const int16_t *ref, const int16_t *src,
return var; return var;
} }
void vpx_minmax_8x8_c(const uint8_t *src, int src_stride, void aom_minmax_8x8_c(const uint8_t *src, int src_stride, const uint8_t *ref,
const uint8_t *ref, int ref_stride, int ref_stride, int *min, int *max) {
int *min, int *max) {
int i, j; int i, j;
*min = 255; *min = 255;
*max = 0; *max = 0;
for (i = 0; i < 8; ++i, src += src_stride, ref += ref_stride) { for (i = 0; i < 8; ++i, src += src_stride, ref += ref_stride) {
for (j = 0; j < 8; ++j) { for (j = 0; j < 8; ++j) {
int diff = abs(src[j]-ref[j]); int diff = abs(src[j] - ref[j]);
*min = diff < *min ? diff : *min; *min = diff < *min ? diff : *min;
*max = diff > *max ? diff : *max; *max = diff > *max ? diff : *max;
} }
} }
} }
#if CONFIG_VP9_HIGHBITDEPTH #if CONFIG_AOM_HIGHBITDEPTH
unsigned int vpx_highbd_avg_8x8_c(const uint8_t *src, int stride) { unsigned int aom_highbd_avg_8x8_c(const uint8_t *src, int stride) {
int i, j; int i, j;
int sum = 0; int sum = 0;
const uint16_t* s = CONVERT_TO_SHORTPTR(src); const uint16_t *s = CONVERT_TO_SHORTPTR(src);
for (i = 0; i < 8; ++i, s += stride) for (i = 0; i < 8; ++i, s += stride)
for (j = 0; j < 8; sum += s[j], ++j) {} for (j = 0; j < 8; sum += s[j], ++j) {
}
return ROUND_POWER_OF_TWO(sum, 6); return ROUND_POWER_OF_TWO(sum, 6);
} }
unsigned int vpx_highbd_avg_4x4_c(const uint8_t *src, int stride) { unsigned int aom_highbd_avg_4x4_c(const uint8_t *src, int stride) {
int i, j; int i, j;
int sum = 0; int sum = 0;
const uint16_t* s = CONVERT_TO_SHORTPTR(src); const uint16_t *s = CONVERT_TO_SHORTPTR(src);
for (i = 0; i < 4; ++i, s+=stride) for (i = 0; i < 4; ++i, s += stride)
for (j = 0; j < 4; sum += s[j], ++j) {} for (j = 0; j < 4; sum += s[j], ++j) {
}
return ROUND_POWER_OF_TWO(sum, 4); return ROUND_POWER_OF_TWO(sum, 4);
} }
void vpx_highbd_minmax_8x8_c(const uint8_t *s8, int p, const uint8_t *d8, void aom_highbd_minmax_8x8_c(const uint8_t *s8, int p, const uint8_t *d8,
int dp, int *min, int *max) { int dp, int *min, int *max) {
int i, j; int i, j;
const uint16_t* s = CONVERT_TO_SHORTPTR(s8); const uint16_t *s = CONVERT_TO_SHORTPTR(s8);
const uint16_t* d = CONVERT_TO_SHORTPTR(d8); const uint16_t *d = CONVERT_TO_SHORTPTR(d8);
*min = 255; *min = 255;
*max = 0; *max = 0;
for (i = 0; i < 8; ++i, s += p, d += dp) { for (i = 0; i < 8; ++i, s += p, d += dp) {
for (j = 0; j < 8; ++j) { for (j = 0; j < 8; ++j) {
int diff = abs(s[j]-d[j]); int diff = abs(s[j] - d[j]);
*min = diff < *min ? diff : *min; *min = diff < *min ? diff : *min;
*max = diff > *max ? diff : *max; *max = diff > *max ? diff : *max;
} }
} }
} }
#endif // CONFIG_VP9_HIGHBITDEPTH #endif // CONFIG_AOM_HIGHBITDEPTH

240
aom_dsp/bitreader.h Normal file
View File

@@ -0,0 +1,240 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITREADER_H_
#define AOM_DSP_BITREADER_H_
#include <assert.h>
#include <limits.h>
#include "./aom_config.h"
#if CONFIG_EC_ADAPT && !CONFIG_EC_MULTISYMBOL
#error "CONFIG_EC_ADAPT is enabled without enabling CONFIG_EC_MULTISYMBOL."
#endif
#include "aom/aomdx.h"
#include "aom/aom_integer.h"
#if CONFIG_ANS
#include "aom_dsp/ansreader.h"
#elif CONFIG_DAALA_EC
#include "aom_dsp/daalaboolreader.h"
#else
#include "aom_dsp/dkboolreader.h"
#endif
#include "aom_dsp/prob.h"
#include "av1/common/odintrin.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#define ACCT_STR_NAME acct_str
#define ACCT_STR_PARAM , const char *ACCT_STR_NAME
#define ACCT_STR_ARG(s) , s
#else
#define ACCT_STR_PARAM
#define ACCT_STR_ARG(s)
#endif
#define aom_read(r, prob, ACCT_STR_NAME) \
aom_read_(r, prob ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_bit(r, ACCT_STR_NAME) \
aom_read_bit_(r ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_tree(r, tree, probs, ACCT_STR_NAME) \
aom_read_tree_(r, tree, probs ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_literal(r, bits, ACCT_STR_NAME) \
aom_read_literal_(r, bits ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_tree_bits(r, tree, probs, ACCT_STR_NAME) \
aom_read_tree_bits_(r, tree, probs ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_symbol(r, cdf, nsymbs, ACCT_STR_NAME) \
aom_read_symbol_(r, cdf, nsymbs ACCT_STR_ARG(ACCT_STR_NAME))
#ifdef __cplusplus
extern "C" {
#endif
#if CONFIG_ANS
typedef struct AnsDecoder aom_reader;
#elif CONFIG_DAALA_EC
typedef struct daala_reader aom_reader;
#else
typedef struct aom_dk_reader aom_reader;
#endif
static INLINE int aom_reader_init(aom_reader *r, const uint8_t *buffer,
size_t size, aom_decrypt_cb decrypt_cb,
void *decrypt_state) {
#if CONFIG_ANS
(void)decrypt_cb;
(void)decrypt_state;
assert(size <= INT_MAX);
return ans_read_init(r, buffer, size);
#elif CONFIG_DAALA_EC
(void)decrypt_cb;
(void)decrypt_state;
return aom_daala_reader_init(r, buffer, size);
#else
return aom_dk_reader_init(r, buffer, size, decrypt_cb, decrypt_state);
#endif
}
static INLINE const uint8_t *aom_reader_find_end(aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "Use the raw buffer size with ANS");
return NULL;
#elif CONFIG_DAALA_EC
return aom_daala_reader_find_end(r);
#else
return aom_dk_reader_find_end(r);
#endif
}
static INLINE int aom_reader_has_error(aom_reader *r) {
#if CONFIG_ANS
return ans_reader_has_error(r);
#elif CONFIG_DAALA_EC
return aom_daala_reader_has_error(r);
#else
return aom_dk_reader_has_error(r);
#endif
}
// Returns the position in the bit reader in bits.
static INLINE uint32_t aom_reader_tell(const aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "aom_reader_tell() is unimplemented for ANS");
return 0;
#elif CONFIG_DAALA_EC
return aom_daala_reader_tell(r);
#else
return aom_dk_reader_tell(r);
#endif
}
// Returns the position in the bit reader in 1/8th bits.
static INLINE uint32_t aom_reader_tell_frac(const aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "aom_reader_tell_frac() is unimplemented for ANS");
return 0;
#elif CONFIG_DAALA_EC
return aom_daala_reader_tell_frac(r);
#else
return aom_dk_reader_tell_frac(r);
#endif
}
#if CONFIG_ACCOUNTING
static INLINE void aom_process_accounting(const aom_reader *r ACCT_STR_PARAM) {
if (r->accounting != NULL) {
uint32_t tell_frac;
tell_frac = aom_reader_tell_frac(r);
aom_accounting_record(r->accounting, ACCT_STR_NAME,
tell_frac - r->accounting->last_tell_frac);
r->accounting->last_tell_frac = tell_frac;
}
}
#endif
static INLINE int aom_read_(aom_reader *r, int prob ACCT_STR_PARAM) {
int ret;
#if CONFIG_ANS
ret = uabs_read(r, prob);
#elif CONFIG_DAALA_EC
ret = aom_daala_read(r, prob);
#else
ret = aom_dk_read(r, prob);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
static INLINE int aom_read_bit_(aom_reader *r ACCT_STR_PARAM) {
int ret;
#if CONFIG_ANS
ret = uabs_read_bit(r); // Non trivial optimization at half probability
#else
ret = aom_read(r, 128, NULL); // aom_prob_half
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
static INLINE int aom_read_literal_(aom_reader *r, int bits ACCT_STR_PARAM) {
int literal = 0, bit;
for (bit = bits - 1; bit >= 0; bit--) literal |= aom_read_bit(r, NULL) << bit;
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return literal;
}
static INLINE int aom_read_tree_bits_(aom_reader *r, const aom_tree_index *tree,
const aom_prob *probs ACCT_STR_PARAM) {
aom_tree_index i = 0;
while ((i = tree[i + aom_read(r, probs[i >> 1], NULL)]) > 0) continue;
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return -i;
}
static INLINE int aom_read_tree_(aom_reader *r, const aom_tree_index *tree,
const aom_prob *probs ACCT_STR_PARAM) {
int ret;
#if CONFIG_DAALA_EC
ret = daala_read_tree_bits(r, tree, probs);
#else
ret = aom_read_tree_bits(r, tree, probs, NULL);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
#if CONFIG_EC_MULTISYMBOL
static INLINE int aom_read_symbol_(aom_reader *r, aom_cdf_prob *cdf,
int nsymbs ACCT_STR_PARAM) {
int ret;
#if CONFIG_RANS
(void)nsymbs;
ret = rans_read(r, cdf);
#elif CONFIG_DAALA_EC
ret = daala_read_symbol(r, cdf, nsymbs);
#else
#error \
"CONFIG_EC_MULTISYMBOL is selected without a valid backing entropy " \
"coder. Enable daala_ec or ans for a valid configuration."
#endif
#if CONFIG_EC_ADAPT
update_cdf(cdf, ret, nsymbs);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
#endif // CONFIG_EC_MULTISYMBOL
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITREADER_H_

View File

@@ -0,0 +1,47 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#include "./bitreader_buffer.h"
size_t aom_rb_bytes_read(struct aom_read_bit_buffer *rb) {
return (rb->bit_offset + 7) >> 3;
}
int aom_rb_read_bit(struct aom_read_bit_buffer *rb) {
const size_t off = rb->bit_offset;
const size_t p = off >> 3;
const int q = 7 - (int)(off & 0x7);
if (rb->bit_buffer + p < rb->bit_buffer_end) {
const int bit = (rb->bit_buffer[p] >> q) & 1;
rb->bit_offset = off + 1;
return bit;
} else {
rb->error_handler(rb->error_handler_data);
return 0;
}
}
int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits) {
int value = 0, bit;
for (bit = bits - 1; bit >= 0; bit--) value |= aom_rb_read_bit(rb) << bit;
return value;
}
int aom_rb_read_signed_literal(struct aom_read_bit_buffer *rb, int bits) {
const int value = aom_rb_read_literal(rb, bits);
return aom_rb_read_bit(rb) ? -value : value;
}
int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits) {
const int nbits = sizeof(unsigned) * 8 - bits - 1;
const unsigned value = (unsigned)aom_rb_read_literal(rb, bits + 1) << nbits;
return ((int)value) >> nbits;
}

View File

@@ -0,0 +1,48 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITREADER_BUFFER_H_
#define AOM_DSP_BITREADER_BUFFER_H_
#include <limits.h>
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef void (*aom_rb_error_handler)(void *data);
struct aom_read_bit_buffer {
const uint8_t *bit_buffer;
const uint8_t *bit_buffer_end;
size_t bit_offset;
void *error_handler_data;
aom_rb_error_handler error_handler;
};
size_t aom_rb_bytes_read(struct aom_read_bit_buffer *rb);
int aom_rb_read_bit(struct aom_read_bit_buffer *rb);
int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits);
int aom_rb_read_signed_literal(struct aom_read_bit_buffer *rb, int bits);
int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITREADER_BUFFER_H_

179
aom_dsp/bitwriter.h Normal file
View File

@@ -0,0 +1,179 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITWRITER_H_
#define AOM_DSP_BITWRITER_H_
#include <assert.h>
#include "./aom_config.h"
#if CONFIG_EC_ADAPT && !CONFIG_EC_MULTISYMBOL
#error "CONFIG_EC_ADAPT is enabled without enabling CONFIG_EC_MULTISYMBOL"
#endif
#if CONFIG_ANS
#include "aom_dsp/buf_ans.h"
#elif CONFIG_DAALA_EC
#include "aom_dsp/daalaboolwriter.h"
#else
#include "aom_dsp/dkboolwriter.h"
#endif
#include "aom_dsp/prob.h"
#if CONFIG_RD_DEBUG
#include "av1/encoder/cost.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif
#if CONFIG_ANS
typedef struct BufAnsCoder aom_writer;
#elif CONFIG_DAALA_EC
typedef struct daala_writer aom_writer;
#else
typedef struct aom_dk_writer aom_writer;
#endif
typedef struct TOKEN_STATS { int64_t cost; } TOKEN_STATS;
static INLINE void aom_start_encode(aom_writer *bc, uint8_t *buffer) {
#if CONFIG_ANS
(void)bc;
(void)buffer;
assert(0 && "buf_ans requires a more complicated startup procedure");
#elif CONFIG_DAALA_EC
aom_daala_start_encode(bc, buffer);
#else
aom_dk_start_encode(bc, buffer);
#endif
}
static INLINE void aom_stop_encode(aom_writer *bc) {
#if CONFIG_ANS
(void)bc;
assert(0 && "buf_ans requires a more complicated shutdown procedure");
#elif CONFIG_DAALA_EC
aom_daala_stop_encode(bc);
#else
aom_dk_stop_encode(bc);
#endif
}
static INLINE void aom_write(aom_writer *br, int bit, int probability) {
#if CONFIG_ANS
buf_uabs_write(br, bit, probability);
#elif CONFIG_DAALA_EC
aom_daala_write(br, bit, probability);
#else
aom_dk_write(br, bit, probability);
#endif
}
static INLINE void aom_write_record(aom_writer *br, int bit, int probability,
TOKEN_STATS *token_stats) {
aom_write(br, bit, probability);
#if CONFIG_RD_DEBUG
token_stats->cost += av1_cost_bit(probability, bit);
#else
(void)token_stats;
#endif
}
static INLINE void aom_write_bit(aom_writer *w, int bit) {
aom_write(w, bit, 128); // aom_prob_half
}
static INLINE void aom_write_bit_record(aom_writer *w, int bit,
TOKEN_STATS *token_stats) {
aom_write_record(w, bit, 128, token_stats); // aom_prob_half
}
static INLINE void aom_write_literal(aom_writer *w, int data, int bits) {
int bit;
for (bit = bits - 1; bit >= 0; bit--) aom_write_bit(w, 1 & (data >> bit));
}
static INLINE void aom_write_tree_bits(aom_writer *w, const aom_tree_index *tr,
const aom_prob *probs, int bits, int len,
aom_tree_index i) {
do {
const int bit = (bits >> --len) & 1;
aom_write(w, bit, probs[i >> 1]);
i = tr[i + bit];
} while (len);
}
static INLINE void aom_write_tree_bits_record(aom_writer *w,
const aom_tree_index *tr,
const aom_prob *probs, int bits,
int len, aom_tree_index i,
TOKEN_STATS *token_stats) {
do {
const int bit = (bits >> --len) & 1;
aom_write_record(w, bit, probs[i >> 1], token_stats);
i = tr[i + bit];
} while (len);
}
static INLINE void aom_write_tree(aom_writer *w, const aom_tree_index *tree,
const aom_prob *probs, int bits, int len,
aom_tree_index i) {
#if CONFIG_DAALA_EC
daala_write_tree_bits(w, tree, probs, bits, len, i);
#else
aom_write_tree_bits(w, tree, probs, bits, len, i);
#endif
}
static INLINE void aom_write_tree_record(aom_writer *w,
const aom_tree_index *tree,
const aom_prob *probs, int bits,
int len, aom_tree_index i,
TOKEN_STATS *token_stats) {
#if CONFIG_DAALA_EC
(void)token_stats;
daala_write_tree_bits(w, tree, probs, bits, len, i);
#else
aom_write_tree_bits_record(w, tree, probs, bits, len, i, token_stats);
#endif
}
#if CONFIG_EC_MULTISYMBOL
static INLINE void aom_write_symbol(aom_writer *w, int symb, aom_cdf_prob *cdf,
int nsymbs) {
#if CONFIG_RANS
struct rans_sym s;
(void)nsymbs;
assert(cdf);
s.cum_prob = symb > 0 ? cdf[symb - 1] : 0;
s.prob = cdf[symb] - s.cum_prob;
buf_rans_write(w, &s);
#elif CONFIG_DAALA_EC
daala_write_symbol(w, symb, cdf, nsymbs);
#else
#error \
"CONFIG_EC_MULTISYMBOL is selected without a valid backing entropy " \
"coder. Enable daala_ec or ans for a valid configuration."
#endif
#if CONFIG_EC_ADAPT
update_cdf(cdf, symb, nsymbs);
#endif
}
#endif // CONFIG_EC_MULTISYMBOL
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITWRITER_H_

View File

@@ -0,0 +1,43 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <limits.h>
#include <stdlib.h>
#include "./aom_config.h"
#include "./bitwriter_buffer.h"
size_t aom_wb_bytes_written(const struct aom_write_bit_buffer *wb) {
return wb->bit_offset / CHAR_BIT + (wb->bit_offset % CHAR_BIT > 0);
}
void aom_wb_write_bit(struct aom_write_bit_buffer *wb, int bit) {
const int off = (int)wb->bit_offset;
const int p = off / CHAR_BIT;
const int q = CHAR_BIT - 1 - off % CHAR_BIT;
if (q == CHAR_BIT - 1) {
wb->bit_buffer[p] = bit << q;
} else {
wb->bit_buffer[p] &= ~(1 << q);
wb->bit_buffer[p] |= bit << q;
}
wb->bit_offset = off + 1;
}
void aom_wb_write_literal(struct aom_write_bit_buffer *wb, int data, int bits) {
int bit;
for (bit = bits - 1; bit >= 0; bit--) aom_wb_write_bit(wb, (data >> bit) & 1);
}
void aom_wb_write_inv_signed_literal(struct aom_write_bit_buffer *wb, int data,
int bits) {
aom_wb_write_literal(wb, data, bits + 1);
}

View File

@@ -0,0 +1,39 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITWRITER_BUFFER_H_
#define AOM_DSP_BITWRITER_BUFFER_H_
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
struct aom_write_bit_buffer {
uint8_t *bit_buffer;
size_t bit_offset;
};
size_t aom_wb_bytes_written(const struct aom_write_bit_buffer *wb);
void aom_wb_write_bit(struct aom_write_bit_buffer *wb, int bit);
void aom_wb_write_literal(struct aom_write_bit_buffer *wb, int data, int bits);
void aom_wb_write_inv_signed_literal(struct aom_write_bit_buffer *wb, int data,
int bits);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITWRITER_BUFFER_H_

Some files were not shown because too many files have changed in this diff Show More