Compare commits

...

2637 Commits

Author SHA1 Message Date
Jingning Han
6515afc6b9 Merge "Add min_tx_size variable to recursive transform block partition system" into nextgenv2 2016-11-08 19:14:33 +00:00
Yaowu Xu
f6e958b604 Merge "Fix the bug that PVQ commit broke dering" into nextgenv2 2016-11-08 18:00:53 +00:00
Angie Chiang
13ea019574 Merge changes Ib9428dc9,Ide04717a,If1dba7d8,I6da97880 into nextgenv2
* changes:
  Merge rd_stats only when it is valid
  Let parentheses in handle_inter_mode be symmetric
  Add RD_STATS into MB_MODE_INFO
  Add txb_coeff_cost_map
2016-11-08 17:42:04 +00:00
Jingning Han
e67b38aa7c Add min_tx_size variable to recursive transform block partition system
Replace max_tx_size with min_tx_size for transform type decision.

Change-Id: I64e39923a67903d52b381bd93eaac33b3400a201
2016-11-08 09:36:54 -08:00
Yushin Cho
48f84dbd1c Fix the bug that PVQ commit broke dering
Since PVQ's max block size equals to the max transform size,
daala's definition of OD_BSIZE_MAX was changed from 5 down to 4 to
use AV1's max trasform size 32x32. However, dering also uses
OD_BSIZE_MAX and assumes its value is 5, which caused dering
not working.

Change-Id: I9d82bb24adc7d57552a8e0a8a7e798e77d96fd4b
2016-11-08 08:15:57 -08:00
Nathan E. Egge
f0481a590f Use --enable-daala_ec by default.
Change-Id: I9e2a8db4e59cb9c109e978e473749ebc4e910148
2016-11-07 21:11:31 -08:00
Brennan Shacklett
e0b5ae8c4e Remove multiple coefficient buffers from PICK_MODE_CONTEXT
This commit is a manual cherry-pick from aom/master:
45592a39d3b00aee4d6bd70da669400017b7a5d8

Only part of the changes apply in nextgenv2

Change-Id: I1e22514c6fe5af556710254278f2f8a5805db999
2016-11-08 03:50:10 +00:00
Tom Finegan
973d4d56fa cmake: Add partial configure.
- Add minimal compiler flag testing.
- Generate aom_config.c and aom_config.h. Note: hard coded
  to generic-gnu values for now.
- Still a work in progress. This will not build anything.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76

Change-Id: Id65b42ea9f4c4f744d788660e2de7234886ce039
2016-11-08 01:51:12 +00:00
Tom Finegan
9b3974ef34 aom_ports: Fix build in Xcode 8.
Use void casts and avoid unused/unnamed parameter warnings.

Change-Id: Id02ec2c613cb1423f693bcc56832ccd9b41d05bd
2016-11-08 01:50:54 +00:00
Yaowu Xu
fc1b213af1 Use block_size for max_scan_line in pvq decoding
Change-Id: I642bc205a7d2c4d472385fbeb4323e62e17984b4
2016-11-08 00:55:23 +00:00
Yaowu Xu
856c55e93d Add transform parameter initialization
The initialization of transform parameters was missing, that led to a
crash in encoder.

Change-Id: I9e35830d5f24e771c845f0d8881671d6b7228c5e
2016-11-08 00:36:30 +00:00
Tristan Matthews
cc37e36683 pvq: drop unused declaration
Change-Id: I5d95bb897d335dc17aa0ae5e873ba7dee46c6fda
2016-11-07 22:14:50 +00:00
Yaowu Xu
f591782085 Fix compiler warning of out-of-bound array access
Change-Id: I00f147cd372cedc5038708b0f23f6fae68918528
2016-11-07 22:14:24 +00:00
Yaowu Xu
dc9720433f Merge "Fix compiler warning of un-used variables" into nextgenv2 2016-11-07 22:03:12 +00:00
Yaowu Xu
4007fa6852 Merge "change to call fwd_txfm()" into nextgenv2 2016-11-07 22:03:01 +00:00
Yaowu Xu
c4c21734d6 Merge "Resolve merge issues with --enable-pvq" into nextgenv2 2016-11-07 20:31:33 +00:00
Yaowu Xu
02d4c3b780 Fix compiler warning of un-used variables
Change-Id: I17d05bbf75a201fd010fc17e2d9bd0db8ef36d41
2016-11-07 19:56:13 +00:00
Yaowu Xu
3442b4b159 change to call fwd_txfm()
The transform functions have been refactored in nextgenv2, this commit
resolves the calls in pvq patch to use this new scheme.

Change-Id: I1b56e75106a3357bb19bd7df2b4ba305eb9ed185
2016-11-07 10:40:41 -08:00
Yaowu Xu
d6ea71cf73 Resolve merge issues with --enable-pvq
This commit resolves some compiling issues due to merge.

Change-Id: I0eef8aa36c404e185e0b0004948a49307c360d3e
2016-11-07 10:35:55 -08:00
Debargha Mukherjee
03dc29bdf3 Merge "Fix bug in bicubic filter in warped_motion.c" into nextgenv2 2016-11-07 17:58:47 +00:00
Yaowu Xu
00a0e010f7 Merge "New experiment: Perceptual Vector Quantization from Daala" into nextgenv2 2016-11-07 16:00:32 +00:00
David Barker
f23bdca6a8 Fix bug in bicubic filter in warped_motion.c
Previously, do_cubic_filter would return results with the
wrong precision if the sample point was exactly aligned to
a pixel.

Change-Id: I40139f9a6701a8e72e691f37bb352f7814a7f306
2016-11-07 13:47:13 +00:00
Yushin Cho
77bba8d30a New experiment: Perceptual Vector Quantization from Daala
PVQ replaces the scalar quantizer and coefficient coding with a new
design originally developed in Daala. It currently depends on the
Daala entropy coder although it could be adapted to work with another
entropy coder if needed:
./configure --enable-experimental --enable-daala_ec --enable-pvq

The version of PVQ in this commit is adapted from the following
revision of Daala:
fb51c1ade6

More information about PVQ:
- https://people.xiph.org/~jm/daala/pvq_demo/
- https://jmvalin.ca/papers/spie_pvq.pdf

The following files are copied as-is from Daala with minimal
adaptations, therefore we disable clang-format on those files
to make it easier to synchronize the AV1 and Daala codebases in the future:
 av1/common/generic_code.c
 av1/common/generic_code.h
 av1/common/laplace_tables.c
 av1/common/partition.c
 av1/common/partition.h
 av1/common/pvq.c
 av1/common/pvq.h
 av1/common/state.c
 av1/common/state.h
 av1/common/zigzag.h
 av1/common/zigzag16.c
 av1/common/zigzag32.c
 av1/common/zigzag4.c
 av1/common/zigzag64.c
 av1/common/zigzag8.c
 av1/decoder/decint.h
 av1/decoder/generic_decoder.c
 av1/decoder/laplace_decoder.c
 av1/decoder/pvq_decoder.c
 av1/decoder/pvq_decoder.h
 av1/encoder/daala_compat_enc.c
 av1/encoder/encint.h
 av1/encoder/generic_encoder.c
 av1/encoder/laplace_encoder.c
 av1/encoder/pvq_encoder.c
 av1/encoder/pvq_encoder.h

Known issues:
- Lossless mode is not supported, '--lossless=1' will give the same result as
'--end-usage=q --cq-level=1'.
- High bit depth is not supported by PVQ.

Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
2016-11-06 22:18:01 -08:00
Angie Chiang
616990d607 Merge rd_stats only when it is valid
Change-Id: Ib9428dc9b6e224fdb5d410368c5b92042c96f68a
2016-11-06 15:25:37 -08:00
Angie Chiang
78a3bc165c Let parentheses in handle_inter_mode be symmetric
Change-Id: Ide04717a8ce2a7c1245f9614485647e296e96abd
2016-11-06 13:01:16 -08:00
Angie Chiang
9a44f5fbc8 Add RD_STATS into MB_MODE_INFO
With RD_STATS in MB_MODE_INFO, we will be able to compare the results
from rate-distortion loop and the results from bitstream packing.

Change-Id: If1dba7d87126577a6f369ac087d4517f7cebb0c5
2016-11-06 12:21:34 -08:00
Angie Chiang
85279f6668 Add txb_coeff_cost_map
The txb_coeff_cost_map is a 16x16 map which records each single
transform block's cost from the transform block's location in 4-pixel
unit in recursive transform experiment.

Change-Id: I6da97880c457680594bca56617084010891beaa2
2016-11-06 11:55:17 -08:00
Debargha Mukherjee
92447f34df Merge "Increase gm precision from 16 to 32 bit ints" into nextgenv2 2016-11-06 10:08:02 +00:00
Debargha Mukherjee
5f305854e6 Increase gm precision from 16 to 32 bit ints
Change-Id: I7117a6c14dc8438e4225b50bd2d3ebbaa7f850cc
2016-11-05 16:50:08 -07:00
Tristan Matthews
ec994d8bbd accounting_test: fix read of uninitialized data
Only read bits that were actually written.

Change-Id: I6d901123c319a1d92c54f511d3caa56daf882281
2016-11-05 10:49:16 -07:00
Tristan Matthews
4891ef9ae0 boolcoder_test: fix read of uninitialized data
Only read bits that were actually written.

Change-Id: Id62c52b7804cbfb401e6e7388201406bc899ea5d
2016-11-05 10:48:55 -07:00
Tom Finegan
591fc6f1aa aom_ports: Silence warnings in aom_timer.h
When CONFIG_OS_SUPPORT is not enabled the aom_timer timer function
stubs cause unused parameter warnings. This comments out the arg
names and silences the warning.

Change-Id: I97bdbcbebdf081ac5cb2ffd86439028a1e672fa2
2016-11-05 10:48:36 -07:00
Yaowu Xu
deaff66955 Merge "Fix the bool coder test" into nextgenv2 2016-11-05 17:43:54 +00:00
James Zern
99ff89b6fb Merge "rdopt: clear maybe-uninitialized variable warning" into nextgenv2 2016-11-05 03:19:33 +00:00
Sarah Parker
70c4fab569 rdopt: clear maybe-uninitialized variable warning
av1/encoder/rdopt.c:9533 ‘zeromv[1].as_int’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
this was spurious given the logic in the if

Change-Id: I8ddfe7e46d1bf5593cc8624f05c9f181243a87d4
2016-11-04 17:56:23 -07:00
Yushin Cho
3e4fcb4ff4 Fix the bool coder test
Fix the bool coder test not to use a probability of 100%.

Change-Id: I799871cb0c48580edf0ee15a6c9931d27591ec99
(cherry picked from commit 9b79f6a3d6ea398e5d51d3d1dd69cbfb1725370e)
2016-11-04 16:07:00 -07:00
Jingning Han
713b56121a Merge "Clean up write_tx_type()" into nextgenv2 2016-11-04 23:03:53 +00:00
Jingning Han
0880b5466f Merge "Refactor tx_type reader" into nextgenv2 2016-11-04 23:03:47 +00:00
Jingning Han
05abee1530 Merge "Factor out common tx_type writing codes from inter/intra frame" into nextgenv2 2016-11-04 23:03:38 +00:00
Angie Chiang
7a77169a35 Merge changes Ia37f170d,Ie3082db5 into nextgenv2
* changes:
  Record YUV planes' txfm block coeff cost in handle_inter_mode()
  Separate coefficient cost of U/V planes in write_modes_b()
2016-11-04 22:58:58 +00:00
Angie Chiang
59526ead45 Merge changes I3bc782d6,I8359e849,Iae50d0b0,Id1704d88,Ia69f13c4, ... into nextgenv2
* changes:
  Add av1_ prefix on ###_rd_stats functions
  Use init_rd_stats() in encodeframe.c
  Add transform block coefficient cost in RD_STATS for debugging
  Add helper functions to modify RD_STATS
  Add mi_row and mi_col into mbmi to facilitate rd_debug process
  Add token cost comparison in write_modes_b()
2016-11-04 22:43:30 +00:00
James Zern
653fdd6d55 Merge changes I139808f4,I3d97d8db into nextgenv2
* changes:
  warped_motion.c: delete unused filter_4tap[]
  warped_motion.c: quiet float-conversion warnings
2016-11-04 22:34:11 +00:00
Angie Chiang
628d7c915b Record YUV planes' txfm block coeff cost in handle_inter_mode()
Change-Id: Ia37f170d8fd961d78a751d84b9525ab7e973b81a
2016-11-04 11:12:44 -07:00
Angie Chiang
c0feea8a0c Add av1_ prefix on ###_rd_stats functions
Change-Id: I3bc782d68bcd9b52b38210eec9eecb21146fde75
2016-11-04 11:12:44 -07:00
Angie Chiang
75f45814ba Separate coefficient cost of U/V planes in write_modes_b()
Change-Id: Ie3082db5b0fead8c322b2aeede4eff7cd723ea12
2016-11-04 11:12:44 -07:00
Angie Chiang
4695b97030 Use init_rd_stats() in encodeframe.c
Change-Id: I8359e8498efd301ff81eea1d7466d0f3fec5e006
2016-11-04 11:11:27 -07:00
Angie Chiang
d81fdb41e6 Add transform block coefficient cost in RD_STATS for debugging
Change-Id: Iae50d0b0c4f8f383ab4f91d2c1c2fa4e799c7250
2016-11-04 11:11:27 -07:00
Angie Chiang
d7246171b5 Add helper functions to modify RD_STATS
Those functions includes
init_rd_stats()
invalid_rd_stats()
merge_rd_stats()

This CL help simplify the code.

Change-Id: Id1704d883bd21a039b0478a940994ca14184ae1c
2016-11-04 11:11:27 -07:00
Angie Chiang
394c337754 Add mi_row and mi_col into mbmi to facilitate rd_debug process
Change-Id: Ia69f13c47f2dd34fabd220652691049166a06a68
2016-11-04 11:09:24 -07:00
Angie Chiang
d402282f69 Add token cost comparison in write_modes_b()
This is just partial implementation
Compare token cost of pack_mb_tokens/pack_txb_tokens with token cost
from rate-distortion loop. If there is any difference, dump out mode
info.

Change-Id: I46b373ee2522c5047f799f36baf7cec5fbc06f06
2016-11-04 11:09:24 -07:00
Jingning Han
4be3214fec Merge "Properly schedule the transform block recursion order" into nextgenv2 2016-11-04 17:53:53 +00:00
Jingning Han
641b1ad5ad Clean up write_tx_type()
Remove repeated mbmi->tx_size calls.

Change-Id: I3e4e03b69b2efffd860cc1ea34e150f4257bf081
2016-11-04 10:36:20 -07:00
Jingning Han
ab7163db08 Refactor tx_type reader
Factor out common codes. Remove repeated mbmi->tx_size calls.

Change-Id: Id5de35e88f1a5f16223eaa06fc2c9f69124061ef
2016-11-04 10:35:34 -07:00
Jingning Han
2a4da9476b Factor out common tx_type writing codes from inter/intra frame
Change-Id: Id2626bd19db2504756d9a1dee709c2d08c79f771
2016-11-04 10:33:12 -07:00
Yue Chen
95a3898cbd Merge "Remove duplicated variables in EXT_INTER" into nextgenv2 2016-11-04 17:11:08 +00:00
Jingning Han
98d6a1f247 Properly schedule the transform block recursion order
This commit replaces the offset based block index calculation with
incremental based one. It does not change the coding statistics.

Change-Id: I3789294eb45416bd0823e773ec30f05ed41ba0dc
2016-11-04 09:06:49 -07:00
Jingning Han
137b2671eb Fix format issue in handle_inter_mode()
Change-Id: I681fd799cf46991de419cc867ccb649a6990c19d
2016-11-04 08:31:24 -07:00
Debargha Mukherjee
68d695b7ca Merge "Further work on 64x64 fwd/inv transform support" into nextgenv2 2016-11-04 09:32:07 +00:00
Angie Chiang
e89ea0ceb7 Merge "Refactor: Replace rate dist sse skip by RD_STATS in VAR_TX" into nextgenv2 2016-11-04 05:42:59 +00:00
Debargha Mukherjee
21378b8ad0 Merge "Fix bilateral filter asan error for highbitdepth" into nextgenv2 2016-11-04 05:25:49 +00:00
James Zern
5d54c175c2 warped_motion.c: delete unused filter_4tap[]
Change-Id: I139808f492a9e9dcac44a36237b61231ede7edc3
2016-11-03 20:12:20 -07:00
James Zern
4846e446c6 warped_motion.c: quiet float-conversion warnings
Change-Id: I3d97d8db51a5a5d6b2c1cae47492b53ab37100a7
2016-11-03 20:11:06 -07:00
James Zern
005ff81598 Merge "warped_motion: Fix ubsan warning for signed integer overflow" into nextgenv2 2016-11-04 00:58:07 +00:00
James Zern
9371394492 Merge "Fix ubsan divide by zero warning in ransac" into nextgenv2 2016-11-04 00:56:23 +00:00
Sarah Parker
db92635745 warped_motion: Fix ubsan warning for signed integer overflow
Change-Id: Ie698aa02ef56128759c71079e9bfa1af25149644
2016-11-04 00:54:25 +00:00
Angie Chiang
b5dda4887b Refactor: Replace rate dist sse skip by RD_STATS in VAR_TX
This is to facilitate implementation of rd_debug tool; it doesn't change
coding behavior.

Change-Id: I0eb82b31473883ba6652ed11dca09b9ec4530183
2016-11-03 17:51:26 -07:00
Debargha Mukherjee
c57924cb9e Fix bilateral filter asan error for highbitdepth
BUG=webm:1334

Change-Id: I5886eec0a22a8cc056e1bdb493d2faf183816656
2016-11-03 16:23:09 -07:00
James Zern
97a2c675e7 Merge "rdopt,global-motion: Fix -1 indexing ubsan warning" into nextgenv2 2016-11-03 22:59:34 +00:00
Sarah Parker
182953b299 rdopt,global-motion: Fix -1 indexing ubsan warning
Change-Id: I1b3caf3543ab385f39f5f253c9949ad89ea5af7d
2016-11-03 22:58:47 +00:00
Alex Converse
5cb72a2dba Merge "Use TX_SIZES in intra_high_pred_fn declarations" into nextgenv2 2016-11-03 22:13:34 +00:00
Yue Chen
9d3e478e72 Remove duplicated variables in EXT_INTER
Introduced by merge commit 141f7a9

Change-Id: Idd68e09a6cd925d97466eabebe0e4905b5031340
2016-11-03 15:12:42 -07:00
Sarah Parker
b60c138cdf Merge "Make inline function static to fix clang compile error" into nextgenv2 2016-11-03 22:09:46 +00:00
Alex Converse
9613758e71 Merge "Don't use a TX_SIZE as a TX_TYPE" into nextgenv2 2016-11-03 21:44:21 +00:00
Debargha Mukherjee
6a47cff882 Further work on 64x64 fwd/inv transform support
For higher level fwd and inv transform functions.

Change-Id: I91518250a0be7d94aada7519f6c9e7ed024574fb
2016-11-03 14:32:54 -07:00
Sarah Parker
fa75ae0663 Fix ubsan divide by zero warning in ransac
Change-Id: I8c736ff665a27ce8307fd62571b9728333756d7e
2016-11-03 13:03:45 -07:00
Debargha Mukherjee
d65708a375 Merge "Replace hard coded numbers with TX_SIZES macro" into nextgenv2 2016-11-03 19:59:10 +00:00
Jingning Han
a504e77a98 Merge "Fix txb_w/h use case in av1_tx_block_rd_b()" into nextgenv2 2016-11-03 19:31:48 +00:00
Yaowu Xu
565f788de9 Merge "fix build issue with --enable-delta-q" into nextgenv2 2016-11-03 18:45:57 +00:00
Sarah Parker
fb3971e55a Make inline function static to fix clang compile error
Change-Id: I0432b8274a2764ba978dd6c4ed532fb7e4b7b519
2016-11-03 10:50:50 -07:00
Alex Converse
86b56742fb Use TX_SIZES in intra_high_pred_fn declarations
Change-Id: I078bb5244dbff153bcfab226206540ca6cebdad0
2016-11-03 10:28:11 -07:00
Alex Converse
f0ede18718 Don't use a TX_SIZE as a TX_TYPE
Change-Id: I26b02e6578ad2d82aadfe1df2aeb84e6c11a747b
2016-11-03 10:28:05 -07:00
Angie Chiang
2b10128a55 Add rd_debug flag
rd_debug is a debug tool aim at finding discrepancy between rate-distortion
loop and bitstream packing.

Change-Id: I751c4121516c5e6368668229c77778880a9dcb9d
2016-11-03 10:25:50 -07:00
Jingning Han
4b47c937d0 Fix txb_w/h use case in av1_tx_block_rd_b()
Match them with block_row/col index.

Change-Id: Idf0f924a093e5312b0a36b765d295e52d033eb5a
2016-11-03 09:20:08 -07:00
Yaowu Xu
5bb8f5b705 fix build issue with --enable-delta-q
BUG=webm:1330

Change-Id: I120ce8ea3581018b232b19ca7ffbb07d3e99d8d0
2016-11-03 09:03:39 -07:00
Debargha Mukherjee
e04fdb2308 Replace hard coded numbers with TX_SIZES macro
Replaces a couple of hard coded numbers with TX_SIZES macro
in common/reconiontra.c

Change-Id: I8a2a53ca16bc3ab51409cec340bea55292ff2dee
2016-11-03 08:51:42 -07:00
Jingning Han
1b5bbf8e97 Merge "Refactor recursive transform block partition search" into nextgenv2 2016-11-03 15:41:57 +00:00
Jingning Han
07cfa29031 Merge "Make bit-stream support rectangular tx_size" into nextgenv2 2016-11-03 15:41:46 +00:00
Yaowu Xu
c1ca945ce5 Merge changes from topic 'update_dering' into nextgenv2
* changes:
  Reformatting the deringing code
  Introducing OD_DERING_SIZE_LOG2 constant (3)
  Renaming deringing blockwise write-back functions to make code clearer
  Deringing refactoring: replace last_sbc with simpler dering_left flag
  Getting rid of the od_dering_in type
2016-11-03 14:03:25 +00:00
Yaowu Xu
7036aee1a4 Merge "Refactoring deringed block list code" into nextgenv2 2016-11-03 13:48:58 +00:00
Yaowu Xu
71c72561fa Merge "Deringing line buffer no longer depends on holding OD_DERING_VERY_LARGE" into nextgenv2 2016-11-03 05:02:32 +00:00
Jingning Han
e60d3294ea Merge "Make recursive txfm encoding process support rectangular tx_size" into nextgenv2 2016-11-03 04:36:55 +00:00
Jingning Han
141f7a9757 Merge "Fix a merge bug between dual_filter and sub8x8mc" into nextgenv2 2016-11-03 01:06:39 +00:00
Jingning Han
1e477f9833 Merge "Remove redundant experimental flags from common_data.h" into nextgenv2 2016-11-03 01:04:45 +00:00
Jingning Han
18482fe32d Refactor recursive transform block partition search
This commit refactors the recursive transform block partition
search process to make it support rectangular transform block size
coding.

Change-Id: I0207ae40d83c7eae3cb5d460e403f470747590d3
2016-11-02 17:03:09 -07:00
Jingning Han
f64062f36f Make bit-stream support rectangular tx_size
Allow the transform size writing, reading, and the reconstruction
process to support rectangular transform block size coding.

Change-Id: I57393c73ec60835a088d785ca838d7e3d7eb29a4
2016-11-02 16:24:20 -07:00
Jean-Marc Valin
39d92a071d Reformatting the deringing code
Manally removed the "clang-format off" lines. The rest is done by clang

Change-Id: I88a2028b55a541729b4e8896cdf66b544e9898bb
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
e04650347c Refactoring deringed block list code
Using a struct named dlist rather than an array named bskip. Simplified some
code.

No change in output

Change-Id: Id40d40b19b5d8f2ebafe347590fa1bb8cb80e6e1
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
01b7780154 Deringing line buffer no longer depends on holding OD_DERING_VERY_LARGE
The OD_DERING_VERY_LARGE values are now explicitly copied to the buffer instead
of being read from the line buffer when we're on the edge of the frame. This
will make it possible to make the line buffer 8-bit for non-high-bitdepth.

No change in output

Change-Id: I1a4134d67ac7f8c239f08d73941405c56f01050b
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
e254241ce7 Introducing OD_DERING_SIZE_LOG2 constant (3)
Also cleans up the size of the deringing destination buffer.

No change in output.

Change-Id: I7fc50d862d3906ce809c1031bf0789acdf39cf34
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
58fdec2cbf Renaming deringing blockwise write-back functions to make code clearer
No change in output.

Change-Id: Ifa5df3adce9f24ef6dcd89a5f33a744bfb57194d
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
3544d15130 Deringing refactoring: replace last_sbc with simpler dering_left flag
No change in output.

Change-Id: I1cc2e14b2bb6c343baa7f88348c875085e5863af
2016-11-02 15:51:00 -07:00
Jean-Marc Valin
39ee109333 Getting rid of the od_dering_in type
We no longer need the deringing code to be generic wrt the input depth.

No change in output.

Change-Id: I2db2beb82f1816e611cd2c0438dff217d363de33
2016-11-02 15:51:00 -07:00
Jingning Han
fee498255d Merge "Remove unused get_intra/inter_scan() from scan.h" into nextgenv2 2016-11-02 22:50:59 +00:00
Jingning Han
1a0faab642 Merge "Remove redundant config flags from get_entropy_context" into nextgenv2 2016-11-02 22:50:49 +00:00
Jean-Marc Valin
d95322a35c Now using a single line buffer
No change in output.

Change-Id: I4701a5517fb97889f970acfb0b44cee51c34fd95
2016-11-02 22:50:10 +00:00
Jean-Marc Valin
621e707259 Only copy data from deringed blocks to the line buffer
No change in output

Change-Id: I6ec4a8c635337562170153585e427afd6f9d9a0f
2016-11-02 22:49:54 +00:00
Jean-Marc Valin
50bb32ec87 Splitting out 8->16 block copy code into copy_sb8_16()
No change in output.

Change-Id: I4f0e37a879432e2647b3debe6a2c0c670a79dd6f
2016-11-02 22:49:39 +00:00
Jean-Marc Valin
39b0d2fb14 Eliminate the big superblock row buffer.
Now only buffering three lines across the entire frame and four lines
over the height of one superblock.

No change in output.

Change-Id: I6b99399974e197dc02f2e4ff2e60cdd7fdaa2e43
2016-11-02 22:48:47 +00:00
Jean-Marc Valin
b154a24283 Making deringing buffer only one row of superblocks at a time
This introduces a line buffer that hold the last three lines of each original
row so that the next row can be deringed with the original input of the upper
row.

No change in output

Change-Id: I8fad3bc48745e9ce3e440289f453477a0c5442c0
2016-11-02 22:48:19 +00:00
Jingning Han
a9336328d4 Make recursive txfm encoding process support rectangular tx_size
This commit makes the encoding process of the recursive transform
block partition support both rectangular and square transform block
sizes as the starting point. If the coding block size is rectangular,
it would allow the transform block size to start from the largest
rectangular transform size, and recursive parse to the selected
coding sizes.

Change-Id: I576628b9166565bada6a918f0a1e67849dfef4cd
2016-11-02 15:48:07 -07:00
Jean-Marc Valin
ca1eb5dc58 Duplicating deringing input superblock copy to make upcoming changes easier
No change in output

Change-Id: Iaa06043dcc31308c83f667424e5a83c2db50ed24
2016-11-02 22:46:53 +00:00
Jean-Marc Valin
8e941780be Using a uniform definition for "bsize" in deringing filter
No change in output

Change-Id: Ia3a1679aa75cb58f4bc6459791e061176eeafd52
2016-11-02 22:46:27 +00:00
Jean-Marc Valin
eab77ea936 Using the copied input for od_dir_find8()
No change in output

Change-Id: Iec1411c35bf175a462eade34e89a4c60eb2a1da4
2016-11-02 15:41:01 -07:00
Yaowu Xu
6285c6674d Merge "Increase deringing horizontal padding to 4 pixels on each side" into nextgenv2 2016-11-02 22:37:39 +00:00
Jean-Marc Valin
471687a9ac Increase deringing horizontal padding to 4 pixels on each side
This makes vectorization easier by having buffer lines be a multiple of 4.

No change in output

Change-Id: I7ec06e03a49554206af0a55aab03daccc411b50f
2016-11-02 22:37:35 +00:00
Yaowu Xu
4cb9a620db Merge "De-sparsifying the deringing output buffer" into nextgenv2 2016-11-02 22:35:05 +00:00
Jean-Marc Valin
82c65fc837 De-sparsifying the deringing output buffer
No change in output

Change-Id: I940203975564aedca8734d6f74b013edb513f517
2016-11-02 22:35:00 +00:00
Yaowu Xu
44f3587459 Merge "No need to store the deringing filter direction variance in an array" into nextgenv2 2016-11-02 22:34:46 +00:00
Jingning Han
46003149e5 Fix a merge bug between dual_filter and sub8x8mc
The function module in inter_predictor() has been changed to
universally support arbitrary block size inter prediction. Hence
sub8x8mc can be a standalone experiment now.

Change-Id: Ie9d87f61fc317b1d114edb4e0bf5544f918ed08e
2016-11-02 14:57:11 -07:00
Jingning Han
d611808324 Remove redundant experimental flags from common_data.h
No coding statistics change.

Change-Id: I88cbb828308b5796a2e87079c2f1bf0dabd99a11
2016-11-02 14:51:23 -07:00
Jingning Han
c104b8f269 Merge "Support rectangular tx_size in the common lib" into nextgenv2 2016-11-02 21:49:41 +00:00
Jingning Han
e714e70f77 Merge "Support rectangular transform block units in the codebase" into nextgenv2 2016-11-02 21:49:18 +00:00
Jingning Han
a834925778 Merge "Make highbd rectangular transform block available in the common lib" into nextgenv2 2016-11-02 21:49:07 +00:00
Jean-Marc Valin
643902d621 No need to store the deringing filter direction variance in an array
No change in output

Change-Id: Ifa5c5d4ed33ff11ea3c56ee5d559c7a40599b3dc
2016-11-02 13:15:42 -07:00
Sarah Parker
fcb2ca6eda Merge "Fix ubsan left shift warnings in warped motion library" into nextgenv2 2016-11-02 19:45:45 +00:00
Jingning Han
653102ab1c Remove unused get_intra/inter_scan() from scan.h
Change-Id: I96fc1da1ce56593ae35ebbc93a668e4ba241234a
2016-11-02 12:00:51 -07:00
Jingning Han
8b5380ac77 Remove redundant config flags from get_entropy_context
The rectangular transform syntax is by default supported, hence
no need to put it under the experimental flag. This does not change
the coding statistics.

Change-Id: I3a147503d973a03400f8a86e11f07c7d754e6234
2016-11-02 11:48:39 -07:00
Jingning Han
9fe31390ca Support rectangular tx_size in the common lib
Change-Id: I4128ab932a967a3d657bb1f95f0fa2af20a06469
2016-11-02 11:48:31 -07:00
Jingning Han
4ba26dc0e1 Support rectangular transform block units in the codebase
Change-Id: I9183851258478a36dc5a4ad2d4faa3d3c8b18bd3
2016-11-02 11:47:48 -07:00
Jingning Han
5238e6eaee Make highbd rectangular transform block available in the common lib
Change-Id: Ief08b23b30b78d640f6d7c702145e5bcf1b37b57
2016-11-02 11:47:48 -07:00
Debargha Mukherjee
deef66db01 Merge "Adding 64x64 forward and inverse transforms" into nextgenv2 2016-11-02 18:40:55 +00:00
Yaowu Xu
1af3d51685 Merge changes I313bde67,I2ddc2d70,Ifb9094c3,I9051ed6e,I5681e332, ... into nextgenv2
* changes:
  Avoid the "initial copy" in the deringing filter
  Only copy the deringed blocks back into the buffer
  Reducing copies in deringing filter
  sb_all_skip_out() now computes a list of deringed blocks
  compute bskip as we go
  Revert "Fix dering filter when using 4:2:2 or 4:4:0 subsampling"
2016-11-02 18:02:56 +00:00
Debargha Mukherjee
67d134772c Adding 64x64 forward and inverse transforms
Change-Id: I213f3111fc0656aecd1303a8b871ecded2b92bc2
2016-11-02 09:48:46 -07:00
Zoe Liu
bd163bc199 Merge "Make a small code clean on handle_inter_mode()" into nextgenv2 2016-11-02 16:39:36 +00:00
Jingning Han
6a503e4110 Merge "Make rectangular transform block available in the common lib" into nextgenv2 2016-11-02 16:17:00 +00:00
Jingning Han
f8a29663be Merge "Simplify tx_size enums" into nextgenv2 2016-11-02 16:16:50 +00:00
Jean-Marc Valin
bcf3580b1e Avoid the "initial copy" in the deringing filter
No change in output

Change-Id: I313bde67e59835f88e3b2e6079b0df2d7ed1a903
2016-11-02 08:23:04 -07:00
Jean-Marc Valin
7618daa555 Only copy the deringed blocks back into the buffer
No change in output

Change-Id: I2ddc2d70c6534e7cfd315d66e838410677f91356
2016-11-02 08:22:58 -07:00
Jean-Marc Valin
cf23aefab5 Reducing copies in deringing filter
Only copy the modified pixels from the first filter back into the input of the
second filter.

Change-Id: Ifb9094c33c876a8c6caa0f68771fc7ef59c78b53
2016-11-02 08:22:51 -07:00
Jean-Marc Valin
3e44bccb50 sb_all_skip_out() now computes a list of deringed blocks
No change in output

Change-Id: I9051ed6e1fbca7d80412ba2b53f7aacbc3ef70eb
2016-11-02 08:22:45 -07:00
Jean-Marc Valin
71466d2288 compute bskip as we go
Change-Id: I5681e3329ad3677296161de59f5ff1236a14f086
2016-11-02 08:22:38 -07:00
Yaowu Xu
3e90f84a34 Revert "Fix dering filter when using 4:2:2 or 4:4:0 subsampling"
This reverts commit 401204a50bea8b21e5fcd721cd5db513b8f70e72.

Change-Id: Id27eadf679b0df2d2ccfab61155be29979b0b6ba
2016-11-02 08:22:02 -07:00
Jingning Han
ec419e0771 Make rectangular transform block available in the common lib
This prepares the integration of rectangular transform block size
with recursive transform block partition system.

Change-Id: Id96aa3790dace15619c665f438241938992d1730
2016-11-01 22:25:54 -07:00
Yaowu Xu
f67e5eec8b Merge "Disable upsampled references for resolutions above 1080p." into nextgenv2 2016-11-02 04:28:51 +00:00
Jingning Han
aad298ffcf Simplify tx_size enums
Remove redundant experimental flag. This does not change the coding
statistics.

Change-Id: I35b3cb04025c5c2d2744312e5efc00d0473c990d
2016-11-01 21:12:55 -07:00
Yi Luo
fb77385fd0 Merge "Remove unused copies of transform related source code" into nextgenv2 2016-11-02 01:43:19 +00:00
Yi Luo
7f6bf9c70d Merge "Hybrid inverse transforms 16x16 AVX2 optimization" into nextgenv2 2016-11-02 01:43:02 +00:00
Jingning Han
9679464e28 Merge "Change TXFM_CONTEXT from TX_SIZE to uint8_t" into nextgenv2 2016-11-02 01:18:19 +00:00
Jingning Han
746e2220b5 Merge "Rework transform block partition context model" into nextgenv2 2016-11-02 01:18:13 +00:00
Thomas Daede
a9e96d4000 Disable upsampled references for resolutions above 1080p.
Upsampled references currently increase the size of references by
64 times. This patch limits the memory used by the encoder to
about 3GB when encoding high bit depth content.

This should be re-evaluated in the future, if doing 8-tap
resampling in the motion search becomes reasonably fast, or if
the upsampled references are reduced in size (by omitting some
subpel positions and interpolating them instead).

Change-Id: I6d84ff0d6202ec46f4fa53e268e68aa808e5df85
2016-11-01 17:39:16 -07:00
Urvang Joshi
a5b09216b5 Merge "Revert of "Mark bogus palette color probabilities as zero"." into nextgenv2 2016-11-02 00:31:55 +00:00
Jingning Han
8b9478af1e Change TXFM_CONTEXT from TX_SIZE to uint8_t
Count the transform block partition context in the unit of pixels.

Change-Id: Ibb66f053526ed347ad0274b78db7ac35cc086b0e
2016-11-01 15:44:26 -07:00
Urvang Joshi
eb54e0cde8 Revert of "Mark bogus palette color probabilities as zero".
Reverted commit: f8306bfdc (with some changes).

Reason: This was triggering an assert in debug build because of zero
probability values. So, using an "UNUSED_PROB" macro to replace these to
retain clarity.

Assertion failure can be reproduced as follows:

$ make clean; extra_cflags='-O0 -g -fno-inline' ../../configure
--enable-debug --enable-experimental --enable-palette && make -j 16

$ ./aomenc -D --codec=av1 ~/videos/screen_content_set/gimp.y4m -o
/tmp/foo.webm --tune-content=screen --limit=50

Pass 1/2 frame   50/51      8976B    1436b/f   86169b/s 2902620 us
(17.23 fps)
Pass 2/2 frame   25/0          0B 2933053 us 8.52 fps [ETA  unknown]
aomenc: ../../av1/encoder/cost.c:46: cost: Assertion `prob != 0' failed.
Aborted (core dumped)

Change-Id: I47a76b8f415060909bc8448fae3002857eb61d8e
2016-11-01 15:25:57 -07:00
Yi Luo
ea1167c33f Remove unused copies of transform related source code
- Library size reduces: 165 kB, 292 kB (HBD).

Change-Id: I50cb630dde326bd2a28c0db4b7e2d53c2fd94a2a
2016-11-01 15:07:46 -07:00
Jingning Han
c8b8936fdc Rework transform block partition context model
This commit allows the partition context model to account for the
maximum transform block size of the coding block.

Change-Id: I22b91e85fff70faa974afd362ce327d3f2eda81d
2016-11-01 15:00:04 -07:00
Zoe Liu
82c8c92cc5 Make a small code clean on handle_inter_mode()
Change-Id: I5fb4898045a481f7996c2ad019d2f741aab08fc7
2016-11-01 14:52:34 -07:00
Yaowu Xu
57a7baf666 Merge "Fix merge issues related --enable-ec-adapt" into nextgenv2 2016-11-01 21:07:18 +00:00
Yaowu Xu
980eb2e9fa Merge "Change to use correct variable in for-loop" into nextgenv2 2016-11-01 21:07:11 +00:00
Yi Luo
7317200002 Hybrid inverse transforms 16x16 AVX2 optimization
- Add unit tests to verify the bit-exact result.
- User level time reduction (EXT_TX):
    encoder: 3.63%
    decoder: 2.36%
- Also add tx_type=V_DCT...H_FLIPADST SSE2 for 16x16 inv txfm.

Change-Id: Idc6d9e8254aa536e5f18a87fa0d37c6bd551c083
2016-11-01 13:38:20 -07:00
Yaowu Xu
8af861bbf1 Fix merge issues related --enable-ec-adapt
1. Avoid compiler warnings.
2. Enable prob_diff_update() required by update_txfm_probs().

Change-Id: I9081b645c55a8432bdaeb600e9ba901c0d0d96f5
2016-11-01 12:36:04 -07:00
Yaowu Xu
ddcdd5b1e5 Merge "Fix a compiler warning with --enable-adapt-scan" into nextgenv2 2016-11-01 18:12:49 +00:00
Yaowu Xu
2ce9707910 Merge "Resolve build issue --enable-aom-qm" into nextgenv2 2016-11-01 18:12:39 +00:00
Yaowu Xu
6043bfdb03 Change to use correct variable in for-loop
Change-Id: I252c2f06dfe256d2d33fd1abc42aaadf50273cc8
2016-11-01 09:54:05 -07:00
Jingning Han
ae81f8b2ab Merge "Make txfm_partition_update support rectangular tx_size" into nextgenv2 2016-11-01 16:51:03 +00:00
Jingning Han
2e4f129b42 Merge "Use get_entropy_context() in select_tx_block" into nextgenv2 2016-11-01 16:50:55 +00:00
Yaowu Xu
b386f0b762 Fix a compiler warning with --enable-adapt-scan
Change-Id: I93b191a522ed3e3ca9a363beab4292f64e869610
2016-11-01 09:40:12 -07:00
Yaowu Xu
a5924740a2 Resolve build issue --enable-aom-qm
Change-Id: I9f52ddb53b39cefd2e0ee7144203e1f3958d01aa
2016-11-01 09:32:03 -07:00
Yaowu Xu
fd601e346c Merge "Rename av1_convolve.[hc] to convolve.[hc]" into nextgenv2 2016-11-01 02:25:19 +00:00
Yaowu Xu
ec040fe23c Merge "cmake support: A starting point." into nextgenv2 2016-11-01 02:25:05 +00:00
Yaowu Xu
0279e91576 Merge "decodemv.c: relocate a function" into nextgenv2 2016-11-01 02:24:51 +00:00
Yaowu Xu
8040aaf9b1 Merge "Fix a bad merge" into nextgenv2 2016-11-01 02:24:38 +00:00
Yaowu Xu
6557ea9fe2 Rename av1_convolve.[hc] to convolve.[hc]
Change-Id: I2047adc4c147201ce0ce3c533fe2861cbff1002c
2016-10-31 17:17:37 -07:00
Jingning Han
7956bd64d7 Make txfm_partition_update support rectangular tx_size
Change-Id: I7d2414a8766141d5109b599271179bc505c772d3
2016-10-31 16:46:30 -07:00
Tom Finegan
fc6f23647d cmake support: A starting point.
Start adding cmake build support. This is based on the generic-gnu
target and will not build anything. It simply produces a project file
(when generating for a IDE) that can be loaded and that allows for
interaction with (most of) the aom sources used in a generic-gnu
build.

Notable missing pieces:
- flag testing
- config generation
- experiment configuration
- enable/disable encoder/decoder
- aomenc/aomdec
- all third party library build integration
- all tests

Change-Id: Iaeda0b03d58591a26a8fb54f63a2aa3b5354e3a6
2016-10-31 16:46:05 -07:00
Yaowu Xu
b24e115bc6 decodemv.c: relocate a function
Change-Id: I932dd9c8b43a20d248c00847b19dff88e6eb11be
2016-10-31 16:45:37 -07:00
Angie Chiang
fd248ab173 Merge "Refactor scan_test.cc" into nextgenv2 2016-10-31 23:31:18 +00:00
Jingning Han
ce059e86fb Use get_entropy_context() in select_tx_block
Replace redundant separate handling to retrieve the context value.

Change-Id: I18dde4599cd08ffe33a78694ec377487609de1b1
2016-10-31 16:27:28 -07:00
Yaowu Xu
e86288d2de Fix a bad merge
Change-Id: I4615e8e64d75b1f4277d2221ec94c5d4f1830aa4
2016-10-31 15:56:38 -07:00
Jingning Han
e29fc1daef Merge "Refactor max_blocks_wide/high computation" into nextgenv2 2016-10-31 22:20:16 +00:00
Jingning Han
609f5c63ac Merge "Remove unused tx_size tables" into nextgenv2 2016-10-31 22:20:08 +00:00
Jingning Han
6511159787 Merge "Replace get_tx2d_size() with direct tx_size_2d[] table access" into nextgenv2 2016-10-31 22:19:59 +00:00
Jingning Han
6491b97350 Merge "Support rectangular tx_size in recursive txfm syntax coding" into nextgenv2 2016-10-31 22:19:46 +00:00
Yaowu Xu
292bd65510 Merge changes from topic 'fix_ec_adapt' into nextgenv2
* changes:
  Reverse order of CLPF and dering
  Refactor: read_tx_size_probs()
  Fix compiling issues with --enable-ec-adapt
  Fixes compilation error on Windows/Visual Studio
2016-10-31 22:18:52 +00:00
Angie Chiang
ec932242b3 Refactor scan_test.cc
Change-Id: I546a955a95d6d43182631ad5e8d1c137c36e9a0c
2016-10-31 13:43:51 -07:00
Steinar Midtskogen
5d56f4d69a Reverse order of CLPF and dering
Low latency:
PSNR YCbCr:     -0.15%      0.11%      0.12%
   PSNRHVS:     -0.25%
      SSIM:     -0.26%
    MSSSIM:     -0.26%
 CIEDE2000:     -0.03%

High latency:
PSNR YCbCr:     -0.18%      0.18%      0.07%
   PSNRHVS:     -0.20%
      SSIM:     -0.21%
    MSSSIM:     -0.21%
 CIEDE2000:     -0.03%

Change-Id: Ieb86d9ba353220de6454bdc15cea825944b6385b
2016-10-31 12:50:11 -07:00
Jingning Han
f65b870e27 Refactor max_blocks_wide/high computation
Factor common codes that show up in multiple places.

Change-Id: I0a72213a151f74bdad926d59f86f0a28d00968fc
2016-10-31 12:39:36 -07:00
Jingning Han
393a60d208 Remove unused tx_size tables
Change-Id: I04367fb68e8fd027f4b9d945f4001e5ab346d098
2016-10-31 12:39:33 -07:00
Jingning Han
7e9929736c Replace get_tx2d_size() with direct tx_size_2d[] table access
Change-Id: I20040cdb5d9fdbf6c50082e5e17b4cfbd1926b13
2016-10-31 12:39:33 -07:00
Jingning Han
42a0fb369d Support rectangular tx_size in recursive txfm syntax coding
Change-Id: I40aa342ffa5b6effe8b124b94783e5f0bd2f2a81
2016-10-31 12:38:07 -07:00
Jingning Han
a98d80fdaa Merge "Use the actual transform block size for loop filter selection" into nextgenv2 2016-10-31 19:09:07 +00:00
Yaowu Xu
efc7535beb Refactor: read_tx_size_probs()
Change-Id: Ibdedd9b8e0b6646b882bc159856ac7c7e7073149
2016-10-31 09:46:42 -07:00
Yaowu Xu
750955b4c1 mvref_common.c: apply clang-format
Change-Id: I755bfb11a57e92e3a68855a53e95efe526f198fd
2016-10-31 09:13:53 -07:00
Yaowu Xu
1aceffa06c Fix compiling issues with --enable-ec-adapt
Change-Id: I52e2c84ce43d36f78806c54b214f9e5b07c5f0f5
2016-10-31 09:13:53 -07:00
Arild Fuldseth (arilfuld)
59622cf292 Fixes compilation error on Windows/Visual Studio
Change-Id: I32377deb5f1e882370c70449cb8f68f2fdafcbef
2016-10-31 09:13:53 -07:00
Yaowu Xu
09a4265725 Merge "simp-mv-pred integration with ref-mv" into nextgenv2 2016-10-30 19:54:41 +00:00
Yaowu Xu
aa70234e82 Merge "Fix the top-right reference block location" into nextgenv2 2016-10-30 19:54:23 +00:00
Yaowu Xu
ca5e18b750 Merge "Upsample reference frames after size dependent speed features are calculated." into nextgenv2 2016-10-30 19:54:17 +00:00
Yaowu Xu
e7a64cc9ec Merge "Let is_interp_needed always return 1" into nextgenv2 2016-10-30 19:53:40 +00:00
Yaowu Xu
02d33bfeeb Merge "Centralize EC_MULTISYMBOL error checking." into nextgenv2 2016-10-30 19:53:29 +00:00
Yaowu Xu
2a67024991 Merge "EC_ADAPT: disable tests requiring tiles." into nextgenv2 2016-10-30 17:16:34 +00:00
Yaowu Xu
a2d2a1858e Merge "EC_ADAPT: refactor and fix MinArfFreq unit tests." into nextgenv2 2016-10-30 17:16:19 +00:00
Yaowu Xu
99be652acb Merge "Only build aom_read/write_symbol if CONFIG_EC_MULTISYMBOL" into nextgenv2 2016-10-30 17:16:06 +00:00
Yaowu Xu
06a5ea9617 Merge "EC_ADAPT: improved symbol adaptation." into nextgenv2 2016-10-30 17:15:54 +00:00
Yaowu Xu
46fcecc395 Merge "EC_ADAPT: send updates for the correct nodes." into nextgenv2 2016-10-30 17:15:40 +00:00
Yaowu Xu
2ae4214618 Merge "Add ec_multisymbol for common daala_ec and rans code" into nextgenv2 2016-10-30 17:15:28 +00:00
Yaowu Xu
58eeb100ab Merge "Handle entropy coder experiment dependencies" into nextgenv2 2016-10-30 16:22:27 +00:00
Yaowu Xu
28adf035df Merge "Disable the SuperframeTest with --enable-daala_ec." into nextgenv2 2016-10-30 16:22:16 +00:00
Yaowu Xu
cfab447bf8 Merge "Fix ec_adapt+daala_ec test failure" into nextgenv2 2016-10-30 16:22:05 +00:00
Yaowu Xu
eaafb17d41 Merge "Add EC_ADAPT experiment for symbol-adaptive entropy coding." into nextgenv2 2016-10-30 16:21:47 +00:00
Deng
ca8d24d4e1 simp-mv-pred integration with ref-mv
This commit adds simp-mv-pred experiment. The experiment is to work on
top of ref-mv experiment to save memory bandwidth and reduce the size
of line buffer needed in ref-mv experiment.

When compared to ref-mv, this experiment showed:
low-delay BDR gain: 0.03%
High-delay BDR gain: 0.01%
memory/memory bandwidth saving: 40%
local memory/gate count saving: 20%

Change-Id: Ic4006e041fc58ede411da83d0d730c464ebe1749
2016-10-29 22:26:48 -07:00
Jingning Han
ea9cf097c9 Fix the top-right reference block location
This commit fixes the top-right reference block location for block
sizes above 8x8. It improves the coding performance of ref-mv:

lowres 0.08%
midres 0.15%

Thanks to jiafeng@ for finding this issue.

Change-Id: I70750fc7b18bf0126d3e07abc1b63ca5a160193e
2016-10-29 22:26:48 -07:00
Thomas Daede
919bd6abd7 Upsample reference frames after size dependent speed features are calculated.
This prevents a crash if the upsample_refs speed feature is
changed as part of set_size_dependent_vars, when the recode
loop is enabled.

Change-Id: I645e389bfe961879dd2001439a34fde2993868d9
2016-10-29 22:26:48 -07:00
Angie Chiang
a69ce1b314 Let is_interp_needed always return 1
This CL will cause
0.122% PSNR drop on lowres dataset
0.059% PSNR drop on midres dataset

However, it will facilitate hardware implementation.

Change-Id: I0a0713acacbfd571509a721337711c021915dd3c
2016-10-29 22:26:48 -07:00
Nathan E. Egge
baaaa16186 Centralize EC_MULTISYMBOL error checking.
The EC_ADAPT experiment cannot work unless EC_MULTISYMBOL is also
 enabled.
This patch replaces all individual checks with a centralized check in
 both the bitreader.h and bitwriter.h.

Change-Id: I418852d95c5012cc074ed65cd24997e08bc2aadd
2016-10-29 22:26:27 -07:00
Thomas Davies
0575e6c2d4 EC_ADAPT: disable tests requiring tiles.
EC_ADAPT is currently not compatible with tiles.

Change-Id: Idd000f0ff23c28e7e4952024eadb55ba0a1da13d
2016-10-29 22:22:19 -07:00
Thomas Davies
6519bebf34 EC_ADAPT: refactor and fix MinArfFreq unit tests.
Ensure that cdfs are synced with pdfs after every
forward update.

Change-Id: I5677f78300156c8622f1728d7a343ff6c3a4ea64
2016-10-29 22:21:32 -07:00
Alex Converse
58c520afe9 Only build aom_read/write_symbol if CONFIG_EC_MULTISYMBOL
Change-Id: If86c7220ac9199a59e605dc43d42cc3db26cf8bd
2016-10-29 17:05:40 -07:00
Thomas Davies
f6c04acaa3 EC_ADAPT: improved symbol adaptation.
Place a floor under symbol probabilities and
modify adaptation rate.

Change-Id: Ic9cf6d9fadfc3bf1f3027bc3d2bb198526441591
2016-10-29 17:05:40 -07:00
Thomas Davies
09ebbfb39f EC_ADAPT: send updates for the correct nodes.
EOB and ZERO token are not currently adapted.

Change-Id: Ie7d657b71fcb157b09e40874fb06a8b7cd95cc70
2016-10-29 17:05:40 -07:00
Alex Converse
aca9feba82 Add ec_multisymbol for common daala_ec and rans code
The new ec_multisymbol experiment supersedes the rans experiment and is
used for multisymbol features that can be backed by either daala_ec or
rans.

This experiment is automatically enabled by ec_adapt and will try to
enable daala_ec or ans (in that order).

Change-Id: Ie75b4002b7a9d7f5f7b4d130c1aacb3dbe97e54f
2016-10-29 17:05:40 -07:00
Alex Converse
242558a21b Handle entropy coder experiment dependencies
Change-Id: I854c53d9379f820b5a78fcb53f9ef09bc6f9d9e7
2016-10-29 17:05:40 -07:00
Yaowu Xu
15c1aa60f3 Disable the SuperframeTest with --enable-daala_ec.
Due to the way the daala entropy coder handles raw bits, the current
test is broken because the buffer length is not known when calling
aom_reader_init() is called.

Change-Id: I76e93ec0e160e31f286c23f7c9c0094390c6c2d4
2016-10-29 17:05:40 -07:00
Alex Converse
bc0a5bacb5 Fix ec_adapt+daala_ec test failure
AV1/AqSegmentTest.TestNoMisMatchAQ1/6 was failing with this experiment
pair.

BUG=aomedia:70

Change-Id: I8c53a043471a87a98a06687afce2e28891592362
2016-10-29 17:05:40 -07:00
Thomas
9ac5508f32 Add EC_ADAPT experiment for symbol-adaptive entropy coding.
This experiment performs symbol-by-symbol statistics
adaptation for non-binary symbols. It requires DAALA_EC or
RANS and ANS to be enabled. The adaptation is currently
based on a simple recursive filter and is taken from
Daala. It has an adaptation rate dependent on alphabet size,
taken from Daala. It applies wherever non-binary symbols
are encoded using Cumulative Probability Functions rather
than trees.

Where symbols are adapted, forward updates in the compressed
header are removed.

In the case of RANS coefficient token values are adapted,
with the exception of the zero token which remains a
binary symbol. In the case of DAALA_EC other values
such as inter and intra modes are adapted as CDFs are
provided in those cases.

The experiment is configured with:

./configure --enable-experimental --enable-daala-ec --enable-ec-adapt

or

./configure --enable-experimental --enable-ans --enable-rans \
    --enable-ec-adapt

EC_ADAPT is not currently compatible with tiles.

BDR results on Objective-1-fast give a small loss:

PSNR YCbCr:      0.51%      0.49%      0.48%
PSNRHVS:      0.50%
SSIM:      0.50%
MSSSIM:      0.51%
CIEDE2000:      0.50%

Change-Id: I3888718e42616f3fd87144de7f125228446ac984
2016-10-29 16:57:48 -07:00
Jingning Han
ee9264c923 Merge "Replace num_4x4_blocks_txsize_loopup table" into nextgenv2 2016-10-29 23:01:26 +00:00
Jingning Han
73d65a49a9 Merge "Refactor rate-distortion optimization of recursive transform partition" into nextgenv2 2016-10-29 23:01:14 +00:00
Jingning Han
9fb1d69e82 Use the actual transform block size for loop filter selection
Parse the recursive transform block partition to fetch the actual
transform size. Use this correct transform size to select the
corresponding loop filter kernel. This slightly improves the coding
performance of recursive transform partition for hdres to 0.14%.

Change-Id: Ibe8bc3fdd0d222a4f1fb8156c56a407bec052b9b
2016-10-29 15:59:55 -07:00
Urvang Joshi
1252f75616 Merge "RANGE_CHECK: "==" || ">" is simply ">="." into nextgenv2 2016-10-28 23:55:01 +00:00
Zoe Liu
9d37fe47a2 Merge "Clean the code in ref frame context decision for ext-refs" into nextgenv2 2016-10-28 23:36:41 +00:00
Jingning Han
32b2028b30 Replace num_4x4_blocks_txsize_loopup table
Unify the transform block size access table in preparation for
2x2 transform integration.

Change-Id: I308def6729e138ae2b2542175206e3225c0cb392
2016-10-28 15:42:44 -07:00
Jingning Han
9fdc42293f Refactor rate-distortion optimization of recursive transform partition
Support rectangular transform block in the rate-distortion cost
estimator.

Change-Id: I99201fcae797c1ed2f2184021a215867eac0288f
2016-10-28 14:48:40 -07:00
Sarah Parker
d722f71ed8 Merge "Bitwise to logical & in rdopt ext tx prune function" into nextgenv2 2016-10-28 21:43:03 +00:00
Urvang Joshi
cd8ab904e1 RANGE_CHECK: "==" || ">" is simply ">=".
Also:
- For unsigned ints, don't check value >= 0 as that is always true.
- Add "-Wlogical-op" warning flag which would have warned that "logical
  'or' of collectively exhaustive tests is always true" before this
  patch.

Change-Id: Idf3bd312464397f2df19256fc69b22f345dc7753
2016-10-28 14:40:29 -07:00
Yaowu Xu
d64eaf138e Merge "Tile groups: ensure each tile in a TG has a length." into nextgenv2 2016-10-28 21:26:32 +00:00
Yaowu Xu
edd3f9c418 Merge "Fix update_delta_q_probs compile warning" into nextgenv2 2016-10-28 21:26:23 +00:00
Yaowu Xu
efd5725242 Merge "Encode and decode multiple tile groups" into nextgenv2 2016-10-28 21:26:11 +00:00
Sarah Parker
68a26b6b4a Bitwise to logical & in rdopt ext tx prune function
Making this change in case the future implementation changes and the
compairson is no longer between single bits.

Change-Id: I94f474ce7d82febfa23cec65cbe1b9d240b42e02
2016-10-28 13:19:33 -07:00
Thomas Davies
8fe64a3a23 Tile groups: ensure each tile in a TG has a length.
This ensures TGs can be decoded even if the whole
frame has not been received and the frame length
is not known.

Change-Id: If24837fcc3b5c46554751be792e91100de73e8d6
2016-10-28 13:01:40 -07:00
Jingning Han
be44c5f46f Fix update_delta_q_probs compile warning
Change-Id: Ifb93970ed876ed61259b2f8da739171857c97fda
2016-10-28 13:01:40 -07:00
Debargha Mukherjee
3ff8cb764b Merge "Fix aom_fdct8x8_ssse3 in high bit depth mode" into nextgenv2 2016-10-28 19:31:45 +00:00
Zoe Liu
782c96438c Clean the code in ref frame context decision for ext-refs
For compound mode, it is a sure thing that one of the 2 reference frames
would be either a forward predictive reference, or a backward predictive
reference, and the other would provide a different prediction.

Change-Id: I8d7b40525bec4db0f26ba255c8eefa9f20bd52a3
2016-10-28 12:23:38 -07:00
Urvang Joshi
76bc587f69 Merge "get_palette_color_context: Make code more readable." into nextgenv2 2016-10-28 19:03:26 +00:00
Thomas Davies
80188d1546 Encode and decode multiple tile groups
This is a manual adaptation of the following commit from aom/master:
ce12003d60a1c8d6c65ed07ba165c34062fcbcbd

The original commit message:

A tile group is a set of tiles in scan order.

Each tile group has a version of uncompressed and compressed headers,
identical apart from tile group parameters.
Encoding probability updates takes account of the number of
headers to control overheads.

The decoder supports arbitrary numbers of tile groups with
arbitrary number of tiles. The number of tiles in a TG is
signalled in the uncompressed header for that TG.

The encoder currently only supports a fixed number
of TGs (3, when error resilient mode is on) of equal size
(except possibly for the last one).

The average BDR performnce with 3 tile groups versus
anchor with error resilient mode and up to 16 tiles is:

NR YCbCr:      3.02%      3.04%      3.05%
PSNRHVS:      3.09%
SSIM:      3.06%
MSSSIM:      3.05%
CIEDE2000:      3.04%

Change-Id: I9b97c5ed733103b9160a3a5d4370de5322c00c0b
2016-10-28 11:52:13 -07:00
Urvang Joshi
79f4fc476d get_palette_color_context: Make code more readable.
For clarity, use separate variables for 'color_ctx_hash' and
'color_ctx' instead of reusing same variables for both.

BUG=webm:1324

Change-Id: I3a516ea54353e1f0737822c613a68da252e30c6e
2016-10-28 09:42:05 -07:00
Angie Chiang
3655dcd4cf Fix tmp_rd type error in handle_inter_mode()
Change-Id: I9398c77c12e9c4caa19a76b92e3035a3135cfd7a
2016-10-28 09:05:27 -07:00
Angie Chiang
349b723f5c Merge "Add unit test for adapt_scan experiment" into nextgenv2 2016-10-28 15:53:59 +00:00
Angie Chiang
6b7255374d Merge "Pass block pixel width/height into av1_predict_intra_block" into nextgenv2 2016-10-28 15:51:30 +00:00
Jingning Han
cb277c0b82 Merge "Refactor recursive transform block partition search" into nextgenv2 2016-10-28 15:50:36 +00:00
Jingning Han
6675bbca0e Merge "Simplify logics in encode_inter_mb_segment" into nextgenv2 2016-10-28 15:50:15 +00:00
Jingning Han
fe8d6c62ce Merge "Refactor recursive transform block decoding" into nextgenv2 2016-10-28 15:49:27 +00:00
Jingning Han
c17b9e00dc Merge "Refactor recursive transform block size decoding" into nextgenv2 2016-10-28 15:49:06 +00:00
Jingning Han
73144260e3 Merge "Remove unused get_tx1d_width/height wrapper" into nextgenv2 2016-10-28 15:48:25 +00:00
Jingning Han
2b0670e10a Merge "Use transform block partition depth count for frame header reset" into nextgenv2 2016-10-28 15:48:11 +00:00
Yaowu Xu
2df83d5c10 Merge "Remove av1_use_hp_mv()" into nextgenv2 2016-10-28 15:26:35 +00:00
Yaowu Xu
c66f264d17 Merge "rans: Use symbol coding for motion vectors" into nextgenv2 2016-10-28 15:26:22 +00:00
David Barker
0602edfbc5 Fix aom_fdct8x8_ssse3 in high bit depth mode
Change-Id: I63e492163ef10e12a842837368c209b8ffc4eee0
2016-10-28 10:13:43 +01:00
Jingning Han
5822404485 Refactor recursive transform block partition search
Use unified transform block size and coding block size map. This
prepares for the integration of 2x2 transform block size and the
rectangular transform block size.

Change-Id: I99f51017d19aef337639b708ee9c7faedcc20935
2016-10-28 05:12:19 +00:00
Jingning Han
c4049db573 Simplify logics in encode_inter_mb_segment
Unify coefficient context used by different experiments. Make
block size and transform block size consistent with rest codebase.

Change-Id: I237336f161d6c473b88c59c48ee68d24b75ce738
2016-10-28 05:12:05 +00:00
Jingning Han
5f61426424 Refactor recursive transform block decoding
Unify the transform block and coding block mapping.

Change-Id: Ifb394809a4aafee6adf2b49a2607036cf13c878e
2016-10-27 22:11:24 -07:00
Jingning Han
65abc314c4 Refactor recursive transform block size decoding
Unify the transform block size to block size mapping.

Change-Id: Ic7359d016cd5965983c4a5476624c09f3123f91c
2016-10-27 22:11:20 -07:00
Yaowu Xu
94df7ab121 Merge "Deringing support for 4:2:2 by not deringing chroma" into nextgenv2 2016-10-28 04:20:17 +00:00
Yaowu Xu
fbf8788d99 Merge "Namespace the idct/iad symbols" into nextgenv2 2016-10-28 04:19:51 +00:00
Angie Chiang
45c198a197 Pass block pixel width/height into av1_predict_intra_block
Change-Id: Ia69bceef24b61b0a222783eba79e7a70bb60edd8
2016-10-27 17:13:50 -07:00
Sarah Parker
243f87ef49 Merge "Cosmetic fixes in global motion experiment" into nextgenv2 2016-10-27 23:23:39 +00:00
Zoe Liu
b99af6e3e9 Merge "A small bug fix in ext-refs on the RD mode selection" into nextgenv2 2016-10-27 22:43:55 +00:00
Alex Converse
6317c88f5a Remove av1_use_hp_mv()
It always returns true since the related misc_fix[1] was merged.

[1] 23e83574b6a5105bdc686c49f2d5909f33ea721f

Change-Id: Ie3af685572a2f0a42d2b9fb9903c1abeea225dfd
2016-10-27 14:33:48 -07:00
Debargha Mukherjee
058e42d399 Merge "Fix clpf and dering signalling when used with ext-partition-types" into nextgenv2 2016-10-27 20:45:07 +00:00
Alex Converse
3fc98e86d1 rans: Use symbol coding for motion vectors
Change-Id: If497b53c3b36e32fb98c99dba2d4a490e226572a
2016-10-27 12:38:43 -07:00
Jean-Marc Valin
c67b895fa4 Deringing support for 4:2:2 by not deringing chroma
No change in output for 4:2:0 and 4:4:4

Change-Id: Ic46753d23a5b5f90b611a3da1a4574870519957c
2016-10-27 12:37:52 -07:00
Luca Barbato
f0f98578df Namespace the idct/iad symbols
Make linking to libvpx and libaom at the same time possible.

Change-Id: I7bab8527a32e446e3d564e6fa5d94ccd056bc63f
2016-10-27 12:36:37 -07:00
Debargha Mukherjee
a5e3bc0fbc Merge "Fix compile error with --enable-ans + --enable-accounting" into nextgenv2 2016-10-27 19:03:22 +00:00
Debargha Mukherjee
030527c54a Merge "Fix dering filter when using 4:2:2 or 4:4:0 subsampling" into nextgenv2 2016-10-27 19:03:04 +00:00
Jingning Han
d4c65cdba4 Remove unused get_tx1d_width/height wrapper
Change-Id: Ie8bc40579720b8c402bbc8b23b6fd3a7a50834bb
2016-10-27 18:49:45 +00:00
Jingning Han
2adcfb19d5 Use transform block partition depth count for frame header reset
Use the transform block partition depth counts to decide if to
reset the tx_mode at frame header level. Add a comment to make this
explicit.

Change-Id: I417920b4b61eeb91cde9536336a12deea2d42f79
2016-10-27 18:49:32 +00:00
Angie Chiang
3f8419976d Add unit test for adapt_scan experiment
Change-Id: I90518b7b5c8bb930f5eeef4ce4cbb536139722ca
2016-10-27 11:43:10 -07:00
Angie Chiang
3d41cb339c Merge "Refactor: Localize tmp_rd in handle_inter_mode()" into nextgenv2 2016-10-27 18:27:18 +00:00
Angie Chiang
47d56f4f36 Merge "Sync definition of av1_get_switchable_rate in rd.c/h" into nextgenv2 2016-10-27 18:27:07 +00:00
Yaowu Xu
18ee02b0b9 Merge "Fix two bugs in parallel_deblocking experiment" into nextgenv2 2016-10-27 14:06:07 +00:00
Yaowu Xu
9edd6005fd Merge "fix filtering of uv int4x4 for odd rows" into nextgenv2 2016-10-27 14:05:52 +00:00
Yaowu Xu
d5723e6f09 Merge "Add parallel-deblocking experiment" into nextgenv2 2016-10-27 14:05:39 +00:00
David Barker
f8935c9c92 Fix clpf and dering signalling when used with ext-partition-types
Previously, when ext-partition-types and either clpf or dering were
enabled, the signalling for clpf/dering would not be encoded or decoded,
as the code to do so was inside a #if !CONFIG_EXT_PARTITION_TYPES block.
This caused many tests (eg, AV1/EndToEndTestLarge.EndToEndPSNRTest/0)
to fail with encode/decode mismatches.

Change-Id: If1742deb1812877813b2c3e93a048430f9a504ba
2016-10-27 13:19:01 +00:00
Jingning Han
199502f259 Merge "Support potential 2x2 transform block unit" into nextgenv2 2016-10-27 00:50:02 +00:00
Yaowu Xu
f6f2cfcaa7 Merge "av1/common/filter.h: apply clang-format" into nextgenv2 2016-10-26 23:43:02 +00:00
Yi Luo
400dcc8088 Merge "Fix aom_fdct32x32_avx2 output as CONFIG_AOM_HIGHBITDEPTH=1" into nextgenv2 2016-10-26 22:42:17 +00:00
Jingning Han
607fa6a6ce Support potential 2x2 transform block unit
Make the codec support 2x2 tranform block unit for chroma components.

Change-Id: Ic454535bd5620abe88a2e99789160cc4664ee518
2016-10-26 15:38:13 -07:00
Jingning Han
b5a3082190 Merge "Synchronize tx_size counts in the decoder" into nextgenv2 2016-10-26 21:46:18 +00:00
Ryan Lei
6f8c1a78da Fix two bugs in parallel_deblocking experiment
This commit fixes two major bugs in parallel deblocking experiment, the
first one is missing initialization of lfm->lfl_uv array for horizontal
filtering. The second one is inconsistent order of vertical/horizontal
filtering of superblocks within a frame between encoder and decoder.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=45#c2
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=53#c1

Change-Id: I2df7eb313d49203fb70efe2bdf957b9d7e0bf678
2016-10-26 13:42:31 -07:00
Sarah Parker
7ba8dc1688 Fix ubsan left shift warnings in warped motion library
Change-Id: I14f609664411577706dbe4c099d90f0cfe2f7bb3
2016-10-26 12:58:36 -07:00
Yi Luo
97b29925fe Merge "Fix incorrect merge of forward txfm function declarations" into nextgenv2 2016-10-26 19:22:53 +00:00
Sarah Parker
b3dab4983b Cosmetic fixes in global motion experiment
These are in response to post-commit suggestions made on
If429c93bb90b66fdff0edc07ecd9fc078077d303.

Change-Id: Id29afa158471bd6259bd07ac00812a50bfd0a709
2016-10-26 11:45:50 -07:00
Urvang Joshi
839b07feec Merge changes I56cddcb4,I40c5a652 into nextgenv2
* changes:
  Mark bogus palette color probabilities as zero
  get_palette_color_context: code cleanup
2016-10-26 18:28:56 +00:00
Jingning Han
906be078a5 Synchronize tx_size counts in the decoder
Make both encoder and decoder use depth index for frame count.

Change-Id: I96dddffc0a83ad5e4e2847b15391e01ba01ee502
2016-10-26 11:04:58 -07:00
Angie Chiang
180566d854 Merge "av1/convolve.[hc],av1_convolve_test: add missing copyright" into nextgenv2 2016-10-26 17:51:28 +00:00
Angie Chiang
8e26f768c1 Merge "Use has_subpel_mv_component in av1_is_interp_needed" into nextgenv2 2016-10-26 17:50:54 +00:00
Angie Chiang
65eb2cf78a Sync definition of av1_get_switchable_rate in rd.c/h
Change-Id: I720934e02a15fd6184bdda6c1b8a23d5b02a5284
2016-10-26 10:48:47 -07:00
Sarah Parker
70c0df29da Merge "Revise precision clamping in GM param refinement" into nextgenv2 2016-10-26 17:33:47 +00:00
Yi Luo
133c13d637 Fix incorrect merge of forward txfm function declarations
- Restore the fwd txfm HBD function declarations exposure.

Change-Id: I1e33df6297fd37e242f4b73c8ab97063b9feb7c6
2016-10-26 10:30:53 -07:00
Jingning Han
b0a7130656 Convert tx_size to relative depth to fetch tx_size_cost
Use the relative transform partition depth as index to fetch the
tx_size_cost value.

Change-Id: I7d5119817baa96f23c32828065ff3175bb9f75cf
2016-10-26 17:12:41 +00:00
Jingning Han
8e022edd59 Allow backward probability model update from tx_size=0
Replace enum items with range definitions.

Change-Id: Iba2b7cac657db5fb6177cb5c9e6f40ec0125d926
2016-10-26 17:12:20 +00:00
Jingning Han
e5596d3168 Merge "Add depth to tx_size mapper to bit-stream coding" into nextgenv2 2016-10-26 17:11:56 +00:00
Angie Chiang
c352e79ee6 Merge "Simplify interpolation filter search in handle_inter_mode()" into nextgenv2 2016-10-26 16:51:58 +00:00
Janne Salonen
e8a3dbc0ff fix filtering of uv int4x4 for odd rows
Change-Id: I61f91855430e11da45d4e91ec6d3a8976c461cb7
2016-10-26 09:26:28 -07:00
Ryan Lei
15149484ec Add parallel-deblocking experiment
This commit is a manual cherry-pick from aom/master:
42ff3881ace1564aac9debae86ef37a8deb8d381

Change-Id: I4a3cdb939b7b96a3aa27f6a00da7a0e73222f3f3
2016-10-26 09:20:47 -07:00
Yunqing Wang
e61ec7bc19 Merge "Change 2 motion search counts to be tile data" into nextgenv2 2016-10-26 16:17:42 +00:00
Yaowu Xu
5a1fedfdda av1/common/filter.h: apply clang-format
Change-Id: I37f0d1fbcc6f262ae287290e2e6f5648ad0113c8
2016-10-26 09:14:01 -07:00
Jingning Han
4e1737af64 Add depth to tx_size mapper to bit-stream coding
It serves as a helper function to integrate various transform coding
options.

Change-Id: I64e7d0c88ea10137fa1ff1072d865eb0054c2a25
2016-10-26 15:45:19 +00:00
Sarah Parker
f41a06b231 Revise precision clamping in GM param refinement
This ensures that the parameter refinement never
results in a motion parameter value that exceeds the number
of alloted bits in the bitstream. It accounts for all of
the necessary precision shifts required to make global motion compatible
with the warped motion library. It also accounts for the
zero-centering that is applied to global motion parameters that are
naturally centered around one.
Change-Id: If429c93bb90b66fdff0edc07ecd9fc078077d303
2016-10-25 21:11:39 -07:00
Jingning Han
c83ef8b946 Merge "Refactor transform size coding" into nextgenv2 2016-10-26 01:12:04 +00:00
Angie Chiang
a2b56d3e05 Refactor: Localize tmp_rd in handle_inter_mode()
Change-Id: I01cb5cd544c849be160a9441d141c01a3424d32b
2016-10-25 17:34:59 -07:00
Angie Chiang
b135debcb6 Use has_subpel_mv_component in av1_is_interp_needed
Change-Id: I8980df4512de605aaa6a67c1f05e544f69a12e96
2016-10-25 17:10:19 -07:00
Angie Chiang
75c2209341 Simplify interpolation filter search in handle_inter_mode()
BDRate
ext_interp  lowres -0.001%
dual_filter lowres  0.001%

Change-Id: Ic24165d554c300eaa0188ee8cb88d320b74125aa
2016-10-25 17:10:08 -07:00
Angie Chiang
6421191247 av1/convolve.[hc],av1_convolve_test: add missing copyright
Change-Id: Ie84bdf90c31b12977d32baacfc8086c1fdd96e65
2016-10-25 16:43:43 -07:00
Jingning Han
aae72a69c3 Refactor transform size coding
Introduce the transform block partition depth macro definition.

Change-Id: I218dc77a77c8e967da4d270d4ec0d7691b712a5f
2016-10-25 15:42:30 -07:00
Jingning Han
b2d6a59ad5 Merge "Refactor tx_size use case in block encoding stage" into nextgenv2 2016-10-25 22:29:21 +00:00
Jingning Han
2eded9a3ff Merge "Refactor tokenize_vartx to use aligned transform block size fetch" into nextgenv2 2016-10-25 22:29:03 +00:00
Yunqing Wang
8c1e57c278 Change 2 motion search counts to be tile data
Imported changes from VP9:
https://chromium-review.googlesource.com/#/c/402551/
https://chromium-review.googlesource.com/#/c/403128/

Change-Id: I8570c867190a6fa641926431ce97f7d9d7da3528
2016-10-25 15:25:37 -07:00
Jingning Han
a1730659ec Merge "Use table fetch for block width in block_rd_txfm" into nextgenv2 2016-10-25 22:18:44 +00:00
James Zern
8aa4cbf5d5 Merge "update_state: quiet const warning w/global-motion" into nextgenv2 2016-10-25 22:15:39 +00:00
Yi Luo
0c552dfd82 Fix aom_fdct32x32_avx2 output as CONFIG_AOM_HIGHBITDEPTH=1
- Change FDCT32x32_2D_AVX2 output parameter to tran_low_t.
- Add unit tests for CONFIG_AOM_HIGHBITDEPTH=1.
- Update TODO notes.
BUG=webm:1323

Change-Id: If4766c919a24231fce886de74658b6dd7a011246
2016-10-25 14:33:21 -07:00
Urvang Joshi
d650f276ce Vertical scalers: Use signed int for src/dst pitch in parameters.
This avoids explicitly casting them to 'int' later.
These methods were already called with signed int arguments for pitch,
so this also avoids int -> unsigned int -> int conversion.

Change-Id: I2129f5ceff8f2525a188ee3ae52f9fe7067bd2e3
2016-10-25 13:00:22 -07:00
Angie Chiang
df70d29b72 Merge "Fix unsigned type error in gen_scaler.c" into nextgenv2 2016-10-25 19:43:09 +00:00
Yaowu Xu
15c37a5ae3 Merge "dkboolwriter.c: change copyright notice" into nextgenv2 2016-10-25 19:41:18 +00:00
Yaowu Xu
c2ac0a1d4c Merge "7-bit interpolation filters" into nextgenv2 2016-10-25 19:41:07 +00:00
Yaowu Xu
dece603fdf Merge "Use constrained tokenset with --enable-daala_ec." into nextgenv2 2016-10-25 19:40:51 +00:00
Jingning Han
e71ad1d4a2 Merge "Refactor dist_block() function" into nextgenv2 2016-10-25 19:39:22 +00:00
Angie Chiang
7c7e555ca0 Merge changes I6faedb29,Ic6586114 into nextgenv2
* changes:
  Remove speed feature of ext_interp experiment
  Refactor: handle_inter()
2016-10-25 19:36:49 +00:00
Jingning Han
de953b9d05 Refactor tx_size use case in block encoding stage
Change-Id: I56110d1fc94b335668e6b991442e9083bbaea8ee
2016-10-25 12:36:09 -07:00
Jingning Han
a893936335 Refactor tokenize_vartx to use aligned transform block size fetch
This prepares for the integration of rectangular transform size
into recursive transform block partition.

Change-Id: I164eb43d10afa9bb2f4722de7a48faa770ba4ced
2016-10-25 12:16:21 -07:00
Jingning Han
99e7a8d837 Merge "Refactor tx_size use cases in blockd.c" into nextgenv2 2016-10-25 19:03:29 +00:00
Jingning Han
c598cf853f Use table fetch for block width in block_rd_txfm
Make direct use of block_size_wide to fetch data for stride.

Change-Id: I0d8491e58cf00ea73c764d218cb56408b64d9ee7
2016-10-25 10:47:46 -07:00
Yaowu Xu
b695b1c118 dkboolwriter.c: change copyright notice
Change-Id: I1d9349a07ffd85991fc5673354d3ceff3404b358
2016-10-25 10:32:33 -07:00
Jingning Han
b9c572706d Refactor dist_block() function
Support automatic scale for mapping between transform block size
and pixel block size.

Change-Id: I141b0477a85c0dcc5f99b4e5d880cfccfae6d316
2016-10-25 10:22:17 -07:00
Arild Fuldseth
7acfabbc40 7-bit interpolation filters
Purpose:
-Reduce dynamic range of interpolation filter coefficents from 8
bits to 7 bits.
-Inner product for 8-bit input data can be stored in a 16-bit signed
integer.

Impact on compression efficiency:
-Marginal improvement, typically less than 0.5% BDR.

Change-Id: I58d1408307ae7d2a6f9de8965c5877b258703199
2016-10-25 10:18:55 -07:00
Yaowu Xu
1f112841d2 Merge "Refactor extrabits packing" into nextgenv2 2016-10-25 17:14:57 +00:00
Yaowu Xu
d8dc1fc522 Merge "Linearize extrabits writing." into nextgenv2 2016-10-25 17:14:44 +00:00
Nathan E. Egge
46e8490498 Use constrained tokenset with --enable-daala_ec.
Change-Id: Ia09edf92bf9f7ecacc65c232ac6e656cde236634
2016-10-25 10:13:22 -07:00
Jingning Han
95cff5c979 Refactor tx_size use cases in blockd.c
Use table to replace the arithmetic computation for mapping between
transform block and pixel number. Support automatic scale of block
size and transform block size.

Change-Id: I84766850172265d4295f418383dbc5e6e5838ec8
2016-10-25 09:50:07 -07:00
Debargha Mukherjee
7f9eb87082 Merge "Fix compile error with --enable-accounting" into nextgenv2 2016-10-25 16:25:42 +00:00
Angie Chiang
d35e12b184 Merge "Refactor: Add macro LOG_SWITCHABLE_FILTERS" into nextgenv2 2016-10-25 16:24:23 +00:00
Angie Chiang
d0aa90ed79 Remove speed feature of ext_interp experiment
This is to facilitate the refactor process

Change-Id: I6faedb29129b47abefe20821dc3f32a43db149d8
2016-10-25 09:22:35 -07:00
Angie Chiang
6305abe114 Refactor: Add macro LOG_SWITCHABLE_FILTERS
Change-Id: I7593ff2f8949d8bc26ca1c8577faaefb09640b59
2016-10-25 09:22:35 -07:00
Angie Chiang
1b131f1c64 Refactor: handle_inter()
Make the parenthesis symmetric
Replace interpolation filter mode number by macro

Change-Id: Ic6586114c4cebe920b950e1b3adc8ebc764d4713
2016-10-25 09:22:35 -07:00
Debargha Mukherjee
f8038850b6 Merge "Fix to make intra_only frames decodable out of order" into nextgenv2 2016-10-25 16:21:20 +00:00
Angie Chiang
dc1813ffd9 Fix unsigned type error in gen_scaler.c
Avoid applying unary minus operator on unsigned type

Change-Id: Ibc60541837eef06810f5be0aaa7fef9edcc8f8a4
2016-10-25 09:18:22 -07:00
Yaowu Xu
4579c5e458 Merge "update_state_supertx: rename a shadowed variable" into nextgenv2 2016-10-25 16:04:33 +00:00
Yaowu Xu
d971eb8521 Merge "Move small fixes and refactoring for obmc pred from AV1" into nextgenv2 2016-10-25 16:03:47 +00:00
Yaowu Xu
3635a832ab Merge "adapt_scan experiment" into nextgenv2 2016-10-25 16:03:40 +00:00
Alex Converse
d8fdfaa4da Refactor extrabits packing
- Eliminate the awkward _av1 suffix/infix in local variable names.
- Lift bitdepth selection out of the token loop.

Change-Id: I26d3397464f7808e0481a804033a93ca4f01f5d5
2016-10-25 08:59:08 -07:00
Alex Converse
81fd890773 Linearize extrabits writing.
The decoder is already linear so changing these tables would just create
a mismatch.

Change-Id: Ib888c0dc273e089c38298f569bb35b6e4c32dd60
2016-10-25 08:59:08 -07:00
Thomas Daede
8ec53b2655 Automatically upgrade profile to match input chroma subsampling.
This is a follow-up to 1195a396f6c53a5bd35559eed957c2aac855f7e.

Change-Id: I4db554e5d88290d55258062e29a1652707d91037
2016-10-25 08:56:55 -07:00
Yaowu Xu
2b33aa903f Remove select_tx_size from struct macroblock
It is no longer used anywhere.

Change-Id: I5d40664373d66821c5382f6155302b8719ce58c0
2016-10-25 08:56:55 -07:00
Guillaume Martres
4e4d3a075b Avoid unnecessary reencode in choose_largest_tx_size
This change is similar to the one done for choose_tx_size_from_rd in
daf841b4a10ece1b6831300d79f271d00f9d027b

It gives a 4% speed-up on bus_cif.y4m with the following settings:
--cpu-used=4 -p 1 --end-usage=q --cq-level=40 --tile-columns=0 --tile-rows=0

Change-Id: Ic54fe4a066a2c0b5f6349d80cd13de8bb8ddcabc
2016-10-25 08:56:55 -07:00
Brennan Shacklett
d4add7aca9 Remove encode_breakout and related speed features
Seems to be dead code

Change-Id: I17b3edc9e82d6a1da172a686522358a6b1a630e9
2016-10-25 08:56:55 -07:00
David Barker
01b16baa5a Fix compile error with --enable-ans + --enable-accounting
Change-Id: I43deba9c80b324c12852750d08c62dc2dd783835
2016-10-25 16:22:24 +01:00
David Barker
d971f40bcc Fix compile error with --enable-accounting
Change-Id: I4b18dbfb013c9805cb23083a68560ab212a0867a
2016-10-25 13:52:07 +01:00
David Barker
401204a50b Fix dering filter when using 4:2:2 or 4:4:0 subsampling
Change-Id: Ifa5bef5123e13df9cad59c7c870b58e18c2ce213
2016-10-25 12:54:59 +01:00
Peter de Rivaz
9d07888350 Fix to make intra_only frames decodable out of order
last_frame_type is not well defined for intra_only frames
if we are decoding them out of order.
This change removes a dependency on last_frame_type for these frames.

Change-Id: I440cac68792714de222e192a0b3e75f6e1aa5e4b
2016-10-25 10:19:57 +01:00
Sarah Parker
4b4e5eefe3 Merge "Extend warp_frame functions to average compound predictions" into nextgenv2 2016-10-25 02:00:48 +00:00
Angie Chiang
ed8cd9a9b4 adapt_scan experiment
Performance improvement
        BDRate
lowres  0.921%
midres  0.730%
hdres   1.019%

Change-Id: I26208d6c0531937bff44de505b4ea355c7852802
2016-10-24 18:24:56 -07:00
Alex Converse
f8306bfdc7 Mark bogus palette color probabilities as zero
It's clearer on inspection that the zero probabilities are unused.

Cherry-picked from aomedia/master: 8134db1

Change-Id: I56cddcb41ba256b7bb921d6a8538405165566dfb
2016-10-24 18:11:59 -07:00
Urvang Joshi
7bc1fa194d Merge changes I2153c57e,I0e291edd into nextgenv2
* changes:
  Palette: Generate encodings automatically from tree.
  Palette + Ext-Intra: shadowed declaration fix.
2016-10-25 01:06:28 +00:00
Urvang Joshi
4f4b68e245 get_palette_color_context: code cleanup
consts, comments and other small readability improvements.

Change-Id: I40c5a652811a796fdb91dc7ca6b108e8871f72d1
2016-10-24 18:03:09 -07:00
Yue Chen
cf6caf7a0c Merge "Fix bugs in SUB8X8_MC" into nextgenv2 2016-10-24 23:16:09 +00:00
Jingning Han
e8a17ba34e Merge "Refactor tx_size to pixel number mapping in reconintra.c" into nextgenv2 2016-10-24 22:24:04 +00:00
Jingning Han
61a50f73cf Merge "Simplify variable defs in av1_encode_block_intra" into nextgenv2 2016-10-24 22:23:59 +00:00
Jingning Han
8d6eaec1d7 Merge "Refactor av1_predict_intra_block tx_size interface" into nextgenv2 2016-10-24 22:23:40 +00:00
Jingning Han
9b0406454d Merge "Add block size in pixels lookup table" into nextgenv2 2016-10-24 22:23:36 +00:00
Angie Chiang
7e213aab0a Merge "Fix unsigned type error in aom_scale.c" into nextgenv2 2016-10-24 21:41:18 +00:00
Urvang Joshi
0b325978d7 Palette: Generate encodings automatically from tree.
Ran some manual sanity checks:
- Verified that the automatically generated encodings match the
  hand-written encodings before the patch.
- Verified that the encoded bitstream before/after this patch is
  identical.

Change-Id: I2153c57e463cff09c1d03d619b432fb1015199c3
2016-10-24 14:37:25 -07:00
Yue Chen
894fcceb87 Move small fixes and refactoring for obmc pred from AV1
Covering commits 1c263e0 and 79d8a07 from AOM codebase

Change-Id: I6400e5f99bbb2ef6584ef232d465e520230c06e0
2016-10-24 14:14:47 -07:00
Urvang Joshi
626591dfa1 Palette + Ext-Intra: shadowed declaration fix.
This shadowed declaration warning was generated when both experiments
are on.

Change-Id: I0e291eddeefabd68c5c3a0e5f8ac87706a82d55a
2016-10-24 14:13:55 -07:00
Jingning Han
7f76d4763d Prevent potential token buffer overflow in format 444
For a 16x16 pixel block, one needs to allocate 16x16 coefficient
tokens, plus up to 16 eob tokens, per plane. This commit increases
the token allocation size to cover the case where all the transform
blocks are of size 4x4 in format 444.

Change-Id: I5755e6a53771053d51163d01ec1d62e670c5009e
2016-10-24 14:08:34 -07:00
Thomas Daede
c0dca3c507 Automatically set internal bit depth to at least the input bit depth.
Upgrade profile if required.

Change-Id: Ieb2b77d2446290a8fc749739247a01e8f0600c55
2016-10-24 14:08:34 -07:00
Jingning Han
63632447ae Merge "Add MAX_VARTX_DEPTH macro" into nextgenv2 2016-10-24 21:01:29 +00:00
Jingning Han
e98c4a10e5 Merge "Simplify the recursive transform block decoding" into nextgenv2 2016-10-24 21:01:17 +00:00
Yaowu Xu
1ca24708a0 Merge "Correct data size estimation for odd size video" into nextgenv2 2016-10-24 20:57:55 +00:00
Debargha Mukherjee
0c78ebb22d Merge "Fix a bug when combining new-quant + supertx" into nextgenv2 2016-10-24 19:53:42 +00:00
Yue Chen
edd2915e21 Fix bugs in SUB8X8_MC
Change-Id: Ia544974f83c6b7f9cdb148eeb13a6d0c6eb4ed24
2016-10-24 12:22:59 -07:00
Yaowu Xu
7e87bef0ff Merge "Increase min size of compressed data" into nextgenv2 2016-10-24 19:21:45 +00:00
Yaowu Xu
23fb2feaa5 Merge "Avoid the use of uninitialized value in ActiveMap encoding route" into nextgenv2 2016-10-24 19:21:29 +00:00
Angie Chiang
10ab157a53 Fix unsigned type error in aom_scale.c
Avoid unary minus operator applied to unsigned type

Change-Id: I6986cd2b0ea236e0129ee94c02275593c287a87d
2016-10-24 11:51:50 -07:00
Yaowu Xu
10d9627ffe Merge "Use the actual inter prediction filter buffer in DRL" into nextgenv2 2016-10-24 18:34:29 +00:00
Yaowu Xu
932ca0ece2 Merge "fdct4x4_test: fix unsigned overflow" into nextgenv2 2016-10-24 18:25:55 +00:00
Debargha Mukherjee
bbd9705802 Merge "Add bit accounting information for deringing" into nextgenv2 2016-10-24 18:14:31 +00:00
Yaowu Xu
02be3ee60a Merge "Use remove some magic numbers in aom_rans_merge_prob8_pdf." into nextgenv2 2016-10-24 18:10:32 +00:00
Jingning Han
d89c72e997 Refactor tx_size to pixel number mapping in reconintra.c
Change-Id: Id66a14a869df8317c5bbb693d14262326fe84206
2016-10-24 11:07:46 -07:00
Jingning Han
62a2b9e197 Simplify variable defs in av1_encode_block_intra
Use direct table access to fetch the block size and transform size
in pixels.

Change-Id: Ia0093d5aed912be24996a06b0567bb2d873ec068
2016-10-24 11:07:27 -07:00
Jingning Han
c4c99da925 Refactor av1_predict_intra_block tx_size interface
Simplify the input arguments. Make direct use of the block size
in the unit of pixels.

Change-Id: Ifec9d90b4b4fa9605f93b4f93b8242f76f898b5f
2016-10-24 11:06:23 -07:00
Yaowu Xu
abc7d81b40 Correct data size estimation for odd size video
Given the largest transform size is 32x32, this commmit changes size
estiiation based on the size rounding up to 32 multiples to avoid
insufficient buffer allocations.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=36

Change-Id: I6eab09dc6acdc0f5a6bcadb918d62c4852aae21f
2016-10-24 10:46:32 -07:00
Jingning Han
571189c66d Add MAX_VARTX_DEPTH macro
Change-Id: I85532cf88f91f0f0cb4d9cb4b2dbda8a181297bf
2016-10-24 10:38:43 -07:00
Yaowu Xu
416b0d94de Increase min size of compressed data
This commit increases the minimum size for allocated buffer for
compressed data. The old size underestimated the size needed for
small images with width or height less than 64 pixels.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=31

Change-Id: Ia12507edc2be1e737ec49c32f64fd2ebf1eab41f
2016-10-24 09:56:09 -07:00
David Barker
d7d78c83e5 Fix a bug when combining new-quant + supertx
Previously, we assumed that av1_init_plane_quantizers is always called with
segment_id == xd->mi[0]->mbmi.segment_id (and use the latter to derive the value
of 'qindex' to use in the quantizer). But this is no longer true when supertx
is enabled. This patch instead remembers the value of 'qindex' derived from
the latest call to av1_init_plane_quantizers and uses that directly.

Change-Id: Ifa1c5bf74cad29942ff79b88ca92c231bc07f336
2016-10-24 17:43:51 +01:00
Jingning Han
6408895e69 Avoid the use of uninitialized value in ActiveMap encoding route
This commit resets the transform size to be the maximum possible
value. It avoids out-of-boundary writing when the ActiveMap is
turned on.

Change-Id: I8302dd9a5c9fffaea3edf9ad33f72aa111999737
2016-10-24 09:41:40 -07:00
Angie Chiang
91072e982f Merge "Align frame contexts." into nextgenv2 2016-10-24 16:36:36 +00:00
Jingning Han
72120969bc Use the actual inter prediction filter buffer in DRL
This avoids an encoding segmentation fault in speed 5, due to the
use of uninitialized dummy inter prediction filter buffer in the
dynamic motion vector referencing scheme.

Change-Id: Icd888d46623e8abf34267838135eed8656d552e4
2016-10-24 09:32:41 -07:00
Yaowu Xu
59b969daae fdct4x4_test: fix unsigned overflow
The difference between src and dst will be signed, the error will be
unsigned. The change quiets -fsanitize=integer:
    unsigned integer overflow: 4294967295 * 4294967295

Change-Id: I131cefcc9583ee8a5b98eb5182fd30e9c7237ea0
2016-10-24 09:21:55 -07:00
David Barker
95e248e7d7 Add bit accounting information for deringing
It seems that when bit accounting was introduced in
https://chromium-review.googlesource.com/#/c/400658/
there was one place which was accidentally skipped, leading to build failures
with --enable-dering. This patch adds the missing information.

Change-Id: I59e1bd6f7e1d4fa58506ee7af307b845c78a7cbe
2016-10-24 16:14:50 +00:00
Alex Converse
8db9faefe8 Use remove some magic numbers in aom_rans_merge_prob8_pdf.
Change-Id: I0cefae17642d7adf1b9bd637ecb81b437629aa0c
2016-10-24 09:05:03 -07:00
Jingning Han
421af3538d Merge "Limit the transform block partition depth" into nextgenv2 2016-10-24 15:57:28 +00:00
Jingning Han
bb22bbf01d Merge "Allow frame level tx_mode switch" into nextgenv2 2016-10-24 15:57:17 +00:00
Jingning Han
57d093793e Merge "Separate intra and inter tx_size counting" into nextgenv2 2016-10-24 15:56:33 +00:00
Jingning Han
a59d71a678 Make set context function aware of rectangular transform block size
Account for the rectangular transform block size in setting the
context data.

Change-Id: Ic30a6a3eaaca4c945e0aab3acbaeb99aa48b0064
2016-10-23 17:46:42 -07:00
Yaowu Xu
d30a563d23 Merge "Add a runtime flag to enable bit accounting." into nextgenv2 2016-10-23 03:15:37 +00:00
Yaowu Xu
d9301c7eb6 Merge "Add a decoder control to retrieve accounting data." into nextgenv2 2016-10-23 03:15:21 +00:00
James Zern
af322e1d71 update_state: quiet const warning w/global-motion
+ add a TODO as this is incompatible with tile-based threading

Change-Id: I057c551a5f19020366c6b85c2e67e8394bb3306f
2016-10-22 12:46:43 -07:00
James Zern
9ca190c690 update_state_supertx: rename a shadowed variable
Change-Id: I0e5fa71a4b7cd03c9e28b434b1ea72b090ca6772
2016-10-22 12:42:43 -07:00
Jingning Han
958077ab0b Merge "Fix comment typo in common_data.h" into nextgenv2 2016-10-22 03:41:05 +00:00
Jingning Han
8dbf0fd6f7 Merge "Refactor tx_size to pixel number mapping in reconintra.c" into nextgenv2 2016-10-22 03:40:53 +00:00
Jingning Han
60e6516e26 Merge "Refactor tx_size step use cases in decoder" into nextgenv2 2016-10-22 03:40:43 +00:00
Jingning Han
88b198d84b Merge "Replace tx_size_1d with tx_size_wide/high" into nextgenv2 2016-10-22 03:40:32 +00:00
Yi Luo
62b6cc0bc9 Merge "Fix avx2 16x16/32x32 fwd txfm coeff output on HBD" into nextgenv2 2016-10-22 01:46:09 +00:00
Yaowu Xu
c06feefbde Merge "Fix compiler warning when CONFIG_ACCOUNTING enabled." into nextgenv2 2016-10-22 01:18:35 +00:00
Jingning Han
bd161f9f6d Merge "Refactor decoder side qcoeff reset" into nextgenv2 2016-10-22 01:07:32 +00:00
Sarah Parker
43d56f32e5 Extend warp_frame functions to average compound predictions
Change-Id: I400e95161d576510423880b5b9923a2307b5eb02
2016-10-21 17:18:48 -07:00
Angie Chiang
a5d96c4a65 Align frame contexts.
This will allow for aligned cdfs and scan orders inside.

Change-Id: I8ebcd64d55e41da20f518a39ae6ef192def70109
2016-10-21 17:15:07 -07:00
Angie Chiang
a1a753c765 Run clang-format on entropymv.c
Change-Id: Ic9f34e32e51f8a8a4426543bae0b92f5fab0792e
2016-10-21 17:13:59 -07:00
Jingning Han
c47fe6c64b Add block size in pixels lookup table
This prepares for the next refactoring to support 2x2 transform
block sizes.

Change-Id: Ia06bc487da34e853ef9323cd13e3d482e819db43
2016-10-21 16:47:08 -07:00
Jingning Han
e7230e9f20 Fix comment typo in common_data.h
"varios" -> "various"

Change-Id: If91a462dc009f701c48c2cfd7965cd71f61f2970
2016-10-21 16:30:10 -07:00
Nathan E. Egge
eb64fc28b6 Add a runtime flag to enable bit accounting.
By default, when building with --enable-accounting the bit accounting
 code will collect statistics for every frame while decoding.
Collecting statistics can slow down decode time and we would eventually
 like to enable the CONFIG_ACCOUNTING flag by default.
This patch adds a runtime flag so that bit accounting statistics are
 only collected when actually needed.

Change-Id: I25d9eaf26ea132d61ace95b952872158c9ac29e7
2016-10-21 23:12:50 +00:00
Nathan E. Egge
c9862e05f5 Add a decoder control to retrieve accounting data.
This decoder control requires AV1 to be compiled with --enable-accounting.
Note that bit accounting data is only available after a frame has been
 decoded.

Change-Id: I8a15213d9f2587638e0edb62932738e985160e03
2016-10-21 16:12:01 -07:00
Nathan E. Egge
ebbd479e18 Fix compiler warning when CONFIG_ACCOUNTING enabled.
ISO C90 forbids mixed declarations and code and the function
 aom_accounting_set_context() was being called before the MB_MODE_INFO
 declaration.

Change-Id: I8619525b1b2fd37753891bd310d9d59c881b8807
2016-10-21 22:57:23 +00:00
Nathan E. Egge
5f34b61903 Update class0_fp_cdf and fp_cdf tables once per frame.
Move computing the class0_fp_cdf and fp_cdf tables per coded mv
 symbol to computing it only when the probabilities are updated.

Change-Id: Ib4957c8ab21e6189bcc3817a07b7681dfb343223
2016-10-21 22:56:41 +00:00
Nathan E. Egge
d7b893c667 Update class_cdf table once per frame.
Move computing the class_cdf table per coded mv class symbol to
 computing it only when the probabilities are updated.

Change-Id: I6c4a9075817e8ba2e251f0e82436995f08f2ec5c
2016-10-21 22:55:54 +00:00
Nathan E. Egge
5f7fd7ab5e Update joint_cdf table once per frame.
Move computing the joint_cdf table per coded mv joint symbol to
 computing it only when the probabilities are updated.

Change-Id: If5d195f70e6fad7b60f69606c8386ad5e69657d2
2016-10-21 22:55:31 +00:00
Nathan E. Egge
6ec4d10d3c Update inter_mode_cdf tables once per frame.
Move computing the inter_mode_cdf tables per coded inter mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I7a7b059ee75723cb6f278ed82a20cf34c27915d8
2016-10-21 22:54:50 +00:00
Yaowu Xu
b808b43b36 Merge "Update uv_mode_cdf tables once per frame." into nextgenv2 2016-10-21 22:53:42 +00:00
Jingning Han
94d5bfccdd Limit the transform block partition depth
Limit the recursive transform block partition depth to 2. For a
32x32 transform block unit, one can maximally go down to 8x8 transform
block size.

Change-Id: I2caa92bb2eee64762b7ecca8920259f7c50fb0aa
2016-10-21 15:44:34 -07:00
Yaowu Xu
e86df524b9 Merge "Update y_mode_cdf tables once per frame." into nextgenv2 2016-10-21 22:44:09 +00:00
Yaowu Xu
a82712b2a6 Merge "Update kf_y_mode_cdf tables once per frame." into nextgenv2 2016-10-21 22:43:57 +00:00
Jingning Han
9777afc392 Allow frame level tx_mode switch
Check the encoding statistics. If all the coding blocks use the
max transform size, skip transform size coding in the frame header.

Change-Id: I31cb16314e87f945d7e95a34a90a5536b3ed82d5
2016-10-21 15:42:50 -07:00
Jingning Han
dc9ad312be Separate intra and inter tx_size counting
Skip counting the inter transform block size distribution for
the intra transform block size coding.

Change-Id: Ifad9d843f57d069d0619a54d66ca18101e1b69f1
2016-10-21 15:40:18 -07:00
Jingning Han
8fd62b75c1 Simplify the recursive transform block decoding
Remove unneeded block index.

Change-Id: Ifceab4985d3ccd65d4c0a110de83a0b457ce5868
2016-10-21 15:31:21 -07:00
Jingning Han
22daaa3aea Refactor tx_size to pixel number mapping in reconintra.c
Change-Id: I1e4a43f5f08b76867240a207c60d7e85a8ffbb74
2016-10-21 15:25:17 -07:00
Jingning Han
2d64f12595 Refactor tx_size step use cases in decoder
Use lookup table to replace the arithmetic computation for transform
block step.

Change-Id: I1318d81bda9d7ffaf9d550acd19354b0615ede36
2016-10-21 15:22:12 -07:00
Jingning Han
5d5cd6a748 Replace tx_size_1d with tx_size_wide/high
This prepares the support to both rectangular and 2x2 transform
block sizes.

Change-Id: I3c2d4e317f6b627bb45d2273c278331bd976ee92
2016-10-21 15:18:39 -07:00
Jingning Han
1be1878572 Refactor decoder side qcoeff reset
Allow the decoder to memset partial dequantized coefficient line
to zero.

Change-Id: I1f07dc7bf802958754502c1b5c819cc81e7a08cb
2016-10-21 15:10:23 -07:00
Yaowu Xu
dc3c3a33cb Merge "Pass AV1_COMMON into get_scan" into nextgenv2 2016-10-21 21:51:50 +00:00
Yaowu Xu
b11ee30519 Merge "Decoder performance improvement with daala_ec." into nextgenv2 2016-10-21 21:50:13 +00:00
Sarah Parker
1634b48022 Merge "Fix logical vs bitwise & bug" into nextgenv2 2016-10-21 21:16:54 +00:00
Yi Luo
1a0f27aaa6 Fix avx2 16x16/32x32 fwd txfm coeff output on HBD
Change-Id: Ida036defe5688894a63007a31aa2dd0b3f0b5d59
2016-10-21 14:14:00 -07:00
Jingning Han
dc90bf0737 Merge "Fix unused variable error in intrapred.c" into nextgenv2 2016-10-21 21:11:31 +00:00
Jingning Han
823411ea4d Merge "Refactor tx_size to pixel number in decodeframe.c" into nextgenv2 2016-10-21 19:39:22 +00:00
Nathan E. Egge
380cb1a93c Update uv_mode_cdf tables once per frame.
Move computing the uv_mode_cdf tables per coded intra mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I627b59d30726c913f5d7ba7753cb0446a12655bb
2016-10-21 12:39:04 -07:00
Nathan E. Egge
5710c722af Update y_mode_cdf tables once per frame.
Move computing the y_mode_cdf tables per coded intra mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I8c43d09b8ef5febe2a3ec64bd51d28bd78ea73ed
2016-10-21 12:39:04 -07:00
Nathan E. Egge
3ef926edc2 Update kf_y_mode_cdf tables once per frame.
Move computing the kf_y_mode_cdf tables per coded intra mode symbol to
 computing them only when the probabilities are updated.

Change-Id: I5999447050c2f7d5dbccde80bee05ecd1c5440ab
2016-10-21 12:39:04 -07:00
Nathan E. Egge
5357dcaf71 Decoder performance improvement with daala_ec.
Cherry-pick Daala b5020bee:
 Remove redundant test in od_ec_decode_bool_q15().
Using a test that decodes 100M random binary symbols, making this change
 produced a speed up of 8.81% with gcc-4.9.3 and 3.71% with clang-3.7.1,
 both compiled with -O2.

Change-Id: If6d0077a56121a575ae53bcd4d1d9b7d800a317d
2016-10-21 12:38:30 -07:00
Yaowu Xu
91219941b1 Merge "Use divide by multiply in the ans writer." into nextgenv2 2016-10-21 18:46:29 +00:00
Angie Chiang
ff6d890557 Pass AV1_COMMON into get_scan
This CL will facilitate adapt_scan experiment.
In adapt_scan experiment, dynamic scan order will be stored in
AV1_COMMON

Change-Id: I4763ea931b5e1af54d4f173971befeb01a4db335
2016-10-21 11:46:19 -07:00
Yaowu Xu
65818322ef Merge "Sub8x8 block chroma component inter prediction" into nextgenv2 2016-10-21 18:46:18 +00:00
Angie Chiang
646e52a85a Fix unused variable error in intrapred.c
Change-Id: Icda975cd9b264c1752c3057bce8031791f91c08a
2016-10-21 11:45:31 -07:00
Angie Chiang
b0f9968ac7 Merge "Remove the has_no_coeffs corner case" into nextgenv2 2016-10-21 18:16:45 +00:00
Yaowu Xu
c2c5ec21b6 Merge "Unify set_contexts() function for encoder and decoder" into nextgenv2 2016-10-21 18:00:32 +00:00
Yaowu Xu
2f5b9d66b5 Merge "Add support for v256 intrinsics" into nextgenv2 2016-10-21 18:00:20 +00:00
Jingning Han
3d855c5e75 Refactor tx_size to pixel number in decodeframe.c
Use the table access to retrieve pixel numbers from tx_size.

Change-Id: I9459f2c3292c2f9ddf963f16b79e142de7432031
2016-10-21 10:55:54 -07:00
Yaowu Xu
c76572af16 Merge changes Icfc16070,Ied47a248,I8af087d9,I322a1366,If04580af into nextgenv2
* changes:
  Palette: Use inverse_color_order to find color index faster.
  Rewrite some loops to avoid -Wunsafe-loop-optimizations warnings.
  Remove some useless casts
  Add compiler warning flag -Wextra and fix related warnings.
  Declare some array sizes to be constants (known at compile time).
2016-10-21 17:31:42 +00:00
Yaowu Xu
98a306a1b2 Merge changes I027a4f2a,Ide91d76f into nextgenv2
* changes:
  Add complier warning -wunused.
  angle estimation: Some renames/tweaks to sync with aomedia code.
2016-10-21 17:31:22 +00:00
Yaowu Xu
32d8a496ef Merge "Code class0 using aom_read() / aom_write()." into nextgenv2 2016-10-21 17:25:50 +00:00
Yaowu Xu
f29166deff Merge "Use intra_ext_tx_cdf when coding tx_type." into nextgenv2 2016-10-21 17:25:18 +00:00
Yaowu Xu
c53f8ca6fb Merge "Use MV_FP_SIZE based constant instead of 3." into nextgenv2 2016-10-21 17:19:27 +00:00
Alex Converse
64e2f105a7 Use divide by multiply in the ans writer.
Change-Id: Ide4e9b3a605571ec41c265347217e103df8d0821
2016-10-21 09:54:41 -07:00
Jingning Han
e29ea12fc2 Sub8x8 block chroma component inter prediction
Handle the sub8x8 chroma component at the unit of 2x2/4x2/2x4 level
and use the motion vector inherited from the luma component. This
improves the coding performance:

lowres 0.4%
midres 0.25%
hdres  0.15%

Change-Id: I34dff4218cfa3e5d55e7ed0341f36f4719389f7e
2016-10-21 09:39:34 -07:00
Yaowu Xu
67cf85b883 Merge "Remove duplicate code" into nextgenv2 2016-10-21 16:34:24 +00:00
Jingning Han
a6923f7f97 Unify set_contexts() function for encoder and decoder
Remove the separate implementations of set_contexts() in encoder
and decoder.

Change-Id: I9f6e9b075532faae0f74f885d9443589254258a7
2016-10-21 09:32:28 -07:00
Yi Luo
e4abb97ba3 Merge "Fix the overflow of av1_fht32x32() in 2D DCT_DCT" into nextgenv2 2016-10-21 16:13:18 +00:00
Steinar Midtskogen
045d413ca2 Add support for v256 intrinsics
Change-Id: I1da08afaa945ca1aaf4bf9f50cf649a7feef2e60
2016-10-21 08:55:37 -07:00
Nathan E. Egge
45ea963f0b Code class0 using aom_read() / aom_write().
The av1_mv_class0_tree is a balanced tree with two leafs and can
 simply be coded as a boolean with probability class0[0].
If CLASS0_SIZE is ever changed from 1, this change will need to be
 reverted.

Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
2016-10-21 08:34:03 -07:00
Nathan E. Egge
72762a2827 Use intra_ext_tx_cdf when coding tx_type.
When building with --enable-daala_ec, the tx_type for intra blocks can be
 coded using the CDFs that are updated once per frame.
This patch converts a tx_type symbol to be coded with aom_write_symbol()
 and aom_read_symbol() that was missed in f3e8e267.

Change-Id: I34f8fef7525f88e156bbcb78dfc48994367610ce
2016-10-21 08:29:08 -07:00
Nathan E. Egge
ac499f352e Use MV_FP_SIZE based constant instead of 3.
Change-Id: I90ef3b49b499c2ac9c24797467cb4eb194fdf23b
2016-10-21 08:25:33 -07:00
Yaowu Xu
68cb657e92 Remove duplicate code
The duplicate breaks build.

Change-Id: I0f16761c4bcb8563402a664013429403b883c2e1
2016-10-21 08:22:46 -07:00
Yaowu Xu
b97c3a13de Merge "Fix typos" into nextgenv2 2016-10-21 14:44:35 +00:00
Yaowu Xu
23f0604188 Merge "Fix encoder crash when --enable-daala-ec" into nextgenv2 2016-10-21 14:44:26 +00:00
Yaowu Xu
d56df2f9f0 Merge "Pass AV1_COMMON into av1_cost_coeffs" into nextgenv2 2016-10-21 03:20:28 +00:00
Yaowu Xu
432d9071ce Merge "Add adapt_scan APIs and some helping functions" into nextgenv2 2016-10-21 03:20:17 +00:00
Yaowu Xu
361f3fe3b0 Merge "Compute all token encodings from symbol trees." into nextgenv2 2016-10-21 03:20:01 +00:00
Yaowu Xu
360383bfca Merge "decodeframe.c: aom_read_tree_cdf->aom_read_symbol" into nextgenv2 2016-10-21 03:19:14 +00:00
Yaowu Xu
6d5ebbd76c Merge "Encoder/Decoder mismatch fix: need a separate copy of eob_counts." into nextgenv2 2016-10-21 01:23:25 +00:00
Yaowu Xu
c287e271f2 Fix typos
In a previous commit: 5db9743fbbe500bb802b5e5f5eb4e495621e29f7, two
changes that appeared to be typos are breaking build when experiments
are enabled:

../../libvpx/configure --enable-experimental --enable-ref-mv
--enable-ext-intra --enable-ext-refs --enable-ext-interp
--enable-supertx --enable-var-tx --enable-entropy --enable-ext-inter
--enable-ext-tx  --enable-motion-var --enable-dual-filter
--enable-ext-partition --enable-ext-partition-types
--enable-loop-restoration --enable-rect-tx --enable-palette
--enable-aom-highbitdepth --enable-filter-intra --enable-internal-stats
&& make clean && make -j16

This commit fixes the issue.

Change-Id: I9ce5bbc96df326214202868cb0669bd334c86851
2016-10-20 18:19:16 -07:00
Yaowu Xu
e1466ad4e4 Fix encoder crash when --enable-daala-ec
Change-Id: I6855e18d92f693a9789eda7c91a3430566469bdd
2016-10-20 17:56:54 -07:00
Angie Chiang
22ba7514df Pass AV1_COMMON into av1_cost_coeffs
Change-Id: I2043d635e2a7f50f84a541501f28179b797ca326
2016-10-20 17:18:18 -07:00
Angie Chiang
e7d9d1ebeb Merge changes I163874ee,I1424690f into nextgenv2
* changes:
  Add data structure of adpat_scan experiment
  Add adapt_scan experimental flag
2016-10-20 23:52:11 +00:00
Zoe Liu
528d9de543 Merge "Sync with aom branch for ext-refs" into nextgenv2 2016-10-20 22:58:15 +00:00
Urvang Joshi
967ff395b6 Palette: Use inverse_color_order to find color index faster.
Cherry-picked from aomedia/master: b1c3bb5

Change-Id: Icfc16070160fd9763abb1dbf5545103e62b4b9ff
2016-10-20 15:54:33 -07:00
Urvang Joshi
b42827f650 Rewrite some loops to avoid -Wunsafe-loop-optimizations warnings.
For example, loops of the form:
"for (i = 0; i < 1 + max_value; ++i) ..." or
"for (i = 0; i <= max_value; ++i) ..." are possibly infinite loops,
theoretically speaking (even if practically, they aren't).
So, compiler cannot optimize those loops.

When possible, I rewrote such loops to be finite even theoretically.

Cherry-picked from aomedia/master: 4e69284

Change-Id: Ied47a24833b689c0ec011f8645cf1c01856f7c59
2016-10-20 15:53:11 -07:00
Urvang Joshi
77853e56ea Remove some useless casts
Cherry-picked from aomedia/master: 6796e7f

Change-Id: I8af087d97cadb0c2a9e37a4e4723246cdd397995
2016-10-20 15:51:41 -07:00
Urvang Joshi
d71a231c49 Add compiler warning flag -Wextra and fix related warnings.
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.

Cherry-picked from aomedia/master: 4790a69

Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-10-20 15:49:16 -07:00
Nathan E. Egge
3c05679017 Compute all token encodings from symbol trees.
The av1_token encodings must match the contents of the aom_tree_index
 structures so generate all encodings from the symbol trees.

Change-Id: I37be9f12c86a02693ae3c3c1d24b00f2abb29bfb
2016-10-20 15:34:08 -07:00
Yaowu Xu
f2581a3a30 decodeframe.c: aom_read_tree_cdf->aom_read_symbol
This was a missed replacement from cherry-pick of:
9ac7a9dc8ced90a28f5b83801a50597dc12e50a7

Change-Id: I9e01d9d7a39bed397500a293bf68dca2746aa917
2016-10-20 15:31:11 -07:00
Yaowu Xu
ec5a1942e2 Merge changes I7d6394e4,Ia8ce1464,If20e8637,Ia9adc46b,I651db25b into nextgenv2
* changes:
  Define SIMD_INLINE using AOM_FORCE_INLINE
  AOM_FORCE_INLINE: fix always_inline attribute
  Free memory allocated by daala_ec encoder.
  Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention
  sync avg_test.cc with aom/master
2016-10-20 22:30:11 +00:00
Yaowu Xu
748d3f5e0f Merge "Fix Visual Studio build." into nextgenv2 2016-10-20 22:29:57 +00:00
Jingning Han
6d51377858 Merge "Offset speed feature setting index" into nextgenv2 2016-10-20 22:16:00 +00:00
Jingning Han
feee3ed5ee Merge "Add tx_size to pixel number map" into nextgenv2 2016-10-20 22:15:53 +00:00
Jingning Han
7ae6ae3497 Merge "Add 2x2 directional intra predictors" into nextgenv2 2016-10-20 22:15:46 +00:00
Debargha Mukherjee
9cce436975 Merge "Fix for AV1.TestTell" into nextgenv2 2016-10-20 22:05:36 +00:00
Urvang Joshi
bffc0b5748 Declare some array sizes to be constants (known at compile time).
This reduces some memcpys and callocs.

Cherry-picked from aomedia/master: 4081013

Change-Id: If04580af4c63892c8af8ac5b405c7d6aabe5af89
2016-10-20 14:58:13 -07:00
Urvang Joshi
3212dda94d Add complier warning -wunused.
Cherry-picked from aomedia/master: 953f086c

Note: related fixes were already part of webm/nextgenv2.

Change-Id: I027a4f2a540af5a304b358ddbf293965b4211b9e
2016-10-20 14:58:01 -07:00
Urvang Joshi
da70e7b0fa angle estimation: Some renames/tweaks to sync with aomedia code.
Change-Id: Ide91d76fafe79b2b310ffd5afb7cd5b26b681f78
2016-10-20 14:57:34 -07:00
Urvang Joshi
43e6281f62 Encoder/Decoder mismatch fix: need a separate copy of eob_counts.
The bug was introduced here:
https://chromium-review.googlesource.com/#/c/399975/4/av1/encoder/bitstream.c
In that patch, I had removed 2nd declaration of a variable of the same
name. But it turns out that the two variables actually had a different
type (even though the name was same).

Now, we keep both variables, but rename one of them -- that fixes the
mismatch. While we are at it, made both variables local as well.

The fix can be verified as follows:
../../libvpx/configure --enable-experimental --enable-supertx
--enable-var-tx --enable-entropy --enable-internal-stats && make clean
&& make -j16

aomenc -o soccer_cif_1000_av1_b8.webm ../soccer_cif.y4m --codec=av1
--limit=50 --skip=0 -p 2 --pass=1 --fpf=soccer_cif_av1.fpf --good
--cpu-used=0 --target-bitrate=1000 --lag-in-frames=25 --min-q=0
--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0
--drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0
--maxsection-pct=2000 --arnr-maxframes=7 --arnr-strength=5 --sharpness=0
--undershoot-pct=100 --overshoot-pct=100 --frame-parallel=0
--tile-columns=0 --profile=0 --test-decode=warn

aomenc -o soccer_cif_1000_av1_b8.webm ../soccer_cif.y4m --codec=av1
--limit=50 --skip=0 -p 2 --pass=2 --fpf=soccer_cif_av1.fpf --good
--cpu-used=0 --target-bitrate=1000 --lag-in-frames=25 --min-q=0
--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0
--drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0
--maxsection-pct=2000 --arnr-maxframes=7 --arnr-strength=5 --sharpness=0
--undershoot-pct=100 --overshoot-pct=100 --frame-parallel=0
--tile-columns=0 --profile=0 --test-decode=warn -v --psnr

Change-Id: Ibd72dbe1f620e6de231513220ee4e190606613ae
2016-10-20 14:51:01 -07:00
Hui Su
c58d95717f Merge "Renaming in filter-intra sse4 code" into nextgenv2 2016-10-20 21:36:42 +00:00
Hui Su
e3642ac688 Merge "Remove av1/common/intra_filters.h" into nextgenv2 2016-10-20 21:35:10 +00:00
Hui Su
475159cb69 Merge "Seperate FILTER_INTRA from EXT_INTRA experiment" into nextgenv2 2016-10-20 21:34:33 +00:00
James Zern
d37c22271c Merge "Add matching brace in aomenc.c" into nextgenv2 2016-10-20 19:38:57 +00:00
Sarah Parker
ea16b68986 Fix logical vs bitwise & bug
This was causing one of the global motion parameters to not
be centered at 0.

Change-Id: Ide32e3d177bed5613ab768a19b4e33b37692463a
2016-10-20 12:00:16 -07:00
Peter de Rivaz
130ca4d675 Remove the has_no_coeffs corner case
BUG=webm:1277

Change-Id: I052239e8a6c468da8704bdbbb663b59533c01be2
2016-10-20 19:38:26 +01:00
Angie Chiang
648aeb0b1b Add adapt_scan APIs and some helping functions
av1_init_scan_order
initialize data structures related to adaptive scan order

av1_update_scan_prob
update nonzero probabilities from nonzero counts

av1_augment_prob
embed r + c and coeff_idx info with nonzero probabilities.
When sorting the nonzero probabilities, if there is a tie,
the coefficient with smaller r + c will be scanned first

av1_update_sort_order
apply quick sort on nonzero probabilities to obtain a sort order

av1_update_scan_order
apply topological sort on the nonzero probabilities sorting order to
guarantee each to-be-scanned coefficient's upper and left coefficient
will be scanned before the to-be-scanned coefficient.

av1_update_neighbors
For each coeff_idx in scan[], update its above and left neighbors in
neighbors[] accordingly.

Change-Id: I64c4938057daf8e30e48609a00ecc08d2e3062f4
2016-10-20 11:20:40 -07:00
Zoe Liu
6cfaff95b7 Sync with aom branch for ext-refs
Plus a small code clean up. The experiment of EXT_REFS, compared against
the baseline, using Overall PSNR, now obtains a gain on lowres as:
Avg: -5.818; BDRate: -5.653

Compared against the previous EXT_REFS results on lowres, a tiny gain is
obtained as:
Avg: -0.047, BDRate: -0.063

(1) 780952 Add encoder first pass support to bi-prediction in EXT_REFS
(2) f91498 Add pred prob handling for new references in EXT_REFS
(3) e91472 Add decoder support for bi-direct prediction in EXT_REFS
(4) 0dbac9 Add encoder support to new references in EXT_REFS
(5) ad70cc Remove hard-coded number for EXT_REFS
(6) 9c1e2f Add the use of new reference frames at encoder in EXT_REFS
(7) 6d4fde Add the experiment flag of EXT_REFS

Change-Id: I26f7ca45b9ede7579fdb9d0d6a1a91f4334599bd
2016-10-20 10:55:11 -07:00
Angie Chiang
37fb8edd7c Add data structure of adpat_scan experiment
Change-Id: I163874ee64b9c348de2c7cc8e7b2852308734b0e
2016-10-20 10:00:10 -07:00
Yi Luo
157e45a44b Fix the overflow of av1_fht32x32() in 2D DCT_DCT
- Use range check function to avoid DCT_DCT overflow.
  We need to re-develop the column txfm side scaling/rounding. Now,
  we prefer to maintain the current BDRate level.
- Encoder user level time reduction <1% owing to av1_fht32x32_avx2.
- Add MemCheck unit test and fdct32() unit test.

Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
2016-10-20 09:22:24 -07:00
Angie Chiang
8c2dc6f591 Add adapt_scan experimental flag
Change-Id: I1424690fa792b960a1cfb78bbcb37da6b9899ee6
2016-10-20 09:19:01 -07:00
Peter de Rivaz
f994855e8e Fix for AV1.TestTell
The tell functions return an unsigned integer.
This causes the AV1.TestTell test case to fail because
-1 is greater than 20 when treated as an unsigned integer.

Change-Id: I9dd1d7eb61260d30d1713a4917159fc6fe8eee42
2016-10-20 16:24:06 +01:00
hui su
9ff4134f54 Renaming in filter-intra sse4 code
Change-Id: Iff1786a92d164e6b9cfaf4a59ece79819494276f
2016-10-19 21:41:06 -07:00
hui su
344b643d59 Remove av1/common/intra_filters.h
Use a single header reconintra.h for all intra prediction
related codes.

Change-Id: Ib869447f8c482b534c890eab673e81ff830e8d85
2016-10-19 21:41:06 -07:00
hui su
5db9743fbb Seperate FILTER_INTRA from EXT_INTRA experiment
Prepare for the av1/nextgenv2 merge.

Coding gain (%):

               lowres     midres
ext-intra       0.69       0.97
filter-intra    0.67       0.83
both            1.05       1.48

Change-Id: Ia24d6fafb3e484c4f92192e0b7eee5e39f4f4ee6
2016-10-19 21:40:49 -07:00
Yaowu Xu
cfc5ac5034 Merge "Partition the ans experiment into 'ans' and 'rans'" into nextgenv2 2016-10-19 22:58:05 +00:00
Jingning Han
775d99f07e Offset speed feature setting index
Change-Id: If201cbd4175842f68e6dcfb0414ff16ca07e0881
2016-10-19 22:55:44 +00:00
hui su
251e151c3d Add matching brace in aomenc.c
Change-Id: Iccb75d5204f0f52f2c7d6e18d1f8223ce10f68ba
2016-10-19 15:31:51 -07:00
Steinar Midtskogen
c38afedb8d Define SIMD_INLINE using AOM_FORCE_INLINE
Change-Id: I7d6394e48e9b6093e5b523387ed250f371ee7fb9
2016-10-19 15:14:27 -07:00
Thomas
e28d92be97 Fix Visual Studio build.
Change-Id: I01608dfd597cc1d2bd4e73918aa29cf9251edb08
2016-10-19 15:14:27 -07:00
Thomas Davies
f693610a1a Step size and arithmetic coding for delta quantization.
Example performance: 1.8% bit rate savings using
the AQ test mode aq-mode=4 :
./aomenc --codec=av1 --ivf --tile-columns=1 --tile-rows=1 \
                 --kf-max-dist=1000 --kf-min-dist=1000 --cpu-used=0 \
                 --passes=1 --threads=1 --lag-in-frames=0 \
                 --end-usage=q --limit=600 --cq-level=42 \
                 --aq-mode=4 --error-resilient=1 out.bits FourPeople_1280x720_60.y4m

Change-Id: Iba01cf2732a57f3c27481ac2a3c8fc37bb9e5533
2016-10-19 15:14:27 -07:00
James Zern
7dec51534f AOM_FORCE_INLINE: fix always_inline attribute
Change-Id: Ia8ce146489713e137004ccf41faf35aa5645b8ae
2016-10-19 15:14:27 -07:00
Arild Fuldseth
07441165fe Support for delta-q at superblock level
Change-Id: I4128af44776d1f361bddc1fdffb75ed2224dbfa5
2016-10-19 15:14:27 -07:00
Nathan E. Egge
e734fcb114 Free memory allocated by daala_ec encoder.
Free the two memory buffers allocated by the daala_ec encoder when
 calling od_ec_enc_clear() from aom_daala_stop_encode().

Change-Id: If20e86374ea29e51ee59111012905e56039dd4cc
2016-10-19 15:14:27 -07:00
Steinar Midtskogen
f250e20d13 Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention
Change-Id: Ia9adc46b8a4d08c5b8e0089ea1a1526df4f1e1dc
2016-10-19 15:14:27 -07:00
Yaowu Xu
fc5176f851 sync avg_test.cc with aom/master
Change-Id: I651db25bee8f83a9fc6dcd35db5007a002f171c0
2016-10-19 15:14:27 -07:00
Yaowu Xu
dc8a2c523f Merge "Always send frame size explicitly" into nextgenv2 2016-10-19 22:00:40 +00:00
Jingning Han
03b3514058 Add 2x2 directional intra predictors
Change-Id: Iaa25269a15231dadeaba0f4836c864fc10e858df
2016-10-19 21:58:09 +00:00
Yaowu Xu
0a3284cbb9 Merge "Fix build issues when --enable-aom-qm" into nextgenv2 2016-10-19 21:56:41 +00:00
Jingning Han
02935f5f1b Add tx_size to pixel number map
Change-Id: I789fa11638f155f1092a1e9260d26c7855d18e37
2016-10-19 14:52:53 -07:00
Yaowu Xu
8057103d2e Merge "Fix decodeframe.c format" into nextgenv2 2016-10-19 21:28:06 +00:00
Yue Chen
0651eced9f Merge "Remove OBMC from the experimental configure list" into nextgenv2 2016-10-19 21:02:15 +00:00
Yaowu Xu
2a813e41ce Merge "Add unit test for delta-q (aq-mode=4)" into nextgenv2 2016-10-19 21:01:03 +00:00
Jingning Han
8f6eb189e6 Fix decodeframe.c format
Change-Id: I2228a3d1778917ac760582fbec3c868be5d9ba1c
2016-10-19 13:48:57 -07:00
Arild Fuldseth
842e9b030f Always send frame size explicitly
This commit changes to send frame size explicitly when
error_resilient_mode=1. Purpose is to allow parsing of bitstream
after a packet loss.

Change-Id: I7d1c010a465aa18914762cc1a3e61db377304c08
2016-10-19 12:35:12 -07:00
Yaowu Xu
0dd046371f Fix build issues when --enable-aom-qm
Change-Id: I1a462675c06c4b2a5f8b4b347f23fec67feccdd0
2016-10-19 12:26:53 -07:00
Alex Converse
ec6fb649da Partition the ans experiment into 'ans' and 'rans'
The (new) ans experiment replaces the bool coder with uABS bools. The
'rans' experiment adds multisymbol coding.

This matches the setup in aom/master.

Change-Id: Ida8372ccabf1e1e9afc45fe66362cda35a491222
2016-10-19 12:03:15 -07:00
Yaowu Xu
870a72d6b5 Merge "Fix failing TestBitIO test with --enable-daala_ec." into nextgenv2 2016-10-19 18:59:20 +00:00
Yaowu Xu
e94767ae97 Merge "Change return type of tell and tell_frac to uint32_t." into nextgenv2 2016-10-19 18:59:08 +00:00
Yue Chen
48877de873 Remove OBMC from the experimental configure list
It was replaced by MOTION_VAR in commit cb60b18

Change-Id: I7ab625eef4dbae2e5585d9fa3b6873aa78b2c254
2016-10-19 18:45:34 +00:00
Arild Fuldseth (arilfuld)
9f28cb8f93 Add unit test for delta-q (aq-mode=4)
Change-Id: Ic529355880b4dbd076a7e46e7b03a49a1ee5f6f0
2016-10-19 11:35:40 -07:00
Urvang Joshi
66b1fcc924 Merge changes I3922dea2,I3bab2848,I21f7478a,Ida5de713,Ib9f0eefe, ... into nextgenv2
* changes:
  Fix warnings reported by -Wshadow: Part4: main directory
  Fix warnings reported by -Wshadow: Part3: test/ directory
  Fix warnings reported by -Wshadow: Part2b: more from av1 directory
  Fix warnings reported by -Wshadow: Part2: av1 directory
  Fix warnings reported by -Wshadow: Part1b: scan_order struct and variable
  Fix warnings reported by -Wshadow: Part1: aom_dsp directory
  Move STAT_TYPE enum to source file.
  Code cleanup: mainly rd_pick_partition and methods called from there.
2016-10-19 18:25:52 +00:00
Nathan E. Egge
e58781d329 Fix failing TestBitIO test with --enable-daala_ec.
Change-Id: I6a885b7c6315261d67a9c2fcde914206b8301f4a
2016-10-19 10:54:40 -07:00
Nathan E. Egge
b244f39627 Change return type of tell and tell_frac to uint32_t.
The bit accounting functions aom_reader_tell() and aom_reader_tell_frac()
 return the number of bits and 1/8th bits respectively.
This patch changes the return type from ptrdiff_t which is signed to
 uint32_t which is unsigned.
The size_t type is not used since we only care about the number of bits
 or 1/8 bits per entropy coder context and we don't expect to code more
 than 512 megabits per tile.

Change-Id: I84a119d1f52829dcbdb66a92656eacca06e42b11
2016-10-19 10:53:52 -07:00
Hui Su
3e908b7f44 Merge "Temporary fix for 4X8 block intra prediction." into nextgenv2 2016-10-19 16:55:20 +00:00
Hui Su
e22a480225 Merge "Fix format in set_offsets()" into nextgenv2 2016-10-19 16:54:30 +00:00
Angie Chiang
d83fc3b8d9 Merge "Add av1_fdct64_new and av1_idct64_new" into nextgenv2 2016-10-19 16:34:24 +00:00
Urvang Joshi
4145bf05ae Fix warnings reported by -Wshadow: Part4: main directory
Now that all warnings are taken care of, add warning flag -Wshadow to
configure.

Note: Enabling this flag for C++ generates some useless warnings about
some function parameters shadowing class member function names. So, only
enabling this warning for C code.

Cherry-picked from aomedia/master: b96cbc4

Change-Id: I3922dea2e6976b16519c4aa4d1bd395c198134f1
2016-10-19 07:56:53 -07:00
Peter de Rivaz
74d0ad844e Fix for var_tx context update
The tx_partition_set_contexts function changes tx_size even
for blocks coded with a rectangular transform.
This causes an internal rd inconsistency when using all of
CONFIG_VAR_TX, CONFIG_RECT_TX, CONFIG_EXT_TX.

Change-Id: Ia45d4a8893b0961534219bb96d9652719038c7a1
2016-10-19 11:43:11 +01:00
Yaowu Xu
caf2023ae1 Reorder includes
Change-Id: I97487bf353471bf9d245cd620780adfb1d3fc2b1
2016-10-19 04:34:49 +00:00
Michael Bebenita
6048d05225 Bit accounting.
This patch adds bit account infrastructure to the bit reader API.
When configured with --enable-accounting, every bit reader API
function records the number of bits necessary to decoding a symbol.
Accounting symbol entries are collected in global accounting data
structure, that can be used to understand exactly where bits are
spent (http://aomanalyzer.org). The data structure is cleared and
reused each frame to reduce memory usage. When configured without
--enable-accounting, bit accounting does not incur any runtime
overhead.

All aom_read_xxx functions now have an additional string parameter
that specifies the symbol name. By default, the ACCT_STR macro is
used (which expands to __func__). For more precise accounting,
these should be replaced with more descriptive names.

Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
2016-10-19 04:34:29 +00:00
Debargha Mukherjee
4bacfcffd0 Merge "Fix ransac random generator seeding" into nextgenv2 2016-10-19 01:39:08 +00:00
Yaowu Xu
321556a557 Merge "Update segment tree_cdf per frame." into nextgenv2 2016-10-19 01:09:52 +00:00
Yaowu Xu
4aec17a7ec Merge "Adds ability to measure with a higher precision the number of bits read per symbol." into nextgenv2 2016-10-19 01:09:41 +00:00
Sarah Parker
cd2750048f Merge "Add clamping to parameter search" into nextgenv2 2016-10-19 00:44:28 +00:00
Sarah Parker
5572486ed7 Merge "Adjust gm costing so GLOBAL_ZERO is treated as regular zeromv" into nextgenv2 2016-10-19 00:44:12 +00:00
Jingning Han
97d854831f Fix format in set_offsets()
Change-Id: I371297e6ee000e6dc01ba1544763cbed429b0e5a
2016-10-18 17:42:09 -07:00
Brennan Shacklett
7523a7ecd6 Temporary fix for 4X8 block intra prediction.
Currently the RD loop traverses 4X8 blocks in inverted N order while
the bitstream stores blocks smaller than 8x8 in Z order. This causes a
discrepancy where the RD loop reads uninitialized data while
performing intra prediction.  As a temporary fix simply disable the
use of the extended right edge for 4X8 blocks, until the bitstream can
be changed to match the logical structure of the blocks.

Change-Id: I44a9e4fc1a15cd551a7b38c3c1227bc5dac77e9a
2016-10-18 17:24:53 -07:00
Urvang Joshi
88a03bb68f Fix warnings reported by -Wshadow: Part3: test/ directory
Cherry-picked from aomedia/master: be029580

Change-Id: I3bab28488388f92f2db20e6af8fc9cf2d7f26015
2016-10-18 17:22:58 -07:00
Urvang Joshi
368fbc955d Fix warnings reported by -Wshadow: Part2b: more from av1 directory
From code only part of nextgenv2 (and not aomedia)

Change-Id: I21f7478a59d525dff23747efe5238ded16b743d2
2016-10-18 17:22:44 -07:00
Urvang Joshi
454280dabf Fix warnings reported by -Wshadow: Part2: av1 directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Cherry-picked from aomedia/master: 863b0499

Change-Id: Ida5de713156dc0126a27f90fdd36d29a398a3c88
2016-10-18 17:22:34 -07:00
Urvang Joshi
03f6fdcfca Fix warnings reported by -Wshadow: Part1b: scan_order struct and variable
- Change struct name to all caps SCAN_ORDER to be locally consistent.
- Rename struct pointers to 'scan_order' instead of hard to read short
  names 'so' and 'sc'.

Cherry-picked from aomedia/master: 30abc082

Change-Id: Ib9f0eefe28fa97d23d642b77d7dc8e5f8613177d
2016-10-18 17:22:23 -07:00
Urvang Joshi
fdb60962f4 Fix warnings reported by -Wshadow: Part1: aom_dsp directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Cherry-picked from aomedia/master: 09eea2193

Change-Id: I61030e773137ae107d3bd43556c0d5bb26f9dbf8
2016-10-18 17:22:12 -07:00
Urvang Joshi
b5ed35008d Move STAT_TYPE enum to source file.
In the header, all we need is number of stat types, not the names for actual
types.

Removing it avoids names like 'Y', 'U', 'V' and 'ALL' being visible
in all files that include the encoder.h header.

Change-Id: I874a73a3cfe6bcb29aedea102077a52addc49af6
2016-10-18 17:22:00 -07:00
Urvang Joshi
526484482a Code cleanup: mainly rd_pick_partition and methods called from there.
- Const correctness
- Refactoring
- Make variables local when possible etc
- Remove -Wcast-qual to allow explicitly casting away const.

Cherry-picked from aomedia/master: c27fcccc
And then a number of more const correctness changes to make sure other
experiments build OK.

Change-Id: I77c18d99d21218fbdc9b186d7ed3792dc401a0a0
2016-10-18 17:21:27 -07:00
Nathan E. Egge
f627e58e0f Update segment tree_cdf per frame.
Move computing the segmentation_probs.tree_cdf table per symbol to
 computing it only when the probabilities are updated.

Change-Id: I3826418094bbaca4ded87de5ff04d4b27c85e35a
2016-10-18 16:58:48 -07:00
Michael Bebenita
d7baf45ff6 Adds ability to measure with a higher precision the number of bits
read per symbol.

Change-Id: I218abaa5172b769b66dba45050381c0212602668
2016-10-18 16:57:56 -07:00
Sarah Parker
081783dc67 Add clamping to parameter search
This fixes mismatches due to overflowing low precision parameters.

Change-Id: If34e39ca7ab0adc9688d46b0e8ed62cbb6fdaff0
2016-10-18 16:43:54 -07:00
Sarah Parker
ae51dd820d Adjust gm costing so GLOBAL_ZERO is treated as regular zeromv
Change-Id: I1b41146ae844c985566f5f9fdaeb5d4a4a5927b6
2016-10-18 16:18:23 -07:00
Sarah Parker
efa6582235 Fix ransac random generator seeding
Ransac's get_rand_indices originally used rand_r seeded with the
same value every time, producing the same random sequence at every
iteration. This causes the global motion parameters to be slightly
less accurate because ransac cannot improve the model fit after
the first attempt.

Change-Id: Idca2f88468ea21d19ba41ab66e5a2744ee33aade
2016-10-18 16:14:46 -07:00
Angie Chiang
792519bdef Add av1_fdct64_new and av1_idct64_new
Change-Id: If497816d7f6ee094d40872a2f988c91e90b78d7b
2016-10-18 16:07:56 -07:00
Guillaume Martres
470efbcf01 Remove rd_variance_adjustment
This function is called after `super_block_yrd` and assumes that the dst
buffer is correct but that is no longer always the case after
daf841b4a10ece1b6831300d79f271d00f9d027b since we don't call
`txfm_rd_in_plane` after the RDO loop in `choose_tx_size_from_rd`.
We could fix this by always saving and restoring the dst buffer but
removing `rd_variance_adjustment` is a better solution:
- Getting the dst buffer always right is tricky as demonstrated by the
  fact that it is wrong now, even if we fix it now we could break it later
  and not notice
- Perceptual weighting is a good idea but `rd_variance_adjustment` is the
  wrong approach as it weights both the rate and the distortion:
  to get meaningful units you should only weight the distortion,
  weighting rate means that we pretend some bits cost less than other
  bits, this is not the case. The distortion weighting approach is
  implemented by Daala in `od_compute_dist` and we plan to experiment
  with this in AV1 too.
- Removing `rd_variance_adjustment` improves coding efficiency on all
  metrics, here are the results for objective-1-fast using the Low
  Latency settings:

      PSNR Y:     -0.14%
     PSNRHVS:     -0.17%
        SSIM:     -0.12%
      MSSSIM:     -0.12%
   CIEDE2000:     -0.07%

Change-Id: I74b26b568ee65f56521646b8f30dd53bcd29fce3
2016-10-18 14:40:15 -07:00
Jingning Han
32658e2ab8 Add cb4x4 experimental flag
Experiment on coding block at resolution of 4x4 block.

Change-Id: I6aa201038f00c590747d800edb0a3e76ab1a51e8
2016-10-18 14:30:51 -07:00
Zoe Liu
a6a6dd509d A small bug fix in ext-refs on the RD mode selection
Change-Id: I25f14fec8e806cdf98d904488aaf200169def34d
2016-10-18 13:03:12 -07:00
Yushin Cho
40f1d487ad Remove unused PICK_MODE_CONTEXT::is_coded.
Change-Id: Ibc73b4066dcdee45d32355144124762d26a16a28
2016-10-18 12:54:12 -07:00
Urvang Joshi
8a02d76a93 Remove unused array 'last_frame_seg_map_copy'.
This array was allocated and used to save and restore segmentation map,
however the original segmentation map was never modified between the
calls to save and restore.

Change-Id: Iaf0fbfed733c097e84cf44d2aa6b8f35d2fb456b
2016-10-18 12:54:12 -07:00
Jingning Han
d98a45a6cc Add sub8x8_mc experimental flag
Change-Id: Ifcc329df240c0771172180933a6180b21fd31abe
2016-10-18 12:54:12 -07:00
Yaowu Xu
c2461b5e87 Merge "Remove macroblock::skip_optimize." into nextgenv2 2016-10-18 19:52:50 +00:00
Yaowu Xu
be0d933671 Merge "Skip 4x4 transform if maximum possible transform is 32x32" into nextgenv2 2016-10-18 19:52:42 +00:00
Yaowu Xu
cb61012305 Merge "Take out some early termination speed features" into nextgenv2 2016-10-18 19:48:46 +00:00
Angie Chiang
8c9893be05 Merge "Add experimental tag for 64x64 transform" into nextgenv2 2016-10-18 19:14:56 +00:00
Yushin Cho
e2b403b979 Remove macroblock::skip_optimize.
This is not used since the commint 00cd5de536fd5545d8fb663b2db81c014e3e6a41,
"Remove skip_recode speed feature".

Change-Id: Ic03da6c0095f6285a3889d5d22e8aaa2e6cbfd79
2016-10-18 11:26:11 -07:00
Hui Su
eafb2e62ac Skip 4x4 transform if maximum possible transform is 32x32
On average no compression performance changes. Encoding speed is
increased by 10~20% on some test clips in the derf set.

Change-Id: I9856caaa260303f6f6259686671bed7d51012277
2016-10-18 11:26:11 -07:00
Jingning Han
3f16725ff2 Take out some early termination speed features
Drop some speed features used in speed 2 and above, during the
algorithm development process. This helps simplify the codebase.

Change-Id: I3b2f5560d90b00d2d8fd57c2cb36f6ddd3f228e4
2016-10-18 11:26:11 -07:00
Yaowu Xu
8f7b1d3db9 Merge "Move a statement to match order in aom/master" into nextgenv2 2016-10-18 17:58:33 +00:00
Yaowu Xu
31e76edbfe Merge "Remove stale OD_ACCOUNTING code." into nextgenv2 2016-10-18 17:58:13 +00:00
Debargha Mukherjee
fe3814846b Add experimental tag for 64x64 transform
Change-Id: I65c04006f6e6eb13ceb22efc1c39915cb3c82b82
2016-10-18 10:24:31 -07:00
Yaowu Xu
ee775b13e2 Move a statement to match order in aom/master
Change-Id: Ic11eae36c9c62a20699197847aa3ef9562d4ad7e
2016-10-18 10:00:21 -07:00
Yaowu Xu
85c5566559 Merge "Port aom_reader_tell() support" into nextgenv2 2016-10-18 16:48:57 +00:00
Michael Bebenita
63b44c4c50 Remove stale OD_ACCOUNTING code.
Change-Id: Ie90dd06c387119ccd9c920a328c942477df00bb7
2016-10-18 09:12:06 -07:00
Debargha Mukherjee
d8ff1986d4 Merge "Fix for var_tx entropy context with rect_tx" into nextgenv2 2016-10-18 16:03:37 +00:00
Debargha Mukherjee
3f8b5b903f Merge "Correction to costing rect_tx" into nextgenv2 2016-10-18 16:03:18 +00:00
Michael Bebenita
868fc0b04a Port aom_reader_tell() support
This commit ports the following from aom/master:
4c46278 Add aom_reader_tell() support.
b9c9935 Remove an erroneous declaration.
56c9c3b Fix ANS build.

Change-Id: I59bd910f58c218c649a1de2a7b5fae0397e13cb1
2016-10-18 08:50:05 -07:00
Peter de Rivaz
46fcb05fde Fix for var_tx entropy context with rect_tx
This computation should match the code in encode_block
to increase the accuracy of the rd optimization.

Change-Id: Ibc9d9ab6d88d0c0f3af62e9cc233216aba48a57e
2016-10-18 15:38:01 +01:00
Peter de Rivaz
b85a5a7eac Correction to costing rect_tx
When built with var_tx and ext_tx, select_tx_size_fix_type is used
to compute the cost for using a particular tx_type.
The code indexes the array inter_tx_type_costs at the wrong location
resulting in a zero cost for signalling tx_type for rect_tx blocks.

Change-Id: Iba38be3a0d822109f778f0600b242dfb40359766
2016-10-18 11:55:36 +01:00
Nathan E. Egge
9ac1f7d770 Create aom_cdf_prob type for 16-bit probabilities.
Change-Id: I33899eca44300037816c9f20c965aa8311a1ef52
2016-10-17 20:22:48 -07:00
Nathan E. Egge
45741e9351 Rename daala_read_tree_cdf() to daala_read_symbol().
Change-Id: I35f85bad88c637cea62577c546cdd5ced0e21bd6
2016-10-17 20:22:19 -07:00
Hui Su
abf6fb9967 Merge "Add filter_intra experiment flag" into nextgenv2 2016-10-18 00:54:20 +00:00
Yaowu Xu
40bcdbcf3a Merge "Fix warning when discarding const qualifier." into nextgenv2 2016-10-18 00:50:09 +00:00
Yaowu Xu
65147563a5 Merge "Revert code formatting of OD_UNIFORM_CDFS_Q15." into nextgenv2 2016-10-18 00:49:57 +00:00
Yaowu Xu
9ce9e6d533 Merge "Rename aom_write_tree_cdf() to aom_write_symbol()." into nextgenv2 2016-10-18 00:49:41 +00:00
Yaowu Xu
007fd85007 Merge "Bug fix in super_block_uvrd()." into nextgenv2 2016-10-18 00:49:28 +00:00
Yaowu Xu
b44f53ba26 Merge "Display --bit-depth in -h with highbitdepth enabled." into nextgenv2 2016-10-18 00:49:18 +00:00
Yaowu Xu
79644f615e Merge "Update partition_cdf per frame." into nextgenv2 2016-10-18 00:49:05 +00:00
Yaowu Xu
153df29bbf Merge "Update inter_ext_tx_cdf per frame." into nextgenv2 2016-10-18 00:48:53 +00:00
Yaowu Xu
a6fa5436ff Merge "Update intra_ext_tx_cdf per frame." into nextgenv2 2016-10-18 00:48:41 +00:00
Yaowu Xu
f507ba79fe Merge "Update switchable_interp_cdf once per frame." into nextgenv2 2016-10-18 00:48:26 +00:00
Yue Chen
3fcf53e381 Merge "Refactor motion estimation in MOTION_VAR experiment" into nextgenv2 2016-10-18 00:32:00 +00:00
hui su
ffcf4fb788 Add filter_intra experiment flag
Will break ext-intra into 2 experiments: ext-intra and filter-intra.

Change-Id: Ibf66e9b9d9307fd58a703eada9569b74d171434b
2016-10-17 16:17:16 -07:00
Yue Chen
e9638ccfff Refactor motion estimation in MOTION_VAR experiment
To get ready for pulling AV1 to nextgenv2. Refactoring is done to
make the code structures similar, especially for the motion search
part.

Change-Id: I5d7636394408d97de55394d668540f5627827983
2016-10-17 12:48:10 -07:00
Nathan E. Egge
19698a7084 Fix warning when discarding const qualifier.
Cherry-pick Daala 211c2a41: Clean up EC tell() and tell_frac() functions.
Add a const qualifier to the od_ec_enc and od_ec_dec parameters of
 the od_ec_enc_tell(), od_ec_enc_tell_frac(), od_ec_dec_tell(), and
 od_ec_dec_tell_frac() functions.
Add an OD_WARN_UNUSED_RESULT to od_ec_enc_tell_frac().

Change-Id: Ia50e2fd75e98d8a03d993449d658b695cf56e6fb
2016-10-17 12:16:27 -07:00
Nathan E. Egge
f3035f2bc7 Revert code formatting of OD_UNIFORM_CDFS_Q15.
The formatting of OD_UNIFORM_CDFS_Q15[] in entcode.c is helpful for
 for understanding what is contained in the array (e.g., the uniform
 probability distributions of small sizes 2 through 16).
This patch reverts the change made in f4b2926d and adds linter hints to
 ignore the formatting.

Change-Id: I2ad9fe6673b86e6067cb97b40f0f0e69a119cdf5
2016-10-17 12:16:26 -07:00
Nathan E. Egge
56eeaa5daf Rename aom_write_tree_cdf() to aom_write_symbol().
Change-Id: I7c088c55f1c461063976d5bd84ff2026c4f3bc69
2016-10-17 11:54:51 -07:00
Yushin Cho
09de28b4f7 Bug fix in super_block_uvrd().
In super_block_uvrd(),if is_cost_valid == 0, all return parameters,
i.e. rate, distortion, skippable, and sse, are reset.
So, should not call txfm_rd_in_plane() if is_cost_valid == 0.
Also, the bug causes av1_xform_quant() to see invalid diff signal
since av1_subtract_plane() is not called in super_block_uvrd().

Change-Id: Iaa06061e2e9aa8876b4611a54f4ae6b8d499332b
2016-10-17 11:25:13 -07:00
Nathan E. Egge
d1b239c0c3 Display --bit-depth in -h with highbitdepth enabled.
Display the -b --bit-depth command line parameter on of aomenc when
 --config-aom-highbitdepth is enabled.

Change-Id: I76147e38b9985e68b1e642e21be8fd4d8ec4d966
2016-10-17 10:45:24 -07:00
Nathan E. Egge
fba2be692f Update partition_cdf per frame.
Move computing the partition_cdf tables per symbol to
 computing them only when the probabilities are updated.

Change-Id: I442f9230ba00be7f5d0558d7c38d7324ad009ee8
2016-10-17 10:21:06 -07:00
Nathan E. Egge
93878c4243 Update inter_ext_tx_cdf per frame.
Move computing the inter_ext_tx_cdf tables per symbol to
 computing them only when the probabilities are updated.

Change-Id: I5e1e62f8eae8f6b2edbbd378beeb786649502c10
2016-10-17 10:20:53 -07:00
Nathan E. Egge
7c5b4c1665 Update intra_ext_tx_cdf per frame.
Move computing the intra_ext_tx_cdf tables per symbol to
 computing them only when the probabilities are updated.

Change-Id: I26d5e419e103093e98a7d896c196176305b50fc9
2016-10-17 08:47:02 -07:00
Nathan E. Egge
4947c296f7 Update switchable_interp_cdf once per frame.
Move from computing the switchable_interp_cdf per symbol to
 computing once per frame when the probabilities are adapted.

Change-Id: I6571126239f0327e22bb09ee8bad94114291683e
2016-10-17 08:44:57 -07:00
Yaowu Xu
5cb0a7abc9 Replace {} with continue
Change-Id: I2e939e898cc30c2999b47f2789191e08272b1cc0
2016-10-17 08:12:18 -07:00
Yaowu Xu
2bdb9e6344 Merge changes Ie43c599f,Icd0dbed4,Ic04e180b into nextgenv2
* changes:
  Move av1_indices_from_tree() to common code space.
  Add code to compute in-order mappings for tokens.
  Fix bug in av1_tree_to_cdf_2D() macro.
2016-10-14 23:46:48 +00:00
Yaowu Xu
73d702db7f Merge changes I339d0389,I2fa1e87a,If79fa5ae,Icb1a8cb8,Ic76de4a4, ... into nextgenv2
* changes:
  Add missing CONFIG_DAALA_EC declaration.
  Add API for writing trees using a CDF.
  Add macro to build a simple cdf table.
  Use Daala entropy coder to code trees.
  Silence clang-format code review warning.
  Use Daala entropy coder to code bits.
  Clear existing format issue in the codebase
  Add Daala entropy coder.
2016-10-14 23:42:22 +00:00
Yi Luo
1dec26e004 Merge "Zero high 128b YMM registers to avoid SSE-AVX transition penalties" into nextgenv2 2016-10-14 23:13:10 +00:00
Urvang Joshi
03ae55214c Merge "Bugfix: fix the build for CONFIG_FP_MB_STATS" into nextgenv2 2016-10-14 22:11:28 +00:00
Nathan E. Egge
8abf8673e6 Move av1_indices_from_tree() to common code space.
Move the av1_indices_from_tree() function from av1/encoder/treewriter.c
 to aom_dsp/prob.c so that it can be used by both the encoder and
 the decoder.

Change-Id: Ie43c599f425c3503b1ff93f0c77b5033a05b1bb4
2016-10-14 14:59:27 -07:00
Nathan E. Egge
a67c0ff4d7 Add missing CONFIG_DAALA_EC declaration.
Without first including ./aom_config.h in aom_dsp/prob.c the memmove
 function is implicitly defined and causes a compiler warning.

Change-Id: I339d0389f10324a1085aba7d6492b2159a14da92
2016-10-14 14:59:27 -07:00
Nathan E. Egge
cfb02ddcad Add code to compute in-order mappings for tokens.
Add av1_indices_from_tree() function that computes a forward and inverse
 mapping of the tree leaf-node symbols to their in-order traversal.
This is necessary because many of the aom_tree binary trees have their
 leaf nodes out of order (e.g., an in-order traversal of a tree with n
 nodes does not start at symbol 0 and go to symbol n - 1), but the CDFs
 created by tree_to_cdf() are indexed in-order.

Change-Id: Icd0dbed4c171a67c9e84a634106c4fdb5b1b3488
2016-10-14 14:59:27 -07:00
Nathan E. Egge
44460148b2 Add API for writing trees using a CDF.
Added aom_write_tree_cdf() and aom_read_tree_cdf() function calls to
 bitwriter.h and bitreader.h respectively.
These calls take a multisymbol CDF and an index and directly encode the
 symbol using the enabled entropy coder.
Currently only the daala entropy encoder supports this (enabled with
 --enable-daala_ec) and a compile error is thrown otherwise.

Change-Id: I2fa1e87af4352c94384e0cfdbfd170ac99cf3705
2016-10-14 14:59:27 -07:00
Nathan E. Egge
439c50251f Fix bug in av1_tree_to_cdf_2D() macro.
Change-Id: Ic04e180b09745fab2230d05985770c41deea4fad
2016-10-14 14:59:27 -07:00
Nathan E. Egge
e2ed411836 Add macro to build a simple cdf table.
Add the av1_tree_to_cdf() macro which takes a aom_tree_index tree and
 associated aom_prob probabilities and constructs a daala uint16_t cdf.
The av1_tree_to_cdf_1D() and av1_tree_to_cdf_2D() apply av1_tree_to_cdf()
 across 1D and 2D arrays respectively.

Change-Id: If79fa5ae034263f279d7d0842493570885272fb2
2016-10-14 14:59:27 -07:00
Nathan E. Egge
43acafdee2 Use Daala entropy coder to code trees.
When building with --enable-daala_ec, calls to aom_write_tree() and
 aom_read_tree() will convert a aom_tree_index structure with associated
 aom_prob probabilities into a CDF on the fly for use with the
 od_ec_encode_cdf_q15().
The number of symbols in the CDF is capped at 16, and trees that contain
 more than 16 leaf nodes are handled by splitting the most likely, e.g.,
 highest probability symbols, first and coding multiple symbols if
 necessary.

ntt-short-1:

         MEDIUM (%) HIGH (%)
    PSNR 0.000227   0.000213
 PSNRHVS 0.000215   0.000205
    SSIM 0.000229   0.000209
FASTSSIM 0.000229   0.000214

subset1:

          RATE (%)  DSNR (dB)
    PSNR -0.00026   0.00002
 PSNRHVS -0.00026   0.00002
    SSIM -0.00026   0.00001
FASTSSIM -0.00026   0.00001

Change-Id: Icb1a8cb854fd81fdd88fbe4bc6761c7eb4757dfe
2016-10-14 14:59:27 -07:00
Nathan E. Egge
0435f0eae6 Silence clang-format code review warning.
Change-Id: Ic76de4a4c0c39924bf04c3c2fa9214d33bcee9fb
2016-10-14 14:59:27 -07:00
Nathan E. Egge
8043cc4018 Use Daala entropy coder to code bits.
When building with --enable-daala_ec, calls to aom_write() and aom_read()
 use the daala entropy coder to write and read bits.
When the probability is exactly 0.5 (128), then raw bits are used.

ntt-short-1:

          MEDIUM (%) HIGH (%)
    PSNR -0.027556  -0.020114
 PSNRHVS -0.027401  -0.020169
    SSIM -0.027587  -0.020151
FASTSSIM -0.027592  -0.020102

subset1:

         RATE (%)  DSNR (dB)
    PSNR 0.03296  -0.00210
 PSNRHVS 0.03537  -0.00281
    SSIM 0.03299  -0.00161
FASTSSIM 0.03458  -0.00111

Change-Id: I48ad8eb40fc895d62d6e241ea8abc02820d573f7
2016-10-14 14:59:27 -07:00
Yaowu Xu
931bc2a714 Clear existing format issue in the codebase
Fix the clang-format warnings on the existing codes.

Change-Id: I8e9e781b6f68f41a7fbd0a2116f6b35290d73dc8
2016-10-14 14:59:27 -07:00
Nathan E. Egge
1078dee569 Add Daala entropy coder.
Change-Id: I2849a50163268d58cc5d80aacfec1fd02299ca43
2016-10-14 14:59:27 -07:00
Alex Converse
b60dfc2542 Merge "Switch rANS to 15 bit precision, and adjust L_BASE." into nextgenv2 2016-10-14 21:56:34 +00:00
Alex Converse
62a94a649d Switch rANS to 15 bit precision, and adjust L_BASE.
This causes rANS to operate at the same precision as the Daala EC.

aom/master stats: rans10uabs8lbase12 → rans15uabs8lbase15

objective-1-fast
PSNR YCbCr:      0.01%      0.01%      0.01%
   PSNRHVS:      0.01%
      SSIM:      0.01%
    MSSSIM:      0.01%
 CIEDE2000:      0.01%

subset1
PSNR YCbCr:     -0.01%     -0.00%     -0.00%
   PSNRHVS:     -0.01%
      SSIM:     -0.01%
    MSSSIM:     -0.01%
 CIEDE2000:     -0.01%

(cherry picked from aom/master commit ddbc2e2a68bfc997dc61fca5bcaac3a75245e965)

Change-Id: I6ef0a4f6198784b3712a61af9f105d560a22eaea
2016-10-14 14:05:50 -07:00
Urvang Joshi
74114a3a1e Bugfix: fix the build for CONFIG_FP_MB_STATS
Cherry-picked from aomedia/master: bf6c636

Change-Id: Iea3fb46d23cb94d1152de3a7a40b6a183e78b4d7
2016-10-14 13:42:53 -07:00
Urvang Joshi
b100db7c1d Wrap palette code inside CONFIG_PALETTE flag.
This flag was already added to aomedia/master, so bringing it back to
webm/nextgenv2, as part of an effort to get the two codebases in sync.

Change-Id: I2b933a6a160e4210d1411a9e7978149eb8553205
2016-10-14 13:42:02 -07:00
Yi Luo
e9fde265f7 Zero high 128b YMM registers to avoid SSE-AVX transition penalties
Documents:
- https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx
- https://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf

Change-Id: I90f85fcb15a7a2c49ee068300be6ffe9c68d371c
2016-10-14 12:22:35 -07:00
James Zern
fbabcad67c Merge changes I4850b36e,Ic4d7128a into nextgenv2
* changes:
  variance_avx2: sync variance functions with c-code
  Resolve -Wshorten-64-to-32 in variance.
2016-10-14 19:10:20 +00:00
Yaowu Xu
8d510e2e78 Use "av1" as codec name
Change-Id: I7650f1e96df0bcd53b1733c7967aae52dccf836a
2016-10-14 11:05:54 -07:00
Yaowu Xu
931bf3d6e1 Merge "Revert "Revert "Move CLPF block signals from frame to SB level.""" into nextgenv2 2016-10-14 17:58:20 +00:00
Yi Luo
b9fbf38bff Merge "Delete some redundant function declarations in aom_dsp_rtcd_defs.pl" into nextgenv2 2016-10-14 17:50:37 +00:00
Yaowu Xu
d71be7815d Revert "Revert "Move CLPF block signals from frame to SB level.""
This reverts commit 9b25f3067485b32442e13964df098903736c3fd8 to
reinstate the reverted commit with fixes that solved the build issues
when --enalbe-clpf is used in configure.

Change-Id: I15447cae7fa9b3deb27976345dc3db230a4a7a60
2016-10-14 08:58:49 -07:00
Yaowu Xu
4b71775307 Merge "Revert "Move CLPF block signals from frame to SB level."" into nextgenv2 2016-10-14 15:39:36 +00:00
Yaowu Xu
9b25f30674 Revert "Move CLPF block signals from frame to SB level."
This reverts commit 975350387ce0b55bf5af8cb944f6a242b72251ff.

Change-Id: I9f8e891739352ca2bde4b294e37c85a668f416e0
2016-10-14 15:39:03 +00:00
James Zern
8c64331aa2 variance_avx2: sync variance functions with c-code
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings

Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
(cherry picked from commit 6acd061aad8cf62000cc9117390d0c94581a8591)
2016-10-13 20:15:18 -07:00
Alex Converse
2176b7acc2 Resolve -Wshorten-64-to-32 in variance.
The subtrahend is small enough to fit into uint32_t.

Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f
(cherry picked from commit c0241664aac3a1805db9bd8e09e071ac326531e0)
2016-10-13 20:12:20 -07:00
Debargha Mukherjee
078856a4df Merge "Simplify 8x16 and 16x8 inverse transform tests" into nextgenv2 2016-10-14 02:53:38 +00:00
Debargha Mukherjee
089315fc5e Merge "Enable test system to detect transforms misusing 'stride' parameter" into nextgenv2 2016-10-14 02:50:47 +00:00
Debargha Mukherjee
a720f4b3b5 Merge "Add sse2 forward and inverse 16x32 and 32x16 transforms" into nextgenv2 2016-10-14 02:49:20 +00:00
Yue Chen
a48764d05f Merge "Renamings for OBMC experiment" into nextgenv2 2016-10-14 01:33:00 +00:00
Yi Luo
761ae880d7 Delete some redundant function declarations in aom_dsp_rtcd_defs.pl
Change-Id: I4df57a7faba5800c048b2dc469ec31545406f55c
2016-10-13 17:53:45 -07:00
Steinar Midtskogen
975350387c Move CLPF block signals from frame to SB level.
These signals were in the uncompressed frame header (as a temporary
hack), which caused two problems:

* We don't want that header to be duplicated in the slice header
* It was necessary to signal the number of bits to transmit up front

However, the filter size can be 128x128 which is greater than the SB
size, and a decoder wouldn't be able to know whether to read a bit or
not until the final SB of that 128x128 block has been decoded
(depending on whether the 128x128 is all skip or not).  Therefore the
signalling was changed for 128x128 blocks so that every top left SB of
a 128x128 filter block contains a signal regardless of whether the
block is all skip or not.  Also, all the MB's of 128x128 block are
filtered even if they are skip MB's.  This gives the signal a purpose
even when the 128x128 block is all skip, and it also gives a slight
coding gain as it leaves a way to filter skip blocks, which was
previously forbidden.

Low latency:
PSNR YCbCr:     -0.19%     -0.14%     -0.06%
   PSNRHVS:     -0.15%
      SSIM:     -0.13%
    MSSSIM:     -0.15%
 CIEDE2000:     -0.19%

High latency:
PSNR YCbCr:     -0.03%     -0.01%     -0.09%
   PSNRHVS:      0.04%
      SSIM:      0.00%
    MSSSIM:      0.02%
 CIEDE2000:     -0.02%

Change-Id: I69ba7144d07d388b4f0968f6a53558f480979171
2016-10-13 16:06:10 -07:00
Yue Chen
cb60b185c7 Renamings for OBMC experiment
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.

Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
2016-10-13 15:51:22 -07:00
Steinar Midtskogen
2d5f752ae9 Don't use _mm_cvtsi128_si64 on 32 bit systems
Change-Id: I332afb8d9e35cd60f05915160a5b2e1dc8757de5
2016-10-13 14:35:00 -07:00
Yaowu Xu
410fee8de6 Fix formatting in a few files
Change-Id: Ia5175afe82b142d9e18c01c546610202c630588e
2016-10-13 13:04:29 -07:00
Jean-Marc Valin
a8ce2c9199 Removing some useless loops in deringing filter
No change in the output

Change-Id: I1627feaa163d65da0df90e9dacbc5e39ee755de8
2016-10-13 18:27:25 +00:00
Jean-Marc Valin
209f830d97 Fix deringing level choice for 10-bit and 12-bit
Making sure we never exceed a base level of 63

Change-Id: I821254b8d970446bd40fdd6e4d7073c69760a86d
2016-10-13 18:27:17 +00:00
Jean-Marc Valin
3cfec90d33 Don't dering superblocks that have deringing disabled
Doesn't change the output, but avoids useless deringing with threshold=0

Change-Id: I69f3e54abad2d2493cfbc76c188ad7d190f0aeff
2016-10-13 18:27:03 +00:00
Yaowu Xu
98e9ce923b Merge "Add SSE4.1 code for deringing functions." into nextgenv2 2016-10-13 18:02:59 +00:00
Michael Bebenita
7227b65c4c Add SSE4.1 code for deringing functions.
Change-Id: I363f7fb610a5c86ea9f417e34b57c6373af877e5
2016-10-13 18:02:19 +00:00
Yaowu Xu
3feb89170b Merge "Simpler threshold calculation for the second filter" into nextgenv2 2016-10-13 18:01:45 +00:00
Yaowu Xu
5d2f01284f Merge "Make 4x4 deringing (chroma) use shorter filters" into nextgenv2 2016-10-13 18:01:23 +00:00
Yaowu Xu
fd44e24541 Merge "Removing Daala-specific deringing code" into nextgenv2 2016-10-13 18:01:11 +00:00
Zoe Liu
12cbaac759 Merge "Clean code a bit and fix a couple of small bugs in ext-refs" into nextgenv2 2016-10-13 16:47:03 +00:00
Yaowu Xu
9ffdf48c5a Merge "Use a quantizer-based threshold rather than full search for deringing" into nextgenv2 2016-10-13 16:35:08 +00:00
Yaowu Xu
8ac419f307 Merge changes Ic3a68557,Ib1dbe41a,I0da09270,Ibdbd720d into nextgenv2
* changes:
  Deringing cleanup: remove DERING_REFINEMENT (always on now)
  Don't run the deringing filter on skipped blocks within a superblock
  Don't dering skipped superblocks
  On x86 use _mm_set_epi32 when _mm_cvtsi64_si128 isn't available
2016-10-13 15:54:32 +00:00
Zoe Liu
f0e4669edb Clean code a bit and fix a couple of small bugs in ext-refs
Currently the patch does not have any impact on the RD performance. The
fix could however potentially help on the next step of work, especially
when the extra altref frames allow non-zero temporal filtering strength
and their corresponding OVERLAY frames, i.e. the INTNL_OVERLAY frames
are being added.

Change-Id: I2e07fb3d0aa547a0b5dd05bb4ba865cd46309076
2016-10-13 08:42:51 -07:00
Yaowu Xu
89d3f2fd10 Merge "Sync 2x2 intra predictors" into nextgenv2 2016-10-13 15:20:52 +00:00
David Barker
4f803efac1 Simplify 8x16 and 16x8 inverse transform tests
Change-Id: Ie86aedfb1f3e0d9c0cf58d7183861a0ed0e8ccc8
2016-10-13 16:02:59 +01:00
David Barker
7825022daa Enable test system to detect transforms misusing 'stride' parameter
This would have caught the bug introduced in patch set 1 of
https://chromium-review.googlesource.com/#/c/397378/

Change-Id: I9c6d5d9c4c98aed5ac48c4fb1c4ff4131b0df1d5
2016-10-13 15:50:44 +01:00
Alex Converse
cba3d1f1c3 AnsTest: Replace the dummy distribution
Use constrained token table row 65/256 instead.

Change-Id: I8b442d4c82af8fa9d36ac2de0d73179ed040478d
(cherry picked from commit 47eb9a2ca46821b468903514cd34eaaca2533d45)
2016-10-13 07:04:55 -07:00
Alex Converse
fc4980edb7 Merge changes Ic74d9d88,Ie93b474e,I544989ea,Ic273f7d9,Idfd2d2b3, ... into nextgenv2
* changes:
  Remove custom rans types
  Remove add_token_no_extra.
  Remove unused aom_rans_build_cdf_from_pdf
  Add the tool used to generate the constrained tokenset.
  Remove the starting zero from ANS CDFs.
  Import the aom_read/write_symbol abstractions from aom/master
2016-10-13 14:03:15 +00:00
David Barker
33231d4801 Add sse2 forward and inverse 16x32 and 32x16 transforms
Change-Id: I1241257430f1e08ead1ce0f31db8272b50783102
2016-10-13 14:01:22 +01:00
Debargha Mukherjee
cad8283e55 Merge "Fix a bug in inverse halfright 32x32 transform" into nextgenv2 2016-10-13 08:16:47 +00:00
Alex Converse
9ed1a2ff44 Remove custom rans types
(cherry picked from aom/master commit 11206c60d930be9d29100567aa67f2a65463852a)

Includes renames in a bunch of places not handled by the original
due to differing tree states.

Change-Id: Ic74d9d8850b8c80a51e55e425bbf472a67e2653f
2016-10-13 05:53:58 +00:00
Jingning Han
e3954d8312 Sync 2x2 intra predictors
Add 2x2 DC, V, H, TM intra predictors.

Change-Id: I2a614adde553f821c45bc5a9bf09800a9f0aaa26
2016-10-12 21:04:01 -07:00
Jean-Marc Valin
4713d8d019 Simpler threshold calculation for the second filter
PSNR YCbCr:      0.03%     -0.00%      0.07%
   PSNRHVS:      0.06%
      SSIM:      0.12%
    MSSSIM:      0.09%
 CIEDE2000:      0.05%

Change-Id: I15ef9598a08f6713bc28ab98b0182310433e97ef
2016-10-12 18:17:10 -07:00
Jean-Marc Valin
ea64c342b7 Make 4x4 deringing (chroma) use shorter filters
Avoids blurring chroma for 4:2:0

PSNR YCbCr:      0.03%     -0.31%     -0.29%
   PSNRHVS:      0.02%
      SSIM:      0.03%
    MSSSIM:      0.02%
 CIEDE2000:      0.01%

Change-Id: If744fb902b5f24404479def22b9ca8a19baec722
2016-10-12 18:16:54 -07:00
Jean-Marc Valin
2c616e61e0 Removing Daala-specific deringing code
No point in keeping them in sync now that all the code is reformatted

Change-Id: I8a062253ed6a5f86028cd5a2a922b3c760def6fb
2016-10-12 18:16:23 -07:00
Jean-Marc Valin
6d5a7a924b Use a quantizer-based threshold rather than full search for deringing
objective-1-short results (with deringing enabled):
PSNR YCbCr:      0.08%      0.03%      0.11%
   PSNRHVS:      0.06%
      SSIM:      0.12%
    MSSSIM:      0.08%
 CIEDE2000:      0.05%

Change-Id: Ifcfc42c14c33650dcf879c4d0ddd8688d4d07da1
2016-10-12 18:16:07 -07:00
Alex Converse
4ce69de9a6 Remove add_token_no_extra.
It was a fairly small production optimization for VP9.

Change-Id: Ie93b474ea5b7e63384a7c0b3a56b135462d1471b
(cherry picked from aom/master commit df9bb76b1330de42fe13827df4c72010adb51429)
2016-10-12 17:44:28 -07:00
Alex Converse
d5b9c730ad Remove unused aom_rans_build_cdf_from_pdf
Change-Id: I544989eae45b7dda04250365c3de99f50110a76b
(cherry picked from aom/master commit 06cce842caa5212826d51c2a317de0bdfae74349)
2016-10-12 17:44:14 -07:00
Alex Converse
dacf45facd Add the tool used to generate the constrained tokenset.
The code that generates the raw distribution is based on a MATLAB
program by Debargha Mukherjee, and the algorithm used to quantize the
distribution comes from the ANS Toolkit by Jarek Duda.

Change-Id: Ic273f7d9e43e3ecd999e9e7e04cde57e8559375a
(cherry picked from aom/master commit ef446026aeafa318f9bee182b8c80eb4f1ef5a0a)
2016-10-12 17:41:01 -07:00
Alex Converse
e9f70f8f10 Remove the starting zero from ANS CDFs.
This brings it in line with the Daala CDFs and will make it easier to
share code.

Change-Id: Idfd2d2b33c3b9b2c4e72ce72fb3d8039013448b9
(cherry picked from aom/master commit af98507ca928afe33e9f88fdd2ca168379528d6a)
2016-10-12 17:41:01 -07:00
Alex Converse
a1ac972867 Import the aom_read/write_symbol abstractions from aom/master
Change-Id: I0b255c05108c3b97e74df1b59c34111c9e9a5770
2016-10-12 17:41:01 -07:00
Jean-Marc Valin
e874ce0300 Deringing cleanup: remove DERING_REFINEMENT (always on now)
Change-Id: Ic3a6855799be010e69aeab924b013679282ab191
2016-10-12 17:13:09 -07:00
Jean-Marc Valin
8455cd9fc1 Don't run the deringing filter on skipped blocks within a superblock
No change in metrics

Change-Id: Ib1dbe41a9e1a564dd9a63a33e2a5315ad6bca70c
2016-10-12 17:12:45 -07:00
Jean-Marc Valin
56b0c3c51b Don't dering skipped superblocks
No change in metrics

Change-Id: I0da09270d78c3caf78a32a3157f02c87f2232e3e
2016-10-12 17:12:10 -07:00
Yi Luo
e01484e412 Merge "Hybrid forward transform 32x32 AVX2 optimization" into nextgenv2 2016-10-13 00:08:48 +00:00
Steinar Midtskogen
b074823863 On x86 use _mm_set_epi32 when _mm_cvtsi64_si128 isn't available
Change-Id: Ibdbd720d4f68892da6164a9849e212e759305005
2016-10-12 15:48:13 -07:00
Alex Converse
91e4e604bd Merge changes I3ca2b674,I78afc587,I3ae62181,I5ed91556 into nextgenv2
* changes:
  Unfork ANS decode_coefs
  Remove ZERO_TOKEN from the ANS tokenset
  Drop costing ANS tokens from derived probabilities
  Unfork ANS pack_mb_tokens
2016-10-12 22:25:27 +00:00
Debargha Mukherjee
e52816bf8f Fix a bug in inverse halfright 32x32 transform
Fix a bug in the C implementation of the ihalfright32
transform, in the case that its input and output buffers are the same.
This occurs when it is called by av1_iht32x16_512_add_c.

Change-Id: I61c652e2662178520c0639a2879ae128a9c7ec3f
2016-10-12 14:49:18 -07:00
Yi Luo
fed8e1c06d Hybrid forward transform 32x32 AVX2 optimization
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.

- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
  But function replacement must go with the corresponding inverse txfm.

- No obvious user level time reduction due to 32x32 TX_TYPE selection.

- Zero high 128b YMM to avoid AVX-SSE transition penalties
  (fix 16x16 case).

- Added 32x32 AVX2 unit tests to verify bitexact.

- AVX2 optimization summary:
  On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
  C to AVX2: function level time reduction, ~86-89%.
  SSE2 to AVX2: function level time reduction, ~51%.

Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
2016-10-12 14:19:53 -07:00
Hui Su
933bf08cfb Merge "Send allow_screen_content flag for both key and intra only frames" into nextgenv2 2016-10-12 21:13:24 +00:00
Debargha Mukherjee
4282b6bbbb Merge "Refactor expand dry_run types to return coef rate" into nextgenv2 2016-10-12 21:06:41 +00:00
Alex Converse
5e4d00c37e Unfork ANS decode_coefs
This is less code and more like what we have in aom/master.

Change-Id: I3ca2b674e4ad9e2e211d08bb51d78549e8b63a54
2016-10-12 13:23:33 -07:00
Alex Converse
ea7e990fd4 Remove ZERO_TOKEN from the ANS tokenset
This can be re-added after aligning AOM's ANS with nextgenv2's ANS.

This partially reverts commit 3829cd2f2f9904572019aa047d068baeee843767.

Change-Id: I78afc587f1abfe33ffcd53b3262910cfae135534
2016-10-12 13:15:08 -07:00
Alex Converse
ccf472bc05 Drop costing ANS tokens from derived probabilities
This mimics what's currently done in aom/master. This can be re-added
after aligning AOM's ANS with nextgenv2's ANS.

Change-Id: I3ae62181dd4803694204a234c717a86a15ca8a40
2016-10-12 13:14:21 -07:00
Alex Converse
dc62b0925d Unfork ANS pack_mb_tokens
This is less code and more like what we have in aom/master.

Change-Id: I5ed915563cbfbc6281113c1eb31455f50710ba9f
2016-10-12 13:09:13 -07:00
Jim Bankoski
3265ef3d1d AUTHORS regenerated
script changed to remove extra entities and clang-format bot.

Change-Id: I102cd80fdf4b240e6e4d5172943e49146a601a72
2016-10-12 12:26:05 -07:00
Yaowu Xu
c4d8fea575 Merge "minor updates" into nextgenv2 2016-10-12 19:25:47 +00:00
hui su
24f7b07f2e Send allow_screen_content flag for both key and intra only frames
BUG=webm:1311

Change-Id: I03c1043d17ed4e4ea22002473779a9612884c6c6
2016-10-12 11:45:05 -07:00
Yaowu Xu
c49a6f2a21 Merge "Include fix: use aom_integer.h" into nextgenv2 2016-10-12 18:26:30 +00:00
Yaowu Xu
694419b6a6 Merge "Add compiler flag -Wsign-compare" into nextgenv2 2016-10-12 18:26:22 +00:00
Yaowu Xu
732c188523 Merge "LIBVPX_TEST_DATA_PATH -> LIBAOM_TEST_DATA_PATH" into nextgenv2 2016-10-12 17:56:26 +00:00
Yaowu Xu
f36d0b46d1 minor updates
1. vp8->aom
2. removed no-effect statements and spaces

Change-Id: I367d05ff9bf1b9f3c71c517c45d8049d9d4236ec
2016-10-12 10:50:08 -07:00
Sarah Parker
d2b1fe4a1f Merge "Fix inconsistency in gm parameter write to bitstream" into nextgenv2 2016-10-12 17:32:21 +00:00
Urvang Joshi
f792a72740 Include fix: use aom_integer.h
Change-Id: I98919a04bead417379e555461f67978501f922e7
2016-10-12 08:27:00 -07:00
Urvang Joshi
d3a7576fbc Add compiler flag -Wsign-compare
Also, fix the warnings generated by this flag.

Conflicts:
	examples/aom_cx_set_ref.c

Change-Id: I0451e119c52000aa7c1c55027d53f1da5a02a11f
2016-10-12 08:27:00 -07:00
Yaowu Xu
97aa09f658 LIBVPX_TEST_DATA_PATH -> LIBAOM_TEST_DATA_PATH
This commit renames LIBVPX_TEST_DATA_PATH to LIBAOM_TEST_DATA_PATH,
with a work around for working with jenkins environmnet variables.

Change-Id: If664ce57e25ad2af8121d1b578bf64043f0baa2a
2016-10-12 08:26:44 -07:00
Yaowu Xu
445ae93ec7 Merge "y4m_test: fix segfault if test files are missing" into nextgenv2 2016-10-12 04:26:52 +00:00
Yaowu Xu
6bb9b697be Merge "Remove two files not in use" into nextgenv2 2016-10-12 04:26:36 +00:00
Sarah Parker
689b0caea7 Fix inconsistency in gm parameter write to bitstream
Before this change, gm parameters were being written to the
bitstream for all frames, but only read for inter only frames,
causing a bitstream error.

Change-Id: I63b8e2fdf6358e07cc00718de04cc399809bde37
2016-10-11 19:35:26 -07:00
Tristan Matthews
46940a8e7d y4m_test: fix segfault if test files are missing
Change-Id: I7a04beb83095e5c0821048909f81f45be8b5eee3
2016-10-11 18:20:01 -07:00
Alex Converse
5cca4187fe Merge "Remove -fno-strict-aliasing flag" into nextgenv2 2016-10-11 23:24:39 +00:00
Yaowu Xu
5a9b51c725 Remove two files not in use
test/cx_set_ref.sh: replaced by test/aomcx_set_ref.sh
test/vpxdec.sh: replaced by aomdec.sh

Change-Id: I74136d311eee7666e08ed8f573a17f810992fc52
2016-10-11 15:12:11 -07:00
Yaowu Xu
4a01dca3c6 Merge "change to use aomedia copyright notice" into nextgenv2 2016-10-11 22:11:09 +00:00
Yaowu Xu
058ec6cd56 Merge "Fix missing parentheses in v64_align()" into nextgenv2 2016-10-11 22:10:08 +00:00
Yaowu Xu
f72f844572 Merge "Improve v128 and v64 8 bit shifts for x86" into nextgenv2 2016-10-11 22:09:53 +00:00
Yaowu Xu
c96168987d Merge "Clean up and speed up CLPF clipping" into nextgenv2 2016-10-11 22:09:31 +00:00
Yaowu Xu
afb60c361c Merge "Fix typos in CLPF unit test" into nextgenv2 2016-10-11 22:06:59 +00:00
Yaowu Xu
bd979a16c8 Merge "Make generic SIMD code compile if no native support" into nextgenv2 2016-10-11 22:06:43 +00:00
Debargha Mukherjee
ceebb70197 Refactor expand dry_run types to return coef rate
Adds the functionality to return the rate cost due to
coefficients without doing full search of all modes.
This will be subsequently used in various experiments,
including in new_quant experiment to search quantization
profiles at the superblock level without repeating the
full mode/partition search.

Change-Id: I4aad3f3f0c8b8dfdea38f8f4f094a98283f47f08
2016-10-11 14:55:26 -07:00
Yaowu Xu
53a9745c7a Merge "Bugfix in CLPF RDO. Prevented selection of enable_fb_flag=0." into nextgenv2 2016-10-11 21:54:13 +00:00
Yaowu Xu
1aa6cbc7ea Merge "Bugfix in the CLPF RDO." into nextgenv2 2016-10-11 21:53:56 +00:00
Sarah Parker
4082ff0bf6 Merge "Read mode to mi->bmi for sub 8x8 blocks" into nextgenv2 2016-10-11 21:48:01 +00:00
Yaowu Xu
6e0d64c5fe change to use aomedia copyright notice
Change-Id: Idb2cf2555bcbe04a6650c492a3a714d7d5836b67
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
b066b962a7 Fix missing parentheses in v64_align()
Change-Id: I16469062853c101965f56002be30ebc5823975b1
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
9d6a53b8fd Improve v128 and v64 8 bit shifts for x86
Change-Id: I25dc61bab46895d425ce49f89fceb164bee36906
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
e66fc87c46 Clean up and speed up CLPF clipping
* Move clipping tests from inside to outside loops
* Let sizex and sizey to clpf_block() be the clipped block size rather
  than both just bs
* Make fallback tests to C more accurate

Change-Id: Icdc57540ce21b41a95403fdcc37988a4ebf546c7
2016-10-11 12:36:17 -07:00
Steinar Midtskogen
6116141c23 Fix typos in CLPF unit test
Change-Id: Ia69bad44e47509208e3b9d306165d0872d4e92f3
2016-10-11 12:36:16 -07:00
Steinar Midtskogen
ebf209ba82 Make generic SIMD code compile if no native support
Change-Id: I7f691a0ae27f06ef3d727764829a60a8ffc509eb
2016-10-11 12:36:16 -07:00
Steinar Midtskogen
86b19177ab Bugfix in CLPF RDO. Prevented selection of enable_fb_flag=0.
PSNR YCbCr:     -0.01%     -0.06%     -0.17%
   PSNRHVS:      0.01%
      SSIM:      0.03%
    MSSSIM:      0.00%
 CIEDE2000:     -0.05%

Change-Id: I1205c021bfc5cee6f80344fec92aabb529af9bd1
2016-10-11 12:35:48 -07:00
Steinar Midtskogen
2e40cc4ce6 Bugfix in the CLPF RDO.
When CLPF was extended to chroma, the chroma RDO accidentally
discarded the optimal block size found in the luma RDO.

PSNR YCbCr:     -0.25%      0.05%      0.06%
   PSNRHVS:     -0.19%
      SSIM:     -0.36%
    MSSSIM:     -0.23%

Conflicts:
	av1/common/clpf.c

Change-Id: Ie49cd30f9276a311ada88cb2f13d14757617f030
2016-10-11 12:35:10 -07:00
Yaowu Xu
25faa0e9f5 Merge "Move tree writing code into bitwriter.h." into nextgenv2 2016-10-11 19:16:25 +00:00
Yaowu Xu
de005d322a Merge "Remove unused color_sensitivity member from MACROBLOCK." into nextgenv2 2016-10-11 19:16:07 +00:00
Sarah Parker
d7fa8542f6 Read mode to mi->bmi for sub 8x8 blocks
Previously, only the motion vectors were being stored. This caused
a mismatch in the global motion experiment, which needs this
mode information to decide whether or not to use the gm parameters
in reconstruction.

Change-Id: I58cde750ec06587dbfb8d65b07c15a67b7d6b1f6
2016-10-11 11:51:59 -07:00
Yaowu Xu
57aa518c30 Merge "CLPF: Remove redundant function argument." into nextgenv2 2016-10-11 18:44:56 +00:00
Yaowu Xu
80eaf1a120 Merge "Extend CLPF to chroma." into nextgenv2 2016-10-11 18:44:31 +00:00
Yaowu Xu
39b25dfa38 Merge "Remove some dead code in CLPF." into nextgenv2 2016-10-11 18:43:27 +00:00
Yaowu Xu
aaf64c4387 Merge "Print correct info if CLPF unit tests fail." into nextgenv2 2016-10-11 18:42:52 +00:00
Yaowu Xu
443e522b5c Merge "Reduce memory footprint for CLPF encoding." into nextgenv2 2016-10-11 18:42:34 +00:00
Yaowu Xu
a1a7ad0c15 Merge "Make generic SIMD work with clang." into nextgenv2 2016-10-11 18:42:15 +00:00
Yaowu Xu
0bab35bf64 Merge "Fix clang-format warnings in aom_dsp/simd/v64_intrinsics_arm.h" into nextgenv2 2016-10-11 18:41:50 +00:00
Yaowu Xu
a71552421d Merge "Non-normative quality improvements to CLPF." into nextgenv2 2016-10-11 18:41:40 +00:00
Yaowu Xu
038d41045b Merge "Added high bit-depth support in CLPF." into nextgenv2 2016-10-11 18:41:15 +00:00
Yaowu Xu
6fc92c1ccc Merge "Fix a memleak in CLPF." into nextgenv2 2016-10-11 18:41:03 +00:00
Yaowu Xu
a2bbf621f1 Merge "Reduce memory footprint for CLPF decoding." into nextgenv2 2016-10-11 18:40:47 +00:00
Yaowu Xu
4da3ed40a3 Merge "Make CLPF handle frame widths and heights not divisible by 8." into nextgenv2 2016-10-11 18:40:05 +00:00
Yaowu Xu
b5e73bddb0 Merge "CLPF: Don't assume sb size=64 and w&h multiple of 8 + valgrind fix." into nextgenv2 2016-10-11 17:44:12 +00:00
Yaowu Xu
3b161e14b3 Merge "Silence some harmless compiler warnings in CLPF." into nextgenv2 2016-10-11 17:43:23 +00:00
Zoe Liu
d623c4122a Merge "Add a small code clean for show_existing_frame" into nextgenv2 2016-10-11 16:58:17 +00:00
Nathan E. Egge
eeedc633c0 Move tree writing code into bitwriter.h.
Rename av1_write_tree() to aom_write_tree() and move it into bitwriter.h
 to match aom_read_tree() in bitreader.h.

Manually cherry-picked from aom/master:
33a143fa7ac42d62080bfc20468cb76ad26045db

Change-Id: I6c686cdd3e0f179d7e95c5bc6984558b62d46d67
2016-10-11 09:36:01 -07:00
Thomas Daede
debaface95 Remove unused color_sensitivity member from MACROBLOCK.
Conflicts:
	av1/encoder/block.h
	av1/encoder/encodeframe.c

Change-Id: I941e7b9e76380f262b173928d3c5132c5613b3ce
2016-10-11 09:35:39 -07:00
Yaowu Xu
12fcf74c8a Merge "Use derived variable size for memcpy" into nextgenv2 2016-10-11 16:15:43 +00:00
Yaowu Xu
4960f7c3bd Merge "Added generic SIMD support for CLPF." into nextgenv2 2016-10-11 16:05:18 +00:00
Debargha Mukherjee
fb865cf41c Merge "Add sse2 forward / inverse 4x8 and 8x4 transforms" into nextgenv2 2016-10-11 15:50:32 +00:00
Yaowu Xu
c648a9fd83 Use derived variable size for memcpy
Manually cherry-picked from aom/master:
bf2ad75a1723d223c376b93295aa06dd23226937

Change-Id: I99f05e79ec8ad35a49bc124e6dd829ccc7d9cc36
2016-10-10 17:39:29 -07:00
Zoe Liu
5fca72498a Add a small code clean for show_existing_frame
Change-Id: I42dc9f0fdecd3cf3398ab82d6e01dde06bdf7b24
2016-10-10 17:18:57 -07:00
Steinar Midtskogen
ded69f5668 CLPF: Remove redundant function argument.
Change-Id: I31bea3b1f76493060edd7e1bd616a223841d5f77
2016-10-10 15:24:33 -07:00
Steinar Midtskogen
ecf9a0c821 Extend CLPF to chroma.
Objective quality impact (low latency):

PSNR YCbCr:      0.13%     -1.37%     -1.79%
   PSNRHVS:      0.03%
      SSIM:      0.24%
    MSSSIM:      0.10%
 CIEDE2000:     -0.83%

Change-Id: I8ddf0def569286775f0f9d4d4005932766a7fc27
2016-10-10 15:23:38 -07:00
Steinar Midtskogen
9021d09f9a Remove some dead code in CLPF.
av1_clpf_frame() was always called with the same src and dst,
so we only need one argument and the code supporting different
src and dst was removed.

Change-Id: I70919f50e5cfb19c22eb4dff9ee7c0fa2697fad3
2016-10-10 15:23:09 -07:00
Steinar Midtskogen
ee54e5f3c5 Print correct info if CLPF unit tests fail.
Change-Id: Ieac27194f342d8ef9ef98c96ebea9d0c444658cf
2016-10-10 15:21:06 -07:00
Steinar Midtskogen
a8af9126fb Reduce memory footprint for CLPF encoding.
Use in-place filtering, like in the decoder
(see eb5794da1659f87597291d84c2fbdfd89280065d).

Change-Id: If037ead45f5cb3461347a63e0e415954d5dcba8b
2016-10-10 15:20:42 -07:00
Steinar Midtskogen
7b7624e89e Make generic SIMD work with clang.
Change-Id: I2c504a078a7137bea6ba50c5768c1295878e9ea1
2016-10-10 15:18:57 -07:00
Jingning Han
0b44cdcab1 Fix clang-format warnings in aom_dsp/simd/v64_intrinsics_arm.h
Change-Id: I221bf4520d7030133e3b2fea883a995b3d6f6282
2016-10-10 15:18:49 -07:00
Steinar Midtskogen
499deb9def Non-normative quality improvements to CLPF.
BDR improvements:
     PSNR  PSNRHVS SSIM  MSSSIM CIEDE2000 PSNR Cb  PSNR Cr
LL: -0.17% -0.13% -0.11% -0.12%   -0.18%   -0.19%   -0.21%
HL: -0.21% -0.14% -0.15% -0.11%   -0.37%   -0.39%   -0.52%

Change-Id: I58c00a1cc0ddfc3376644f66345e99472482a613
2016-10-10 11:31:50 -07:00
Steinar Midtskogen
3dbd55a6c4 Added high bit-depth support in CLPF.
Change-Id: Ic5eadb323227a820ad876c32d4dc296e05db6ece
2016-10-10 11:27:04 -07:00
Steinar Midtskogen
9351b2f792 Fix a memleak in CLPF.
The memleak appeared in eb5794da1659f87597291d84c2fbdfd89280065d.

Change-Id: Ifdd6d64aafa0d0ce4dfaf1844f594d5f843bf2e0
2016-10-10 11:26:52 -07:00
Steinar Midtskogen
e8224c7ad5 Reduce memory footprint for CLPF decoding.
Instead of having CLPF write to an entire new frame and
copy the result back into the original frame, make the
filter able to work in-place by keeping a buffer of size
frame_width*filter_block_size and delay the write-back
by one filter_block_size row.

This reduces the cycles spent in the filter to ~75%.

Change-Id: I78ca74380c45492daa8935d08d766851edb5fbc1
2016-10-10 11:26:33 -07:00
Steinar Midtskogen
34dac00adc Make CLPF handle frame widths and heights not divisible by 8.
Change-Id: If5eb33b6b090f43ba64c82468576b89eddd872c3
2016-10-10 11:26:15 -07:00
Steinar Midtskogen
f4d41e6330 CLPF: Don't assume sb size=64 and w&h multiple of 8 + valgrind fix.
Change-Id: I518ad9c58973910eb0bdcb377f2d90138208c570
2016-10-10 11:21:23 -07:00
Steinar Midtskogen
2fd70ee124 Silence some harmless compiler warnings in CLPF.
Change-Id: I4a6d84007bc17b89cfd8d8f2440bf2968505bd6a
2016-10-10 11:20:43 -07:00
Steinar Midtskogen
be668e92c3 Added generic SIMD support for CLPF.
Change-Id: Ie03f9a5b0a4c708a586532198d755a1e7509f149
2016-10-10 11:19:37 -07:00
Yaowu Xu
607048d606 Merge "Added generic SIMD library supporting x86 SSE2+ and ARM NEON." into nextgenv2 2016-10-10 18:17:50 +00:00
Yaowu Xu
abe0484cee Merge "New CLPF: New kernel and RDO for strength and block size" into nextgenv2 2016-10-10 18:17:41 +00:00
David Barker
4d03d6fc6f Add sse2 forward / inverse 4x8 and 8x4 transforms
Change-Id: I89ed93fb20cf975c2b463cff58879521ceaa4163
2016-10-10 09:02:45 -07:00
Yi Luo
3a8217f21b Merge "Hybrid forward transforms 16x16 AVX2 optimization" into nextgenv2 2016-10-07 01:52:11 +00:00
Debargha Mukherjee
609453e7e4 Merge "Added sse2 inverse 8x16 and 16x8 transforms" into nextgenv2 2016-10-07 00:03:34 +00:00
Debargha Mukherjee
e4dc5f8dc9 Merge "A bug fix for var-tx" into nextgenv2 2016-10-07 00:02:31 +00:00
Johann
9ed9cedae1 Remove -fno-strict-aliasing flag
The referenced bug was fixed by saving neon registers. That this had any
effect was coincidental.

Both chromium and Android build with clang and neither uses this flag.

Change-Id: I470247d6fd9226fc207b42a187105581a94badc3
(cherry picked from commit fad70a358b9ab832f5f2ece1609936b80b649c71)
2016-10-06 15:52:39 -07:00
Yi Luo
e8e8cd8f1b Hybrid forward transforms 16x16 AVX2 optimization
- Unit tests are added for AVX2 SIMD.
- Encoder speed improvement:
  AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
  800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
  user level time reduction: 3.86%.

Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
2016-10-06 15:33:15 -07:00
Alex Converse
24aa59cc51 Fix left shift of negative integer in hbd directional predictors
Change-Id: Id78139ae2dfa2d521bd50618b4a81cf24e09e391
2016-10-06 11:41:47 -07:00
Peter de Rivaz
1baecfeb03 Added sse2 inverse 8x16 and 16x8 transforms
Change-Id: I43628407b11e5c8e6af4df69f2acdc67ac827834
2016-10-06 11:23:14 -07:00
Debargha Mukherjee
29804479b5 Merge "Silence some warnings" into nextgenv2 2016-10-06 18:15:16 +00:00
Debargha Mukherjee
28d924b7b8 A bug fix for var-tx
Fixes a crash with supertx, ext-tx and rect-tx

Change-Id: I6b5f4cfd6e209558541a791be685b55156aa0138
2016-10-06 11:14:27 -07:00
Steinar Midtskogen
a5f8ea1109 Added generic SIMD library supporting x86 SSE2+ and ARM NEON.
Change-Id: I037f4c44f621a7e909b82ccb6a299d41bcbf8607
2016-10-06 16:37:08 +00:00
Steinar Midtskogen
d06588ab18 New CLPF: New kernel and RDO for strength and block size
This commit ports a CLPF change from aom/master by manually
cherry-picking:
7560123c066854aa40c4685625454aea03410b18

Change-Id: I61eb08862a101df74a6b65ece459833401e81117
2016-10-06 09:36:03 -07:00
Jingning Han
3b22d1a875 Merge "Make ref_mv_idx syntax context dependent on block distance only" into nextgenv2 2016-10-06 15:55:40 +00:00
Angie Chiang
9c2d401ca0 Merge "Simplify file dependencies of SIMD implementation of interpolation filters" into nextgenv2 2016-10-05 16:26:26 +00:00
Jingning Han
8205b78552 Make ref_mv_idx syntax context dependent on block distance only
This allows the hardware decoder to start decoding ref_mv_idx
syntax prior to the sorting stage and hide the latency of entropy
decoding. The compression performance change is about 0.01% level.

Change-Id: I86b34f31f6c99a36ae2780416175cc0bd90ff492
2016-10-05 09:09:00 -07:00
Debargha Mukherjee
1ae9f2cfab Silence some warnings
Change-Id: I8efb64eac3438484e7a77a8a1db198223fc52bfa
2016-10-04 14:30:16 -07:00
Debargha Mukherjee
cb603790b0 Fix a compiler warning in ext-inter experiment
Change-Id: If36417c1384646da57453344b208e7653a4d31e5
2016-10-04 13:22:21 -07:00
Debargha Mukherjee
1a16a987ee Fix an integer overflow issue in restoration
https://bugs.chromium.org/p/webm/issues/detail?id=1306

Change-Id: Icd11d373ff08954121c097728e4c7791791e223f
2016-10-04 11:50:00 -07:00
Alex Converse
438b1dcb72 Merge "ext_tx: fix a signed overflow" into nextgenv2 2016-10-04 17:24:06 +00:00
Yi Luo
2b47628903 Merge "Fix high bitdepth variance overflow on uint32_t" into nextgenv2 2016-10-04 17:11:07 +00:00
Angie Chiang
b9ba5c251b Simplify file dependencies of SIMD implementation of interpolation filters
This is a similar change to following aom CL
https://aomedia-review.googlesource.com/#/c/1961/

Move SIMD related functions from filter.c/h to following files
av1_convolve_ssse3.c
av1_highbd_convolve_filters_sse4.c

Change following c files to header files.
av1_highbd_convolve_filters_sse4.c
av1_convolve_filters_ssse3.c

Change-Id: I41a3cc6b0789e632451aeda82f5eb97a4d78e370
2016-10-03 18:43:23 -07:00
Yi Luo
a674ba93fe Fix high bitdepth variance overflow on uint32_t
BUG=webm:1305

Change-Id: I4c56631359e298b99e618c07bcbae9f793c5e2ac
2016-10-03 16:37:00 -07:00
Yi Luo
8e46b860c6 Fix filter type mismatch warning on Visual Studio
- Move filter look-up functions to corresponding optimization modules.

BUG=webm:1296

Change-Id: I87f399609052db2dbc7e5a590afb08b82e3fa89f
2016-10-03 16:24:25 -07:00
Alex Converse
aa77b5168f ext_tx: fix a signed overflow
Change-Id: I9a08bc5da1a84c3d4b8fe2d457bb80406c0bc028
2016-10-03 16:17:24 -07:00
Debargha Mukherjee
bf0431276d Merge "Further changes to new-quant tables" into nextgenv2 2016-10-03 21:10:30 +00:00
Yaowu Xu
0badd0201e Merge "decode_with_drops.sh : make sample test work for av1" into nextgenv2 2016-10-03 19:57:43 +00:00
Yaowu Xu
6f745a2795 Merge "decode_with_drop.sh: vp8->aom" into nextgenv2 2016-10-03 19:57:30 +00:00
Yaowu Xu
f54733ad25 Merge "decode_to_md5_test: fixes and runs quick encode and checks decode" into nextgenv2 2016-10-03 19:57:17 +00:00
Yaowu Xu
4707c75a5b Merge "decode_to_md5.sh: vp8->aom" into nextgenv2 2016-10-03 19:56:59 +00:00
Jim Bankoski
ddb90bb445 decode_with_drops.sh : make sample test work for av1
Change-Id: I4175070840a6561c1cec5f5a50b64e425f3e2926
2016-10-03 11:13:13 -07:00
Yaowu Xu
a7c2e5c3f5 decode_with_drop.sh: vp8->aom
Change-Id: I22dacbc2e4933a60ce7151204af9ee253990ca1f
2016-10-03 11:13:08 -07:00
Jim Bankoski
0d730f95f0 decode_to_md5_test: fixes and runs quick encode and checks decode
This test checks if there's any basic change to the bitstream or default
encoder by running an encode and checking that the md5 from the decode
doesn't change.

Any change to the default encoder or bitstream should be accompanied by
a change to the md5 in this file.

Change-Id: Ibdd5a1442296fd3e946823ec1f43e8ac4e66dd34
2016-10-03 11:12:13 -07:00
Yaowu Xu
8982e20889 decode_to_md5.sh: vp8->aom
Change-Id: I0dcb0643cf83ee99b63336df851cbca749c11b68
2016-10-03 11:11:54 -07:00
Jingning Han
42bc3a9ef3 Sync ref-mv experiment between aom and nextgenv2
Change-Id: I134d276234b3b8aa7df1ab647892b5d739647f4c
2016-10-03 09:02:20 -07:00
Debargha Mukherjee
3c42c09608 Further changes to new-quant tables
Refactor to streamline the number of profiles needed, in
preparation for the next steps.

NO change in performance.

Change-Id: I753b89299897857f3c250c316b4cdc4fedcb90e8
2016-10-01 17:59:28 -07:00
Jingning Han
1f470046aa Merge "Rename aom_write_nmv_probs as av1_write_nmv_probs" into nextgenv2 2016-10-01 01:06:09 +00:00
Yaowu Xu
d59fb48bc7 Add notes for an option
cherry-picked from aom/master:
2b407394907253be68bc497aa978b0adc298bbf8

Change-Id: Ia7b3bfd68e2c31b21f49a429fecc4d0b701b045f
2016-09-30 15:39:53 -07:00
Yaowu Xu
671f2bd3f5 Rename AOM_ENC/DEC_BORDER_IN_PIXELS
Cherry-picked from aom/master:
e2721a65cbfb5b560cd884d60eb17f53539df5f0

Change-Id: I4ade58be91e7bca0cc4f2bed98a43177d7f590a5
2016-09-30 15:17:16 -07:00
Jingning Han
71e4553c3b Clean up av1_adapt_mv_probs format
Change-Id: Ib5226d4fe3dcf916fe8954c7240966e3a32eed31
2016-09-30 17:58:21 +00:00
Jingning Han
5c60cdf23f Sync assign_mv format
Change-Id: I4fea280d72d7e428f2ab0820fd728997d5a903c9
2016-09-30 17:58:06 +00:00
Jingning Han
3b0a3f3ab3 Merge "Set spatial neighbor search resolution 16x16 for block size 64x64" into nextgenv2 2016-09-30 17:57:52 +00:00
Jingning Han
dcf1b40d91 Merge "Search collocated reference block in 16x16 unit" into nextgenv2 2016-09-30 17:45:09 +00:00
Jingning Han
fd0cf16d7f Rename aom_write_nmv_probs as av1_write_nmv_probs
Change-Id: Ia33ce4918d3d40eba331f81909f3f1f0f3ab7a58
2016-09-30 10:34:33 -07:00
Jingning Han
75e513f126 Set spatial neighbor search resolution 16x16 for block size 64x64
When the block has width/height above or equal to 64, use 16x16
block search step for reference motion vector search in the non-
immediate rows and columns.

Change-Id: If11ce97a9328b879f30ef87115086aa0cd985a2f
2016-09-30 10:00:10 -07:00
Jingning Han
883c63ca57 Search collocated reference block in 16x16 unit
Use 16x16 block resolution for collocated reference motion vector
search.

Change-Id: I1091b5b178e255eb6cc0b994de360994f7661b79
2016-09-30 09:04:21 -07:00
Alex Converse
770911d48c Merge changes I319cb856,Ib009b6b6 into nextgenv2
* changes:
  Remove multi-entropy coder hacks from the treewriter
  Rename rans_dec_lut to rans_lut
2016-09-29 21:54:28 +00:00
Jingning Han
d54e5a04c4 Merge "more ref_mv changes from aom/master" into nextgenv2 2016-09-29 21:46:56 +00:00
Yue Chen
7dc7703bcb Merge "Fix unit test failure for RECT_TX + VAR_TX" into nextgenv2 2016-09-29 21:41:10 +00:00
Yaowu Xu
4306b6e599 more ref_mv changes from aom/master
Change-Id: I9152f898dfacdf3877ed719f193bb1e0dbee0a1a
2016-09-29 12:41:55 -07:00
Yue Chen
235133a22e Fix compiler error for GLOBAL_MOTION+WARPED_MOTION
Fix the logical OR computation in .mk file. Otherwise, when both
experiments are on, the output of $(filter... will be two 'yes',
which will cause missing library issue.

Change-Id: I53c44e925dc9ea77c7467217c20e4f1bc7e20fc3
2016-09-29 12:12:47 -07:00
Yue Chen
8e87224604 Merge "Move warping model estimation functions to COMMON folder" into nextgenv2 2016-09-29 18:24:32 +00:00
Alex Converse
57aa0f656d Merge changes Ideda50a6,Id2bced5f,If423eeb3 into nextgenv2
* changes:
  Port ANS from aom/master 25aaf40
  Refactor bitreader and bitwriter wrapper.
  Migrate aom/master ANS test from d311d02.
2016-09-29 16:43:12 +00:00
Yue Chen
49587a77f1 Fix unit test failure for RECT_TX + VAR_TX
Disable rect_tx because we only support 4x4 Walsh-Hadamard transform
in lossless mode.

Fixes failure in ./test_libaom --gtest_filter=*Large*ScreencastQ0/1
Configuration: --enable-experimental --enable-var-tx --enable-rect-tx
 --enable-ref-mv --enable-ext_intra --enable-ext_tx --enable-debug
 --disable-optimizations

Change-Id: Ib6b3494c7dcf7182f1cab9b138388d054851a23d
2016-09-29 09:20:52 -07:00
Debargha Mukherjee
485af9e580 Merge "Change non-uniform-quant parameters" into nextgenv2 2016-09-29 16:04:58 +00:00
Debargha Mukherjee
4ee1f71b5c Merge "Update codec name in test enviroment to match decoder" into nextgenv2 2016-09-29 16:04:37 +00:00
Jingning Han
4d7d2254bc Merge "mvref_common.c: port refactoring from aom/master" into nextgenv2 2016-09-29 15:45:20 +00:00
Jingning Han
3f485e9528 Merge "Remove an intermediate variable" into nextgenv2 2016-09-29 15:45:10 +00:00
Alex Converse
5847de75c2 Remove multi-entropy coder hacks from the treewriter
Change-Id: I319cb856a16ace343359c2aebc449c1d73bdedee
2016-09-28 15:35:12 -07:00
Alex Converse
33590f8c71 Rename rans_dec_lut to rans_lut
It's used in both encoding and decoding. Matches (historical)
implementation in aom/master.

Change-Id: Ib009b6b6023cfe69e99a0a92f3c70f4416fcdb47
2016-09-28 15:35:04 -07:00
Alex Converse
7fe2ae8e88 Port ANS from aom/master 25aaf40
Reconciles the following commits from aom/master to nextgenv2:
- 25aaf40bbc24beeb52de9af7d7624b7d7c6ce9de
- 87073de5693df70eba1c9b9be2b2732ed3b08fb3

Change-Id: Ideda50a6ec75485cb4fa7437c69f4e58d6a2ca73
2016-09-28 12:07:00 -07:00
Alex Converse
018150d01b Clang-format ransac.c
Change-Id: I1679da4fb8832133ab1bcb396f4bed4e5448e504
2016-09-28 12:07:00 -07:00
Nathan E. Egge
e691a24cff Refactor bitreader and bitwriter wrapper.
Move code for reading and writing literals and reading trees to use
just the aom_read_bit() and aom_write_bit() function calls.

Change-Id: Id2bced5f0125a5558030a813c51c3d79e5701873
(cherry picked from aom/master commit bc1ac15846a200272551699d45457039535e56b2)
2016-09-28 12:07:00 -07:00
Alex Converse
5d33cc42c3 Migrate aom/master ANS test from d311d02.
This helps in porting entropy coder changes that happened in aom/master.

Change-Id: If423eeb3da552066cceb88227138ea61d6a20f07
(cherry picked from aom/master commit d311d02da55433d20aad6dd88e0bbb992919988d)
2016-09-28 12:07:00 -07:00
Peter de Rivaz
105fa6d9f2 Update codec name in test enviroment to match decoder
The codec name is defined in av1_dx_iface.c
This name needs to match kAV1Name in decode_test_driver.cc.
Otherwise the EndtoEndPSNRTest fails when built with --enable-ext-tile,
(because we need the IsAV1 function to return true.)

Change-Id: I05d5ea5b6fd4bbd49e8bcacd047fb81c27efb3b3
2016-09-28 09:11:41 -07:00
Debargha Mukherjee
9324d38825 Change non-uniform-quant parameters
Also adds hooks to choose different profiles for UV and intra.

Results
lowres: -0.15%
midres: -0.24%

Change-Id: I4af8bc3e9b82b6f8a061dce9f52c89afa6239ae1
2016-09-28 09:09:35 -07:00
Yue Chen
1ab57800f1 Move warping model estimation functions to COMMON folder
These functions will be called by both enc and dec in WARPED_MOTION
experiment.

Change-Id: I4b4a20af111b30822760aee8c9451e9ccbb2dd05
2016-09-27 17:59:45 -07:00
Yi Luo
fda2f1b95a Merge "Add a TODO for aom_highbd_fdct16x16_1_sse2 tests" into nextgenv2 2016-09-27 22:21:54 +00:00
Yaowu Xu
dc035da9b9 mvref_common.c: port refactoring from aom/master
Change-Id: I53cf072f33de957eed6bf6be270218db8ff33af9
2016-09-27 11:59:15 -07:00
Yaowu Xu
439286a6c5 Remove an intermediate variable
This commit changes to use function parameter "len" directly.

Change-Id: I072d165aeca59cfbbcf52c9be3c2a91e3191b980
2016-09-27 10:13:33 -07:00
Yue Chen
6c6ddac3a4 Merge "Fix for compile error with RECT_TX without EXT_TX" into nextgenv2 2016-09-27 06:46:19 +00:00
Alex Converse
c8b229772e Merge changes I13eed9cb,I3b213790,I7232f9ae into nextgenv2
* changes:
  Remove VP10 style bitreader and bitwriter wrappers
  Rename av1_ans_test to match aom/master.
  Migrate bitreader to the interface from aom/master
2016-09-26 22:34:57 +00:00
Yaowu Xu
c7d6eaa5fe Merge "rename pred_mv_s8 to pred_mv" into nextgenv2 2016-09-26 21:12:05 +00:00
Alex Converse
4fb213f31f Remove VP10 style bitreader and bitwriter wrappers
Change-Id: I13eed9cb6950ea4fbdd586d43b73ac0cc2d78d33
2016-09-26 14:02:34 -07:00
Alex Converse
0ad82c6edb Rename av1_ans_test to match aom/master.
Change-Id: I3b2137903a87a1f8169ff45e940575b917c26a6a
2016-09-26 13:15:41 -07:00
Alex Converse
acef60bd2c Migrate bitreader to the interface from aom/master
Change-Id: I7232f9ae3d97e730f66e4b80f550192e3ef7230b
2016-09-26 12:19:11 -07:00
Sarah Parker
f94296dec6 Merge "Add double precision warping for ransac" into nextgenv2 2016-09-26 19:03:52 +00:00
Yaowu Xu
f5bbbfad1d rename pred_mv_s8 to pred_mv
Change-Id: Ib1088c3fc80952074e098385fe5eb81742e7dc59
2016-09-26 09:13:38 -07:00
Yaowu Xu
d9470c20df Merge "minor format fix" into nextgenv2 2016-09-26 15:13:05 +00:00
Yaowu Xu
3bf484efb2 Merge "change to use aomedia copyright notice" into nextgenv2 2016-09-26 15:12:57 +00:00
Peter de Rivaz
a7c814664e Fix for compile error with RECT_TX without EXT_TX
Change-Id: I2f4e3fc877c03a5bee7f7fd1dc50e6a693697647
2016-09-26 14:20:13 +01:00
Alex Converse
71427df526 Merge "enums.h: Combine related #defines into packed enums." into nextgenv2 2016-09-24 00:38:53 +00:00
Yaowu Xu
def1a3d65e minor format fix
Change-Id: Ia4a37d43a7110c84cda6ad317aa7f799e00bde82
2016-09-23 15:37:46 -07:00
Yaowu Xu
5e53c43ec7 change to use aomedia copyright notice
av1/common/allcommon.h
doc.mk

Change-Id: I7e08c9131ab1c0d7e7854f7e70b90397d041143a
2016-09-23 15:37:36 -07:00
Sarah Parker
97fa6da1d2 Add double precision warping for ransac
Change-Id: I32b6e2e6c8454ffb64e4a4ceb87070d175f05fe9
2016-09-23 11:19:27 -07:00
Alex Converse
1d1e0844e9 Merge "Migrate bitwriter to the interface in aom/master" into nextgenv2 2016-09-23 01:18:30 +00:00
Debargha Mukherjee
60b2927d51 Merge "Fix bug in table for UV tx ize" into nextgenv2 2016-09-22 18:53:51 +00:00
Debargha Mukherjee
6de06dd3d8 Fix bug in table for UV tx ize
Change-Id: I086b79462b0933cf9dc1101ff71cbc71c7da2738
2016-09-22 10:10:20 -07:00
Urvang Joshi
cb586f3ba9 enums.h: Combine related #defines into packed enums.
enums for BLOCK_SIZE, TX_SIZE and PREDICTION_MODE.

Note: These were converted to #defines earlier to save on memory:
https://chromium-review.googlesource.com/#/c/269854/

But we, instead, use attribute 'packed' (see here:
https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#Common-Type-Attributes)
to ensure that these enums use the smallest possible integer type,
and so use smallest memory when used in structs/arrays etc.

Change-Id: If1fc136686b28847109c9f3a06f8728165e7e475
2016-09-22 09:44:51 -07:00
Angie Chiang
6062a8bfee bitstream_debug: build related cleanup
Move experimental config from debug_util.c/h to aom_util.mk to avoid
empty object.

Change-Id: Id7978ed6a342262bddaa4df8b53115e750fa1c2c
2016-09-22 09:37:56 -07:00
Alex Converse
080a2cccba Migrate bitwriter to the interface in aom/master
Change-Id: I73d46229f0feea43cbe933e51da997833cce032b
2016-09-21 11:17:08 -07:00
Debargha Mukherjee
7a9ad9c83f Merge "Misc. refactoring of loop restoration" into nextgenv2 2016-09-21 04:37:17 +00:00
Debargha Mukherjee
5d89a63a7e Misc. refactoring of loop restoration
Streamilines the functions and data structures to make it
easy to add new restore options.

Change-Id: Ib00638a5749e6c38c2455f3e3142b1025e6e0624
2016-09-20 20:46:36 -07:00
Sarah Parker
8f71e396b1 Merge "Fix naming mistake in multiply_mat" into nextgenv2 2016-09-20 23:15:35 +00:00
Alex Converse
3e457ba154 Merge changes I38f40582,Ib7afcffa into nextgenv2
* changes:
  Move ANS to aom_dsp.
  Move and wrap the old vpx boolcoder.
2016-09-20 22:55:18 +00:00
Sarah Parker
8f90d8b59b Fix naming mistake in multiply_mat
This was introduced in a cleanup in
I1e07ccab18558dfdd996547a72a396abe02ed23d

Change-Id: If6ac798d838a1ad392981f4e5970778207c3cb0b
2016-09-20 15:37:15 -07:00
Yi Luo
fbf5681aae Add a TODO for aom_highbd_fdct16x16_1_sse2 tests
- Here function, aom_fdct16x16_1_sse2 is mistakely tested. It can pass
  AOM_BITS_8, AOM_BITS_10, but not AOM_BITS_12. We should fix this test
  when aom_highbd_fdct16x16_1_sse2 is available.

Change-Id: I5cac6ee5404ff6d833940e1ecc34663b29d7a41c
2016-09-19 16:26:08 -07:00
clang-format
bda8d61ed1 apply clang-format after 5cd2ab9
Change-Id: I186e90d99cd54e66d38159b7cb55a881226b1568
2016-09-19 15:56:08 -07:00
Alex Converse
674e9a7ca6 Merge "Use the aom_writer type rather than the tag in calling code." into nextgenv2 2016-09-19 21:50:56 +00:00
Alex Converse
1ac1ae73dc Move ANS to aom_dsp.
That's where it lives in aom/master.

Change-Id: I38f405827d9c2d0b06ef5f3bfd7cadc35d5991ef
2016-09-19 09:51:27 -07:00
Pascal Massimino
e5868cdba9 Merge "Kludge to keep ANS building while porting from aom/master." into nextgenv2 2016-09-18 07:21:58 +00:00
Alex Converse
e54fd03c5a Use the aom_writer type rather than the tag in calling code.
This makes room for typedefing some other struct to aom_writer.

Change-Id: I1e82de1320da00b3e41c90b14f2df45e7628aa89
(cherry picked from commit d69161f8f1eed602e0e5d21f4e6157b674e30cf6)
2016-09-17 14:56:51 -07:00
Alex Converse
eb00cb289b Move and wrap the old vpx boolcoder.
This should make room for compile time pluggable replacements.

Change-Id: Ib7afcffa93bf664b89a49da21a20138127443292
(cherry picked from commit 9dd0b8982445515d6dddb6342e655b56062a8f7f)
2016-09-17 14:56:51 -07:00
Alex Converse
9264650838 Kludge to keep ANS building while porting from aom/master.
Change-Id: I9e74bdb94c5640aca025b11b6676e8a8c008f47e
2016-09-17 14:56:48 -07:00
Debargha Mukherjee
4c80804e66 Merge "Enable tile-adaptive restoration" into nextgenv2 2016-09-17 19:10:28 +00:00
Debargha Mukherjee
5cd2ab95c9 Enable tile-adaptive restoration
Includes a major refactoring/enhancement to support
tile-adaptive switchable restoration. The framework can be
readily extended to add more restoration schemes in the
future. Also includes various cleanups and fixes.

Specifically the framework allows restoration to be conducted
on tiles such that each tile can be either left unrestored, or
use bilateral or wiener filtering.

There is a modest improvemnt in coding efficiency (0.1 - 0.2%).

Further enhancements will be added subsequently to improve coding
efficiency and complexity.

Change-Id: I5ebedb04785ce1ef6f324abe209e925c2d6cbe8a
2016-09-17 09:46:28 -07:00
Sarah Parker
f9a961c5d0 Style fixes for global motion experiment
These are in response to a post-commit review in
Ib6664df44090e8cfa4db9f2f9e0556931ccfe5c8

Change-Id: I1e07ccab18558dfdd996547a72a396abe02ed23d
2016-09-16 16:22:24 -07:00
clang-format
67948d312d apply clang-format
Change-Id: If22018f8911d9d7ee99c2127bdfcc56e42b0e2d7
2016-09-15 16:41:21 -07:00
James Zern
964a717acf .clang-format: update to 3.8.1
based on --style=Google with the following differences:
3a4
> # Generated with clang-format 3.8.1
13c14
< AllowShortCaseLabelsOnASingleLine: false
---
> AllowShortCaseLabelsOnASingleLine: true
41c42
< ConstructorInitializerAllOnOneLineOrOnePerLine: true
---
> ConstructorInitializerAllOnOneLineOrOnePerLine: false
44,45c45,46
< Cpp11BracedListStyle: true
< DerivePointerAlignment: true
---
> Cpp11BracedListStyle: false
> DerivePointerAlignment: false
73c74
< PointerAlignment: Left
---
> PointerAlignment: Right
75c76
< SortIncludes:    true
---
> SortIncludes:    false

SortIncludes will like be enabled in a future commit

Change-Id: I5c404f44081b65354e7f526411c91fbbe31ac5af
(cherry picked from commit 6d84689870e1437b2ebb5df56c672b3249b975bb)
2016-09-15 15:12:14 -07:00
Jingning Han
1aab81843d Sort header files
cherry-picked #ecd07473 from aom/master

Change-Id: Id8f45d9c11406fc301b39801c5228ccd6aa2d5d6
2016-09-09 16:45:02 -07:00
Jim Bankoski
f7f043774b aomdec.sh : Make this test create files if needed to test decoder.
If test files don't already exist it calls aomenc to create them.

cherry-picked #ee9ac321 from aom/master

Change-Id: I0e0f33cb60b3492e9106d6c9e2c51f64f71ebb63
2016-09-09 16:39:21 -07:00
Jim Bankoski
5d105b40c3 simple_encoder: make it so we can run it in tests.
Added a limit, resolving a todo and added a limit parameter so that we
can do a very simple fast encode in 1 pass.

Change-Id: I265cd912d970d560a0b00b86e6c7ec7b6fef1e7b
2016-09-09 15:54:51 -07:00
Jim Bankoski
e78a964e29 simple_decoder.sh: Support encoding in decode test scripts.
Adding AV1 input files to the test set is not feasible because the
bitstream is in constant flux. Add test input encoding and hook
it up in simple_decoder.sh to start.

cherry-picked #b591df89 from aom/master

Change-Id: Ie4c06a7c458cdc2ab003d27fb92418c77c87fc88
2016-09-09 15:49:56 -07:00
Yaowu Xu
2a88d24907 Merge "Convert to int before adding negative numbers" into nextgenv2 2016-09-09 22:39:32 +00:00
Yaowu Xu
f9490ff58a Merge "Convert "var" to uint64_t" into nextgenv2 2016-09-09 22:39:24 +00:00
Yaowu Xu
ca38a67a5c Merge "twopass_encoder: sample and test script fixed." into nextgenv2 2016-09-09 22:39:16 +00:00
Yaowu Xu
250a52ed06 Merge "set_maps: add back script and fix." into nextgenv2 2016-09-09 22:39:07 +00:00
Yaowu Xu
66c41f9937 Merge "Clarify valid value ranges" into nextgenv2 2016-09-09 22:38:57 +00:00
Yaowu Xu
34ddb7ab1f Merge "change to use correct type" into nextgenv2 2016-09-09 22:38:44 +00:00
Debargha Mukherjee
8e80f422d6 Merge "Add SSE2 versions of av1_fht8x16 and av1_fht16x8" into nextgenv2 2016-09-09 20:51:03 +00:00
Yaowu Xu
8706182376 Convert to int before adding negative numbers
This is avoid that -1 overflows uint32_t.

cherry-picked #c48106da from aom/master

Change-Id: Ic3d99b1985cdb0a28cc83f8291422f5aba5a5a6d
2016-09-09 12:43:02 -07:00
Yaowu Xu
aa8729c55f Convert "var" to uint64_t
This is to avoid overflow at uint32_t.

cherry-picked #000098a0 from aom/master

Change-Id: I549d2d13d0577fd05d57303a438fbc8034755e45
2016-09-09 12:42:12 -07:00
Jim Bankoski
a65e7beea8 twopass_encoder: sample and test script fixed.
Added a limit function and removed a todo and fixed script so that
it can actually be run on av1.

cherry-picked #1801d35d from aom/master

Change-Id: Ib8d1d1b5c7dbe0169e4e6c7d89d28801d7699c37
2016-09-09 12:38:37 -07:00
Jim Bankoski
a7a3909f55 set_maps: add back script and fix.
cherry-picked #a5c5f856 from aom/master

Change-Id: Ie50a81063b5e14f4b5f3b5adcb822dba6b3ee93d
2016-09-09 12:33:09 -07:00
Yaowu Xu
6feda0602a Clarify valid value ranges
This commit adds asserts to clarify value ranges in sum computations,
also corrects type conversion used in related calculations.

cherry-picked #738d5b19 from aom/master

Change-Id: Ib6d574ec23e5c28ccd994dac26f973eb3920430d
2016-09-09 11:58:53 -07:00
Yaowu Xu
57d92577d4 change to use correct type
This commit changes to use uint32_t for cost (always non-negative),
and promote to int64_t before calculation of the savings.

This fixes an integer overflow.

cherry-picked #a3028ddf from aom/master

Change-Id: I71c2580d188cc79d2d8069241d0353cf331b5c83
2016-09-09 11:52:34 -07:00
Jim Bankoski
19a06bccdf resize_util.sh : resize util was removed.
The app this script called was removed in this patch.
50cbe24 remove more vp8 and vp9 only code

cherry-picked #1c17dd6f from aom/master

Change-Id: Ib622eff6a3a35c5dab26908b094ace969f128c11
2016-09-09 11:51:13 -07:00
Thomas Daede
ac0a380ea2 Make deadline mode not depend on frame duration.
Backwards compatible with old API.

cherry-picked #02ae3dd3 from aom/master.

Change-Id: I65aa43f84bb9491e8cca73fe444094c2622b0187
2016-09-09 11:50:33 -07:00
Thomas Daede
f56859f198 Fix decoding Daala deringing and CLPF filters with tiling.
cherry-picked #14ed7a61 from aom/master

Change-Id: I077b0e97186bdd292f925e08966a2ca3cf8c250d
2016-09-09 11:47:56 -07:00
Yaowu Xu
af048635bb Change to use correct type
This commit changes to use int instead of unsigned for a variable used in
inverse quantization.

Change-Id: I8f0ff5f80c9e68d52425265ef177357c65ead1e2
2016-09-09 18:47:15 +00:00
Jim Bankoski
23938a73c0 aomenc: Remove tests unsupported in av1.
Change-Id: I9379eedd577c8bfb7b82f1c996e4ee4c62ce686b
2016-09-09 18:46:57 +00:00
Yaowu Xu
3a45d57574 vp8_multi_resolution_encoder.sh: remove file
Change-Id: I3be6480b98cdde4c24b6cdfbebf362072153bcca
2016-09-09 18:46:40 +00:00
Yaowu Xu
1ff9579773 restore vp9 and vpx in libwebm
renaming should not have been applied to third_party.

Change-Id: I95be7ec4b7558298cd49ec4c5d1ed15a17ad222b
2016-09-09 18:46:13 +00:00
Yaowu Xu
70287defe1 Merge "simplify test code" into nextgenv2 2016-09-09 18:45:54 +00:00
Geza Lore
1a800f6539 Add SSE2 versions of av1_fht8x16 and av1_fht16x8
Encoder speedup ~2% with ext-tx + rect-tx

Change-Id: Id56ddf102a887de31d181bde6d8ef8c4f03da945
2016-09-09 11:29:41 -07:00
Debargha Mukherjee
d610d209c9 Merge "Fix some var_tx related rd_costing mismatches" into nextgenv2 2016-09-09 17:46:38 +00:00
Sarah Parker
e51ee021dc Merge "Swap order of affine parameters" into nextgenv2 2016-09-09 17:08:11 +00:00
Yaowu Xu
81fb4cf1ee simplify test code
Change-Id: Ib5491fb8f5dd7edf27c74abdd21b1f0a42aafd1f
2016-09-09 16:40:58 +00:00
Debargha Mukherjee
797cc30f23 Merge "Rd fixes and cleanups" into nextgenv2 2016-09-09 01:52:59 +00:00
James Zern
ea3621ab95 Merge "aom_mem,align_addr: use ~ to create mask" into nextgenv2 2016-09-09 01:41:31 +00:00
Yaowu Xu
fe24b956e9 aom_cx_set_ref: add example showing setting reference frame
Manually cherry-picked from AOM:
16944e59 aom_cx_set_ref: Example showing setting a reference frame
8f4c0ec8 examples.mk - Invalid comment fixed

Change-Id: Ifa87611561b089aebef2c132099baf265c845b10
2016-09-08 17:36:44 -07:00
Yaowu Xu
628d3c5839 variance_impl_avx2.c: align a table for better readability
Change-Id: I8cd99f9807dbfe6f70147615d2fd6775a7d98c16
2016-09-08 17:36:44 -07:00
James Zern
7b9407a81b s/INTERP_FILTER/InterpFilter/
this matches style guidelines and stabilizes successive runs of
clang-format across the tree. remaining types should be address in
successive commits.

Change-Id: I6ad3f69cf0a22cb9a9b895b272195f891f71170f
2016-09-09 00:32:31 +00:00
Debargha Mukherjee
096ae4cb68 Rd fixes and cleanups
A minor cleanup and an enhancement to return y_skip correctly
from sub8x8 intra mode search.

Change-Id: I87256d3cc5f57a2fd7b837d461cc1a7f06e01a1b
2016-09-08 15:48:05 -07:00
Peter de Rivaz
c0b4d7ae2c Fix some var_tx related rd_costing mismatches
This makes the code in select_tx_size_fix_type match the
corresponding code in pack_inter_mode_mvs.

Change-Id: I69bcc0dc6fdd733091fafe9188a3f7397e1e613f
2016-09-08 12:04:55 -07:00
James Zern
20b859833c aom_mem,align_addr: use ~ to create mask
removes the need for an intermediate cast to int, which was missing in
the call added in:
73a3fd4 aom_mem: Refactor code

quiets a visual studio warning:
C4146: unary minus operator applied to unsigned type, result still
unsigned

Change-Id: I76c4003416759c6c76b78f74de7c0d2ba5071216
2016-09-08 10:45:15 -07:00
James Zern
9fa47587d9 fix 'dist' & other decode-only builds
common/av1_fwd_txfm.[hc] are encode-only; add a TODO to relocate them

Change-Id: I28cf8d0b22632b04066bcb72f3d2252ee7eb153e
2016-09-08 14:53:42 +00:00
James Zern
ba98061af3 av1_inv_txfm_test: fix decode-only build
fdct's are only enabled with --enable-av1-encoder

Change-Id: Iaf1dfdf713f2ecd1d215ba7ec635f353c02fa4d0
2016-09-07 16:33:35 -07:00
Debargha Mukherjee
d125b7a0cd Merge "Parameter adjustments to loop restoration" into nextgenv2 2016-09-07 21:26:34 +00:00
Debargha Mukherjee
035c5f34eb Parameter adjustments to loop restoration
Some minor adjustments to tile size and bilateral filters.

About 0.1% improvement for midres and hdres, very small change for
lowres.

Change-Id: Ia94f68a926867dfd67da1a8795fd8de0ddd8e2d6
2016-09-07 13:51:01 -07:00
Sarah Parker
c4bcb50635 Swap order of affine parameters
This allows for a clean subtraction of 1 along the transform
matrix diagonal and also makes the order of the parameter list
a little more intuitive.

Change-Id: I6a5d754af41b8d1292f241f9b21473160517d24f
2016-09-07 13:41:03 -07:00
Sarah Parker
3410a88373 Merge "Add parameter search to global motion computation" into nextgenv2 2016-09-07 20:39:37 +00:00
Sarah Parker
e3b8ff50f2 Fix hbd naming mistake in warped_motion.h
This changes a remaining VP9_HIGHBITDEPTH to AOM_HIGHBITDEPTH

Change-Id: I35efaf9528de660fb69104792a563dba5c41f329
2016-09-07 12:20:16 -07:00
Debargha Mukherjee
f579555423 Merge "Minor transform code cleanup" into nextgenv2 2016-09-07 16:59:42 +00:00
Debargha Mukherjee
ff4e315d13 Merge "Harmonize and fix coeff context computation" into nextgenv2 2016-09-07 16:54:21 +00:00
Pascal Massimino
4d5fda029c Merge "aom_mem.c: remove unnecessary inline" into nextgenv2 2016-09-07 09:34:37 +00:00
James Zern
cd24516347 aom_mem.c: remove unnecessary inline
these aren't overly speed critical, best to leave it to the compiler. as
a side-effect this fixes Visual Studio compilation (should have been
INLINE)

Change-Id: Ic81fb5ac76bc19c61efb2f1a965c0f79e9e45ebd
2016-09-06 23:36:59 -07:00
James Zern
5d986e5a30 odintrin.h: add missing extern "C"
fixes test linkage

Change-Id: I15a7b32551fddc5e78e3035e9d2e94a57ff9f1d2
2016-09-06 23:31:26 -07:00
Sarah Parker
cda2345787 Merge "Adjust types in hbd error computation to avoid overflow" into nextgenv2 2016-09-07 03:28:07 +00:00
Sarah Parker
ca92da752b Adjust types in hbd error computation to avoid overflow
Change-Id: I8e08ebc8cbb2d1a1f97c8ef0c9237d8dfe0df208
2016-09-06 19:43:01 -07:00
Sarah Parker
ecb0afc838 Add parameter search to global motion computation
Change-Id: I66ea5a819ab54ecb5327eee20f798d7d7f0833d3
2016-09-06 19:33:52 -07:00
Sarah Parker
1d22837fbd Merge "Fix formatting in internal stats for vp10" into nextgenv2 2016-09-07 00:53:14 +00:00
Yaowu Xu
a668cce3b5 Merge "Use AOMedia's Patents and LICENSE files" into nextgenv2 2016-09-07 00:34:34 +00:00
Yaowu Xu
cf92ae9f16 Merge "README.libvpx -> README.libaom" into nextgenv2 2016-09-07 00:34:25 +00:00
Yaowu Xu
0dc4cbb059 sad_avx2.c: add hints for clang-foramt
Change-Id: I721c52e69395a99b3a0395dc229de1cbb32670e9
2016-09-07 00:29:13 +00:00
Yaowu Xu
151864fcff Use AOMedia's Patents and LICENSE files
Change-Id: Icb53448442a8f341af3799d873e2fd6f3db5fbe2
2016-09-06 16:09:03 -07:00
Yaowu Xu
848668bee7 README.libvpx -> README.libaom
Change-Id: Ie7dd4aeee084ef9520f68663aa566ea32350e227
2016-09-06 14:35:52 -07:00
Urvang Joshi
497f27ed9d aom_realloc correction.
aom_realloc was allocating 1 byte more than needed every time.
Fixed this, and took this opportunity to do a small refactoring.

Change-Id: I38fcb62b698894acbbab43466c1decd12f906789
2016-09-06 21:27:20 +00:00
Urvang Joshi
73a3fd4710 aom_mem: Refactor code
Change-Id: I2da9cd5da48ae97e770bccfd1233bcc70b484688
2016-09-06 21:27:03 +00:00
Yaowu Xu
e14a42a453 Merge "Move CHECK_MEM_ERROR implementation to aom/internal." into nextgenv2 2016-09-06 21:26:52 +00:00
Alex Converse
f5550733e8 Move CHECK_MEM_ERROR implementation to aom/internal.
Allow using it in aom_dsp.

Change-Id: Ide7d58b6d11f8a45d473fc13bf730ba5bccb5516
2016-09-06 21:23:36 +00:00
Debargha Mukherjee
2963054ef6 Harmonize and fix coeff context computation
Change-Id: I75740e221deb3872647bd480ae506ba68800e8c7
2016-09-06 13:23:31 -07:00
Yue Chen
a1e48dccf2 Make RECT_TX(>=8x8) work with VAR_TX
Bitstream syntax:
For a rectangular inter block, 'rect_tx' flag is sent to indicate if
the biggest rect tx is used. If no, continue to decode regular
recursive tx partition.

Change-Id: I127e35cc619b65acb5e9a0717f399cdcdb73fbf0
2016-09-06 11:26:15 -07:00
Sarah Parker
5ebdf40d77 Merge "Add global motion experiment to rdopt" into nextgenv2 2016-09-06 18:07:31 +00:00
Debargha Mukherjee
40da2c899a Merge "Enable rectangular transforms for UV" into nextgenv2 2016-09-06 15:46:21 +00:00
Yaowu Xu
f87b9021f1 Fix a compiler warning of unused variable
Change-Id: I4a2faa32cc0847fe14dd8f40156163f4713055ca
2016-09-06 14:52:49 +00:00
Yaowu Xu
037845507d Avoid re-use same temp variables
In highbd_quantize_intrin_sse2.c.

Change-Id: Iaf6360e456f1fb2f8ff06461afbfecfc0103dda3
2016-09-06 14:52:19 +00:00
Yaowu Xu
34b0ee61b2 quantize.c: int->uint32_t for absolute values
Change-Id: I784f32e0e86d873655e46cf68c5c124a698af361
2016-09-06 14:51:47 +00:00
Yaowu Xu
1f9356a536 aom_dsp: AV1_IADST8x16_1D to AOM_IADST8x16_1D
Change-Id: Iba415ab2d4adb3350b4747a58f69db7d02bbab68
2016-09-06 14:51:32 +00:00
Debargha Mukherjee
2f12340ff0 Enable rectangular transforms for UV
Uses an array to map block sizes, y tx sizes, and subsampling
factors to various transform sizes for UV.

Results improve by 0.1-0.2%

Change-Id: Icb58fd96bc7c01a72cbf1332fe2be4d55a0feedc
2016-09-05 15:06:19 -07:00
Sarah Parker
f97b7860d5 Fix formatting in internal stats for vp10
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.

The corresponding fixes for vp9 and vp8 are in
Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1.

Change-Id: Ica3d625d6672b3c47e0e208b45eede29b9004030
2016-09-03 12:02:01 -07:00
Yaowu Xu
01bade1064 Removed tests and data not in use
Change-Id: If688da3089ad33f18751fa2f8c46b6f5dc708bd2
2016-09-03 00:06:09 +00:00
Urvang Joshi
51dcf564b8 Merge "test_intra_pred_speed.cc : Fix visual studio build." into nextgenv2 2016-09-02 23:12:34 +00:00
Yaowu Xu
ecee7f29d0 Merge "Change to use AOM copyright notice" into nextgenv2 2016-09-02 22:13:24 +00:00
Urvang Joshi
31744ec4f2 test_intra_pred_speed.cc : Fix visual studio build.
Visual studio doesn't like nested macros, apparently. This patch should
fix it.

Change-Id: Ifa56fae5be0b3dfd3fecd88a8a443e39135f96ab
2016-09-02 15:11:59 -07:00
Yaowu Xu
2ab7ff05f1 Change to use AOM copyright notice
Change-Id: I2b2b70e756b7eb9611b7b33b7d5f19b3b30e0a50
2016-09-02 19:52:03 +00:00
Yunqing Wang
99c6637dfa Merge "Remove unused buffer allocation functions" into nextgenv2 2016-09-02 17:52:48 +00:00
Yaowu Xu
0efe92f177 Merge "aomcx_set_ref -> aom_cx_set_ref" into nextgenv2 2016-09-02 17:41:44 +00:00
Yaowu Xu
0764955002 Merge "Change to use aom copyright notice" into nextgenv2 2016-09-02 17:41:21 +00:00
Angie Chiang
0bfe491967 Merge "Add frame info in bitstream debug tool" into nextgenv2 2016-09-02 17:05:48 +00:00
Yunqing Wang
8aa228019c Remove unused buffer allocation functions
Removed unused buffer allocation functions.

Change-Id: Ib779dde9ad6a511d88b7f7cba2604902eff7aa05
2016-09-02 09:23:53 -07:00
Yaowu Xu
890c4f2497 aomcx_set_ref -> aom_cx_set_ref
Change-Id: I60dd645451d6d65465f099a16ac855fb0b5a57a9
2016-09-02 08:54:47 -07:00
Yaowu Xu
9c01aa1b0c Change to use aom copyright notice
This minimize code differences between AOM master and nextgenv2

Change-Id: If144865bdf3ef0818e7aac11018b9e786444c550
2016-09-02 08:22:07 -07:00
Geza Lore
a1ddae59eb Minor transform code cleanup
- Localize static lookup tables in the sole functions that use them.
- Remove dead high bit-depth IDST functions.
- Apply clang-format

Change-Id: Ibbd7db4259f9ea64d695b2f13f5c118aac8f1cf9
2016-09-02 09:58:09 +01:00
Debargha Mukherjee
a782a3b68f Merge "Some cleanups for unnecessary macros" into nextgenv2 2016-09-02 08:37:36 +00:00
Sarah Parker
e529986568 Add global motion experiment to rdopt
This patch completes the global motion experiment
implementation. It modifies the format of the motion
parameters to use the mv union to facilitate faster
copying and checks for parameters equal to 0 that occur
frequently in rdopt. The rd decisions for the global motion experiment
have also been added to rdopt.
Change-Id: Idfb9f0c6d23e538221763881099c5a2a3891f5a9
2016-09-01 19:51:11 -07:00
Yaowu Xu
9c323bc272 Port two daala_dering changes from AOMedia
03394bd Remove dead code from av1_dering_search.
337b23a Changing the weights of the first CRF filter in deringing

Change-Id: I1216c146dc3f72f24ceec3d3c65c4dd6cd73623e
2016-09-02 00:39:52 +00:00
Yaowu Xu
3b95d59a1b rename two mk files to make naming consistent
av1cx.mk -> av1_cx.mk
av1dx.mk -> av1_dx.mk

Change-Id: I698bd65b933c433066d5dfeb94cee680095508e4
2016-09-02 00:39:32 +00:00
Yaowu Xu
14292bbb10 Merge "Add explict conversion from int64_t to int" into nextgenv2 2016-09-02 00:39:19 +00:00
Angie Chiang
cb9a9ebd81 Add frame info in bitstream debug tool
Change-Id: Iead3edd8563d7900481eb199e8b003d2d3df075b
2016-09-01 16:24:49 -07:00
Yaowu Xu
9702fcbb16 Add explict conversion from int64_t to int
The values after right shifts should fit into 32bit int. The commit
fixes MSVC build warning when new-quant is enabled.

Change-Id: Ic89dd86fb981a1206653943658af2b6b2925a676
2016-09-01 22:33:56 +00:00
Yaowu Xu
c8b2fd8022 Merge ".gitignore: corrent entries from vpx to aom" into nextgenv2 2016-09-01 22:33:19 +00:00
Urvang Joshi
0d515b29b1 Merge "Add ALT_INTRA experiment." into nextgenv2 2016-09-01 21:45:32 +00:00
Yaowu Xu
75c9abd28f .gitignore: corrent entries from vpx to aom
Change-Id: I8af6a9723c31c0f868e9bd75dcc079413a3700c4
2016-09-01 13:57:13 -07:00
Urvang Joshi
340593e530 Add ALT_INTRA experiment.
When the experiment is ON, we use Paeth predictor instead of TM
predictor.

For derf set, this gives about 0.09% improvement overall, and 0.55%
improvement if all frames are forced to be intra-only.

Also, if the EXT_INTRA experiment is also on, the improvement overall
is 0.056%, and improvement if all frames are forced to be intra-only is
0.465%.

Change-Id: Id74e107ede70a8d2107fa14fcb3f44b23a437274
2016-09-01 12:03:20 -07:00
Yaowu Xu
f7ae12d7fd add an explicit conversion from size_t to int
Function ans_read_int() takes int as parameter, this commit uses an
explicit conversion to avoid MSVC building warning.

Change-Id: Ia405e1d5a86c0f42932fa1da29417ccbf2dd58e7
2016-09-01 08:59:46 -07:00
Yaowu Xu
958303c4c6 Replace inline with INLINE
This fixes msvc build errors.

Change-Id: I1344685e891db61ba569d818e0f2167b2978c299
2016-09-01 08:45:22 -07:00
Debargha Mukherjee
3b52b3ac27 Some cleanups for unnecessary macros
Remove some macros that are no longer necessary for experimentation.

Change-Id: I959bf441c8333607df4aa1ee18841f189ade8112
2016-09-01 00:30:32 -07:00
Yaowu Xu
f883b42cab Port renaming changes from AOMedia
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7  Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*

Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419
2016-08-31 18:19:03 -07:00
Yaowu Xu
c27fc14b02 Port folder renaming changes from AOM
Manually cherry-picked commits:
ceef058 libvpx->libaom part2
3d26d91 libvpx -> libaom
cfea7dd vp10/ -> av1/
3a8eff7 Fix a build issue for a test
bf4202e Rename vpx to aom

Change-Id: I1b0eb5a40796e3aaf41c58984b4229a439a597dc
2016-08-31 17:26:24 -07:00
Yunqing Wang
b1fb998c46 Merge "Change buffer_alloc_sz and frame_size type to size_t" into nextgenv2 2016-08-31 23:56:02 +00:00
Yunqing Wang
a722a114d6 Change buffer_alloc_sz and frame_size type to size_t
1. Changed buffer_alloc_sz and frame_size type to size_t.
2. Added a TODO for video resolution limits. On 32 bit systems, the maximum
resolution supported in the encoder is 4k(3840x2160). The malloc() would
fail if encoding >4k video on a 32 bit system.

Change-Id: Ibd91b28fd63d1b04e8ac9a5270a17629f239188a
2016-08-31 14:56:21 -07:00
Yunqing Wang
0de5d5d221 Merge "Fix Visual Studio build warnings" into nextgenv2 2016-08-31 18:50:07 +00:00
Yunqing Wang
53db7d0caa Merge "Remove unused buffer allocation functions" into nextgenv2 2016-08-31 18:49:29 +00:00
Zoe Liu
03a11f6ceb Fix a bug in calculating the compound ref frame cost
The previous ext-refs experiment did not consider the cost of the 2nd
reference frame on mode decision in the compound mode. With the fix,
using Overall PSNR, compared to the previous ext-refs RD performance
before the bug fix, all against the baseline, the improvements are:

"ext-refs" before fix: lowres -5.665%  midres: -4.833%
"ext-refs" after fix:  lowres -5.776%  midres: -5.000%
Improvement by the fix: lowres -0.111%  midres: -0.167%

Change-Id: I2eceedf2d4046b169514e049fd01baaf0bbb50c6
2016-08-31 09:43:31 -07:00
Zoe Liu
2033078a18 Merge "Fix a bug in deciding ref frame context in ext-refs" into nextgenv2 2016-08-31 16:42:43 +00:00
Wei-Ting
966e609e95 Make an unmeaningful change to be added into the list
Change-Id: I578589a955bd3f3d7ff61723b574361661453f67
2016-08-30 15:42:32 -07:00
Zoe Liu
27af52300e Fix a bug in deciding ref frame context in ext-refs
Change-Id: Ie58b98baa870c5d2a5b7193f8fe4f84fd7ec6c16
2016-08-30 10:20:04 -07:00
Yunqing Wang
ed07056f1a Fix Visual Studio build warnings
Fixed a list of VS warnings. Warning message:
..\test\vp10_convolve_test.cc(34): warning C4244: 'initializing' : conversion
from 'ptrdiff_t' to 'int', possible loss of data

Change-Id: I9a1d3978a79fbb7b1ac028c5713ac72b6ff99172
2016-08-30 09:40:24 -07:00
Debargha Mukherjee
df73dd0dc3 Merge "clpf experiment build fix" into nextgenv2 2016-08-30 05:45:33 +00:00
Sarah Parker
4dc0f1b186 Implement global motion parameter computation
This computes global motion parameters between 2 frames by
matching corresponding points using FAST feature and then
fitting a model using RANSAC.

Change-Id: Ib6664df44090e8cfa4db9f2f9e0556931ccfe5c8
2016-08-29 16:59:43 -07:00
Yunqing Wang
e9947a8d2d Remove unused buffer allocation functions
Removed unused buffer allocation functions.

Change-Id: I5aa265a7698a2d2df736ddb89c6d93a5ee34895b
2016-08-29 15:02:09 -07:00
Debargha Mukherjee
162f5f792b Merge "Tiling in loop restoration + cosmetics" into nextgenv2 2016-08-29 20:46:13 +00:00
Debargha Mukherjee
100846a8ac clpf experiment build fix
Change-Id: I729e14916ecb58b5a75756078ab96a2d340bc0d6
2016-08-29 12:28:00 -07:00
Aamir Anis
e40e6e576a Tiling in loop restoration + cosmetics
Frame can be split into rectangular tiles for application of separate
bilateral or Wiener filters per tile. Some variable names changed for
better readability.

Change-Id: I13ebc4d0b0baf368e524db5ce276f03ed76af9c8
2016-08-29 11:24:11 -07:00
Debargha Mukherjee
8ee5ab9f13 Fix for supertx with rect-tx
Change-Id: I0cc3523a8992f889f8dd203449ceb55f2a422324
2016-08-29 11:16:17 -07:00
Pascal Massimino
04ed7ad57b fix uint32_t <-> size_t mismatch in tests
Change-Id: Ifde4d57957219560e01ebd1657f1c0721f041054
2016-08-29 09:09:09 +02:00
James Zern
f7a865859b Merge "vp10_alloc_context_buffers: clear cm->mi* on failure" into nextgenv2 2016-08-27 16:43:53 +00:00
Jingning Han
003dff6962 Merge "Fix VS build warnings in blend_a64_mask_test.cc" into nextgenv2 2016-08-26 21:47:39 +00:00
Hui Su
976a9b9304 Merge "Remove unnecessary buffer pointers in PICK_MODE_CONTEXT" into nextgenv2 2016-08-26 21:38:14 +00:00
Jingning Han
91ae5d954a Fix VS build warnings in blend_a64_mask_test.cc
Change-Id: Id4c764198549a60d98e5c4a74083972b97da5b81
2016-08-26 11:25:08 -07:00
Debargha Mukherjee
8b7e4dbaf4 Fix compile error in dering
Change-Id: I56890c813de1b366e4ef482d9fc6da81034636ab
2016-08-26 10:48:16 -07:00
Geza Lore
d21982c80f Use rectangular transforms for >= 8x8 blocks
For rectangular blocks between 8x8 and 32x32, we can now code the
transform size as one bigger than the largest square that fits in
the block (eg, for 16x8, we can code a transform size of 16x16
rather than the previous maximum of 8x8), when this oversized
transform is coded in the bitstream, the codec will use the full
size rectangular transform for that block (eg 16x8 transform in
the above example).

Also fixes a scaling bug in 16x8/8x16 transforms.

Change-Id: I62ce75f1b01c46fe2fbc727ce4abef695f4fcd43
2016-08-25 17:31:51 -07:00
Sarah Parker
a97fd6c43e Merge "Update VP9_PROB_COST_SHIFT to VP10_PROB_COST_SHIFT" into nextgenv2 2016-08-25 18:31:22 +00:00
Wei-ting Lin
4c7e1cd973 Separate EXT_ARFs' frame context index
This commit separate the frame index of EXT_ARFs' from other frame
types in the ext-refs setting.

It improves the average RD performance by

0.206% in the lowres, and
0.173% in the midres.

The overall gains for the ext-refs compared to the baseline are

5.665% in the lowres, and
4.883% in the midres.

Change-Id: I6591ad29120880c1aef0bd0b7cf15238c3f3b8f3
2016-08-25 09:31:00 -07:00
Yunqing Wang
167a4efbb5 Merge "Fix motion vector out of range bugs" into nextgenv2 2016-08-25 15:29:20 +00:00
Sarah Parker
6632915485 Update VP9_PROB_COST_SHIFT to VP10_PROB_COST_SHIFT
Change-Id: Ie1416569e73e66518cdb2765d79a2fb3dd570489
2016-08-24 17:25:00 -07:00
Yunqing Wang
90e12eaecb Fix motion vector out of range bugs
2 bugs were fixed in VP9.
https://chromium-review.googlesource.com/#/c/366873/
https://chromium-review.googlesource.com/#/c/368440/
Fixed them in VP10 as well.

Change-Id: I2e53fabc6131ff80ba6dcfd4c73eb76c59b4c474
2016-08-24 17:11:14 -07:00
Urvang Joshi
c691864423 Merge "gitignore: add some entries" into nextgenv2 2016-08-24 22:12:03 +00:00
Urvang Joshi
4b3f980828 Merge "test_intra_pred_speed fix: use dspr2 version when HAVE_DSPR2" into nextgenv2 2016-08-24 22:10:01 +00:00
hui su
71c625d758 Remove unnecessary buffer pointers in PICK_MODE_CONTEXT
Change-Id: I600af6a66dc0e1310c8bfc7c16efa8a82e90856d
2016-08-24 14:18:56 -07:00
Yue Chen
35d4524b5b Merge "Make rectangular txfm in EXT_TX work with VAR_TX" into nextgenv2 2016-08-23 23:54:40 +00:00
Urvang Joshi
f1906e966a Palette code: remove the use of same if condition twice.
rd_pick_palette_intra_sby() method is called only when,
cpi->common.allow_screen_content_tools is on. So, no need to check that
again. We just use an assert() instead to still be safe.

Change-Id: I19785c2aac016798c8d331bbe91971b3806b73a8
2016-08-23 15:01:41 -07:00
Urvang Joshi
7e5aa9e7a5 Merge "Rename CONFIG_VPX_HIGHBITDEPTH -> CONFIG_VP9_HIGHBITDEPTH" into nextgenv2 2016-08-23 19:33:53 +00:00
Yue Chen
e57b1a5ea5 Make rectangular txfm in EXT_TX work with VAR_TX
Adapt rectangular txfm experiment to syntax/tokenization/loopfilter
framework of VAR_TX

Change-Id: Idcb005ecf5b3712de3e1cccb0d811ca16d87af24
2016-08-23 12:11:23 -07:00
Urvang Joshi
e4e63b63c0 Rename CONFIG_VPX_HIGHBITDEPTH -> CONFIG_VP9_HIGHBITDEPTH
"vpx-highbitdepth" config doesn't exist.

Change-Id: Ib6d3691454299bb381ecc75b80657fbebf9f59b2
2016-08-23 12:04:18 -07:00
Urvang Joshi
3bcf3f07ac test_intra_pred_speed fix: use dspr2 version when HAVE_DSPR2
Change-Id: Ie7c78e19e077516615c71669022f505f8b3c80ca
2016-08-23 11:29:44 -07:00
Urvang Joshi
3ea2c234fa gitignore: add some entries
Change-Id: I65507c3d132b2b3ba90cf0a7b1c729da7d3de15f
2016-08-23 11:19:17 -07:00
Wei-ting Lin
7fed5044ca Merge "Allow LF_UPDATE type of frames to use BWDREF" into nextgenv2 2016-08-23 18:06:56 +00:00
Debargha Mukherjee
49b85d3965 Missing fixes for rect-tx
Reintroducing some fixes that were dropped inadvertently in
course of rebasing.

Change-Id: I5f51160c586010590d4bfd5cf225fb21347b0a40
2016-08-23 07:12:51 -07:00
Yaowu Xu
9a89ec5447 Merge "Make type conversion explicit" into nextgenv2 2016-08-23 01:28:05 +00:00
Debargha Mukherjee
ccbefec3d8 Merge "Various rect-tx fixes" into nextgenv2 2016-08-23 01:00:39 +00:00
Yaowu Xu
04fe3499a4 Make type conversion explicit
This fixes two MSVC compiler warnings.

Change-Id: I55ad8833676e20c2c4a55885b99a7a9293d9623f
2016-08-23 00:01:00 +00:00
Yaowu Xu
88849e1395 Merge "Apply clang-format" into nextgenv2 2016-08-23 00:00:48 +00:00
Wei-ting Lin
4e8acca925 Allow LF_UPDATE type of frames to use BWDREF
Originally, only bi-pred type of frames can use BWDREF. When
extra alt-refs are inserted in a gf group, the closest alt-ref
serves as ALTREF for the frames within the corresponding
subgroup. Therefore, the original alt-ref can be used as BWDREF
for the LF_UPDATE type of frames.

This patch further swaps the virtual indices of BWDREF and ALTREF
for those frames whose BWDREF is farther than ALTREF. As a result,
the BWDREF is always the closet backward reference frame, and the
ALTREF is the farther one.

It improves the average RD performance by

0.132% in lowres, and
0.030% in midres.

The overall gains for the ext-refs compared to the baseline are

5.486% in lowres, and
4.666% in midres.

Change-Id: I22e4e5f378f19c4c89196a0a5e9214adb46c3428
2016-08-22 17:00:41 -07:00
Yaowu Xu
c3cc46d8c2 Apply clang-format
Change-Id: Ie283af5f30324f54b4f749becdb48f937584707d
2016-08-22 16:22:10 -07:00
Debargha Mukherjee
44026851c3 Various rect-tx fixes
Change-Id: I02f44713b99284092ecfc50ce7ab268e91d2c6f8
2016-08-22 14:18:40 -07:00
Sarah Parker
3464aff41f Add integerize function back in warped_motion.c
This function was previously unused and removed in
I6bc740e778658d6f81ca54888fc6fa822d3b5ee0. I am adding it back in
with previously suggested fixes.

Change-Id: Iee0afb39170d25895b11d07e71843eae6913efd1
2016-08-22 12:29:26 -07:00
Urvang Joshi
3c7aa7ce2d Merge "Palette: count Y colors only for screen content." into nextgenv2 2016-08-19 23:18:39 +00:00
Urvang Joshi
28ca8554c5 Merge "Handle centroid rounding inside palette.c itself." into nextgenv2 2016-08-19 22:22:52 +00:00
Wei-ting Lin
7417932401 Merge "Insert extra ARFs' in a gf group" into nextgenv2 2016-08-19 22:10:41 +00:00
James Zern
7b2537b5e9 Merge "Fix compiler warnings in rdopt when warped motion is enabled" into nextgenv2 2016-08-19 21:39:59 +00:00
Urvang Joshi
d68c7b6d6d Palette: count Y colors only for screen content.
Change-Id: Id4e12708598100df54bdfcf8cdb248161ab6ef88
2016-08-19 13:02:02 -07:00
Urvang Joshi
f746c103a7 Handle centroid rounding inside palette.c itself.
Mostly refactoring, but a very tiny functional change:
Do all rounding in calc_centroids() itself, instead of rounding in two
places inside palette.c

This gives a slight performance improvement for screen content:
0.078% on average.

Change-Id: I7a0e007d30ebf4e59839483a167123f31a222dd4
2016-08-19 12:23:41 -07:00
Sarah Parker
984f073b8a Fix compiler warnings in rdopt when warped motion is enabled
The previous code was giving:
 unused variable ‘tmp_rate’ [-Wunused-variable]
 unused variable ‘tmp_dist’ [-Wunused-variable]
 ‘rate2_nocoeff’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Change-Id: I26326d0e5ffc141ad548654356a877cd3627cea6
2016-08-19 11:10:44 -07:00
James Zern
e0ab852f0b vp10_alloc_context_buffers: clear cm->mi* on failure
based on:
8b4c315 vp9_alloc_context_buffers: clear cm->mi* on failure

Change-Id: I3438a052721b960ff178cb647780f11bc33571fe
2016-08-19 10:39:46 -07:00
Alex Converse
32c92c97ea Merge "Don't send segment probability updates when the map isn't updated." into nextgenv2 2016-08-19 16:52:11 +00:00
James Zern
b360168783 Merge "apply clang-format" into nextgenv2 2016-08-19 07:31:50 +00:00
Wei-ting Lin
41d5d52d78 Insert extra ARFs' in a gf group
Insert multiple arfs in a gf group to emulate multi-layer backward
reference frames structure. At maximum, two extra ARF's are inserted
in a gf group.

It improves the RD performance by 0.317% in Avg in lowres dataset.

Change-Id: I62c32e1b0f25b978484dd113b319bebcd959bf60
2016-08-18 18:21:13 -07:00
Sarah Parker
daa4ba8d19 Disable global motion experiment when incompatible experiments are enabled
This is temporary until the global motion experiment is made to work
with ext_inter and dual_filter.

Change-Id: I73624ca6f536fd98218d7e07bcd7a2c1e6f5aebd
2016-08-18 16:00:38 -07:00
clang-format
21a0c2c9d7 apply clang-format
after:
253c001 Port dering experiment from aom
7208145 Adding 8x16/16x8/32x16/16x32 transforms

Change-Id: Id93e0d7b72a128701d8dec35fc2fac473944d0c1
2016-08-18 15:10:22 -07:00
Alex Converse
fd96aec9c6 Don't send segment probability updates when the map isn't updated.
BUG=webm:1275

Change-Id: I7d4bbaaf2f2146b023e1902fbc535a70e490cf2d
2016-08-18 18:02:01 +00:00
James Zern
0996fc6be3 Merge "fix mips msa build w/CONFIG_EXT_TX" into nextgenv2 2016-08-18 01:44:39 +00:00
Wei-ting Lin
c0235c2c21 Merge "Change the B-frame coding structure." into nextgenv2 2016-08-17 21:15:15 +00:00
Sarah Parker
d4553f5b4d Merge "Switch order of gm parameters for affine model" into nextgenv2 2016-08-17 20:30:52 +00:00
Yi Luo
bfeb90f92a Merge "Delete DCT 64x64 functions to save code size" into nextgenv2 2016-08-17 16:31:28 +00:00
Angie Chiang
688a2ed1f5 Remove __func__
Change-Id: Ibdf1c2d422b9e644eba76fc200c8c10217394036
2016-08-16 18:43:41 -07:00
James Zern
1c25b7f29e fix mips msa build w/CONFIG_EXT_TX
vp10_fht{16x16,8x8,4x4}_msa and the iht were disabled with this config
in:
4ab19ea Fix assertion failures in mips+msa setting

Change-Id: Ic675258b89ca490e8021c887b705c68428925129
2016-08-16 17:30:17 -07:00
Yi Luo
166dd79368 Delete DCT 64x64 functions to save code size
- gcc x86_64 build binary is about 47 KB smaller.

Change-Id: I9e5f41fc9c5c75aec453f8b8567e228a6a6cd71d
2016-08-16 17:16:05 -07:00
Sarah Parker
bec4fbe4be Switch order of gm parameters for affine model
This was originally subtracting 1 from the wrong element in the
parameter set.

Change-Id: I790aafc505f7a8fe7bb00d7d6c62549487a0980f
2016-08-16 15:06:31 -07:00
Wei-ting Lin
b20d0777a8 Change the B-frame coding structure.
Originally we can have a BRF right before an overlay frame (in
display order), which might be unnecessary since we already has a
quality backward reference frame (ARF).
This patch avoids such a coding structure and improves the RD
performance by 0.086% in Avg in the lowres dataset, and 0.153 in
Avg in the midres dataset.

In the lowres dataset, significant gains are obtained for the
following sequences:

mobisode2_240p: 0.563%
keiba_240p: 0.440%
bus_cif: 0.336%
soccer_cif: 0.333%

And the performance drops only in the following four video sequences:

motherdaughter_cif: 0.028%
bqsquare_240p: 0.017%
basketballpass_240p: 0.015%
bowing_cif: 0.006%

Change-Id: Ic94f648ba8e52eb0014933d484fb247610a9ae05
2016-08-16 10:52:24 -07:00
Yaowu Xu
253c001f8f Port dering experiment from aom
Mannually cherry-picked:
1579133 Use OD_DIVU for small divisions in temporal_filter.
0312229 Replace divides by small values with multiplies.
9c48eec Removing divisions from od_dir_find8()
0950ed8 Merge "Port active map / cyclic refresh fixes to vp10."
efefdad Port active map / cyclic refresh fixes to vp10.
1eaf748 Port switch to 9-bit rate cost to aom.
0b1606e Only build deringing code when --enable-dering.
e2511e1 Deringing cleanup: don't hardcode the number of levels
8fe5c5d Rename dering_in to od_dering_in to sync with Daala
4eb1380 Makes second filters for 45-degree directions horizontal
7f4c3f5 Removes the superblock variance contribution to the threshold
3dc56f9 Simplifying arithmetic by using multiply+shift
cf2aaba Return 0 explicitly for OD_ILOG(0).
49ca22a Use the Daala implementation of OD_ILOG().
8518724 Fix compiler warning in od_dering.c.
485d6a6 Prevent multiple inclusion of odintrin.h.
51b7a99 Adds the Daala deringing filter as experimental

Note that a few of the changes were already in libvpx codebse.

Change-Id: I1c32ee7694e5ad22c98b06ff97737cd792cd88ae
2016-08-16 13:47:18 +00:00
Yaowu Xu
0818a7c828 Port commits related to clpf and qm experiments
Manually cherry-picked following commits from AOMedia git repository:
bb2727c Sort includess for "clpf.h"
c297fd0 Add quantisation matrix range parameters.
0527894 Add encoder option and signaling for quant matrix control.
4106232 Turn off trellis coding for quantization matrices.
4017fca Modify tests to allow quantization matrices.
1c122c2 Add quant and dequant functions for new quant matrices.
95a8999 Enable CLPF
f72782b Fix a build issue
73bae50 Add quantisation matrices and selection functions
33208d2 Added support for constrained low pass filter (CLPF)

Change-Id: I60fc1ee1ac40e6b9d1d00affd97547ee5d5dd6be
2016-08-16 13:46:49 +00:00
Sarah Parker
ac917ec262 Merge "Fix dropped const qualifier in new_quant experiment" into nextgenv2 2016-08-16 03:09:54 +00:00
Sarah Parker
28666204ca Fix dropped const qualifier in new_quant experiment
This was causing a compiler warning from -Wcast-qual.

Change-Id: Ie525ffe20be4f38ced68fb0c4141e36400eb0717
2016-08-15 19:26:31 -07:00
James Zern
58b3813cda Merge changes from topic 'clang-format' into nextgenv2
* changes:
  remove tools/vpx-style.sh
  README: add a note about clang-format
2016-08-16 02:00:01 +00:00
Sarah Parker
9142e515e2 Merge "Fix precision bug in warped_motion.c" into nextgenv2 2016-08-15 18:51:37 +00:00
Debargha Mukherjee
7208145722 Adding 8x16/16x8/32x16/16x32 transforms
Adds forward, inverse transforms and scan orders.

Change-Id: Iab6994f4b0ef65e660b714d111b79b1c8172d6a8
2016-08-15 10:33:24 -07:00
Sarah Parker
99adc57976 Fix precision bug in warped_motion.c
The projected coordiantes in projectPointsTranslation were
being shifted by the incorrect precision.

Change-Id: If6040bea9e5187020d85c6095d85c7ff5786b7f9
2016-08-12 16:44:05 -07:00
James Zern
7dcd4993bb remove tools/vpx-style.sh
update ftfy.sh to use clang-format

Change-Id: I8ac740c5b3842beed2b8878fbe506f381f4c57e4
(cherry picked from commit 958ae5af9c892e5328ec0363d1a69afbfe0e0907)
2016-08-12 16:41:19 -07:00
James Zern
92ed0c9146 README: add a note about clang-format
Change-Id: I835401e3befffcbc68e7d2bdd2fd556a19948e91
(cherry picked from commit 15f29ef0922c6584dea69ac75367edb2f6934bb3)
2016-08-12 16:41:19 -07:00
James Zern
814986b84e Merge "webm{dec,enc}.cc,debug_util.c: apply clang-format" into nextgenv2 2016-08-12 23:40:17 +00:00
James Zern
09e3f49854 Merge "vp10/encoder: apply clang-format" into nextgenv2 2016-08-12 23:36:17 +00:00
clang-format
01f4c71719 webm{dec,enc}.cc,debug_util.c: apply clang-format
top-level *.cc were missed in the original change
debug_util.c was checked in with some warnings

Change-Id: I72999bf94d734ffc127bf6f96a8d17f9c313d5a0
2016-08-12 16:23:55 -07:00
James Zern
4efb9771ff Merge "vp10/common: apply clang-format" into nextgenv2 2016-08-12 23:23:04 +00:00
James Zern
ca502bf018 Merge "vp10_fwd_txfm2d_test: use sizeof(var)" into nextgenv2 2016-08-12 23:01:45 +00:00
clang-format
d9f9a34bb1 vp10/encoder: apply clang-format
Change-Id: I58a42ced5b8a4338524434ff3356850b89aa705a
2016-08-12 15:08:05 -07:00
clang-format
7feae8e84e vp10/common: apply clang-format
Change-Id: I01d8241eba3ccaf4d06c00a51df2d17c126f6f9d
2016-08-12 15:07:08 -07:00
James Zern
26777fca7b Merge "vp10/decoder,vp10/*.[hc]: apply clang-format" into nextgenv2 2016-08-12 22:01:48 +00:00
James Zern
245c5a865b vp10_fwd_txfm2d_test: use sizeof(var)
rather than sizeof(type)

Change-Id: I63755e4ca3810bec2d31013bebcc363c5c9f56ed
2016-08-12 14:58:07 -07:00
James Zern
ea74959b7f Merge "test/: apply clang-format" into nextgenv2 2016-08-12 21:57:05 +00:00
James Zern
79fa2f6eba Merge "reconintra_predictors_test: use new[] operator" into nextgenv2 2016-08-12 21:41:32 +00:00
clang-format
3a826f1d3d test/: apply clang-format
Change-Id: I1138fbeff5f63beb5c0de2c357793da12502d453
2016-08-12 12:40:41 -07:00
Yi Luo
4dc5bd7b71 Apply branch prediction on quantize/quantize_skip functions
- On E5-2680, park_joy_1080p, 5 frames, baseline encoding time
  reduces about 0.8~1.0%.
- Credit goes to Erik Niemeyer (erik.a.niemeyer@intel.com).

Change-Id: I69f191d5a4e4b96a5f9ffd8286e484b69d565c01
2016-08-12 12:37:32 -07:00
Yi Luo
712e66dafa reconintra_predictors_test: use new[] operator
fixes mix of malloc & delete[]

Change-Id: I89a1de0614234bf8b3dbe4aacfe71f75f39d08ff
2016-08-12 12:34:23 -07:00
Yi Luo
454fd586b3 Merge "Optimization for HBD filter intra predictors (SSE4.1)" into nextgenv2 2016-08-12 16:20:54 +00:00
clang-format
8a061d421e vp10/decoder,vp10/*.[hc]: apply clang-format
Change-Id: Ie4d7ecb2f692c1c43eff1242e1f00e7fbae00e57
2016-08-11 20:11:16 -07:00
Yi Luo
8e0360a130 Optimization for HBD filter intra predictors (SSE4.1)
- Add unit tests to verify the bit-exact.
- Speed unit test, function improvement: about 8%-23%.
- On E5-2680, park_joy_1080p_12, 25 frames, --kf-max-dist=1
  encoding time improves from <1% to 3.5%

Change-Id: Ic16368885bb253db0200c3a6db143ab1a0b7fc26
2016-08-11 17:34:51 -07:00
James Zern
9c6a7cabd8 Merge "vpx_mem/: apply clang-format" into nextgenv2 2016-08-12 00:18:19 +00:00
Angie Chiang
d2697fce4e Merge "Bitstream debug tool" into nextgenv2 2016-08-11 23:44:08 +00:00
Debargha Mukherjee
c94b635190 Merge "A fix in optimize_b for new-quant" into nextgenv2 2016-08-11 22:09:33 +00:00
clang-format
031d46c941 vpx_mem/: apply clang-format
Change-Id: Ib21077a85ded17823ab62e0b7fdf663ae3dbc05d
2016-08-11 13:02:30 -07:00
Angie Chiang
4de81ee1f1 Bitstream debug tool
This is a debug tool used to detect bitstream error. On encoder side, it pushes
each bit and probability into a queue before the bit is written into the
Arithmetic coder. On decoder side, whenever a bit is read out from the
Arithmetic coder, it pops up the reference bit and probability from the queue as
well. If the two results do not match, this debug tool will report an error.
This tool can be used to pin down the bitstream error precisely. By combining
gdb's backtrace method, we can detect which module causes the bitstream error.

Change-Id: I133a4371fafdd48c488f2ca47f9e395676c401f2
2016-08-11 11:16:04 -07:00
clang-format
05ce850890 vpx_ports/: apply clang-format
Change-Id: I9654530a34a3d0691baeca9d62184cd7b9ac3b4c
2016-08-11 10:52:34 -07:00
Debargha Mukherjee
f4112212da A fix in optimize_b for new-quant
Change-Id: I5a7bd3c2d0c7f6cf714367674f1d75510659b54d
2016-08-11 10:01:54 -07:00
Zoe Liu
cdd4eb0291 Fix a bug in RATE_FACTOR_LEVEL definition for ext-refs
There was a bug in the original set up for RATE_FACTOR_LEVELS, which
results that rate_factor_deltas for GF_ARF_STD is 2.00, instead of the
intentional value of 1.75, whereas for KF_STD is 0.00, instead of the
intentional value of 2.00.

Nevertheless, if simply fixing the bug as in the first patch, the RD
performance unexpectedly dropped by 0.143% in Avg bitrate using
Overall PSNR, especially for following sequences in lowres:

bridge_close_cif: dropped by 1.468%
container_cif: dropped by 2.140%
husky_cif: dropped by 0.826%
motherdaughter_cif: dropped by 0.798%
rasehorses_240p: dropped by 0.805%
students_cif: dropped by 1.411%

This indicates that we should boost up the value for GF_ARF_STD from
1.75 to at least to 2.00. After doing so, while still keeps 2.00 for
KF_STD, the new patch achieves a small gain of 0.15% for the baseline,
and a smaller gain of 0.06% for the experiment of ext-refs. Most
sequences keep the similar RD performance in lowres, except for the
following ones that obtain a bigger gain:

(1) Baseline:
container_cif: 1.628%
students_cif: 1.015%

(2) ext-refs
tennis_sif: 1.248%

Change-Id: I992f8f6a3e20f1b71ec52a1ddc969af4968b78d5
2016-08-11 09:47:46 -07:00
Yaowu Xu
445274d962 Merge "vpx_scale/: apply clang-format" into nextgenv2 2016-08-11 14:33:47 +00:00
clang-format
923d155179 vpx_scale/: apply clang-format
Change-Id: I514654a0704512fb44c7eef5dd045a5767df953a
2016-08-10 23:53:14 -07:00
James Zern
db6a1120a9 Merge "vpx_util/: apply clang-format" into nextgenv2 2016-08-11 04:00:23 +00:00
James Zern
45d1294fdf Merge changes from topic 'clang-format' into nextgenv2
* changes:
  vpx_dsp/: apply clang-format
  vpx/: apply clang-format
  top-level: apply clang-format
  examples: apply clang-format
2016-08-11 03:55:53 +00:00
clang-format
3a992f848a vpx_util/: apply clang-format
Change-Id: I831214d16a5bbfdb86e24dbff8afe4ff4aeebdde
2016-08-10 17:15:04 -07:00
Zoe Liu
4e2d26bd17 Code clean on encoder rate controller
Change-Id: Iec29c00e24ac8c4f24d43142db6ae03f1b3945ac
2016-08-10 15:34:01 -07:00
clang-format
1214cee2f7 vpx_dsp/: apply clang-format
Change-Id: Ia3f96910409be4ae8a4907a2f0dee73b1af8f93d
2016-08-10 12:56:41 -07:00
clang-format
83a5207893 vpx/: apply clang-format
Change-Id: I727b41153cc7929a143e5c370623277558b66e80
2016-08-10 12:42:59 -07:00
clang-format
6c4d83ec9e top-level: apply clang-format
Change-Id: Iac1d97d84518649404e32b136b8fdd840723303c
2016-08-10 12:42:52 -07:00
clang-format
397d964f29 examples: apply clang-format
Change-Id: I06903104bf822819fae39e42fdb6e44d3f9d7787
2016-08-10 12:42:44 -07:00
Urvang Joshi
6dde801818 Palette code: Use built-in qsort() method; create remove_dup() method.
Change-Id: Id816413307334336a9f473540cf9aa0e789ea9e9
2016-08-10 12:10:09 -07:00
Debargha Mukherjee
1da3e129ff Merge "Fix for lossless with rect-tx" into nextgenv2 2016-08-10 19:06:47 +00:00
James Zern
4cfb8309f8 Merge changes I619b365d,I579a9328 into nextgenv2
* changes:
  lossless_test: mark tests as Large
  cpu_speed_test: mark speed 0 as Large
2016-08-10 19:06:16 +00:00
Urvang Joshi
a017c372e5 Merge "Palette code cleanup:" into nextgenv2 2016-08-10 19:05:11 +00:00
Yaowu Xu
d67a8feb93 Change to use proper types
block: from int64_t to int as it is a block index.
sse: from unsigned int to int64_t to reduce type conversion. 

Change-Id: Iec8104ff8a3fd3a77d4e451c12918bd869966c2f
2016-08-10 14:27:12 +00:00
Peter de Rivaz
ffbdc51018 Fix for lossless with rect-tx
Change-Id: Ibb1e5d5137c7717bc6a8683ad78d842c3e5f052e
2016-08-10 12:00:55 +00:00
James Zern
239bb16fef lossless_test: mark tests as Large
Change-Id: I619b365d636737da8b1a322bab3be973de53200d
2016-08-09 20:39:44 -07:00
James Zern
b5818b7722 cpu_speed_test: mark speed 0 as Large
TestTuneScreen / TestScreencastQ0 are the worst offenders

Change-Id: I579a93289aa431afbfea8a280ddcb1011ab1a8cf
2016-08-09 20:32:51 -07:00
Yaowu Xu
c57816fb58 Merge "vp10_highbd_quantize_fp: use const consistently" into nextgenv2 2016-08-10 03:13:42 +00:00
Yaowu Xu
f9efcb345a vp10_highbd_quantize_fp: use const consistently
Remove a few extra ones that are consistent with the definitions, this
fixes some MSVC warnings.

Change-Id: I4b26de4cca71f0ac85667bd641c448b44315941b
2016-08-10 03:13:22 +00:00
James Zern
9df7c2544a Merge "remove SVC" into nextgenv2 2016-08-10 03:07:07 +00:00
James Zern
cc73e1fcd4 remove SVC
spatial/temporal scalability are not supported in VP10 currently.
+ remove the unused vp10/encoder/skin_detection.[hc]

this also enables DatarateTestLarge for VP10 which passes with no
experiments enabled. these were removed previously when only the SVC
tests should have been:
134710a Disable tests not applicable to VP10

Change-Id: I9ee7a0dd5ad3d8cc1e8fd5f0a90260fa43da387c
2016-08-09 18:42:20 -07:00
Sarah Parker
b4d9a2caf3 Merge "Add interface to compute gm parameters in encodeframe" into nextgenv2 2016-08-10 00:37:28 +00:00
James Zern
d44472b646 Merge "remove vp8cx_set_ref.c" into nextgenv2 2016-08-10 00:03:54 +00:00
Sarah Parker
d616a5cee4 Add interface to compute gm parameters in encodeframe
This patch just creates the interface for global motion computation
and calls it from encodeframe. Currently, the function
compute_global_motion_feature_based is empty and the work to do
the actual parameter calculation will be added in a future patch.

Change-Id: Ife142742140079e1c1743b66f180aeb2ecea29ae
2016-08-09 16:00:59 -07:00
Wei-ting Lin
ffdd988427 Merge "Fix a bug for multi_arf_allowed" into nextgenv2 2016-08-09 21:09:57 +00:00
Urvang Joshi
d000020840 Palette code cleanup:
- Avoid some memcpy()s
- Remove indices array
- Make pre_indices array local
- Avoid rounding twice
- Other small simplifications

Change-Id: Iac3236daaad04f21f54054cdd9504de13b942a07
2016-08-09 11:53:34 -07:00
James Zern
2c14c539b3 remove vp8cx_set_ref.c
and the related tests. vpxcx_set_ref is the binary to use for vp10.

Change-Id: I4c4ce7b36b165e6d06b87fd6b53923a1c11e4e6c
2016-08-08 17:14:04 -07:00
James Zern
b869de9856 Merge "configure: test for -Wfloat-conversion" into nextgenv2 2016-08-08 21:48:14 +00:00
Yi Luo
dd2edd0ad5 Merge "Optimization EXT_INTRA's filtered intra predictor (SSE4.1)" into nextgenv2 2016-08-08 20:55:44 +00:00
Sarah Parker
b659281eec Add reconstruction using gm parameters
This patch only includes inter frame reconstruction using gm
parameters when GLOBAL_MOTION and/or VP9_HIGHBITDEPTH are enabled.
GM is not currently used when EXT_INTER or DUAL_FILTER is enabled.
This will be added in a followup patch. For now, these experiments
will take precedence over GLOBAL_MOTION when they are all enabled.

Change-Id: I930ddda529c44d7245dbb56db3c9c5524cf45473
2016-08-08 10:17:05 -07:00
Yi Luo
57c4711b5c Optimization EXT_INTRA's filtered intra predictor (SSE4.1)
- Add unit tests to verify the bit-exact result.
- In speed test, function speed (for each mode/tx_size)
  improves about 23%~35%.
- On E5-2680, park_joy_1080p, 10 frames, --kf-max-dist=1,
  encoding time improves about 1%~2%.

Change-Id: Id89f313d44eea562c02e775a6253dc4df7e046a9
2016-08-08 10:02:36 -07:00
Yue Chen
292ea74fe4 Merge "Speed filter intra mode search in EXT_INTRA experiment" into nextgenv2 2016-08-06 00:17:33 +00:00
James Zern
a9d984830a configure: test for -Wfloat-conversion
supported by clang, gcc-4.9+

Change-Id: I893766de7307fef9a8b68c0cfae137c9d3b0dbe8
(cherry picked from commit 889ed5b158fc280927f2de9172d48245c3b735a7)
2016-08-06 00:02:45 +00:00
James Zern
d609b520fa Merge "warped_motion: remove unused vp10_integerize_model" into nextgenv2 2016-08-06 00:02:15 +00:00
Yue Chen
f6a5c27493 Speed filter intra mode search in EXT_INTRA experiment
(1) Key frame: skip filter intra modes whose directional pred
    version is relatively bad (rd >= 1.125 * best_rd)
(2) Inter frame: do not check filter intra modes if best_intra_rd
    >= 1.25 * best_rd

Encoding time overhead is reduced by:
4.9% (9.2%->4.3%, soccer_cif)
Coding gains drop by 0.021% on lowres and by 0.076% on midres

Change-Id: I29b6f7d3d3dc4b362c6d63bc447e6a429ba5dc66
2016-08-05 23:04:46 +00:00
Wei-ting Lin
c0e55de06b Fix a bug for multi_arf_allowed
The ARF Index was wrong when updating the upsampled reference
frame buffer.

Compared to the baseline in which multi_arf_allowed is disabled, the
RD performance drops 2.250% in Avg using Overall PSNR in the derf
dataset. The performance decrease is especially in the following
video sequences:

foreman_cif: drops 7.489%
husky_cif: drops 6.421%
soccer_cif: drops 4.850%

However, it has a significant gain in the following video sequences:

container_cif: increases 8.043%
harbour_cif: increases 1.332%

Change-Id: I02472909eb34bd070d7544f57383e72559fa42b3
2016-08-05 14:05:50 -07:00
Urvang Joshi
016a5daa59 Palette code: simpler and faster duplicate removal
Change-Id: I0c1baa5ca73c1f067d69239d3e31d1050b4706d2
2016-08-05 12:33:21 -07:00
Zoe Liu
cbed16b8b3 Merge "Code refactoring on Macros related to ref frame numbers" into nextgenv2 2016-08-05 16:59:36 +00:00
Urvang Joshi
9faf9148ec Merge "Make palette code faster: replace nested for loops by a single memcpy()." into nextgenv2 2016-08-05 00:20:11 +00:00
Yaowu Xu
476e44a689 Merge "Replace variants of 'vp8' and 'vp9' with 'vpx'" into nextgenv2 2016-08-04 23:48:59 +00:00
Yaowu Xu
59589d0c4a Merge changes Ic5ddba3c,Ibe7a3248 into nextgenv2
* changes:
  Fix a number of msvc warnings
  Reduce number of frames in lossless tests
2016-08-04 23:27:35 +00:00
Yaowu Xu
fe291b647e Replace variants of 'vp8' and 'vp9' with 'vpx'
Change-Id: Id6cb96b0b15efdda63348d8bfe59fc0533c85ba1
2016-08-04 22:20:38 +00:00
Urvang Joshi
a0a23b7f0c Make palette code faster: replace nested for loops by a single memcpy().
Change-Id: Ia14df45a35c98d680822454fbb8d1763884c1852
2016-08-04 15:01:19 -07:00
Yaowu Xu
56a91f139d Fix a number of msvc warnings
Change-Id: Ic5ddba3ca0c87245617b6dbc78c0f13dc952ce8b
2016-08-04 21:42:56 +00:00
Yaowu Xu
a1b6507e2b Reduce number of frames in lossless tests
This it to reduce the time necessary for these tests.

Change-Id: Ibe7a3248a6c45baf575af85fdffcffc557dd054b
2016-08-04 14:36:29 -07:00
James Zern
6679608386 vp10_inv_txfm2d_test: normalize max_error type
quiets double -> int conversion warning

Change-Id: Ic860d187bc77e18b277eef28310feee1899cdbe6
2016-08-04 12:24:20 -07:00
James Zern
0d5ac98d1c Merge ".clang-format: disable DerivePointerAlignment" into nextgenv2 2016-08-04 19:21:38 +00:00
Yaowu Xu
2b2ee5cdf6 Merge "more cleanup of vp8 and vp9" into nextgenv2 2016-08-04 18:56:46 +00:00
Zoe Liu
1af28f0230 Code refactoring on Macros related to ref frame numbers
We have renamed following Macros to avoid name confusion:

REFS_PER_FRAME --> INTER_REFS_PER_FRAME
(= ALTREF_FRAME - LAST_FRAME + 1)
MAX_REF_FRAMES --> TOTAL_REFS_PER_FRAME
(= ALTREF_FRAME - INTRA_FRAME + 1)

INTER_REFS_PER_FRAME specifies the maximum number of reference frames
that each Inter frame may use.
TOTAL_REFS_PER_FRAME is equal to INTER_REFS_PER_FRAME + 1, which
counts the INTRA_FRAME.

Further, at the encoder side, since REF_FRAMES specifies the maximum
number of the reference frames that the encoder may store, REF_FRAMES
is usually larger than INTER_REFS_PER_FRAME. For example, in the
ext-refs experiment, REF_FRAMES == 8, which allows the encoder to
store maximum 8 reference frames in the buffer, but
INTER_REFS_PER_FRAME equals to 6, which allows each Inter frame may
use up to 6 frames out of the 8 buffered frames as its references.
Hence, in order to explore the possibility to store more reference
frames in future patches, we modified a couple of array sizes to
accomodate the case that the number of buffered reference frames is
not always equal to the number of the references that are being used
by each Inter frame.

Change-Id: I19e42ef608946cc76ebfd3e965a05f4b9b93a0b3
2016-08-04 11:21:28 -07:00
James Zern
32427b379c warped_motion: remove unused vp10_integerize_model
this function produces implicit double -> int conversion warnings and
has additional style issues.

Change-Id: I6bc740e778658d6f81ca54888fc6fa822d3b5ee0
2016-08-03 15:52:03 -07:00
Sarah Parker
108df24d2a Merge "Adjust gm parameter computation to avoid mismatch" into nextgenv2 2016-08-03 21:42:30 +00:00
Yaowu Xu
d9e73a32fc Merge "Cherry pick from AOMedia" into nextgenv2 2016-08-03 19:25:22 +00:00
Yaowu Xu
efe198fb04 Merge "Cherry pick from AOM:" into nextgenv2 2016-08-03 19:25:17 +00:00
Yaowu Xu
a3cff08259 more cleanup of vp8 and vp9
Change-Id: Ic90ebe6136f4b75645ba699d49c0bcb3764ddccf
2016-08-03 12:20:33 -07:00
Sarah Parker
aa810c002c Adjust gm parameter computation to avoid mismatch
The gm parameters need to have WARPED_PRECISION_BITS precision
until they are written to the bitstream because functions in
reconinter use these parameters before they are written to
the bitstream. Previously, the parameters weren't being converted
to WARPED_PRECISION_BITS until they were read from the bitstream
which causes an encode/decode mismatch.

Change-Id: I31e76e9d6f7d24df21af287a72f8c01f1997304d
2016-08-03 12:20:12 -07:00
Yue Chen
0acb76d8eb Merge "Fix a bug and a function name in EXT_INTRA experiment" into nextgenv2 2016-08-03 17:04:15 +00:00
Yaowu Xu
b06147de6b Cherry pick from AOMedia
5b5fbad VP9LfSync->VP10LfSync
b752848 vp8_yv12 -> vpx_yv12
e5068cd VP9->VPX for reference frame flags

Change-Id: Ia36860499c81a5aca8cd6190e7370ec404c0df0f
2016-08-02 16:24:41 -07:00
Yue Chen
31dab60888 Fix a bug and a function name in EXT_INTRA experiment
(1) Apply ALLOW_FILTER_INTRA_MODES flag to the correct place, otherwise
there are bitstream mismatchs when it is 0.
(2) Rename pick_ext_intra_iframe() to pick_ext_intra_interframe().

Change-Id: Ic88c930de1d3f819750f0892df52bde55ae32a91
2016-08-02 16:12:49 -07:00
Yaowu Xu
8bf837f153 Cherry pick from AOM:
68e7e4d0 Remove VP9_CAP_POSTPROC
0738390c Remove vp9_temporal denoise
b89861a4 Remove vp9-postproc

Change-Id: I4ecaa0ac83a519c8174a494378fc23df610ff2a8
2016-08-02 15:29:50 -07:00
Yaowu Xu
134710af32 Disable tests not applicable to VP10
As VP10 does not support multiple layers yet, we disable the ported
tests from VP9.

Change-Id: Ib7577c27e402ede481213b7a64ebee7576a025a5
2016-08-01 13:32:40 -07:00
Hui Su
0594c7f5b1 Merge "Use all possible intra ref pixels for blocks on frame boundary" into nextgenv2 2016-08-01 17:29:09 +00:00
Yaowu Xu
de42ab22bd Merge "Cherry pick renaming changes from AOMedia branch" into nextgenv2 2016-08-01 17:16:07 +00:00
Yaowu Xu
22fda38fa4 Merge "Rename files with vp9_ prefix" into nextgenv2 2016-08-01 17:15:50 +00:00
Yaowu Xu
d4c4724090 Cherry pick renaming changes from AOMedia branch
Manually cherry-picked the following changes:
8c8d16de vp9 -> vpx in names
75b57d39 VP9_ -> VPX_ in function names
761a7088 VP9_INTERP_EXTEND -> VPX_INTERP_EXTEND
4273a52c VP9->VPX in border pixel macros
03568c31 VP9_FRAME_MARKER -> VPX_FRAME_MARKER
2334f51d VP9->VPX in fdct function names

Change-Id: Icc18dbf4b416dd0fa21033b3e19ab8a47c893508
2016-07-29 13:31:32 -07:00
hui su
a4daf360ca Use all possible intra ref pixels for blocks on frame boundary
Tested on lowres and midres, performance impact is neutral.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1269

Change-Id: Idaccaf7da7b460e6201efd436c084be66b5d4ebd
2016-07-29 10:06:50 -07:00
Yaowu Xu
ac86d3d668 Merge "More vp8/vp9 clean up" into nextgenv2 2016-07-29 16:50:33 +00:00
Yaowu Xu
5eee90730b Rename files with vp9_ prefix
Change-Id: I9c51ae3a2af698efe32288b807f881385e19822b
2016-07-29 16:45:08 +00:00
Yaowu Xu
3fa28d51af More vp8/vp9 clean up
Change-Id: I8101de20e873c19d03c7fd2977bc22003e395807
2016-07-28 18:22:47 -07:00
hui su
f67ff4a5b7 Fix a missing space
Change-Id: I6a9ab351012b731308d6b4fc78c09744c6afb61a
2016-07-28 09:55:17 -07:00
Jingning Han
3ecf31dbdf Merge "Separate frame context index for different frame types" into nextgenv2 2016-07-28 16:39:45 +00:00
Yaowu Xu
3bd709fafe Remove vp8, vp9 folders
Change-Id: I09b8acd22d031ece52e1fee18b998349bf1cf06b
2016-07-28 14:33:21 +00:00
Jingning Han
8915eb8e9a Separate frame context index for different frame types
This commit makes the encoder to use different frame context index
for different frame types. In the baseline setting, it sets the
frame context index of the overlay frame to be different from other
regular inter frames. In the ext-refs setting, it further allows
the backward reference frame to use a different index.

It improves the compression performance for both settings.

Baseline
lowres  0.12%

ext-refs
lowres  0.50%
midres  0.56%

Change-Id: I7c63ddec9fc296c56a86353cf2c661a740b97a97
2016-07-27 15:35:15 -07:00
James Zern
725f7f9d57 .clang-format: disable DerivePointerAlignment
everything outside of third_party should follow 'PointerAlignment:
right' i.e., associate the '*' with the variable

+ add a note about the clang-format that generated this file

Change-Id: I13e3f4f5fb6e22a8fa7fc3d06879c995b7c41a39
(cherry picked from commit e4290800b21478a3f9548c58b4f15c5ba5393073)
2016-07-26 19:51:45 -07:00
Hui Su
b124b243d4 Merge "ext-intra: squeeze the derivative table" into nextgenv2 2016-07-26 21:53:50 +00:00
hui su
831e009970 ext-intra: squeeze the derivative table
Reduce its size form 270x2 to 90.

Change-Id: Icaacc4258e43cdc19c6f06598fee17c3dd06d7e9
2016-07-26 12:58:22 -07:00
Yaowu Xu
6d6291b119 Merge "Fix compilation error under Clang 4.0." into nextgenv2 2016-07-26 17:21:58 +00:00
Yaowu Xu
077168bb16 Merge "MinArfFreqTest: Don't leak video on failure." into nextgenv2 2016-07-26 17:21:46 +00:00
Yaowu Xu
d8f83fcfd6 Merge "blockiness_test: fix implicit float conversion" into nextgenv2 2016-07-26 17:21:37 +00:00
Yaowu Xu
73eb764a35 Merge "resize_test: fix implicit float->int conversion" into nextgenv2 2016-07-26 17:21:30 +00:00
Yaowu Xu
8e3766f967 Merge "Add VPX_SWAP macro" into nextgenv2 2016-07-26 17:21:22 +00:00
Yaowu Xu
abb842d45c Merge "Make test encoder test driver less likely to leak on failure." into nextgenv2 2016-07-26 17:21:05 +00:00
Yunqing Wang
2cd670d1c8 Merge "Combine vpxcx_set_ref example for VP9 and VP10" into nextgenv2 2016-07-26 04:16:21 +00:00
Ivan Krasin
eb66904d7c Fix compilation error under Clang 4.0.
The LLVM trunk has reached 4.0 and now __clang_major__ is not enough
to distinguish between old XCode Clang and the new 'real' Clang.
Using __apple_build_version__ allows to make this distinction.

BUG=chromium:631144

Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9
2016-07-26 04:12:08 +00:00
Alex Converse
12ca90d383 MinArfFreqTest: Don't leak video on failure.
Change-Id: I250379f0ac8d4929c9032e7343290e2980fc2e77
2016-07-26 04:11:56 +00:00
James Zern
74e230336f blockiness_test: fix implicit float conversion
float->int as reported by -Wfloat-conversion

Change-Id: Icb0ecb9e2d54edb95813d9f2de34cb6c27b63cbd
(cherry picked from commit 5e2791b54da066cc8543c066813373c9274ff53f)
2016-07-26 04:11:46 +00:00
James Zern
60be793eb6 resize_test: fix implicit float->int conversion
Change-Id: I1efc16fa158740a06da719a1ea90c6dd6a182bb4
(cherry picked from commit 325bdddc38ad15cf7dd2bc618461a13cfb006812)
2016-07-26 04:11:37 +00:00
Yury Gitman
a8de3c0c89 Add VPX_SWAP macro
Change-Id: I60e233eddef238ad918183392794084673f27d2d
2016-07-26 04:08:06 +00:00
Alex Converse
1c1bc94899 Make test encoder test driver less likely to leak on failure.
Individual tests still need to be updated.

Change-Id: Ic433d0f742e13560b136f136b72b2a9973970d78
2016-07-26 04:07:42 +00:00
James Zern
e0c265e4cb y4minput.c: correct empty loop formatting
prefer {}s over ';'

Change-Id: I563fc82717e1deb4f42a40e03dca318c6adaa0c1
2016-07-26 04:05:00 +00:00
James Zern
57303cb783 build/make/Makefile: add a 'test_*' default target
allows 'make test_libvpx', etc. some reworking of the makefiles would be
needed to avoid hard coding targets here.

Change-Id: I18982dbf691e7d36ab8bcf5934bab9340687b061
(cherry picked from commit 25085a6ac21fc8e2341b92e2f1f14d5a7fef30c6)
2016-07-26 04:04:28 +00:00
James Zern
ec53ba7dad build/make/Makefile: remove default suffix rules
Change-Id: I15c8976c6478bf75ec617398f49461b310ab7569
2016-07-26 04:02:58 +00:00
Yaowu Xu
3a9e8797c0 Merge "take II: variance_test partial clean-up" into nextgenv2 2016-07-26 04:02:37 +00:00
skal
8dbbcda9c7 take II: variance_test partial clean-up
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.

Change-Id: Id37b7330eebe80d19b9d12a454f24ff9be6b1116
2016-07-25 18:01:05 -07:00
Yunqing Wang
2a5a3f6bed Combine vpxcx_set_ref example for VP9 and VP10
Modified the example so that the test script worked correctly. Also
included minor fixes.

Change-Id: If97525dd9b85004a07e91c384132eadae35cdbf8
2016-07-25 15:09:13 -07:00
Yaowu Xu
230c24caaf Merge "inline->INLINE: vpx_dsp/x86/blend_a64_vmask_sse4.c" into nextgenv2 2016-07-25 20:48:53 +00:00
Yunqing Wang
464724c4d2 Merge "Adjust adaptive_rd_thresh speed feature" into nextgenv2 2016-07-25 18:53:28 +00:00
Yaowu Xu
53fb766d2a inline->INLINE: vpx_dsp/x86/blend_a64_vmask_sse4.c
This fixes the build using MSVC.

Change-Id: I7788e28fd4f0c6ff3d405c4b4a2ff59eda3ba8b6
2016-07-25 10:43:12 -07:00
James Zern
3f5a2a8ee2 vp10/ratectrl.c: fix implicit float conversion
float->int as reported by -Wfloat-conversion

Change-Id: I3c55990821d922bda7a7600c00ae8d5dcc3cee94
2016-07-22 18:08:23 -07:00
James Zern
d2fa9fe853 rd_pick_palette_intra_sbuv: fix implicit float conv
float->int as reported by -Wfloat-conversion

Change-Id: I9e3d6ce9dbb0689f214afc8d5950f209275e883d
2016-07-22 18:08:23 -07:00
Sarah Parker
d2e11e9705 Fix compiler errors in warped_motion.c
A few functions and variables were named incorrectly

Change-Id: Ib32fa459c51b9e9aad8bb107e1b689a96d98b368
2016-07-22 15:26:28 -07:00
James Zern
19a95f0e07 Merge "Restore vp10_default_scan_orders[]" into nextgenv2 2016-07-22 22:24:58 +00:00
Yunqing Wang
b171dcb1ee Adjust adaptive_rd_thresh speed feature
Set adaptive_rd_thresh to 0 at speed 0. This allows a thorough mode
search, and eliminates a blocking artifact seen in an encoder test.

Borg test:
1. lowres
Overall PSNR: -0.135%; SSIM: -0.293%;
2. hdres
Overall PSNR: -0.122%; SSIM: -0.208%;
Encoder speed tests: 2% - 6% slower.

Change-Id: Ie7601cb8824df8f6f2ae0b2942bd938600f76990
2016-07-22 15:13:04 -07:00
Sarah Parker
da30900381 Restore vp10_default_scan_orders[]
vp10_default_scan_orders was removed in:
e5848de Rectangular transforms 4x8 & 8x4
This fixes compiler error in vp10_quantize_test.cc
Change-Id: I1b8a637e011f9426c3b41e61e00e3babc80defba
2016-07-22 13:39:40 -07:00
James Zern
e96c20b14d add .clang-format, based on Google style
derived from clang-format 3.7.1; same as used in libaom

Change-Id: I8ea915a41d1f2ea3b0d4e4dab9ebc808e9116f11
2016-07-22 12:14:42 -07:00
Yaowu Xu
44aac61c13 Add and remove explicit type conversions
Change-Id: I8b791fda7c64a0363549add99dc9fcae3b29beae
2016-07-22 10:04:06 -07:00
Yaowu Xu
3826383ca1 Fix compiling issues
Change-Id: I530348b12a1c039842ce4e33d21046fe63878f19
2016-07-22 09:43:22 -07:00
Sarah Parker
0ea035f8b7 Merge "Add affine model to global motion" into nextgenv2 2016-07-21 23:17:33 +00:00
Debargha Mukherjee
a7cfdd9457 Merge "Rectangular transforms 4x8 & 8x4" into nextgenv2 2016-07-21 21:54:12 +00:00
Sarah Parker
e9bd26b826 Add affine model to global motion
Change-Id: I9cd355a3ea344ef66a61028efa25d94f54e7e2bd
2016-07-21 14:50:18 -07:00
Debargha Mukherjee
e5848dea5a Rectangular transforms 4x8 & 8x4
Added a new expt rect-tx to be used in conjunction with ext-tx.
[rect-tx is a temporary config flag and will eventually be
merged into ext-tx once it works correctly with all other
experiments].

Added 4x8 and 8x4 tranforms for use initially with rectangular
sub8x8 y blocks as part of this experiment.

There is about a -0.2% BDRATE improvement on lowres, others pending.

When var-tx is on rectangular transforms are currently not used.
That will be enabled in a subsequent patch.

Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
2016-07-21 10:46:41 -07:00
Yaowu Xu
ff3f35c16c Merge "Cleanup x86inc leftovers" into nextgenv2 2016-07-21 02:51:27 +00:00
Yi Luo
f9c01c7b76 Merge "HBD fast path quantization speed improvement" into nextgenv2 2016-07-20 22:48:52 +00:00
Yi Luo
b2663a8a67 HBD fast path quantization speed improvement
- HBD encoder speed improvement (SSE4.1):
  Enable CONFIG_VP9_HIGHBITDEPTH, on Xeon E5-2680,
  50 frames, park_joy_1080p, 12-bit,
  Encoding time reduces from 4846481 to 4177471 (ms)
- Add unit test to verify bit-exact and EOB calculation

Change-Id: I08e8ef3549ddad5ab36d86e78557df3b288537ea
2016-07-20 14:11:10 -07:00
Yaowu Xu
dec16abf7f Cleanup x86inc leftovers
Change-Id: I732d6942d56042a79c4259f775a6045fa95fbeff
2016-07-20 12:32:19 -07:00
Pascal Massimino
c03268b4b5 make signatures match for vp10_init_plane_quantizers
Change-Id: I1efbc91d0ca9183fe34692315307c00c4b346e73
2016-07-20 06:27:36 -07:00
Sarah Parker
0be4d3b11f Merge "Change order of warped motion parameters" into nextgenv2 2016-07-19 21:27:59 +00:00
Zoe Liu
054689b2bf A small refactor on the rate controller
Change-Id: Ie39e16de2457dd201121c62967e4ddaf5a05c33a
2016-07-19 13:55:14 -07:00
James Zern
95fef21468 Merge "fix vp10_convolve() signatures" into nextgenv2 2016-07-19 19:53:08 +00:00
sarahparker
78ea3b3e3e Change order of warped motion parameters
This makes it easier to interface between global motion and warped motion

Change-Id: I850e0a383969a1973f03fb207f100713cda6bb51
2016-07-19 11:04:29 -07:00
Hui Su
096d8ace8e Merge "Extra round of subpel MV search around second best full-pixel MV" into nextgenv2 2016-07-19 16:55:11 +00:00
Wei-ting Lin
3c13124e9b Merge "Allow OVERLAY frames to use the show_exsiting_frame flag" into nextgenv2 2016-07-19 04:28:08 +00:00
Sarah Parker
5fa46c0b60 Add global motion parameters to compressed header
Currently nothing is implemented to compute GM parameters, this
just adds the capability to send them in the bitstream if they
were computed. Still need to implement the reconstruction
based on the parameters in reconinter.

Change-Id: I72aea3c6a9de9f5a40f96da76c82b54a52781fe2
2016-07-18 17:24:07 -07:00
Wei-ting Lin
ccc9e7cfc6 Allow OVERLAY frames to use the show_exsiting_frame flag
ARF with zero strength temporal filter can be reused by setting the
show_existing_frame = 1, and in this case, there is no need to
refresh the reference frame buffer. However, we used the flag
"refresh_golden_frame" as the identifier for the starting point of a gf
group.

A new flags "is_arf_filter_off" is used to record if the filter with
strengrh zero is used.

Change-Id: I25971a760f6e1638d5147fe30488c48125512b1a
2016-07-18 17:15:33 -07:00
Yaowu Xu
681ba36414 Merge "Merge changes from libvpx/master by cherry-pick" into nextgenv2 2016-07-18 22:43:40 +00:00
Sarah Parker
e03af51203 Merge "Add buf0, width, height fields to buf_2d" into nextgenv2 2016-07-18 22:40:37 +00:00
Jingning Han
a555207b20 Merge "Align the quantizers for inter/inter modes in the first pass coding" into nextgenv2 2016-07-18 21:54:05 +00:00
hui su
9a4702417a Extra round of subpel MV search around second best full-pixel MV
Keep track of the best and second best full pixel motion vector
candidates, and do subpel search around both of them.

Compression improvement:
lowres 0.22%   midres 0.23%   hdres 0.18%

No noticeable encoding speed changes observed on lowres test clips.

Change-Id: I5f4df2a03d1db061cfdfdba6138b27e9ea91f089
2016-07-18 12:25:24 -07:00
Zoe Liu
e7869b7168 Correct the experiment names for ext-refs
Change-Id: I83a2b22d12e4573453e2ad866c7ceb430ff062c6
2016-07-18 11:28:31 -07:00
Sarah Parker
166c3250a3 Add buf0, width, height fields to buf_2d
These are needed for the warping function in the global motion
experiment.

Change-Id: Iaab176d0c0b90f6b938e2bac48b24c07e87e3cd9
2016-07-18 11:04:56 -07:00
Johann
2967bf355e Merge changes from libvpx/master by cherry-pick
This commit bring all up-to-date changes from master that are
applicable to nextgenv2. Due to the remove VP10 code in master,
we had to cherry pick the following commits to get those changes:

Add default flags for arm64/armv8 builds

Allows building simple targets with sane default flags.

For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
  --platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
  ~/libvpx/configure --target=arm64-linux-gcc --disable-multithread

BUG=webm:1143

vpx_lpf_horizontal_4_sse2: Remove dead load.

Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e

Fail early when android target does not include --sdk-path

Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c

configure: clean up var style and set_all usage

Use quotes whenever possible and {} always for variables.

Replace multiple set_all calls with *able_feature().

Conflicts:
	build/make/configure.sh

vp9-svc: Remove some unneeded code/comment.

datarate_test,DatarateTestLarge: normalize bits type

quiets a msvc warning:
conversion from 'const int64_t' to 'size_t', possible loss of data

mips added p6600 cpu support

Removed -funroll-loops

psnr.c: use int64_t for sum of differences

Since the values can be negative.

*.asm: normalize label format

add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.

BUG=b/29583530

Prevent negative variance

Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.

configure: remove old visual studio support (<2010)

BUG=b/29583530

Conflicts:
	configure

configure: restore vs_version variable

inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)

this prevents an empty CONFIG_VS_VERSION and avoids make failure

Require x86inc.asm

Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.

The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).

BUG=b:29583530

convolve_test: fix byte offsets in hbd build

CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.

+ factorized some common expressions

Conflicts:
	test/convolve_test.cc

vpx_dsp: remove x86inc.asm distinction

BUG=b:29583530

Conflicts:
	vpx_dsp/vpx_dsp.mk
	vpx_dsp/vpx_dsp_rtcd_defs.pl
	vpx_dsp/x86/highbd_variance_sse2.c
	vpx_dsp/x86/variance_sse2.c

test: remove x86inc.asm distinction

BUG=b:29583530

Conflicts:
	test/vp9_subtract_test.cc

configure: remove x86inc.asm distinction

BUG=b:29583530

Change-Id: I59a1192142e89a6a36b906f65a491a734e603617

Update vpx subpixel 1d filter ssse3 asm

Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.

improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw

Make set_reference control API work in VP9

Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30

Conflicts:
	examples.mk
	examples/vpx_cx_set_ref.c
	test/cx_set_ref.sh
	vp9/decoder/vp9_decoder.c

deblock filter : moved from vp8 code branch

The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.

vpx_thread.[hc]: update webp source reference

+ drop the blob hash, the updated reference will be updated in the
commit message

BUG=b/29583578

vpx_thread: use native windows cond var if available

BUG=b/29583578

original webp change:

commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 19:49:58 2015 -0800

    thread: use native windows cond var if available

    Vista / Server 2008 and up. no speed difference observed.

100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use InitializeCriticalSectionEx if available

BUG=b/29583578

original webp change:

commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:38:46 2015 -0800

    thread: use InitializeCriticalSectionEx if available

    Windows Vista / Server 2008 and up

100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use WaitForSingleObjectEx if available

BUG=b/29583578

original webp change:

commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:40:26 2015 -0800

    thread: use WaitForSingleObjectEx if available

    Windows XP and up

100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use CreateThread for windows phone

BUG=b/29583578

original webp change:

commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:41:26 2015 -0800

    thread: use CreateThread for windows phone

    _beginthreadex is unavailable for winrt/uwp

    Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d

100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vp9_postproc.c missing extern.

BUG=webm:1256

deblock: missing const on extern const.

postproc - move filling of noise buffer to vpx_dsp.

Fix encoder crashes for odd size input

clean-up vp9_intrapred_test

remove tuple and overkill VP9IntraPredBase class.

postproc: noise style fixes.

gtest-all.cc: quiet an unused variable warning

under windows / mingw builds

vp9_intrapred_test: follow-up cleanup

address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea

Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
2016-07-18 10:31:10 -07:00
Jingning Han
2ad40b89b3 Align the quantizers for inter/inter modes in the first pass coding
Use regular extended zero bin quantizer for both inter and intra
modes in the first pass. This doesn't affect lowres and midres
significantly, but would bring back 0.9% coding gains for hdres.

Change-Id: Ifa5977fa7b141fc5be595c0f3a4fc81a93f6606f
2016-07-18 10:16:03 -07:00
skal
87c2db8296 fix vp10_convolve() signatures
fortunately, the call site was calling the function with
the correct parameter order.

Change-Id: Ia48099c18288a2416c8b9a7062d2b8d417fd07df
2016-07-18 15:35:18 +00:00
Yaowu Xu
06c297bd1c Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-07-15 04:45:53 +00:00
Yaowu Xu
6fe07a207b Merge branch 'master' into nextgenv2
Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265
2016-07-14 16:05:48 -07:00
Sarah Parker
010d4a8a93 Merge "Add new_quant quantization in rdopt for 4x4 blocks and intra" into nextgenv2 2016-07-14 22:15:33 +00:00
Debargha Mukherjee
5f8ea94c1f Remove unused zcoeff_blk
from PICK_MODE_CONTEXT and MACROBLOCK

Change-Id: I42f98ce51871948244bdcaaaeb3d0191622116ae
2016-07-14 12:36:03 -07:00
Pascal Massimino
b90dbc3da9 Merge "Fix highbd obmc_variance unit test" into nextgenv2 2016-07-14 18:59:38 +00:00
Sarah Parker
a6aed6e4b3 Add new_quant quantization in rdopt for 4x4 blocks and intra
Originally the uniform quantization function was not being
replaced with the new_quant version in rdopt when new_quant
is turned on. This fixes the bug.

Change-Id: I593793bb909e1e1a6f89544eeca6783fe0576f25
2016-07-14 11:25:13 -07:00
Jingning Han
a387b19619 Fix highbd obmc_variance unit test
Fix the compiling errors in highbd obmc_variance unit test.

Change-Id: Id1bdfd50aeaff996e54067d5e9b369a5fd2d87a8
2016-07-14 10:12:03 -07:00
Hui Su
0c68db43ea Merge "Refactor codes about motion search" into nextgenv2 2016-07-14 00:13:47 +00:00
Jingning Han
75b3224a42 Merge "Fix highbd inter prediction filter sse4 overwriting issue" into nextgenv2 2016-07-13 21:35:29 +00:00
Jingning Han
edbbce8e61 Fix highbd inter prediction filter sse4 overwriting issue
Properly handle the case where the height is an integer multiple
of 4.

Change-Id: I11ac188c13f78db20902e2e333c60ce76ce837c5
2016-07-13 12:51:02 -07:00
Yue Chen
f2b34c3ad8 Merge "Optimize and cleanup obmc predictor and rd search." into nextgenv2 2016-07-13 18:40:49 +00:00
hui su
581636d767 Refactor codes about motion search
1. Add "best_mv" in MACROBLOCK to store the best motion vector
during motion search, so that we don't need to pass its pointer
to various motion search functions.

2. Declare some functions as static when possible.

3. Fix some indents.

Change-Id: I0778146c0866cbc55e245988c59222577ea8260e
2016-07-13 10:12:37 -07:00
Geza Lore
4c4f04ac11 Optimize and cleanup obmc predictor and rd search.
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the obmc predictor. Clean up calc_target_weighted_pred.

Encoder speedup: 1.3%
Decoder speedup: 6.5%

Change-Id: I0c774fe53d22399e92a10d1daf3af0010d88d2c5
2016-07-13 16:54:20 +00:00
Geza Lore
ebc2d34cd9 Add SSE4.1 vpx_obmc_variance* implementations and cosmetics
Speedup for these functions: 4x
Also include some cosmetic changes to SAD functions

Change-Id: I344c32c795492507ae08742f52d035a13f583799
2016-07-12 21:04:46 -07:00
Pascal Massimino
6de0e97d97 Merge "Clean up FunctionEquivalenceTest." into nextgenv2 2016-07-13 03:09:52 +00:00
Geza Lore
a3f7ddc347 Clean up FunctionEquivalenceTest.
remove use of tuple in favor of struct.

Change-Id: If3b1aa5c2fc3cfe1446fff7a8fd270f2ca85fedf
2016-07-12 17:01:19 -07:00
Aamir Anis
15aaa601bd Merge "Fix for loop filter selection procedure" into nextgenv2 2016-07-12 23:56:37 +00:00
Aamir Anis
8575709f97 Fix for loop filter selection procedure
Fixed best error reported by loop filter selection, this value is used
during loop restoration to pick best mode. Baseline remains unchanged,
change in BDRate for loop restoration experiment:
-0.628 -> -0.625 for lowres,
-1.262 -> -1.283 for highres.

Change-Id: I69ef1608bc232b250ac46f59e31fdbed1a999dcd
2016-07-12 15:01:07 -07:00
Yi Luo
fde48c980a Merge "HBD convolution filtering (10/12 taps) SSE4.1 optimization" into nextgenv2 2016-07-12 19:28:48 +00:00
Yi Luo
8cacca73bf HBD convolution filtering (10/12 taps) SSE4.1 optimization
- For experiment EXT_INTERP under high bit depth.
- Add unit test to verify bit-exact.
- Speed performance improvement:
  On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
  drops from 6682503 ms to 5390270 ms.

Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
2016-07-12 10:13:30 -07:00
Geza Lore
c804e0df05 Cleanup obmc_sad function prototypes.
Name 'wsrc', 'mask' and 'pre' explicitly, rather than
using 'b', 'm' and 'a'.

Change-Id: Iaee6d1ac1211b0b05b47cf98b50570089b12d600
2016-07-12 13:23:33 +01:00
James Zern
b8a28fbb3a Merge changes from topic 'missing-proto' into nextgenv2
* changes:
  vp10/encoder/rdopt.c: make a function static
  vp10/encoder/rd.c: make a function static
  vp10_convolve_ssse3.c: make some functions static
  vp10/encoder/bitstream.[hc]: correct a prototype
  vp10/common/idct.h: add some missing prototypes
  highbd_quantize_intrin_sse2.c: add missing rtcd include
  vp10: add some missing includes
2016-07-12 02:39:24 +00:00
Yue Chen
4ff6d13771 Merge "Cosmetics for vp10/common/vp10_rtcd_defs.pl" into nextgenv2 2016-07-12 01:21:33 +00:00
James Zern
849e990779 vp10/encoder/rdopt.c: make a function static
+ remove vp10_ prefix

quiets a -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: I8821c38009b90296280f9b14233e73c92076e81f
2016-07-11 16:52:11 -07:00
James Zern
0baa08336a vp10/encoder/rd.c: make a function static
+ remove vp10_ prefix

quiets a -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: I6b5d71f8120a6d1fee4c782beb4c6d6eef980f65
2016-07-11 16:52:10 -07:00
James Zern
08bd57ef0d vp10_convolve_ssse3.c: make some functions static
quiets -Wmissing-prototypes warnings

BUG=b/29584271

Change-Id: I4d2eb7f4b45d7b829421976641b3212bcf29e7dd
2016-07-11 16:52:10 -07:00
James Zern
3c127f2e36 vp10/encoder/bitstream.[hc]: correct a prototype
quiets a -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: I91aba2a75dccd6752bdf91837564c2aa45817c09
2016-07-11 16:52:09 -07:00
James Zern
9bf5a1ab46 vp10/common/idct.h: add some missing prototypes
quiets the warning of the same name

BUG=b/29584271

Change-Id: I220cd58e1060f77e3910472fed1b167add3a08f8
2016-07-11 16:52:08 -07:00
James Zern
e046f5efef highbd_quantize_intrin_sse2.c: add missing rtcd include
quiets -Wmissing-prototypes warnings

BUG=b/29584271

Change-Id: Iff5214df0d1781810afbfc20bfaf664f109e2f29
2016-07-11 16:52:08 -07:00
James Zern
bc4341fd94 vp10: add some missing includes
quiets some -Wmissing-prototypes warnings

BUG=b/29584271

Change-Id: I9174728459fcabb6d9ac0028ae58029e52c0da92
2016-07-11 16:52:07 -07:00
Yue Chen
68e19472c1 Cosmetics for vp10/common/vp10_rtcd_defs.pl
Change-Id: Iaf8c6f0b1e340f0406df2871a3dc2ded19b7009a
2016-07-11 23:41:30 +00:00
Debargha Mukherjee
5041ff4921 Merge "Add a few branch hints to vp10_optimize_b." into nextgenv2 2016-07-11 22:30:33 +00:00
Debargha Mukherjee
6770c7361e Merge "Optimize and cleanup supertx predictor." into nextgenv2 2016-07-11 22:30:16 +00:00
Debargha Mukherjee
6bbadfb303 Merge "Improve vpx_blend_* functions." into nextgenv2 2016-07-11 19:30:04 +00:00
Geza Lore
cd489264e1 Optimize and cleanup supertx predictor.
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the supertx predictor.

Decoder speedup of up to 4% has been observed.

Change-Id: I255a5ba4cc24f78dc905d25b6e2f7fbafac13253
2016-07-11 18:14:21 +00:00
Geza Lore
bfa59b4a5f Improve vpx_blend_* functions.
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
  indicative that the function does alpha blending. The 6, or 6b
  suffix was misleading, as the max mask value (64) does not fit into
  6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
  the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
  masks directly and apply them to all rows/columns
  (vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
  horizontal version now falls back on the 2D version. This can be
  improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
  (ie: a single pixel). This is for usage convenience. The SSE4.1
  optimized versions fall back on the C implementation if
  w <= 2 or h <= 2. This can again be improved if it becomes hot code.

Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
2016-07-11 19:05:17 +01:00
Pascal Massimino
e5fb2d4e93 remove ROUNDZ_* macros in favor of just ROUND_* ones
Change-Id: I263088be8d71018deb9cc6a9d2c66307770b824d
2016-07-11 06:27:41 -07:00
Geza Lore
1178f71d99 Merge "Fix unused warning without ext-interp" into nextgenv2 2016-07-11 11:29:17 +00:00
Debargha Mukherjee
5d28183fcf Merge "Refactor and clean up on blend_mask6" into nextgenv2 2016-07-09 06:50:32 +00:00
Yue Chen
5b25323c25 Merge "Fix assertion failures in mips+msa setting" into nextgenv2 2016-07-09 01:07:27 +00:00
Yue Chen
4ab19eac62 Fix assertion failures in mips+msa setting
Directly call c functions, otherwise when EXT_TX is enabled, hybrid
transform other than combination of DCT/ADST has not been implemented, thus
will cause assertion failures in the switch loops in vp10_fhtnxn_msa() and
vp10_ihtnxn_nxn_add_msa().

BUG=webm:1239

Change-Id: I2379a07e5406f9489edcd2f3205682f679c9b091
2016-07-08 17:13:52 -07:00
Jingning Han
9c4b041a80 Merge "Properly reset rate and distortion value for zero pred residual case" into nextgenv2 2016-07-08 22:21:27 +00:00
Debargha Mukherjee
72ef6d7704 Refactor and clean up on blend_mask6
Change-Id: Ie9188471e7dc07ab9c95b22f258b1662e895c533
2016-07-08 15:02:57 -07:00
Jingning Han
985dd03ff7 Merge "Integrate ext-interp into dual filter framework" into nextgenv2 2016-07-08 18:25:14 +00:00
Geza Lore
0b9b3d8643 Add a few branch hints to vp10_optimize_b.
vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.

Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.

Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
2016-07-08 19:20:35 +01:00
Sarah Parker
6c56def33e Merge "Make new_quant bin widths to be uniform" into nextgenv2 2016-07-08 17:40:55 +00:00
Jingning Han
e3a2aeb05d Integrate ext-interp into dual filter framework
The combination of the two experiments improves the compression
performance gains:

lowres 2.5%
midres 2.1%

Change-Id: Id26c0a9474ce08893aa1d946365c7ff850fab57a
2016-07-08 16:38:59 +00:00
Jingning Han
1bf039ccd5 Properly reset rate and distortion value for zero pred residual case
When the prediction residuals are all zero, reset the coeff rate
cost and the distortion value to be zero. This change doesn't affect
lowres set significantly, but improves several clips in the midres
set, like sintel_480p and mobisode2_480p, by a few percents. The
average performance for midres set is improved by 0.2%.

Change-Id: Idd5ebf2652e556a1b1c569fe3c48dacef3f11c32
2016-07-08 09:09:18 -07:00
Geza Lore
bb5059ff9b Fix unused warning without ext-interp
Change-Id: Ibb63c492eb8278d115262b8fc3cbc761c406b107
2016-07-08 15:48:02 +01:00
Jingning Han
7c393d097f Merge "Fix ioc in trellis optimization with hbd" into nextgenv2 2016-07-08 01:11:17 +00:00
Sarah Parker
88faa2b348 Make new_quant bin widths to be uniform
Change-Id: Iceeca8ecbc43919b43189352a307479d666d1dad
2016-07-07 16:22:32 -07:00
Debargha Mukherjee
c6f9b7f4ee Merge "RD costing fix in loop-restoration expt" into nextgenv2 2016-07-07 22:47:58 +00:00
Debargha Mukherjee
51957b4162 Merge "Remove redundant code in new_quant" into nextgenv2 2016-07-07 21:55:38 +00:00
Debargha Mukherjee
fc3ce72674 Merge "Clean up build_wedge_inter_predictor_from_buf" into nextgenv2 2016-07-07 20:05:12 +00:00
Debargha Mukherjee
aab64cdddc RD costing fix in loop-restoration expt
Change-Id: I8dbc1002f5d6bf8f2409db8c6be4346f1df0590c
2016-07-07 12:54:54 -07:00
Jingning Han
07d35de056 Fix ioc in trellis optimization with hbd
Use int64_t type for distortion. This avoids integer overflow
issues in the trellis optimization function in high bit-depth
settings.

Change-Id: I550c3ca9f11a3191ef8638a152887018cd476141
2016-07-07 12:00:38 -07:00
Debargha Mukherjee
a85e84599b Remove redundant code in new_quant
Change-Id: Ie2534c7c0cc3fc59e7389b55cb066f2b347d846e
2016-07-07 11:55:20 -07:00
Geza Lore
e6f8c17ac5 Remove various testing utilities.
test/assertion_helpers.h
test/randomise.{cc,h}
test/snapshot.h

Modfiy blend_mask6_test.cc not to rely on these.

Change-Id: I88b8933fe0a729a606797e5cd421795a544c612d
2016-07-07 16:22:07 +01:00
Geza Lore
fc28be3b23 Clean up build_wedge_inter_predictor_from_buf
Change-Id: I715f8ffa3e81056a74ca8ac94793009afb781221
2016-07-07 13:12:57 +01:00
Debargha Mukherjee
fabc0ed7ad Merge "Reinstate tests for wedge partition selection optimizations." into nextgenv2 2016-07-07 05:55:07 +00:00
Debargha Mukherjee
9303d428a2 Merge "Add tests for vpx_sum_squares_i16." into nextgenv2 2016-07-07 05:54:45 +00:00
Yue Chen
c7a92f2cad Merge "Add SSE4.1 vpx_obmc_sad* implementations." into nextgenv2 2016-07-07 01:12:20 +00:00
James Zern
abf05c3e60 Merge changes I9433d858,Iafd05637,If08ce6ca into nextgenv2
* changes:
  tests: remove redundant round() definition
  remove visual studio < 2010 workarounds
  configure: remove old visual studio support (<2010)
2016-07-06 23:25:35 +00:00
Yue Chen
87aec58f11 Merge "Refactoring in preparation for OBMC optimizations." into nextgenv2 2016-07-06 22:44:27 +00:00
Geza Lore
aacdf98c9a Add SSE4.1 vpx_obmc_sad* implementations.
Speedup for these functions: 4x

Change-Id: I21baa04f53c6ab308ea3edf3ebacc62970e97454
2016-07-06 19:46:13 +00:00
hui su
4a18771330 mcomp.c: rename variables and remove unnecessary codes
Change-Id: I4ad4061875fa1c8f3801efbcdcb0da47e7c032a5
2016-07-06 10:25:46 -07:00
Geza Lore
471362f61f Add tests for vpx_sum_squares_i16.
Change-Id: I529c34d5bfa85719cb6499a9a3c9d907eccccd56
2016-07-06 15:14:59 +01:00
Geza Lore
2791d9db1e Reinstate tests for wedge partition selection optimizations.
This reinstates the tests from commit
efda2831e5f758b4f350679b5c55c0b9282449b0 with the appropriate
fixes for 32 bit x86 builds.

Change-Id: Ib331906c5b448ca964895ee9cbfd4266f67d1089
2016-07-06 15:09:46 +01:00
Geza Lore
007aa7dd65 Refactoring in preparation for OBMC optimizations.
- Use int32_t instead of int in vpx_obmc{variance,sad} functions
- Remove weigthed_src and obmc mask strides and assume contiguous
  buffers. These inputs can always be packed as contiguous arrays.

Change-Id: I74c09b3fb3337f13d39e13a9cb61e140536f345d
2016-07-04 16:57:17 +01:00
Wei-ting Lin
f9e38a7bb9 Remove reference frame buffer update for show_exsiting_frame
Originally we need to send the refresh flag and the virtual indices
mapping for the reference frame buffer update for show_existing_frame to
have the BWDREF_FRAME replace the LAST_FRAME.

To remove sending this information, we update the the virtual indices
of the reference frame buffer after the last_bipred_frame is encoded,
and therefore the decoder will receive the updated reference mapping
at the next non-show-existing frame.

As a result, we can save 4 bytes per show-existing frame, and get 0.12,
0.2, and 0.07 BDRATE improvement in lowres, derf, and midref test set
respectively.

Change-Id: I63d41ee6ea99884798f0778b789d2701e2f2d3e0
2016-07-01 09:26:54 -07:00
Geza Lore
b04ea832a4 Minor cleanup of inter mode search.
Change-Id: I523a3b30eb80fc6c6ed83638fdb82cf65c22b2e5
2016-07-01 09:00:05 +01:00
Debargha Mukherjee
e5e37e310b Merge "Reject ext-inter compound modes based on modelled RD." into nextgenv2 2016-06-30 18:18:53 +00:00
Jingning Han
3d8cde6618 Merge "Remove unused BITDEPTH_10 definition" into nextgenv2 2016-06-30 16:26:25 +00:00
Jingning Han
fce34bc8ac Merge "Fix shift value in dist_block with hbd" into nextgenv2 2016-06-30 16:26:18 +00:00
Debargha Mukherjee
2756b2f004 Merge "Various cosmetics on the new_quant experiment" into nextgenv2 2016-06-30 16:03:49 +00:00
Geza Lore
532304e468 Reject ext-inter compound modes based on modelled RD.
Reject ext-inter compound modes before doing full rate distortion
evaluation, if the corresponding single reference modes had a lower
modelled RD.

ext-inter speedup up to TBD.

Coding performance: TBD

Change-Id: I358bfb879c5ebe5e7afbf6f540cc784f8de14857
2016-06-30 09:56:17 +01:00
James Zern
6bbb8b79eb tests: remove redundant round() definition
use vpx_ports/msvc.h for compatibility

BUG=b/29583530

Change-Id: I9433d8586cd0b790e7f4d697304298feafe801f1
(cherry picked from commit 0a64929f19cc1ce89f993aa5c9d61a29679eb961)
2016-06-29 17:11:11 -07:00
James Zern
26f6b216a1 remove visual studio < 2010 workarounds
BUG=b/29583530

Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a
(cherry picked from commit c125f4a594815ad63b50e4b684ada4b44c545932)
2016-06-29 17:11:10 -07:00
James Zern
f8876a22c1 configure: remove old visual studio support (<2010)
BUG=b/29583530

Change-Id: If08ce6ca352f377ac4db6b9b1909b507bba6d872
(cherry picked from commit 078dff72ca7bff079cb3c56d98c588c6ea6d2814)
(cherry picked from commit 046226376533b610ddc700f14409f195aa6abd51)
2016-06-29 17:10:36 -07:00
Jingning Han
e6c8e35dec Remove unused BITDEPTH_10 definition
Change-Id: Ic11f32db352e1ff7b3ed140654ee1a6016ba516f
2016-06-29 16:43:54 -07:00
Jingning Han
49222c3718 Fix shift value in dist_block with hbd
This offset value related to the bit depth has been taken care of
inside the function vp10_highbd_block_error.

Change-Id: I58dd8a53380ba4529d59837e56a951bc81a2962e
2016-06-29 16:42:23 -07:00
Debargha Mukherjee
0eefe6edb4 Remove use_quant_fp speed feature
Change-Id: I22f1299545d4c75d80e72d479be66f66ea142ef1
2016-06-29 13:58:53 -07:00
Debargha Mukherjee
a35597fc7f Various cosmetics on the new_quant experiment
Also extends quant profiles to include quality range.

Change-Id: Ia96e45b6425e1d42ca61fc401f63d4fd7214e448
2016-06-29 13:18:52 -07:00
Debargha Mukherjee
d5ddb56721 Merge "Remove skip_txfm optimization." into nextgenv2 2016-06-29 17:52:39 +00:00
Sarah Parker
ac626ee7ac Merge "Fix compiler warnings in yv12extend.c" into nextgenv2 2016-06-29 03:29:13 +00:00
Sarah Parker
cbb7c65794 Merge "Fix compiler warnings in vp10_convolve_optimz_test.cc" into nextgenv2 2016-06-29 02:03:10 +00:00
Sarah Parker
29f36f7e37 Fix compiler warnings in yv12extend.c
Change-Id: I1f6f5b8861c4081b1f4b85c531c5d7ef0cb67bab
2016-06-28 17:42:44 -07:00
Sarah Parker
9576374952 Fix compiler warnings in vp10_convolve_optimz_test.cc
Change-Id: I11b717e1652dff440a54f6977527d544b0c5ed29
2016-06-28 17:13:03 -07:00
Alex Converse
0dc56b6a15 ethread_test: Remove vp10 as test parameter.
Change-Id: I043418cde5a2562520ff37cdf81436abc2c9821a
2016-06-28 14:32:15 -07:00
Geza Lore
92922be83c Remove skip_txfm optimization.
Commit 0d6980d7a1caa592058f8d5d618b012c160772f7 removed some use
of the skip_txfm optimization, and the rest are not productive.

The current use of this optimization is only used with --good
and --cpu-used >= 3, however the overhead of this is higher than the
speedup it yields.

Removing this, and subsequently simplifying model_rd_for_sb yields
a net encoder speedup:
--cpu-used=0    ~1.5% faster
--cpu-used=3    ~2.0% faster

The code simplification is also significant.

Change-Id: I1dd668c32de15a2e912c59c42379d0f9e1032ff8
2016-06-28 10:03:03 +01:00
Sarah Parker
7458f11766 Merge "Quantization fix for new-quant/var-tx" into nextgenv2 2016-06-28 02:21:35 +00:00
Hui Su
487bdac2f9 Merge "Rename the initial MV search candidate" into nextgenv2 2016-06-28 00:37:48 +00:00
Hui Su
3426f11459 Merge "Refactor vp10_pattern_search" into nextgenv2 2016-06-28 00:24:02 +00:00
Debargha Mukherjee
f3dfa0c36a Quantization fix for new-quant/var-tx
Also use the fp quantizer consistently

lowres: -0.07 BDRATE improvement

Change-Id: I9174f6ad54a74d38541004b99cb3689d0c09be55
2016-06-27 17:22:09 -07:00
Hui Su
ab9940293e Merge "Fix a bug in vp10_pattern_search()" into nextgenv2 2016-06-27 23:09:10 +00:00
Jingning Han
3667d62e06 Merge "Disable trellis optimized quantization in the first-pass" into nextgenv2 2016-06-27 21:34:42 +00:00
Yi Luo
dd2064a0ac Merge "Fix bugs in convolution filter optimization" into nextgenv2 2016-06-27 21:33:45 +00:00
Jingning Han
813201e174 Disable trellis optimized quantization in the first-pass
This resolves the use of uninitialized value in the first-pass
encoding.

Change-Id: I78bc19214a1bfde5c5641424550cbbe4e52cae99
2016-06-27 12:46:07 -07:00
Sarah Parker
cc16c5d805 Merge "Add multiple quantization profiles to new_quant experiment" into nextgenv2 2016-06-27 18:46:25 +00:00
Yi Luo
8404253f81 Fix bugs in convolution filter optimization
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
  pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to,
  81ad953 Convolution vertical filter SSSE3 optimization

Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536
2016-06-27 10:23:38 -07:00
hui su
09a77e1052 Rename the initial MV search candidate
Its old name "ref_mv" is confusing.

Change-Id: I7ac8f346c468bcf3c0e7692582d423fb7a1f113a
2016-06-24 21:47:00 -07:00
hui su
19f3eaa657 Refactor vp10_pattern_search
Combine it with vp10_pattern_search_sad

Change-Id: I47a3b34dfefad9fc8abd23fcc197f6aea3419873
2016-06-24 21:44:59 -07:00
hui su
15f0bf47e7 Fix a bug in vp10_pattern_search()
Should use sub-pel MV instead of full-pixel MV as input parameter
to calc_int_cost_list().

Change-Id: I054d94220a090ca54c8d24df265193ee345cd439
2016-06-24 21:17:00 -07:00
Debargha Mukherjee
9f2167aede Merge "Turn on ActiveMapRefreshTest for Vp10" into nextgenv2 2016-06-25 00:32:21 +00:00
Sarah Parker
fbe6fb2773 Add multiple quantization profiles to new_quant experiment
Add the ability to pick between 3 quantization profiles.
The profile is chosen based on the entropy context at the
block level.

Change-Id: Iaea0485798441b7d635962c2563f3a477f582dac
2016-06-24 16:16:13 -07:00
Debargha Mukherjee
cf0cdfc55e Turn on ActiveMapRefreshTest for Vp10
Also reduce number of frames coded for VP10.

Change-Id: I7de908861620b6f4f08513516110fd584660d994
2016-06-24 12:55:03 -07:00
Yi Luo
2003cd8011 Merge "Change register loading to fix stack overflow issue" into nextgenv2 2016-06-24 18:47:21 +00:00
Yi Luo
08184e32de Change register loading to fix stack overflow issue
- Use _mm_loadl_epi64 instead of _mm_loadu_si128 for
  uint16_t temp2[4 * 4] buffer.
- Refer to:
  d0de89a remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1
BUG=webm:1242

Change-Id: Ieff555c8dd8070937f27f4ec8535b77e1ed5b8b2
2016-06-24 10:39:49 -07:00
Jingning Han
bf89ee7109 Merge "Use uniform quantizer for sub8x8 block coding" into nextgenv2 2016-06-24 17:07:46 +00:00
Jingning Han
0353e8d6b8 Merge "Refactor sub8x8 block transform and quantization process" into nextgenv2 2016-06-24 17:07:35 +00:00
Jingning Han
45c9add28c Merge "Make recursive txfm partitioning use uniform quantizer" into nextgenv2 2016-06-24 03:18:11 +00:00
Jingning Han
d100948ca2 Use uniform quantizer for sub8x8 block coding
Use the trellis optimization based uniform quantizer to encode the
sub8x8 block coding.

Change-Id: Ibbf7791b0aa430b7c67ef38eac3af6379578f56d
2016-06-23 17:01:00 -07:00
Jingning Han
bbe1b2217b Refactor sub8x8 block transform and quantization process
This commit refactors the transform and quantization process for
sub8x8 blocks and unifies the related functions.

Change-Id: I005f61f3eb49eec44f947b906c4e308cab9935a2
2016-06-23 16:56:05 -07:00
Jingning Han
db78fa9abb Make recursive txfm partitioning use uniform quantizer
Replace the expanded zero-bin quantizer with uniform quantizer in
the recursive transform block partitioning scheme. This improves
the compression performance by 0.4% for lowres.

Change-Id: I1c32ce9ebba0f0760e36a2c5bd20f2f5887ea5b4
2016-06-23 15:23:43 -07:00
Yi Luo
2f52813ebd Merge "Convolution vertical filter SSSE3 optimization" into nextgenv2 2016-06-23 22:01:19 +00:00
Jingning Han
dde6e2b1d3 Merge "Enforce trellis optimization for 1-pass encoding" into nextgenv2 2016-06-23 21:38:03 +00:00
Yi Luo
81ad95363a Convolution vertical filter SSSE3 optimization
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
  to 6.73%.

Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57
2016-06-23 12:56:47 -07:00
Jingning Han
8aca4c3495 Enforce trellis optimization for 1-pass encoding
This fixes the unit test failure in the 1-pass settings of
EndToEndTestLarge.EndtoEndPSNRTest

bug=webm:1243

Change-Id: I7667c341f7c063f7ffb83786446bbbd1e498c1aa
2016-06-23 12:18:28 -07:00
Yi Luo
76ff9b3097 Merge "Fix input buffer initialization in convolution filter test" into nextgenv2 2016-06-23 18:43:03 +00:00
Jingning Han
95d99ae1f0 Merge "Refactor reference frame type defs" into nextgenv2 2016-06-23 16:51:25 +00:00
Alex Converse
8505967430 Merge "Reject sub8x8 partitions with SEG_LVL_SKIP." into nextgenv2 2016-06-23 16:50:04 +00:00
Alex Converse
5a019e148e Reject sub8x8 partitions with SEG_LVL_SKIP.
Change-Id: I2503f163464862dc3a7a3141d43d3f07c81b33d2
2016-06-23 16:20:51 +00:00
Geza Lore
c9e7675c1a Force SIMPLE_TRANSLATION motion for SEG_LVL_SKIP blocks.
Change-Id: Ib8ac19f25d06351b8aabed742aa0be66e28ec4d4
2016-06-23 14:09:16 +01:00
Jingning Han
b605de074d Refactor reference frame type defs
Move the reference frame type definitions to common/enums.h file.
Replace hard coded numbers.
Combine repeated definitions.

Change-Id: I288e079a03e448014cc181bcdb3f88ee8ec8d139
2016-06-22 12:34:44 -07:00
Yi Luo
f26a48bd52 Fix input buffer initialization in convolution filter test
Change-Id: I70c0da96a81463d752e88b134b6fde012bd5823d
2016-06-22 11:46:16 -07:00
Zoe Liu
cb2c037c06 Remove unnecessary macros
Change-Id: Id0975947b4e7b76b2c2464905f3b9a29245946c2
2016-06-22 10:25:40 -07:00
Yue Chen
883e3b840f Merge "(Cosmetics) Remove unnecessary new parameters in obmc experiment" into nextgenv2 2016-06-22 17:22:47 +00:00
Debargha Mukherjee
78842b2870 Merge "Reinstate "Optimize wedge partition selection." without tests." into nextgenv2 2016-06-22 16:59:40 +00:00
Yue Chen
02596589e7 (Cosmetics) Remove unnecessary new parameters in obmc experiment
pred_variance in obmc experiment is equivalant to recon_variance in
baseline

Change-Id: Iba8fb9bd973898be5a0d87a507ceaf65c75bdc51
2016-06-22 06:24:32 +00:00
Jingning Han
c797e709a2 Merge "Fix uninitialized context use case in supertx and var-tx" into nextgenv2 2016-06-22 05:47:45 +00:00
Jingning Han
7272750702 Merge "Make drl support bi-directional reference frames" into nextgenv2 2016-06-22 05:47:34 +00:00
Hui Su
09d7d76b21 Merge "Remove an unnecessary if()" into nextgenv2 2016-06-22 04:10:41 +00:00
Hui Su
467bb16215 Merge "Skip optimizing larger coefficients in trellis quant module" into nextgenv2 2016-06-22 04:10:21 +00:00
James Zern
5d14586392 Merge "remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1" into nextgenv2 2016-06-22 03:13:31 +00:00
Jingning Han
d26815569f Fix uninitialized context use case in supertx and var-tx
This commit fixes the use of uninitialized context values in the
combination of supertx and var-tx.

Change-Id: I2d36badf5c9806ea402ce3e19515cc299e6b79e8
2016-06-22 00:46:22 +00:00
Jingning Han
c2195c5b7e Make drl support bi-directional reference frames
This commit refactors the reference frame structure used in the
dynamic motion vector referencing system, and makes it support
the bi-directional reference frames. This resolves unit test
failure (enc/dec mismatch) when both are turned on.

The compression performance (ref-mv + ext-refs) is improved by
0.2% for lowres.

Change-Id: I233624d8fccc1f69e82295f94de984ff056365dc
2016-06-21 17:39:30 -07:00
Debargha Mukherjee
997b491272 Merge "Add 1D version of vpx_sum_squares_i16" into nextgenv2 2016-06-22 00:33:18 +00:00
Alex Converse
27d3905cae Merge "Cleanup dist_block()" into nextgenv2 2016-06-22 00:16:10 +00:00
hui su
9981cb8b0f Remove an unnecessary if()
The condition of this if() is always true.

Change-Id: I251715d519414d1a3d0a78eb3d025df11d913298
2016-06-21 14:56:11 -07:00
hui su
e067755930 Skip optimizing larger coefficients in trellis quant module
This achieves a few percent speed increase without hurting
compression performance.

Change-Id: I040e9bb69274f7de843bdd15926a5c924b30a731
2016-06-21 14:55:52 -07:00
Geza Lore
135d663159 Reinstate "Optimize wedge partition selection." without tests.
This reinstates commit efda2831e5f758b4f350679b5c55c0b9282449b0
without the tests and with fixes for 32 bit x86 builds.

Change-Id: I34be4fe1e8a67686d26ba256fd7efe0eb6a569e8
2016-06-21 20:31:50 +01:00
Geza Lore
52141c9111 Add 1D version of vpx_sum_squares_i16
Change-Id: I1829f931749a26aec38c896b609c5a2640d6dfaf
2016-06-21 20:31:50 +01:00
Debargha Mukherjee
7f929d292d Merge "Always respect tile bounds in calc_target_weighted_pred." into nextgenv2 2016-06-21 18:33:40 +00:00
Debargha Mukherjee
db328a6b18 Merge "Fix false uninitialized warnings (GCC 5+)." into nextgenv2 2016-06-21 17:12:13 +00:00
Geza Lore
78bd14b38d Always respect tile bounds in calc_target_weighted_pred.
The tile boundaries should now be respected even between tile rows.
regardless of whether ext-tile is used or not.

Change-Id: I5a39fd274451114a4264215f97f12be2c908016d
2016-06-21 17:56:29 +01:00
Jingning Han
02b8212be8 Merge "Handle two identical states in the trellis chain" into nextgenv2 2016-06-21 16:04:31 +00:00
Geza Lore
7de2ba3eae Fix false uninitialized warnings (GCC 5+).
Change-Id: Ia00c754ddaf22bb7f1dfcd20106db6293bf4b070
2016-06-21 12:54:17 +01:00
Jingning Han
5223a4b405 Handle two identical states in the trellis chain
When the next two states are identical, skip repeated cost table
fetch and multiplication operations. This makes the trellis unit
about 5% faster.

Change-Id: I0dbf7ad0a5732044e4e45dd59e9431a251c678f2
2016-06-20 16:59:28 -07:00
Yue Chen
474ea305ea Merge "Fix RDO issue of obmc + speed feature fast_inter_tx_type_search" into nextgenv2 2016-06-20 21:57:24 +00:00
Yi Luo
f1a50db2d1 Merge "Convolution horizontal filter SSSE3 optimization" into nextgenv2 2016-06-20 20:06:02 +00:00
Yi Luo
229690a95c Convolution horizontal filter SSSE3 optimization
- Apply signal direction/4-pixel vertical/8-pixel vertical
  parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.

Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59
2016-06-20 11:10:30 -07:00
Jingning Han
a5bcf03030 Merge "Use precise rate estimate for zero_token" into nextgenv2 2016-06-20 16:47:33 +00:00
Jingning Han
899a989d3a Merge "Optimize the use case of token_cost table" into nextgenv2 2016-06-20 16:47:20 +00:00
Debargha Mukherjee
dc5431ad4b Merge "Turn on AqSegment tests for VP10" into nextgenv2 2016-06-20 16:47:13 +00:00
Yue Chen
1273c39c03 Fix RDO issue of obmc + speed feature fast_inter_tx_type_search
Change-Id: I86a967ad2d824ca7877626eed9eb11f0e057b22d
2016-06-20 16:38:12 +00:00
Yue Chen
b37c279ab5 Merge "Re-enable ActiveMapTest for VP10" into nextgenv2 2016-06-20 16:37:53 +00:00
Jingning Han
86b7d39a83 Merge "Fix unit test failure in obmc exp" into nextgenv2 2016-06-18 22:38:53 +00:00
James Zern
d0de89a12a remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1
these cause ASan errors VP10/EndToEndTestLarge.EndtoEndPSNRTest

BUG=webm:1242

Change-Id: I0334e3b255b14e18f61970c3721ae748dc79727b
2016-06-17 19:46:20 -07:00
Jingning Han
887f020691 Fix unit test failure in obmc exp
Properly restore the rate cost in the inner search loop of obmc
prediction. This avoids unexpected encoding behavior. It fixes
the unit test failure in obmc experiment:

AltRefForcedKeyTestLarge.Frame1IsKey/2

Change-Id: I667b219dfcf2f2c63d9d984900ed3cfd10c354bd
2016-06-17 17:44:03 -07:00
Yue Chen
0c73623d2c Merge "Make variance based partitioning compatible with SEG_LVL_SKIP" into nextgenv2 2016-06-18 00:19:42 +00:00
Jingning Han
8293759056 Merge "Skip restore token_cache value" into nextgenv2 2016-06-17 21:59:53 +00:00
Geza Lore
dc5ae1e34c Merge "Fix warnings from gtest under GCC 5 or newer." into nextgenv2 2016-06-17 20:10:14 +00:00
Zoe Liu
5805a14ca6 Merge bi-predictive frames to EXT_REFS
This patch removed the experiment of BIDIR_PRED and merged the feature
into the experiment of EXT_REFS:

(1) Each frame now has up to 6 reference frames, namely
    LAST_FRAME, LAST2_FRAME, LAST3_FRAME, GOLDEN_FRAME, (forward) and
    BWDREF_FRAME, ALTREF_FRAME (backward);
    LAST4_FRAME has been removed;
(2) First pass still keeps the 8 updates:
    KF_UPDATE, LF_UPDATE, GF_UPDATE, ARF_UPDATE, OVERLAY_UPDATE, and
    BRF_UPDATE, LAST_BIPRED_UPDATE, BI_PRED_UPDATE;
(3) show_existing_frame==1 is supported in the experiment of EXT_REFS;
(4) New encoding modes are added for both single-ref and compound cases,
    through the use of the 2 extra forward references (LAST2 & LAST3)
    and the 1 extra backward reference (BWDREF).

RD performance wise, using Overall PSNR: Avg/BDRate
        Bipred only      Prev EXT_REFS    Current EXT_REFS with bipred
lowres: -3.474/-3.324    -1.748/-1.586    -4.613/-4.387
derflr: -2.097/-1.353    -1.439/-1.215    -3.120/-2.252
midres: -2.129/-1.901    -1.345/-1.185    -2.898/-2.636

If in vp10/encoder/firstpass.h, change BFG_INTERVAL from 2 to 3, i.e. to
use 2 bi-predictive frames than 1, a further improvement may be
obtained:
                 Current EXT_REFS with bipred
        1 bi-predictive frame    2 bi-predictive frames
lowres: -4.613/-4.387            -4.675/-4.465
derflr: -3.120/-2.252            -3.333/-2.516
midres: -2.898/-2.636            -3.406/-3.095

Change-Id: Ib06fe9ea0a5cfd7418a1d79b978ee9d80bf191cb
2016-06-17 12:43:39 -07:00
Geza Lore
7172e97abe Re-enable ActiveMapTest for VP10
Change-Id: I030fdde966b9911712eca131d095015afd9b0d8a
2016-06-17 20:33:58 +01:00
Geza Lore
169431b84a Make variance based partitioning compatible with SEG_LVL_SKIP
Inter blocks that have SEG_LVL_SKIP active must be at least 8x8 in
size for bitstream conformance (see read_inter_block_mode_info in
decodemv.c).

This patch makes the variance based partitioning scheme stop at 8x8
blocks in inter frames. This satisfies the SEG_LVL_SKIP constraint
and is more in line with the original implementation of this function
(before it got extended for 128x128 superblocks).

BUG=webm:1234

Change-Id: I1fdd894569a9c0817713a77daabe4c8b8e1d00c0
2016-06-17 20:31:05 +01:00
Jingning Han
019b750867 Use precise rate estimate for zero_token
This commit takes the precise rate estimate for zero_token rate
cost update. It improves the compression performance:

lowres 0.15%
midres 0.23%

Change-Id: I36761079f75ce43c814f8c663667e359d4ac2cd4
2016-06-17 10:57:30 -07:00
Jingning Han
90ea281f29 Optimize the use case of token_cost table
Reduce the cache footprint of the token_costs table.

Change-Id: Ie989e60c6479ac3251cadaac9c7e795ccba52f4e
2016-06-17 10:15:34 -07:00
Geza Lore
4c83fdd3d7 Fix warnings from gtest under GCC 5 or newer.
Change-Id: I9661f2fe9d315dccae69caa70d929b5d9d93b7db
2016-06-17 15:36:06 +01:00
Jingning Han
019dbb4cdc Merge "Rework table access operations in vp10_optimize_b function" into nextgenv2 2016-06-17 00:25:50 +00:00
Alex Converse
cf7968dcea Cleanup dist_block()
Change-Id: Iff0c0548924efd5a01c3a301cc5b4cdfda42e87e
2016-06-16 15:27:22 -07:00
Jingning Han
c187429865 Skip restore token_cache value
The trellis optimization is going backward. Hence there is no need
to restore the token_cache values that is behind the current node
in the scan order.

Change-Id: I4da8a2e3f78bf9630e6667c85d8c387c5d94de9a
2016-06-16 15:18:46 -07:00
Yue Chen
6ab5dbee4f Merge "Make supertx skip bits observe segment level coding." into nextgenv2 2016-06-16 22:03:45 +00:00
Jingning Han
37bf29b916 Rework table access operations in vp10_optimize_b function
Localize table access. This provides another 10% speed-up to
the unit.

Change-Id: Ib902121f412f78e2bd501b9799c8c64462f803b5
2016-06-16 14:33:16 -07:00
Debargha Mukherjee
907b5124fd Merge "Change supertx syntax order." into nextgenv2 2016-06-16 18:25:05 +00:00
Zoe Liu
5201280f70 Disable the unit test of ArfFreq for BIDIR_PRED
The test in arf_freq assumes any no-show frame as ALTREF_FRAME and
then calculate the minimum run between two consecutive ALTREF_FRAME's
based on this assumption. As BWDREF_FRAME is also a no-show frame and
the minimum run between two consecutive BWDREF_FRAME's may vary
between 1 and any arbitrary positive number as long as it does not
exceed the golden frame group interval, this test does not apply to
the experiment of BIDIR_PRED.

Change-Id: I70efb2c691fdc18601dbb8a7735ac2f27817e75a
2016-06-16 09:45:57 -07:00
Geza Lore
7f6518a4b7 Make supertx skip bits observe segment level coding.
Change-Id: Id918d502c8f89e236bcb51949d7ad34efa017321
2016-06-16 17:41:46 +01:00
Geza Lore
984cc04a22 Change supertx syntax order.
Move the supertx skip bit and transform type past the recursive
prediction blocks. This is in preparation for using the segment level
skip feature for supertx blocks.

Change-Id: I8319414b0734144a9264e8a4a60940b6716b12a8
2016-06-16 17:41:39 +01:00
Zoe Liu
a0d122079d Merge "Fix the superframe unit test for BIDIR_PRED" into nextgenv2 2016-06-16 16:15:07 +00:00
Debargha Mukherjee
567ee69b24 Turn on AqSegment tests for VP10
Also shortens the test and changes some of the parameters.

Change-Id: Ieda4aeffa55550fbb9e4235f735c383ef6baf32c
2016-06-16 07:26:39 -07:00
Geza Lore
8192010e32 Plug leak of variance tree.
The speed features can change per frame, so remove condition on
releasing the variance tree.

Change-Id: I651c87a1504266d737e6d98f14fd3ed30d84e01d
2016-06-16 13:21:03 +01:00
Debargha Mukherjee
43115f6878 Merge "Use correct size load in vpx_avg_4x4_sse2." into nextgenv2 2016-06-16 12:15:47 +00:00
Debargha Mukherjee
f9fc898d56 Merge "Split some slower tests based on cpu-used" into nextgenv2 2016-06-16 11:46:36 +00:00
Geza Lore
ffa9173378 Use correct size load in vpx_avg_4x4_sse2.
The old version used 64 bit loads, and then ignored the top half
of the result. This can cause asan failures if we read past the end
of a buffer. Switched to using 32 bit loads instead.

Change-Id: I57da127a26f869fb4b4f700b55408f6dc2fbbc1a
2016-06-16 11:49:24 +01:00
Debargha Mukherjee
6abddf37f8 Split some slower tests based on cpu-used
Change-Id: Idf84475fe06666d5c73c9d86dfc5c23bef170086
2016-06-15 23:14:51 -07:00
James Zern
94e84bbc07 cosmetics,test.mk: fix a typo
Change-Id: Ib74a494e1cf50a356f51e8185e19ca66fcb896a2
2016-06-15 20:33:04 -07:00
James Zern
fba6f748e8 rename vp9_end_to_end_test.cc -> end_to_end_test.cc
this is shared between vp9/10

BUG=webm:1235

Change-Id: I2f44b15268a33453a1c1e0c691d4fc1fc12d0263
2016-06-15 18:30:22 -07:00
James Zern
2710f76692 vp9_end_to_end_test: enable in vp10-only builds
this file is shared between vp9 & vp10; this makes it available in the
presence of --disable-vp9

BUG=webm:1235

Change-Id: Iaf060c3c09afd2c7df69995b0c01589f78d4945e
2016-06-15 18:28:30 -07:00
Zoe Liu
1aa674b588 Fix the superframe unit test for BIDIR_PRED
Change-Id: I2ef8e479893403581711abc020509c6863c2035d
2016-06-15 17:18:26 -07:00
Sarah Parker
50c5921517 Add EndToEndTestLarge for VP10 non-highbitdepth
The current test case is only run for vp9 and vp10 when HBD
is enabled. This was mistakenly removed in:

d53f9a3 Enable VP10 HBD PSNR checking unit test

Change-Id: I88b8168ad1efd805d759238a037653a2901bf50d
2016-06-15 19:45:24 +00:00
Jingning Han
d10161eafc Merge "Refactor trellis optimization process" into nextgenv2 2016-06-15 19:16:30 +00:00
Debargha Mukherjee
918ad13b56 Merge "Pick up correct dequant for segment with new-quant." into nextgenv2 2016-06-15 16:51:38 +00:00
Debargha Mukherjee
00b6ad1ebf Merge "Disable loop restoration when LPF_PICK_MINIMAL_LPF." into nextgenv2 2016-06-15 16:50:55 +00:00
Debargha Mukherjee
00f1580a14 Merge "Remove magic number from traversal (CYCLIC_REFRESH_AQ)." into nextgenv2 2016-06-15 16:50:15 +00:00
Debargha Mukherjee
e20a29d3b0 Merge "Select segment based loopfilter strength for supertx blocks." into nextgenv2 2016-06-15 16:49:34 +00:00
Debargha Mukherjee
343fe016ae Merge "Remove now superfluous argument from predict_b_extend." into nextgenv2 2016-06-15 16:48:13 +00:00
Debargha Mukherjee
52c71f749d Merge "Rework supertx segment handling and adaptive quantization." into nextgenv2 2016-06-15 16:47:58 +00:00
Debargha Mukherjee
b2aabeffc4 Merge "Minor refactor of decode_block for supertx." into nextgenv2 2016-06-15 16:47:28 +00:00
Debargha Mukherjee
3b84c803d8 Merge "Re-initialise quantiser after changing segment." into nextgenv2 2016-06-15 16:45:57 +00:00
Debargha Mukherjee
1b735da7d5 Merge "Refactor variance aq." into nextgenv2 2016-06-15 16:45:21 +00:00
Debargha Mukherjee
46f048e397 Merge "Pass segment id explicitly to quantizer init." into nextgenv2 2016-06-15 16:44:37 +00:00
Debargha Mukherjee
095c88e470 Merge "Fix estimate_wedge_sign with high bit-depth." into nextgenv2 2016-06-15 16:43:51 +00:00
Jingning Han
c457fc3553 Merge "Rework transform quantization pipeline" into nextgenv2 2016-06-15 16:07:10 +00:00
Jingning Han
08bf788ebd Merge "Refactor the trellis optimization process" into nextgenv2 2016-06-15 16:07:03 +00:00
Jingning Han
e9c44a76a2 Refactor trellis optimization process
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It also removes two memset operations on quantized
and dequantized coefficient sets.

The trellis coefficient optimization is on average running over
10% faster.

Change-Id: If3aa26d2a706c3012bf2b7ac059bf1825250e81f
2016-06-15 09:06:13 -07:00
Geza Lore
d3df694fa8 Fix estimate_wedge_sign with high bit-depth.
This fixes some crashes in
VP10/EndToEndTestLarge.EndtoEndPSNRTest/ with high bit-depth and
ext-inter.

Change-Id: I10f0f08e1be4bd5c388616074d4aa3f91a2fda7a
2016-06-15 10:52:29 +01:00
Hui Su
1f493d1ff8 Merge "Speed up ext-intra inter frame encoding" into nextgenv2 2016-06-15 02:46:06 +00:00
Hui Su
e7fb03c8ae Merge "ext-intra: refactor rd loop in interframe" into nextgenv2 2016-06-15 02:46:00 +00:00
Jingning Han
1faf288798 Rework transform quantization pipeline
This commit reworks the transform and quantization unit. It enables
the use of adaptive quantization for intra modes. This further
improves the compression performance:
lowres 0.36%
midres 0.79%
hdres  0.73%

The key frame coding performance is improved:
lowres 1.7%
midres 1.9%
hdres  3.3%

The overall coding gains are:
lowres 1.1%
midres 1.8%
hdres  2.3%

Change-Id: Iaec1a3a4c1d5eac883ab526ed076d957060479dd
2016-06-14 16:32:04 -07:00
Hui Su
69f6fd2134 Merge "Fix rate cost calculation for ext-intra" into nextgenv2 2016-06-14 23:11:25 +00:00
Hui Su
703585b244 Merge "Handle intra modes when tx type speed feature is enabled" into nextgenv2 2016-06-14 22:37:19 +00:00
Jingning Han
48f5125749 Merge "Fix enc/dec mismatch in non-420 settings" into nextgenv2 2016-06-14 21:54:08 +00:00
hui su
8c3b3d3686 Handle intra modes when tx type speed feature is enabled
Change-Id: I9dc156214f3b3ded33ab30d558124b3151548161
2016-06-14 13:46:53 -07:00
hui su
8f9c9b28a8 Speed up ext-intra inter frame encoding
Skip filter intra mode search when regular intra modes have large
rd cost.

Encoding speed improvement:  8%.

Compression performance drop: 0.02%  / 0.09%  / 0.03% on
                              lowres / midres / hdres

Change-Id: I94d3e48781bff6ae6895a54f271dd65c959bb976
2016-06-14 13:46:17 -07:00
hui su
70566f0563 ext-intra: refactor rd loop in interframe
Move filter intra modes search to the end, after regular
mode search.

On average no performance changes.

Change-Id: I9293c8fdf706ebf831fbd61c6bb81959790f4848
2016-06-14 13:46:17 -07:00
hui su
7fa61d7d51 Fix rate cost calculation for ext-intra
It was broken by commit 8ee640f979.

Change-Id: I26b9eba810c74849b0805e64da2d269ab0685cb9
2016-06-14 13:46:17 -07:00
Jingning Han
a116ab7092 Merge "Make tx_type speed feature default" into nextgenv2 2016-06-14 20:45:03 +00:00
Geza Lore
1e46c740c3 Pick up correct dequant for segment with new-quant.
Change-Id: Id73500305a7cc581c11461c9ddb1b22dd8f5d8f4
2016-06-14 16:47:17 +01:00
Geza Lore
3cf3ce949f Disable loop restoration when LPF_PICK_MINIMAL_LPF.
The speed feature sf->lpf_picl == LPF_PICK_MINIMAL_LPF is used
to disable loop filtering. This did not work with the loop-restoration
experiment, but now it is respected.

Note that this speed feature is only used in real-time cpu-used >= 8
settings.

Change-Id: I193723c9ac5f802ec31d8c8b4d37650796e065fd
2016-06-14 16:07:51 +01:00
Geza Lore
58168f5bf4 Remove magic number from traversal (CYCLIC_REFRESH_AQ).
mi->stride now depends on the maximum superblock size, and hence
the constant 8 padding is no longer appropriate. Traverse the array
using mi->stride instead.

Change-Id: I8e84b9fe1728f6663f8c10765fe32206375f1e71
2016-06-14 16:07:51 +01:00
Geza Lore
44b91a0e76 Select segment based loopfilter strength for supertx blocks.
Segment based loopfilter strength for supertx coded blocks is now
selected based on the minimum of all segment IDs within a supertx
coded block (same as the quantiser settings).

Change-Id: Ib056bd0d05f6a1d3b512a76deb4e2ad4db0f7dc4
2016-06-14 16:07:51 +01:00
Geza Lore
7faae780a5 Remove now superfluous argument from predict_b_extend.
Change-Id: I7a76756842af9ce806c6e0e1f98f294af748e8bd
2016-06-14 16:07:50 +01:00
Geza Lore
7dd90c9d22 Rework supertx segment handling and adaptive quantization.
Segment level quantizer settings for supertx coded blocks are now
selected based on the minimum of all segment IDs within a supertx
coded block.

This also fixes the 3 adaptive quantization modes with supertx.

Change-Id: Ib5db099539d4f82f240e1d745d6e5264f8b34cde
2016-06-14 16:07:50 +01:00
Geza Lore
32992fa0b1 Minor refactor of decode_block for supertx.
Exit early from function when supertx is used, rather than putting
the bulk of the function body in a single conditional.

Change-Id: I41f388a45bd46e4a6ee1c51f26782ed9bddff4e5
2016-06-14 16:07:50 +01:00
Geza Lore
9e95919414 Re-initialise quantiser after changing segment.
When using VARIANCE_AQ, we can change the segment assignment after
initialising the quantiser in set_offsets, so re-initialise it when
we do so.

Change-Id: I1f168553aaf0ade419f0d4bf05820cd591b87659
2016-06-14 16:07:50 +01:00
Geza Lore
d60523bc28 Refactor variance aq.
Explicitly signal when the segment map is being refreshed when
using VARIANE_AQ. This simplifies decisions about when the segment id
needs to be set from the previous segment map vs based on the current
variance.

Change-Id: Ieb12c950e9cfbc3f53f4d184880071dea805563c
2016-06-14 16:07:50 +01:00
Geza Lore
2a588555bb Pass segment id explicitly to quantizer init.
This is purely refactoring in preparation of fixing supertx segment
handling

Change-Id: I74bcae34241fdf2b592e1cd45b67af77b9e16c9a
2016-06-14 16:07:37 +01:00
Debargha Mukherjee
ed3034e066 Merge "A crash fix for supertx / ext-inter combination." into nextgenv2 2016-06-14 14:44:55 +00:00
Jingning Han
a4ea8fd8b8 Fix enc/dec mismatch in non-420 settings
This commit makes the dual filter experiment work with non-420
settings. It fixes unit test failure in EndToEndTestLarge.

Change-Id: I04f7afdee78f91389d9ff72947efa152098af930
2016-06-14 00:21:48 +00:00
James Zern
05bd964adc Merge "Revert "Add 1D version of vpx_sum_squares_i16"" into nextgenv2 2016-06-14 00:04:57 +00:00
Debargha Mukherjee
902ee5060c A crash fix for supertx / ext-inter combination.
Change-Id: I9860376c98aa3b25f5bf86ed13d4a7631fa6b153
2016-06-13 13:57:30 -07:00
Jingning Han
a9a8c5993b Refactor the trellis optimization process
Speed up the trellis optimization unit by 10%.

Change-Id: If055f6c0589a405c008d2900bb8fbc11b1246f66
2016-06-13 12:19:57 -07:00
Jingning Han
04f26783c4 Make tx_type speed feature default
Revisit the compression performance and complexity trade-off after
making the SIMD version of trellis optimizations. Before that,
reduce the transform-quantization function calls temporarily. This
would cause about 0.3% performance drop for lowres set.

Change-Id: I16917a6bd5c44ec6cd8cd0b59f3c336c4fd96dd2
2016-06-13 12:19:54 -07:00
James Zern
d2ca083c9f Merge "active_map_refresh_test: fix missing file w/vp10-only" into nextgenv2 2016-06-13 19:08:05 +00:00
Jingning Han
4588676cfb Merge "Trellis based adaptive quantization" into nextgenv2 2016-06-13 17:36:19 +00:00
James Zern
a8ba2eb3d3 active_map_refresh_test: fix missing file w/vp10-only
Change-Id: I6413b7622a3c8524ec0409e087cf7c92f79e4f2d
2016-06-11 09:49:02 -07:00
Debargha Mukherjee
81f8b3f31c Merge "Some refactoring to support warped motion mode" into nextgenv2 2016-06-10 23:18:39 +00:00
Zoe Liu
1d1286bfb4 Fix one typo in the comment
Change-Id: Ie98fd60426b18980ec85572f3cfc9ce0b97a5361
2016-06-10 15:58:30 -07:00
Alex Converse
11ce75968f Merge "Turn on ActiveMapTest speeds [0,5) with all experiments." into nextgenv2 2016-06-10 21:52:57 +00:00
Jingning Han
25ca322957 Trellis based adaptive quantization
This commit combines uniform quantizer with trellis based coefficient
level optimization. It improves the codebase compression performance:

lowres 0.8%
midres 1.0%
hdres  1.6%

Note that the current trellis optimization unit is using C code. This
will make the cost of the overall quantization process slower. A number
of optimizations will come up next.

Change-Id: Id441dd238e4844409d0f08f82604be777f3f5282
2016-06-10 12:56:14 -07:00
Debargha Mukherjee
03be30ba3e Some refactoring to support warped motion mode
Change-Id: I15d54a3ae48b2b33082668116792c6595bdb3ddb
2016-06-10 12:04:18 -07:00
Debargha Mukherjee
8b118faa61 Merge "Adds higher precision for homography model 3rd row" into nextgenv2 2016-06-10 18:47:17 +00:00
Sarah Parker
e14f61b924 Merge "Move new quant experiment from nextgen" into nextgenv2 2016-06-10 17:26:33 +00:00
Jingning Han
b77dfccf00 Merge "Add MIN_TX_SIZE definition" into nextgenv2 2016-06-10 16:04:38 +00:00
Sarah Parker
a21afd421b Move new quant experiment from nextgen
This experiment implements non-uniform quantization where
the width of the bins increases gradually to more closely
match a laplacian distribution of the coeficcients.

Performance Gain:
derflr: 0.15%
hevcmr: 0.675%

Change-Id: I25234244e3bcd94b87c1f77cf682190b61c8ef94
2016-06-10 08:06:22 -07:00
James Zern
5e831c548f Revert "Add 1D version of vpx_sum_squares_i16"
This reverts commit f19700fe52850d051e505ec1b085f25060f7d054.

This crashes in SSE2/SumSquares2DTest.RandomValues/0 under x86 due to
alignment issues

Change-Id: I135d83ba6a7894c09d7c7a139b7eaf876416b40c
2016-06-09 23:42:15 -07:00
James Zern
667db87a1b Merge "Revert "Optimize wedge partition selection."" into nextgenv2 2016-06-10 03:49:29 +00:00
Angie Chiang
95340fccb3 Revert "Optimize wedge partition selection."
This reverts commit efda2831e5f758b4f350679b5c55c0b9282449b0.

This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0

Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53
2016-06-09 19:17:37 -07:00
Sarah Parker
9d924a0c4a Fix vp9_end_to_end_test for vp10 HBD
This test is failing when no experiments are turned on. PSNR is
31.96 when the threshold is 32.

broken since:
0d6980d Remove swap buffer speed feature

Change-Id: I3c29815b40d5282c37f52f4345b56992f8558b2e
2016-06-09 18:47:47 -07:00
Debargha Mukherjee
b0bdc3c1a8 Merge "Add warped motion config flag" into nextgenv2 2016-06-09 22:36:07 +00:00
Debargha Mukherjee
bcf4e0aba8 Add warped motion config flag
Change-Id: I4b5e1251dc215073384e168a6f845ae059d6c4f2
2016-06-09 13:58:56 -07:00
Aamir Anis
de2a20b411 Merge "Updated loop restoration" into nextgenv2 2016-06-09 20:57:09 +00:00
Alex Converse
587b8a11d0 Turn on ActiveMapTest speeds [0,5) with all experiments.
Change-Id: I7da9e6a85648aa69e5e20d825b717d51e3c6809c
2016-06-09 13:51:00 -07:00
Alex Converse
d279cadbe0 Port active map / cyclic refresh fixes to VP10.
Bring commits 575e81f and 3d6b8a6 to VP10. These changes predate
the creation of the active map cyclic refresh test.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I3559b6933ffa5649926a4b214e45ed0fae523a25
2016-06-09 16:52:43 +00:00
Debargha Mukherjee
560a15e62d Adds higher precision for homography model 3rd row
Also adds a function to integerize a double model.

Change-Id: Ie09b3e165492cf66ab81fe25d4bc2422a5e6defd
2016-06-09 04:12:57 -07:00
Jingning Han
f59bf76eef Merge "Take out skip_recode speed feature" into nextgenv2 2016-06-08 21:46:55 +00:00
Jingning Han
cedf90a9d6 Merge "Remove swap buffer speed feature" into nextgenv2 2016-06-08 19:45:54 +00:00
Jingning Han
68cd946994 Add MIN_TX_SIZE definition
Change-Id: I399d601d40827ac383a6687cbeaec59e9a9c63e4
2016-06-08 11:29:02 -07:00
Jingning Han
025fa11c75 Take out skip_recode speed feature
The assumption doesn't hold true in the current codebase. Remove
this speed feature to simplify the codebase.

Change-Id: I9b69f484c9b7cd612b825047cc5b2fce63ee0af7
2016-06-08 18:27:36 +00:00
Jingning Han
0d6980d7a1 Remove swap buffer speed feature
The inter prediction residual can undergo different transform types
during the rate-distortion optimization search. The assumption used
in this speed feature no longer holds true. This commit removes the
related code to clean up the codebase and clear out unit test
failure in higher speed setting.

Change-Id: I7f7cd4df2345ed3e607c9fae75b38cd2dbde0cac
2016-06-08 11:27:00 -07:00
Jingning Han
b48eb90023 Merge "Add tx type speed feature to recursive transform block partitioning" into nextgenv2 2016-06-07 23:44:01 +00:00
Jingning Han
0b7f864213 Merge "Rework the tx type speed feature" into nextgenv2 2016-06-07 23:42:43 +00:00
Angie Chiang
d9410d2d43 Merge "Move #if out of TEST_P in vp10_fwd/inv_txfm2d_test.cc" into nextgenv2 2016-06-07 22:02:28 +00:00
Zoe Liu
ba61de387b Merge "Fix a RD performance bug in bipredictive frames" into nextgenv2 2016-06-07 21:34:56 +00:00
Alex Converse
b47cc0fceb Merge "Turn ActiveMapTest back on." into nextgenv2 2016-06-07 20:28:25 +00:00
Debargha Mukherjee
31da10b41f Merge "Pick up top left mbmi for supertx decode." into nextgenv2 2016-06-07 20:23:44 +00:00
Debargha Mukherjee
ac232300a2 Merge "Zero segment counter before accumulating." into nextgenv2 2016-06-07 20:23:20 +00:00
Alex Converse
7e26f01342 Turn ActiveMapTest back on.
If it's creating problems with some experiments, disable it under the
actual conditions where it doesn't work and file a bug.

Change-Id: Iab9f4bfe42ea926d49d371918da25f9a8938a20f
2016-06-07 11:59:15 -07:00
Jingning Han
33dafdb58b Add tx type speed feature to recursive transform block partitioning
Change-Id: I45440a72b4287d98cbe21b72defc67138a8eb953
2016-06-07 11:34:30 -07:00
Geza Lore
6279d1293d Pick up top left mbmi for supertx decode.
This ensures using the correct segment_id downstream in
reconstruct_inter_block.

Change-Id: Ia8b6ec60de51fa2e26c326d3c537abb18aea75ae
2016-06-07 19:34:11 +01:00
Jingning Han
9a858e868c Rework the tx type speed feature
This commit re-works the transform type speed feature. It moves
the transform type selection outside of the coding mode loop. This
avoids repeated motion search if the best prediction mode is
chosen as NEWMV. It improves the speed performance for clips that
contain more motion activities.

For mobile_cif at 1000 kbps, this makes the baseline encoding 7%
faster and makes the encoding with dynamic motion vector referencing
scheme enabled 10% faster.

Change-Id: I93e2714b3e461303372c4b66a4134ee212faffd1
2016-06-07 11:32:27 -07:00
Zoe Liu
5414abb4a0 Fix a RD performance bug in bipredictive frames
This patch will make sure the use of the BWDREF_FRAME for the
encoding of both the two types of bipredictive frames, namely
LAST_BIPRED_UPDATE and BIPRED_UPDATE. To realize it, the
updates on the cpi->ref_frame_flags have been moved to before
the encoding of one frame, instread of originally handled after
the encoding of one frame.

RD performance has been improved slightly, approximately by 0.17%
compared to before the applying of this patch:

lowres: Avg -3.474; BDRate -3.324
derflr: Avg -2.097; BDRate -1.353

Change-Id: I0aa19afd752293e345489fbff104c4351ca5498c
2016-06-07 09:45:10 -07:00
Geza Lore
f304d5c8e7 Zero segment counter before accumulating.
The segment counts are computed as part of packing the bitstream,
so they might have been computed already in the recode loop. Zero
the accumulator to avoid double counting.

This fixes some encoder/decoder mismatches.

Change-Id: Ib7816034cbbb1db41101116b706302b02fad3a2c
2016-06-07 17:02:03 +01:00
Debargha Mukherjee
d3180b8b97 Merge "Fix build failure happened in reconinter.c" into nextgenv2 2016-06-07 14:22:25 +00:00
Debargha Mukherjee
13155e7725 Merge "Optimize wedge partition selection." into nextgenv2 2016-06-07 09:50:13 +00:00
Debargha Mukherjee
24a04f9048 Merge "Fix decoder crash with supertx" into nextgenv2 2016-06-07 09:46:48 +00:00
Aamir Anis
99d9a8fe30 Updated loop restoration
1. Wiener restoration filter now has normalization and evaluation of
quantization procedure.
2. Corrected scaling of bits in RD cost computation.
3. Changed dynamic range and number of bits for Wiener filter.
Observed gains: Overall 0.58% for low_res, 0.7% for mid_res sequences.

Change-Id: I8928b3ea493bfe1790926b00388d6c4bafc08e19
2016-06-06 15:49:52 -07:00
Angie Chiang
2250c6b07b Fix build failure happened in reconinter.c
Change-Id: Ifd5ed91e4e91238fb53a202c8d76c11fbb9ccf7c
2016-06-06 14:41:14 -07:00
Angie Chiang
f67196b2ed Move #if out of TEST_P in vp10_fwd/inv_txfm2d_test.cc
Change-Id: I1d5b2408f27a1e277574c2238f1e49e884596309
2016-06-06 12:45:54 -07:00
Jingning Han
3713949b6d Merge "Make ref-mv experiment support ActiveMap" into nextgenv2 2016-06-06 16:06:41 +00:00
Geza Lore
efda2831e5 Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.

Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.

Details are in wedge_utils.c.

Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.

ext-inter gains about ~4.5% speedup.

Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
2016-06-06 14:43:10 +01:00
Geza Lore
6c4306c27d Fix decoder crash with supertx
xd->plane[0].n4_h and xd->plane[0].n4_w are not set at that point
when using supertx.

While this fixes the immediate crash described in the referenced
bug report, there are still issues in the ref-mv experiment that
causes these tests to fail, so they are kept disabled.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1230

Change-Id: Ibf8ef02847a903f8d10e6be28e16694db10c75af
2016-06-06 09:58:11 +01:00
Debargha Mukherjee
b85d0adadf Merge "Always include the cost of tx size in rate for Y." into nextgenv2 2016-06-03 22:57:17 +00:00
Debargha Mukherjee
33c57e6223 Merge "Check if sub8x8 rd stats are valid before reusing them." into nextgenv2 2016-06-03 22:38:56 +00:00
Debargha Mukherjee
fc61d92bf8 Merge "Compute rate of partition type accurately for edge blocks." into nextgenv2 2016-06-03 22:37:33 +00:00
Jingning Han
27d8a948c1 Make ref-mv experiment support ActiveMap
Reset the ref_mv_idx and predicted motion vector when the coding
block belongs to skip segment.

Change-Id: I5746ab315a436b829b64a1a25121989d3c11c995
2016-06-03 15:04:18 -07:00
Geza Lore
b87078d51e Always include the cost of tx size in rate for Y.
The transform can only be skipped if both Y and U/V can be skipped, so
we always include the cost of tx size in the rate for Y. This will
get later subtracted if the transform is actually skipped.

Change-Id: I136a223e5596f18b69bb9f743e7e08438183a215
2016-06-03 11:51:35 -07:00
Geza Lore
d9870c32a9 Check if sub8x8 rd stats are valid before reusing them.
Change-Id: I5d49f15a07de58c226d4003b4691e001abf1f3f8
2016-06-03 11:47:34 -07:00
Geza Lore
8ee640f979 Compute cost of UV mode accurately for intra blocks.
We used to cache the cost of the UV mode from the search with a
different previously tried Y mode, but the UV mode is contexted
on the Y mode, so caching the cost is inaccurate.

Change-Id: Ib003510afb6fc9befb7808b67b0be64f1c0a0804
2016-06-03 11:13:51 -07:00
Geza Lore
1354c6942c Compute rate of partition type accurately for edge blocks.
This patch factors in the different partition coding syntax used for
right and bottom edge blocks when doing RD search.

Change-Id: I2f31650512b6a4a7a2c03352414693aff6fbf87b
2016-06-03 06:43:34 -07:00
Debargha Mukherjee
353930d212 Merge "Add 1D version of vpx_sum_squares_i16" into nextgenv2 2016-06-03 13:27:50 +00:00
Debargha Mukherjee
5590c48937 Merge "Move template specializations into .cc from .h" into nextgenv2 2016-06-03 13:27:43 +00:00
Debargha Mukherjee
cfa03374f8 Merge "Factor out x86 SIMD intrinsic synonyms" into nextgenv2 2016-06-03 13:27:29 +00:00
Debargha Mukherjee
1e160ce559 Merge "Factor out model_rd_from_sse" into nextgenv2 2016-06-03 13:27:22 +00:00
Debargha Mukherjee
cbf51c5ba0 Merge "Pre-compute and use contiguous wedge masks." into nextgenv2 2016-06-03 13:27:02 +00:00
Geza Lore
f19700fe52 Add 1D version of vpx_sum_squares_i16
Change-Id: I0d7bda2fe6f995a9e88a9f66540b4979b3f7fab1
2016-06-03 09:34:55 +01:00
Geza Lore
5a69ee0e11 Move template specializations into .cc from .h
Change-Id: I6d8775c1fa228fde25016a401e3c22a8e3da42f9
2016-06-03 09:34:55 +01:00
Geza Lore
9ebca46933 Factor out x86 SIMD intrinsic synonyms
Change-Id: Idc4ac3ccd2ba19087cdb74a3e4a6774ac50386aa
2016-06-03 09:34:55 +01:00
Geza Lore
73bc3119be Factor out model_rd_from_sse
Change-Id: Ia60ff0ecc8d083870fadbfe07d494d1e2c080489
2016-06-03 09:34:55 +01:00
Geza Lore
ab29978e9f Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.

Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of  operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.

Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
2016-06-03 00:16:22 -07:00
Debargha Mukherjee
17c4f1c7f5 Merge "Use standard rounding in combine_interintra." into nextgenv2 2016-06-02 19:29:16 +00:00
Debargha Mukherjee
7534a15c3a Merge "Warped motion functions added" into nextgenv2 2016-06-02 19:28:03 +00:00
Geza Lore
888e90e823 Use standard rounding in combine_interintra.
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.

Change-Id: I04e92850bc69a7d7a07b06e3d2ce97f6f2ada321
2016-06-02 16:26:05 +01:00
Alex Converse
380c4ee32d Merge "segmentation: Don't use uninitialized probability data." into nextgenv2 2016-06-01 17:50:37 +00:00
Alex Converse
6bae20ca43 Merge "Replace some vpxbool calls with entropy coder agnostic calls." into nextgenv2 2016-05-31 23:58:00 +00:00
Alex Converse
7a6cb59dbb segmentation: Don't use uninitialized probability data.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I17b76fcf0d8c191850350d5aa50dcc007b8b0cdc
2016-05-31 16:42:29 -07:00
Hui Su
afaefc89eb Merge "ext-intra: speed up keyframe encoding" into nextgenv2 2016-05-31 23:21:03 +00:00
Hui Su
118167a47d Merge "Add a speed feature for inter tx type search" into nextgenv2 2016-05-31 23:20:57 +00:00
Hui Su
60b52a1334 Merge "Add a speed feature for intra tx type search" into nextgenv2 2016-05-31 23:20:52 +00:00
James Zern
1d9cf262f7 Merge "vp10_inv_txfm2d_test: fix memory leak" into nextgenv2 2016-05-31 23:19:47 +00:00
Alex Converse
aee0091161 Replace some vpxbool calls with entropy coder agnostic calls.
Change-Id: Ifbcd0714fcf994c43b69255185456c7a255df66c
2016-05-31 15:42:19 -07:00
Debargha Mukherjee
faf3c2cd38 Warped motion functions added
Change-Id: I5064ef1421e17c3ecafe70e7ff1fc7db0c16cc8f
2016-05-31 14:03:23 -07:00
hui su
fa933553da ext-intra: speed up keyframe encoding
130% speed increase for keyframe encoding, with 0.4%
compression loss.

When kf-max-dist=150, 1.5% speed increase with 0.03%
compression loss.

Change-Id: I4cf7314ab95b9eb6dd17f314aca8955522c82676
2016-05-31 10:34:44 -07:00
hui su
f523d7b540 Add a speed feature for inter tx type search
Seperate prediction mode and tx type search for inter
modes. Enabled for speed >=1.

baseline:
speed increase     40%
compression drop   0.30%/0.29% on lowres/midres

ext-tx:
speed increase    160%
compression drop  1.08%/0.95% on lowres/midres

Change-Id: Ieb34b1ee80df6980d16e26a5783e08cc0deae55b
2016-05-31 10:34:35 -07:00
hui su
38e6dd71bb Add a speed feature for intra tx type search
Add a speed feature to seperate prediction mode and tx type search
for intra modes: search for best intra prediction mode with fixed
default tx type first, then choose the best tx type for the
selected mode.

Coding performance drop:
baseline
  lowres 0.10% midres 0.08% hdres 0.14%
with ext-tx
  lowres 0.14% midres 0.25% hdres 0.20%

Speed improvement is 20% for baseline and 17% for ext-tx.

It is turned on for speed >= 1.

Change-Id: Ia5e8d39e8a4e2e42c521bfde938f8b6a98ab24f9
2016-05-31 10:33:56 -07:00
Zoe Liu
e89ca180c2 Make the bi-predictive frame group interval adjustable
This is for the bidir-pred experiment. Previously the length of the
bi-predictive frame group interval is fixed at 2, i.e. one
bi-predictive frame may be inserted every other frame. This patch
makes the length adjustable, i.e. any positive number may be
specified, but the use of the backward ref will be turned off if the
bi-predictive frame group interval is larger than the golden frame
group.

Further, an additional rate factor level has been added:
INTER_LOW
, which applies to LAST_BIPRED_UPDATE frames that are not used as
references.

Change-Id: I5514d34a64dd486bbb5756c2d0612946f598a789
2016-05-28 16:46:45 -07:00
Hui Su
6fd7f7dd3e Merge "ext-intra: refactor mode info. writing and reading" into nextgenv2 2016-05-28 04:34:59 +00:00
James Zern
5d237f0986 vp10_inv_txfm2d_test: fix memory leak
input_, ref_input_ and output_ were being allocated with new[] followed
by vpx_memalign, remove the former

Change-Id: Ia16d0f9b9317042a24445095ad3c284f4e7bb481
2016-05-26 20:04:59 -07:00
Hui Su
e717ece4ab Merge "Add a quick path in build_intra_predictors" into nextgenv2 2016-05-26 22:12:53 +00:00
hui su
e5f47d4334 ext-intra: refactor mode info. writing and reading
No performance changes.

Change-Id: I001068330ea217a993aee9b79d7ffead0d23100e
2016-05-26 14:56:40 -07:00
Hui Su
88eaf5d6ce Merge "Skip unnecessary calculations in ext-intra" into nextgenv2 2016-05-26 18:03:02 +00:00
hui su
bad6e169bf Add a quick path in build_intra_predictors
For the cases where no reference data is available.

Change-Id: Ibf1ac9b7073acc2c7fc44da893f3d608dc74bc1e
2016-05-25 15:21:57 -07:00
Yi Luo
469d002f4e Merge "Integrate HBD inverse HT flip types sse4.1 optimization" into nextgenv2 2016-05-25 21:35:14 +00:00
Yi Luo
bfe4c0ae07 Integrate HBD inverse HT flip types sse4.1 optimization
- tx_size: 4x4, 8x8, 16x16.
- tx_type: FLIPADST_DCT, DCT_FLIPADST, FLIPADST_FLIPADST,
  ADST_FLIPADST, FLIPADST_ADST.
- Encoder speed improvement:
  park_joy_1080p_12: ~11%, crowd_run_1080p_12: ~7%.
- Add unit test cases for bit-exact against C.

Change-Id: Ia69d069031fa76c4625e845bfbfe7e6f6ed6e841
2016-05-25 12:32:10 -07:00
James Zern
008f27e70a Merge "add vp10 ActiveMap/ActiveMapRefreshTest" into nextgenv2 2016-05-25 19:05:02 +00:00
Yi Luo
cb507ff29a Merge "HBD inverse HT 8x8 and 16x16 sse4.1 optimization" into nextgenv2 2016-05-24 22:06:07 +00:00
Zoe Liu
cf5083d4cd Added an experiment "bidir_pred" for backward prediction
Major parts have been implemented as follows:
(1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
(2) Added the handling for the scenario of
"cpi->common.show_existing_frame == 1" at the encoder;
(3) Added a new reference frame of BWDREF_FRAME;
(4) Have bwd-ref work with upsampled references.

Note that when the experiment of "ext_refs" turned on, this experiment
will be turned off automatically currently.

RD performance in Overall PSNR has been improved, compared against the
VP10 baseline:

lowres: Avg -3.312; BDRate -3.154
derflr: Avg -1.927; BDRate -1.176
midres: Avg -2.149; BDRate -2.001
hdres : Avg -0.567; BDRate -0.588

Change-Id: I4c06ff51cc20194bffbd4d2346e57ba3dcf6b62c
2016-05-24 13:55:57 -07:00
Yi Luo
28cdee448d HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.

Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
2016-05-24 12:55:30 -07:00
Debargha Mukherjee
89f5b6a0b6 Merge "Remove redundant memcpy from wedge predictor." into nextgenv2 2016-05-24 17:33:54 +00:00
Debargha Mukherjee
416da08102 Merge "Pick up bit-depth from the right place" into nextgenv2 2016-05-24 17:33:22 +00:00
Geza Lore
2935b4db0e Remove redundant memcpy from wedge predictor.
Removing redundant calls to memcpy from
build_wedge_inter_predictor_from_buf yields a net 4% encoder speedup
with ext-inter only. The output is identical.

Change-Id: If97d4e323a5c8aca90c84a25a72085e006b05446
2016-05-24 11:31:18 +01:00
Geza Lore
62b6331753 Pick up bit-depth from the right place
Change-Id: Icbdb036d7927b77b84bd78e8348ec8b5be88df08
2016-05-24 11:08:23 +01:00
hui su
4a741a5d5c Skip unnecessary calculations in ext-intra
Around 5% speedup.

Change-Id: I1c552e4e58fbf5637c0b5a97dd2cc4f83a1ca201
2016-05-23 17:24:19 -07:00
Zoe Liu
a63147ae77 Fix --test-decode=warn to test mismatch
This patch always compares the most recent show frames between
the encoder and the decoder to test the mismatch.

Change-Id: I68a91ad0996a598231450debfd616e24992419b5
2016-05-23 17:01:53 -07:00
Debargha Mukherjee
fb65f9b54b Merge "Add optimized vpx_blend_mask6" into nextgenv2 2016-05-23 23:43:52 +00:00
Geza Lore
a661bc87c4 Add optimized vpx_blend_mask6
This is to replace vp10/common/reconinter.c:build_masked_compound.
Functionality is equivalent, but the interface is slightly more
generic.

Total encoder speedup with ext-inter: ~7.5%

Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
2016-05-23 16:28:58 +01:00
Debargha Mukherjee
fa5022978d Merge "Wedge refactoring to handle signs better" into nextgenv2 2016-05-20 23:19:39 +00:00
Jingning Han
8c9f6c5531 Merge "Clear redundant condition check from vp10_ext_tile_test.cc" into nextgenv2 2016-05-20 22:10:41 +00:00
Debargha Mukherjee
e5de2ad632 Wedge refactoring to handle signs better
Mostly refactoring. Handles signs better though results are
more or less neutral.

Change-Id: If499537c8f8da4f34d104ebfda072eb4c85fb12f
2016-05-20 14:12:52 -07:00
Yaowu Xu
93921097a6 Merge "Properly handle the filter extension in highbd setting" into nextgenv2 2016-05-20 20:00:51 +00:00
Yaowu Xu
17611f2f73 Merge "Fix build when vp8 is disabled" into nextgenv2 2016-05-20 19:59:48 +00:00
Yaowu Xu
7fd0e1b991 Merge "Port change to highbitdepth code path" into nextgenv2 2016-05-20 19:59:41 +00:00
Yaowu Xu
0924bcd824 Fix build when vp8 is disabled
Change-Id: Ie1765f086b10d0f7c4d72961d238dfe0d6056dc2
2016-05-20 11:33:07 -07:00
Yaowu Xu
ba794ea356 Port change to highbitdepth code path
This fixes the crash in encoder when configure with both  highbitdepth
and dual-filter.

Change-Id: Ie06cc528094f4b31b7fc0ba75e7b15cae031d707
2016-05-20 11:30:37 -07:00
Hui Su
83713e7059 Merge "Use standard rounding in intra filters." into nextgenv2 2016-05-20 16:36:04 +00:00
Jingning Han
f1c283f4de Merge "Rework sub8x8 chroma component inter predictor" into nextgenv2 2016-05-20 15:50:32 +00:00
Jingning Han
7488ae014b Merge "Remove unused private variables from vp10_inv_txfm2d_test.cc" into nextgenv2 2016-05-20 01:23:25 +00:00
Yaowu Xu
a9fc1cc257 Fix a build issue
When both obmc and dual_filter is enabled.

Change-Id: I56b127573a6cca31469bb357cf7a6a9c3df64071
2016-05-19 14:24:41 -07:00
Yue Chen
a33e3d12cb Merge "Fix obmc + ext-interp interference" into nextgenv2 2016-05-19 18:08:52 +00:00
Jingning Han
d84a2e7dc0 Properly handle the filter extension in highbd setting
This commit makes the filter extension in highbd aware of the
dual filter and ext-interp experiments to prevent enc/dec mismatch
when both experiments are turned on.

Change-Id: I11ac1f041bd5f73d61e839d6386d9c5d008da3f7
2016-05-19 09:59:48 -07:00
Jingning Han
e816401a81 Clear redundant condition check from vp10_ext_tile_test.cc
Change-Id: I74e9df9e314e49b931c23a81d14f5a9e143b0b7d
2016-05-19 09:31:18 -07:00
Jingning Han
7d5ccccd47 Remove unused private variables from vp10_inv_txfm2d_test.cc
Change-Id: Ie933d754aca649bdf17cd679b9a31239bf413b63
2016-05-19 09:21:13 -07:00
Yi Luo
5fec33012e Merge "Fix to conform Google's coding convention" into nextgenv2 2016-05-19 16:07:01 +00:00
Jingning Han
0f513752a0 Rework sub8x8 chroma component inter predictor
This commit makes the sub8x8 chroma component inter predictor
operate at 2x2 block level. This allows one to use the actual motion
vector associated with each individal pixel block. It improves the
compression performance

lowres  0.40%
midres  0.25%
hdres   0.15%

Change-Id: Ia40e07cc7fde463dbf660018850e024932136c4f
2016-05-19 09:03:57 -07:00
Jingning Han
936ed0804d Merge "Account sub8x8 block reference filter type for prob context" into nextgenv2 2016-05-19 15:04:31 +00:00
Jingning Han
300083da27 Merge "Re-structure probability model context for prediction filter type" into nextgenv2 2016-05-19 15:04:18 +00:00
Geza Lore
fa63b5514a Use standard rounding in intra filters.
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.

Change-Id: Ie969ed7eb9fcc88a93a90c7e274fd82f336c7e4d
2016-05-19 13:16:42 +01:00
Geza Lore
009bd1153e Fix obmc + ext-interp interference
With ext-interp, a switchable interpolation filter is coded iff the
motion vector uses fractional pixel movement (ie, true subpixel
movement). With ext-interp and obmc enabled at the same time, the RD
search proceeds as:
1. Do motion search
2. Do interpolation filter search iff subpixel motion, otherwise use
   EIGHTTAP_REGULAR
3. Evaluate obmc=0
4. Evaluete obmc=1 - This involves another motion search

If the motion search in step 4 yields an integer motion vector, while
the search in step 1 did not, then an interp_filter value other than
EIGHTTAP_REGULAR is invalid, and will cause an assertion failure
at output time, or a mismatch if not using --enable-debug.

The fix sets the interp_filter to EIGHTTAP_REGULAR if obmc=1 is picked
with an integer motion vector.

Change-Id: I4685d1ad537f41d833dc9eb64845956b67886cca
2016-05-19 11:30:07 +01:00
Yi Luo
346d2449f0 Fix to conform Google's coding convention
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.

Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
2016-05-18 18:15:53 -07:00
Zoe Liu
011f020447 Refactor on getting upsampled reference frame
Reused a function that has been used in getting the normal
reference frames.

Change-Id: Ic4f7dac5c396d689a72699ab79fd580747f8bd65
2016-05-18 16:00:23 -07:00
Jingning Han
9161464f6c Account sub8x8 block reference filter type for prob context
If a reference block is coded with sub8x8 block size, and if it
has sub-pixel level motion vectors, its prediction filter type
should be used as context information.

The coding performance gains of dual filter type coding scheme are
lowres  0.57%
hdres   0.88%

Change-Id: I68b98f2518d02f11c29d0256aeb45b2580fe5cac
2016-05-18 12:35:31 -07:00
Angie Chiang
b0612009bd Merge "Turn on flip in inverse txfm2d" into nextgenv2 2016-05-18 18:37:41 +00:00
Angie Chiang
6f28581b26 Turn on flip in inverse txfm2d
Fix build failed
Reduce txfm test time

Change-Id: Ieaf6b27f3a272d06286f817f01230413fa8adcf6
2016-05-18 11:26:57 -07:00
James Zern
8c125eaf28 Merge "vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include" into nextgenv2 2016-05-18 18:25:00 +00:00
Jingning Han
27d44a1843 Re-structure probability model context for prediction filter type
This commit reworks the probability model contexts used in the
prediction filter type entropy coding.

Change-Id: I7abc68cb469248d0d7ca1046da3c086ecb7b066a
2016-05-18 11:11:43 -07:00
Yi Luo
18ecb16c30 Merge "Integrate HBD row/column flip fwd txfm SSE4.1 optimization" into nextgenv2 2016-05-18 17:45:45 +00:00
Debargha Mukherjee
f1ddf6eb04 Merge "Reducing computation of interintra modes" into nextgenv2 2016-05-18 17:21:15 +00:00
James Zern
297c752106 vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include
Change-Id: I103be7eee36492f8619144ce8325bc916d4975c7
(cherry picked from commit 2184692c07c15290b424d538ae942d5f60eb7df8)
2016-05-18 05:39:49 +00:00
Yi Luo
1d307368a9 Integrate HBD row/column flip fwd txfm SSE4.1 optimization
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
  block, for experiment, EXT_TX.
- Encoder speed improves about 12%-15%.
- Update the unit tests for bit-exact result against C.

Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
2016-05-18 03:48:01 +00:00
Jingning Han
9f55543c06 Merge "Silience compiler warnings in unsigned int" into nextgenv2 2016-05-18 01:18:46 +00:00
Jingning Han
436f78fab7 Silience compiler warnings in unsigned int
Add suffix u to clarify the unsigned int constant when the value
is above 2^31.

Change-Id: Ic712096285b7bf37deaeb5ad1b6b117fc0d67093
2016-05-17 16:46:42 -07:00
Debargha Mukherjee
049dbe7786 Reducing computation of interintra modes
Use model for interintra mode search.
Speed-up about 5-10% with about 0.04 drop in efficiency.

lowres: -2.60%

Change-Id: I825bf0ba8a46eb7f19fc528c25b8df066fb8ea95
2016-05-17 07:28:06 -07:00
James Zern
a81a75184c Merge "vp10/rdopt,rd_pick_intra4x4block: port tsan fix from vp9" into nextgenv2 2016-05-17 03:04:00 +00:00
James Zern
8eba4ac46e vp10/rdopt,rd_pick_intra4x4block: port tsan fix from vp9
minus the non-existent nonrd portion. original change:

commit d642294b1c57a5adacb1038ff45766c38bae8a6d
Author: Jingning Han <jingning@google.com>
Date:   Thu Feb 11 12:36:49 2016 -0800

    Fix tsan error in VP9 sub8x8 intra mode search

    This commit fixes issue 1141. The issue was triggered in multi-tile
    encoding. The change properly saves and restores the block context
    information in the real-time mode selection process. It removes
    several redundant memcpy operations in sub8x8 intra block mode
    search.

    Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010

Change-Id: Idfa38c54c9e645479f6870d46f71fb1e91c071da
2016-05-16 17:20:29 -07:00
Jingning Han
4677e1a718 Unify the per directional filter type system for compound modes
For the current stage, we assume a single prediction filter type
per direction in the settings of compound inter prediction modes.

Change-Id: I12a1afdd364b93fcee870bd11ad01fc40ab48cff
2016-05-16 14:41:08 -07:00
Jingning Han
d567e14e81 Enable per motion component filter type selection
Change-Id: I73823fc94f296d225dece7156de71b30bae3fcb7
2016-05-16 14:38:43 -07:00
Jingning Han
c4e7fde68a Merge "Properly handle 2D filter boundary extension" into nextgenv2 2016-05-16 21:34:28 +00:00
Yi Luo
ceabb00704 Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2 2016-05-16 21:15:08 +00:00
Debargha Mukherjee
250c6af087 Merge "Various wedge enhancements" into nextgenv2 2016-05-16 21:11:35 +00:00
Debargha Mukherjee
fb8ea1736b Various wedge enhancements
Increases number of wedges for smaller block and removes
wedge coding mode for blocks larger than 32x32.

Also adds various other enhancements for subsequent experimentation,
including adding provision for multiple smoothing functions
(though one is used currently), adds a speed feature that decides
the sign for interinter wedges using a fast mechanism, and refactors
wedge representations.

lowres: -2.651% BDRATE

Most of the gain is due to increase in codebook size for 8x8 - 16x16.

Change-Id: I50669f558c8d0d45e5a6f70aca4385a185b58b5b
2016-05-16 12:41:47 -07:00
Jingning Han
14dd5538e9 Properly handle 2D filter boundary extension
The amount of border extension needed in the first stage inter
filtering is decided by the length of the second stage filter
kernel.

Change-Id: Icddbc58c02234d5df09ff0eeebcf166ffe689203
2016-05-16 11:49:27 -07:00
Angie Chiang
af87a7b0ca Merge changes I6aa75c66,Id5f0fade,I368d365e,Ibaf7b00b into nextgenv2
* changes:
  Refactor and add flip unit test to vp10_inv_txfm2d_test.cc
  Add flip feature to vp10_inv_txfm2d.c
  add unit test for highbd flip transform
  Refactor vp10_fwd_txfm2d_test.cc
2016-05-16 18:12:18 +00:00
Angie Chiang
fdaad9f673 Refactor and add flip unit test to vp10_inv_txfm2d_test.cc
Change-Id: I6aa75c66429a0178852cf8df88f16eaa8e36b629
2016-05-13 12:30:51 -07:00
Angie Chiang
909bbe734a Add flip feature to vp10_inv_txfm2d.c
Change-Id: Id5f0fade42749d2bed5553eda0d690af22b6c5b1
2016-05-13 12:21:58 -07:00
Angie Chiang
6a75253311 add unit test for highbd flip transform
Change-Id: I368d365ee0f58373bc399b615febd790addb2c36
2016-05-13 12:20:06 -07:00
Angie Chiang
716f1bd46c Refactor vp10_fwd_txfm2d_test.cc
Change-Id: Ibaf7b00bfe247df3e665ea3a0241667cb130e16c
2016-05-13 12:13:31 -07:00
Yi Luo
a3a69b400c HBD inverse HT 4x4 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoder overall instruction count drops 2.91%.
- Decoder overall instruction count drops 1.01%.
- Add unit test to test bit-exact result against C.

Change-Id: I908c9e0e5106c58f67dd72d28760e6c9ce54278e
2016-05-13 12:08:43 -07:00
Jingning Han
09ed43ed56 Add static to memcpy_short_addr
Change-Id: I4a713784395bf13aaba2f7b175a6e93d50373a2f
2016-05-12 16:16:52 -07:00
Jingning Han
5b573d650a Fix vp10_inv_txfm2d.round_trip test failure
Avoid accessing transform type that is not 2D-DCT if the transform
size is 64x64. This fixes an assert failure in this unit test.

Change-Id: I0dee865ea0925f5743b8a25c2f90eb6522b4d272
2016-05-12 16:09:02 -07:00
Jingning Han
cacd634791 Fix vp10_get_inv_txfm_64x64_cfg
Add a missing break statement to prevent unintended behavior.

Change-Id: I54ecc95d4a35d4011e85af5635c94015cc944331
2016-05-12 16:07:23 -07:00
Jingning Han
ddee66f2e4 Refactor the inter predictor for supertx
This commit unifies the inter predictor used in supertx at both
encoder and decoder sides. It removes the redundant decoder
implementations related to border extension.

Change-Id: I03985cee52604a518394232fa9258ce057af9c00
2016-05-12 14:43:24 -07:00
Jingning Han
7ac38a7143 Merge "Unify the use of inter predictor in encoder and decoder" into nextgenv2 2016-05-12 21:27:07 +00:00
Jingning Han
d6c881358b Unify the use of inter predictor in encoder and decoder
This commit unifies the inter predictor used in the encoder and
decoder side for super-tx experiment. This resolves an enc/dec
mismatch found in nextgenv2 nightly-run unit test.

Change-Id: I16ab8d6063edf9d2fba79473f470f1a592cc10a0
2016-05-12 12:52:30 -07:00
Yunqing Wang
e7ebe26dd5 Merge "Add decoder APIs and unit tests in tile-coding experiment" into nextgenv2 2016-05-12 19:05:58 +00:00
Angie Chiang
1e587ae616 Merge "Add flip option for vp10_fwd_txfm2d_#x#_c" into nextgenv2 2016-05-12 18:08:28 +00:00
Jingning Han
aad8c94fb7 Merge "Fix highbd masked variance function declaration" into nextgenv2 2016-05-12 01:29:38 +00:00
Yunqing Wang
8e5e338727 Add decoder APIs and unit tests in tile-coding experiment
In the tile-coding experiment,
1. In tile decoder, added 2 set control APIs:
   VP10_SET_DECODE_TILE_ROW and VP10_SET_DECODE_TILE_COL. It allowed
   users to set the range of decoding at frame level.
2. Added a unit test while tile-coding experiment is on. It tested
   both tile encoder and decoder to make sure the encoded frame
   can be decoded as a whole frame or as independent tiles.

Change-Id: I73fd0632b685047cb9376008127cde72efa3fb2b
2016-05-11 16:47:26 -07:00
James Zern
18112f6724 add vp10 ActiveMap/ActiveMapRefreshTest
currently disabled as they result in ASan errors

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I9c80910adc5dc2cd6eccb3030d33043df53e7ec5
2016-05-11 16:33:29 -07:00
Jingning Han
5538c2cb00 Fix highbd masked variance function declaration
Fix the variable type mismatch between highbd_calc_masked_var_t and
the actual function definiton. This clears the related compiler
warnings in highbd with ext-inter experiment.

Change-Id: I0423318b16c867ed84700084ad21ca6e42edb321
2016-05-11 15:52:58 -07:00
Debargha Mukherjee
55598d1b3d Merge "Cost wedge sign/index properly in rdopt." into nextgenv2 2016-05-11 20:55:40 +00:00
Angie Chiang
f8629918a8 Merge "Remove vp10_fwd_txfm2d_sse4_test.cc" into nextgenv2 2016-05-11 19:04:03 +00:00
Geza Lore
c1b739014f Cost wedge sign/index properly in rdopt.
Lowres improves by about 0.1%

lowres: -2.164 BDRATE

Change-Id: I393bbb92700bfbb8763ace424f4edc2d672a74b4
2016-05-11 11:59:10 -07:00
Debargha Mukherjee
c590c590e6 Merge "Adjust smoothing function for wedge to be sharper" into nextgenv2 2016-05-11 18:02:08 +00:00
Yue Chen
372e12b959 Merge "Add single motion search for OBMC predictor" into nextgenv2 2016-05-11 17:20:32 +00:00
Debargha Mukherjee
81abbc203e Adjust smoothing function for wedge to be sharper
Improves performance by 0.2%

lowres: -2.052% BDRATE

Also increases precision of the shift parameters (for further
investigation into different wedge shifts).

Change-Id: I59fcab9baa002e52a6487ed8d617185840a678ed
2016-05-11 09:35:43 -07:00
Geza Lore
0778f05cab Compute end of frame precisely with selective tile decoding.
Change-Id: I0ee480d437411bebe240bedff204682833efb131
2016-05-11 11:11:14 +01:00
Yue Chen
370f203a40 Add single motion search for OBMC predictor
Weighted single motion search is implemented for obmc predictor.
When NEWMV mode is used, to determine the MV for the current block,
we run weighted motion search to compare the weighted prediction
with (source - weighted prediction using neighbors' MVs), in which
the distortion is the actual prediction error of obmc prediction.

Coding gain: 0.404/0.425/0.366 for lowres/midres/hdres
Speed impact: +14% encoding time
              (obmc w/o mv search 13%-> obmc w/ mv search 27%)

Change-Id: Id7ad3fc6ba295b23d9c53c8a16a4ac1677ad835c
2016-05-10 18:27:45 -07:00
Angie Chiang
1954fa390f Add flip option for vp10_fwd_txfm2d_#x#_c
Will add unit test to test/vp10_fwd_txfm2d_test.cc later

Change-Id: I626900c67fca4eee2ad0ae1828188527a04a5362
2016-05-10 18:14:57 -07:00
Angie Chiang
b5331459c2 Remove vp10_fwd_txfm2d_sse4_test.cc
Functions vp10_fwd_txfm2d_#x#_sse4_1 tested in this file
will be tested in vp10_fhts#x#_test.cc
Remove this to avoid duplication

Change-Id: Iaf21ab85b9a164fcf2a4574b3e13217e43b6255e
2016-05-10 17:06:40 -07:00
Yaowu Xu
dc73c3332e Merge "Move count buffers from stack to heap" into nextgenv2 2016-05-10 23:58:59 +00:00
Jingning Han
005564813d Merge "Remove unused highbd_fdct32x32 function" into nextgenv2 2016-05-10 23:16:41 +00:00
Jingning Han
4b639fcf43 Merge "Remove unused highbd_ihalfcenter32_c function" into nextgenv2 2016-05-10 23:16:35 +00:00
Jingning Han
dd4352b5cd Merge "Fix high bit-depth build with ext-inter and dual-filter exps" into nextgenv2 2016-05-10 23:16:26 +00:00
Yaowu Xu
102cdf94ed Move count buffers from stack to heap
This fixes the stack overflow issue on MSVC build.

Change-Id: Icb0a78e5992a097d2192979ec2432546eaa452dd
2016-05-10 14:49:26 -07:00
Jingning Han
5cf3408ba1 Remove unused highbd_fdct32x32 function
The encoder is using vp10_fwd_txfm2d_32x32 now.

Change-Id: I719f18ec0b065f1e062d01fd300533dd2f17c712
2016-05-10 14:33:34 -07:00
Jingning Han
6b9a507f82 Remove unused highbd_ihalfcenter32_c function
Change-Id: I4390fcbdf353d79dadc021d83d40891e518997dc
2016-05-10 14:27:16 -07:00
Jingning Han
f28550d348 Fix high bit-depth build with ext-inter and dual-filter exps
Change-Id: Ie4a884899d73cafea439ecab6ff4de54652b8c28
2016-05-10 14:17:13 -07:00
Sarah Parker
f6acf8ad7c Move new quant experiment in quant_common.c from nextgen
NEW_QUANT allows bin widths to be modified as a factor of the nominal
quantization step size. This adds functions to get dequantization
values based on the dequantization offset and 3 knots for a single
quantization profile.

Change-Id: I41f10599997e943cb3391c7a0847d8485b9d8b43
2016-05-10 13:40:37 -07:00
Debargha Mukherjee
3fbe6e5e49 Merge "Wedge rd improvements" into nextgenv2 2016-05-10 20:34:00 +00:00
Debargha Mukherjee
447032eb32 Wedge rd improvements
Improves speed by about 10-15% by combining y-only rd with
modeling function in a better way.
Also, coding efficiency improves by about 0.1%

lowres: -1.805% BDRATE with ext-inter

Change-Id: I6ef1f8942ec6806252f3fcf749ae4f30dffe42b1
2016-05-10 11:47:48 -07:00
Alex Converse
fcc7edd48f Merge "Fix some ans const warnings." into nextgenv2 2016-05-10 18:26:53 +00:00
Yaowu Xu
bf692e853d Merge "Fix build without dual-filter" into nextgenv2 2016-05-10 18:10:44 +00:00
Yaowu Xu
0f1ee1caeb Merge "Remove "const" for parameters passed by value" into nextgenv2 2016-05-10 18:05:27 +00:00
Debargha Mukherjee
03009b2e9e Merge "Use multiple tiles in V10 tile independence tests." into nextgenv2 2016-05-10 18:01:08 +00:00
Debargha Mukherjee
61ac55314d Merge "Break tile row dependencies." into nextgenv2 2016-05-10 18:00:34 +00:00
Debargha Mukherjee
a53337740d Merge "Fix interintra predictor buffer overflow." into nextgenv2 2016-05-10 17:59:33 +00:00
Yaowu Xu
fc9deb6b0c Remove "const" for parameters passed by value
This commit removes const from parameters that are passed by value
for consistency in code style.

Change-Id: I2947c4e9cc6e809c4b9b4c162046e45127b8a41c
2016-05-10 09:30:44 -07:00
Yunqing Wang
a2676565d3 Merge "Refine VP10 REFRESH_FRAME_CONTEXT_MODE" into nextgenv2 2016-05-10 15:53:48 +00:00
Yi Luo
73d28a4068 Merge "Change inverse HT function argument from TXFM_2D_CFG* to int" into nextgenv2 2016-05-10 15:38:11 +00:00
Yaowu Xu
9e41323f13 Merge "Make type conversions explicit" into nextgenv2 2016-05-10 14:33:05 +00:00
Geza Lore
559e8d8e50 Fix build without dual-filter
Change-Id: I91946940c1540c9f935161da89155ed304055fda
2016-05-10 13:12:07 +01:00
Geza Lore
d29062c4da Use multiple tiles in V10 tile independence tests.
Change-Id: I6e5c1cbe1bf40d2f7a0d8bd821cac8ce626ce3b8
2016-05-10 13:09:54 +01:00
Geza Lore
9ab9438fbb Break tile row dependencies.
When not using ext-tile, there were still dependencies between tile
rows due to various tools (eg intra predictors) relying on the above
row or above mode info, which can be in the above tile. This is now
broken (the same way as it was when ext-tile is enabled) by fixing
the appropriate predicates.

Change-Id: I107dd0d8481775a792f14e05cfbbd761f16cdc1e
2016-05-10 13:09:47 +01:00
Geza Lore
e9d2e36264 Fix interintra predictor buffer overflow.
When constructing the intra predictor for rectangular interintra blocks,
the last row/column of the first square is copied back into the source
image (which is the current reconstructed image buffer) before
predicting the second square. The code used to use the height instead
of width for vertical rectangles, and vice versa for horizontal
rectangles, leading to overwriting the block on the right/below. This
leads to an encode/decode mismatch if the right/below block is in a
different tile and is encoded before the current block, which did happen
with multi-threaded encoding tests. This is now fixed.

Change-Id: I073a2a447a98b842b1394d72cc774a78cb296921
2016-05-10 09:53:29 +01:00
Geza Lore
e0dcab9d0c Print mismatch location for failing tests.
Change-Id: Ied6929bf5ac41ca25ee4df4ef19edada5bf1e8cd
2016-05-10 09:53:29 +01:00
Yi Luo
cd8cfb8675 Change inverse HT function argument from TXFM_2D_CFG* to int
This change has no performance impact. It prepares the proper
function interface for better performance optimization.

Change-Id: I12e2f2deaf7f3adc603de0a74852116468c762f6
2016-05-09 18:34:16 -07:00
Yi Luo
6f3e71606f Merge "HBD hybrid transform 16x16 SSE4.1 optimization" into nextgenv2 2016-05-09 23:58:05 +00:00
Alex Converse
a2db5815c0 Merge "Fix ans+ref_mv build." into nextgenv2 2016-05-09 23:55:48 +00:00
Jingning Han
0a91b2da26 Merge "Fix unit test failure due to ext-inter and dual filter" into nextgenv2 2016-05-09 23:54:07 +00:00
Alex Converse
6e14846c9f Fix some ans const warnings.
Change-Id: I508d497803d0c1085aa6a8b26d7a574cb27dd6e2
2016-05-09 16:50:03 -07:00
Sarah Parker
d119a5f5c8 Merge "Edit ext-tx so it isn't doing redundant prunes" into nextgenv2 2016-05-09 22:57:37 +00:00
Alex Converse
8c9da4e943 Fix ans+ref_mv build.
Use vp10_read/write instead of vpx_read/write.

Change-Id: I2b7f17e9cdbea14ff48f4bd9776dd3e6aff17a2b
2016-05-09 15:35:58 -07:00
Alex Converse
afad52c670 Merge "Remove the ANS rename on pack_mb_tokens()." into nextgenv2 2016-05-09 22:16:35 +00:00
Jingning Han
6b8acc2868 Merge "Fix dual filter type for high bit-depth" into nextgenv2 2016-05-09 22:06:09 +00:00
Yunqing Wang
484ba02435 Refine VP10 REFRESH_FRAME_CONTEXT_MODE
In VP10, REFRESH_FRAME_CONTEXT_OFF mode is only set when the error
resillient mode is on. Instead of being used to decide how to update
the frame contexts, it is used to decide if or not to reset the
frame contexts.

To verify, ran borg test on lowres set. The result is neutral.
Overall PSNR: -0.006%; SSIM: -0.006%.

Change-Id: Ic48265cf7488e80c6f5aab3eef7ba1c273506419
2016-05-09 14:20:50 -07:00
Sarah Parker
f546383b73 Edit ext-tx so it isn't doing redundant prunes
The original pruning function was not taking into account
that certain tx sizes/block sizes use a reduced tx set.

Prune 1: -0.3% performance drop, 20% speedup on foreman video
Prune 2: -0.48% perfomance drop, 30% speedup on foreman video

Change-Id: I557e919d97a89f787b47b3c8579a080db57f91d0
2016-05-09 13:35:42 -07:00
Zoe Liu
b9d0d3f4c7 Turn on the use of upsampled refs for ext-refs
Without this patch, the experiment of ext-refs showed almost no coding
gains compared to the baseline. This is because when ext-refs is on, the
use of upsampled reference is off.

With this patch, the ext-refs experiment works with the upsampled
references and shows coding gains in Overall PSNR as follows, with ~5%
slow down for encoding time:

lowres: Avg - -0.965;  BDRate - -0.844
derflr: Avg - -0.847;  BDRate - -0.669

Note that the previous patch a912c6ec314d816767a4c3eb4e5e1bddcc4c1186
that "Make LAST_FRAME always point to the newly coded frame in ext-refs"
made ext-refs work with the upsampled refereces.

Change-Id: Id79248d71760109fb9198af4f45718b17455555f
2016-05-09 13:34:08 -07:00
Alex Converse
4d22cc1578 Remove the ANS rename on pack_mb_tokens().
This fixes the ans+var_tx combination.

Change-Id: I4c34edb1deac4475c97ce1907c1d6bdf23ce3fc0
2016-05-09 12:02:01 -07:00
Yi Luo
412ad22f46 HBD hybrid transform 16x16 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update vp10_fht16x16_test.cc to do bit-exact test against
  latest C version.
- HBD encoder speed improves ~1.8%.

Change-Id: Icfc799a212e5289bcf6cedcae3722032133a2bc6
2016-05-09 11:07:01 -07:00
Jingning Han
1215793007 Fix unit test failure due to ext-inter and dual filter
Make the inter predictor use the right filter type to avoid
enc/dec mismatch.

Change-Id: I2aa416d50450188ec2057dca3338fa258314e562
2016-05-09 16:41:57 +00:00
Geza Lore
1d2d1e752e Merge "Add SSE2 versions of 128x128 vpx_sad*" into nextgenv2 2016-05-09 10:30:59 +00:00
Geza Lore
edf6a708c1 Merge "Unbreak VP9 threading tests." into nextgenv2 2016-05-09 10:30:46 +00:00
Jingning Han
9de916eb20 Fix dual filter type for high bit-depth
This commit fixes the compiler error in high bit-depth inter
predictor when dual filter type experiment is turned on.

Change-Id: I404a76a246477f2fcffc38a3275007d5dfe229cd
2016-05-09 02:14:48 +00:00
Jingning Han
df56fcdf52 Merge "Reduce sizes of some of the tests" into nextgenv2 2016-05-09 02:14:38 +00:00
Yaowu Xu
98c59c98ba Make type conversions explicit
This eliminates MSVC compiler warnings.

Change-Id: Id6ace2586ed7c6248366905b133448fe8ecbd53d
2016-05-07 20:33:40 +00:00
Yaowu Xu
569101bed8 Merge "Make parameter types consistent" into nextgenv2 2016-05-07 20:33:25 +00:00
Yaowu Xu
1a6ec3c756 Merge "Change initializations to be compatible with MSVC" into nextgenv2 2016-05-07 20:33:12 +00:00
Zoe Liu
a912c6ec31 Make LAST_FRAME always point to the newly coded frame in ext-refs
This patch changes the encoder only for the ext-refs experiment. For
each newly coded frame to refresh the LAST_FRAME, the decoder is
notified that the LAST4_FRAME is to be refreshed, and read out the
updated reference frame buffer vitural indexes for the next coded
frame in a way that:
LAST4_FRAME => LAST_FRAME,
LAST_FRAME  => LAST2_FRAME,
LAST2_FRAME => LAST3_FRAME, and
LAST3_FRAME => LAST4_FRAME.

Compared against the original ext-refs experiment in TOT, a small gain
is achieved in overall PSNR:
lowres Avg: -0.154
lowres BDRate: -0.044

Change-Id: I648810c146a3cd915b408274a9373b7d38324864
2016-05-07 00:27:51 -07:00
Jingning Han
bd33326372 Dual prediction filter type for motion compensated reference
Make the bit-stream level support per direction filter type coding
for motion compensated reference.

Change-Id: I61a2360b301075f6734cfd9711b7ae68f214174d
2016-05-07 03:03:04 +00:00
Debargha Mukherjee
a5c4dcb553 Reduce sizes of some of the tests
Change-Id: I846410bd61253d0271c6315d266c6edc2808621d
2016-05-06 17:23:01 -07:00
Yi Luo
7c5fd6aadc Merge "Normalize naming/testing convention in vp10_fht8x8_test.cc" into nextgenv2 2016-05-06 23:48:17 +00:00
Yaowu Xu
ad841b7dac Make parameter types consistent
This fixes compiler warnings from MSVC.

Change-Id: Iaac0e994869561371295578a893f766493ce0544
2016-05-06 23:39:46 +00:00
Yi Luo
fcf54fb7df Normalize naming/testing convention in vp10_fht8x8_test.cc
Use clear and correct type/function names.
Add ASM_REGISTER_STATE_CHECK wrapper for SSE4.1 function.
Conform macro EXPECT_EQ(expected, actual) convention.

Change-Id: I26c6430bea98a4fcb9727eb411b86a3b7abce933
2016-05-06 14:32:12 -07:00
Yaowu Xu
33993798e1 Change initializations to be compatible with MSVC
Change-Id: If5473dadc40d3caea61953fbd112a01939dc1183
2016-05-06 14:20:15 -07:00
Yaowu Xu
71d4e444c1 Merge "Change initializations of variables with type "int_mv"" into nextgenv2 2016-05-06 20:21:39 +00:00
Jingning Han
eb366a0312 Merge "Clean up ext-interp experiment" into nextgenv2 2016-05-06 17:16:56 +00:00
Alex Converse
d5a5bc6522 Merge "Rename pick_filter_intra." into nextgenv2 2016-05-06 16:57:53 +00:00
Yaowu Xu
824a8b228d Change initializations of variables with type "int_mv"
This is to make MSVC happy and eliminate build errors.

Change-Id: Ic81e7c7516923913e6e7a652b691953e4a1af8aa
2016-05-06 16:52:12 +00:00
Yaowu Xu
3be8c5fc64 Merge "Replace inline with INLINE" into nextgenv2 2016-05-06 16:48:54 +00:00
Geza Lore
a0e1c23277 Add SSE2 versions of 128x128 vpx_sad*
Encoder speedup with all experiments enabled approx 15%.

Change-Id: Ib3c771d8da00989ddc9112b71b48ce7c5594e91a
2016-05-06 14:18:00 +01:00
Geza Lore
855b6d7a56 Unbreak VP9 threading tests.
Change-Id: If4eb7094986513ee2e49f7456a2248ad1c54d833
2016-05-06 13:27:52 +01:00
Yaowu Xu
f2512710d5 Replace inline with INLINE
This fixes build issues under MSVC

Change-Id: I6db6a43cba2e8ddb099b676f1ae019fe2742f366
2016-05-05 18:28:04 -07:00
Alex Converse
130cccba8d Rename pick_filter_intra.
The word 'pick' is usually used in functions that make decisions where
the bitstream allows multiple legal choices, and not to limit the
bitstream format itself.

Change-Id: Ia60709c29e004475e1aa8861aefded27ebaf4712
2016-05-05 17:06:54 -07:00
Jingning Han
8b084b683c Clean up ext-interp experiment
Remove the unused sub-experiments within the ext-interp experiment.

Change-Id: I716e3392412d02c56f9395a86c9cab02f580fa59
2016-05-05 16:29:21 -07:00
Jingning Han
2041979a09 Remove misc-fixes flag from the experimental list
This flag is not in effect in the codebase. The related contents
have been merged.

Change-Id: I7125ccbedf39e4683d117ecb72ffdd7547c23fc4
2016-05-05 16:11:32 -07:00
Sarah Parker
9dfe45a84d Merge "Add 1D tx set that corresponds to reduced ext tx inter sets" into nextgenv2 2016-05-05 23:06:14 +00:00
Sarah Parker
867f664e17 Add config flag for new_quant experiment
Change-Id: I3575f688ad473d9750a16c7dae74f5f97d026b26
2016-05-05 22:15:23 +00:00
Zoe Liu
adaa685215 Merge "Add the experiment of bidir-pred" into nextgenv2 2016-05-05 22:04:57 +00:00
Zoe Liu
9e974b86de Add the experiment of bidir-pred
This experiment will implement the use of a backward prediction
reference without temporal filtering. No overlay frame will be
transmitted, instead, the flag of show_existing_frame will be turned
on.

Change-Id: I361a3004344e2ca6b63723f660635c0d790ee036
2016-05-05 14:32:48 -07:00
Jingning Han
4862363e6a Merge "Remove a redundant variable definition from sub8x8 RD loop" into nextgenv2 2016-05-05 21:17:31 +00:00
Debargha Mukherjee
47fd87e77f Merge "Fix mismatch with ext-interp." into nextgenv2 2016-05-05 21:06:51 +00:00
Debargha Mukherjee
8dcb06fd20 Merge "Refactor supertx rd search" into nextgenv2 2016-05-05 21:06:18 +00:00
Jingning Han
cf51217148 Remove a redundant variable definition from sub8x8 RD loop
Change-Id: I464cbb75fbd3872f66ca024dd803605542a9d887
2016-05-05 12:41:05 -07:00
Geza Lore
1502d9e44a Fix mismatch with ext-interp.
The encoder signals the interp filter type in the frame header if all
blocks use the same filter (see bitstream.c:fix_interp_filter). This
decision is made based on the counts, but with ext-interp, the counts
are actually only incremented for blocks that fail vp10_is_interp_needed
(see for example encodeframe.c:update_state), otherwise a default value
is used (EIGHTTAP_REGULAR). The decoder however first checks if the
interp filter is signaled at the frame level, and uses that filter type
for all blocks, even if the default value should have been used.

This patch makes the decoder first check with vp10_is_interp_needed
to see if the default value should be used and then checks the frame
level signaling, which reconciles the difference between encoder and
decoder.

Change-Id: I87857ade42dea06b0d5ec2a029e9219268334dbb
2016-05-05 18:24:21 +01:00
Geza Lore
688f9ed6c3 Improve multi-threaded encode/decode test.
The test used to test that multi-threaded encode/decode resulted in
the same reconstructed image as single-threaded encode/decode. This
however did not mean that the multi-threaded encoder produced the same
bitstream as the single-threaded encoder, as the multi-threaded encoder
could use different forward probability updates and still produce a
bitstream that is sub optimal but yields the same reconstructed image.
The test now asserts that the bitstream is the same as well as the
reconstructed image. Also added more cpu-use values for testing VP10.

Change-Id: I324ed33a702c488b39e077f750d81a1ad1d7ea87
2016-05-05 18:23:31 +01:00
Geza Lore
a905c45c77 Refactor supertx rd search
General code cleanup, but also use the same supertx condition for
ext-partition-types as for conventional partitions.

Change-Id: If86eb18b3c07b9c60434eec2c98b97ce93665b67
2016-05-05 18:20:12 +01:00
Geza Lore
c9cb346e56 Merge "Fix vp10_accumulate_frame_counts once and for all." into nextgenv2 2016-05-05 10:19:41 +00:00
Yaowu Xu
8502727a59 Merge "Change to call build_masked_compound_highbd()" into nextgenv2 2016-05-05 04:09:26 +00:00
Jingning Han
8b2a708b19 Merge "Refactor intra filter type context fetch function" into nextgenv2 2016-05-05 03:32:06 +00:00
Yaowu Xu
f0c7e76717 Change to call build_masked_compound_highbd()
from combine_interintra_highbd(). This fixes a crash in encoder in
highbitdepth build.

Change-Id: I0aa4cc30200703ff21e9990163bb26ace41aabbc
2016-05-04 15:58:15 -07:00
Jingning Han
928d72f365 Refactor intra filter type context fetch function
Factor out common codes from vp10_get_pred_context_intra_interp().
This prevents a potential invalid access of pointers xd->left_mbmi
and xd->above_mbmi.

The coding statistics are identical.

Change-Id: I72dbf9380da7359b997bbe925010faab8e9e7f8d
2016-05-04 15:48:27 -07:00
Yaowu Xu
357c5387d7 Remove the use of non-declared "plane"
The variable is not defined, it is not needed by the called function
either.

Change-Id: Ia601c03231afc0ae68a10ae1f35e8fc4121c3d28
2016-05-04 12:39:37 -07:00
Yaowu Xu
0d7dc0cae1 Change to use proper type in vp10_token_state
"qc" in vp10_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.

This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.

Change-Id: I914c6fd3d3f4b9d061f9ed7cc5f08a883ab59dcd
2016-05-04 11:59:10 -07:00
Sarah Parker
3da61efe3b Add 1D tx set that corresponds to reduced ext tx inter sets
This is the set of 1D transforms that are used in each
ext_tx_used_inter set. The 1D sets will help speed up
the ext tx pruning functions.

Change-Id: Ib46ad26be2df60b3bfcd2f22d96e7f38ae286df5
2016-05-04 11:42:32 -07:00
Geza Lore
c959151fa2 Fix vp10_accumulate_frame_counts once and for all.
This ensures the multi-threaded and single-threaded encoder/decoder
always uses the same probability contexts.

Change-Id: I6f1e7c6bd8808c390c1dc0a628ae97db3acedf6d
2016-05-04 11:32:40 +01:00
Debargha Mukherjee
e536a1cc07 Merge "Compute end of frame data precisely with ext-tile." into nextgenv2 2016-05-03 22:42:42 +00:00
Debargha Mukherjee
4f5045299e Merge "Refactoring and uv fix for wedge" into nextgenv2 2016-05-03 22:36:24 +00:00
Debargha Mukherjee
6c056de8a9 Merge "Test tile row independence." into nextgenv2 2016-05-03 21:50:29 +00:00
Debargha Mukherjee
e88ce7442a Merge "Configure tiles in tests when using ext-tile." into nextgenv2 2016-05-03 21:49:48 +00:00
Jingning Han
9f35cafaa1 Merge "Replace hard coded values in mv_has_subpel" into nextgenv2 2016-05-03 19:25:19 +00:00
Geza Lore
cba70d29ba Compute end of frame data precisely with ext-tile.
Decoding superframes correctly requires computing the end of the
frame contents in the bitstream precisely. This patch enables
ext-tile to do so.

Also extended superframe_test to test with multiple tiles if using
ext-tile.

Change-Id: I04bb8cde8755a3d764ee3c36aa8b7a6c5c9db742
2016-05-03 19:03:00 +01:00
Geza Lore
1982d67774 Test tile row independence.
Tile rows should now be independent, so make pbi->inv_tile_order
invert the decoding order of tile rows as well as tile columns.
This should improve test coverage. Also added more tile configurations
to the tile_independence_tests.

Change-Id: I14b0f2fa9241c1acaf9e2a07071952cb33feca77
2016-05-03 19:03:00 +01:00
Geza Lore
67a2ff7f90 Configure tiles in tests when using ext-tile.
With ext-tile enabled, the encoder test driver needs to configure the
tile sizes wit different values to encode using a single tile, and to
decode all tiles. This should fix most unit test failures.

Change-Id: I0a0d26737414669791f3bd8d80c537db09f06072
2016-05-03 19:02:11 +01:00
Jingning Han
c19ab86202 Merge "Add dual_filter into the experimental flag list" into nextgenv2 2016-05-03 17:29:31 +00:00
Yi Luo
40b524c58e Merge "Enable VP10 HBD PSNR checking unit test" into nextgenv2 2016-05-03 16:20:23 +00:00
Jingning Han
39775e1099 Add dual_filter into the experimental flag list
Change-Id: I3f240fb46716b70e7c0ddd40660f55e1285875cd
2016-05-03 09:18:34 -07:00
Jingning Han
113f8d8746 Replace hard coded values in mv_has_subpel
Change-Id: Id437740c2db1a3a56c1ad29d8b51bb763c044c1d
2016-05-03 09:08:06 -07:00
Debargha Mukherjee
3407785536 Refactoring and uv fix for wedge
lowres: -1.72%

Change-Id: I4c883097caac72fab8e01945454579891617145e
2016-05-03 08:02:08 -07:00
Angie Chiang
47af3efc6e Merge "Fix vp10 highbd psnr calculation" into nextgenv2 2016-05-03 01:29:51 +00:00
Yi Luo
bb6984a833 Merge "Add the 64-bit CPU cycle count utility function" into nextgenv2 2016-05-03 00:51:28 +00:00
Yi Luo
681150f6d3 Add the 64-bit CPU cycle count utility function
Change-Id: Ie87245bbdf5735bc9729199eeb07899d81dbf267
(cherry picked from commit b547a2f38cfab81f6cbe392b9eb48ab0c12b80cf)
2016-05-02 23:16:42 +00:00
Angie Chiang
cf26c03ce9 Fix vp10 highbd psnr calculation
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1209
Change-Id: Ia189ec4c1cc9bbcb79a45d1904567d943e2b3bb6
2016-05-02 15:37:32 -07:00
Yi Luo
d53f9a398a Enable VP10 HBD PSNR checking unit test
We enable this unit test to protect the high bit depth picture
quality from degrading under VP10 optimization and new experimental
tools.

Change-Id: I6297a44cf01954773e06549ab2a68c319fc848a8
2016-05-02 11:29:58 -07:00
Yue Chen
326975ada3 Merge "Bug fixes for obmc/ext-inter/ext-tile experiment" into nextgenv2 2016-05-02 18:09:08 +00:00
Yi Luo
9be7075f61 Merge "HBD hybrid transform 8x8 SSE4.1 optimization" into nextgenv2 2016-05-02 17:34:50 +00:00
Jingning Han
2a39296d57 Merge "Fix encoder failure in segmentation mode" into nextgenv2 2016-05-02 15:39:12 +00:00
Yue Chen
c1d473849e Bug fixes for obmc/ext-inter/ext-tile experiment
Fix 1: in ext-inter + obmc config, properly identify if the left
predictor used for obmc is a compound one in the case that the
neighbor uses wedgeinterinter pred and we will dump the ALTREF part.
This will fix the seg fault in unit test:
VP10/AltRefForcedKeyTestLarge.Frame1IsKey/0

Fix 2: in ext-tile + obmc experiment, handle the case that the
above block does not fit in the same row tile with the current one,
so as to prevent potential crashes.

Change-Id: I1c177d4f4ad15e10d11d8756e146496437753eea
2016-04-29 19:03:39 -07:00
Jingning Han
e729d28c08 Fix encoder failure in segmentation mode
This commit fixes an encoder segment fault in the codebase, when
the segmentation feature is turned on. The issue was introduced in

5cce322 Porting ext_partition experiment from nextgen

Change-Id: Ifb4c06c5a6976114a8bd061d40d0338a136abaaf
2016-04-29 17:59:26 -07:00
Yi Luo
299c5fc202 HBD hybrid transform 8x8 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update bit-exact unit test against current C version.
- HBD encoder speed improves ~3.8%.

Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
2016-04-29 17:04:52 -07:00
Debargha Mukherjee
88fe7871be Refactor wedge generation
Change-Id: I2ec4f562e28a4673477e20186f9d6167b24b76b8
2016-04-28 17:51:21 -07:00
Debargha Mukherjee
cf3ee22597 Merge "Make the backward updates work with bitshifts" into nextgenv2 2016-04-28 20:25:53 +00:00
Debargha Mukherjee
e4bf50b9b9 Make the backward updates work with bitshifts
Removes integer divides from backward updates for VP10.
Currently this is put in as part of the entropy experiment.
Coding efficiency change is in the noise level.

Change-Id: I5b3c0ab6169ee6d82d0ca1778e264fd4577cdd32
2016-04-28 11:51:18 -07:00
Debargha Mukherjee
7ff7943455 Brings back near-near compound mode into ext-inter
lowres: improves by 0.1%

Change-Id: I245019916bf47c6e24bc8c3953b86715ab0193c9
2016-04-28 11:34:13 -07:00
Geza Lore
bf93b38561 Fix some mismatches when using ext-interp.
With ext interp, write_switchable_interp_filter calls
vp10_is_interp_needed, which needs access to the reference frame
buffers to check if they are scaled, the ref frame buffer pointer
at this point used to be uninitialized in the encoder resulting in
bitstream syntax mismatch when the encoder/decoder did not read/write
the interp filter element consistently.

Change-Id: Ie0be2a19cbfcb5639a751aa857458e91c23b8fe3
2016-04-28 18:09:54 +01:00
Alex Converse
b93a1e6b44 Merge changes Ia2dd6bb1,Id1220b03 into nextgenv2
* changes:
  transform tests: Avoid #if inside INSTANTIATE_TEST_CASE_P
  variance_test: Avoid #if inside INSTANTIATE_TEST_CASE_P
2016-04-27 23:59:14 +00:00
Alex Converse
4aafcd220d Merge "convolve_test: Avoid #if inside INSTANTIATE_TEST_CASE_P" into nextgenv2 2016-04-27 23:47:30 +00:00
Alex Converse
13c3757067 Merge "buf_ans: Misc cleanup." into nextgenv2 2016-04-27 23:02:39 +00:00
Hui Su
338c9e704a Merge "ext-intra: completely remove floating point operations" into nextgenv2 2016-04-27 22:00:22 +00:00
Alex Converse
2e520f2768 transform tests: Avoid #if inside INSTANTIATE_TEST_CASE_P
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1200

Change-Id: Ia2dd6bb1ca2dff4422753af4a00156a12e488ed0
2016-04-27 14:39:38 -07:00
Alex Converse
25de2e15a9 variance_test: Avoid #if inside INSTANTIATE_TEST_CASE_P
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1200

Change-Id: Id1220b03e7be931e298848de381fdbce911e4a73
2016-04-27 14:39:37 -07:00
Alex Converse
f03e238f6b convolve_test: Avoid #if inside INSTANTIATE_TEST_CASE_P
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1200

Change-Id: I7f7754e7d1288b103a4510303d10afc68a7d8ca8
2016-04-27 14:39:31 -07:00
Alex Converse
38dfee802f Merge "Fix vp10 txfm on MSVC 2015." into nextgenv2 2016-04-27 21:38:31 +00:00
Alex Converse
fc838a04be buf_ans: Misc cleanup.
Change-Id: I18a3ef2ee6cdda57abcd27683b30b4e3136182c0
2016-04-27 14:10:15 -07:00
Debargha Mukherjee
bd76fc0492 Merge "Turn skip recode off temporarily for ref-mv" into nextgenv2 2016-04-27 20:43:32 +00:00
Alex Converse
97673cb128 Fix vp10 txfm on MSVC 2015.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1187

Change-Id: Ied6d3d003ed6ab9cf4f03cdd1d0037ae755254f4
2016-04-27 19:40:02 +00:00
hui su
6e39af3697 ext-intra: completely remove floating point operations
No performance changes

Change-Id: Ia489041253423ddf8ebc7e2d41fbfb9e138109f0
2016-04-27 12:08:38 -07:00
Jingning Han
22a68fb047 Merge "Fix compound mv costing for ref-mv." into nextgenv2 2016-04-27 17:15:16 +00:00
Debargha Mukherjee
a671241a6e Turn skip recode off temporarily for ref-mv
To fix tests in VP10/AltRefForcedKeyTestLarge.Frame1IsKey/*

Change-Id: I2f4f9fea515c9935d57006b709a9dd524f174b25
2016-04-27 09:55:30 -07:00
Debargha Mukherjee
bc982cc994 Merge "Initialize dummy variables." into nextgenv2 2016-04-27 16:36:24 +00:00
Geza Lore
264d5c446e Fix compound mv costing for ref-mv.
I believe this is necessary for computing the correct rate,
when not doing joint_motion_search.

Change-Id: I7634d6d7a5e6f0a6998edb4d577dd047d80df3c8
2016-04-27 13:37:29 +01:00
Geza Lore
d29ec48504 Initialize dummy variables.
Valgrind flags these up as needed by handle_inter_mode.
Initializing fixes some assertion failures in the unit tests with
only ref-mv enabled.

Change-Id: I4d56c356692745dbecd9f790cdbb8dbfbaf72d55
2016-04-27 13:35:12 +01:00
Geza Lore
4e177393f0 Fix ext-tile without ext-partition.
Default case (when ext-partition was not configured) was incorrect
in encoder tile size initialization.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1197

Change-Id: Ibe57cb1dc16b9fa300573816fc16d2d2f6849fc6
2016-04-27 11:14:48 +01:00
Yue Chen
88bb103f75 Merge "Optimization for EXT_INTER + OBMC" into nextgenv2 2016-04-27 06:29:38 +00:00
Yue Chen
3ac12aecc5 Optimization for EXT_INTER + OBMC
Remove the restriction that the neighboring predictor cannot be
used in obmc prediction if it is an interintra or wedgeinterinter
block. The inter predictor of the interintra block, or the first
inter predictor(using LAST or GOLDEN frame) of the wedgeinterinter
block will be exploited in obmc prediction.

Coding gain: 0.248% (2.833%->3.081%) lowres

Change-Id: I4ac0368b9d2f2956f266b30c1ac97db8bafa0742
2016-04-26 16:50:10 -07:00
Debargha Mukherjee
e5b8a01fd5 Merge "Reduce intra transform set" into nextgenv2 2016-04-26 23:32:16 +00:00
Yue Chen
02e941d371 Merge "Remove double counting for mv costs" into nextgenv2 2016-04-26 21:40:13 +00:00
Jingning Han
2f2448aec9 Merge "Rework motion vector precision limit" into nextgenv2 2016-04-26 21:31:10 +00:00
Yue Chen
34177e673d Remove double counting for mv costs
The bug is introduced by commit 1a0352d, in which mv costs are
counted twice in joint_motion_search() in ext_inter experiment.

Change-Id: Ibace453df999d3c2e781d73f1f0912038fee2d4e
2016-04-26 13:01:52 -07:00
Hui Su
3f7a709676 Merge "ext-intra: get rid of some floating operations." into nextgenv2 2016-04-26 18:53:33 +00:00
Jingning Han
8678ab4c55 Rework motion vector precision limit
This commit enables 1/8 luma component motion vector precision
for all motion vector cases. It improves the compression performance
of lowres by 0.13% and hdres by 0.49%.

Change-Id: Iccfc85e8ee1c0154dfbd18f060344f1e3db5dc18
2016-04-26 10:14:26 -07:00
Debargha Mukherjee
8851acc5ed Reduce intra transform set
Reduce transform set for intra for 8x8 and smalller to 7 from 12.
Also fixes an issue with prob updates.

Enocder Speed-up about 8-10%

Coding efficiency very little change.
lowres: -2.996 (from -3.055 before)
midres: -2.482 (from -2.552 before)

Change-Id: I4ba50ff967521b33c748fe423bd92f7cf4105ebc
2016-04-26 10:10:55 -07:00
Hui Su
1e93a3e64e Merge "Keep track of zcoeff_blk in tx size/type search" into nextgenv2 2016-04-26 16:41:50 +00:00
hui su
ad50c226e6 ext-intra: get rid of some floating operations.
No performance changes.

Change-Id: Idd4043090fec09e57520bc970ed2e39e6f7e1a5e
2016-04-25 14:44:42 -07:00
Debargha Mukherjee
022da62579 Merge "Clear X87 register state before using double." into nextgenv2 2016-04-25 21:42:23 +00:00
Yi Luo
333ff883e1 Merge "HBD hybrid transform 4x4 SSE4.1 optimization" into nextgenv2 2016-04-25 19:43:35 +00:00
Geza Lore
23c4116ebb Clear X87 register state before using double.
MMX and X87 floating point instructions cannot be mixed freely on
the 32 bit x86 architecture.

This fixes a lot of unit tests in the 32bit build with
--enable-ext-intra.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1196

Change-Id: I0e1c3565f4b9cb4fc2d716e94d9c40e68b36fac8
2016-04-25 10:30:20 -07:00
Alex Converse
d4fe243cdf Merge "Raise the probability resolution for rANS tokens to 10-bits per symbol" into nextgenv2 2016-04-25 17:11:16 +00:00
Yi Luo
a4593f17ca HBD hybrid transform 4x4 SSE4.1 optimization
- Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Overall encoder speed improves ~4.5%-6%.
- Update bit-exact unit test against current C version.

Change-Id: If751c030612245b1c2470200c9570cf40d655504
2016-04-25 09:53:09 -07:00
Jingning Han
b4cbe54ed6 Merge "Fix out-of-bound memory access in loop filter" into nextgenv2 2016-04-25 16:13:04 +00:00
Jingning Han
221c09aa99 Merge "Refactor sub-pixel motion search" into nextgenv2 2016-04-25 16:12:51 +00:00
James Zern
0aa6435c45 vp10/rdopt: quiet unused variable warning
when CONFIG_REF_MV and CONFIG_EXT_INTER are enabled

Change-Id: I17fa2b5fe0e1878333099cc5fa2b1ee36636b4d3
2016-04-23 16:59:45 +00:00
Yue Chen
8ce563bf92 Merge "Fix EXT_INTER unit test failure in 32-bit builds" into nextgenv2 2016-04-23 16:51:08 +00:00
Jingning Han
004c7fa668 Fix out-of-bound memory access in loop filter
This commit fixes an out-of-bound memory access case in the
loop filter mask setting. This issue was introduced in

10232ed Refactor loopfilter level arrays to 2D.
https://chromium-review.googlesource.com/#/c/336645/

Change-Id: I7101a4a79b9ecfdd8ec5ef13a0b314cc95f48d12
2016-04-22 22:57:14 -07:00
Yue Chen
6daf1a460e Fix EXT_INTER unit test failure in 32-bit builds
Align new buffers that are used in interintra and wedgeinterinter prediction.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1196

Change-Id: I1ef49fdf13c79a22cf8a1737e3d3052da0a92dfe
2016-04-22 22:37:13 -07:00
Jingning Han
cdf989adb7 Silence compiler above-boundary warnings
Change-Id: I6d806f92e8d38d5b0b01bc8e0fd97bd8839c84df
2016-04-22 15:49:34 -07:00
Alex Converse
1f57aa38cd Raise the probability resolution for rANS tokens to 10-bits per symbol
Change-Id: I397b5a9371c85d1df401d261143c985623e9def6
2016-04-22 15:48:11 -07:00
Jingning Han
77d451ecca Refactor sub-pixel motion search
Unify the rate cost used in the motion estimation process.

Change-Id: I8e52ca9f29eee3469553433302b62fb02a038919
2016-04-22 22:27:09 +00:00
Jingning Han
0dccc85c98 Replace left shift with multiplications
This avoids the potential risk in left shift of negative numbers.

Change-Id: I7aecb499ee6ce7342b172adc4741de5c6c107a24
2016-04-22 20:56:02 +00:00
Jingning Han
3f6ec144e5 Fix an enc/dec mismatch issue in ext-inter experiment
This commit fixes an encoding decision process issue that could
trigger enc/dec mismatch in the ext-inter experiment.

Change-Id: I6f10d1fd2fd1aa04e51df04c39a65cf72ac66c42
2016-04-22 20:49:29 +00:00
Yi Luo
cf7f00691f Change hybrid transform function argument from TXFM_2D_CFG* to int
Unit test shows manually developed SSE4.1 code would performs ~30%
  better if TXFM_2D_CFG configuration is set in lower level. This
  change only updates function signature. There is no performance
  impact.

Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
2016-04-21 18:37:21 -07:00
Alex Converse
8f2fa04181 Unbreak the non-var_tx build.
Change-Id: I76cc3d88122de42f035fbf6508bdf3fd7c995012
2016-04-21 13:27:19 -07:00
Debargha Mukherjee
53968c3917 Merge "Fix uninitialized blk_skip for VAR TX." into nextgenv2 2016-04-21 19:56:17 +00:00
Alex Converse
67058089b4 Merge "Move ZERO_TOKEN into the ANS coef tokenset." into nextgenv2 2016-04-21 18:08:04 +00:00
Angie Chiang
7d598d658c Merge "relax txfm test error constraint" into nextgenv2 2016-04-20 02:17:40 +00:00
Alex Converse
fcea1485bb Merge "Store ANS token CDFs in the FRAME_CONTEXT rather than in a global table." into nextgenv2 2016-04-19 23:29:14 +00:00
Alex Converse
3829cd2f2f Move ZERO_TOKEN into the ANS coef tokenset.
Change-Id: I87943e027437543ab31fa3ae1aa8b2de3a063ae5
2016-04-19 15:29:47 -07:00
Jingning Han
1a0352d18e Merge "Handle zero motion vector residual" into nextgenv2 2016-04-19 21:20:08 +00:00
Hui Su
1e02f2e8a4 Merge "Adjust optimize_b RD parameters" into nextgenv2 2016-04-19 21:18:50 +00:00
Hui Su
ee8c72d95a Merge "Enable optimize_b for intra blocks" into nextgenv2 2016-04-19 21:18:37 +00:00
Angie Chiang
aea0cc5041 Merge "Change the naming of txfm#d_test" into nextgenv2 2016-04-19 19:52:03 +00:00
Angie Chiang
218dfbd547 Change the naming of txfm#d_test
Change-Id: I151b18b38f7a000fb6e431cd42675ac4e7e9e3ca
2016-04-19 11:59:00 -07:00
hui su
7dffb43267 Keep track of zcoeff_blk in tx size/type search
Prevent potential problems when per transform block
zero forcing is re-enabled (a To-Do).

Change-Id: I03b0ab2a86d88058441f2ca18994cfd2e6329898
2016-04-19 11:45:11 -07:00
Yue Chen
feb2184c4e Merge "Remove an unsuccessful adaption of overlap sizes in obmc experiment" into nextgenv2 2016-04-19 18:41:02 +00:00
hui su
ad59b08f76 Adjust optimize_b RD parameters
Coding gain:
lowres  0.44%
midres  0.24%
hdres   0.32%

Change-Id: Ie558203b2b2bf5c16cd49b114df3d696c4f35049
2016-04-19 09:54:08 -07:00
hui su
e43c21112d Enable optimize_b for intra blocks
Coding gain:
lowres  0.05%
midres  0.10%
hdres   0.18%

Change-Id: I508b150c02588f911a8ddddfe73c770f0819fe10
2016-04-19 09:50:45 -07:00
Alex Converse
6ca364606b Store ANS token CDFs in the FRAME_CONTEXT rather than in a global table.
This will facilitate bringing the zero node into the token set while
allowing its probability to vary independently.

Change-Id: I57b44c0fce44debb8e612021e44713b229d1b3cf
2016-04-19 09:39:48 -07:00
Alex Converse
ab759be8d9 Merge "Use an exponential growth approach for the ANS reversal buffer." into nextgenv2 2016-04-19 16:39:18 +00:00
Geza Lore
7aa95be980 Fix uninitialized blk_skip for VAR TX.
x->blk_skip used to be uninitialized (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when using cm->tx_mode != TX_MODE_SELECT.

This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.

Also fixed an edge effect where encode_block in encodemb.c used the
formal width of the block (without cropping at the right edge), to
look up blk_skip, while select_tx_block in rdopt.c used the cropped
width to set blk_skip.

Change-Id: I76d0f49ac5ab3ab54203573e0d7fcfcc1c6aa10d
2016-04-19 17:00:20 +01:00
Yaowu Xu
efc6aa0c97 Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-04-19 15:43:58 +00:00
Geza Lore
8d64b53dc8 Revert "Fix uninitialized blk_skip for VAR TX."
This reverts commit e7b89d88354708790211ff3949fdc705a4fa1672.
2016-04-19 15:41:56 +01:00
Geza Lore
e7b89d8835 Fix uninitialized blk_skip for VAR TX.
x->blk_skip used to be uninitialzied (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when uning cm->tx_mode != TX_MODE_SELECT.

This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.

Change-Id: If39062927446798c626fc93694b4e6a4f35fa5da
2016-04-19 14:22:48 +01:00
Jingning Han
ec2ffda599 Handle zero motion vector residual
This commit handles the zero motion vector residuals for single
and compound reference modes, respectively. It improves the coding
performance by 0.13% with no additional encoding complexity.

Change-Id: I16075a836025bd2746da2ff4698fb9261e4b08c1
2016-04-18 18:14:01 -07:00
Yi Luo
3cf1a082e0 Merge "Disable HBD 4x4 DCT_DCT HT test" into nextgenv2 2016-04-18 23:07:25 +00:00
Yue Chen
c0fd271932 Remove an unsuccessful adaption of overlap sizes in obmc experiment
We removed this adaption, which intended to reduce the size of
overlapped region if the neighboring block is a non-skip one. Thus,
now the width/height of the overlapping region is fixed as a half of
the current block.

Performance improvement (lowres/midres): 0.111%/0.102%

Change-Id: Ife75dad9d4eb355c78a05178b50cc015c442884f
2016-04-18 15:27:59 -07:00
Yaowu Xu
ed04e82a04 Merge branch 'master' into nextgenv2
Conflicts:
	vp10/common/scan.c
	vp9/common/vp9_pred_common.c
	vp9/decoder/vp9_decoder.c

Change-Id: Id559d98ea676da15d60ed464ddb6c48d3eed1111
2016-04-18 15:15:05 -07:00
Jingning Han
2aa6117bda Refactor transform selection process
This commit re-arranges the transform type and size selectio
process. It removes an unnecessary rate-distortion cost computation
step. Local experiments show that this speeds up the encoding
process by 6% for both the baseline and the ext-intra experiment.

Change-Id: Iab3b86a63a1e9e55548466791ed5d29a0575c1e7
2016-04-18 19:45:56 +00:00
Jingning Han
c5449d3eb7 Merge "Refactor rd_variance_adjustment function" into nextgenv2 2016-04-18 19:45:45 +00:00
Angie Chiang
caf066f845 Merge changes I67543d36,I763f2924 into nextgenv2
* changes:
  Reduce shift in txfm8x8
  Let txfm's constant bit be the same for each stage
2016-04-18 19:40:33 +00:00
Yi Luo
dd04329367 Disable HBD 4x4 DCT_DCT HT test
- HBD HT unit tests will be modified to test against new algorithm.

Change-Id: Iba58eeb21a45612685c93c98d7c846dab25e6638
2016-04-18 12:24:31 -07:00
Angie Chiang
d72560e10d Merge "Fit adst/dct's stage range into 32-bit in bd12" into nextgenv2 2016-04-18 18:40:28 +00:00
Angie Chiang
cf3ef18fc4 Merge "Remove double operation from tx_size selection" into nextgenv2 2016-04-18 18:11:36 +00:00
Yi Luo
a431a93cb1 Merge "Improvement on hybrid transform 4x4 DCT_DCT SSE4.1 optimization" into nextgenv2 2016-04-18 18:04:06 +00:00
Angie Chiang
6de4a77df3 Remove double operation from tx_size selection
This CL fix the bug
rdopt.c:1687: choose_tx_size_from_rd: Assertion
`mbmi->tx_type == DCT_DCT' failed

It is caused by
1) mms register access before double operation
2) different compiler behaviors
code:
  int64_t a = INT64_MAX;
  double b = 1. * INT64_MAX;
  printf("a < b: %d\n", a < b);
result:
  a < b: 0

code:
  --target=x86-linux-gcc
  int64_t a = INT64_MAX;
  double b = 1. * INT64_MAX;
  printf("a < b: %d\n", a < b);
result:
  a < b: 1

I remove the double operation and test it with EXT_TX experiment.
The psnr change is around 0.05%, which is considered as noise level.

Change-Id: If8935c70c8603617fcfa8571accd30ccdda786a0
2016-04-18 11:00:13 -07:00
Jingning Han
c8312daad1 Refactor rd_variance_adjustment function
Compute the reconstruction variance in the prediction mode search.

Change-Id: Id9c7635a9c9f5383e61c0e427e95234211834301
2016-04-18 09:37:34 -07:00
Yue Chen
16a99e967c Merge "Optimization for EXT_INTER + OBMC combination" into nextgenv2 2016-04-17 18:54:33 +00:00
Yue Chen
321794c4d5 Optimization for EXT_INTER + OBMC combination
In the rd loop, check the perf of obmc, whose mv is copied from regular
inter predictor, when wedge interinter is better than regular inter
(previously it will force allow_obmc = 0). The condition of the early
termination before this step is relaxed to avoid skipping too many obmc
predictions. The rates of the overhead are properly calculated for these tools.

The logic of the bitstream syntax:
(a single ref) the interintra flag is sent first, only if it is 0, we
send the obmc flag;
(compound refs) the obmc flag is sent first, only if it is 0, we send
the wedge interinter flag

Coding gain
lowres: 0.428% (2.287%->2.715%)

Change-Id: I5f3a34640b398e313cbf84235c9fe2073eb2173f
2016-04-15 17:03:20 -07:00
Yi Luo
71fa2b2218 Merge "Fix an unaligned memory allocation in HT 4x4 speed test" into nextgenv2 2016-04-15 23:56:21 +00:00
Angie Chiang
e7f64756a1 Merge "remove redundant header" into nextgenv2 2016-04-15 22:44:33 +00:00
Angie Chiang
8de8499cc9 remove redundant header
Change-Id: Ib0e880c341adebb238f43a6caeb661e2094e7a93
2016-04-15 15:34:05 -07:00
Angie Chiang
1b0092a76e relax txfm test error constraint
The error is increases because we reduce the const bit
of txfm

Change-Id: I0235a3fdb7dc6a4c0cd1c8cebb369df2a5071b94
2016-04-15 15:25:53 -07:00
Yi Luo
f53ecc21b0 Fix an unaligned memory allocation in HT 4x4 speed test
- Allocate 16-byte aligned memory.
- Disable speed test in unit tests.

Change-Id: Ibef734f4b9d39ad50e9b2e8e0a5d74565d57b409
2016-04-15 14:59:31 -07:00
Yi Luo
f095ea7dd6 Improvement on hybrid transform 4x4 DCT_DCT SSE4.1 optimization
- Implemented Angie's new fwd txfm algorithm.
- Improve ~100% than last 64-bit version; 3 times faster than
  original C code.
- Passed bit-exact unit test.

Change-Id: Ica30b9768706604a6d69fe42da778441f0f5f02e
2016-04-15 14:16:30 -07:00
Jingning Han
4d503d1043 Remove duplicated TxfmFunc declarations
Change-Id: If3876610a1fbce0988cc21ea917596bbb467df93
2016-04-15 12:03:21 -07:00
Zoe Liu
9638ee1f4e Merge "Fix segfault with --cpu-used >= 3 and ext-refs." into nextgenv2 2016-04-15 16:41:15 +00:00
Geza Lore
77d197e635 Fix segfault with --cpu-used >= 3 and ext-refs.
With ext-ref enabled, it is possible that when trying to encode the
first true ALTREF frame after a keyframe, the previous ALTREF frame
(alias for the keyframe) is the same as one of the new LAST{2,3,4}
reference frames, and hence cpi->ref_frame_flags will have the ALTREF
bit clear, as computed by get_ref_frame_flags in encoder.c.

sf->alt_ref_search_fp forces the previous ALTREF frame to
be used as the only possible  reference when encoding a new ALTREF
frame, but due to cpi->ref_frame_flags, some buffers will not be
initialized (see rdopt.c:7689 yv12_mb), leading to a segfault.

get_ref_frame_flags in encoder.c has been changed to prefer to keep
the  LAST frame, then the ALTREF frame, then any of the LAST{2,3,4}
frames and then the GOLDEN frame in that order of preference in case
any of them are the same. This avoids the segfault and behaves the
same for the baseline.

Change-Id: I4da1991667614009da5d3061a6316c0d5dbc6c0c
2016-04-15 11:17:22 +01:00
Angie Chiang
0a715add2e Reduce shift in txfm8x8
Change-Id: I67543d365cbef3c3e113f01660ae8cb744cc556d
2016-04-14 19:12:22 -07:00
Angie Chiang
dfa532cc2a Let txfm's constant bit be the same for each stage
Change-Id: I763f2924afca526db371231bca18b38879bdf793
2016-04-14 15:46:54 -07:00
Angie Chiang
02d23fbbf4 Fit adst/dct's stage range into 32-bit in bd12
Change-Id: Ie428c6f0655873de3e77e844a2f2e4203cf47dff
2016-04-14 15:44:05 -07:00
Jingning Han
019683e963 Merge "Clean up motion vector precision check in the encoding process" into nextgenv2 2016-04-14 20:55:51 +00:00
Jingning Han
79bef030f2 Merge "Apply motion vector precision check to candidate mv" into nextgenv2 2016-04-14 20:55:45 +00:00
Jingning Han
03a468f9ac Merge "Enable mode conversion in sub8x8 block" into nextgenv2 2016-04-14 19:01:15 +00:00
Alex Converse
031fd260f1 Merge "Disable the TestSuperframeIndexIsOptional test with ANS." into nextgenv2 2016-04-14 18:57:28 +00:00
Jingning Han
6af8f63d96 Clean up motion vector precision check in the encoding process
Remove unnecessary motion vector precision check in the encoding
process.

Change-Id: Ica32933c7d138f499f36b1dedec14c894b27d85a
2016-04-14 11:37:19 -07:00
Jingning Han
525995a3d9 Apply motion vector precision check to candidate mv
This avoids repeatedly checking the candidate motion vector
precision level at the decoder end. The compression performance
varies at 0.01% level.

Change-Id: I4a88e95decd900d0cac9a0c2e70ba43ef7ecac38
2016-04-14 09:44:41 -07:00
Jingning Han
cd39224cff Merge "Speed up dynamic motion vector referencing system" into nextgenv2 2016-04-14 16:16:43 +00:00
Hui Su
436a6cc4e7 Merge "ext-tx: use raster scan order for identity transform" into nextgenv2 2016-04-13 23:52:35 +00:00
Jingning Han
885a81f468 Merge "Fix a few mis-use cases of MAX_MV_REF_CANDIDATES" into nextgenv2 2016-04-13 23:44:25 +00:00
Angie Chiang
716f0ea3cf Merge changes I92819356,I50b5a313,I807e60c6,I8a8df9fd into nextgenv2
* changes:
  Branch dct to new implementation for bd12
  Change dct32x32's range
  Fit dct's stage range into 32-bit when bitdepth is 12
  Pass tx_type into get_tx_scale
2016-04-13 23:24:41 +00:00
Alex Converse
91c985fc28 Merge "Convert some vpx boolcoder calls back to vp10 generic calls." into nextgenv2 2016-04-13 23:04:17 +00:00
Hui Su
85a3f5b740 Merge "Speed-up in tx_size search" into nextgenv2 2016-04-13 23:02:21 +00:00
Jingning Han
9a1a8f1d8e Speed up dynamic motion vector referencing system
Skip transform type search in modes with ref_mv_idx > 0. This
brings down the additional encoding time cost due to the DMR system
from 32% to 17%, at minimal coding performance regression.

Change-Id: Ie82e1d2831a313c6f1e47f7da221b51345023eb3
2016-04-13 15:51:36 -07:00
Jingning Han
f33a0a8215 Fix a few mis-use cases of MAX_MV_REF_CANDIDATES
Fix several use cases where MAX_MV_REF_CANDIDATES is mixed up with
is_compound flag to avoid potential coding interruption.

Change-Id: Ifdee1ef8a81ef6d1c155315c6c6a3074aa7a8b5e
2016-04-13 15:16:55 -07:00
Alex Converse
5d2b0f93b9 Use an exponential growth approach for the ANS reversal buffer.
Memory constrained hardware can window the data via our standard windowing
mechanism, tiles.

Change-Id: Ib1cfd157604a8c9d9f9a9f2b0ba3bc2fd0643082
2016-04-13 15:16:29 -07:00
Alex Converse
c3688e398c Disable the TestSuperframeIndexIsOptional test with ANS.
Change-Id: Id55a741e2015c4e01d156d3fe5319498b016b9cf
2016-04-13 14:58:40 -07:00
Jingning Han
e07dbaa2f5 Enable mode conversion in sub8x8 block
Convert the newmv mode into reference motion vector modes.

Change-Id: I51bd2543dafb70345c1340fba700b44f67f20853
2016-04-13 14:35:54 -07:00
Zoe Liu
1d043d56da Merge "Make ext-refs respect encoding flags." into nextgenv2 2016-04-13 19:31:30 +00:00
Debargha Mukherjee
200e50568b Merge "Fix 2 warning when building with GCC 5." into nextgenv2 2016-04-13 19:28:25 +00:00
hui su
6a7ddd84bb Speed-up in tx_size search
Do not consider 4x4 transform when the maximum possible transform
size is 32x32.

Overall encoding speed is increased by more than 10%. Compression
performance is neutral on lowres, midres, and hdres.

Change-Id: Ifac61c3c9f4b0ab392bffd4d1faa373d91014cf1
2016-04-13 10:19:00 -07:00
hui su
b72aa72a90 ext-tx: use raster scan order for identity transform
coding gain of ext-tx:
screen_content 12.73% -> 13.05%

Change-Id: I5fc8cf0db84c3e56dd3cb7675e1d81c9c575bc57
2016-04-13 09:42:43 -07:00
Alex Converse
70bd058352 Merge "Fix the tree diagram comment." into nextgenv2 2016-04-13 16:14:18 +00:00
Geza Lore
c50aaf3049 Make ext-refs respect encoding flags.
The VP8_EFLAG_NO_UPD_LAST and VP8_EFLAG_NO_REF_LAST flags can be
passed to the encoder to signal that it should not update/reference
the LAST ref frame when encoding the current frame. With
--enable-ext-refs turned on, the new LAST2 LAST3 and LAST4 ref frames
could still be used or updated, which causes the
  VP10/ErrorResilienceTestLarge.DropFramesWithoutRecovery/{0,1,2}
tests to fail.

With this patch, if --enable-ext-refs is used, then
VP8_EFLAG_NO_UPD_LAST and VP8_EFLAG_NO_REF_LAST also applies to the
new LAST2 LAST3 and LAST4 ref frames, as well as the LAST ref frame.

Change-Id: If482b1c09bbaf914eca8e0348a2367bff261661d
2016-04-13 12:03:58 +01:00
Geza Lore
c6cf7a6111 Fix 2 warning when building with GCC 5.
These caused the following warning with GCC 5:
     warning: logical not is only applied to the left hand side of
     comparison [-Wlogical-not-parentheses]
     assert(!is_compound == (cm->reference_mode == SINGLE_REFERENCE));

Change-Id: If296aabb2311ceb7d903b395c1549ef81c2cbf9b
2016-04-13 10:49:52 +01:00
Alex Converse
c1729d12b8 Merge "ANS: Remove extra buffer size checks causing a false decode error." into nextgenv2 2016-04-13 01:37:05 +00:00
Alex Converse
af56299119 Fix the tree diagram comment.
Clear up a multiline comment warning and clarify the comment.

Change-Id: Ie0277b4ed4a088a9751e6998f2aeae57d302e6d4
2016-04-12 16:57:08 -07:00
Hui Su
9e8cad3be7 Merge "Add vp10_ prefix to full_to_model_counts and fill_token_costs" into nextgenv2 2016-04-12 23:38:47 +00:00
Alex Converse
493a585273 ANS: Remove extra buffer size checks causing a false decode error.
The minimal ans partition size is now one byte. This is checked in
ans_read_init().

The read_is_valid() condition is handled by setup_token_decoder().

Change-Id: I7b202b896630bc4285532208bf7cf84567afe158
2016-04-12 15:19:30 -07:00
Yi Luo
6db95602e4 Merge "Optimized HBD block subtraction for all block sizes" into nextgenv2 2016-04-12 21:22:32 +00:00
Debargha Mukherjee
ec1365a0c9 Merge "Extend variance based partitioning to 128x128 superblocks" into nextgenv2 2016-04-12 19:42:35 +00:00
Yi Luo
0f80b1f754 Optimized HBD block subtraction for all block sizes
- Interface function takes a local MxN function to call based on the
  block size.
- Repetition call (w/o cache line miss) shows improvement:
  ~63% - ~340%.
- Overall encoder speed improvement: ~0.9%.

Change-Id: Ieff8f3d192415c61d6d58d8b99bb2a722004823f
2016-04-12 12:04:43 -07:00
hui su
0792748646 Add vp10_ prefix to full_to_model_counts and fill_token_costs
Change-Id: I5e6c644fb09f7a80c88142dfdfa05cf5be260241
2016-04-12 11:06:47 -07:00
Angie Chiang
027d12b7d6 Merge changes I359aa49c,Ic8ca5afb into nextgenv2
* changes:
  Generalize txfm scale in highbd quantizer
  Parameterize transform scale for quantizer
2016-04-12 18:02:05 +00:00
Alex Converse
e7224b7866 Convert some vpx boolcoder calls back to vp10 generic calls.
Change-Id: I362f753ff42d4c4fb94df2419cdaad423d7a4229
2016-04-12 11:00:52 -07:00
Geza Lore
61af8981b0 Extend variance based partitioning to 128x128 superblocks
Change-Id: I41edf266d5540a9b070a5e65bc397dd3da210507
2016-04-12 09:40:11 +01:00
Debargha Mukherjee
648538959d Merge "Use reduced transform set for 16x16" into nextgenv2 2016-04-11 23:32:29 +00:00
Debargha Mukherjee
c4da5d500e Use reduced transform set for 16x16
Speed increase for ext-tx by 20% for a BDRATE drop of 0.26%.
The ext-tx expt becomes -2.66% BDRATE (reduced from -2.92%) for
the lowres set.

It turns out that reducing the set of transforms for intra from
12 to 5 makes very little difference in coding performance (~0.04%).
Most of the performance drop comes from the reduction is transform
set for inter. Currently there is a provision to control that with
a macro.

Change-Id: I7de05527bf72f96acc1e0ab8a74a849da0a141e5
2016-04-11 13:04:41 -07:00
Sarah Parker
33ccd0f85e Merge "Fix prune one and two to make compatible with new transforms" into nextgenv2 2016-04-11 17:12:28 +00:00
Yi Luo
fd367c243e Merge "Some cosmetic improvements since HBD variance 4x4 optimization" into nextgenv2 2016-04-11 16:04:02 +00:00
Yi Luo
4c792f2814 Merge "Add unit tests for HBD variance 4x4 SSE4.1 optimization" into nextgenv2 2016-04-11 16:03:38 +00:00
Debargha Mukherjee
9930a00ed7 Merge "Refactor PC_TREE root handling." into nextgenv2 2016-04-09 13:33:53 +00:00
Debargha Mukherjee
38b26b0dc3 Merge "Make subpel masked motion work with upsampled refs" into nextgenv2 2016-04-09 13:30:09 +00:00
Hui Su
5e558121fe Merge "Changes to scan order neighbors" into nextgenv2 2016-04-08 21:18:16 +00:00
Hui Su
7fcf94eb23 Merge "Reformat scan order neighbors" into nextgenv2 2016-04-08 20:57:50 +00:00
Sarah Parker
19e3c6415c Fix prune one and two to make compatible with new transforms
Update svm parameters with training data using new transforms
and remove DST from pruning functions.

Change-Id: I7bd1c4744455d571c1ecfb4cea14c25ac291f002
2016-04-08 11:47:48 -07:00
Yi Luo
c249689b48 Add unit tests for HBD variance 4x4 SSE4.1 optimization
Change-Id: Ib4ceb53cbbc35ecb4e55d74bed30282310075004
2016-04-08 11:16:30 -07:00
hui su
f94d699c09 Changes to scan order neighbors
-Fix some bugs in row_scan and col_scan. In some cases, the above
or left neighbor was not considered even though it is available.

-When above or left neighbor is not available, try using the
top-left, top-right or bottom-left neighbor.

Compression improvement:
lowres   0.20%
midres   0.16%
hdres    0.20%

Change-Id: If521665589c7f29277b8e9223f21f4a8bf3fef39
2016-04-08 11:08:57 -07:00
hui su
b76118b736 Reformat scan order neighbors
Change-Id: Iafcd080612012b08f3cbff45335c12f434543f38
2016-04-08 10:50:13 -07:00
Yi Luo
e5f4e8eab9 Some cosmetic improvements since HBD variance 4x4 optimization
Change-Id: I414c1fabd2e3a9b1d9daa8a90f85a0bace8bd3cd
2016-04-08 10:32:13 -07:00
Geza Lore
f2be4f6058 Refactor PC_TREE root handling.
Change-Id: Id8b16c1b18bd6f909e72aae3fd582dd3503c88c6
2016-04-08 17:01:00 +01:00
Debargha Mukherjee
c485b10416 Make subpel masked motion work with upsampled refs
Change-Id: Id483354e73e983793370b55a1a6a1f2dcd137dc9
2016-04-08 08:54:58 -07:00
Alex Converse
bb0e692151 Convert palette from double to float.
About 20% less time spent coding in vp10_k_means().

Change-Id: I5cf7605cde869a269776197bace70de353b07d83
2016-04-07 15:17:30 -07:00
Alex Converse
d1327aec1b Add roundf and lroundf replacements for VS < 2013.
Change-Id: I25678279ab44672acf680bf04d9c551156e2904b
2016-04-07 15:17:30 -07:00
Geza Lore
454989ff32 Make superblock size variable at the frame level.
The uncompressed frame header contains a bit to signal whether the
frame is encoded using 64x64 or 128x128 superblocks. This can vary
between any 2 frames.

vpxenc gained the --sb-size={64,128,dynamic} option, which allows the
configuration of the superblock size used (default is dynamic). 64/128
will force the encoder to always use the specified superblock size.
Dynamic would enable the encoder to choose the sb size for each
frame, but this is not implemented yet (dynamic does the same as 128
for now).

Constraints on tile sizes depend on the superblock size, the following
is a summary of the current bitstream syntax and semantics:

If both --enable-ext-tile is OFF and --enable-ext-partition is OFF:
     The tile coding in this case is the same as VP9. In particular,
     tiles have a minimum width of 256 pixels and a maximum width of
     4096 pixels. The tile width must be multiples of 64 pixels
     (except for the rightmost tile column). There can be a maximum
     of 64 tile columns and 4 tile rows.

If --enable-ext-tile is OFF and --enable-ext-partition is ON:
     Same constraints as above, except that tile width must be
     multiples of 128 pixels (except for the rightmost tile column).

There is no change in the bitstream syntax used for coding the tile
configuration if --enable-ext-tile is OFF.

If --enable-ext-tile is ON and --enable-ext-partition is ON:
     This is the new large scale tile coding configuration. The
     minimum/maximum tile width and height are 64/4096 pixels. Tile
     width and height must be multiples of 64 pixels. The uncompressed
     header contains two 6 bit fields that hold the tile width/heigh
     in units of 64 pixels. The maximum number of tile rows/columns
     is only limited by the maximum frame size of 65536x65536 pixels
     that can be coded in the bitstream. This yields a maximum of
     1024x1024 tile rows and columns (of 64x64 tiles in a 65536x65536
     frame).

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     Same applies as above, except that in the bitstream the 2 fields
     containing the tile width/height are in units of the superblock
     size, and the superblock size itself is also coded in the bitstream.
     If the uncompressed header signals the use of 64x64 superblocks,
     then the tile width/height fields are 6 bits wide and are in units
     of 64 pixels. If the uncompressed header signals the use of 128x128
     superblocks, then the tile width/height fields are 5 bits wide and
     are in units of 128 pixels.

The above is a summary of the bitstream. The user interface to vpxenc
(and the equivalent encoder API) behaves a follows:

If --enable-ext-tile is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the base 2 logarithm of the desired number of tile columns
     and tile rows. The actual number of tile rows and tile columns,
     and the particular tile width and tile height are computed by the
     codec ensuring all of the above constraints are respected.

If --enable-ext-tile is ON, but --enable-ext-partition is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the WIDTH and HEIGHT of the tiles in unit of 64 pixels.
     The valid values are in the range [1, 64] (which corresponds to
     [64, 4096] pixels in increments of 64.

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     If --sb-size=64 (default):
         The user interface is the same as in the previous point.
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 64 pixels, in the range [1, 64] (which corresponds
         to [64, 4096] pixels in increments of 64).
     If --sb-size=128 or --sb-size=dynamic:
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 128 pixels in the range [1, 32] (which corresponds
         to [128, 4096] pixels in increments of 128).

Change-Id: Idc9beee1ad12ff1634e83671985d14c680f9179a
2016-04-07 10:34:25 +01:00
Angie Chiang
6161f35037 Fix compile error of vp10_fwd_txfm2d_sse4_test.cc
Change-Id: I14a0821631e404d59a1121f7517f97de8d6f790f
2016-04-06 16:26:04 -07:00
Angie Chiang
24f9e42f69 Merge "Test c version in vp10_fwd_txfm2d_test.cc" into nextgenv2 2016-04-06 22:18:23 +00:00
Angie Chiang
6e16582dfd Merge "Put vp10_txfm_test.h into libvpx_test namespace" into nextgenv2 2016-04-06 21:58:08 +00:00
Angie Chiang
29a06a1733 Put vp10_txfm_test.h into libvpx_test namespace
Change-Id: I32ff059143d777fa4518d8e404ff16c890c7fecb
2016-04-06 10:29:27 -07:00
Julia Robson
4300e50cce Fixing assertion in *Large unit tests
In certain cases the code was subtracting the obmc cost
despite it not having been added previously.
For example with ref_mv, supertx, ext_inter, obmc & ext_refs
enabled the following test was failing but now passes:
"VP10/ArfFreqTestLarge.MinArfFreqTest/33"

Change-Id: I966853f34c18d5a1d4c7a56fa201c1b02973fc88
2016-04-06 11:22:10 +01:00
Julia Robson
0f05873c89 Fixing error when building with ref_mv experiment
Code was using variable that was only defined when
var_tx was also enabled

Change-Id: Ide02ff99b433bfc5c95b71e700c66562020cedae
2016-04-06 11:22:10 +01:00
Debargha Mukherjee
de3d15bb2c Merge "Refactoring and cosmetic changes to ext-inter expt" into nextgenv2 2016-04-06 01:19:06 +00:00
Angie Chiang
01862037c8 Test c version in vp10_fwd_txfm2d_test.cc
Change-Id: I6d9a5f790e984d76d0a5af4ca32ccc1c582805ee
2016-04-05 18:03:13 -07:00
Debargha Mukherjee
0fc82ea1cf Refactoring and cosmetic changes to ext-inter expt
Change-Id: Icd457480744b7734b3c412c9fed43be738373334
2016-04-05 15:16:18 -07:00
Yi Luo
c3e07b22c4 Fix high bit depth mask and variance reference function
- Use arithmetic AND (&) instead of logical AND (&&) to
  generate correct testing input.
- Fix variance reference function to be consistent with
  our codebase implementation.
- Refer to the following issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1166

Change-Id: I8c1ebb03e22dc9e1dcd96bdf935fc126cee71307
2016-04-05 12:58:06 -07:00
Yi Luo
e6b089be43 Merge "Optimized HBD 4x4 variance calculation" into nextgenv2 2016-04-05 18:41:09 +00:00
Geza Lore
8917146a12 Fix supertx with ext-tile.
Change-Id: Ic2135c3812be009085c7c8e8dc15ee2ba618a67e
2016-04-05 12:49:55 +01:00
Angie Chiang
ff8c490b9a Branch dct to new implementation for bd12
Change-Id: I9281935653aacce22ac3100f79fb956c249e2bf3
2016-04-04 12:40:10 -07:00
Yi Luo
250935cab3 Optimized HBD 4x4 variance calculation
vpx_highbd_8/10/12_variance4x4_sse4_1 improves performance ~7%-11%.

Change-Id: Ida22bb2a2f7a58037cfd73e186d4f6267a960c02
2016-04-04 11:28:59 -07:00
Angie Chiang
f1060f5bc4 Change dct32x32's range
Bitdepth 10/12:
Fit coefficient range into 32 bits
Fit codfficient * const range into 32 bits

Bitdepth 8:
Fit coefficient range into 16 bits
Fit codfficient * constant range into 32 bits

Change-Id: I50b5a3132e8a9f5155c971ab0f6eb52876d2b5ca
2016-04-04 11:21:11 -07:00
Angie Chiang
39b3c025fa Fit dct's stage range into 32-bit when bitdepth is 12
Change-Id: I807e60c6dcacc50c087adcbdb1df022f8541efc5
2016-04-04 11:13:44 -07:00
Geza Lore
f0290cd127 Refactor get_partition to be universal.
Change-Id: I3a2fe4073bb94c5afc24d9274e6edcdb3aed934f
2016-04-04 15:22:25 +01:00
Geza Lore
e0dbfdeedc Minor refactoring of partition type processing.
Change-Id: Idcb1e94298d4b7d8832d285548ec2d2ced4b2988
2016-04-04 14:51:10 +01:00
Angie Chiang
75ae90f7a9 Pass tx_type into get_tx_scale
Change-Id: I8a8df9fdefa492f66cf2cd29b0b081ad69b5d85e
2016-04-01 12:53:10 -07:00
Alex Converse
d649065ea3 Merge "Remove duplicate ans parameter in bitstream functions." into nextgenv2 2016-04-01 18:35:11 +00:00
Alex Converse
c961bcc594 Merge "ANS experiment: Use ANS everywhere." into nextgenv2 2016-04-01 18:34:50 +00:00
Alex Converse
0f68d80420 Remove duplicate ans parameter in bitstream functions.
Change-Id: Icd459209dae328f90c9a875259fe5d201b2a4e45
2016-04-01 11:33:06 -07:00
Alex Converse
fb9186d68d ANS experiment: Use ANS everywhere.
Use ANS for all entropy coded data in VP10 including the compressed header and
modes and motion vectors. ANS tokens continue to be used for DCT tokens.

Change-Id: Idf709a747150601e4d95d81ecfb3dc7253d349df
2016-04-01 11:32:31 -07:00
Debargha Mukherjee
2fba8189de Merge "Loopfilter fix" into nextgenv2 2016-04-01 17:48:09 +00:00
Angie Chiang
9f879b3c5f Merge "change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1" into nextgenv2 2016-04-01 17:25:23 +00:00
Angie Chiang
2c2b9bd455 Merge "Remove redundant code from vp10_fwd_txfm2d.c" into nextgenv2 2016-04-01 17:25:13 +00:00
Angie Chiang
1b755039c6 Merge "Simplify rounding in vp10_[fwd/inv]_txfm[1/2]d_#x#" into nextgenv2 2016-04-01 17:24:50 +00:00
Angie Chiang
0a9eedfbef Merge "Add vp10_fwd_txfm2d_sse2" into nextgenv2 2016-04-01 17:24:34 +00:00
Debargha Mukherjee
f7457f5e89 Loopfilter fix
Fixes mismatch introduced in
https://chromium-review.googlesource.com/#/c/336645

Change-Id: I15cded221c18dbf87b5029bc464e975d5c7c40e3
2016-03-31 19:57:42 -07:00
Yaowu Xu
a416d5bd2d Fix a build issue
Change-Id: Ifdb32c487632098496bf59fcc76c518f8f0426d2
2016-03-31 16:06:24 -07:00
Debargha Mukherjee
2a6389bb8b Merge "Fix interpolation values and decouple interintra" into nextgenv2 2016-03-31 21:47:10 +00:00
Debargha Mukherjee
2be211e971 Fix interpolation values and decouple interintra
Decouples interintra modes and probability models from regular
intra modes, to enable creating/optimizing new interintra modes.
Also, fixes interpolation values for 128x128 interintra and obmc.

Change-Id: I5c2016db49b8f029164e5fe84c6274d4e02ff90e
2016-03-31 12:12:51 -07:00
Debargha Mukherjee
6d3fc82b7f Merge changes Id20526d0,Iee08d975 into nextgenv2
* changes:
  Refactor loopfilter level arrays to 2D.
  Rename MI_BLOCK_SIZE and MI_MASK macros.
2016-03-31 18:48:20 +00:00
Yi Luo
48aa76ac35 Fixed HBD variance unit test on incorrect bit mask operation
- Change logical AND (&&) to arithmetic AND (&).
- Disable failed unit tests for next-step fixing.
https://bugs.chromium.org/p/webm/issues/detail?id=1166

Change-Id: I75cf8d114fa995731ae1cf4c782b134781274a1f
2016-03-31 09:17:59 -07:00
Jingning Han
aae7e0f6a4 Merge "Refactor the sub8x8 block motion search control" into nextgenv2 2016-03-31 15:50:38 +00:00
Geza Lore
10232eda8e Refactor loopfilter level arrays to 2D.
Change-Id: Id20526d0b6d1371dc9f45cb8b5f24b6974da7bc4
2016-03-31 15:52:12 +01:00
Geza Lore
511da8cbe5 Rename MI_BLOCK_SIZE and MI_MASK macros.
Rename MI_BLOCK_SIZE.* -> MAX_MIB_SIZE.* (MIB is for MI Block).
Rename MI_MASK.* -> MAX_MIB_MASK.*

There are no functional changes.

This is in preparation for coding the superblock size at the frame
level, which will require some of these constants to become variables.
The new names better reflect future semantics, and hence make the code
clearer.

Change-Id: Iee08d97554cf4cc16a5dc166a3ffd1ab91529992
2016-03-31 09:57:41 +01:00
Alex Converse
615482af92 Merge "Use write_modes_b_wrapper throughout." into nextgenv2 2016-03-31 06:04:03 +00:00
Hui Su
cce6688c31 Merge "Set block size upper bound for Palette mode" into nextgenv2 2016-03-31 00:23:11 +00:00
Geza Lore
9d288bf698 Use write_modes_b_wrapper throughout.
Change-Id: Ifbef3aa6e6b0dbc3701a9ef91b8b685a918d84f4
2016-03-30 22:26:54 +00:00
Angie Chiang
c7c40d2329 Generalize txfm scale in highbd quantizer
Change-Id: I359aa49c09b244e0d44ebd09442e365a3d22556c
2016-03-30 15:25:26 -07:00
Angie Chiang
25520d8dc3 change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1
The speed performance for running 20k times  is as follows

Notice that the vp10_highbd_fdct#x#_sse2 version is
16-bit version plus range check

The rest are 32-bit version

vp10_fwd_txfm2d_4x4_c (2 ms)
vp10_fwd_txfm2d_8x8_c (9 ms)
vp10_fwd_txfm2d_16x16_c (45 ms)
vp10_fwd_txfm2d_32x32_c (233 ms)

vp10_fwd_txfm2d_4x4_sse4_1 (2 ms)
vp10_fwd_txfm2d_8x8_sse4_1 (3 ms)
vp10_fwd_txfm2d_16x16_sse4_1 (16 ms)
vp10_fwd_txfm2d_32x32_sse4_1 (80 ms)

vp10_highbd_fdct4x4_c (1 ms)
vp10_highbd_fdct8x8_c (3 ms)
vp10_highbd_fdct16x16_c (17 ms)
highbd_fdct32x32_c (160 ms)

vp10_highbd_fdct4x4_sse2 (0 ms)
vp10_highbd_fdct8x8_sse2 (2 ms)
vp10_highbd_fdct16x16_sse2 (8 ms)
highbd_fdct32x32_sse2 (105 ms)

Change-Id: I24daf1e0d4d66e91e4ce61ef71cefa7b70ee90ce
2016-03-30 15:25:26 -07:00
Angie Chiang
c75f64780b Remove redundant code from vp10_fwd_txfm2d.c
Change-Id: I87ae5e93957616c0f5160a4f679e42f77092c33f
2016-03-30 15:25:26 -07:00
Angie Chiang
f2b311f580 Simplify rounding in vp10_[fwd/inv]_txfm[1/2]d_#x#
Change-Id: I24ce46e157dc5b9c0d75000a1a48e9c136ed4ee1
2016-03-30 15:25:26 -07:00
Angie Chiang
11d2bb5429 Add vp10_fwd_txfm2d_sse2
Change-Id: Idfbe3c7f5a7eb799c03968171006f21bf3d96091
2016-03-30 15:25:26 -07:00
Angie Chiang
64413a6ca7 Parameterize transform scale for quantizer
This is to facilitate changing transform scale later

Change-Id: Ic8ca5afba57d2489ebd191ccc40c1b31605a0d8c
2016-03-30 15:25:26 -07:00
hui su
cbb8be769d Set block size upper bound for Palette mode
Avoid buffer overflow in case of such new experiments as
128 x 128 superblock size.

Change-Id: Ib775f3925a85fc87227c0ddd9b6a6110a12ef196
2016-03-30 14:39:44 -07:00
Debargha Mukherjee
8d3a4aa891 Some fixes/speed-ups on inter-intra part of ext-inter
Fixes an issue with rectangular inter-intra blocks.
Includes various other refactoring and cleanups to enable fast mixing
of inter and intra predictors.
Uses only the best single inter reference so far for the inter-intra
search.

About 30% speed-up with a 0.1% hit in performance.

This is part one of overhauling on the ext-inter experiment. To be
continued in subsequent patches.

Change-Id: Id10ee100c78c6e00009a3a4f930a4435ef403a95
2016-03-30 14:39:29 -07:00
Debargha Mukherjee
91707ac79e Merge "Extend superblock size fo 128x128 pixels." into nextgenv2 2016-03-30 20:55:32 +00:00
Geza Lore
552d5cd715 Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.

Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
2016-03-30 18:23:06 +01:00
Debargha Mukherjee
e467627f33 Merge "Fix for ext_interp experiment" into nextgenv2 2016-03-30 14:44:39 +00:00
Debargha Mukherjee
cd1d01b96a Merge "Add new config flag for global motion experiment" into nextgenv2 2016-03-30 14:36:45 +00:00
Jingning Han
b6238b413e Refactor the sub8x8 block motion search control
Change-Id: Ia340e66e0a61403070adf8e4f18f00eab143f8f7
2016-03-29 09:53:55 -07:00
Hui Su
aa6f5724ec Merge "Palette mode: record selected transform type" into nextgenv2 2016-03-29 16:23:07 +00:00
Alex Converse
21ce8b9671 Merge "Force the VPX boolcoder trees in the ANS test." into nextgenv2 2016-03-29 16:23:00 +00:00
Yaowu Xu
37241e6f95 Merge "Merge branch 'masterbase' into nextgenv2" into nextgenv2 2016-03-29 16:05:53 +00:00
Julia Robson
068e799459 Fix for ext_interp experiment
Amends previous commit to also handle subsampling correctly.
Change ID of prev commit: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc

Was causing occassional failures for 422 streams due to accessing
elements beyond the extent of the bmi array.

Change-Id: I37ebabf4c01ca84bcd1851428172bdf753805d98
2016-03-29 16:09:49 +01:00
hui su
4ab00912c4 Palette mode: record selected transform type
Change-Id: I4c3d3224571176ac924d79ddfaba56990fc4000e
2016-03-28 20:43:59 -07:00
Jingning Han
78ee83125b Merge "Fix a rdcost computation issue in sub8x8 block mode search" into nextgenv2 2016-03-29 00:51:01 +00:00
Yaowu Xu
c810740c36 Merge branch 'masterbase' into nextgenv2
Conflicts:
	vp9/encoder/vp9_encoder.c
	vpx_dsp/x86/convolve.h

Change-Id: I60c3532936bedd796a75dfe78245a95ec21e2e55
2016-03-28 17:44:28 -07:00
Jingning Han
7279a4748f Merge "Rename run_rd_check to run_mv_search" into nextgenv2 2016-03-28 23:10:48 +00:00
Jingning Han
b534987110 Merge "Rework the predicted motion vector for sub8x8 block" into nextgenv2 2016-03-28 23:10:35 +00:00
Jingning Han
d133524e7c Fix a rdcost computation issue in sub8x8 block mode search
Compute the rate-distortion cost for sub8x8 blocks with integer
motion vectors.

Change-Id: I7dc034fcc4bec3850f26d1f9ae0595c91df1137e
2016-03-28 23:09:53 +00:00
Jingning Han
59d45d603b Rename run_rd_check to run_mv_search
Improve the readability in the related rate-distortion optimization
search control function of sub8x8 blocks.

Change-Id: I7f7456bf40a98aa5146abfe0488cda745b84d899
2016-03-28 21:59:10 +00:00
Jingning Han
0586460938 Rework the predicted motion vector for sub8x8 block
This commit makes the sub8x8 block to use its nearest neighbor's
motion vector as predicted motion vector for NEWMV mode. It improves
the coding performance by 0.12%.

Change-Id: I99e56715b327573ce7e8a26e3515a4984dadfd98
2016-03-28 14:58:17 -07:00
Angie Chiang
4144a11552 Merge "Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10" into nextgenv2 2016-03-28 19:20:48 +00:00
Alex Converse
7b22c1d433 Force the VPX boolcoder trees in the ANS test.
Change-Id: I282f958c35aabcdfaf1077f8909c56c999420937
2016-03-28 12:10:08 -07:00
Yunqing Wang
89a8174fec Merge "Make set_reference control API work in VP9 and VP10" into nextgenv2 2016-03-28 18:33:03 +00:00
Hui Su
14f2d03b4b Merge "Fix assertion fail in build_intra_predictors" into nextgenv2 2016-03-28 18:14:47 +00:00
Angie Chiang
33833aefdd Merge "Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10" into nextgenv2 2016-03-28 18:11:47 +00:00
Angie Chiang
46b234478f Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10
Change-Id: I996c48a90d7d71b52594a91a35cb8712c7fc212e
2016-03-28 11:08:40 -07:00
Alex Converse
72e29c3a73 Merge changes I3c72a2d8,I9905f3a8 into nextgenv2
* changes:
  Add pluggable bitwriters.
  Add pluggable bitreaders.
2016-03-28 16:59:18 +00:00
Yunqing Wang
9aaa3c933c Make set_reference control API work in VP9 and VP10
Moved the API patch from NextGen to NextGenv2 and also added this
API to VP10. An example was included. To try it, for example, run
the following command:
$ examples/vpx_cx_set_ref vp10 352 288 in.yuv out.ivf 4 30

Change-Id: Ib56bc3d365e530cfc8d859a13ddbf4c007907b81
2016-03-28 09:55:24 -07:00
Hui Su
9ab6e589b0 Merge "Fixes for Palette mode" into nextgenv2 2016-03-28 16:44:34 +00:00
hui su
f24b91c9e1 Fix assertion fail in build_intra_predictors
Change-Id: Id6683b9593b52aa0d159f8f013782d9e0bd07206
2016-03-28 09:37:54 -07:00
Yi Luo
80bc1d1d90 Merge "8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization" into nextgenv2 2016-03-28 15:39:24 +00:00
hui su
8a128c2a72 Fixes for Palette mode
This patch fixes 2 issues in Palette mode:
1. More memory is needed in PALETTE_BUFFER for 444 video format.
2. A merge issue caused by
https://chromium-review.googlesource.com/#/c/333940/7

Change-Id: I2aedc7dfdfb6b66fbd600189ec6e1e2cc6120d40
2016-03-25 18:16:44 -07:00
Alex Converse
9859dde47b Merge "Use speed 2 on superframe test." into nextgenv2 2016-03-26 00:49:06 +00:00
Alex Converse
0aef392f1b Use speed 2 on superframe test.
No need to do avoid shortcuts when all we are testing is the superframe
syntax. Decreases the run time up the VP10 version of the test from 22
seconds to 3 seconds on my machine.

Change-Id: If0c3551cbb8af8b803e02629e803e5f09da76cd1
2016-03-26 00:48:21 +00:00
Alex Converse
297c91a857 Merge "Fix failing test: VP10/SuperframeTest.TestSuperframeIndexIsOptional/0" into nextgenv2 2016-03-26 00:14:21 +00:00
Yi Luo
770bf71503 8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization
- Wrote function: fidtx8_sse2() and fidtx16_sse2().
- Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
- Updated 8x8/16x16 unit tests for accuracy/speed.
- Running 20K times with random numbers and getting through
  tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
  8x8: ~131%
  16x16: ~66%

Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
2016-03-25 16:48:19 -07:00
Alex Converse
c5d118f103 Fix failing test: VP10/SuperframeTest.TestSuperframeIndexIsOptional/0
Failing since: 66f2f65 Merge MISC_FIXES

Change-Id: I8135d6a9d74972c595f1b6294fb842e68f91e50e
2016-03-25 16:07:00 -07:00
Yue Chen
e63792e5cf A major speed up for obmc experiment
Skip checking obmc when regular inter predictor is not so good (the
rd-cost for Y residual is greater than the total rd of the best mode
so far.)

Performance change compared to full rd search:
  +0.006% lowres, -0.056% midres
Encoding time :
  1.14X baseline (was 1.42X)

Change-Id: I11350f955a20e1a2331be458537a915e09fbedf3
2016-03-25 14:06:52 -07:00
Alex Converse
d5c6b83431 Merge "Fix memory leak and slopiness around the uncompressed ANS buffer." into nextgenv2 2016-03-25 20:08:01 +00:00
Yunqing Wang
916bdfd9ac Merge "Recover tile coding performance" into nextgenv2 2016-03-25 19:18:58 +00:00
Alex Converse
30097af4ea Fix memory leak and slopiness around the uncompressed ANS buffer.
Change-Id: Ic9ed1f88f5550b69a45a0fdc71aae5864db7e178
2016-03-25 11:11:07 -07:00
Alex Converse
65bea98d74 Add pluggable bitwriters.
This will make the code change for a pure ANS experiment manageable.

Change-Id: I3c72a2d8e75afa2cc8e56992ee91f4760202f4d4
2016-03-25 11:02:41 -07:00
Alex Converse
efd566ff93 Add pluggable bitreaders.
This will make the code change for a pure ANS experiment manageable.

Change-Id: I9905f3a89f492a4346860463a72fa8c52aac4c8e
2016-03-25 11:02:41 -07:00
Hui Su
f9d77d66e6 Merge "Speed up ext-intra" into nextgenv2 2016-03-25 17:52:33 +00:00
Yunqing Wang
bdcc14051b Recover tile coding performance
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.

Change-Id: I8f736db8388408a8cc35320a2f80abb02906571c
2016-03-25 09:05:25 -07:00
hui su
c85a68123f Speed up ext-intra
Skip filtered intra modes search in inter frame when DC mode is
worse than the best mode so far.

With ext-intra enabled, the overall speed is increased by 20~40%;
performance drop is 0.03% on lowres and 0.05% on midres.

Change-Id: I75d2503b067cf5e46e3533b97fb01497e125baa7
2016-03-24 21:43:18 -07:00
Yi Luo
07a1fd413f Merge "4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization" into nextgenv2 2016-03-25 00:07:17 +00:00
Yi Luo
4970388c23 4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization
- Added function fidtx4_sse2().
- Turned on vp10_fht4x4_sse2() for these tx types.
- Updated 4x4 unit test for speed/accuracy.
- 4x4 Unit test passed.
- Running 20K times with random numbers for tx type from
  V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.

Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf
2016-03-24 15:09:18 -07:00
Jingning Han
9cb3664c0e Fix compiling error in highbd transform unit test
Change-Id: Id09e1913c1ac965b78df2e67471807019c89f8ab
2016-03-24 21:30:43 +00:00
Jingning Han
fa76102929 Merge "Fix an enc/dec mismatch issue in DRL experiment" into nextgenv2 2016-03-24 19:02:13 +00:00
Jingning Han
4823dc364e Fix an enc/dec mismatch issue in DRL experiment
This was broken due the leakage between consecutive CLs.

Change-Id: I08ba8c67a42871d9488729ed854845641aa7ca30
2016-03-24 09:48:54 -07:00
Geza Lore
490ba1ad25 Port large scale tile coding features from nextgen.
If configured with --enable-ext-tile, the codec uses an alternative
tile coding syntax in the bitstream. Changes include::
 - The maximum number of tile rows and columns is extended to 1024
   each.
 - The minimum tile width/height is 64 pixels (1 superblock).
 - A tile copy mode is added where a tile directly reuse the coded
   data of a previous tile
 - The meaning of the tile-columns and tile-rows codec parameters are
   overloaded to mean tile-width and tile-height in units of 64
   pixels.
 - All tiles should now be independent, including rows within the
   same columns, so large scale parallel, or independent decoding is
   possible.
 - vpxdec also gained the options to decode only a particular tile,
   tile row, or tile column.

Changes without --enable-ext-tile:
 - All tiles should now be independent, including rows within the
   same columns, so large scale parallel, or independent decoding is
   possible.
 - vpxenc default tile configuration changed to use 1 tile column.

Change-Id: I0cd08ad550967ac18622dae5e98ad23d581cb33e
2016-03-24 09:26:05 +00:00
Angie Chiang
b4334460cb Merge "Call vp10_fwd_txfm_4x4 in encode_inter_mb_segment" into nextgenv2 2016-03-24 00:38:04 +00:00
Yi Luo
ea94451f20 Merge "Misc. updates for highbd changes" into nextgenv2 2016-03-23 22:43:47 +00:00
Sarah Parker
091f0804e4 Add new config flag for global motion experiment
Change-Id: I312af6af911cd0f52357745324b74e56a8d08d70
2016-03-23 15:24:44 -07:00
Yi Luo
659c2c98e1 Misc. updates for highbd changes
- Use Makefile to control the build for highbd_fwd_txfm_sse4.c.
- Fixed hybrid transform (HT) types due to recent update.
- Added new unit test cases for highbd HT.

Change-Id: Ifd768a9b429a8c21ed40c1de8152fb5ac71e2f90
2016-03-23 12:10:52 -07:00
Jingning Han
1fcb5fc755 Refactor motion vector residual coding process
This commit separates the predicted motion vector from the nearestmv
motion vector in the coding process for both regular and sub8x8
block sizes.

Change-Id: I703490513b0194e6669ebf719352db015facb3e1
2016-03-23 12:10:38 -07:00
Angie Chiang
d9a0cbb1b7 Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10
Change-Id: Ie35bdbd7aafae693e3106d7ccbbdd8e65ee8800c
2016-03-23 12:05:12 -07:00
Angie Chiang
2b93fde9da Call vp10_fwd_txfm_4x4 in encode_inter_mb_segment
Change-Id: Ieabe5534e5f4fb3f2d751a3cfc682208b3913715
2016-03-23 11:43:45 -07:00
Yi Luo
deb33056d1 Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2 2016-03-23 18:30:40 +00:00
Hui Su
daf2fb42e6 Merge "Add "entropy" experiment" into nextgenv2 2016-03-23 17:50:57 +00:00
Alex Converse
a06e39a945 Merge "Add buf_ans.h to the Makefile." into nextgenv2 2016-03-23 16:27:13 +00:00
Alex Converse
b5454b245a Merge "Add some ANS helpers needed to replace the vpx bool coder with pure ANS." into nextgenv2 2016-03-23 16:21:58 +00:00
Hui Su
13501fe45f Merge "Small speed up for super_block_uvrd" into nextgenv2 2016-03-23 16:16:46 +00:00
Yi Luo
977dccd12c Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
  intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
  and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
  sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
  --bit-depth=12, 50 frames.
- Unit test passed.

Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
2016-03-23 09:13:45 -07:00
Debargha Mukherjee
7a3bae768e Merge "Porting ext_partition experiment from nextgen" into nextgenv2 2016-03-23 04:58:38 +00:00
Debargha Mukherjee
a61e506200 Make the tile independence test shorter
Uses 15 frames instead of 30. Also only test speed 0 for VP10.

Change-Id: Icace050edd974622d83bdf843058f63bd8d3a84b
2016-03-22 18:00:04 -07:00
Alex Converse
6b9cb8c489 Add some ANS helpers needed to replace the vpx bool coder with pure ANS.
Change-Id: I32b63fca020c410cef16e93379b4e6e281ccbccd
2016-03-22 16:23:23 -07:00
Yue Chen
2613b5e9d6 Merge "Refactor prediction functions of OBMC" into nextgenv2 2016-03-22 21:06:16 +00:00
Julia Robson
5cce322a09 Porting ext_partition experiment from nextgen
This has been ported under ext_partition_types because it is due
to be combined with the coding_unit_size experiment which is
already being ported under ext_partition

Change-Id: I47af869ae123ddf0aa99160dac644059d14266ee
2016-03-22 12:29:01 -07:00
Alex Converse
b00c09026c Wrap write_modes functions with macros to avoid ifdefs at all the callsites.
Change-Id: I5a960bf63ec404f0fbfe6a404f436ef4122a219d
2016-03-22 10:02:23 -07:00
Angie Chiang
9d380d8872 Merge "mv vp10_fwd_txfm2d_#x# into vp10_rtcd.h" into nextgenv2 2016-03-22 01:07:56 +00:00
Angie Chiang
063e965d7d Merge "Passing TXFM_TYPE instead of func pointer" into nextgenv2 2016-03-22 01:07:42 +00:00
Yue Chen
b5083af67a Merge "Refactor transform type-size search function" into nextgenv2 2016-03-22 00:58:44 +00:00
Jingning Han
4df51c8de4 Merge "Refactor sub8x8 reference motion vector search function" into nextgenv2 2016-03-22 00:07:45 +00:00
Jingning Han
bfdcccd8a1 Merge "Rework the DRL syntax entropy coding system" into nextgenv2 2016-03-22 00:07:36 +00:00
Yue Chen
2e3f77316d Refactor prediction functions of OBMC
Merge the functions that generate prediction by above/left predictors
for the encoder and the decoder.

Change-Id: I57e53a8f2eb8d3028c4ed0c9abdcbf00503f95a0
2016-03-21 17:04:13 -07:00
Yue Chen
7c1f6d1862 Refactor transform type-size search function
Decompose choose_tx_size_from_rd into three functions that determine
the transform coding rd at different stages. Besides the original
function, txfm_yrd() calculates the rd for fixed size and type.
choose_tx_size_fix_type() fixes the type and searches for the size.
It can enable other experiments to do restricted tx searches so as to
reduce the impact on speed.
Similar refactoring is done for select_tx_type_yrd() in VAR_TX.

Performance change in baseline is trivial:
0.014/0.001/-0.020 for lowres/midres/hdres.

Change-Id: I2ecbf6066329be088ec1bfb69013b657b14b8afe
2016-03-21 16:12:05 -07:00
Alex Converse
e6a136e864 Add buf_ans.h to the Makefile.
Change-Id: I6339912d240a1c2c3aa2f7303e7ca4d9721a29f4
2016-03-21 14:13:26 -07:00
Yaowu Xu
cbfc15b11b Merge "Properly set rate_nocoef when pallete mode is used" into nextgenv2 2016-03-21 20:44:17 +00:00
Debargha Mukherjee
c28dbdf665 Merge "Adds 1D transforms for ADST/FlipADST to make 16" into nextgenv2 2016-03-21 20:40:21 +00:00
Alex Converse
d324c6b025 Write MB tokens using the forward buffered ANS writer.
This allows sharing more code paths with the rest of the code an allows
for easier compatibility with the other experiments.

Change-Id: Id288b533805a4d0657ec2f17542f2e6ad23ebdb4
2016-03-21 18:43:14 +00:00
Alex Converse
109ef96a5f Merge "Add a placeholder forward buffered ANS coder." into nextgenv2 2016-03-21 18:41:32 +00:00
Debargha Mukherjee
1b17559327 Adds 1D transforms for ADST/FlipADST to make 16
Makes a set of 16 transforms total, adding all 1D
combinations of ADST and FlipADST, and removng all DST
transforms.

lowres, midres both improve by about 0.1% and hdres by
-0.378% in BDRATE but with fewer transforms that are also
simpler.

Further experiments to continue later.

Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
2016-03-21 11:19:36 -07:00
Yaowu Xu
c96c3fa2b3 Properly set rate_nocoef when pallete mode is used
Change-Id: Iff04c82b3d3b5cf2c7700717c3c3d678bbbb9f9b
2016-03-21 11:07:53 -07:00
Angie Chiang
abd447e339 mv vp10_fwd_txfm2d_#x# into vp10_rtcd.h
Change-Id: Iad7352698786791b0fd7c005a7edfd1724b71599
2016-03-21 10:51:54 -07:00
Angie Chiang
40ef86f27d Passing TXFM_TYPE instead of func pointer
This is to facilitate sse2 implementation

Change-Id: Id2f53e83c5508c4445d9b1bba00a649cb4da6b74
2016-03-21 10:50:59 -07:00
Jingning Han
66df6e7c7f Refactor sub8x8 reference motion vector search function
Rework the interface to allow codec store the reference motion
vector list information for coding process.

Change-Id: I47e26587f6c0808655e4626f316ec7614a7ad8ed
2016-03-21 10:02:08 -07:00
Jingning Han
5c9d315572 Rework the DRL syntax entropy coding system
This commit re-designs the probability model for the syntax elements
of the dynamic motion vector referencing system.

Change-Id: Icfb8203c7e8f64e10e99f5890e25e6f6b15fe5d1
2016-03-21 09:52:33 -07:00
Jingning Han
4914ae4622 Merge "Enable dynamic motion vector referencing for newmv mode" into nextgenv2 2016-03-19 00:40:04 +00:00
Debargha Mukherjee
6074403d2d Merge "Add ext_partition_types config option" into nextgenv2 2016-03-18 23:51:55 +00:00
Debargha Mukherjee
3c065ac46a Merge "Refactor bsse and skip_txfm in MACROBLOCK." into nextgenv2 2016-03-18 23:51:40 +00:00
Debargha Mukherjee
05029a47a1 Merge "Refactor save_context restore_context in rd_pick_partition." into nextgenv2 2016-03-18 23:51:06 +00:00
Debargha Mukherjee
0ac48f8f65 Merge "Refactor mbmi->inter_tx_size to 2D array." into nextgenv2 2016-03-18 23:50:25 +00:00
Sarah Parker
0adb805db9 Merge "Remove prune three from speed features" into nextgenv2 2016-03-18 21:29:24 +00:00
Sarah Parker
fab5454a16 Remove prune three from speed features
Not getting good results for this feature, will try again
when transforms are frozen.

Change-Id: Id12396786cb9369ad34d0bd845f7beba3a037726
2016-03-18 13:06:40 -07:00
Alex Converse
44ce668063 Add a placeholder forward buffered ANS coder.
This buffered ANS coder supports coding the symbols in forward (decode)
order. Rather than windowing or growing the buffer, right now this
coder merely asserts that the buffer will never overflow.

This approach should allow ANS to be used as a drop in replacement for
other entropy coders rather than requiring complicated reversal logic
throughout the codebase.

Change-Id: I6689271233d0e22fea94c51950415dad5af96598
2016-03-18 19:33:45 +00:00
Yaowu Xu
42e5c2ad8a Two minor logic fixes
Change-Id: I1d5624fb2f34f87a55613036851034ec7c2d0b76
2016-03-18 11:48:19 -07:00
Jingning Han
93748c3e4f Enable dynamic motion vector referencing for newmv mode
This commit enables the dynamic motion vector predictor for NEWMV
mode. It allows the codec to select the best motion vector predictor
in a rate-distortion optimization framework for motion vector
residual coding. The compression performance is improved:
lowres  0.14%
midres  0.27%
hdres   0.24%

Change-Id: I6a601c74eb6cb0b71a613336d40363359f2edecd
2016-03-18 09:51:37 -07:00
hui su
30d2d9783e Small speed up for super_block_uvrd
Eearly termination if U plane RD cost is large enough.

No notable compression performance changes.

Change-Id: Ieeefc5859cb55d94391b502b4bd840bc8bcb2578
2016-03-18 09:28:10 -07:00
Geza Lore
f8cfb72a32 Refactor bsse and skip_txfm in MACROBLOCK.
Simple refactoring to 2 dimensional arrays, in preparation for 128
wide superblocks.

Change-Id: I40d447bd9fbd4f755534ea3cc82fc8f4676cea07
2016-03-18 15:30:10 +00:00
Geza Lore
efe7d4e5a2 Refactor mbmi->inter_tx_size to 2D array.
This is in preparation of increasing the superblock size.

Change-Id: I9197e397399fbe8aec1178a45ea0337dd90412d7
2016-03-18 15:30:09 +00:00
Julia Robson
8bc8f2711a Add ext_partition_types config option
Change-Id: I91d280a64d1da77be31004ea91d08d1a70529e42
2016-03-18 15:22:42 +00:00
Hui Su
cb61bfa695 Merge "Minor bug fix in ex-intra experiment" into nextgenv2 2016-03-18 05:01:56 +00:00
hui su
507c55b227 Minor bug fix in ex-intra experiment
No performance change observed from borg tests.

Change-Id: I20c232c5dde8cfc84452a4c7185389bd8b812ffd
2016-03-17 19:37:21 -07:00
Sarah Parker
99235fd908 Merge "Fix hbd segfault for prune_one and prune_two" into nextgenv2 2016-03-17 22:40:47 +00:00
Angie Chiang
cb3072ca4a Refactor: call inv_txfm_add
Change-Id: I52c209a5db1b4a6525c04b70291a08ab5a68c6fe
2016-03-17 11:19:00 -07:00
Angie Chiang
6b55b8dc93 Refactor:Merge inverse_transform_block_inter/intra
Merge inverse_transform_block_inter inverse_transform_block_intra
to inverse_transform_block

Change-Id: I0f561830e639e3bf8d831c28a7c784cb0c4c8b09
2016-03-17 11:19:00 -07:00
Angie Chiang
e8bcce1255 Merge "add dct 64x64 transform" into nextgenv2 2016-03-17 18:15:11 +00:00
Sarah Parker
79dd1ee907 Fix hbd segfault for prune_one and prune_two
Change-Id: I71da102550aa7d81961e5f10c71058d5fa8dc6c4
2016-03-17 09:58:29 -07:00
Geza Lore
24659dbca3 Refactor save_context restore_context in rd_pick_partition.
This is a cosmetic patch that removes a great deal of conditional
compilation around CONFIG_VAR_TX from the partition search function.

Change-Id: I9dcef9d4fe6847b793c77bdf565a5cacbdfacd59
2016-03-17 15:53:59 +00:00
Yaowu Xu
51f951292a Merge "Properly save and restore skip related variables" into nextgenv2 2016-03-17 14:41:03 +00:00
Angie Chiang
ed2514a22c add dct 64x64 transform
Change-Id: I131c4d1216cd156e520b8a91c4438c2d3c6602cb
2016-03-16 19:37:21 -07:00
Yunqing Wang
a129905c70 Merge "Optimize HBD up-sampled prediction functions" into nextgenv2 2016-03-16 23:52:08 +00:00
Yaowu Xu
4fcc4f816e Properly save and restore skip related variables
Change-Id: Id52744e140585c08d047fb395b75ac9318a0b4c3
2016-03-16 16:28:52 -07:00
hui su
83b47af18d Add "entropy" experiment
This patch added two features to improve entropy coding efficiency
for coefficient tokens.

1. Choose 1 of 4 default probability tables based on q-index for
key-frames.
It is ported from nextgen branch:
https://chromium-review.googlesource.com/#/c/280586/

2. Do backward update after each superblock (64X64) row using
subframe token counts.

Coding gain: 0.1% on lowres; 0.42% on midres; 0.36% on hdres.
Much larger gain for key-frames: 2.6%, 2.3%, 1.7%.

Design doc: go/huisu-entropy

Change-Id: Ia3b6a615636be09247d70e4c520405637561532b
2016-03-16 11:55:50 -07:00
Angie Chiang
f330444a09 Merge "remove filter_cache" into nextgenv2 2016-03-16 17:21:53 +00:00
Geza Lore
c2005c578b Factor out zeroing above and left context.
Change-Id: I6e5d8cff869c7415a924f845c9e6ccaabe2b7a9b
2016-03-16 13:08:29 +00:00
Geza Lore
8735c17ceb Fix build without supertx.
Change-Id: Ib60821487710f6cf06aaa8ddcbdd5487ba1cbe4f
2016-03-16 13:08:13 +00:00
Yaowu Xu
fced640a0a Merge "Properly set the estimate of rate_nocoef" into nextgenv2 2016-03-15 23:23:12 +00:00
Yaowu Xu
2becffaef5 Properly set the estimate of rate_nocoef
This commit fixes the computation of rate_nocoef for situation when
rate_y is uninitialized at INT_MAX for  x->skip is true.

Change-Id: If3dde4e4ee16667f4408067d3bb3084f916272f1
2016-03-15 15:00:53 -07:00
Angie Chiang
b6fef12481 remove filter_cache
PSNR test
        lowres  hdres
lowbd   -0.013  0.067
highbd  -0.044  0.039

Change-Id: Iefdb1e966bd004b2027456778185b675e8fb9b81
2016-03-15 14:40:59 -07:00
Hui Su
c67dd39adc Merge "Add "entropy" experiment flag" into nextgenv2 2016-03-15 20:46:09 +00:00
Debargha Mukherjee
dcbbb81605 Merge "Refactor 1D transforms" into nextgenv2 2016-03-15 19:08:07 +00:00
hui su
08d7f44c39 Add "entropy" experiment flag
For experiments to improve compression efficiency of entropy coding.

Change-Id: Idf30dc554bd7eea3a79d21d18515c3e6b8f64b26
2016-03-15 11:44:01 -07:00
Debargha Mukherjee
cb37db126e Merge "Fix copy/zero macros." into nextgenv2 2016-03-15 17:45:31 +00:00
Jingning Han
b00aa8f216 Merge "Turn off 32x32 transform type selection" into nextgenv2 2016-03-15 16:59:37 +00:00
Geza Lore
a301f5e03d Fix copy/zero macros.
Change-Id: I2df3b6ecd35406ee05c2aa4e49be779e73e1bdc6
2016-03-15 10:57:58 +00:00
Debargha Mukherjee
9b88762b17 Refactor 1D transforms
In preparation for adding more 1D variants with ADST/FlipADST/etc.

BDRATE actually improves by 0.21% on lowres.

Change-Id: I2fa4720c69fe001fa666119a284dfc6b17fffab2
2016-03-14 22:30:09 -07:00
Yunqing Wang
5f5552d846 Optimize HBD up-sampled prediction functions
Optimized 2 up-sampled reference prediction functions in high-bit
depth case. This reduced the HBD encoding time by 3%.

Change-Id: I8663ffb5234f5e70168c0fc9ca676309fe8e98f2
2016-03-14 19:04:33 -07:00
Hui Su
429d304bc1 Merge "Fix typos in unit tests" into nextgenv2 2016-03-15 01:15:06 +00:00
Yue Chen
66e6fb84de Merge "Speed up rd selection in OBMC experiment" into nextgenv2 2016-03-15 00:14:06 +00:00
hui su
b1a3871542 Fix typos in unit tests
Change-Id: Idff52b337ab2d494c0c26e0d2c71ab3ee8208691
2016-03-14 16:32:41 -07:00
Yue Chen
b5f8b70ce5 Speed up rd selection in OBMC experiment
Instead of testing all interpfilter-BMC/OBMC combinations, we choose
the best interpolation filter based on regular inter prediction.

Reduction in encoding time: ~10%
Drop in performance gain: 0.08% lowres, 0.04% midres

Change-Id: Ifc19097a918ac76b529db9af4c60e2c70e93f7ad
2016-03-14 15:36:44 -07:00
Jingning Han
a2c87a3dda Turn off 32x32 transform type selection
Temporarily disable transform type selection for 32x32 transform
block size. This speeds up the encoding process. For bus at CIF
150 frames, the encoding time goes from 896s -> 762s (11% faster).
The compression performance for lowres set is improved by 0.15%,
and -0.029% for hdres.

Change-Id: If239b272970eb302150bec13b8cf192fbe045332
2016-03-14 11:31:03 -07:00
Yunqing Wang
91b8236cdd Merge "Add high-precision sub-pixel search as a speed feature" into nextgenv2 2016-03-12 02:26:36 +00:00
Angie Chiang
46cd6ee9bd Merge "Fix sub8x8 interpolation full pixel bug" into nextgenv2 2016-03-12 01:45:27 +00:00
Yunqing Wang
e6e2d886d3 Add high-precision sub-pixel search as a speed feature
Using the up-sampled reference frames in sub-pixel motion search is
enabled as a speed feature for good-quality mode speed 0 and speed 1.

Change-Id: Ieb454bf8c646ddb99e87bd64c8e74dbd78d84a50
2016-03-11 16:32:11 -08:00
Debargha Mukherjee
e38e2ad86e Merge "Fix an overflow in highbitdepth loop restoration" into nextgenv2 2016-03-11 21:48:37 +00:00
Angie Chiang
c0f708c03a Merge "convolve8 sse2 test" into nextgenv2 2016-03-11 19:57:30 +00:00
Hui Su
f0e0a7e7e9 Merge "Complete (mostly) migration of palette mode" into nextgenv2 2016-03-11 19:52:41 +00:00
Hui Su
571072b84b Merge "Fix a bug in ext-intra experiment" into nextgenv2 2016-03-11 19:52:34 +00:00
Debargha Mukherjee
7ea59de69c Fix an overflow in highbitdepth loop restoration
Change-Id: Ie20cd35a4c96443c0de234d2cf097187a70ec8dd
2016-03-11 11:48:24 -08:00
Hui Su
f7fbc54bd1 Merge "Fix compiler warnings" into nextgenv2 2016-03-11 19:47:39 +00:00
hui su
8102aeb368 Fix a bug in ext-intra experiment
Change-Id: I6fab352eb1f7d9c5dc783a4d4d878b6b42838ca2
2016-03-11 10:23:51 -08:00
hui su
8fce4b8543 Fix compiler warnings
Change-Id: I00314ec296e8368f1239a556b3a55feac9cec7ae
2016-03-11 10:13:08 -08:00
Jingning Han
68d9a14e9f Merge "Enable hybrid 1-D/2-D transform coding for highbd setting" into nextgenv2 2016-03-11 18:09:11 +00:00
hui su
78b0bd0a0d Complete (mostly) migration of palette mode
Coding gain on screen_content is 12.2% (was 6.6%).

Some features such as frame-level color buffer, adaptive
entropy coding, are coming in future patches.

Change-Id: I2658cf5ec0cbb02cff685475759f3b68c9807697
2016-03-11 09:56:21 -08:00
Sarah Parker
09368fcf99 Filling in speed feature functions for ext tx search
Filled in prune one and prune two. Prune three is still
being experimented with.

Change-Id: Ic07f828c448e86cacb0369aa3a9a0feb2edae054
2016-03-10 14:08:13 -08:00
Debargha Mukherjee
ce4b35d510 Merge "Adds compound wedge prediction modes" into nextgenv2 2016-03-10 17:44:45 +00:00
Jingning Han
c453ae53d0 Enable hybrid 1-D/2-D transform coding for highbd setting
This commit enables the hybrid 1-D/2-D transform coding scheme for
high bit-depth setting. It improves the compression performance of
ext-tx experiment by 0.98% for lowres_all set.

Change-Id: Ic27f5037f2c36b095a93b9f15dbae34bdcdf00aa
2016-03-10 08:58:07 -08:00
Debargha Mukherjee
f34deab243 Adds compound wedge prediction modes
Incorporates wedge compound prediction modes.

Change-Id: Ie73b54b629105b9dcc5f3763be87f35b09ad2ec7
2016-03-10 07:19:54 -08:00
Jingning Han
ccc809f30c Merge "Fix an assertion condition in transform type search" into nextgenv2 2016-03-10 00:20:30 +00:00
Yi Luo
431e35913e Merge "Implemented DST 16x16 SSE2 intrinsics optimization" into nextgenv2 2016-03-09 22:27:44 +00:00
Jingning Han
240ae9729e Merge "Add horizontal and vertical scan order for 1-D transform" into nextgenv2 2016-03-09 20:47:06 +00:00
Angie Chiang
836e83c49f Fix sub8x8 interpolation full pixel bug
Change-Id: I5df744dc6b21ed9dbbf6ddf38004f2a9e88b7d00
2016-03-09 11:15:19 -08:00
Jingning Han
02734b6457 Fix an assertion condition in transform type search
Change-Id: I442475e559be2acdc1c2a3e5ca021b3de77adda5
2016-03-09 19:07:23 +00:00
Jingning Han
e0413094fb Add horizontal and vertical scan order for 1-D transform
This commit enables the 1-D transform to use Manhattan grid vertical
and horizontal scan order for transform coefficient entropy coding.

Enabled in inter prediction mode, the hybrid 1D/2D transform coding
scheme outperforms the 2D-DCT based coding system used in VP9 by
lowres_all  1.7%
hdres_all   1.4%

As one coding option, in addition to the existing 17 other transform
types in ext-tx experiment, the 1D/2D hybrid transform improves
the coding gains:
lowres_all  2.2% -> 3.0%

Change-Id: I9cefa9d9e38224546d0afd67feecd9f8d4a16ab0
2016-03-09 10:58:23 -08:00
hui su
954e560f9e Refactor entropy coding of transform size
No performance change.

Change-Id: If35125fed909d89235b303514f77a33183bb36b3
2016-03-08 16:46:00 -08:00
Yi Luo
50a164a1f6 Implemented DST 16x16 SSE2 intrinsics optimization
- Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
- Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
- Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
  fwd_txfm_16x16().
- Added vp10_fht16x16_sse2() unit test against C version:
  vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
- Unit test passed.
- Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
  and mobile_cif.y4m.

Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
2016-03-08 14:56:38 -08:00
Debargha Mukherjee
e3d001ea43 Merge "Adds an ext-tile config flag to host new tiling" into nextgenv2 2016-03-08 17:14:45 +00:00
Yaowu Xu
de661cdbc5 Merge "Fix several MSVC compiler warning/errors" into nextgenv2 2016-03-08 16:44:17 +00:00
Debargha Mukherjee
86088511bd Adds an ext-tile config flag to host new tiling
Change-Id: I6c3bf5545c42030b484a8aaf434b63bd409a0487
2016-03-08 07:18:07 -08:00
Yaowu Xu
28eb784e46 Fix several MSVC compiler warning/errors
Change-Id: Iccaacee9b7a66b016b5747a3902c236888ad4ba1
2016-03-07 17:00:03 -08:00
Yi Luo
cf9c95c32c Merge "Added vp10_fht8x8_sse2() unit test" into nextgenv2 2016-03-08 00:30:06 +00:00
Yue Chen
043b698a87 Merge "Calculate the distortion in pixel domain for sub8x8 rd selection" into nextgenv2 2016-03-08 00:13:46 +00:00
Yue Chen
ef8f7c1211 Calculate the distortion in pixel domain for sub8x8 rd selection
Pixel domain distortion calculation is enabled for the rd loop of
inter sub8x8 and intra 4x4 cases.

Coding gain: 0.124% derflr, 0.122% derfhd

Change-Id: I43b47fe81b4f5ccc1c66bc626bd310c413a1ed87
2016-03-07 14:49:22 -08:00
Yi Luo
6ab062124d Added vp10_fht8x8_sse2() unit test
- Inherited base class TransformTestBase to derived class VP10Trans8x8HT.
- Employed RunCoeffCheck() to test vp10_fht8x8_sse2() against C reference
  function vp10_fht8x8_c().
- fdst8_sse2() related seven hybrid transform cases are covered in this
  test.
- Test passed (4 test cases w/o EXT_TX; 16 test cases with EXT_TX).

Change-Id: Id9a9b308c707164a120d9ceb2c30e572026fb1d0
2016-03-07 14:25:07 -08:00
Alex Converse
76d4fdd391 Merge "ANS: Switch from PDFs to CDFs." into nextgenv2 2016-03-07 20:51:45 +00:00
Debargha Mukherjee
c7d77b32dc Merge "Extend convolution functions to 128x128 for ext-partition." into nextgenv2 2016-03-07 19:54:45 +00:00
Debargha Mukherjee
6adfba7c0f Merge "Make sharp filter 10 tap and makes sharp2 sharper" into nextgenv2 2016-03-07 19:51:42 +00:00
Yi Luo
42c08a3f52 Merge "Added vp10_fht4x4_sse2() unit test" into nextgenv2 2016-03-07 19:18:34 +00:00
Jingning Han
79c5a533cd Merge "Hybrid 1-D/2-D transform coding" into nextgenv2 2016-03-07 19:15:44 +00:00
Jingning Han
a8dc9694a4 Hybrid 1-D/2-D transform coding
This commit enables a hybrid 1-D/2-D transform coding scheme and
the accompany entropy coding system. It currently uses hybrid
1-D/2-D DCT transform coding. It provides coding performance gains:

lowres_all  0.55%
hdres_all   0.43%

Change-Id: I2b30dcafd21eb2bb3371f6e854cbab440a4dfa78
2016-03-07 09:27:46 -08:00
Sarah Parker
df3849370a Merge "Adding speed feature interface for ext tx search" into nextgenv2 2016-03-07 16:32:55 +00:00
Debargha Mukherjee
1815961469 Merge "Add 128 pixel variance and SAD functions" into nextgenv2 2016-03-07 16:02:05 +00:00
Geza Lore
938b8dfc73 Extend convolution functions to 128x128 for ext-partition.
Change-Id: I7f7e26cd1d58eb38417200550c6fbf4108c9f942
2016-03-07 11:39:27 +00:00
Hui Su
5e5bef6c18 Merge "Cleanup in get_uv_tx_size" into nextgenv2 2016-03-05 07:42:26 +00:00
hui su
c3c1c6f405 Cleanup in get_uv_tx_size
Change-Id: Ia2aa7558f9f53da7dff970b30fe0a94958159ffb
2016-03-04 16:53:19 -08:00
Yue Chen
10cdeab42a Fix a bug in obmc prediction
For left side obmc, the input of the mask function is corrected as
the column coordinate.
Also, minor fixes for a compiler warning.

Change-Id: Ia981ef443d5b0285a93d73e5c7ab83f8c3a23464
2016-03-04 15:54:14 -08:00
Yi Luo
267f73a1f7 Added vp10_fht4x4_sse2() unit test
Inherited class TransformTestBase to derived class VP10Trans4x4HT.
Employed RunCoeffCheck() to test vp10_fht4x4_sse2() against
C reference vp10_fht4x4_c().
fdst4_sse2() related seven hybrid transform cases are covered
 in this test.
Wrote a header file for test base class. Some modification to
make sure the base class can be used for 8x8, 16x16, 32x32 cases.
All related tests passed.

Change-Id: I6b19a39d3ea30b657847781e78e73b829998a57a
2016-03-04 14:19:30 -08:00
Sarah Parker
2ca7d42e7e Adding speed feature interface for ext tx search
This sets up the interface for 3 speed features that progressively
eliminate a greater number of transforms in ext tx using
pre-trained support vector machines.
Each speed feature still needs to be implemented.

Change-Id: Ia508aeadc0cffdc080fb227f357a5d1dfbca08e2
2016-03-04 10:27:21 -08:00
Jingning Han
351ca31238 Merge "Apply mv precision check to reference mv candidate" into nextgenv2 2016-03-04 16:54:27 +00:00
Jingning Han
04cb49385e Merge "Properly restore transform block skip flag in RD search" into nextgenv2 2016-03-03 23:30:58 +00:00
Jingning Han
7174d637e8 Properly restore transform block skip flag in RD search
This commit fixes an encoding issue related to var-tx and ref-mv
experiments that causes the codec to use random values for transform
block skip flag.

Change-Id: I8daa6d6b88ea45b5bbeb81b43dd0eeff545c8e5a
2016-03-03 13:52:49 -08:00
Yi Luo
6231b6b077 Merge "Fixed a computation bug in fdct16_sse2()" into nextgenv2 2016-03-03 20:05:36 +00:00
Debargha Mukherjee
7d2618bc70 Make sharp filter 10 tap and makes sharp2 sharper
There is a ~0.1% gain.

Various experiments with different kinds of windowing functions to
follow.

Change-Id: I0787fddca53607ab39e53f919066839301938e68
2016-03-03 12:01:55 -08:00
Geza Lore
697bf5beff Add 128 pixel variance and SAD functions
Change-Id: I8fde245b32c9e586683a28aa6925da0b83850b39
2016-03-03 10:24:29 +00:00
Alex Converse
6bbbe31656 ANS: Switch from PDFs to CDFs.
Make the RANS implementation operate on cumulative distribution
functions rather than individual probability distribution functions.
CDFs have shown themselves more flexible to work with.

Reduces decoding memory usage from scaling O(num_distributions *
symbol_resolution) to O(num_distributions).

No bitstream change. This is an purely implementation change.

Change-Id: I4e18d3a0a3d37a36a61487c3d778f9d088b0b374
2016-03-03 09:32:54 +00:00
Jingning Han
13fb7c1b88 Apply mv precision check to reference mv candidate
This allows the codec to use effective motion vector as the candidate
to produce the reference motion vector list.

Change-Id: Ib90be705fe28200c13376d6d7741800a61f13043
2016-03-02 20:14:07 -08:00
Yi Luo
68d6a5073a Fixed a computation bug in fdct16_sse2()
fdct16_sse2() was not bit-exact with C reference, fdct16().
The inconsistency was found by writing a unit test for
vp10_fht16x16_sse2().  Since the unit test needs a pending
change on the inherited base class.  I will commit this unit
test after making a header file for this base class.
Passed the uncommitted unit test: vp10_fht16x16_test.cc.

Change-Id: If2b617883c633a3ea90c19e1d018240c8007102b
2016-03-02 15:20:12 -08:00
hui su
ebc6e058db Fix a bug in vp10_predict_intra_block
Avoid mistakenly setting "have_right" as 0 for UV channel in blocks
of width no larger than 8.

Change-Id: Ic2b031e32f967a23fd118a052bf9edd7d5a3abe6
2016-03-02 11:22:09 -08:00
Debargha Mukherjee
339ef0ce7a Merge "Adds masked variance and sad functions for wedge" into nextgenv2 2016-03-02 03:28:39 +00:00
Debargha Mukherjee
1d69ceee5c Adds masked variance and sad functions for wedge
Adds masked variance and sad functions needed for wedge
prediction modes to come.

Change-Id: I25b231bbc345e6a494316abb0a7d5cd5586a3a54
2016-03-01 17:28:56 -08:00
Yaowu Xu
9425616615 Merge "Fix a unused function warning with var_tx on" into nextgenv2 2016-03-02 01:11:17 +00:00
Hui Su
90fe1cffbf Merge "Fix a couple of minor bugs in vp10_has_right and vp10_has_bottom" into nextgenv2 2016-03-02 00:33:38 +00:00
Yunqing Wang
84f982080a Minor fix in header files
Move functions to be included in extern "C".

Change-Id: If57fa5eb7955763cf99e6839dde4d7221fad75ea
2016-03-01 13:16:03 -08:00
Yaowu Xu
3d89d059dc Merge "Fix an overflow issue for HBD" into nextgenv2 2016-03-01 19:22:48 +00:00
Yaowu Xu
0cfa89c0eb Fix a unused function warning with var_tx on
Change-Id: I1e65d7e1586d8c7c65bb150b1a928cf3adf97366
2016-03-01 11:05:48 -08:00
hui su
935a837c01 Fix a couple of minor bugs in vp10_has_right and vp10_has_bottom
The above-right and left-bottom pixels were sometimes not used even
though they are available. Results on lowres_all and hdres_all are
mostly neutral.

Change-Id: Ic13533dd498442ad5592b83bb5fabf053cc8e8f0
2016-03-01 10:09:04 -08:00
Yaowu Xu
5c613ea881 Fix an overflow issue for HBD
The sum of squared value of a block can overflow 32bit, this commit
changes to use int64_t to avoid the overflow issue.

Change-Id: I78fcd6999634f186f86d649cfce85d97a993d040
2016-03-01 09:44:04 -08:00
Angie Chiang
7667733991 Update obmc counts in multithread mode
Change-Id: I0743e00dad9d36a87870c480922f5ae904bd5c9d
2016-02-29 17:09:02 -08:00
Yunqing Wang
342a368fd4 Do sub-pixel motion search in up-sampled reference frames
Up-sampled the reference frames to 8 times in each dimension using
the 8-tap interpolation filter. In sub-pixel motion search, use the
up-sampled reference frames to find the best matching blocks. This
largely improved the motion search precision, and thus, improved
the compression quality. There was no change in decoder side.

Borg test and speed test results:
1. On derflr set,
Overall PSNR gain: 1.306%, and SSIM gain: 1.512%.
Average speed loss on derf set was 6.0%.
2. On stdhd set,
Overall PSNR gain: 0.754%, and SSIM gain: 0.814%.
On hevchd set,
Overall PSNR gain: 0.465%, and SSIM gain: 0.527%.
Speed loss on HD clips was 3.5%.

Change-Id: I300ebaafff57e88914f3dedc8784cb21d316b04f
2016-02-29 12:14:47 -08:00
Debargha Mukherjee
db084506d8 A build fix and some other cosmetic changes
Fixes some issues introduced by a merge of two patches.
Also decouples the temporal interpolation filter from the switchable
filters for now for ease of experimentation with both separately.

Change-Id: If1c7c08adf00e0cf818fe8d0d3656c26ea65eb32
2016-02-29 10:20:52 -08:00
Debargha Mukherjee
48589e8d07 Merge "Some refactoring and cleanups of interp filter" into nextgenv2 2016-02-29 15:55:48 +00:00
Hui Su
95428a5926 Merge "Fix compiler warnings" into nextgenv2 2016-02-27 05:04:02 +00:00
Jingning Han
0fc0c1a32d Merge "Enable improved temporal filter in ext-interp experiment" into nextgenv2 2016-02-27 01:22:15 +00:00
Jingning Han
dca86af8f4 Merge "Unify frame border extension operation" into nextgenv2 2016-02-27 01:22:03 +00:00
Debargha Mukherjee
bab2912b5e Some refactoring and cleanups of interp filter
Includes various cosmetic changes and refactoring including
naming the sharp filters differently (since they are no longer
8-tap).

Change-Id: Ida5a19ca0daa9f6a64a6734394c685b2a4a2564a
2016-02-26 15:42:49 -08:00
Jingning Han
95d35a4a0b Enable improved temporal filter in ext-interp experiment
It improves the coding performance by 0.3%.

Change-Id: I9703abd705ceacdf9e7424428e5120253cadcc18
2016-02-26 21:59:51 +00:00
Jingning Han
d1d11fc6dd Unify frame border extension operation
This commit unifies the encoder and decoder border extension and
motion compensated prediction process. Remove the decoder specific
flow to simplify the development flow.

Change-Id: I9c43bbe6d7c017e6da2db6a62c5bf3d0af7ccfce
2016-02-26 13:58:53 -08:00
hui su
4aeabf1b0d Fix compiler warnings
Change-Id: Id7240260cec471a3f8d0986b9c8df06efda925f9
2016-02-26 13:52:49 -08:00
Geza Lore
7ded038af5 Port interintra experiment from nextgen.
The interintra experiment, which combines an inter prediction and an
inter prediction have been ported from the nextgen branch. The
experiment is merged into ext_inter, so there is no separate configure
option to enable it.

Change-Id: I0cc20cefd29e9b77ab7bbbb709abc11512320325
2016-02-26 13:01:51 -08:00
Debargha Mukherjee
3287f5519e Merge "Hooks to use 32x32 masked transforms for ext-tx" into nextgenv2 2016-02-26 20:54:37 +00:00
Yi Luo
b347c3c5e5 Merge "Implemented DST 8x8 with SSE2 intrinsics." into nextgenv2 2016-02-26 19:10:00 +00:00
Jingning Han
2b7196a8bb Merge "Use sharp filter for alter reference frame generation" into nextgenv2 2016-02-26 16:24:59 +00:00
Jingning Han
83ecafbd95 Merge "Enable context based motion vector entropy coding" into nextgenv2 2016-02-26 16:24:49 +00:00
Yaowu Xu
a570cefcf8 Merge "Extend vpxssim to handle more HBD combinations" into nextgenv2 2016-02-26 15:57:40 +00:00
Jingning Han
72eda13e50 Use sharp filter for alter reference frame generation
This commit uses 12-tap sharp filter to generate alter reference
frame. It improves the compression performance by
derf    0.45%
hevcmr  0.35%
stdhd   0.79%

No encoding time change is observed.

Change-Id: Ia5dc26d5aae6b9b0cb782e5a28dc5066eeeb2ec8
2016-02-25 14:20:38 -08:00
Hui Su
1226e734a0 Merge "Add test for screen content coding tools in end to end test" into nextgenv2 2016-02-25 03:47:03 +00:00
Angie Chiang
8878fa4f9a convolve8 sse2 test
This experiment shows that when frame size is 64x64
vpx_highbd_convolve8_sse2 and vpx_convolve8_sse2's speed are similar.
However when frame size becomes 1024x1024
vpx_highbd_convolve8_sse2 is around 50% slower than vpx_convolve8_sse2
we think the bottleneck is from memory IO

VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_8_64
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_8_64 (17 ms)
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_16_64
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_16_64 (42 ms)
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_32_64
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_32_64 (139 ms)
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_64_64
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_64_64 (499 ms)

VP10ConvolveTest.vpx_convolve8_sse2_speed_l_8_64
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_8_64 (16 ms)
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_16_64
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_16_64 (40 ms)
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_32_64
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_32_64 (130 ms)
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_64_64
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_64_64 (485 ms)

VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_8_1024
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_8_1024 (32 ms)
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_16_1024
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_16_1024 (61 ms)
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_32_1024
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_32_1024 (196 ms)
VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_64_1024

VP10ConvolveTest.vpx_highbd_convolve8_sse2_speed_64_1024 (694 ms)
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_8_1024
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_8_1024 (21 ms)
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_16_1024
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_16_1024 (44 ms)
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_32_1024
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_32_1024 (138 ms)
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_64_1024
VP10ConvolveTest.vpx_convolve8_sse2_speed_l_64_1024 (491 ms)

Change-Id: I3131a031e0380e8eae748cfcccc6cbb961d05943
2016-02-24 17:01:20 -08:00
hui su
827e1b3fef Add test for screen content coding tools in end to end test
Test screen content coding tools (currently only palette) at
speed 1 and two-pass.

Change-Id: I3c467aee1cd9c366c65a3abfdccfafa0416b59b7
2016-02-24 15:27:07 -08:00
Yi Luo
0353f596e9 Implemented DST 8x8 with SSE2 intrinsics.
Implemented fdst8_sse2() function against C version: fdst8().
Added seven DST related hybrid transform types in vp10_fht8x8_sse2().
Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8().
Speedup: 18.1%, 11.5%, 22.0% based on speed test from
city_cif.y4m, garden_sif.y4m, mobile_cif.y4m.

Change-Id: Ia4aa1ea44c7a33e494f64ce843037f8703f975e3
2016-02-24 14:58:01 -08:00
Debargha Mukherjee
da2d4a7afc Hooks to use 32x32 masked transforms for ext-tx
Adds hooks to use 32x32 ext-tx. Also adds scan orders for the masked
transforms for 32x32.
Make macro USE_MSKTX_FOR_32X32 1 in blockd.h to support 32x32 masked
transforms for ext-tx.

Change-Id: Ie6564830266651fcafae2d536c274dafd664ce17
2016-02-24 13:08:37 -08:00
Debargha Mukherjee
389efb289e Adds an utility macro ROUNDZ_POWER_OF_TWO
This macro works for the shift parameter being 0.
The ROUND_POWER_OF_TWO macro does not.

Change-Id: I8434d2933892e09bbc0d2dafc934d0c3637df347
2016-02-24 12:35:29 -08:00
Hui Su
aa703adb46 Merge "Fix some compiler warnings." into nextgenv2 2016-02-24 20:28:37 +00:00
Debargha Mukherjee
ad574d4008 Merge "Some fixes in reconintra" into nextgenv2 2016-02-24 20:25:25 +00:00
hui su
8537826eb4 Fix some compiler warnings.
"taking the absolute value of unsigned type 'unsigned int' has no effect"

Change-Id: Iea1f67c2a3171a98ca89d5dc7192a5508d086c16
2016-02-24 11:17:33 -08:00
Yaowu Xu
aa6c754635 Merge remote-tracking branch 'webm/master' into nextgenv2 2016-02-24 10:53:17 -08:00
Debargha Mukherjee
3ef0db078e Some fixes in reconintra
Change-Id: I0b0fa7c9853ce12d39ee21829686b308154b2c61
2016-02-24 10:49:35 -08:00
Debargha Mukherjee
557cb9a879 Merge "Rename above and left offset variables." into nextgenv2 2016-02-24 18:48:22 +00:00
Debargha Mukherjee
c1e51beba6 Merge "Experiment to use image domain dist in baseline." into nextgenv2 2016-02-24 18:30:50 +00:00
Geza Lore
44dba01f3e Rename above and left offset variables.
These variable names were legacy from a previous version of this
function and in the current version they were confusingly backwards.

Change-Id: I4f6c1628f296fd5b650fd9c5e2d56d7daf66a3f6
2016-02-24 17:39:48 +00:00
Jingning Han
47bc2a5741 Enable context based motion vector entropy coding
This commit enables a context based motion vector entropy coding
conditioned on dynamic reference motion vector list. This (along with
the previous CL) imporves the coding gains due to dynamic motion
vector referencing based entropy coding:
derf   0.1%
hevcmr 0.2%
stdhd  0.7%
hevchr 0.4%

No encoding time change was observed.

Change-Id: I179c723844079195f6952a12582996a3ca9e9914
2016-02-24 09:02:32 -08:00
Yue Chen
02e734168c Merge "Optimizing obmc rd decision by checking the real rd cost" into nextgenv2 2016-02-23 23:05:06 +00:00
Yue Chen
a614262edb Optimizing obmc rd decision by checking the real rd cost
Instead of using model_rd_for_sb() to estimate the cost and make the
decision on bmc/obmc, we use super_block_yrd/uvrd() to calculate and
compare the real rd costs of bmc and obmc.

Average bit-rate reduction(%) of obmc experiment:
derflr/derfhd/hevcmr/hevchd
2.353/TBD/TBD/TBD
Before the optimization, the coding gain was:
1.582/1.109/1.600/1.164

Note: there is still some mysterious bug because that compared to
the previous version, the performance at low bit rate drops a lot.

Change-Id: I8dbee04a272190f10516a3953c1ae690f8136766
2016-02-23 14:16:12 -08:00
hui su
875aa1c58c Fix palette mode
It was broken by commit 3787b17439d00d3684646e97c18d231860dae8b9

Change-Id: I2be0f6243e8716b9ca4c0321de381419108d1abe
2016-02-23 12:01:23 -08:00
Alex Converse
05f33142f5 Merge "Port "Better workaround for Bug 1089." to vp10 (nextgenv2)." into nextgenv2 2016-02-23 17:53:57 +00:00
Geza Lore
3c4b56c4dd Experiment to use image domain dist in baseline.
Change-Id: Ib29f510289716b5ab5c7d74d32a450c190308a83
2016-02-23 09:35:40 -08:00
Yaowu Xu
272dbaa13f Merge "Cleanup psnr.h" into nextgenv2 2016-02-23 17:13:34 +00:00
Angie Chiang
5340d1424d Merge "Merge 12sharp filter into ext-interp" into nextgenv2 2016-02-23 01:26:23 +00:00
Yaowu Xu
ec6b8d8b76 Merge "Add shift stage in FASTSSIM computation" into nextgenv2 2016-02-23 00:43:18 +00:00
Angie Chiang
e4af6a42a7 Merge 12sharp filter into ext-interp
Change-Id: I7df48e7f3b57f212798ef4be86f28aed928fc3e0
2016-02-22 16:26:38 -08:00
Yaowu Xu
eeaf8e6b6c Extend vpxssim to handle more HBD combinations
Change-Id: I38426d946b74c9090a265d34b89e2db6693927c2
2016-02-22 16:09:08 -08:00
Yaowu Xu
38cfc45e07 Cleanup psnr.h
Change-Id: Id026e72ee655ee5bd645a89e378da0d462be367d
2016-02-22 15:37:40 -08:00
Angie Chiang
a90f8b8c8f Merge "sync dec/enc mv clamp behavior in prediction" into nextgenv2 2016-02-22 23:30:10 +00:00
Yaowu Xu
d1c5cd4a30 Add shift stage in FASTSSIM computation
This commits adds a shift stage for FASTSSIM computaton when source
bit depth is different from working bit depth, to make sure metric
results are calculated in bit_depth consistent with source.

Change-Id: I997799634076ef7b00fd051710544681ed536185
2016-02-22 14:58:10 -08:00
Yaowu Xu
be5d1693fb Merge "Add shift stage for PSNRHVS computation" into nextgenv2 2016-02-22 21:56:00 +00:00
Angie Chiang
e9336e4dfc sync dec/enc mv clamp behavior in prediction
Change-Id: I12ce1da18b3db7bd2f36e0424a264b3c36fbed61
2016-02-22 11:36:03 -08:00
Angie Chiang
94493e606d Merge "Fix 12 TAP convolution bug" into nextgenv2 2016-02-22 19:03:06 +00:00
Yaowu Xu
af3a8381ef Merge "Move psnrhvs function declaration to psnr.h" into nextgenv2 2016-02-22 18:46:39 +00:00
Yaowu Xu
9bce2f76fb Merge "Extend HBDMetricTest" into nextgenv2 2016-02-22 18:46:26 +00:00
Yaowu Xu
195bf52bca Add shift stage for PSNRHVS computation
This commit adds the ability to shift down the working buffer when
source bit_depth is different than working bit_depth. It does so
by shift down to be consistent with source bit_depth.

Change-Id: Idfdbfc614d73fe445d62e35e642cc7d75e9dc4ff
2016-02-22 10:22:42 -08:00
Alex Converse
9fce131de8 Port "Better workaround for Bug 1089." to vp10 (nextgenv2).
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.

As a side effect, an illegal read in the ANS experiment is fixed.

https://bugs.chromium.org/p/webm/issues/detail?id=1089

Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
2016-02-22 10:19:03 -08:00
Jingning Han
404c512786 Merge "Unify motion vector cost system" into nextgenv2 2016-02-22 17:38:00 +00:00
Jingning Han
a10814e11e Merge "Account context based prob model for motion vector cost estimate" into nextgenv2 2016-02-22 17:37:42 +00:00
Jingning Han
1f984a5a63 Merge "Vectorize motion vector probability models" into nextgenv2 2016-02-22 17:37:29 +00:00
Jingning Han
682dad0ec7 Merge "Store predicted motion vectors" into nextgenv2 2016-02-22 17:14:05 +00:00
Yaowu Xu
6e695da2d9 Move psnrhvs function declaration to psnr.h
From "ssim.h"

Change-Id: Ie53378794149ef8a844b4eb47ad4f08579de4b60
2016-02-22 08:38:49 -08:00
Jingning Han
fec5988657 Unify motion vector cost system
This commit unifies the motion vector cost buffers for full pixel
and sub-pixel motion search. The new motion vector coding system
provides 0.5% coding gains for 720p and above sequences and 0.2%
for lower resolution sets.

Change-Id: I927ec81eadc39d11a3c12b375221a1ddd2e8bf24
2016-02-21 22:21:28 -08:00
Yaowu Xu
f6a7b17a35 Extend HBDMetricTest
This commit extends the HBDMetricTests to handle testing for metric
computation where input source depth is different from working bit
depth.

Change-Id: I5d11101cc9603a3fd09e8439816bb982a0f1b654
2016-02-20 21:19:18 -08:00
Angie Chiang
1e403064b9 Fix 12 TAP convolution bug
Priviously, we do 12-tap interpolation even there is no sub pixel,
This could cause a bug becuase decoder doesn't extend border when there
is no sub pixel. In this situation, if we still do interpolation, we
will access the border extension which doesn't exist and cause a
memory error

Change-Id: I55b879722f0a10c5d13261bd9617a75c826a2418
2016-02-19 19:31:38 -08:00
Jingning Han
03c01bc3c0 Account context based prob model for motion vector cost estimate
This commit accounts for the context based probability model for
motion vector cost estimate in rate-distortion optimization.

Change-Id: Ia068a9395dcb4ecc348f128b17b8d24734660b83
2016-02-19 16:32:51 -08:00
Yi Luo
961668c91c Merge "Initial SSE2 function fdst4_sse2()." into nextgenv2 2016-02-20 00:31:31 +00:00
Jingning Han
df59bb8986 Vectorize motion vector probability models
This commit converts the scalar motion vector probability model
into vector format for later precise estimate.

Change-Id: I7008d047ecc1b9577aa8442b4db2df312be869dc
2016-02-19 16:20:41 -08:00
Jingning Han
876c8b03e6 Store predicted motion vectors
Change-Id: I51307a217eeba14dbdaa2522be474530316a4faa
2016-02-19 14:25:34 -08:00
Yi Luo
5456aee6fc Initial SSE2 function fdst4_sse2().
Applied DST sse2 to 4x4 transform.

Fixed DST coefficient packing to satisfy 4x4 transpose requirement.

Change-Id: I9164714c77049523dbbc9e145ebb10d7911fba9d
2016-02-19 11:13:37 -08:00
Yaowu Xu
5712456bd9 Merge "Properly normalize HBD sse computation" into nextgenv2 2016-02-19 02:26:47 +00:00
Yaowu Xu
0c0f3efdeb Properly normalize HBD sse computation
This fixes a bug in HBD sum of squared error computation introduced
in  #abd00505d1c658cc106bad51369197270a299f92.

Change-Id: I9d4e8627eb8ea491bac44794c40c7f1e6ba135dc
2016-02-18 15:42:19 -08:00
Hui Su
286480de9b Merge "Speed-up for ext-intra" into nextgenv2 2016-02-18 23:12:45 +00:00
Debargha Mukherjee
9a019bce84 Merge "cost_coeff speed improvements" into nextgenv2 2016-02-18 19:31:18 +00:00
hui su
c4b69eb0eb Speed-up for ext-intra
-Avoid unnecessary calculations
-Use SIMD when possible

Encoder is about 5% faster with the extra intra prediction angles
enabled.

Change-Id: I131056befe327cedab217ad4a40d5f2a11318acc
2016-02-18 10:50:57 -08:00
Julia Robson
c6eba0b47a cost_coeff speed improvements
Preliminary tests indicated that these changes make cost_coeffs
approximately 20% faster which is a 2% improvement overall

Change-Id: Iaf013ba75884415cd824e98349f654ffb1c3ef33
2016-02-18 13:18:39 +00:00
Yaowu Xu
acc4addb60 Merge "Add tests for Highbitdepth PSNR metric computations" into nextgenv2 2016-02-18 01:01:00 +00:00
Yaowu Xu
7823fbb45c Merge "Move PSNR related functions into vpx_dsp/psnr.c" into nextgenv2 2016-02-18 01:00:54 +00:00
Yaowu Xu
9fb593d0fc Add tests for Highbitdepth PSNR metric computations
Change-Id: I07324155f73bbdbe25bb7a7ccd587ebf9010ac7a
2016-02-17 21:28:22 +00:00
Yaowu Xu
7538501ad1 Move PSNR related functions into vpx_dsp/psnr.c
This makes all metric computation to locate at some place, also gets
rid of duplicate code between vp9 and vp10.

Change-Id: I24a2707d183a2419cd18a8343010adae185ffcd4
2016-02-17 13:05:34 -08:00
Jingning Han
dd1391a005 Merge "Fix enc/dec mismatch in dynamic mv referenceing experiment" into nextgenv2 2016-02-17 19:03:14 +00:00
Debargha Mukherjee
35d9eadf08 Merge "Extends ext-tx to support 32x32 masked transforms" into nextgenv2 2016-02-17 18:33:10 +00:00
Debargha Mukherjee
7485498773 Extends ext-tx to support 32x32 masked transforms
Adds new 32x32 masked 1-d transforms that combine 1-D length-16
DCT with length-16 identity transforms.

To be continued in subsequent patches.

Change-Id: I0b4f66492d44c079b3c3b531ba48a97201de1484
2016-02-17 09:31:34 -08:00
Jingning Han
95247be0bf Fix enc/dec mismatch in dynamic mv referenceing experiment
This commit fixes an enc/dec mismatch in the dynamic motion vector
referencing experiment introduced in 837ef00.

Change-Id: I9fbe116fce118a80ef0f96bf41ce1f802547c2ee
2016-02-17 09:29:54 -08:00
Yaowu Xu
6ed7f7a516 Merge branch 'master' into nextgenv2 2016-02-17 07:23:58 -08:00
Yue Chen
907f88c4e6 Fixing a bug in obmc prediction in the rd loop
This bug made the rd loop use one-side obmc (compound of the current
predictor and the predictors of the left mi's, while the above ones
are ignored by mistake) to determine whether to use obmc. This fix
improved the compression performance by ~0.6% on different test sets.

Coding gain (%) of obmc experiment on derflr/derfhd/hevcmr/hevchd:
1.568/TBD/1.628/TBD

Change-Id: I43b239bedf9a8eebfd02315b1b036e140a998140
2016-02-16 14:43:45 -08:00
Debargha Mukherjee
f9c25498eb Merge "Tweak encoding flags for supertx." into nextgenv2 2016-02-16 22:10:30 +00:00
Debargha Mukherjee
907544a328 Merge "Code cleanup: remove redundant DST1 code" into nextgenv2 2016-02-16 19:43:25 +00:00
Geza Lore
c582aacb7a Tweak encoding flags for supertx.
Change-Id: I46f69d3a176897294d33c3f6d30b23c75b6267a8
2016-02-16 11:24:17 -08:00
Debargha Mukherjee
1badceada8 Code cleanup: remove redundant DST1 code
Removes the USE_DST2 flag that was on by default. DST2 performs
slightly better that DST1 and is faster to compute.

Change-Id: Ifb788f3f0a0e1995d7625230cec144b876f01206
2016-02-16 10:36:02 -08:00
Hui Su
0107373234 Merge "Add a speed feature to skip transform type selection" into nextgenv2 2016-02-16 18:31:18 +00:00
Debargha Mukherjee
8cc04ef505 Merge "Further supertx costing fixes." into nextgenv2 2016-02-16 18:02:24 +00:00
Debargha Mukherjee
6f49446dfa Merge "Fix double counting of compound reference bit cost." into nextgenv2 2016-02-16 17:55:49 +00:00
Geza Lore
abd00505d1 Add optimized vpx_sum_squares_2d_i16 for vp10.
Using this we can eliminate large numbers of calls to predict intra,
and is also faster than most of the variance functions it replaces.
This is an equivalence transform so coding performance is unaffected.

Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
enabled.

Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
2016-02-15 16:54:52 +00:00
Yue Chen
d1cad9c3f5 Overlapped block motion compensation experiment
In this experiment, an obmc inter prediction mode is enabled for
>= 8X8 inter blocks. When the obmc flag is on, the regular block-
based motion compensation will be refined by using predictors of
the above and left blocks.
Fixed some compatibility issues with vp9_highbitdepth, supertx,
ref_mv, and ext_interp.

Coding gain (%) on derflr/hevcmr/hevchd
OBMC:
1.047/1.022/0.708
OBMC + SUPERTX:
1.652/1.616/1.137
SUPERTX:
0.862/0.779/0.630

Change-Id: I5d8d3c4729c6d3ccb03ec7034563107893103b7f
2016-02-12 13:36:25 -08:00
Alex Converse
a45d5d3f94 Merge "Port switch to 9-bit rate cost to vp10." into nextgenv2 2016-02-12 21:15:35 +00:00
Geza Lore
599003969d Further supertx costing fixes.
Change-Id: I85897168c7fda3fd79daaba985b6607fd7df476b
2016-02-12 11:47:26 -08:00
Jingning Han
18eaf8e6fc Merge "Refactor vp10_drl_idx concept" into nextgenv2 2016-02-12 19:39:44 +00:00
Debargha Mukherjee
937c97faed Merge "Adding loop wiener restoration" into nextgenv2 2016-02-12 19:32:18 +00:00
Debargha Mukherjee
8b0a5b8718 Adding loop wiener restoration
Adds a wiener filter based restoration scheme in loop which can
be optionally selected instead of the bilateral filter.

The LMMSE filter generated per frame is a separable symmetric 7
tap filter. Three parameters for each of horizontal and vertical
filters are transmitted in the bitstream. The fourth parameter
is obtained assuming the sum is normalized to 1.
Also integerizes the bilateral filters, along with other
refactoring necessary in order to support the new switchable
restoration type framework.

derflr: -0.75% BDRATE

[A lot of videos still prefer bilateral, however since many frames
now use the simpler separable filter, the decoding speed is
much better].

Further experiments to follow, related to replacing the bilateral.

Change-Id: I6b1879983d50aab7ec5647340b6aef6b22299636
2016-02-12 09:56:24 -08:00
Yaowu Xu
18b6e9a36f Merge branch 'masterbase' into nextgenv2
Conflicts:
	vp10/encoder/rdopt.c

Change-Id: If720e7f9810378d24bf9fd51a95fd29c3bc5d774
2016-02-12 09:19:30 -08:00
Yaowu Xu
13efa8a089 Merge "Refactor internal stats code" into nextgenv2 2016-02-12 16:37:16 +00:00
Yaowu Xu
1a69cb286f Refactor internal stats code
Also removed the use of postprocessing in computing internal stats.

Change-Id: Ib8fdbdfe7b7ca05cd1a034a373aa7762fa44323c
2016-02-12 07:31:29 -08:00
Yaowu Xu
89a1ab395c Merge "Enable computing PSNRHVS for hbd build" into nextgenv2 2016-02-12 15:24:28 +00:00
Jingning Han
a39e83d743 Refactor vp10_drl_idx concept
Remove the implicit assumption on offsetting the index by 1.

Change-Id: I6f1d391e067d57b7e45b9287e866014dbc16da71
2016-02-11 16:38:13 -08:00
Debargha Mukherjee
c1924b9ff0 Merge "Complete high bitdepth VAR_TX implementation." into nextgenv2 2016-02-12 00:16:18 +00:00
Angie Chiang
368e3d9293 Merge "Refactor: add predict_interp_filter() to simplify the flow in handle_inter_mode" into nextgenv2 2016-02-12 00:16:13 +00:00
Yaowu Xu
bb8ca08816 Enable computing PSNRHVS for hbd build
This commit adds computation of PSNRHVS for highbitdepth build, it
also adds tests to make sure the calculation of psnrhvs metric for
10 and 12 bit correct.

Change-Id: Iac8a8073d2b3e3ba5d368829d770793212fa63b6
2016-02-11 13:17:59 -08:00
Jingning Han
57c83b330e Remove redundant parameters from vp10_txfm_rd_in_plane_supertx()
Change-Id: Icb164403239f88f18fd64de75d4881d33d3ab1cc
2016-02-11 11:53:22 -08:00
Jingning Han
f70134f729 Align rate-distortion cost metric for chroma compoments
This commit aligns the rate-distortion metrics for both luma and
chroma components in super transform rate-distortion optimization.
It improves the coding gains due to var-tx and supertx experiments
by 0.2% for high resolution test sets.

Change-Id: Ib89d99e29cb5ee27b1f867e301954d4164d8b364
2016-02-11 11:07:09 -08:00
Jingning Han
5c772f38fa Format clean-ups in transform experiments
Change-Id: Ib2843cb03ae452ce9fec3a94c709431ea0202d8b
2016-02-11 11:07:00 -08:00
Alex Converse
b3ad81288f Port switch to 9-bit rate cost to vp10.
Brings the following commits to vp10:
269428e Tie the bit cost scale to a define.
d13385c Switch to 9-bit rate cost constants built on a 256 probability denominator.
ad43a73 Fix a signed overflow in vp9 motion cost.
1c9b091 Fix some interger overflow errors
fac947d Restore previous motion search bit-error scale.

Change-Id: I598ba7ee7efcde18439c31dfa96b86cbf297a580
2016-02-11 09:54:24 -08:00
Geza Lore
432e875dce Complete high bitdepth VAR_TX implementation.
VAR_TX now works in the high bitdepth configuration.

Change-Id: I4114d7d9ed59c598f1e4d35b8e75876c07074ba7
2016-02-11 10:49:56 +00:00
Yaowu Xu
00380700fb Merge "Enable computing of FastSSIM for HBD build" into nextgenv2 2016-02-11 05:43:55 +00:00
Hui Su
6779be2487 Merge "Refactor rd_pick_intra_angle_" into nextgenv2 2016-02-11 01:44:14 +00:00
Yaowu Xu
c0874f2441 Enable computing of FastSSIM for HBD build
This commit adds the computation of fastSSIM for highbitdepth build,
it also modifies the hbdmetric test to be more generic and applicable
for fastSSIM.

The 255 used for calculating ssim constants c1 and c2 is not exactly
scaled by 4x and 16x to 1023 and 4095, therefore requries the metric
test to have a thresold more tolerant than 0, currently at 0.03dB.

Change-Id: I631829da7773de400e77fc36004156e5e126c7e0
2016-02-10 17:11:58 -08:00
Yue Chen
e25ccffcef Adding the config tag for the OBMC experiment
obmc: We add an obmc prediction mode at superblock level.
When it is enabled, predictors of the above and left blocks
are used to refine the regular block-based motion compensation.

Change-Id: I6310104ea3dfece16d736351e368861471dd1c9b
2016-02-10 15:58:08 -08:00
hui su
5a7c8d8c1d Refactor rd_pick_intra_angle_
Change-Id: I6c78188bdedb52655678c63f6a767567b256a880
2016-02-10 15:41:04 -08:00
Angie Chiang
c0035cc480 Refactor: add predict_interp_filter() to
simplify the flow in handle_inter_mode

Change-Id: Ic7934c0a5d0a79bdf546b4d2d106035449b475a6
2016-02-10 15:32:10 -08:00
hui su
329e340dc5 Add a speed feature to skip transform type selection
Setting FIXED_TX_TYPE as 1 makes the encoder skip tx_type search,
about twice as fast.

This speed feature is off by defualt; we can turn it on when we
want to quickly test new ideas.

Change-Id: Ieab5807d17fcd54fce3e8ae2f59a18b42eb79408
2016-02-10 15:11:01 -08:00
Yaowu Xu
6eaca90d65 Merge "Add a test for VPXSSIM computation for HBD inputs" into nextgenv2 2016-02-10 22:50:41 +00:00
Angie Chiang
ed5a6bd947 Merge "fix range_check error in vp10_[fwd/inv]_txfm1d.c" into nextgenv2 2016-02-10 21:21:57 +00:00
Yaowu Xu
988f27bfcf Add a test for VPXSSIM computation for HBD inputs
Change-Id: I61dc0f43d073b62d0eab8cd7471c2d76e03379bf
2016-02-10 12:49:19 -08:00
Jingning Han
3c8bd0d3de Merge "Resolve conflict between var-tx and super-tx" into nextgenv2 2016-02-10 18:54:21 +00:00
Jingning Han
4c6c82a2e8 Resolve conflict between var-tx and super-tx
This commit aligns the rate-distortion metric for the recursive
transform block partitioning and the super transform. It resolves
the conflicts between these two experiments. The coding performance
gains of the combined experiments (var-tx + super-tx) has been
improved:

derf   0.89%  ->  1.9%
hevcmr 1.06%  ->  1.8%
stdhd  0.29%  ->  1.4%
hevchr 0.80%  ->  2.3%

Change-Id: I7e33994ad70c1b2751435620815f867d82172f41
2016-02-10 09:36:40 -08:00
Jingning Han
ba1bed68ec Merge "Remove redundant void statement" into nextgenv2 2016-02-10 16:50:18 +00:00
Jingning Han
32a6a55b66 Merge "Replace arbitrary number with defs" into nextgenv2 2016-02-10 16:50:05 +00:00
Jingning Han
260da6ca8d Merge "Entropy coding for dynamic ref mv modes" into nextgenv2 2016-02-10 16:49:46 +00:00
Geza Lore
fd5463571e Fix double counting of compound reference bit cost.
These costs are added in separately just before the computed
ref_costs_* are added in the calling functions, so they were
effectively double counted.

Change-Id: Ic941d0243460cc2e750791cfc508e97d8b90e8fd
2016-02-10 13:37:31 +00:00
Jingning Han
875da8c666 Remove redundant void statement
Change-Id: Ia9ccba156a8a3ada6928be53eb6258fb68b6c9f1
2016-02-09 21:27:13 -08:00
Jingning Han
6d5d4395b5 Replace arbitrary number with defs
Change-Id: Ia5a68f26c67d13d3f2dd3b3f8afabb781e2c8f73
2016-02-09 21:27:13 -08:00
Angie Chiang
f98565755a fix range_check error in vp10_[fwd/inv]_txfm1d.c
Change-Id: I7a810323fc33fb6d373c3f5bb4d5d0d33170948c
2016-02-09 16:01:11 -08:00
Angie Chiang
7a8c7853c1 Remove C99 struct init syle in fwd/inv config
Change-Id: Ieeb458353af6c903445518eef60328c62ca5c741
2016-02-09 15:38:27 -08:00
Jingning Han
14a7aada68 Merge "Fix partition type costing." into nextgenv2 2016-02-09 22:31:14 +00:00
Debargha Mukherjee
34aa5e1d21 Merge "A variety of fixes for supertx/var-tx rd costing" into nextgenv2 2016-02-09 17:16:23 +00:00
Geza Lore
4ac6727438 Fix partition type costing.
This patch makes rd optimization use the same context for computing
the rate cost of coding the partitioning as the packer actually uses
when emitting it in write_modes_sb.

Change-Id: Idb1427bb2f9c37ab80c6aa182f7ff754ef0595cb
2016-02-09 10:36:53 +00:00
Debargha Mukherjee
e2c1ea9422 A variety of fixes for supertx/var-tx rd costing
Change-Id: I8a3d59378abb1dfa4e614b2975c2db05d4224bd5
2016-02-08 21:46:08 -08:00
Yaowu Xu
27bd9939a3 Merge "Fix a bug in HBD buffer size computation" into nextgenv2 2016-02-09 04:40:46 +00:00
Yaowu Xu
bb5f9e431f Fix a bug in HBD buffer size computation
The value of use_highbitdepth flag is used for compute the size for
high bit depth buffer allocation, which should take value 0 or 1
depending on if the buffer is used for high bit depth or not.
Previously, the values is set to 8 or 0, this commit fixes the issue
and properly set the value for this flag to 1 or 0.

This cuts the size of highbitdepth buffer memory allocation to 2/9 of
the size prior to the fix.

Change-Id: I401518b5a6147e5d8a973e54f7ca6bc1892065e0
2016-02-08 18:52:08 -08:00
Jingning Han
4958987b2a Entropy coding for dynamic ref mv modes
This commit enables entropy coding for dynamic reference motion
vector modes. The probability model is contexted on the ranking
categories of the reference motion vector candidates.

Change-Id: I09b58d98a409d63ec1a407331e29f8945b7ef17d
2016-02-08 17:05:24 -08:00
Debargha Mukherjee
d46c1f2349 Explicitly set tx_type for sub8x8 blocks
Fixes an issue where the tx_type was not set correctly for
sub8x8 inter and intra blocks. In the current syntax, for
sub8x8 blocks, there is still a single tx_type that is
transmitted. Ideally, this should be searched for the best
rd performance, albeit at the expense of encode speed.
For now, we just set it to DCT_DCT. Previously it was left
incorrectly as what was used for the previous non sub8x8
block.

derflr: BDRATE -0.277%

Change-Id: If76ba903bfbfd4d374cf1ac7d1daee50e92f0edd
2016-02-08 16:22:25 -08:00
Jingning Han
afd73539bb Merge "Enable dynamic ref motion vector mode for compound inter block" into nextgenv2 2016-02-09 00:04:36 +00:00
Yaowu Xu
1c8fa6a5ca Merge "Change to use local variables consistently" into nextgenv2 2016-02-08 23:58:23 +00:00
Yaowu Xu
f8cd25df1d Merge "Remove a flavor of SSIM that is never really used." into nextgenv2 2016-02-08 23:39:04 +00:00
Yaowu Xu
ff70d988c6 Merge "Set a max dB value for PSNR_HVS and FAST_SSIM" into nextgenv2 2016-02-08 23:38:58 +00:00
Yaowu Xu
78bca69138 Merge "Fix msvc compiler warnings" into nextgenv2 2016-02-08 23:38:48 +00:00
Yaowu Xu
e76714f4eb Merge "Normalize fdct8x8 in psnrhvs computation" into nextgenv2 2016-02-08 23:38:41 +00:00
Yaowu Xu
090eaadf20 Change to use local variables consistently
This commit does not change the computation, nor results.

Change-Id: I1a7bb47050220d970f075458b507c5e55d93b22e
2016-02-08 11:38:04 -08:00
Jingning Han
28e0393f23 Enable dynamic ref motion vector mode for compound inter block
This commit enables the dynamic reference motion vector coding mode
for the compound inter blocks.

Change-Id: Ibe78fd8de6989db392cd67a9d81a69d680345ba1
2016-02-08 11:24:15 -08:00
Yaowu Xu
204e77e059 Remove a flavor of SSIM that is never really used.
Change-Id: I61ea7f63acbcfeecd3f7dba5a5a38b980efc802b
2016-02-08 11:22:08 -08:00
Yaowu Xu
efe1b1dbf7 Set a max dB value for PSNR_HVS and FAST_SSIM
Now set at 100.0 instead of infinite

Change-Id: I41bae0c4bd95a26f9819584e7311b7945df1271a
2016-02-08 10:55:25 -08:00
Angie Chiang
355e586f45 Merge "Experiment: use 12 taps for sharp filter" into nextgenv2 2016-02-08 18:33:51 +00:00
Angie Chiang
eb71ef9235 Merge "add convolution function with adjustable length" into nextgenv2 2016-02-08 18:32:38 +00:00
Yaowu Xu
c89bcae560 Merge "Fixed warnings of unused functions" into nextgenv2 2016-02-08 18:18:47 +00:00
Yaowu Xu
3c28b4a8ff Fix msvc compiler warnings
There were a number of compiler warnings:
1. int16_t to uint8_t in recon_intra.c;
2. double to float conversions in psnrhvs.c
3. intptr_t to int in quantize.c
4. size_t to int32_t in decoder.c

Change-Id: Id95423b17779dcfa6cf39d9a90fe8cb8b910f5df
2016-02-08 10:14:08 -08:00
Yaowu Xu
d2ef991f9e Merge "Fix bad merge artifacts" into nextgenv2 2016-02-08 17:25:54 +00:00
Yaowu Xu
ac898d221f Normalize fdct8x8 in psnrhvs computation
This is to match the scale to the fdct8x8 used in original daala
psnrhvs computation.

Change-Id: Ic30b50747ba9c340bcb679f7439640046c69f90a
2016-02-08 17:13:18 +00:00
Angie Chiang
b9d3fbe0c5 Experiment: use 12 taps for sharp filter
Set USE_12_SHARP_FILTER to 1 to turn on the experiment
The psnr percentages increase
       derf   stdhd
lowbd  +0.332 +0.318
highbd +0.476 +0.507

Change-Id: I783c0fc764ee8541645e100453c9b2073924e209
2016-02-05 17:39:37 -08:00
Angie Chiang
d5349112e8 add convolution function with adjustable length
Change-Id: I1a5b1e15a188ef11594d0c6ac0dbd42aac59cfca
2016-02-05 17:33:19 -08:00
Yaowu Xu
4d90ae4b49 Fixed warnings of unused functions
And enabled the warning flag in configure for vp10.

Change-Id: If556d6fac65755af3d6ed7fe71b8eca0ef1b1965
2016-02-05 14:34:09 -08:00
Hui Su
e5bd08185a Merge "Add a speed feature for intra filter search" into nextgenv2 2016-02-05 18:49:24 +00:00
Yaowu Xu
105da4128d Fix bad merge artifacts
Temporaly disable warning for unused function for vp10, needs clean
out the warnings before re-enable the flag for vp10.

Change-Id: I5636f8cd607423f6ea6963db9c2cbd688e30b495
2016-02-05 09:04:41 -08:00
Yaowu Xu
48b2713553 Merge branch 'master' into nextgenv2 2016-02-05 05:00:06 -08:00
Hui Su
9604f69005 Merge "Add 8-tap interpolation filter options for intra prediction" into nextgenv2 2016-02-04 23:06:28 +00:00
Jingning Han
d1c2949eb4 Merge "Generalize the dynamic reference motion vector coding mode" into nextgenv2 2016-02-04 22:28:56 +00:00
Yaowu Xu
027cc8c873 merge per-segment lossless feature in misc_fixes
Change-Id: I56d56781d371c99aa5cdd2db1cbc0a17437723e9
2016-02-04 08:31:16 -08:00
Debargha Mukherjee
9909b5414e Merge "Fix for supertx: ignore skip_txfm optimization" into nextgenv2 2016-02-04 04:28:29 +00:00
hui su
5b618b7cae Add a speed feature for intra filter search
Seperate the prediction angle search and fitler search.
It can reduce the computation overhead of filter search by as much
as 85%, while keeping more than 50% of the coding gain.

Change-Id: Id152f71e20ebcaca8b429bdd4ca1fbeb646fc6bf
2016-02-03 15:12:06 -08:00
Yaowu Xu
aac86aa093 Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-02-03 22:54:25 +00:00
hui su
3b1c766802 Add 8-tap interpolation filter options for intra prediction
BD-rate performance improvement (on top of ext-intra):
derflr  0.22%
hevclr  0.36%
hevcmr  0.48%
hevchr  0.37%
stdhd   0.19%

Average speed impact on some derf clips is about 40% slower (on
top of ext-intra). Speed improvment is a to-do.

Change-Id: I8fe3fe8c5e4f60d0462778adbcc15c84dfbe7a25
2016-02-03 14:19:20 -08:00
Jingning Han
4fb8b217e6 Generalize the dynamic reference motion vector coding mode
This commit generalizes dynamic reference motion vector coding mode
to support multiple candidate modes in the rate-distortion
optimization scheme and to support the selection in the bit-stream
syntax. The maximum number of modes allowed is currently limited to
4. The syntax elements for the dynamic reference motion vector
modes are using binary codes. The scheme supports single reference
frame.

It improves the compression performance
derf   0.135%
hevcmr 0.098%

Change-Id: Id053d6ce76e8365e52727bd0d12d28ce3de2e0e8
2016-02-03 11:57:42 -08:00
Jingning Han
6cfb488500 Merge "Account for zero-forcing operation in selective ref mv mode" into nextgenv2 2016-02-03 19:07:28 +00:00
Jingning Han
590265eaf1 Account for zero-forcing operation in selective ref mv mode
It makes the encoder accounts for the block zero-forcing operation
when optimizing the mode decisions.

Change-Id: I2c8e243756080b446b8a53a9679f75c4c47148cf
2016-02-03 09:26:35 -08:00
Yaowu Xu
238cc11cce Merge branch 'master' into nextgenv2
Change-Id: Ib050607fa5c0288360ff224fd048445d16568520
2016-02-03 09:26:13 -08:00
Debargha Mukherjee
7cdb078673 Merge "Supertx fix for 422 colour subsampling" into nextgenv2 2016-02-03 17:19:09 +00:00
Geza Lore
17a3d31d85 Fix for supertx: ignore skip_txfm optimization
Change-Id: I80eedb548c449ec43c6b5b88c5493b665606906e
2016-02-03 12:51:39 +00:00
Julia Robson
4fbd678f99 Supertx fix for 422 colour subsampling
Fixes assertion for football_422_4sif.y4m when supertx, var_tx and
ext_tx are all enabled. Problem was after subsampling, the u and v
blocks being encoded were no longer square.

Change-Id: Ie626f30a2e64538d33343a26d5124a79a6f2b985
2016-02-03 12:17:25 +00:00
Jingning Han
67cf8908bc Enable adaptive motion vector referencing mode
This commit allows an adaptive motion vector referencing mode
approach. It checks the available reference motion vector candidate
list and decides the amount of motion vector referencing modes. The
current implementation assumes simple binary coding for the syntax.

The compression performance is improved by
derf   0.11%
hevcmr 0.38%
stdhd  0.09%
hevchr 0.23%

The coding gains due to the new reference motion vector system are
derf   1.0%
hevcmr 1.7%
stdhd  1.4%
hevchr 1.3%

Change-Id: Idf932fc373546fe59c8741f1b933ff656e8dbc3f
2016-02-02 15:15:40 -08:00
Debargha Mukherjee
331b029590 Merge "Fixing a issue of calculating tx cost for SUPERTX+VAR_TX" into nextgenv2 2016-02-02 15:14:39 +00:00
Yaowu Xu
0839d02b4c Merge branch 'master' into nextgenv2 2016-02-02 05:00:05 -08:00
Angie Chiang
392d577c49 Merge "Pass filter type instead of filter array" into nextgenv2 2016-02-02 02:32:16 +00:00
Angie Chiang
10ad97bc55 Pass filter type instead of filter array
Change-Id: I25f2149ddaa332722f7ab82e8f832a253c4b6ab3
2016-02-01 17:03:50 -08:00
Debargha Mukherjee
9953279978 Merge "Refactor to separate restoration from loop filter" into nextgenv2 2016-02-02 00:17:14 +00:00
Yaowu Xu
9568a284ab Fix automerge errors
Change-Id: I24d415bafe617eac894427088d7b2fbe0b7e04d7
2016-02-01 14:03:49 -08:00
Yaowu Xu
3ba144f81d Merge branch 'master' into nextgenv2 2016-02-01 05:00:05 -08:00
Yaowu Xu
8678ecd1ef Merge branch 'master' into nextgenv2 2016-01-31 05:00:05 -08:00
Yaowu Xu
8dc6f3f5c2 Merge branch 'master' into nextgenv2 2016-01-30 05:00:05 -08:00
Debargha Mukherjee
f0a4485e54 Refactor to separate restoration from loop filter
Change-Id: Iab517862d957f3aa2a664e9349d57bbf424febb3
2016-01-29 15:39:23 -08:00
Debargha Mukherjee
af99a61697 Merge "Cosmetic changes to loop restoration" into nextgenv2 2016-01-29 20:57:04 +00:00
Yue Chen
11af20dbeb Fixing a issue of calculating tx cost for SUPERTX+VAR_TX
Update blk_skip in update_state_supertx() and rd_supertx_sb().
Performance (SUPERTX+VAR_TX): TBD
(Eventually will merge update_state() and update_state_supertx())

Change-Id: I34ef982b80151ba2dfba745859cb2ca7b90dc888
2016-01-29 09:31:04 +00:00
Yaowu Xu
646f831fbd Merge branch 'master' into nextgenv2 2016-01-28 07:32:05 -08:00
Debargha Mukherjee
3eb10fcf21 Cosmetic changes to loop restoration
Also adds a normalized filtering function to be used later.

Change-Id: I30e2140e664db635602f26a73b81ce8e008dff5e
2016-01-27 17:33:36 -08:00
Debargha Mukherjee
b831fafceb Merge "Fixes ext-interp experiment" into nextgenv2 2016-01-27 19:16:55 +00:00
Debargha Mukherjee
eef57c1e99 Fixes ext-interp experiment
Fixes integer pel MV usage for the sub8x8 case, which fixes a
rare mismatch issue.

Also adds some other minor missing code related to filter threshes.

Change-Id: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc
2016-01-27 09:24:48 -08:00
Yaowu Xu
35e55871ab Merge branch 'master' into nextgenv2 2016-01-27 05:00:06 -08:00
Geza Lore
5aa655f79a Reintroduce VAR_TX fix
Reintroduce part of
Iaf2b717e6b8626b2b6a03226127221b776b49884
Which was later reverted in
I4c5b40ec63a6f19521191d3c730af87db3c4bc00

Change-Id: If3e5610ba3985ae7b4d952d8e616982465ac667a
2016-01-26 15:27:47 +00:00
Geza Lore
f33755ef1f Always recode SUPERTX blocks.
There is still an assertion failure when tokenizing transform
coefficients for a supertx block, due to the eob not being set
consistently with the coefficients, so we always recode supertx blocks
for now. Also added further PICK_MODE_CONTEXT instances to avoid
potential clash between horizontal/vertical/split partition SUPERTX
trials.

Change-Id: I5f3da1fa0d8d20fc21face170487e1a285fd1cc6
2016-01-26 15:21:07 +00:00
Geza Lore
753dcbe5f9 Merge "Update VAR_TX related context when using SUPERTX." into nextgenv2 2016-01-26 15:20:32 +00:00
Yaowu Xu
f512a311f2 Merge branch 'master' into nextgenv2 2016-01-26 05:00:05 -08:00
Geza Lore
22cfc841d5 Update VAR_TX related context when using SUPERTX.
The encoder did not update left_txfm_context and above_txfm_context in
MACROBLOCKD (used for choosing the probability context for the vartx
split bits) when the supertx bit was set for a block. The deoder on the
other hand did update these for supertx blocks. The encoder used these
to compute the context counts, which the packer then uses to adapt it's
probabilities. This results in the packer and the decoder using
different probabilities.

This patch harmonizes the encoder and the decoder by making the encoder
update the mentioned context for supertx coded blocks.

Change-Id: I3a22132124b1bce2ee501d640ceab374b19e3ca1
2016-01-26 01:18:56 +00:00
Geza Lore
56686b4514 Initalize mbmi->tx_size during rdopt.
This is necessary when using SUPERTX, as the bitstream packer relies on
tx_size being set correctly to decide whether to output the block using
supertx or not.

Change-Id: I79e776b3b810f4a15b9dbc6afdd6fc90c73c8934
2016-01-26 01:18:56 +00:00
Geza Lore
e7c0e157d2 Set inter_tx_size for supertx coded blocks.
The loop filter relies on inter_tx_size in MB_MODE_INFO being set
properly when VAR_TX is enabled. Supertx coded blocks did not set this
previously at all, and the differing garbage values eventually resulted
in in a YUV mismatch between encoder and decoder after loop filtering.

This patch fixes this by setting inter_tx_size to the proper supertx
size in both the encoder and the decoder. This should also mean that
loop filtering is done at the proper transform boundaries, even when
supertx or vartx is being used.

Change-Id: I41a564cd6d34ce4a8313ad4efa89d905f5ead731
2016-01-26 01:18:56 +00:00
Debargha Mukherjee
a8122bb957 A minor assert fix for supertx
Change-Id: I532fff64ccaa1f38240ba7ca5ce2f7e1eb531771
2016-01-25 17:15:51 -08:00
Debargha Mukherjee
3e63170fdd Merge "Some supertx fixes" into nextgenv2 2016-01-25 22:21:53 +00:00
Debargha Mukherjee
9a8a6a1b35 Some supertx fixes
Fixes some of the issues introduced by a merge from master.

derflr: -0.893% BDRATE
hevcmr: -1.667% BDRATE

Change-Id: I4c5b40ec63a6f19521191d3c730af87db3c4bc00
2016-01-25 10:36:41 -08:00
Yaowu Xu
33768c24af Merge branch 'master' into nextgenv2 2016-01-24 05:00:05 -08:00
Yue Chen
968bbc7bb2 Adding new compound modes to EXT_INTER experiment
Combinations of different mv modes for two reference frames
are allowed in compound inter modes. 9 options are enabled,
including NEAREST_NEARESTMV, NEAREST_NEARMV, NEAR_NEARESTMV,
NEAREST_NEWMV, NEW_NEARESTMV, NEAR_NEWMV, NEW_NEARMV, ZERO_ZEROMV,
and NEW_NEWMV.
This experiment is mostly deported from the nextgen branch.
It is made compatible with other experiments

Coding gain of EXT_INTER(derflr/hevcmr/hevchd): 0.533%/0.728%/0.639%

Change-Id: Id47e97284e6481b186870afbad33204b7a33dbb0
2016-01-22 13:52:16 -08:00
Yaowu Xu
53c9ffd06f Fix a merge mistake
Change-Id: I0769ca1abd42521a24078dc9ba7093ccb5e362ee
2016-01-22 10:48:10 -08:00
Debargha Mukherjee
11d3899c60 Merge "A fix for a missing tx_type" into nextgenv2 2016-01-22 18:36:46 +00:00
Yaowu Xu
94322a9f9b Merge branch 'master' into nextgenv2
Change-Id: I0a82fa1cbe4ee7c7831d2d174f140a40d09a06c5
2016-01-22 08:46:06 -08:00
Debargha Mukherjee
4e406f702c A fix for a missing tx_type
Change-Id: I165cd06256175edb7739020379ba3098251f4a7c
2016-01-22 03:29:39 -08:00
Debargha Mukherjee
ade60d4fe9 Merge "Fixes for var_tx when ext_tx is not enabled" into nextgenv2 2016-01-22 06:09:36 +00:00
Debargha Mukherjee
0a5c176011 Merge "Some fixes on tx size/type selection" into nextgenv2 2016-01-22 04:02:11 +00:00
Yue Chen
91f55290ef Merge "Relocate supertx prob update" into nextgenv2 2016-01-22 03:45:38 +00:00
Julia Robson
9fe188e4a5 Fixes for var_tx when ext_tx is not enabled
This patch fixes a couple of issues caused by change-id:
I15d20ce5292b70f0c2b4ba55c1f1318181481596
Changes to the code for when the ext_tx experiment is not enabled
were merged from master but as var_tx does not exist on master
the changes had not been applied to the case when var_tx experiment
is enabled

Change-Id: Iaf2b717e6b8626b2b6a03226127221b776b49884
2016-01-21 17:46:03 -08:00
Debargha Mukherjee
8d69a6e816 Some fixes on tx size/type selection
For ext_tx experiment.

Change-Id: Ie37b9b456b09bde8b606fb978fee4cca8d0326b7
2016-01-21 17:45:30 -08:00
Debargha Mukherjee
b1d49f24b4 Build fix for vp9
Change-Id: I1b93ea826487f9033de70fc4563ade410fe07d74
2016-01-21 17:41:36 -08:00
Yue Chen
fb29aec42c Relocate supertx prob update
Move it from vp10_adapt_intra_frame_probs() to
vp10_adapt_inter_frame_probs() because intra frames do not use
supertx.

Change-Id: I28c7391944848666054d4b990ac17a8ae08aaaee
2016-01-21 17:41:21 -08:00
Debargha Mukherjee
09f3615606 Merge "Loop restoration filter" into nextgenv2 2016-01-21 17:43:39 +00:00
Debargha Mukherjee
0880b42129 Merge "Making the forward transform consistent with high bit depth" into nextgenv2 2016-01-21 15:22:48 +00:00
Debargha Mukherjee
84ca7a9f0f Loop restoration filter
Current implementation is a bilateral filter whose
parameters are transmitted in the bitstream.

derflr: -0.647% BDRATE
hevcmr: -0.794% BDRATE

This is a prelimary patch. Various other variations are to
be investigated next, that will hopefully be less expensive
on the decoder side.

Change-Id: I50634ae8f5014ad0bf7432306348908a349d81e1
2016-01-20 17:59:46 -08:00
Yaowu Xu
a40d486215 Merge branch 'master' into nextgenv2
Change-Id: I4e5dd38caa9608252235265c7f2342b183f99815
2016-01-20 09:24:41 -08:00
Julia Robson
c178b2d192 Making the forward transform consistent with high bit depth
This patch changes the code for 16bit buffers to use the same
optimisation as is used for 8bit buffers. (See change-Id:
I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1 for more information
about the optimisation)

Change-Id: I5f327a13a7b01fc356114a2aa9d1261bf76d8d69
2016-01-20 12:03:16 +00:00
hui su
2778b1cbb9 Fix a bug with ext-intra when skip_recode is enabled
Change-Id: I906945d61254149b315a6de81ac6373ed31791e6
2016-01-19 14:54:31 -08:00
Debargha Mukherjee
63b57c311d Merge "Adding experimental tags for new experiments" into nextgenv2 2016-01-19 16:58:21 +00:00
Yaowu Xu
d32c38c75c Merge branch 'master' into nextgenv2 2016-01-16 05:00:06 -08:00
Yue Chen
1ac858794a EXT_INTER experiment
NEW2MV is enabled, representing a new motion vector predicted from
NEARMV. It is mostly ported from nextgen, where it was named
NEW_INTER.
A few fixes are done for sub8x8 RDO to correct some misused
mv references in the original patch.
A 'bug-fix' for encoding complexity is done, reducing the additional
encoding time from 50% to 20%. In sub8x8 case, the old patch
did motion search for every interpolation filter (vp9 only
searches once). This fix also slightly improves the coding gain.
This experiment has been made compatible with REF_MV and EXT_REFS.

Coding gain (derflr/hevcmr/hevchd): 0.267%/0.542%/0.257%

Change-Id: I9a94c5f292e7454492a877f65072e8aedba087d4
2016-01-15 14:47:02 -08:00
Debargha Mukherjee
6a5a08ee1c Adding experimental tags for new experiments
ext-partition: to hold partition extensions (ex. ext-partition,
ext-coding-unit-size from nextgen)
loop-restore: to hold in-loop restoration filter (ex. loop-postfilter
from nextgen and other Wiener restoration filters)

Change-Id: I71c7f1588f05fb0f2b00f7004a78e90c9cceae3f
2016-01-15 12:55:03 -08:00
Debargha Mukherjee
eee6afe0b9 Fixing some compile issues
Fixes a breakage introduced with the latest merge from master and
cleans up a couple of compiler warnings.

Change-Id: Ia55b39ba78e43f6fe52c54d7f34faa4dd6bbbf26
2016-01-15 11:02:30 -08:00
Yaowu Xu
44da65fb44 Merge branch 'master' into nextgenv2 2016-01-14 13:57:27 -08:00
Jingning Han
c38fc52df4 Merge "Handle single ref mv pair in the candidate list for compound mode" into nextgenv2 2016-01-14 02:55:36 +00:00
Yaowu Xu
727ca802bf Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-01-14 00:26:45 +00:00
Jingning Han
3944cfb14d Handle single ref mv pair in the candidate list for compound mode
This commit considers the case where a single reference motion
vector pair is found in the candidate list. It treats this pair
as the effective motion vector for nearestmv mode. This improves
the coding performance by 0.06% for stdhd sets.

Change-Id: I9ce12f456b52912933e05c18c3841a78c26155d2
2016-01-13 16:19:27 -08:00
Yaowu Xu
0367f32ea8 Merge branch 'master' into nextgenv2
Manually resovled the following conflicts:
	vp10/common/blockd.h
	vp10/common/entropy.h
	vp10/common/entropymode.c
	vp10/common/entropymode.h
	vp10/common/enums.h
	vp10/common/thread_common.c
	vp10/decoder/decodeframe.c
	vp10/decoder/decodemv.c
	vp10/encoder/bitstream.c
	vp10/encoder/encodeframe.c
	vp10/encoder/rd.c
	vp10/encoder/rdopt.c

Change-Id: I15d20ce5292b70f0c2b4ba55c1f1318181481596
2016-01-13 13:18:06 -08:00
Debargha Mukherjee
e9db70fb63 Merge "Use specific PICK_MODE_CONTEXT for supertx." into nextgenv2 2016-01-13 19:41:39 +00:00
Jingning Han
177601b561 Merge "Generate compound reference motion vector" into nextgenv2 2016-01-13 19:37:38 +00:00
Jingning Han
97629f5961 Merge "Refactor ref mv stack system" into nextgenv2 2016-01-13 19:36:16 +00:00
Jingning Han
33cc1bd21d Generate compound reference motion vector
This commit allows the codec to add motion vector pairs into
the candidate list. It further improves the compression performance
by 0.1% across derf, hevcmr, stdhd, and hevchr sets without adding
encode/decode time.

Change-Id: I88d36da25a2a89bb506d411844af667081eba98b
2016-01-12 15:28:47 -08:00
Alex Converse
0f840cc3eb Compress the final ANS state.
The '110' prefix on a final byte indicates a superframe marker. Coded
data is not allowed to use this pattern on a final byte.

Code |state - l_base| little endian with the following prefix scheme:
Prefix '00': Single byte coded state.
Prefix '01': Two bytes le coded state.
Prefix '10': Three bytes le coded state.

Change-Id: Ibc953b67675b567394b93de39b7cb22cadc47435
2016-01-12 23:11:17 +00:00
Alex Converse
5311ad9e83 Merge "Code DCT tokens with ANS" into nextgenv2 2016-01-12 23:08:36 +00:00
Alex Converse
dff3ea7ab3 Merge "Add an implementation of Asymetric Numeral Systems (ANS)." into nextgenv2 2016-01-12 23:05:31 +00:00
Geza Lore
e4663e6da1 Use specific PICK_MODE_CONTEXT for supertx.
Change-Id: I402d10dd666ebc4a06ce4472810a7e22f2e056ff
2016-01-12 13:06:42 +00:00
Alex Converse
d1893f64e0 Code DCT tokens with ANS
Change-Id: I452f9675325a5f45bfbbe3e7e135009a125539f1
2016-01-12 09:08:19 +00:00
Alex Converse
9ffcb469fb Add an implementation of Asymetric Numeral Systems (ANS).
Change-Id: Ie41bc72127e700887566dcc951da9d83a0b94891
2016-01-11 16:26:30 -08:00
Jingning Han
253a200d3b Refactor ref mv stack system
This commit re-works the reference motion vector stack process
and make it support extended context set. It unifies reference
motion vector checking process for row and column scan, as well as
for single block scan.

Change-Id: I68c05cde93cf8b0ca2ef4d1523399f405bd0a337
2016-01-11 12:39:29 -08:00
Debargha Mukherjee
d738ce2b0a Merge "Fixing issue with txfm context when var_tx and supertx are enabled" into nextgenv2 2016-01-11 18:40:25 +00:00
Jingning Han
387a10e3dc Enable context analyzer for inter mode entropy coding
It allows the codec to account for certain corner cases when
processing inter prediction mode entropy coding.

Change-Id: Ied451f4fff26ba579f6556554b8381ff2ccd0003
2016-01-08 10:27:27 -08:00
Yaowu Xu
c557efafeb Merge branch 'master' into nextgenv2 2016-01-08 05:00:05 -08:00
Geza Lore
a167244346 Fix 2 bugs when using both SUPERTX and EXT_TX
Change-Id: Ibcbe470a97880c294600345337054ed9af84de2b
2016-01-07 12:13:35 -08:00
Julia Robson
54fbf7e55f Fixing issue with txfm context when var_tx and supertx are enabled
In the decoder, the txfm context was not being set for supertx
blocks.

Change-Id: Ifa0882bba36bc54bbd9dba3e370317b5531e33d3
2016-01-07 15:23:08 +00:00
Yaowu Xu
c67fca154e Merge branch 'master' into nextgenv2 2016-01-07 05:00:06 -08:00
Debargha Mukherjee
b5480b3594 Merge "Change to rd costing for CONFIG_VAR_TX" into nextgenv2 2016-01-06 23:49:01 +00:00
Yaowu Xu
55118ad061 Merge branch 'master' into nextgenv2
Conflicts:
	vp10/decoder/decodeframe.c
	vp10/encoder/bitstream.c

Change-Id: I743449f49b723d2ce357832619a28b53369d0547
2016-01-06 08:36:18 -08:00
Peter de Rivaz
2f943131df Change to rd costing for CONFIG_VAR_TX
In select_tx_block I believe the rd cost was ignoring the bits
needed to encode the split bit.

Change-Id: Iacbf705b880db9a68967a994406ba90ecf693ab8
2016-01-05 15:35:43 +00:00
Debargha Mukherjee
3787b17439 Super transform - ported from nextgen branch
Various additional changes were made to make the experiment
compatible with misc_fixes.

derflr: +0.979%
hevcmr: +0.865%

Speed-wise with --enable-supertx the encoder is only about 10%
slower than without. Decoding impact is about 30% slowdown.

Note this does not work with ext-tx or var-tx yet. That is
a TODO.

Change-Id: If25af4241a7a9efbd28f58eda3c4f044c7a7ef4b
2016-01-04 22:12:57 -08:00
Hui Su
717be7bcd5 Merge "Use precise rate cost for intra modes in inter frames" into nextgenv2 2016-01-04 18:17:59 +00:00
hui su
1559afda85 Use precise rate cost for intra modes in inter frames
derflr +0.021%
hevclr +0.207%
hevcmr +0.035%
stdhd  +0.042%

Change-Id: Ic750df93bcc0a261a66a9b19d939a5cd61a6b516
2016-01-04 08:35:29 -08:00
Zoe Liu
9581f3d49a Replaced a hard-coded value with the macro
Change-Id: I2aec63d8a600e319d037b764b0609092bce1e483
2015-12-30 17:16:51 -08:00
Yaowu Xu
250213ac7e Merge branch 'master' into nextgenv2 2015-12-29 05:00:05 -08:00
hui su
0681f6f1df ext-intra experiment: exploit left-bottom bundary
ext-intra vs nextgenv2 baseline:
derflr +1.12% (was +1.06%)
hevcmr +2.26% (was +2.15%)

Change-Id: I6cc7612d0d7e81e200aa962988db1ea7680626d7
2015-12-28 10:47:40 -08:00
Yaowu Xu
14b0443792 Merge branch 'master' into nextgenv2 2015-12-23 05:00:05 -08:00
Yaowu Xu
7c6144bc4a Merge branch 'master' into nextgenv2 2015-12-22 05:00:05 -08:00
Zoe Liu
a2832f1b5b Merge "Cleared the EXT_REFS code to make it more legible." into nextgenv2 2015-12-21 18:26:35 +00:00
Debargha Mukherjee
305fac7a19 Merge "Fix for high bitdepth temporal filter" into nextgenv2 2015-12-21 17:41:26 +00:00
Yaowu Xu
f73feedb9e Merge branch 'master' into nextgenv2 2015-12-19 05:00:06 -08:00
Yaowu Xu
7330108009 Merge branch 'master' into nextgenv2 2015-12-18 05:00:05 -08:00
Zoe Liu
a4d0c7148b Cleared the EXT_REFS code to make it more legible.
Change-Id: I309c4e16fd305bcfa590d14f957a9598d23c7ee6
2015-12-17 16:46:28 -08:00
Zoe Liu
ec36a2b061 Restore the flexibility for the new 3 references
For the experiment of EXT_REFS, removed the previous special handling
on the new last 3 references, i.e. LAST2_FRAME, LAST3_FRAME, and
LAST4_FRAME, at the decoder, so that these new last references are
treated the same way as the other 3 references (LAST_FRAME,
GOLDEN_FRAME, and ALTREF_FRAME). Encoder changes have been made
accordingly to realize this flexibility.

Change-Id: Ic6546f9443b4377bb7e7b101bfa3e70a8b8d1c65
2015-12-17 16:34:02 -08:00
Debargha Mukherjee
8b9efaa161 Merge "Replace DST1 in ext_tx experiment with DST2" into nextgenv2 2015-12-16 23:47:28 +00:00
Yaowu Xu
dab7515aa4 Merge branch 'master' into nextgenv2
With a few manual fixes of merge conflicts.

Change-Id: I0dd65ff90f9fa8606e5563f528659e2607b12376
2015-12-16 09:00:57 -08:00
Angie Chiang
d6695b8a0e Merge "Refactor vp10_encode_block_intra" into nextgenv2 2015-12-15 19:59:33 +00:00
Debargha Mukherjee
49d9730f60 Replace DST1 in ext_tx experiment with DST2
The DST2 is implemented by input alternate sign-flip, followed
by DCT, followed by output reversal.
Results are roughly the same, but it should be easier to optimize
the DST2.
[Interestingly a mtrix multuiply implementation is about 0.1%
better].

Change-Id: If9ae5fdba87767fb0e6c163a62b77ee66a8d3afc
2015-12-15 11:30:48 -08:00
Yaowu Xu
b37e8b0e00 Merge branch 'master' into nextgenv2 2015-12-15 05:00:05 -08:00
Jingning Han
6f1f0d896a Merge "Enable adaptive prediction mode coding" into nextgenv2 2015-12-15 04:38:15 +00:00
Angie Chiang
0919edd4d2 Refactor vp10_encode_block_intra
1) Add VP10_XFORM_QUANT_SKIP_QUANT mode for vp10_xform_quant
2) Let encode_block call vp10_xform_quant so that its code flow
   is clear

Change-Id: I122d5cf6a089f444ae018f3e4bf844be847e17ee
2015-12-11 14:30:24 -08:00
Angie Chiang
30ee689da3 Merge "Refactor vp10_xform_quant" into nextgenv2 2015-12-11 20:29:04 +00:00
Yaowu Xu
f07d73b9bf Merge branch 'master' into nextgenv2
Change-Id: Id0b784b115602e2502b42fa972a5ae210435a3be
2015-12-11 08:58:40 -08:00
Debargha Mukherjee
104636a39a Some fixes from merging MISC_FIXES config
Change-Id: I3f77e952af3c441a50479bb5d278ea0fd6cf62c6
2015-12-10 15:17:33 -08:00
Jingning Han
aa5d53eb17 Enable adaptive prediction mode coding
This commit allows the codec to analyze the reference motion vector
candidate list and adaptively reduce the size of inter prediction
mode set.

Change-Id: Ied6a403843b860d66f26ed485c1825c05c71bdfc
2015-12-10 09:02:32 -08:00
Jingning Han
8edbe4d6db Merge "Allow precise classification for refmv mode context" into nextgenv2 2015-12-10 03:26:24 +00:00
Jingning Han
48365e10bb Merge "Re-design motion compensated prediction mode entropy coding system" into nextgenv2 2015-12-10 03:26:05 +00:00
Jingning Han
0d65cae638 Allow precise classification for refmv mode context
Combine the nearest ref mv count and the total ref mv count for
mode context.

Change-Id: I342a2b126bf7d2d30c344911260d9769a923026b
2015-12-10 02:03:32 +00:00
Jingning Han
1dc18077b8 Re-design motion compensated prediction mode entropy coding system
This commit re-works the entropy coding scheme of the motion
compensated prediction modes. It allows more flexible hyperplane
partition for precise classification.

Change-Id: Iba5035c76691946cf1386b6c495e399c3d9c8fc5
2015-12-09 18:02:20 -08:00
Yaowu Xu
f757782f22 Merge branch 'master' into nextgenv2
Change-Id: I6f8b540854ddc78fc4a2a8045b194a888749a3cb
2015-12-09 08:09:30 -08:00
Peter de Rivaz
5283e57786 Fix for high bitdepth temporal filter
The 8bit temporal filter has changed to use non-local means.
This fix adds the same change into the high bitdepth code.

Change-Id: I3375b13a7d914fc8aa9eb4aac1d2e4d9b74b782f
2015-12-09 15:53:01 +00:00
Debargha Mukherjee
9fbc394036 Merge "Fix for crash when using high bitdepth and var-tx" into nextgenv2 2015-12-08 19:17:10 +00:00
Hui Su
cdffec73e9 Merge "Bring palette back to nextgenv2" into nextgenv2 2015-12-08 17:44:22 +00:00
Peter de Rivaz
22850493b6 Fix for crash when using high bitdepth and var-tx
Change-Id: Ide48fa4312f7828f99290f7a2be878f5673fa716
2015-12-08 15:58:21 +00:00
hui su
c93e5cc3e9 Bring palette back to nextgenv2
It was removed by the master branch merge.

Change-Id: I4b2a524c9e052e41063359afcb4ba22bf78344cf
2015-12-07 18:24:15 -08:00
hui su
bf0ff0907e Miscellaneous changes in reconintra.c
Fix a bug in vp10_has_right;
Some cosmetic changes.

Tiny performance improvement (0.02%~0.04%) on derflr and hevcmr.

Change-Id: Iee829003a20f32d6185a08bab2bd4201806be2b3
2015-12-07 17:08:27 -08:00
Yaowu Xu
69f4930041 Merge branch 'master' into nextgenv2
Conflicts:
	vp10/common/blockd.h
	vp10/common/entropymode.h
	vp10/common/reconintra.c
	vp10/decoder/decodemv.c
	vp10/encoder/bitstream.c
	vp10/encoder/encoder.h
	vp10/encoder/rd.c
	vp10/encoder/rdopt.c
	vp10/encoder/tokenize.h

Change-Id: Ic4891839b6f0474026d6d69821e38edec9632df1
2015-12-07 11:37:14 -08:00
Jingning Han
f5bed806fb Merge "Extend reference motion vector candidate range" into nextgenv2 2015-12-04 06:00:02 +00:00
Angie Chiang
88cae8b422 Refactor vp10_xform_quant
1) Add facade to quantize b/fp/dc version so that their interface
   are the same.
2) Merge vp10_xform_quant b/fp/dc version to one function so that
   the code flow in encodemb.c is clear

Change-Id: Ib62d6215438fc2d07f4e7e72393f964832d6746f
2015-12-03 15:28:11 -08:00
Angie Chiang
2b3f1d36b3 Merge changes Iea45fd22,If174d8dd,I9f539491 into nextgenv2
* changes:
  Add facade to inverse txfm
  Create hybrid_fwd_txfm.c
  merge txfm_#x#_1 into txfm_#x#
2015-12-03 22:29:03 +00:00
Jingning Han
e15fb2bb99 Extend reference motion vector candidate range
This commit adds top-right corner and collocated block into the
reference motion vector candidate check list.

Change-Id: I892a4c7fb04ddda44e0f9dfe769471252d40c42b
2015-12-03 09:25:20 -08:00
Yaowu Xu
3e2273fcee Merge branch 'master' into nextgenv2 2015-12-03 05:00:05 -08:00
Yaowu Xu
24e2b4c005 Merge branch 'master' into nextgenv2 2015-12-02 05:00:05 -08:00
hui su
d7c8bc77c6 Speed up angle search in intra mode selection
Estimate angle histogram using gradient analysis, then skip those
angles that are unlikely to be chosen.

On ext-intra experiment, turning off filter-intra modes:
for all-key-frame setting, computation overhead is reduced
by about 40%, coding gain dropped from +2.08% to +1.96% (derflr);
with kf-max-dist=150, computation overhead is reduced
by about 60%, coding gain dropped from +0.58% to +0.49% (derflr).

Change-Id: I36687410fb10561b8e1a8eebb1528cf17755bd5b
2015-12-01 11:15:47 -08:00
Yaowu Xu
d1486c3837 Merge branch 'master' into nextgenv2 2015-12-01 05:00:05 -08:00
Angie Chiang
a245d9f88c Add facade to inverse txfm
Add inv_txfm and highbd_inv_txfm as facades of inverse transform
such that the code flow in encodemb.c can be simpler

Change-Id: Iea45fd22dd8b173f8eb3919ca6502636f7bcfcf7
2015-11-25 13:50:40 -08:00
Jingning Han
40cedd6763 Refactor sub8x8 ref motion vector search
Take out an unnecessary use of the mode context array.

Change-Id: I4032ed18464e5ec49a2160bea964bad9b716ee54
2015-11-25 13:42:59 -08:00
Angie Chiang
96baa73ed9 Create hybrid_fwd_txfm.c
Move txfm functions from encodemb to hybrid_twd_txfm.c
to make encodemb's code flow clear

Change-Id: If174d8ddb490d149c103e5127d30ef19adfbed13
2015-11-25 12:51:25 -08:00
Debargha Mukherjee
63def292eb Merge "Fix ext-tx experiment for highbitdepth" into nextgenv2 2015-11-25 17:33:12 +00:00
Jingning Han
35921b897b Merge "Make sub8x8 block ref mv search consistent to regular blocks" into nextgenv2 2015-11-25 16:47:01 +00:00
Yaowu Xu
49f5903dd2 Merge branch 'master' into nextgenv2 2015-11-25 05:00:05 -08:00
Jingning Han
7f4bab0697 Make sub8x8 block ref mv search consistent to regular blocks
Check all motion vectors in the immediate above and left blocks if
the reference conditions matched.

Change-Id: I8bf33bfcee99e8150232c7681fdeade307024272
2015-11-24 21:58:10 -08:00
Jingning Han
c7b31a5c49 Remove a redundant argument in setup_ref_mv_list
Change-Id: I215071bff55f8ba6347fa985414b40723b4986f5
2015-11-25 02:22:16 +00:00
Jingning Han
731dcd3e6a Merge "Integrate motion vector stack into codec" into nextgenv2 2015-11-25 02:21:53 +00:00
Angie Chiang
30e325a94b merge txfm_#x#_1 into txfm_#x#
Change-Id: I9f539491fe676898246976c91d5ac4804a155803
2015-11-24 18:21:27 -08:00
Jingning Han
e7569225f1 Merge "Refactor tokenization coding tree" into nextgenv2 2015-11-25 01:15:05 +00:00
Jingning Han
e5c57c580a Integrate motion vector stack into codec
This commit ports the motion vector stack from motion field
analyzer to the encoding and decoding pipeline.

Change-Id: Ie283c1e1a15b4c17a1c7c175ce322bf053bb7840
2015-11-25 01:14:44 +00:00
Jingning Han
11bac096f2 Merge "Analyze motion field to produce reference motion vectors" into nextgenv2 2015-11-25 01:14:12 +00:00
Jingning Han
2ec5ed258a Refactor tokenization coding tree
Expand the tokenization tree writing to support per transform block
type coding in next CLs.

Change-Id: I3560e658f89cc500eb49603f95dd2b4e99045f5b
2015-11-24 16:01:51 -08:00
Jingning Han
254d3e172a Analyze motion field to produce reference motion vectors
This commit allows the codec to analyze the motion field in the
avaiable above and left neighboring area to produce a set of
reference motion vectors for each reference frame. These reference
motion vectors are ranked according to the likelihood that it will
be picked.

Change-Id: I82e6cd990a7716848bb7b6f5f2b1829966ff2483
2015-11-24 15:52:55 -08:00
Debargha Mukherjee
6ef5d8c4ed Merge "Reduce transform options for ext-tx experiment" into nextgenv2 2015-11-24 21:30:10 +00:00
Zoe Liu
9c62f9282f Merge "Added 3 more reference frames for inter prediction." into nextgenv2 2015-11-24 19:47:03 +00:00
Debargha Mukherjee
13e0cfb8c7 Fix ext-tx experiment for highbitdepth
Change-Id: I610e18f150d73378283882ae81f5f77c367d2956
2015-11-24 10:38:37 -08:00
Yaowu Xu
ea78294030 Merge branch 'master' into nextgenv2 2015-11-24 05:00:05 -08:00
Debargha Mukherjee
56ab215dad Reduce transform options for ext-tx experiment
Reduces the transform optons for INTRA as well as INTER when
transform size is 16x16 to not use any of the DSTs.
Thus, a total of 10 options are used for 16x16, while 4x4
and 8x8 still uses 17 options.

derflr/hevchd actually improves a little, while hevcmr drops
a little.

About 10% speed improvement.

Change-Id: I920a182231e052cdd622f8bb67085c16c572cb1e
2015-11-23 12:58:48 -08:00
Jingning Han
c335bfeb56 Move n8_w and n8_h out of experiment flag
These primitive variables are commonly required by many other
experiments as well. The use of n4_w and n4_h was originally
introduced in the vp9 decoder implementation.

Change-Id: I93d701d891e3860f31150031e3b9a2b29a3993d2
2015-11-23 09:46:11 -08:00
Yaowu Xu
c1629ca53b Merge branch 'master' into nextgenv2 2015-11-21 05:00:05 -08:00
Zoe Liu
3ec1601e37 Added 3 more reference frames for inter prediction.
Under the experiment of EXT_REFS: LAST2_FRAME, LAST3_FRAME, and
LAST4_FRAME.

Coding efficiency: derflr +1.601%; hevchr +1.895%
Speed: Encoder slowed down by ~75%

Change-Id: Ifeee5f049c2c1f7cb29bc897622ef88897082ecf
2015-11-20 17:00:24 -08:00
Angie Chiang
6e9ed38d1f Merge "Add vp10_inv_txfm2d" into nextgenv2 2015-11-20 18:22:49 +00:00
Yaowu Xu
c00abfa3ee Merge branch 'master' into nextgenv2 2015-11-20 05:00:05 -08:00
Jingning Han
0c8110efe1 Merge "Add ref-mv experiment flag" into nextgenv2 2015-11-19 23:01:54 +00:00
Hui Su
7dfed5cf21 Merge "Turn off tx type selection for intra blocks by default" into nextgenv2 2015-11-19 19:50:57 +00:00
Jingning Han
fe8ecc843b Add ref-mv experiment flag
Change-Id: Ie2101e362aeb01681313adf67596bc6b479e873e
2015-11-19 11:40:56 -08:00
Yaowu Xu
8c95116d96 Merge branch 'master' into nextgenv2 2015-11-19 05:00:06 -08:00
hui su
d894d34d04 Turn off tx type selection for intra blocks by default
Coding gain on derflr drops to +1.83%.

Change-Id: If68c429f09422a70513d9f1e8e36e10c928e034a
2015-11-18 23:16:25 -08:00
Yaowu Xu
7eeb7671d5 Merge branch 'master' into nextgenv2 2015-11-18 05:00:05 -08:00
Angie Chiang
4fd0ba8f6f Add vp10_inv_txfm2d
Change-Id: Ib63062a52c688e65bae5eb0052ce69d73d96c9c5
2015-11-17 19:53:28 -08:00
Hui Su
4d3cf45992 Merge "Merge MISC_FIXES" into nextgenv2 2015-11-18 01:08:21 +00:00
hui su
66f2f65ef7 Merge MISC_FIXES
Remove MISC_FIXES flags except for the changes on MV precision, which
has a 0.1% performance drop.

On derflr, the impact is -0.012%.

Change-Id: I0a74e5a212dd0cb827192a318c92a714c9681e45
2015-11-17 15:06:08 -08:00
hui su
af084fbec1 Fix some unused variable warnings
Change-Id: Ia7680ddf00dd50dd66bbb5753bae30b937988800
2015-11-17 10:40:25 -08:00
Jingning Han
5f9e089b1d Merge "Limit the reset range of inter_tx_size array" into nextgenv2 2015-11-16 16:58:08 +00:00
Jingning Han
4ae193eec7 Merge "Alternate reference frame" into nextgenv2 2015-11-16 16:04:14 +00:00
Jingning Han
0f34e35d26 Limit the reset range of inter_tx_size array
Reset the effective range of inter_tx_size, instead of the entire
array in the rate-distortion optimization loop.

Change-Id: Id453fbd6dddfe69f4e451ba8518c083326d5dd53
2015-11-15 20:56:04 -08:00
Yaowu Xu
7c5fe4cbff Merge branch 'master' into nextgenv2 2015-11-14 05:00:06 -08:00
Debargha Mukherjee
e9b18242eb Merge "Adding experiment for supertransform" into nextgenv2 2015-11-13 23:50:21 +00:00
Hui Su
83388fb0af Merge "refactor ext-intra" into nextgenv2 2015-11-13 21:19:27 +00:00
hui su
4aa50c17df refactor ext-intra
Coding gain remains about the same, while overall speed is
substantially increased.

Change-Id: I2989bebcfd21092cd6a02653d4df4a3bf6780874
2015-11-13 12:12:09 -08:00
Debargha Mukherjee
5bbacd9bda Adding experiment for supertransform
Change-Id: Ie43027f7d46c43df137fd4a7f731ff6ccb78fcee
2015-11-13 11:32:17 -08:00
Yaowu Xu
b0ab6a3bbd Merge branch 'master' into nextgenv2 2015-11-13 05:00:05 -08:00
Zoe Liu
49c34da379 Merge "Fixed a few sanity checks." into nextgenv2 2015-11-13 04:55:54 +00:00
Angie Chiang
35ec6d2b88 Merge changes Ifafbd497,I042bba27,Id6fd8558,Id5b79519 into nextgenv2
* changes:
  Add adst_dct config to vp10_inv_txfm2d_cfg
  Add adst_adst config to vp10_inv_txfm2d_cfg
  Add dct_adst config to vp10_inv_txfm2d_cfg
  Add dct_dct config to vp10_inv_txfm2d_cfg
2015-11-12 23:38:44 +00:00
Jingning Han
140182b96c Alternate reference frame
This commit re-designs the alternate reference frame generation
process. It employs non-local mean approach to produce more stable
pixel estimation for alternate reference frame. It improves the
compression performance gains:
derf   0.5%
hevcmr 0.8%
stdhd  1.3%
hevchr 1.0%

The encoding time at speed 0 is not affected.

Change-Id: Iaa757f0da189ce93812d69617a81bf630d449848
2015-11-12 11:16:59 -08:00
Yaowu Xu
80b30f46a5 Merge branch 'master' into nextgenv2 2015-11-12 05:00:05 -08:00
Angie Chiang
7104079efb Add adst_dct config to vp10_inv_txfm2d_cfg
Change-Id: Ifafbd4974be44685ab2550ed159dbf0411b6f031
2015-11-11 18:02:42 -08:00
Angie Chiang
164ba2a2d8 Add adst_adst config to vp10_inv_txfm2d_cfg
Change-Id: I042bba27540ab2a3d8a00871980295e98f616480
2015-11-11 17:59:22 -08:00
Angie Chiang
db88473ea9 Add dct_adst config to vp10_inv_txfm2d_cfg
Change-Id: Id6fd8558452f64c4ac30d7cb656b659f0587b5d6
2015-11-11 17:55:35 -08:00
Angie Chiang
09c2809a50 Add dct_dct config to vp10_inv_txfm2d_cfg
Change-Id: Id5b795198552443a700413284a1015296e267dcf
2015-11-11 17:51:55 -08:00
Zoe Liu
2096296421 Fixed a few sanity checks.
Change-Id: Ieec4a7be5945dc6de192e2d8292ab978baf47f53
2015-11-11 10:38:17 -08:00
Yaowu Xu
edaf8c4596 Merge branch 'master' into nextgenv2 2015-11-11 05:00:06 -08:00
Yaowu Xu
0e929ef94d Merge "Replace inline with INLINE" into nextgenv2 2015-11-11 01:31:14 +00:00
Yaowu Xu
843e2bad4b Merge "Fix msvc compling" into nextgenv2 2015-11-11 01:31:06 +00:00
Jingning Han
73e75e9dbc Merge "Fix an encoding failure case when speed features are on" into nextgenv2 2015-11-11 01:25:55 +00:00
Angie Chiang
0694844fae Merge "Add vp10_fwd_txfm2d_test" into nextgenv2 2015-11-11 00:28:35 +00:00
Yaowu Xu
a08bfb778a Replace inline with INLINE
Change-Id: I37b5ed9fef0e97feabd856bd4c1b4c7869991a34
2015-11-10 16:09:09 -08:00
Yaowu Xu
72a6cb62ee Fix msvc compling
Change-Id: I5abd6d2fd198b3789732e81b23a5bac009af5290
2015-11-10 16:08:09 -08:00
Jingning Han
35b3bd3e3b Fix an encoding failure case when speed features are on
This commit fixes an encoding failure case triggered when early
termination feature is turned on for transform block size search.
It resolves the corresponding enc/dec mismatch issue.

Change-Id: I2c5b7d8b1efe25fe3810e6ed307f4b1865dede49
2015-11-10 16:04:00 -08:00
Yaowu Xu
dcbe42298a Merge "Get test to build with MSVC" into nextgenv2 2015-11-10 23:08:51 +00:00
Angie Chiang
af38f6fca4 Add vp10_fwd_txfm2d_test
Change-Id: Icbc17403430751d3a841f822a190f0c30450d603
2015-11-09 15:18:15 -08:00
Yaowu Xu
4bc259db3d Get test to build with MSVC
Added _USE_MATH_DEFINES to make sure M_PI can be refered to definitions
in math.h for MSVC

Change-Id: Idca128910384593a002eb08bae72c739fb998e19
2015-11-09 12:07:25 -08:00
Yaowu Xu
b49ac0b160 Merge branch 'master' into nextgenv2
Change-Id: I8811bfd8fc132b9f515707e795bb6308e4bf263b
2015-11-09 09:52:18 -08:00
Debargha Mukherjee
bc54f9dc00 Merge "Resolve conficts caused by master branch merging" into nextgenv2 2015-11-06 23:35:07 +00:00
Angie Chiang
c7c69d88af Merge changes I7ca0cc34,I97189d6e,I4e2b51cf,I21158867,I8d73beee into nextgenv2
* changes:
  Add adst_dct config to vp10_fwd_txfm2d_cfg
  Add adst_adst config to vp10_fwd_txfm2d_cfg
  Add dct_adst config to vp10_fwd_txfm2d_cfg
  Add dct_dct config to vp10_fwd_txfm2d_cfg
  Add vp10_fwd_txfm2d_8x8/16x16/32x32
2015-11-06 23:34:56 +00:00
Angie Chiang
e26c712ab2 Merge "Add vp10_fwd_txfm2d_4x4" into nextgenv2 2015-11-06 23:34:35 +00:00
hui su
707cd03658 Resolve conficts caused by master branch merging
Change-Id: I167e241b789331572581fcb0567ebe535b4b9345
2015-11-06 14:35:08 -08:00
Angie Chiang
45222e5b20 Add adst_dct config to vp10_fwd_txfm2d_cfg
Change-Id: I7ca0cc341ae36ac9f7aa24789f8872161b832b7b
2015-11-06 10:47:46 -08:00
Angie Chiang
786f1af891 Add adst_adst config to vp10_fwd_txfm2d_cfg
Change-Id: I97189d6e917929c756a3f89fe0ab66077a0a5436
2015-11-06 10:47:46 -08:00
Angie Chiang
634d0bdc7c Add dct_adst config to vp10_fwd_txfm2d_cfg
Change-Id: I4e2b51cf5b0dedb9ea1106747edb76835804fffc
2015-11-06 10:47:46 -08:00
Angie Chiang
51c0c35c6a Add dct_dct config to vp10_fwd_txfm2d_cfg
Change-Id: I21158867fb2b762d3632d0664ebe70c68d0953e1
2015-11-06 10:47:46 -08:00
Angie Chiang
f08141c734 Add vp10_fwd_txfm2d_8x8/16x16/32x32
Change-Id: I8d73beee5a619d26f3f8640a6679150d874522c4
2015-11-06 10:47:45 -08:00
Angie Chiang
ff7fe99342 Add vp10_fwd_txfm2d_4x4
Change-Id: I9bca3b1c76b64575366d71ab65ffef7264ce0c9b
2015-11-06 10:39:27 -08:00
Debargha Mukherjee
85514c40ae New interpolation experiment
Adds a new interpolation experiment.

Improves entropy coding to send the filter type only if
the motion vectors have subpel components.
Adds one new 8-tap smooth filter, and tweaks the others.

derflr: +0.695%
hevcmr: +0.305%

About 5% encode slowdown. No visible impact for decoding.

Also makes the interpolation framework flexible to support both
strictly interpolating filters as well as non-interpolating
filters that filter integer offsets. This is mainly for
further experimentation and if not found useful the code will
be removed.

Change-Id: I8db9cde56ca916be771fe54a130d608bf10786e6
2015-11-06 09:51:34 -08:00
Hui Su
9b3ad185dc Merge "ext-intra experiment" into nextgenv2 2015-11-06 17:40:49 +00:00
Yaowu Xu
fe4160d8e3 Merge branch 'master' into nextgenv2 2015-11-06 05:00:06 -08:00
Debargha Mukherjee
70e514ce78 Merge "Flip the result of the inverse transform for FLIPADST." into nextgenv2 2015-11-06 09:20:46 +00:00
Debargha Mukherjee
46d2cc5714 Merge "Eliminate copying for FLIPADST in fwd transforms." into nextgenv2 2015-11-06 08:37:25 +00:00
Angie Chiang
444acd771b Add vp10_inv_txfm1d_test
Change-Id: I3b76c0146af7f191cdae31d2b53ab6d51ac791a4
2015-11-04 14:23:56 -08:00
Angie Chiang
b0df5e0f9e Add iadst32
Change-Id: I3a53ee51146d0bd4b0fe4b27c286e8c921f9823b
2015-11-04 14:23:56 -08:00
Angie Chiang
35486a6b88 Add iadst16
Change-Id: I093881aacaf9a070f78cc4eea2e8a6ede8a71792
2015-11-04 14:23:56 -08:00
Angie Chiang
0ca0cc240b Add iadst8
Change-Id: Ia58e4735d7d7bfd2ac55259c32705118c6745c6d
2015-11-04 14:23:56 -08:00
Angie Chiang
ba69089e65 Add iadst4
Change-Id: Ie419b2b1e939a41c30ed609e1ba46f5f6609b2a5
2015-11-04 14:23:56 -08:00
Angie Chiang
7467833401 Add idct32
Change-Id: I75412bdc4bd0d9c90e8b56e02e0e467a2d9957f9
2015-11-04 14:23:56 -08:00
Angie Chiang
d3cee565ad Add idct16
Change-Id: I8e5ba3a3f9b64ccbf038e371525e897774729b06
2015-11-04 14:23:56 -08:00
Angie Chiang
bd9db2f55b Add idct8
Change-Id: I8092a6f229b196c5c8b7dcd2dff8aaf68253e422
2015-11-04 14:23:56 -08:00
Angie Chiang
7d2b7b6944 Add idct4
Change-Id: I1d1b6822452772cec95160491c7bc6d3bba1f5c2
2015-11-04 14:23:56 -08:00
Angie Chiang
b934148fb6 Add vp10_fwd_txfm1d_test
Change-Id: If3bef2be355227cfc2932e4471b84c21c7cd2b90
2015-11-04 14:23:56 -08:00
Angie Chiang
a9253a2029 Add fadst32
Change-Id: I77299f0e39fc7cef91e7e420513dbd05194f320a
2015-11-04 14:23:56 -08:00
Angie Chiang
a7d26f4e80 Add fadst16
Change-Id: I5175e39b5df73646488f74b2a9e4a463ae79d91a
2015-11-04 14:23:56 -08:00
Debargha Mukherjee
12fac1c281 Merge "Fix transform tables in C implementations." into nextgenv2 2015-11-04 21:11:38 +00:00
Angie Chiang
3813c2bc46 Merge "Add fadst8" into nextgenv2 2015-11-04 20:21:08 +00:00
Angie Chiang
498866b699 Merge "Add fadst4" into nextgenv2 2015-11-04 20:20:57 +00:00
Jingning Han
de00c163c7 Merge "Simplify txfm rate-distortion optimization" into nextgenv2 2015-11-04 19:31:03 +00:00
Jingning Han
493d02347c Simplify txfm rate-distortion optimization
This commit refactors the rate-distortion optimization scheme for
transform block coding. When both ext-tx and var-tx experiments
are turned on, the encoding time for bus_cif at 1000 kbps goes down
from 706377 ms to 666503 ms (5.6% speed-up). The coding statics
remain unchanged.

Change-Id: I20835db573725580aad79c16220f799ce01f2093
2015-11-04 10:25:48 -08:00
Geza Lore
4f5108090a Flip the result of the inverse transform for FLIPADST.
When using FLIPADST, the vp10_inv_txfm_add functions used to flip
the destination array, add the result of the inverse transform, to it
and then flip the destination back. This has been replaced by
flipping the result of the inverse transform before adding it to the
destination. Up-Down flipping is done by negating the destination
stride, and staring from the bottom, so it should now be free.
Left-right flipping is done with the usual SSE2 instructions in the
optimized code.

The C functions match the SSE2 functions as expected, so the C functions
now do the flipping as well when required. Adding this cleanly required
some refactoring of the C functions, but there is no measurable
performance impact when ext-tx is not enabled.

Encode speedup with ext-tx enabled is about 3%.

Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234
2015-11-04 17:11:44 +00:00
Yaowu Xu
4aafd01861 Merge branch 'master' into nextgenv2 2015-11-04 05:00:05 -08:00
hui su
be3559ba07 ext-intra experiment
Currently there are two parts in this experiment: extra directional intra
prediction modes and the filter intra modes migrated from the nextgen branch.

Several macros are defined in "blockd.h" to provide controls of the experiment
settings. Setting "DR_ONLY" as 1 (default is 0) means we only use directional
modes, and skip the filter-intra modes; "EXT_INTRA_ANGLES" (default is 128)
defines the number of different angles we want to support; setting
"ANGLE_FAST_SEARCH" as 1 (default is 1) means we use fast sub-optimal search
for the best prediction angle, instead of exhaustive search. The fast search
is about 6 times faster than the exhaustive search, while preserving about
60% of the coding gains.

With extra directional prediction modes (fast search), we observe the following
code gains (number in parentheses is for all-key-frame setting):
derflr +0.42%  (+1.79%)
hevclr +0.78%  (+2.19%)
hevcmr +1.20%  (+3.49%)
stdhd  +0.56%
Speed-wise, about 110% slower for key frames, and 30% slower overall.

The gains of filter intra modes mostly add up with the gains of directional
modes. The overall coding gain of this experiment:
derflr +0.94%
hevclr +1.46%
hevcmr +1.94%
stdhd  +1.58%

Change-Id: Ida9ad00cdb33aff422d06eb42b4f4e5f25df8a2a
2015-11-03 18:46:02 -08:00
Jingning Han
4101154d5b Merge "Re-work rate-distortion optimization scheme for transform coding" into nextgenv2 2015-11-03 22:47:21 +00:00
hui su
a3a1b2d052 Speed up per-commit test for nextgenv2 branch
Jenkins per-commit test need to be expedited as more experiments are
added into the nextgenv2 branch. This patch does the following:

thread test: change the length of test clip from 5 frames to 3 frames;
only test speed 1.
ArfFreq test: marked as "large".

The tests marked as "large" will be removed from per-commit test
(to nightly test).

Change-Id: I62b373c52b481dcd281e741ebf5098408a97ff4d
2015-11-03 12:27:19 -08:00
Geza Lore
01bb4a318d Eliminate copying for FLIPADST in fwd transforms.
This patch eliminates the copying of data when using FLIPADST forward
transforms, by incorporating the necessary data flipping into the
load_buffer_* functions of the SSE2 optimized forward transforms. The
load_buffer_* functions are normally inlined, so the overhead of copying
the data is removed and the overhead of flipping is minimized. Left to
right flipping is still not free, as the columns need to be shuffled in
registers.

To preserve identity between the C and SSE2 implementations, the
appropriate C implementations now also do the data flipping as part of
the transform, rather than relying on the caller for flipping the input.

Overall speedup is about 1.5-2% in encode on my tests. Note that these
are only the forward transforms. Inverse transforms to come in a later
patch.

There are also a few code hygiene changes:
- Fixed some indents of switch statements.
- DCT_DCT transform now always use vp10_fht* functions, which dispatch
  to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
  directly, some of them used to call vp10_fht*).

Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574
2015-11-03 17:10:55 +00:00
Geza Lore
2b39bcec29 Fix transform tables in C implementations.
These tables were out of sync with the indexing enum since the
refactoring in commit 4f16f119 (change 303389), due to the removal
of the ext_tx_to_txtype lookup table. This patch just puts them
back in order.

Change-Id: Ieb7d57654f61b99b511d54c9ba09abbd5e8d0d14
2015-11-03 17:10:51 +00:00
Jingning Han
696ee004a5 Re-work rate-distortion optimization scheme for transform coding
This commit re-works the rate-distortion optimization scheme for
transform coding. It improves the overall compression performance.
For derf set, the ext-tx experiment provides 2.27% coding gains,
and the new scheme that integrates multiple transform type selection
and recursive transform block partitioning provides a total of 3.24%
coding gains.

Change-Id: Ia1887c4c44b73dfb915d091d96660a99f09d5cc3
2015-11-03 09:03:53 -08:00
Jingning Han
6d43a53a0c Merge "Incorporate flexible tx type and tx partition in RD scheme" into nextgenv2 2015-11-03 16:43:48 +00:00
Yaowu Xu
2c32861814 Merge branch 'master' into nextgenv2 2015-11-03 05:00:04 -08:00
Jingning Han
4b594d3d00 Incorporate flexible tx type and tx partition in RD scheme
This commit hooks up the rate-distortion optimization system to
fully exploit recursive transform block partition and multiple
transform type. The compression performance of the two experiments
largely adds up. For derf set, ext-tx provides additional 2.1%
coding gains on top of the gains due to recursive transform block
partition (0.69%).

Change-Id: I1091fb9545f74e489a6a2489dc3c12f5abd05043
2015-11-02 17:40:05 -08:00
Jingning Han
506e3b136c Merge "Fix block size computation in coeff token packing" into nextgenv2 2015-11-03 00:28:15 +00:00
Jingning Han
4b0ef55f10 Fix block size computation in coeff token packing
Correctly compute the block size in bit-stream coefficient token
packing. This fixes an enc/dec mismatch at very high bit-rates.

Change-Id: I37bf084731dc660df0c695cad406ddcd0f9eb904
2015-11-02 14:55:55 -08:00
Debargha Mukherjee
51c357083c Merge "Adding placeholders for new expts to be added" into nextgenv2 2015-11-02 20:25:36 +00:00
Jingning Han
dfd054649f Merge "Make loop filter support recursive transform block partitioning" into nextgenv2 2015-11-02 19:06:48 +00:00
Debargha Mukherjee
73e06f33b5 Adding placeholders for new expts to be added
Change-Id: I38952cd55b91f35e5db45bc8e6a20ef25069c464
--ext-refs:   extended references - for multi-ref in nextgen
--ext-inter:  extended inter - for new_inter/copy_mode in nextgen
--ext-interp: for new interpolation
2015-11-02 09:58:58 -08:00
Jingning Han
88b3b23619 Merge "Refactor loop filter mask" into nextgenv2 2015-10-31 18:21:33 +00:00
Jingning Han
365fa8d12d Merge "Fix a switch condition in select_tx_block" into nextgenv2 2015-10-31 18:20:54 +00:00
Yaowu Xu
893468dd04 Merge branch 'master' into nextgenv2 2015-10-31 05:00:04 -07:00
Jingning Han
94266f4f34 Make loop filter support recursive transform block partitioning
This commit allows the loop filter to account for the recursive
transform block partition when selecting the filter and mask.

Change-Id: I62b6c2dcc0497cbe1f264b03c46163f55d2c9752
2015-10-30 15:42:25 -07:00
Jingning Han
6727943ceb Refactor loop filter mask
This commit refactors the loop filter selection process to support
variable transform block sizes based filter mask. It disables the
multi-thread loop filter implementation to simplify the experiments.
The speed impact on speed 0 encoding is negligible.

Change-Id: Ia470b6da9ad833fe6eb72d2cbeda9296b21910ec
2015-10-30 15:25:16 -07:00
Jingning Han
47c7fd984e Fix a switch condition in select_tx_block
Change-Id: I3d90a0286c5ef559b91ad298db97e8990becf85f
2015-10-30 13:01:52 -07:00
Angie Chiang
27a739b650 Add fadst8
Change-Id: Ida4f292b824a29b4ffac8cad6e4042867f427979
2015-10-30 12:18:54 -07:00
Angie Chiang
1ef285bfe9 Add fadst4
Change-Id: I320a1cf82d1705e5ec7fe1870327f70ce8493489
2015-10-30 12:18:08 -07:00
Angie Chiang
0a6cab1dbc Add fdct32
Change-Id: Ifc709b62313cca0101638ed85ddb5c82e5f84fac
2015-10-30 12:07:40 -07:00
Angie Chiang
67e5b45dad Add fdct16
Change-Id: Id155b76d6694ba3fe5184ed2c20d57db4951ebf8
2015-10-30 12:05:24 -07:00
Angie Chiang
3eb719f6b0 Add fdct8
Change-Id: I66b884bc831324b5295c7874aa23f62042157834
2015-10-30 11:24:06 -07:00
Angie Chiang
ba78f80b01 Merge "Add fdct4" into nextgenv2 2015-10-30 17:41:21 +00:00
Jingning Han
b86b76bb4a Merge "Support per transform block skip coding" into nextgenv2 2015-10-30 16:58:03 +00:00
Jingning Han
bfeac5e19c Support per transform block skip coding
Allow the encoder to drop individual transform block coding.

Change-Id: I2c2b2985254cb92baf891f03daa33f067279373b
2015-10-30 08:55:17 -07:00
Yaowu Xu
cca1b39586 Merge branch 'master' into nextgenv2 2015-10-30 05:00:05 -07:00
Jingning Han
366bf3c2b6 Merge "Reset txfm context condition for skip coded blocks" into nextgenv2 2015-10-30 02:15:34 +00:00
Jingning Han
981f09a1f1 Reset txfm context condition for skip coded blocks
If a block has all coefficients quantized to zero, the codec will
assume that it uses largest transform block size.

Change-Id: Icd4e8e7cdc4b6af6974f87169e50b040ebfe9020
2015-10-29 18:02:37 -07:00
Jingning Han
88b9e90a56 Turn off fixed tx size in frame header
Temporarily turn off the fixed transform size at frame level.

Change-Id: I94a6a3b18893909d33fb7fa91e73ee3568b537b2
2015-10-29 14:30:56 -07:00
Angie Chiang
2e0aa9fb28 Add fdct4
Change-Id: Ic4539cb6c2d421ddffa44f58d3ce21bd797b57c6
2015-10-29 14:05:25 -07:00
Jingning Han
3edad6e887 Enable entropy coding of recursive transform block partition
This commit enables the entropy coding of the recursive transform
block partition syntax.

Change-Id: I0c2509fb7b9822d12a721f9ebf9327fac83c777e
2015-10-29 11:06:46 -07:00
Debargha Mukherjee
bdaa257674 Merge "Refactoring tx-types to add more flexibility" into nextgenv2 2015-10-29 16:51:14 +00:00
Yaowu Xu
fff670456d Merge branch 'master' into nextgenv2 2015-10-29 05:00:05 -07:00
Debargha Mukherjee
8a4292441f Refactoring tx-types to add more flexibility
Allows inter and intra tx_types to have different sets of
transforms for different tx_size/sb_type combinations.

Change-Id: Ic0ac1daef7a9fb15c4210271e4d04cd36e5cec8e
2015-10-28 23:31:32 -07:00
Jingning Han
71c156070c Use precise distortion metric
Rework the rate distortion optimization pipeline. Use precise
distortion metric that accounts for the forward and inverse
transform rounding effect.

Change-Id: Ibe19ce9791ec3547739294cc3012dd9e11f4ea49
2015-10-28 11:47:14 -07:00
Jingning Han
4bfed0b32e Account for variable txfm sizes in coeff token packing
This commit makes the coefficient token packtization process account
for variable transform block sizes supported in a single processing
block. It fixes an enc/dec mismatch issue when var-tx, ext-tx, and
misc-fixes experiments are all turned on.

Change-Id: I2e8946e6f72de567603a568debbadad11196430c
2015-10-28 11:45:31 -07:00
Yaowu Xu
eb7b5f660d Merge branch 'master' into nextgenv2
Change-Id: I63dc39d1ec9ad2e2454da6f5956dcd4367b87190
2015-10-28 08:14:16 -07:00
Jingning Han
f847a16a7c Add tx_type counts in key frame
Properly update the transform type counts in key frame coding at
decoder. It fixes an enc/dec mismatch issue when both ext-tx and
misc-fixes are turned on.

Change-Id: I1e40a77c8d8157d5ff254b072ce474d8dfbaa3ae
2015-10-27 16:51:45 -07:00
Debargha Mukherjee
0526305151 Merge "Accumulate EXT_TX counts for multithread" into nextgenv2 2015-10-27 18:48:25 +00:00
Peter de Rivaz
325b96dcac Accumulate EXT_TX counts for multithread
EXT_TX introduces some new symbols to be decoded.
The encoder counts how many times these are used.
In multithreaded mode, the counts from the worker threads
need to be accumulated into the main thread.

This change means that VP10/VPxEncoderThreadTest now works
with more choices of cpu-used and number of passes.

Change-Id: Ibe7e6a3c58145265f4ead155ff98fb4cb37c3513
2015-10-27 09:41:07 -07:00
Yaowu Xu
b6da40ad82 Merge branch 'master' into nextgenv2
Change-Id: I0e4030a37354bb23b3aa8be5cc1473770b9e7b06
2015-10-27 08:28:09 -07:00
Jingning Han
236623cf2c Fix early termination flag in recursive transform block search
Properly reset the early termination flag in the recursive transform
block partitioning rate-distortion optimization scheme.

Change-Id: Ibfe918f21f11dcb1ec267c09f954c635305cc95a
2015-10-26 19:41:14 -07:00
Jingning Han
f0dee7765a Fix lossless coding
Use inter_block_yrd as rate-distortion optimization for lossless
coding. This fixes transform coefficient buffer swap use case and
resolves the unit test failure related to lossless coding.

Change-Id: I1512dab5ed5760c31f7de21a06e8d9ed1eb081fa
2015-10-26 22:03:39 +00:00
Debargha Mukherjee
c655b5f5f5 Test fix for VP10
Disbales threading test for speeds > 0 and all modes other than
2-pass temporarily.

Change-Id: I098ef2b16f575c039a7f6a21244dd87eee6960ce
2015-10-26 22:03:05 +00:00
Jingning Han
232098a774 Merge "Make transform block partition scheme support use largest txfm setting" into nextgenv2 2015-10-26 19:37:57 +00:00
Jingning Han
01ba752a0d Make transform block partition scheme support use largest txfm setting
This commit properly resets the recursive transform block partition
array in the settings of using largest transform block size at frame
header level. It fixes one of the unit test failure related to the use
of frame level fixed transform block size with 440 color format.

Change-Id: I6750f323e2c2510c080ffc3af82ce2041f4f60b8
2015-10-26 11:13:36 -07:00
Yaowu Xu
c6d31ecce0 Merge branch 'master' into nextgenv2 2015-10-26 08:57:30 -07:00
Jingning Han
3ff3313502 Silence compiler warnings when high bit-depth is turned on
Clear the compiler warnings when both ext-tx and high bit-depth
are turned on.

Change-Id: I2e02f1f29043f2952fe215f8183b5bfd80e16f58
2015-10-23 14:51:16 -07:00
Jingning Han
79fe7246c1 Properly handle non-420 color format in recursive transform scheme
This commit makes the recursive transform block partitioning properly
handle the non-420 color format. It resolves an enc/dec mismatch
issue in that setting when var-tx experiment is turned on.

Change-Id: I48a91de02c11b3153f897d1cca0ae948eec15605
2015-10-23 14:42:01 -07:00
Debargha Mukherjee
f1c4b79d72 Build fix for ext-tx
Change-Id: Ifab43f85f6ae1be6b9f95521f79ba49055353b5f
2015-10-23 21:38:50 +00:00
Jingning Han
48c7de0fce Fix enc/dec mismatch in var-tx experiment
This commit fixes an enc/dec mismatch issue in recursive transform
partitioning experiment due to merge conflict.

Change-Id: I66146ef806c008902c91d54f4f8c7ccf47996b78
2015-10-23 13:58:38 -07:00
Yaowu Xu
ed8bddaac8 Merge "Merge branch 'masterbase' into nextgenv2" into nextgenv2 2015-10-23 01:32:19 +00:00
Yaowu Xu
37d17b6518 Merge branch 'masterbase' into nextgenv2
Conflicts:
	configure

Change-Id: I7f331981e19338451d16030f0ac1179db2e08c4d
2015-10-22 18:31:26 -07:00
Yaowu Xu
df33be4b22 Merge "Fix merge defects" into nextgenv2 2015-10-23 01:28:32 +00:00
Yaowu Xu
5a27b3bb85 Fix merge defects
This commit fixes the merge conflicts between master and nextgenv2 and
disable early termination in choose_tx_size() to avoid failure in test.

The test failures are pre-existing, some of the issue were fixed in
masterbase already, so will have another merge to introduce the fixes.

Change-Id: Ib71889661955e73aedbb4db49d8be70425281dcb
2015-10-22 18:25:41 -07:00
Jingning Han
18aea429af Merge "Reset tx_type in recursive transform block partitioning" into nextgenv2 2015-10-22 21:34:32 +00:00
Jingning Han
d3e5545fa6 Reset tx_type in recursive transform block partitioning
Temporarily reset the transform type in the inter modes when
recursive transform block partitioning is used. This resolves an
enc/dec mismatch issue in nextgenv2 codebase when both var-tx and
ext-tx experiments are turned on.

Change-Id: I2543f0a567243da95b237752d46964b07b669ad9
2015-10-22 14:28:56 -07:00
Yaowu Xu
4ac2ae3a4d Merge branch 'masterbase' into nextgenv2
Conflicts:
	configure
	test/vp9_encoder_parms_get_to_decoder.cc
	vp10/common/blockd.h
	vp10/common/entropymode.c
	vp10/common/entropymode.h
	vp10/common/idct.c
	vp10/decoder/decodeframe.c
	vp10/decoder/decodemv.c
	vp10/encoder/bitstream.c
	vp10/encoder/encodeframe.c
	vp10/encoder/encodemb.c
	vp10/encoder/encoder.c
	vp10/encoder/encoder.h
	vp10/encoder/rd.c
	vp10/encoder/rdopt.c
	vp10/encoder/tokenize.c
	vp10/encoder/tokenize.h
	vp9/decoder/vp9_decodeframe.c
	vp9/decoder/vp9_decoder.h
	vp9/encoder/vp9_aq_cyclicrefresh.c
	vp9/encoder/vp9_encoder.h
	vp9/vp9_cx_iface.c
	vpx/vp8cx.h
	vpx_dsp/x86/vpx_subpixel_8t_intrin_ssse3.c
	vpx_scale/yv12config.h

Change-Id: I604a329d38badec7a11e8ede16ca1404476e9b93
2015-10-22 11:40:44 -07:00
Jingning Han
20484048a9 Fix compiler error with high bit-depth and var-tx
Clear the compiler errors when both high bit-depth and recursive
transform block partition experiments are enabled.

Change-Id: If0b6396851f10c28b4f26350322ccd1ba2fc9aff
2015-10-21 17:38:00 -07:00
Jingning Han
6a9ed8d2b6 Fix forward transform bit range limits
Change-Id: I13c0ecff8c58a0571d9de4bc5fbbebe72533ccdb
2015-10-15 09:09:44 -07:00
Jingning Han
8198be113d Fix a compiler error under ext-tx experiment flag
Change-Id: Ib3df8c10b9df5627358ae3315b05b81fdca60535
2015-10-14 19:46:35 -07:00
Jingning Han
1e48f74d9a Enable early termination in the recursive transform block search
It makes the encoder 5% faster for CIF clips and 12% faster for
1080p clips.

Change-Id: I073408dbb4d50675a79db8794fe73975ac957b91
2015-10-13 21:14:58 +00:00
Jingning Han
3a27961cf3 Refactor recursive transform block scheme
This commit re-designs the recursive transform block partition
rate-distortion optimization framework. It allows the encoder to
improve speed by 10%.

Change-Id: I6dd3a7dd428a530d8012e5c6ddc40e650c8b392b
2015-10-13 21:13:29 +00:00
Jingning Han
2cdc12742d Rate-distortion optimization for recursive transform block coding
This commit enables the rate-distortion optimization for recursive
transform block coding scheme.

Change-Id: Id6a8336ca847bb3af1e94cbfb51db1f4da12d38f
2015-10-13 12:49:03 -07:00
Jingning Han
a8dad55c82 Make chroma component RD estimate support transform partition
This commit makes the rate-distortion optimization for chroma
component support the recursive transform block coding scheme.

Change-Id: I1bfed6d05b0ebb3905cb625222401e2ccbae10f3
2015-10-08 18:04:03 -07:00
Jingning Han
704985e65a Add decoder support to recursive transform block partition
This commit allows the decoder to recursively parse and rebuild
the pixel blocks.

Change-Id: I510f3a30ae7cdad5b70725c66882b00a0594e96f
2015-10-08 16:16:41 -07:00
Jingning Han
52bb9dd45c Make tokenization process support recursive transform block coding
This commit makes the transform, quantization, tokenization and
their corresponding inverse operations support recursive transform
block coding process.

Change-Id: I71f2ef3a7c2d3db7cfc63c1fd3f1337e8e0360b5
2015-10-08 08:46:02 -07:00
Jingning Han
cffcfdb809 Add support to recursive transform block coding
This commit re-designs the bitstream syntax to support recursive
transform block partition. The initial stage targets the inter
prediction residuals.

Change-Id: I556ab3c68c198387a2fd2d02e2b475e83cd417c3
2015-10-07 19:33:13 -07:00
Jingning Han
056c741ac7 Merge "Use explicit block position in foreach_transformed_block" into nextgenv2 2015-10-08 01:39:37 +00:00
Jingning Han
ebc48efe37 Use explicit block position in foreach_transformed_block
Add the row and column index to the argument list of unit functions
called by foreach_transformed_block wrapper. This avoids the
repeated internal parsing according to the block index.

Change-Id: I42b3578eac258ebaba7a7c74f684de9abab521a6
2015-10-07 16:32:19 -07:00
hui su
4b447e7b05 Add ext_intra experiment
Experiment for extended/extra intra prediction.

Change-Id: Icfeaeb62bafd69474302d2de36d42c6a077a46c2
2015-10-07 11:26:44 -07:00
Jingning Han
00ca5c1c98 Simplify vp10_xform_quant index parsing
Change-Id: Id7f7a9b2e53fc0074b55d58143f296afad6b844e
2015-10-06 17:19:23 -07:00
Hui Su
cd7c7a9d3b Merge "Refactor ext-tx experiment" into nextgenv2 2015-10-05 19:31:51 +00:00
Hui Su
6a9f0db997 Merge "Extend ext_tx experiment to intra blocks" into nextgenv2 2015-10-05 19:31:33 +00:00
Zoe Liu
8806955dbd Added is_compound_ref() to identify compound prediction
Change-Id: I7e3bf9f181e0cfbebf7afe93dabb03384b595b79
2015-10-02 13:47:15 -07:00
hui su
4f16f11993 Refactor ext-tx experiment
Remove unnecessary transform type lookups and unused codes.

Change-Id: Ib52d26690468996b1501b419d919643c8ea5ecaa
2015-10-02 11:00:51 -07:00
hui su
3fa0129caf Extend ext_tx experiment to intra blocks
ext-tx on derflr +2.30% (was +1.84%)

Change-Id: Ic91565cacc38e7a8e1200d054ed7bf99295fe19e
2015-10-02 09:39:38 -07:00
hui su
f53153db42 Fix a bug induced in 0c2e393c3274da40e228a157a483c03380492092
ext-tx on derflr: +1.841% (was +1.756)

Change-Id: Ic8c59a6fa3c77b0d2a2c493fe8cb758d91b0d886
2015-10-01 15:59:21 -07:00
Hui Su
2858110294 Merge "Add identity transform to ext-tx experiment" into nextgenv2 2015-10-01 18:17:48 +00:00
hui su
2afe7320c8 Add identity transform to ext-tx experiment
ext-tx on derflr: +1.756% (was +1.648)

Change-Id: I8a87970fa589e8f5f96db7aa68ec9b6c98e20188
2015-09-30 18:47:46 -07:00
Yaowu Xu
0f29a024af Move vp10-specific functions to vp10 from vpx_dsp
They are used by VP10 only, are now moved to vp10 and made static.

Change-Id: I4a4d4f1ceae1f7143240629bb94f8daf2733879d
2015-09-30 17:19:08 -07:00
Hui Su
4b7043f804 Merge "ext-tx experiment support in choose_largest_tx_size" into nextgenv2 2015-09-30 15:14:17 +00:00
Debargha Mukherjee
3e8cceb3fc Speed up of DST and the search in ext_tx
Adds an early termination to the ext_tx search, and also
implements the DST transforms more efficiently.

About 4 times faster with the ext-tx experiment.

There is a 0.09% drop in performance on derflr from 1.735% to
1.648%, but worth it with the speedup achieved.

Change-Id: I2ede9d69c557f25e0a76cd5d701cc0e36e825c7c
2015-09-29 19:11:43 -07:00
Yaowu Xu
7c514e2dfd Merged branch 'master' into nextgenv2
Resolved Conflicts in the following files:
        configure
        vp10/common/idct.c
        vp10/encoder/dct.c
        vp10/encoder/encodemb.c
        vp10/encoder/rdopt.c

Change-Id: I4cb3986b0b80de65c722ca29d53a0a57f5a94316
2015-09-29 16:17:32 -07:00
hui su
6c81e37916 ext-tx experiment support in choose_largest_tx_size
Change-Id: Ic161b8b257a02c1c43e515d830c1051d0de074de
2015-09-29 12:09:15 -07:00
hui su
07154b0216 Refactor ext-tx experiment
At this point, ext-tx compapred to baseline +1.735%.

Change-Id: Ia16ac293e2cc87e06a0d898c1d52a8f3495ff814
2015-09-23 09:14:49 -07:00
hui su
8e273b23ad Adjust rd calculation in choose_tx_size_from_rd
Consider tha case in which skipping transform coefficients is more
efficient.

derflr +0.13%
hevclr +0.11%
hevcmr +0.14%
hevchr +0.22%

with ext-tx, the impact is -0.02%.

Change-Id: I0aa2965cf9e152396623c2fee62545bd3a3a7f07
2015-09-23 09:13:55 -07:00
hui su
38debe512e Simplify choose_tx_size_from_rd
No impact on performance.

Change-Id: Ib0420b190c9d83ef47b14ea78d5918a6a5078e3a
2015-09-23 09:08:47 -07:00
Debargha Mukherjee
b2ec0a0d1c Merge "Changes to ext-tx probs" into nextgenv2 2015-09-18 00:41:57 +00:00
Debargha Mukherjee
09ff5f2792 Merge remote-tracking branch 'origin/master' into nextgenv2
Periodic merge to get master changes into nextgenv2.

Change-Id: I6f0e4b470f193da03f1a8cb8e6a93ae39395699a
2015-09-17 16:33:18 -07:00
Debargha Mukherjee
c4b4db4b12 Changes to ext-tx probs
Slight improvement in performance.
derflr: +1.828%

Change-Id: I74f5d3743a2b9c27e8b97c266c702dd1a791f73e
2015-09-17 11:04:10 -07:00
Debargha Mukherjee
31341374d7 Inter UV blocks use the same transform type as Y
Extend the ext_tx experiment to make the UV inter blocks use
the same transform type as the extended transform type used
for Y.

derflr: +1.792% (about +0.06)

Change-Id: I4a77e1f7764b2e8b523e28f42ba13559dde4f0ca
2015-09-16 09:55:12 -07:00
Debargha Mukherjee
b8bc026c72 Misc. ext_tx fixes/enhancements
derflr: +1.732% (8-bit)

Change-Id: I9c04c8249646ff96eacacfa1dcb0bd118c04e84a
2015-09-15 10:00:54 -07:00
Debargha Mukherjee
4ce81d666e Comprehensive support for symmetric DST
Creates new hybrid transforms combining symmetric DST with
ADST and DCT. Thus a total of 16 transforms are supported.

derfl: +1.659% (up about 0.2%)

Change-Id: Idde1cecdb59527890bf05da740099c3f6a5b9764
2015-09-10 11:13:59 -07:00
Debargha Mukherjee
ab3042ba3b Some refactoring of EXT_TX
Change-Id: I61359787fdacdeb245e2798031a6e06e4afb83e0
2015-09-09 17:13:22 -07:00
Debargha Mukherjee
9fc691efbe Backport EXT_TX experiment from nextgen
Does not include DST1 yet.

derflr: +1.437 (8-bit internal), +7.243 (12-bit internal)
with --enable-ext-tx

Change-Id: I91f1759fd2de794755eb6384cda52e80e979cb7d
2015-09-09 09:42:51 -07:00
hui su
b3cc3a07b0 Enable ADST for UV channel
derflr +0.202%
hevclf +0.207%
hevcmr +0.095%
hevchr +0.077%

Tested locally on several derf sequences, speed (encoder + decoder)
is slower by less than 1%.

It is part of the EXT_TX experiment, which is to be continued to
explore different transform variants.

Change-Id: I05d44994a62106538a9a241ed8d89bd7c5d14761
2015-08-26 13:25:30 -07:00
Debargha Mukherjee
26addefc34 Merge "Merge remote-tracking branch 'origin/master' into nextgenv2" into nextgenv2 2015-08-26 19:19:24 +00:00
Debargha Mukherjee
a396e2017f Merge remote-tracking branch 'origin/master' into nextgenv2
Merges changes from master to nextgenv2

Change-Id: Ia86490127d01ffde3e376dac2760d84e6b09a2e7
2015-08-25 10:27:02 -07:00
1525 changed files with 222592 additions and 287079 deletions

91
.clang-format Normal file
View File

@ -0,0 +1,91 @@
---
Language: Cpp
# BasedOnStyle: Google
# Generated with clang-format 3.8.1
AccessModifierOffset: -1
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlinesLeft: true
AlignOperands: true
AlignTrailingComments: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: true
AllowShortFunctionsOnASingleLine: All
AllowShortIfStatementsOnASingleLine: true
AllowShortLoopsOnASingleLine: true
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: true
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
BeforeCatch: false
BeforeElse: false
IndentBraces: false
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Attach
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
ColumnLimit: 80
CommentPragmas: '^ IWYU pragma:'
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
ForEachMacros: [ foreach, Q_FOREACH, BOOST_FOREACH ]
IncludeCategories:
- Regex: '^<.*\.h>'
Priority: 1
- Regex: '^<.*'
Priority: 2
- Regex: '.*'
Priority: 3
IndentCaseLabels: true
IndentWidth: 2
IndentWrappedFunctionNames: false
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBlockIndentWidth: 2
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: false
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 200
PointerAlignment: Right
ReflowComments: true
SortIncludes: false
SpaceAfterCStyleCast: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 2
SpacesInAngles: false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Auto
TabWidth: 8
UseTab: Never
...

41
.gitignore vendored
View File

@ -29,37 +29,36 @@
/examples/decode_with_drops
/examples/decode_with_partial_drops
/examples/example_xma
/examples/lossless_encoder
/examples/postproc
/examples/resize_util
/examples/set_maps
/examples/simple_decoder
/examples/simple_encoder
/examples/twopass_encoder
/examples/vp8_multi_resolution_encoder
/examples/vp8cx_set_ref
/examples/vp9_lossless_encoder
/examples/vp9_spatial_scalable_encoder
/examples/vpx_temporal_scalable_patterns
/examples/vpx_temporal_svc_encoder
/examples/aom_cx_set_ref
/examples/av1_spatial_scalable_encoder
/examples/aom_temporal_scalable_patterns
/examples/aom_temporal_svc_encoder
/ivfdec
/ivfdec.dox
/ivfenc
/ivfenc.dox
/libvpx.so*
/libvpx.ver
/libaom.so*
/libaom.ver
/samples.dox
/test_intra_pred_speed
/test_libvpx
/vp8_api1_migration.dox
/vp[89x]_rtcd.h
/vpx.pc
/vpx_config.c
/vpx_config.h
/vpx_dsp_rtcd.h
/vpx_scale_rtcd.h
/vpx_version.h
/vpxdec
/vpxdec.dox
/vpxenc
/vpxenc.dox
/test_libaom
/aom_api1_migration.dox
/av1_rtcd.h
/aom.pc
/aom_config.c
/aom_config.h
/aom_dsp_rtcd.h
/aom_scale_rtcd.h
/aom_version.h
/aomdec
/aomdec.dox
/aomenc
/aomenc.dox
TAGS

16
AUTHORS
View File

@ -56,13 +56,16 @@ James Zern <jzern@google.com>
Jan Gerber <j@mailb.org>
Jan Kratochvil <jan.kratochvil@redhat.com>
Janne Salonen <jsalonen@google.com>
Jean-Marc Valin <jmvalin@jmvalin.ca>
Jeff Faust <jfaust@google.com>
Jeff Muizelaar <jmuizelaar@mozilla.com>
Jeff Petkau <jpet@chromium.org>
Jia Jia <jia.jia@linaro.org>
Jian Zhou <zhoujian@google.com>
Jim Bankoski <jimbankoski@google.com>
Jingning Han <jingning@google.com>
Joey Parrish <joeyparrish@google.com>
Johann Koenig <johannkoenig@chromium.org>
Johann Koenig <johannkoenig@google.com>
John Koleszar <jkoleszar@google.com>
Johnny Klonaris <google@jawknee.com>
@ -89,6 +92,7 @@ Mike Hommey <mhommey@mozilla.com>
Mikhal Shemer <mikhal@google.com>
Minghai Shang <minghai@google.com>
Morton Jonuschat <yabawock@gmail.com>
Nathan E. Egge <negge@dgql.org>
Nico Weber <thakis@chromium.org>
Parag Salasakar <img.mips1@gmail.com>
Pascal Massimino <pascal.massimino@gmail.com>
@ -97,6 +101,7 @@ Paul Wilkins <paulwilkins@google.com>
Pavol Rusnak <stick@gk2.sk>
Paweł Hajdan <phajdan@google.com>
Pengchong Jin <pengchong@google.com>
Peter de Rivaz <peter.derivaz@argondesign.com>
Peter de Rivaz <peter.derivaz@gmail.com>
Philip Jägenstedt <philipj@opera.com>
Priit Laes <plaes@plaes.org>
@ -107,13 +112,16 @@ Rob Bradford <rob@linux.intel.com>
Ronald S. Bultje <rsbultje@gmail.com>
Rui Ueyama <ruiu@google.com>
Sami Pietilä <samipietila@google.com>
Sasi Inguva <isasi@google.com>
Scott Graham <scottmg@chromium.org>
Scott LaVarnway <slavarnway@google.com>
Sean McGovern <gseanmcg@gmail.com>
Sergey Kolomenkin <kolomenkin@gmail.com>
Sergey Ulanov <sergeyu@chromium.org>
Shimon Doodkin <helpmepro1@gmail.com>
Shunyao Li <shunyaoli@google.com>
Stefan Holmer <holmer@google.com>
Steinar Midtskogen <stemidts@cisco.com>
Suman Sunkara <sunkaras@google.com>
Taekhyun Kim <takim@nvidia.com>
Takanori MATSUURA <t.matsuu@gmail.com>
@ -121,14 +129,16 @@ Tamar Levy <tamar.levy@intel.com>
Tao Bai <michaelbai@chromium.org>
Tero Rintaluoma <teror@google.com>
Thijs Vermeir <thijsvermeir@gmail.com>
Thomas Daede <tdaede@mozilla.com>
Thomas Davies <thdavies@cisco.com>
Thomas <thdavies@cisco.com>
Tim Kopp <tkopp@google.com>
Timothy B. Terriberry <tterribe@xiph.org>
Tom Finegan <tomfinegan@google.com>
Tristan Matthews <le.businessman@gmail.com>
Tristan Matthews <tmatth@videolan.org>
Vignesh Venkatasubramanian <vigneshv@google.com>
Yaowu Xu <yaowu@google.com>
Yongzhe Wang <yongzhe@google.com>
Yunqing Wang <yunqingwang@google.com>
Zoe Liu <zoeliu@google.com>
Google Inc.
The Mozilla Foundation
The Xiph.Org Foundation

View File

@ -1,7 +1,9 @@
Next Release
- Incompatible changes:
The VP9 encoder's default keyframe interval changed to 128 from 9999.
The AV1 encoder's default keyframe interval changed to 128 from 9999.
2016-04-07 v0.1.0 "AOMedia Codec 1"
This release is the first Alliance for Open Media codec.
2015-11-09 v1.5.0 "Javan Whistling Duck"
This release improves upon the VP9 encoder and speeds up the encoding and
decoding processes.

270
CMakeLists.txt Normal file
View File

@ -0,0 +1,270 @@
##
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
cmake_minimum_required(VERSION 3.2)
project(AOM C CXX)
set(AOM_ROOT "${CMAKE_CURRENT_SOURCE_DIR}")
set(AOM_CONFIG_DIR "${CMAKE_CURRENT_BINARY_DIR}")
include("${AOM_ROOT}/build/cmake/aom_configure.cmake")
set(AOM_SRCS
"${AOM_CONFIG_DIR}/aom_config.c"
"${AOM_CONFIG_DIR}/aom_config.h"
"${AOM_ROOT}/aom/aom.h"
"${AOM_ROOT}/aom/aom_codec.h"
"${AOM_ROOT}/aom/aom_decoder.h"
"${AOM_ROOT}/aom/aom_encoder.h"
"${AOM_ROOT}/aom/aom_frame_buffer.h"
"${AOM_ROOT}/aom/aom_image.h"
"${AOM_ROOT}/aom/aom_integer.h"
"${AOM_ROOT}/aom/aomcx.h"
"${AOM_ROOT}/aom/aomdx.h"
"${AOM_ROOT}/aom/internal/aom_codec_internal.h"
"${AOM_ROOT}/aom/src/aom_codec.c"
"${AOM_ROOT}/aom/src/aom_decoder.c"
"${AOM_ROOT}/aom/src/aom_encoder.c"
"${AOM_ROOT}/aom/src/aom_image.c")
set(AOM_DSP_SRCS
"${AOM_ROOT}/aom_dsp/aom_convolve.c"
"${AOM_ROOT}/aom_dsp/aom_convolve.h"
"${AOM_ROOT}/aom_dsp/aom_dsp_common.h"
"${AOM_ROOT}/aom_dsp/aom_dsp_rtcd.c"
"${AOM_ROOT}/aom_dsp/aom_filter.h"
"${AOM_ROOT}/aom_dsp/aom_simd.c"
"${AOM_ROOT}/aom_dsp/aom_simd.h"
"${AOM_ROOT}/aom_dsp/aom_simd_inline.h"
"${AOM_ROOT}/aom_dsp/avg.c"
"${AOM_ROOT}/aom_dsp/bitreader.h"
"${AOM_ROOT}/aom_dsp/bitreader_buffer.c"
"${AOM_ROOT}/aom_dsp/bitreader_buffer.h"
"${AOM_ROOT}/aom_dsp/bitwriter.h"
"${AOM_ROOT}/aom_dsp/bitwriter_buffer.c"
"${AOM_ROOT}/aom_dsp/bitwriter_buffer.h"
"${AOM_ROOT}/aom_dsp/blend.h"
"${AOM_ROOT}/aom_dsp/blend_a64_hmask.c"
"${AOM_ROOT}/aom_dsp/blend_a64_mask.c"
"${AOM_ROOT}/aom_dsp/blend_a64_vmask.c"
"${AOM_ROOT}/aom_dsp/dkboolreader.c"
"${AOM_ROOT}/aom_dsp/dkboolreader.h"
"${AOM_ROOT}/aom_dsp/dkboolwriter.c"
"${AOM_ROOT}/aom_dsp/dkboolwriter.h"
"${AOM_ROOT}/aom_dsp/fwd_txfm.c"
"${AOM_ROOT}/aom_dsp/fwd_txfm.h"
"${AOM_ROOT}/aom_dsp/intrapred.c"
"${AOM_ROOT}/aom_dsp/inv_txfm.c"
"${AOM_ROOT}/aom_dsp/inv_txfm.h"
"${AOM_ROOT}/aom_dsp/loopfilter.c"
"${AOM_ROOT}/aom_dsp/prob.c"
"${AOM_ROOT}/aom_dsp/prob.h"
"${AOM_ROOT}/aom_dsp/psnr.c"
"${AOM_ROOT}/aom_dsp/psnr.h"
"${AOM_ROOT}/aom_dsp/quantize.c"
"${AOM_ROOT}/aom_dsp/quantize.h"
"${AOM_ROOT}/aom_dsp/sad.c"
"${AOM_ROOT}/aom_dsp/simd/v128_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v128_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/simd/v256_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v256_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/simd/v64_intrinsics.h"
"${AOM_ROOT}/aom_dsp/simd/v64_intrinsics_c.h"
"${AOM_ROOT}/aom_dsp/subtract.c"
"${AOM_ROOT}/aom_dsp/txfm_common.h"
"${AOM_ROOT}/aom_dsp/variance.c"
"${AOM_ROOT}/aom_dsp/variance.h")
set(AOM_MEM_SRCS
"${AOM_ROOT}/aom_mem/aom_mem.c"
"${AOM_ROOT}/aom_mem/aom_mem.h"
"${AOM_ROOT}/aom_mem/include/aom_mem_intrnl.h")
set(AOM_SCALE_SRCS
"${AOM_ROOT}/aom_scale/aom_scale.h"
"${AOM_ROOT}/aom_scale/aom_scale_rtcd.c"
"${AOM_ROOT}/aom_scale/generic/aom_scale.c"
"${AOM_ROOT}/aom_scale/generic/gen_scalers.c"
"${AOM_ROOT}/aom_scale/generic/yv12config.c"
"${AOM_ROOT}/aom_scale/generic/yv12extend.c"
"${AOM_ROOT}/aom_scale/yv12config.h")
# TODO(tomfinegan): Extract aom_ports from aom_util if possible.
set(AOM_UTIL_SRCS
"${AOM_ROOT}/aom_ports/aom_once.h"
"${AOM_ROOT}/aom_ports/aom_timer.h"
"${AOM_ROOT}/aom_ports/bitops.h"
"${AOM_ROOT}/aom_ports/emmintrin_compat.h"
"${AOM_ROOT}/aom_ports/mem.h"
"${AOM_ROOT}/aom_ports/mem_ops.h"
"${AOM_ROOT}/aom_ports/mem_ops_aligned.h"
"${AOM_ROOT}/aom_ports/msvc.h"
"${AOM_ROOT}/aom_ports/system_state.h"
"${AOM_ROOT}/aom_util/aom_thread.c"
"${AOM_ROOT}/aom_util/aom_thread.h"
"${AOM_ROOT}/aom_util/endian_inl.h")
set(AOM_AV1_COMMON_SRCS
"${AOM_ROOT}/av1/av1_iface_common.h"
"${AOM_ROOT}/av1/common/alloccommon.c"
"${AOM_ROOT}/av1/common/alloccommon.h"
"${AOM_ROOT}/av1/common/av1_fwd_txfm.c"
"${AOM_ROOT}/av1/common/av1_fwd_txfm.h"
"${AOM_ROOT}/av1/common/av1_inv_txfm.c"
"${AOM_ROOT}/av1/common/av1_inv_txfm.h"
"${AOM_ROOT}/av1/common/av1_rtcd.c"
"${AOM_ROOT}/av1/common/blockd.c"
"${AOM_ROOT}/av1/common/blockd.h"
"${AOM_ROOT}/av1/common/common.h"
"${AOM_ROOT}/av1/common/common_data.h"
"${AOM_ROOT}/av1/common/convolve.c"
"${AOM_ROOT}/av1/common/convolve.h"
"${AOM_ROOT}/av1/common/debugmodes.c"
"${AOM_ROOT}/av1/common/entropy.c"
"${AOM_ROOT}/av1/common/entropy.h"
"${AOM_ROOT}/av1/common/entropymode.c"
"${AOM_ROOT}/av1/common/entropymode.h"
"${AOM_ROOT}/av1/common/entropymv.c"
"${AOM_ROOT}/av1/common/entropymv.h"
"${AOM_ROOT}/av1/common/enums.h"
"${AOM_ROOT}/av1/common/filter.c"
"${AOM_ROOT}/av1/common/filter.h"
"${AOM_ROOT}/av1/common/frame_buffers.c"
"${AOM_ROOT}/av1/common/frame_buffers.h"
"${AOM_ROOT}/av1/common/idct.c"
"${AOM_ROOT}/av1/common/idct.h"
"${AOM_ROOT}/av1/common/loopfilter.c"
"${AOM_ROOT}/av1/common/loopfilter.h"
"${AOM_ROOT}/av1/common/mv.h"
"${AOM_ROOT}/av1/common/mvref_common.c"
"${AOM_ROOT}/av1/common/mvref_common.h"
"${AOM_ROOT}/av1/common/odintrin.c"
"${AOM_ROOT}/av1/common/odintrin.h"
"${AOM_ROOT}/av1/common/onyxc_int.h"
"${AOM_ROOT}/av1/common/pred_common.c"
"${AOM_ROOT}/av1/common/pred_common.h"
"${AOM_ROOT}/av1/common/quant_common.c"
"${AOM_ROOT}/av1/common/quant_common.h"
"${AOM_ROOT}/av1/common/reconinter.c"
"${AOM_ROOT}/av1/common/reconinter.h"
"${AOM_ROOT}/av1/common/reconintra.c"
"${AOM_ROOT}/av1/common/reconintra.h"
"${AOM_ROOT}/av1/common/scale.c"
"${AOM_ROOT}/av1/common/scale.h"
"${AOM_ROOT}/av1/common/scan.c"
"${AOM_ROOT}/av1/common/scan.h"
"${AOM_ROOT}/av1/common/seg_common.c"
"${AOM_ROOT}/av1/common/seg_common.h"
"${AOM_ROOT}/av1/common/thread_common.c"
"${AOM_ROOT}/av1/common/thread_common.h"
"${AOM_ROOT}/av1/common/tile_common.c"
"${AOM_ROOT}/av1/common/tile_common.h")
set(AOM_AV1_DECODER_SRCS
"${AOM_ROOT}/av1/av1_dx_iface.c"
"${AOM_ROOT}/av1/decoder/decodeframe.c"
"${AOM_ROOT}/av1/decoder/decodeframe.h"
"${AOM_ROOT}/av1/decoder/decodemv.c"
"${AOM_ROOT}/av1/decoder/decodemv.h"
"${AOM_ROOT}/av1/decoder/decoder.c"
"${AOM_ROOT}/av1/decoder/decoder.h"
"${AOM_ROOT}/av1/decoder/detokenize.c"
"${AOM_ROOT}/av1/decoder/detokenize.h"
"${AOM_ROOT}/av1/decoder/dsubexp.c"
"${AOM_ROOT}/av1/decoder/dsubexp.h"
"${AOM_ROOT}/av1/decoder/dthread.c"
"${AOM_ROOT}/av1/decoder/dthread.h")
set(AOM_AV1_ENCODER_SRCS
"${AOM_ROOT}/av1/av1_cx_iface.c"
"${AOM_ROOT}/av1/encoder/aq_complexity.c"
"${AOM_ROOT}/av1/encoder/aq_complexity.h"
"${AOM_ROOT}/av1/encoder/aq_cyclicrefresh.c"
"${AOM_ROOT}/av1/encoder/aq_cyclicrefresh.h"
"${AOM_ROOT}/av1/encoder/aq_variance.c"
"${AOM_ROOT}/av1/encoder/aq_variance.h"
"${AOM_ROOT}/av1/encoder/bitstream.c"
"${AOM_ROOT}/av1/encoder/bitstream.h"
"${AOM_ROOT}/av1/encoder/block.h"
"${AOM_ROOT}/av1/encoder/context_tree.c"
"${AOM_ROOT}/av1/encoder/context_tree.h"
"${AOM_ROOT}/av1/encoder/cost.c"
"${AOM_ROOT}/av1/encoder/cost.h"
"${AOM_ROOT}/av1/encoder/dct.c"
"${AOM_ROOT}/av1/encoder/encodeframe.c"
"${AOM_ROOT}/av1/encoder/encodeframe.h"
"${AOM_ROOT}/av1/encoder/encodemb.c"
"${AOM_ROOT}/av1/encoder/encodemb.h"
"${AOM_ROOT}/av1/encoder/encodemv.c"
"${AOM_ROOT}/av1/encoder/encodemv.h"
"${AOM_ROOT}/av1/encoder/encoder.c"
"${AOM_ROOT}/av1/encoder/encoder.h"
"${AOM_ROOT}/av1/encoder/ethread.c"
"${AOM_ROOT}/av1/encoder/ethread.h"
"${AOM_ROOT}/av1/encoder/extend.c"
"${AOM_ROOT}/av1/encoder/extend.h"
"${AOM_ROOT}/av1/encoder/firstpass.c"
"${AOM_ROOT}/av1/encoder/firstpass.h"
"${AOM_ROOT}/av1/encoder/hybrid_fwd_txfm.c"
"${AOM_ROOT}/av1/encoder/hybrid_fwd_txfm.h"
"${AOM_ROOT}/av1/encoder/lookahead.c"
"${AOM_ROOT}/av1/encoder/lookahead.h"
"${AOM_ROOT}/av1/encoder/mbgraph.c"
"${AOM_ROOT}/av1/encoder/mbgraph.h"
"${AOM_ROOT}/av1/encoder/mcomp.c"
"${AOM_ROOT}/av1/encoder/mcomp.h"
"${AOM_ROOT}/av1/encoder/picklpf.c"
"${AOM_ROOT}/av1/encoder/picklpf.h"
"${AOM_ROOT}/av1/encoder/quantize.c"
"${AOM_ROOT}/av1/encoder/quantize.h"
"${AOM_ROOT}/av1/encoder/ratectrl.c"
"${AOM_ROOT}/av1/encoder/ratectrl.h"
"${AOM_ROOT}/av1/encoder/rd.c"
"${AOM_ROOT}/av1/encoder/rd.h"
"${AOM_ROOT}/av1/encoder/rdopt.c"
"${AOM_ROOT}/av1/encoder/rdopt.h"
"${AOM_ROOT}/av1/encoder/resize.c"
"${AOM_ROOT}/av1/encoder/resize.h"
"${AOM_ROOT}/av1/encoder/segmentation.c"
"${AOM_ROOT}/av1/encoder/segmentation.h"
"${AOM_ROOT}/av1/encoder/speed_features.c"
"${AOM_ROOT}/av1/encoder/speed_features.h"
"${AOM_ROOT}/av1/encoder/subexp.c"
"${AOM_ROOT}/av1/encoder/subexp.h"
"${AOM_ROOT}/av1/encoder/temporal_filter.c"
"${AOM_ROOT}/av1/encoder/temporal_filter.h"
"${AOM_ROOT}/av1/encoder/tokenize.c"
"${AOM_ROOT}/av1/encoder/tokenize.h"
"${AOM_ROOT}/av1/encoder/treewriter.c"
"${AOM_ROOT}/av1/encoder/treewriter.h")
# Targets
add_library(aom_dsp ${AOM_DSP_SRCS})
include_directories(${AOM_ROOT} ${AOM_CONFIG_DIR})
add_library(aom_mem ${AOM_MEM_SRCS})
add_library(aom_scale ${AOM_SCALE_SRCS})
include_directories(${AOM_ROOT} ${AOM_CONFIG_DIR})
add_library(aom_util ${AOM_UTIL_SRCS})
add_library(aom_av1_decoder ${AOM_AV1_DECODER_SRCS})
add_library(aom_av1_encoder ${AOM_AV1_ENCODER_SRCS})
add_library(aom ${AOM_SRCS})
target_link_libraries(aom LINK_PUBLIC
aom_dsp
aom_mem
aom_scale
aom_util
aom_av1_decoder
aom_av1_encoder)
add_executable(simple_decoder examples/simple_decoder.c)
include_directories(${AOM_ROOT})
target_link_libraries(simple_decoder LINK_PUBLIC aom)
add_executable(simple_encoder examples/simple_encoder.c)
include_directories(${AOM_ROOT})
target_link_libraries(simple_encoder LINK_PUBLIC aom)

34
LICENSE
View File

@ -1,31 +1,27 @@
Copyright (c) 2010, The WebM Project authors. All rights reserved.
Copyright (c) 2016, Alliance for Open Media. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Google, nor the WebM Project, nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

127
PATENTS
View File

@ -1,23 +1,108 @@
Additional IP Rights Grant (Patents)
------------------------------------
Alliance for Open Media Patent License 1.0
"These implementations" means the copyrightable works that implement the WebM
codecs distributed by Google as part of the WebM Project.
1. License Terms.
1.1. Patent License. Subject to the terms and conditions of this License, each
Licensor, on behalf of itself and successors in interest and assigns,
grants Licensee a non-sublicensable, perpetual, worldwide, non-exclusive,
no-charge, royalty-free, irrevocable (except as expressly stated in this
License) patent license to its Necessary Claims to make, use, sell, offer
for sale, import or distribute any Implementation.
1.2. Conditions.
1.2.1. Availability. As a condition to the grant of rights to Licensee to make,
sell, offer for sale, import or distribute an Implementation under
Section 1.1, Licensee must make its Necessary Claims available under
this License, and must reproduce this License with any Implementation
as follows:
a. For distribution in source code, by including this License in the
root directory of the source code with its Implementation.
b. For distribution in any other form (including binary, object form,
and/or hardware description code (e.g., HDL, RTL, Gate Level Netlist,
GDSII, etc.)), by including this License in the documentation, legal
notices, and/or other written materials provided with the
Implementation.
1.2.2. Additional Conditions. This license is directly from Licensor to
Licensee. Licensee acknowledges as a condition of benefiting from it
that no rights from Licensor are received from suppliers, distributors,
or otherwise in connection with this License.
1.3. Defensive Termination. If any Licensee, its Affiliates, or its agents
initiates patent litigation or files, maintains, or voluntarily
participates in a lawsuit against another entity or any person asserting
that any Implementation infringes Necessary Claims, any patent licenses
granted under this License directly to the Licensee are immediately
terminated as of the date of the initiation of action unless 1) that suit
was in response to a corresponding suit regarding an Implementation first
brought against an initiating entity, or 2) that suit was brought to
enforce the terms of this License (including intervention in a third-party
action by a Licensee).
1.4. Disclaimers. The Reference Implementation and Specification are provided
"AS IS" and without warranty. The entire risk as to implementing or
otherwise using the Reference Implementation or Specification is assumed
by the implementer and user. Licensor expressly disclaims any warranties
(express, implied, or otherwise), including implied warranties of
merchantability, non-infringement, fitness for a particular purpose, or
title, related to the material. IN NO EVENT WILL LICENSOR BE LIABLE TO
ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF
ACTION OF ANY KIND WITH RESPECT TO THIS LICENSE, WHETHER BASED ON BREACH
OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR
NOT THE OTHER PARTRY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2. Definitions.
2.1. Affiliate. “Affiliate” means an entity that directly or indirectly
Controls, is Controlled by, or is under common Control of that party.
2.2. Control. “Control” means direct or indirect control of more than 50% of
the voting power to elect directors of that corporation, or for any other
entity, the power to direct management of such entity.
2.3. Decoder. "Decoder" means any decoder that conforms fully with all
non-optional portions of the Specification.
2.4. Encoder. "Encoder" means any encoder that produces a bitstream that can
be decoded by a Decoder only to the extent it produces such a bitstream.
2.5. Final Deliverable. “Final Deliverable” means the final version of a
deliverable approved by the Alliance for Open Media as a Final
Deliverable.
2.6. Implementation. "Implementation" means any implementation, including the
Reference Implementation, that is an Encoder and/or a Decoder. An
Implementation also includes components of an Implementation only to the
extent they are used as part of an Implementation.
2.7. License. “License” means this license.
2.8. Licensee. “Licensee” means any person or entity who exercises patent
rights granted under this License.
2.9. Licensor. "Licensor" means (i) any Licensee that makes, sells, offers
for sale, imports or distributes any Implementation, or (ii) a person
or entity that has a licensing obligation to the Implementation as a
result of its membership and/or participation in the Alliance for Open
Media working group that developed the Specification.
2.10. Necessary Claims. "Necessary Claims" means all claims of patents or
patent applications, (a) that currently or at any time in the future,
are owned or controlled by the Licensor, and (b) (i) would be an
Essential Claim as defined by the W3C Policy as of February 5, 2004
(https://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential)
as if the Specification was a W3C Recommendation; or (ii) are infringed
by the Reference Implementation.
2.11. Reference Implementation. “Reference Implementation” means an Encoder
and/or Decoder released by the Alliance for Open Media as a Final
Deliverable.
2.12. Specification. “Specification” means the specification designated by
the Alliance for Open Media as a Final Deliverable for which this
License was issued.
Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge,
royalty-free, irrevocable (except as stated in this section) patent license to
make, have made, use, offer to sell, sell, import, transfer, and otherwise
run, modify and propagate the contents of these implementations of WebM, where
such license applies only to those patent claims, both currently owned by
Google and acquired in the future, licensable by Google that are necessarily
infringed by these implementations of WebM. This grant does not include claims
that would be infringed only as a consequence of further modification of these
implementations. If you or your agent or exclusive licensee institute or order
or agree to the institution of patent litigation or any other patent
enforcement activity against any entity (including a cross-claim or
counterclaim in a lawsuit) alleging that any of these implementations of WebM
or any code incorporated within any of these implementations of WebM
constitute direct or contributory patent infringement, or inducement of
patent infringement, then any patent rights granted to you under this License
for these implementations of WebM shall terminate as of the date such
litigation is filed.

34
README
View File

@ -1,6 +1,6 @@
README - 23 March 2015
Welcome to the WebM VP8/VP9 Codec SDK!
Welcome to the WebM VP8/AV1 Codec SDK!
COMPILING THE APPLICATIONS/LIBRARIES:
The build system used is similar to autotools. Building generally consists of
@ -33,13 +33,13 @@ COMPILING THE APPLICATIONS/LIBRARIES:
$ mkdir build
$ cd build
$ ../libvpx/configure <options>
$ ../libaom/configure <options>
$ make
3. Configuration options
The 'configure' script supports a number of options. The --help option can be
used to get a list of supported options:
$ ../libvpx/configure --help
$ ../libaom/configure --help
4. Cross development
For cross development, the most notable option is the --target option. The
@ -79,9 +79,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
x86-os2-gcc
x86-solaris-gcc
x86-win32-gcc
x86-win32-vs7
x86-win32-vs8
x86-win32-vs9
x86-win32-vs10
x86-win32-vs11
x86-win32-vs12
@ -98,8 +95,6 @@ COMPILING THE APPLICATIONS/LIBRARIES:
x86_64-linux-icc
x86_64-solaris-gcc
x86_64-win64-gcc
x86_64-win64-vs8
x86_64-win64-vs9
x86_64-win64-vs10
x86_64-win64-vs11
x86_64-win64-vs12
@ -113,7 +108,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
toolchain, the following command could be used (note, POSIX SH syntax, adapt
to your shell as necessary):
$ CROSS=mipsel-linux-uclibc- ../libvpx/configure
$ CROSS=mipsel-linux-uclibc- ../libaom/configure
In addition, the executables to be invoked can be overridden by specifying the
environment variables: CC, AR, LD, AS, STRIP, NM. Additional flags can be
@ -124,13 +119,28 @@ COMPILING THE APPLICATIONS/LIBRARIES:
This defaults to config.log. This should give a good indication of what went
wrong. If not, contact us for support.
VP8/VP9 TEST VECTORS:
VP8/AV1 TEST VECTORS:
The test vectors can be downloaded and verified using the build system after
running configure. To specify an alternate directory the
LIBVPX_TEST_DATA_PATH environment variable can be used.
LIBAOM_TEST_DATA_PATH environment variable can be used.
$ ./configure --enable-unit-tests
$ LIBVPX_TEST_DATA_PATH=../libvpx-test-data make testdata
$ LIBAOM_TEST_DATA_PATH=../-test-data make testdata
CODE STYLE:
The coding style used by this project is enforced with clang-format using the
configuration contained in the .clang-format file in the root of the
repository.
Before pushing changes for review you can format your code with:
# Apply clang-format to modified .c, .h and .cc files
$ clang-format -i --style=file \
$(git diff --name-only --diff-filter=ACMR '*.[hc]' '*.cc')
Check the .clang-format file for the version used to generate it if there is
any difference between your local formatting and the review system.
See also: http://clang.llvm.org/docs/ClangFormat.html
SUPPORT
This library is an open source project supported by its community. Please

160
aom/aom.h Normal file
View File

@ -0,0 +1,160 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup aom AOM
* \ingroup codecs
* AOM is aom's newest video compression algorithm that uses motion
* compensated prediction, Discrete Cosine Transform (DCT) coding of the
* prediction error signal and context dependent entropy coding techniques
* based on arithmetic principles. It features:
* - YUV 4:2:0 image format
* - Macro-block based coding (16x16 luma plus two 8x8 chroma)
* - 1/4 (1/8) pixel accuracy motion compensated prediction
* - 4x4 DCT transform
* - 128 level linear quantizer
* - In loop deblocking filter
* - Context-based entropy coding
*
* @{
*/
/*!\file
* \brief Provides controls common to both the AOM encoder and decoder.
*/
#ifndef AOM_AOM_H_
#define AOM_AOM_H_
#include "./aom_codec.h"
#include "./aom_image.h"
#ifdef __cplusplus
extern "C" {
#endif
/*!\brief Control functions
*
* The set of macros define the control functions of AOM interface
*/
enum aom_com_control_id {
/*!\brief pass in an external frame into decoder to be used as reference frame
*/
AOM_SET_REFERENCE = 1,
AOM_COPY_REFERENCE = 2, /**< get a copy of reference frame from the decoder */
AOM_SET_POSTPROC = 3, /**< set the decoder's post processing settings */
AOM_SET_DBG_COLOR_REF_FRAME =
4, /**< set the reference frames to color for each macroblock */
AOM_SET_DBG_COLOR_MB_MODES = 5, /**< set which macro block modes to color */
AOM_SET_DBG_COLOR_B_MODES = 6, /**< set which blocks modes to color */
AOM_SET_DBG_DISPLAY_MV = 7, /**< set which motion vector modes to draw */
/* TODO(jkoleszar): The encoder incorrectly reuses some of these values (5+)
* for its control ids. These should be migrated to something like the
* AOM_DECODER_CTRL_ID_START range next time we're ready to break the ABI.
*/
AV1_GET_REFERENCE = 128, /**< get a pointer to a reference frame */
AOM_COMMON_CTRL_ID_MAX,
AV1_GET_NEW_FRAME_IMAGE = 192, /**< get a pointer to the new frame */
AOM_DECODER_CTRL_ID_START = 256
};
/*!\brief post process flags
*
* The set of macros define AOM decoder post processing flags
*/
enum aom_postproc_level {
AOM_NOFILTERING = 0,
AOM_DEBLOCK = 1 << 0,
AOM_DEMACROBLOCK = 1 << 1,
AOM_ADDNOISE = 1 << 2,
AOM_DEBUG_TXT_FRAME_INFO = 1 << 3, /**< print frame information */
AOM_DEBUG_TXT_MBLK_MODES =
1 << 4, /**< print macro block modes over each macro block */
AOM_DEBUG_TXT_DC_DIFF = 1 << 5, /**< print dc diff for each macro block */
AOM_DEBUG_TXT_RATE_INFO = 1 << 6, /**< print video rate info (encoder only) */
AOM_MFQE = 1 << 10
};
/*!\brief post process flags
*
* This define a structure that describe the post processing settings. For
* the best objective measure (using the PSNR metric) set post_proc_flag
* to AOM_DEBLOCK and deblocking_level to 1.
*/
typedef struct aom_postproc_cfg {
/*!\brief the types of post processing to be done, should be combination of
* "aom_postproc_level" */
int post_proc_flag;
int deblocking_level; /**< the strength of deblocking, valid range [0, 16] */
int noise_level; /**< the strength of additive noise, valid range [0, 16] */
} aom_postproc_cfg_t;
/*!\brief reference frame type
*
* The set of macros define the type of AOM reference frames
*/
typedef enum aom_ref_frame_type {
AOM_LAST_FRAME = 1,
AOM_GOLD_FRAME = 2,
AOM_ALTR_FRAME = 4
} aom_ref_frame_type_t;
/*!\brief reference frame data struct
*
* Define the data struct to access aom reference frames.
*/
typedef struct aom_ref_frame {
aom_ref_frame_type_t frame_type; /**< which reference frame */
aom_image_t img; /**< reference frame data in image format */
} aom_ref_frame_t;
/*!\brief AV1 specific reference frame data struct
*
* Define the data struct to access av1 reference frames.
*/
typedef struct av1_ref_frame {
int idx; /**< frame index to get (input) */
aom_image_t img; /**< img structure to populate (output) */
} av1_ref_frame_t;
/*!\cond */
/*!\brief aom decoder control function parameter type
*
* defines the data type for each of AOM decoder control function requires
*/
AOM_CTRL_USE_TYPE(AOM_SET_REFERENCE, aom_ref_frame_t *)
#define AOM_CTRL_AOM_SET_REFERENCE
AOM_CTRL_USE_TYPE(AOM_COPY_REFERENCE, aom_ref_frame_t *)
#define AOM_CTRL_AOM_COPY_REFERENCE
AOM_CTRL_USE_TYPE(AOM_SET_POSTPROC, aom_postproc_cfg_t *)
#define AOM_CTRL_AOM_SET_POSTPROC
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_REF_FRAME, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_REF_FRAME
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_MB_MODES, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_MB_MODES
AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_B_MODES, int)
#define AOM_CTRL_AOM_SET_DBG_COLOR_B_MODES
AOM_CTRL_USE_TYPE(AOM_SET_DBG_DISPLAY_MV, int)
#define AOM_CTRL_AOM_SET_DBG_DISPLAY_MV
AOM_CTRL_USE_TYPE(AV1_GET_REFERENCE, av1_ref_frame_t *)
#define AOM_CTRL_AV1_GET_REFERENCE
AOM_CTRL_USE_TYPE(AV1_GET_NEW_FRAME_IMAGE, aom_image_t *)
#define AOM_CTRL_AV1_GET_NEW_FRAME_IMAGE
/*!\endcond */
/*! @} - end defgroup aom */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOM_H_

487
aom/aom_codec.h Normal file
View File

@ -0,0 +1,487 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup codec Common Algorithm Interface
* This abstraction allows applications to easily support multiple video
* formats with minimal code duplication. This section describes the interface
* common to all codecs (both encoders and decoders).
* @{
*/
/*!\file
* \brief Describes the codec algorithm interface to applications.
*
* This file describes the interface between an application and a
* video codec algorithm.
*
* An application instantiates a specific codec instance by using
* aom_codec_init() and a pointer to the algorithm's interface structure:
* <pre>
* my_app.c:
* extern aom_codec_iface_t my_codec;
* {
* aom_codec_ctx_t algo;
* res = aom_codec_init(&algo, &my_codec);
* }
* </pre>
*
* Once initialized, the instance is manged using other functions from
* the aom_codec_* family.
*/
#ifndef AOM_AOM_CODEC_H_
#define AOM_AOM_CODEC_H_
#ifdef __cplusplus
extern "C" {
#endif
#include "./aom_integer.h"
#include "./aom_image.h"
/*!\brief Decorator indicating a function is deprecated */
#ifndef DEPRECATED
#if defined(__GNUC__) && __GNUC__
#define DEPRECATED __attribute__((deprecated))
#elif defined(_MSC_VER)
#define DEPRECATED
#else
#define DEPRECATED
#endif
#endif /* DEPRECATED */
#ifndef DECLSPEC_DEPRECATED
#if defined(__GNUC__) && __GNUC__
#define DECLSPEC_DEPRECATED /**< \copydoc #DEPRECATED */
#elif defined(_MSC_VER)
/*!\brief \copydoc #DEPRECATED */
#define DECLSPEC_DEPRECATED __declspec(deprecated)
#else
#define DECLSPEC_DEPRECATED /**< \copydoc #DEPRECATED */
#endif
#endif /* DECLSPEC_DEPRECATED */
/*!\brief Decorator indicating a function is potentially unused */
#ifdef UNUSED
#elif defined(__GNUC__) || defined(__clang__)
#define UNUSED __attribute__((unused))
#else
#define UNUSED
#endif
/*!\brief Decorator indicating that given struct/union/enum is packed */
#ifndef ATTRIBUTE_PACKED
#if defined(__GNUC__) && __GNUC__
#define ATTRIBUTE_PACKED __attribute__((packed))
#elif defined(_MSC_VER)
#define ATTRIBUTE_PACKED
#else
#define ATTRIBUTE_PACKED
#endif
#endif /* ATTRIBUTE_PACKED */
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_CODEC_ABI_VERSION (3 + AOM_IMAGE_ABI_VERSION) /**<\hideinitializer*/
/*!\brief Algorithm return codes */
typedef enum {
/*!\brief Operation completed without error */
AOM_CODEC_OK,
/*!\brief Unspecified error */
AOM_CODEC_ERROR,
/*!\brief Memory operation failed */
AOM_CODEC_MEM_ERROR,
/*!\brief ABI version mismatch */
AOM_CODEC_ABI_MISMATCH,
/*!\brief Algorithm does not have required capability */
AOM_CODEC_INCAPABLE,
/*!\brief The given bitstream is not supported.
*
* The bitstream was unable to be parsed at the highest level. The decoder
* is unable to proceed. This error \ref SHOULD be treated as fatal to the
* stream. */
AOM_CODEC_UNSUP_BITSTREAM,
/*!\brief Encoded bitstream uses an unsupported feature
*
* The decoder does not implement a feature required by the encoder. This
* return code should only be used for features that prevent future
* pictures from being properly decoded. This error \ref MAY be treated as
* fatal to the stream or \ref MAY be treated as fatal to the current GOP.
*/
AOM_CODEC_UNSUP_FEATURE,
/*!\brief The coded data for this stream is corrupt or incomplete
*
* There was a problem decoding the current frame. This return code
* should only be used for failures that prevent future pictures from
* being properly decoded. This error \ref MAY be treated as fatal to the
* stream or \ref MAY be treated as fatal to the current GOP. If decoding
* is continued for the current GOP, artifacts may be present.
*/
AOM_CODEC_CORRUPT_FRAME,
/*!\brief An application-supplied parameter is not valid.
*
*/
AOM_CODEC_INVALID_PARAM,
/*!\brief An iterator reached the end of list.
*
*/
AOM_CODEC_LIST_END
} aom_codec_err_t;
/*! \brief Codec capabilities bitfield
*
* Each codec advertises the capabilities it supports as part of its
* ::aom_codec_iface_t interface structure. Capabilities are extra interfaces
* or functionality, and are not required to be supported.
*
* The available flags are specified by AOM_CODEC_CAP_* defines.
*/
typedef long aom_codec_caps_t;
#define AOM_CODEC_CAP_DECODER 0x1 /**< Is a decoder */
#define AOM_CODEC_CAP_ENCODER 0x2 /**< Is an encoder */
/*! \brief Initialization-time Feature Enabling
*
* Certain codec features must be known at initialization time, to allow for
* proper memory allocation.
*
* The available flags are specified by AOM_CODEC_USE_* defines.
*/
typedef long aom_codec_flags_t;
/*!\brief Codec interface structure.
*
* Contains function pointers and other data private to the codec
* implementation. This structure is opaque to the application.
*/
typedef const struct aom_codec_iface aom_codec_iface_t;
/*!\brief Codec private data structure.
*
* Contains data private to the codec implementation. This structure is opaque
* to the application.
*/
typedef struct aom_codec_priv aom_codec_priv_t;
/*!\brief Iterator
*
* Opaque storage used for iterating over lists.
*/
typedef const void *aom_codec_iter_t;
/*!\brief Codec context structure
*
* All codecs \ref MUST support this context structure fully. In general,
* this data should be considered private to the codec algorithm, and
* not be manipulated or examined by the calling application. Applications
* may reference the 'name' member to get a printable description of the
* algorithm.
*/
typedef struct aom_codec_ctx {
const char *name; /**< Printable interface name */
aom_codec_iface_t *iface; /**< Interface pointers */
aom_codec_err_t err; /**< Last returned error */
const char *err_detail; /**< Detailed info, if available */
aom_codec_flags_t init_flags; /**< Flags passed at init time */
union {
/**< Decoder Configuration Pointer */
const struct aom_codec_dec_cfg *dec;
/**< Encoder Configuration Pointer */
const struct aom_codec_enc_cfg *enc;
const void *raw;
} config; /**< Configuration pointer aliasing union */
aom_codec_priv_t *priv; /**< Algorithm private storage */
} aom_codec_ctx_t;
/*!\brief Bit depth for codec
* *
* This enumeration determines the bit depth of the codec.
*/
typedef enum aom_bit_depth {
AOM_BITS_8 = 8, /**< 8 bits */
AOM_BITS_10 = 10, /**< 10 bits */
AOM_BITS_12 = 12, /**< 12 bits */
} aom_bit_depth_t;
/*!\brief Superblock size selection.
*
* Defines the superblock size used for encoding. The superblock size can
* either be fixed at 64x64 or 128x128 pixels, or it can be dynamically
* selected by the encoder for each frame.
*/
typedef enum aom_superblock_size {
AOM_SUPERBLOCK_SIZE_64X64, /**< Always use 64x64 superblocks. */
AOM_SUPERBLOCK_SIZE_128X128, /**< Always use 128x128 superblocks. */
AOM_SUPERBLOCK_SIZE_DYNAMIC /**< Select superblock size dynamically. */
} aom_superblock_size_t;
/*
* Library Version Number Interface
*
* For example, see the following sample return values:
* aom_codec_version() (1<<16 | 2<<8 | 3)
* aom_codec_version_str() "v1.2.3-rc1-16-gec6a1ba"
* aom_codec_version_extra_str() "rc1-16-gec6a1ba"
*/
/*!\brief Return the version information (as an integer)
*
* Returns a packed encoding of the library version number. This will only
* include
* the major.minor.patch component of the version number. Note that this encoded
* value should be accessed through the macros provided, as the encoding may
* change
* in the future.
*
*/
int aom_codec_version(void);
#define AOM_VERSION_MAJOR(v) \
((v >> 16) & 0xff) /**< extract major from packed version */
#define AOM_VERSION_MINOR(v) \
((v >> 8) & 0xff) /**< extract minor from packed version */
#define AOM_VERSION_PATCH(v) \
((v >> 0) & 0xff) /**< extract patch from packed version */
/*!\brief Return the version major number */
#define aom_codec_version_major() ((aom_codec_version() >> 16) & 0xff)
/*!\brief Return the version minor number */
#define aom_codec_version_minor() ((aom_codec_version() >> 8) & 0xff)
/*!\brief Return the version patch number */
#define aom_codec_version_patch() ((aom_codec_version() >> 0) & 0xff)
/*!\brief Return the version information (as a string)
*
* Returns a printable string containing the full library version number. This
* may
* contain additional text following the three digit version number, as to
* indicate
* release candidates, prerelease versions, etc.
*
*/
const char *aom_codec_version_str(void);
/*!\brief Return the version information (as a string)
*
* Returns a printable "extra string". This is the component of the string
* returned
* by aom_codec_version_str() following the three digit version number.
*
*/
const char *aom_codec_version_extra_str(void);
/*!\brief Return the build configuration
*
* Returns a printable string containing an encoded version of the build
* configuration. This may be useful to aom support.
*
*/
const char *aom_codec_build_config(void);
/*!\brief Return the name for a given interface
*
* Returns a human readable string for name of the given codec interface.
*
* \param[in] iface Interface pointer
*
*/
const char *aom_codec_iface_name(aom_codec_iface_t *iface);
/*!\brief Convert error number to printable string
*
* Returns a human readable string for the last error returned by the
* algorithm. The returned error will be one line and will not contain
* any newline characters.
*
*
* \param[in] err Error number.
*
*/
const char *aom_codec_err_to_string(aom_codec_err_t err);
/*!\brief Retrieve error synopsis for codec context
*
* Returns a human readable string for the last error returned by the
* algorithm. The returned error will be one line and will not contain
* any newline characters.
*
*
* \param[in] ctx Pointer to this instance's context.
*
*/
const char *aom_codec_error(aom_codec_ctx_t *ctx);
/*!\brief Retrieve detailed error information for codec context
*
* Returns a human readable string providing detailed information about
* the last error.
*
* \param[in] ctx Pointer to this instance's context.
*
* \retval NULL
* No detailed information is available.
*/
const char *aom_codec_error_detail(aom_codec_ctx_t *ctx);
/* REQUIRED FUNCTIONS
*
* The following functions are required to be implemented for all codecs.
* They represent the base case functionality expected of all codecs.
*/
/*!\brief Destroy a codec instance
*
* Destroys a codec context, freeing any associated memory buffers.
*
* \param[in] ctx Pointer to this instance's context
*
* \retval #AOM_CODEC_OK
* The codec algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx);
/*!\brief Get the capabilities of an algorithm.
*
* Retrieves the capabilities bitfield from the algorithm's interface.
*
* \param[in] iface Pointer to the algorithm interface
*
*/
aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface);
/*!\brief Control algorithm
*
* This function is used to exchange algorithm specific data with the codec
* instance. This can be used to implement features specific to a particular
* algorithm.
*
* This wrapper function dispatches the request to the helper function
* associated with the given ctrl_id. It tries to call this function
* transparently, but will return #AOM_CODEC_ERROR if the request could not
* be dispatched.
*
* Note that this function should not be used directly. Call the
* #aom_codec_control wrapper macro instead.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] ctrl_id Algorithm specific control identifier
*
* \retval #AOM_CODEC_OK
* The control request was processed.
* \retval #AOM_CODEC_ERROR
* The control request was not processed.
* \retval #AOM_CODEC_INVALID_PARAM
* The data was not valid.
*/
aom_codec_err_t aom_codec_control_(aom_codec_ctx_t *ctx, int ctrl_id, ...);
#if defined(AOM_DISABLE_CTRL_TYPECHECKS) && AOM_DISABLE_CTRL_TYPECHECKS
#define aom_codec_control(ctx, id, data) aom_codec_control_(ctx, id, data)
#define AOM_CTRL_USE_TYPE(id, typ)
#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ)
#define AOM_CTRL_VOID(id, typ)
#else
/*!\brief aom_codec_control wrapper macro
*
* This macro allows for type safe conversions across the variadic parameter
* to aom_codec_control_().
*
* \internal
* It works by dispatching the call to the control function through a wrapper
* function named with the id parameter.
*/
#define aom_codec_control(ctx, id, data) \
aom_codec_control_##id(ctx, id, data) /**<\hideinitializer*/
/*!\brief aom_codec_control type definition macro
*
* This macro allows for type safe conversions across the variadic parameter
* to aom_codec_control_(). It defines the type of the argument for a given
* control identifier.
*
* \internal
* It defines a static function with
* the correctly typed arguments as a wrapper to the type-unsafe internal
* function.
*/
#define AOM_CTRL_USE_TYPE(id, typ) \
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *, int, typ) \
UNUSED; \
\
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *ctx, \
int ctrl_id, typ data) { \
return aom_codec_control_(ctx, ctrl_id, data); \
} /**<\hideinitializer*/
/*!\brief aom_codec_control deprecated type definition macro
*
* Like #AOM_CTRL_USE_TYPE, but indicates that the specified control is
* deprecated and should not be used. Consult the documentation for your
* codec for more information.
*
* \internal
* It defines a static function with the correctly typed arguments as a
* wrapper to the type-unsafe internal function.
*/
#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ) \
DECLSPEC_DEPRECATED static aom_codec_err_t aom_codec_control_##id( \
aom_codec_ctx_t *, int, typ) DEPRECATED UNUSED; \
\
DECLSPEC_DEPRECATED static aom_codec_err_t aom_codec_control_##id( \
aom_codec_ctx_t *ctx, int ctrl_id, typ data) { \
return aom_codec_control_(ctx, ctrl_id, data); \
} /**<\hideinitializer*/
/*!\brief aom_codec_control void type definition macro
*
* This macro allows for type safe conversions across the variadic parameter
* to aom_codec_control_(). It indicates that a given control identifier takes
* no argument.
*
* \internal
* It defines a static function without a data argument as a wrapper to the
* type-unsafe internal function.
*/
#define AOM_CTRL_VOID(id) \
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *, int) \
UNUSED; \
\
static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *ctx, \
int ctrl_id) { \
return aom_codec_control_(ctx, ctrl_id); \
} /**<\hideinitializer*/
#endif
/*!@} - end defgroup codec*/
#ifdef __cplusplus
}
#endif
#endif // AOM_AOM_CODEC_H_

42
aom/aom_codec.mk Normal file
View File

@ -0,0 +1,42 @@
##
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
API_EXPORTS += exports
API_SRCS-$(CONFIG_AV1_ENCODER) += aom.h
API_SRCS-$(CONFIG_AV1_ENCODER) += aomcx.h
API_DOC_SRCS-$(CONFIG_AV1_ENCODER) += aom.h
API_DOC_SRCS-$(CONFIG_AV1_ENCODER) += aomcx.h
API_SRCS-$(CONFIG_AV1_DECODER) += aom.h
API_SRCS-$(CONFIG_AV1_DECODER) += aomdx.h
API_DOC_SRCS-$(CONFIG_AV1_DECODER) += aom.h
API_DOC_SRCS-$(CONFIG_AV1_DECODER) += aomdx.h
API_DOC_SRCS-yes += aom_codec.h
API_DOC_SRCS-yes += aom_decoder.h
API_DOC_SRCS-yes += aom_encoder.h
API_DOC_SRCS-yes += aom_frame_buffer.h
API_DOC_SRCS-yes += aom_image.h
API_SRCS-yes += src/aom_decoder.c
API_SRCS-yes += aom_decoder.h
API_SRCS-yes += src/aom_encoder.c
API_SRCS-yes += aom_encoder.h
API_SRCS-yes += internal/aom_codec_internal.h
API_SRCS-yes += src/aom_codec.c
API_SRCS-yes += src/aom_image.c
API_SRCS-yes += aom_codec.h
API_SRCS-yes += aom_codec.mk
API_SRCS-yes += aom_frame_buffer.h
API_SRCS-yes += aom_image.h
API_SRCS-yes += aom_integer.h

366
aom/aom_decoder.h Normal file
View File

@ -0,0 +1,366 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOM_DECODER_H_
#define AOM_AOM_DECODER_H_
/*!\defgroup decoder Decoder Algorithm Interface
* \ingroup codec
* This abstraction allows applications using this decoder to easily support
* multiple video formats with minimal code duplication. This section describes
* the interface common to all decoders.
* @{
*/
/*!\file
* \brief Describes the decoder algorithm interface to applications.
*
* This file describes the interface between an application and a
* video decoder algorithm.
*
*/
#ifdef __cplusplus
extern "C" {
#endif
#include "./aom_codec.h"
#include "./aom_frame_buffer.h"
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_DECODER_ABI_VERSION \
(3 + AOM_CODEC_ABI_VERSION) /**<\hideinitializer*/
/*! \brief Decoder capabilities bitfield
*
* Each decoder advertises the capabilities it supports as part of its
* ::aom_codec_iface_t interface structure. Capabilities are extra interfaces
* or functionality, and are not required to be supported by a decoder.
*
* The available flags are specified by AOM_CODEC_CAP_* defines.
*/
#define AOM_CODEC_CAP_PUT_SLICE 0x10000 /**< Will issue put_slice callbacks */
#define AOM_CODEC_CAP_PUT_FRAME 0x20000 /**< Will issue put_frame callbacks */
#define AOM_CODEC_CAP_POSTPROC 0x40000 /**< Can postprocess decoded frame */
/*!\brief Can conceal errors due to packet loss */
#define AOM_CODEC_CAP_ERROR_CONCEALMENT 0x80000
/*!\brief Can receive encoded frames one fragment at a time */
#define AOM_CODEC_CAP_INPUT_FRAGMENTS 0x100000
/*! \brief Initialization-time Feature Enabling
*
* Certain codec features must be known at initialization time, to allow for
* proper memory allocation.
*
* The available flags are specified by AOM_CODEC_USE_* defines.
*/
/*!\brief Can support frame-based multi-threading */
#define AOM_CODEC_CAP_FRAME_THREADING 0x200000
/*!brief Can support external frame buffers */
#define AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER 0x400000
#define AOM_CODEC_USE_POSTPROC 0x10000 /**< Postprocess decoded frame */
/*!\brief Conceal errors in decoded frames */
#define AOM_CODEC_USE_ERROR_CONCEALMENT 0x20000
/*!\brief The input frame should be passed to the decoder one fragment at a
* time */
#define AOM_CODEC_USE_INPUT_FRAGMENTS 0x40000
/*!\brief Enable frame-based multi-threading */
#define AOM_CODEC_USE_FRAME_THREADING 0x80000
/*!\brief Stream properties
*
* This structure is used to query or set properties of the decoded
* stream. Algorithms may extend this structure with data specific
* to their bitstream by setting the sz member appropriately.
*/
typedef struct aom_codec_stream_info {
unsigned int sz; /**< Size of this structure */
unsigned int w; /**< Width (or 0 for unknown/default) */
unsigned int h; /**< Height (or 0 for unknown/default) */
unsigned int is_kf; /**< Current frame is a keyframe */
} aom_codec_stream_info_t;
/* REQUIRED FUNCTIONS
*
* The following functions are required to be implemented for all decoders.
* They represent the base case functionality expected of all decoders.
*/
/*!\brief Initialization Configurations
*
* This structure is used to pass init time configuration options to the
* decoder.
*/
typedef struct aom_codec_dec_cfg {
unsigned int threads; /**< Maximum number of threads to use, default 1 */
unsigned int w; /**< Width */
unsigned int h; /**< Height */
} aom_codec_dec_cfg_t; /**< alias for struct aom_codec_dec_cfg */
/*!\brief Initialize a decoder instance
*
* Initializes a decoder context using the given interface. Applications
* should call the aom_codec_dec_init convenience macro instead of this
* function directly, to ensure that the ABI version number parameter
* is properly initialized.
*
* If the library was configured with --disable-multithread, this call
* is not thread safe and should be guarded with a lock if being used
* in a multithreaded context.
*
* \param[in] ctx Pointer to this instance's context.
* \param[in] iface Pointer to the algorithm interface to use.
* \param[in] cfg Configuration to use, if known. May be NULL.
* \param[in] flags Bitfield of AOM_CODEC_USE_* flags
* \param[in] ver ABI version number. Must be set to
* AOM_DECODER_ABI_VERSION
* \retval #AOM_CODEC_OK
* The decoder algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_dec_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_dec_cfg_t *cfg,
aom_codec_flags_t flags, int ver);
/*!\brief Convenience macro for aom_codec_dec_init_ver()
*
* Ensures the ABI version parameter is properly set.
*/
#define aom_codec_dec_init(ctx, iface, cfg, flags) \
aom_codec_dec_init_ver(ctx, iface, cfg, flags, AOM_DECODER_ABI_VERSION)
/*!\brief Parse stream info from a buffer
*
* Performs high level parsing of the bitstream. Construction of a decoder
* context is not necessary. Can be used to determine if the bitstream is
* of the proper format, and to extract information from the stream.
*
* \param[in] iface Pointer to the algorithm interface
* \param[in] data Pointer to a block of data to parse
* \param[in] data_sz Size of the data buffer
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
aom_codec_err_t aom_codec_peek_stream_info(aom_codec_iface_t *iface,
const uint8_t *data,
unsigned int data_sz,
aom_codec_stream_info_t *si);
/*!\brief Return information about the current stream.
*
* Returns information about the stream that has been parsed during decoding.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
aom_codec_err_t aom_codec_get_stream_info(aom_codec_ctx_t *ctx,
aom_codec_stream_info_t *si);
/*!\brief Decode data
*
* Processes a buffer of coded data. If the processing results in a new
* decoded frame becoming available, PUT_SLICE and PUT_FRAME events may be
* generated, as appropriate. Encoded data \ref MUST be passed in DTS (decode
* time stamp) order. Frames produced will always be in PTS (presentation
* time stamp) order.
* If the decoder is configured with AOM_CODEC_USE_INPUT_FRAGMENTS enabled,
* data and data_sz can contain a fragment of the encoded frame. Fragment
* \#n must contain at least partition \#n, but can also contain subsequent
* partitions (\#n+1 - \#n+i), and if so, fragments \#n+1, .., \#n+i must
* be empty. When no more data is available, this function should be called
* with NULL as data and 0 as data_sz. The memory passed to this function
* must be available until the frame has been decoded.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] data Pointer to this block of new coded data. If
* NULL, a AOM_CODEC_CB_PUT_FRAME event is posted
* for the previously decoded frame.
* \param[in] data_sz Size of the coded data, in bytes.
* \param[in] user_priv Application specific data to associate with
* this frame.
* \param[in] deadline Soft deadline the decoder should attempt to meet,
* in us. Set to zero for unlimited.
*
* \return Returns #AOM_CODEC_OK if the coded data was processed completely
* and future pictures can be decoded without error. Otherwise,
* see the descriptions of the other error codes in ::aom_codec_err_t
* for recoverability capabilities.
*/
aom_codec_err_t aom_codec_decode(aom_codec_ctx_t *ctx, const uint8_t *data,
unsigned int data_sz, void *user_priv,
long deadline);
/*!\brief Decoded frames iterator
*
* Iterates over a list of the frames available for display. The iterator
* storage should be initialized to NULL to start the iteration. Iteration is
* complete when this function returns NULL.
*
* The list of available frames becomes valid upon completion of the
* aom_codec_decode call, and remains valid until the next call to
* aom_codec_decode.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] iter Iterator storage, initialized to NULL
*
* \return Returns a pointer to an image, if one is ready for display. Frames
* produced will always be in PTS (presentation time stamp) order.
*/
aom_image_t *aom_codec_get_frame(aom_codec_ctx_t *ctx, aom_codec_iter_t *iter);
/*!\defgroup cap_put_frame Frame-Based Decoding Functions
*
* The following functions are required to be implemented for all decoders
* that advertise the AOM_CODEC_CAP_PUT_FRAME capability. Calling these
* functions
* for codecs that don't advertise this capability will result in an error
* code being returned, usually AOM_CODEC_ERROR
* @{
*/
/*!\brief put frame callback prototype
*
* This callback is invoked by the decoder to notify the application of
* the availability of decoded image data.
*/
typedef void (*aom_codec_put_frame_cb_fn_t)(void *user_priv,
const aom_image_t *img);
/*!\brief Register for notification of frame completion.
*
* Registers a given function to be called when a decoded frame is
* available.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb Pointer to the callback function
* \param[in] user_priv User's private data
*
* \retval #AOM_CODEC_OK
* Callback successfully registered.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* posting slice completion.
*/
aom_codec_err_t aom_codec_register_put_frame_cb(aom_codec_ctx_t *ctx,
aom_codec_put_frame_cb_fn_t cb,
void *user_priv);
/*!@} - end defgroup cap_put_frame */
/*!\defgroup cap_put_slice Slice-Based Decoding Functions
*
* The following functions are required to be implemented for all decoders
* that advertise the AOM_CODEC_CAP_PUT_SLICE capability. Calling these
* functions
* for codecs that don't advertise this capability will result in an error
* code being returned, usually AOM_CODEC_ERROR
* @{
*/
/*!\brief put slice callback prototype
*
* This callback is invoked by the decoder to notify the application of
* the availability of partially decoded image data. The
*/
typedef void (*aom_codec_put_slice_cb_fn_t)(void *user_priv,
const aom_image_t *img,
const aom_image_rect_t *valid,
const aom_image_rect_t *update);
/*!\brief Register for notification of slice completion.
*
* Registers a given function to be called when a decoded slice is
* available.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb Pointer to the callback function
* \param[in] user_priv User's private data
*
* \retval #AOM_CODEC_OK
* Callback successfully registered.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* posting slice completion.
*/
aom_codec_err_t aom_codec_register_put_slice_cb(aom_codec_ctx_t *ctx,
aom_codec_put_slice_cb_fn_t cb,
void *user_priv);
/*!@} - end defgroup cap_put_slice*/
/*!\defgroup cap_external_frame_buffer External Frame Buffer Functions
*
* The following section is required to be implemented for all decoders
* that advertise the AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER capability.
* Calling this function for codecs that don't advertise this capability
* will result in an error code being returned, usually AOM_CODEC_ERROR.
*
* \note
* Currently this only works with AV1.
* @{
*/
/*!\brief Pass in external frame buffers for the decoder to use.
*
* Registers functions to be called when libaom needs a frame buffer
* to decode the current frame and a function to be called when libaom does
* not internally reference the frame buffer. This set function must
* be called before the first call to decode or libaom will assume the
* default behavior of allocating frame buffers internally.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb_get Pointer to the get callback function
* \param[in] cb_release Pointer to the release callback function
* \param[in] cb_priv Callback's private data
*
* \retval #AOM_CODEC_OK
* External frame buffers will be used by libaom.
* \retval #AOM_CODEC_INVALID_PARAM
* One or more of the callbacks were NULL.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* using external frame buffers.
*
* \note
* When decoding AV1, the application may be required to pass in at least
* #AOM_MAXIMUM_WORK_BUFFERS external frame
* buffers.
*/
aom_codec_err_t aom_codec_set_frame_buffer_functions(
aom_codec_ctx_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv);
/*!@} - end defgroup cap_external_frame_buffer */
/*!@} - end defgroup decoder*/
#ifdef __cplusplus
}
#endif
#endif // AOM_AOM_DECODER_H_

837
aom/aom_encoder.h Normal file
View File

@ -0,0 +1,837 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOM_ENCODER_H_
#define AOM_AOM_ENCODER_H_
/*!\defgroup encoder Encoder Algorithm Interface
* \ingroup codec
* This abstraction allows applications using this encoder to easily support
* multiple video formats with minimal code duplication. This section describes
* the interface common to all encoders.
* @{
*/
/*!\file
* \brief Describes the encoder algorithm interface to applications.
*
* This file describes the interface between an application and a
* video encoder algorithm.
*
*/
#ifdef __cplusplus
extern "C" {
#endif
#include "./aom_codec.h"
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_ENCODER_ABI_VERSION \
(5 + AOM_CODEC_ABI_VERSION) /**<\hideinitializer*/
/*! \brief Encoder capabilities bitfield
*
* Each encoder advertises the capabilities it supports as part of its
* ::aom_codec_iface_t interface structure. Capabilities are extra
* interfaces or functionality, and are not required to be supported
* by an encoder.
*
* The available flags are specified by AOM_CODEC_CAP_* defines.
*/
#define AOM_CODEC_CAP_PSNR 0x10000 /**< Can issue PSNR packets */
/*! Can output one partition at a time. Each partition is returned in its
* own AOM_CODEC_CX_FRAME_PKT, with the FRAME_IS_FRAGMENT flag set for
* every partition but the last. In this mode all frames are always
* returned partition by partition.
*/
#define AOM_CODEC_CAP_OUTPUT_PARTITION 0x20000
/*! Can support input images at greater than 8 bitdepth.
*/
#define AOM_CODEC_CAP_HIGHBITDEPTH 0x40000
/*! \brief Initialization-time Feature Enabling
*
* Certain codec features must be known at initialization time, to allow
* for proper memory allocation.
*
* The available flags are specified by AOM_CODEC_USE_* defines.
*/
#define AOM_CODEC_USE_PSNR 0x10000 /**< Calculate PSNR on each frame */
/*!\brief Make the encoder output one partition at a time. */
#define AOM_CODEC_USE_OUTPUT_PARTITION 0x20000
#define AOM_CODEC_USE_HIGHBITDEPTH 0x40000 /**< Use high bitdepth */
/*!\brief Generic fixed size buffer structure
*
* This structure is able to hold a reference to any fixed size buffer.
*/
typedef struct aom_fixed_buf {
void *buf; /**< Pointer to the data */
size_t sz; /**< Length of the buffer, in chars */
} aom_fixed_buf_t; /**< alias for struct aom_fixed_buf */
/*!\brief Time Stamp Type
*
* An integer, which when multiplied by the stream's time base, provides
* the absolute time of a sample.
*/
typedef int64_t aom_codec_pts_t;
/*!\brief Compressed Frame Flags
*
* This type represents a bitfield containing information about a compressed
* frame that may be useful to an application. The most significant 16 bits
* can be used by an algorithm to provide additional detail, for example to
* support frame types that are codec specific (MPEG-1 D-frames for example)
*/
typedef uint32_t aom_codec_frame_flags_t;
#define AOM_FRAME_IS_KEY 0x1 /**< frame is the start of a GOP */
/*!\brief frame can be dropped without affecting the stream (no future frame
* depends on this one) */
#define AOM_FRAME_IS_DROPPABLE 0x2
/*!\brief frame should be decoded but will not be shown */
#define AOM_FRAME_IS_INVISIBLE 0x4
/*!\brief this is a fragment of the encoded frame */
#define AOM_FRAME_IS_FRAGMENT 0x8
/*!\brief Error Resilient flags
*
* These flags define which error resilient features to enable in the
* encoder. The flags are specified through the
* aom_codec_enc_cfg::g_error_resilient variable.
*/
typedef uint32_t aom_codec_er_flags_t;
/*!\brief Improve resiliency against losses of whole frames */
#define AOM_ERROR_RESILIENT_DEFAULT 0x1
/*!\brief The frame partitions are independently decodable by the bool decoder,
* meaning that partitions can be decoded even though earlier partitions have
* been lost. Note that intra prediction is still done over the partition
* boundary. */
#define AOM_ERROR_RESILIENT_PARTITIONS 0x2
/*!\brief Encoder output packet variants
*
* This enumeration lists the different kinds of data packets that can be
* returned by calls to aom_codec_get_cx_data(). Algorithms \ref MAY
* extend this list to provide additional functionality.
*/
enum aom_codec_cx_pkt_kind {
AOM_CODEC_CX_FRAME_PKT, /**< Compressed video frame */
AOM_CODEC_STATS_PKT, /**< Two-pass statistics for this frame */
AOM_CODEC_FPMB_STATS_PKT, /**< first pass mb statistics for this frame */
AOM_CODEC_PSNR_PKT, /**< PSNR statistics for this frame */
AOM_CODEC_CUSTOM_PKT = 256 /**< Algorithm extensions */
};
/*!\brief Encoder output packet
*
* This structure contains the different kinds of output data the encoder
* may produce while compressing a frame.
*/
typedef struct aom_codec_cx_pkt {
enum aom_codec_cx_pkt_kind kind; /**< packet variant */
union {
struct {
void *buf; /**< compressed data buffer */
size_t sz; /**< length of compressed data */
/*!\brief time stamp to show frame (in timebase units) */
aom_codec_pts_t pts;
/*!\brief duration to show frame (in timebase units) */
unsigned long duration;
aom_codec_frame_flags_t flags; /**< flags for this frame */
/*!\brief the partition id defines the decoding order of the partitions.
* Only applicable when "output partition" mode is enabled. First
* partition has id 0.*/
int partition_id;
} frame; /**< data for compressed frame packet */
aom_fixed_buf_t twopass_stats; /**< data for two-pass packet */
aom_fixed_buf_t firstpass_mb_stats; /**< first pass mb packet */
struct aom_psnr_pkt {
unsigned int samples[4]; /**< Number of samples, total/y/u/v */
uint64_t sse[4]; /**< sum squared error, total/y/u/v */
double psnr[4]; /**< PSNR, total/y/u/v */
} psnr; /**< data for PSNR packet */
aom_fixed_buf_t raw; /**< data for arbitrary packets */
/* This packet size is fixed to allow codecs to extend this
* interface without having to manage storage for raw packets,
* i.e., if it's smaller than 128 bytes, you can store in the
* packet list directly.
*/
char pad[128 - sizeof(enum aom_codec_cx_pkt_kind)]; /**< fixed sz */
} data; /**< packet data */
} aom_codec_cx_pkt_t; /**< alias for struct aom_codec_cx_pkt */
/*!\brief Rational Number
*
* This structure holds a fractional value.
*/
typedef struct aom_rational {
int num; /**< fraction numerator */
int den; /**< fraction denominator */
} aom_rational_t; /**< alias for struct aom_rational */
/*!\brief Multi-pass Encoding Pass */
enum aom_enc_pass {
AOM_RC_ONE_PASS, /**< Single pass mode */
AOM_RC_FIRST_PASS, /**< First pass of multi-pass mode */
AOM_RC_LAST_PASS /**< Final pass of multi-pass mode */
};
/*!\brief Rate control mode */
enum aom_rc_mode {
AOM_VBR, /**< Variable Bit Rate (VBR) mode */
AOM_CBR, /**< Constant Bit Rate (CBR) mode */
AOM_CQ, /**< Constrained Quality (CQ) mode */
AOM_Q, /**< Constant Quality (Q) mode */
};
/*!\brief Keyframe placement mode.
*
* This enumeration determines whether keyframes are placed automatically by
* the encoder or whether this behavior is disabled. Older releases of this
* SDK were implemented such that AOM_KF_FIXED meant keyframes were disabled.
* This name is confusing for this behavior, so the new symbols to be used
* are AOM_KF_AUTO and AOM_KF_DISABLED.
*/
enum aom_kf_mode {
AOM_KF_FIXED, /**< deprecated, implies AOM_KF_DISABLED */
AOM_KF_AUTO, /**< Encoder determines optimal placement automatically */
AOM_KF_DISABLED = 0 /**< Encoder does not place keyframes. */
};
/*!\brief Encoded Frame Flags
*
* This type indicates a bitfield to be passed to aom_codec_encode(), defining
* per-frame boolean values. By convention, bits common to all codecs will be
* named AOM_EFLAG_*, and bits specific to an algorithm will be named
* /algo/_eflag_*. The lower order 16 bits are reserved for common use.
*/
typedef long aom_enc_frame_flags_t;
#define AOM_EFLAG_FORCE_KF (1 << 0) /**< Force this frame to be a keyframe */
/*!\brief Encoder configuration structure
*
* This structure contains the encoder settings that have common representations
* across all codecs. This doesn't imply that all codecs support all features,
* however.
*/
typedef struct aom_codec_enc_cfg {
/*
* generic settings (g)
*/
/*!\brief Algorithm specific "usage" value
*
* Algorithms may define multiple values for usage, which may convey the
* intent of how the application intends to use the stream. If this value
* is non-zero, consult the documentation for the codec to determine its
* meaning.
*/
unsigned int g_usage;
/*!\brief Maximum number of threads to use
*
* For multi-threaded implementations, use no more than this number of
* threads. The codec may use fewer threads than allowed. The value
* 0 is equivalent to the value 1.
*/
unsigned int g_threads;
/*!\brief Bitstream profile to use
*
* Some codecs support a notion of multiple bitstream profiles. Typically
* this maps to a set of features that are turned on or off. Often the
* profile to use is determined by the features of the intended decoder.
* Consult the documentation for the codec to determine the valid values
* for this parameter, or set to zero for a sane default.
*/
unsigned int g_profile; /**< profile of bitstream to use */
/*!\brief Width of the frame
*
* This value identifies the presentation resolution of the frame,
* in pixels. Note that the frames passed as input to the encoder must
* have this resolution. Frames will be presented by the decoder in this
* resolution, independent of any spatial resampling the encoder may do.
*/
unsigned int g_w;
/*!\brief Height of the frame
*
* This value identifies the presentation resolution of the frame,
* in pixels. Note that the frames passed as input to the encoder must
* have this resolution. Frames will be presented by the decoder in this
* resolution, independent of any spatial resampling the encoder may do.
*/
unsigned int g_h;
/*!\brief Bit-depth of the codec
*
* This value identifies the bit_depth of the codec,
* Only certain bit-depths are supported as identified in the
* aom_bit_depth_t enum.
*/
aom_bit_depth_t g_bit_depth;
/*!\brief Bit-depth of the input frames
*
* This value identifies the bit_depth of the input frames in bits.
* Note that the frames passed as input to the encoder must have
* this bit-depth.
*/
unsigned int g_input_bit_depth;
/*!\brief Stream timebase units
*
* Indicates the smallest interval of time, in seconds, used by the stream.
* For fixed frame rate material, or variable frame rate material where
* frames are timed at a multiple of a given clock (ex: video capture),
* the \ref RECOMMENDED method is to set the timebase to the reciprocal
* of the frame rate (ex: 1001/30000 for 29.970 Hz NTSC). This allows the
* pts to correspond to the frame number, which can be handy. For
* re-encoding video from containers with absolute time timestamps, the
* \ref RECOMMENDED method is to set the timebase to that of the parent
* container or multimedia framework (ex: 1/1000 for ms, as in FLV).
*/
struct aom_rational g_timebase;
/*!\brief Enable error resilient modes.
*
* The error resilient bitfield indicates to the encoder which features
* it should enable to take measures for streaming over lossy or noisy
* links.
*/
aom_codec_er_flags_t g_error_resilient;
/*!\brief Multi-pass Encoding Mode
*
* This value should be set to the current phase for multi-pass encoding.
* For single pass, set to #AOM_RC_ONE_PASS.
*/
enum aom_enc_pass g_pass;
/*!\brief Allow lagged encoding
*
* If set, this value allows the encoder to consume a number of input
* frames before producing output frames. This allows the encoder to
* base decisions for the current frame on future frames. This does
* increase the latency of the encoding pipeline, so it is not appropriate
* in all situations (ex: realtime encoding).
*
* Note that this is a maximum value -- the encoder may produce frames
* sooner than the given limit. Set this value to 0 to disable this
* feature.
*/
unsigned int g_lag_in_frames;
/*
* rate control settings (rc)
*/
/*!\brief Temporal resampling configuration, if supported by the codec.
*
* Temporal resampling allows the codec to "drop" frames as a strategy to
* meet its target data rate. This can cause temporal discontinuities in
* the encoded video, which may appear as stuttering during playback. This
* trade-off is often acceptable, but for many applications is not. It can
* be disabled in these cases.
*
* Note that not all codecs support this feature. All aom AVx codecs do.
* For other codecs, consult the documentation for that algorithm.
*
* This threshold is described as a percentage of the target data buffer.
* When the data buffer falls below this percentage of fullness, a
* dropped frame is indicated. Set the threshold to zero (0) to disable
* this feature.
*/
unsigned int rc_dropframe_thresh;
/*!\brief Enable/disable spatial resampling, if supported by the codec.
*
* Spatial resampling allows the codec to compress a lower resolution
* version of the frame, which is then upscaled by the encoder to the
* correct presentation resolution. This increases visual quality at
* low data rates, at the expense of CPU time on the encoder/decoder.
*/
unsigned int rc_resize_allowed;
/*!\brief Internal coded frame width.
*
* If spatial resampling is enabled this specifies the width of the
* encoded frame.
*/
unsigned int rc_scaled_width;
/*!\brief Internal coded frame height.
*
* If spatial resampling is enabled this specifies the height of the
* encoded frame.
*/
unsigned int rc_scaled_height;
/*!\brief Spatial resampling up watermark.
*
* This threshold is described as a percentage of the target data buffer.
* When the data buffer rises above this percentage of fullness, the
* encoder will step up to a higher resolution version of the frame.
*/
unsigned int rc_resize_up_thresh;
/*!\brief Spatial resampling down watermark.
*
* This threshold is described as a percentage of the target data buffer.
* When the data buffer falls below this percentage of fullness, the
* encoder will step down to a lower resolution version of the frame.
*/
unsigned int rc_resize_down_thresh;
/*!\brief Rate control algorithm to use.
*
* Indicates whether the end usage of this stream is to be streamed over
* a bandwidth constrained link, indicating that Constant Bit Rate (CBR)
* mode should be used, or whether it will be played back on a high
* bandwidth link, as from a local disk, where higher variations in
* bitrate are acceptable.
*/
enum aom_rc_mode rc_end_usage;
/*!\brief Two-pass stats buffer.
*
* A buffer containing all of the stats packets produced in the first
* pass, concatenated.
*/
aom_fixed_buf_t rc_twopass_stats_in;
/*!\brief first pass mb stats buffer.
*
* A buffer containing all of the first pass mb stats packets produced
* in the first pass, concatenated.
*/
aom_fixed_buf_t rc_firstpass_mb_stats_in;
/*!\brief Target data rate
*
* Target bandwidth to use for this stream, in kilobits per second.
*/
unsigned int rc_target_bitrate;
/*
* quantizer settings
*/
/*!\brief Minimum (Best Quality) Quantizer
*
* The quantizer is the most direct control over the quality of the
* encoded image. The range of valid values for the quantizer is codec
* specific. Consult the documentation for the codec to determine the
* values to use. To determine the range programmatically, call
* aom_codec_enc_config_default() with a usage value of 0.
*/
unsigned int rc_min_quantizer;
/*!\brief Maximum (Worst Quality) Quantizer
*
* The quantizer is the most direct control over the quality of the
* encoded image. The range of valid values for the quantizer is codec
* specific. Consult the documentation for the codec to determine the
* values to use. To determine the range programmatically, call
* aom_codec_enc_config_default() with a usage value of 0.
*/
unsigned int rc_max_quantizer;
/*
* bitrate tolerance
*/
/*!\brief Rate control adaptation undershoot control
*
* This value, expressed as a percentage of the target bitrate,
* controls the maximum allowed adaptation speed of the codec.
* This factor controls the maximum amount of bits that can
* be subtracted from the target bitrate in order to compensate
* for prior overshoot.
*
* Valid values in the range 0-1000.
*/
unsigned int rc_undershoot_pct;
/*!\brief Rate control adaptation overshoot control
*
* This value, expressed as a percentage of the target bitrate,
* controls the maximum allowed adaptation speed of the codec.
* This factor controls the maximum amount of bits that can
* be added to the target bitrate in order to compensate for
* prior undershoot.
*
* Valid values in the range 0-1000.
*/
unsigned int rc_overshoot_pct;
/*
* decoder buffer model parameters
*/
/*!\brief Decoder Buffer Size
*
* This value indicates the amount of data that may be buffered by the
* decoding application. Note that this value is expressed in units of
* time (milliseconds). For example, a value of 5000 indicates that the
* client will buffer (at least) 5000ms worth of encoded data. Use the
* target bitrate (#rc_target_bitrate) to convert to bits/bytes, if
* necessary.
*/
unsigned int rc_buf_sz;
/*!\brief Decoder Buffer Initial Size
*
* This value indicates the amount of data that will be buffered by the
* decoding application prior to beginning playback. This value is
* expressed in units of time (milliseconds). Use the target bitrate
* (#rc_target_bitrate) to convert to bits/bytes, if necessary.
*/
unsigned int rc_buf_initial_sz;
/*!\brief Decoder Buffer Optimal Size
*
* This value indicates the amount of data that the encoder should try
* to maintain in the decoder's buffer. This value is expressed in units
* of time (milliseconds). Use the target bitrate (#rc_target_bitrate)
* to convert to bits/bytes, if necessary.
*/
unsigned int rc_buf_optimal_sz;
/*
* 2 pass rate control parameters
*/
/*!\brief Two-pass mode CBR/VBR bias
*
* Bias, expressed on a scale of 0 to 100, for determining target size
* for the current frame. The value 0 indicates the optimal CBR mode
* value should be used. The value 100 indicates the optimal VBR mode
* value should be used. Values in between indicate which way the
* encoder should "lean."
*/
unsigned int rc_2pass_vbr_bias_pct;
/*!\brief Two-pass mode per-GOP minimum bitrate
*
* This value, expressed as a percentage of the target bitrate, indicates
* the minimum bitrate to be used for a single GOP (aka "section")
*/
unsigned int rc_2pass_vbr_minsection_pct;
/*!\brief Two-pass mode per-GOP maximum bitrate
*
* This value, expressed as a percentage of the target bitrate, indicates
* the maximum bitrate to be used for a single GOP (aka "section")
*/
unsigned int rc_2pass_vbr_maxsection_pct;
/*
* keyframing settings (kf)
*/
/*!\brief Keyframe placement mode
*
* This value indicates whether the encoder should place keyframes at a
* fixed interval, or determine the optimal placement automatically
* (as governed by the #kf_min_dist and #kf_max_dist parameters)
*/
enum aom_kf_mode kf_mode;
/*!\brief Keyframe minimum interval
*
* This value, expressed as a number of frames, prevents the encoder from
* placing a keyframe nearer than kf_min_dist to the previous keyframe. At
* least kf_min_dist frames non-keyframes will be coded before the next
* keyframe. Set kf_min_dist equal to kf_max_dist for a fixed interval.
*/
unsigned int kf_min_dist;
/*!\brief Keyframe maximum interval
*
* This value, expressed as a number of frames, forces the encoder to code
* a keyframe if one has not been coded in the last kf_max_dist frames.
* A value of 0 implies all frames will be keyframes. Set kf_min_dist
* equal to kf_max_dist for a fixed interval.
*/
unsigned int kf_max_dist;
} aom_codec_enc_cfg_t; /**< alias for struct aom_codec_enc_cfg */
/*!\brief Initialize an encoder instance
*
* Initializes a encoder context using the given interface. Applications
* should call the aom_codec_enc_init convenience macro instead of this
* function directly, to ensure that the ABI version number parameter
* is properly initialized.
*
* If the library was configured with --disable-multithread, this call
* is not thread safe and should be guarded with a lock if being used
* in a multithreaded context.
*
* \param[in] ctx Pointer to this instance's context.
* \param[in] iface Pointer to the algorithm interface to use.
* \param[in] cfg Configuration to use, if known. May be NULL.
* \param[in] flags Bitfield of AOM_CODEC_USE_* flags
* \param[in] ver ABI version number. Must be set to
* AOM_ENCODER_ABI_VERSION
* \retval #AOM_CODEC_OK
* The decoder algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_enc_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_enc_cfg_t *cfg,
aom_codec_flags_t flags, int ver);
/*!\brief Convenience macro for aom_codec_enc_init_ver()
*
* Ensures the ABI version parameter is properly set.
*/
#define aom_codec_enc_init(ctx, iface, cfg, flags) \
aom_codec_enc_init_ver(ctx, iface, cfg, flags, AOM_ENCODER_ABI_VERSION)
/*!\brief Initialize multi-encoder instance
*
* Initializes multi-encoder context using the given interface.
* Applications should call the aom_codec_enc_init_multi convenience macro
* instead of this function directly, to ensure that the ABI version number
* parameter is properly initialized.
*
* \param[in] ctx Pointer to this instance's context.
* \param[in] iface Pointer to the algorithm interface to use.
* \param[in] cfg Configuration to use, if known. May be NULL.
* \param[in] num_enc Total number of encoders.
* \param[in] flags Bitfield of AOM_CODEC_USE_* flags
* \param[in] dsf Pointer to down-sampling factors.
* \param[in] ver ABI version number. Must be set to
* AOM_ENCODER_ABI_VERSION
* \retval #AOM_CODEC_OK
* The decoder algorithm initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
aom_codec_err_t aom_codec_enc_init_multi_ver(
aom_codec_ctx_t *ctx, aom_codec_iface_t *iface, aom_codec_enc_cfg_t *cfg,
int num_enc, aom_codec_flags_t flags, aom_rational_t *dsf, int ver);
/*!\brief Convenience macro for aom_codec_enc_init_multi_ver()
*
* Ensures the ABI version parameter is properly set.
*/
#define aom_codec_enc_init_multi(ctx, iface, cfg, num_enc, flags, dsf) \
aom_codec_enc_init_multi_ver(ctx, iface, cfg, num_enc, flags, dsf, \
AOM_ENCODER_ABI_VERSION)
/*!\brief Get a default configuration
*
* Initializes a encoder configuration structure with default values. Supports
* the notion of "usages" so that an algorithm may offer different default
* settings depending on the user's intended goal. This function \ref SHOULD
* be called by all applications to initialize the configuration structure
* before specializing the configuration with application specific values.
*
* \param[in] iface Pointer to the algorithm interface to use.
* \param[out] cfg Configuration buffer to populate.
* \param[in] reserved Must set to 0 for VP8 and AV1.
*
* \retval #AOM_CODEC_OK
* The configuration was populated.
* \retval #AOM_CODEC_INCAPABLE
* Interface is not an encoder interface.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, or the usage value was not recognized.
*/
aom_codec_err_t aom_codec_enc_config_default(aom_codec_iface_t *iface,
aom_codec_enc_cfg_t *cfg,
unsigned int reserved);
/*!\brief Set or change configuration
*
* Reconfigures an encoder instance according to the given configuration.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cfg Configuration buffer to use
*
* \retval #AOM_CODEC_OK
* The configuration was populated.
* \retval #AOM_CODEC_INCAPABLE
* Interface is not an encoder interface.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, or the usage value was not recognized.
*/
aom_codec_err_t aom_codec_enc_config_set(aom_codec_ctx_t *ctx,
const aom_codec_enc_cfg_t *cfg);
/*!\brief Get global stream headers
*
* Retrieves a stream level global header packet, if supported by the codec.
*
* \param[in] ctx Pointer to this instance's context
*
* \retval NULL
* Encoder does not support global header
* \retval Non-NULL
* Pointer to buffer containing global header packet
*/
aom_fixed_buf_t *aom_codec_get_global_headers(aom_codec_ctx_t *ctx);
/*!\brief deadline parameter analogous to AVx REALTIME mode. */
#define AOM_DL_REALTIME (1)
/*!\brief deadline parameter analogous to AVx GOOD QUALITY mode. */
#define AOM_DL_GOOD_QUALITY (1000000)
/*!\brief deadline parameter analogous to AVx BEST QUALITY mode. */
#define AOM_DL_BEST_QUALITY (0)
/*!\brief Encode a frame
*
* Encodes a video frame at the given "presentation time." The presentation
* time stamp (PTS) \ref MUST be strictly increasing.
*
* The encoder supports the notion of a soft real-time deadline. Given a
* non-zero value to the deadline parameter, the encoder will make a "best
* effort" guarantee to return before the given time slice expires. It is
* implicit that limiting the available time to encode will degrade the
* output quality. The encoder can be given an unlimited time to produce the
* best possible frame by specifying a deadline of '0'. This deadline
* supercedes the AVx notion of "best quality, good quality, realtime".
* Applications that wish to map these former settings to the new deadline
* based system can use the symbols #AOM_DL_REALTIME, #AOM_DL_GOOD_QUALITY,
* and #AOM_DL_BEST_QUALITY.
*
* When the last frame has been passed to the encoder, this function should
* continue to be called, with the img parameter set to NULL. This will
* signal the end-of-stream condition to the encoder and allow it to encode
* any held buffers. Encoding is complete when aom_codec_encode() is called
* and aom_codec_get_cx_data() returns no data.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] img Image data to encode, NULL to flush.
* \param[in] pts Presentation time stamp, in timebase units.
* \param[in] duration Duration to show frame, in timebase units.
* \param[in] flags Flags to use for encoding this frame.
* \param[in] deadline Time to spend encoding, in microseconds. (0=infinite)
*
* \retval #AOM_CODEC_OK
* The configuration was populated.
* \retval #AOM_CODEC_INCAPABLE
* Interface is not an encoder interface.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, the image format is unsupported, etc.
*/
aom_codec_err_t aom_codec_encode(aom_codec_ctx_t *ctx, const aom_image_t *img,
aom_codec_pts_t pts, unsigned long duration,
aom_enc_frame_flags_t flags,
unsigned long deadline);
/*!\brief Set compressed data output buffer
*
* Sets the buffer that the codec should output the compressed data
* into. This call effectively sets the buffer pointer returned in the
* next AOM_CODEC_CX_FRAME_PKT packet. Subsequent packets will be
* appended into this buffer. The buffer is preserved across frames,
* so applications must periodically call this function after flushing
* the accumulated compressed data to disk or to the network to reset
* the pointer to the buffer's head.
*
* `pad_before` bytes will be skipped before writing the compressed
* data, and `pad_after` bytes will be appended to the packet. The size
* of the packet will be the sum of the size of the actual compressed
* data, pad_before, and pad_after. The padding bytes will be preserved
* (not overwritten).
*
* Note that calling this function does not guarantee that the returned
* compressed data will be placed into the specified buffer. In the
* event that the encoded data will not fit into the buffer provided,
* the returned packet \ref MAY point to an internal buffer, as it would
* if this call were never used. In this event, the output packet will
* NOT have any padding, and the application must free space and copy it
* to the proper place. This is of particular note in configurations
* that may output multiple packets for a single encoded frame (e.g., lagged
* encoding) or if the application does not reset the buffer periodically.
*
* Applications may restore the default behavior of the codec providing
* the compressed data buffer by calling this function with a NULL
* buffer.
*
* Applications \ref MUSTNOT call this function during iteration of
* aom_codec_get_cx_data().
*
* \param[in] ctx Pointer to this instance's context
* \param[in] buf Buffer to store compressed data into
* \param[in] pad_before Bytes to skip before writing compressed data
* \param[in] pad_after Bytes to skip after writing compressed data
*
* \retval #AOM_CODEC_OK
* The buffer was set successfully.
* \retval #AOM_CODEC_INVALID_PARAM
* A parameter was NULL, the image format is unsupported, etc.
*/
aom_codec_err_t aom_codec_set_cx_data_buf(aom_codec_ctx_t *ctx,
const aom_fixed_buf_t *buf,
unsigned int pad_before,
unsigned int pad_after);
/*!\brief Encoded data iterator
*
* Iterates over a list of data packets to be passed from the encoder to the
* application. The different kinds of packets available are enumerated in
* #aom_codec_cx_pkt_kind.
*
* #AOM_CODEC_CX_FRAME_PKT packets should be passed to the application's
* muxer. Multiple compressed frames may be in the list.
* #AOM_CODEC_STATS_PKT packets should be appended to a global buffer.
*
* The application \ref MUST silently ignore any packet kinds that it does
* not recognize or support.
*
* The data buffers returned from this function are only guaranteed to be
* valid until the application makes another call to any aom_codec_* function.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] iter Iterator storage, initialized to NULL
*
* \return Returns a pointer to an output data packet (compressed frame data,
* two-pass statistics, etc.) or NULL to signal end-of-list.
*
*/
const aom_codec_cx_pkt_t *aom_codec_get_cx_data(aom_codec_ctx_t *ctx,
aom_codec_iter_t *iter);
/*!\brief Get Preview Frame
*
* Returns an image that can be used as a preview. Shows the image as it would
* exist at the decompressor. The application \ref MUST NOT write into this
* image buffer.
*
* \param[in] ctx Pointer to this instance's context
*
* \return Returns a pointer to a preview image, or NULL if no image is
* available.
*
*/
const aom_image_t *aom_codec_get_preview_frame(aom_codec_ctx_t *ctx);
/*!@} - end defgroup encoder*/
#ifdef __cplusplus
}
#endif
#endif // AOM_AOM_ENCODER_H_

View File

@ -1,15 +1,16 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef VPX_VPX_FRAME_BUFFER_H_
#define VPX_VPX_FRAME_BUFFER_H_
#ifndef AOM_AOM_FRAME_BUFFER_H_
#define AOM_AOM_FRAME_BUFFER_H_
/*!\file
* \brief Describes the decoder external frame buffer interface.
@ -19,28 +20,28 @@
extern "C" {
#endif
#include "./vpx_integer.h"
#include "./aom_integer.h"
/*!\brief The maximum number of work buffers used by libvpx.
/*!\brief The maximum number of work buffers used by libaom.
* Support maximum 4 threads to decode video in parallel.
* Each thread will use one work buffer.
* TODO(hkuang): Add support to set number of worker threads dynamically.
*/
#define VPX_MAXIMUM_WORK_BUFFERS 8
#define AOM_MAXIMUM_WORK_BUFFERS 8
/*!\brief The maximum number of reference buffers that a VP9 encoder may use.
/*!\brief The maximum number of reference buffers that a AV1 encoder may use.
*/
#define VP9_MAXIMUM_REF_BUFFERS 8
#define AOM_MAXIMUM_REF_BUFFERS 8
/*!\brief External frame buffer
*
* This structure holds allocated frame buffers used by the decoder.
*/
typedef struct vpx_codec_frame_buffer {
typedef struct aom_codec_frame_buffer {
uint8_t *data; /**< Pointer to the data buffer */
size_t size; /**< Size of data in bytes */
void *priv; /**< Frame's private data */
} vpx_codec_frame_buffer_t;
} aom_codec_frame_buffer_t;
/*!\brief get frame buffer callback prototype
*
@ -51,17 +52,17 @@ typedef struct vpx_codec_frame_buffer {
* to the allocated size. The application does not need to align the allocated
* data. The callback is triggered when the decoder needs a frame buffer to
* decode a compressed image into. This function may be called more than once
* for every call to vpx_codec_decode. The application may set fb->priv to
* for every call to aom_codec_decode. The application may set fb->priv to
* some data which will be passed back in the ximage and the release function
* call. |fb| is guaranteed to not be NULL. On success the callback must
* return 0. Any failure the callback must return a value less than 0.
*
* \param[in] priv Callback's private data
* \param[in] new_size Size in bytes needed by the buffer
* \param[in,out] fb Pointer to vpx_codec_frame_buffer_t
* \param[in,out] fb Pointer to aom_codec_frame_buffer_t
*/
typedef int (*vpx_get_frame_buffer_cb_fn_t)(
void *priv, size_t min_size, vpx_codec_frame_buffer_t *fb);
typedef int (*aom_get_frame_buffer_cb_fn_t)(void *priv, size_t min_size,
aom_codec_frame_buffer_t *fb);
/*!\brief release frame buffer callback prototype
*
@ -71,13 +72,13 @@ typedef int (*vpx_get_frame_buffer_cb_fn_t)(
* a value less than 0.
*
* \param[in] priv Callback's private data
* \param[in] fb Pointer to vpx_codec_frame_buffer_t
* \param[in] fb Pointer to aom_codec_frame_buffer_t
*/
typedef int (*vpx_release_frame_buffer_cb_fn_t)(
void *priv, vpx_codec_frame_buffer_t *fb);
typedef int (*aom_release_frame_buffer_cb_fn_t)(void *priv,
aom_codec_frame_buffer_t *fb);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // VPX_VPX_FRAME_BUFFER_H_
#endif // AOM_AOM_FRAME_BUFFER_H_

225
aom/aom_image.h Normal file
View File

@ -0,0 +1,225 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Describes the aom image descriptor and associated operations
*
*/
#ifndef AOM_AOM_IMAGE_H_
#define AOM_AOM_IMAGE_H_
#ifdef __cplusplus
extern "C" {
#endif
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_IMAGE_ABI_VERSION (4) /**<\hideinitializer*/
#define AOM_IMG_FMT_PLANAR 0x100 /**< Image is a planar format. */
#define AOM_IMG_FMT_UV_FLIP 0x200 /**< V plane precedes U in memory. */
#define AOM_IMG_FMT_HAS_ALPHA 0x400 /**< Image has an alpha channel. */
#define AOM_IMG_FMT_HIGHBITDEPTH 0x800 /**< Image uses 16bit framebuffer. */
/*!\brief List of supported image formats */
typedef enum aom_img_fmt {
AOM_IMG_FMT_NONE,
AOM_IMG_FMT_RGB24, /**< 24 bit per pixel packed RGB */
AOM_IMG_FMT_RGB32, /**< 32 bit per pixel packed 0RGB */
AOM_IMG_FMT_RGB565, /**< 16 bit per pixel, 565 */
AOM_IMG_FMT_RGB555, /**< 16 bit per pixel, 555 */
AOM_IMG_FMT_UYVY, /**< UYVY packed YUV */
AOM_IMG_FMT_YUY2, /**< YUYV packed YUV */
AOM_IMG_FMT_YVYU, /**< YVYU packed YUV */
AOM_IMG_FMT_BGR24, /**< 24 bit per pixel packed BGR */
AOM_IMG_FMT_RGB32_LE, /**< 32 bit packed BGR0 */
AOM_IMG_FMT_ARGB, /**< 32 bit packed ARGB, alpha=255 */
AOM_IMG_FMT_ARGB_LE, /**< 32 bit packed BGRA, alpha=255 */
AOM_IMG_FMT_RGB565_LE, /**< 16 bit per pixel, gggbbbbb rrrrrggg */
AOM_IMG_FMT_RGB555_LE, /**< 16 bit per pixel, gggbbbbb 0rrrrrgg */
AOM_IMG_FMT_YV12 =
AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_UV_FLIP | 1, /**< planar YVU */
AOM_IMG_FMT_I420 = AOM_IMG_FMT_PLANAR | 2,
AOM_IMG_FMT_AOMYV12 = AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_UV_FLIP |
3, /** < planar 4:2:0 format with aom color space */
AOM_IMG_FMT_AOMI420 = AOM_IMG_FMT_PLANAR | 4,
AOM_IMG_FMT_I422 = AOM_IMG_FMT_PLANAR | 5,
AOM_IMG_FMT_I444 = AOM_IMG_FMT_PLANAR | 6,
AOM_IMG_FMT_I440 = AOM_IMG_FMT_PLANAR | 7,
AOM_IMG_FMT_444A = AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_HAS_ALPHA | 6,
AOM_IMG_FMT_I42016 = AOM_IMG_FMT_I420 | AOM_IMG_FMT_HIGHBITDEPTH,
AOM_IMG_FMT_I42216 = AOM_IMG_FMT_I422 | AOM_IMG_FMT_HIGHBITDEPTH,
AOM_IMG_FMT_I44416 = AOM_IMG_FMT_I444 | AOM_IMG_FMT_HIGHBITDEPTH,
AOM_IMG_FMT_I44016 = AOM_IMG_FMT_I440 | AOM_IMG_FMT_HIGHBITDEPTH
} aom_img_fmt_t; /**< alias for enum aom_img_fmt */
/*!\brief List of supported color spaces */
typedef enum aom_color_space {
AOM_CS_UNKNOWN = 0, /**< Unknown */
AOM_CS_BT_601 = 1, /**< BT.601 */
AOM_CS_BT_709 = 2, /**< BT.709 */
AOM_CS_SMPTE_170 = 3, /**< SMPTE.170 */
AOM_CS_SMPTE_240 = 4, /**< SMPTE.240 */
AOM_CS_BT_2020 = 5, /**< BT.2020 */
AOM_CS_RESERVED = 6, /**< Reserved */
AOM_CS_SRGB = 7 /**< sRGB */
} aom_color_space_t; /**< alias for enum aom_color_space */
/*!\brief List of supported color range */
typedef enum aom_color_range {
AOM_CR_STUDIO_RANGE = 0, /**< Y [16..235], UV [16..240] */
AOM_CR_FULL_RANGE = 1 /**< YUV/RGB [0..255] */
} aom_color_range_t; /**< alias for enum aom_color_range */
/**\brief Image Descriptor */
typedef struct aom_image {
aom_img_fmt_t fmt; /**< Image Format */
aom_color_space_t cs; /**< Color Space */
aom_color_range_t range; /**< Color Range */
/* Image storage dimensions */
unsigned int w; /**< Stored image width */
unsigned int h; /**< Stored image height */
unsigned int bit_depth; /**< Stored image bit-depth */
/* Image display dimensions */
unsigned int d_w; /**< Displayed image width */
unsigned int d_h; /**< Displayed image height */
/* Image intended rendering dimensions */
unsigned int r_w; /**< Intended rendering image width */
unsigned int r_h; /**< Intended rendering image height */
/* Chroma subsampling info */
unsigned int x_chroma_shift; /**< subsampling order, X */
unsigned int y_chroma_shift; /**< subsampling order, Y */
/* Image data pointers. */
#define AOM_PLANE_PACKED 0 /**< To be used for all packed formats */
#define AOM_PLANE_Y 0 /**< Y (Luminance) plane */
#define AOM_PLANE_U 1 /**< U (Chroma) plane */
#define AOM_PLANE_V 2 /**< V (Chroma) plane */
#define AOM_PLANE_ALPHA 3 /**< A (Transparency) plane */
unsigned char *planes[4]; /**< pointer to the top left pixel for each plane */
int stride[4]; /**< stride between rows for each plane */
int bps; /**< bits per sample (for packed formats) */
/*!\brief The following member may be set by the application to associate
* data with this image.
*/
void *user_priv;
/* The following members should be treated as private. */
unsigned char *img_data; /**< private */
int img_data_owner; /**< private */
int self_allocd; /**< private */
void *fb_priv; /**< Frame buffer data associated with the image. */
} aom_image_t; /**< alias for struct aom_image */
/**\brief Representation of a rectangle on a surface */
typedef struct aom_image_rect {
unsigned int x; /**< leftmost column */
unsigned int y; /**< topmost row */
unsigned int w; /**< width */
unsigned int h; /**< height */
} aom_image_rect_t; /**< alias for struct aom_image_rect */
/*!\brief Open a descriptor, allocating storage for the underlying image
*
* Returns a descriptor for storing an image of the given format. The
* storage for the descriptor is allocated on the heap.
*
* \param[in] img Pointer to storage for descriptor. If this parameter
* is NULL, the storage for the descriptor will be
* allocated on the heap.
* \param[in] fmt Format for the image
* \param[in] d_w Width of the image
* \param[in] d_h Height of the image
* \param[in] align Alignment, in bytes, of the image buffer and
* each row in the image(stride).
*
* \return Returns a pointer to the initialized image descriptor. If the img
* parameter is non-null, the value of the img parameter will be
* returned.
*/
aom_image_t *aom_img_alloc(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int align);
/*!\brief Open a descriptor, using existing storage for the underlying image
*
* Returns a descriptor for storing an image of the given format. The
* storage for descriptor has been allocated elsewhere, and a descriptor is
* desired to "wrap" that storage.
*
* \param[in] img Pointer to storage for descriptor. If this parameter
* is NULL, the storage for the descriptor will be
* allocated on the heap.
* \param[in] fmt Format for the image
* \param[in] d_w Width of the image
* \param[in] d_h Height of the image
* \param[in] align Alignment, in bytes, of each row in the image.
* \param[in] img_data Storage to use for the image
*
* \return Returns a pointer to the initialized image descriptor. If the img
* parameter is non-null, the value of the img parameter will be
* returned.
*/
aom_image_t *aom_img_wrap(aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w,
unsigned int d_h, unsigned int align,
unsigned char *img_data);
/*!\brief Set the rectangle identifying the displayed portion of the image
*
* Updates the displayed rectangle (aka viewport) on the image surface to
* match the specified coordinates and size.
*
* \param[in] img Image descriptor
* \param[in] x leftmost column
* \param[in] y topmost row
* \param[in] w width
* \param[in] h height
*
* \return 0 if the requested rectangle is valid, nonzero otherwise.
*/
int aom_img_set_rect(aom_image_t *img, unsigned int x, unsigned int y,
unsigned int w, unsigned int h);
/*!\brief Flip the image vertically (top for bottom)
*
* Adjusts the image descriptor's pointers and strides to make the image
* be referenced upside-down.
*
* \param[in] img Image descriptor
*/
void aom_img_flip(aom_image_t *img);
/*!\brief Close an image descriptor
*
* Frees all allocated storage associated with an image descriptor.
*
* \param[in] img Image descriptor
*/
void aom_img_free(aom_image_t *img);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOM_IMAGE_H_

64
aom/aom_integer.h Normal file
View File

@ -0,0 +1,64 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOM_INTEGER_H_
#define AOM_AOM_INTEGER_H_
/* get ptrdiff_t, size_t, wchar_t, NULL */
#include <stddef.h>
#if defined(_MSC_VER)
#define AOM_FORCE_INLINE __forceinline
#define AOM_INLINE __inline
#else
#define AOM_FORCE_INLINE __inline__ __attribute__((always_inline))
// TODO(jbb): Allow a way to force inline off for older compilers.
#define AOM_INLINE inline
#endif
#if defined(AOM_EMULATE_INTTYPES)
typedef signed char int8_t;
typedef signed short int16_t;
typedef signed int int32_t;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
#ifndef _UINTPTR_T_DEFINED
typedef size_t uintptr_t;
#endif
#else
/* Most platforms have the C99 standard integer types. */
#if defined(__cplusplus)
#if !defined(__STDC_FORMAT_MACROS)
#define __STDC_FORMAT_MACROS
#endif
#if !defined(__STDC_LIMIT_MACROS)
#define __STDC_LIMIT_MACROS
#endif
#endif // __cplusplus
#include <stdint.h>
#endif
/* VS2010 defines stdint.h, but not inttypes.h */
#if defined(_MSC_VER) && _MSC_VER < 1800
#define PRId64 "I64d"
#else
#include <inttypes.h>
#endif
#endif // AOM_AOM_INTEGER_H_

759
aom/aomcx.h Normal file
View File

@ -0,0 +1,759 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_AOMCX_H_
#define AOM_AOMCX_H_
/*!\defgroup aom_encoder AOMedia AOM/AV1 Encoder
* \ingroup aom
*
* @{
*/
#include "./aom.h"
#include "./aom_encoder.h"
/*!\file
* \brief Provides definitions for using AOM or AV1 encoder algorithm within the
* aom Codec Interface.
*/
#ifdef __cplusplus
extern "C" {
#endif
/*!\name Algorithm interface for AV1
*
* This interface provides the capability to encode raw AV1 streams.
* @{
*/
extern aom_codec_iface_t aom_codec_av1_cx_algo;
extern aom_codec_iface_t *aom_codec_av1_cx(void);
/*!@} - end algorithm interface member group*/
/*
* Algorithm Flags
*/
/*!\brief Don't reference the last frame
*
* When this flag is set, the encoder will not use the last frame as a
* predictor. When not set, the encoder will choose whether to use the
* last frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_LAST (1 << 16)
/*!\brief Don't reference the golden frame
*
* When this flag is set, the encoder will not use the golden frame as a
* predictor. When not set, the encoder will choose whether to use the
* golden frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_GF (1 << 17)
/*!\brief Don't reference the alternate reference frame
*
* When this flag is set, the encoder will not use the alt ref frame as a
* predictor. When not set, the encoder will choose whether to use the
* alt ref frame or not automatically.
*/
#define AOM_EFLAG_NO_REF_ARF (1 << 21)
/*!\brief Don't update the last frame
*
* When this flag is set, the encoder will not update the last frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_LAST (1 << 18)
/*!\brief Don't update the golden frame
*
* When this flag is set, the encoder will not update the golden frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_GF (1 << 22)
/*!\brief Don't update the alternate reference frame
*
* When this flag is set, the encoder will not update the alt ref frame with
* the contents of the current frame.
*/
#define AOM_EFLAG_NO_UPD_ARF (1 << 23)
/*!\brief Force golden frame update
*
* When this flag is set, the encoder copy the contents of the current frame
* to the golden frame buffer.
*/
#define AOM_EFLAG_FORCE_GF (1 << 19)
/*!\brief Force alternate reference frame update
*
* When this flag is set, the encoder copy the contents of the current frame
* to the alternate reference frame buffer.
*/
#define AOM_EFLAG_FORCE_ARF (1 << 24)
/*!\brief Disable entropy update
*
* When this flag is set, the encoder will not update its internal entropy
* model based on the entropy of this frame.
*/
#define AOM_EFLAG_NO_UPD_ENTROPY (1 << 20)
/*!\brief AVx encoder control functions
*
* This set of macros define the control functions available for AVx
* encoder interface.
*
* \sa #aom_codec_control
*/
enum aome_enc_control_id {
/*!\brief Codec control function to set which reference frame encoder can use.
*
* Supported in codecs: VP8, AV1
*/
AOME_USE_REFERENCE = 7,
/*!\brief Codec control function to pass an ROI map to encoder.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ROI_MAP = 8,
/*!\brief Codec control function to pass an Active map to encoder.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ACTIVEMAP,
/*!\brief Codec control function to set encoder scaling mode.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_SCALEMODE = 11,
/*!\brief Codec control function to set encoder internal speed settings.
*
* Changes in this value influences, among others, the encoder's selection
* of motion estimation methods. Values greater than 0 will increase encoder
* speed at the expense of quality.
*
* \note Valid range for VP8: -16..16
* \note Valid range for AV1: -8..8
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_CPUUSED = 13,
/*!\brief Codec control function to enable automatic set and use alf frames.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ENABLEAUTOALTREF,
#if CONFIG_EXT_REFS
/*!\brief Codec control function to enable automatic set and use
* bwd-pred frames.
*
* Supported in codecs: AV1
*/
AOME_SET_ENABLEAUTOBWDREF,
#endif // CONFIG_EXT_REFS
/*!\brief control function to set noise sensitivity
*
* 0: off, 1: OnYOnly, 2: OnYUV,
* 3: OnYUVAggressive, 4: Adaptive
*
* Supported in codecs: VP8
*/
AOME_SET_NOISE_SENSITIVITY,
/*!\brief Codec control function to set sharpness.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_SHARPNESS,
/*!\brief Codec control function to set the threshold for MBs treated static.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_STATIC_THRESHOLD,
/*!\brief Codec control function to set the number of token partitions.
*
* Supported in codecs: VP8
*/
AOME_SET_TOKEN_PARTITIONS,
/*!\brief Codec control function to get last quantizer chosen by the encoder.
*
* Return value uses internal quantizer scale defined by the codec.
*
* Supported in codecs: VP8, AV1
*/
AOME_GET_LAST_QUANTIZER,
/*!\brief Codec control function to get last quantizer chosen by the encoder.
*
* Return value uses the 0..63 scale as used by the rc_*_quantizer config
* parameters.
*
* Supported in codecs: VP8, AV1
*/
AOME_GET_LAST_QUANTIZER_64,
/*!\brief Codec control function to set the max no of frames to create arf.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ARNR_MAXFRAMES,
/*!\brief Codec control function to set the filter strength for the arf.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_ARNR_STRENGTH,
/*!\deprecated control function to set the filter type to use for the arf. */
AOME_SET_ARNR_TYPE,
/*!\brief Codec control function to set visual tuning.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_TUNING,
/*!\brief Codec control function to set constrained quality level.
*
* \attention For this value to be used aom_codec_enc_cfg_t::g_usage must be
* set to #AOM_CQ.
* \note Valid range: 0..63
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_CQ_LEVEL,
/*!\brief Codec control function to set Max data rate for Intra frames.
*
* This value controls additional clamping on the maximum size of a
* keyframe. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* unlimited, or no additional clamping beyond the codec's built-in
* algorithm.
*
* For example, to allocate no more than 4.5 frames worth of bitrate
* to a keyframe, set this to 450.
*
* Supported in codecs: VP8, AV1
*/
AOME_SET_MAX_INTRA_BITRATE_PCT,
/*!\brief Codec control function to set reference and update frame flags.
*
* Supported in codecs: VP8
*/
AOME_SET_FRAME_FLAGS,
/*!\brief Codec control function to set max data rate for Inter frames.
*
* This value controls additional clamping on the maximum size of an
* inter frame. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* unlimited, or no additional clamping beyond the codec's built-in
* algorithm.
*
* For example, to allow no more than 4.5 frames worth of bitrate
* to an inter frame, set this to 450.
*
* Supported in codecs: AV1
*/
AV1E_SET_MAX_INTER_BITRATE_PCT,
/*!\brief Boost percentage for Golden Frame in CBR mode.
*
* This value controls the amount of boost given to Golden Frame in
* CBR mode. It is expressed as a percentage of the average
* per-frame bitrate, with the special (and default) value 0 meaning
* the feature is off, i.e., no golden frame boost in CBR mode and
* average bitrate target is used.
*
* For example, to allow 100% more bits, i.e, 2X, in a golden frame
* than average frame, set this to 100.
*
* Supported in codecs: AV1
*/
AV1E_SET_GF_CBR_BOOST_PCT,
/*!\brief Codec control function to set encoder screen content mode.
*
* 0: off, 1: On, 2: On with more aggressive rate control.
*
* Supported in codecs: VP8
*/
AOME_SET_SCREEN_CONTENT_MODE,
/*!\brief Codec control function to set lossless encoding mode.
*
* AV1 can operate in lossless encoding mode, in which the bitstream
* produced will be able to decode and reconstruct a perfect copy of
* input source. This control function provides a mean to switch encoder
* into lossless coding mode(1) or normal coding mode(0) that may be lossy.
* 0 = lossy coding mode
* 1 = lossless coding mode
*
* By default, encoder operates in normal coding mode (maybe lossy).
*
* Supported in codecs: AV1
*/
AV1E_SET_LOSSLESS,
#if CONFIG_AOM_QM
/*!\brief Codec control function to encode with quantisation matrices.
*
* AOM can operate with default quantisation matrices dependent on
* quantisation level and block type.
* 0 = do not use quantisation matrices
* 1 = use quantisation matrices
*
* By default, the encoder operates without quantisation matrices.
*
* Supported in codecs: AOM
*/
AV1E_SET_ENABLE_QM,
/*!\brief Codec control function to set the min quant matrix flatness.
*
* AOM can operate with different ranges of quantisation matrices.
* As quantisation levels increase, the matrices get flatter. This
* control sets the minimum level of flatness from which the matrices
* are determined.
*
* By default, the encoder sets this minimum at half the available
* range.
*
* Supported in codecs: AOM
*/
AV1E_SET_QM_MIN,
/*!\brief Codec control function to set the max quant matrix flatness.
*
* AOM can operate with different ranges of quantisation matrices.
* As quantisation levels increase, the matrices get flatter. This
* control sets the maximum level of flatness possible.
*
* By default, the encoder sets this maximum at the top of the
* available range.
*
* Supported in codecs: AOM
*/
AV1E_SET_QM_MAX,
#endif
/*!\brief Codec control function to set number of tile columns.
*
* In encoding and decoding, AV1 allows an input image frame be partitioned
* into separated vertical tile columns, which can be encoded or decoded
* independently. This enables easy implementation of parallel encoding and
* decoding. This control requests the encoder to use column tiles in
* encoding an input frame, with number of tile columns (in Log2 unit) as
* the parameter:
* 0 = 1 tile column
* 1 = 2 tile columns
* 2 = 4 tile columns
* .....
* n = 2**n tile columns
* The requested tile columns will be capped by encoder based on image size
* limitation (The minimum width of a tile column is 256 pixel, the maximum
* is 4096).
*
* By default, the value is 0, i.e. one single column tile for entire image.
*
* Supported in codecs: AV1
*/
AV1E_SET_TILE_COLUMNS,
/*!\brief Codec control function to set number of tile rows.
*
* In encoding and decoding, AV1 allows an input image frame be partitioned
* into separated horizontal tile rows. Tile rows are encoded or decoded
* sequentially. Even though encoding/decoding of later tile rows depends on
* earlier ones, this allows the encoder to output data packets for tile rows
* prior to completely processing all tile rows in a frame, thereby reducing
* the latency in processing between input and output. The parameter
* for this control describes the number of tile rows, which has a valid
* range [0, 2]:
* 0 = 1 tile row
* 1 = 2 tile rows
* 2 = 4 tile rows
*
* By default, the value is 0, i.e. one single row tile for entire image.
*
* Supported in codecs: AV1
*/
AV1E_SET_TILE_ROWS,
/*!\brief Codec control function to enable frame parallel decoding feature.
*
* AV1 has a bitstream feature to reduce decoding dependency between frames
* by turning off backward update of probability context used in encoding
* and decoding. This allows staged parallel processing of more than one
* video frames in the decoder. This control function provides a mean to
* turn this feature on or off for bitstreams produced by encoder.
*
* By default, this feature is off.
*
* Supported in codecs: AV1
*/
AV1E_SET_FRAME_PARALLEL_DECODING,
/*!\brief Codec control function to set adaptive quantization mode.
*
* AV1 has a segment based feature that allows encoder to adaptively change
* quantization parameter for each segment within a frame to improve the
* subjective quality. This control makes encoder operate in one of the
* several AQ_modes supported.
*
* By default, encoder operates with AQ_Mode 0(adaptive quantization off).
*
* Supported in codecs: AV1
*/
AV1E_SET_AQ_MODE,
/*!\brief Codec control function to enable/disable periodic Q boost.
*
* One AV1 encoder speed feature is to enable quality boost by lowering
* frame level Q periodically. This control function provides a mean to
* turn on/off this feature.
* 0 = off
* 1 = on
*
* By default, the encoder is allowed to use this feature for appropriate
* encoding modes.
*
* Supported in codecs: AV1
*/
AV1E_SET_FRAME_PERIODIC_BOOST,
/*!\brief Codec control function to set noise sensitivity.
*
* 0: off, 1: On(YOnly)
*
* Supported in codecs: AV1
*/
AV1E_SET_NOISE_SENSITIVITY,
/*!\brief Codec control function to set content type.
* \note Valid parameter range:
* AOM_CONTENT_DEFAULT = Regular video content (Default)
* AOM_CONTENT_SCREEN = Screen capture content
*
* Supported in codecs: AV1
*/
AV1E_SET_TUNE_CONTENT,
/*!\brief Codec control function to set color space info.
* \note Valid ranges: 0..7, default is "UNKNOWN".
* 0 = UNKNOWN,
* 1 = BT_601
* 2 = BT_709
* 3 = SMPTE_170
* 4 = SMPTE_240
* 5 = BT_2020
* 6 = RESERVED
* 7 = SRGB
*
* Supported in codecs: AV1
*/
AV1E_SET_COLOR_SPACE,
/*!\brief Codec control function to set minimum interval between GF/ARF frames
*
* By default the value is set as 4.
*
* Supported in codecs: AV1
*/
AV1E_SET_MIN_GF_INTERVAL,
/*!\brief Codec control function to set minimum interval between GF/ARF frames
*
* By default the value is set as 16.
*
* Supported in codecs: AV1
*/
AV1E_SET_MAX_GF_INTERVAL,
/*!\brief Codec control function to get an Active map back from the encoder.
*
* Supported in codecs: AV1
*/
AV1E_GET_ACTIVEMAP,
/*!\brief Codec control function to set color range bit.
* \note Valid ranges: 0..1, default is 0
* 0 = Limited range (16..235 or HBD equivalent)
* 1 = Full range (0..255 or HBD equivalent)
*
* Supported in codecs: AV1
*/
AV1E_SET_COLOR_RANGE,
/*!\brief Codec control function to set intended rendering image size.
*
* By default, this is identical to the image size in pixels.
*
* Supported in codecs: AV1
*/
AV1E_SET_RENDER_SIZE,
/*!\brief Codec control function to set target level.
*
* 255: off (default); 0: only keep level stats; 10: target for level 1.0;
* 11: target for level 1.1; ... 62: target for level 6.2
*
* Supported in codecs: AV1
*/
AV1E_SET_TARGET_LEVEL,
/*!\brief Codec control function to get bitstream level.
*
* Supported in codecs: AV1
*/
AV1E_GET_LEVEL,
/*!\brief Codec control function to set intended superblock size.
*
* By default, the superblock size is determined separately for each
* frame by the encoder.
*
* Supported in codecs: AV1
*/
AV1E_SET_SUPERBLOCK_SIZE,
};
/*!\brief aom 1-D scaling mode
*
* This set of constants define 1-D aom scaling modes
*/
typedef enum aom_scaling_mode_1d {
AOME_NORMAL = 0,
AOME_FOURFIVE = 1,
AOME_THREEFIVE = 2,
AOME_ONETWO = 3
} AOM_SCALING_MODE;
/*!\brief aom region of interest map
*
* These defines the data structures for the region of interest map
*
*/
typedef struct aom_roi_map {
/*! An id between 0 and 3 for each 16x16 region within a frame. */
unsigned char *roi_map;
unsigned int rows; /**< Number of rows. */
unsigned int cols; /**< Number of columns. */
// TODO(paulwilkins): broken for AV1 which has 8 segments
// q and loop filter deltas for each segment
// (see MAX_MB_SEGMENTS)
int delta_q[4]; /**< Quantizer deltas. */
int delta_lf[4]; /**< Loop filter deltas. */
/*! Static breakout threshold for each segment. */
unsigned int static_threshold[4];
} aom_roi_map_t;
/*!\brief aom active region map
*
* These defines the data structures for active region map
*
*/
typedef struct aom_active_map {
/*!\brief specify an on (1) or off (0) each 16x16 region within a frame */
unsigned char *active_map;
unsigned int rows; /**< number of rows */
unsigned int cols; /**< number of cols */
} aom_active_map_t;
/*!\brief aom image scaling mode
*
* This defines the data structure for image scaling mode
*
*/
typedef struct aom_scaling_mode {
AOM_SCALING_MODE h_scaling_mode; /**< horizontal scaling mode */
AOM_SCALING_MODE v_scaling_mode; /**< vertical scaling mode */
} aom_scaling_mode_t;
/*!\brief VP8 token partition mode
*
* This defines VP8 partitioning mode for compressed data, i.e., the number of
* sub-streams in the bitstream. Used for parallelized decoding.
*
*/
typedef enum {
AOM_ONE_TOKENPARTITION = 0,
AOM_TWO_TOKENPARTITION = 1,
AOM_FOUR_TOKENPARTITION = 2,
AOM_EIGHT_TOKENPARTITION = 3
} aome_token_partitions;
/*!brief AV1 encoder content type */
typedef enum {
AOM_CONTENT_DEFAULT,
AOM_CONTENT_SCREEN,
AOM_CONTENT_INVALID
} aom_tune_content;
/*!\brief VP8 model tuning parameters
*
* Changes the encoder to tune for certain types of input material.
*
*/
typedef enum { AOM_TUNE_PSNR, AOM_TUNE_SSIM } aom_tune_metric;
/*!\cond */
/*!\brief VP8 encoder control function parameter type
*
* Defines the data types that VP8E control functions take. Note that
* additional common controls are defined in aom.h
*
*/
AOM_CTRL_USE_TYPE_DEPRECATED(AOME_USE_REFERENCE, int)
#define AOM_CTRL_AOME_USE_REFERENCE
AOM_CTRL_USE_TYPE(AOME_SET_FRAME_FLAGS, int)
#define AOM_CTRL_AOME_SET_FRAME_FLAGS
AOM_CTRL_USE_TYPE(AOME_SET_ROI_MAP, aom_roi_map_t *)
#define AOM_CTRL_AOME_SET_ROI_MAP
AOM_CTRL_USE_TYPE(AOME_SET_ACTIVEMAP, aom_active_map_t *)
#define AOM_CTRL_AOME_SET_ACTIVEMAP
AOM_CTRL_USE_TYPE(AOME_SET_SCALEMODE, aom_scaling_mode_t *)
#define AOM_CTRL_AOME_SET_SCALEMODE
AOM_CTRL_USE_TYPE(AOME_SET_CPUUSED, int)
#define AOM_CTRL_AOME_SET_CPUUSED
AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOALTREF, unsigned int)
#define AOM_CTRL_AOME_SET_ENABLEAUTOALTREF
#if CONFIG_EXT_REFS
AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOBWDREF, unsigned int)
#define AOM_CTRL_AOME_SET_ENABLEAUTOBWDREF
#endif // CONFIG_EXT_REFS
AOM_CTRL_USE_TYPE(AOME_SET_NOISE_SENSITIVITY, unsigned int)
#define AOM_CTRL_AOME_SET_NOISE_SENSITIVITY
AOM_CTRL_USE_TYPE(AOME_SET_SHARPNESS, unsigned int)
#define AOM_CTRL_AOME_SET_SHARPNESS
AOM_CTRL_USE_TYPE(AOME_SET_STATIC_THRESHOLD, unsigned int)
#define AOM_CTRL_AOME_SET_STATIC_THRESHOLD
AOM_CTRL_USE_TYPE(AOME_SET_TOKEN_PARTITIONS, int) /* aome_token_partitions */
#define AOM_CTRL_AOME_SET_TOKEN_PARTITIONS
AOM_CTRL_USE_TYPE(AOME_SET_ARNR_MAXFRAMES, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_MAXFRAMES
AOM_CTRL_USE_TYPE(AOME_SET_ARNR_STRENGTH, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_STRENGTH
AOM_CTRL_USE_TYPE_DEPRECATED(AOME_SET_ARNR_TYPE, unsigned int)
#define AOM_CTRL_AOME_SET_ARNR_TYPE
AOM_CTRL_USE_TYPE(AOME_SET_TUNING, int) /* aom_tune_metric */
#define AOM_CTRL_AOME_SET_TUNING
AOM_CTRL_USE_TYPE(AOME_SET_CQ_LEVEL, unsigned int)
#define AOM_CTRL_AOME_SET_CQ_LEVEL
AOM_CTRL_USE_TYPE(AV1E_SET_TILE_COLUMNS, int)
#define AOM_CTRL_AV1E_SET_TILE_COLUMNS
AOM_CTRL_USE_TYPE(AV1E_SET_TILE_ROWS, int)
#define AOM_CTRL_AV1E_SET_TILE_ROWS
AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER, int *)
#define AOM_CTRL_AOME_GET_LAST_QUANTIZER
AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER_64, int *)
#define AOM_CTRL_AOME_GET_LAST_QUANTIZER_64
AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTRA_BITRATE_PCT, unsigned int)
#define AOM_CTRL_AOME_SET_MAX_INTRA_BITRATE_PCT
AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTER_BITRATE_PCT, unsigned int)
#define AOM_CTRL_AOME_SET_MAX_INTER_BITRATE_PCT
AOM_CTRL_USE_TYPE(AOME_SET_SCREEN_CONTENT_MODE, unsigned int)
#define AOM_CTRL_AOME_SET_SCREEN_CONTENT_MODE
AOM_CTRL_USE_TYPE(AV1E_SET_GF_CBR_BOOST_PCT, unsigned int)
#define AOM_CTRL_AV1E_SET_GF_CBR_BOOST_PCT
AOM_CTRL_USE_TYPE(AV1E_SET_LOSSLESS, unsigned int)
#define AOM_CTRL_AV1E_SET_LOSSLESS
#if CONFIG_AOM_QM
AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_QM, unsigned int)
#define AOM_CTRL_AV1E_SET_ENABLE_QM
AOM_CTRL_USE_TYPE(AV1E_SET_QM_MIN, unsigned int)
#define AOM_CTRL_AV1E_SET_QM_MIN
AOM_CTRL_USE_TYPE(AV1E_SET_QM_MAX, unsigned int)
#define AOM_CTRL_AV1E_SET_QM_MAX
#endif
AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PARALLEL_DECODING, unsigned int)
#define AOM_CTRL_AV1E_SET_FRAME_PARALLEL_DECODING
AOM_CTRL_USE_TYPE(AV1E_SET_AQ_MODE, unsigned int)
#define AOM_CTRL_AV1E_SET_AQ_MODE
AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PERIODIC_BOOST, unsigned int)
#define AOM_CTRL_AV1E_SET_FRAME_PERIODIC_BOOST
AOM_CTRL_USE_TYPE(AV1E_SET_NOISE_SENSITIVITY, unsigned int)
#define AOM_CTRL_AV1E_SET_NOISE_SENSITIVITY
AOM_CTRL_USE_TYPE(AV1E_SET_TUNE_CONTENT, int) /* aom_tune_content */
#define AOM_CTRL_AV1E_SET_TUNE_CONTENT
AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_SPACE, int)
#define AOM_CTRL_AV1E_SET_COLOR_SPACE
AOM_CTRL_USE_TYPE(AV1E_SET_MIN_GF_INTERVAL, unsigned int)
#define AOM_CTRL_AV1E_SET_MIN_GF_INTERVAL
AOM_CTRL_USE_TYPE(AV1E_SET_MAX_GF_INTERVAL, unsigned int)
#define AOM_CTRL_AV1E_SET_MAX_GF_INTERVAL
AOM_CTRL_USE_TYPE(AV1E_GET_ACTIVEMAP, aom_active_map_t *)
#define AOM_CTRL_AV1E_GET_ACTIVEMAP
AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_RANGE, int)
#define AOM_CTRL_AV1E_SET_COLOR_RANGE
/*!\brief
*
* TODO(rbultje) : add support of the control in ffmpeg
*/
#define AOM_CTRL_AV1E_SET_RENDER_SIZE
AOM_CTRL_USE_TYPE(AV1E_SET_RENDER_SIZE, int *)
AOM_CTRL_USE_TYPE(AV1E_SET_SUPERBLOCK_SIZE, unsigned int)
#define AOM_CTRL_AV1E_SET_SUPERBLOCK_SIZE
AOM_CTRL_USE_TYPE(AV1E_SET_TARGET_LEVEL, unsigned int)
#define AOM_CTRL_AV1E_SET_TARGET_LEVEL
AOM_CTRL_USE_TYPE(AV1E_GET_LEVEL, int *)
#define AOM_CTRL_AV1E_GET_LEVEL
/*!\endcond */
/*! @} - end defgroup vp8_encoder */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOMCX_H_

191
aom/aomdx.h Normal file
View File

@ -0,0 +1,191 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\defgroup aom_decoder AOMedia AOM/AV1 Decoder
* \ingroup aom
*
* @{
*/
/*!\file
* \brief Provides definitions for using AOM or AV1 within the aom Decoder
* interface.
*/
#ifndef AOM_AOMDX_H_
#define AOM_AOMDX_H_
#ifdef __cplusplus
extern "C" {
#endif
/* Include controls common to both the encoder and decoder */
#include "./aom.h"
/*!\name Algorithm interface for AV1
*
* This interface provides the capability to decode AV1 streams.
* @{
*/
extern aom_codec_iface_t aom_codec_av1_dx_algo;
extern aom_codec_iface_t *aom_codec_av1_dx(void);
/*!@} - end algorithm interface member group*/
/** Data structure that stores bit accounting for debug
*/
typedef struct Accounting Accounting;
/*!\enum aom_dec_control_id
* \brief AOM decoder control functions
*
* This set of macros define the control functions available for the AOM
* decoder interface.
*
* \sa #aom_codec_control
*/
enum aom_dec_control_id {
/** control function to get info on which reference frames were updated
* by the last decode
*/
AOMD_GET_LAST_REF_UPDATES = AOM_DECODER_CTRL_ID_START,
/** check if the indicated frame is corrupted */
AOMD_GET_FRAME_CORRUPTED,
/** control function to get info on which reference frames were used
* by the last decode
*/
AOMD_GET_LAST_REF_USED,
/** decryption function to decrypt encoded buffer data immediately
* before decoding. Takes a aom_decrypt_init, which contains
* a callback function and opaque context pointer.
*/
AOMD_SET_DECRYPTOR,
// AOMD_SET_DECRYPTOR = AOMD_SET_DECRYPTOR,
/** control function to get the dimensions that the current frame is decoded
* at. This may be different to the intended display size for the frame as
* specified in the wrapper or frame header (see AV1D_GET_DISPLAY_SIZE). */
AV1D_GET_FRAME_SIZE,
/** control function to get the current frame's intended display dimensions
* (as specified in the wrapper or frame header). This may be different to
* the decoded dimensions of this frame (see AV1D_GET_FRAME_SIZE). */
AV1D_GET_DISPLAY_SIZE,
/** control function to get the bit depth of the stream. */
AV1D_GET_BIT_DEPTH,
/** control function to set the byte alignment of the planes in the reference
* buffers. Valid values are power of 2, from 32 to 1024. A value of 0 sets
* legacy alignment. I.e. Y plane is aligned to 32 bytes, U plane directly
* follows Y plane, and V plane directly follows U plane. Default value is 0.
*/
AV1_SET_BYTE_ALIGNMENT,
/** control function to invert the decoding order to from right to left. The
* function is used in a test to confirm the decoding independence of tile
* columns. The function may be used in application where this order
* of decoding is desired.
*
* TODO(yaowu): Rework the unit test that uses this control, and in a future
* release, this test-only control shall be removed.
*/
AV1_INVERT_TILE_DECODE_ORDER,
/** control function to set the skip loop filter flag. Valid values are
* integers. The decoder will skip the loop filter when its value is set to
* nonzero. If the loop filter is skipped the decoder may accumulate decode
* artifacts. The default value is 0.
*/
AV1_SET_SKIP_LOOP_FILTER,
/** control function to retrieve a pointer to the Accounting struct. When
* compiled without --enable-accounting, this returns AOM_CODEC_INCAPABLE.
* If called before a frame has been decoded, this returns AOM_CODEC_ERROR.
* The caller should ensure that AOM_CODEC_OK is returned before attempting
* to dereference the Accounting pointer.
*/
AV1_GET_ACCOUNTING,
AOM_DECODER_CTRL_ID_MAX,
/** control function to set the range of tile decoding. A value that is
* greater and equal to zero indicates only the specific row/column is
* decoded. A value that is -1 indicates the whole row/column is decoded.
* A special case is both values are -1 that means the whole frame is
* decoded.
*/
AV1_SET_DECODE_TILE_ROW,
AV1_SET_DECODE_TILE_COL
};
/** Decrypt n bytes of data from input -> output, using the decrypt_state
* passed in AOMD_SET_DECRYPTOR.
*/
typedef void (*aom_decrypt_cb)(void *decrypt_state, const unsigned char *input,
unsigned char *output, int count);
/*!\brief Structure to hold decryption state
*
* Defines a structure to hold the decryption state and access function.
*/
typedef struct aom_decrypt_init {
/*! Decrypt callback. */
aom_decrypt_cb decrypt_cb;
/*! Decryption state. */
void *decrypt_state;
} aom_decrypt_init;
/*!\brief A deprecated alias for aom_decrypt_init.
*/
typedef aom_decrypt_init aom_decrypt_init;
/*!\cond */
/*!\brief AOM decoder control function parameter type
*
* Defines the data types that AOMD control functions take. Note that
* additional common controls are defined in aom.h
*
*/
AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_UPDATES, int *)
#define AOM_CTRL_AOMD_GET_LAST_REF_UPDATES
AOM_CTRL_USE_TYPE(AOMD_GET_FRAME_CORRUPTED, int *)
#define AOM_CTRL_AOMD_GET_FRAME_CORRUPTED
AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_USED, int *)
#define AOM_CTRL_AOMD_GET_LAST_REF_USED
AOM_CTRL_USE_TYPE(AOMD_SET_DECRYPTOR, aom_decrypt_init *)
#define AOM_CTRL_AOMD_SET_DECRYPTOR
// AOM_CTRL_USE_TYPE(AOMD_SET_DECRYPTOR, aom_decrypt_init *)
//#define AOM_CTRL_AOMD_SET_DECRYPTOR
AOM_CTRL_USE_TYPE(AV1D_GET_DISPLAY_SIZE, int *)
#define AOM_CTRL_AV1D_GET_DISPLAY_SIZE
AOM_CTRL_USE_TYPE(AV1D_GET_BIT_DEPTH, unsigned int *)
#define AOM_CTRL_AV1D_GET_BIT_DEPTH
AOM_CTRL_USE_TYPE(AV1D_GET_FRAME_SIZE, int *)
#define AOM_CTRL_AV1D_GET_FRAME_SIZE
AOM_CTRL_USE_TYPE(AV1_INVERT_TILE_DECODE_ORDER, int)
#define AOM_CTRL_AV1_INVERT_TILE_DECODE_ORDER
AOM_CTRL_USE_TYPE(AV1_GET_ACCOUNTING, Accounting **)
#define AOM_CTRL_AV1_GET_ACCOUNTING
AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_ROW, int)
#define AOM_CTRL_AV1_SET_DECODE_TILE_ROW
AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_COL, int)
#define AOM_CTRL_AV1_SET_DECODE_TILE_COL
/*!\endcond */
/*! @} - end defgroup aom_decoder */
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_AOMDX_H_

16
aom/exports_com Normal file
View File

@ -0,0 +1,16 @@
text aom_codec_build_config
text aom_codec_control_
text aom_codec_destroy
text aom_codec_err_to_string
text aom_codec_error
text aom_codec_error_detail
text aom_codec_get_caps
text aom_codec_iface_name
text aom_codec_version
text aom_codec_version_extra_str
text aom_codec_version_str
text aom_img_alloc
text aom_img_flip
text aom_img_free
text aom_img_set_rect
text aom_img_wrap

8
aom/exports_dec Normal file
View File

@ -0,0 +1,8 @@
text aom_codec_dec_init_ver
text aom_codec_decode
text aom_codec_get_frame
text aom_codec_get_stream_info
text aom_codec_peek_stream_info
text aom_codec_register_put_frame_cb
text aom_codec_register_put_slice_cb
text aom_codec_set_frame_buffer_functions

9
aom/exports_enc Normal file
View File

@ -0,0 +1,9 @@
text aom_codec_enc_config_default
text aom_codec_enc_config_set
text aom_codec_enc_init_multi_ver
text aom_codec_enc_init_ver
text aom_codec_encode
text aom_codec_get_cx_data
text aom_codec_get_global_headers
text aom_codec_get_preview_frame
text aom_codec_set_cx_data_buf

View File

@ -0,0 +1,465 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Describes the decoder algorithm interface for algorithm
* implementations.
*
* This file defines the private structures and data types that are only
* relevant to implementing an algorithm, as opposed to using it.
*
* To create a decoder algorithm class, an interface structure is put
* into the global namespace:
* <pre>
* my_codec.c:
* aom_codec_iface_t my_codec = {
* "My Codec v1.0",
* AOM_CODEC_ALG_ABI_VERSION,
* ...
* };
* </pre>
*
* An application instantiates a specific decoder instance by using
* aom_codec_init() and a pointer to the algorithm's interface structure:
* <pre>
* my_app.c:
* extern aom_codec_iface_t my_codec;
* {
* aom_codec_ctx_t algo;
* res = aom_codec_init(&algo, &my_codec);
* }
* </pre>
*
* Once initialized, the instance is manged using other functions from
* the aom_codec_* family.
*/
#ifndef AOM_INTERNAL_AOM_CODEC_INTERNAL_H_
#define AOM_INTERNAL_AOM_CODEC_INTERNAL_H_
#include "./aom_config.h"
#include "../aom_decoder.h"
#include "../aom_encoder.h"
#include <stdarg.h>
#ifdef __cplusplus
extern "C" {
#endif
/*!\brief Current ABI version number
*
* \internal
* If this file is altered in any way that changes the ABI, this value
* must be bumped. Examples include, but are not limited to, changing
* types, removing or reassigning enums, adding/removing/rearranging
* fields to structures
*/
#define AOM_CODEC_INTERNAL_ABI_VERSION (5) /**<\hideinitializer*/
typedef struct aom_codec_alg_priv aom_codec_alg_priv_t;
typedef struct aom_codec_priv_enc_mr_cfg aom_codec_priv_enc_mr_cfg_t;
/*!\brief init function pointer prototype
*
* Performs algorithm-specific initialization of the decoder context. This
* function is called by the generic aom_codec_init() wrapper function, so
* plugins implementing this interface may trust the input parameters to be
* properly initialized.
*
* \param[in] ctx Pointer to this instance's context
* \retval #AOM_CODEC_OK
* The input stream was recognized and decoder initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory operation failed.
*/
typedef aom_codec_err_t (*aom_codec_init_fn_t)(
aom_codec_ctx_t *ctx, aom_codec_priv_enc_mr_cfg_t *data);
/*!\brief destroy function pointer prototype
*
* Performs algorithm-specific destruction of the decoder context. This
* function is called by the generic aom_codec_destroy() wrapper function,
* so plugins implementing this interface may trust the input parameters
* to be properly initialized.
*
* \param[in] ctx Pointer to this instance's context
* \retval #AOM_CODEC_OK
* The input stream was recognized and decoder initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory operation failed.
*/
typedef aom_codec_err_t (*aom_codec_destroy_fn_t)(aom_codec_alg_priv_t *ctx);
/*!\brief parse stream info function pointer prototype
*
* Performs high level parsing of the bitstream. This function is called by the
* generic aom_codec_peek_stream_info() wrapper function, so plugins
* implementing this interface may trust the input parameters to be properly
* initialized.
*
* \param[in] data Pointer to a block of data to parse
* \param[in] data_sz Size of the data buffer
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
typedef aom_codec_err_t (*aom_codec_peek_si_fn_t)(const uint8_t *data,
unsigned int data_sz,
aom_codec_stream_info_t *si);
/*!\brief Return information about the current stream.
*
* Returns information about the stream that has been parsed during decoding.
*
* \param[in] ctx Pointer to this instance's context
* \param[in,out] si Pointer to stream info to update. The size member
* \ref MUST be properly initialized, but \ref MAY be
* clobbered by the algorithm. This parameter \ref MAY
* be NULL.
*
* \retval #AOM_CODEC_OK
* Bitstream is parsable and stream information updated
*/
typedef aom_codec_err_t (*aom_codec_get_si_fn_t)(aom_codec_alg_priv_t *ctx,
aom_codec_stream_info_t *si);
/*!\brief control function pointer prototype
*
* This function is used to exchange algorithm specific data with the decoder
* instance. This can be used to implement features specific to a particular
* algorithm.
*
* This function is called by the generic aom_codec_control() wrapper
* function, so plugins implementing this interface may trust the input
* parameters to be properly initialized. However, this interface does not
* provide type safety for the exchanged data or assign meanings to the
* control codes. Those details should be specified in the algorithm's
* header file. In particular, the ctrl_id parameter is guaranteed to exist
* in the algorithm's control mapping table, and the data parameter may be NULL.
*
*
* \param[in] ctx Pointer to this instance's context
* \param[in] ctrl_id Algorithm specific control identifier
* \param[in,out] data Data to exchange with algorithm instance.
*
* \retval #AOM_CODEC_OK
* The internal state data was deserialized.
*/
typedef aom_codec_err_t (*aom_codec_control_fn_t)(aom_codec_alg_priv_t *ctx,
va_list ap);
/*!\brief control function pointer mapping
*
* This structure stores the mapping between control identifiers and
* implementing functions. Each algorithm provides a list of these
* mappings. This list is searched by the aom_codec_control() wrapper
* function to determine which function to invoke. The special
* value {0, NULL} is used to indicate end-of-list, and must be
* present. The special value {0, <non-null>} can be used as a catch-all
* mapping. This implies that ctrl_id values chosen by the algorithm
* \ref MUST be non-zero.
*/
typedef const struct aom_codec_ctrl_fn_map {
int ctrl_id;
aom_codec_control_fn_t fn;
} aom_codec_ctrl_fn_map_t;
/*!\brief decode data function pointer prototype
*
* Processes a buffer of coded data. If the processing results in a new
* decoded frame becoming available, #AOM_CODEC_CB_PUT_SLICE and
* #AOM_CODEC_CB_PUT_FRAME events are generated as appropriate. This
* function is called by the generic aom_codec_decode() wrapper function,
* so plugins implementing this interface may trust the input parameters
* to be properly initialized.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] data Pointer to this block of new coded data. If
* NULL, a #AOM_CODEC_CB_PUT_FRAME event is posted
* for the previously decoded frame.
* \param[in] data_sz Size of the coded data, in bytes.
*
* \return Returns #AOM_CODEC_OK if the coded data was processed completely
* and future pictures can be decoded without error. Otherwise,
* see the descriptions of the other error codes in ::aom_codec_err_t
* for recoverability capabilities.
*/
typedef aom_codec_err_t (*aom_codec_decode_fn_t)(aom_codec_alg_priv_t *ctx,
const uint8_t *data,
unsigned int data_sz,
void *user_priv,
long deadline);
/*!\brief Decoded frames iterator
*
* Iterates over a list of the frames available for display. The iterator
* storage should be initialized to NULL to start the iteration. Iteration is
* complete when this function returns NULL.
*
* The list of available frames becomes valid upon completion of the
* aom_codec_decode call, and remains valid until the next call to
* aom_codec_decode.
*
* \param[in] ctx Pointer to this instance's context
* \param[in out] iter Iterator storage, initialized to NULL
*
* \return Returns a pointer to an image, if one is ready for display. Frames
* produced will always be in PTS (presentation time stamp) order.
*/
typedef aom_image_t *(*aom_codec_get_frame_fn_t)(aom_codec_alg_priv_t *ctx,
aom_codec_iter_t *iter);
/*!\brief Pass in external frame buffers for the decoder to use.
*
* Registers functions to be called when libaom needs a frame buffer
* to decode the current frame and a function to be called when libaom does
* not internally reference the frame buffer. This set function must
* be called before the first call to decode or libaom will assume the
* default behavior of allocating frame buffers internally.
*
* \param[in] ctx Pointer to this instance's context
* \param[in] cb_get Pointer to the get callback function
* \param[in] cb_release Pointer to the release callback function
* \param[in] cb_priv Callback's private data
*
* \retval #AOM_CODEC_OK
* External frame buffers will be used by libaom.
* \retval #AOM_CODEC_INVALID_PARAM
* One or more of the callbacks were NULL.
* \retval #AOM_CODEC_ERROR
* Decoder context not initialized, or algorithm not capable of
* using external frame buffers.
*
* \note
* When decoding AV1, the application may be required to pass in at least
* #AOM_MAXIMUM_WORK_BUFFERS external frame
* buffers.
*/
typedef aom_codec_err_t (*aom_codec_set_fb_fn_t)(
aom_codec_alg_priv_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv);
typedef aom_codec_err_t (*aom_codec_encode_fn_t)(aom_codec_alg_priv_t *ctx,
const aom_image_t *img,
aom_codec_pts_t pts,
unsigned long duration,
aom_enc_frame_flags_t flags,
unsigned long deadline);
typedef const aom_codec_cx_pkt_t *(*aom_codec_get_cx_data_fn_t)(
aom_codec_alg_priv_t *ctx, aom_codec_iter_t *iter);
typedef aom_codec_err_t (*aom_codec_enc_config_set_fn_t)(
aom_codec_alg_priv_t *ctx, const aom_codec_enc_cfg_t *cfg);
typedef aom_fixed_buf_t *(*aom_codec_get_global_headers_fn_t)(
aom_codec_alg_priv_t *ctx);
typedef aom_image_t *(*aom_codec_get_preview_frame_fn_t)(
aom_codec_alg_priv_t *ctx);
typedef aom_codec_err_t (*aom_codec_enc_mr_get_mem_loc_fn_t)(
const aom_codec_enc_cfg_t *cfg, void **mem_loc);
/*!\brief usage configuration mapping
*
* This structure stores the mapping between usage identifiers and
* configuration structures. Each algorithm provides a list of these
* mappings. This list is searched by the aom_codec_enc_config_default()
* wrapper function to determine which config to return. The special value
* {-1, {0}} is used to indicate end-of-list, and must be present. At least
* one mapping must be present, in addition to the end-of-list.
*
*/
typedef const struct aom_codec_enc_cfg_map {
int usage;
aom_codec_enc_cfg_t cfg;
} aom_codec_enc_cfg_map_t;
/*!\brief Decoder algorithm interface interface
*
* All decoders \ref MUST expose a variable of this type.
*/
struct aom_codec_iface {
const char *name; /**< Identification String */
int abi_version; /**< Implemented ABI version */
aom_codec_caps_t caps; /**< Decoder capabilities */
aom_codec_init_fn_t init; /**< \copydoc ::aom_codec_init_fn_t */
aom_codec_destroy_fn_t destroy; /**< \copydoc ::aom_codec_destroy_fn_t */
aom_codec_ctrl_fn_map_t *ctrl_maps; /**< \copydoc ::aom_codec_ctrl_fn_map_t */
struct aom_codec_dec_iface {
aom_codec_peek_si_fn_t peek_si; /**< \copydoc ::aom_codec_peek_si_fn_t */
aom_codec_get_si_fn_t get_si; /**< \copydoc ::aom_codec_get_si_fn_t */
aom_codec_decode_fn_t decode; /**< \copydoc ::aom_codec_decode_fn_t */
aom_codec_get_frame_fn_t
get_frame; /**< \copydoc ::aom_codec_get_frame_fn_t */
aom_codec_set_fb_fn_t set_fb_fn; /**< \copydoc ::aom_codec_set_fb_fn_t */
} dec;
struct aom_codec_enc_iface {
int cfg_map_count;
aom_codec_enc_cfg_map_t
*cfg_maps; /**< \copydoc ::aom_codec_enc_cfg_map_t */
aom_codec_encode_fn_t encode; /**< \copydoc ::aom_codec_encode_fn_t */
aom_codec_get_cx_data_fn_t
get_cx_data; /**< \copydoc ::aom_codec_get_cx_data_fn_t */
aom_codec_enc_config_set_fn_t
cfg_set; /**< \copydoc ::aom_codec_enc_config_set_fn_t */
aom_codec_get_global_headers_fn_t
get_glob_hdrs; /**< \copydoc ::aom_codec_get_global_headers_fn_t */
aom_codec_get_preview_frame_fn_t
get_preview; /**< \copydoc ::aom_codec_get_preview_frame_fn_t */
aom_codec_enc_mr_get_mem_loc_fn_t
mr_get_mem_loc; /**< \copydoc ::aom_codec_enc_mr_get_mem_loc_fn_t */
} enc;
};
/*!\brief Callback function pointer / user data pair storage */
typedef struct aom_codec_priv_cb_pair {
union {
aom_codec_put_frame_cb_fn_t put_frame;
aom_codec_put_slice_cb_fn_t put_slice;
} u;
void *user_priv;
} aom_codec_priv_cb_pair_t;
/*!\brief Instance private storage
*
* This structure is allocated by the algorithm's init function. It can be
* extended in one of two ways. First, a second, algorithm specific structure
* can be allocated and the priv member pointed to it. Alternatively, this
* structure can be made the first member of the algorithm specific structure,
* and the pointer cast to the proper type.
*/
struct aom_codec_priv {
const char *err_detail;
aom_codec_flags_t init_flags;
struct {
aom_codec_priv_cb_pair_t put_frame_cb;
aom_codec_priv_cb_pair_t put_slice_cb;
} dec;
struct {
aom_fixed_buf_t cx_data_dst_buf;
unsigned int cx_data_pad_before;
unsigned int cx_data_pad_after;
aom_codec_cx_pkt_t cx_data_pkt;
unsigned int total_encoders;
} enc;
};
/*
* Multi-resolution encoding internal configuration
*/
struct aom_codec_priv_enc_mr_cfg {
unsigned int mr_total_resolutions;
unsigned int mr_encoder_id;
struct aom_rational mr_down_sampling_factor;
void *mr_low_res_mode_info;
};
#undef AOM_CTRL_USE_TYPE
#define AOM_CTRL_USE_TYPE(id, typ) \
static AOM_INLINE typ id##__value(va_list args) { return va_arg(args, typ); }
#undef AOM_CTRL_USE_TYPE_DEPRECATED
#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ) \
static AOM_INLINE typ id##__value(va_list args) { return va_arg(args, typ); }
#define CAST(id, arg) id##__value(arg)
/* CODEC_INTERFACE convenience macro
*
* By convention, each codec interface is a struct with extern linkage, where
* the symbol is suffixed with _algo. A getter function is also defined to
* return a pointer to the struct, since in some cases it's easier to work
* with text symbols than data symbols (see issue #169). This function has
* the same name as the struct, less the _algo suffix. The CODEC_INTERFACE
* macro is provided to define this getter function automatically.
*/
#define CODEC_INTERFACE(id) \
aom_codec_iface_t *id(void) { return &id##_algo; } \
aom_codec_iface_t id##_algo
/* Internal Utility Functions
*
* The following functions are intended to be used inside algorithms as
* utilities for manipulating aom_codec_* data structures.
*/
struct aom_codec_pkt_list {
unsigned int cnt;
unsigned int max;
struct aom_codec_cx_pkt pkts[1];
};
#define aom_codec_pkt_list_decl(n) \
union { \
struct aom_codec_pkt_list head; \
struct { \
struct aom_codec_pkt_list head; \
struct aom_codec_cx_pkt pkts[n]; \
} alloc; \
}
#define aom_codec_pkt_list_init(m) \
(m)->alloc.head.cnt = 0, \
(m)->alloc.head.max = sizeof((m)->alloc.pkts) / sizeof((m)->alloc.pkts[0])
int aom_codec_pkt_list_add(struct aom_codec_pkt_list *,
const struct aom_codec_cx_pkt *);
const aom_codec_cx_pkt_t *aom_codec_pkt_list_get(
struct aom_codec_pkt_list *list, aom_codec_iter_t *iter);
#include <stdio.h>
#include <setjmp.h>
struct aom_internal_error_info {
aom_codec_err_t error_code;
int has_detail;
char detail[80];
int setjmp;
jmp_buf jmp;
};
#define CLANG_ANALYZER_NORETURN
#if defined(__has_feature)
#if __has_feature(attribute_analyzer_noreturn)
#undef CLANG_ANALYZER_NORETURN
#define CLANG_ANALYZER_NORETURN __attribute__((analyzer_noreturn))
#endif
#endif
void aom_internal_error(struct aom_internal_error_info *info,
aom_codec_err_t error, const char *fmt,
...) CLANG_ANALYZER_NORETURN;
#if CONFIG_DEBUG
#define AOM_CHECK_MEM_ERROR(error_info, lval, expr) \
do { \
lval = (expr); \
if (!lval) \
aom_internal_error(error_info, AOM_CODEC_MEM_ERROR, \
"Failed to allocate " #lval " at %s:%d", __FILE__, \
__LINE__); \
} while (0)
#else
#define AOM_CHECK_MEM_ERROR(error_info, lval, expr) \
do { \
lval = (expr); \
if (!lval) \
aom_internal_error(error_info, AOM_CODEC_MEM_ERROR, \
"Failed to allocate " #lval); \
} while (0)
#endif
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_INTERNAL_AOM_CODEC_INTERNAL_H_

134
aom/src/aom_codec.c Normal file
View File

@ -0,0 +1,134 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap decoder algorithms.
*
*/
#include <stdarg.h>
#include <stdlib.h>
#include "aom/aom_integer.h"
#include "aom/internal/aom_codec_internal.h"
#include "aom_version.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
int aom_codec_version(void) { return VERSION_PACKED; }
const char *aom_codec_version_str(void) { return VERSION_STRING_NOSP; }
const char *aom_codec_version_extra_str(void) { return VERSION_EXTRA; }
const char *aom_codec_iface_name(aom_codec_iface_t *iface) {
return iface ? iface->name : "<invalid interface>";
}
const char *aom_codec_err_to_string(aom_codec_err_t err) {
switch (err) {
case AOM_CODEC_OK: return "Success";
case AOM_CODEC_ERROR: return "Unspecified internal error";
case AOM_CODEC_MEM_ERROR: return "Memory allocation error";
case AOM_CODEC_ABI_MISMATCH: return "ABI version mismatch";
case AOM_CODEC_INCAPABLE:
return "Codec does not implement requested capability";
case AOM_CODEC_UNSUP_BITSTREAM:
return "Bitstream not supported by this decoder";
case AOM_CODEC_UNSUP_FEATURE:
return "Bitstream required feature not supported by this decoder";
case AOM_CODEC_CORRUPT_FRAME: return "Corrupt frame detected";
case AOM_CODEC_INVALID_PARAM: return "Invalid parameter";
case AOM_CODEC_LIST_END: return "End of iterated list";
}
return "Unrecognized error code";
}
const char *aom_codec_error(aom_codec_ctx_t *ctx) {
return (ctx) ? aom_codec_err_to_string(ctx->err)
: aom_codec_err_to_string(AOM_CODEC_INVALID_PARAM);
}
const char *aom_codec_error_detail(aom_codec_ctx_t *ctx) {
if (ctx && ctx->err)
return ctx->priv ? ctx->priv->err_detail : ctx->err_detail;
return NULL;
}
aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx) {
aom_codec_err_t res;
if (!ctx)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
ctx->iface->destroy((aom_codec_alg_priv_t *)ctx->priv);
ctx->iface = NULL;
ctx->name = NULL;
ctx->priv = NULL;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface) {
return (iface) ? iface->caps : 0;
}
aom_codec_err_t aom_codec_control_(aom_codec_ctx_t *ctx, int ctrl_id, ...) {
aom_codec_err_t res;
if (!ctx || !ctrl_id)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv || !ctx->iface->ctrl_maps)
res = AOM_CODEC_ERROR;
else {
aom_codec_ctrl_fn_map_t *entry;
res = AOM_CODEC_ERROR;
for (entry = ctx->iface->ctrl_maps; entry && entry->fn; entry++) {
if (!entry->ctrl_id || entry->ctrl_id == ctrl_id) {
va_list ap;
va_start(ap, ctrl_id);
res = entry->fn((aom_codec_alg_priv_t *)ctx->priv, ap);
va_end(ap);
break;
}
}
}
return SAVE_STATUS(ctx, res);
}
void aom_internal_error(struct aom_internal_error_info *info,
aom_codec_err_t error, const char *fmt, ...) {
va_list ap;
info->error_code = error;
info->has_detail = 0;
if (fmt) {
size_t sz = sizeof(info->detail);
info->has_detail = 1;
va_start(ap, fmt);
vsnprintf(info->detail, sz - 1, fmt, ap);
va_end(ap);
info->detail[sz - 1] = '\0';
}
if (info->setjmp) longjmp(info->jmp, info->error_code);
}

189
aom/src/aom_decoder.c Normal file
View File

@ -0,0 +1,189 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap decoder algorithms.
*
*/
#include <string.h>
#include "aom/internal/aom_codec_internal.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
static aom_codec_alg_priv_t *get_alg_priv(aom_codec_ctx_t *ctx) {
return (aom_codec_alg_priv_t *)ctx->priv;
}
aom_codec_err_t aom_codec_dec_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_dec_cfg_t *cfg,
aom_codec_flags_t flags, int ver) {
aom_codec_err_t res;
if (ver != AOM_DECODER_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!ctx || !iface)
res = AOM_CODEC_INVALID_PARAM;
else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if ((flags & AOM_CODEC_USE_POSTPROC) &&
!(iface->caps & AOM_CODEC_CAP_POSTPROC))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_ERROR_CONCEALMENT) &&
!(iface->caps & AOM_CODEC_CAP_ERROR_CONCEALMENT))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_INPUT_FRAGMENTS) &&
!(iface->caps & AOM_CODEC_CAP_INPUT_FRAGMENTS))
res = AOM_CODEC_INCAPABLE;
else if (!(iface->caps & AOM_CODEC_CAP_DECODER))
res = AOM_CODEC_INCAPABLE;
else {
memset(ctx, 0, sizeof(*ctx));
ctx->iface = iface;
ctx->name = iface->name;
ctx->priv = NULL;
ctx->init_flags = flags;
ctx->config.dec = cfg;
res = ctx->iface->init(ctx, NULL);
if (res) {
ctx->err_detail = ctx->priv ? ctx->priv->err_detail : NULL;
aom_codec_destroy(ctx);
}
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_peek_stream_info(aom_codec_iface_t *iface,
const uint8_t *data,
unsigned int data_sz,
aom_codec_stream_info_t *si) {
aom_codec_err_t res;
if (!iface || !data || !data_sz || !si ||
si->sz < sizeof(aom_codec_stream_info_t))
res = AOM_CODEC_INVALID_PARAM;
else {
/* Set default/unknown values */
si->w = 0;
si->h = 0;
res = iface->dec.peek_si(data, data_sz, si);
}
return res;
}
aom_codec_err_t aom_codec_get_stream_info(aom_codec_ctx_t *ctx,
aom_codec_stream_info_t *si) {
aom_codec_err_t res;
if (!ctx || !si || si->sz < sizeof(aom_codec_stream_info_t))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
/* Set default/unknown values */
si->w = 0;
si->h = 0;
res = ctx->iface->dec.get_si(get_alg_priv(ctx), si);
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_decode(aom_codec_ctx_t *ctx, const uint8_t *data,
unsigned int data_sz, void *user_priv,
long deadline) {
aom_codec_err_t res;
/* Sanity checks */
/* NULL data ptr allowed if data_sz is 0 too */
if (!ctx || (!data && data_sz) || (data && !data_sz))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else {
res = ctx->iface->dec.decode(get_alg_priv(ctx), data, data_sz, user_priv,
deadline);
}
return SAVE_STATUS(ctx, res);
}
aom_image_t *aom_codec_get_frame(aom_codec_ctx_t *ctx, aom_codec_iter_t *iter) {
aom_image_t *img;
if (!ctx || !iter || !ctx->iface || !ctx->priv)
img = NULL;
else
img = ctx->iface->dec.get_frame(get_alg_priv(ctx), iter);
return img;
}
aom_codec_err_t aom_codec_register_put_frame_cb(aom_codec_ctx_t *ctx,
aom_codec_put_frame_cb_fn_t cb,
void *user_priv) {
aom_codec_err_t res;
if (!ctx || !cb)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_PUT_FRAME))
res = AOM_CODEC_ERROR;
else {
ctx->priv->dec.put_frame_cb.u.put_frame = cb;
ctx->priv->dec.put_frame_cb.user_priv = user_priv;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_register_put_slice_cb(aom_codec_ctx_t *ctx,
aom_codec_put_slice_cb_fn_t cb,
void *user_priv) {
aom_codec_err_t res;
if (!ctx || !cb)
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_PUT_SLICE))
res = AOM_CODEC_ERROR;
else {
ctx->priv->dec.put_slice_cb.u.put_slice = cb;
ctx->priv->dec.put_slice_cb.user_priv = user_priv;
res = AOM_CODEC_OK;
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_set_frame_buffer_functions(
aom_codec_ctx_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv) {
aom_codec_err_t res;
if (!ctx || !cb_get || !cb_release) {
res = AOM_CODEC_INVALID_PARAM;
} else if (!ctx->iface || !ctx->priv ||
!(ctx->iface->caps & AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER)) {
res = AOM_CODEC_ERROR;
} else {
res = ctx->iface->dec.set_fb_fn(get_alg_priv(ctx), cb_get, cb_release,
cb_priv);
}
return SAVE_STATUS(ctx, res);
}

380
aom/src/aom_encoder.c Normal file
View File

@ -0,0 +1,380 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
/*!\file
* \brief Provides the high level interface to wrap encoder algorithms.
*
*/
#include <limits.h>
#include <string.h>
#include "aom_config.h"
#include "aom/internal/aom_codec_internal.h"
#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
static aom_codec_alg_priv_t *get_alg_priv(aom_codec_ctx_t *ctx) {
return (aom_codec_alg_priv_t *)ctx->priv;
}
aom_codec_err_t aom_codec_enc_init_ver(aom_codec_ctx_t *ctx,
aom_codec_iface_t *iface,
const aom_codec_enc_cfg_t *cfg,
aom_codec_flags_t flags, int ver) {
aom_codec_err_t res;
if (ver != AOM_ENCODER_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!ctx || !iface || !cfg)
res = AOM_CODEC_INVALID_PARAM;
else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!(iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_PSNR) && !(iface->caps & AOM_CODEC_CAP_PSNR))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_OUTPUT_PARTITION) &&
!(iface->caps & AOM_CODEC_CAP_OUTPUT_PARTITION))
res = AOM_CODEC_INCAPABLE;
else {
ctx->iface = iface;
ctx->name = iface->name;
ctx->priv = NULL;
ctx->init_flags = flags;
ctx->config.enc = cfg;
res = ctx->iface->init(ctx, NULL);
if (res) {
ctx->err_detail = ctx->priv ? ctx->priv->err_detail : NULL;
aom_codec_destroy(ctx);
}
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_enc_init_multi_ver(
aom_codec_ctx_t *ctx, aom_codec_iface_t *iface, aom_codec_enc_cfg_t *cfg,
int num_enc, aom_codec_flags_t flags, aom_rational_t *dsf, int ver) {
aom_codec_err_t res = AOM_CODEC_OK;
if (ver != AOM_ENCODER_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!ctx || !iface || !cfg || (num_enc > 16 || num_enc < 1))
res = AOM_CODEC_INVALID_PARAM;
else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
res = AOM_CODEC_ABI_MISMATCH;
else if (!(iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_PSNR) && !(iface->caps & AOM_CODEC_CAP_PSNR))
res = AOM_CODEC_INCAPABLE;
else if ((flags & AOM_CODEC_USE_OUTPUT_PARTITION) &&
!(iface->caps & AOM_CODEC_CAP_OUTPUT_PARTITION))
res = AOM_CODEC_INCAPABLE;
else {
int i;
void *mem_loc = NULL;
if (!(res = iface->enc.mr_get_mem_loc(cfg, &mem_loc))) {
for (i = 0; i < num_enc; i++) {
aom_codec_priv_enc_mr_cfg_t mr_cfg;
/* Validate down-sampling factor. */
if (dsf->num < 1 || dsf->num > 4096 || dsf->den < 1 ||
dsf->den > dsf->num) {
res = AOM_CODEC_INVALID_PARAM;
break;
}
mr_cfg.mr_low_res_mode_info = mem_loc;
mr_cfg.mr_total_resolutions = num_enc;
mr_cfg.mr_encoder_id = num_enc - 1 - i;
mr_cfg.mr_down_sampling_factor.num = dsf->num;
mr_cfg.mr_down_sampling_factor.den = dsf->den;
/* Force Key-frame synchronization. Namely, encoder at higher
* resolution always use the same frame_type chosen by the
* lowest-resolution encoder.
*/
if (mr_cfg.mr_encoder_id) cfg->kf_mode = AOM_KF_DISABLED;
ctx->iface = iface;
ctx->name = iface->name;
ctx->priv = NULL;
ctx->init_flags = flags;
ctx->config.enc = cfg;
res = ctx->iface->init(ctx, &mr_cfg);
if (res) {
const char *error_detail = ctx->priv ? ctx->priv->err_detail : NULL;
/* Destroy current ctx */
ctx->err_detail = error_detail;
aom_codec_destroy(ctx);
/* Destroy already allocated high-level ctx */
while (i) {
ctx--;
ctx->err_detail = error_detail;
aom_codec_destroy(ctx);
i--;
}
}
if (res) break;
ctx++;
cfg++;
dsf++;
}
ctx--;
}
}
return SAVE_STATUS(ctx, res);
}
aom_codec_err_t aom_codec_enc_config_default(aom_codec_iface_t *iface,
aom_codec_enc_cfg_t *cfg,
unsigned int usage) {
aom_codec_err_t res;
aom_codec_enc_cfg_map_t *map;
int i;
if (!iface || !cfg || usage > INT_MAX)
res = AOM_CODEC_INVALID_PARAM;
else if (!(iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else {
res = AOM_CODEC_INVALID_PARAM;
for (i = 0; i < iface->enc.cfg_map_count; ++i) {
map = iface->enc.cfg_maps + i;
if (map->usage == (int)usage) {
*cfg = map->cfg;
cfg->g_usage = usage;
res = AOM_CODEC_OK;
break;
}
}
}
return res;
}
#if ARCH_X86 || ARCH_X86_64
/* On X86, disable the x87 unit's internal 80 bit precision for better
* consistency with the SSE unit's 64 bit precision.
*/
#include "aom_ports/x86.h"
#define FLOATING_POINT_INIT() \
do { \
unsigned short x87_orig_mode = x87_set_double_precision();
#define FLOATING_POINT_RESTORE() \
x87_set_control_word(x87_orig_mode); \
} \
while (0)
#else
static void FLOATING_POINT_INIT() {}
static void FLOATING_POINT_RESTORE() {}
#endif
aom_codec_err_t aom_codec_encode(aom_codec_ctx_t *ctx, const aom_image_t *img,
aom_codec_pts_t pts, unsigned long duration,
aom_enc_frame_flags_t flags,
unsigned long deadline) {
aom_codec_err_t res = AOM_CODEC_OK;
if (!ctx || (img && !duration))
res = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
res = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else {
unsigned int num_enc = ctx->priv->enc.total_encoders;
/* Execute in a normalized floating point environment, if the platform
* requires it.
*/
FLOATING_POINT_INIT();
if (num_enc == 1)
res = ctx->iface->enc.encode(get_alg_priv(ctx), img, pts, duration, flags,
deadline);
else {
/* Multi-resolution encoding:
* Encode multi-levels in reverse order. For example,
* if mr_total_resolutions = 3, first encode level 2,
* then encode level 1, and finally encode level 0.
*/
int i;
ctx += num_enc - 1;
if (img) img += num_enc - 1;
for (i = num_enc - 1; i >= 0; i--) {
if ((res = ctx->iface->enc.encode(get_alg_priv(ctx), img, pts, duration,
flags, deadline)))
break;
ctx--;
if (img) img--;
}
ctx++;
}
FLOATING_POINT_RESTORE();
}
return SAVE_STATUS(ctx, res);
}
const aom_codec_cx_pkt_t *aom_codec_get_cx_data(aom_codec_ctx_t *ctx,
aom_codec_iter_t *iter) {
const aom_codec_cx_pkt_t *pkt = NULL;
if (ctx) {
if (!iter)
ctx->err = AOM_CODEC_INVALID_PARAM;
else if (!ctx->iface || !ctx->priv)
ctx->err = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
ctx->err = AOM_CODEC_INCAPABLE;
else
pkt = ctx->iface->enc.get_cx_data(get_alg_priv(ctx), iter);
}
if (pkt && pkt->kind == AOM_CODEC_CX_FRAME_PKT) {
// If the application has specified a destination area for the
// compressed data, and the codec has not placed the data there,
// and it fits, copy it.
aom_codec_priv_t *const priv = ctx->priv;
char *const dst_buf = (char *)priv->enc.cx_data_dst_buf.buf;
if (dst_buf && pkt->data.raw.buf != dst_buf &&
pkt->data.raw.sz + priv->enc.cx_data_pad_before +
priv->enc.cx_data_pad_after <=
priv->enc.cx_data_dst_buf.sz) {
aom_codec_cx_pkt_t *modified_pkt = &priv->enc.cx_data_pkt;
memcpy(dst_buf + priv->enc.cx_data_pad_before, pkt->data.raw.buf,
pkt->data.raw.sz);
*modified_pkt = *pkt;
modified_pkt->data.raw.buf = dst_buf;
modified_pkt->data.raw.sz +=
priv->enc.cx_data_pad_before + priv->enc.cx_data_pad_after;
pkt = modified_pkt;
}
if (dst_buf == pkt->data.raw.buf) {
priv->enc.cx_data_dst_buf.buf = dst_buf + pkt->data.raw.sz;
priv->enc.cx_data_dst_buf.sz -= pkt->data.raw.sz;
}
}
return pkt;
}
aom_codec_err_t aom_codec_set_cx_data_buf(aom_codec_ctx_t *ctx,
const aom_fixed_buf_t *buf,
unsigned int pad_before,
unsigned int pad_after) {
if (!ctx || !ctx->priv) return AOM_CODEC_INVALID_PARAM;
if (buf) {
ctx->priv->enc.cx_data_dst_buf = *buf;
ctx->priv->enc.cx_data_pad_before = pad_before;
ctx->priv->enc.cx_data_pad_after = pad_after;
} else {
ctx->priv->enc.cx_data_dst_buf.buf = NULL;
ctx->priv->enc.cx_data_dst_buf.sz = 0;
ctx->priv->enc.cx_data_pad_before = 0;
ctx->priv->enc.cx_data_pad_after = 0;
}
return AOM_CODEC_OK;
}
const aom_image_t *aom_codec_get_preview_frame(aom_codec_ctx_t *ctx) {
aom_image_t *img = NULL;
if (ctx) {
if (!ctx->iface || !ctx->priv)
ctx->err = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
ctx->err = AOM_CODEC_INCAPABLE;
else if (!ctx->iface->enc.get_preview)
ctx->err = AOM_CODEC_INCAPABLE;
else
img = ctx->iface->enc.get_preview(get_alg_priv(ctx));
}
return img;
}
aom_fixed_buf_t *aom_codec_get_global_headers(aom_codec_ctx_t *ctx) {
aom_fixed_buf_t *buf = NULL;
if (ctx) {
if (!ctx->iface || !ctx->priv)
ctx->err = AOM_CODEC_ERROR;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
ctx->err = AOM_CODEC_INCAPABLE;
else if (!ctx->iface->enc.get_glob_hdrs)
ctx->err = AOM_CODEC_INCAPABLE;
else
buf = ctx->iface->enc.get_glob_hdrs(get_alg_priv(ctx));
}
return buf;
}
aom_codec_err_t aom_codec_enc_config_set(aom_codec_ctx_t *ctx,
const aom_codec_enc_cfg_t *cfg) {
aom_codec_err_t res;
if (!ctx || !ctx->iface || !ctx->priv || !cfg)
res = AOM_CODEC_INVALID_PARAM;
else if (!(ctx->iface->caps & AOM_CODEC_CAP_ENCODER))
res = AOM_CODEC_INCAPABLE;
else
res = ctx->iface->enc.cfg_set(get_alg_priv(ctx), cfg);
return SAVE_STATUS(ctx, res);
}
int aom_codec_pkt_list_add(struct aom_codec_pkt_list *list,
const struct aom_codec_cx_pkt *pkt) {
if (list->cnt < list->max) {
list->pkts[list->cnt++] = *pkt;
return 0;
}
return 1;
}
const aom_codec_cx_pkt_t *aom_codec_pkt_list_get(
struct aom_codec_pkt_list *list, aom_codec_iter_t *iter) {
const aom_codec_cx_pkt_t *pkt;
if (!(*iter)) {
*iter = list->pkts;
}
pkt = (const aom_codec_cx_pkt_t *)*iter;
if ((size_t)(pkt - list->pkts) < list->cnt)
*iter = pkt + 1;
else
pkt = NULL;
return pkt;
}

240
aom/src/aom_image.c Normal file
View File

@ -0,0 +1,240 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <stdlib.h>
#include <string.h>
#include "aom/aom_image.h"
#include "aom/aom_integer.h"
#include "aom_mem/aom_mem.h"
static aom_image_t *img_alloc_helper(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int buf_align,
unsigned int stride_align,
unsigned char *img_data) {
unsigned int h, w, s, xcs, ycs, bps;
unsigned int stride_in_bytes;
int align;
/* Treat align==0 like align==1 */
if (!buf_align) buf_align = 1;
/* Validate alignment (must be power of 2) */
if (buf_align & (buf_align - 1)) goto fail;
/* Treat align==0 like align==1 */
if (!stride_align) stride_align = 1;
/* Validate alignment (must be power of 2) */
if (stride_align & (stride_align - 1)) goto fail;
/* Get sample size for this format */
switch (fmt) {
case AOM_IMG_FMT_RGB32:
case AOM_IMG_FMT_RGB32_LE:
case AOM_IMG_FMT_ARGB:
case AOM_IMG_FMT_ARGB_LE: bps = 32; break;
case AOM_IMG_FMT_RGB24:
case AOM_IMG_FMT_BGR24: bps = 24; break;
case AOM_IMG_FMT_RGB565:
case AOM_IMG_FMT_RGB565_LE:
case AOM_IMG_FMT_RGB555:
case AOM_IMG_FMT_RGB555_LE:
case AOM_IMG_FMT_UYVY:
case AOM_IMG_FMT_YUY2:
case AOM_IMG_FMT_YVYU: bps = 16; break;
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12: bps = 12; break;
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I440: bps = 16; break;
case AOM_IMG_FMT_I444: bps = 24; break;
case AOM_IMG_FMT_I42016: bps = 24; break;
case AOM_IMG_FMT_I42216:
case AOM_IMG_FMT_I44016: bps = 32; break;
case AOM_IMG_FMT_I44416: bps = 48; break;
default: bps = 16; break;
}
/* Get chroma shift values for this format */
switch (fmt) {
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12:
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I42216: xcs = 1; break;
default: xcs = 0; break;
}
switch (fmt) {
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_I440:
case AOM_IMG_FMT_YV12:
case AOM_IMG_FMT_AOMI420:
case AOM_IMG_FMT_AOMYV12:
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I44016: ycs = 1; break;
default: ycs = 0; break;
}
/* Calculate storage sizes given the chroma subsampling */
align = (1 << xcs) - 1;
w = (d_w + align) & ~align;
align = (1 << ycs) - 1;
h = (d_h + align) & ~align;
s = (fmt & AOM_IMG_FMT_PLANAR) ? w : bps * w / 8;
s = (s + stride_align - 1) & ~(stride_align - 1);
stride_in_bytes = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
/* Allocate the new image */
if (!img) {
img = (aom_image_t *)calloc(1, sizeof(aom_image_t));
if (!img) goto fail;
img->self_allocd = 1;
} else {
memset(img, 0, sizeof(aom_image_t));
}
img->img_data = img_data;
if (!img_data) {
const uint64_t alloc_size = (fmt & AOM_IMG_FMT_PLANAR)
? (uint64_t)h * s * bps / 8
: (uint64_t)h * s;
if (alloc_size != (size_t)alloc_size) goto fail;
img->img_data = (uint8_t *)aom_memalign(buf_align, (size_t)alloc_size);
img->img_data_owner = 1;
}
if (!img->img_data) goto fail;
img->fmt = fmt;
img->bit_depth = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 16 : 8;
img->w = w;
img->h = h;
img->x_chroma_shift = xcs;
img->y_chroma_shift = ycs;
img->bps = bps;
/* Calculate strides */
img->stride[AOM_PLANE_Y] = img->stride[AOM_PLANE_ALPHA] = stride_in_bytes;
img->stride[AOM_PLANE_U] = img->stride[AOM_PLANE_V] = stride_in_bytes >> xcs;
/* Default viewport to entire image */
if (!aom_img_set_rect(img, 0, 0, d_w, d_h)) return img;
fail:
aom_img_free(img);
return NULL;
}
aom_image_t *aom_img_alloc(aom_image_t *img, aom_img_fmt_t fmt,
unsigned int d_w, unsigned int d_h,
unsigned int align) {
return img_alloc_helper(img, fmt, d_w, d_h, align, align, NULL);
}
aom_image_t *aom_img_wrap(aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w,
unsigned int d_h, unsigned int stride_align,
unsigned char *img_data) {
/* By setting buf_align = 1, we don't change buffer alignment in this
* function. */
return img_alloc_helper(img, fmt, d_w, d_h, 1, stride_align, img_data);
}
int aom_img_set_rect(aom_image_t *img, unsigned int x, unsigned int y,
unsigned int w, unsigned int h) {
unsigned char *data;
if (x + w <= img->w && y + h <= img->h) {
img->d_w = w;
img->d_h = h;
/* Calculate plane pointers */
if (!(img->fmt & AOM_IMG_FMT_PLANAR)) {
img->planes[AOM_PLANE_PACKED] =
img->img_data + x * img->bps / 8 + y * img->stride[AOM_PLANE_PACKED];
} else {
const int bytes_per_sample =
(img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1;
data = img->img_data;
if (img->fmt & AOM_IMG_FMT_HAS_ALPHA) {
img->planes[AOM_PLANE_ALPHA] =
data + x * bytes_per_sample + y * img->stride[AOM_PLANE_ALPHA];
data += img->h * img->stride[AOM_PLANE_ALPHA];
}
img->planes[AOM_PLANE_Y] =
data + x * bytes_per_sample + y * img->stride[AOM_PLANE_Y];
data += img->h * img->stride[AOM_PLANE_Y];
if (!(img->fmt & AOM_IMG_FMT_UV_FLIP)) {
img->planes[AOM_PLANE_U] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
data += (img->h >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
img->planes[AOM_PLANE_V] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
} else {
img->planes[AOM_PLANE_V] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
data += (img->h >> img->y_chroma_shift) * img->stride[AOM_PLANE_V];
img->planes[AOM_PLANE_U] =
data + (x >> img->x_chroma_shift) * bytes_per_sample +
(y >> img->y_chroma_shift) * img->stride[AOM_PLANE_U];
}
}
return 0;
}
return -1;
}
void aom_img_flip(aom_image_t *img) {
/* Note: In the calculation pointer adjustment calculation, we want the
* rhs to be promoted to a signed type. Section 6.3.1.8 of the ISO C99
* standard indicates that if the adjustment parameter is unsigned, the
* stride parameter will be promoted to unsigned, causing errors when
* the lhs is a larger type than the rhs.
*/
img->planes[AOM_PLANE_Y] += (signed)(img->d_h - 1) * img->stride[AOM_PLANE_Y];
img->stride[AOM_PLANE_Y] = -img->stride[AOM_PLANE_Y];
img->planes[AOM_PLANE_U] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
img->stride[AOM_PLANE_U];
img->stride[AOM_PLANE_U] = -img->stride[AOM_PLANE_U];
img->planes[AOM_PLANE_V] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
img->stride[AOM_PLANE_V];
img->stride[AOM_PLANE_V] = -img->stride[AOM_PLANE_V];
img->planes[AOM_PLANE_ALPHA] +=
(signed)(img->d_h - 1) * img->stride[AOM_PLANE_ALPHA];
img->stride[AOM_PLANE_ALPHA] = -img->stride[AOM_PLANE_ALPHA];
}
void aom_img_free(aom_image_t *img) {
if (img) {
if (img->img_data && img->img_data_owner) aom_free(img->img_data);
if (img->self_allocd) free(img);
}
}

72
aom_dsp/add_noise.c Normal file
View File

@ -0,0 +1,72 @@
/*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <math.h>
#include <stdlib.h>
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
void aom_plane_add_noise_c(uint8_t *start, char *noise, char blackclamp[16],
char whiteclamp[16], char bothclamp[16],
unsigned int width, unsigned int height, int pitch) {
unsigned int i, j;
for (i = 0; i < height; ++i) {
uint8_t *pos = start + i * pitch;
char *ref = (char *)(noise + (rand() & 0xff)); // NOLINT
for (j = 0; j < width; ++j) {
int v = pos[j];
v = clamp(v - blackclamp[0], 0, 255);
v = clamp(v + bothclamp[0], 0, 255);
v = clamp(v - whiteclamp[0], 0, 255);
pos[j] = v + ref[j];
}
}
}
static double gaussian(double sigma, double mu, double x) {
return 1 / (sigma * sqrt(2.0 * 3.14159265)) *
(exp(-(x - mu) * (x - mu) / (2 * sigma * sigma)));
}
int aom_setup_noise(double sigma, int size, char *noise) {
char char_dist[256];
int next = 0, i, j;
// set up a 256 entry lookup that matches gaussian distribution
for (i = -32; i < 32; ++i) {
const int a_i = (int)(0.5 + 256 * gaussian(sigma, 0, i));
if (a_i) {
for (j = 0; j < a_i; ++j) {
char_dist[next + j] = (char)i;
}
next = next + j;
}
}
// Rounding error - might mean we have less than 256.
for (; next < 256; ++next) {
char_dist[next] = 0;
}
for (i = 0; i < size; ++i) {
noise[i] = char_dist[rand() & 0xff]; // NOLINT
}
// Returns the highest non 0 value used in distribution.
return -char_dist[0];
}

64
aom_dsp/ans.c Normal file
View File

@ -0,0 +1,64 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/ans.h"
#include "aom_dsp/prob.h"
static int find_largest(const aom_cdf_prob *const pdf_tab, int num_syms) {
int largest_idx = -1;
int largest_p = -1;
int i;
for (i = 0; i < num_syms; ++i) {
int p = pdf_tab[i];
if (p > largest_p) {
largest_p = p;
largest_idx = i;
}
}
return largest_idx;
}
void aom_rans_merge_prob8_pdf(aom_cdf_prob *const out_pdf,
const AnsP8 node_prob,
const aom_cdf_prob *const src_pdf, int in_syms) {
int i;
int adjustment = RANS_PRECISION;
const int round_fact = ANS_P8_PRECISION >> 1;
const AnsP8 p1 = ANS_P8_PRECISION - node_prob;
const int out_syms = in_syms + 1;
assert(src_pdf != out_pdf);
out_pdf[0] = node_prob << (RANS_PROB_BITS - ANS_P8_SHIFT);
adjustment -= out_pdf[0];
for (i = 0; i < in_syms; ++i) {
int p = (p1 * src_pdf[i] + round_fact) >> ANS_P8_SHIFT;
p = AOMMIN(p, (int)RANS_PRECISION - in_syms);
p = AOMMAX(p, 1);
out_pdf[i + 1] = p;
adjustment -= p;
}
// Adjust probabilities so they sum to the total probability
if (adjustment > 0) {
i = find_largest(out_pdf, out_syms);
out_pdf[i] += adjustment;
} else {
while (adjustment < 0) {
i = find_largest(out_pdf, out_syms);
--out_pdf[i];
assert(out_pdf[i] > 0);
adjustment++;
}
}
}

44
aom_dsp/ans.h Normal file
View File

@ -0,0 +1,44 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANS_H_
#define AOM_DSP_ANS_H_
// Constants, types and utilities for Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/prob.h"
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
typedef uint8_t AnsP8;
#define ANS_P8_PRECISION 256u
#define ANS_P8_SHIFT 8
#define RANS_PROB_BITS 15
#define RANS_PRECISION (1u << RANS_PROB_BITS)
// L_BASE % PRECISION must be 0. Increasing L_BASE beyond 2**15 will cause uabs
// to overflow.
#define L_BASE (RANS_PRECISION)
#define IO_BASE 256
// Range I = { L_BASE, L_BASE + 1, ..., L_BASE * IO_BASE - 1 }
void aom_rans_merge_prob8_pdf(aom_cdf_prob *const out_pdf,
const AnsP8 node_prob,
const aom_cdf_prob *const src_pdf, int in_syms);
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANS_H_

146
aom_dsp/ansreader.h Normal file
View File

@ -0,0 +1,146 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANSREADER_H_
#define AOM_DSP_ANSREADER_H_
// A uABS and rANS decoder implementation of Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/prob.h"
#include "aom_dsp/ans.h"
#include "aom_ports/mem_ops.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
struct AnsDecoder {
const uint8_t *buf;
int buf_offset;
uint32_t state;
#if CONFIG_ACCOUNTING
Accounting *accounting;
#endif
};
static INLINE int uabs_read(struct AnsDecoder *ans, AnsP8 p0) {
AnsP8 p = ANS_P8_PRECISION - p0;
int s;
unsigned xp, sp;
unsigned state = ans->state;
while (state < L_BASE && ans->buf_offset > 0) {
state = state * IO_BASE + ans->buf[--ans->buf_offset];
}
sp = state * p;
xp = sp / ANS_P8_PRECISION;
s = (sp & 0xFF) >= p0;
if (s)
ans->state = xp;
else
ans->state = state - xp;
return s;
}
static INLINE int uabs_read_bit(struct AnsDecoder *ans) {
int s;
unsigned state = ans->state;
while (state < L_BASE && ans->buf_offset > 0) {
state = state * IO_BASE + ans->buf[--ans->buf_offset];
}
s = (int)(state & 1);
ans->state = state >> 1;
return s;
}
struct rans_dec_sym {
uint8_t val;
aom_cdf_prob prob;
aom_cdf_prob cum_prob; // not-inclusive
};
static INLINE void fetch_sym(struct rans_dec_sym *out, const aom_cdf_prob *cdf,
aom_cdf_prob rem) {
int i;
aom_cdf_prob cum_prob = 0, top_prob;
// TODO(skal): if critical, could be a binary search.
// Or, better, an O(1) alias-table.
for (i = 0; rem >= (top_prob = cdf[i]); ++i) {
cum_prob = top_prob;
}
out->val = i;
out->prob = top_prob - cum_prob;
out->cum_prob = cum_prob;
}
static INLINE int rans_read(struct AnsDecoder *ans, const aom_cdf_prob *tab) {
unsigned rem;
unsigned quo;
struct rans_dec_sym sym;
while (ans->state < L_BASE && ans->buf_offset > 0) {
ans->state = ans->state * IO_BASE + ans->buf[--ans->buf_offset];
}
quo = ans->state / RANS_PRECISION;
rem = ans->state % RANS_PRECISION;
fetch_sym(&sym, tab, rem);
ans->state = quo * sym.prob + rem - sym.cum_prob;
return sym.val;
}
static INLINE int ans_read_init(struct AnsDecoder *const ans,
const uint8_t *const buf, int offset) {
unsigned x;
if (offset < 1) return 1;
ans->buf = buf;
x = buf[offset - 1] >> 6;
if (x == 0) {
ans->buf_offset = offset - 1;
ans->state = buf[offset - 1] & 0x3F;
} else if (x == 1) {
if (offset < 2) return 1;
ans->buf_offset = offset - 2;
ans->state = mem_get_le16(buf + offset - 2) & 0x3FFF;
} else if (x == 2) {
if (offset < 3) return 1;
ans->buf_offset = offset - 3;
ans->state = mem_get_le24(buf + offset - 3) & 0x3FFFFF;
} else if ((buf[offset - 1] & 0xE0) == 0xE0) {
if (offset < 4) return 1;
ans->buf_offset = offset - 4;
ans->state = mem_get_le32(buf + offset - 4) & 0x1FFFFFFF;
} else {
// 110xxxxx implies this byte is a superframe marker
return 1;
}
#if CONFIG_ACCOUNTING
ans->accounting = NULL;
#endif
ans->state += L_BASE;
if (ans->state >= L_BASE * IO_BASE) return 1;
return 0;
}
static INLINE int ans_read_end(struct AnsDecoder *const ans) {
return ans->state == L_BASE;
}
static INLINE int ans_reader_has_error(const struct AnsDecoder *const ans) {
return ans->state < L_BASE && ans->buf_offset == 0;
}
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANSREADER_H_

120
aom_dsp/answriter.h Normal file
View File

@ -0,0 +1,120 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_ANSWRITER_H_
#define AOM_DSP_ANSWRITER_H_
// A uABS and rANS encoder implementation of Asymmetric Numeral Systems
// http://arxiv.org/abs/1311.2540v2
#include <assert.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_dsp/ans.h"
#include "aom_dsp/prob.h"
#include "aom_ports/mem_ops.h"
#include "av1/common/odintrin.h"
#if RANS_PRECISION <= OD_DIVU_DMAX
#define ANS_DIVREM(quotient, remainder, dividend, divisor) \
do { \
quotient = OD_DIVU_SMALL((dividend), (divisor)); \
remainder = (dividend) - (quotient) * (divisor); \
} while (0)
#else
#define ANS_DIVREM(quotient, remainder, dividend, divisor) \
do { \
quotient = (dividend) / (divisor); \
remainder = (dividend) % (divisor); \
} while (0)
#endif
#define ANS_DIV8(dividend, divisor) OD_DIVU_SMALL((dividend), (divisor))
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
struct AnsCoder {
uint8_t *buf;
int buf_offset;
uint32_t state;
};
static INLINE void ans_write_init(struct AnsCoder *const ans,
uint8_t *const buf) {
ans->buf = buf;
ans->buf_offset = 0;
ans->state = L_BASE;
}
static INLINE int ans_write_end(struct AnsCoder *const ans) {
uint32_t state;
assert(ans->state >= L_BASE);
assert(ans->state < L_BASE * IO_BASE);
state = ans->state - L_BASE;
if (state < (1 << 6)) {
ans->buf[ans->buf_offset] = (0x00 << 6) + state;
return ans->buf_offset + 1;
} else if (state < (1 << 14)) {
mem_put_le16(ans->buf + ans->buf_offset, (0x01 << 14) + state);
return ans->buf_offset + 2;
} else if (state < (1 << 22)) {
mem_put_le24(ans->buf + ans->buf_offset, (0x02 << 22) + state);
return ans->buf_offset + 3;
} else if (state < (1 << 29)) {
mem_put_le32(ans->buf + ans->buf_offset, (0x07 << 29) + state);
return ans->buf_offset + 4;
} else {
assert(0 && "State is too large to be serialized");
return ans->buf_offset;
}
}
// uABS with normalization
static INLINE void uabs_write(struct AnsCoder *ans, int val, AnsP8 p0) {
AnsP8 p = ANS_P8_PRECISION - p0;
const unsigned l_s = val ? p : p0;
while (ans->state >= L_BASE / ANS_P8_PRECISION * IO_BASE * l_s) {
ans->buf[ans->buf_offset++] = ans->state % IO_BASE;
ans->state /= IO_BASE;
}
if (!val)
ans->state = ANS_DIV8(ans->state * ANS_P8_PRECISION, p0);
else
ans->state = ANS_DIV8((ans->state + 1) * ANS_P8_PRECISION + p - 1, p) - 1;
}
struct rans_sym {
aom_cdf_prob prob;
aom_cdf_prob cum_prob; // not-inclusive
};
// rANS with normalization
// sym->prob takes the place of l_s from the paper
// ANS_P10_PRECISION is m
static INLINE void rans_write(struct AnsCoder *ans,
const struct rans_sym *const sym) {
const aom_cdf_prob p = sym->prob;
unsigned quot, rem;
while (ans->state >= L_BASE / RANS_PRECISION * IO_BASE * p) {
ans->buf[ans->buf_offset++] = ans->state % IO_BASE;
ans->state /= IO_BASE;
}
ANS_DIVREM(quot, rem, ans->state, p);
ans->state = quot * RANS_PRECISION + rem + sym->cum_prob;
}
#undef ANS_DIV8
#undef ANS_DIVREM
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // AOM_DSP_ANSWRITER_H_

View File

@ -1,28 +1,29 @@
/*
* Copyright (c) 2013 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include <string.h>
#include "./vpx_config.h"
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "vpx_dsp/vpx_convolve.h"
#include "vpx_dsp/vpx_dsp_common.h"
#include "vpx_dsp/vpx_filter.h"
#include "vpx_ports/mem.h"
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_dsp/aom_convolve.h"
#include "aom_dsp/aom_dsp_common.h"
#include "aom_dsp/aom_filter.h"
#include "aom_ports/mem.h"
static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *x_filters,
int x0_q4, int x_step_q4, int w, int h) {
const InterpKernel *x_filters, int x0_q4,
int x_step_q4, int w, int h) {
int x, y;
src -= SUBPEL_TAPS / 2 - 1;
for (y = 0; y < h; ++y) {
@ -31,8 +32,7 @@ static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_x[k] * x_filter[k];
for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
dst[x] = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
x_q4 += x_step_q4;
}
@ -43,8 +43,8 @@ static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride,
static void convolve_avg_horiz(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *x_filters,
int x0_q4, int x_step_q4, int w, int h) {
const InterpKernel *x_filters, int x0_q4,
int x_step_q4, int w, int h) {
int x, y;
src -= SUBPEL_TAPS / 2 - 1;
for (y = 0; y < h; ++y) {
@ -53,10 +53,9 @@ static void convolve_avg_horiz(const uint8_t *src, ptrdiff_t src_stride,
const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_x[k] * x_filter[k];
dst[x] = ROUND_POWER_OF_TWO(dst[x] +
clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)), 1);
for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
dst[x] = ROUND_POWER_OF_TWO(
dst[x] + clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)), 1);
x_q4 += x_step_q4;
}
src += src_stride;
@ -66,8 +65,8 @@ static void convolve_avg_horiz(const uint8_t *src, ptrdiff_t src_stride,
static void convolve_vert(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *y_filters,
int y0_q4, int y_step_q4, int w, int h) {
const InterpKernel *y_filters, int y0_q4,
int y_step_q4, int w, int h) {
int x, y;
src -= src_stride * (SUBPEL_TAPS / 2 - 1);
@ -89,8 +88,8 @@ static void convolve_vert(const uint8_t *src, ptrdiff_t src_stride,
static void convolve_avg_vert(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *y_filters,
int y0_q4, int y_step_q4, int w, int h) {
const InterpKernel *y_filters, int y0_q4,
int y_step_q4, int w, int h) {
int x, y;
src -= src_stride * (SUBPEL_TAPS / 2 - 1);
@ -102,8 +101,10 @@ static void convolve_avg_vert(const uint8_t *src, ptrdiff_t src_stride,
int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_y[k * src_stride] * y_filter[k];
dst[y * dst_stride] = ROUND_POWER_OF_TWO(dst[y * dst_stride] +
clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)), 1);
dst[y * dst_stride] = ROUND_POWER_OF_TWO(
dst[y * dst_stride] +
clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS)),
1);
y_q4 += y_step_q4;
}
++src;
@ -111,13 +112,11 @@ static void convolve_avg_vert(const uint8_t *src, ptrdiff_t src_stride,
}
}
static void convolve(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *const x_filters,
static void convolve(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const InterpKernel *const x_filters,
int x0_q4, int x_step_q4,
const InterpKernel *const y_filters,
int y0_q4, int y_step_q4,
int w, int h) {
const InterpKernel *const y_filters, int y0_q4,
int y_step_q4, int w, int h) {
// Note: Fixed size intermediate buffer, temp, places limits on parameters.
// 2d filtering proceeds in 2 steps:
// (1) Interpolate horizontally into an intermediate buffer, temp.
@ -130,19 +129,21 @@ static void convolve(const uint8_t *src, ptrdiff_t src_stride,
// --Must round-up because block may be located at sub-pixel position.
// --Require an additional SUBPEL_TAPS rows for the 8-tap filter tails.
// --((64 - 1) * 32 + 15) >> 4 + 8 = 135.
uint8_t temp[135 * 64];
uint8_t temp[MAX_EXT_SIZE * MAX_SB_SIZE];
int intermediate_height =
(((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS;
assert(w <= 64);
assert(h <= 64);
assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE);
assert(y_step_q4 <= 32);
assert(x_step_q4 <= 32);
convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1), src_stride, temp, 64,
x_filters, x0_q4, x_step_q4, w, intermediate_height);
convolve_vert(temp + 64 * (SUBPEL_TAPS / 2 - 1), 64, dst, dst_stride,
y_filters, y0_q4, y_step_q4, w, h);
convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1), src_stride, temp,
MAX_SB_SIZE, x_filters, x0_q4, x_step_q4, w,
intermediate_height);
convolve_vert(temp + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1), MAX_SB_SIZE, dst,
dst_stride, y_filters, y0_q4, y_step_q4, w, h);
}
static const InterpKernel *get_filter_base(const int16_t *filter) {
@ -155,70 +156,69 @@ static int get_filter_offset(const int16_t *f, const InterpKernel *base) {
return (int)((const InterpKernel *)(intptr_t)f - base);
}
void vpx_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
const int16_t *filter_y, int y_step_q4, int w,
int h) {
const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x);
(void)filter_y;
(void)y_step_q4;
convolve_horiz(src, src_stride, dst, dst_stride, filters_x,
x0_q4, x_step_q4, w, h);
convolve_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4, x_step_q4,
w, h);
}
void vpx_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
const int16_t *filter_y, int y_step_q4, int w,
int h) {
const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x);
(void)filter_y;
(void)y_step_q4;
convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x,
x0_q4, x_step_q4, w, h);
convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4,
x_step_q4, w, h);
}
void vpx_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
const int16_t *filter_y, int y_step_q4, int w,
int h) {
const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y);
(void)filter_x;
(void)x_step_q4;
convolve_vert(src, src_stride, dst, dst_stride, filters_y,
y0_q4, y_step_q4, w, h);
convolve_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4, y_step_q4,
w, h);
}
void vpx_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
const int16_t *filter_y, int y_step_q4, int w,
int h) {
const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y);
(void)filter_x;
(void)x_step_q4;
convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y,
y0_q4, y_step_q4, w, h);
convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4,
y_step_q4, w, h);
}
void vpx_convolve8_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
void aom_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x);
@ -226,35 +226,35 @@ void vpx_convolve8_c(const uint8_t *src, ptrdiff_t src_stride,
const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y);
convolve(src, src_stride, dst, dst_stride,
filters_x, x0_q4, x_step_q4,
convolve(src, src_stride, dst, dst_stride, filters_x, x0_q4, x_step_q4,
filters_y, y0_q4, y_step_q4, w, h);
}
void vpx_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
void aom_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
/* Fixed size intermediate buffer places limits on parameters. */
DECLARE_ALIGNED(16, uint8_t, temp[64 * 64]);
assert(w <= 64);
assert(h <= 64);
DECLARE_ALIGNED(16, uint8_t, temp[MAX_SB_SIZE * MAX_SB_SIZE]);
assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE);
vpx_convolve8_c(src, src_stride, temp, 64,
filter_x, x_step_q4, filter_y, y_step_q4, w, h);
vpx_convolve_avg_c(temp, 64, dst, dst_stride, NULL, 0, NULL, 0, w, h);
aom_convolve8_c(src, src_stride, temp, MAX_SB_SIZE, filter_x, x_step_q4,
filter_y, y_step_q4, w, h);
aom_convolve_avg_c(temp, MAX_SB_SIZE, dst, dst_stride, NULL, 0, NULL, 0, w,
h);
}
void vpx_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride,
int w, int h) {
void aom_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int filter_x_stride, const int16_t *filter_y,
int filter_y_stride, int w, int h) {
int r;
(void)filter_x; (void)filter_x_stride;
(void)filter_y; (void)filter_y_stride;
(void)filter_x;
(void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
for (r = h; r > 0; --r) {
memcpy(dst, src, w);
@ -263,85 +263,80 @@ void vpx_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride,
}
}
void vpx_convolve_avg_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride,
int w, int h) {
void aom_convolve_avg_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int filter_x_stride, const int16_t *filter_y,
int filter_y_stride, int w, int h) {
int x, y;
(void)filter_x; (void)filter_x_stride;
(void)filter_y; (void)filter_y_stride;
(void)filter_x;
(void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
for (y = 0; y < h; ++y) {
for (x = 0; x < w; ++x)
dst[x] = ROUND_POWER_OF_TWO(dst[x] + src[x], 1);
for (x = 0; x < w; ++x) dst[x] = ROUND_POWER_OF_TWO(dst[x] + src[x], 1);
src += src_stride;
dst += dst_stride;
}
}
void vpx_scaled_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
void aom_scaled_horiz_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
vpx_convolve8_horiz_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
aom_convolve8_horiz_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h);
}
void vpx_scaled_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
void aom_scaled_vert_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
vpx_convolve8_vert_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
aom_convolve8_vert_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h);
}
void vpx_scaled_2d_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
void aom_scaled_2d_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
vpx_convolve8_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
aom_convolve8_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h);
}
void vpx_scaled_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_scaled_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
vpx_convolve8_avg_horiz_c(src, src_stride, dst, dst_stride, filter_x,
const int16_t *filter_y, int y_step_q4, int w,
int h) {
aom_convolve8_avg_horiz_c(src, src_stride, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h);
}
void vpx_scaled_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_scaled_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
vpx_convolve8_avg_vert_c(src, src_stride, dst, dst_stride, filter_x,
const int16_t *filter_y, int y_step_q4, int w,
int h) {
aom_convolve8_avg_vert_c(src, src_stride, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h);
}
void vpx_scaled_avg_2d_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
void aom_scaled_avg_2d_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
vpx_convolve8_avg_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
aom_convolve8_avg_c(src, src_stride, dst, dst_stride, filter_x, x_step_q4,
filter_y, y_step_q4, w, h);
}
#if CONFIG_VP9_HIGHBITDEPTH
#if CONFIG_AOM_HIGHBITDEPTH
static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *x_filters,
int x0_q4, int x_step_q4,
int w, int h, int bd) {
const InterpKernel *x_filters, int x0_q4,
int x_step_q4, int w, int h, int bd) {
int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@ -352,8 +347,7 @@ static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride,
const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_x[k] * x_filter[k];
for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
dst[x] = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
x_q4 += x_step_q4;
}
@ -364,9 +358,8 @@ static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve_avg_horiz(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *x_filters,
int x0_q4, int x_step_q4,
int w, int h, int bd) {
const InterpKernel *x_filters, int x0_q4,
int x_step_q4, int w, int h, int bd) {
int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@ -377,10 +370,10 @@ static void highbd_convolve_avg_horiz(const uint8_t *src8, ptrdiff_t src_stride,
const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_x[k] * x_filter[k];
dst[x] = ROUND_POWER_OF_TWO(dst[x] +
clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd), 1);
for (k = 0; k < SUBPEL_TAPS; ++k) sum += src_x[k] * x_filter[k];
dst[x] = ROUND_POWER_OF_TWO(
dst[x] + clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd),
1);
x_q4 += x_step_q4;
}
src += src_stride;
@ -390,9 +383,8 @@ static void highbd_convolve_avg_horiz(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *y_filters,
int y0_q4, int y_step_q4, int w, int h,
int bd) {
const InterpKernel *y_filters, int y0_q4,
int y_step_q4, int w, int h, int bd) {
int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@ -405,8 +397,8 @@ static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride,
int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_y[k * src_stride] * y_filter[k];
dst[y * dst_stride] = clip_pixel_highbd(
ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
dst[y * dst_stride] =
clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
y_q4 += y_step_q4;
}
++src;
@ -416,9 +408,8 @@ static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve_avg_vert(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride,
const InterpKernel *y_filters,
int y0_q4, int y_step_q4, int w, int h,
int bd) {
const InterpKernel *y_filters, int y0_q4,
int y_step_q4, int w, int h, int bd) {
int x, y;
uint16_t *src = CONVERT_TO_SHORTPTR(src8);
uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
@ -431,8 +422,10 @@ static void highbd_convolve_avg_vert(const uint8_t *src8, ptrdiff_t src_stride,
int k, sum = 0;
for (k = 0; k < SUBPEL_TAPS; ++k)
sum += src_y[k * src_stride] * y_filter[k];
dst[y * dst_stride] = ROUND_POWER_OF_TWO(dst[y * dst_stride] +
clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd), 1);
dst[y * dst_stride] = ROUND_POWER_OF_TWO(
dst[y * dst_stride] +
clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd),
1);
y_q4 += y_step_q4;
}
++src;
@ -442,11 +435,9 @@ static void highbd_convolve_avg_vert(const uint8_t *src8, ptrdiff_t src_stride,
static void highbd_convolve(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const InterpKernel *const x_filters,
int x0_q4, int x_step_q4,
const InterpKernel *const y_filters,
int y0_q4, int y_step_q4,
int w, int h, int bd) {
const InterpKernel *const x_filters, int x0_q4,
int x_step_q4, const InterpKernel *const y_filters,
int y0_q4, int y_step_q4, int w, int h, int bd) {
// Note: Fixed size intermediate buffer, temp, places limits on parameters.
// 2d filtering proceeds in 2 steps:
// (1) Interpolate horizontally into an intermediate buffer, temp.
@ -459,26 +450,38 @@ static void highbd_convolve(const uint8_t *src, ptrdiff_t src_stride,
// --Must round-up because block may be located at sub-pixel position.
// --Require an additional SUBPEL_TAPS rows for the 8-tap filter tails.
// --((64 - 1) * 32 + 15) >> 4 + 8 = 135.
uint16_t temp[64 * 135];
uint16_t temp[MAX_EXT_SIZE * MAX_SB_SIZE];
int intermediate_height =
(((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS;
assert(w <= 64);
assert(h <= 64);
assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE);
assert(y_step_q4 <= 32);
assert(x_step_q4 <= 32);
highbd_convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1),
src_stride, CONVERT_TO_BYTEPTR(temp), 64,
x_filters, x0_q4, x_step_q4, w,
intermediate_height, bd);
highbd_convolve_vert(CONVERT_TO_BYTEPTR(temp) + 64 * (SUBPEL_TAPS / 2 - 1),
64, dst, dst_stride, y_filters, y0_q4, y_step_q4,
w, h, bd);
highbd_convolve_horiz(src - src_stride * (SUBPEL_TAPS / 2 - 1), src_stride,
CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE, x_filters, x0_q4,
x_step_q4, w, intermediate_height, bd);
highbd_convolve_vert(
CONVERT_TO_BYTEPTR(temp) + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1),
MAX_SB_SIZE, dst, dst_stride, y_filters, y0_q4, y_step_q4, w, h, bd);
}
void aom_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, int w,
int h, int bd) {
const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x);
(void)filter_y;
(void)y_step_q4;
void vpx_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
highbd_convolve_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4,
x_step_q4, w, h, bd);
}
void aom_highbd_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
@ -488,25 +491,25 @@ void vpx_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
(void)filter_y;
(void)y_step_q4;
highbd_convolve_horiz(src, src_stride, dst, dst_stride, filters_x,
x0_q4, x_step_q4, w, h, bd);
highbd_convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4,
x_step_q4, w, h, bd);
}
void vpx_highbd_convolve8_avg_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h, int bd) {
const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x);
(void)filter_y;
(void)y_step_q4;
const int16_t *filter_y, int y_step_q4, int w,
int h, int bd) {
const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y);
(void)filter_x;
(void)x_step_q4;
highbd_convolve_avg_horiz(src, src_stride, dst, dst_stride, filters_x,
x0_q4, x_step_q4, w, h, bd);
highbd_convolve_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4,
y_step_q4, w, h, bd);
}
void vpx_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_highbd_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
@ -516,57 +519,42 @@ void vpx_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
(void)filter_x;
(void)x_step_q4;
highbd_convolve_vert(src, src_stride, dst, dst_stride, filters_y,
y0_q4, y_step_q4, w, h, bd);
highbd_convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4,
y_step_q4, w, h, bd);
}
void vpx_highbd_convolve8_avg_vert_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_highbd_convolve8_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h, int bd) {
const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y);
(void)filter_x;
(void)x_step_q4;
highbd_convolve_avg_vert(src, src_stride, dst, dst_stride, filters_y,
y0_q4, y_step_q4, w, h, bd);
}
void vpx_highbd_convolve8_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h, int bd) {
const int16_t *filter_y, int y_step_q4, int w,
int h, int bd) {
const InterpKernel *const filters_x = get_filter_base(filter_x);
const int x0_q4 = get_filter_offset(filter_x, filters_x);
const InterpKernel *const filters_y = get_filter_base(filter_y);
const int y0_q4 = get_filter_offset(filter_y, filters_y);
highbd_convolve(src, src_stride, dst, dst_stride,
filters_x, x0_q4, x_step_q4,
highbd_convolve(src, src_stride, dst, dst_stride, filters_x, x0_q4, x_step_q4,
filters_y, y0_q4, y_step_q4, w, h, bd);
}
void vpx_highbd_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride,
void aom_highbd_convolve8_avg_c(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h, int bd) {
const int16_t *filter_y, int y_step_q4, int w,
int h, int bd) {
// Fixed size intermediate buffer places limits on parameters.
DECLARE_ALIGNED(16, uint16_t, temp[64 * 64]);
assert(w <= 64);
assert(h <= 64);
DECLARE_ALIGNED(16, uint16_t, temp[MAX_SB_SIZE * MAX_SB_SIZE]);
assert(w <= MAX_SB_SIZE);
assert(h <= MAX_SB_SIZE);
vpx_highbd_convolve8_c(src, src_stride, CONVERT_TO_BYTEPTR(temp), 64,
aom_highbd_convolve8_c(src, src_stride, CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE,
filter_x, x_step_q4, filter_y, y_step_q4, w, h, bd);
vpx_highbd_convolve_avg_c(CONVERT_TO_BYTEPTR(temp), 64, dst, dst_stride,
NULL, 0, NULL, 0, w, h, bd);
aom_highbd_convolve_avg_c(CONVERT_TO_BYTEPTR(temp), MAX_SB_SIZE, dst,
dst_stride, NULL, 0, NULL, 0, w, h, bd);
}
void vpx_highbd_convolve_copy_c(const uint8_t *src8, ptrdiff_t src_stride,
void aom_highbd_convolve_copy_c(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride,
@ -587,7 +575,7 @@ void vpx_highbd_convolve_copy_c(const uint8_t *src8, ptrdiff_t src_stride,
}
}
void vpx_highbd_convolve_avg_c(const uint8_t *src8, ptrdiff_t src_stride,
void aom_highbd_convolve_avg_c(const uint8_t *src8, ptrdiff_t src_stride,
uint8_t *dst8, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride,

57
aom_dsp/aom_convolve.h Normal file
View File

@ -0,0 +1,57 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_CONVOLVE_H_
#define AOM_DSP_AOM_CONVOLVE_H_
#include "./aom_config.h"
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
// Note: Fixed size intermediate buffers, place limits on parameters
// of some functions. 2d filtering proceeds in 2 steps:
// (1) Interpolate horizontally into an intermediate buffer, temp.
// (2) Interpolate temp vertically to derive the sub-pixel result.
// Deriving the maximum number of rows in the temp buffer (135):
// --Smallest scaling factor is x1/2 ==> y_step_q4 = 32 (Normative).
// --Largest block size is 64x64 pixels.
// --64 rows in the downscaled frame span a distance of (64 - 1) * 32 in the
// original frame (in 1/16th pixel units).
// --Must round-up because block may be located at sub-pixel position.
// --Require an additional SUBPEL_TAPS rows for the 8-tap filter tails.
// --((64 - 1) * 32 + 15) >> 4 + 8 = 135.
#if CONFIG_AV1 && CONFIG_EXT_PARTITION
#define MAX_EXT_SIZE 263
#else
#define MAX_EXT_SIZE 135
#endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
typedef void (*convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, int w,
int h);
#if CONFIG_AOM_HIGHBITDEPTH
typedef void (*highbd_convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h, int bd);
#endif
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_CONVOLVE_H_

View File

@ -1,27 +1,39 @@
##
## Copyright (c) 2015 The WebM project authors. All Rights Reserved.
## Copyright (c) 2016, Alliance for Open Media. All rights reserved
##
## Use of this source code is governed by a BSD-style license
## that can be found in the LICENSE file in the root of the source
## tree. An additional intellectual property rights grant can be found
## in the file PATENTS. All contributing project authors may
## be found in the AUTHORS file in the root of the source tree.
## This source code is subject to the terms of the BSD 2 Clause License and
## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
## was not distributed with this source code in the LICENSE file, you can
## obtain it at www.aomedia.org/license/software. If the Alliance for Open
## Media Patent License 1.0 was not distributed with this source code in the
## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
##
DSP_SRCS-yes += vpx_dsp.mk
DSP_SRCS-yes += vpx_dsp_common.h
DSP_SRCS-yes += aom_dsp.mk
DSP_SRCS-yes += aom_dsp_common.h
DSP_SRCS-$(HAVE_MSA) += mips/macros_msa.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/synonyms.h
# bit reader
DSP_SRCS-yes += prob.h
DSP_SRCS-yes += prob.c
DSP_SRCS-$(CONFIG_ANS) += ans.h
DSP_SRCS-$(CONFIG_ANS) += ans.c
ifeq ($(CONFIG_ENCODERS),yes)
DSP_SRCS-$(CONFIG_ANS) += answriter.h
DSP_SRCS-yes += bitwriter.h
DSP_SRCS-yes += bitwriter.c
DSP_SRCS-yes += dkboolwriter.h
DSP_SRCS-yes += dkboolwriter.c
DSP_SRCS-yes += bitwriter_buffer.c
DSP_SRCS-yes += bitwriter_buffer.h
DSP_SRCS-yes += psnr.c
DSP_SRCS-yes += psnr.h
DSP_SRCS-$(CONFIG_ANS) += buf_ans.h
DSP_SRCS-$(CONFIG_ANS) += buf_ans.c
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.c
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += ssim.h
DSP_SRCS-$(CONFIG_INTERNAL_STATS) += psnrhvs.c
@ -29,8 +41,10 @@ DSP_SRCS-$(CONFIG_INTERNAL_STATS) += fastssim.c
endif
ifeq ($(CONFIG_DECODERS),yes)
DSP_SRCS-$(CONFIG_ANS) += ansreader.h
DSP_SRCS-yes += bitreader.h
DSP_SRCS-yes += bitreader.c
DSP_SRCS-yes += dkboolreader.h
DSP_SRCS-yes += dkboolreader.c
DSP_SRCS-yes += bitreader_buffer.c
DSP_SRCS-yes += bitreader_buffer.h
endif
@ -38,25 +52,28 @@ endif
# intra predictions
DSP_SRCS-yes += intrapred.c
ifeq ($(CONFIG_USE_X86INC),yes)
ifeq ($(CONFIG_DAALA_EC),yes)
DSP_SRCS-yes += entenc.c
DSP_SRCS-yes += entenc.h
DSP_SRCS-yes += entdec.c
DSP_SRCS-yes += entdec.h
DSP_SRCS-yes += entcode.c
DSP_SRCS-yes += entcode.h
DSP_SRCS-yes += daalaboolreader.c
DSP_SRCS-yes += daalaboolreader.h
DSP_SRCS-yes += daalaboolwriter.c
DSP_SRCS-yes += daalaboolwriter.h
endif
DSP_SRCS-$(HAVE_SSE) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/intrapred_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_ssse3.asm
endif # CONFIG_USE_X86INC
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_ssse3.asm
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
ifeq ($(CONFIG_USE_X86INC),yes)
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE) += x86/highbd_intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_intrapred_sse2.asm
endif # CONFIG_USE_X86INC
endif # CONFIG_VP9_HIGHBITDEPTH
ifneq ($(filter yes,$(CONFIG_POSTPROC) $(CONFIG_VP9_POSTPROC)),)
DSP_SRCS-yes += add_noise.c
DSP_SRCS-$(HAVE_MSA) += mips/add_noise_msa.c
DSP_SRCS-$(HAVE_SSE2) += x86/add_noise_sse2.asm
endif # CONFIG_POSTPROC
endif # CONFIG_AOM_HIGHBITDEPTH
DSP_SRCS-$(HAVE_NEON_ASM) += arm/intrapred_neon_asm$(ASM)
DSP_SRCS-$(HAVE_NEON) += arm/intrapred_neon.c
@ -68,53 +85,61 @@ DSP_SRCS-$(HAVE_DSPR2) += mips/intrapred16_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/common_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/common_dspr2.c
# inter predictions
DSP_SRCS-yes += blend.h
DSP_SRCS-yes += blend_a64_mask.c
DSP_SRCS-yes += blend_a64_hmask.c
DSP_SRCS-yes += blend_a64_vmask.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_sse4.h
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_mask_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_hmask_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/blend_a64_vmask_sse4.c
# interpolation filters
DSP_SRCS-yes += vpx_convolve.c
DSP_SRCS-yes += vpx_convolve.h
DSP_SRCS-yes += vpx_filter.h
DSP_SRCS-yes += aom_convolve.c
DSP_SRCS-yes += aom_convolve.h
DSP_SRCS-yes += aom_filter.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/convolve.h
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/vpx_asm_stubs.c
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_subpixel_bilinear_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_bilinear_ssse3.asm
DSP_SRCS-$(HAVE_AVX2) += x86/vpx_subpixel_8t_intrin_avx2.c
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_intrin_ssse3.c
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_bilinear_sse2.asm
endif
ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_convolve_copy_sse2.asm
DSP_SRCS-$(ARCH_X86)$(ARCH_X86_64) += x86/aom_asm_stubs.c
DSP_SRCS-$(HAVE_SSE2) += x86/aom_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/aom_subpixel_bilinear_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_bilinear_ssse3.asm
DSP_SRCS-$(HAVE_AVX2) += x86/aom_subpixel_8t_intrin_avx2.c
DSP_SRCS-$(HAVE_SSSE3) += x86/aom_subpixel_8t_intrin_ssse3.c
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/aom_high_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/aom_high_subpixel_bilinear_sse2.asm
endif
DSP_SRCS-$(HAVE_SSE2) += x86/aom_convolve_copy_sse2.asm
ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/vpx_convolve_copy_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve8_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve8_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/vpx_convolve_neon.c
DSP_SRCS-yes += arm/aom_convolve_copy_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve8_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve8_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve_avg_neon_asm$(ASM)
DSP_SRCS-yes += arm/aom_convolve_neon.c
else
ifeq ($(HAVE_NEON),yes)
DSP_SRCS-yes += arm/vpx_convolve_copy_neon.c
DSP_SRCS-yes += arm/vpx_convolve8_avg_neon.c
DSP_SRCS-yes += arm/vpx_convolve8_neon.c
DSP_SRCS-yes += arm/vpx_convolve_avg_neon.c
DSP_SRCS-yes += arm/vpx_convolve_neon.c
DSP_SRCS-yes += arm/aom_convolve_copy_neon.c
DSP_SRCS-yes += arm/aom_convolve8_avg_neon.c
DSP_SRCS-yes += arm/aom_convolve8_neon.c
DSP_SRCS-yes += arm/aom_convolve_avg_neon.c
DSP_SRCS-yes += arm/aom_convolve_neon.c
endif # HAVE_NEON
endif # HAVE_NEON_ASM
# common (msa)
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_avg_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_avg_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve8_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve_copy_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/vpx_convolve_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_avg_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_horiz_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve8_vert_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_avg_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_copy_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/aom_convolve_msa.h
# common (dspr2)
DSP_SRCS-$(HAVE_DSPR2) += mips/convolve_common_dspr2.h
@ -161,15 +186,37 @@ DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_horiz_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/loopfilter_mb_vert_dspr2.c
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_loopfilter_sse2.c
endif # CONFIG_VP9_HIGHBITDEPTH
endif # CONFIG_AOM_HIGHBITDEPTH
DSP_SRCS-yes += txfm_common.h
DSP_SRCS-yes += x86/txfm_common_intrin.h
DSP_SRCS-$(HAVE_SSE2) += x86/txfm_common_sse2.h
DSP_SRCS-$(HAVE_MSA) += mips/txfm_macros_msa.h
# forward transform
ifneq ($(filter yes,$(CONFIG_VP9_ENCODER) $(CONFIG_VP10_ENCODER)),)
ifeq ($(CONFIG_AV1),yes)
DSP_SRCS-yes += fwd_txfm.c
DSP_SRCS-yes += fwd_txfm.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32_8cols_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_impl_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32x32_impl_sse2.h
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/fwd_txfm_ssse3_x86_64.asm
endif
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.h
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/txfm_common_avx2.h
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_dct32x32_impl_avx2.h
DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_dct32x32_msa.c
endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_PVQ),yes)
DSP_SRCS-yes += fwd_txfm.c
DSP_SRCS-yes += fwd_txfm.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.h
@ -177,30 +224,26 @@ DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.c
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_impl_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32x32_impl_sse2.h
ifeq ($(ARCH_X86_64),yes)
ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/fwd_txfm_ssse3_x86_64.asm
endif
endif
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_dct32x32_impl_avx2.h
DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.h
DSP_SRCS-$(HAVE_MSA) += mips/fwd_txfm_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/fwd_dct32x32_msa.c
endif # CONFIG_VP9_ENCODER || CONFIG_VP10_ENCODER
endif # CONFIG_PVQ
# inverse transform
ifneq ($(filter yes,$(CONFIG_VP9) $(CONFIG_VP10)),)
ifeq ($(CONFIG_AV1), yes)
DSP_SRCS-yes += inv_txfm.h
DSP_SRCS-yes += inv_txfm.c
DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.c
ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/inv_wht_sse2.asm
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/inv_txfm_ssse3_x86_64.asm
endif # ARCH_X86_64
endif # CONFIG_USE_X86INC
ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/save_reg_neon$(ASM)
@ -232,31 +275,29 @@ DSP_SRCS-$(HAVE_MSA) += mips/idct8x8_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/idct16x16_msa.c
DSP_SRCS-$(HAVE_MSA) += mips/idct32x32_msa.c
ifneq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
ifneq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_DSPR2) += mips/inv_txfm_dspr2.h
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans4_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans8_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans16_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_dspr2.c
DSP_SRCS-$(HAVE_DSPR2) += mips/itrans32_cols_dspr2.c
endif # CONFIG_VP9_HIGHBITDEPTH
endif # CONFIG_VP9 || CONFIG_VP10
endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_AV1
# quantization
ifneq ($(filter yes, $(CONFIG_VP9_ENCODER) $(CONFIG_VP10_ENCODER)),)
ifneq ($(filter yes,$(CONFIG_AV1_ENCODER)),)
DSP_SRCS-yes += quantize.c
DSP_SRCS-yes += quantize.h
DSP_SRCS-$(HAVE_SSE2) += x86/quantize_sse2.c
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_quantize_intrin_sse2.c
endif
ifeq ($(ARCH_X86_64),yes)
ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/quantize_ssse3_x86_64.asm
DSP_SRCS-$(HAVE_AVX) += x86/quantize_avx_x86_64.asm
endif
endif
# avg
DSP_SRCS-yes += avg.c
@ -265,12 +306,21 @@ DSP_SRCS-$(HAVE_NEON) += arm/avg_neon.c
DSP_SRCS-$(HAVE_MSA) += mips/avg_msa.c
DSP_SRCS-$(HAVE_NEON) += arm/hadamard_neon.c
ifeq ($(ARCH_X86_64),yes)
ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/avg_ssse3_x86_64.asm
endif
# high bit depth subtract
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subtract_sse2.c
endif
endif # CONFIG_VP9_ENCODER || CONFIG_VP10_ENCODER
endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_AV1_ENCODER),yes)
DSP_SRCS-yes += sum_squares.c
DSP_SRCS-$(HAVE_SSE2) += x86/sum_squares_sse2.c
endif # CONFIG_AV1_ENCODER
ifeq ($(CONFIG_ENCODERS),yes)
DSP_SRCS-yes += sad.c
@ -290,22 +340,31 @@ DSP_SRCS-$(HAVE_SSE4_1) += x86/sad_sse4.asm
DSP_SRCS-$(HAVE_AVX2) += x86/sad4d_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/sad_avx2.c
ifeq ($(CONFIG_USE_X86INC),yes)
ifeq ($(CONFIG_AV1_ENCODER),yes)
ifeq ($(CONFIG_EXT_INTER),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/masked_sad_intrin_ssse3.c
DSP_SRCS-$(HAVE_SSSE3) += x86/masked_variance_intrin_ssse3.c
endif #CONFIG_EXT_INTER
ifeq ($(CONFIG_MOTION_VAR),yes)
DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_sad_sse4.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_variance_sse4.c
endif #CONFIG_MOTION_VAR
endif #CONFIG_AV1_ENCODER
DSP_SRCS-$(HAVE_SSE) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE) += x86/sad_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/sad_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subtract_sse2.asm
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm
endif # CONFIG_VP9_HIGHBITDEPTH
endif # CONFIG_USE_X86INC
endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_ENCODERS
ifneq ($(filter yes,$(CONFIG_ENCODERS) $(CONFIG_POSTPROC) $(CONFIG_VP9_POSTPROC)),)
ifneq ($(filter yes,$(CONFIG_ENCODERS)),)
DSP_SRCS-yes += variance.c
DSP_SRCS-yes += variance.h
@ -332,23 +391,36 @@ ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/ssim_opt_x86_64.asm
endif # ARCH_X86_64
ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE) += x86/subpel_variance_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subpel_variance_sse2.asm # Contains SSE2 and SSSE3
endif # CONFIG_USE_X86INC
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
ifeq ($(CONFIG_AOM_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_sse2.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/highbd_variance_sse4.c
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_impl_sse2.asm
ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subpel_variance_impl_sse2.asm
endif # CONFIG_USE_X86INC
endif # CONFIG_VP9_HIGHBITDEPTH
endif # CONFIG_ENCODERS || CONFIG_POSTPROC || CONFIG_VP9_POSTPROC
endif # CONFIG_AOM_HIGHBITDEPTH
endif # CONFIG_ENCODERS
DSP_SRCS-no += $(DSP_SRCS_REMOVE-yes)
DSP_SRCS-yes += vpx_dsp_rtcd.c
DSP_SRCS-yes += vpx_dsp_rtcd_defs.pl
DSP_SRCS-yes += aom_dsp_rtcd.c
DSP_SRCS-yes += aom_dsp_rtcd_defs.pl
$(eval $(call rtcd_h_template,vpx_dsp_rtcd,vpx_dsp/vpx_dsp_rtcd_defs.pl))
DSP_SRCS-yes += aom_simd.c
DSP_SRCS-yes += aom_simd.h
DSP_SRCS-yes += aom_simd_inline.h
DSP_SRCS-yes += simd/v64_intrinsics.h
DSP_SRCS-yes += simd/v64_intrinsics_c.h
DSP_SRCS-yes += simd/v128_intrinsics.h
DSP_SRCS-yes += simd/v128_intrinsics_c.h
DSP_SRCS-yes += simd/v256_intrinsics.h
DSP_SRCS-yes += simd/v256_intrinsics_c.h
DSP_SRCS-$(HAVE_SSE2) += simd/v64_intrinsics_x86.h
DSP_SRCS-$(HAVE_SSE2) += simd/v128_intrinsics_x86.h
DSP_SRCS-$(HAVE_SSE2) += simd/v256_intrinsics_x86.h
DSP_SRCS-$(HAVE_NEON) += simd/v64_intrinsics_arm.h
DSP_SRCS-$(HAVE_NEON) += simd/v128_intrinsics_arm.h
DSP_SRCS-$(HAVE_NEON) += simd/v256_intrinsics_arm.h
$(eval $(call rtcd_h_template,aom_dsp_rtcd,aom_dsp/aom_dsp_rtcd_defs.pl))

102
aom_dsp/aom_dsp_common.h Normal file
View File

@ -0,0 +1,102 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_DSP_COMMON_H_
#define AOM_DSP_AOM_DSP_COMMON_H_
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#ifdef __cplusplus
extern "C" {
#endif
#ifndef MAX_SB_SIZE
#if CONFIG_AV1 && CONFIG_EXT_PARTITION
#define MAX_SB_SIZE 128
#else
#define MAX_SB_SIZE 64
#endif // CONFIG_AV1 && CONFIG_EXT_PARTITION
#endif // ndef MAX_SB_SIZE
#define AOMMIN(x, y) (((x) < (y)) ? (x) : (y))
#define AOMMAX(x, y) (((x) > (y)) ? (x) : (y))
#define IMPLIES(a, b) (!(a) || (b)) // Logical 'a implies b' (or 'a -> b')
#define IS_POWER_OF_TWO(x) (((x) & ((x)-1)) == 0)
// These can be used to give a hint about branch outcomes.
// This can have an effect, even if your target processor has a
// good branch predictor, as these hints can affect basic block
// ordering by the compiler.
#ifdef __GNUC__
#define LIKELY(v) __builtin_expect(v, 1)
#define UNLIKELY(v) __builtin_expect(v, 0)
#else
#define LIKELY(v) (v)
#define UNLIKELY(v) (v)
#endif
#define AOM_SWAP(type, a, b) \
do { \
type c = (b); \
b = a; \
a = c; \
} while (0)
#if CONFIG_AOM_QM
typedef uint16_t qm_val_t;
#define AOM_QM_BITS 6
#endif
#if CONFIG_AOM_HIGHBITDEPTH
// Note:
// tran_low_t is the datatype used for final transform coefficients.
// tran_high_t is the datatype used for intermediate transform stages.
typedef int64_t tran_high_t;
typedef int32_t tran_low_t;
#else
// Note:
// tran_low_t is the datatype used for final transform coefficients.
// tran_high_t is the datatype used for intermediate transform stages.
typedef int32_t tran_high_t;
typedef int16_t tran_low_t;
#endif // CONFIG_AOM_HIGHBITDEPTH
static INLINE uint8_t clip_pixel(int val) {
return (val > 255) ? 255 : (val < 0) ? 0 : val;
}
static INLINE int clamp(int value, int low, int high) {
return value < low ? low : (value > high ? high : value);
}
static INLINE double fclamp(double value, double low, double high) {
return value < low ? low : (value > high ? high : value);
}
#if CONFIG_AOM_HIGHBITDEPTH
static INLINE uint16_t clip_pixel_highbd(int val, int bd) {
switch (bd) {
case 8:
default: return (uint16_t)clamp(val, 0, 255);
case 10: return (uint16_t)clamp(val, 0, 1023);
case 12: return (uint16_t)clamp(val, 0, 4095);
}
}
#endif // CONFIG_AOM_HIGHBITDEPTH
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_DSP_COMMON_H_

16
aom_dsp/aom_dsp_rtcd.c Normal file
View File

@ -0,0 +1,16 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#define RTCD_C
#include "./aom_dsp_rtcd.h"
#include "aom_ports/aom_once.h"
void aom_dsp_rtcd() { once(setup_rtcd_internal); }

1950
aom_dsp/aom_dsp_rtcd_defs.pl Normal file

File diff suppressed because it is too large Load Diff

43
aom_dsp/aom_filter.h Normal file
View File

@ -0,0 +1,43 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_FILTER_H_
#define AOM_DSP_AOM_FILTER_H_
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
#define FILTER_BITS 7
#define SUBPEL_BITS 4
#define SUBPEL_MASK ((1 << SUBPEL_BITS) - 1)
#define SUBPEL_SHIFTS (1 << SUBPEL_BITS)
#define SUBPEL_TAPS 8
typedef int16_t InterpKernel[SUBPEL_TAPS];
#define BIL_SUBPEL_BITS 3
#define BIL_SUBPEL_SHIFTS (1 << BIL_SUBPEL_BITS)
// 2 tap bilinear filters
static const uint8_t bilinear_filters_2t[BIL_SUBPEL_SHIFTS][2] = {
{ 128, 0 }, { 112, 16 }, { 96, 32 }, { 80, 48 },
{ 64, 64 }, { 48, 80 }, { 32, 96 }, { 16, 112 },
};
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_AOM_FILTER_H_

13
aom_dsp/aom_simd.c Normal file
View File

@ -0,0 +1,13 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
// Set to 1 to add some sanity checks in the fallback C code
const int simd_check = 1;

32
aom_dsp/aom_simd.h Normal file
View File

@ -0,0 +1,32 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_AOM_SIMD_H_
#define AOM_DSP_AOM_AOM_SIMD_H_
#include <stdint.h>
#if defined(_WIN32)
#include <intrin.h>
#endif
#include "./aom_config.h"
#include "./aom_simd_inline.h"
#if HAVE_NEON
#include "simd/v256_intrinsics_arm.h"
#elif HAVE_SSE2
#include "simd/v256_intrinsics_x86.h"
#else
#include "simd/v256_intrinsics.h"
#endif
#endif // AOM_DSP_AOM_AOM_SIMD_H_

21
aom_dsp/aom_simd_inline.h Normal file
View File

@ -0,0 +1,21 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_AOM_SIMD_INLINE_H_
#define AOM_DSP_AOM_SIMD_INLINE_H_
#include "aom/aom_integer.h"
#ifndef SIMD_INLINE
#define SIMD_INLINE static AOM_FORCE_INLINE
#endif
#endif // AOM_DSP_AOM_SIMD_INLINE_H_

View File

@ -1,30 +1,26 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include <assert.h>
#include "./vpx_config.h"
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "vpx_ports/mem.h"
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
static INLINE int32x4_t MULTIPLY_BY_Q0(
int16x4_t dsrc0,
int16x4_t dsrc1,
int16x4_t dsrc2,
int16x4_t dsrc3,
int16x4_t dsrc4,
int16x4_t dsrc5,
int16x4_t dsrc6,
int16x4_t dsrc7,
static INLINE int32x4_t MULTIPLY_BY_Q0(int16x4_t dsrc0, int16x4_t dsrc1,
int16x4_t dsrc2, int16x4_t dsrc3,
int16x4_t dsrc4, int16x4_t dsrc5,
int16x4_t dsrc6, int16x4_t dsrc7,
int16x8_t q0s16) {
int32x4_t qdst;
int16x4_t d0s16, d1s16;
@ -43,17 +39,12 @@ static INLINE int32x4_t MULTIPLY_BY_Q0(
return qdst;
}
void vpx_convolve8_avg_horiz_neon(
const uint8_t *src,
ptrdiff_t src_stride,
uint8_t *dst,
ptrdiff_t dst_stride,
const int16_t *filter_x,
int x_step_q4,
void aom_convolve8_avg_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, // unused
int y_step_q4, // unused
int w,
int h) {
int w, int h) {
int width;
const uint8_t *s;
uint8_t *d;
@ -74,6 +65,10 @@ void vpx_convolve8_avg_horiz_neon(
assert(x_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_y;
q0s16 = vld1q_s16(filter_x);
src -= 3; // adjust for taps
@ -90,8 +85,8 @@ void vpx_convolve8_avg_horiz_neon(
q12u8 = vcombine_u8(d24u8, d25u8);
q13u8 = vcombine_u8(d26u8, d27u8);
q0x2u16 = vtrnq_u16(vreinterpretq_u16_u8(q12u8),
vreinterpretq_u16_u8(q13u8));
q0x2u16 =
vtrnq_u16(vreinterpretq_u16_u8(q12u8), vreinterpretq_u16_u8(q13u8));
d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0]));
d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0]));
d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1]));
@ -117,9 +112,7 @@ void vpx_convolve8_avg_horiz_neon(
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21
for (width = w;
width > 0;
width -= 4, src += 4, dst += 4) { // loop_horiz
for (width = w; width > 0; width -= 4, src += 4, dst += 4) { // loop_horiz
s = src;
d28u32 = vld1_dup_u32((const uint32_t *)s);
s += src_stride;
@ -131,10 +124,10 @@ void vpx_convolve8_avg_horiz_neon(
__builtin_prefetch(src + 64);
d0x2u16 = vtrn_u16(vreinterpret_u16_u32(d28u32),
vreinterpret_u16_u32(d31u32));
d1x2u16 = vtrn_u16(vreinterpret_u16_u32(d29u32),
vreinterpret_u16_u32(d30u32));
d0x2u16 =
vtrn_u16(vreinterpret_u16_u32(d28u32), vreinterpret_u16_u32(d31u32));
d1x2u16 =
vtrn_u16(vreinterpret_u16_u32(d29u32), vreinterpret_u16_u32(d30u32));
d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28
vreinterpret_u8_u16(d1x2u16.val[0])); // d29
d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31
@ -144,8 +137,8 @@ void vpx_convolve8_avg_horiz_neon(
q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]);
q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]);
q0x2u32 = vtrnq_u32(vreinterpretq_u32_u8(q14u8),
vreinterpretq_u32_u8(q15u8));
q0x2u32 =
vtrnq_u32(vreinterpretq_u32_u8(q14u8), vreinterpretq_u32_u8(q15u8));
d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0]));
d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0]));
@ -173,14 +166,14 @@ void vpx_convolve8_avg_horiz_neon(
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16,
d18s16, d19s16, d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16,
d19s16, d23s16, d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16,
d23s16, d24s16, d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16,
d24s16, d26s16, d27s16, d25s16, q0s16);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, d18s16, d19s16,
d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, d19s16, d23s16,
d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, d23s16, d24s16,
d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
__builtin_prefetch(src + 64 + src_stride * 3);
@ -195,8 +188,7 @@ void vpx_convolve8_avg_horiz_neon(
d2u8 = vqmovn_u16(q1u16);
d3u8 = vqmovn_u16(q2u16);
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8),
vreinterpret_u16_u8(d3u8));
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), vreinterpret_u16_u8(d3u8));
d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]),
vreinterpret_u32_u16(d0x2u16.val[1]));
d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]),
@ -231,16 +223,11 @@ void vpx_convolve8_avg_horiz_neon(
return;
}
void vpx_convolve8_avg_vert_neon(
const uint8_t *src,
ptrdiff_t src_stride,
uint8_t *dst,
ptrdiff_t dst_stride,
void aom_convolve8_avg_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, // unused
int x_step_q4, // unused
const int16_t *filter_y,
int y_step_q4,
int w,
const int16_t *filter_y, int y_step_q4, int w,
int h) {
int height;
const uint8_t *s;
@ -258,6 +245,10 @@ void vpx_convolve8_avg_vert_neon(
assert(y_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_x;
src -= src_stride * 3;
q0s16 = vld1q_s16(filter_y);
for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h
@ -319,20 +310,20 @@ void vpx_convolve8_avg_vert_neon(
__builtin_prefetch(s);
__builtin_prefetch(s + src_stride);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16,
d20s16, d21s16, d22s16, d24s16, q0s16);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, d20s16, d21s16,
d22s16, d24s16, q0s16);
__builtin_prefetch(s + src_stride * 2);
__builtin_prefetch(s + src_stride * 3);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16,
d21s16, d22s16, d24s16, d26s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, d21s16, d22s16,
d24s16, d26s16, q0s16);
__builtin_prefetch(d);
__builtin_prefetch(d + dst_stride);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16,
d22s16, d24s16, d26s16, d27s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, d22s16, d24s16,
d26s16, d27s16, q0s16);
__builtin_prefetch(d + dst_stride * 2);
__builtin_prefetch(d + dst_stride * 3);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16,
d24s16, d26s16, d27s16, d25s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7);

View File

@ -1,11 +1,14 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
@ -14,11 +17,11 @@
; w%4 == 0
; h%4 == 0
; taps == 8
; VP9_FILTER_WEIGHT == 128
; VP9_FILTER_SHIFT == 7
; AV1_FILTER_WEIGHT == 128
; AV1_FILTER_SHIFT == 7
EXPORT |vpx_convolve8_avg_horiz_neon|
EXPORT |vpx_convolve8_avg_vert_neon|
EXPORT |aom_convolve8_avg_horiz_neon|
EXPORT |aom_convolve8_avg_vert_neon|
ARM
REQUIRE8
PRESERVE8
@ -49,7 +52,7 @@
; sp[]int w
; sp[]int h
|vpx_convolve8_avg_horiz_neon| PROC
|aom_convolve8_avg_horiz_neon| PROC
push {r4-r10, lr}
sub r0, r0, #3 ; adjust for taps
@ -72,7 +75,7 @@
mov r10, r6 ; w loop counter
vpx_convolve8_avg_loop_horiz_v
aom_convolve8_avg_loop_horiz_v
vld1.8 {d24}, [r0], r1
vld1.8 {d25}, [r0], r1
vld1.8 {d26}, [r0], r1
@ -95,7 +98,7 @@ vpx_convolve8_avg_loop_horiz_v
add r0, r0, #3
vpx_convolve8_avg_loop_horiz
aom_convolve8_avg_loop_horiz
add r5, r0, #64
vld1.32 {d28[]}, [r0], r1
@ -164,20 +167,20 @@ vpx_convolve8_avg_loop_horiz
vmov q9, q13
subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_avg_loop_horiz
bgt aom_convolve8_avg_loop_horiz
; outer loop
mov r6, r10 ; restore w counter
add r0, r0, r9 ; src += src_stride * 4 - w
add r2, r2, r12 ; dst += dst_stride * 4 - w
subs r7, r7, #4 ; h -= 4
bgt vpx_convolve8_avg_loop_horiz_v
bgt aom_convolve8_avg_loop_horiz_v
pop {r4-r10, pc}
ENDP
|vpx_convolve8_avg_vert_neon| PROC
|aom_convolve8_avg_vert_neon| PROC
push {r4-r8, lr}
; adjust for taps
@ -193,7 +196,7 @@ vpx_convolve8_avg_loop_horiz
lsl r1, r1, #1
lsl r3, r3, #1
vpx_convolve8_avg_loop_vert_h
aom_convolve8_avg_loop_vert_h
mov r4, r0
add r7, r0, r1, asr #1
mov r5, r2
@ -213,7 +216,7 @@ vpx_convolve8_avg_loop_vert_h
vmovl.u8 q10, d20
vmovl.u8 q11, d22
vpx_convolve8_avg_loop_vert
aom_convolve8_avg_loop_vert
; always process a 4x4 block at a time
vld1.u32 {d24[0]}, [r7], r1
vld1.u32 {d26[0]}, [r4], r1
@ -278,13 +281,13 @@ vpx_convolve8_avg_loop_vert
vmov d22, d25
subs r12, r12, #4 ; h -= 4
bgt vpx_convolve8_avg_loop_vert
bgt aom_convolve8_avg_loop_vert
; outer loop
add r0, r0, #4
add r2, r2, #4
subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_avg_loop_vert_h
bgt aom_convolve8_avg_loop_vert_h
pop {r4-r8, pc}

View File

@ -1,30 +1,26 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include <assert.h>
#include "./vpx_config.h"
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "vpx_ports/mem.h"
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
static INLINE int32x4_t MULTIPLY_BY_Q0(
int16x4_t dsrc0,
int16x4_t dsrc1,
int16x4_t dsrc2,
int16x4_t dsrc3,
int16x4_t dsrc4,
int16x4_t dsrc5,
int16x4_t dsrc6,
int16x4_t dsrc7,
static INLINE int32x4_t MULTIPLY_BY_Q0(int16x4_t dsrc0, int16x4_t dsrc1,
int16x4_t dsrc2, int16x4_t dsrc3,
int16x4_t dsrc4, int16x4_t dsrc5,
int16x4_t dsrc6, int16x4_t dsrc7,
int16x8_t q0s16) {
int32x4_t qdst;
int16x4_t d0s16, d1s16;
@ -43,17 +39,12 @@ static INLINE int32x4_t MULTIPLY_BY_Q0(
return qdst;
}
void vpx_convolve8_horiz_neon(
const uint8_t *src,
ptrdiff_t src_stride,
uint8_t *dst,
ptrdiff_t dst_stride,
const int16_t *filter_x,
int x_step_q4,
void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, // unused
int y_step_q4, // unused
int w,
int h) {
int w, int h) {
int width;
const uint8_t *s, *psrc;
uint8_t *d, *pdst;
@ -74,11 +65,14 @@ void vpx_convolve8_horiz_neon(
assert(x_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_y;
q0s16 = vld1q_s16(filter_x);
src -= 3; // adjust for taps
for (; h > 0; h -= 4,
src += src_stride * 4,
for (; h > 0; h -= 4, src += src_stride * 4,
dst += dst_stride * 4) { // loop_horiz_v
s = src;
d24u8 = vld1_u8(s);
@ -92,8 +86,8 @@ void vpx_convolve8_horiz_neon(
q12u8 = vcombine_u8(d24u8, d25u8);
q13u8 = vcombine_u8(d26u8, d27u8);
q0x2u16 = vtrnq_u16(vreinterpretq_u16_u8(q12u8),
vreinterpretq_u16_u8(q13u8));
q0x2u16 =
vtrnq_u16(vreinterpretq_u16_u8(q12u8), vreinterpretq_u16_u8(q13u8));
d24u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[0]));
d25u8 = vreinterpret_u8_u16(vget_high_u16(q0x2u16.val[0]));
d26u8 = vreinterpret_u8_u16(vget_low_u16(q0x2u16.val[1]));
@ -119,8 +113,7 @@ void vpx_convolve8_horiz_neon(
d20s16 = vreinterpret_s16_u16(vget_low_u16(q10u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q10u16)); // vmov 23 21
for (width = w, psrc = src + 7, pdst = dst;
width > 0;
for (width = w, psrc = src + 7, pdst = dst; width > 0;
width -= 4, psrc += 4, pdst += 4) { // loop_horiz
s = psrc;
d28u32 = vld1_dup_u32((const uint32_t *)s);
@ -133,10 +126,10 @@ void vpx_convolve8_horiz_neon(
__builtin_prefetch(psrc + 64);
d0x2u16 = vtrn_u16(vreinterpret_u16_u32(d28u32),
vreinterpret_u16_u32(d31u32));
d1x2u16 = vtrn_u16(vreinterpret_u16_u32(d29u32),
vreinterpret_u16_u32(d30u32));
d0x2u16 =
vtrn_u16(vreinterpret_u16_u32(d28u32), vreinterpret_u16_u32(d31u32));
d1x2u16 =
vtrn_u16(vreinterpret_u16_u32(d29u32), vreinterpret_u16_u32(d30u32));
d0x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[0]), // d28
vreinterpret_u8_u16(d1x2u16.val[0])); // d29
d1x2u8 = vtrn_u8(vreinterpret_u8_u16(d0x2u16.val[1]), // d31
@ -146,8 +139,8 @@ void vpx_convolve8_horiz_neon(
q14u8 = vcombine_u8(d0x2u8.val[0], d0x2u8.val[1]);
q15u8 = vcombine_u8(d1x2u8.val[1], d1x2u8.val[0]);
q0x2u32 = vtrnq_u32(vreinterpretq_u32_u8(q14u8),
vreinterpretq_u32_u8(q15u8));
q0x2u32 =
vtrnq_u32(vreinterpretq_u32_u8(q14u8), vreinterpretq_u32_u8(q15u8));
d28u8 = vreinterpret_u8_u32(vget_low_u32(q0x2u32.val[0]));
d29u8 = vreinterpret_u8_u32(vget_high_u32(q0x2u32.val[0]));
@ -166,14 +159,14 @@ void vpx_convolve8_horiz_neon(
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16,
d18s16, d19s16, d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16,
d19s16, d23s16, d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16,
d23s16, d24s16, d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16,
d24s16, d26s16, d27s16, d25s16, q0s16);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d20s16, d22s16, d18s16, d19s16,
d23s16, d24s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d20s16, d22s16, d18s16, d19s16, d23s16,
d24s16, d26s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d20s16, d22s16, d18s16, d19s16, d23s16, d24s16,
d26s16, d27s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d22s16, d18s16, d19s16, d23s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
__builtin_prefetch(psrc + 60 + src_stride * 3);
@ -188,8 +181,7 @@ void vpx_convolve8_horiz_neon(
d2u8 = vqmovn_u16(q1u16);
d3u8 = vqmovn_u16(q2u16);
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8),
vreinterpret_u16_u8(d3u8));
d0x2u16 = vtrn_u16(vreinterpret_u16_u8(d2u8), vreinterpret_u16_u8(d3u8));
d0x2u32 = vtrn_u32(vreinterpret_u32_u16(d0x2u16.val[0]),
vreinterpret_u32_u16(d0x2u16.val[1]));
d0x2u8 = vtrn_u8(vreinterpret_u8_u32(d0x2u32.val[0]),
@ -217,16 +209,11 @@ void vpx_convolve8_horiz_neon(
return;
}
void vpx_convolve8_vert_neon(
const uint8_t *src,
ptrdiff_t src_stride,
uint8_t *dst,
ptrdiff_t dst_stride,
void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, // unused
int x_step_q4, // unused
const int16_t *filter_y,
int y_step_q4,
int w,
const int16_t *filter_y, int y_step_q4, int w,
int h) {
int height;
const uint8_t *s;
@ -242,6 +229,10 @@ void vpx_convolve8_vert_neon(
assert(y_step_q4 == 16);
(void)x_step_q4;
(void)y_step_q4;
(void)filter_x;
src -= src_stride * 3;
q0s16 = vld1q_s16(filter_y);
for (; w > 0; w -= 4, src += 4, dst += 4) { // loop_vert_h
@ -294,20 +285,20 @@ void vpx_convolve8_vert_neon(
__builtin_prefetch(d);
__builtin_prefetch(d + dst_stride);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16,
d20s16, d21s16, d22s16, d24s16, q0s16);
q1s32 = MULTIPLY_BY_Q0(d16s16, d17s16, d18s16, d19s16, d20s16, d21s16,
d22s16, d24s16, q0s16);
__builtin_prefetch(d + dst_stride * 2);
__builtin_prefetch(d + dst_stride * 3);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16,
d21s16, d22s16, d24s16, d26s16, q0s16);
q2s32 = MULTIPLY_BY_Q0(d17s16, d18s16, d19s16, d20s16, d21s16, d22s16,
d24s16, d26s16, q0s16);
__builtin_prefetch(s);
__builtin_prefetch(s + src_stride);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16,
d22s16, d24s16, d26s16, d27s16, q0s16);
q14s32 = MULTIPLY_BY_Q0(d18s16, d19s16, d20s16, d21s16, d22s16, d24s16,
d26s16, d27s16, q0s16);
__builtin_prefetch(s + src_stride * 2);
__builtin_prefetch(s + src_stride * 3);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16,
d24s16, d26s16, d27s16, d25s16, q0s16);
q15s32 = MULTIPLY_BY_Q0(d19s16, d20s16, d21s16, d22s16, d24s16, d26s16,
d27s16, d25s16, q0s16);
d2u16 = vqrshrun_n_s32(q1s32, 7);
d3u16 = vqrshrun_n_s32(q2s32, 7);

View File

@ -1,11 +1,14 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
@ -14,11 +17,11 @@
; w%4 == 0
; h%4 == 0
; taps == 8
; VP9_FILTER_WEIGHT == 128
; VP9_FILTER_SHIFT == 7
; AV1_FILTER_WEIGHT == 128
; AV1_FILTER_SHIFT == 7
EXPORT |vpx_convolve8_horiz_neon|
EXPORT |vpx_convolve8_vert_neon|
EXPORT |aom_convolve8_horiz_neon|
EXPORT |aom_convolve8_vert_neon|
ARM
REQUIRE8
PRESERVE8
@ -49,7 +52,7 @@
; sp[]int w
; sp[]int h
|vpx_convolve8_horiz_neon| PROC
|aom_convolve8_horiz_neon| PROC
push {r4-r10, lr}
sub r0, r0, #3 ; adjust for taps
@ -72,7 +75,7 @@
mov r10, r6 ; w loop counter
vpx_convolve8_loop_horiz_v
aom_convolve8_loop_horiz_v
vld1.8 {d24}, [r0], r1
vld1.8 {d25}, [r0], r1
vld1.8 {d26}, [r0], r1
@ -95,7 +98,7 @@ vpx_convolve8_loop_horiz_v
add r0, r0, #3
vpx_convolve8_loop_horiz
aom_convolve8_loop_horiz
add r5, r0, #64
vld1.32 {d28[]}, [r0], r1
@ -153,20 +156,20 @@ vpx_convolve8_loop_horiz
vmov q9, q13
subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_loop_horiz
bgt aom_convolve8_loop_horiz
; outer loop
mov r6, r10 ; restore w counter
add r0, r0, r9 ; src += src_stride * 4 - w
add r2, r2, r12 ; dst += dst_stride * 4 - w
subs r7, r7, #4 ; h -= 4
bgt vpx_convolve8_loop_horiz_v
bgt aom_convolve8_loop_horiz_v
pop {r4-r10, pc}
ENDP
|vpx_convolve8_vert_neon| PROC
|aom_convolve8_vert_neon| PROC
push {r4-r8, lr}
; adjust for taps
@ -182,7 +185,7 @@ vpx_convolve8_loop_horiz
lsl r1, r1, #1
lsl r3, r3, #1
vpx_convolve8_loop_vert_h
aom_convolve8_loop_vert_h
mov r4, r0
add r7, r0, r1, asr #1
mov r5, r2
@ -202,7 +205,7 @@ vpx_convolve8_loop_vert_h
vmovl.u8 q10, d20
vmovl.u8 q11, d22
vpx_convolve8_loop_vert
aom_convolve8_loop_vert
; always process a 4x4 block at a time
vld1.u32 {d24[0]}, [r7], r1
vld1.u32 {d26[0]}, [r4], r1
@ -256,13 +259,13 @@ vpx_convolve8_loop_vert
vmov d22, d25
subs r12, r12, #4 ; h -= 4
bgt vpx_convolve8_loop_vert
bgt aom_convolve8_loop_vert
; outer loop
add r0, r0, #4
add r2, r2, #4
subs r6, r6, #4 ; w -= 4
bgt vpx_convolve8_loop_vert_h
bgt aom_convolve8_loop_vert_h
pop {r4-r8, pc}

View File

@ -1,35 +1,34 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
void vpx_convolve_avg_neon(
const uint8_t *src, // r0
void aom_convolve_avg_neon(const uint8_t *src, // r0
ptrdiff_t src_stride, // r1
uint8_t *dst, // r2
ptrdiff_t dst_stride, // r3
const int16_t *filter_x,
int filter_x_stride,
const int16_t *filter_y,
int filter_y_stride,
int w,
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride, int w,
int h) {
uint8_t *d;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
uint32x2_t d0u32, d2u32;
uint8x16_t q0u8, q1u8, q2u8, q3u8, q8u8, q9u8, q10u8, q11u8;
(void)filter_x; (void)filter_x_stride;
(void)filter_y; (void)filter_y_stride;
(void)filter_x;
(void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
d = dst;
if (w > 32) { // avg64
@ -133,8 +132,7 @@ void vpx_convolve_avg_neon(
d2u32 = vld1_lane_u32((const uint32_t *)d, d2u32, 1);
d += dst_stride;
d0u8 = vrhadd_u8(vreinterpret_u8_u32(d0u32),
vreinterpret_u8_u32(d2u32));
d0u8 = vrhadd_u8(vreinterpret_u8_u32(d0u32), vreinterpret_u8_u32(d2u32));
d0u32 = vreinterpret_u32_u8(d0u8);
vst1_lane_u32((uint32_t *)dst, d0u32, 0);

View File

@ -1,21 +1,24 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_convolve_avg_neon|
;
EXPORT |aom_convolve_avg_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
|vpx_convolve_avg_neon| PROC
|aom_convolve_avg_neon| PROC
push {r4-r6, lr}
ldrd r4, r5, [sp, #32]
mov r6, r2

View File

@ -1,33 +1,32 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
void vpx_convolve_copy_neon(
const uint8_t *src, // r0
void aom_convolve_copy_neon(const uint8_t *src, // r0
ptrdiff_t src_stride, // r1
uint8_t *dst, // r2
ptrdiff_t dst_stride, // r3
const int16_t *filter_x,
int filter_x_stride,
const int16_t *filter_y,
int filter_y_stride,
int w,
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride, int w,
int h) {
uint8x8_t d0u8, d2u8;
uint8x16_t q0u8, q1u8, q2u8, q3u8;
(void)filter_x; (void)filter_x_stride;
(void)filter_y; (void)filter_y_stride;
(void)filter_x;
(void)filter_x_stride;
(void)filter_y;
(void)filter_y_stride;
if (w > 32) { // copy64
for (; h > 0; h--) {

View File

@ -1,21 +1,24 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_convolve_copy_neon|
;
EXPORT |aom_convolve_copy_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
|vpx_convolve_copy_neon| PROC
|aom_convolve_copy_neon| PROC
push {r4-r5, lr}
ldrd r4, r5, [sp, #28]

View File

@ -0,0 +1,66 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <assert.h>
#include "./aom_dsp_rtcd.h"
#include "aom_dsp/aom_dsp_common.h"
#include "aom_ports/mem.h"
void aom_convolve8_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
ptrdiff_t dst_stride, const int16_t *filter_x,
int x_step_q4, const int16_t *filter_y, int y_step_q4,
int w, int h) {
/* Given our constraints: w <= 64, h <= 64, taps == 8 we can reduce the
* maximum buffer size to 64 * 64 + 7 (+ 1 to make it divisible by 4).
*/
DECLARE_ALIGNED(8, uint8_t, temp[64 * 72]);
// Account for the vertical phase needing 3 lines prior and 4 lines post
int intermediate_height = h + 7;
assert(y_step_q4 == 16);
assert(x_step_q4 == 16);
/* Filter starting 3 lines back. The neon implementation will ignore the
* given height and filter a multiple of 4 lines. Since this goes in to
* the temp buffer which has lots of extra room and is subsequently discarded
* this is safe if somewhat less than ideal.
*/
aom_convolve8_horiz_neon(src - src_stride * 3, src_stride, temp, 64, filter_x,
x_step_q4, filter_y, y_step_q4, w,
intermediate_height);
/* Step into the temp buffer 3 lines to get the actual frame data */
aom_convolve8_vert_neon(temp + 64 * 3, 64, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h);
}
void aom_convolve8_avg_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4, int w,
int h) {
DECLARE_ALIGNED(8, uint8_t, temp[64 * 72]);
int intermediate_height = h + 7;
assert(y_step_q4 == 16);
assert(x_step_q4 == 16);
/* This implementation has the same issues as above. In addition, we only want
* to average the values after both passes.
*/
aom_convolve8_horiz_neon(src - src_stride * 3, src_stride, temp, 64, filter_x,
x_step_q4, filter_y, y_step_q4, w,
intermediate_height);
aom_convolve8_avg_vert_neon(temp + 64 * 3, 64, dst, dst_stride, filter_x,
x_step_q4, filter_y, y_step_q4, w, h);
}

View File

@ -1,20 +1,21 @@
/*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include <assert.h>
#include "./vpx_dsp_rtcd.h"
#include "./vpx_config.h"
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "vpx/vpx_integer.h"
#include "aom/aom_integer.h"
static INLINE unsigned int horizontal_add_u16x8(const uint16x8_t v_16x8) {
const uint32x4_t a = vpaddlq_u16(v_16x8);
@ -24,7 +25,7 @@ static INLINE unsigned int horizontal_add_u16x8(const uint16x8_t v_16x8) {
return vget_lane_u32(c, 0);
}
unsigned int vpx_avg_4x4_neon(const uint8_t *s, int p) {
unsigned int aom_avg_4x4_neon(const uint8_t *s, int p) {
uint16x8_t v_sum;
uint32x2_t v_s0 = vdup_n_u32(0);
uint32x2_t v_s1 = vdup_n_u32(0);
@ -36,7 +37,7 @@ unsigned int vpx_avg_4x4_neon(const uint8_t *s, int p) {
return (horizontal_add_u16x8(v_sum) + 8) >> 4;
}
unsigned int vpx_avg_8x8_neon(const uint8_t *s, int p) {
unsigned int aom_avg_8x8_neon(const uint8_t *s, int p) {
uint8x8_t v_s0 = vld1_u8(s);
const uint8x8_t v_s1 = vld1_u8(s + p);
uint16x8_t v_sum = vaddl_u8(v_s0, v_s1);
@ -64,7 +65,7 @@ unsigned int vpx_avg_8x8_neon(const uint8_t *s, int p) {
// coeff: 16 bits, dynamic range [-32640, 32640].
// length: value range {16, 64, 256, 1024}.
int vpx_satd_neon(const int16_t *coeff, int length) {
int aom_satd_neon(const int16_t *coeff, int length) {
const int16x4_t zero = vdup_n_s16(0);
int32x4_t accum = vdupq_n_s32(0);
@ -89,7 +90,7 @@ int vpx_satd_neon(const int16_t *coeff, int length) {
}
}
void vpx_int_pro_row_neon(int16_t hbuf[16], uint8_t const *ref,
void aom_int_pro_row_neon(int16_t hbuf[16], uint8_t const *ref,
const int ref_stride, const int height) {
int i;
uint16x8_t vec_sum_lo = vdupq_n_u16(0);
@ -142,7 +143,7 @@ void vpx_int_pro_row_neon(int16_t hbuf[16], uint8_t const *ref,
vst1q_s16(hbuf, vreinterpretq_s16_u16(vec_sum_hi));
}
int16_t vpx_int_pro_col_neon(uint8_t const *ref, const int width) {
int16_t aom_int_pro_col_neon(uint8_t const *ref, const int width) {
int i;
uint16x8_t vec_sum = vdupq_n_u16(0);
@ -158,7 +159,7 @@ int16_t vpx_int_pro_col_neon(uint8_t const *ref, const int width) {
// ref, src = [0, 510] - max diff = 16-bits
// bwl = {2, 3, 4}, width = {16, 32, 64}
int vpx_vector_var_neon(int16_t const *ref, int16_t const *src, const int bwl) {
int aom_vector_var_neon(int16_t const *ref, int16_t const *src, const int bwl) {
int width = 4 << bwl;
int32x4_t sse = vdupq_n_s32(0);
int16x8_t total = vdupq_n_s16(0);
@ -198,27 +199,24 @@ int vpx_vector_var_neon(int16_t const *ref, int16_t const *src, const int bwl) {
}
}
void vpx_minmax_8x8_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
int *min, int *max) {
void aom_minmax_8x8_neon(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, int *min, int *max) {
// Load and concatenate.
const uint8x16_t a01 = vcombine_u8(vld1_u8(a),
vld1_u8(a + a_stride));
const uint8x16_t a23 = vcombine_u8(vld1_u8(a + 2 * a_stride),
vld1_u8(a + 3 * a_stride));
const uint8x16_t a45 = vcombine_u8(vld1_u8(a + 4 * a_stride),
vld1_u8(a + 5 * a_stride));
const uint8x16_t a67 = vcombine_u8(vld1_u8(a + 6 * a_stride),
vld1_u8(a + 7 * a_stride));
const uint8x16_t a01 = vcombine_u8(vld1_u8(a), vld1_u8(a + a_stride));
const uint8x16_t a23 =
vcombine_u8(vld1_u8(a + 2 * a_stride), vld1_u8(a + 3 * a_stride));
const uint8x16_t a45 =
vcombine_u8(vld1_u8(a + 4 * a_stride), vld1_u8(a + 5 * a_stride));
const uint8x16_t a67 =
vcombine_u8(vld1_u8(a + 6 * a_stride), vld1_u8(a + 7 * a_stride));
const uint8x16_t b01 = vcombine_u8(vld1_u8(b),
vld1_u8(b + b_stride));
const uint8x16_t b23 = vcombine_u8(vld1_u8(b + 2 * b_stride),
vld1_u8(b + 3 * b_stride));
const uint8x16_t b45 = vcombine_u8(vld1_u8(b + 4 * b_stride),
vld1_u8(b + 5 * b_stride));
const uint8x16_t b67 = vcombine_u8(vld1_u8(b + 6 * b_stride),
vld1_u8(b + 7 * b_stride));
const uint8x16_t b01 = vcombine_u8(vld1_u8(b), vld1_u8(b + b_stride));
const uint8x16_t b23 =
vcombine_u8(vld1_u8(b + 2 * b_stride), vld1_u8(b + 3 * b_stride));
const uint8x16_t b45 =
vcombine_u8(vld1_u8(b + 4 * b_stride), vld1_u8(b + 5 * b_stride));
const uint8x16_t b67 =
vcombine_u8(vld1_u8(b + 6 * b_stride), vld1_u8(b + 7 * b_stride));
// Absolute difference.
const uint8x16_t ab01_diff = vabdq_u8(a01, b01);

View File

@ -1,16 +1,19 @@
;
; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |vpx_filter_block2d_bil_first_pass_media|
EXPORT |vpx_filter_block2d_bil_second_pass_media|
EXPORT |aom_filter_block2d_bil_first_pass_media|
EXPORT |aom_filter_block2d_bil_second_pass_media|
AREA |.text|, CODE, READONLY ; name this block of code
@ -20,13 +23,13 @@
; r2 unsigned int src_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *vpx_filter
; stack const short *aom_filter
;-------------------------------------
; The output is transposed stroed in output array to make it easy for second pass filtering.
|vpx_filter_block2d_bil_first_pass_media| PROC
|aom_filter_block2d_bil_first_pass_media| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vpx_filter address
ldr r11, [sp, #40] ; aom_filter address
ldr r4, [sp, #36] ; width
mov r12, r3 ; outer-loop counter
@ -134,7 +137,7 @@
ldmia sp!, {r4 - r11, pc}
ENDP ; |vpx_filter_block2d_bil_first_pass_media|
ENDP ; |aom_filter_block2d_bil_first_pass_media|
;---------------------------------
@ -143,12 +146,12 @@
; r2 int dst_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *vpx_filter
; stack const short *aom_filter
;---------------------------------
|vpx_filter_block2d_bil_second_pass_media| PROC
|aom_filter_block2d_bil_second_pass_media| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vpx_filter address
ldr r11, [sp, #40] ; aom_filter address
ldr r4, [sp, #36] ; width
ldr r5, [r11] ; load up filter coefficients
@ -232,6 +235,6 @@
bne bil_height_loop_null_2nd
ldmia sp!, {r4 - r11, pc}
ENDP ; |vpx_filter_block2d_second_pass_media|
ENDP ; |aom_filter_block2d_second_pass_media|
END

View File

@ -1,19 +1,20 @@
/*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./vpx_config.h"
#include "vpx_dsp/txfm_common.h"
#include "./aom_config.h"
#include "aom_dsp/txfm_common.h"
void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
void aom_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
int i;
// stage 1
int16x8_t input_0 = vshlq_n_s16(vld1q_s16(&input[0 * stride]), 2);
@ -52,10 +53,10 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
v_t2_hi = vmlal_n_s16(v_t2_hi, vget_high_s16(v_x3), (int16_t)cospi_8_64);
v_t3_lo = vmlsl_n_s16(v_t3_lo, vget_low_s16(v_x2), (int16_t)cospi_8_64);
v_t3_hi = vmlsl_n_s16(v_t3_hi, vget_high_s16(v_x2), (int16_t)cospi_8_64);
v_t0_lo = vmulq_n_s32(v_t0_lo, cospi_16_64);
v_t0_hi = vmulq_n_s32(v_t0_hi, cospi_16_64);
v_t1_lo = vmulq_n_s32(v_t1_lo, cospi_16_64);
v_t1_hi = vmulq_n_s32(v_t1_hi, cospi_16_64);
v_t0_lo = vmulq_n_s32(v_t0_lo, (int32_t)cospi_16_64);
v_t0_hi = vmulq_n_s32(v_t0_hi, (int32_t)cospi_16_64);
v_t1_lo = vmulq_n_s32(v_t1_lo, (int32_t)cospi_16_64);
v_t1_hi = vmulq_n_s32(v_t1_hi, (int32_t)cospi_16_64);
{
const int16x4_t a = vrshrn_n_s32(v_t0_lo, DCT_CONST_BITS);
const int16x4_t b = vrshrn_n_s32(v_t0_hi, DCT_CONST_BITS);
@ -131,14 +132,14 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
// 14 15 16 17 54 55 56 57
// 24 25 26 27 64 65 66 67
// 34 35 36 37 74 75 76 77
const int32x4x2_t r02_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_0),
vreinterpretq_s32_s16(out_2));
const int32x4x2_t r13_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_1),
vreinterpretq_s32_s16(out_3));
const int32x4x2_t r46_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_4),
vreinterpretq_s32_s16(out_6));
const int32x4x2_t r57_s32 = vtrnq_s32(vreinterpretq_s32_s16(out_5),
vreinterpretq_s32_s16(out_7));
const int32x4x2_t r02_s32 =
vtrnq_s32(vreinterpretq_s32_s16(out_0), vreinterpretq_s32_s16(out_2));
const int32x4x2_t r13_s32 =
vtrnq_s32(vreinterpretq_s32_s16(out_1), vreinterpretq_s32_s16(out_3));
const int32x4x2_t r46_s32 =
vtrnq_s32(vreinterpretq_s32_s16(out_4), vreinterpretq_s32_s16(out_6));
const int32x4x2_t r57_s32 =
vtrnq_s32(vreinterpretq_s32_s16(out_5), vreinterpretq_s32_s16(out_7));
const int16x8x2_t r01_s16 =
vtrnq_s16(vreinterpretq_s16_s32(r02_s32.val[0]),
vreinterpretq_s16_s32(r13_s32.val[0]));
@ -170,7 +171,7 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
}
} // for
{
// from vpx_dct_sse2.c
// from aom_dct_sse2.c
// Post-condition (division by two)
// division of two 16 bits signed numbers using shifts
// n / 2 = (n - (n >> 15)) >> 1
@ -202,7 +203,7 @@ void vpx_fdct8x8_neon(const int16_t *input, int16_t *final_output, int stride) {
}
}
void vpx_fdct8x8_1_neon(const int16_t *input, int16_t *output, int stride) {
void aom_fdct8x8_1_neon(const int16_t *input, int16_t *output, int stride) {
int r;
int16x8_t sum = vld1q_s16(&input[0]);
for (r = 1; r < 8; ++r) {

View File

@ -10,11 +10,10 @@
#include <arm_neon.h>
#include "./vpx_dsp_rtcd.h"
#include "./aom_dsp_rtcd.h"
static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1,
int16x8_t *a2, int16x8_t *a3,
int16x8_t *a4, int16x8_t *a5,
static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1, int16x8_t *a2,
int16x8_t *a3, int16x8_t *a4, int16x8_t *a5,
int16x8_t *a6, int16x8_t *a7) {
const int16x8_t b0 = vaddq_s16(*a0, *a1);
const int16x8_t b1 = vsubq_s16(*a0, *a1);
@ -47,9 +46,8 @@ static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1,
// TODO(johannkoenig): Make a transpose library and dedup with idct. Consider
// reversing transpose order which may make it easier for the compiler to
// reconcile the vtrn.64 moves.
static void transpose8x8(int16x8_t *a0, int16x8_t *a1,
int16x8_t *a2, int16x8_t *a3,
int16x8_t *a4, int16x8_t *a5,
static void transpose8x8(int16x8_t *a0, int16x8_t *a1, int16x8_t *a2,
int16x8_t *a3, int16x8_t *a4, int16x8_t *a5,
int16x8_t *a6, int16x8_t *a7) {
// Swap 64 bit elements. Goes from:
// a0: 00 01 02 03 04 05 06 07
@ -91,14 +89,14 @@ static void transpose8x8(int16x8_t *a0, int16x8_t *a1,
// a1657_hi:
// 12 13 28 29 44 45 60 61
// 14 15 30 31 46 47 62 63
const int32x4x2_t a0246_lo = vtrnq_s32(vreinterpretq_s32_s16(a04_lo),
vreinterpretq_s32_s16(a26_lo));
const int32x4x2_t a1357_lo = vtrnq_s32(vreinterpretq_s32_s16(a15_lo),
vreinterpretq_s32_s16(a37_lo));
const int32x4x2_t a0246_hi = vtrnq_s32(vreinterpretq_s32_s16(a04_hi),
vreinterpretq_s32_s16(a26_hi));
const int32x4x2_t a1357_hi = vtrnq_s32(vreinterpretq_s32_s16(a15_hi),
vreinterpretq_s32_s16(a37_hi));
const int32x4x2_t a0246_lo =
vtrnq_s32(vreinterpretq_s32_s16(a04_lo), vreinterpretq_s32_s16(a26_lo));
const int32x4x2_t a1357_lo =
vtrnq_s32(vreinterpretq_s32_s16(a15_lo), vreinterpretq_s32_s16(a37_lo));
const int32x4x2_t a0246_hi =
vtrnq_s32(vreinterpretq_s32_s16(a04_hi), vreinterpretq_s32_s16(a26_hi));
const int32x4x2_t a1357_hi =
vtrnq_s32(vreinterpretq_s32_s16(a15_hi), vreinterpretq_s32_s16(a37_hi));
// Swap 16 bit elements resulting in:
// b0:
@ -132,7 +130,7 @@ static void transpose8x8(int16x8_t *a0, int16x8_t *a1,
*a7 = b3.val[1];
}
void vpx_hadamard_8x8_neon(const int16_t *src_diff, int src_stride,
void aom_hadamard_8x8_neon(const int16_t *src_diff, int src_stride,
int16_t *coeff) {
int16x8_t a0 = vld1q_s16(src_diff);
int16x8_t a1 = vld1q_s16(src_diff + src_stride);
@ -161,19 +159,19 @@ void vpx_hadamard_8x8_neon(const int16_t *src_diff, int src_stride,
vst1q_s16(coeff + 56, a7);
}
void vpx_hadamard_16x16_neon(const int16_t *src_diff, int src_stride,
void aom_hadamard_16x16_neon(const int16_t *src_diff, int src_stride,
int16_t *coeff) {
int i;
/* Rearrange 16x16 to 8x32 and remove stride.
* Top left first. */
vpx_hadamard_8x8_neon(src_diff + 0 + 0 * src_stride, src_stride, coeff + 0);
aom_hadamard_8x8_neon(src_diff + 0 + 0 * src_stride, src_stride, coeff + 0);
/* Top right. */
vpx_hadamard_8x8_neon(src_diff + 8 + 0 * src_stride, src_stride, coeff + 64);
aom_hadamard_8x8_neon(src_diff + 8 + 0 * src_stride, src_stride, coeff + 64);
/* Bottom left. */
vpx_hadamard_8x8_neon(src_diff + 0 + 8 * src_stride, src_stride, coeff + 128);
aom_hadamard_8x8_neon(src_diff + 0 + 8 * src_stride, src_stride, coeff + 128);
/* Bottom right. */
vpx_hadamard_8x8_neon(src_diff + 8 + 8 * src_stride, src_stride, coeff + 192);
aom_hadamard_8x8_neon(src_diff + 8 + 8 * src_stride, src_stride, coeff + 192);
for (i = 0; i < 64; i += 8) {
const int16x8_t a0 = vld1q_s16(coeff + 0);

View File

@ -1,28 +1,31 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_idct16x16_1_add_neon|
EXPORT |aom_idct16x16_1_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_idct16x16_1_add_neon(int16_t *input, uint8_t *dest,
;void aom_idct16x16_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vpx_idct16x16_1_add_neon| PROC
|aom_idct16x16_1_add_neon| PROC
ldrsh r0, [r0]
; generate cospi_16_64 = 11585
@ -193,6 +196,6 @@
vst1.64 {d31}, [r12], r2
bx lr
ENDP ; |vpx_idct16x16_1_add_neon|
ENDP ; |aom_idct16x16_1_add_neon|
END

View File

@ -0,0 +1,59 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct16x16_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d2u8, d3u8, d30u8, d31u8;
uint64x1_t d2u64, d3u64, d4u64, d5u64;
uint16x8_t q0u16, q9u16, q10u16, q11u16, q12u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, j, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 6);
q0s16 = vdupq_n_s16(a1);
q0u16 = vreinterpretq_u16_s16(q0s16);
for (d1 = d2 = dest, i = 0; i < 4; i++) {
for (j = 0; j < 2; j++) {
d2u64 = vld1_u64((const uint64_t *)d1);
d3u64 = vld1_u64((const uint64_t *)(d1 + 8));
d1 += dest_stride;
d4u64 = vld1_u64((const uint64_t *)d1);
d5u64 = vld1_u64((const uint64_t *)(d1 + 8));
d1 += dest_stride;
q9u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d2u64));
q10u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d3u64));
q11u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d4u64));
q12u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d5u64));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d30u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
d31u8 = vqmovun_s16(vreinterpretq_s16_u16(q12u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
vst1_u64((uint64_t *)(d2 + 8), vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d30u8));
vst1_u64((uint64_t *)(d2 + 8), vreinterpret_u64_u8(d31u8));
d2 += dest_stride;
}
}
return;
}

View File

@ -1,17 +1,20 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_idct16x16_256_add_neon_pass1|
EXPORT |vpx_idct16x16_256_add_neon_pass2|
EXPORT |vpx_idct16x16_10_add_neon_pass1|
EXPORT |vpx_idct16x16_10_add_neon_pass2|
;
EXPORT |aom_idct16x16_256_add_neon_pass1|
EXPORT |aom_idct16x16_256_add_neon_pass2|
EXPORT |aom_idct16x16_10_add_neon_pass1|
EXPORT |aom_idct16x16_10_add_neon_pass2|
ARM
REQUIRE8
PRESERVE8
@ -36,7 +39,7 @@
MEND
AREA Block, CODE, READONLY ; name this block of code
;void |vpx_idct16x16_256_add_neon_pass1|(int16_t *input,
;void |aom_idct16x16_256_add_neon_pass1|(int16_t *input,
; int16_t *output, int output_stride)
;
; r0 int16_t input
@ -46,7 +49,7 @@
; idct16 stage1 - stage6 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation.
|vpx_idct16x16_256_add_neon_pass1| PROC
|aom_idct16x16_256_add_neon_pass1| PROC
; TODO(hkuang): Find a better way to load the elements.
; load elements of 0, 2, 4, 6, 8, 10, 12, 14 into q8 - q15
@ -273,9 +276,9 @@
vst1.64 {d31}, [r1], r2
bx lr
ENDP ; |vpx_idct16x16_256_add_neon_pass1|
ENDP ; |aom_idct16x16_256_add_neon_pass1|
;void vpx_idct16x16_256_add_neon_pass2(int16_t *src,
;void aom_idct16x16_256_add_neon_pass2(int16_t *src,
; int16_t *output,
; int16_t *pass1Output,
; int16_t skip_adding,
@ -292,7 +295,7 @@
; idct16 stage1 - stage7 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation.
|vpx_idct16x16_256_add_neon_pass2| PROC
|aom_idct16x16_256_add_neon_pass2| PROC
push {r3-r9}
; TODO(hkuang): Find a better way to load the elements.
@ -784,9 +787,9 @@ skip_adding_dest
end_idct16x16_pass2
pop {r3-r9}
bx lr
ENDP ; |vpx_idct16x16_256_add_neon_pass2|
ENDP ; |aom_idct16x16_256_add_neon_pass2|
;void |vpx_idct16x16_10_add_neon_pass1|(int16_t *input,
;void |aom_idct16x16_10_add_neon_pass1|(int16_t *input,
; int16_t *output, int output_stride)
;
; r0 int16_t input
@ -796,7 +799,7 @@ end_idct16x16_pass2
; idct16 stage1 - stage6 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation.
|vpx_idct16x16_10_add_neon_pass1| PROC
|aom_idct16x16_10_add_neon_pass1| PROC
; TODO(hkuang): Find a better way to load the elements.
; load elements of 0, 2, 4, 6, 8, 10, 12, 14 into q8 - q15
@ -905,9 +908,9 @@ end_idct16x16_pass2
vst1.64 {d31}, [r1], r2
bx lr
ENDP ; |vpx_idct16x16_10_add_neon_pass1|
ENDP ; |aom_idct16x16_10_add_neon_pass1|
;void vpx_idct16x16_10_add_neon_pass2(int16_t *src,
;void aom_idct16x16_10_add_neon_pass2(int16_t *src,
; int16_t *output,
; int16_t *pass1Output,
; int16_t skip_adding,
@ -924,7 +927,7 @@ end_idct16x16_pass2
; idct16 stage1 - stage7 on all the elements loaded in q8-q15. The output
; will be stored back into q8-q15 registers. This function will touch q0-q7
; registers and use them as buffer during calculation.
|vpx_idct16x16_10_add_neon_pass2| PROC
|aom_idct16x16_10_add_neon_pass2| PROC
push {r3-r9}
; TODO(hkuang): Find a better way to load the elements.
@ -1175,5 +1178,5 @@ end_idct16x16_pass2
end_idct10_16x16_pass2
pop {r3-r9}
bx lr
ENDP ; |vpx_idct16x16_10_add_neon_pass2|
ENDP ; |aom_idct16x16_10_add_neon_pass2|
END

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,152 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "aom_dsp/aom_dsp_common.h"
void aom_idct16x16_256_add_neon_pass1(const int16_t *input, int16_t *output,
int output_stride);
void aom_idct16x16_256_add_neon_pass2(const int16_t *src, int16_t *output,
int16_t *pass1Output, int16_t skip_adding,
uint8_t *dest, int dest_stride);
void aom_idct16x16_10_add_neon_pass1(const int16_t *input, int16_t *output,
int output_stride);
void aom_idct16x16_10_add_neon_pass2(const int16_t *src, int16_t *output,
int16_t *pass1Output, int16_t skip_adding,
uint8_t *dest, int dest_stride);
#if HAVE_NEON_ASM
/* For ARM NEON, d8-d15 are callee-saved registers, and need to be saved. */
extern void aom_push_neon(int64_t *store);
extern void aom_pop_neon(int64_t *store);
#endif // HAVE_NEON_ASM
void aom_idct16x16_256_add_neon(const int16_t *input, uint8_t *dest,
int dest_stride) {
#if HAVE_NEON_ASM
int64_t store_reg[8];
#endif
int16_t pass1_output[16 * 16] = { 0 };
int16_t row_idct_output[16 * 16] = { 0 };
#if HAVE_NEON_ASM
// save d8-d15 register values.
aom_push_neon(store_reg);
#endif
/* Parallel idct on the upper 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(input, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
aom_idct16x16_256_add_neon_pass2(input + 1, row_idct_output, pass1_output, 0,
dest, dest_stride);
/* Parallel idct on the lower 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(input + 8 * 16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
aom_idct16x16_256_add_neon_pass2(input + 8 * 16 + 1, row_idct_output + 8,
pass1_output, 0, dest, dest_stride);
/* Parallel idct on the left 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 1, row_idct_output,
pass1_output, 1, dest, dest_stride);
/* Parallel idct on the right 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output + 8 * 16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 8 * 16 + 1,
row_idct_output + 8, pass1_output, 1,
dest + 8, dest_stride);
#if HAVE_NEON_ASM
// restore d8-d15 register values.
aom_pop_neon(store_reg);
#endif
return;
}
void aom_idct16x16_10_add_neon(const int16_t *input, uint8_t *dest,
int dest_stride) {
#if HAVE_NEON_ASM
int64_t store_reg[8];
#endif
int16_t pass1_output[16 * 16] = { 0 };
int16_t row_idct_output[16 * 16] = { 0 };
#if HAVE_NEON_ASM
// save d8-d15 register values.
aom_push_neon(store_reg);
#endif
/* Parallel idct on the upper 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_10_add_neon_pass1(input, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
aom_idct16x16_10_add_neon_pass2(input + 1, row_idct_output, pass1_output, 0,
dest, dest_stride);
/* Skip Parallel idct on the lower 8 rows as they are all 0s */
/* Parallel idct on the left 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 1, row_idct_output,
pass1_output, 1, dest, dest_stride);
/* Parallel idct on the right 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
aom_idct16x16_256_add_neon_pass1(row_idct_output + 8 * 16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
aom_idct16x16_256_add_neon_pass2(row_idct_output + 8 * 16 + 1,
row_idct_output + 8, pass1_output, 1,
dest + 8, dest_stride);
#if HAVE_NEON_ASM
// restore d8-d15 register values.
aom_pop_neon(store_reg);
#endif
return;
}

View File

@ -1,13 +1,16 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_idct32x32_1_add_neon|
EXPORT |aom_idct32x32_1_add_neon|
ARM
REQUIRE8
PRESERVE8
@ -64,14 +67,14 @@
vst1.8 {q15},[$dst], $stride
MEND
;void vpx_idct32x32_1_add_neon(int16_t *input, uint8_t *dest,
;void aom_idct32x32_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride
|vpx_idct32x32_1_add_neon| PROC
|aom_idct32x32_1_add_neon| PROC
push {lr}
pld [r1]
add r3, r1, #16 ; r3 dest + 16 for second loop
@ -140,5 +143,5 @@ diff_positive_32_32_loop
bne diff_positive_32_32_loop
pop {pc}
ENDP ; |vpx_idct32x32_1_add_neon|
ENDP ; |aom_idct32x32_1_add_neon|
END

View File

@ -0,0 +1,141 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
static INLINE void LD_16x8(uint8_t *d, int d_stride, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vld1q_u8(d);
d += d_stride;
*q9u8 = vld1q_u8(d);
d += d_stride;
*q10u8 = vld1q_u8(d);
d += d_stride;
*q11u8 = vld1q_u8(d);
d += d_stride;
*q12u8 = vld1q_u8(d);
d += d_stride;
*q13u8 = vld1q_u8(d);
d += d_stride;
*q14u8 = vld1q_u8(d);
d += d_stride;
*q15u8 = vld1q_u8(d);
return;
}
static INLINE void ADD_DIFF_16x8(uint8x16_t qdiffu8, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vqaddq_u8(*q8u8, qdiffu8);
*q9u8 = vqaddq_u8(*q9u8, qdiffu8);
*q10u8 = vqaddq_u8(*q10u8, qdiffu8);
*q11u8 = vqaddq_u8(*q11u8, qdiffu8);
*q12u8 = vqaddq_u8(*q12u8, qdiffu8);
*q13u8 = vqaddq_u8(*q13u8, qdiffu8);
*q14u8 = vqaddq_u8(*q14u8, qdiffu8);
*q15u8 = vqaddq_u8(*q15u8, qdiffu8);
return;
}
static INLINE void SUB_DIFF_16x8(uint8x16_t qdiffu8, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
*q8u8 = vqsubq_u8(*q8u8, qdiffu8);
*q9u8 = vqsubq_u8(*q9u8, qdiffu8);
*q10u8 = vqsubq_u8(*q10u8, qdiffu8);
*q11u8 = vqsubq_u8(*q11u8, qdiffu8);
*q12u8 = vqsubq_u8(*q12u8, qdiffu8);
*q13u8 = vqsubq_u8(*q13u8, qdiffu8);
*q14u8 = vqsubq_u8(*q14u8, qdiffu8);
*q15u8 = vqsubq_u8(*q15u8, qdiffu8);
return;
}
static INLINE void ST_16x8(uint8_t *d, int d_stride, uint8x16_t *q8u8,
uint8x16_t *q9u8, uint8x16_t *q10u8,
uint8x16_t *q11u8, uint8x16_t *q12u8,
uint8x16_t *q13u8, uint8x16_t *q14u8,
uint8x16_t *q15u8) {
vst1q_u8(d, *q8u8);
d += d_stride;
vst1q_u8(d, *q9u8);
d += d_stride;
vst1q_u8(d, *q10u8);
d += d_stride;
vst1q_u8(d, *q11u8);
d += d_stride;
vst1q_u8(d, *q12u8);
d += d_stride;
vst1q_u8(d, *q13u8);
d += d_stride;
vst1q_u8(d, *q14u8);
d += d_stride;
vst1q_u8(d, *q15u8);
return;
}
void aom_idct32x32_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x16_t q0u8, q8u8, q9u8, q10u8, q11u8, q12u8, q13u8, q14u8, q15u8;
int i, j, dest_stride8;
uint8_t *d;
int16_t a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 6);
dest_stride8 = dest_stride * 8;
if (a1 >= 0) { // diff_positive_32_32
a1 = a1 < 0 ? 0 : a1 > 255 ? 255 : a1;
q0u8 = vdupq_n_u8(a1);
for (i = 0; i < 2; i++, dest += 16) { // diff_positive_32_32_loop
d = dest;
for (j = 0; j < 4; j++) {
LD_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ADD_DIFF_16x8(q0u8, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ST_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
d += dest_stride8;
}
}
} else { // diff_negative_32_32
a1 = -a1;
a1 = a1 < 0 ? 0 : a1 > 255 ? 255 : a1;
q0u8 = vdupq_n_u8(a1);
for (i = 0; i < 2; i++, dest += 16) { // diff_negative_32_32_loop
d = dest;
for (j = 0; j < 4; j++) {
LD_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
SUB_DIFF_16x8(q0u8, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
ST_16x8(d, dest_stride, &q8u8, &q9u8, &q10u8, &q11u8, &q12u8, &q13u8,
&q14u8, &q15u8);
d += dest_stride8;
}
}
}
return;
}

View File

@ -1,11 +1,14 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
;TODO(cd): adjust these constant to be able to use vqdmulh for faster
@ -43,7 +46,7 @@ cospi_30_64 EQU 1606
cospi_31_64 EQU 804
EXPORT |vpx_idct32x32_1024_add_neon|
EXPORT |aom_idct32x32_1024_add_neon|
ARM
REQUIRE8
PRESERVE8
@ -288,7 +291,7 @@ cospi_31_64 EQU 804
MEND
; --------------------------------------------------------------------------
;void vpx_idct32x32_1024_add_neon(int16_t *input, uint8_t *dest, int dest_stride);
;void aom_idct32x32_1024_add_neon(int16_t *input, uint8_t *dest, int dest_stride);
;
; r0 int16_t *input,
; r1 uint8_t *dest,
@ -303,7 +306,7 @@ cospi_31_64 EQU 804
; r9 dest + 15 * dest_stride, descending (14, 13, 12, ...)
; r10 dest + 16 * dest_stride, ascending (17, 18, 19, ...)
|vpx_idct32x32_1024_add_neon| PROC
|aom_idct32x32_1024_add_neon| PROC
; This function does one pass of idct32x32 transform.
;
; This is done by transposing the input and then doing a 1d transform on
@ -1295,5 +1298,5 @@ idct32_bands_end_2nd_pass
vpop {d8-d15}
pop {r4-r11}
bx lr
ENDP ; |vpx_idct32x32_1024_add_neon|
ENDP ; |aom_idct32x32_1024_add_neon|
END

View File

@ -0,0 +1,686 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/txfm_common.h"
#define LOAD_FROM_TRANSPOSED(prev, first, second) \
q14s16 = vld1q_s16(trans_buf + first * 8); \
q13s16 = vld1q_s16(trans_buf + second * 8);
#define LOAD_FROM_OUTPUT(prev, first, second, qA, qB) \
qA = vld1q_s16(out + first * 32); \
qB = vld1q_s16(out + second * 32);
#define STORE_IN_OUTPUT(prev, first, second, qA, qB) \
vst1q_s16(out + first * 32, qA); \
vst1q_s16(out + second * 32, qB);
#define STORE_COMBINE_CENTER_RESULTS(r10, r9) \
__STORE_COMBINE_CENTER_RESULTS(r10, r9, stride, q6s16, q7s16, q8s16, q9s16);
static INLINE void __STORE_COMBINE_CENTER_RESULTS(uint8_t *p1, uint8_t *p2,
int stride, int16x8_t q6s16,
int16x8_t q7s16,
int16x8_t q8s16,
int16x8_t q9s16) {
int16x4_t d8s16, d9s16, d10s16, d11s16;
d8s16 = vld1_s16((int16_t *)p1);
p1 += stride;
d11s16 = vld1_s16((int16_t *)p2);
p2 -= stride;
d9s16 = vld1_s16((int16_t *)p1);
d10s16 = vld1_s16((int16_t *)p2);
q7s16 = vrshrq_n_s16(q7s16, 6);
q8s16 = vrshrq_n_s16(q8s16, 6);
q9s16 = vrshrq_n_s16(q9s16, 6);
q6s16 = vrshrq_n_s16(q6s16, 6);
q7s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q7s16), vreinterpret_u8_s16(d9s16)));
q8s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_s16(d10s16)));
q9s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_s16(d11s16)));
q6s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q6s16), vreinterpret_u8_s16(d8s16)));
d9s16 = vreinterpret_s16_u8(vqmovun_s16(q7s16));
d10s16 = vreinterpret_s16_u8(vqmovun_s16(q8s16));
d11s16 = vreinterpret_s16_u8(vqmovun_s16(q9s16));
d8s16 = vreinterpret_s16_u8(vqmovun_s16(q6s16));
vst1_s16((int16_t *)p1, d9s16);
p1 -= stride;
vst1_s16((int16_t *)p2, d10s16);
p2 += stride;
vst1_s16((int16_t *)p1, d8s16);
vst1_s16((int16_t *)p2, d11s16);
return;
}
#define STORE_COMBINE_EXTREME_RESULTS(r7, r6) \
; \
__STORE_COMBINE_EXTREME_RESULTS(r7, r6, stride, q4s16, q5s16, q6s16, q7s16);
static INLINE void __STORE_COMBINE_EXTREME_RESULTS(uint8_t *p1, uint8_t *p2,
int stride, int16x8_t q4s16,
int16x8_t q5s16,
int16x8_t q6s16,
int16x8_t q7s16) {
int16x4_t d4s16, d5s16, d6s16, d7s16;
d4s16 = vld1_s16((int16_t *)p1);
p1 += stride;
d7s16 = vld1_s16((int16_t *)p2);
p2 -= stride;
d5s16 = vld1_s16((int16_t *)p1);
d6s16 = vld1_s16((int16_t *)p2);
q5s16 = vrshrq_n_s16(q5s16, 6);
q6s16 = vrshrq_n_s16(q6s16, 6);
q7s16 = vrshrq_n_s16(q7s16, 6);
q4s16 = vrshrq_n_s16(q4s16, 6);
q5s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q5s16), vreinterpret_u8_s16(d5s16)));
q6s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q6s16), vreinterpret_u8_s16(d6s16)));
q7s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q7s16), vreinterpret_u8_s16(d7s16)));
q4s16 = vreinterpretq_s16_u16(
vaddw_u8(vreinterpretq_u16_s16(q4s16), vreinterpret_u8_s16(d4s16)));
d5s16 = vreinterpret_s16_u8(vqmovun_s16(q5s16));
d6s16 = vreinterpret_s16_u8(vqmovun_s16(q6s16));
d7s16 = vreinterpret_s16_u8(vqmovun_s16(q7s16));
d4s16 = vreinterpret_s16_u8(vqmovun_s16(q4s16));
vst1_s16((int16_t *)p1, d5s16);
p1 -= stride;
vst1_s16((int16_t *)p2, d6s16);
p2 += stride;
vst1_s16((int16_t *)p2, d7s16);
vst1_s16((int16_t *)p1, d4s16);
return;
}
#define DO_BUTTERFLY_STD(const_1, const_2, qA, qB) \
DO_BUTTERFLY(q14s16, q13s16, const_1, const_2, qA, qB);
static INLINE void DO_BUTTERFLY(int16x8_t q14s16, int16x8_t q13s16,
int16_t first_const, int16_t second_const,
int16x8_t *qAs16, int16x8_t *qBs16) {
int16x4_t d30s16, d31s16;
int32x4_t q8s32, q9s32, q10s32, q11s32, q12s32, q15s32;
int16x4_t dCs16, dDs16, dAs16, dBs16;
dCs16 = vget_low_s16(q14s16);
dDs16 = vget_high_s16(q14s16);
dAs16 = vget_low_s16(q13s16);
dBs16 = vget_high_s16(q13s16);
d30s16 = vdup_n_s16(first_const);
d31s16 = vdup_n_s16(second_const);
q8s32 = vmull_s16(dCs16, d30s16);
q10s32 = vmull_s16(dAs16, d31s16);
q9s32 = vmull_s16(dDs16, d30s16);
q11s32 = vmull_s16(dBs16, d31s16);
q12s32 = vmull_s16(dCs16, d31s16);
q8s32 = vsubq_s32(q8s32, q10s32);
q9s32 = vsubq_s32(q9s32, q11s32);
q10s32 = vmull_s16(dDs16, d31s16);
q11s32 = vmull_s16(dAs16, d30s16);
q15s32 = vmull_s16(dBs16, d30s16);
q11s32 = vaddq_s32(q12s32, q11s32);
q10s32 = vaddq_s32(q10s32, q15s32);
*qAs16 = vcombine_s16(vqrshrn_n_s32(q8s32, 14), vqrshrn_n_s32(q9s32, 14));
*qBs16 = vcombine_s16(vqrshrn_n_s32(q11s32, 14), vqrshrn_n_s32(q10s32, 14));
return;
}
static INLINE void idct32_transpose_pair(int16_t *input, int16_t *t_buf) {
int16_t *in;
int i;
const int stride = 32;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
int32x4x2_t q0x2s32, q1x2s32, q2x2s32, q3x2s32;
int16x8x2_t q0x2s16, q1x2s16, q2x2s16, q3x2s16;
for (i = 0; i < 4; i++, input += 8) {
in = input;
q8s16 = vld1q_s16(in);
in += stride;
q9s16 = vld1q_s16(in);
in += stride;
q10s16 = vld1q_s16(in);
in += stride;
q11s16 = vld1q_s16(in);
in += stride;
q12s16 = vld1q_s16(in);
in += stride;
q13s16 = vld1q_s16(in);
in += stride;
q14s16 = vld1q_s16(in);
in += stride;
q15s16 = vld1q_s16(in);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_low_s16(q9s16);
d19s16 = vget_high_s16(q9s16);
d20s16 = vget_low_s16(q10s16);
d21s16 = vget_high_s16(q10s16);
d22s16 = vget_low_s16(q11s16);
d23s16 = vget_high_s16(q11s16);
d24s16 = vget_low_s16(q12s16);
d25s16 = vget_high_s16(q12s16);
d26s16 = vget_low_s16(q13s16);
d27s16 = vget_high_s16(q13s16);
d28s16 = vget_low_s16(q14s16);
d29s16 = vget_high_s16(q14s16);
d30s16 = vget_low_s16(q15s16);
d31s16 = vget_high_s16(q15s16);
q8s16 = vcombine_s16(d16s16, d24s16); // vswp d17, d24
q9s16 = vcombine_s16(d18s16, d26s16); // vswp d19, d26
q10s16 = vcombine_s16(d20s16, d28s16); // vswp d21, d28
q11s16 = vcombine_s16(d22s16, d30s16); // vswp d23, d30
q12s16 = vcombine_s16(d17s16, d25s16);
q13s16 = vcombine_s16(d19s16, d27s16);
q14s16 = vcombine_s16(d21s16, d29s16);
q15s16 = vcombine_s16(d23s16, d31s16);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q10s16));
q1x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q9s16), vreinterpretq_s32_s16(q11s16));
q2x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q12s16), vreinterpretq_s32_s16(q14s16));
q3x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q13s16), vreinterpretq_s32_s16(q15s16));
q0x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[0]), // q8
vreinterpretq_s16_s32(q1x2s32.val[0])); // q9
q1x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[1]), // q10
vreinterpretq_s16_s32(q1x2s32.val[1])); // q11
q2x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[0]), // q12
vreinterpretq_s16_s32(q3x2s32.val[0])); // q13
q3x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[1]), // q14
vreinterpretq_s16_s32(q3x2s32.val[1])); // q15
vst1q_s16(t_buf, q0x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q0x2s16.val[1]);
t_buf += 8;
vst1q_s16(t_buf, q1x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q1x2s16.val[1]);
t_buf += 8;
vst1q_s16(t_buf, q2x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q2x2s16.val[1]);
t_buf += 8;
vst1q_s16(t_buf, q3x2s16.val[0]);
t_buf += 8;
vst1q_s16(t_buf, q3x2s16.val[1]);
t_buf += 8;
}
return;
}
static INLINE void idct32_bands_end_1st_pass(int16_t *out, int16x8_t q2s16,
int16x8_t q3s16, int16x8_t q6s16,
int16x8_t q7s16, int16x8_t q8s16,
int16x8_t q9s16, int16x8_t q10s16,
int16x8_t q11s16, int16x8_t q12s16,
int16x8_t q13s16, int16x8_t q14s16,
int16x8_t q15s16) {
int16x8_t q0s16, q1s16, q4s16, q5s16;
STORE_IN_OUTPUT(17, 16, 17, q6s16, q7s16);
STORE_IN_OUTPUT(17, 14, 15, q8s16, q9s16);
LOAD_FROM_OUTPUT(15, 30, 31, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(31, 30, 31, q6s16, q7s16);
STORE_IN_OUTPUT(31, 0, 1, q4s16, q5s16);
LOAD_FROM_OUTPUT(1, 12, 13, q0s16, q1s16);
q2s16 = vaddq_s16(q10s16, q1s16);
q3s16 = vaddq_s16(q11s16, q0s16);
q4s16 = vsubq_s16(q11s16, q0s16);
q5s16 = vsubq_s16(q10s16, q1s16);
LOAD_FROM_OUTPUT(13, 18, 19, q0s16, q1s16);
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_IN_OUTPUT(19, 18, 19, q6s16, q7s16);
STORE_IN_OUTPUT(19, 12, 13, q8s16, q9s16);
LOAD_FROM_OUTPUT(13, 28, 29, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(29, 28, 29, q6s16, q7s16);
STORE_IN_OUTPUT(29, 2, 3, q4s16, q5s16);
LOAD_FROM_OUTPUT(3, 10, 11, q0s16, q1s16);
q2s16 = vaddq_s16(q12s16, q1s16);
q3s16 = vaddq_s16(q13s16, q0s16);
q4s16 = vsubq_s16(q13s16, q0s16);
q5s16 = vsubq_s16(q12s16, q1s16);
LOAD_FROM_OUTPUT(11, 20, 21, q0s16, q1s16);
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_IN_OUTPUT(21, 20, 21, q6s16, q7s16);
STORE_IN_OUTPUT(21, 10, 11, q8s16, q9s16);
LOAD_FROM_OUTPUT(11, 26, 27, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(27, 26, 27, q6s16, q7s16);
STORE_IN_OUTPUT(27, 4, 5, q4s16, q5s16);
LOAD_FROM_OUTPUT(5, 8, 9, q0s16, q1s16);
q2s16 = vaddq_s16(q14s16, q1s16);
q3s16 = vaddq_s16(q15s16, q0s16);
q4s16 = vsubq_s16(q15s16, q0s16);
q5s16 = vsubq_s16(q14s16, q1s16);
LOAD_FROM_OUTPUT(9, 22, 23, q0s16, q1s16);
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_IN_OUTPUT(23, 22, 23, q6s16, q7s16);
STORE_IN_OUTPUT(23, 8, 9, q8s16, q9s16);
LOAD_FROM_OUTPUT(9, 24, 25, q0s16, q1s16);
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_IN_OUTPUT(25, 24, 25, q6s16, q7s16);
STORE_IN_OUTPUT(25, 6, 7, q4s16, q5s16);
return;
}
static INLINE void idct32_bands_end_2nd_pass(
int16_t *out, uint8_t *dest, int stride, int16x8_t q2s16, int16x8_t q3s16,
int16x8_t q6s16, int16x8_t q7s16, int16x8_t q8s16, int16x8_t q9s16,
int16x8_t q10s16, int16x8_t q11s16, int16x8_t q12s16, int16x8_t q13s16,
int16x8_t q14s16, int16x8_t q15s16) {
uint8_t *r6 = dest + 31 * stride;
uint8_t *r7 = dest /* + 0 * stride*/;
uint8_t *r9 = dest + 15 * stride;
uint8_t *r10 = dest + 16 * stride;
int str2 = stride << 1;
int16x8_t q0s16, q1s16, q4s16, q5s16;
STORE_COMBINE_CENTER_RESULTS(r10, r9);
r10 += str2;
r9 -= str2;
LOAD_FROM_OUTPUT(17, 30, 31, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
r7 += str2;
r6 -= str2;
LOAD_FROM_OUTPUT(31, 12, 13, q0s16, q1s16)
q2s16 = vaddq_s16(q10s16, q1s16);
q3s16 = vaddq_s16(q11s16, q0s16);
q4s16 = vsubq_s16(q11s16, q0s16);
q5s16 = vsubq_s16(q10s16, q1s16);
LOAD_FROM_OUTPUT(13, 18, 19, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_COMBINE_CENTER_RESULTS(r10, r9);
r10 += str2;
r9 -= str2;
LOAD_FROM_OUTPUT(19, 28, 29, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
r7 += str2;
r6 -= str2;
LOAD_FROM_OUTPUT(29, 10, 11, q0s16, q1s16)
q2s16 = vaddq_s16(q12s16, q1s16);
q3s16 = vaddq_s16(q13s16, q0s16);
q4s16 = vsubq_s16(q13s16, q0s16);
q5s16 = vsubq_s16(q12s16, q1s16);
LOAD_FROM_OUTPUT(11, 20, 21, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_COMBINE_CENTER_RESULTS(r10, r9);
r10 += str2;
r9 -= str2;
LOAD_FROM_OUTPUT(21, 26, 27, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
r7 += str2;
r6 -= str2;
LOAD_FROM_OUTPUT(27, 8, 9, q0s16, q1s16)
q2s16 = vaddq_s16(q14s16, q1s16);
q3s16 = vaddq_s16(q15s16, q0s16);
q4s16 = vsubq_s16(q15s16, q0s16);
q5s16 = vsubq_s16(q14s16, q1s16);
LOAD_FROM_OUTPUT(9, 22, 23, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
STORE_COMBINE_CENTER_RESULTS(r10, r9);
LOAD_FROM_OUTPUT(23, 24, 25, q0s16, q1s16)
q4s16 = vaddq_s16(q2s16, q1s16);
q5s16 = vaddq_s16(q3s16, q0s16);
q6s16 = vsubq_s16(q3s16, q0s16);
q7s16 = vsubq_s16(q2s16, q1s16);
STORE_COMBINE_EXTREME_RESULTS(r7, r6);
return;
}
void aom_idct32x32_1024_add_neon(int16_t *input, uint8_t *dest, int stride) {
int i, idct32_pass_loop;
int16_t trans_buf[32 * 8];
int16_t pass1[32 * 32];
int16_t pass2[32 * 32];
int16_t *out;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
for (idct32_pass_loop = 0, out = pass1; idct32_pass_loop < 2;
idct32_pass_loop++,
input = pass1, // the input of pass2 is the result of pass1
out = pass2) {
for (i = 0; i < 4; i++, input += 32 * 8, out += 8) { // idct32_bands_loop
idct32_transpose_pair(input, trans_buf);
// -----------------------------------------
// BLOCK A: 16-19,28-31
// -----------------------------------------
// generate 16,17,30,31
// part of stage 1
LOAD_FROM_TRANSPOSED(0, 1, 31)
DO_BUTTERFLY_STD(cospi_31_64, cospi_1_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(31, 17, 15)
DO_BUTTERFLY_STD(cospi_15_64, cospi_17_64, &q1s16, &q3s16)
// part of stage 2
q4s16 = vaddq_s16(q0s16, q1s16);
q13s16 = vsubq_s16(q0s16, q1s16);
q6s16 = vaddq_s16(q2s16, q3s16);
q14s16 = vsubq_s16(q2s16, q3s16);
// part of stage 3
DO_BUTTERFLY_STD(cospi_28_64, cospi_4_64, &q5s16, &q7s16)
// generate 18,19,28,29
// part of stage 1
LOAD_FROM_TRANSPOSED(15, 9, 23)
DO_BUTTERFLY_STD(cospi_23_64, cospi_9_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(23, 25, 7)
DO_BUTTERFLY_STD(cospi_7_64, cospi_25_64, &q1s16, &q3s16)
// part of stage 2
q13s16 = vsubq_s16(q3s16, q2s16);
q3s16 = vaddq_s16(q3s16, q2s16);
q14s16 = vsubq_s16(q1s16, q0s16);
q2s16 = vaddq_s16(q1s16, q0s16);
// part of stage 3
DO_BUTTERFLY_STD(-cospi_4_64, -cospi_28_64, &q1s16, &q0s16)
// part of stage 4
q8s16 = vaddq_s16(q4s16, q2s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q10s16 = vaddq_s16(q7s16, q1s16);
q15s16 = vaddq_s16(q6s16, q3s16);
q13s16 = vsubq_s16(q5s16, q0s16);
q14s16 = vsubq_s16(q7s16, q1s16);
STORE_IN_OUTPUT(0, 16, 31, q8s16, q15s16)
STORE_IN_OUTPUT(31, 17, 30, q9s16, q10s16)
// part of stage 5
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q0s16, &q1s16)
STORE_IN_OUTPUT(30, 29, 18, q1s16, q0s16)
// part of stage 4
q13s16 = vsubq_s16(q4s16, q2s16);
q14s16 = vsubq_s16(q6s16, q3s16);
// part of stage 5
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q4s16, &q6s16)
STORE_IN_OUTPUT(18, 19, 28, q4s16, q6s16)
// -----------------------------------------
// BLOCK B: 20-23,24-27
// -----------------------------------------
// generate 20,21,26,27
// part of stage 1
LOAD_FROM_TRANSPOSED(7, 5, 27)
DO_BUTTERFLY_STD(cospi_27_64, cospi_5_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(27, 21, 11)
DO_BUTTERFLY_STD(cospi_11_64, cospi_21_64, &q1s16, &q3s16)
// part of stage 2
q13s16 = vsubq_s16(q0s16, q1s16);
q0s16 = vaddq_s16(q0s16, q1s16);
q14s16 = vsubq_s16(q2s16, q3s16);
q2s16 = vaddq_s16(q2s16, q3s16);
// part of stage 3
DO_BUTTERFLY_STD(cospi_12_64, cospi_20_64, &q1s16, &q3s16)
// generate 22,23,24,25
// part of stage 1
LOAD_FROM_TRANSPOSED(11, 13, 19)
DO_BUTTERFLY_STD(cospi_19_64, cospi_13_64, &q5s16, &q7s16)
LOAD_FROM_TRANSPOSED(19, 29, 3)
DO_BUTTERFLY_STD(cospi_3_64, cospi_29_64, &q4s16, &q6s16)
// part of stage 2
q14s16 = vsubq_s16(q4s16, q5s16);
q5s16 = vaddq_s16(q4s16, q5s16);
q13s16 = vsubq_s16(q6s16, q7s16);
q6s16 = vaddq_s16(q6s16, q7s16);
// part of stage 3
DO_BUTTERFLY_STD(-cospi_20_64, -cospi_12_64, &q4s16, &q7s16)
// part of stage 4
q10s16 = vaddq_s16(q7s16, q1s16);
q11s16 = vaddq_s16(q5s16, q0s16);
q12s16 = vaddq_s16(q6s16, q2s16);
q15s16 = vaddq_s16(q4s16, q3s16);
// part of stage 6
LOAD_FROM_OUTPUT(28, 16, 17, q14s16, q13s16)
q8s16 = vaddq_s16(q14s16, q11s16);
q9s16 = vaddq_s16(q13s16, q10s16);
q13s16 = vsubq_s16(q13s16, q10s16);
q11s16 = vsubq_s16(q14s16, q11s16);
STORE_IN_OUTPUT(17, 17, 16, q9s16, q8s16)
LOAD_FROM_OUTPUT(16, 30, 31, q14s16, q9s16)
q8s16 = vsubq_s16(q9s16, q12s16);
q10s16 = vaddq_s16(q14s16, q15s16);
q14s16 = vsubq_s16(q14s16, q15s16);
q12s16 = vaddq_s16(q9s16, q12s16);
STORE_IN_OUTPUT(31, 30, 31, q10s16, q12s16)
// part of stage 7
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q13s16, &q14s16)
STORE_IN_OUTPUT(31, 25, 22, q14s16, q13s16)
q13s16 = q11s16;
q14s16 = q8s16;
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q13s16, &q14s16)
STORE_IN_OUTPUT(22, 24, 23, q14s16, q13s16)
// part of stage 4
q14s16 = vsubq_s16(q5s16, q0s16);
q13s16 = vsubq_s16(q6s16, q2s16);
DO_BUTTERFLY_STD(-cospi_8_64, -cospi_24_64, &q5s16, &q6s16);
q14s16 = vsubq_s16(q7s16, q1s16);
q13s16 = vsubq_s16(q4s16, q3s16);
DO_BUTTERFLY_STD(-cospi_8_64, -cospi_24_64, &q0s16, &q1s16);
// part of stage 6
LOAD_FROM_OUTPUT(23, 18, 19, q14s16, q13s16)
q8s16 = vaddq_s16(q14s16, q1s16);
q9s16 = vaddq_s16(q13s16, q6s16);
q13s16 = vsubq_s16(q13s16, q6s16);
q1s16 = vsubq_s16(q14s16, q1s16);
STORE_IN_OUTPUT(19, 18, 19, q8s16, q9s16)
LOAD_FROM_OUTPUT(19, 28, 29, q8s16, q9s16)
q14s16 = vsubq_s16(q8s16, q5s16);
q10s16 = vaddq_s16(q8s16, q5s16);
q11s16 = vaddq_s16(q9s16, q0s16);
q0s16 = vsubq_s16(q9s16, q0s16);
STORE_IN_OUTPUT(29, 28, 29, q10s16, q11s16)
// part of stage 7
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q13s16, &q14s16)
STORE_IN_OUTPUT(29, 20, 27, q13s16, q14s16)
DO_BUTTERFLY(q0s16, q1s16, cospi_16_64, cospi_16_64, &q1s16, &q0s16);
STORE_IN_OUTPUT(27, 21, 26, q1s16, q0s16)
// -----------------------------------------
// BLOCK C: 8-10,11-15
// -----------------------------------------
// generate 8,9,14,15
// part of stage 2
LOAD_FROM_TRANSPOSED(3, 2, 30)
DO_BUTTERFLY_STD(cospi_30_64, cospi_2_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(30, 18, 14)
DO_BUTTERFLY_STD(cospi_14_64, cospi_18_64, &q1s16, &q3s16)
// part of stage 3
q13s16 = vsubq_s16(q0s16, q1s16);
q0s16 = vaddq_s16(q0s16, q1s16);
q14s16 = vsubq_s16(q2s16, q3s16);
q2s16 = vaddq_s16(q2s16, q3s16);
// part of stage 4
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q1s16, &q3s16)
// generate 10,11,12,13
// part of stage 2
LOAD_FROM_TRANSPOSED(14, 10, 22)
DO_BUTTERFLY_STD(cospi_22_64, cospi_10_64, &q5s16, &q7s16)
LOAD_FROM_TRANSPOSED(22, 26, 6)
DO_BUTTERFLY_STD(cospi_6_64, cospi_26_64, &q4s16, &q6s16)
// part of stage 3
q14s16 = vsubq_s16(q4s16, q5s16);
q5s16 = vaddq_s16(q4s16, q5s16);
q13s16 = vsubq_s16(q6s16, q7s16);
q6s16 = vaddq_s16(q6s16, q7s16);
// part of stage 4
DO_BUTTERFLY_STD(-cospi_8_64, -cospi_24_64, &q4s16, &q7s16)
// part of stage 5
q8s16 = vaddq_s16(q0s16, q5s16);
q9s16 = vaddq_s16(q1s16, q7s16);
q13s16 = vsubq_s16(q1s16, q7s16);
q14s16 = vsubq_s16(q3s16, q4s16);
q10s16 = vaddq_s16(q3s16, q4s16);
q15s16 = vaddq_s16(q2s16, q6s16);
STORE_IN_OUTPUT(26, 8, 15, q8s16, q15s16)
STORE_IN_OUTPUT(15, 9, 14, q9s16, q10s16)
// part of stage 6
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q1s16, &q3s16)
STORE_IN_OUTPUT(14, 13, 10, q3s16, q1s16)
q13s16 = vsubq_s16(q0s16, q5s16);
q14s16 = vsubq_s16(q2s16, q6s16);
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q1s16, &q3s16)
STORE_IN_OUTPUT(10, 11, 12, q1s16, q3s16)
// -----------------------------------------
// BLOCK D: 0-3,4-7
// -----------------------------------------
// generate 4,5,6,7
// part of stage 3
LOAD_FROM_TRANSPOSED(6, 4, 28)
DO_BUTTERFLY_STD(cospi_28_64, cospi_4_64, &q0s16, &q2s16)
LOAD_FROM_TRANSPOSED(28, 20, 12)
DO_BUTTERFLY_STD(cospi_12_64, cospi_20_64, &q1s16, &q3s16)
// part of stage 4
q13s16 = vsubq_s16(q0s16, q1s16);
q0s16 = vaddq_s16(q0s16, q1s16);
q14s16 = vsubq_s16(q2s16, q3s16);
q2s16 = vaddq_s16(q2s16, q3s16);
// part of stage 5
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q1s16, &q3s16)
// generate 0,1,2,3
// part of stage 4
LOAD_FROM_TRANSPOSED(12, 0, 16)
DO_BUTTERFLY_STD(cospi_16_64, cospi_16_64, &q5s16, &q7s16)
LOAD_FROM_TRANSPOSED(16, 8, 24)
DO_BUTTERFLY_STD(cospi_24_64, cospi_8_64, &q14s16, &q6s16)
// part of stage 5
q4s16 = vaddq_s16(q7s16, q6s16);
q7s16 = vsubq_s16(q7s16, q6s16);
q6s16 = vsubq_s16(q5s16, q14s16);
q5s16 = vaddq_s16(q5s16, q14s16);
// part of stage 6
q8s16 = vaddq_s16(q4s16, q2s16);
q9s16 = vaddq_s16(q5s16, q3s16);
q10s16 = vaddq_s16(q6s16, q1s16);
q11s16 = vaddq_s16(q7s16, q0s16);
q12s16 = vsubq_s16(q7s16, q0s16);
q13s16 = vsubq_s16(q6s16, q1s16);
q14s16 = vsubq_s16(q5s16, q3s16);
q15s16 = vsubq_s16(q4s16, q2s16);
// part of stage 7
LOAD_FROM_OUTPUT(12, 14, 15, q0s16, q1s16)
q2s16 = vaddq_s16(q8s16, q1s16);
q3s16 = vaddq_s16(q9s16, q0s16);
q4s16 = vsubq_s16(q9s16, q0s16);
q5s16 = vsubq_s16(q8s16, q1s16);
LOAD_FROM_OUTPUT(15, 16, 17, q0s16, q1s16)
q8s16 = vaddq_s16(q4s16, q1s16);
q9s16 = vaddq_s16(q5s16, q0s16);
q6s16 = vsubq_s16(q5s16, q0s16);
q7s16 = vsubq_s16(q4s16, q1s16);
if (idct32_pass_loop == 0) {
idct32_bands_end_1st_pass(out, q2s16, q3s16, q6s16, q7s16, q8s16, q9s16,
q10s16, q11s16, q12s16, q13s16, q14s16,
q15s16);
} else {
idct32_bands_end_2nd_pass(out, dest, stride, q2s16, q3s16, q6s16, q7s16,
q8s16, q9s16, q10s16, q11s16, q12s16, q13s16,
q14s16, q15s16);
dest += 8;
}
}
}
return;
}

View File

@ -1,28 +1,31 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_idct4x4_1_add_neon|
EXPORT |aom_idct4x4_1_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_idct4x4_1_add_neon(int16_t *input, uint8_t *dest,
;void aom_idct4x4_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vpx_idct4x4_1_add_neon| PROC
|aom_idct4x4_1_add_neon| PROC
ldrsh r0, [r0]
; generate cospi_16_64 = 11585
@ -63,6 +66,6 @@
vst1.32 {d7[1]}, [r12]
bx lr
ENDP ; |vpx_idct4x4_1_add_neon|
ENDP ; |aom_idct4x4_1_add_neon|
END

View File

@ -0,0 +1,47 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct4x4_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d6u8;
uint32x2_t d2u32 = vdup_n_u32(0);
uint16x8_t q8u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 4);
q0s16 = vdupq_n_s16(a1);
// dc_only_idct_add
d1 = d2 = dest;
for (i = 0; i < 2; i++) {
d2u32 = vld1_lane_u32((const uint32_t *)d1, d2u32, 0);
d1 += dest_stride;
d2u32 = vld1_lane_u32((const uint32_t *)d1, d2u32, 1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q0s16), vreinterpret_u8_u32(d2u32));
d6u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
vst1_lane_u32((uint32_t *)d2, vreinterpret_u32_u8(d6u8), 0);
d2 += dest_stride;
vst1_lane_u32((uint32_t *)d2, vreinterpret_u32_u8(d6u8), 1);
d2 += dest_stride;
}
return;
}

View File

@ -1,14 +1,17 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_idct4x4_16_add_neon|
;
EXPORT |aom_idct4x4_16_add_neon|
ARM
REQUIRE8
PRESERVE8
@ -16,13 +19,13 @@
AREA ||.text||, CODE, READONLY, ALIGN=2
AREA Block, CODE, READONLY ; name this block of code
;void vpx_idct4x4_16_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;void aom_idct4x4_16_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vpx_idct4x4_16_add_neon| PROC
|aom_idct4x4_16_add_neon| PROC
; The 2D transform is done with two passes which are actually pretty
; similar. We first transform the rows. This is done by transposing
@ -185,6 +188,6 @@
vst1.32 {d26[1]}, [r1], r2
vst1.32 {d26[0]}, [r1] ; no post-increment
bx lr
ENDP ; |vpx_idct4x4_16_add_neon|
ENDP ; |aom_idct4x4_16_add_neon|
END

View File

@ -0,0 +1,146 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/txfm_common.h"
void aom_idct4x4_16_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d26u8, d27u8;
uint32x2_t d26u32, d27u32;
uint16x8_t q8u16, q9u16;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16;
int16x4_t d22s16, d23s16, d24s16, d26s16, d27s16, d28s16, d29s16;
int16x8_t q8s16, q9s16, q13s16, q14s16;
int32x4_t q1s32, q13s32, q14s32, q15s32;
int16x4x2_t d0x2s16, d1x2s16;
int32x4x2_t q0x2s32;
uint8_t *d;
d26u32 = d27u32 = vdup_n_u32(0);
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_low_s16(q9s16);
d19s16 = vget_high_s16(q9s16);
d0x2s16 = vtrn_s16(d16s16, d17s16);
d1x2s16 = vtrn_s16(d18s16, d19s16);
q8s16 = vcombine_s16(d0x2s16.val[0], d0x2s16.val[1]);
q9s16 = vcombine_s16(d1x2s16.val[0], d1x2s16.val[1]);
d20s16 = vdup_n_s16((int16_t)cospi_8_64);
d21s16 = vdup_n_s16((int16_t)cospi_16_64);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q9s16));
d16s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d17s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d18s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d19s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d22s16 = vdup_n_s16((int16_t)cospi_24_64);
// stage 1
d23s16 = vadd_s16(d16s16, d18s16);
d24s16 = vsub_s16(d16s16, d18s16);
q15s32 = vmull_s16(d17s16, d22s16);
q1s32 = vmull_s16(d17s16, d20s16);
q13s32 = vmull_s16(d23s16, d21s16);
q14s32 = vmull_s16(d24s16, d21s16);
q15s32 = vmlsl_s16(q15s32, d19s16, d20s16);
q1s32 = vmlal_s16(q1s32, d19s16, d22s16);
d26s16 = vqrshrn_n_s32(q13s32, 14);
d27s16 = vqrshrn_n_s32(q14s32, 14);
d29s16 = vqrshrn_n_s32(q15s32, 14);
d28s16 = vqrshrn_n_s32(q1s32, 14);
q13s16 = vcombine_s16(d26s16, d27s16);
q14s16 = vcombine_s16(d28s16, d29s16);
// stage 2
q8s16 = vaddq_s16(q13s16, q14s16);
q9s16 = vsubq_s16(q13s16, q14s16);
d16s16 = vget_low_s16(q8s16);
d17s16 = vget_high_s16(q8s16);
d18s16 = vget_high_s16(q9s16); // vswp d18 d19
d19s16 = vget_low_s16(q9s16);
d0x2s16 = vtrn_s16(d16s16, d17s16);
d1x2s16 = vtrn_s16(d18s16, d19s16);
q8s16 = vcombine_s16(d0x2s16.val[0], d0x2s16.val[1]);
q9s16 = vcombine_s16(d1x2s16.val[0], d1x2s16.val[1]);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(q8s16), vreinterpretq_s32_s16(q9s16));
d16s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d17s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[0]));
d18s16 = vget_low_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
d19s16 = vget_high_s16(vreinterpretq_s16_s32(q0x2s32.val[1]));
// do the transform on columns
// stage 1
d23s16 = vadd_s16(d16s16, d18s16);
d24s16 = vsub_s16(d16s16, d18s16);
q15s32 = vmull_s16(d17s16, d22s16);
q1s32 = vmull_s16(d17s16, d20s16);
q13s32 = vmull_s16(d23s16, d21s16);
q14s32 = vmull_s16(d24s16, d21s16);
q15s32 = vmlsl_s16(q15s32, d19s16, d20s16);
q1s32 = vmlal_s16(q1s32, d19s16, d22s16);
d26s16 = vqrshrn_n_s32(q13s32, 14);
d27s16 = vqrshrn_n_s32(q14s32, 14);
d29s16 = vqrshrn_n_s32(q15s32, 14);
d28s16 = vqrshrn_n_s32(q1s32, 14);
q13s16 = vcombine_s16(d26s16, d27s16);
q14s16 = vcombine_s16(d28s16, d29s16);
// stage 2
q8s16 = vaddq_s16(q13s16, q14s16);
q9s16 = vsubq_s16(q13s16, q14s16);
q8s16 = vrshrq_n_s16(q8s16, 4);
q9s16 = vrshrq_n_s16(q9s16, 4);
d = dest;
d26u32 = vld1_lane_u32((const uint32_t *)d, d26u32, 0);
d += dest_stride;
d26u32 = vld1_lane_u32((const uint32_t *)d, d26u32, 1);
d += dest_stride;
d27u32 = vld1_lane_u32((const uint32_t *)d, d27u32, 1);
d += dest_stride;
d27u32 = vld1_lane_u32((const uint32_t *)d, d27u32, 0);
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u32(d26u32));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u32(d27u32));
d26u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d27u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d = dest;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d26u8), 0);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d26u8), 1);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d27u8), 1);
d += dest_stride;
vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d27u8), 0);
return;
}

View File

@ -1,28 +1,31 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_idct8x8_1_add_neon|
EXPORT |aom_idct8x8_1_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_idct8x8_1_add_neon(int16_t *input, uint8_t *dest,
;void aom_idct8x8_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vpx_idct8x8_1_add_neon| PROC
|aom_idct8x8_1_add_neon| PROC
ldrsh r0, [r0]
; generate cospi_16_64 = 11585
@ -83,6 +86,6 @@
vst1.64 {d31}, [r12], r2
bx lr
ENDP ; |vpx_idct8x8_1_add_neon|
ENDP ; |aom_idct8x8_1_add_neon|
END

View File

@ -0,0 +1,62 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "aom_dsp/inv_txfm.h"
#include "aom_ports/mem.h"
void aom_idct8x8_1_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8x8_t d2u8, d3u8, d30u8, d31u8;
uint64x1_t d2u64, d3u64, d4u64, d5u64;
uint16x8_t q0u16, q9u16, q10u16, q11u16, q12u16;
int16x8_t q0s16;
uint8_t *d1, *d2;
int16_t i, a1;
int16_t out = dct_const_round_shift(input[0] * cospi_16_64);
out = dct_const_round_shift(out * cospi_16_64);
a1 = ROUND_POWER_OF_TWO(out, 5);
q0s16 = vdupq_n_s16(a1);
q0u16 = vreinterpretq_u16_s16(q0s16);
d1 = d2 = dest;
for (i = 0; i < 2; i++) {
d2u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d4u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
d5u64 = vld1_u64((const uint64_t *)d1);
d1 += dest_stride;
q9u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d2u64));
q10u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d3u64));
q11u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d4u64));
q12u16 = vaddw_u8(q0u16, vreinterpret_u8_u64(d5u64));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d30u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
d31u8 = vqmovun_s16(vreinterpretq_s16_u16(q12u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d30u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d31u8));
d2 += dest_stride;
}
return;
}

View File

@ -1,15 +1,18 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_idct8x8_64_add_neon|
EXPORT |vpx_idct8x8_12_add_neon|
;
EXPORT |aom_idct8x8_64_add_neon|
EXPORT |aom_idct8x8_12_add_neon|
ARM
REQUIRE8
PRESERVE8
@ -198,13 +201,13 @@
MEND
AREA Block, CODE, READONLY ; name this block of code
;void vpx_idct8x8_64_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;void aom_idct8x8_64_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vpx_idct8x8_64_add_neon| PROC
|aom_idct8x8_64_add_neon| PROC
push {r4-r9}
vpush {d8-d15}
vld1.s16 {q8,q9}, [r0]!
@ -308,15 +311,15 @@
vpop {d8-d15}
pop {r4-r9}
bx lr
ENDP ; |vpx_idct8x8_64_add_neon|
ENDP ; |aom_idct8x8_64_add_neon|
;void vpx_idct8x8_12_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;void aom_idct8x8_12_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vpx_idct8x8_12_add_neon| PROC
|aom_idct8x8_12_add_neon| PROC
push {r4-r9}
vpush {d8-d15}
vld1.s16 {q8,q9}, [r0]!
@ -514,6 +517,6 @@
vpop {d8-d15}
pop {r4-r9}
bx lr
ENDP ; |vpx_idct8x8_12_add_neon|
ENDP ; |aom_idct8x8_12_add_neon|
END

View File

@ -0,0 +1,509 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom_dsp/txfm_common.h"
static INLINE void TRANSPOSE8X8(int16x8_t *q8s16, int16x8_t *q9s16,
int16x8_t *q10s16, int16x8_t *q11s16,
int16x8_t *q12s16, int16x8_t *q13s16,
int16x8_t *q14s16, int16x8_t *q15s16) {
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int32x4x2_t q0x2s32, q1x2s32, q2x2s32, q3x2s32;
int16x8x2_t q0x2s16, q1x2s16, q2x2s16, q3x2s16;
d16s16 = vget_low_s16(*q8s16);
d17s16 = vget_high_s16(*q8s16);
d18s16 = vget_low_s16(*q9s16);
d19s16 = vget_high_s16(*q9s16);
d20s16 = vget_low_s16(*q10s16);
d21s16 = vget_high_s16(*q10s16);
d22s16 = vget_low_s16(*q11s16);
d23s16 = vget_high_s16(*q11s16);
d24s16 = vget_low_s16(*q12s16);
d25s16 = vget_high_s16(*q12s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d30s16 = vget_low_s16(*q15s16);
d31s16 = vget_high_s16(*q15s16);
*q8s16 = vcombine_s16(d16s16, d24s16); // vswp d17, d24
*q9s16 = vcombine_s16(d18s16, d26s16); // vswp d19, d26
*q10s16 = vcombine_s16(d20s16, d28s16); // vswp d21, d28
*q11s16 = vcombine_s16(d22s16, d30s16); // vswp d23, d30
*q12s16 = vcombine_s16(d17s16, d25s16);
*q13s16 = vcombine_s16(d19s16, d27s16);
*q14s16 = vcombine_s16(d21s16, d29s16);
*q15s16 = vcombine_s16(d23s16, d31s16);
q0x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q8s16), vreinterpretq_s32_s16(*q10s16));
q1x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q9s16), vreinterpretq_s32_s16(*q11s16));
q2x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q12s16), vreinterpretq_s32_s16(*q14s16));
q3x2s32 =
vtrnq_s32(vreinterpretq_s32_s16(*q13s16), vreinterpretq_s32_s16(*q15s16));
q0x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[0]), // q8
vreinterpretq_s16_s32(q1x2s32.val[0])); // q9
q1x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q0x2s32.val[1]), // q10
vreinterpretq_s16_s32(q1x2s32.val[1])); // q11
q2x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[0]), // q12
vreinterpretq_s16_s32(q3x2s32.val[0])); // q13
q3x2s16 = vtrnq_s16(vreinterpretq_s16_s32(q2x2s32.val[1]), // q14
vreinterpretq_s16_s32(q3x2s32.val[1])); // q15
*q8s16 = q0x2s16.val[0];
*q9s16 = q0x2s16.val[1];
*q10s16 = q1x2s16.val[0];
*q11s16 = q1x2s16.val[1];
*q12s16 = q2x2s16.val[0];
*q13s16 = q2x2s16.val[1];
*q14s16 = q3x2s16.val[0];
*q15s16 = q3x2s16.val[1];
return;
}
static INLINE void IDCT8x8_1D(int16x8_t *q8s16, int16x8_t *q9s16,
int16x8_t *q10s16, int16x8_t *q11s16,
int16x8_t *q12s16, int16x8_t *q13s16,
int16x8_t *q14s16, int16x8_t *q15s16) {
int16x4_t d0s16, d1s16, d2s16, d3s16;
int16x4_t d8s16, d9s16, d10s16, d11s16, d12s16, d13s16, d14s16, d15s16;
int16x4_t d16s16, d17s16, d18s16, d19s16, d20s16, d21s16, d22s16, d23s16;
int16x4_t d24s16, d25s16, d26s16, d27s16, d28s16, d29s16, d30s16, d31s16;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int32x4_t q2s32, q3s32, q5s32, q6s32, q8s32, q9s32;
int32x4_t q10s32, q11s32, q12s32, q13s32, q15s32;
d0s16 = vdup_n_s16((int16_t)cospi_28_64);
d1s16 = vdup_n_s16((int16_t)cospi_4_64);
d2s16 = vdup_n_s16((int16_t)cospi_12_64);
d3s16 = vdup_n_s16((int16_t)cospi_20_64);
d16s16 = vget_low_s16(*q8s16);
d17s16 = vget_high_s16(*q8s16);
d18s16 = vget_low_s16(*q9s16);
d19s16 = vget_high_s16(*q9s16);
d20s16 = vget_low_s16(*q10s16);
d21s16 = vget_high_s16(*q10s16);
d22s16 = vget_low_s16(*q11s16);
d23s16 = vget_high_s16(*q11s16);
d24s16 = vget_low_s16(*q12s16);
d25s16 = vget_high_s16(*q12s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d30s16 = vget_low_s16(*q15s16);
d31s16 = vget_high_s16(*q15s16);
q2s32 = vmull_s16(d18s16, d0s16);
q3s32 = vmull_s16(d19s16, d0s16);
q5s32 = vmull_s16(d26s16, d2s16);
q6s32 = vmull_s16(d27s16, d2s16);
q2s32 = vmlsl_s16(q2s32, d30s16, d1s16);
q3s32 = vmlsl_s16(q3s32, d31s16, d1s16);
q5s32 = vmlsl_s16(q5s32, d22s16, d3s16);
q6s32 = vmlsl_s16(q6s32, d23s16, d3s16);
d8s16 = vqrshrn_n_s32(q2s32, 14);
d9s16 = vqrshrn_n_s32(q3s32, 14);
d10s16 = vqrshrn_n_s32(q5s32, 14);
d11s16 = vqrshrn_n_s32(q6s32, 14);
q4s16 = vcombine_s16(d8s16, d9s16);
q5s16 = vcombine_s16(d10s16, d11s16);
q2s32 = vmull_s16(d18s16, d1s16);
q3s32 = vmull_s16(d19s16, d1s16);
q9s32 = vmull_s16(d26s16, d3s16);
q13s32 = vmull_s16(d27s16, d3s16);
q2s32 = vmlal_s16(q2s32, d30s16, d0s16);
q3s32 = vmlal_s16(q3s32, d31s16, d0s16);
q9s32 = vmlal_s16(q9s32, d22s16, d2s16);
q13s32 = vmlal_s16(q13s32, d23s16, d2s16);
d14s16 = vqrshrn_n_s32(q2s32, 14);
d15s16 = vqrshrn_n_s32(q3s32, 14);
d12s16 = vqrshrn_n_s32(q9s32, 14);
d13s16 = vqrshrn_n_s32(q13s32, 14);
q6s16 = vcombine_s16(d12s16, d13s16);
q7s16 = vcombine_s16(d14s16, d15s16);
d0s16 = vdup_n_s16((int16_t)cospi_16_64);
q2s32 = vmull_s16(d16s16, d0s16);
q3s32 = vmull_s16(d17s16, d0s16);
q13s32 = vmull_s16(d16s16, d0s16);
q15s32 = vmull_s16(d17s16, d0s16);
q2s32 = vmlal_s16(q2s32, d24s16, d0s16);
q3s32 = vmlal_s16(q3s32, d25s16, d0s16);
q13s32 = vmlsl_s16(q13s32, d24s16, d0s16);
q15s32 = vmlsl_s16(q15s32, d25s16, d0s16);
d0s16 = vdup_n_s16((int16_t)cospi_24_64);
d1s16 = vdup_n_s16((int16_t)cospi_8_64);
d18s16 = vqrshrn_n_s32(q2s32, 14);
d19s16 = vqrshrn_n_s32(q3s32, 14);
d22s16 = vqrshrn_n_s32(q13s32, 14);
d23s16 = vqrshrn_n_s32(q15s32, 14);
*q9s16 = vcombine_s16(d18s16, d19s16);
*q11s16 = vcombine_s16(d22s16, d23s16);
q2s32 = vmull_s16(d20s16, d0s16);
q3s32 = vmull_s16(d21s16, d0s16);
q8s32 = vmull_s16(d20s16, d1s16);
q12s32 = vmull_s16(d21s16, d1s16);
q2s32 = vmlsl_s16(q2s32, d28s16, d1s16);
q3s32 = vmlsl_s16(q3s32, d29s16, d1s16);
q8s32 = vmlal_s16(q8s32, d28s16, d0s16);
q12s32 = vmlal_s16(q12s32, d29s16, d0s16);
d26s16 = vqrshrn_n_s32(q2s32, 14);
d27s16 = vqrshrn_n_s32(q3s32, 14);
d30s16 = vqrshrn_n_s32(q8s32, 14);
d31s16 = vqrshrn_n_s32(q12s32, 14);
*q13s16 = vcombine_s16(d26s16, d27s16);
*q15s16 = vcombine_s16(d30s16, d31s16);
q0s16 = vaddq_s16(*q9s16, *q15s16);
q1s16 = vaddq_s16(*q11s16, *q13s16);
q2s16 = vsubq_s16(*q11s16, *q13s16);
q3s16 = vsubq_s16(*q9s16, *q15s16);
*q13s16 = vsubq_s16(q4s16, q5s16);
q4s16 = vaddq_s16(q4s16, q5s16);
*q14s16 = vsubq_s16(q7s16, q6s16);
q7s16 = vaddq_s16(q7s16, q6s16);
d26s16 = vget_low_s16(*q13s16);
d27s16 = vget_high_s16(*q13s16);
d28s16 = vget_low_s16(*q14s16);
d29s16 = vget_high_s16(*q14s16);
d16s16 = vdup_n_s16((int16_t)cospi_16_64);
q9s32 = vmull_s16(d28s16, d16s16);
q10s32 = vmull_s16(d29s16, d16s16);
q11s32 = vmull_s16(d28s16, d16s16);
q12s32 = vmull_s16(d29s16, d16s16);
q9s32 = vmlsl_s16(q9s32, d26s16, d16s16);
q10s32 = vmlsl_s16(q10s32, d27s16, d16s16);
q11s32 = vmlal_s16(q11s32, d26s16, d16s16);
q12s32 = vmlal_s16(q12s32, d27s16, d16s16);
d10s16 = vqrshrn_n_s32(q9s32, 14);
d11s16 = vqrshrn_n_s32(q10s32, 14);
d12s16 = vqrshrn_n_s32(q11s32, 14);
d13s16 = vqrshrn_n_s32(q12s32, 14);
q5s16 = vcombine_s16(d10s16, d11s16);
q6s16 = vcombine_s16(d12s16, d13s16);
*q8s16 = vaddq_s16(q0s16, q7s16);
*q9s16 = vaddq_s16(q1s16, q6s16);
*q10s16 = vaddq_s16(q2s16, q5s16);
*q11s16 = vaddq_s16(q3s16, q4s16);
*q12s16 = vsubq_s16(q3s16, q4s16);
*q13s16 = vsubq_s16(q2s16, q5s16);
*q14s16 = vsubq_s16(q1s16, q6s16);
*q15s16 = vsubq_s16(q0s16, q7s16);
return;
}
void aom_idct8x8_64_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8_t *d1, *d2;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
uint64x1_t d0u64, d1u64, d2u64, d3u64;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
uint16x8_t q8u16, q9u16, q10u16, q11u16;
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
q10s16 = vld1q_s16(input + 16);
q11s16 = vld1q_s16(input + 24);
q12s16 = vld1q_s16(input + 32);
q13s16 = vld1q_s16(input + 40);
q14s16 = vld1q_s16(input + 48);
q15s16 = vld1q_s16(input + 56);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
q8s16 = vrshrq_n_s16(q8s16, 5);
q9s16 = vrshrq_n_s16(q9s16, 5);
q10s16 = vrshrq_n_s16(q10s16, 5);
q11s16 = vrshrq_n_s16(q11s16, 5);
q12s16 = vrshrq_n_s16(q12s16, 5);
q13s16 = vrshrq_n_s16(q13s16, 5);
q14s16 = vrshrq_n_s16(q14s16, 5);
q15s16 = vrshrq_n_s16(q15s16, 5);
d1 = d2 = dest;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
q8s16 = q12s16;
q9s16 = q13s16;
q10s16 = q14s16;
q11s16 = q15s16;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
return;
}
void aom_idct8x8_12_add_neon(int16_t *input, uint8_t *dest, int dest_stride) {
uint8_t *d1, *d2;
uint8x8_t d0u8, d1u8, d2u8, d3u8;
int16x4_t d10s16, d11s16, d12s16, d13s16, d16s16;
int16x4_t d26s16, d27s16, d28s16, d29s16;
uint64x1_t d0u64, d1u64, d2u64, d3u64;
int16x8_t q0s16, q1s16, q2s16, q3s16, q4s16, q5s16, q6s16, q7s16;
int16x8_t q8s16, q9s16, q10s16, q11s16, q12s16, q13s16, q14s16, q15s16;
uint16x8_t q8u16, q9u16, q10u16, q11u16;
int32x4_t q9s32, q10s32, q11s32, q12s32;
q8s16 = vld1q_s16(input);
q9s16 = vld1q_s16(input + 8);
q10s16 = vld1q_s16(input + 16);
q11s16 = vld1q_s16(input + 24);
q12s16 = vld1q_s16(input + 32);
q13s16 = vld1q_s16(input + 40);
q14s16 = vld1q_s16(input + 48);
q15s16 = vld1q_s16(input + 56);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
// First transform rows
// stage 1
q0s16 = vdupq_n_s16((int16_t)cospi_28_64 * 2);
q1s16 = vdupq_n_s16((int16_t)cospi_4_64 * 2);
q4s16 = vqrdmulhq_s16(q9s16, q0s16);
q0s16 = vdupq_n_s16(-(int16_t)cospi_20_64 * 2);
q7s16 = vqrdmulhq_s16(q9s16, q1s16);
q1s16 = vdupq_n_s16((int16_t)cospi_12_64 * 2);
q5s16 = vqrdmulhq_s16(q11s16, q0s16);
q0s16 = vdupq_n_s16((int16_t)cospi_16_64 * 2);
q6s16 = vqrdmulhq_s16(q11s16, q1s16);
// stage 2 & stage 3 - even half
q1s16 = vdupq_n_s16((int16_t)cospi_24_64 * 2);
q9s16 = vqrdmulhq_s16(q8s16, q0s16);
q0s16 = vdupq_n_s16((int16_t)cospi_8_64 * 2);
q13s16 = vqrdmulhq_s16(q10s16, q1s16);
q15s16 = vqrdmulhq_s16(q10s16, q0s16);
// stage 3 -odd half
q0s16 = vaddq_s16(q9s16, q15s16);
q1s16 = vaddq_s16(q9s16, q13s16);
q2s16 = vsubq_s16(q9s16, q13s16);
q3s16 = vsubq_s16(q9s16, q15s16);
// stage 2 - odd half
q13s16 = vsubq_s16(q4s16, q5s16);
q4s16 = vaddq_s16(q4s16, q5s16);
q14s16 = vsubq_s16(q7s16, q6s16);
q7s16 = vaddq_s16(q7s16, q6s16);
d26s16 = vget_low_s16(q13s16);
d27s16 = vget_high_s16(q13s16);
d28s16 = vget_low_s16(q14s16);
d29s16 = vget_high_s16(q14s16);
d16s16 = vdup_n_s16((int16_t)cospi_16_64);
q9s32 = vmull_s16(d28s16, d16s16);
q10s32 = vmull_s16(d29s16, d16s16);
q11s32 = vmull_s16(d28s16, d16s16);
q12s32 = vmull_s16(d29s16, d16s16);
q9s32 = vmlsl_s16(q9s32, d26s16, d16s16);
q10s32 = vmlsl_s16(q10s32, d27s16, d16s16);
q11s32 = vmlal_s16(q11s32, d26s16, d16s16);
q12s32 = vmlal_s16(q12s32, d27s16, d16s16);
d10s16 = vqrshrn_n_s32(q9s32, 14);
d11s16 = vqrshrn_n_s32(q10s32, 14);
d12s16 = vqrshrn_n_s32(q11s32, 14);
d13s16 = vqrshrn_n_s32(q12s32, 14);
q5s16 = vcombine_s16(d10s16, d11s16);
q6s16 = vcombine_s16(d12s16, d13s16);
// stage 4
q8s16 = vaddq_s16(q0s16, q7s16);
q9s16 = vaddq_s16(q1s16, q6s16);
q10s16 = vaddq_s16(q2s16, q5s16);
q11s16 = vaddq_s16(q3s16, q4s16);
q12s16 = vsubq_s16(q3s16, q4s16);
q13s16 = vsubq_s16(q2s16, q5s16);
q14s16 = vsubq_s16(q1s16, q6s16);
q15s16 = vsubq_s16(q0s16, q7s16);
TRANSPOSE8X8(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
IDCT8x8_1D(&q8s16, &q9s16, &q10s16, &q11s16, &q12s16, &q13s16, &q14s16,
&q15s16);
q8s16 = vrshrq_n_s16(q8s16, 5);
q9s16 = vrshrq_n_s16(q9s16, 5);
q10s16 = vrshrq_n_s16(q10s16, 5);
q11s16 = vrshrq_n_s16(q11s16, 5);
q12s16 = vrshrq_n_s16(q12s16, 5);
q13s16 = vrshrq_n_s16(q13s16, 5);
q14s16 = vrshrq_n_s16(q14s16, 5);
q15s16 = vrshrq_n_s16(q15s16, 5);
d1 = d2 = dest;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
q8s16 = q12s16;
q9s16 = q13s16;
q10s16 = q14s16;
q11s16 = q15s16;
d0u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d1u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d2u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
d3u64 = vld1_u64((uint64_t *)d1);
d1 += dest_stride;
q8u16 = vaddw_u8(vreinterpretq_u16_s16(q8s16), vreinterpret_u8_u64(d0u64));
q9u16 = vaddw_u8(vreinterpretq_u16_s16(q9s16), vreinterpret_u8_u64(d1u64));
q10u16 = vaddw_u8(vreinterpretq_u16_s16(q10s16), vreinterpret_u8_u64(d2u64));
q11u16 = vaddw_u8(vreinterpretq_u16_s16(q11s16), vreinterpret_u8_u64(d3u64));
d0u8 = vqmovun_s16(vreinterpretq_s16_u16(q8u16));
d1u8 = vqmovun_s16(vreinterpretq_s16_u16(q9u16));
d2u8 = vqmovun_s16(vreinterpretq_s16_u16(q10u16));
d3u8 = vqmovun_s16(vreinterpretq_s16_u16(q11u16));
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d0u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d1u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d2u8));
d2 += dest_stride;
vst1_u64((uint64_t *)d2, vreinterpret_u64_u8(d3u8));
d2 += dest_stride;
return;
}

View File

@ -1,26 +1,26 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./vpx_config.h"
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
//------------------------------------------------------------------------------
// DC 4x4
// 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left,
int do_above, int do_left) {
static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
const uint8_t *left, int do_above, int do_left) {
uint16x8_t sum_top;
uint16x8_t sum_left;
uint8x8_t dc0;
@ -59,24 +59,24 @@ static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_4x4(dst, stride, above, left, 1, 1);
}
void vpx_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
dc_4x4(dst, stride, NULL, left, 0, 1);
}
void vpx_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)left;
dc_4x4(dst, stride, above, NULL, 1, 0);
}
void vpx_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
(void)left;
@ -87,9 +87,8 @@ void vpx_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
// DC 8x8
// 'do_above' and 'do_left' facilitate branch removal when inlined.
static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left,
int do_above, int do_left) {
static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
const uint8_t *left, int do_above, int do_left) {
uint16x8_t sum_top;
uint16x8_t sum_left;
uint8x8_t dc0;
@ -130,24 +129,24 @@ static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_8x8(dst, stride, above, left, 1, 1);
}
void vpx_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
dc_8x8(dst, stride, NULL, left, 0, 1);
}
void vpx_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)left;
dc_8x8(dst, stride, above, NULL, 1, 0);
}
void vpx_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
(void)above;
(void)left;
@ -203,26 +202,26 @@ static INLINE void dc_16x16(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_dc_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_16x16(dst, stride, above, left, 1, 1);
}
void vpx_dc_left_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_left_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
dc_16x16(dst, stride, NULL, left, 0, 1);
}
void vpx_dc_top_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_top_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)left;
dc_16x16(dst, stride, above, NULL, 1, 0);
}
void vpx_dc_128_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_128_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
@ -286,26 +285,26 @@ static INLINE void dc_32x32(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
dc_32x32(dst, stride, above, left, 1, 1);
}
void vpx_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
dc_32x32(dst, stride, NULL, left, 0, 1);
}
void vpx_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)left;
dc_32x32(dst, stride, above, NULL, 1, 0);
}
void vpx_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
void aom_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
(void)above;
@ -315,7 +314,7 @@ void vpx_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
// -----------------------------------------------------------------------------
void vpx_d45_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_d45_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
const uint64x1_t A0 = vreinterpret_u64_u8(vld1_u8(above)); // top row
const uint64x1_t A1 = vshr_n_u64(A0, 8);
@ -338,7 +337,7 @@ void vpx_d45_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
dst[3 * stride + 3] = above[7];
}
void vpx_d45_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_d45_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
static const uint8_t shuffle1[8] = { 1, 2, 3, 4, 5, 6, 7, 7 };
static const uint8_t shuffle2[8] = { 2, 3, 4, 5, 6, 7, 7, 7 };
@ -358,7 +357,7 @@ void vpx_d45_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
vst1_u8(dst + i * stride, row);
}
void vpx_d45_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_d45_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
const uint8x16_t A0 = vld1q_u8(above); // top row
const uint8x16_t above_right = vld1q_dup_u8(above + 15);
@ -377,7 +376,7 @@ void vpx_d45_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
// -----------------------------------------------------------------------------
void vpx_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
const uint8x8_t XABCD_u8 = vld1_u8(above - 1);
const uint64x1_t XABCD = vreinterpret_u64_u8(XABCD_u8);
@ -407,7 +406,7 @@ void vpx_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
#if !HAVE_NEON_ASM
void vpx_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint32x2_t d0u32 = vdup_n_u32(0);
@ -418,29 +417,27 @@ void vpx_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
vst1_lane_u32((uint32_t *)dst, d0u32, 0);
}
void vpx_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint8x8_t d0u8 = vdup_n_u8(0);
(void)left;
d0u8 = vld1_u8(above);
for (i = 0; i < 8; i++, dst += stride)
vst1_u8(dst, d0u8);
for (i = 0; i < 8; i++, dst += stride) vst1_u8(dst, d0u8);
}
void vpx_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint8x16_t q0u8 = vdupq_n_u8(0);
(void)left;
q0u8 = vld1q_u8(above);
for (i = 0; i < 16; i++, dst += stride)
vst1q_u8(dst, q0u8);
for (i = 0; i < 16; i++, dst += stride) vst1q_u8(dst, q0u8);
}
void vpx_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint8x16_t q0u8 = vdupq_n_u8(0);
@ -455,7 +452,7 @@ void vpx_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
uint8x8_t d0u8 = vdup_n_u8(0);
uint32x2_t d1u32 = vdup_n_u32(0);
@ -476,7 +473,7 @@ void vpx_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
}
void vpx_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
uint8x8_t d0u8 = vdup_n_u8(0);
uint64x1_t d1u64 = vdup_n_u64(0);
@ -509,7 +506,7 @@ void vpx_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
vst1_u8(dst, d0u8);
}
void vpx_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j;
uint8x8_t d2u8 = vdup_n_u8(0);
@ -547,7 +544,7 @@ void vpx_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j, k;
uint8x8_t d2u8 = vdup_n_u8(0);
@ -595,7 +592,7 @@ void vpx_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_tm_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
void aom_tm_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int i;
uint16x8_t q1u16, q3u16;
@ -608,14 +605,14 @@ void vpx_tm_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
q3u16 = vsubl_u8(vreinterpret_u8_u32(d2u32), d0u8);
for (i = 0; i < 4; i++, dst += stride) {
q1u16 = vdupq_n_u16((uint16_t)left[i]);
q1s16 = vaddq_s16(vreinterpretq_s16_u16(q1u16),
vreinterpretq_s16_u16(q3u16));
q1s16 =
vaddq_s16(vreinterpretq_s16_u16(q1u16), vreinterpretq_s16_u16(q3u16));
d0u8 = vqmovun_s16(q1s16);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
}
}
void vpx_tm_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
void aom_tm_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j;
uint16x8_t q0u16, q3u16, q10u16;
@ -631,33 +628,33 @@ void vpx_tm_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
d20u16 = vget_low_u16(q10u16);
for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) {
q0u16 = vdupq_lane_u16(d20u16, 0);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16),
vreinterpretq_s16_u16(q0u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 1);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16),
vreinterpretq_s16_u16(q0u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 2);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16),
vreinterpretq_s16_u16(q0u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
q0u16 = vdupq_lane_u16(d20u16, 3);
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q3u16),
vreinterpretq_s16_u16(q0u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q3u16), vreinterpretq_s16_u16(q0u16));
d0u8 = vqmovun_s16(q0s16);
vst1_u64((uint64_t *)dst, vreinterpret_u64_u8(d0u8));
dst += stride;
}
}
void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
void aom_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j, k;
uint16x8_t q0u16, q2u16, q3u16, q8u16, q10u16;
@ -677,14 +674,14 @@ void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
for (j = 0; j < 2; j++, d20u16 = vget_high_u16(q10u16)) {
q0u16 = vdupq_lane_u16(d20u16, 0);
q8u16 = vdupq_lane_u16(d20u16, 1);
q1s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q2u16));
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q3u16));
q11s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16),
vreinterpretq_s16_u16(q2u16));
q8s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16),
vreinterpretq_s16_u16(q3u16));
q1s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q2u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q3u16));
q11s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q2u16));
q8s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q3u16));
d2u8 = vqmovun_s16(q1s16);
d3u8 = vqmovun_s16(q0s16);
d22u8 = vqmovun_s16(q11s16);
@ -698,14 +695,14 @@ void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
q0u16 = vdupq_lane_u16(d20u16, 2);
q8u16 = vdupq_lane_u16(d20u16, 3);
q1s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q2u16));
q0s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q3u16));
q11s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16),
vreinterpretq_s16_u16(q2u16));
q8s16 = vaddq_s16(vreinterpretq_s16_u16(q8u16),
vreinterpretq_s16_u16(q3u16));
q1s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q2u16));
q0s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q3u16));
q11s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q2u16));
q8s16 =
vaddq_s16(vreinterpretq_s16_u16(q8u16), vreinterpretq_s16_u16(q3u16));
d2u8 = vqmovun_s16(q1s16);
d3u8 = vqmovun_s16(q0s16);
d22u8 = vqmovun_s16(q11s16);
@ -720,7 +717,7 @@ void vpx_tm_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
}
}
void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
void aom_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
int j, k;
uint16x8_t q0u16, q3u16, q8u16, q9u16, q10u16, q11u16;
@ -742,10 +739,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
d6u16 = vget_low_u16(q3u16);
for (j = 0; j < 2; j++, d6u16 = vget_high_u16(q3u16)) {
q0u16 = vdupq_lane_u16(d6u16, 0);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q9u16));
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
@ -761,10 +758,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 1);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q9u16));
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
@ -780,10 +777,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 2);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q9u16));
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
@ -799,10 +796,10 @@ void vpx_tm_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
dst += stride;
q0u16 = vdupq_lane_u16(d6u16, 3);
q12s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q8u16));
q13s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q9u16));
q12s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q8u16));
q13s16 =
vaddq_s16(vreinterpretq_s16_u16(q0u16), vreinterpretq_s16_u16(q9u16));
q14s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),
vreinterpretq_s16_u16(q10u16));
q15s16 = vaddq_s16(vreinterpretq_s16_u16(q0u16),

View File

@ -1,32 +1,35 @@
;
; Copyright (c) 2014 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_v_predictor_4x4_neon|
EXPORT |vpx_v_predictor_8x8_neon|
EXPORT |vpx_v_predictor_16x16_neon|
EXPORT |vpx_v_predictor_32x32_neon|
EXPORT |vpx_h_predictor_4x4_neon|
EXPORT |vpx_h_predictor_8x8_neon|
EXPORT |vpx_h_predictor_16x16_neon|
EXPORT |vpx_h_predictor_32x32_neon|
EXPORT |vpx_tm_predictor_4x4_neon|
EXPORT |vpx_tm_predictor_8x8_neon|
EXPORT |vpx_tm_predictor_16x16_neon|
EXPORT |vpx_tm_predictor_32x32_neon|
;
EXPORT |aom_v_predictor_4x4_neon|
EXPORT |aom_v_predictor_8x8_neon|
EXPORT |aom_v_predictor_16x16_neon|
EXPORT |aom_v_predictor_32x32_neon|
EXPORT |aom_h_predictor_4x4_neon|
EXPORT |aom_h_predictor_8x8_neon|
EXPORT |aom_h_predictor_16x16_neon|
EXPORT |aom_h_predictor_32x32_neon|
EXPORT |aom_tm_predictor_4x4_neon|
EXPORT |aom_tm_predictor_8x8_neon|
EXPORT |aom_tm_predictor_16x16_neon|
EXPORT |aom_tm_predictor_32x32_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -34,16 +37,16 @@
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_v_predictor_4x4_neon| PROC
|aom_v_predictor_4x4_neon| PROC
vld1.32 {d0[0]}, [r2]
vst1.32 {d0[0]}, [r0], r1
vst1.32 {d0[0]}, [r0], r1
vst1.32 {d0[0]}, [r0], r1
vst1.32 {d0[0]}, [r0], r1
bx lr
ENDP ; |vpx_v_predictor_4x4_neon|
ENDP ; |aom_v_predictor_4x4_neon|
;void vpx_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -51,7 +54,7 @@
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_v_predictor_8x8_neon| PROC
|aom_v_predictor_8x8_neon| PROC
vld1.8 {d0}, [r2]
vst1.8 {d0}, [r0], r1
vst1.8 {d0}, [r0], r1
@ -62,9 +65,9 @@
vst1.8 {d0}, [r0], r1
vst1.8 {d0}, [r0], r1
bx lr
ENDP ; |vpx_v_predictor_8x8_neon|
ENDP ; |aom_v_predictor_8x8_neon|
;void vpx_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -72,7 +75,7 @@
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_v_predictor_16x16_neon| PROC
|aom_v_predictor_16x16_neon| PROC
vld1.8 {q0}, [r2]
vst1.8 {q0}, [r0], r1
vst1.8 {q0}, [r0], r1
@ -91,9 +94,9 @@
vst1.8 {q0}, [r0], r1
vst1.8 {q0}, [r0], r1
bx lr
ENDP ; |vpx_v_predictor_16x16_neon|
ENDP ; |aom_v_predictor_16x16_neon|
;void vpx_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -101,7 +104,7 @@
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_v_predictor_32x32_neon| PROC
|aom_v_predictor_32x32_neon| PROC
vld1.8 {q0, q1}, [r2]
mov r2, #2
loop_v
@ -124,9 +127,9 @@ loop_v
subs r2, r2, #1
bgt loop_v
bx lr
ENDP ; |vpx_v_predictor_32x32_neon|
ENDP ; |aom_v_predictor_32x32_neon|
;void vpx_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -134,7 +137,7 @@ loop_v
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_h_predictor_4x4_neon| PROC
|aom_h_predictor_4x4_neon| PROC
vld1.32 {d1[0]}, [r3]
vdup.8 d0, d1[0]
vst1.32 {d0[0]}, [r0], r1
@ -145,9 +148,9 @@ loop_v
vdup.8 d0, d1[3]
vst1.32 {d0[0]}, [r0], r1
bx lr
ENDP ; |vpx_h_predictor_4x4_neon|
ENDP ; |aom_h_predictor_4x4_neon|
;void vpx_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -155,7 +158,7 @@ loop_v
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_h_predictor_8x8_neon| PROC
|aom_h_predictor_8x8_neon| PROC
vld1.64 {d1}, [r3]
vdup.8 d0, d1[0]
vst1.64 {d0}, [r0], r1
@ -174,9 +177,9 @@ loop_v
vdup.8 d0, d1[7]
vst1.64 {d0}, [r0], r1
bx lr
ENDP ; |vpx_h_predictor_8x8_neon|
ENDP ; |aom_h_predictor_8x8_neon|
;void vpx_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -184,7 +187,7 @@ loop_v
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_h_predictor_16x16_neon| PROC
|aom_h_predictor_16x16_neon| PROC
vld1.8 {q1}, [r3]
vdup.8 q0, d2[0]
vst1.8 {q0}, [r0], r1
@ -219,9 +222,9 @@ loop_v
vdup.8 q0, d3[7]
vst1.8 {q0}, [r0], r1
bx lr
ENDP ; |vpx_h_predictor_16x16_neon|
ENDP ; |aom_h_predictor_16x16_neon|
;void vpx_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride,
;void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -229,7 +232,7 @@ loop_v
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_h_predictor_32x32_neon| PROC
|aom_h_predictor_32x32_neon| PROC
sub r1, r1, #16
mov r2, #2
loop_h
@ -285,9 +288,9 @@ loop_h
subs r2, r2, #1
bgt loop_h
bx lr
ENDP ; |vpx_h_predictor_32x32_neon|
ENDP ; |aom_h_predictor_32x32_neon|
;void vpx_tm_predictor_4x4_neon (uint8_t *dst, ptrdiff_t y_stride,
;void aom_tm_predictor_4x4_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -295,7 +298,7 @@ loop_h
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_tm_predictor_4x4_neon| PROC
|aom_tm_predictor_4x4_neon| PROC
; Load ytop_left = above[-1];
sub r12, r2, #1
vld1.u8 {d0[]}, [r12]
@ -331,9 +334,9 @@ loop_h
vst1.32 {d0[0]}, [r0], r1
vst1.32 {d1[0]}, [r0], r1
bx lr
ENDP ; |vpx_tm_predictor_4x4_neon|
ENDP ; |aom_tm_predictor_4x4_neon|
;void vpx_tm_predictor_8x8_neon (uint8_t *dst, ptrdiff_t y_stride,
;void aom_tm_predictor_8x8_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -341,7 +344,7 @@ loop_h
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_tm_predictor_8x8_neon| PROC
|aom_tm_predictor_8x8_neon| PROC
; Load ytop_left = above[-1];
sub r12, r2, #1
vld1.8 {d0[]}, [r12]
@ -403,9 +406,9 @@ loop_h
vst1.64 {d3}, [r0], r1
bx lr
ENDP ; |vpx_tm_predictor_8x8_neon|
ENDP ; |aom_tm_predictor_8x8_neon|
;void vpx_tm_predictor_16x16_neon (uint8_t *dst, ptrdiff_t y_stride,
;void aom_tm_predictor_16x16_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -413,7 +416,7 @@ loop_h
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_tm_predictor_16x16_neon| PROC
|aom_tm_predictor_16x16_neon| PROC
; Load ytop_left = above[-1];
sub r12, r2, #1
vld1.8 {d0[]}, [r12]
@ -496,9 +499,9 @@ loop_16x16_neon
bgt loop_16x16_neon
bx lr
ENDP ; |vpx_tm_predictor_16x16_neon|
ENDP ; |aom_tm_predictor_16x16_neon|
;void vpx_tm_predictor_32x32_neon (uint8_t *dst, ptrdiff_t y_stride,
;void aom_tm_predictor_32x32_neon (uint8_t *dst, ptrdiff_t y_stride,
; const uint8_t *above,
; const uint8_t *left)
; r0 uint8_t *dst
@ -506,7 +509,7 @@ loop_16x16_neon
; r2 const uint8_t *above
; r3 const uint8_t *left
|vpx_tm_predictor_32x32_neon| PROC
|aom_tm_predictor_32x32_neon| PROC
; Load ytop_left = above[-1];
sub r12, r2, #1
vld1.8 {d0[]}, [r12]
@ -625,6 +628,6 @@ loop_32x32_neon
bgt loop_32x32_neon
bx lr
ENDP ; |vpx_tm_predictor_32x32_neon|
ENDP ; |aom_tm_predictor_32x32_neon|
END

View File

@ -1,19 +1,22 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_lpf_horizontal_4_dual_neon|
;
EXPORT |aom_lpf_horizontal_4_dual_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vpx_lpf_horizontal_4_dual_neon(uint8_t *s, int p,
;void aom_lpf_horizontal_4_dual_neon(uint8_t *s, int p,
; const uint8_t *blimit0,
; const uint8_t *limit0,
; const uint8_t *thresh0,
@ -29,7 +32,7 @@
; sp+8 const uint8_t *limit1,
; sp+12 const uint8_t *thresh1,
|vpx_lpf_horizontal_4_dual_neon| PROC
|aom_lpf_horizontal_4_dual_neon| PROC
push {lr}
ldr r12, [sp, #4] ; load thresh0
@ -66,7 +69,7 @@
sub r2, r2, r1, lsl #1
sub r3, r3, r1, lsl #1
bl vpx_loop_filter_neon_16
bl aom_loop_filter_neon_16
vst1.u8 {q5}, [r2@64], r1 ; store op1
vst1.u8 {q6}, [r3@64], r1 ; store op0
@ -76,9 +79,9 @@
vpop {d8-d15} ; restore neon registers
pop {pc}
ENDP ; |vpx_lpf_horizontal_4_dual_neon|
ENDP ; |aom_lpf_horizontal_4_dual_neon|
; void vpx_loop_filter_neon_16();
; void aom_loop_filter_neon_16();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. This function uses
; registers d8-d15, so the calling function must save those registers.
@ -101,7 +104,7 @@
; q6 op0
; q7 oq0
; q8 oq1
|vpx_loop_filter_neon_16| PROC
|aom_loop_filter_neon_16| PROC
; filter_mask
vabd.u8 q11, q3, q4 ; m1 = abs(p3 - p2)
@ -194,6 +197,6 @@
veor q8, q12, q10 ; *oq1 = u^0x80
bx lr
ENDP ; |vpx_loop_filter_neon_16|
ENDP ; |aom_loop_filter_neon_16|
END

View File

@ -0,0 +1,174 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
static INLINE void loop_filter_neon_16(uint8x16_t qblimit, // blimit
uint8x16_t qlimit, // limit
uint8x16_t qthresh, // thresh
uint8x16_t q3, // p3
uint8x16_t q4, // p2
uint8x16_t q5, // p1
uint8x16_t q6, // p0
uint8x16_t q7, // q0
uint8x16_t q8, // q1
uint8x16_t q9, // q2
uint8x16_t q10, // q3
uint8x16_t *q5r, // p1
uint8x16_t *q6r, // p0
uint8x16_t *q7r, // q0
uint8x16_t *q8r) { // q1
uint8x16_t q1u8, q2u8, q11u8, q12u8, q13u8, q14u8, q15u8;
int16x8_t q2s16, q11s16;
uint16x8_t q4u16;
int8x16_t q0s8, q1s8, q2s8, q11s8, q12s8, q13s8;
int8x8_t d2s8, d3s8;
q11u8 = vabdq_u8(q3, q4);
q12u8 = vabdq_u8(q4, q5);
q13u8 = vabdq_u8(q5, q6);
q14u8 = vabdq_u8(q8, q7);
q3 = vabdq_u8(q9, q8);
q4 = vabdq_u8(q10, q9);
q11u8 = vmaxq_u8(q11u8, q12u8);
q12u8 = vmaxq_u8(q13u8, q14u8);
q3 = vmaxq_u8(q3, q4);
q15u8 = vmaxq_u8(q11u8, q12u8);
q9 = vabdq_u8(q6, q7);
// aom_hevmask
q13u8 = vcgtq_u8(q13u8, qthresh);
q14u8 = vcgtq_u8(q14u8, qthresh);
q15u8 = vmaxq_u8(q15u8, q3);
q2u8 = vabdq_u8(q5, q8);
q9 = vqaddq_u8(q9, q9);
q15u8 = vcgeq_u8(qlimit, q15u8);
// aom_filter() function
// convert to signed
q10 = vdupq_n_u8(0x80);
q8 = veorq_u8(q8, q10);
q7 = veorq_u8(q7, q10);
q6 = veorq_u8(q6, q10);
q5 = veorq_u8(q5, q10);
q2u8 = vshrq_n_u8(q2u8, 1);
q9 = vqaddq_u8(q9, q2u8);
q2s16 = vsubl_s8(vget_low_s8(vreinterpretq_s8_u8(q7)),
vget_low_s8(vreinterpretq_s8_u8(q6)));
q11s16 = vsubl_s8(vget_high_s8(vreinterpretq_s8_u8(q7)),
vget_high_s8(vreinterpretq_s8_u8(q6)));
q9 = vcgeq_u8(qblimit, q9);
q1s8 = vqsubq_s8(vreinterpretq_s8_u8(q5), vreinterpretq_s8_u8(q8));
q14u8 = vorrq_u8(q13u8, q14u8);
q4u16 = vdupq_n_u16(3);
q2s16 = vmulq_s16(q2s16, vreinterpretq_s16_u16(q4u16));
q11s16 = vmulq_s16(q11s16, vreinterpretq_s16_u16(q4u16));
q1u8 = vandq_u8(vreinterpretq_u8_s8(q1s8), q14u8);
q15u8 = vandq_u8(q15u8, q9);
q1s8 = vreinterpretq_s8_u8(q1u8);
q2s16 = vaddw_s8(q2s16, vget_low_s8(q1s8));
q11s16 = vaddw_s8(q11s16, vget_high_s8(q1s8));
q4 = vdupq_n_u8(3);
q9 = vdupq_n_u8(4);
// aom_filter = clamp(aom_filter + 3 * ( qs0 - ps0))
d2s8 = vqmovn_s16(q2s16);
d3s8 = vqmovn_s16(q11s16);
q1s8 = vcombine_s8(d2s8, d3s8);
q1u8 = vandq_u8(vreinterpretq_u8_s8(q1s8), q15u8);
q1s8 = vreinterpretq_s8_u8(q1u8);
q2s8 = vqaddq_s8(q1s8, vreinterpretq_s8_u8(q4));
q1s8 = vqaddq_s8(q1s8, vreinterpretq_s8_u8(q9));
q2s8 = vshrq_n_s8(q2s8, 3);
q1s8 = vshrq_n_s8(q1s8, 3);
q11s8 = vqaddq_s8(vreinterpretq_s8_u8(q6), q2s8);
q0s8 = vqsubq_s8(vreinterpretq_s8_u8(q7), q1s8);
q1s8 = vrshrq_n_s8(q1s8, 1);
q1s8 = vbicq_s8(q1s8, vreinterpretq_s8_u8(q14u8));
q13s8 = vqaddq_s8(vreinterpretq_s8_u8(q5), q1s8);
q12s8 = vqsubq_s8(vreinterpretq_s8_u8(q8), q1s8);
*q8r = veorq_u8(vreinterpretq_u8_s8(q12s8), q10);
*q7r = veorq_u8(vreinterpretq_u8_s8(q0s8), q10);
*q6r = veorq_u8(vreinterpretq_u8_s8(q11s8), q10);
*q5r = veorq_u8(vreinterpretq_u8_s8(q13s8), q10);
return;
}
void aom_lpf_horizontal_4_dual_neon(
uint8_t *s, int p /* pitch */, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1,
const uint8_t *limit1, const uint8_t *thresh1) {
uint8x8_t dblimit0, dlimit0, dthresh0, dblimit1, dlimit1, dthresh1;
uint8x16_t qblimit, qlimit, qthresh;
uint8x16_t q3u8, q4u8, q5u8, q6u8, q7u8, q8u8, q9u8, q10u8;
dblimit0 = vld1_u8(blimit0);
dlimit0 = vld1_u8(limit0);
dthresh0 = vld1_u8(thresh0);
dblimit1 = vld1_u8(blimit1);
dlimit1 = vld1_u8(limit1);
dthresh1 = vld1_u8(thresh1);
qblimit = vcombine_u8(dblimit0, dblimit1);
qlimit = vcombine_u8(dlimit0, dlimit1);
qthresh = vcombine_u8(dthresh0, dthresh1);
s -= (p << 2);
q3u8 = vld1q_u8(s);
s += p;
q4u8 = vld1q_u8(s);
s += p;
q5u8 = vld1q_u8(s);
s += p;
q6u8 = vld1q_u8(s);
s += p;
q7u8 = vld1q_u8(s);
s += p;
q8u8 = vld1q_u8(s);
s += p;
q9u8 = vld1q_u8(s);
s += p;
q10u8 = vld1q_u8(s);
loop_filter_neon_16(qblimit, qlimit, qthresh, q3u8, q4u8, q5u8, q6u8, q7u8,
q8u8, q9u8, q10u8, &q5u8, &q6u8, &q7u8, &q8u8);
s -= (p * 5);
vst1q_u8(s, q5u8);
s += p;
vst1q_u8(s, q6u8);
s += p;
vst1q_u8(s, q7u8);
s += p;
vst1q_u8(s, q8u8);
return;
}

View File

@ -1,23 +1,26 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_lpf_horizontal_4_neon|
EXPORT |vpx_lpf_vertical_4_neon|
;
EXPORT |aom_lpf_horizontal_4_neon|
EXPORT |aom_lpf_vertical_4_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
; Currently vpx only works on iterations 8 at a time. The vp8 loop filter
; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time.
;
; void vpx_lpf_horizontal_4_neon(uint8_t *s,
; void aom_lpf_horizontal_4_neon(uint8_t *s,
; int p /* pitch */,
; const uint8_t *blimit,
; const uint8_t *limit,
@ -28,7 +31,7 @@
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|vpx_lpf_horizontal_4_neon| PROC
|aom_lpf_horizontal_4_neon| PROC
push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
@ -53,7 +56,7 @@
sub r2, r2, r1, lsl #1
sub r3, r3, r1, lsl #1
bl vpx_loop_filter_neon
bl aom_loop_filter_neon
vst1.u8 {d4}, [r2@64], r1 ; store op1
vst1.u8 {d5}, [r3@64], r1 ; store op0
@ -61,12 +64,12 @@
vst1.u8 {d7}, [r3@64], r1 ; store oq1
pop {pc}
ENDP ; |vpx_lpf_horizontal_4_neon|
ENDP ; |aom_lpf_horizontal_4_neon|
; Currently vpx only works on iterations 8 at a time. The vp8 loop filter
; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time.
;
; void vpx_lpf_vertical_4_neon(uint8_t *s,
; void aom_lpf_vertical_4_neon(uint8_t *s,
; int p /* pitch */,
; const uint8_t *blimit,
; const uint8_t *limit,
@ -77,7 +80,7 @@
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|vpx_lpf_vertical_4_neon| PROC
|aom_lpf_vertical_4_neon| PROC
push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
@ -113,7 +116,7 @@
vtrn.8 d7, d16
vtrn.8 d17, d18
bl vpx_loop_filter_neon
bl aom_loop_filter_neon
sub r0, r0, #2
@ -128,9 +131,9 @@
vst4.8 {d4[7], d5[7], d6[7], d7[7]}, [r0]
pop {pc}
ENDP ; |vpx_lpf_vertical_4_neon|
ENDP ; |aom_lpf_vertical_4_neon|
; void vpx_loop_filter_neon();
; void aom_loop_filter_neon();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. The function does not use
; registers d8-d15.
@ -154,7 +157,7 @@
; d5 op0
; d6 oq0
; d7 oq1
|vpx_loop_filter_neon| PROC
|aom_loop_filter_neon| PROC
; filter_mask
vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2)
vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1)
@ -244,6 +247,6 @@
veor d7, d20, d18 ; *oq1 = u^0x80
bx lr
ENDP ; |vpx_loop_filter_neon|
ENDP ; |aom_loop_filter_neon|
END

View File

@ -0,0 +1,250 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
static INLINE void loop_filter_neon(uint8x8_t dblimit, // flimit
uint8x8_t dlimit, // limit
uint8x8_t dthresh, // thresh
uint8x8_t d3u8, // p3
uint8x8_t d4u8, // p2
uint8x8_t d5u8, // p1
uint8x8_t d6u8, // p0
uint8x8_t d7u8, // q0
uint8x8_t d16u8, // q1
uint8x8_t d17u8, // q2
uint8x8_t d18u8, // q3
uint8x8_t *d4ru8, // p1
uint8x8_t *d5ru8, // p0
uint8x8_t *d6ru8, // q0
uint8x8_t *d7ru8) { // q1
uint8x8_t d19u8, d20u8, d21u8, d22u8, d23u8, d27u8, d28u8;
int16x8_t q12s16;
int8x8_t d19s8, d20s8, d21s8, d26s8, d27s8, d28s8;
d19u8 = vabd_u8(d3u8, d4u8);
d20u8 = vabd_u8(d4u8, d5u8);
d21u8 = vabd_u8(d5u8, d6u8);
d22u8 = vabd_u8(d16u8, d7u8);
d3u8 = vabd_u8(d17u8, d16u8);
d4u8 = vabd_u8(d18u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d20u8 = vmax_u8(d21u8, d22u8);
d3u8 = vmax_u8(d3u8, d4u8);
d23u8 = vmax_u8(d19u8, d20u8);
d17u8 = vabd_u8(d6u8, d7u8);
d21u8 = vcgt_u8(d21u8, dthresh);
d22u8 = vcgt_u8(d22u8, dthresh);
d23u8 = vmax_u8(d23u8, d3u8);
d28u8 = vabd_u8(d5u8, d16u8);
d17u8 = vqadd_u8(d17u8, d17u8);
d23u8 = vcge_u8(dlimit, d23u8);
d18u8 = vdup_n_u8(0x80);
d5u8 = veor_u8(d5u8, d18u8);
d6u8 = veor_u8(d6u8, d18u8);
d7u8 = veor_u8(d7u8, d18u8);
d16u8 = veor_u8(d16u8, d18u8);
d28u8 = vshr_n_u8(d28u8, 1);
d17u8 = vqadd_u8(d17u8, d28u8);
d19u8 = vdup_n_u8(3);
d28s8 = vsub_s8(vreinterpret_s8_u8(d7u8), vreinterpret_s8_u8(d6u8));
d17u8 = vcge_u8(dblimit, d17u8);
d27s8 = vqsub_s8(vreinterpret_s8_u8(d5u8), vreinterpret_s8_u8(d16u8));
d22u8 = vorr_u8(d21u8, d22u8);
q12s16 = vmull_s8(d28s8, vreinterpret_s8_u8(d19u8));
d27u8 = vand_u8(vreinterpret_u8_s8(d27s8), d22u8);
d23u8 = vand_u8(d23u8, d17u8);
q12s16 = vaddw_s8(q12s16, vreinterpret_s8_u8(d27u8));
d17u8 = vdup_n_u8(4);
d27s8 = vqmovn_s16(q12s16);
d27u8 = vand_u8(vreinterpret_u8_s8(d27s8), d23u8);
d27s8 = vreinterpret_s8_u8(d27u8);
d28s8 = vqadd_s8(d27s8, vreinterpret_s8_u8(d19u8));
d27s8 = vqadd_s8(d27s8, vreinterpret_s8_u8(d17u8));
d28s8 = vshr_n_s8(d28s8, 3);
d27s8 = vshr_n_s8(d27s8, 3);
d19s8 = vqadd_s8(vreinterpret_s8_u8(d6u8), d28s8);
d26s8 = vqsub_s8(vreinterpret_s8_u8(d7u8), d27s8);
d27s8 = vrshr_n_s8(d27s8, 1);
d27s8 = vbic_s8(d27s8, vreinterpret_s8_u8(d22u8));
d21s8 = vqadd_s8(vreinterpret_s8_u8(d5u8), d27s8);
d20s8 = vqsub_s8(vreinterpret_s8_u8(d16u8), d27s8);
*d4ru8 = veor_u8(vreinterpret_u8_s8(d21s8), d18u8);
*d5ru8 = veor_u8(vreinterpret_u8_s8(d19s8), d18u8);
*d6ru8 = veor_u8(vreinterpret_u8_s8(d26s8), d18u8);
*d7ru8 = veor_u8(vreinterpret_u8_s8(d20s8), d18u8);
return;
}
void aom_lpf_horizontal_4_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s, *psrc;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d3u8, d4u8, d5u8, d6u8, d7u8, d16u8, d17u8, d18u8;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
psrc = src - (pitch << 2);
for (i = 0; i < 1; i++) {
s = psrc + i * 8;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
loop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d4u8, &d5u8, &d6u8, &d7u8);
s -= (pitch * 5);
vst1_u8(s, d4u8);
s += pitch;
vst1_u8(s, d5u8);
s += pitch;
vst1_u8(s, d6u8);
s += pitch;
vst1_u8(s, d7u8);
}
return;
}
void aom_lpf_vertical_4_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i, pitch8;
uint8_t *s;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d3u8, d4u8, d5u8, d6u8, d7u8, d16u8, d17u8, d18u8;
uint32x2x2_t d2tmp0, d2tmp1, d2tmp2, d2tmp3;
uint16x4x2_t d2tmp4, d2tmp5, d2tmp6, d2tmp7;
uint8x8x2_t d2tmp8, d2tmp9, d2tmp10, d2tmp11;
uint8x8x4_t d4Result;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
pitch8 = pitch * 8;
for (i = 0; i < 1; i++, src += pitch8) {
s = src - (i + 1) * 4;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
d2tmp0 = vtrn_u32(vreinterpret_u32_u8(d3u8), vreinterpret_u32_u8(d7u8));
d2tmp1 = vtrn_u32(vreinterpret_u32_u8(d4u8), vreinterpret_u32_u8(d16u8));
d2tmp2 = vtrn_u32(vreinterpret_u32_u8(d5u8), vreinterpret_u32_u8(d17u8));
d2tmp3 = vtrn_u32(vreinterpret_u32_u8(d6u8), vreinterpret_u32_u8(d18u8));
d2tmp4 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[0]),
vreinterpret_u16_u32(d2tmp2.val[0]));
d2tmp5 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[0]),
vreinterpret_u16_u32(d2tmp3.val[0]));
d2tmp6 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[1]),
vreinterpret_u16_u32(d2tmp2.val[1]));
d2tmp7 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[1]),
vreinterpret_u16_u32(d2tmp3.val[1]));
d2tmp8 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[0]),
vreinterpret_u8_u16(d2tmp5.val[0]));
d2tmp9 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[1]),
vreinterpret_u8_u16(d2tmp5.val[1]));
d2tmp10 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[0]),
vreinterpret_u8_u16(d2tmp7.val[0]));
d2tmp11 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[1]),
vreinterpret_u8_u16(d2tmp7.val[1]));
d3u8 = d2tmp8.val[0];
d4u8 = d2tmp8.val[1];
d5u8 = d2tmp9.val[0];
d6u8 = d2tmp9.val[1];
d7u8 = d2tmp10.val[0];
d16u8 = d2tmp10.val[1];
d17u8 = d2tmp11.val[0];
d18u8 = d2tmp11.val[1];
loop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d4u8, &d5u8, &d6u8, &d7u8);
d4Result.val[0] = d4u8;
d4Result.val[1] = d5u8;
d4Result.val[2] = d6u8;
d4Result.val[3] = d7u8;
src -= 2;
vst4_lane_u8(src, d4Result, 0);
src += pitch;
vst4_lane_u8(src, d4Result, 1);
src += pitch;
vst4_lane_u8(src, d4Result, 2);
src += pitch;
vst4_lane_u8(src, d4Result, 3);
src += pitch;
vst4_lane_u8(src, d4Result, 4);
src += pitch;
vst4_lane_u8(src, d4Result, 5);
src += pitch;
vst4_lane_u8(src, d4Result, 6);
src += pitch;
vst4_lane_u8(src, d4Result, 7);
}
return;
}

View File

@ -1,23 +1,26 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_lpf_horizontal_8_neon|
EXPORT |vpx_lpf_vertical_8_neon|
;
EXPORT |aom_lpf_horizontal_8_neon|
EXPORT |aom_lpf_vertical_8_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
; Currently vpx only works on iterations 8 at a time. The vp8 loop filter
; Currently aom only works on iterations 8 at a time. The aom loop filter
; works on 16 iterations at a time.
;
; void vpx_lpf_horizontal_8_neon(uint8_t *s, int p,
; void aom_lpf_horizontal_8_neon(uint8_t *s, int p,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh)
@ -26,7 +29,7 @@
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|vpx_lpf_horizontal_8_neon| PROC
|aom_lpf_horizontal_8_neon| PROC
push {r4-r5, lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
@ -51,7 +54,7 @@
sub r3, r3, r1, lsl #1
sub r2, r2, r1, lsl #2
bl vpx_mbloop_filter_neon
bl aom_mbloop_filter_neon
vst1.u8 {d0}, [r2@64], r1 ; store op2
vst1.u8 {d1}, [r3@64], r1 ; store op1
@ -62,9 +65,9 @@
pop {r4-r5, pc}
ENDP ; |vpx_lpf_horizontal_8_neon|
ENDP ; |aom_lpf_horizontal_8_neon|
; void vpx_lpf_vertical_8_neon(uint8_t *s,
; void aom_lpf_vertical_8_neon(uint8_t *s,
; int pitch,
; const uint8_t *blimit,
; const uint8_t *limit,
@ -75,7 +78,7 @@
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|vpx_lpf_vertical_8_neon| PROC
|aom_lpf_vertical_8_neon| PROC
push {r4-r5, lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
@ -114,7 +117,7 @@
sub r2, r0, #3
add r3, r0, #1
bl vpx_mbloop_filter_neon
bl aom_mbloop_filter_neon
;store op2, op1, op0, oq0
vst4.8 {d0[0], d1[0], d2[0], d3[0]}, [r2], r1
@ -137,9 +140,9 @@
vst2.8 {d4[7], d5[7]}, [r3]
pop {r4-r5, pc}
ENDP ; |vpx_lpf_vertical_8_neon|
ENDP ; |aom_lpf_vertical_8_neon|
; void vpx_mbloop_filter_neon();
; void aom_mbloop_filter_neon();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. The function does not use
; registers d8-d15.
@ -165,7 +168,7 @@
; d3 oq0
; d4 oq1
; d5 oq2
|vpx_mbloop_filter_neon| PROC
|aom_mbloop_filter_neon| PROC
; filter_mask
vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2)
vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1)
@ -420,6 +423,6 @@ filter_branch_only
bx lr
ENDP ; |vpx_mbloop_filter_neon|
ENDP ; |aom_mbloop_filter_neon|
END

View File

@ -0,0 +1,430 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
static INLINE void mbloop_filter_neon(uint8x8_t dblimit, // mblimit
uint8x8_t dlimit, // limit
uint8x8_t dthresh, // thresh
uint8x8_t d3u8, // p2
uint8x8_t d4u8, // p2
uint8x8_t d5u8, // p1
uint8x8_t d6u8, // p0
uint8x8_t d7u8, // q0
uint8x8_t d16u8, // q1
uint8x8_t d17u8, // q2
uint8x8_t d18u8, // q3
uint8x8_t *d0ru8, // p1
uint8x8_t *d1ru8, // p1
uint8x8_t *d2ru8, // p0
uint8x8_t *d3ru8, // q0
uint8x8_t *d4ru8, // q1
uint8x8_t *d5ru8) { // q1
uint32_t flat;
uint8x8_t d0u8, d1u8, d2u8, d19u8, d20u8, d21u8, d22u8, d23u8, d24u8;
uint8x8_t d25u8, d26u8, d27u8, d28u8, d29u8, d30u8, d31u8;
int16x8_t q15s16;
uint16x8_t q10u16, q14u16;
int8x8_t d21s8, d24s8, d25s8, d26s8, d28s8, d29s8, d30s8;
d19u8 = vabd_u8(d3u8, d4u8);
d20u8 = vabd_u8(d4u8, d5u8);
d21u8 = vabd_u8(d5u8, d6u8);
d22u8 = vabd_u8(d16u8, d7u8);
d23u8 = vabd_u8(d17u8, d16u8);
d24u8 = vabd_u8(d18u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d20u8 = vmax_u8(d21u8, d22u8);
d25u8 = vabd_u8(d6u8, d4u8);
d23u8 = vmax_u8(d23u8, d24u8);
d26u8 = vabd_u8(d7u8, d17u8);
d19u8 = vmax_u8(d19u8, d20u8);
d24u8 = vabd_u8(d6u8, d7u8);
d27u8 = vabd_u8(d3u8, d6u8);
d28u8 = vabd_u8(d18u8, d7u8);
d19u8 = vmax_u8(d19u8, d23u8);
d23u8 = vabd_u8(d5u8, d16u8);
d24u8 = vqadd_u8(d24u8, d24u8);
d19u8 = vcge_u8(dlimit, d19u8);
d25u8 = vmax_u8(d25u8, d26u8);
d26u8 = vmax_u8(d27u8, d28u8);
d23u8 = vshr_n_u8(d23u8, 1);
d25u8 = vmax_u8(d25u8, d26u8);
d24u8 = vqadd_u8(d24u8, d23u8);
d20u8 = vmax_u8(d20u8, d25u8);
d23u8 = vdup_n_u8(1);
d24u8 = vcge_u8(dblimit, d24u8);
d21u8 = vcgt_u8(d21u8, dthresh);
d20u8 = vcge_u8(d23u8, d20u8);
d19u8 = vand_u8(d19u8, d24u8);
d23u8 = vcgt_u8(d22u8, dthresh);
d20u8 = vand_u8(d20u8, d19u8);
d22u8 = vdup_n_u8(0x80);
d23u8 = vorr_u8(d21u8, d23u8);
q10u16 = vcombine_u16(vreinterpret_u16_u8(d20u8), vreinterpret_u16_u8(d21u8));
d30u8 = vshrn_n_u16(q10u16, 4);
flat = vget_lane_u32(vreinterpret_u32_u8(d30u8), 0);
if (flat == 0xffffffff) { // Check for all 1's, power_branch_only
d27u8 = vdup_n_u8(3);
d21u8 = vdup_n_u8(2);
q14u16 = vaddl_u8(d6u8, d7u8);
q14u16 = vmlal_u8(q14u16, d3u8, d27u8);
q14u16 = vmlal_u8(q14u16, d4u8, d21u8);
q14u16 = vaddw_u8(q14u16, d5u8);
*d0ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d16u8);
*d1ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d17u8);
*d2ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d3ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vsubw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d4ru8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vsubw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d17u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d5ru8 = vqrshrn_n_u16(q14u16, 3);
} else {
d21u8 = veor_u8(d7u8, d22u8);
d24u8 = veor_u8(d6u8, d22u8);
d25u8 = veor_u8(d5u8, d22u8);
d26u8 = veor_u8(d16u8, d22u8);
d27u8 = vdup_n_u8(3);
d28s8 = vsub_s8(vreinterpret_s8_u8(d21u8), vreinterpret_s8_u8(d24u8));
d29s8 = vqsub_s8(vreinterpret_s8_u8(d25u8), vreinterpret_s8_u8(d26u8));
q15s16 = vmull_s8(d28s8, vreinterpret_s8_u8(d27u8));
d29s8 = vand_s8(d29s8, vreinterpret_s8_u8(d23u8));
q15s16 = vaddw_s8(q15s16, d29s8);
d29u8 = vdup_n_u8(4);
d28s8 = vqmovn_s16(q15s16);
d28s8 = vand_s8(d28s8, vreinterpret_s8_u8(d19u8));
d30s8 = vqadd_s8(d28s8, vreinterpret_s8_u8(d27u8));
d29s8 = vqadd_s8(d28s8, vreinterpret_s8_u8(d29u8));
d30s8 = vshr_n_s8(d30s8, 3);
d29s8 = vshr_n_s8(d29s8, 3);
d24s8 = vqadd_s8(vreinterpret_s8_u8(d24u8), d30s8);
d21s8 = vqsub_s8(vreinterpret_s8_u8(d21u8), d29s8);
d29s8 = vrshr_n_s8(d29s8, 1);
d29s8 = vbic_s8(d29s8, vreinterpret_s8_u8(d23u8));
d25s8 = vqadd_s8(vreinterpret_s8_u8(d25u8), d29s8);
d26s8 = vqsub_s8(vreinterpret_s8_u8(d26u8), d29s8);
if (flat == 0) { // filter_branch_only
*d0ru8 = d4u8;
*d1ru8 = veor_u8(vreinterpret_u8_s8(d25s8), d22u8);
*d2ru8 = veor_u8(vreinterpret_u8_s8(d24s8), d22u8);
*d3ru8 = veor_u8(vreinterpret_u8_s8(d21s8), d22u8);
*d4ru8 = veor_u8(vreinterpret_u8_s8(d26s8), d22u8);
*d5ru8 = d17u8;
return;
}
d21u8 = veor_u8(vreinterpret_u8_s8(d21s8), d22u8);
d24u8 = veor_u8(vreinterpret_u8_s8(d24s8), d22u8);
d25u8 = veor_u8(vreinterpret_u8_s8(d25s8), d22u8);
d26u8 = veor_u8(vreinterpret_u8_s8(d26s8), d22u8);
d23u8 = vdup_n_u8(2);
q14u16 = vaddl_u8(d6u8, d7u8);
q14u16 = vmlal_u8(q14u16, d3u8, d27u8);
q14u16 = vmlal_u8(q14u16, d4u8, d23u8);
d0u8 = vbsl_u8(d20u8, dblimit, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
d1u8 = vbsl_u8(d20u8, dlimit, d25u8);
d30u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vaddw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d16u8);
d2u8 = vbsl_u8(d20u8, dthresh, d24u8);
d31u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vaddw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d17u8);
*d0ru8 = vbsl_u8(d20u8, d30u8, d0u8);
d23u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d3u8);
q14u16 = vsubw_u8(q14u16, d6u8);
q14u16 = vaddw_u8(q14u16, d7u8);
*d1ru8 = vbsl_u8(d20u8, d31u8, d1u8);
q14u16 = vaddw_u8(q14u16, d18u8);
*d2ru8 = vbsl_u8(d20u8, d23u8, d2u8);
d22u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d4u8);
q14u16 = vsubw_u8(q14u16, d7u8);
q14u16 = vaddw_u8(q14u16, d16u8);
d3u8 = vbsl_u8(d20u8, d3u8, d21u8);
q14u16 = vaddw_u8(q14u16, d18u8);
d4u8 = vbsl_u8(d20u8, d4u8, d26u8);
d6u8 = vqrshrn_n_u16(q14u16, 3);
q14u16 = vsubw_u8(q14u16, d5u8);
q14u16 = vsubw_u8(q14u16, d16u8);
q14u16 = vaddw_u8(q14u16, d17u8);
q14u16 = vaddw_u8(q14u16, d18u8);
d5u8 = vbsl_u8(d20u8, d5u8, d17u8);
d7u8 = vqrshrn_n_u16(q14u16, 3);
*d3ru8 = vbsl_u8(d20u8, d22u8, d3u8);
*d4ru8 = vbsl_u8(d20u8, d6u8, d4u8);
*d5ru8 = vbsl_u8(d20u8, d7u8, d5u8);
}
return;
}
void aom_lpf_horizontal_8_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s, *psrc;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
uint8x8_t d16u8, d17u8, d18u8;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
psrc = src - (pitch << 2);
for (i = 0; i < 1; i++) {
s = psrc + i * 8;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
mbloop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d0u8, &d1u8, &d2u8, &d3u8, &d4u8,
&d5u8);
s -= (pitch * 6);
vst1_u8(s, d0u8);
s += pitch;
vst1_u8(s, d1u8);
s += pitch;
vst1_u8(s, d2u8);
s += pitch;
vst1_u8(s, d3u8);
s += pitch;
vst1_u8(s, d4u8);
s += pitch;
vst1_u8(s, d5u8);
}
return;
}
void aom_lpf_vertical_8_neon(uint8_t *src, int pitch, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
int i;
uint8_t *s;
uint8x8_t dblimit, dlimit, dthresh;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
uint8x8_t d16u8, d17u8, d18u8;
uint32x2x2_t d2tmp0, d2tmp1, d2tmp2, d2tmp3;
uint16x4x2_t d2tmp4, d2tmp5, d2tmp6, d2tmp7;
uint8x8x2_t d2tmp8, d2tmp9, d2tmp10, d2tmp11;
uint8x8x4_t d4Result;
uint8x8x2_t d2Result;
dblimit = vld1_u8(blimit);
dlimit = vld1_u8(limit);
dthresh = vld1_u8(thresh);
for (i = 0; i < 1; i++) {
s = src + (i * (pitch << 3)) - 4;
d3u8 = vld1_u8(s);
s += pitch;
d4u8 = vld1_u8(s);
s += pitch;
d5u8 = vld1_u8(s);
s += pitch;
d6u8 = vld1_u8(s);
s += pitch;
d7u8 = vld1_u8(s);
s += pitch;
d16u8 = vld1_u8(s);
s += pitch;
d17u8 = vld1_u8(s);
s += pitch;
d18u8 = vld1_u8(s);
d2tmp0 = vtrn_u32(vreinterpret_u32_u8(d3u8), vreinterpret_u32_u8(d7u8));
d2tmp1 = vtrn_u32(vreinterpret_u32_u8(d4u8), vreinterpret_u32_u8(d16u8));
d2tmp2 = vtrn_u32(vreinterpret_u32_u8(d5u8), vreinterpret_u32_u8(d17u8));
d2tmp3 = vtrn_u32(vreinterpret_u32_u8(d6u8), vreinterpret_u32_u8(d18u8));
d2tmp4 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[0]),
vreinterpret_u16_u32(d2tmp2.val[0]));
d2tmp5 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[0]),
vreinterpret_u16_u32(d2tmp3.val[0]));
d2tmp6 = vtrn_u16(vreinterpret_u16_u32(d2tmp0.val[1]),
vreinterpret_u16_u32(d2tmp2.val[1]));
d2tmp7 = vtrn_u16(vreinterpret_u16_u32(d2tmp1.val[1]),
vreinterpret_u16_u32(d2tmp3.val[1]));
d2tmp8 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[0]),
vreinterpret_u8_u16(d2tmp5.val[0]));
d2tmp9 = vtrn_u8(vreinterpret_u8_u16(d2tmp4.val[1]),
vreinterpret_u8_u16(d2tmp5.val[1]));
d2tmp10 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[0]),
vreinterpret_u8_u16(d2tmp7.val[0]));
d2tmp11 = vtrn_u8(vreinterpret_u8_u16(d2tmp6.val[1]),
vreinterpret_u8_u16(d2tmp7.val[1]));
d3u8 = d2tmp8.val[0];
d4u8 = d2tmp8.val[1];
d5u8 = d2tmp9.val[0];
d6u8 = d2tmp9.val[1];
d7u8 = d2tmp10.val[0];
d16u8 = d2tmp10.val[1];
d17u8 = d2tmp11.val[0];
d18u8 = d2tmp11.val[1];
mbloop_filter_neon(dblimit, dlimit, dthresh, d3u8, d4u8, d5u8, d6u8, d7u8,
d16u8, d17u8, d18u8, &d0u8, &d1u8, &d2u8, &d3u8, &d4u8,
&d5u8);
d4Result.val[0] = d0u8;
d4Result.val[1] = d1u8;
d4Result.val[2] = d2u8;
d4Result.val[3] = d3u8;
d2Result.val[0] = d4u8;
d2Result.val[1] = d5u8;
s = src - 3;
vst4_lane_u8(s, d4Result, 0);
s += pitch;
vst4_lane_u8(s, d4Result, 1);
s += pitch;
vst4_lane_u8(s, d4Result, 2);
s += pitch;
vst4_lane_u8(s, d4Result, 3);
s += pitch;
vst4_lane_u8(s, d4Result, 4);
s += pitch;
vst4_lane_u8(s, d4Result, 5);
s += pitch;
vst4_lane_u8(s, d4Result, 6);
s += pitch;
vst4_lane_u8(s, d4Result, 7);
s = src + 1;
vst2_lane_u8(s, d2Result, 0);
s += pitch;
vst2_lane_u8(s, d2Result, 1);
s += pitch;
vst2_lane_u8(s, d2Result, 2);
s += pitch;
vst2_lane_u8(s, d2Result, 3);
s += pitch;
vst2_lane_u8(s, d2Result, 4);
s += pitch;
vst2_lane_u8(s, d2Result, 5);
s += pitch;
vst2_lane_u8(s, d2Result, 6);
s += pitch;
vst2_lane_u8(s, d2Result, 7);
}
return;
}

View File

@ -1,16 +1,19 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
EXPORT |vpx_lpf_horizontal_edge_8_neon|
EXPORT |vpx_lpf_horizontal_edge_16_neon|
EXPORT |vpx_lpf_vertical_16_neon|
;
EXPORT |aom_lpf_horizontal_edge_8_neon|
EXPORT |aom_lpf_horizontal_edge_16_neon|
EXPORT |aom_lpf_vertical_16_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
@ -55,7 +58,7 @@ h_count
vld1.u8 {d14}, [r8@64], r1 ; q6
vld1.u8 {d15}, [r8@64], r1 ; q7
bl vpx_wide_mbfilter_neon
bl aom_wide_mbfilter_neon
tst r7, #1
beq h_mbfilter
@ -118,7 +121,7 @@ h_next
ENDP ; |mb_lpf_horizontal_edge|
; void vpx_lpf_horizontal_edge_8_neon(uint8_t *s, int pitch,
; void aom_lpf_horizontal_edge_8_neon(uint8_t *s, int pitch,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh)
@ -127,12 +130,12 @@ h_next
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh
|vpx_lpf_horizontal_edge_8_neon| PROC
|aom_lpf_horizontal_edge_8_neon| PROC
mov r12, #1
b mb_lpf_horizontal_edge
ENDP ; |vpx_lpf_horizontal_edge_8_neon|
ENDP ; |aom_lpf_horizontal_edge_8_neon|
; void vpx_lpf_horizontal_edge_16_neon(uint8_t *s, int pitch,
; void aom_lpf_horizontal_edge_16_neon(uint8_t *s, int pitch,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh)
@ -141,12 +144,12 @@ h_next
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh
|vpx_lpf_horizontal_edge_16_neon| PROC
|aom_lpf_horizontal_edge_16_neon| PROC
mov r12, #2
b mb_lpf_horizontal_edge
ENDP ; |vpx_lpf_horizontal_edge_16_neon|
ENDP ; |aom_lpf_horizontal_edge_16_neon|
; void vpx_lpf_vertical_16_neon(uint8_t *s, int p,
; void aom_lpf_vertical_16_neon(uint8_t *s, int p,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh)
@ -155,7 +158,7 @@ h_next
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|vpx_lpf_vertical_16_neon| PROC
|aom_lpf_vertical_16_neon| PROC
push {r4-r8, lr}
vpush {d8-d15}
ldr r4, [sp, #88] ; load thresh
@ -205,7 +208,7 @@ h_next
vtrn.8 d12, d13
vtrn.8 d14, d15
bl vpx_wide_mbfilter_neon
bl aom_wide_mbfilter_neon
tst r7, #1
beq v_mbfilter
@ -308,9 +311,9 @@ v_end
vpop {d8-d15}
pop {r4-r8, pc}
ENDP ; |vpx_lpf_vertical_16_neon|
ENDP ; |aom_lpf_vertical_16_neon|
; void vpx_wide_mbfilter_neon();
; void aom_wide_mbfilter_neon();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store.
;
@ -334,7 +337,7 @@ v_end
; d13 q5
; d14 q6
; d15 q7
|vpx_wide_mbfilter_neon| PROC
|aom_wide_mbfilter_neon| PROC
mov r7, #0
; filter_mask
@ -630,6 +633,6 @@ v_end
vbif d3, d14, d17 ; oq6 |= q6 & ~(f2 & f & m)
bx lr
ENDP ; |vpx_wide_mbfilter_neon|
ENDP ; |aom_wide_mbfilter_neon|
END

View File

@ -0,0 +1,49 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
void aom_lpf_vertical_4_dual_neon(uint8_t *s, int p, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0,
const uint8_t *blimit1, const uint8_t *limit1,
const uint8_t *thresh1) {
aom_lpf_vertical_4_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_vertical_4_neon(s + 8 * p, p, blimit1, limit1, thresh1);
}
#if HAVE_NEON_ASM
void aom_lpf_horizontal_8_dual_neon(
uint8_t *s, int p /* pitch */, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1,
const uint8_t *limit1, const uint8_t *thresh1) {
aom_lpf_horizontal_8_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_horizontal_8_neon(s + 8, p, blimit1, limit1, thresh1);
}
void aom_lpf_vertical_8_dual_neon(uint8_t *s, int p, const uint8_t *blimit0,
const uint8_t *limit0, const uint8_t *thresh0,
const uint8_t *blimit1, const uint8_t *limit1,
const uint8_t *thresh1) {
aom_lpf_vertical_8_neon(s, p, blimit0, limit0, thresh0);
aom_lpf_vertical_8_neon(s + 8 * p, p, blimit1, limit1, thresh1);
}
void aom_lpf_vertical_16_dual_neon(uint8_t *s, int p, const uint8_t *blimit,
const uint8_t *limit,
const uint8_t *thresh) {
aom_lpf_vertical_16_neon(s, p, blimit, limit, thresh);
aom_lpf_vertical_16_neon(s + 8 * p, p, blimit, limit, thresh);
}
#endif // HAVE_NEON_ASM

View File

@ -1,25 +1,26 @@
/*
* Copyright (c) 2015 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./vpx_config.h"
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo,
const uint16x8_t vec_hi) {
const uint32x4_t vec_l_lo = vaddl_u16(vget_low_u16(vec_lo),
vget_high_u16(vec_lo));
const uint32x4_t vec_l_hi = vaddl_u16(vget_low_u16(vec_hi),
vget_high_u16(vec_hi));
const uint32x4_t vec_l_lo =
vaddl_u16(vget_low_u16(vec_lo), vget_high_u16(vec_lo));
const uint32x4_t vec_l_hi =
vaddl_u16(vget_low_u16(vec_hi), vget_high_u16(vec_hi));
const uint32x4_t a = vaddq_u32(vec_l_lo, vec_l_hi);
const uint64x2_t b = vpaddlq_u32(a);
const uint32x2_t c = vadd_u32(vreinterpret_u32_u64(vget_low_u64(b)),
@ -33,8 +34,7 @@ static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo,
static void sad_neon_64(const uint8x16_t vec_src_00,
const uint8x16_t vec_src_16,
const uint8x16_t vec_src_32,
const uint8x16_t vec_src_48,
const uint8_t *ref,
const uint8x16_t vec_src_48, const uint8_t *ref,
uint16x8_t *vec_sum_ref_lo,
uint16x8_t *vec_sum_ref_hi) {
const uint8x16_t vec_ref_00 = vld1q_u8(ref);
@ -63,8 +63,7 @@ static void sad_neon_64(const uint8x16_t vec_src_00,
// Calculate the absolute difference of 32 bytes from vec_src_00, vec_src_16,
// and ref. Accumulate partial sums in vec_sum_ref_lo and vec_sum_ref_hi.
static void sad_neon_32(const uint8x16_t vec_src_00,
const uint8x16_t vec_src_16,
const uint8_t *ref,
const uint8x16_t vec_src_16, const uint8_t *ref,
uint16x8_t *vec_sum_ref_lo,
uint16x8_t *vec_sum_ref_hi) {
const uint8x16_t vec_ref_00 = vld1q_u8(ref);
@ -80,7 +79,7 @@ static void sad_neon_32(const uint8x16_t vec_src_00,
vget_high_u8(vec_ref_16));
}
void vpx_sad64x64x4d_neon(const uint8_t *src, int src_stride,
void aom_sad64x64x4d_neon(const uint8_t *src, int src_stride,
const uint8_t *const ref[4], int ref_stride,
uint32_t *res) {
int i;
@ -126,7 +125,7 @@ void vpx_sad64x64x4d_neon(const uint8_t *src, int src_stride,
res[3] = horizontal_long_add_16x8(vec_sum_ref3_lo, vec_sum_ref3_hi);
}
void vpx_sad32x32x4d_neon(const uint8_t *src, int src_stride,
void aom_sad32x32x4d_neon(const uint8_t *src, int src_stride,
const uint8_t *const ref[4], int ref_stride,
uint32_t *res) {
int i;
@ -148,14 +147,14 @@ void vpx_sad32x32x4d_neon(const uint8_t *src, int src_stride,
const uint8x16_t vec_src_00 = vld1q_u8(src);
const uint8x16_t vec_src_16 = vld1q_u8(src + 16);
sad_neon_32(vec_src_00, vec_src_16, ref0,
&vec_sum_ref0_lo, &vec_sum_ref0_hi);
sad_neon_32(vec_src_00, vec_src_16, ref1,
&vec_sum_ref1_lo, &vec_sum_ref1_hi);
sad_neon_32(vec_src_00, vec_src_16, ref2,
&vec_sum_ref2_lo, &vec_sum_ref2_hi);
sad_neon_32(vec_src_00, vec_src_16, ref3,
&vec_sum_ref3_lo, &vec_sum_ref3_hi);
sad_neon_32(vec_src_00, vec_src_16, ref0, &vec_sum_ref0_lo,
&vec_sum_ref0_hi);
sad_neon_32(vec_src_00, vec_src_16, ref1, &vec_sum_ref1_lo,
&vec_sum_ref1_hi);
sad_neon_32(vec_src_00, vec_src_16, ref2, &vec_sum_ref2_lo,
&vec_sum_ref2_hi);
sad_neon_32(vec_src_00, vec_src_16, ref3, &vec_sum_ref3_lo,
&vec_sum_ref3_hi);
src += src_stride;
ref0 += ref_stride;
@ -170,7 +169,7 @@ void vpx_sad32x32x4d_neon(const uint8_t *src, int src_stride,
res[3] = horizontal_long_add_16x8(vec_sum_ref3_lo, vec_sum_ref3_hi);
}
void vpx_sad16x16x4d_neon(const uint8_t *src, int src_stride,
void aom_sad16x16x4d_neon(const uint8_t *src, int src_stride,
const uint8_t *const ref[4], int ref_stride,
uint32_t *res) {
int i;
@ -195,20 +194,20 @@ void vpx_sad16x16x4d_neon(const uint8_t *src, int src_stride,
const uint8x16_t vec_ref2 = vld1q_u8(ref2);
const uint8x16_t vec_ref3 = vld1q_u8(ref3);
vec_sum_ref0_lo = vabal_u8(vec_sum_ref0_lo, vget_low_u8(vec_src),
vget_low_u8(vec_ref0));
vec_sum_ref0_lo =
vabal_u8(vec_sum_ref0_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref0));
vec_sum_ref0_hi = vabal_u8(vec_sum_ref0_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref0));
vec_sum_ref1_lo = vabal_u8(vec_sum_ref1_lo, vget_low_u8(vec_src),
vget_low_u8(vec_ref1));
vec_sum_ref1_lo =
vabal_u8(vec_sum_ref1_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref1));
vec_sum_ref1_hi = vabal_u8(vec_sum_ref1_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref1));
vec_sum_ref2_lo = vabal_u8(vec_sum_ref2_lo, vget_low_u8(vec_src),
vget_low_u8(vec_ref2));
vec_sum_ref2_lo =
vabal_u8(vec_sum_ref2_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref2));
vec_sum_ref2_hi = vabal_u8(vec_sum_ref2_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref2));
vec_sum_ref3_lo = vabal_u8(vec_sum_ref3_lo, vget_low_u8(vec_src),
vget_low_u8(vec_ref3));
vec_sum_ref3_lo =
vabal_u8(vec_sum_ref3_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref3));
vec_sum_ref3_hi = vabal_u8(vec_sum_ref3_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref3));

View File

@ -1,15 +1,18 @@
;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |vpx_sad16x16_media|
EXPORT |aom_sad16x16_media|
ARM
REQUIRE8
@ -21,7 +24,7 @@
; r1 int src_stride
; r2 const unsigned char *ref_ptr
; r3 int ref_stride
|vpx_sad16x16_media| PROC
|aom_sad16x16_media| PROC
stmfd sp!, {r4-r12, lr}
pld [r0, r1, lsl #0]

View File

@ -1,24 +1,22 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./vpx_config.h"
#include "./aom_config.h"
#include "vpx/vpx_integer.h"
#include "aom/aom_integer.h"
unsigned int vpx_sad8x16_neon(
unsigned char *src_ptr,
int src_stride,
unsigned char *ref_ptr,
int ref_stride) {
unsigned int aom_sad8x16_neon(unsigned char *src_ptr, int src_stride,
unsigned char *ref_ptr, int ref_stride) {
uint8x8_t d0, d8;
uint16x8_t q12;
uint32x4_t q1;
@ -48,11 +46,8 @@ unsigned int vpx_sad8x16_neon(
return vget_lane_u32(d5, 0);
}
unsigned int vpx_sad4x4_neon(
unsigned char *src_ptr,
int src_stride,
unsigned char *ref_ptr,
int ref_stride) {
unsigned int aom_sad4x4_neon(unsigned char *src_ptr, int src_stride,
unsigned char *ref_ptr, int ref_stride) {
uint8x8_t d0, d8;
uint16x8_t q12;
uint32x2_t d1;
@ -79,11 +74,8 @@ unsigned int vpx_sad4x4_neon(
return vget_lane_u32(vreinterpret_u32_u64(d3), 0);
}
unsigned int vpx_sad16x8_neon(
unsigned char *src_ptr,
int src_stride,
unsigned char *ref_ptr,
int ref_stride) {
unsigned int aom_sad16x8_neon(unsigned char *src_ptr, int src_stride,
unsigned char *ref_ptr, int ref_stride) {
uint8x16_t q0, q4;
uint16x8_t q12, q13;
uint32x4_t q1;
@ -118,10 +110,10 @@ unsigned int vpx_sad16x8_neon(
static INLINE unsigned int horizontal_long_add_16x8(const uint16x8_t vec_lo,
const uint16x8_t vec_hi) {
const uint32x4_t vec_l_lo = vaddl_u16(vget_low_u16(vec_lo),
vget_high_u16(vec_lo));
const uint32x4_t vec_l_hi = vaddl_u16(vget_low_u16(vec_hi),
vget_high_u16(vec_hi));
const uint32x4_t vec_l_lo =
vaddl_u16(vget_low_u16(vec_lo), vget_high_u16(vec_lo));
const uint32x4_t vec_l_hi =
vaddl_u16(vget_low_u16(vec_hi), vget_high_u16(vec_hi));
const uint32x4_t a = vaddq_u32(vec_l_lo, vec_l_hi);
const uint64x2_t b = vpaddlq_u32(a);
const uint32x2_t c = vadd_u32(vreinterpret_u32_u64(vget_low_u64(b)),
@ -136,7 +128,7 @@ static INLINE unsigned int horizontal_add_16x8(const uint16x8_t vec_16x8) {
return vget_lane_u32(c, 0);
}
unsigned int vpx_sad64x64_neon(const uint8_t *src, int src_stride,
unsigned int aom_sad64x64_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) {
int i;
uint16x8_t vec_accum_lo = vdupq_n_u16(0);
@ -172,7 +164,7 @@ unsigned int vpx_sad64x64_neon(const uint8_t *src, int src_stride,
return horizontal_long_add_16x8(vec_accum_lo, vec_accum_hi);
}
unsigned int vpx_sad32x32_neon(const uint8_t *src, int src_stride,
unsigned int aom_sad32x32_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) {
int i;
uint16x8_t vec_accum_lo = vdupq_n_u16(0);
@ -197,7 +189,7 @@ unsigned int vpx_sad32x32_neon(const uint8_t *src, int src_stride,
return horizontal_add_16x8(vaddq_u16(vec_accum_lo, vec_accum_hi));
}
unsigned int vpx_sad16x16_neon(const uint8_t *src, int src_stride,
unsigned int aom_sad16x16_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) {
int i;
uint16x8_t vec_accum_lo = vdupq_n_u16(0);
@ -208,15 +200,15 @@ unsigned int vpx_sad16x16_neon(const uint8_t *src, int src_stride,
const uint8x16_t vec_ref = vld1q_u8(ref);
src += src_stride;
ref += ref_stride;
vec_accum_lo = vabal_u8(vec_accum_lo, vget_low_u8(vec_src),
vget_low_u8(vec_ref));
vec_accum_hi = vabal_u8(vec_accum_hi, vget_high_u8(vec_src),
vget_high_u8(vec_ref));
vec_accum_lo =
vabal_u8(vec_accum_lo, vget_low_u8(vec_src), vget_low_u8(vec_ref));
vec_accum_hi =
vabal_u8(vec_accum_hi, vget_high_u8(vec_src), vget_high_u8(vec_ref));
}
return horizontal_add_16x8(vaddq_u16(vec_accum_lo, vec_accum_hi));
}
unsigned int vpx_sad8x8_neon(const uint8_t *src, int src_stride,
unsigned int aom_sad8x8_neon(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride) {
int i;
uint16x8_t vec_accum = vdupq_n_u16(0);

View File

@ -0,0 +1,39 @@
;
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |aom_push_neon|
EXPORT |aom_pop_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
|aom_push_neon| PROC
vst1.i64 {d8, d9, d10, d11}, [r0]!
vst1.i64 {d12, d13, d14, d15}, [r0]!
bx lr
ENDP
|aom_pop_neon| PROC
vld1.i64 {d8, d9, d10, d11}, [r0]!
vld1.i64 {d12, d13, d14, d15}, [r0]!
bx lr
ENDP
END

View File

@ -0,0 +1,81 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#include "./aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
#if HAVE_MEDIA
static const int16_t bilinear_filters_media[8][2] = { { 128, 0 }, { 112, 16 },
{ 96, 32 }, { 80, 48 },
{ 64, 64 }, { 48, 80 },
{ 32, 96 }, { 16, 112 } };
extern void aom_filter_block2d_bil_first_pass_media(
const uint8_t *src_ptr, uint16_t *dst_ptr, uint32_t src_pitch,
uint32_t height, uint32_t width, const int16_t *filter);
extern void aom_filter_block2d_bil_second_pass_media(
const uint16_t *src_ptr, uint8_t *dst_ptr, int32_t src_pitch,
uint32_t height, uint32_t width, const int16_t *filter);
unsigned int aom_sub_pixel_variance8x8_media(
const uint8_t *src_ptr, int src_pixels_per_line, int xoffset, int yoffset,
const uint8_t *dst_ptr, int dst_pixels_per_line, unsigned int *sse) {
uint16_t first_pass[10 * 8];
uint8_t second_pass[8 * 8];
const int16_t *HFilter, *VFilter;
HFilter = bilinear_filters_media[xoffset];
VFilter = bilinear_filters_media[yoffset];
aom_filter_block2d_bil_first_pass_media(src_ptr, first_pass,
src_pixels_per_line, 9, 8, HFilter);
aom_filter_block2d_bil_second_pass_media(first_pass, second_pass, 8, 8, 8,
VFilter);
return aom_variance8x8_media(second_pass, 8, dst_ptr, dst_pixels_per_line,
sse);
}
unsigned int aom_sub_pixel_variance16x16_media(
const uint8_t *src_ptr, int src_pixels_per_line, int xoffset, int yoffset,
const uint8_t *dst_ptr, int dst_pixels_per_line, unsigned int *sse) {
uint16_t first_pass[36 * 16];
uint8_t second_pass[20 * 16];
const int16_t *HFilter, *VFilter;
unsigned int var;
if (xoffset == 4 && yoffset == 0) {
var = aom_variance_halfpixvar16x16_h_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else if (xoffset == 0 && yoffset == 4) {
var = aom_variance_halfpixvar16x16_v_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else if (xoffset == 4 && yoffset == 4) {
var = aom_variance_halfpixvar16x16_hv_media(
src_ptr, src_pixels_per_line, dst_ptr, dst_pixels_per_line, sse);
} else {
HFilter = bilinear_filters_media[xoffset];
VFilter = bilinear_filters_media[yoffset];
aom_filter_block2d_bil_first_pass_media(
src_ptr, first_pass, src_pixels_per_line, 17, 16, HFilter);
aom_filter_block2d_bil_second_pass_media(first_pass, second_pass, 16, 16,
16, VFilter);
var = aom_variance16x16_media(second_pass, 16, dst_ptr, dst_pixels_per_line,
sse);
}
return var;
}
#endif // HAVE_MEDIA

View File

@ -1,31 +1,26 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./vpx_dsp_rtcd.h"
#include "./vpx_config.h"
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "vpx_ports/mem.h"
#include "vpx/vpx_integer.h"
#include "aom_ports/mem.h"
#include "aom/aom_integer.h"
#include "vpx_dsp/variance.h"
#include "aom_dsp/variance.h"
static const uint8_t bilinear_filters[8][2] = {
{ 128, 0, },
{ 112, 16, },
{ 96, 32, },
{ 80, 48, },
{ 64, 64, },
{ 48, 80, },
{ 32, 96, },
{ 16, 112, },
{ 128, 0 }, { 112, 16 }, { 96, 32 }, { 80, 48 },
{ 64, 64 }, { 48, 80 }, { 32, 96 }, { 16, 112 },
};
static void var_filter_block2d_bil_w8(const uint8_t *src_ptr,
@ -79,74 +74,61 @@ static void var_filter_block2d_bil_w16(const uint8_t *src_ptr,
}
}
unsigned int vpx_sub_pixel_variance8x8_neon(const uint8_t *src,
int src_stride,
int xoffset,
int yoffset,
const uint8_t *dst,
int dst_stride,
unsigned int aom_sub_pixel_variance8x8_neon(const uint8_t *src, int src_stride,
int xoffset, int yoffset,
const uint8_t *dst, int dst_stride,
unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[8 * 8]);
DECLARE_ALIGNED(16, uint8_t, fdata3[9 * 8]);
var_filter_block2d_bil_w8(src, fdata3, src_stride, 1,
9, 8,
var_filter_block2d_bil_w8(src, fdata3, src_stride, 1, 9, 8,
bilinear_filters[xoffset]);
var_filter_block2d_bil_w8(fdata3, temp2, 8, 8, 8,
8, bilinear_filters[yoffset]);
return vpx_variance8x8_neon(temp2, 8, dst, dst_stride, sse);
var_filter_block2d_bil_w8(fdata3, temp2, 8, 8, 8, 8,
bilinear_filters[yoffset]);
return aom_variance8x8_neon(temp2, 8, dst, dst_stride, sse);
}
unsigned int vpx_sub_pixel_variance16x16_neon(const uint8_t *src,
int src_stride,
int xoffset,
int yoffset,
const uint8_t *dst,
unsigned int aom_sub_pixel_variance16x16_neon(const uint8_t *src,
int src_stride, int xoffset,
int yoffset, const uint8_t *dst,
int dst_stride,
unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[16 * 16]);
DECLARE_ALIGNED(16, uint8_t, fdata3[17 * 16]);
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1,
17, 16,
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, 17, 16,
bilinear_filters[xoffset]);
var_filter_block2d_bil_w16(fdata3, temp2, 16, 16, 16,
16, bilinear_filters[yoffset]);
return vpx_variance16x16_neon(temp2, 16, dst, dst_stride, sse);
var_filter_block2d_bil_w16(fdata3, temp2, 16, 16, 16, 16,
bilinear_filters[yoffset]);
return aom_variance16x16_neon(temp2, 16, dst, dst_stride, sse);
}
unsigned int vpx_sub_pixel_variance32x32_neon(const uint8_t *src,
int src_stride,
int xoffset,
int yoffset,
const uint8_t *dst,
unsigned int aom_sub_pixel_variance32x32_neon(const uint8_t *src,
int src_stride, int xoffset,
int yoffset, const uint8_t *dst,
int dst_stride,
unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[32 * 32]);
DECLARE_ALIGNED(16, uint8_t, fdata3[33 * 32]);
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1,
33, 32,
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, 33, 32,
bilinear_filters[xoffset]);
var_filter_block2d_bil_w16(fdata3, temp2, 32, 32, 32,
32, bilinear_filters[yoffset]);
return vpx_variance32x32_neon(temp2, 32, dst, dst_stride, sse);
var_filter_block2d_bil_w16(fdata3, temp2, 32, 32, 32, 32,
bilinear_filters[yoffset]);
return aom_variance32x32_neon(temp2, 32, dst, dst_stride, sse);
}
unsigned int vpx_sub_pixel_variance64x64_neon(const uint8_t *src,
int src_stride,
int xoffset,
int yoffset,
const uint8_t *dst,
unsigned int aom_sub_pixel_variance64x64_neon(const uint8_t *src,
int src_stride, int xoffset,
int yoffset, const uint8_t *dst,
int dst_stride,
unsigned int *sse) {
DECLARE_ALIGNED(16, uint8_t, temp2[64 * 64]);
DECLARE_ALIGNED(16, uint8_t, fdata3[65 * 64]);
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1,
65, 64,
var_filter_block2d_bil_w16(src, fdata3, src_stride, 1, 65, 64,
bilinear_filters[xoffset]);
var_filter_block2d_bil_w16(fdata3, temp2, 64, 64, 64,
64, bilinear_filters[yoffset]);
return vpx_variance64x64_neon(temp2, 64, dst, dst_stride, sse);
var_filter_block2d_bil_w16(fdata3, temp2, 64, 64, 64, 64,
bilinear_filters[yoffset]);
return aom_variance64x64_neon(temp2, 64, dst, dst_stride, sse);
}

View File

@ -0,0 +1,80 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_config.h"
#include "aom/aom_integer.h"
void aom_subtract_block_neon(int rows, int cols, int16_t *diff,
ptrdiff_t diff_stride, const uint8_t *src,
ptrdiff_t src_stride, const uint8_t *pred,
ptrdiff_t pred_stride) {
int r, c;
if (cols > 16) {
for (r = 0; r < rows; ++r) {
for (c = 0; c < cols; c += 32) {
const uint8x16_t v_src_00 = vld1q_u8(&src[c + 0]);
const uint8x16_t v_src_16 = vld1q_u8(&src[c + 16]);
const uint8x16_t v_pred_00 = vld1q_u8(&pred[c + 0]);
const uint8x16_t v_pred_16 = vld1q_u8(&pred[c + 16]);
const uint16x8_t v_diff_lo_00 =
vsubl_u8(vget_low_u8(v_src_00), vget_low_u8(v_pred_00));
const uint16x8_t v_diff_hi_00 =
vsubl_u8(vget_high_u8(v_src_00), vget_high_u8(v_pred_00));
const uint16x8_t v_diff_lo_16 =
vsubl_u8(vget_low_u8(v_src_16), vget_low_u8(v_pred_16));
const uint16x8_t v_diff_hi_16 =
vsubl_u8(vget_high_u8(v_src_16), vget_high_u8(v_pred_16));
vst1q_s16(&diff[c + 0], vreinterpretq_s16_u16(v_diff_lo_00));
vst1q_s16(&diff[c + 8], vreinterpretq_s16_u16(v_diff_hi_00));
vst1q_s16(&diff[c + 16], vreinterpretq_s16_u16(v_diff_lo_16));
vst1q_s16(&diff[c + 24], vreinterpretq_s16_u16(v_diff_hi_16));
}
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
} else if (cols > 8) {
for (r = 0; r < rows; ++r) {
const uint8x16_t v_src = vld1q_u8(&src[0]);
const uint8x16_t v_pred = vld1q_u8(&pred[0]);
const uint16x8_t v_diff_lo =
vsubl_u8(vget_low_u8(v_src), vget_low_u8(v_pred));
const uint16x8_t v_diff_hi =
vsubl_u8(vget_high_u8(v_src), vget_high_u8(v_pred));
vst1q_s16(&diff[0], vreinterpretq_s16_u16(v_diff_lo));
vst1q_s16(&diff[8], vreinterpretq_s16_u16(v_diff_hi));
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
} else if (cols > 4) {
for (r = 0; r < rows; ++r) {
const uint8x8_t v_src = vld1_u8(&src[0]);
const uint8x8_t v_pred = vld1_u8(&pred[0]);
const uint16x8_t v_diff = vsubl_u8(v_src, v_pred);
vst1q_s16(&diff[0], vreinterpretq_s16_u16(v_diff));
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
} else {
for (r = 0; r < rows; ++r) {
for (c = 0; c < cols; ++c) diff[c] = src[c] - pred[c];
diff += diff_stride;
pred += pred_stride;
src += src_stride;
}
}
}

View File

@ -1,15 +1,18 @@
;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |vpx_variance_halfpixvar16x16_h_media|
EXPORT |aom_variance_halfpixvar16x16_h_media|
ARM
REQUIRE8
@ -22,7 +25,7 @@
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|vpx_variance_halfpixvar16x16_h_media| PROC
|aom_variance_halfpixvar16x16_h_media| PROC
stmfd sp!, {r4-r12, lr}

View File

@ -1,15 +1,18 @@
;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |vpx_variance_halfpixvar16x16_hv_media|
EXPORT |aom_variance_halfpixvar16x16_hv_media|
ARM
REQUIRE8
@ -22,7 +25,7 @@
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|vpx_variance_halfpixvar16x16_hv_media| PROC
|aom_variance_halfpixvar16x16_hv_media| PROC
stmfd sp!, {r4-r12, lr}

View File

@ -1,15 +1,18 @@
;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |vpx_variance_halfpixvar16x16_v_media|
EXPORT |aom_variance_halfpixvar16x16_v_media|
ARM
REQUIRE8
@ -22,7 +25,7 @@
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|vpx_variance_halfpixvar16x16_v_media| PROC
|aom_variance_halfpixvar16x16_v_media| PROC
stmfd sp!, {r4-r12, lr}

View File

@ -1,17 +1,20 @@
;
; Copyright (c) 2011 The WebM project authors. All Rights Reserved.
; Copyright (c) 2016, Alliance for Open Media. All rights reserved
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
; This source code is subject to the terms of the BSD 2 Clause License and
; the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
; was not distributed with this source code in the LICENSE file, you can
; obtain it at www.aomedia.org/license/software. If the Alliance for Open
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
;
EXPORT |vpx_variance16x16_media|
EXPORT |vpx_variance8x8_media|
EXPORT |vpx_mse16x16_media|
EXPORT |aom_variance16x16_media|
EXPORT |aom_variance8x8_media|
EXPORT |aom_mse16x16_media|
ARM
REQUIRE8
@ -24,7 +27,7 @@
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|vpx_variance16x16_media| PROC
|aom_variance16x16_media| PROC
stmfd sp!, {r4-r12, lr}
@ -157,7 +160,7 @@ loop16x16
; r2 unsigned char *ref_ptr
; r3 int recon_stride
; stack unsigned int *sse
|vpx_variance8x8_media| PROC
|aom_variance8x8_media| PROC
push {r4-r10, lr}
@ -241,10 +244,10 @@ loop8x8
; r3 int recon_stride
; stack unsigned int *sse
;
;note: Based on vpx_variance16x16_media. In this function, sum is never used.
;note: Based on aom_variance16x16_media. In this function, sum is never used.
; So, we can remove this part of calculation.
|vpx_mse16x16_media| PROC
|aom_mse16x16_media| PROC
push {r4-r9, lr}

400
aom_dsp/arm/variance_neon.c Normal file
View File

@ -0,0 +1,400 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <arm_neon.h>
#include "./aom_dsp_rtcd.h"
#include "./aom_config.h"
#include "aom/aom_integer.h"
#include "aom_ports/mem.h"
static INLINE int horizontal_add_s16x8(const int16x8_t v_16x8) {
const int32x4_t a = vpaddlq_s16(v_16x8);
const int64x2_t b = vpaddlq_s32(a);
const int32x2_t c = vadd_s32(vreinterpret_s32_s64(vget_low_s64(b)),
vreinterpret_s32_s64(vget_high_s64(b)));
return vget_lane_s32(c, 0);
}
static INLINE int horizontal_add_s32x4(const int32x4_t v_32x4) {
const int64x2_t b = vpaddlq_s32(v_32x4);
const int32x2_t c = vadd_s32(vreinterpret_s32_s64(vget_low_s64(b)),
vreinterpret_s32_s64(vget_high_s64(b)));
return vget_lane_s32(c, 0);
}
// w * h must be less than 2048 or local variable v_sum may overflow.
static void variance_neon_w8(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, int w, int h, uint32_t *sse,
int *sum) {
int i, j;
int16x8_t v_sum = vdupq_n_s16(0);
int32x4_t v_sse_lo = vdupq_n_s32(0);
int32x4_t v_sse_hi = vdupq_n_s32(0);
for (i = 0; i < h; ++i) {
for (j = 0; j < w; j += 8) {
const uint8x8_t v_a = vld1_u8(&a[j]);
const uint8x8_t v_b = vld1_u8(&b[j]);
const uint16x8_t v_diff = vsubl_u8(v_a, v_b);
const int16x8_t sv_diff = vreinterpretq_s16_u16(v_diff);
v_sum = vaddq_s16(v_sum, sv_diff);
v_sse_lo =
vmlal_s16(v_sse_lo, vget_low_s16(sv_diff), vget_low_s16(sv_diff));
v_sse_hi =
vmlal_s16(v_sse_hi, vget_high_s16(sv_diff), vget_high_s16(sv_diff));
}
a += a_stride;
b += b_stride;
}
*sum = horizontal_add_s16x8(v_sum);
*sse = (unsigned int)horizontal_add_s32x4(vaddq_s32(v_sse_lo, v_sse_hi));
}
void aom_get8x8var_neon(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, unsigned int *sse, int *sum) {
variance_neon_w8(a, a_stride, b, b_stride, 8, 8, sse, sum);
}
void aom_get16x16var_neon(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, unsigned int *sse, int *sum) {
variance_neon_w8(a, a_stride, b, b_stride, 16, 16, sse, sum);
}
unsigned int aom_variance8x8_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum;
variance_neon_w8(a, a_stride, b, b_stride, 8, 8, sse, &sum);
return *sse - (((int64_t)sum * sum) >> 6); // >> 6 = / 8 * 8
}
unsigned int aom_variance16x16_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum;
variance_neon_w8(a, a_stride, b, b_stride, 16, 16, sse, &sum);
return *sse - (((int64_t)sum * sum) >> 8); // >> 8 = / 16 * 16
}
unsigned int aom_variance32x32_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum;
variance_neon_w8(a, a_stride, b, b_stride, 32, 32, sse, &sum);
return *sse - (((int64_t)sum * sum) >> 10); // >> 10 = / 32 * 32
}
unsigned int aom_variance32x64_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum1, sum2;
uint32_t sse1, sse2;
variance_neon_w8(a, a_stride, b, b_stride, 32, 32, &sse1, &sum1);
variance_neon_w8(a + (32 * a_stride), a_stride, b + (32 * b_stride), b_stride,
32, 32, &sse2, &sum2);
*sse = sse1 + sse2;
sum1 += sum2;
return *sse - (((int64_t)sum1 * sum1) >> 11); // >> 11 = / 32 * 64
}
unsigned int aom_variance64x32_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum1, sum2;
uint32_t sse1, sse2;
variance_neon_w8(a, a_stride, b, b_stride, 64, 16, &sse1, &sum1);
variance_neon_w8(a + (16 * a_stride), a_stride, b + (16 * b_stride), b_stride,
64, 16, &sse2, &sum2);
*sse = sse1 + sse2;
sum1 += sum2;
return *sse - (((int64_t)sum1 * sum1) >> 11); // >> 11 = / 32 * 64
}
unsigned int aom_variance64x64_neon(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
unsigned int *sse) {
int sum1, sum2;
uint32_t sse1, sse2;
variance_neon_w8(a, a_stride, b, b_stride, 64, 16, &sse1, &sum1);
variance_neon_w8(a + (16 * a_stride), a_stride, b + (16 * b_stride), b_stride,
64, 16, &sse2, &sum2);
sse1 += sse2;
sum1 += sum2;
variance_neon_w8(a + (16 * 2 * a_stride), a_stride, b + (16 * 2 * b_stride),
b_stride, 64, 16, &sse2, &sum2);
sse1 += sse2;
sum1 += sum2;
variance_neon_w8(a + (16 * 3 * a_stride), a_stride, b + (16 * 3 * b_stride),
b_stride, 64, 16, &sse2, &sum2);
*sse = sse1 + sse2;
sum1 += sum2;
return *sse - (((int64_t)sum1 * sum1) >> 12); // >> 12 = / 64 * 64
}
unsigned int aom_variance16x8_neon(const unsigned char *src_ptr,
int source_stride,
const unsigned char *ref_ptr,
int recon_stride, unsigned int *sse) {
int i;
int16x4_t d22s16, d23s16, d24s16, d25s16, d26s16, d27s16, d28s16, d29s16;
uint32x2_t d0u32, d10u32;
int64x1_t d0s64, d1s64;
uint8x16_t q0u8, q1u8, q2u8, q3u8;
uint16x8_t q11u16, q12u16, q13u16, q14u16;
int32x4_t q8s32, q9s32, q10s32;
int64x2_t q0s64, q1s64, q5s64;
q8s32 = vdupq_n_s32(0);
q9s32 = vdupq_n_s32(0);
q10s32 = vdupq_n_s32(0);
for (i = 0; i < 4; i++) {
q0u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
q1u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
__builtin_prefetch(src_ptr);
q2u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
q3u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
__builtin_prefetch(ref_ptr);
q11u16 = vsubl_u8(vget_low_u8(q0u8), vget_low_u8(q2u8));
q12u16 = vsubl_u8(vget_high_u8(q0u8), vget_high_u8(q2u8));
q13u16 = vsubl_u8(vget_low_u8(q1u8), vget_low_u8(q3u8));
q14u16 = vsubl_u8(vget_high_u8(q1u8), vget_high_u8(q3u8));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q11u16));
q9s32 = vmlal_s16(q9s32, d22s16, d22s16);
q10s32 = vmlal_s16(q10s32, d23s16, d23s16);
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q12u16));
q9s32 = vmlal_s16(q9s32, d24s16, d24s16);
q10s32 = vmlal_s16(q10s32, d25s16, d25s16);
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q13u16));
q9s32 = vmlal_s16(q9s32, d26s16, d26s16);
q10s32 = vmlal_s16(q10s32, d27s16, d27s16);
d28s16 = vreinterpret_s16_u16(vget_low_u16(q14u16));
d29s16 = vreinterpret_s16_u16(vget_high_u16(q14u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q14u16));
q9s32 = vmlal_s16(q9s32, d28s16, d28s16);
q10s32 = vmlal_s16(q10s32, d29s16, d29s16);
}
q10s32 = vaddq_s32(q10s32, q9s32);
q0s64 = vpaddlq_s32(q8s32);
q1s64 = vpaddlq_s32(q10s32);
d0s64 = vadd_s64(vget_low_s64(q0s64), vget_high_s64(q0s64));
d1s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
q5s64 = vmull_s32(vreinterpret_s32_s64(d0s64), vreinterpret_s32_s64(d0s64));
vst1_lane_u32((uint32_t *)sse, vreinterpret_u32_s64(d1s64), 0);
d10u32 = vshr_n_u32(vreinterpret_u32_s64(vget_low_s64(q5s64)), 7);
d0u32 = vsub_u32(vreinterpret_u32_s64(d1s64), d10u32);
return vget_lane_u32(d0u32, 0);
}
unsigned int aom_variance8x16_neon(const unsigned char *src_ptr,
int source_stride,
const unsigned char *ref_ptr,
int recon_stride, unsigned int *sse) {
int i;
uint8x8_t d0u8, d2u8, d4u8, d6u8;
int16x4_t d22s16, d23s16, d24s16, d25s16;
uint32x2_t d0u32, d10u32;
int64x1_t d0s64, d1s64;
uint16x8_t q11u16, q12u16;
int32x4_t q8s32, q9s32, q10s32;
int64x2_t q0s64, q1s64, q5s64;
q8s32 = vdupq_n_s32(0);
q9s32 = vdupq_n_s32(0);
q10s32 = vdupq_n_s32(0);
for (i = 0; i < 8; i++) {
d0u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d2u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
__builtin_prefetch(src_ptr);
d4u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d6u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
__builtin_prefetch(ref_ptr);
q11u16 = vsubl_u8(d0u8, d4u8);
q12u16 = vsubl_u8(d2u8, d6u8);
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q11u16));
q9s32 = vmlal_s16(q9s32, d22s16, d22s16);
q10s32 = vmlal_s16(q10s32, d23s16, d23s16);
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
q8s32 = vpadalq_s16(q8s32, vreinterpretq_s16_u16(q12u16));
q9s32 = vmlal_s16(q9s32, d24s16, d24s16);
q10s32 = vmlal_s16(q10s32, d25s16, d25s16);
}
q10s32 = vaddq_s32(q10s32, q9s32);
q0s64 = vpaddlq_s32(q8s32);
q1s64 = vpaddlq_s32(q10s32);
d0s64 = vadd_s64(vget_low_s64(q0s64), vget_high_s64(q0s64));
d1s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
q5s64 = vmull_s32(vreinterpret_s32_s64(d0s64), vreinterpret_s32_s64(d0s64));
vst1_lane_u32((uint32_t *)sse, vreinterpret_u32_s64(d1s64), 0);
d10u32 = vshr_n_u32(vreinterpret_u32_s64(vget_low_s64(q5s64)), 7);
d0u32 = vsub_u32(vreinterpret_u32_s64(d1s64), d10u32);
return vget_lane_u32(d0u32, 0);
}
unsigned int aom_mse16x16_neon(const unsigned char *src_ptr, int source_stride,
const unsigned char *ref_ptr, int recon_stride,
unsigned int *sse) {
int i;
int16x4_t d22s16, d23s16, d24s16, d25s16, d26s16, d27s16, d28s16, d29s16;
int64x1_t d0s64;
uint8x16_t q0u8, q1u8, q2u8, q3u8;
int32x4_t q7s32, q8s32, q9s32, q10s32;
uint16x8_t q11u16, q12u16, q13u16, q14u16;
int64x2_t q1s64;
q7s32 = vdupq_n_s32(0);
q8s32 = vdupq_n_s32(0);
q9s32 = vdupq_n_s32(0);
q10s32 = vdupq_n_s32(0);
for (i = 0; i < 8; i++) { // mse16x16_neon_loop
q0u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
q1u8 = vld1q_u8(src_ptr);
src_ptr += source_stride;
q2u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
q3u8 = vld1q_u8(ref_ptr);
ref_ptr += recon_stride;
q11u16 = vsubl_u8(vget_low_u8(q0u8), vget_low_u8(q2u8));
q12u16 = vsubl_u8(vget_high_u8(q0u8), vget_high_u8(q2u8));
q13u16 = vsubl_u8(vget_low_u8(q1u8), vget_low_u8(q3u8));
q14u16 = vsubl_u8(vget_high_u8(q1u8), vget_high_u8(q3u8));
d22s16 = vreinterpret_s16_u16(vget_low_u16(q11u16));
d23s16 = vreinterpret_s16_u16(vget_high_u16(q11u16));
q7s32 = vmlal_s16(q7s32, d22s16, d22s16);
q8s32 = vmlal_s16(q8s32, d23s16, d23s16);
d24s16 = vreinterpret_s16_u16(vget_low_u16(q12u16));
d25s16 = vreinterpret_s16_u16(vget_high_u16(q12u16));
q9s32 = vmlal_s16(q9s32, d24s16, d24s16);
q10s32 = vmlal_s16(q10s32, d25s16, d25s16);
d26s16 = vreinterpret_s16_u16(vget_low_u16(q13u16));
d27s16 = vreinterpret_s16_u16(vget_high_u16(q13u16));
q7s32 = vmlal_s16(q7s32, d26s16, d26s16);
q8s32 = vmlal_s16(q8s32, d27s16, d27s16);
d28s16 = vreinterpret_s16_u16(vget_low_u16(q14u16));
d29s16 = vreinterpret_s16_u16(vget_high_u16(q14u16));
q9s32 = vmlal_s16(q9s32, d28s16, d28s16);
q10s32 = vmlal_s16(q10s32, d29s16, d29s16);
}
q7s32 = vaddq_s32(q7s32, q8s32);
q9s32 = vaddq_s32(q9s32, q10s32);
q10s32 = vaddq_s32(q7s32, q9s32);
q1s64 = vpaddlq_s32(q10s32);
d0s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
vst1_lane_u32((uint32_t *)sse, vreinterpret_u32_s64(d0s64), 0);
return vget_lane_u32(vreinterpret_u32_s64(d0s64), 0);
}
unsigned int aom_get4x4sse_cs_neon(const unsigned char *src_ptr,
int source_stride,
const unsigned char *ref_ptr,
int recon_stride) {
int16x4_t d22s16, d24s16, d26s16, d28s16;
int64x1_t d0s64;
uint8x8_t d0u8, d1u8, d2u8, d3u8, d4u8, d5u8, d6u8, d7u8;
int32x4_t q7s32, q8s32, q9s32, q10s32;
uint16x8_t q11u16, q12u16, q13u16, q14u16;
int64x2_t q1s64;
d0u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d4u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d1u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d5u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d2u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d6u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
d3u8 = vld1_u8(src_ptr);
src_ptr += source_stride;
d7u8 = vld1_u8(ref_ptr);
ref_ptr += recon_stride;
q11u16 = vsubl_u8(d0u8, d4u8);
q12u16 = vsubl_u8(d1u8, d5u8);
q13u16 = vsubl_u8(d2u8, d6u8);
q14u16 = vsubl_u8(d3u8, d7u8);
d22s16 = vget_low_s16(vreinterpretq_s16_u16(q11u16));
d24s16 = vget_low_s16(vreinterpretq_s16_u16(q12u16));
d26s16 = vget_low_s16(vreinterpretq_s16_u16(q13u16));
d28s16 = vget_low_s16(vreinterpretq_s16_u16(q14u16));
q7s32 = vmull_s16(d22s16, d22s16);
q8s32 = vmull_s16(d24s16, d24s16);
q9s32 = vmull_s16(d26s16, d26s16);
q10s32 = vmull_s16(d28s16, d28s16);
q7s32 = vaddq_s32(q7s32, q8s32);
q9s32 = vaddq_s32(q9s32, q10s32);
q9s32 = vaddq_s32(q7s32, q9s32);
q1s64 = vpaddlq_s32(q9s32);
d0s64 = vadd_s64(vget_low_s64(q1s64), vget_high_s64(q1s64));
return vget_lane_u32(vreinterpret_u32_s64(d0s64), 0);
}

View File

@ -1,33 +1,36 @@
/*
* Copyright (c) 2014 The WebM project authors. All Rights Reserved.
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <stdlib.h>
#include "./vpx_dsp_rtcd.h"
#include "vpx_ports/mem.h"
#include "./aom_dsp_rtcd.h"
#include "aom_ports/mem.h"
unsigned int vpx_avg_8x8_c(const uint8_t *s, int p) {
unsigned int aom_avg_8x8_c(const uint8_t *src, int stride) {
int i, j;
int sum = 0;
for (i = 0; i < 8; ++i, s+=p)
for (j = 0; j < 8; sum += s[j], ++j) {}
return (sum + 32) >> 6;
for (i = 0; i < 8; ++i, src += stride)
for (j = 0; j < 8; sum += src[j], ++j) {
}
unsigned int vpx_avg_4x4_c(const uint8_t *s, int p) {
return ROUND_POWER_OF_TWO(sum, 6);
}
unsigned int aom_avg_4x4_c(const uint8_t *src, int stride) {
int i, j;
int sum = 0;
for (i = 0; i < 4; ++i, s+=p)
for (j = 0; j < 4; sum += s[j], ++j) {}
for (i = 0; i < 4; ++i, src += stride)
for (j = 0; j < 4; sum += src[j], ++j) {
}
return (sum + 8) >> 4;
return ROUND_POWER_OF_TWO(sum, 4);
}
// src_diff: first pass, 9 bit, dynamic range [-255, 255]
@ -64,7 +67,7 @@ static void hadamard_col8(const int16_t *src_diff, int src_stride,
// The order of the output coeff of the hadamard is not important. For
// optimization purposes the final transpose may be skipped.
void vpx_hadamard_8x8_c(const int16_t *src_diff, int src_stride,
void aom_hadamard_8x8_c(const int16_t *src_diff, int src_stride,
int16_t *coeff) {
int idx;
int16_t buffer[64];
@ -87,14 +90,14 @@ void vpx_hadamard_8x8_c(const int16_t *src_diff, int src_stride,
}
// In place 16x16 2D Hadamard transform
void vpx_hadamard_16x16_c(const int16_t *src_diff, int src_stride,
void aom_hadamard_16x16_c(const int16_t *src_diff, int src_stride,
int16_t *coeff) {
int idx;
for (idx = 0; idx < 4; ++idx) {
// src_diff: 9 bit, dynamic range [-255, 255]
const int16_t *src_ptr = src_diff + (idx >> 1) * 8 * src_stride
+ (idx & 0x01) * 8;
vpx_hadamard_8x8_c(src_ptr, src_stride, coeff + idx * 64);
const int16_t *src_ptr =
src_diff + (idx >> 1) * 8 * src_stride + (idx & 0x01) * 8;
aom_hadamard_8x8_c(src_ptr, src_stride, coeff + idx * 64);
}
// coeff: 15 bit, dynamic range [-16320, 16320]
@ -120,11 +123,10 @@ void vpx_hadamard_16x16_c(const int16_t *src_diff, int src_stride,
// coeff: 16 bits, dynamic range [-32640, 32640].
// length: value range {16, 64, 256, 1024}.
int vpx_satd_c(const int16_t *coeff, int length) {
int aom_satd_c(const int16_t *coeff, int length) {
int i;
int satd = 0;
for (i = 0; i < length; ++i)
satd += abs(coeff[i]);
for (i = 0; i < length; ++i) satd += abs(coeff[i]);
// satd: 26 bits, dynamic range [-32640 * 1024, 32640 * 1024]
return satd;
@ -132,7 +134,7 @@ int vpx_satd_c(const int16_t *coeff, int length) {
// Integer projection onto row vectors.
// height: value range {16, 32, 64}.
void vpx_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref,
void aom_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref,
const int ref_stride, const int height) {
int idx;
const int norm_factor = height >> 1;
@ -140,8 +142,7 @@ void vpx_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref,
int i;
hbuf[idx] = 0;
// hbuf[idx]: 14 bit, dynamic range [0, 16320].
for (i = 0; i < height; ++i)
hbuf[idx] += ref[i * ref_stride];
for (i = 0; i < height; ++i) hbuf[idx] += ref[i * ref_stride];
// hbuf[idx]: 9 bit, dynamic range [0, 510].
hbuf[idx] /= norm_factor;
++ref;
@ -149,20 +150,18 @@ void vpx_int_pro_row_c(int16_t hbuf[16], const uint8_t *ref,
}
// width: value range {16, 32, 64}.
int16_t vpx_int_pro_col_c(const uint8_t *ref, const int width) {
int16_t aom_int_pro_col_c(const uint8_t *ref, const int width) {
int idx;
int16_t sum = 0;
// sum: 14 bit, dynamic range [0, 16320]
for (idx = 0; idx < width; ++idx)
sum += ref[idx];
for (idx = 0; idx < width; ++idx) sum += ref[idx];
return sum;
}
// ref: [0 - 510]
// src: [0 - 510]
// bwl: {2, 3, 4}
int vpx_vector_var_c(const int16_t *ref, const int16_t *src,
const int bwl) {
int aom_vector_var_c(const int16_t *ref, const int16_t *src, const int bwl) {
int i;
int width = 4 << bwl;
int sse = 0, mean = 0, var;
@ -178,42 +177,44 @@ int vpx_vector_var_c(const int16_t *ref, const int16_t *src,
return var;
}
void vpx_minmax_8x8_c(const uint8_t *s, int p, const uint8_t *d, int dp,
int *min, int *max) {
void aom_minmax_8x8_c(const uint8_t *src, int src_stride, const uint8_t *ref,
int ref_stride, int *min, int *max) {
int i, j;
*min = 255;
*max = 0;
for (i = 0; i < 8; ++i, s += p, d += dp) {
for (i = 0; i < 8; ++i, src += src_stride, ref += ref_stride) {
for (j = 0; j < 8; ++j) {
int diff = abs(s[j]-d[j]);
int diff = abs(src[j] - ref[j]);
*min = diff < *min ? diff : *min;
*max = diff > *max ? diff : *max;
}
}
}
#if CONFIG_VP9_HIGHBITDEPTH
unsigned int vpx_highbd_avg_8x8_c(const uint8_t *s8, int p) {
#if CONFIG_AOM_HIGHBITDEPTH
unsigned int aom_highbd_avg_8x8_c(const uint8_t *src, int stride) {
int i, j;
int sum = 0;
const uint16_t* s = CONVERT_TO_SHORTPTR(s8);
for (i = 0; i < 8; ++i, s+=p)
for (j = 0; j < 8; sum += s[j], ++j) {}
return (sum + 32) >> 6;
const uint16_t *s = CONVERT_TO_SHORTPTR(src);
for (i = 0; i < 8; ++i, s += stride)
for (j = 0; j < 8; sum += s[j], ++j) {
}
unsigned int vpx_highbd_avg_4x4_c(const uint8_t *s8, int p) {
return ROUND_POWER_OF_TWO(sum, 6);
}
unsigned int aom_highbd_avg_4x4_c(const uint8_t *src, int stride) {
int i, j;
int sum = 0;
const uint16_t* s = CONVERT_TO_SHORTPTR(s8);
for (i = 0; i < 4; ++i, s+=p)
for (j = 0; j < 4; sum += s[j], ++j) {}
return (sum + 8) >> 4;
const uint16_t *s = CONVERT_TO_SHORTPTR(src);
for (i = 0; i < 4; ++i, s += stride)
for (j = 0; j < 4; sum += s[j], ++j) {
}
void vpx_highbd_minmax_8x8_c(const uint8_t *s8, int p, const uint8_t *d8,
return ROUND_POWER_OF_TWO(sum, 4);
}
void aom_highbd_minmax_8x8_c(const uint8_t *s8, int p, const uint8_t *d8,
int dp, int *min, int *max) {
int i, j;
const uint16_t *s = CONVERT_TO_SHORTPTR(s8);
@ -228,6 +229,4 @@ void vpx_highbd_minmax_8x8_c(const uint8_t *s8, int p, const uint8_t *d8,
}
}
}
#endif // CONFIG_VP9_HIGHBITDEPTH
#endif // CONFIG_AOM_HIGHBITDEPTH

240
aom_dsp/bitreader.h Normal file
View File

@ -0,0 +1,240 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITREADER_H_
#define AOM_DSP_BITREADER_H_
#include <assert.h>
#include <limits.h>
#include "./aom_config.h"
#if CONFIG_EC_ADAPT && !CONFIG_EC_MULTISYMBOL
#error "CONFIG_EC_ADAPT is enabled without enabling CONFIG_EC_MULTISYMBOL."
#endif
#include "aom/aomdx.h"
#include "aom/aom_integer.h"
#if CONFIG_ANS
#include "aom_dsp/ansreader.h"
#elif CONFIG_DAALA_EC
#include "aom_dsp/daalaboolreader.h"
#else
#include "aom_dsp/dkboolreader.h"
#endif
#include "aom_dsp/prob.h"
#include "av1/common/odintrin.h"
#if CONFIG_ACCOUNTING
#include "av1/common/accounting.h"
#define ACCT_STR_NAME acct_str
#define ACCT_STR_PARAM , const char *ACCT_STR_NAME
#define ACCT_STR_ARG(s) , s
#else
#define ACCT_STR_PARAM
#define ACCT_STR_ARG(s)
#endif
#define aom_read(r, prob, ACCT_STR_NAME) \
aom_read_(r, prob ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_bit(r, ACCT_STR_NAME) \
aom_read_bit_(r ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_tree(r, tree, probs, ACCT_STR_NAME) \
aom_read_tree_(r, tree, probs ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_literal(r, bits, ACCT_STR_NAME) \
aom_read_literal_(r, bits ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_tree_bits(r, tree, probs, ACCT_STR_NAME) \
aom_read_tree_bits_(r, tree, probs ACCT_STR_ARG(ACCT_STR_NAME))
#define aom_read_symbol(r, cdf, nsymbs, ACCT_STR_NAME) \
aom_read_symbol_(r, cdf, nsymbs ACCT_STR_ARG(ACCT_STR_NAME))
#ifdef __cplusplus
extern "C" {
#endif
#if CONFIG_ANS
typedef struct AnsDecoder aom_reader;
#elif CONFIG_DAALA_EC
typedef struct daala_reader aom_reader;
#else
typedef struct aom_dk_reader aom_reader;
#endif
static INLINE int aom_reader_init(aom_reader *r, const uint8_t *buffer,
size_t size, aom_decrypt_cb decrypt_cb,
void *decrypt_state) {
#if CONFIG_ANS
(void)decrypt_cb;
(void)decrypt_state;
assert(size <= INT_MAX);
return ans_read_init(r, buffer, size);
#elif CONFIG_DAALA_EC
(void)decrypt_cb;
(void)decrypt_state;
return aom_daala_reader_init(r, buffer, size);
#else
return aom_dk_reader_init(r, buffer, size, decrypt_cb, decrypt_state);
#endif
}
static INLINE const uint8_t *aom_reader_find_end(aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "Use the raw buffer size with ANS");
return NULL;
#elif CONFIG_DAALA_EC
return aom_daala_reader_find_end(r);
#else
return aom_dk_reader_find_end(r);
#endif
}
static INLINE int aom_reader_has_error(aom_reader *r) {
#if CONFIG_ANS
return ans_reader_has_error(r);
#elif CONFIG_DAALA_EC
return aom_daala_reader_has_error(r);
#else
return aom_dk_reader_has_error(r);
#endif
}
// Returns the position in the bit reader in bits.
static INLINE uint32_t aom_reader_tell(const aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "aom_reader_tell() is unimplemented for ANS");
return 0;
#elif CONFIG_DAALA_EC
return aom_daala_reader_tell(r);
#else
return aom_dk_reader_tell(r);
#endif
}
// Returns the position in the bit reader in 1/8th bits.
static INLINE uint32_t aom_reader_tell_frac(const aom_reader *r) {
#if CONFIG_ANS
(void)r;
assert(0 && "aom_reader_tell_frac() is unimplemented for ANS");
return 0;
#elif CONFIG_DAALA_EC
return aom_daala_reader_tell_frac(r);
#else
return aom_dk_reader_tell_frac(r);
#endif
}
#if CONFIG_ACCOUNTING
static INLINE void aom_process_accounting(const aom_reader *r ACCT_STR_PARAM) {
if (r->accounting != NULL) {
uint32_t tell_frac;
tell_frac = aom_reader_tell_frac(r);
aom_accounting_record(r->accounting, ACCT_STR_NAME,
tell_frac - r->accounting->last_tell_frac);
r->accounting->last_tell_frac = tell_frac;
}
}
#endif
static INLINE int aom_read_(aom_reader *r, int prob ACCT_STR_PARAM) {
int ret;
#if CONFIG_ANS
ret = uabs_read(r, prob);
#elif CONFIG_DAALA_EC
ret = aom_daala_read(r, prob);
#else
ret = aom_dk_read(r, prob);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
static INLINE int aom_read_bit_(aom_reader *r ACCT_STR_PARAM) {
int ret;
#if CONFIG_ANS
ret = uabs_read_bit(r); // Non trivial optimization at half probability
#else
ret = aom_read(r, 128, NULL); // aom_prob_half
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
static INLINE int aom_read_literal_(aom_reader *r, int bits ACCT_STR_PARAM) {
int literal = 0, bit;
for (bit = bits - 1; bit >= 0; bit--) literal |= aom_read_bit(r, NULL) << bit;
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return literal;
}
static INLINE int aom_read_tree_bits_(aom_reader *r, const aom_tree_index *tree,
const aom_prob *probs ACCT_STR_PARAM) {
aom_tree_index i = 0;
while ((i = tree[i + aom_read(r, probs[i >> 1], NULL)]) > 0) continue;
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return -i;
}
static INLINE int aom_read_tree_(aom_reader *r, const aom_tree_index *tree,
const aom_prob *probs ACCT_STR_PARAM) {
int ret;
#if CONFIG_DAALA_EC
ret = daala_read_tree_bits(r, tree, probs);
#else
ret = aom_read_tree_bits(r, tree, probs, NULL);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
#if CONFIG_EC_MULTISYMBOL
static INLINE int aom_read_symbol_(aom_reader *r, aom_cdf_prob *cdf,
int nsymbs ACCT_STR_PARAM) {
int ret;
#if CONFIG_RANS
(void)nsymbs;
ret = rans_read(r, cdf);
#elif CONFIG_DAALA_EC
ret = daala_read_symbol(r, cdf, nsymbs);
#else
#error \
"CONFIG_EC_MULTISYMBOL is selected without a valid backing entropy " \
"coder. Enable daala_ec or ans for a valid configuration."
#endif
#if CONFIG_EC_ADAPT
update_cdf(cdf, ret, nsymbs);
#endif
#if CONFIG_ACCOUNTING
if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
#endif
return ret;
}
#endif // CONFIG_EC_MULTISYMBOL
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITREADER_H_

View File

@ -0,0 +1,47 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include "./aom_config.h"
#include "./bitreader_buffer.h"
size_t aom_rb_bytes_read(struct aom_read_bit_buffer *rb) {
return (rb->bit_offset + 7) >> 3;
}
int aom_rb_read_bit(struct aom_read_bit_buffer *rb) {
const size_t off = rb->bit_offset;
const size_t p = off >> 3;
const int q = 7 - (int)(off & 0x7);
if (rb->bit_buffer + p < rb->bit_buffer_end) {
const int bit = (rb->bit_buffer[p] >> q) & 1;
rb->bit_offset = off + 1;
return bit;
} else {
rb->error_handler(rb->error_handler_data);
return 0;
}
}
int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits) {
int value = 0, bit;
for (bit = bits - 1; bit >= 0; bit--) value |= aom_rb_read_bit(rb) << bit;
return value;
}
int aom_rb_read_signed_literal(struct aom_read_bit_buffer *rb, int bits) {
const int value = aom_rb_read_literal(rb, bits);
return aom_rb_read_bit(rb) ? -value : value;
}
int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits) {
const int nbits = sizeof(unsigned) * 8 - bits - 1;
const unsigned value = (unsigned)aom_rb_read_literal(rb, bits + 1) << nbits;
return ((int)value) >> nbits;
}

View File

@ -0,0 +1,48 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITREADER_BUFFER_H_
#define AOM_DSP_BITREADER_BUFFER_H_
#include <limits.h>
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef void (*aom_rb_error_handler)(void *data);
struct aom_read_bit_buffer {
const uint8_t *bit_buffer;
const uint8_t *bit_buffer_end;
size_t bit_offset;
void *error_handler_data;
aom_rb_error_handler error_handler;
};
size_t aom_rb_bytes_read(struct aom_read_bit_buffer *rb);
int aom_rb_read_bit(struct aom_read_bit_buffer *rb);
int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits);
int aom_rb_read_signed_literal(struct aom_read_bit_buffer *rb, int bits);
int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITREADER_BUFFER_H_

179
aom_dsp/bitwriter.h Normal file
View File

@ -0,0 +1,179 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITWRITER_H_
#define AOM_DSP_BITWRITER_H_
#include <assert.h>
#include "./aom_config.h"
#if CONFIG_EC_ADAPT && !CONFIG_EC_MULTISYMBOL
#error "CONFIG_EC_ADAPT is enabled without enabling CONFIG_EC_MULTISYMBOL"
#endif
#if CONFIG_ANS
#include "aom_dsp/buf_ans.h"
#elif CONFIG_DAALA_EC
#include "aom_dsp/daalaboolwriter.h"
#else
#include "aom_dsp/dkboolwriter.h"
#endif
#include "aom_dsp/prob.h"
#if CONFIG_RD_DEBUG
#include "av1/encoder/cost.h"
#endif
#ifdef __cplusplus
extern "C" {
#endif
#if CONFIG_ANS
typedef struct BufAnsCoder aom_writer;
#elif CONFIG_DAALA_EC
typedef struct daala_writer aom_writer;
#else
typedef struct aom_dk_writer aom_writer;
#endif
typedef struct TOKEN_STATS { int64_t cost; } TOKEN_STATS;
static INLINE void aom_start_encode(aom_writer *bc, uint8_t *buffer) {
#if CONFIG_ANS
(void)bc;
(void)buffer;
assert(0 && "buf_ans requires a more complicated startup procedure");
#elif CONFIG_DAALA_EC
aom_daala_start_encode(bc, buffer);
#else
aom_dk_start_encode(bc, buffer);
#endif
}
static INLINE void aom_stop_encode(aom_writer *bc) {
#if CONFIG_ANS
(void)bc;
assert(0 && "buf_ans requires a more complicated shutdown procedure");
#elif CONFIG_DAALA_EC
aom_daala_stop_encode(bc);
#else
aom_dk_stop_encode(bc);
#endif
}
static INLINE void aom_write(aom_writer *br, int bit, int probability) {
#if CONFIG_ANS
buf_uabs_write(br, bit, probability);
#elif CONFIG_DAALA_EC
aom_daala_write(br, bit, probability);
#else
aom_dk_write(br, bit, probability);
#endif
}
static INLINE void aom_write_record(aom_writer *br, int bit, int probability,
TOKEN_STATS *token_stats) {
aom_write(br, bit, probability);
#if CONFIG_RD_DEBUG
token_stats->cost += av1_cost_bit(probability, bit);
#else
(void)token_stats;
#endif
}
static INLINE void aom_write_bit(aom_writer *w, int bit) {
aom_write(w, bit, 128); // aom_prob_half
}
static INLINE void aom_write_bit_record(aom_writer *w, int bit,
TOKEN_STATS *token_stats) {
aom_write_record(w, bit, 128, token_stats); // aom_prob_half
}
static INLINE void aom_write_literal(aom_writer *w, int data, int bits) {
int bit;
for (bit = bits - 1; bit >= 0; bit--) aom_write_bit(w, 1 & (data >> bit));
}
static INLINE void aom_write_tree_bits(aom_writer *w, const aom_tree_index *tr,
const aom_prob *probs, int bits, int len,
aom_tree_index i) {
do {
const int bit = (bits >> --len) & 1;
aom_write(w, bit, probs[i >> 1]);
i = tr[i + bit];
} while (len);
}
static INLINE void aom_write_tree_bits_record(aom_writer *w,
const aom_tree_index *tr,
const aom_prob *probs, int bits,
int len, aom_tree_index i,
TOKEN_STATS *token_stats) {
do {
const int bit = (bits >> --len) & 1;
aom_write_record(w, bit, probs[i >> 1], token_stats);
i = tr[i + bit];
} while (len);
}
static INLINE void aom_write_tree(aom_writer *w, const aom_tree_index *tree,
const aom_prob *probs, int bits, int len,
aom_tree_index i) {
#if CONFIG_DAALA_EC
daala_write_tree_bits(w, tree, probs, bits, len, i);
#else
aom_write_tree_bits(w, tree, probs, bits, len, i);
#endif
}
static INLINE void aom_write_tree_record(aom_writer *w,
const aom_tree_index *tree,
const aom_prob *probs, int bits,
int len, aom_tree_index i,
TOKEN_STATS *token_stats) {
#if CONFIG_DAALA_EC
(void)token_stats;
daala_write_tree_bits(w, tree, probs, bits, len, i);
#else
aom_write_tree_bits_record(w, tree, probs, bits, len, i, token_stats);
#endif
}
#if CONFIG_EC_MULTISYMBOL
static INLINE void aom_write_symbol(aom_writer *w, int symb, aom_cdf_prob *cdf,
int nsymbs) {
#if CONFIG_RANS
struct rans_sym s;
(void)nsymbs;
assert(cdf);
s.cum_prob = symb > 0 ? cdf[symb - 1] : 0;
s.prob = cdf[symb] - s.cum_prob;
buf_rans_write(w, &s);
#elif CONFIG_DAALA_EC
daala_write_symbol(w, symb, cdf, nsymbs);
#else
#error \
"CONFIG_EC_MULTISYMBOL is selected without a valid backing entropy " \
"coder. Enable daala_ec or ans for a valid configuration."
#endif
#if CONFIG_EC_ADAPT
update_cdf(cdf, symb, nsymbs);
#endif
}
#endif // CONFIG_EC_MULTISYMBOL
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITWRITER_H_

View File

@ -0,0 +1,43 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <limits.h>
#include <stdlib.h>
#include "./aom_config.h"
#include "./bitwriter_buffer.h"
size_t aom_wb_bytes_written(const struct aom_write_bit_buffer *wb) {
return wb->bit_offset / CHAR_BIT + (wb->bit_offset % CHAR_BIT > 0);
}
void aom_wb_write_bit(struct aom_write_bit_buffer *wb, int bit) {
const int off = (int)wb->bit_offset;
const int p = off / CHAR_BIT;
const int q = CHAR_BIT - 1 - off % CHAR_BIT;
if (q == CHAR_BIT - 1) {
wb->bit_buffer[p] = bit << q;
} else {
wb->bit_buffer[p] &= ~(1 << q);
wb->bit_buffer[p] |= bit << q;
}
wb->bit_offset = off + 1;
}
void aom_wb_write_literal(struct aom_write_bit_buffer *wb, int data, int bits) {
int bit;
for (bit = bits - 1; bit >= 0; bit--) aom_wb_write_bit(wb, (data >> bit) & 1);
}
void aom_wb_write_inv_signed_literal(struct aom_write_bit_buffer *wb, int data,
int bits) {
aom_wb_write_literal(wb, data, bits + 1);
}

View File

@ -0,0 +1,39 @@
/*
* Copyright (c) 2016, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
* was not distributed with this source code in the LICENSE file, you can
* obtain it at www.aomedia.org/license/software. If the Alliance for Open
* Media Patent License 1.0 was not distributed with this source code in the
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#ifndef AOM_DSP_BITWRITER_BUFFER_H_
#define AOM_DSP_BITWRITER_BUFFER_H_
#include "aom/aom_integer.h"
#ifdef __cplusplus
extern "C" {
#endif
struct aom_write_bit_buffer {
uint8_t *bit_buffer;
size_t bit_offset;
};
size_t aom_wb_bytes_written(const struct aom_write_bit_buffer *wb);
void aom_wb_write_bit(struct aom_write_bit_buffer *wb, int bit);
void aom_wb_write_literal(struct aom_write_bit_buffer *wb, int data, int bits);
void aom_wb_write_inv_signed_literal(struct aom_write_bit_buffer *wb, int data,
int bits);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // AOM_DSP_BITWRITER_BUFFER_H_

Some files were not shown because too many files have changed in this diff Show More