Compare commits

..

1129 Commits

Author SHA1 Message Date
Yunqing Wang
fbad09f465 Hide global symbols for macho32/64
Added hiding global symbols for macho32 and macho64 in x86inc.asm.
This was done to fix exported symbol issue in Chrome build.

Change-Id: I08d5c559b985b82f655b537469fee125615e78c0
2013-10-30 13:12:15 -07:00
Dmitry Kovalev
300ae1b668 Correct handling of show_bit in uncompressed header.
"keyframe" variable in the current code actually means that previous
frame is a keyframe because cm->frame_type has not been initialized
in read_uncompressed_header.

Change-Id: I5645b0816c70abdef5dfc70113018d06276dac77
2013-10-29 13:59:35 -07:00
Yaowu Xu
91745bcff9 changed to comply with strict aliasing rule
The clamp operation may not affect the values of the final assigned mv
where compiler may make use of strict aliasing rule to optimize out the
clamp operation. This change made the code segments to better comply
the strict aliasing rule.

Change-Id: I24502ff18bd4f9e62507a879cc8760a91a0fd07e
2013-10-29 11:34:19 -07:00
Yaowu Xu
07b0dab5df Converted assert to error checking
Change-Id: Icb8c677f910f588cc7c97e70f024787fe6789257
2013-10-29 10:28:25 -07:00
Yaowu Xu
518904926d Added checking for invalid size
Change-Id: I9672a61e60a26e2934796f088880ce4cb49605be
2013-10-29 10:28:25 -07:00
Yaowu Xu
225aafcf9e Converted assertion to returning error
Assertion happens for invalid input data, the commit replace the
assertion with returning error.

Change-Id: I1b73ae752d64882d984cd23936efe75a757c2b41
2013-10-29 10:28:25 -07:00
Yaowu Xu
b90d51fee6 Added trap for invalid key frame
Change-Id: I698e8df9b336d38bffe01e656acba00d4003695f
2013-10-29 10:28:25 -07:00
Yaowu Xu
cd4bac3004 Prevent access to invalid pointer
The commit added check to make sure no invalid memory access even when
the decoder instance is never initialized.

Change-Id: I4da343d0b3c78c27777ac7f5ce7688562c69f0c5
2013-10-29 10:28:25 -07:00
Yaowu Xu
2f4a2a1268 Add clamp to prevent out of bound access
For bad input data, the decoder may access the array out of bounds. The
commit added clamp to prevent such out of bound access

Change-Id: I0a1cfd9b8786ea7113a998053c76605c963b077a
2013-10-29 10:28:08 -07:00
Dmitry Kovalev
e36425e637 Adding assign_mv() function to reduce code duplication.
Change-Id: I2b4e5b842c19f64749b18946ad215c0caa57e7b7
2013-10-28 16:38:28 -07:00
Dmitry Kovalev
9281f7c278 Reading diff update flag inside vp9_diff_update_prob.
Change-Id: I5ae659c1bfb132428a7272d094b5287d144ec7c8
2013-10-28 16:37:25 -07:00
Dmitry Kovalev
3680f98b1b BITSTREAM - "update_map" SEMANTICS BROKEN IN 398ddafb62
This patch reverts old commit 398ddafb62
"New way of updating last frame segmentation map.".

Change-Id: Iba730f433c30ed7f5e5449d6768049cbf9a2b2c5
2013-10-28 16:37:21 -07:00
Dmitry Kovalev
8fa0ca3b20 BITSTREAM - RESTORING BILINEAR INTERPOLATION FILTER SUPPORT
Adding appropriate test vector vp90-2-06-bilinear.webm.

Change-Id: Ia3bbf57318e0cc61a1b724fe751e3f9c7e11b337
2013-10-28 16:37:18 -07:00
Jingning Han
42f31923b0 BITSTREAM - CLARIFICATION OF MV SIZE RANGE
The codec should effectively run with motion vector of range (-2048, 2047)
in full pixels, for sequences of 1080p and below. Add assertions to clarify
this behavior.

Change-Id: Ia0cac28249f587d8f8882205228fa480263ab313
2013-10-28 16:37:14 -07:00
Dmitry Kovalev
a8a3966174 Adding read_intra_mode_{y, uv} functions for clarity.
Change-Id: I92fd32476c472e54f52b8d7602a98262b25e6eaf
2013-10-28 16:37:05 -07:00
Yaowu Xu
8ab0a2031b fix build with MSVC
near is a key word, changed to use nearmv instead.

Change-Id: Ib54438c431b2b2521a62fc7b61a9c127dd7bc01e
2013-10-28 16:36:41 -07:00
Dmitry Kovalev
16bb20631b Using array of motion vectors instead of separate variables.
Change-Id: I7380a089105f658257bbb3e30a525da168e76952
2013-10-28 16:36:06 -07:00
Dmitry Kovalev
8f83c132ff New way of updating last frame segmentation map.
Implementing more natural (and faster) way of updating last frame
segmentation map.

Change-Id: I9fefa8f78e77bd7948133b04173da45edc15a17e
2013-10-28 16:32:50 -07:00
Jingning Han
4beb889f99 Remove redundant mode update in sub8x8 decoding
The probability model used to code prediction mode is conditioned
on the immediate above and left 8x8 blocks' prediction modes. When
the above/left block is coded in sub8x8 mode, we use the prediction
mode of the bottom-right sub8x8 block as the reference to generate
the context.

This commit moves the update of mbmi.mode out of the sub8x8 decoding
loop, hence removing redundant update steps and keeping the bottom-
right block's mode for the decoding process of next blocks.

Change-Id: I1e8d749684d201c1a1151697621efa5d569218b6
2013-10-28 16:31:21 -07:00
Yunqing Wang
a7b7f94ae8 Merge "Fix x86inc.asm to build PIC code correctly" 2013-09-18 14:51:31 -07:00
Yunqing Wang
9d901217c6 Fix x86inc.asm to build PIC code correctly
Current x86inc.asm didn't handle 32bit PIC build properly.
TEXTRELs were seen in the library built. The PIC macros from
libvpx's x86_abi_support.asm was used to fix this problem.
The assembly code was modified to use the macros.

Notes: We need this fix in for decoder building. Functions in
encoder will be fixed later.

Change-Id: Ifa548d37b1d0bc7d0528db75009cc18cd5eb1838
2013-09-18 13:45:46 -07:00
Adrian Grange
bb30fff978 Merge "Modified resize unit test to output test vector" 2013-09-18 08:36:00 -07:00
Yaowu Xu
85fd8bdb01 Merge "Silence a bunch of MSVC warnings" 2013-09-17 17:10:58 -07:00
Jingning Han
c437bbcde0 Clean up second ref check in sub8x8 rd loop
This commit cleans up the second reference check in the
rate-distortion optimization loop of sub8x8 blocks.

Change-Id: Ife68feaa4cddbfad2878c9b44d3012788d634f97
2013-09-17 15:59:49 -07:00
Adrian Grange
88c8ff2508 Modified resize unit test to output test vector
Modified the resize unit test so that it optionally
writes the encoded bitstream to file. The macro
WRITE_COMPRESSED_STREAM should be set to 1 to enable
output of the test bitstream; it is set to 0 by default.

Change-Id: I7d436b1942f935da97db6d84574a98d379f57fb1
2013-09-17 15:38:30 -07:00
Yaowu Xu
a783da80e7 Silence a bunch of MSVC warnings
Change-Id: I16633269582a640809dca27572bbe99efa6369fc
2013-09-17 12:08:51 -07:00
Jingning Han
2b3bfaa9ce Remove redundant argument in get_sub_block_mv
The sub8x8 check can be directly inferred from block_idx, hence
removed from the arguments if get_sub_block_mv.

Change-Id: Ib766d57e81248fb92df0f6d9b163e6c77b933ccd
2013-09-17 12:08:45 -07:00
Paul Wilkins
84758960db Merge "Minor clean up." 2013-09-17 03:39:24 -07:00
Paul Wilkins
90a52694f3 Merge "Adjustment to mode_skip_start." 2013-09-17 03:39:15 -07:00
Adrian Grange
f582aa6eda Merge "Fix failure to copy data files if content changes" 2013-09-16 17:20:59 -07:00
Adrian Grange
5b23666e67 Fix failure to copy data files if content changes
Jenkins was failing to detect the case where an existing
file is recreated with new content. In this case, thinking
that the file already existed, Jenkins did not re-copy the
file as it should have.

By adding the file test-data.sha1 as a dependendency to
the LIBVPX_TEST_DATA build target the files will be
recopied if the MD5 of an existing file changes.

This could be further improved to only copy files that
have changed rather than copying the whole set as done in
this patch.

(Thanks to jzern@ who diagnozed ithe problem and suggested
this fix).

Change-Id: Icea7c61a95189bc639fec83020c28c70da5b2b41
2013-09-16 16:14:39 -07:00
hkuang
cbf394574d Merge "Speed up iht8x8 by rearranging instructions. Speed improves from 282% to 302% faster based on assembly-perf." 2013-09-16 14:39:45 -07:00
hkuang
23e1a29fc7 Speed up iht8x8 by rearranging instructions.
Speed improves from 282% to 302% faster based on assembly-perf.

Change-Id: I08c5c1a542d43361611198f750b725e4303d19e2
2013-09-16 14:23:26 -07:00
Yaowu Xu
eeae6f946d fix a problem where an invalid mv used in search
The commit added reset of pred_mv at the beginning of each SB64x64
partition mv search, also limited the usage of pred_mv only when
search on the largest partition is already done. This is to fix
a crash at speed 1/2 encoder where an invalid mv is used in mv
search.

Change-Id: I39010177da76d054e3c90b7899a44feb2e3a5b1b
2013-09-16 12:49:27 -07:00
Paul Wilkins
cb50dc7f33 Minor clean up.
Removed some unused code and minor cleanup
/ reordering.

Change-Id: I4083ae56aeb8edfe9b85aa2f42a16aa28d19da94
2013-09-16 13:45:20 +01:00
Paul Wilkins
3b01778450 Adjustment to mode_skip_start.
Corrected values relating to modified mode order.

Change-Id: I24fccba3af4bc16721d5e7e51888a66305bfa7fe
2013-09-16 13:44:48 +01:00
James Zern
c73e4412b3 Merge "Revert "Improved 8t filters"" 2013-09-13 16:06:27 -07:00
Yaowu Xu
9ae985b23a Merge "Minor adjustment in unit tests" 2013-09-13 15:20:24 -07:00
James Zern
2d58761993 Revert "Improved 8t filters"
This is incompatible with most toolchains other than gcc.

Revert "Deleted #include <inttypes.h>"

This reverts commit 4d018be950.

This reverts commit d22a504d11.

Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
2013-09-13 15:13:06 -07:00
Jingning Han
e8a967d960 Merge "Adaptive motion search control" 2013-09-13 14:43:23 -07:00
Jingning Han
c4826c5941 Adaptive motion search control
This commit enables adaptive constraint on motion search range for
smaller partitions, given the motion vectors of collocated larger
partition as a candidate initial search point.

It makes speed 0 runtime of bus at CIF and 2000 kbps goes from
167s down to 162s (3% speed-up), at 0.01dB performance gains. In
the settings of speed 1, this makes the runtime goes from 33687 ms
to 32142 ms (4.5% speed-up), at 0.03dB performance gains.

Compression performance wise, it gains at speed 1:
derf  0.118%
yt    0.237%
hd    0.203%
stdhd 0.438%

Change-Id: Ic8b34c67810d9504a9579bef2825d3fa54b69454
2013-09-13 13:58:10 -07:00
Deb Mukherjee
0c3038234d Merge "Clean up of the search best filter speed feature" 2013-09-13 11:03:59 -07:00
Yaowu Xu
040ffb6326 Minor adjustment in unit tests
The CpuSpeedTest is extended to cover 2pass good quality with CpuUsed
from 0 to 4. The BordersTest is changed to use CpuUsed 1 for faster
turn around.

Change-Id: I005e89adee7fe63af4b1f2a76a3a13ea826feadf
2013-09-13 09:32:16 -07:00
Paul Wilkins
5d8642354e Merge "Fix VP9_mode_order[]" 2013-09-13 09:19:31 -07:00
Scott LaVarnway
8fc95a1b11 Merge "New mode_info_context storage -- undo revert" 2013-09-13 08:56:20 -07:00
Paul Wilkins
1407cf8588 Fix VP9_mode_order[]
Mis-merge of the following change managed to break mode order
and delete two mode options (new alt ref and near alt ref)
It also created a situation where we could test two undefined
modes off the end of the VP9_mode_order[] data structure.
  "clang warnings : remove split and i4x4_pred fake modes"
  "Change Id: I8ef3c*"

Initial testing on Akiyo at speed 2.
101.35	 44.567	 44.447 improves to
96.82	 44.915	 44.815

Approx 0.3-0.4db gain and 2.5% size reduction

Change-Id: Icff813e7c0778d140ad4f0eea18cf1ed203c4e34
2013-09-13 13:33:26 +01:00
Paul Wilkins
9c9a3b2775 Merge "Deleted #include <inttypes.h>" 2013-09-13 01:05:31 -07:00
Jim Bankoski
324ebb704a Merge "fix clang warning in rdopt" 2013-09-12 16:39:05 -07:00
hkuang
86fb12b600 Merge "Add neon optimize iht8x8 which is 282% faster than C." 2013-09-12 15:42:44 -07:00
Christian Duvivier
25655e5794 Merge "First draft of vp9_short_idct32x32_add_neon." 2013-09-12 14:23:00 -07:00
hkuang
182366c736 Add neon optimize iht8x8 which is 282% faster than C.
Change-Id: I963dd4a6e8671957403ccbb9a16ea7de703e3530
2013-09-12 11:49:05 -07:00
Jim Bankoski
9ee9918dad fix clang warning in rdopt
either missed this or it crept back in

Change-Id: I6cc1519d09e558be7250254c25bde2ae720555ea
2013-09-12 06:39:42 -07:00
Jim Bankoski
e7f2aa0fb8 clang warnings : ref frame enum mismatch
Convert from refframe_type_t to VP9_REFFRAME

Change-Id: Iff4043c3fdb3e1c9c2b412bdffd5da8ed913ec13
2013-09-12 06:29:07 -07:00
Jim Bankoski
cddde51ec5 Merge "clang warnings : remove split and i4x4_pred fake modes" 2013-09-12 06:20:45 -07:00
Paul Wilkins
4d018be950 Deleted #include <inttypes.h>
This seems not to be needed and is not supported
in the Windows build.

Change-Id: Iaca3bbf8cca283aee6bc336cb31ba9dd4610322b
2013-09-12 13:43:07 +01:00
Paul Wilkins
66755abff4 Merge "Changes in speed 2 settings" 2013-09-12 02:22:45 -07:00
Jim Bankoski
7fb42d909e clang warnings : remove split and i4x4_pred fake modes
Change-Id: I8ef3c7c0f08f0f1f4ccb8ea4deca4cd8143526ee
2013-09-11 16:34:55 -07:00
Christian Duvivier
6a501462f8 First draft of vp9_short_idct32x32_add_neon.
Lots of TODO which will be taken care in upcoming changes. As is,
about 6x faster than C version.

Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0
2013-09-11 15:19:38 -07:00
Deb Mukherjee
b964646756 Clean up of the search best filter speed feature
Removes this speed feature since it is very slow and unlikely
to be used in practice. This cleanup removes a bunch of unnecessary
complications in the outer encode loop.

Change-Id: I3c66ef1ca924fbfad7dadff297c9e7f652d308a1
2013-09-11 15:16:36 -07:00
Scott LaVarnway
23845947c4 Merge "Improved 8t filters" 2013-09-11 14:34:54 -07:00
Jim Bankoski
d09abfa9f7 Merge "resolve clang issue : implicit convert tx_mode -> tx_size" 2013-09-11 13:40:11 -07:00
Scott LaVarnway
d22a504d11 Improved 8t filters
Reformatted version of a patch submitted by Erik/Tamar
from Intel.  For the test clips used, the decoder
performance improved by ~2%.

Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
2013-09-11 13:56:32 -04:00
Deb Mukherjee
69fe840ec4 Changes in speed 2 settings
Propose some changes to the speed 2 settings to improve quality.
In particular, turns off the adjust_thresholds_by_speed feature
which improves results by 6%. Also removes the code for
adjust_thresholds_by_speed since it conflicts with the adaptive
rd thresh feature.

Overall, with this change speed 2 is -15.2% from speed 0 settings,
on derf, which is significantly better than -21.6% down before.

Change-Id: I6e90a563470979eb0c258ec32d6183ed7ce9a505
2013-09-11 10:54:07 -07:00
Scott LaVarnway
ac6093d179 New mode_info_context storage -- undo revert
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of pointers to MODE_INFO structs.  The
MODE_INFO structs are now stored as a stream (decoder only),
eliminating unnecessary copies and is a little more cache
friendly.

Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
2013-09-11 13:45:44 -04:00
Yunqing Wang
079183c1a8 code cleanup
Removed unused function.

Change-Id: Icb12a09e4d303968be6aec9fae1ef05935913a4f
2013-09-11 09:32:00 -07:00
Jingning Han
65fe7d7605 Merge "Remove redundant condition check in 32x32 quant" 2013-09-10 16:39:18 -07:00
James Zern
db487188b1 Merge "vpx_mem: increase default alignment" 2013-09-10 14:03:31 -07:00
Adrian Grange
321c2fd178 Merge "Enable and fix resize_test for VP9" 2013-09-10 12:46:38 -07:00
Jingning Han
cb24406da5 Merge "Remove the use of uninitialized_safe in encode_sb_" 2013-09-10 12:05:22 -07:00
Jingning Han
5d93feb6ad Remove redundant condition check in 32x32 quant
The c code implementation of 32x32 quantization does the zbin check
of all coefficients prior to the quant/dequant loop, hence removing
the redundant zbin check inside the loop. This only affects the
c code version. SSSE3 version does not separate the zbin check out.

Change-Id: Ic197a7d61d0b25fcac3cc092987651378cb56e4e
2013-09-10 12:04:33 -07:00
Adrian Grange
93ffd371eb Enable and fix resize_test for VP9
Added the resize_test unit test to the VP9 set.

Set g_in_frames = 0 to avoid a problem when the total
number of frames being encoded is smaller than
g_in_frames. In this case the test will not have
access to the encoded frames and will not be able to
compare them for testing for encoder/decoder mismatch.

Change-Id: I0d2ff8ef058de7002c5faa894ed6ea794d5f900b
2013-09-10 12:02:37 -07:00
Deb Mukherjee
3d22d3ae0c Merge "Small tweaks on the constant quality mode" 2013-09-10 11:16:47 -07:00
Deb Mukherjee
09830aa0ea Small tweaks on the constant quality mode
Improves results a little.
derf is now +1.078% over bitrate control.

Change-Id: I4812136f3e67be21d14ec089419976a32a841785
2013-09-10 10:16:19 -07:00
Yunqing Wang
0607abc3dd Stop partition checking when distortion is small
If the current obtained distortion is very small, which happens
for static image case, we pick the current partition type without
further split checking.

This won't affect regular videos. For static videos, we got 10%~12%
encoding speed gain. PSNR was better for some clips, and worse for
others. Overall it was even.

Change-Id: If787a57bedf46fc595ca4f5ded2b0c0a69e9fdef
2013-09-10 10:13:24 -07:00
Yunqing Wang
f6bc783d63 Merge "Modify encode breakout for static frames" 2013-09-10 10:04:30 -07:00
Yunqing Wang
939791a129 Modify encode breakout for static frames
Thank Paul for the suggestions. While turning on static-thresh
for static-image videos, a big jump on bitrate was seen. In this
patch, we detected static frames in the video using first-pass
stats. For different cases, disable encode breakout or reduce
encode breakout threshold to limit the skipping.

More modification need be done to break incorrect partition
picking pattern for static frames while skipping happens.

Change-Id: Ia25f47041af0f04e229c70a0185e12b0ffa6047f
2013-09-10 09:06:03 -07:00
Jingning Han
2873d5608b Merge "Enable accuracy/memory check for 16x16 transforms" 2013-09-10 09:05:34 -07:00
Jingning Han
87bc705fb5 Merge "Rework 16x16 transform unit test" 2013-09-10 09:05:04 -07:00
hkuang
f4a6f936b5 Merge "Speed up idct16x16 by rearrange instructions." 2013-09-10 08:23:57 -07:00
Paul Wilkins
4f660cc018 Modified mode skip functionality.
A previous speed feature skipped modes not used in earlier
partitions but this not longer worked as intended following
changes to the partition coding order and in conjunction
with some other speed features (Especially speed 2 and above).

This modified mode skip feature sets a mask after the first X
modes have been tested in each partition depending on the
reference frame of the current best case.

This patch also makes some changes to the order modes are
tested to fit better with this skip functionality.

Initial testing suggests speed and rd hit count improvements
of up to 20% at speed 1. Quality results. (derf -1.9%, std hd  +0.23%).

Change-Id: Idd8efa656cbc0c28f06d09690984c1f18b1115e1
2013-09-10 13:30:10 +01:00
Paul Wilkins
901c495482 Added extra check to rd_auto_partition_range()
Added check that the returned max and minimum are
valid in bottom and right border cases.

Change-Id: I2d6cdc9b5f04c7d0ff512ddcf3228331e028bf9b
2013-09-10 13:29:23 +01:00
James Zern
563c273738 test/idct_test: add missing vpx_integer.h include
Change-Id: I9de764638ec981bb34fc8e183985d8c285b006fb
2013-09-09 22:20:41 -07:00
hkuang
fc5ec206a7 Speed up idct16x16 by rearrange instructions.
Speed improve from 376% to 400% faster base on assembly-perf.

Change-Id: If0b2eccc39d5793dc101ce9feb7fcadf88396ea2
2013-09-09 18:00:13 -07:00
Jingning Han
37705a3bc5 Enable accuracy/memory check for 16x16 transforms
This commit completes the per coefficient accuracy check and memory
overflow check for SSE2 and other implemented versions of 16x16
transform.

Change-Id: If26a3e4f6ba82ccecc13f0b73cb8f7bb6ac14584
2013-09-09 17:07:55 -07:00
Ivan Maltz
20abe595ec Merge "API extensions and sample app for spacial scalable encoder" 2013-09-09 16:57:01 -07:00
Jingning Han
8f92a7efdb Rework 16x16 transform unit test
This commit refactors the 16x16 transform unit test. It enables the
test on all implemented versions of forward and inverse 16x16 transform
modules.

Change-Id: I0c7d5f3c5fdd5d789a25f73e287aeeaf463b9d69
2013-09-09 16:12:32 -07:00
Ivan Maltz
01b35c3c16 API extensions and sample app for spacial scalable encoder
Sample app: vp9_spatial_scalable_encoder
vpx_codec_control extensions:
  VP9E_SET_SVC
  VP9E_SET_WIDTH, VP9E_SET_HEIGHT, VP9E_SET_LAYER
  VP9E_SET_MIN_Q, VP9E_SET_MAX_Q
expanded buffer size for vp9_convolve

modified setting of initial width in vp9_onyx_if.c so that layer size
can be set prior to initial encode

Default number of layers set to 3 (VPX_SS_DEFAULT_LAYERS)
Number of layers set explicitly in vpx_codec_enc_cfg.ss_number_layers

Change-Id: I2c7a6fe6d665113671337032f7ad032430ac4197
2013-09-09 15:57:56 -07:00
Jingning Han
18c780a0ff Remove the use of uninitialized_safe in encode_sb_
Initialize the probability model context with default value in
encode_sb.

Change-Id: Id826114024dfc21c7ef41aea9f4a0316d4a5cb95
2013-09-09 15:41:16 -07:00
James Zern
c1913c9cf4 Merge "Revert "New mode_info_context storage"" 2013-09-09 14:38:01 -07:00
James Zern
54a03e20dd Revert "New mode_info_context storage"
This reverts commit dae17734ec

Encode crashes, leaks and increases integer overflow errors.

Change-Id: I595aa2649bb8d0b6552ff91652837a74c103fda2
2013-09-09 13:37:01 -07:00
Yaowu Xu
132ef4295a changed to enable vp9_postproc
In configure when internal-stats is enabled, because postprocessing
code is needed for computing stats for enabling internal-stats

Change-Id: I3601dc5a4aa65feb99465452486a21e75eb62c1f
2013-09-09 08:12:56 -07:00
Yaowu Xu
b19126b291 Merge "Reduce the amount of extension in src frames" 2013-09-09 08:09:56 -07:00
Paul Wilkins
740acd6891 Merge "Enable kf restrictions at speed 4" 2013-09-09 05:39:13 -07:00
Yaowu Xu
65c2444e15 Reduce the amount of extension in src frames
The commit changes the border pixel extension from 160 pixel each side
to what is necessary in arnr filter or motion estimation portion, i.e.
16 pixel on top and left side. For right or bottom side, the extension
is changed to either round up image size to multiple of 64 or at least
16 pixels.

Change-Id: Ic05e19b94368c1ab4df568723aae5734e6c3d2c5
2013-09-08 15:51:54 -07:00
Jim Bankoski
9faa7e8186 resolve clang issue : implicit convert tx_mode -> tx_size
Change-Id: Ifc9da470358f58e800e3d0d70a565b61e5f7834a
2013-09-08 07:17:12 -07:00
Jim Bankoski
e378566060 Merge "New mode_info_context storage" 2013-09-08 07:16:25 -07:00
Jingning Han
09bc942b47 Fix overflow issue in 16x16 quantization SSSE3
The 16x16 transform unit test suggested that the peak coefficient
value can reach 32639. This could cause potential overflow issue
in the SSSE3 implmentation of 16x16 block quantization. This commit
fixes this issue by replacing addition with saturated addition.

Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
2013-09-06 21:06:10 -07:00
James Zern
fb550ee620 vpx_mem: increase default alignment
this prevents returning an address smaller than the natural heap
alignment from vpx_malloc on e.g., x86_64

Change-Id: I283e858664a8529f28b22060c3815116a7798c0d
2013-09-06 19:11:51 -07:00
Deb Mukherjee
d1268c5921 Merge "Support a constant quality mode in VP9" 2013-09-06 11:22:54 -07:00
Paul Wilkins
f15cdc7451 Enable kf restrictions at speed 4
Change-Id: I453409d3be3f5fe118b15affde45cb52184aef20
2013-09-06 11:16:04 -07:00
Deb Mukherjee
e378a89bd6 Support a constant quality mode in VP9
Adds a new end-usage option for constant quality encoding in vpx. This
first version implemented for VP9, encodes all regular inter frames
using the quality specified in the --cq-level= option, while encoding
all key frames and golden/altref frames at a quality better than that.

The current performance on derfraw300 is +0.910% up from bitrate control,
but achieved without multiple recode loops per frame.

The decision for qp for each altref/golden/key frame will be improved
in subsequent patches based on better use of stats from the first pass.
Further, the qp for regular inter frames may also be varied around the
provided cq-level.

Change-Id: I6c4a2a68563679d60e0616ebcb11698578615fb3
2013-09-06 10:30:53 -07:00
Yaowu Xu
afffa3d9b0 cleanup cpplint warnings
Suggested by James Zern to clear out cpplint warnings for all unit
test code.

Change-Id: I731a3fa4d2a257eb9ef733426ba84286fbd7ea34
2013-09-06 10:13:49 -07:00
Scott LaVarnway
dae17734ec New mode_info_context storage
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of a pointer to a MODE_INFO struct and
a "in the image" flag.  The MODE_INFO structs are now stored
as a stream, eliminating unnecessary copies and is a little
more cache friendly.

For the test clips used, the decoder performance improved
by ~4.3% (1080p) and ~9.7% (720p).

Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
and 5.9% (720p).

Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
2013-09-06 12:33:34 -04:00
Jim Bankoski
e4e864586c Merge "fix loop filter setup_mask could reach out of bounds issue" 2013-09-06 06:21:28 -07:00
hkuang
3476404912 Merge "Speed up idct8x8 by rearrange instructions. Speed improve from 264% ~ 270% to 280% ~ 300% base on assembly-perf." 2013-09-05 17:37:13 -07:00
Jim Bankoski
736114f44b fix loop filter setup_mask could reach out of bounds issue
Change-Id: Ic8446c4f26b6782a6dc482c19ea73c77646df418
2013-09-05 15:53:31 -07:00
Jingning Han
170be56a74 Merge "Enable 32x32 Transform unit test" 2013-09-05 15:23:27 -07:00
Jingning Han
4ad52a8f18 Enable 32x32 Transform unit test
This commit enabled a full functional test on 32x32 forward/inverse
transform, including round-trip error and memory overflow check. It
tests the prototype functions in C and all other implementations if
applicable.

Change-Id: I9cc50b05abdb4863e7abbcb29209a19b1fe90da7
2013-09-05 14:46:51 -07:00
Jingning Han
1c263d6918 Merge "Use saturated addition in SSSE3 of 32x32 quant" 2013-09-05 14:09:40 -07:00
Jim Bankoski
2156ccaa4a Merge "resolve clang warnings : uninitialized vars in vp9_entropy.h" 2013-09-05 12:55:32 -07:00
Jingning Han
458c2833c0 Use saturated addition in SSSE3 of 32x32 quant
The 32x32 forward transform can potentially reach peak coefficient
value close to 32700, while the rounding factor can go upto 610.
This could cause overflow issue in the SSSE3 implementation of 32x32
quantization process.

This commit resolves this issue by replacing the addition operations
with saturated addition operations in 32x32 block quantization.

Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70
2013-09-05 12:49:12 -07:00
Jim Bankoski
9fc3d32a50 Merge "faster accounting of inc_mv" 2013-09-05 12:38:56 -07:00
Yaowu Xu
9158b8956f Merge "make bsize requirement for SEG_LVL_SKIP explicit" 2013-09-05 08:15:03 -07:00
Yaowu Xu
7bc775d93d Merge "Added ClearSystemState in a unit test" 2013-09-05 08:14:44 -07:00
Jim Bankoski
2e4ca9d1a5 resolve clang warnings : uninitialized vars in vp9_entropy.h
This helps clear out some of the warnings

Change-Id: Ie7ccaca8fd92542386a7f1b257398e1bdf2f55dc
2013-09-04 18:38:41 -07:00
Jim Bankoski
e8feb2932f Merge "wrap non420 loop filter code in macro" 2013-09-04 17:20:53 -07:00
Paul Wilkins
e5deed06c0 Merge "Attempt to fix speed 4" 2013-09-04 17:19:22 -07:00
Yaowu Xu
1ee66933c1 make bsize requirement for SEG_LVL_SKIP explicit
The segment feature SEG_LVL_SKIP requires the prediction unit size
to be at least BLOCK_8X8. This commit makes the requirement to be
explicit. This is to prevent future encoder implementations from
making wrong choices.

Change-Id: I0127f0bd4c66e130b81f0cb0a8d3dbfe3b2da5c2
2013-09-04 16:32:26 -07:00
hkuang
01c4e04424 Speed up idct8x8 by rearrange instructions.
Speed improve from 264% ~ 270% to 280% ~ 300% base on assembly-perf.

Change-Id: I3e2cc818ec14b432204ff43732f39b6438db685d
2013-09-04 15:57:22 -07:00
Yaowu Xu
e494df1a37 Added ClearSystemState in a unit test
There is another unit test that has been failing randomly on win32
build. Investigation has shown that the failure was caused by simd
register state is not reset appropriately in the fdct8x8 test. This
commit added ClearSystemState() in the teardown of this test, tests
showed it resolved the random failure issue for win32 build.

Related issue: https://code.google.com/p/webm/issues/detail?id=614

Change-Id: I9381d0c1a6f4b855ccaeef1aca8c417ac8c71ee2
2013-09-04 15:07:34 -07:00
Yaowu Xu
72872d3d8c Merge "Fixing problem with invalid delta_q reading." 2013-09-04 14:21:30 -07:00
hkuang
3c05bda058 Merge "Add neon optimize vp9_short_iht4x4_add." 2013-09-04 13:35:09 -07:00
hkuang
3b8614a8f6 Add neon optimize vp9_short_iht4x4_add.
Change-Id: I42c497b68ae1ee645b59c9968ad805db0a43e37e
2013-09-04 12:37:58 -07:00
Dmitry Kovalev
890eee3b47 Fixing problem with invalid delta_q reading.
This is a bitstream change but no currently produces videos should
be affected. https://code.google.com/p/webm/issues/detail?id=610

Change-Id: Ic85a6477df6c201cdf7f70f6bd84607b71f4593c
2013-09-04 11:25:43 -07:00
Yaowu Xu
76a437a31b Merge "Replacing init_dequantizer() with setup_plane_dequants()." 2013-09-04 10:42:12 -07:00
Jim Bankoski
872c6d85c0 Merge "speed up inc_mv_component" 2013-09-04 10:35:51 -07:00
Jim Bankoski
bb2313db28 Merge "make vp9 postproc a config option" 2013-09-04 10:35:26 -07:00
Yunqing Wang
9fd2767200 Merge "Use correct bit cost while static-thresh is on" 2013-09-04 10:26:37 -07:00
Jim Bankoski
c3c21e3c14 wrap non420 loop filter code in macro
Change-Id: I62bca0e7a4bffc1a78b750dbb9df9d2378e92423
2013-09-04 10:24:42 -07:00
Jim Bankoski
79401542f7 make vp9 postproc a config option
Vp9 postproc is disabled for now as its not been shown to help and
may be merged with vp8.

Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057
2013-09-04 10:02:08 -07:00
Jim Bankoski
532179e845 faster accounting of inc_mv
Moves counting of mv branches to where we have a new mv, instead of after
the whole frame is summed.

Change-Id: I945d9f6d9199ba2443fe816c92d5849340d17bbd
2013-09-04 09:47:57 -07:00
Dmitry Kovalev
d6606d1ea7 Replacing init_dequantizer() with setup_plane_dequants().
Change-Id: Ib67e996b4a6dcb6f481889f5a0d84811a9e3c5d1
2013-09-04 09:22:59 -07:00
Jim Bankoski
5dda1d2394 speed up inc_mv_component
Convert mv_class if statements to look up.  re order to avoid ifs...

Change-Id: I76966a21bf517bb1f9a7957c08c476c7bb3e9a63
2013-09-04 07:11:30 -07:00
James Zern
1cf2272347 Merge "Fix intermediate height in convolve_c" 2013-09-03 15:50:33 -07:00
Paul Wilkins
49317cddad Attempt to fix speed 4
Speed 4 fixed partition size. Use fixed size unless it does not
fit inside image, in which case use the largest size that does.

Change-Id: I250f7a80506750dd82ab355721624a1344247223
2013-09-03 17:46:25 +01:00
Jingning Han
010c0ad0eb Merge "Fix 32x32 forward transform SSE2 version" 2013-09-03 08:58:03 -07:00
Scott LaVarnway
948aaab4ca Merge "Improved mb_lpf_horizontal_edge_w_sse2_8" 2013-09-03 05:44:01 -07:00
Jingning Han
3cf46fa591 Fix 32x32 forward transform SSE2 version
This commit fixed the potential overflow issue in the SSE2
implementation of 32x32 forward DCT. It resolved the corrupted
coded frames in the border of scenes.

Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
2013-08-31 18:47:08 -07:00
Yunqing Wang
0ca7855f67 Use correct bit cost while static-thresh is on
While static-thresh is on, we only need to transmit skip
flag if skip = 1. The cost of skip bit is added to the
total rate cost.

Change-Id: I64e73e482bc297eba22907026298a15fa8cc3920
2013-08-30 15:25:13 -07:00
Paul Wilkins
2b9baca4f0 Merge "Added per pixel inter rd hit count stats" 2013-08-30 08:56:01 -07:00
Jingning Han
e22bb0dc8e Merge "Refactor 16x16 unit tests" 2013-08-30 08:53:19 -07:00
Tero Rintaluoma
e326cecf18 Fix intermediate height in convolve_c
- Intermediate height was not correct i.e. when block size is 4 and
  y_step_q4 is 6. In this case intermediate height was
  (4*6) >> 4 = 1 and vertical interpolation needs two source pixels
  plus 7 extra pixels for taps.
- Also if the current output block is 16x16 and we are using 4x upscaling
  we need only 12 rows after horizontal filtering instead of 16.

  Patch Set 2: Intermediate_height updated after CL 66723
               "Fix bug in convolution functions (filter selection)"

Change-Id: I5a1a1bc2ac9d5edb3a6e0818de618bf318fdd589
2013-08-30 10:31:21 +03:00
Jim Bankoski
1d44fc0c49 Merge "rework filter_block_plane" 2013-08-29 20:11:09 -07:00
Jim Bankoski
bc50961a74 rework filter_block_plane
Change-Id: I55c3b60c4c0f4910d3dfb70e3edaae00cfa8dc4d
2013-08-29 17:00:05 -07:00
Jingning Han
ec4b2742e7 Refactor 16x16 unit tests
Make the new test module comply to the unit test rules.

Change-Id: Id79ff7f03f870973ffbc74f26d64edb418b75299
2013-08-29 16:49:11 -07:00
Jingning Han
c86c5443eb Merge "Fix overflow issue in SSSE3 32x32 quantization" 2013-08-29 16:49:04 -07:00
Paul Wilkins
1f4bf79d65 Added per pixel inter rd hit count stats
Added some code to output normalized rd hit count stats.
In effect this approximates to the average number of rd
operations/tests per pixel for the sequence.

The results are not quite accurate and I have not bothered
to account for partial SB64s at frame edges and for key frames
However they do give some idea of the number of modes /
prediction methods being tested for each pixel across the
different partition sizes. This indicates how much scope their
is for further gains either by reducing the number of partitions
examined or the modes per partition through heuristics.

Patch 3 moved place where count incremented so partial rd
tests that are aborted with INT_MAX return are also counted.

Example numbers for first 50 frames of Akiyo.
Speed 0 ~84.4 rd operations / pixel
Speed 1 ~28.8
Speed 2 ~11.9

Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8
2013-08-30 00:13:51 +01:00
Deb Mukherjee
b6dbf11ed5 Merge "Adds a speed feature for fast 1-loop forw updates" 2013-08-29 15:54:04 -07:00
James Zern
e83e8f0426 Merge changes Ib1e853f9,Ifd75c809,If3e83404
* changes:
  consistently name VP9_COMMON variables #3
  consistently name VP9_COMMON variables #2
  consistently name VP9_COMMON variables #1
2013-08-29 15:50:56 -07:00
Yaowu Xu
ee961599e1 Merge "Fixed potential overflows" 2013-08-29 15:43:26 -07:00
James Zern
d765df2796 consistently name VP9_COMMON variables #3
stragglers

Change-Id: Ib1e853f9a331b7b66639dc34d79568d84d1930f1
2013-08-29 13:27:41 -07:00
James Zern
aa05321262 consistently name VP9_COMMON variables #2
oci -> cm

Change-Id: Ifd75c809d9cc99034d3c2fccc4653a78b3aec21f
2013-08-29 13:25:58 -07:00
James Zern
924d74516a consistently name VP9_COMMON variables #1
pc -> cm

Change-Id: If3e83404f574316fdd3b9aace2487b64efdb66f3
2013-08-29 13:25:57 -07:00
Dmitry Kovalev
e80bf802a9 Merge "Renaming txfm_size to tx_size." 2013-08-29 12:30:18 -07:00
Jingning Han
abff678866 Fix overflow issue in SSSE3 32x32 quantization
The 32x32 quantization process can potentially have the intermediate
stacks over 16-bit range, thereby causing enc/dec mismatch. This commit
fixes this overflow issue in the SSSE3 implementation, as well as the
prototype, of 32x32 quantization.

This fixes issue 607 from webm@googlecode.

Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806
2013-08-29 11:00:54 -07:00
Yaowu Xu
aaa7b44460 Fixed potential overflows
The two arrays are typically initialized to INT64_MAX, if they are not
filled with valid values before the addition, the values can overflow
and lead to wrong results.

Change-Id: I515de22cf3e8f55af4b74bdb2c8eb821a02d3059
2013-08-29 10:26:52 -07:00
Scott LaVarnway
22dc946a7e Improved mb_lpf_horizontal_edge_w_sse2_8
This patch is a reformatted version of optimizations done by
engineers at Intel (Erik/Tamar) who have been providing
performance feedback for VP9.  For the test clips used (720p, 1080p),
up to 1.2% performance improvement was seen.

Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2
2013-08-29 08:30:17 -04:00
Dmitry Kovalev
b71807082c Merge "General code cleanup." 2013-08-28 12:57:49 -07:00
Dmitry Kovalev
db20806710 Merge "Removing unnecessary call to vp9_setup_interp_filters." 2013-08-28 12:31:08 -07:00
Dmitry Kovalev
b62ddd5f8b General code cleanup.
Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
redundant code, adding const.

Change-Id: Iaab2207590fd24d0b76999071778d1395dc5cd5d
2013-08-28 12:22:37 -07:00
Deb Mukherjee
e02dc84c1a Adds a speed feature for fast 1-loop forw updates
Incorporates a speed feature for fast forward updates of
coefficients. This feature takes 3 values:
0 - use standard 2-loop version
1 - use a 1-loop version
2 - use a 1-loop version with reduced updates

Results: derfraw300 +0.007% (on speed 0) at feature value = 1
                    -0.160% (on speed 0) at feature value = 2

There is substantial speed up at speeds 2 and above for low
resolution sequences where the entropy updates are a big part
of the overall computations.

Change-Id: Ie96fc50777088a5bd441288bca6111e43d03bcae
2013-08-28 10:56:52 -07:00
Dmitry Kovalev
851a2fd72c Renaming txfm_size to tx_size.
Change-Id: I752e374867d459960995b24d197301d65ad535e3
2013-08-27 19:47:53 -07:00
Jingning Han
eb7acb5524 Merge "Fix buf alignment in sub8x8 comp inter-inter pred" 2013-08-27 19:03:12 -07:00
Dmitry Kovalev
1d3f94efe2 Merge "Adding get_entropy_context function." 2013-08-27 17:02:36 -07:00
Frank Galligan
7d058ef86c Merge "Fix winodws warning." 2013-08-27 15:39:58 -07:00
Frank Galligan
f1560ce035 Fix winodws warning.
Const is not needed on the function parameter.

Change-Id: I38c2a7317cb6f42f70bbddfde9a2cd18d65ceb1c
2013-08-27 15:19:55 -07:00
Dmitry Kovalev
a93992e725 Adding get_entropy_context function.
Moving common code from encoder and decoder to this function.

Change-Id: I60fa643fb1ddf7ebbff5e83b6c4710137b0195ef
2013-08-27 14:17:53 -07:00
hkuang
3a679e56b2 Add neon optimize vp9_short_idct16x16_1_add.
Change-Id: Ib9354c1d975d03e8081df20d50b6a77dfe2dc7e5
2013-08-27 14:00:27 -07:00
hkuang
ce04b1aa62 Merge "Add neon optimize vp9_short_idct8x8_1_add." 2013-08-27 12:10:07 -07:00
Dmitry Kovalev
7b95f9bf39 Renaming BLOCK_SIZE_TYPE to BLOCK_SIZE in the encoder.
Change-Id: I62bb07c377f947cb72fac68add7a6b199e42c6b9
2013-08-27 11:05:08 -07:00
Dmitry Kovalev
ba10aed86d Merge "Using num_8x8_* lookup tables instead of mi_*_log2." 2013-08-27 10:49:36 -07:00
Dmitry Kovalev
12e5931a9a Merge "Using existing functions instead of raw expressions." 2013-08-27 10:33:34 -07:00
Dmitry Kovalev
f77c6973a1 Merge "Cleaning up decode_block_intra function." 2013-08-27 10:17:56 -07:00
Dmitry Kovalev
f389ca2acc Merge "Cleaning up model_rd_for_sb_y_tx." 2013-08-27 10:17:10 -07:00
Dmitry Kovalev
bfebe7e927 Merge "Renaming BLOCK_SIZE_TYPE to BLOCK_SIZE in the common/decoder." 2013-08-27 10:15:21 -07:00
Dmitry Kovalev
78e670fcf8 Merge "Renaming D27 to D207." 2013-08-27 10:03:57 -07:00
Jingning Han
2d6aadd7e2 Fix buf alignment in sub8x8 comp inter-inter pred
This commit resolved a mis-alignment issue in compound inter-inter
prediction of sub8x8. This patch follows solution from dkovalev@.

Change-Id: I3cc0cf7e55b84110e0c42ef4b2e6ca7ac3f8f932
2013-08-27 09:28:05 -07:00
Yaowu Xu
45125ee573 Merge "fixed the reading too many bytes" 2013-08-27 09:09:18 -07:00
Yaowu Xu
9482c07953 fixed the reading too many bytes
In subpel_avg_variance functions, code similar to the following

punpkldq m2, [addr]

actually reads 8 bytes. For functions that are supposed to work on
buffers only have less 8 bytes a line, this caused valgrind error
of reading uninitialized memory.

Change-Id: I2a4c079dbdbc747829bd9e2ed85f0018ad2a3a34
2013-08-27 08:39:20 -07:00
Jim Bankoski
3e43e49ffd Merge "Add a test vector that tests color space 444" 2013-08-27 08:33:12 -07:00
Dmitry Kovalev
44b7854c84 Removing unnecessary call to vp9_setup_interp_filters.
vp9_setup_interp_filters before each inter block decoding, it is not
necessary to call it just before the whole frame decoding.

Change-Id: Id1b0ee62f987474e27eafba0013a4896b492c400
2013-08-26 17:25:49 -07:00
hkuang
36e9b82080 Add neon optimize vp9_short_idct8x8_1_add.
Change-Id: I0b15d5e3b0eb97abb9ab5ec08e88b61f8723aaf4
2013-08-26 16:28:57 -07:00
hkuang
ba8fc71979 Merge "Add neon optimize vp9_short_idct4x4_1_add." 2013-08-26 16:26:38 -07:00
Dmitry Kovalev
657ee2d719 Cleaning up model_rd_for_sb_y_tx.
Removing references to plane_block_width and plane_block_height (we are
going to delete the latter ones).

Change-Id: I7982da4d373aebb54d2209dc8886f6192df4d287
2013-08-26 16:18:28 -07:00
hkuang
69384f4fad Add neon optimize vp9_short_idct4x4_1_add.
Change-Id: I6ecb5c4a1a472feb8e84e9f3352b536d5e28a4a5
2013-08-26 15:55:16 -07:00
Jim Bankoski
bbb490f6a3 Merge "Fix Chroma plane md5 check" 2013-08-26 15:27:14 -07:00
Jim Bankoski
a5cb05c45d Add a test vector that tests color space 444
This adds a test vector for 444 color space.

Change-Id: I1e2ac3883211989a062cfafc0e58151b14d294b8
2013-08-26 15:24:35 -07:00
Dmitry Kovalev
242460cb66 Cleaning up decode_block_intra function.
Change-Id: Ia41ea5d526d15fcbc9b56d74079593cf8b2fdf66
2013-08-26 15:24:12 -07:00
Jim Bankoski
af13fbb70f Fix Chroma plane md5 check
Chroma plane MD5 calculation was incorrect for 444 and 422
yuv color spaces.

Change-Id: If985396871a2f57db85108a4355172f9793d3007
2013-08-26 14:26:38 -07:00
Dmitry Kovalev
b25589c6bb Using num_8x8_* lookup tables instead of mi_*_log2.
Change-Id: I8a246b3d056c98be614d05a90bc261e2441ffc10
2013-08-26 14:22:54 -07:00
Yaowu Xu
4505e8accb Merge "Fix the reading of too many input pixels" 2013-08-26 14:01:50 -07:00
Paul Wilkins
aa823f8667 Merge "Changes to adaptive inter rd thresholds." 2013-08-26 12:48:11 -07:00
Yaowu Xu
6c5433c836 Fix the reading of too many input pixels
in VP9_get4x4var_mmx

Change-Id: I4b4a8f45f25ebdfad281f169cc87aba5e2d6f227
2013-08-26 12:35:27 -07:00
Paul Wilkins
642696b678 Merge "Limit Key frame Intra modes checks." 2013-08-26 12:34:56 -07:00
Dmitry Kovalev
45870619f3 Renaming BLOCK_SIZE_TYPE to BLOCK_SIZE in the common/decoder.
Adding temporary "typedef BLOCK_SIZE BLOCK_SIZE_TYPE" which will go away
after encoder's patch.

Change-Id: I06ec6a6f079401439843ec981d1496234fd7775c
2013-08-26 11:33:16 -07:00
Jingning Han
4681197a58 Merge "Temporarily disable SSSE3 quant_32x32" 2013-08-26 11:19:53 -07:00
Dmitry Kovalev
5eed6e2224 Merge "Removing redundant calls to clamp_mv2." 2013-08-26 10:48:37 -07:00
Jingning Han
166dc85bed Temporarily disable SSSE3 quant_32x32
Make the current head working properly, while working on fixing an
issue in the SSSE3 implementation of 32x32 quantization.

Change-Id: Ic029da3fd7f1f5e58bc641341cbd226ec49a16bc
2013-08-26 10:45:59 -07:00
James Zern
66ccf5ddcf Merge "cosmetics: yv12extend add some const" 2013-08-24 18:13:41 -07:00
James Zern
8b970da40d cosmetics: yv12extend add some const
Change-Id: I87f1ce2ceca80d3869dd72ba862329a98eb3e0c2
2013-08-24 12:09:01 -07:00
James Zern
b19babe5e6 Merge "cosmetics: strip 'VP9_' from defines in vp9 only code" 2013-08-23 19:59:16 -07:00
James Zern
55b5a68d72 Merge "yv12extend: name variables consistently" 2013-08-23 19:40:03 -07:00
James Zern
c8ba8c513c cosmetics: strip 'VP9_' from defines in vp9 only code
Change-Id: I481d9bb2fa3ec72b6a83d5f04d545ad8013f295c
2013-08-23 19:16:49 -07:00
James Zern
2c6ba737f8 Merge "vp9: remove unnecessary wait w/threaded loopfilter" 2013-08-23 18:52:10 -07:00
James Zern
5724b7e292 yv12extend: name variables consistently
- s|source -> src
- dest -> dst
- use verbose names in extend_plane dropping the redundant comments

+ light cosmetics:
- join a few lines / assignments
- drop some unnecessary comments & includes

Change-Id: I6d979a85a0223a0a79a22f79a6d9c7512fd04532
2013-08-23 18:45:11 -07:00
Dmitry Kovalev
50ee61db4c Renaming D27 to D207.
I've already renamed d27_predictor to d207_predictor but forgot about the
corresponding constant.

Change-Id: Id312aa80fc5b5a1ab8a709a33418a029552a6857
2013-08-23 17:33:48 -07:00
Dmitry Kovalev
480dd8ffbe Using existing functions instead of raw expressions.
Change-Id: Ifa50b04bac1a6ff2abef989073cbf1f37a89eb50
2013-08-23 17:26:53 -07:00
Dmitry Kovalev
e6c435b506 Merge "Cleanup in mvref_common.{h, c}." 2013-08-23 17:09:49 -07:00
Dmitry Kovalev
7194da2167 Merge "Fixing display size setting problem." 2013-08-23 17:08:51 -07:00
Yaowu Xu
13930cf569 Limit mv range to be based on partition size
Previous change c4048dbd limits the mv search range assuming max block
size of 64x64, this commit change the search range using actual block
size instead.

Change-Id: Ibe07ab02b62bf64bd9f8675d2b997af20a2c7e11
2013-08-23 15:43:57 -07:00
Dmitry Kovalev
cd2cc27af1 Removing redundant calls to clamp_mv2.
We could avoid calling clamp_mv2 because it has been already called
inside vp9_find_best_ref_mvs function.

Change-Id: I08edeaf3e11e98c19e67b9711b2523ca5fb1416e
2013-08-23 15:18:35 -07:00
Yaowu Xu
8e04257bc5 Merge "Added border extension" 2013-08-23 14:43:58 -07:00
Adrian Grange
78debf246b Merge "Fix bug in convolution functions (filter selection)" 2013-08-23 13:41:47 -07:00
Dmitry Kovalev
fb481913f0 Merge "Removing useless calls to setup_{pre, dst}_planes." 2013-08-23 13:37:32 -07:00
Dmitry Kovalev
11e3ac62a5 Fixing display size setting problem.
Fix of https://code.google.com/p/webm/issues/detail?id=608. We could have
used invalid display size equal to the previous frame size (not to the
current frame size).

Change-Id: I91b576be5032e47084214052a1990dc51213e2f0
2013-08-23 13:12:46 -07:00
Dmitry Kovalev
21d8e8590b Cleanup in mvref_common.{h, c}.
Making code more compact, adding consts, removing redundant arguments,
adding do/while(0) for macros.

Change-Id: Ic9ec0bc58cee0910a5450b7fb8cfbf35fa9d0d16
2013-08-23 12:00:30 -07:00
Yaowu Xu
656632b776 Added border extension
To the source buffer to be encoded as an alt ref frame. This is to fix
the problem of using uninitialized memory in encoder.

See https://code.google.com/p/webm/issues/detail?id=605

Change-Id: I97618a2fc207e08abcf5301b734aa9e3ad695e2c
2013-08-23 11:31:28 -07:00
Adrian Grange
3f10831308 Fix bug in convolution functions (filter selection)
(In response to Issue 604:
 https://code.google.com/p/webm/issues/detail?id=604)

There were bugs in the convolution code for two cases:

1. Where the filter table was assumed to be aligned to a
   256 byte boundary. The offset of the pixel in the
   source buffer was computed incorrectly.

2. Where no such alignment assumption was made. An
   incorrect address for the filter table base was used.

To fix both problems, I now assume that the filter table is
256-byte aligned and modify the pixel offset calculation to
match.

A later patch should remove the restriction that the filter
table is aligned to a 256-byte boundary.

There was also a bug in the ConvolveTest unit test
(convolve_test.cc).

(Bug & initial fix suggestion submitted by Tero Rintaluoma
and Sami Pietilä).

Change-Id: I71985551e62846e55e40de9e7e3959d4805baa82
2013-08-23 11:16:08 -07:00
Dmitry Kovalev
1c159c470a Merge "Checking scale factors on access." 2013-08-23 11:05:17 -07:00
James Zern
bef320aa07 Merge "vpx_scale: correct pixel spelling" 2013-08-23 11:01:27 -07:00
hkuang
b85367a608 Merge "Optimise idct4x4: rearrange the instructions a bit to improve instruction scheduling." 2013-08-23 10:08:43 -07:00
Paul Wilkins
aa5b67add0 Changes to adaptive inter rd thresholds.
Values now carried over frame to frame.
Change to algorithm for decreasing threshold after
a hit and to max threshold (now based on speed)

Removed some old commented out code relating to
VP8 adaptive thresholds.

The impact of these changes tested on Akiyo (50 frames)
and measured in terms of unit rd hits is as follows:

Speed 0 84.36 -> 84.67
Speed 1 29.48 -> 22.22
Speed 2 11.76 -> 8.21
Speed 3 12.32 -> 7.21

Encode speed impact is broadly in line with these.

Change-Id: I5b886efee3077a11553fa950d796fd6d00c8cb19
2013-08-23 16:18:45 +01:00
Paul Wilkins
f76f52df61 Limit Key frame Intra modes checks.
Most of the focus so far has been on inter frames.

At high speed settings the key frame is now taking a high %
of the cycles.

This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings

TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.

Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.

Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2
2013-08-23 16:10:30 +01:00
James Zern
735b3a710a vpx_scale: correct pixel spelling
Change-Id: Idcfab16da37134f943a4314674e2e2fcbff3a0f8
2013-08-22 19:39:52 -07:00
Jingning Han
9655c2c7a6 Merge "Fix rectangular partition check flag" 2013-08-22 18:59:18 -07:00
Dmitry Kovalev
33104cdd42 Merge "vp9_encodeframe.c cleanup." 2013-08-22 18:07:35 -07:00
James Zern
711aff9d9d Merge "vp9/encoder: fix last_frame_seg_map mem leak" 2013-08-22 18:04:03 -07:00
James Zern
d843ac5132 Merge "rename LOG2_* defines to *_LOG2" 2013-08-22 18:02:42 -07:00
Jingning Han
84f3b76e1c Fix rectangular partition check flag
Put rectangular partition check flag change according to the rd
costs of NONE and SPLIT partition types under the speed feature.

Change-Id: If681e1e078a8d43d86961ea4b748da5cd1b6c331
2013-08-22 17:15:01 -07:00
Dmitry Kovalev
53f6f8ac93 Merge "check_bsize_coverage cleanup." 2013-08-22 16:18:24 -07:00
hkuang
4205d79273 Merge "Add neon optimize vp9_short_idct10_16x16_add." 2013-08-22 15:57:28 -07:00
hkuang
4082bf9d7c Add neon optimize vp9_short_idct10_16x16_add.
vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.

Change-Id: I6e30a3fee1ece5af7f258532416d0bfddd1143f0
2013-08-22 15:53:22 -07:00
Dmitry Kovalev
604022d40b vp9_encodeframe.c cleanup.
Removing unused get_sbuv_perpixel_variance function, using has_second_ref/
is_inter_block functions, organizing includes.

Change-Id: I016de4af12fbbb8b4ece26a70759b2392651b095
2013-08-22 15:50:51 -07:00
Dmitry Kovalev
335b1d360b check_bsize_coverage cleanup.
Change-Id: Ib7803857b35c00e317c9deb8630e777e25eb278f
2013-08-22 15:45:56 -07:00
Dmitry Kovalev
3c42657207 Checking scale factors on access.
It is possible to have invalid scale factors and not access them
during decoding. Error is reported if we really try to use invalid scale
factors.

Change-Id: Ie532d3ea7325ee0c7a6ada08269f804350c80fdf
2013-08-22 15:19:05 -07:00
James Zern
40ae02c247 rename LOG2_* defines to *_LOG2
gets rid of a mix of styles

Change-Id: I3591d312157bc6f53a25438bf047765c671fd8a8
2013-08-22 14:45:24 -07:00
Dmitry Kovalev
13eed79c77 Merge "Adding vp9_is_scaled function." 2013-08-22 14:39:55 -07:00
Dmitry Kovalev
09858c239b Removing useless calls to setup_{pre, dst}_planes.
Comment is wrong, we don't initialize any xd pointers. We only initialize
xd->planes[i]->dst and xd->planes[i]->pre[], which are actually initialized
for every block during the decoding.

Change-Id: If152ea872ebef1f83ca70712fa6f8df1b6855f56
2013-08-22 14:39:05 -07:00
James Zern
a5726ac453 vp9/encoder: fix last_frame_seg_map mem leak
remove duplicate allocation from vp9_create_compressor, it was added to
vp9_alloc_frame_buffers in:

d5bec52 Added resizing & initialization of last frame segment map

Change-Id: I996723226a16a62aff8f9a52ac74e0b73cc98fdf
2013-08-22 14:13:04 -07:00
Dmitry Kovalev
640dea4d9d Adding vp9_is_scaled function.
Change-Id: Ieb7077ca3586b9491912027eed450a4f6fd38d30
2013-08-22 14:04:59 -07:00
Jingning Han
8adc20ce35 Merge "Refactor rd_pick_partition for parameter control" 2013-08-22 13:54:48 -07:00
James Zern
da9a6ac9e7 Merge "vp9_peek_si: add bitstream v1 support" 2013-08-22 13:28:00 -07:00
Jingning Han
01a37177d1 Refactor rd_pick_partition for parameter control
This commit changes the partition search order of superblocks from
{SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
consistency with that of sub8x8 partition search. It enable the use
of early termination in partition search for all block sizes.

For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.

This will further move towards making the in-search partition types
configurable, hence unifying various speed-up approaches.

Some speed 1 and 2 features are turned off during the refactoring
process, including:
disable_split_var_thresh
using_small_partition_info

Stricter constraints are applied to use_square_partition_only for
right/bottom boundary blocks. Will bring back/refine these features
subsequently. At this point, it makes derf set at speed 1 about
0.45% higher in compression performance, and 9% down in run-time.

Change-Id: I3db9f9d1d1a0d6cbe2e50e49bd9eda1cf705f37c
2013-08-22 12:36:02 -07:00
hkuang
610642c130 Optimise idct4x4: rearrange the instructions a bit
to improve instruction scheduling.

Change-Id: I5ea881a6e419f9e8ed4b3b619406403b4de24134
2013-08-22 11:02:22 -07:00
Deb Mukherjee
8b810c7a78 Fixes on feature disabling split based on variance
Adds a couple of minor fixes, which may be absorbed in Jingning's
patch. Thanks to Guillaume for pointing these out.
Also adjusts the thresholds for speed 1 and 2 to 16 and 32
respectively, to keep quality drops small.

Results:
--------
derfraw300:  threshold = 16, psnr -0.082%, speedup 2-3%
             threshold = 32, psnr -0.218%, speedup 5-6%
stdhdraw250: threshold = 16, psnr -0.031%, speedup 2-3%
             threshold = 32, psnr -0.273%, speedup 5-6%

Change-Id: I4b11ae8296cca6c2a9f644be7e40de7c423b8330
2013-08-22 07:05:44 -07:00
Scott LaVarnway
f39bf458e5 Merge "Initialize mb_skip_coeff before picking modes" 2013-08-22 06:26:04 -07:00
Scott LaVarnway
94bfbaa84e Initialize mb_skip_coeff before picking modes
It appears that the above/left mb_skip_coeff used during
the pick modes, is left over from the previously
encode frame.  This patch initializes the flag to the default
value of zero.


Change-Id: Ida4684cc99611d6e3e82628db35ed717e28ce550
2013-08-22 08:51:04 -04:00
Dmitry Kovalev
96a1a59d21 Merge "Using has_second_ref function to simplify the code." 2013-08-22 01:39:14 -07:00
Dmitry Kovalev
a33f178491 Merge "Cleaning up foreach_transformed_block_in_plane." 2013-08-22 01:37:21 -07:00
Dmitry Kovalev
359b571448 Merge "Cleaning up reset_skip_context function." 2013-08-22 01:36:25 -07:00
Dmitry Kovalev
596c51087b Merge "Removing unused foreach_predicted_block function." 2013-08-22 01:35:41 -07:00
Dmitry Kovalev
cb05a451c6 Merge "Cleaning up optimize_init_b function." 2013-08-22 01:35:27 -07:00
Dmitry Kovalev
64c0f5c592 Merge "Cleaning up sum_intra_stats function." 2013-08-22 01:34:39 -07:00
Jingning Han
fcb890d751 Merge "Enable zero coeff check in sub8x8 UV rd loop" 2013-08-21 22:07:00 -07:00
James Zern
ccb6bdca75 configure: fix action expansion
enable|disable -> (enable|disable)_feature

Change-Id: I7494913c78ebe8bb532fa6545e0ae53a79ccd388
2013-08-21 19:00:08 -07:00
James Zern
42ab401fd3 configure: rename enable() to enable_feature()
+ disable() -> disable_feature() for balance

this avoids shadowing the bash builtin 'enable' allowing the scripts to
be linted with checkbashisms

Change-Id: Ia11cf86c92ec25bd14e69427b0ac0a9a61a5f7a5
2013-08-21 18:11:45 -07:00
James Zern
85640f1c9d vp9: remove unnecessary wait w/threaded loopfilter
the final macroblock rows are scheduled in the main thread. prior to
this change one additional macroblock row would be scheduled in the
worker forcing the main thread to wait before finishing.

Change-Id: I05f3168e5c629b898fcebb0d77eb6d6a90d6105e
2013-08-21 17:43:44 -07:00
Dmitry Kovalev
4172d7c584 Cleaning up foreach_transformed_block_in_plane.
Change-Id: I9f45af3894c57f35cb266c255e2b904295d39c34
2013-08-21 17:16:02 -07:00
James Zern
6167355309 vp9_peek_si: add bitstream v1 support
currently protected by CONFIG_NON420 as v1 is still not entirely stable

Change-Id: Id1c5081b04a2c47a842822048b8804be67d23a6d
2013-08-21 17:04:10 -07:00
Dmitry Kovalev
be60924f29 Cleaning up optimize_init_b function.
Change-Id: Ib2c975e1d96deefb7ac4d6b600c8c5388035d111
2013-08-21 16:40:16 -07:00
Dmitry Kovalev
c43da352ab Cleaning up reset_skip_context function.
Change-Id: Ib3e72671eb8da6f2e9767a6de292ec7c7cde6bc7
2013-08-21 16:31:51 -07:00
Dmitry Kovalev
048ccb2849 Cleaning up sum_intra_stats function.
Using size_group_lookup table and better variable names.

Change-Id: I6e67f2ce091845db43ace7d21b7ae31c6f165aec
2013-08-21 16:25:02 -07:00
Dmitry Kovalev
3286abd82e Merge "Adding scale factor check." 2013-08-21 14:11:13 -07:00
Dmitry Kovalev
687891238c Merge "Removing PLANE_TYPE argument from cost_coeffs function." 2013-08-21 14:10:05 -07:00
Deb Mukherjee
a2f7619860 Merge "Make "good" quality 2-pass vpxenc encoding default" 2013-08-21 13:58:49 -07:00
James Zern
ac12f3926b Merge "vp9 rtcd: remove non-existent sad functions" 2013-08-21 13:55:59 -07:00
Dmitry Kovalev
2f1a0a0e2c Removing PLANE_TYPE argument from cost_coeffs function.
We can determine plane_type for another function arguments.

Change-Id: I85331877aedb357632ae916a37b5b15f22c0bb1f
2013-08-21 13:02:28 -07:00
Deb Mukherjee
0d8723f8d5 Make "good" quality 2-pass vpxenc encoding default
Currently, the best quality mode in VP9 is not very well developed,
and unnecessarily makes the encode too slow. Hence the command line
default is changed to "good" quality. Also, the number of passes
default is changed to 2 passes as well, since 1-pass encoding is
not very efficient in VP9.

Besides, a number of VP9 defaults are set to the currently
recommended settings. With these changes, vpxenc
run with --codec=vp9 --kf-max-dist=9999 --cpu-used=0 should
work about the same as our borg results.
Note when the --cpu-used=0 option is dropped there will be a slight
difference in the output, because of a difference in the cpu-used
value for the first pass. Specifically, the default when unspecified
is to use cpu_used=1 for the first pass and cpu_used=0 for the
second pass. But when specified, both passes will use the cpu-used
value specified.

Note that this also changes the default for VP8 as being "good"
but other options stay unchanged.

Change-Id: Ib23c1a05ae2f36ee076c0e34403efbda518c5066
2013-08-21 12:41:26 -07:00
Dmitry Kovalev
27a984fbd3 Removing a lot of duplicated code.
Adding set_contexts contexts function and call it instead of
set_contexts_on_border. Calling txfrm_block_to_raster_xy to get aoff and
loff.

Change-Id: I41897e344afd2cae1f923f4fdbe63daccf6fe80e
2013-08-21 11:55:12 -07:00
Dmitry Kovalev
a3ae4c87fd Adding scale factor check.
We support only [1/16, 2] scale factors, enforcing this now.

Change-Id: I0822eb7cea51720df6814e42d3f35ff340963061
2013-08-21 11:24:47 -07:00
Adrian Grange
ce28d0ca89 Fix typos and minor stylistic cleanup
Change-Id: I32e43474e8651ef2eb181d24860a8f118cfea7bf
2013-08-21 08:45:42 -07:00
Adrian Grange
5b63963573 Merge "Further correct bug in loopfilter initialization" 2013-08-21 07:17:43 -07:00
James Zern
ae455fabd8 vp9 rtcd: remove non-existent sad functions
vp9_sad32x3, vp9_sad3x32

+ remove unnecessary sad include from vp9_findnearmv.c

Change-Id: Idef2a89cadc3fec64eff82ba9be60ffff50b3468
2013-08-20 18:07:53 -07:00
Dmitry Kovalev
90027be251 Removing unused foreach_predicted_block function.
Moving foreach_predicted_block_in_plane function to vp9_reconinter.c
because there is only one usage.

Change-Id: I9852feae43fc3cf809b817fc541d043bc5496209
2013-08-20 17:20:47 -07:00
Dmitry Kovalev
7f814c6bf8 Merge "Passing plane_bsize to foreach_transformed_block_visitor." 2013-08-20 14:25:01 -07:00
Dmitry Kovalev
27de4fe922 Using has_second_ref function to simplify the code.
Updating implementation of vp9_get_pred_context_single_ref_p2 using
has_second_ref function to make code easier to read.

Change-Id: I5ba642712f59861a48aab974e73aa01640d086fe
2013-08-20 14:09:56 -07:00
hkuang
62a2cd9ed2 Merge "Add neon optimize vp9_short_idct10_8x8_add." 2013-08-20 14:06:57 -07:00
Dmitry Kovalev
381d3b8b7d Merge "vp9_filter.{h, c} cleanup + adding SUBPEL_TAPS constant." 2013-08-20 13:46:53 -07:00
Dmitry Kovalev
d19ac4b66d vp9_filter.{h, c} cleanup + adding SUBPEL_TAPS constant.
Change-Id: Ib394ea23f464591dad50b5c65c316701378d06d7
2013-08-20 12:29:57 -07:00
hkuang
37cda6dc4c Add neon optimize vp9_short_idct10_8x8_add.
vp9_short_idct10_8x8_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut several
unnecessary calculations in order to save instructions.

Change-Id: I34fda95e29082b789aded97c2df193991c2d9195
2013-08-20 11:51:07 -07:00
Jingning Han
1bf1428654 Enable zero coeff check in sub8x8 UV rd loop
Check the minimum rate-distortion cost of regular quantization and
all zero coeffs cases in the sub8x8 inter prediction rd loop for
luma components. Use this as the cumulative rdcost sent to UV rd
estimation.

Change-Id: Ia4bc7700437d5e13d7cdad4cf9ae57ab036d3e97
2013-08-20 10:33:42 -07:00
Deb Mukherjee
246381faf2 Merge "Cleanup/enhancements of switchable filter search" 2013-08-20 10:16:51 -07:00
Dmitry Kovalev
5826407f2a Merge "Moving plane_block_idx from vp9_blockd.h to vp9_quantize.c." 2013-08-20 10:06:22 -07:00
Dmitry Kovalev
5baf510f74 Merge "Adding has_second_ref function." 2013-08-20 10:06:14 -07:00
Dmitry Kovalev
039b0c4c9e Merge "Adding VP9_FILTER_BITS constant." 2013-08-20 10:05:09 -07:00
Deb Mukherjee
2ffe64ad5c Cleanup/enhancements of switchable filter search
Cleans up the switchable filter search logic. Also adds a
speed feature - a variance threshold - to disable filter search
if source variance is lower than this value.

Results: derfraw300
threshold = 16, psnr -0.238%, 4-5% speedup (tested on football)
threshold = 32, psnr -0.381%, 8-9% speedup (tested on football)
threshold = 64, psnr -0.611%, 12-13% speedup (tested on football)
threshold = 96, psnr -0.804%, 16-17% speedup (tested on football)

Based on these results, the threshold is chosen as 16 for speed 1,
32 for speed 2, 64 for speed 3 and 96 for speed 4.

Change-Id: Ib630d39192773b1983d3d349b97973768e170c04
2013-08-20 09:47:04 -07:00
Jingning Han
bb64c9a355 Merge "Enable early termination in uv rd loop" 2013-08-20 09:07:26 -07:00
Jim Bankoski
be5dc2321b Merge "fix the mv_ref_idx issue" 2013-08-20 09:00:57 -07:00
Jim Bankoski
f167433d9c fix the mv_ref_idx issue
The following issue was reported :
https://code.google.com/p/webm/issues/detail?id=601&q=jimbankoski&sort=-id&colspec=ID%20Pri%20mstone%20ReleaseBlock%20Type%20Component%20Status%20Owner%20Summary

This code makes the choice and code cleaner and removes any question
about whether the border needs to be checked.

Change-Id: Ia7aecfb3168e340618805bd318499176c2989597
2013-08-20 08:14:52 -07:00
Paul Wilkins
e8923fe492 Changes to auto partition size selection.
Changes to code to auto select a partition size range
based on data from spatial neighbors.

Now looks at the sb_type in each 8x8 block of above
and left SB64.

The effect on speed 1 is now weaker giving better
quality but less speed gain. Now also used in speed 2.

Change-Id: Iace33a97d5c3498dd2a9a8a4067351941abcbabc
2013-08-20 14:05:39 +01:00
Dmitry Kovalev
2612b99cc7 Adding VP9_FILTER_BITS constant.
Removing VP9_FILTER_WEIGHT, VP9_FILTER_SHIFT, BLOCK_WIDTH_HEIGHT
constants. Using ROUND_POWER_OF_TWO for rounding.

Change-Id: I2e8d6858dcd600a87096138209731137d7decc24
2013-08-20 00:42:25 -07:00
Dmitry Kovalev
d8286dd56d Adding has_second_ref function.
Updating implementation of vp9_get_pred_context_single_ref_p1 using
has_second_ref function to make code easier to read.

Change-Id: Ie8f60403a7195117ceb2c6c43176ca9a9e70b909
2013-08-19 18:39:34 -07:00
Yaowu Xu
c4048dbdd3 Change to limit the mv search range
As the pixel values beyond image border are duplicates of pixels
on edge, the change limits the mv search range, any mv beyond
the limits no longer produce new/different prediction values
as entire block with pixels used for subpel interpolation are
outside image border.

Change-Id: I4c6fdf06e33c1cef1489f5470ce0fb4e5e01fb79
2013-08-19 17:19:36 -07:00
Yaowu Xu
f70330a906 fix a bug when null function pointer is used.
For certain partition size, the function poniter may not be intialized
at all. The patch prevent the call if the pointer is not set.

Change-Id: I78b8c3992b639e8799a16b3c74f0973d07b8b9ac
2013-08-19 17:16:12 -07:00
Dmitry Kovalev
569ca37d09 Moving plane_block_idx from vp9_blockd.h to vp9_quantize.c.
Change-Id: Ib8af21f2e7f603c2fb407e5d15a3bba64b545b49
2013-08-19 16:44:10 -07:00
Jingning Han
3275ad701a Enable early termination in uv rd loop
This commit enables early termination in the rate-distortion
optimization search loop for chroma components. When the cumulative
rd cost is above the current best value, skip the rest per-block
transform/quantization/coeff_cost and continue to the next
prediction mode.

For bus_cif at 2000 kbps, the average run-time goes down from
168546ms -> 164678ms, (2% speed-up) at speed 0
 36197ms ->  34465ms, (4% speed-up) at speed 1

Change-Id: I9d3043864126e62bd0166250d66b3170d520b3c0
2013-08-19 16:31:19 -07:00
Dmitry Kovalev
82d4d9a008 Passing plane_bsize to foreach_transformed_block_visitor.
Updating all foreach_transformed_block_visitor functions to work with
plane block size instead of general block. Removing a lot of duplicated
code.

Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
2013-08-19 15:47:24 -07:00
Jingning Han
31c97c2bdf Merge "Fix potential use of uninitialized value" 2013-08-19 15:15:58 -07:00
Jingning Han
5dc0b309ab Merge "Fix the returned distortion value in rd_pick_intra" 2013-08-19 14:34:19 -07:00
Dmitry Kovalev
2e3478a593 Using plane_bsize instead of bsize.
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.

Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
2013-08-19 13:20:21 -07:00
Adrian Grange
5a1a269f67 Further correct bug in loopfilter initialization
The intent was to initialize the deltas for the
segment to the computed value, irrespective of mode
and reference frame if (mode_ref_delta_enabled == 0).

(In response to bug posted by Manjit Hota to codec-devel
and webm-discuss lists)

Change-Id: I10435cb63d0f88359bb4c14f22181878a1988e72
2013-08-19 11:58:52 -07:00
Jingning Han
b34ce04378 Fix potential use of uninitialized value
Initialize the best mode and tx_size values in the rate-distortion
optimization search loop.

Change-Id: Ibfb5c0895691f172abcd4265c23aef4cb99fa8af
2013-08-19 11:15:53 -07:00
Jingning Han
f67919ae86 Fix the returned distortion value in rd_pick_intra
Return the distortion value in vp9_rd_pick_intra_mode_sb as sum of
dist_y and dist_uv. Remove the right shift operation on dist_uv,
and make it consistent with that of vp9_rd_pick_inter_mode_sb.

Change-Id: I9d564e242d9add38e32595d33b0e0dddb1d55e5b
2013-08-16 21:23:22 -07:00
Dmitry Kovalev
26e5b5e25d Removing unused or redundant arguments from *_args structures.
Redundant dst, pre[2] from build_inter_predictors_args, unused cm from
encode_b_args.

Change-Id: I2c476cd328c5c0cca4c78ba451ca6ba2a2c37e2d
2013-08-16 12:51:20 -07:00
Dmitry Kovalev
367cb10fcf Merge "Moving from ss_txfrm_size to tx_size." 2013-08-16 12:46:45 -07:00
Dmitry Kovalev
1462433370 Merge "Renaming d27 predictor to d207." 2013-08-16 12:07:24 -07:00
Johann
d514b778c4 Merge "Reduce the instructions of idct8x8. Also add the saving and restoring of D registers." 2013-08-16 11:30:21 -07:00
Johann
65aa89af1a Merge "Reduce instructions of idct4x4." 2013-08-16 11:28:35 -07:00
Frank Galligan
bdc785e976 Merge "vp9: neon: optimise vp9_wide_mbfilter_neon" 2013-08-16 11:16:48 -07:00
hkuang
df0715204c Reduce instructions of idct4x4.
Change-Id: Ia26a2526804e7e2f656b0051618a615fca8fc79d
2013-08-16 10:54:56 -07:00
hkuang
60ecd60c9a Reduce the instructions of idct8x8. Also add the
saving and restoring of D registers.

Change-Id: Id3630c90fcb160ef939fef55411342608af5f990
2013-08-16 10:32:12 -07:00
Johann
bba68342ce Merge "vp9: neon: use aligned stores in convolve functions" 2013-08-16 10:29:59 -07:00
Adrian Grange
79f4c1b9a4 Fixed typos and formatting
Change-Id: I3814984a624bc64147c57efa74fbdda8eda47262
2013-08-16 09:15:26 -07:00
Adrian Grange
3e340880a8 Merge "Added resizing & initialization of last frame segment map" 2013-08-16 09:07:36 -07:00
Mans Rullgard
4fa93bcef4 vp9: neon: use aligned stores in convolve functions
The destination is block-aligned so it is safe to use aligned
stores.

Change-Id: I38261e4fa40bc60e6472edffece59e372908da7e
2013-08-16 14:25:08 +01:00
Dmitry Kovalev
afd9bd3e3c Moving from ss_txfrm_size to tx_size.
Updating foreach_transformed_block_visitor and corresponding functions
to accept tx_size instead of ss_txfrm_size. List of functions per file:

vp9_decodframe.c
  decode_block
  decode_block_intra

vp9_detokenize.c
  decode_block

vp9_encodemb.c
  optimize_block
  vp9_xform_quant
  vp9_encode_block_intra

vp9_rdopt.c
  dist_block
  rate_block
  block_yrd_txfm

vp9_tokenize.c
  set_entropy_context_b
  tokenize_b
  is_skippable

Change-Id: I351bf563eb36cf34db71c3f06b9bbc9a61b55b73
2013-08-15 17:03:03 -07:00
Jingning Han
5e80a49307 Merge "Refactor rd loop for chroma components" 2013-08-15 16:02:12 -07:00
Adrian Grange
d5bec522da Added resizing & initialization of last frame segment map
When the frame size changes the last frame segment map must
be resized to match and initialized to 0.

Change-Id: Idc10de109f55dbe9af3a6caae355a2974712243d
2013-08-15 15:35:21 -07:00
Dmitry Kovalev
9451e8d37e Merge "Converting code from using ss_txfrm_size to tx_size." 2013-08-15 15:21:09 -07:00
Dmitry Kovalev
939b1e4a8c Merge "Moving segmentation struct from MACROBLOCKD to VP9_COMMON." 2013-08-15 15:14:32 -07:00
Johann
a9aa7d07d0 Merge "vp9: neon: add vp9_convolve_avg_neon" 2013-08-15 14:55:15 -07:00
Johann
63e140eaa7 Merge "vp9: neon: add vp9_convolve_copy_neon" 2013-08-15 14:55:08 -07:00
Jingning Han
68369ca897 Refactor rd loop for chroma components
This commit makes the rate-distortion optimization search of chroma
components consistent across all block sizes. It removes redundant
codes.

Change-Id: I7e76f54d045e8efdd41d84a164c71f55b484471b
2013-08-15 14:54:48 -07:00
Jingning Han
c2ff1882ff Merge "Remove unused RDCOST_8X8 macro" 2013-08-15 13:48:25 -07:00
Jingning Han
ca983f34f7 Merge "Unify luma and chroma rd-cost estimation" 2013-08-15 13:48:15 -07:00
Dmitry Kovalev
bb3b817c1e Converting code from using ss_txfrm_size to tx_size.
Updated function signatures:
  txfrm_block_to_raster_block
  txfrm_block_to_raster_xy
  extend_for_intra
  vp9_optimize_b

Change-Id: I7213f4c4b1b9ec802f90621d5ba61d5e4dac5e0a
2013-08-15 11:44:57 -07:00
Dmitry Kovalev
6f4fa44c42 Using { 0 } for initialization instead of memset.
Change-Id: I4fad357465022d14bfc7e13b348c6da267587314
2013-08-15 11:37:56 -07:00
Dmitry Kovalev
81d7bd50f5 Renaming d27 predictor to d207.
27 degrees intra predictor is actually 207 degrees, so renaming it.

Change-Id: Ife96a910437eb80ccdc0b7a5b7a62c77542ae5be
2013-08-15 11:09:49 -07:00
Mans Rullgard
67e53716e0 vp9: neon: optimise vp9_wide_mbfilter_neon
Break up long dependency chains to improve instruction scheduling.

Change-Id: I0e0cb66943df24af920767bb4167b25c38af9630
2013-08-15 19:07:22 +01:00
James Zern
89a1fcf884 Merge "vp9_dx_iface: check for NULL/0-size input" 2013-08-15 10:59:22 -07:00
James Zern
cefaaa86c7 Merge "vpxenc: open output file after setting pass #" 2013-08-15 10:58:53 -07:00
Dmitry Kovalev
b7616e387e Moving segmentation struct from MACROBLOCKD to VP9_COMMON.
VP9_COMMON is the right place to segmentatation struct because it has
global segmentation parameters, not something specific to macroblock
processing.

Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
2013-08-15 10:47:48 -07:00
Jingning Han
b0646f9e98 Remove unused RDCOST_8X8 macro
Change-Id: I17c7d7eaa60fe69c543403c340f7c1078bfd339f
2013-08-15 10:40:44 -07:00
Dmitry Kovalev
4d73416099 Merge "Quantization code cleanup." 2013-08-15 10:23:01 -07:00
Deb Mukherjee
24856b6abc Speed feature to skip split partition based on var
Adds a speed feature to disable split partition search based on a
given threshold on the source variance. A tighter threshold derived
from the threshold provided is used to also disable horizontal and
vertical partitions.

Results on derfraw300:
threshold = 16, psnr = -0.057%, speedup ~1% (football)
threshold = 32, psnr = -0.150%, speedup ~4-5% (football)
threshold = 64, psnr = -0.570%, speedup ~10-12% (football)

Results on stdhdraw250:
threshold = 32, psnr = -0.18%, speedup is somewhat more than derf
because of a larger number of smoother blocks at higher resolution.

Based on these results, a threshold of 32 is chosen for speed 1,
and a threshold of 64 is chosen for speeds 2 and above.

Change-Id: If08912fb6c67fd4242d12a0d094783a99f52f6c6
2013-08-15 10:01:45 -07:00
Jingning Han
ec01f52ffa Unify luma and chroma rd-cost estimation
This commit unifies the rate-distortion cost calculation process of
luma and chroma components. It allows early termination to be enabled
later in the rd search loop of chroma components, in consistent with
luma pixels.

Change-Id: I2e52a7c6496176bf2a5e3ef338d34ceb8aad9b3d
2013-08-15 09:41:33 -07:00
Paul Wilkins
1a3641d91b Merge "Renaming in MB_MODE_INFO" 2013-08-15 02:12:48 -07:00
James Zern
adfc54a464 Merge "Get rid of bashisms in armlink_adapter.sh" 2013-08-14 22:22:15 -07:00
Guillaume Martres
eb2fbea621 Get rid of bashisms in armlink_adapter.sh
Change-Id: If3cd84bb873fbad5622172c9b79ad8ae5485235a
2013-08-14 18:57:10 -07:00
James Zern
ab21378a2e Merge "Get rid of bashisms in the main build scripts" 2013-08-14 18:49:52 -07:00
James Zern
20395189cd vp9_dx_iface: check for NULL/0-size input
avoids a crash caused by issue #585

Change-Id: I301595ee0227699b0da6f0dad6d870dd546e94ef
2013-08-14 18:35:22 -07:00
James Zern
8cb09719a3 vpxenc: open output file after setting pass #
write_ivf_file_header would incorrectly skip writing the file header in
the 2nd pass, causing the initial frame header to be overwritten on
close potential causing an overly large frame header to be read and a
crash.

most likely broken since:
9e50ed7 vpxenc: initial implementation of multistream support

fixes issue #585

Change-Id: I7e863e295dd6344c33b3e9c07f9f0394ec496e7b
2013-08-14 18:34:44 -07:00
hkuang
39f42c8713 Merge "Add neon optimize vp9_short_idct16x16_add." 2013-08-14 14:16:20 -07:00
hkuang
cf6beea661 Add neon optimize vp9_short_idct16x16_add.
Change-Id: I27134b9a5cace2bdad53534562c91d829b48838d
2013-08-14 13:52:16 -07:00
Dmitry Kovalev
bb072000e8 foreach_transformed_block_in_plane cleanup, explicit tx_size var.
Making foreach_transformed_block_in_plane more clear (it's not finished
yet). Using explicit tx_size variable consistently instead of
(ss_txfrm_size / 2) or (ss_txfrm_size >> 1) expression.

Change-Id: I1b9bba2c0a9f817fca72c88324bbe6004766fb7d
2013-08-14 11:39:31 -07:00
Dmitry Kovalev
f2c073efaa Adding const to arguments of intra prediction functions.
Adding const to above and left pointers. Cleanup.

Change-Id: I51e195fa2e2923048043fe68b4e38a47ee82cda1
2013-08-14 10:35:56 -07:00
Mans Rullgard
0f1deccf86 vp9: neon: add vp9_convolve_avg_neon
Change-Id: I33cff9ac4f2234558f6f87729f9b2e88a33fbf58
2013-08-14 16:27:55 +01:00
Mans Rullgard
635ba269be vp9: neon: add vp9_convolve_copy_neon
Change-Id: I15adbbda15d1842e9f15f21878a5ffbb75c3c0c9
2013-08-14 16:27:55 +01:00
Paul Wilkins
26fead7ecf Renaming in MB_MODE_INFO
The macro block mode info context originally contained an
entry for each 16x16 macroblock. In VP9 each entry refers
to an 8x8 region not a macro block, so the naming is misleading.

This first stage clean up changes the names of 3 entries in the
structure to remove the mb_ prefix.

TODO clean up the nomenclature more widely in respect of
mbmi and bmi.

Change-Id: Ia7305c6d0cb805dfe8cdc98dad21338f502e49c6
2013-08-14 12:47:52 +01:00
Paul Wilkins
54979b4350 Merge "Honor min_partition_size properly for non-square splits" 2013-08-14 04:45:18 -07:00
Guillaume Martres
3526f1cd5e Get rid of bashisms in the main build scripts
The conversion was done with the help of the checkbashisms script
and https://wiki.ubuntu.com/DashAsBinSh .

Change-Id: Id64ecefb35c8d72302f343cd2ec442e7ef989d47
2013-08-13 18:48:35 -07:00
Guillaume Martres
fc50477082 Honor min_partition_size properly for non-square splits
Don't do vertical or horizontal splits if subsize < min_partition_size,
except for edge blocks where it makes sense.

Change-Id: I479aa66ba1838d227b5de8312d46be184a8d6401
2013-08-13 15:24:03 -07:00
Dmitry Kovalev
bcc8e9d9c6 Merge "Little cleanup inside decode_tile() function." 2013-08-13 14:43:10 -07:00
Guillaume Martres
ecb78b3e0c Merge "Trivial clean up." 2013-08-13 12:40:37 -07:00
Jingning Han
7e0f88b6be Use lookup table to find largest txfm size
Refactor choose_largest_txfm_size_ and make it find the largest
transform size via lookup table.

Change-Id: I685e0396d71111b599d5367ab1b9c934bd5490c8
2013-08-13 10:32:14 -07:00
Dmitry Kovalev
8105ce6dce Merge "Using is_inter_block() instead of repetitive code." 2013-08-13 10:00:01 -07:00
Jingning Han
dc70fbe42d Merge "Refactor model based tx search in super_block_yrd" 2013-08-13 08:48:49 -07:00
Paul Wilkins
5459f68d71 Trivial clean up.
Delete unused / commented out  variable references.

Change-Id: Iaf20c0c3744f89adb296d153b516b5ea41b4f3b4
2013-08-13 13:26:18 +01:00
Paul Wilkins
8e35263bed Merge "Honor min_partition_size properly" 2013-08-13 05:19:51 -07:00
Paul Wilkins
902f9c7cbd Merge "Broken loop filter case." 2013-08-13 01:56:29 -07:00
Jingning Han
39fe235032 Merge "SSE2 high precision 32x32 forward DCT" 2013-08-12 23:03:47 -07:00
Dmitry Kovalev
2c7ae8c29a Little cleanup inside decode_tile() function.
Change-Id: I3ed4beb59371fe21ca3e82253aa98e0cbd5e0630
2013-08-12 18:28:13 -07:00
Johann
4417c04531 Merge "vp9: neon: optimise convolve8_vert functions" 2013-08-12 17:54:47 -07:00
Johann
4cabbca4ce Merge "vp9: neon: optimise convolve8_horiz functions" 2013-08-12 17:54:42 -07:00
Dmitry Kovalev
32006aadd8 Using is_inter_block() instead of repetitive code.
Change-Id: If0b04c476c34fb8c102c9f750d7fe5669a86a532
2013-08-12 17:42:14 -07:00
Jingning Han
78136edcdc SSE2 high precision 32x32 forward DCT
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.

Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
2013-08-12 16:52:53 -07:00
Jingning Han
14cc7b319f Refactor model based tx search in super_block_yrd
Remove unnecessary conditional branches in model-based transform
size search.

Change-Id: Ic862dc33ed6710a186f6248239dd5f09b5c19981
2013-08-12 16:34:48 -07:00
Dmitry Kovalev
b89eef8f82 Merge "Simplifying vp9_mvref_common.c." 2013-08-12 16:24:22 -07:00
Dmitry Kovalev
b214cd0dab Merge "Removing foreach_predicted_block_uv function." 2013-08-12 15:54:01 -07:00
Dmitry Kovalev
98e3d73e16 Merge "Using MV* instead of int_mv* as argument of vp9_clamp_mv_min_max." 2013-08-12 15:53:25 -07:00
Dmitry Kovalev
1a5e6ffb02 Simplifying vp9_mvref_common.c.
Change-Id: I272df2e33fa05310466acf06c179728514dd7494
2013-08-12 15:52:08 -07:00
Dmitry Kovalev
9d5885b0ab Quantization code cleanup.
Change-Id: I77b42418b852093f79260cbd880533a0bd86678f
2013-08-12 15:23:47 -07:00
Dmitry Kovalev
c66320b3e4 Merge "Entropy context related cleanups." 2013-08-12 15:18:24 -07:00
Dmitry Kovalev
bd1bc1d303 Merge "Making scaling code more clear." 2013-08-12 15:17:26 -07:00
Dmitry Kovalev
9a31d05e24 Removing unused convolve_avg_c function + cleanup.
Change-Id: Id2b126c6456627c25e4041a82e304d0151d951ba
2013-08-12 14:28:00 -07:00
Dmitry Kovalev
1aedfc992a Using MV* instead of int_mv* as argument of vp9_clamp_mv_min_max.
Change-Id: I3c45916a9059f11b41e9d798e34ffee052969a44
2013-08-12 13:56:04 -07:00
Dmitry Kovalev
76d166e413 Removing foreach_predicted_block_uv function.
Adding function build_inter_predictors_for_planes to build inter
predictors for specified planes. This function allows to remove
condition "#if CONFIG_ALPHA" and use MAX_MB_PLANE for general case.
Renaming 'which_mv' local var to 'ref', and 'weight' argument to 'ref'.

Change-Id: I1a97160c9263006929d38953f266bc68e9c56c7d
2013-08-12 13:54:13 -07:00
Dmitry Kovalev
a72e269318 Making scaling code more clear.
Reusing existing functions, using constants instead of magic numbers.

Change-Id: Idc689ffba52c9a8b203fcf26bd67110ecb5635f9
2013-08-12 13:30:26 -07:00
Paul Wilkins
c3b5ef7600 Broken loop filter case.
Loop filter level moved to common but this case missed.

Change-Id: I7fcb557e46ef4ed8e2b5e9c3e82cb042b55bbd7f
2013-08-12 19:46:47 +01:00
Jingning Han
3984b41c87 Fix a compile failure in vp9_get_compressed_data
The lf struct is now with VP9_COMMON, instead of MACROBLOCKD.

Change-Id: Idfdd4f91f78f486078a138322d58bb61e93e1bc9
2013-08-12 11:42:17 -07:00
Dmitry Kovalev
8b0e6035a2 Entropy context related cleanups.
Adding set_skip_context() function used from both encoder and decoder.

Change-Id: Ia22cfad3211a00a63eb294f64f857b78f4aa9b85
2013-08-12 11:24:24 -07:00
Mans Rullgard
ad7021dd6c vp9: neon: optimise convolve8_vert functions
Invert loops to operate vertically in the inner loop.  This allows
removing redundant loads.

Also add preloading of data.

Change-Id: I4fa85c0ab1735bcb1dd6ea58937efac949172bdc
2013-08-12 15:37:48 +01:00
Dmitry Kovalev
097046ae28 Merge "Removing redundant code and function arguments." 2013-08-11 12:20:58 -07:00
Mans Rullgard
b84dc949c8 vp9: neon: optimise convolve8_horiz functions
Each iteration of the horizontal loop reuses 7 of the 11 source
values.  Loading only the 4 new values saves some time.

Also add preload for source data.

Overall 4% faster on Chromebook.

Change-Id: I8f69e749f2b7f79e9734620dcee51dbfcd716b44
2013-08-11 16:21:55 +01:00
Dmitry Kovalev
3c43ec206c Renaming BLOCK_SIZE_TYPES constant to BLOCK_SIZES.
There will be another change set to rename BLOCK_SIZE_TYPE enum to
BLOCK_SIZE.

Change-Id: I8d1dfc873d6186fa5e554262f5169e929978085e
2013-08-09 17:47:32 -07:00
Guillaume Martres
58b07a6f9d Honor min_partition_size properly
It represents the minimum partition size, so don't split if
bsize == min_partition_size .

Change-Id: Id77c32d6afef7d2ddec0368eaae18fb13227d30e
2013-08-09 17:28:33 -07:00
Dmitry Kovalev
67fe9d17cb Removing redundant code and function arguments.
Change-Id: Ia5cdda0f755befcd1e64397452c42cb7031ca574
2013-08-09 17:24:40 -07:00
Dmitry Kovalev
e7c5ca8983 Merge "Inlining 16 as a stride for BLOCK_OFFSET macro." 2013-08-09 17:22:46 -07:00
James Zern
ef101af8ae Merge "vp9_rd_pick_inter_mode_sb: fix uninitialized value" 2013-08-09 17:13:32 -07:00
Dmitry Kovalev
f1559bdeaf Inlining 16 as a stride for BLOCK_OFFSET macro.
Change-Id: I7f23d174eb089e5500f268a10db09648634c1b82
2013-08-09 16:40:05 -07:00
James Zern
f295774d43 vp9_rd_pick_inter_mode_sb: fix uninitialized value
'skippable' can remain unset and negatively affect later decisions

address one aspect of issue #599

Change-Id: Iffdf0ac2e49ac481c27dc27c87fa546d4167bb28
2013-08-09 16:26:22 -07:00
Dmitry Kovalev
125146034e Merge "Using MV struct instead of int[2] array." 2013-08-09 15:33:08 -07:00
Dmitry Kovalev
cd0629fe68 Merge "Removing plane_block_{width, height}_log2by4 functions." 2013-08-09 15:26:51 -07:00
Dmitry Kovalev
ff7df102d9 Merge "Moving loopfilter struct to VP9_COMMON." 2013-08-09 15:23:00 -07:00
Dmitry Kovalev
816d6c989c Moving loopfilter struct to VP9_COMMON.
Loop filter configuration doesn't belong to macroblock, so moving it from
MACROBLOCKD to VP9_COMMON. Also moving the declaration of loopfilter struct
from vp9_blockd.h to vp9_loopfilter.h.

Change-Id: I4b3e34be9623b47cda35f9b1f9951f8c5b1d5d28
2013-08-09 14:41:51 -07:00
Dmitry Kovalev
8ffe85ad00 Moving scale_factors and related code to separate files.
Change-Id: I531829e5aee2a4a7a112d528ecccbddf052d0e74
2013-08-09 14:07:09 -07:00
Scott LaVarnway
ace93a175d Merge "Bug fix: call set_offsets before rd_auto_partition_range" 2013-08-09 12:30:52 -07:00
Dmitry Kovalev
fa0cd61087 Merge "Using buf_2d struct instead of separate buffer and stride vars." 2013-08-09 11:50:58 -07:00
Scott LaVarnway
41251ae558 Bug fix: call set_offsets before rd_auto_partition_range
The set_offsets call is necessary inorder to set the
mode_info_context ptr correctly.

Change-Id: I644910cc5bacc50ee9cd78458843274ad8ee636d
2013-08-09 14:09:49 -04:00
Adrian Grange
0eef1acbef Merge "Correct bug in loopfilter initialization" 2013-08-09 09:51:58 -07:00
Adrian Grange
12eb2d0267 Correct bug in loopfilter initialization
The memset sets 16 bytes rather than the correct size of the
final array dimension (MAX_MODE_LF_DELTAS).

(In response to bug posted by Manjit Hota to codec-devel
and webm-discuss lists)

Change-Id: I8980f5aa71ddc9d7ef57c5b4700bc28ddf8651c7
2013-08-09 09:21:15 -07:00
Yaowu Xu
6ec2b85bad Added lpf level picking using partial frame
Change-Id: I599ab1bd22b5f3f10d5962c609952abdef8ff67a
2013-08-09 07:37:08 -07:00
Yaowu Xu
6a7a4ba753 renamed vp8_yv12_copy_y to vpx_yv12_copy_y
Becuase the routine is used by both vp8 and vp9

Change-Id: I2d35b287b5bc2394865d931a27da61f4ce7edeeb
2013-08-09 07:37:08 -07:00
Yaowu Xu
c7c9901845 added a speed feature on lpf level picking
Change-Id: Id578f8afdeab3702fc8386969f2d832d8f1b5420
2013-08-09 07:36:32 -07:00
Yaowu Xu
e3c92bd21e Merge "fix unit test failure on win32 vs2008 build" 2013-08-09 07:19:59 -07:00
Dmitry Kovalev
6fd2407035 Using buf_2d struct instead of separate buffer and stride vars.
Change-Id: Id5cc3566cc16d1e3030ddb4d1c58459320321dca
2013-08-08 21:25:48 -07:00
Dmitry Kovalev
6a8ec3eac2 General code cleanup.
Removing redundant parenthesis and curly braces. Combining declarations
with initializations. Adding useful intermediate variables instead of
recalculating expressions every time.

Change-Id: I00106f404afd60bfc189905b0fded881684f941a
2013-08-08 21:12:34 -07:00
Yaowu Xu
bc484ebf06 fix unit test failure on win32 vs2008 build
The mix use of double type and simd code caused invalid values stored
in double variables, further caused unit tests to fail. The failures
were only observed on x86-win32-vs9 build with vs2008.

Change-Id: If0131754a3bf217a5ace303b7963e8f5162c34b5
2013-08-08 18:51:51 -07:00
Dmitry Kovalev
ee40e1a637 Merge "Cleanup inside vp9_reconinter.c." 2013-08-08 14:59:38 -07:00
Deb Mukherjee
2158909fc3 Merge "Adds a new subpel motion function" 2013-08-08 12:26:55 -07:00
Dmitry Kovalev
9e3bcdd135 Merge "Removing unneeded intermediate entropy_nodes_adapt var." 2013-08-08 12:16:57 -07:00
Dmitry Kovalev
47fad4c2d7 Using MV struct instead of int[2] array.
Change-Id: Iab951c555037e36b154f319f351c5e67f9abb931
2013-08-08 12:01:56 -07:00
Dmitry Kovalev
ac008f0030 Removing unneeded intermediate entropy_nodes_adapt var.
Change-Id: I541a178d997b4541e0e2d4d5b854e2ed6b113c3a
2013-08-08 11:52:02 -07:00
Deb Mukherjee
1ba91a84ad Adds a new subpel motion function
Adds a new subpel motion estimation function that uses a 2-level
tree-structured decision tree to eliminate redundant computations.
It searches fewer points than iterative search (which can search
the same point multiple times) but has the same quality roughly.

This is made the default setting at speeds 0 and 1, while at
speed 2 and above only a 1-level search is used.

Also includes various cleanups for consistency and redundancy removal.

Results:
derf: +0.012% psnr
stdhd: +0.09% psnr
Speedup of about 2-3%

Change-Id: Iedde4866f5475586dea0f0ba4cb7428fba24eee9
2013-08-08 11:41:49 -07:00
Adrian Grange
83ee80c045 Moved fast motion search level decision to function
Moving this block of code into a function makes the
code easier to read and change.

Change-Id: If4ede570cce1eab1982b188c4d3e4fd3d4db236e
2013-08-08 11:01:44 -07:00
Adrian Grange
aae6a4c895 Simplify & fix potential bug in rd_pick_partition
Different partitionings were not being evaluated against
best_rd and there were unnecessary calls to RDCOST. This
could have resulted in a non-optimal partioning being
selected.

I simplified the variables used to track the rate,
distortion and RD values throughout the function.

Change-Id: Ifa7085ee80d824e86791432a5bc6d8fea5a3e313
2013-08-08 09:55:45 -07:00
Jingning Han
6bfcce8c7a Merge "Use low precision 32x32fdct for encodemb in speed1" 2013-08-07 19:05:14 -07:00
Dmitry Kovalev
61c33d0ad5 Removing plane_block_{width, height}_log2by4 functions.
Change-Id: I040b82b8e32aee272d10cbb021c7ba1c76343d7a
2013-08-07 17:06:33 -07:00
Dmitry Kovalev
a766d8918e Cleanup inside vp9_reconinter.c.
Using block width and block height instead of their logarithms. Using
SUBPEL_BITS and SUBPEL_SHIFTS constants instead of magic numbers.

Change-Id: I4e10e93c907c8a5e1cb27dfe74d1fcdcc4995448
2013-08-07 17:02:28 -07:00
Dmitry Kovalev
82d7c6fb3c Merge "Using only one scale function in scale_factors struct." 2013-08-07 16:32:09 -07:00
Dmitry Kovalev
1492698ed3 Merge "Adding ss_size_lookup table." 2013-08-07 16:08:24 -07:00
Jingning Han
debb9c68c8 Use low precision 32x32fdct for encodemb in speed1
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.

Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.

Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
2013-08-07 15:34:12 -07:00
Dmitry Kovalev
8db2675b97 Adding ss_size_lookup table.
Removing the old one bsize_from_dim_lookup. Now we have a way to determine
block size for plane using its subsampling values (ss_size_lookup). And
then we can find the number of pixels in the block (num_pels_log2_lookup).

Change-Id: I6fc981da2ae093de81741d3d78eaefed11015db9
2013-08-07 15:33:17 -07:00
Dmitry Kovalev
ea2348ca29 Merge "Removing NMS_STATS defines." 2013-08-07 15:28:30 -07:00
Christian Duvivier
78182538d6 Neon version of vp9_short_idct4x4_add.
Change-Id: Idec4cae0cb9b3a29835fd2750d354c1393d47aa4
2013-08-06 18:41:27 -07:00
Deb Mukherjee
296931c817 Merge "Clean ups of the subpel search functions" 2013-08-06 17:28:48 -07:00
Deb Mukherjee
71b43b0ff0 Clean ups of the subpel search functions
Removes some unused code and speed features, and organizes the
interfaces for fractional mv step functions for use in new speed
features to come.

In the process a new speed feature - number of iterations per
step during the subpel search - is exposed.

No change when this parameter is set as the original value of 3.

Results:
subpel_iters_per_step = 3: baseline
subpel_iters_per_step = 2: psnr -0.067%, 1% speedup
subpel_iters_per_step = 1: psnr -0.331%, 3-4% speedup

Change-Id: I2eba8a21f6461be8caf56af04a5337257a5693a8
2013-08-06 17:23:50 -07:00
Dmitry Kovalev
63ec0587c1 Merge "Motion vector code cleanup." 2013-08-06 16:00:01 -07:00
Dmitry Kovalev
1c552e79bd Using only one scale function in scale_factors struct.
Functions scale_mv_q4 and scale_mv_q3_to_q4 were almost identical except
q3->q4 conversion in scale_mv_q3_to_q4. Now q3->q4 conversion happens
directly in vp9_build_inter_predictor.

Also adding useful constants: SUBPEL_BITS and SUBPEL_MASK.

Change-Id: Ia0a6ad2ac07c45fdf95a5139ece6286c035e9639
2013-08-06 15:43:56 -07:00
Jingning Han
2c091f9768 Merge "Place holder for high-precision 32x32 fdct" 2013-08-06 14:47:30 -07:00
Jim Bankoski
5b307886fb variance x86inc guards
also fixed bug in sad calcs

Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d
2013-08-06 14:17:13 -07:00
Jim Bankoski
6eb1254b88 sse3 intrapred x86inc protected
Change-Id: I4a3c83119cdf8a205920034c8019d855d5504605
2013-08-06 14:17:13 -07:00
Deb Mukherjee
fac7c8c9f9 Merge "Flexible support for various pattern searches" 2013-08-06 14:03:27 -07:00
Jim Bankoski
c9126e0b30 sad + miscellaneous updates
Enable use_x86inc as a commandline option.  Fix Bug with sse2 when
x86inc is disabled. Adds Sad asm protection to x86inc protection

Change-Id: Iee0f9dd235ea10e8ace512eb362ba9bebe8c9df6
2013-08-06 12:16:04 -07:00
Dmitry Kovalev
8725ca2ed2 Merge "Inlining vp9_get_pred_probs_switchable_interp function." 2013-08-06 11:57:45 -07:00
Deb Mukherjee
15b5a6a2c7 Flexible support for various pattern searches
Adds a few pattern searches to achieve various tradeoffs
between motion estimation complexity and performance.
The search framework is unified across these searches so that a
common pattern search function is used for all. Besides it will
be easier to experiment with various patterns or combinations
thereof at different scales in the future.

The new pattern search is multi-scale and is capable of using
different patterns at different scales.

The new hex search uses 8 points at the smallest scale
and 6 points at other scales.
Two other pattern searches - big-diamond and square are
also added. Big diamond uses 4 points at the smallest scale and
8 points in diamond shape at the larger scales.
Square is very similar conceptually to the default n-step search
but is somewhat faster since it keeps only one survivor across
all scales.

Psnr/speed-up results on derf300:

hex: -1.6% psnr%, 6-8% speed-up
big-diamond: -0.96% psnr, 4-5% speedup
square: -0.93% psnr, 4-5% speedup

Change-Id: I02a7ef5193f762601e0994e2c99399a3535a43d2
2013-08-06 11:56:39 -07:00
Jingning Han
28566a6cd5 Place holder for high-precision 32x32 fdct
Resolve compile warnings on re-define FDCT32x32_2D template.

Change-Id: Idb3a54ef8d2710ce7245b726379a0e5c875f5cad
2013-08-06 11:44:08 -07:00
Dmitry Kovalev
0c80065694 Inlining vp9_get_pred_probs_switchable_interp function.
There was no benefit having this function. For example, inside
read_switchable_filter_type switchable filter context was calculated twice.

Change-Id: I79cd5bf95cbc0f6d8bf91a2e32289e01b18dcff1
2013-08-06 11:04:31 -07:00
Jingning Han
7d61f8fe53 Merge "Move fdct32x32 SSE2 implementation in separate file." 2013-08-06 10:46:41 -07:00
Jim Bankoski
efc94102f0 Merge "intrapred x86inc guards" 2013-08-06 10:39:19 -07:00
Dmitry Kovalev
a39abe2627 Motion vector code cleanup.
Converting arguments of two functions (clamp_mv_ref, lower_mv_precision)
from int_mv* to MV*. Rewriting is_inside function to make it much shorter.

Change-Id: Ie4c4cf3eccd46707c7df099ec21fb1b61c72fc7a
2013-08-06 10:31:11 -07:00
Dmitry Kovalev
3e51acafec Merge "Finally removing all old block size constants." 2013-08-06 10:30:37 -07:00
Dmitry Kovalev
4a692e4168 Merge "Changing the order switchable filter enum constants." 2013-08-06 10:30:26 -07:00
Dmitry Kovalev
25b7dc08cd Merge "Removing unused functions." 2013-08-06 10:29:57 -07:00
Deb Mukherjee
33afddadb9 Merge "Add variance based mode/skipping" 2013-08-06 10:19:15 -07:00
Christian Duvivier
3d98205fce Move fdct32x32 SSE2 implementation in separate file.
This is in preparation for the SSE2 version of the high-precision
32x32 forward DCT which will share a lot of code with the existing
low precision version used for rate-distortion search.

Change-Id: I7084b6bdfb480b1fabb8493fb14e3f7fcc7888c0
2013-08-06 10:17:11 -07:00
Jim Bankoski
25ec1375c9 intrapred x86inc guards
Change-Id: If0399d8e11f4ebe75a5c91abb8d6a52a7709065b
2013-08-06 09:39:30 -07:00
Jim Bankoski
62c6aa884d block error / x86inc mods
Change-Id: Icb607745634e10b9bac5019d06661ece09fcdb40
2013-08-06 06:23:38 -07:00
Jim Bankoski
a93b115cd6 reworked config for use_x86_inc
Support enabling it or disabling it.  Moved read out to configure.sh
so that its done once instead of in make and in config.

Change-Id: I73a9190cf31de9f03e8a577f478fa522f8c01c8b
2013-08-05 17:35:25 -07:00
James Zern
d115cd8b12 Merge changes I082959ab,Ib6932640
* changes:
  vp9/decoder: threaded row-based loop filter
  vp9/decoder: add thread worker
2013-08-05 16:07:09 -07:00
Dmitry Kovalev
b9c7d04e95 Finally removing all old block size constants.
Change-Id: I3aae21e88b876d53ecc955260479980ffe04ad8d
2013-08-05 15:23:49 -07:00
Jim Bankoski
f4837579d1 fixed script problem with config_force_x86_inc
Change-Id: I226e5094d216b09dc47fa5511a66e2d314608000
2013-08-05 14:48:20 -07:00
Jim Bankoski
a5a7322459 Merge "Begin to restrict x86inc.asm usage" 2013-08-05 14:17:49 -07:00
Deb Mukherjee
8b3faccb9e Add variance based mode/skipping
Adds a speed feature to skip all intra modes other than
DC_PRED if the source variance is small. This feature is
made part of speed 1 and up.

Results on derf300: psnr -0.07%, speedup about 1-2%

Also uses the source variance to fine-tune the early
termination criteria when FLAG_EARLY_TERMINATE is on.
This feature is made part of speed 2 and up.

Results on derf300: psnr -0.52%, speedup about 5-7%

Change-Id: I59e38aa836557cfa5405ae706fc64815cbfe4232
2013-08-05 14:14:01 -07:00
Jim Bankoski
9f988a2edf Merge "cleanups after bw bh code" 2013-08-05 14:02:02 -07:00
James Zern
a0ffa2794b vp9/decoder: threaded row-based loop filter
Currently the only threaded option for vp9 decode. Enabled when the
decoder config thread count is > 1.

Change-Id: I082959abac9e31aa4a38ed9fd68b94680e57f4df
2013-08-05 13:22:04 -07:00
James Zern
183b77d5ab vp9/decoder: add thread worker
vp9/decoder/vp9_thread.[hc]
Original source:
 http://git.chromium.org/webm/libwebp.git
 100644 blob b1615d0fb8d311666b2fa4561076c62d72c2e3ff  src/utils/thread.c
 100644 blob 13a61a4c84194c3374080cbf03d881d3cd6af40d  src/utils/thread.h

Local modifications:
 - s/WebP/VP9/g
 - camelcase functions -> lower with _'s

Change-Id: Ib6932640ee34f8b4782c6fbd15864a59d5d4c5fe
2013-08-05 13:21:13 -07:00
Dmitry Kovalev
3f611555d7 Changing the order switchable filter enum constants.
This changeset allows to remove vp9_switchable_interp and
vp9_switchable_interp_map arrays and make code much clear. Actually we
still have to use these mapping but only inside read_interp_filter_type and
write_interp_filter_type functions.

Change-Id: I4026c6f8c4acefba6c81421b7bacbaa52cc45f50
2013-08-05 12:26:15 -07:00
Jim Bankoski
5d2cb7ead0 cleanups after bw bh code
Cons bw/bh parms that should have been const. Additional formatting.

Change-Id: Icd36a5c9dc17dadd7284315ac0d6fef1a565ca16
2013-08-05 12:15:52 -07:00
Paweł Hajdan
9ab47772bd Update README
- new date
- add VP9 to the title
- update list of available targets

Change-Id: I56263336db393020bac5da8e42fbac3a276ffb1f
2013-08-05 12:13:32 -07:00
Jim Bankoski
c3809f3de5 Begin to restrict x86inc.asm usage
Chromium does not support 32bit builds for Mac which use x86inc.asm.
Make the files which include it work if 64bit or not PIC enabled
starting with vp9_copy_sse2.asm

Consolidate these targets in vp9_rtcd_defs.sh

Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248
2013-08-05 12:07:30 -07:00
Dmitry Kovalev
d007446b3f Replacing long block size enum values with shorter ones (2).
Change-Id: I428c4d42212b757112e3acfe5b81314cfbb5fd6b
2013-08-05 10:51:02 -07:00
Dmitry Kovalev
319867d71c Merge "Cleaning up vp9_build_inter_predictor function." 2013-08-05 01:52:11 -07:00
Dmitry Kovalev
78671e2eff Merge "Replacing "txfm" with "tx" in identifiers." 2013-08-04 02:52:22 -07:00
Jim Bankoski
f703f98757 reworked find_mv_ref
This is an attempt at rewriting vp9_find_mv_refs_idx.   I believe that it gains
about 1-2% decode speed

Change-Id: Ia5359c94ce9bb43b32652890e605e9a385485c1b
2013-08-03 20:25:55 -07:00
Dmitry Kovalev
fe2a201eb1 Replacing "txfm" with "tx" in identifiers.
Consistent names with TX_SIZE, TX_MODE, and TX_MODE.

Change-Id: I79592218bf5a40ace89197a34a06ee7de581ed8d
2013-08-02 17:28:23 -07:00
Dmitry Kovalev
5edc65d00d Removing NMS_STATS defines.
Change-Id: Iabab0e59042a33456df1d449c0d0f01debc00c7c
2013-08-02 17:10:15 -07:00
Dmitry Kovalev
7b50333e8f Merge "Adding is_inter_block function." 2013-08-02 16:54:32 -07:00
Dmitry Kovalev
5f0a52faaf Cleaning up vp9_build_inter_predictor function.
Change-Id: I94f6b4272b95ac101de6d10f048116ba065788b0
2013-08-02 16:53:18 -07:00
Dmitry Kovalev
fec4ec4edd Removing unused functions.
Removed functions:
  model_rd_for_sb_y,
  block_error_sby,
  get_sb_variance

Change-Id: Iec458df180caf6f8eac3605773841a4121dd3a8f
2013-08-02 16:41:09 -07:00
Dmitry Kovalev
603931e291 Merge "Changing function arg type from int_mv* to MV*." 2013-08-02 16:30:06 -07:00
Dmitry Kovalev
a6adc82e78 Merge "Cleanups around allow_high_precision_mv flag." 2013-08-02 16:27:05 -07:00
Dmitry Kovalev
680ec32d18 Adding is_inter_block function.
Using it instead of long unclear verbose check
"mbmi->ref_frame[0] != INTRA_FRAME".

Change-Id: I9c7b4b3797942fa962bf3ba7460fff3084beabe9
2013-08-02 16:25:33 -07:00
Dmitry Kovalev
d4e020c4b1 Merge "Cleaning up set_contexts_on_border function." 2013-08-02 16:22:50 -07:00
Yunqing Wang
d340c114fb Merge "Add more checking to using_small_partition_info" 2013-08-02 15:55:09 -07:00
Dmitry Kovalev
769bcab3f5 Cleaning up set_contexts_on_border function.
Change-Id: I8f21c18b29f54b277fb1c167f278f109d9f3b996
2013-08-02 15:52:26 -07:00
Dmitry Kovalev
25b77e2569 Changing function arg type from int_mv* to MV*.
Change-Id: Ic878d31df2ce783a2c9a8c4bc9ed301ec8ffe25e
2013-08-02 15:26:32 -07:00
Dmitry Kovalev
5d86f3886d Moving struct loop_filter_info from *.h to *.c file.
Change-Id: I3fe90eb40088a5b07bdc7d66d93ffe6ef99943d5
2013-08-02 11:53:49 -07:00
Adrian Grange
60ff123536 Merge "Fixed typos and added a few explanatory comments" 2013-08-02 11:37:47 -07:00
Adrian Grange
075b11f004 Merge "Changed name of rd_pick_intra4x4mby_modes" 2013-08-02 11:36:46 -07:00
Johann
8ff58093f0 Merge "vp9: neon: convolve: replace some insns with simpler equivalents" 2013-08-02 11:28:31 -07:00
Johann
8bebfbf7c5 Merge "vp9: neon: convolve: simplify branching to C fallbacks" 2013-08-02 11:28:25 -07:00
Johann
7d14ce8ba5 Merge "vp9: neon: optimise loads in horiz convolve functions" 2013-08-02 11:28:04 -07:00
Johann
319b7dc283 Merge "vp9: neon: add vp9_mb_lpf_* functions" 2013-08-02 11:27:52 -07:00
Dmitry Kovalev
86053d3ae2 Cleanups around allow_high_precision_mv flag.
Change-Id: Ic07f5f8ffeaedd5b7513b464871f83afc82dcd5c
2013-08-02 11:21:16 -07:00
Dmitry Kovalev
b47153deed Replacing long block size enum values with shorter ones.
Change-Id: I0e9329490828684a4fd46f540d89114cc68e8407
2013-08-02 10:48:27 -07:00
Yunqing Wang
0d68080445 Merge "Comment out 2 unused speed features" 2013-08-02 09:58:46 -07:00
Mans Rullgard
355cb14dc7 vp9: neon: convolve: replace some insns with simpler equivalents
Change-Id: I5d6906772e6e6adf68d7f0fd5b8b5207a64a3a37
2013-08-02 08:11:28 -07:00
Mans Rullgard
2003468df8 vp9: neon: convolve: simplify branching to C fallbacks
Change-Id: Ic7cacd02d6dc9243ad8fc85082c5618a9d1e66dc
2013-08-02 08:11:25 -07:00
Mans Rullgard
5e2e78d024 vp9: neon: optimise loads in horiz convolve functions
Loading to single lanes in multiple registers is expensive since
it requires a read and write of each register which saturates
the register file access.  Loading to single registers followed
by a separate transpose reduces this pressure.

Change-Id: I4cc35887ddbca80e5e635b50d2b1d158de9668ee
2013-08-02 08:11:08 -07:00
Mans Rullgard
d85ae87183 vp9: neon: add vp9_mb_lpf_* functions
Change-Id: I13e0880df234f15abc4cc7c57fe84488d5d46a75
2013-08-02 08:10:50 -07:00
Dmitry Kovalev
d91e9f4e36 Merge "Cleanup: replacing xd->seg with seg, and xd->lf with lf." 2013-08-01 23:17:17 -07:00
Dmitry Kovalev
4144fee9e9 Merge "Cleanup: reusing clamp_mv function." 2013-08-01 23:16:56 -07:00
Jingning Han
555bbd68c7 Merge "Remove unused vp9_short_idct10_32x32_add" 2013-08-01 15:41:35 -07:00
Dmitry Kovalev
741537f3ce Cleanup: replacing xd->seg with seg, and xd->lf with lf.
Change-Id: I73b59d7699a8e7e7acd3bf8041cb6c98ce9ba4bf
2013-08-01 15:38:16 -07:00
Dmitry Kovalev
9f4f001ba5 Merge "Cleanup: removing unused function arguments." 2013-08-01 15:07:12 -07:00
Dmitry Kovalev
422d38bca1 Cleanup: reusing clamp_mv function.
Change-Id: I8715f08a3554bdb557c5f935f1dfbd671f18e766
2013-08-01 15:06:34 -07:00
Dmitry Kovalev
ddf02e323a Merge "Nice looking motion vector clamping functions." 2013-08-01 14:50:14 -07:00
Deb Mukherjee
19d42de3ca Merge "Adds a source variance computation function" 2013-08-01 14:18:43 -07:00
Dmitry Kovalev
0497b8d7cd Merge "vp9_get_pred_context_intra_inter cleanup." 2013-08-01 14:15:53 -07:00
Dmitry Kovalev
ce8dedc353 Cleanup: removing unused function arguments.
Change-Id: I27471768980fc631916069f24bc7c482a5c9ca17
2013-08-01 13:41:38 -07:00
Dmitry Kovalev
b621e2d72e Nice looking motion vector clamping functions.
Removing assign_and_clamp_mv function, making implementation of clamp_mv
and clamp_mv2 more clear and consistent.

Change-Id: Iecd08e1c1bf0379f8314ebe01811f8253f4ade58
2013-08-01 13:40:26 -07:00
Deb Mukherjee
dbea726daf Adds a source variance computation function
Adds a function to compute source variance for various
sb_types to be used for pruning mode and partition searches.
[The existing activity measure function is currently specialized
for only 16x16 MBs and needs to be updated].

Change-Id: I22a41e6f1430184201487326fdbebb9b47e6fc24
2013-08-01 13:01:54 -07:00
Jingning Han
67719abde1 Remove unused vp9_short_idct10_32x32_add
The inverse 32x32 transform detects all zero entries and skips the
computations accordingly per 8 rows in the first 1-D operation. The
function vp9_short_idct10_32x32_add performs differently and is not
used anywhere, hence removed.

Change-Id: Ic4fad422debbde7b6b6ffed47c69fbd4268a906c
2013-08-01 12:45:16 -07:00
Jingning Han
56df76bf1b Merge "Optimize 32x32 2D inverse DCT for speed-up" 2013-08-01 11:53:39 -07:00
Yunqing Wang
215b010f4b Add more checking to using_small_partition_info
If the partition is out of partition size range, we don't
need to process small partition information.

Change-Id: Ice9bfbbdebe1f2ef79271a3aee17de0ed4608376
2013-08-01 11:37:41 -07:00
Yunqing Wang
7965a6ea34 Comment out 2 unused speed features
use_min_partition_size and use_max_partition_size are not used
currently, and could be added back if needed later.

Change-Id: Ib22a9c06b064567a7c1d6d5445567ed77e0d3acc
2013-08-01 11:03:34 -07:00
Dmitry Kovalev
ff4bfa726b Merge "Adding missing const to vp9_extra_bits array." 2013-08-01 10:19:51 -07:00
Adrian Grange
89e73c63c0 Fixed typos and added a few explanatory comments
Change-Id: Ib4e4b41094b54874ee34343dd77c0c131ceed9d2
2013-08-01 09:23:49 -07:00
Adrian Grange
5271d47892 Changed name of rd_pick_intra4x4mby_modes
The function name rd_pick_intra4x4mby_modes is confusing, so
I changed it to rd_pick_intra_sub_8x8_y_modes to better
reflect what the function does. Also added const qualifiers
to some of the input parameters and removed camel-case.

Change-Id: I23d53d4c7af5d79ed8a471acd59a09bbb47add39
2013-08-01 09:23:49 -07:00
Dmitry Kovalev
5b65246a71 Adding missing const to vp9_extra_bits array.
Change-Id: Icd128ab58719e0b9066bdfa66a5d0d427a84d6df
2013-07-31 18:51:18 -07:00
Dmitry Kovalev
fb3e78a73a vp9_get_pred_context_intra_inter cleanup.
Change-Id: I8beeee4c020425175f7d5ec83be86afa7b95da1a
2013-07-31 18:33:04 -07:00
Jingning Han
9d67495f72 Optimize 32x32 2D inverse DCT for speed-up
This commit exploits the sparsity of quantized coefficient matrix.
It detects each 32x8 array and skip the corresponding inverse
transformation if all entries are zero.

For ped1080p at 8000 kbps, this on average reduces the runtime of
32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200
cycles. It makes the overall encoding process about 2% faster at
speed 0. The speed-up is more pronounceable for the decoding process.

Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf
2013-07-31 17:13:31 -07:00
Jingning Han
12f5762756 Remove unnecessary arguments in rd_pick_ref_frame
This commit removes redundant arguments passing in the function of
rd_pick_reference_frame. This resolves the clang warnings about
potential use of uninitialized values.

Change-Id: Ic68f949a9f8fcd0a583786b0c75321104ea44739
2013-07-31 17:04:13 -07:00
Dmitry Kovalev
8259cdf298 vp9_decodemv.c cleanup.
Inlining VP9_NMV_UPDATE_PROB constant, consistent local variable names.

Change-Id: I01692501982568fa535882d6b320e3c692f88abb
2013-07-31 15:03:36 -07:00
Dmitry Kovalev
9239e96536 Removing get_mi_{row, col} functions.
Passing mi_row and mi_col parameters to functions explicitly. Removing
unused xd argument from scale_mv function.

Change-Id: Icb4c495ec72d26fb066c14470d3ae0b741fbf18a
2013-07-31 14:06:55 -07:00
Dmitry Kovalev
3be9fd9120 Merge "Removing unused "ishp" arguments." 2013-07-31 12:03:04 -07:00
Dmitry Kovalev
0e0a6f840b Merge "Consistent update for inter_mode probabilities." 2013-07-31 12:02:35 -07:00
Dmitry Kovalev
500ade243a Removing unused "ishp" arguments.
Using different variable names "allow_hp" and "use_hp" instead of "usehp".

Change-Id: I0cd5996ddeb46bd754473b680a993c0aaf8eb879
2013-07-31 11:27:53 -07:00
Jingning Han
ac7bab7575 Merge "Make the use of ref_frame index consistent" 2013-07-31 09:11:37 -07:00
Jingning Han
86c384d398 Make the use of ref_frame index consistent
Refactor the frame buffer referencing in choose_partition and make
it consistent with other places. This means to prevent potential
issues when we extend reference frame buffer.

Change-Id: I5ff33ed5f671e1f4cc7049622212769a9b4578d9
2013-07-30 19:49:36 -07:00
Dmitry Kovalev
8701bc11df Consistent update for inter_mode probabilities.
Using inter-mode counts instead of inter-mode-tree branch counts inside
FRAME_COUNTS structure.

Change-Id: I60dde13af37d06146d7d15543311c1b5044e9e04
2013-07-30 18:06:34 -07:00
Adrian Grange
18cb8bdd20 Merge "Cleanup: remove two stray '+', fix typos." 2013-07-30 13:01:14 -07:00
Adrian Grange
fbd73648dd Merge "Cleanup typos, remove unnecessary lines, replace switch" 2013-07-30 12:59:46 -07:00
Adrian Grange
6f0882d064 Cleanup: remove two stray '+', fix typos.
Change-Id: I9c30e3dbedabe4942439a0ee2f691fb9a04cd03b
2013-07-30 12:15:25 -07:00
Adrian Grange
b30a06b930 Cleanup typos, remove unnecessary lines, replace switch
Removed unnecessary code lines, replaced switch with an if,
fixed spelling errors and formatting.

Change-Id: Ie48aa4604aa0ed48362ca359d792fb21b2ec1dc6
2013-07-30 12:10:32 -07:00
Yaowu Xu
88e48444da Merge "removed duplication" 2013-07-30 09:38:02 -07:00
Yaowu Xu
a15d1f3134 removed duplication
Change-Id: Ica23b66f6664e5a5b168499584f0afffbc54794f
2013-07-30 09:09:14 -07:00
Jingning Han
525745b17a Remove a redundant branching in tokenize_b
The tokenize_b function is only called when output flag is on. Hence
removing the conditional branch on it therein.

Change-Id: Ib709f47f23f39ca05a695faf86fa3377f11f2dd0
2013-07-29 17:08:13 -07:00
Jingning Han
455f2de20b Tune tokenization/detokenization flow for speed-up
This commit optimizes the tokenization and detokenization operational
flow for speed-up. It makes the coding process about 0.3% faster at
speed 0.

Change-Id: I28008df7482874e4b5f237f2d418ff82a249dd56
2013-07-29 16:15:30 -07:00
Jingning Han
b5323ed89a Skip redundant tokenization in rd loop
This commit makes the encoder skip the redundant tokenization process
in the rate-distortion optimization search loop, while updating the
entropy contexts accordingly. It makes the speed 0 encoding process
about 0.5% faster at no performance change.

Change-Id: I34a4155a0b5332afeb45c93a51c7f35a294d685c
2013-07-29 16:09:16 -07:00
Jingning Han
5875d7a4a4 Merge "16x16 inverse 2D-DCT with DC only" 2013-07-29 15:29:25 -07:00
John Koleszar
9c6fafb25b Merge "Remove unnecessary 64 byte alignment" 2013-07-29 15:09:15 -07:00
Jingning Han
a7c4de22e1 16x16 inverse 2D-DCT with DC only
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.

Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
2013-07-29 14:45:53 -07:00
Dmitry Kovalev
828119d6ab Renaming txfm to tx for consistency in some places.
Change-Id: I2a6a646570e2af66315e7c658d00d99f80c4b127
2013-07-29 14:35:55 -07:00
John Koleszar
a31effca75 Remove unnecessary 64 byte alignment
Fixes a warning on MSVS 2012 where the alignment of vp9_default_iscan_8x8
didn't match between its declaration and definition.

Change-Id: I1466a15635f4b22594d705d570b7e399bfb6cf21
2013-07-29 14:02:02 -07:00
Dmitry Kovalev
730a34416f Renaming NB_TXFM_MODES constant to TX_MODES.
Change-Id: I10bf06e3a3d5271221ae6a42a36074d01d493039
2013-07-29 13:38:40 -07:00
Dmitry Kovalev
23391ea835 Renaming TX_SIZE_MAX_SB to TX_SIZES.
Change-Id: I6aa4191935aa93461a07c41b59fdae1eb5f5f107
2013-07-29 12:25:34 -07:00
Jingning Han
decb1b94de Merge "Shortcut 8x8/16x16 inverse 2D-DCT" 2013-07-29 11:04:07 -07:00
Dmitry Kovalev
cc0ff7ecfa Cleanup: replacing xd->mode_info_context with temp variable.
Change-Id: I5a3e83102784cabb918a5404405fcab99c5bb9b6
2013-07-26 19:05:37 -07:00
Ronald S. Bultje
118ccdcd30 Inverse dimension order in token_cost array.
This allows us to increment the position at the band-level only as
we go from one band to the next; more importantly, that allows us to
use an add instead of multiply instruction, and omit the instruction
altogether if the band doesn't change from one coef to the next, thus
being slightly faster (probably more noticeable on systems where a
multiply is expensive, like arm).

Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
2013-07-26 17:30:04 -07:00
Dmitry Kovalev
35e7e7b614 Merge "vp9_decodemv.c cleanup." 2013-07-26 17:24:34 -07:00
Ronald S. Bultje
6f3054b65d Merge "d45 intra prediction SSSE3 optimizations." 2013-07-26 17:21:09 -07:00
Ronald S. Bultje
dcacce6dd9 Merge "Save pixels instead of coefficients in intra4x4 RD loop." 2013-07-26 17:20:58 -07:00
Ronald S. Bultje
d30c8f41ef Merge "Add best_rd breakout in intra4x4 RD loop." 2013-07-26 17:20:51 -07:00
Jingning Han
38fa487164 Shortcut 8x8/16x16 inverse 2D-DCT
This commit brought back the shortcut implementation of 8x8/16x16
inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
operations on row 4:7/4:15 in the first round. For bus_cif at 1000
kbps, this provides about 2% speed-up at speed 0.

Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
2013-07-26 17:19:14 -07:00
Dmitry Kovalev
d42e60d2d8 vp9_decodemv.c cleanup.
Renaming:
  read_intra_mode_info  -> read_intra_frame_mode_info
  read_inter_mode_info  -> read_inter_frame_mode_info
  read_intra_block_part -> read_intra_block_mode_info
  read_inter_block_part -> read_inter_block_mode_info
  read_ref_frame        -> read_ref_frames
  read_reference_frame  -> read_is_inter_block

Using num_4x4_blocks_{wide, high}_lookup instead of bit shifts.

Change-Id: I83c81573b4ef6f53f2f8d24683895014bebfba61
2013-07-26 16:49:49 -07:00
Jingning Han
b9c3dd481a Merge "Special handle on DC only inverse 8x8 2D-DCT" 2013-07-26 16:04:14 -07:00
Dmitry Kovalev
620861dedc Merge "Making read_inter_mode_info function more clear." 2013-07-26 15:47:40 -07:00
hkuang
aaa9755746 Merge "Fix some format error and code error in neon code." 2013-07-26 15:24:28 -07:00
Jingning Han
325e0aa650 Special handle on DC only inverse 8x8 2D-DCT
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.

Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
2013-07-26 14:16:51 -07:00
hkuang
588b4daf54 Fix some format error and code error in neon code.
Change-Id: I748dee8938dfb19f417f24eed005f3d216f83a82
2013-07-26 14:14:57 -07:00
Dmitry Kovalev
c09b81719f Merge "General cleanups." 2013-07-26 13:59:39 -07:00
Ronald S. Bultje
94b0c6791d d45 intra prediction SSSE3 optimizations.
Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb
2013-07-26 13:30:02 -07:00
Yaowu Xu
4f75a1f4ed Merge "Auto min and max partition size experiment." 2013-07-26 12:10:27 -07:00
Paul Wilkins
fe5e2a91bb Auto min and max partition size experiment.
Speed feature experiment to set an upper and lower
partition size limit based on what has been seen
in spatial neighbors.

This seems to gives quite reasonable speed gains in local
(10-15%) and when used with speed 0 the losses are small
(0.25% derf, 0.35% stdhd). However, for now I am only
enabling it on speed 1 as there may be clashes with the existing
temporal partition selection in speed 2.

Using a tighter min / max around the range derived from the
neighbors increases speed further but at the cost of a
bigger quality loss. However,  I think this spatial method could
be combined with data from either the last frame or a variance
method (or both) to refine the range of minimum and maximum
partition size. I.e. consider the min and max from spatial and
temporal neighbors and the variance recommendation.

Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
2013-07-26 18:30:49 +01:00
Yunqing Wang
52256cdbca Modify static threshold calculation
Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;

Similar speedup is achieved. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.952  5.077s(50f)
akiyo            500        500        48.866  4.169s(50f)

parkjoy(1080p)   4000       0          30.388  78.20s(30f)
parkjoy          4000       500        30.367  70.85s(30f)

sunflower(1080p) 4000       0          44.402  74.55s(30f)
sunflower        4000       500        44.414  68.69s(30f)

Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3
2013-07-25 19:59:33 -07:00
Dmitry Kovalev
048e9c0991 Making read_inter_mode_info function more clear.
Now read_inter_mode_info calls read_intra_block_part (renamed from
read_intra_block_modes) or read_inter_block_part (just added).

Change-Id: I541badea6b663e0ae692ec158665efb90ed20c03
2013-07-25 15:30:18 -07:00
Johann
67b07c520d Merge "Add const to vp9_accum_mv_refs parameter" 2013-07-25 15:10:52 -07:00
Yunqing Wang
845fd5011c Merge "Add encoding option --static-thresh" 2013-07-25 14:58:00 -07:00
Yunqing Wang
d36852b702 Add encoding option --static-thresh
This option exists in VP8, and it was rewritten in VP9 to support
skipping on different partition levels. After prediction is done,
we can check if the residuals in the partition block will be all
quantized to 0. If this is true, the skip flag is set, and only
prediction data are needed in reconstruction. Based on DCT's energy
conservation property, the skipping check can be estimated in
spatial domain.

The prediction error is calculated and compared to a threshold.
The threshold is determined by the dequant values, and also
adjusted by partition sizes. To be precise, the DC and AC parts
for Y, U, and V planes are checked to decide skipping or not.

Test showed that
1. derf set:
when static-thresh = 1, psnr loss is 0.666%;
when static-thresh = 500, psnr loss is 1.162%;
2. stdhd set:
when static-thresh = 1, psnr loss is 1.249%;
when static-thresh = 500, psnr loss is 1.668%;

For different clips, encoding speedup range is between several
percentage and 20+% when static-thresh <= 500. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.923  5.635s(50f)
akiyo            500        500        48.863  4.402s(50f)

parkjoy(1080p)   4000       0          30.380  77.54s(30f)
parkjoy          4000       500        30.384  69.59s(30f)

sunflower(1080p) 4000       0          44.461  85.2s(30f)
sunflower        4000       500        44.418  78.1s(30f)

Higher static-thresh values give larger speedup with larger
quality loss.

Change-Id: I857031ceb466ff314ab580ac5ec5d18542203c53
2013-07-25 14:28:05 -07:00
Johann
6c8ef8d957 Add const to vp9_accum_mv_refs parameter
Change-Id: I0625d8ffddf590dfecd1bb8b8d6f57ef64b8bf18
2013-07-25 14:25:33 -07:00
Dmitry Kovalev
7131cb0e3d General cleanups.
Removing unused constants, macros, and function declarations. Using
ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
#include from *.h to *.c. Merging for loops for motion vectors.

Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
2013-07-25 14:13:48 -07:00
Dmitry Kovalev
d53fc9ee4e Merge "Adding lookup table for size group." 2013-07-25 13:57:28 -07:00
Dmitry Kovalev
08fd41ccd7 Adding lookup table for size group.
Change-Id: Ia6144d77ebed66e0739b62e4d673e26a95aa9550
2013-07-25 12:58:54 -07:00
Adrian Grange
e862c6f9eb Merge "Simplify handling of sub-partition motion vectors" 2013-07-25 12:58:38 -07:00
Adrian Grange
6f0f0e4907 Merge "Use local variables rather than structure members" 2013-07-25 12:57:52 -07:00
Dmitry Kovalev
be00d3970d Merge "Removing duplicated code for merging two probabilities." 2013-07-25 12:52:26 -07:00
Dmitry Kovalev
d604914f09 Merge "Removing vp9_adapt_mode_context function." 2013-07-25 12:46:31 -07:00
Jingning Han
d571af76d3 Merge "Make coeff_optimize initialized per-plane" 2013-07-25 12:46:14 -07:00
Dmitry Kovalev
f7ece83141 Merge "Inlining inc_mv_component_count function." 2013-07-25 12:45:23 -07:00
Dmitry Kovalev
9f8335d091 Merge "Removing duplicated PREDICTION_PROBS constant." 2013-07-25 12:45:03 -07:00
Yaowu Xu
51a8458822 Merge "fix a bug where flags are not reset" 2013-07-25 12:18:51 -07:00
Adrian Grange
be700e140a Simplify handling of sub-partition motion vectors
Simplified the code that extracts and uses the motion
vectors for the 4 sub-partitions in rd_pick_partition.

Change-Id: Iaf698ef7ee3aef9edd59015e1ae065dd359b17d9
2013-07-25 11:51:51 -07:00
James Zern
08202e0ae0 Merge "msvs: Generate proper configurations for mixed platforms" 2013-07-25 11:50:05 -07:00
Jingning Han
2f58faffa4 Make coeff_optimize initialized per-plane
This commit makes the initialization of trellis coeff optimization
a per-plane operation, thereby eliminating the redundant steps in
encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
faster.

Change-Id: Iffe9faca6a109dafc0dd69dc7273cbdec19b17cd
2013-07-25 11:44:29 -07:00
Dmitry Kovalev
778989a097 Removing duplicated PREDICTION_PROBS constant.
Already defined in vp9_seg_common.h.

Change-Id: I5a0e3fa15966b1ebeb77ccd506b55fc231c22342
2013-07-25 11:08:21 -07:00
Dmitry Kovalev
47d61f008f Removing vp9_adapt_mode_context function.
Moving code from vp9_adapt_mode_context to vp9_adapt_mode_probs.

Change-Id: I60829c30b28968cd813551ef3a206dfb98d323c9
2013-07-25 10:48:45 -07:00
Yaowu Xu
3e386aefc2 fix a bug where flags are not reset
The feature that uses small partition results as a measure to skip
mode evaluation at larger partition requires the flags to be reset.
The reset was missing in the code path that calls rd_use_partition().

Change-Id: Ia0a3a0aee1a862b6e2333d596808db7c48033d50
2013-07-25 10:28:38 -07:00
Jingning Han
242157c756 Merge "SSE2 inverse 4x4 2D-DCT with DC only" 2013-07-25 08:49:37 -07:00
Scott LaVarnway
a0e8b45fee Merge "pack_inter_mode_mvs cleanup" 2013-07-25 04:47:56 -07:00
Jingning Han
384e37e32b SSE2 inverse 4x4 2D-DCT with DC only
Add SSE2 implementation to handle the special case of inverse 2D-DCT
where only DC coefficient is non-zero.

Change-Id: I2c6a59e21e5e77b8cf39a4af5eecf4d5ade32e2f
2013-07-24 23:19:56 -07:00
Jingning Han
91fa12429c Merge "Merge vp9_dc_only_idct_add and vp9_short_idct4x4_1" 2013-07-24 23:18:24 -07:00
Dmitry Kovalev
40358dc406 Removing duplicated code for merging two probabilities.
Adding common merge_probs and merge_probs2 functions. Changing ints to
usigned ints in some places.

Change-Id: Icf088ffdea7cf5b95284a128916409bdd53506b0
2013-07-24 17:44:04 -07:00
Dmitry Kovalev
4450fa4cd9 Inlining vp9_init_mode_contexts function.
Change-Id: I21ee76bcae101cc9f6ef1d867622e50b7ae565fc
2013-07-24 17:03:03 -07:00
Jingning Han
d2de1ca37b Merge vp9_dc_only_idct_add and vp9_short_idct4x4_1
They share the same functionality, so merging together.

Change-Id: I98a0386fcee052cb854f9ff90c283c1b844bcb79
2013-07-24 16:51:15 -07:00
Dmitry Kovalev
fcc34796d2 Removing CONFIG_BALANCED_COEFTREE experiment.
Change-Id: I61a8b0101eac3ee2e0621d56151b90c269fd4db4
2013-07-24 15:53:42 -07:00
Dmitry Kovalev
1787b00214 Merge "Adding condition inside get_tx_type_{4x4, 8x8, 16x16}." 2013-07-24 15:23:22 -07:00
Dmitry Kovalev
0064958c71 Inlining inc_mv_component_count function.
Change-Id: Ic99d07a56b1752ec49fc5074b1dd6804b17609a0
2013-07-24 15:03:00 -07:00
Martin Storsjo
feefd81bd7 msvs: Generate proper configurations for mixed platforms
Prior to 73c4e284, the generated .sln files didn't contain any
information about the different configurations when using .vcxproj
project files. The MSVS IDE was able to fill this in just fine when
loaded though.

When building for ARM, the obj_int_extract project still is built
for x86, in order for the build process to be able to use
obj_int_extract.exe.

Now that configuration info is generated, it breaks current ARM
setups, since the configurations generated by gen_msvs_sln.sh only
included configurations from the last parsed project file (as
mentioned in the comment).

In these setups, the MSVS IDE generated a third meta-platform, called
"Mixed Platforms". This meta-platform points to either ARM or
Win32 as platform in each of the individual projects.

When the MSVS IDE generated this automatically, it also included
the original ARM and Win32 platforms as separate choices, but these
can be omitted since they don't make sense.

Change-Id: Ie25226496f91af4bb1ad8eb9ae9ca5bfed0433d7
2013-07-24 23:10:00 +03:00
Dmitry Kovalev
9139ee0908 Adding condition inside get_tx_type_{4x4, 8x8, 16x16}.
Adding plane type check condition because it was always used outside of
get_tx_type_{4x4, 8x8, 16x16}.

Change-Id: I02f0bbfee8063474865bd903eb25b54d26e07230
2013-07-24 12:55:45 -07:00
James Zern
9e29b4cd54 Merge "vp9_find_mv_refs_idx: remove unused split_count" 2013-07-24 12:49:15 -07:00
James Zern
e6c0387edd vp9_find_mv_refs_idx: remove unused split_count
variable was write only

Change-Id: I04b002178f66961836ee08fb60a05b91b54e91d8
2013-07-24 11:51:37 -07:00
Adrian Grange
4cfd36d8fd Use local variables rather than structure members
Although local copies of the mode member variables
(mode, ref_frame) were made, they were not used in
all places. Also, made a local copy of the
second_ref_frame member.

Change-Id: I84d8c822e5cb3d8a02fc3de8a4037ca3fea8bfad
2013-07-24 11:17:44 -07:00
Adrian Grange
a183f17d33 Merge "Correct spelling mistakes" 2013-07-24 09:48:57 -07:00
Ronald S. Bultje
7817d3221f Save pixels instead of coefficients in intra4x4 RD loop.
Prevents doing duplicate IDCTs; encoding of first 50 frames of bus
(speed 0) @ 1500kbps goes from 1min4.0 to 1min3.5, i.e. 0.87% faster
overall.

Change-Id: I2df39e29ed9d5ea5e7d2704a34940ba622832ddd
2013-07-24 09:03:20 -07:00
Ronald S. Bultje
b72ecbb1b9 Add best_rd breakout in intra4x4 RD loop.
Encoding time of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min5.4 to 1min4.0, i.e. 2.2% faster overall.

Change-Id: I8c32f2aff9a649ce7dd49d910dc5ba16b99c3bc6
2013-07-24 09:02:05 -07:00
Adrian Grange
bc8b0529db Correct spelling mistakes
Change-Id: Id4138293efeac4503b2e01ce7a6c150a5abeef77
2013-07-24 07:58:26 -07:00
Ronald S. Bultje
47336afd8d Merge "More optimizations for cost_coeffs()." 2013-07-23 21:36:12 -07:00
Jingning Han
666c266623 Merge "Unify the use of encode_b_args/optimize_block_args" 2013-07-23 18:08:50 -07:00
Dmitry Kovalev
1099a436d3 Moving counts from FRAME_CONTEXT to new struct FRAME_COUNTS.
Counts are separate from frame context. We have several frame contexts but
need only one copy of all counts.

Change-Id: I5279b0321cb450bbea7049adaa9275306a7cef7d
2013-07-23 17:02:08 -07:00
Jingning Han
ab77828b36 Unify the use of encode_b_args/optimize_block_args
The struct optimize_block_args is defined same as encode_b_args.
Remove this redundant definition, and use encode_b_args consistently.

Change-Id: I1703aeeb3bacf92e98a34f4355202712110173d9
2013-07-23 16:04:02 -07:00
Dmitry Kovalev
8d13b0d1df Removing LOW_PRECISION_MV_UPDATE define.
Change-Id: I78d16ee758e1fae0200b746f00031f6d9c6d6ce7
2013-07-23 15:41:45 -07:00
Dmitry Kovalev
a9bbabd94b Merge "Removing vp9_is_interpolating_filter array." 2013-07-23 15:01:19 -07:00
Adrian Grange
719cd35f3a Merge "Rolled-up several for loops into one" 2013-07-23 15:00:06 -07:00
Adrian Grange
646edbc1b2 Rolled-up several for loops into one
Several consecutive for loops executed over the same
index range, so I rolled them into one.

Change-Id: I5cfcc8c38c738478965768409cca9d09adf224e1
2013-07-23 14:32:21 -07:00
Dmitry Kovalev
db7f5d28b9 Removing vp9_is_interpolating_filter array.
All filters are interpolating now, so we don't need this array, all
values from this array are evaluated to true.

Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961
2013-07-23 14:24:39 -07:00
Dmitry Kovalev
2855d8aea1 Merge "Adding update_tx_counts function." 2013-07-23 13:57:59 -07:00
Dmitry Kovalev
0d59d6efcd Merge "Removing MODE_COUNT_TESTING from vp9_entropymode.c." 2013-07-23 13:57:05 -07:00
Jingning Han
825f676ceb Merge "Make xform_quant operations tx_type independent" 2013-07-23 13:40:27 -07:00
Dmitry Kovalev
9c2c17dec7 Merge "Cleanup inside vp9_get_pred_context_tx_size." 2013-07-23 12:45:49 -07:00
Dmitry Kovalev
a97d4ab123 Removing MODE_COUNT_TESTING from vp9_entropymode.c.
Change-Id: I5367bc1d9e660d86879d285a6f146d8a47e62464
2013-07-23 12:37:41 -07:00
Jingning Han
e9e2fe8ec3 Make xform_quant operations tx_type independent
The xform_quant() module is only used by inter modes, hence removing
the redundant switches therein conditioned on tx_type.

Change-Id: Ib87ce5b2f2e4cbf3ceb133a1108afa173c933a3f
2013-07-23 12:37:25 -07:00
James Zern
8dede954c7 Merge "vp9: make some static tables const" 2013-07-23 11:37:01 -07:00
Jingning Han
4ef1d35abf Merge "Skip inverse transform when eob is zero" 2013-07-23 10:31:19 -07:00
James Zern
c3871f8f70 Merge "VP9_COMMON: remove unused temp_scale_frame" 2013-07-23 10:30:55 -07:00
Deb Mukherjee
9360fd3dcf Merge "Diamond search change to accelerate movement" 2013-07-23 10:14:10 -07:00
Jingning Han
0359ad7f9a Skip inverse transform when eob is zero
When all the transform coefficients were quantized to zero, skip
the inverse transform operation. For bus_cif at 1000 kbps, the
runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
at speed 0.

Change-Id: Ic0a813fff5e28972d4888ee42d8747846a6c3cc6
2013-07-23 10:06:41 -07:00
Paul Wilkins
cedd24ec61 Merge "Renaming of segment constants." 2013-07-23 08:16:12 -07:00
Scott LaVarnway
7bc294a3fe pack_inter_mode_mvs cleanup
xd->mode_info_context is set to m prior to this call.

Change-Id: Ibc442529961750c29ccf0c6cae08cb2b0431415f
2013-07-23 10:08:28 -04:00
Jim Bankoski
256ee00093 Merge "clean up bw, bh" 2013-07-23 06:58:28 -07:00
Jim Bankoski
86a9dec73c clean up bw, bh
many structures use bw and bh and they have different meanings.   This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...

also removed partition type multiple operation code in bitstream.c.

Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
2013-07-23 06:51:44 -07:00
Scott LaVarnway
2fd20eb37d Merge "Eliminated prev_mip memsets/memcpys in encoder" 2013-07-23 06:43:52 -07:00
Paul Wilkins
7c134bc0cd Merge "Reworked the auto_mv_step_size speed feature" 2013-07-23 04:49:55 -07:00
Paul Wilkins
32042af14b Renaming of segment constants.
Renamed:
  MAX_MB_SEGMENTS to MAX_SEGMENTS
  MB_SEG_TREE_PROBS to SEG_TREE_PROBS

The minimum unit for segmentation in the segment map
is now 8x8 so it is misleading to use MB_ as macro-block
traditionally refers to a 16x16 region.

Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22
2013-07-23 12:09:04 +01:00
James Zern
3c8cce353f vp9: make some static tables const
Change-Id: I8bcae51271673da8755c66a51aea005dfe6a3739
2013-07-22 19:19:13 -07:00
Frank Galligan
e88db77892 Merge "Speedup loopfilter neon code." 2013-07-22 17:39:42 -07:00
Dmitry Kovalev
0ad079e583 Cleanup inside vp9_get_pred_context_tx_size.
Using max_txsize_lookup to get max transform size.

Change-Id: If4b39beba3c06a581effd8cab698ea90727dc2c9
2013-07-22 17:18:11 -07:00
James Zern
ab139094ed Merge "VP9_COMMON: drop cur_tile_{row,col}_idx" 2013-07-22 17:12:39 -07:00
Frank Galligan
5af6bf6c43 Speedup loopfilter neon code.
Try and cut down the cycle count by rearranging the instructions
so there are less stalls.

Change-Id: Ic1383335ee0f05e656477d9ee9c179ec231285d5
2013-07-22 17:00:01 -07:00
James Zern
37da8ea693 Merge "vp9: apply loopfilter inline if possible" 2013-07-22 16:32:20 -07:00
Ronald S. Bultje
e20fcd9585 More optimizations for cost_coeffs().
4x4:    163 ->  123 cycles (33% faster)
8x8:    491 ->  399 cycles (23% faster)
16x16: 1889 -> 1763 cycles (7% faster)
32x32: 8311 -> 8180 cycles (1.6% faster)

Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.

Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
2013-07-22 16:09:09 -07:00
James Zern
38a4412e1b vp9: apply loopfilter inline if possible
excludes tiled content currently

Change-Id: I44155253e8d6771e5e039d663be5f21cc9d0355d
2013-07-22 15:52:10 -07:00
Yunqing Wang
f9e8167ba9 libyuv: fix SSSE3 code in scale.c
This patch was provided by Frank.

Change-Id: Icebcbd96016a51a85dbe5e8a351ab7624ace962b
2013-07-22 15:42:23 -07:00
Dmitry Kovalev
b2fc6fa969 Adding update_tx_counts function.
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).

Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
2013-07-22 14:57:43 -07:00
James Zern
746154d905 Merge "filter_block_plane: remove MACROBLOCKD param" 2013-07-22 13:43:34 -07:00
James Zern
0a58f462b8 VP9_COMMON: remove unused temp_scale_frame
Change-Id: I696a0dca1d02d365e283029d1d077710bd5680e0
2013-07-22 13:42:11 -07:00
Dmitry Kovalev
0c5a383b2a Merge "Using update_ct and update_ct2 functions for probability update." 2013-07-22 13:34:30 -07:00
James Zern
ccf6710dc2 VP9_COMMON: drop cur_tile_{row,col}_idx
these were only being written in one location and never read.

Change-Id: If59f3c09aa1485cf89bac0099a8a79e99688b5d1
2013-07-22 13:23:33 -07:00
Yaowu Xu
6261d79206 Merge "fix a build error" 2013-07-22 13:02:15 -07:00
James Zern
32bca36f51 Merge "configure: default configure log to config.log" 2013-07-22 12:55:29 -07:00
James Zern
76db4d599a Merge "VP[89]_COMMON: remove golden/altref frame counts" 2013-07-22 12:55:07 -07:00
Yaowu Xu
fc186dcad6 fix a build error
Change-Id: I3b05687f439ff6a7c426d2c97a6c58c831fa51ac
2013-07-22 12:37:30 -07:00
Jingning Han
416f315e82 Merge "Skip buffer update in sub8x8 rd loop" 2013-07-22 12:08:22 -07:00
Jingning Han
a5a9f5f7f3 Merge "Optimize operation flow in sub8x8 rd loop" 2013-07-22 12:08:15 -07:00
Dmitry Kovalev
8c5ca9ff14 Using update_ct and update_ct2 functions for probability update.
Update logic for both mode and mvref was the same, so using MODE_COUNT_SAT,
MODE_MAX_UPDATE_FACTOR, update_ct, update_ct2 for both cases. Removing
function update_tx_ct because it was identical to update_mode_ct2.

Change-Id: Iff566be27dbd6cde4c2ec04e8d988f207046b8f0
2013-07-22 12:06:43 -07:00
James Zern
1197d6736c Merge "tests: silence a few type related warnings" 2013-07-22 11:50:22 -07:00
James Zern
4a688b26f7 Merge "cosmetics: idct_test.cc: fix formatting" 2013-07-22 11:49:23 -07:00
Deb Mukherjee
a1e2d50be9 Diamond search change to accelerate movement
Optional change in diamond search to continue in the best move
direction until that move turns worse.

This is still WIP since the exact way the new method is to be used is
under investigation. One option is to make it an option in diamond
search and use it only when motion is large.

Overall slightly positive on derfraw300 +0.02%, stdhdraw +0.13%,
but works a lot better for high motion sequences (ex. football : +1%).

Change-Id: If88e01a6021daa0cda934680cdc70be1ee04f798
2013-07-22 11:19:15 -07:00
Paul Wilkins
3798d7a641 Merge "Re-order mode search in rd." 2013-07-22 10:46:04 -07:00
Jingning Han
409e77f2d4 Optimize operation flow in sub8x8 rd loop
Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
the encoder to skip the forward transform, quantization, and coeff cost
estimation, in the sub8x8 rd optimization search, if the motion
vector(s) are of integer pixel value, and have been tested in the
previous prediction filter type rd loops of the same block.

This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
Its efficacy depends how frequently the motion search will select an
integer motion vector.

Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
2013-07-22 10:40:33 -07:00
Paul Wilkins
1d189d6464 Re-order mode search in rd.
Mode search order in rd loop changed to better reflect
observed hit counts.

Also some adjustment of the baseline mode rd thresholds
to reflect the order change and observed frequencies.

Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
2013-07-22 17:21:12 +01:00
Jim Bankoski
9ad604c6fb Merge "fix left over overflow" 2013-07-22 08:51:26 -07:00
Jim Bankoski
2ac8b50cd8 fix left over overflow
This cl fixes issues rbultje brought up. that I somehow neglected when I
submitted yaowu's patch.

Change-Id: I07ad18796317822510b96e951c88d29f194a3c2e
2013-07-22 06:39:39 -07:00
Paul Wilkins
888375d243 Fix build error.
When CONFIG_POSTPROC is set there was a now
invalid reference to cm->filter_level.

Changed to cpi->mb.e_mbd.lf.filter_level in line with
change Iaf5fb71c33719cdfa1b991f671caf071be9ea035

Change-Id: If746e60044903f7ba8d0d346225b3d015226c7d0
2013-07-22 14:01:43 +01:00
Dmitry Kovalev
ee1fe2f750 Merge "Removing pre probabilities from FRAME_CONTEXT." 2013-07-20 22:50:32 -07:00
Dmitry Kovalev
8962d975b2 Merge "Moving all loop filter related variables into new struct." 2013-07-20 22:45:24 -07:00
Dmitry Kovalev
39342db138 Merge "Consistent names for inter mode probabilities and encodings." 2013-07-20 22:40:51 -07:00
Dmitry Kovalev
f66821afbb Merge "Removing frame_type field from MACROBLOCKD struct." 2013-07-20 22:40:06 -07:00
Dmitry Kovalev
2b089f149a Merge "Removing unused static arrays from vp9_reatectrl.c." 2013-07-20 22:39:33 -07:00
Dmitry Kovalev
ad46753378 Merge "Moving vp9_reader into decode_tiles function." 2013-07-20 22:39:22 -07:00
Jingning Han
c725502bf3 Skip buffer update in sub8x8 rd loop
This commit allows the encoder to skip a few buffer update steps in
rd_pick_best_mbsegmentation, when early breakout has been triggered
in the rd_check_segment_txsize. It provides about 1% speed-up for
bus_cif at 2000 kbps, in the settings of speed 0.

Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
2013-07-20 21:38:12 -07:00
Yaowu Xu
ea284d6281 added checks to prevent rate/distortion overflow
At speed 2, due to the threshold scheme used, it is possible the rate
and distortion assigned with INT_MAX value. The patch added checking
to prevent the INT_MAX value is used in further calculation of RD
scores. The patch also changed the assertion in rd_use_partition() to
be mirror similar assertion in rd_pick_partition().

Change-Id: Idb52c543cc1e10abdf6e6a5d6e9cb535a42214dc
2013-07-19 17:52:50 -07:00
Dmitry Kovalev
7e703de729 Removing pre probabilities from FRAME_CONTEXT.
Using cm->frame_contexts[cm->frame_context_idx] as source of previous
probabilities.

Change-Id: Ie03778acf0e7bebdc3a1f6a51854d4a0712f24a1
2013-07-19 17:33:10 -07:00
Dmitry Kovalev
ee1771ebaa Moving all loop filter related variables into new struct.
Adding loopfilter struct with fields from MACROBLOCKD and VP9Common.
Eventually it will be moved to vp9_loopfilter.h for better code structure.

Change-Id: Iaf5fb71c33719cdfa1b991f671caf071be9ea035
2013-07-19 16:19:10 -07:00
Dmitry Kovalev
f00a237a43 Merge "Fixing problem introduced in one of my previous commits." 2013-07-19 16:14:21 -07:00
Dmitry Kovalev
29f0f79317 Removing unused static arrays from vp9_reatectrl.c.
Removed arrays: kf_boost_seperation_adjustment,
                gf_adjust_table,
                gf_intra_usage_adjustment,
                gf_interval_table.

Change-Id: I62e400cb6e4d039787615169a3779e31ebf95893
2013-07-19 15:55:09 -07:00
Dmitry Kovalev
c3a56ee583 Merge "Moving Scale2Ration function from vp9_onyx.h to vp9_onyx_if.c." 2013-07-19 15:27:24 -07:00
Dmitry Kovalev
2fc927c66a Fixing problem introduced in one of my previous commits.
Changing fc->tx_probs back to fc->pre_tx_probs. This change actually
affects the bitstream but current test vectors work. Chrome branch is not
affected at all. Broken since:

cc662dd Adding struct tx_probs and struct tx_counts to cleanup the code.

Change-Id: I36dd4b3678e902e10aba8dd49b0012eb558c209d
2013-07-19 15:18:43 -07:00
Deb Mukherjee
302698fb12 Reworked the auto_mv_step_size speed feature
This patch modifies the auto_mv_step_size speed feature to
use a combination of the maximum magnitude mv from the last
inter frame, and the maximum magnitude mv for the two reference
mvs with the same reference. For arf frames, the max mav step
for the resolution is used.
The bounds therefore are slightly tighter. The feature is made
a speed 1 feature.

Rebased.

Results (when this feature is turned on over speed 0):
derfraw300: -0.046% psnr, about 5+% speedup
(tested on football: goes from 4m30.760s to 4m17.410s).

Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
2013-07-19 15:12:56 -07:00
James Zern
de012cec4f filter_block_plane: remove MACROBLOCKD param
replace with direct use of the plane and MODE_INFO

Change-Id: Icce57bc398a6e3607aedde0573d977e192040696
2013-07-19 14:19:55 -07:00
Morton Jonuschat
0d204f48b5 Merge "Make libvpx compile on OSX 10.9 (Mavericks)" 2013-07-19 12:37:26 -07:00
Dmitry Kovalev
d6e74e0d59 Moving vp9_reader into decode_tiles function.
Change-Id: Ic741054836d6c1b89c4f1c75cc6bd938a7d56723
2013-07-19 12:27:56 -07:00
Morton Jonuschat
fe4a52077f Make libvpx compile on OSX 10.9 (Mavericks)
Change-Id: Ibf2555f1c0d00e91d416eb39201a5a91df7fab27
2013-07-19 21:22:18 +02:00
Dmitry Kovalev
e71a4a77bb Merge "Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE)." 2013-07-19 12:14:32 -07:00
Dmitry Kovalev
97e96bc4e9 Removing frame_type field from MACROBLOCKD struct.
Change-Id: Ia4e83913251c1cdc7aa2abd64bf01ecb1a962119
2013-07-19 11:55:36 -07:00
Dmitry Kovalev
c0eb57406c Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE).
Moving TX_MODE enum to vp9_enums.h. Renaming txfm_mode variables to
tx_mode.

Change-Id: I459d1af6dd928ce7fccdf8ce30b6f1ca057bef92
2013-07-19 11:37:13 -07:00
Dmitry Kovalev
afe43d4089 Removing redundant VP9_COMMON* from function signatures.
Functions: vp9_get_pred_context_switchable_interp,
           vp9_get_pred_context_intra_inter,
           vp9_get_pred_context_single_ref_p1,
           vp9_get_pred_context_single_ref_p2.

Change-Id: I3d6fb8aee23c9062270768e1e6da416dd9bb8f96
2013-07-19 11:20:49 -07:00
Dmitry Kovalev
bc7acb134b Consistent names for inter mode probabilities and encodings.
Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.

Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
2013-07-19 10:40:04 -07:00
James Zern
561fb31d10 Merge "cosmetics: tile_independence_test: fix formatting" 2013-07-19 10:15:56 -07:00
Paul Wilkins
2b8ae5a0f7 Merge "Alignment of THR_MODES to vp9_mode_order[]" 2013-07-19 10:15:26 -07:00
Paul Wilkins
b2b5836a16 Merge "Block index variables in MACROBLOCKD reduced to chars." 2013-07-19 10:14:52 -07:00
hkuang
97dbee00dd Merge "Add neon optimize vp9_short_idct8x8_add." 2013-07-19 08:28:39 -07:00
Paul Wilkins
f3ed9f5523 Alignment of THR_MODES to vp9_mode_order[]
Change-Id: I4032dd0442043543954dcb3724df974b7cc7e515
2013-07-19 11:33:39 +01:00
Paul Wilkins
710d10c521 Block index variables in MACROBLOCKD reduced to chars.
Change-Id: I9a4df095732d561807de01a41dcb1a1960726a3c
2013-07-19 11:32:51 +01:00
Dmitry Kovalev
13253d6121 Merge "Removing kf_{y, uv}_mode_prob arrays from VP9Common." 2013-07-19 01:00:46 -07:00
Yaowu Xu
73ff56b68e Merge "Fix slightly quality drop caused at speed 1." 2013-07-18 18:27:49 -07:00
Dmitry Kovalev
b829a9d63c Merge "Removing unused int_mv32 union." 2013-07-18 17:56:11 -07:00
Ronald S. Bultje
e4686c589e Fix slightly quality drop caused at speed 1.
We would skip the rectangular blocks for sub8x8 partitions because
we would conclude that PARTITION_NONE was better than PARTITION_SPLIT,
however, that conclusion was made before we actually really tested
PARTITION_SPLIT.

Change-Id: I8fa91e59894badc1d8cee3ba8a49e40ae4c4a489
2013-07-18 17:52:08 -07:00
Yaowu Xu
37d901a47a Merge "Add best_rd breakout to keyframe partition selection also." 2013-07-18 17:50:39 -07:00
Yaowu Xu
67fb0679ee Merge "Merge scale_factors and scale_factors_uv." 2013-07-18 17:50:34 -07:00
Yaowu Xu
55b52e32da Merge "Do in-place UV intra mode selection." 2013-07-18 17:50:07 -07:00
Yaowu Xu
51972d1279 Merge "Change break statement in a 2d loop to a return statement." 2013-07-18 17:49:58 -07:00
Dmitry Kovalev
92f4198d52 Merge "Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT)." 2013-07-18 17:29:05 -07:00
hkuang
d757de744c Add neon optimize vp9_short_idct8x8_add.
Change-Id: Ic32acf3e2939c6d12d9c2bf192a5f5da59705fda
2013-07-18 16:40:41 -07:00
James Zern
104dbbbfd9 tests: silence a few type related warnings
Change-Id: If908328c1dbbb5bd84c57e30fab1cda1804933e4
2013-07-18 16:13:39 -07:00
James Zern
bae311772f cosmetics: tile_independence_test: fix formatting
Change-Id: Ifd48f796fa70fe1dc9b87a6f2bdc715bc0ea5ad3
2013-07-18 16:00:01 -07:00
James Zern
36b882eeb6 cosmetics: idct_test.cc: fix formatting
clang-format -style=Google

Change-Id: Ic85f2cd2a1d65d9cf18a0f8bc515c0a0f5161747
2013-07-18 15:42:06 -07:00
Dmitry Kovalev
0b562b2d3d Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT).
Change-Id: Ide58a74d31ff948319445a6337d2c05e98720e34
2013-07-18 15:12:46 -07:00
Ronald S. Bultje
cf8988bda8 Merge "Remove motion vectors from PARTITION_INFO." 2013-07-18 15:10:41 -07:00
Dmitry Kovalev
39cea29bb7 Merge "Removing unused mv_bias and check_mv_bounds functions." 2013-07-18 14:41:52 -07:00
Dmitry Kovalev
9f193ac75f Merge "Removing unused members of VP9Decompressor: mbc, prob_skip_false." 2013-07-18 14:41:43 -07:00
James Zern
e636af13e7 configure: default configure log to config.log
this is consistent with autoconf

Change-Id: I1860831693789259ee35d644775653d6a460cc77
2013-07-18 14:17:00 -07:00
Ronald S. Bultje
96e4db2660 Add best_rd breakout to keyframe partition selection also.
Change-Id: I96b8058f6dfecf8aa3e152cdcbfd7e10071fbbc9
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
5ebe503f04 Merge scale_factors and scale_factors_uv.
This prevents a duplicate memcpy of a 128-byte struct every time
set_scale_factors() is called (which is a lot), thus leading to a
decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
RD/partition loop.

Overall, this decreases encoding time of the first 50 frames of bus
@ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
overall speedup. We can likely get more gains by removing the copy
of the other struct (and replacing it with an indexing) as well.

Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
df4b4fab26 Do in-place UV intra mode selection.
This means we only do UV intra mode selection if we find any intra
mode to actually be useful at all; in addition, we only do UV intra
mode selection for the transform sizes that were selected, rather
than all sizes available in this partition.

First 50 frames of bus @ 1500kbps (speed 0) gains about 5% with this
change.

Change-Id: I7b461eb8b803247f57896c5a9505f745b55502b3
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
e54a5782b9 Change break statement in a 2d loop to a return statement.
The break statement only breaks out of the nested loop, not the
top-level loop, so it doesn't always work as intended. Changing it
to a return statement does what's intended.

Change-Id: I585419823b39a04ec8826b1c8a216099b1728ba7
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
2d4929e340 Remove motion vectors from PARTITION_INFO.
The same information already exists in union b_mode_info.

Change-Id: Iac5086b99a3c3cc270380138062bb693e58f9e6d
2013-07-18 14:10:52 -07:00
James Zern
5f30a0c687 VP[89]_COMMON: remove golden/altref frame counts
these are only used in the encoder.
frames_since_golden / frames_till_alt_ref_frame -> VP[89]_COMP

Change-Id: Ie14a6f46987bced685ddb449b85dc261caba6dfe
2013-07-18 14:09:21 -07:00
Dmitry Kovalev
9f3c0e34a9 Moving Scale2Ration function from vp9_onyx.h to vp9_onyx_if.c.
Change-Id: Idfe2a850f72b38f519aea1aac1266d8c3aa813ee
2013-07-18 14:05:06 -07:00
Ronald S. Bultje
9da67da04a Merge "Fix bug where we don't choose any mode in RD selection." 2013-07-18 12:47:50 -07:00
Ronald S. Bultje
247197d57b Fix bug where we don't choose any mode in RD selection.
This could happen during golden overlay frame coding from a previous
alt-ref frame if the special overlay code was triggered.

Change-Id: I3056d0c547cd26903b260ef93c94026e96bd9868
2013-07-18 12:13:15 -07:00
Dmitry Kovalev
3de20f3ea9 Removing unused members of VP9Decompressor: mbc, prob_skip_false.
Change-Id: Id5480a4fd56c184ad046c2192b30d190debb3de0
2013-07-18 11:29:34 -07:00
Dmitry Kovalev
b3c0a5fddb Removing unused mv_bias and check_mv_bounds functions.
Change-Id: I1558fd969d9ad112bf6480bdd16ef87edd396ab5
2013-07-18 11:20:48 -07:00
Johann
aa86196c93 Merge "libvpx: enable building for iOS devices (armv7)" 2013-07-18 10:33:51 -07:00
Ami Fischman
294ecc7e0a libvpx: enable building for iOS devices (armv7)
Allow output of gas syntax assembly directly from obj_int_extract

Change-Id: I33a747e87ef1c590a8766dea17f8cb2497e54591
2013-07-18 10:21:56 -07:00
Frank Galligan
5769713608 Merge "Fix horz loopfilter loops" 2013-07-18 10:21:21 -07:00
Ronald S. Bultje
4f5815290c Merge "Fix bug which skips zeromv even if near/nearest is not 0,0." 2013-07-18 10:06:51 -07:00
Frank Galligan
7fd5d8e6a4 Fix horz loopfilter loops
If count was greater than 1 the src pointer would be off on
the second loop.

Change-Id: I8e09037e68dc4ae92076a8067f7b6dacbbef8263
2013-07-18 09:44:15 -07:00
Ronald S. Bultje
deb7456058 Fix bug which skips zeromv even if near/nearest is not 0,0.
Change-Id: Id4f454831f3f11099f39c30246adeaa52857d08d
2013-07-18 09:35:19 -07:00
Jingning Han
ced3c20165 Use mv_check_bounds in sub8x8 rd loop
Make the use of mv_check_bounds consistent for mvs of both ref_frame[0]
and ref_frame[1].

Change-Id: I1ca24865cc7232ca9cbe5db566c53abad1592211
2013-07-17 17:13:51 -07:00
Dmitry Kovalev
7363e668d5 Removing unused int_mv32 union.
Change-Id: Ie692ed6e5fa1d2122e3a03573914d0fcce842f9e
2013-07-17 17:02:44 -07:00
Dmitry Kovalev
f9f453ec8d Removing kf_{y, uv}_mode_prob arrays from VP9Common.
These arrays have constant values (no any updates). Removing two
corresponding memcpy calls. Making a little cleanup in vp9_entropymode.h
as well: removing redundant 'extern' keyword and moving all function
declarations at the end.

Change-Id: Ia16b38b46aec2e2500f5df29c40a297ae241dede
2013-07-17 16:50:52 -07:00
Ronald S. Bultje
facecd80da Merge "Add a best_yrd shortcut in splitmv mode search." 2013-07-17 16:11:13 -07:00
Ronald S. Bultje
056111c822 Merge "Skip redundant nearest/near/zero encodes in splitmv." 2013-07-17 16:10:51 -07:00
Ronald S. Bultje
0b1eba25b2 Merge "Skip nearest/near/zero redundant encodes." 2013-07-17 16:10:41 -07:00
Ronald S. Bultje
607424449c Merge "Best_rd breakout in rd partition search." 2013-07-17 16:10:22 -07:00
Yunqing Wang
b409c1d91e Merge "Remove unnecessary calling of vp9_init_quantizer()" 2013-07-17 15:47:17 -07:00
Yunqing Wang
3798db88e1 Remove unnecessary calling of vp9_init_quantizer()
vp9_init_quantizer() is called in vp9_create_compressor(), and
should not be called in vp9_set_speed_features().

Change-Id: Ic2f1f4b0531b9d46bb841d7e1d8da9812207dad6
2013-07-17 14:59:00 -07:00
hkuang
7b9a652813 Merge "Remove unnecessary buffer copy in idct4x4." 2013-07-17 14:51:53 -07:00
Yaowu Xu
6ac5b7db2c Merge "changed mode checking order" 2013-07-17 14:44:40 -07:00
Dmitry Kovalev
a7a1e96136 Merge changes Ieffea49e,Idf610746
* changes:
  Removing two unused arguments from vp9_inc_mv signature.
  Changing signature of vp9_get_pred_probs_tx_size.
2013-07-17 14:44:20 -07:00
Dmitry Kovalev
b775081283 Merge "Removing experimental code from vp9_entropymv.c." 2013-07-17 14:43:45 -07:00
Dmitry Kovalev
102632aab1 Merge "Adding read_comp_pred function." 2013-07-17 14:26:34 -07:00
Ronald S. Bultje
c6917528a5 Add a best_yrd shortcut in splitmv mode search.
Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min6.2 to 1min5.9, i.e. 0.5% faster overall.

Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
2013-07-17 14:21:57 -07:00
hkuang
bd6ce7128c Remove unnecessary buffer copy in idct4x4.
Change-Id: I386066b9bcfb4bffb582e6827af36ca0181f6a83
2013-07-17 14:20:56 -07:00
Ronald S. Bultje
161c995658 Skip redundant nearest/near/zero encodes in splitmv.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
1min7.3 to 1min6.2, i.e. 1.7% faster overall.

Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
2013-07-17 13:53:48 -07:00
Yaowu Xu
42facc292d changed mode checking order
Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
2013-07-17 13:43:50 -07:00
Ronald S. Bultje
8fea880b6f Skip nearest/near/zero redundant encodes.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
to 1min7.3, i.e. 8% faster.

Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
2013-07-17 11:33:15 -07:00
Yunqing Wang
10e83b0717 Enable disable_splitmv feature for other speeds
Added disable_splitmv feature at other speed levels. For speed 3 or
above, always turn it on.

Change-Id: Ibb36f0a7ef12a34b4f8d0f9cb6193eab43b34360
2013-07-17 10:25:49 -07:00
Dmitry Kovalev
8452c34551 Removing experimental code from vp9_entropymv.c.
Change-Id: I340d06e3bc32c78358654496503cccd4196cbe2e
2013-07-17 10:25:09 -07:00
Johann
9ca66ec050 Merge "vp9_convolve8_neon placeholder" 2013-07-17 10:09:00 -07:00
Ronald S. Bultje
9f427bfe98 Best_rd breakout in rd partition search.
About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
goes from 1min36 to 1min24. Results become slightly better (+0.2% on
derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
super_block_yrd(). Overall speed change (on derfraw300) is roughly
-13%. This can probably be improved further by caching best_yrd
between partition searches. Also, we might be able to get more
speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
just at the sb8x8 level.

Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
2013-07-17 09:56:46 -07:00
Ronald S. Bultje
83c7e13a6b Do a skip-block check for sub8x8 partitions also.
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
2013-07-17 09:46:47 -07:00
Yunqing Wang
df90d58f4f Speed up motion estimation using small partitions' result(experiment)
Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
2013-07-17 09:11:47 -07:00
Johann
59dc4e9cdd vp9_convolve8_neon placeholder
Call the individually optimized horizontal and vertical functions. This
implementation abuses the temp buffer.

This will be replaced with a custom optimized function.

Over 2x speedup.

Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
2013-07-17 08:39:27 -07:00
Yaowu Xu
a33086f925 Merge "added missed replacement" 2013-07-17 07:46:04 -07:00
Paul Wilkins
d66eab15dd Merge "Move uv intra mode selection in rd loop." 2013-07-17 05:19:26 -07:00
Paul Wilkins
154c34a3ee Merge "Limit transform sizes searched for uv intra." 2013-07-17 03:40:11 -07:00
Paul Wilkins
2ee338ce3b Move uv intra mode selection in rd loop.
Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
2013-07-17 11:11:21 +01:00
Paul Wilkins
6c667f0ffe Limit transform sizes searched for uv intra.
Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
2013-07-17 11:08:55 +01:00
Paul Wilkins
5f4722c75f Merge "Minor cleanup in code to fine uv tx_size." 2013-07-17 02:50:09 -07:00
Dmitry Kovalev
6638b6f63f Merge "Removing MV_GROUP_UPDATE define and corresponding code." 2013-07-16 21:09:00 -07:00
Jingning Han
0b58fa80a0 Merge "Skip redundant motion search in 4x4 level rd loop" 2013-07-16 20:54:25 -07:00
Dmitry Kovalev
851a911158 Adding read_comp_pred function.
Removing old debug code from vp9_decodemv.c.

Change-Id: I51a6d5fe6a2f6583a1555e692bb1ee5a5b315d6c
2013-07-16 20:20:25 -07:00
Jingning Han
a142d6fc93 Skip redundant motion search in 4x4 level rd loop
This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.

Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
2013-07-16 17:21:11 -07:00
Yaowu Xu
a8f9b9c94f added missed replacement
Change-Id: I2bce6f381fef0729b4dd5eb09ccb609f2eddd7ef
2013-07-16 17:12:45 -07:00
Dmitry Kovalev
41ae3d02d4 Removing two unused arguments from vp9_inc_mv signature.
Change-Id: Ieffea49eb7a5e5092f21f8694c546aff69b07c6d
2013-07-16 17:01:08 -07:00
Dmitry Kovalev
5b65a71cdc Changing signature of vp9_get_pred_probs_tx_size.
Removing VP9_COMMON* argument and adding struct tx_probs* instead of
MACROBLOCKD*.

Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c
2013-07-16 16:34:54 -07:00
Dmitry Kovalev
f53d007b9e Merge "Loop filter code cleanup." 2013-07-16 15:55:17 -07:00
Dmitry Kovalev
3997da0d35 Removing MV_GROUP_UPDATE define and corresponding code.
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
2013-07-16 15:03:00 -07:00
Dmitry Kovalev
9482a0bf10 Cleaning up tile code.
Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.

Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
2013-07-16 14:47:15 -07:00
Dmitry Kovalev
2de3c8d29b Loop filter code cleanup.
Cosmetic code changes, renaming 'flat' local var to 'mask', removing
unused field 'blim' from loopfilter_info_n and loop_filter_info structs.

Change-Id: I51e6ccf727fe361ad9a08e29e1201aa7abd4987f
2013-07-16 14:39:31 -07:00
James Zern
98e132bde0 Merge changes I40454d26,I892e76d5,I865ab3f9,I4a4bec17,I61c4351e,I37eb3559,I1031c556,I8c8f1f42
* changes:
  delete vp9_loopfilter_sse2.asm
  vp9_loopfilter_intrin_sse2: cosmetics: fix indent
  delete x86/vp9_loopfilter_x86.h
  vp9_loopfilter_intrin_sse2: make some funcs static
  vp9_loopfilter_intrin_sse2: remove unused uv funcs
  vp9_loopfilter: remove uv function typedef
  filter_block_plane: reuse some constants
  vp9_loopfilter.c: make some functions static
2013-07-16 14:25:32 -07:00
James Zern
39ce4b13d5 Merge "use consistent framerate naming" 2013-07-16 14:22:52 -07:00
James Zern
9581eb6e8a use consistent framerate naming
s/frame_rate/framerate/g

Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
2013-07-16 14:12:47 -07:00
Jingning Han
5e8e2bf48e Merge "SSE2 16x16 inverse ADST/DCT hybrid transform" 2013-07-16 14:04:04 -07:00
Dmitry Kovalev
5de96b3ce6 Merge "Rewriting vp9_set_pred_flag_{seg_id, mbskip}." 2013-07-16 13:34:42 -07:00
Dmitry Kovalev
85a0d8e85c Merge "Moving vp9_kf_default_bmode_probs to vp9_entropymode.c." 2013-07-16 13:26:53 -07:00
James Zern
50015f6eba delete vp9_loopfilter_sse2.asm
sse2 functions are provided by vp9_loopfilter_intrin_sse2.c

Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b
2013-07-16 13:09:16 -07:00
James Zern
8f4787a383 vp9_loopfilter_intrin_sse2: cosmetics: fix indent
Change-Id: I892e76d5ad1443b2ea0d1a7839fe26afe9c68ffb
2013-07-16 13:09:16 -07:00
James Zern
af58254267 delete x86/vp9_loopfilter_x86.h
also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h

Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405
2013-07-16 13:09:05 -07:00
James Zern
5baa416b6c Merge "vp9: remove frames_{since,till}.. from MACROBLOCKD" 2013-07-16 13:00:14 -07:00
James Zern
70fe2b3ec3 Merge "Cosmetic changes in 4x4 and 8x8 fdct unit tests" 2013-07-16 12:55:42 -07:00
Jingning Han
d05f66aa10 SSE2 16x16 inverse ADST/DCT hybrid transform
This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
This provides about 1% encoding speed-up at speed 0.

Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
2013-07-16 12:51:42 -07:00
James Zern
c0562d08f6 Merge "VP[89]_COMMON: remove unused near_boffset" 2013-07-16 12:17:04 -07:00
James Zern
63e914bde4 Merge "VP9_COMMON: remove unused framerate/bitrate" 2013-07-16 12:16:37 -07:00
James Zern
3a7c2665d0 Merge "yv12config: remove YUV_TYPE" 2013-07-16 12:16:04 -07:00
Ronald S. Bultje
58a2005367 Merge "Replace generated quant tables with static lookup tables." 2013-07-16 12:07:17 -07:00
Ronald S. Bultje
e965cccce5 Replace generated quant tables with static lookup tables.
This prevents possible float rounding issues between architectures.

Change-Id: I6ed260aebd49feb4cfb5596a5370c44be5f72167
2013-07-16 12:06:26 -07:00
John Koleszar
cc1aac1b3c Merge "Fix above context pointers" 2013-07-16 11:23:38 -07:00
Jingning Han
5851904744 Merge "SSE2 8x8 inverse ADST/DCT transform" 2013-07-16 11:00:11 -07:00
Dmitry Kovalev
baf0c959c7 Moving vp9_kf_default_bmode_probs to vp9_entropymode.c.
Removing vp9_modelcontext.c.

Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427
2013-07-16 10:54:34 -07:00
Dmitry Kovalev
863138a2ad Rewriting vp9_set_pred_flag_{seg_id, mbskip}.
Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.

Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
2013-07-16 10:44:48 -07:00
Paul Wilkins
30d2ea45ce Minor cleanup in code to fine uv tx_size.
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
2013-07-16 18:27:33 +01:00
John Koleszar
5efd9609e3 Fix above context pointers
In the prior code, the above context pointers used for entropy
decoding were initialized on the first frame, and not updated when
the frame size changed. The per-frame code which initializes the
contexts assumes that the contexts are contiguous, leading to an
incomplete initialization when the frame is smaller. This commit
updates the pointers so that the context is contigous whenever
the frame size changes.

Change-Id: I08b53e3a30c8289491212311682ff1b8028cff6c
2013-07-16 10:26:56 -07:00
Johann
90ebfe621f Merge "vp9_convolve8_[horiz|vert]_avg" 2013-07-16 09:42:52 -07:00
Jingning Han
dd97c62ab8 Merge "Skip inter-coded block reconstruction in rd loop" 2013-07-16 09:03:38 -07:00
Dmitry Kovalev
e8e7620a1f Merge "Removing and moving around constant definitions." 2013-07-16 00:52:53 -07:00
Yaowu Xu
c5b0cd8405 Merge "Change to extend full border only when needed" 2013-07-15 21:35:32 -07:00
Yaowu Xu
5b915ebd92 Change to extend full border only when needed
This is a short term optimization till we work out a decoder
implementation requiring no frame border extension.

Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
2013-07-15 20:52:13 -07:00
Dmitry Kovalev
ca75f1255f Removing and moving around constant definitions.
Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
2013-07-15 19:26:30 -07:00
Dmitry Kovalev
65762849d1 Merge "Consistent naming for loop-filter filters." 2013-07-15 19:21:32 -07:00
Johann
6eae37f45c Merge "Remove print_nmvcounts" 2013-07-15 18:43:41 -07:00
Ronald S. Bultje
b02c4d364f Increase border size from 96 to 160.
This is required because upon downscaling, if a motion vector points
partially into the UMV (e.g. all minus 1 of 64+7 pixels, i.e. 70),
then we can point up to 140 pixels into the larger-resolution (2x)
reference buffer UMV, which means the UMV for reference buffers in
downscaling needs to be 140 rounded up to the nearest multiple of 32,
i.e. 160.

Longer-term, we should probably handle the UMV differently by detecting
edge coverage on-the-fly and using a temporary buffer for edge extensions
instead of adding 160 pixels on all sides of the image (which means a
CIF image uses 3x its own area size for borders).

Change-Id: I5184443e6731cd6721fc6a5d430a53e7d91b4f7e
2013-07-15 17:30:57 -07:00
Ronald S. Bultje
1ff94fea56 Inline vp9_quantize() in xform_quant().
Cycle times:
4x4:    151 to  131 cycles (15% faster)
8x8:    334 to  306 cycles (9% faster)
16x16: 1401 to 1368 cycles (2.5% faster)
32x32: 7403 to 7367 cycles (0.5% faster)

Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.

Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
2013-07-15 17:30:57 -07:00
Ronald S. Bultje
7e684e2009 Merge "Inline xform_quant() in encode_block_intra()." 2013-07-15 17:29:39 -07:00
Frank Galligan
ce1d69aed9 Merge "Neon: Update mbfilter if all vectors follow one branch." 2013-07-15 17:11:55 -07:00
Dmitry Kovalev
e973b4e2d9 Consistent naming for loop-filter filters.
Renaming flatmask4 to flat_mask4, flatmask5 to flat_mask5, hevmask to
hev_mask, filter to filter4, mbfilter to filter8, wide_mbfilter to
filter16.

Change-Id: Ic61c73e59c2eee505257584867aafac99833cea1
2013-07-15 16:01:31 -07:00
Ronald S. Bultje
6fb418741f Inline xform_quant() in encode_block_intra().
Also inline some of the block calculations to assist the compiler to
not do silly things like calculating the same offset (or converting
between raster/transform block offset or block, mi and pixel unit)
many, many, many times.

Cycle times:
4x4:     584 ->   505 cycles (16% faster)
8x8:    1651 ->  1560 cycles (6% faster)
16x16:  7897 ->  7704 cycles (2.5% faster)
32x32: 16096 -> 15852 cycles (1.5% faster)

Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.

Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
2013-07-15 16:00:42 -07:00
Dmitry Kovalev
2c31729839 Code cleanup inside vp9_decodeframe.c.
Removing unused DEC_DEBUG define and dec_debug variable. Changing function
signatures to eliminate code duplication, renaming function
mb_init_dequantizer to init_dequantizer. Also removing redundant curly
braces, and comments.


Change-Id: Ia56ee1b0be5f24abb0e878581845be8a4773c298
2013-07-15 14:47:25 -07:00
Frank Galligan
f4f60f6005 Neon: Update mbfilter if all vectors follow one branch.
Change the mbfilter Neon code from executing both branches if all
vectors follow only one branch.

The code is about 5% faster when executing only one branch and about
1% slower when executing both branches.

-PS5: Remove local stack space from mbfilter.

Change-Id: I6a23f9b318a9f4568a2718b4c9348db988fe2182
2013-07-15 13:08:28 -07:00
Jingning Han
6094bf37c5 Cosmetic changes in 4x4 and 8x8 fdct unit tests
Make the codes consistent with conventions.

Change-Id: Id044ed8382f83a3c3f54f9edd569f00bcd0523db
2013-07-15 11:37:17 -07:00
Jingning Han
043e0f9dad Skip inter-coded block reconstruction in rd loop
Skip the inverse transform and reconstruction of inter-mode coded
blocks in the rate-distortion optimization loop, when skip_encode_sb
feature is turned on. This provides about 1% speed-up at speed 0,
and 1.5% speed-up at speed 1. No performance change in both settings.

Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
2013-07-15 11:32:14 -07:00
Jingning Han
faff6ed0fb Skip duplicate block encoding in the rd loop
This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
         performance loss.

This operation is currently turned on in settings of speed 1.

Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
2013-07-15 11:08:58 -07:00
Dmitry Kovalev
1f14bbb624 Merge "Fixing vp9_get_pred_context_comp_ref_p function." 2013-07-15 10:51:42 -07:00
James Zern
04606d7258 vp9_loopfilter_intrin_sse2: make some funcs static
+ drop 'vp9_'

Change-Id: I4a4bec175316aab8f65c3a23bacc8362399a1357
2013-07-13 18:48:00 -07:00
James Zern
dc968d3d45 vp9_loopfilter_intrin_sse2: remove unused uv funcs
vp9_mbloop_filter_horizontal_edge_sse2 /
vp9_mbloop_filter_vertical_edge_uv_sse2

Change-Id: I61c4351ef0cce79fa4156a47ddace781f1566869
2013-07-13 18:44:32 -07:00
James Zern
bd6b79c44d vp9_loopfilter: remove uv function typedef
loop_filter_uvfunction is unused

Change-Id: I37eb3559e9eb2808f1f29dfea429441c94c9df2a
2013-07-13 18:38:28 -07:00
James Zern
9a4e175a64 filter_block_plane: reuse some constants
+ light const application
+ limit scope of params to build_lfi

Change-Id: I1031c556aec160a690921dc10e7aa8a707f43ecd
2013-07-13 18:21:05 -07:00
James Zern
b09d37af0c vp9_loopfilter.c: make some functions static
+ drop 'vp9_'

Change-Id: I8c8f1f421f7fc84d2efb80349cd725de3c9bf6bd
2013-07-13 18:14:03 -07:00
James Zern
dc1d2331f6 vp9: remove frames_{since,till}.. from MACROBLOCKD
frames_since_golden / frames_till_alt_ref_frame are unused.

Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b
2013-07-13 18:02:11 -07:00
James Zern
04092764f7 VP9_COMMON: remove unused framerate/bitrate
+ VP8_COMMON: place them under CONFIG_POSTPROC_VISUALIZER

Change-Id: I2702d5a3e1134b9c5f7ddc14b4173955a400f2cf
2013-07-12 21:43:23 -07:00
Jingning Han
91365addf8 SSE2 8x8 inverse ADST/DCT transform
This commit enables SSE2 implementation of 8x8 inverse ADST/DCT
transform. The runtime goes from 1216 cycles -> 266 cycles.
For bus_cif at 2000 kbps, the overall runtime reduces from
253707ms -> 248430ms, i.e., 2% speed-up at speed 0.

Change-Id: Ib0372e17e9162d7b11a10d653b1c8be547c878fb
2013-07-12 21:03:16 -07:00
James Zern
ce0324d8dd VP[89]_COMMON: remove unused near_boffset
Change-Id: If9b9ca703b997312df85241a0758d414cfdc5228
2013-07-12 19:41:27 -07:00
Dmitry Kovalev
429070987a Using vp9_copy and vp9_zero instead of custom code.
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
2013-07-12 18:07:43 -07:00
Dmitry Kovalev
31a68bcdff Fixing vp9_get_pred_context_comp_ref_p function.
Adding missed parenthesis around boolean expressions. Bitstream is changed.
Regenerating test vectors.

Change-Id: I4cc00b761e9473f92f180a9fc3a0c607f0aaae56
2013-07-12 17:46:02 -07:00
Dmitry Kovalev
31403080ff Merge "Removing redundant call to set_mi_row_col." 2013-07-12 17:08:23 -07:00
Dmitry Kovalev
3c94fffdb0 Removing redundant call to set_mi_row_col.
This function is actually called from set_offsets which is called right
before vp9_read_mode_info.

Change-Id: Ibb9d5ad606194bc80eab264fad85b31c9dfd8f77
2013-07-12 16:25:23 -07:00
Johann
a15bebfc0a vp9_convolve8_[horiz|vert]_avg
Super basic conversion from the other implementations. Any changes to
one should be trivial to copy over keep in sync.

Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
2013-07-12 16:21:33 -07:00
Yaowu Xu
cdea4a7c66 Merge "Fix a build issue" 2013-07-12 16:17:22 -07:00
Dmitry Kovalev
aa518af8c7 Merge "Adding struct tx_probs and struct tx_counts to cleanup the code." 2013-07-12 16:02:09 -07:00
Dmitry Kovalev
444c8d4c53 Merge "Making functions read_{inter, intra}_segment_id more similar." 2013-07-12 15:50:02 -07:00
James Zern
c9a2a06c20 Merge "vp9_postproc: remove useless self-assign" 2013-07-12 15:41:41 -07:00
James Zern
4fc6c88e9c yv12config: remove YUV_TYPE
this was never fleshed out in the context of VP8, for which it was
added. for VP9 it has no meaning.

Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff
2013-07-12 15:25:48 -07:00
Dmitry Kovalev
cc662dd768 Adding struct tx_probs and struct tx_counts to cleanup the code.
Also removing unused declarations from vp9_entropymode.h file.

Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
2013-07-12 15:22:38 -07:00
Dmitry Kovalev
60969da5cb Merge "Code cleanup in vp9_pred_common.c" 2013-07-12 15:04:07 -07:00
Dmitry Kovalev
db0d603b1c Making functions read_{inter, intra}_segment_id more similar.
Change-Id: I51f9ac910834f2d7aba2be4f7ffbce597e61a144
2013-07-12 14:50:33 -07:00
James Zern
cca973a1ab vp9_postproc: remove useless self-assign
Change-Id: I0bc5d2d8c9fec8be18263b0dc2528886bb5b7b61
2013-07-12 14:17:15 -07:00
Dmitry Kovalev
3ab86adb1e Code cleanup in vp9_pred_common.c
No bitstream changes. Using MB_MODE_INFO temp variables instead of
MODE_INFO variables. Removing redundant curly braces.

Change-Id: Ib9d1bedfbd8af97ecc722ccf697ea8177bbe287c
2013-07-12 14:11:48 -07:00
Yaowu Xu
fb754b182f Fix a build issue
Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
2013-07-12 11:38:44 -07:00
James Zern
0195fb53cb vp9: consistent 'log2' variable naming
lg2 -> log2

Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf
2013-07-12 11:37:43 -07:00
James Zern
37c0a1a8d0 Merge changes I33e76c42,I24aeac1e,If4192b40
* changes:
  vp9_dx_iface: s/vp8/vp9/ where possible
  vp[89]_dx_iface: delete unused function
  vp[89]_dx_iface: factorize vp8_mmap_*()
2013-07-12 11:10:18 -07:00
James Zern
563b4b20a8 vp9_dx_iface: s/vp8/vp9/ where possible
drop 'vp9_' from most static functions unrelated to the codec interface
itself.

Change-Id: I33e76c425bb7373570a57a61662a56d65ab4bdf3
2013-07-12 11:05:39 -07:00
James Zern
2908091342 Merge "msvs-build: use msbuild for vs >= 2005" 2013-07-12 10:59:35 -07:00
Deb Mukherjee
94c481f9f1 Some minor cleanups for efficiency
Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
2013-07-12 10:22:56 -07:00
Dmitry Kovalev
727631873d Merge "Removing redundant code mostly from vp9_pred_common.{h, c}." 2013-07-12 10:22:30 -07:00
Paul Wilkins
b8ddc9f0d3 Merge "Speed 2 feature adjustment." 2013-07-12 02:14:01 -07:00
James Zern
e202a2be03 vp[89]_dx_iface: delete unused function
static mmap_lkup

Change-Id: I24aeac1eca8453e28d58bc06925e58efc228a0a6
2013-07-11 23:03:24 -07:00
James Zern
b088998e5d vp[89]_dx_iface: factorize vp8_mmap_*()
s/vp8/vpx/ -> vpx_codec_internal.h / vpx_codec.c

Change-Id: If4192b40206276a761b01d44e334fe15bcb81128
2013-07-11 23:01:26 -07:00
Jingning Han
119decdee7 Merge "Cosmetic changes in 16x16 ADST/DCT unit test" 2013-07-11 21:52:39 -07:00
Jingning Han
84c3ac0476 Merge "Remove unnecessary tx_type branch in encode_block" 2013-07-11 21:52:27 -07:00
Dmitry Kovalev
dd150e8ea9 Removing redundant code mostly from vp9_pred_common.{h, c}.
Removing redundant function arguments and curly braces.

Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b
2013-07-11 18:39:10 -07:00
Johann
e6ab476dd4 Remove print_nmvcounts
For some reason iOS builds take a really long time to sort this
function out.

It's not used anywhere so remove it.

Change-Id: Ia5c8513a0d9c7eb32641cca58ca1c1113e2dd9f4
2013-07-11 17:22:03 -07:00
Ronald S. Bultje
ee09dd9949 Remove unused function block_error().
Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
2013-07-11 17:14:03 -07:00
James Zern
30bac896f9 Merge "vp9: fix peek_si for version==0" 2013-07-11 15:51:39 -07:00
James Zern
5b11e38aa7 Merge "small update to peek_si/get_si documentation" 2013-07-11 15:47:11 -07:00
Dmitry Kovalev
cae3fb7267 Merge "Calling is_inter_mode() instead of custom code." 2013-07-11 15:20:14 -07:00
Jingning Han
dac5891a1a Merge "SSE2 4x4 invserse ADST/DCT transform" 2013-07-11 14:17:23 -07:00
Dmitry Kovalev
8c05e59065 Calling is_inter_mode() instead of custom code.
Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf
2013-07-11 14:14:47 -07:00
Dmitry Kovalev
b55ecafda8 Merge "Making vp9_default_nmv_context static." 2013-07-11 13:58:34 -07:00
James Zern
43dc0f8886 small update to peek_si/get_si documentation
correct a doxygen and function reference

Change-Id: I525371d64969aa60c464d0f6a133bc29895d7991
2013-07-11 12:23:28 -07:00
James Zern
7645c9ab34 vp9: fix peek_si for version==0
Change-Id: I6bfec4fa50dfc1a953edb1a2aa8e97e6e896bed6
2013-07-11 12:22:39 -07:00
Dmitry Kovalev
c4ad3273c7 Moving segmentation related vars into separate struct.
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
2013-07-11 11:57:57 -07:00
Dmitry Kovalev
f70c021d36 Merge "Adding write_compressed_header function." 2013-07-11 11:57:17 -07:00
Dmitry Kovalev
802e57535a Merge "Removing unused TOKENEXTRA arg from pick_sb_modes function." 2013-07-11 11:46:06 -07:00
Jingning Han
29c45f31ee Cosmetic changes in 16x16 ADST/DCT unit test
Change-Id: Ic649e9e47d14d6f8cae0c443a425ea533a97ad8d
2013-07-11 11:37:38 -07:00
Johann
158c80cbb0 convolve8 optimizations for neon
Independent horizontal and vertical implementations.

Requires that blocks be built from 4x4 and [xy]_step_q4 == 16

6-10% improvement. CIF improved the least.

Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
2013-07-11 11:08:19 -07:00
hkuang
c9b25dcae4 Add neon optimize vp9_dc_only_idct_add.
Change-Id: Iae84ab945cc9662a0ddd839aa2b9ca59f2ae5423
2013-07-11 10:30:47 -07:00
Jingning Han
b9381b6faf Remove unnecessary tx_type branch in encode_block
The function encode_block is called only by inter-prediction modes,
hence removing the transform type branching there.

Change-Id: I34a3172e28ce2388835efd0f8781922211bff857
2013-07-11 09:11:35 -07:00
Scott LaVarnway
f2a6bcfb18 Eliminated prev_mip memsets/memcpys in encoder
This patch is in experimental but was not merged into master.

This patch swaps ptrs instead of copying and uses the
last show_frame flag instead of setting the entire buffer
to zero.

Change-Id: Ia0950466c8ba301a2a5bf917ff3d07bc1a2c2311
2013-07-11 10:47:28 -04:00
Jim Bankoski
5000cdf0ff Merge "Wide loopfilter 16 pix at a time" 2013-07-11 06:44:02 -07:00
Paul Wilkins
5290eeab88 Speed 2 feature adjustment.
With sf->auto_mv_step_size on it is questionable
whether sf->reduce_first_step_size is worthwhile.
At speed 2 it was not having a big impact.

Even at speed 2 sf->optimize_coefficients = 0 is not
having a big speed imapct so for now I have moved it
down into a higher speed setting.

Change-Id: I8a54de76d486ad37aabce76474889da2768b14c1
2013-07-11 13:59:12 +01:00
Jingning Han
49b6302044 SSE2 4x4 invserse ADST/DCT transform
Enable SSE2 4x4 inverse ADST/DCT transform. The runtime goes from
292 cycles down to 89 cycles. Running bus_cif at 2000 kbps, the
overall runtime of speed 0 goes from 301s to 295s (2% speed-up).

Change-Id: I24098136e7fee7ab2fbf1c11755bdf2ca37f3628
2013-07-10 20:16:02 -07:00
Jingning Han
aedc7c59b1 Merge "Fix tx_type bug in intra4x4 rd loop" 2013-07-10 20:13:25 -07:00
Ronald S. Bultje
decead7336 Replace copy_memNxM functions with a generic copy/avg function.
Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa
2013-07-10 18:27:24 -07:00
Ronald S. Bultje
c13e0bcb52 Remove unused fwalsh/fdct x86 SIMD implementations.
Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e
2013-07-10 18:22:51 -07:00
Dmitry Kovalev
ac72ad071d Making vp9_default_nmv_context static.
Change-Id: Ia3d5bd45adf288de11ab59c4728266c93c17e275
2013-07-10 17:44:45 -07:00
Ronald S. Bultje
46997bde88 Merge "Remove unused iwalsh4x4 MMX/SSE2 functions." 2013-07-10 17:08:46 -07:00
Ronald S. Bultje
a7ef456453 Merge "Remove unused 16x3/3x16 sad SSE2 functions." 2013-07-10 17:08:43 -07:00
John Koleszar
64f7a4d8cb Wide loopfilter 16 pix at a time
Where possible, do the 16 pixel wide filter while doing the horizontal
filtering pass. The same approach can be taken for the mbloop_filter
when that's implemented. Doing so on the vertical pass is a little more
involved, but possible.

Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
2013-07-10 16:32:44 -07:00
James Zern
73c4e28487 msvs-build: use msbuild for vs >= 2005
allows concurrent builds via the /m command line option

Change-Id: I668792ba00276e8626dc175c0a44ddab35fc7114
2013-07-10 16:18:13 -07:00
Dmitry Kovalev
544d8c3316 Removing unused TOKENEXTRA arg from pick_sb_modes function.
Change-Id: I0543e72fa092eef3976b65e16bb597197c364873
2013-07-10 15:57:28 -07:00
Jingning Han
18803f9cc4 Fix tx_type bug in intra4x4 rd loop
This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.

Change-Id: I7fe9953175b74890357dbcee33c138573766e980
2013-07-10 15:49:49 -07:00
Deb Mukherjee
7494bba66b Merge "Prunes out full-rd computation based on modeled rd" 2013-07-10 15:37:11 -07:00
Dmitry Kovalev
140447db5a Merge "Adding read_compressed_header function." 2013-07-10 15:11:08 -07:00
Dmitry Kovalev
0ac5e4dd58 Adding write_compressed_header function.
Change-Id: Ic5257fa8278e9b6297de230e4fd26a1e23ad2bb7
2013-07-10 15:08:34 -07:00
Jim Bankoski
68ef7a6b8a configure with internal stats not working
Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a
2013-07-10 15:07:53 -07:00
Ronald S. Bultje
3f210f10eb Remove unused iwalsh4x4 MMX/SSE2 functions.
Change-Id: I2d22577911a37ed7d8c7e08cac20764842267652
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
48c53233fd Remove unused 16x3/3x16 sad SSE2 functions.
Change-Id: I30a597c0cc366e34c9a3e2afe32d70e044f95ca4
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
e6f955251f Merge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction." 2013-07-10 14:52:23 -07:00
Ronald S. Bultje
6a60249071 Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction." 2013-07-10 14:52:19 -07:00
Jim Bankoski
865ca76604 Merge "remove warnings when NDEBUG is set" 2013-07-10 14:39:39 -07:00
Jim Bankoski
6591cf2f7e remove warnings when NDEBUG is set
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
2013-07-10 14:27:20 -07:00
Deb Mukherjee
53ff43adc3 Prunes out full-rd computation based on modeled rd
Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
2013-07-10 13:49:49 -07:00
James Zern
82f5935111 Merge "msvc: set a more useful debug format" 2013-07-10 13:02:22 -07:00
James Zern
9a8524d5ba Merge "test_libvpx: disable pthreads in gtest for win targets" 2013-07-10 13:01:52 -07:00
Jingning Han
114423538f SSE2 16x16 ADST/DCT hybrid transform
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.

Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
2013-07-10 12:14:53 -07:00
Dmitry Kovalev
417df1d42e Merge "Adding encode_tiles function to vp9_bitstream.c." 2013-07-10 11:43:50 -07:00
Yaowu Xu
e52eec490c Merge "Add a feature to reduce chrome intra mode search" 2013-07-10 11:35:47 -07:00
Jingning Han
82c415328c Merge "Add unit test for 16x16 forward ADST/DCT" 2013-07-10 11:16:39 -07:00
Scott LaVarnway
25b4909076 Merge "Bug fix: set frame_parallel_decoding_mode" 2013-07-10 11:09:30 -07:00
John Koleszar
d1f8dd518c Merge "Fix intermediate height in convolve" 2013-07-10 11:04:40 -07:00
Dmitry Kovalev
704afd0c7a Adding read_compressed_header function.
Splitting setup_txfm_mode into read_tx_mode and read_tx_probs.

Change-Id: I5b4fe48698d56490857d32eafcaeb4291f208479
2013-07-10 10:27:50 -07:00
Ronald S. Bultje
44b29a769c Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction." 2013-07-10 10:24:16 -07:00
Ronald S. Bultje
89810bfd71 Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction." 2013-07-10 10:13:16 -07:00
Ronald S. Bultje
4823307720 Merge "Remove memcpy() in handle_inter_mode() filter selection." 2013-07-10 10:13:07 -07:00
Dmitry Kovalev
20986c81b3 Merge "Removing vp9_maskingmv.c and corresponding assembly file." 2013-07-10 10:05:06 -07:00
Jingning Han
cf768b2d80 Add unit test for 16x16 forward ADST/DCT
Unit tests on the functional accuracy of forward ADST/DCT.

Change-Id: I81afff866bdeacbd457b0af96993a035741657f6
2013-07-10 09:40:46 -07:00
Ronald S. Bultje
7fd643264a SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction.
Change-Id: Iad70966b986f65259329070e258f76ef0af816b4
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
8dade638a1 SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction.
Change-Id: I3441c059214c2956e8261331bbf521525a617a86
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
75b33c68c7 SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction.
Change-Id: I55a6cfa2daba738cbc0c4a02f806893f7e556997
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
92c5d3665d SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction.
Change-Id: Ibe1690afc5459f3b3beca401e7734fcd03da6dd0
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
b1df674a99 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
2013-07-10 09:27:56 -07:00
Yaowu Xu
bed27a960a Add a feature to reduce chrome intra mode search
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
2013-07-10 08:59:18 -07:00
Jim Bankoski
863204e64d mi_width_log2 & mi_height_log2
converted to lookup to avoid unnecessary code

Change-Id: I2ee6a01f06984cc2c4ba74b3fffd215318f749d2
2013-07-10 07:26:08 -07:00
Jim Bankoski
6c8170af52 b_width_log2 and b_height_log2 lookups
Replace case statement with lookup.
    Small speed gain at low speed settings but at speed 2+ where the
    number of motion searches etc. falls the impact rises to ~3-4%.

    Change-Id: Idff639b7b302ee65e042b7bf836943ac0a06fad8

Change-Id: I5940719a4a161f8c26ac9a6753f1678494cec644
2013-07-10 07:19:09 -07:00
Jim Bankoski
fb027a7658 removing case statements around prediction entropy coding
Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b
2013-07-09 20:10:16 -07:00
Yaowu Xu
059f2929e9 Merge "Revert "Remove memcpy() in handle_inter_mode() filter selection."" 2013-07-09 20:10:06 -07:00
James Zern
c2a4b2059e Merge "remove unused VP8 com/dec asm offsets" 2013-07-09 19:13:49 -07:00
James Zern
dac57fece6 Merge "Remove all asm offset files from VP9" 2013-07-09 19:13:37 -07:00
Dmitry Kovalev
2824048a56 Merge "Loop filter code cleanup." 2013-07-09 18:56:19 -07:00
Yaowu Xu
205efbc153 Revert "Remove memcpy() in handle_inter_mode() filter selection."
This reverts commit fcf7998a47.

Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb
2013-07-09 17:42:10 -07:00
James Zern
9718e708f3 msvc: set a more useful debug format
pdb vs. c7; works better with test_libvpx

Change-Id: I67d18e328dd8e7734d3710f3912e9b179d368a62
2013-07-09 17:28:22 -07:00
Yaowu Xu
d3d6ddcee5 Merge "Added a lossless test" 2013-07-09 17:15:09 -07:00
Dmitry Kovalev
d82f459d1a Adding encode_tiles function to vp9_bitstream.c.
Change-Id: Ie44824ec25fd8fdb25d7c8124a9b28c26d802029
2013-07-09 15:59:19 -07:00
Frank Galligan
53971d86ea Merge "Add Neon horizontal and vertical vp9_mbloop_filter" 2013-07-09 15:38:44 -07:00
Yaowu Xu
9ce6de195b Added a lossless test
It does encodings with min and max q set at 0, and check to make sure
output PSNR at MAX_PSNR (100).

Change-Id: Ia2418353cccf6e487204ea4ff874a7e71e55cb3e
2013-07-09 14:40:20 -07:00
James Zern
f89335f7ca remove unused VP8 com/dec asm offsets
Change-Id: Ib3b26ee27f04b2dcbbd32b3127afb45e9f50cfcf
2013-07-09 14:33:49 -07:00
John Koleszar
f0d9f10d24 Remove all asm offset files from VP9
The files are empty and unused.

Change-Id: Ieb4242d14273efdf24149bda33f9591540bba06a
2013-07-09 14:26:53 -07:00
Scott LaVarnway
5900d13183 Merge "Removed unnecessary xd->mode_info_context assignment" 2013-07-09 12:45:32 -07:00
Frank Galligan
198fa6d0a0 Add Neon horizontal and vertical vp9_mbloop_filter
- The vp9 mbfilter C code will branch on flat and mask. This CL
  will perform both branches and combine the data. A later CL will
  perform a check to see if all patch will take one branch.
- These functions are about 1.75 times faster than the C code on
  Nexus 7.

PS #3
- Changed all functions to dub limit, blimit, and thresh from
  vld {dx[]}, freeing up r4-r6.
- Changed code to use vbif to reduce one instruction and free
  up a d register.

Change-Id: I028dae0e434dc9891c3677bdb182e201ffb04777
2013-07-09 12:40:05 -07:00
Dmitry Kovalev
ec68d25521 Merge "Adding update_tx_ct function, removing duplicated code." 2013-07-09 12:26:11 -07:00
Dmitry Kovalev
aeed28f143 Removing vp9_maskingmv.c and corresponding assembly file.
Change-Id: I9842d02d61d78d17dc3449bae8ffbe60f4b3ecb3
2013-07-09 11:22:56 -07:00
Dmitry Kovalev
92a9eaef50 Loop filter code cleanup.
Using MAX_LOOP_FILTER constant instead of number 63.

Change-Id: If91e0c198331b3041e7cd0707a5948479e9209d8
2013-07-09 11:18:09 -07:00
Scott LaVarnway
69d1d1d865 Removed unnecessary xd->mode_info_context assignment
mi is xd->mode_info_context

Change-Id: Ib101be922b695205ec57b5ce1828ba19bde5b41c
2013-07-09 13:41:34 -04:00
Ronald S. Bultje
204d1b7058 Merge "Unbreak lossless." 2013-07-09 09:54:48 -07:00
Ronald S. Bultje
d8fa5d45cc Merge "Make intra prediction pointers RTCD-based." 2013-07-09 09:54:43 -07:00
Ronald S. Bultje
059c0ba5d4 Unbreak lossless.
Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1
2013-07-09 09:46:37 -07:00
Jim Bankoski
7f960223ea cleanup read_mode_info if (1)
Change-Id: I851af23c787a2d3637d84244b9f75063cbf782f1
2013-07-09 09:04:45 -07:00
Jim Bankoski
c36d502e92 decoder speedup - get-segment-id only if segmentation enabled
Change-Id: I9355f8446660aeb7dfdbc5ee56635c791ac35e95
2013-07-09 08:52:30 -07:00
Yaowu Xu
df5731273f Merge "Fix loopfilter bug" 2013-07-09 01:34:25 -07:00
Dmitry Kovalev
c6c279aff0 Merge "Using mi_cols instead of mb_cols." 2013-07-08 20:09:19 -07:00
Dmitry Kovalev
1c65c580d6 Merge "Refactoring setup_pre_planes function." 2013-07-08 20:08:05 -07:00
Dmitry Kovalev
6254c8d780 Merge "Calling set_partition_seg_context() instead of code duplication." 2013-07-08 20:07:06 -07:00
Ronald S. Bultje
8350e7fe38 Make intra prediction pointers RTCD-based.
This probably has a mildly negative impact on performance, but will
(in future commits - or possibly merged with this one) allow SIMD
implementations of individual intra prediction functions. We may
perhaps want to consider having separate functions per txfm-size
also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for
each intra prediction mode), but I haven't played much with that
yet.

Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269
2013-07-08 17:25:51 -07:00
Ronald S. Bultje
a5062cc635 Don't call encode_sb() for the final of 4-split subpartitions.
The resulting reconstruction is never used, thus it just wastes CPU
cycles. Reduces encode time of first 50 frames of bus (speed 0) @
1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.

Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
bd867f1619 Inline vp9_get_mv_joint().
Encode time for first 50 frames of bus (speed 0) @ 1500kbps goes from
2min10.9 to 2min10.5, i.e. 0.3% faster overall, basically because we
prevent the call overhead.

Change-Id: I1eab1a95dd3eae282f9b866f1f0b3dcadff073d5
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
8fde07a3ae Don't recalculate mv_ref costs for each block/partition.
Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.

Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
5a73254918 Remove unnecessary memset(best_index, 0) from trellis/optimize.
First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
2min11.6, i.e. 0.75% overall speedup.

Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
fcf7998a47 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
ed995afba1 Make frame-wide filter-type decision fully RD-based.
Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
2013-07-08 16:22:37 -07:00
Dmitry Kovalev
b7559258a4 Using mi_cols instead of mb_cols.
Eliminating usage of mb-units, switching to mi-units. Adding
ALIGN_POWER_OF_TWO macro.

Change-Id: I2491c969f713207c062011878b57e4e531818607
2013-07-08 14:54:04 -07:00
Deb Mukherjee
d9b62160a0 Implements several heuristics to prune mode search
Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
2013-07-08 12:17:12 -07:00
Jingning Han
a38cf2658a Merge "Refactor SSE2 8x8 functional units" 2013-07-05 11:18:18 -07:00
Tero Rintaluoma
18303b1263 Fix intermediate height in convolve
intermediate_height for horizontal filtering must be at least 8
pixels to be able to do vertical filtering correctly. Currently
it can be less for small block and y_step_q4 sizes.

Change-Id: I2ee28b0591b2041c2fa9844d0ae2ff8a1a59cc21
2013-07-05 14:58:25 +03:00
Paul Wilkins
ef0ca2deaa Merge "Fix to comp_inter_joint_search_thresh feature." 2013-07-04 03:27:00 -07:00
Dmitry Kovalev
bfcef95c45 Adding update_tx_ct function, removing duplicated code.
Change-Id: I8882fe3cd247a5a8304ab8ab2ee9abdb92830133
2013-07-03 18:24:13 -07:00
Dmitry Kovalev
f72e072555 Refactoring setup_pre_planes function.
Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63
2013-07-03 17:42:01 -07:00
Dmitry Kovalev
2ce6b23473 Merge "Adding write_skip_coeff function." 2013-07-03 16:33:58 -07:00
Jingning Han
68172dbede Merge "Enable early termination in rd search" 2013-07-03 14:20:41 -07:00
Dmitry Kovalev
430bd0c94a Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE." 2013-07-03 14:16:02 -07:00
Dmitry Kovalev
dda1835dc6 Adding write_skip_coeff function.
Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd
2013-07-03 13:23:47 -07:00
Yaowu Xu
c86b18894e Merge "Inline a few intra predictors" 2013-07-03 13:21:22 -07:00
Jingning Han
2bd6fe08f8 Enable early termination in rd search
This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
2013-07-03 12:54:18 -07:00
Dmitry Kovalev
2ad62c9312 Calling set_partition_seg_context() instead of code duplication.
Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555
2013-07-03 11:15:58 -07:00
Dmitry Kovalev
5a21de8418 Replacing 64 / MI_SIZE with MI_BLOCK_SIZE.
Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
2013-07-03 10:54:50 -07:00
Dmitry Kovalev
60198a595d Merge "Adding write_selected_txfm_size function." 2013-07-03 10:33:55 -07:00
Yaowu Xu
0f02dc2709 Inline a few intra predictors
Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710
2013-07-03 10:20:41 -07:00
Jingning Han
2cb75c9607 Refactor SSE2 8x8 functional units
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
2013-07-03 10:11:59 -07:00
Ronald S. Bultje
61fe678f36 Merge "Use pmovmskb to skip quantize loops over empty coefficients." 2013-07-03 09:05:48 -07:00
Ronald S. Bultje
98c493a1c0 Merge "Remove unused function vp9_build_inter4x4_predictors_mbuv()." 2013-07-03 09:05:20 -07:00
Paul Wilkins
f58b44ad62 Fix to comp_inter_joint_search_thresh feature.
When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88
2013-07-03 16:58:34 +01:00
Paul Wilkins
72c5778ec5 Added two new skip experiments.
sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
2013-07-03 16:56:06 +01:00
Scott LaVarnway
5e6b333902 Bug fix: set frame_parallel_decoding_mode
This patch allows the frame_parallel_decoding_mode flag
to be set instead of returning a codec error.

Change-Id: I4a1631c625723ac8873290d0fd0211074a87d112
2013-07-03 10:25:29 -04:00
Paul Wilkins
b0a2871c35 Merge "Adjust Speed 0 settings." 2013-07-03 02:47:18 -07:00
Dmitry Kovalev
1f6e95e76a Merge "Removing redundant struct from union b_mode_info." 2013-07-02 18:09:31 -07:00
Yaowu Xu
16147d4bcc Merge "Added a speed feature use_square_partition_only" 2013-07-02 17:16:11 -07:00
Dmitry Kovalev
be77f6bbbf Removing redundant struct from union b_mode_info.
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
2013-07-02 16:51:57 -07:00
Dmitry Kovalev
edb060a77c Adding write_selected_txfm_size function.
Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214
2013-07-02 16:41:22 -07:00
Yaowu Xu
0d7b7c09cb Added a speed feature use_square_partition_only
This commit adds a speed feature where only squared partition are
evaluated in partition picking. Enable this feature in cpu-used 2
reduces encoding time by ~30%.

loss of compression:
-0.9% on cif set
-1.23% on stdhd

Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
2013-07-02 16:40:15 -07:00
Ronald S. Bultje
e5fb4b61b6 Use pmovmskb to skip quantize loops over empty coefficients.
If none of the 16 coefficients that we quantize per loop iteration
are larger than the zbin, directly skip to the next round of coeffs,
rather than doing a full quantize loop that will eventually result
in 16 zeroes. This incurs a jump cost, but saves a lot of other work.
32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded
no significantly positive results for smaller transforms, so is not
used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles).

Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647
2013-07-02 16:34:24 -07:00
Ronald S. Bultje
5b87240230 Remove unused function vp9_build_inter4x4_predictors_mbuv().
Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
2013-07-02 16:34:24 -07:00
Jim Bankoski
b0520b61ed new unit test for cpu-speed
Tests q0 ( lossless),  very high bitrate and low bitrates at cpu speed
0, 1 and 2.

Change-Id: I0c5cdca00acd8d01e7b13f124b3b08d4b1ae9f6d
2013-07-02 14:38:03 -07:00
Deb Mukherjee
37501d687c Speed feature to binary search dir intramodes
This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
2013-07-02 14:07:19 -07:00
Deb Mukherjee
66324d501a Merge "Clean-up in forward update to use mapping tables" 2013-07-02 14:02:53 -07:00
Deb Mukherjee
8d3d2b76f3 Tx size selection enhancements
(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
2013-07-02 13:54:00 -07:00
Deb Mukherjee
9c20cedd93 Clean-up in forward update to use mapping tables
Uses mapping tables instead of complicated modulo/division
operations for prob mapping for forward updates.

No bit-stream or output change.

Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546
2013-07-02 12:48:20 -07:00
Dmitry Kovalev
904070ca64 Merge "Removing unused implicit segmentation code." 2013-07-02 11:58:48 -07:00
Ronald S. Bultje
3cc6eb7c00 Merge "Make get_coef_context() branchless." 2013-07-02 11:48:15 -07:00
Dmitry Kovalev
3140c443e4 Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h." 2013-07-02 11:31:35 -07:00
Dmitry Kovalev
18fd43601c Merge "Additional vp9_decodemv.c cleanup." 2013-07-02 11:31:23 -07:00
Dmitry Kovalev
a3d2e6c98b Removing unused implicit segmentation code.
Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905
2013-07-02 11:16:42 -07:00
Yunqing Wang
f4bee75c2b Merge "Add speed feature to disable splitmv" 2013-07-02 10:54:22 -07:00
Yunqing Wang
b12e060b55 Add speed feature to disable splitmv
Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.

(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)

Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6
2013-07-02 10:02:34 -07:00
Jingning Han
b91a1586a3 Calculate rd cost per transformed block
Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
2013-07-02 09:58:46 -07:00
Ronald S. Bultje
9df24b41ca Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also." 2013-07-02 09:38:08 -07:00
Paul Wilkins
1319d9c077 Adjust Speed 0 settings.
Remove the use of sf->comp_inter_joint_search_thresh
from the baseline speed 0. Approx +0.4% on derf.

Change-Id: Icc14db98909830f40e5ac66130d40e78d2e55c71
2013-07-02 15:42:14 +01:00
Paul Wilkins
b7cd01ed73 Revert "New motion threshold factor - speed feature."
This reverts commit 1377278180.
Also fixes a spelling mistake.

Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
2013-07-02 15:06:40 +01:00
Yaowu Xu
9e408e3504 fix the mismatch again in cpu_used 2
Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679
2013-07-01 19:13:18 -07:00
Jim Bankoski
d4158283e7 use partitioning from last frame
This cl converts use partition from last frame to do the following:

if part is none,horz, vert -> try split
if part != none and one of the children is not split - try none


Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
2013-07-01 18:18:50 -07:00
Dmitry Kovalev
1ac0540296 Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h.
Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
2013-07-01 17:28:08 -07:00
Ronald S. Bultje
26b6318de8 Make get_coef_context() branchless.
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
2013-07-01 16:34:10 -07:00
Dmitry Kovalev
7ed754995d Additional vp9_decodemv.c cleanup.
Change-Id: I5b413bc0884af0bda38c05332d86490103905b3b
2013-07-01 16:14:13 -07:00
Yaowu Xu
ba3b2604f0 Merge "Quantize (64-bit only, for now) SSSE3 SIMD." 2013-07-01 15:58:57 -07:00
Dmitry Kovalev
6411228aca Merge "Removing vp9_modecont.{h, c}." 2013-07-01 14:58:48 -07:00
Dmitry Kovalev
d9db0d96ec Merge "Moving encoder subexp encoding functions to subexp.{h, c}." 2013-07-01 14:58:36 -07:00
Dmitry Kovalev
a4e14d7f9f Merge "Adding vp9_rb_read_signed_literal function." 2013-07-01 14:58:20 -07:00
Dmitry Kovalev
33fffc155e Merge "Inlining decode_atom, decode_sb_intra, and decode_sb." 2013-07-01 14:58:06 -07:00
Dmitry Kovalev
2381d46480 Merge "Cleanup inside vp9_decodemv.c." 2013-07-01 14:50:32 -07:00
Ronald S. Bultje
c8defcfdee Update quantize SSSE3 SIMD to cover 32x32 transform case also.
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
2min10.1, i.e. a 2.3% overall speed increase.

Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
2013-07-01 11:36:33 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Dmitry Kovalev
2ab3bc8871 Removing vp9_modecont.{h, c}.
Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.

Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de
2013-07-01 10:17:15 -07:00
Paul Wilkins
7bb436feee Merge "New motion threshold factor - speed feature." 2013-07-01 09:39:02 -07:00
Yaowu Xu
632289b31f fix a mismatch in cpuused 2
Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
2013-07-01 08:54:50 -07:00
Paul Wilkins
1377278180 New motion threshold factor - speed feature.
Added a speed feature that focuses only on thresholds
for new motion modes.

Moved sf->comp_inter_joint_search_thresh into speed
1.  This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.

Slight adjustment to baseline thresholds.

Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
2013-07-01 12:11:21 +01:00
Dmitry Kovalev
e5e15eb38e Adding vp9_rb_read_signed_literal function.
Change-Id: I30ea91561ffac7e5065ba41b2d3ab7dedb720593
2013-07-01 02:09:36 -07:00
Jingning Han
993942ce0c Merge "Enable SSE2 4x4 ADST/DCT transform" 2013-06-29 15:57:04 -07:00
Christian Duvivier
466e0cf303 SSE2 version of vp9_short_fdct32x32_rd.
43,000 -> 5,750 cycles, about 7.5x faster.

Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
2013-06-29 13:53:00 -07:00
Dmitry Kovalev
bb8ccf1caf Moving encoder subexp encoding functions to subexp.{h, c}.
Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca
2013-06-29 11:50:45 -07:00
Ronald S. Bultje
bc70c60b25 Merge "fixed a bug where sse is not populated" 2013-06-29 07:42:41 -07:00
Johann
6098e359f4 Merge "add Neon optimized add constant residual functions" 2013-06-28 19:50:38 -07:00
James Zern
84d08fa9c4 Merge "fix test compile error" 2013-06-28 19:48:05 -07:00
Ronald S. Bultje
a487af8d35 Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)." 2013-06-28 19:37:11 -07:00
Ronald S. Bultje
7731e53839 Merge "Minor change to prevent one level of dereference in cost_coeffs()." 2013-06-28 19:36:56 -07:00
chm
a83cfd4da1 add Neon optimized add constant residual functions
- Add add_constant_residual_8x8 16x16 32x32 functions
- Tested under RealView debugger enviroment

Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e
2013-06-28 19:06:51 -07:00
Dmitry Kovalev
d6264f9adf Merge "Cosmetic reordering of FRAME_CONTEXT members." 2013-06-28 18:38:02 -07:00
Dmitry Kovalev
1947828c3d Inlining decode_atom, decode_sb_intra, and decode_sb.
Change-Id: I41711bb994f542c5ba3d0cefd9b2e79db3c2c3a1
2013-06-28 18:34:30 -07:00
James Zern
a63e31e81e fix test compile error
since:
92479d9 Make update_partition_context faster

fixes:
vp9/common/vp9_blockd.h:408:22: error:
non-constant-expression cannot be narrowed from type 'int' to 'char' in
initializer list [-Wc++11-narrowing]
  char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
                     ^~~~~~~~~~~~~~~~~

Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b
2013-06-28 18:07:37 -07:00
Jingning Han
1109b6b888 Enable SSE2 4x4 ADST/DCT transform
This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.

Change-Id: Iad4d526346e05c7be896466c05500711bb763660
2013-06-28 17:24:43 -07:00
Yaowu Xu
f853e662b7 fixed a bug where sse is not populated
Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc
2013-06-28 17:10:22 -07:00
Jingning Han
07b72ace70 Merge "Fix switch statement in 8x8 transform" 2013-06-28 16:49:59 -07:00
Dmitry Kovalev
228b8232d3 Cosmetic reordering of FRAME_CONTEXT members.
Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2
2013-06-28 16:16:03 -07:00
Dmitry Kovalev
15fefced7d Cleanup inside vp9_decodemv.c.
Adding read_skip_coeff function. Renaming decode_mv to read_mv for
consistency with another function names. Removing redundant function
arguments. Renaming kfread_modes to read_intra_mode_info, read_mb_modes_mv
to read_inter_mode_info, vp9_decode_mb_mode_mv to vp9_read_mode_info,
vp9_decode_mode_mvs_init to vp9_prepare_read_mode_info. Inlining function
mb_mode_mv_init inside vp9_prepare_read_mode_info.

Change-Id: Ifee05d333da4cd331d4aff40ce41ccd9b70e494a
2013-06-28 15:32:31 -07:00
Dmitry Kovalev
59070f6e3c Merge "Removing CONFIG_DEBUG checks on assertions." 2013-06-28 14:03:28 -07:00
Jingning Han
9def7f72a0 Fix switch statement in 8x8 transform
Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
2013-06-28 13:40:36 -07:00
Ronald S. Bultje
cee3bc6ffa Merge "Some minor optimizations for cost_coeffs()." 2013-06-28 11:54:50 -07:00
Ronald S. Bultje
ec5d09b950 Merge "Make coefficient skip condition an explicit RD choice." 2013-06-28 11:54:28 -07:00
Ronald S. Bultje
d00b8e5f82 Inline vp9_get_coef_context() (and remove vp9_ prefix).
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
2013-06-28 10:40:21 -07:00
Dmitry Kovalev
0345fc3ad9 Merge "Decoder's code cleanup." 2013-06-28 10:38:54 -07:00
Dmitry Kovalev
8e6ce6bb9e Removing CONFIG_DEBUG checks on assertions.
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.

Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
2013-06-28 10:36:20 -07:00
Ronald S. Bultje
e3ce2b2ab3 Minor change to prevent one level of dereference in cost_coeffs().
4x4: 234 -> 236 cycles
8x8: 878 -> 888 cycles
16x16: 3664 -> 3550 cycles
32x32: 18134 -> 17392 cycles

Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
2013-06-28 10:29:07 -07:00
Ronald S. Bultje
91d223bd5c Some minor optimizations for cost_coeffs().
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles

Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.

Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
2013-06-28 10:29:02 -07:00
Ronald S. Bultje
af660715c0 Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28 10:28:49 -07:00
Yaowu Xu
1b5421f3c5 Merge "Minor cleanups" 2013-06-28 09:56:11 -07:00
Yaowu Xu
64bb996e03 Merge "Optimize partition search order" 2013-06-28 09:29:39 -07:00
Yaowu Xu
8b9eea0a34 Minor cleanups
Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
2013-06-28 09:19:50 -07:00
Yaowu Xu
1374a06bd8 Optimize partition search order
This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.

This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set

Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
2013-06-28 07:13:54 -07:00
Ronald S. Bultje
5de08054ae Merge "Fix tile independence with both column tiling and static_thresh set." 2013-06-28 07:01:15 -07:00
James Zern
5ec57c91b7 Merge "variance_test: add missing ClearSystemState..." 2013-06-27 22:25:33 -07:00
Ronald S. Bultje
fd4eed3b08 Fix tile independence with both column tiling and static_thresh set.
Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
2013-06-27 21:56:40 -07:00
Dmitry Kovalev
3231da0a9e Decoder's code cleanup.
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.

Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
2013-06-27 16:15:43 -07:00
Frank Galligan
1d6dc1b702 Add Neon optimized loop filter functions.
- Added vp9_loop_filter_horizontal_edge_neon and
  vp9_loop_filter_vertical_edge_neon.
- The functions are based off the vp8 loopfilter
  functions.
- Matches x86 md5 checksum.

Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0
2013-06-27 16:14:45 -07:00
Dmitry Kovalev
a3664258c5 Merge "General cleanup in segmentation-related code." 2013-06-27 14:57:07 -07:00
Dmitry Kovalev
be83ef3104 Merge "Moving subexp encoding functions in separate vp9_dsubexp.c file." 2013-06-27 14:55:18 -07:00
Ronald S. Bultje
7a049be6bf Inline quantize so idiv instruction gets removed from inner loop.
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
3min15.0 to 3min10.9, i.e. 2.1% faster overall.

Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
2013-06-27 09:01:04 -07:00
Paul Wilkins
05ffdf2625 Merge "Auto adapt step size feature." 2013-06-27 02:28:41 -07:00
Paul Wilkins
59af9049d3 Merge "Start adaptive threshold for each mode at max." 2013-06-27 02:28:36 -07:00
Paul Wilkins
5bcf069c6b Merge "Change meaning of cpi->sf.first_step and rename." 2013-06-27 02:28:21 -07:00
Jingning Han
fc1cfd8e32 Merge "Make intra predictor reference buffer configurable" 2013-06-26 19:02:02 -07:00
Jingning Han
4c10515f89 Merge "Make update_partition_context faster" 2013-06-26 19:01:45 -07:00
James Zern
e4d2c255f1 test_libvpx: disable pthreads in gtest for win targets
currently threading is internal to libvpx so thread safety is unneeded
in libgtest -- visual studio builds already operate in this way as they
do not have pthread.h available by default.

this removes an unconditional link to libpthread using $(extralibs)
should libvpx require it.

Change-Id: I2f278b711f533d0f4d8a6c896833e3e2237d1f45
2013-06-26 18:35:11 -07:00
James Zern
e247ab09a6 variance_test: add missing ClearSystemState...
...to recently added SubpelVarianceTest

Change-Id: I8775e39fd5dbfba81ad42b79b47bf6dd6ca8cc0e
2013-06-26 18:32:21 -07:00
Yaowu Xu
896dc47cac Merge "Change to use LUT for mode-to-txfm conversion" 2013-06-26 17:19:47 -07:00
Jingning Han
861cb06c67 Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.

Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
2013-06-26 17:17:21 -07:00
Jingning Han
a9e7243d1a Merge "Remove empty function vp9_build_block_offsets" 2013-06-26 17:06:56 -07:00
Jingning Han
92479d9526 Make update_partition_context faster
Use vpx_memset for updating the partition contexts. Thanks to Noah
for pointing out the need of refactoring in this part.

Change-Id: I67fb78429d632298f1cd8a0be346cc76f79392a6
2013-06-26 17:05:51 -07:00
Ronald S. Bultje
b5468155b6 Remove unused macro RDTRUNC_8x8 from encodemb.c.
Change-Id: I0c097567adab24215d807963ccb34810a2afe007
2013-06-26 15:34:01 -07:00
Jingning Han
bd9bac0391 Remove empty function vp9_build_block_offsets
This function is empty, hence is removed.

Change-Id: Ia9d01710806bffe0398a6dc9405f8a5a81b27d74
2013-06-26 14:55:47 -07:00
Yaowu Xu
25fe05fd92 Change to use LUT for mode-to-txfm conversion
Change-Id: Ieb989830f49e6708ee7728eddebf7a2144c37c6f
2013-06-26 14:10:43 -07:00
Jingning Han
3d13a988fd Merge "Fix aligned memory allocation in unit tests" 2013-06-26 12:21:05 -07:00
Jingning Han
9b744ce35b Fix aligned memory allocation in unit tests
Change-Id: I38fac90e0ed25cb747453ab1d6396187cf5ef3b9
2013-06-26 11:59:46 -07:00
Paul Wilkins
f36f66cd37 Merge "fixed a compiling problem with MSVC win32 build" 2013-06-26 11:58:16 -07:00
Paul Wilkins
9f3ab83486 Auto adapt step size feature.
Also tweaks to other features and experiments with
what is on and off at different speed settings.

Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
2013-06-26 19:48:39 +01:00
James Zern
e4f38c88da test/fdct*: fix some warnings
comment out some unused parameters and adjust the format to avoid:
./test/fdct4x4_test.cc|27| warning C4138: '*/' found outside of comment

Change-Id: I60f93b4c3cd7e8d61f0de80019f3404b40161f03
2013-06-26 11:09:08 -07:00
Dmitry Kovalev
be07485e9a General cleanup in segmentation-related code.
Using consistent function and variable names.

Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7
2013-06-26 10:27:28 -07:00
Dmitry Kovalev
49dee16879 Merge "Using get_plane_block_{width, height} instead of custom code." 2013-06-26 10:23:27 -07:00
Yaowu Xu
60dc7375af fixed a compiling problem with MSVC win32 build
The aligned array in parameter list caused win32 build to report
c2719 error. This commit fixed the issue by make the parameter
type a pointer instead of an array.

Change-Id: I4ed654ce4eba2db4995d9cdc136c68e9a6acc992
2013-06-26 09:33:16 -07:00
Paul Wilkins
689957e3ad Start adaptive threshold for each mode at max.
Each frame we reset all adaptive thresholds to MAX
rather than base. As modes are picked their thresholds
drop down.

Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
2013-06-26 17:04:47 +01:00
Paul Wilkins
e606cac046 Change meaning of cpi->sf.first_step and rename.
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.

The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.

Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
2013-06-26 17:04:06 +01:00
John Koleszar
2291563ba8 Merge "vpxenc: send usage to stderr" 2013-06-25 22:44:39 -07:00
John Koleszar
8c89e6b2ba Merge ".gitignore: add gcov files" 2013-06-25 22:44:26 -07:00
John Koleszar
8137e24f3d Merge "Move vp9_counts_to_nmv_context to encoder" 2013-06-25 22:44:21 -07:00
John Koleszar
7bbb0633cd Merge "Move vp9_full_to_model_counts to encoder" 2013-06-25 22:44:16 -07:00
John Koleszar
e0d07abd48 Merge "make: add libvpx_test_srcs.txt target" 2013-06-25 22:30:50 -07:00
John Koleszar
743e3393f3 Merge "tests/*source: test file pointer before reading" 2013-06-25 22:29:37 -07:00
John Koleszar
ad9ee2c91b Merge "encode_test_driver: check for fatal failures" 2013-06-25 22:27:39 -07:00
Jingning Han
3cc8c8c3a0 Merge "Refactor intra predictor block" 2013-06-25 19:46:55 -07:00
James Zern
66c7dffd5c tests/*source: test file pointer before reading
if the caller did not abort after an ASSERT failure in Begin()
FillFrame() would segfault.

Change-Id: I2d3f5a0918611bbd081be6f686dea19c56695073
2013-06-25 17:57:52 -07:00
James Zern
1c05e9de2c encode_test_driver: check for fatal failures
Make the base test be:
!(fatal || abort_) removing some redundancy in the encode tests

Change-Id: I8ffaf33fcf9a3030b38ea3e8eb94704cdc2fc920
2013-06-25 17:57:52 -07:00
Jingning Han
d19ea3861d Refactor intra predictor block
Remove vp9_intra4x4_predict(). Use the common intra prediction
function for all block sizes.

Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
2013-06-25 16:33:13 -07:00
Dmitry Kovalev
6fb10f2de4 Renaming "nmv" to "mv".
Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
2013-06-25 15:19:18 -07:00
Dmitry Kovalev
dc0f457c94 Using get_plane_block_{width, height} instead of custom code.
Change-Id: I453ed11b965e857a14c18ea5c0f4a0a48e7dc0d9
2013-06-25 14:11:18 -07:00
Ronald S. Bultje
0441e0a2fc Merge "Only do metrics on cropped (visible) area of picture." 2013-06-25 13:51:18 -07:00
Ronald S. Bultje
1d0ae2e63c Merge "Don't skip right/bottom border pixels in SSIM calculations." 2013-06-25 13:51:04 -07:00
Ronald S. Bultje
c5be54eef3 Merge "Add averaging-SAD functions for 8-point comp-inter motion search." 2013-06-25 13:50:53 -07:00
James Zern
4a7cebe690 make: add libvpx_test_srcs.txt target
same application as libvpx_srcs.txt

Change-Id: I1f096cc3c180d205365663c1aa5533b52561d811
2013-06-25 13:50:30 -07:00
Jingning Han
3f184bce7b Merge "Cosmetic changes in 4x4 fwd transform unit test" 2013-06-25 13:17:23 -07:00
Jingning Han
d52c359d43 Merge "Tune the rounding operations in 8x8 ADST/DCT sse2" 2013-06-25 13:17:05 -07:00
James Zern
4c0f283886 Merge "I420VideoSource: normalize framerate types" 2013-06-25 12:57:49 -07:00
Ronald S. Bultje
450c7b57a8 Only do metrics on cropped (visible) area of picture.
The part where we align it by 8 or 16 is an implementation detail that
shouldn't matter to the outside world.

Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
2013-06-25 12:57:28 -07:00
Ronald S. Bultje
44f349df62 Don't skip right/bottom border pixels in SSIM calculations.
Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1
2013-06-25 12:57:28 -07:00
Ronald S. Bultje
c24d922396 Add averaging-SAD functions for 8-point comp-inter motion search.
Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
the variance of the averaging predictor. This is slightly suboptimal
because the function is subpixel-position-aware, but it will (at least
for the SSE2 version) not actually use a bilinear filter for a full-pixel
position, thus leading to approximately the same performance compared to
if we implemented an actual average-aware full-pixel variance function.
That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
leading to a total gain of 2.7%.

Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
2013-06-25 12:57:28 -07:00
James Zern
9d95993115 Merge "intrapred_test: add virtual dtor to IntraPredBase" 2013-06-25 12:56:40 -07:00
Jingning Han
0084e61d5f Tune the rounding operations in 8x8 ADST/DCT sse2
Improve the round-trip precision to meet the unit test setttings.

Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79
2013-06-25 12:02:26 -07:00
Ronald S. Bultje
becf1691c4 Merge "Add SAD unit tests for all rectangular sizes." 2013-06-25 12:00:41 -07:00
Ronald S. Bultje
5ebe47747d Merge "Don't re-allocate comp_pred buffers for each call to comp motion search." 2013-06-25 12:00:36 -07:00
Dmitry Kovalev
9467571777 Moving subexp encoding functions in separate vp9_dsubexp.c file.
Change-Id: Idbb2ea80f764fa830fe2ddcfc54ef7fe232f05a8
2013-06-25 11:53:17 -07:00
Dmitry Kovalev
5ae096778e Merge "Removing unused code." 2013-06-25 11:50:55 -07:00
Jingning Han
29b6e73c2c Cosmetic changes in 4x4 fwd transform unit test
Change-Id: I7a9ea03b92160f1052e56665b19a155211ee241f
2013-06-25 11:39:19 -07:00
Jingning Han
cd6932db77 Merge "Add 8x8 dct/adst unit tests" 2013-06-25 11:21:17 -07:00
Yaowu Xu
c2e3ee13e7 Merge "Changed size of mb_mode_context to 8 bits" 2013-06-25 10:44:47 -07:00
Scott LaVarnway
855e23ce8c Merge "Small mode_info_context cleanup in filter_block_plane" 2013-06-25 10:34:19 -07:00
Dmitry Kovalev
87ee34aacb Removing unused code.
Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
functions.

Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
2013-06-25 10:17:19 -07:00
Dmitry Kovalev
70e9622185 Merge "Removing find_seg_id and using vp9_get_pred_mi_segid instead." 2013-06-25 10:16:06 -07:00
Dmitry Kovalev
529679bd52 Merge "Transforming scale_mv_component_q4 into scale_mv_q4 function." 2013-06-25 10:15:33 -07:00
Jingning Han
ab362621fe Add 8x8 dct/adst unit tests
This commit enables 8x8 DCT and hybrid transform unit tests. It
also tunes the forward hybrid transform rounding opertions for
more precise round-trip performance.

Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
2013-06-25 09:57:01 -07:00
Jingning Han
67365520e7 Merge "Use aligned buffer operations in 8x8/16x16 2D-DCT" 2013-06-25 09:49:03 -07:00
Scott LaVarnway
c787f40bc4 Small mode_info_context cleanup in filter_block_plane
Unnecessary updates to xd->mode_info_context.

Change-Id: I36d2d68ca48366f727548526726b1b5437f62968
2013-06-25 12:28:50 -04:00
John Koleszar
b9e2761019 vpxenc: send usage to stderr
Thanks to hiiragikei AT gmail.com for the fix.

Change-Id: Iab6c0822593fc5557d86efbb014ff6409ff05b35
2013-06-25 09:17:41 -07:00
Yaowu Xu
b9c934df8e Merge "Enable sse2 implmentation of 8x8 ADST/DCT" 2013-06-25 09:13:22 -07:00
Yaowu Xu
ca976db44d Merge "change to enable use_largest_txform feature" 2013-06-25 09:07:01 -07:00
Jingning Han
82d504b50f Use aligned buffer operations in 8x8/16x16 2D-DCT
This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.

Change-Id: I137758b81cd127b936175284310e81378db64552
2013-06-24 19:56:23 -07:00
Jingning Han
a32a086d23 Enable sse2 implmentation of 8x8 ADST/DCT
This commit makes use of the butterfly structure to enable the sse2
version implementation of 8x8 ADST/DCT hybrid transform coding.

The runtime of hybrid transform module goes down from 1170 cycles
to 245 cycles. Overall speed-up around 1.5%.

Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
2013-06-24 18:41:33 -07:00
Yaowu Xu
e371cd73a3 change to enable use_largest_txform feature
for all regular inter frames at speed 1

Change-Id: I0a8b301273ecf2b8730ab1f6b7a05f89f4d498e0
2013-06-24 16:43:26 -07:00
John Koleszar
381e0882dc .gitignore: add gcov files
Change-Id: I0a58578e7cf27f3de839eb62a334e343eaed12c5
2013-06-24 15:59:32 -07:00
John Koleszar
4ecd6dbead Move vp9_counts_to_nmv_context to encoder
This function only used from within vp9_encodemv.c.

Change-Id: Ib3fc7c30b1e2d27321397ac474cbc8976bc1f4b1
2013-06-24 15:58:18 -07:00
John Koleszar
08b1798ae7 Move vp9_full_to_model_counts to encoder
This function is not called from the decoder, so it doesn't need to be
in common/.

Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733
2013-06-24 15:46:15 -07:00
John Koleszar
ece724ae16 Merge "Remove unused vp9_build_intra_predictors_sb{y,uv}_s" 2013-06-24 15:08:58 -07:00
John Koleszar
ee4a7e4e46 Merge "Remove unused vp9_model_to_full_probs_sb()" 2013-06-24 15:08:54 -07:00
Scott LaVarnway
dfa2ecc3f1 Changed size of mb_mode_context to 8 bits
This reduced the size of the MODE_INFO array (mip and prev_mip)
by 425,568 bytes each for 1080p resolutions.

Change-Id: Ifa513ec2d0a49e8ec0867ec90620762fb7f1261d
2013-06-24 17:11:16 -04:00
Ronald S. Bultje
3c4abbe454 Add SAD unit tests for all rectangular sizes.
Change-Id: I47e81b51f072abdb276bdec85423febba34b5f81
2013-06-24 14:05:13 -07:00
Ronald S. Bultje
4dc70fa7f9 Don't re-allocate comp_pred buffers for each call to comp motion search.
Instead, just allocate a few bytes on the stack, this is 4k, which isn't
all that much.

Change-Id: I82af6ee89e6ed01faaa23ff891ee7ced76df8c16
2013-06-24 14:05:13 -07:00
James Zern
c2fa8390f6 I420VideoSource: normalize framerate types
ctor inputs are ints as are vpx_rational_t members

Change-Id: I62a39bf3df123727a872e40b74e3ee9e55ef2ede
2013-06-21 19:34:51 -07:00
James Zern
f6d293adf6 intrapred_test: add virtual dtor to IntraPredBase
classes with virtual functions should have virtual destructors

Change-Id: If54e2f8384f0bfcbf812cc727eb9d0a586173674
2013-06-21 19:33:50 -07:00
John Koleszar
9e7019f7df Remove unused vp9_build_intra_predictors_sb{y,uv}_s
The functions no longer referenced.

Change-Id: If2705dfbc607f79ec8ec2242d5e03bec27a35aaf
2013-06-21 16:10:05 -07:00
John Koleszar
5c32215e27 Remove unused vp9_model_to_full_probs_sb()
This function never referenced.

Change-Id: I1c42cd355bfa88e17d169f7335a44be682af58cc
2013-06-21 15:38:55 -07:00
Dmitry Kovalev
f27f76dfb3 Transforming scale_mv_component_q4 into scale_mv_q4 function.
Using MV instead of int_mv for function arguments.

Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6
2013-06-21 15:34:29 -07:00
Dmitry Kovalev
40141681c0 Removing find_seg_id and using vp9_get_pred_mi_segid instead.
Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827
2013-06-21 13:05:10 -07:00
266 changed files with 31447 additions and 19894 deletions

4
.gitignore vendored
View File

@@ -1,6 +1,8 @@
*.a
*.asm.s
*.d
*.gcno
*.gcda
*.o
*~
/*.ivf
@@ -14,7 +16,7 @@
/.install-*
/.libs
/Makefile
/config.err
/config.log
/config.mk
/decode_to_md5
/decode_to_md5.c

36
README
View File

@@ -1,7 +1,7 @@
vpx Multi-Format Codec SDK
README - 21 June 2012
README - 1 August 2013
Welcome to the WebM VP8 Codec SDK!
Welcome to the WebM VP8/VP9 Codec SDK!
COMPILING THE APPLICATIONS/LIBRARIES:
The build system used is similar to autotools. Building generally consists of
@@ -53,33 +53,63 @@ COMPILING THE APPLICATIONS/LIBRARIES:
armv5te-android-gcc
armv5te-linux-rvct
armv5te-linux-gcc
armv5te-none-rvct
armv6-darwin-gcc
armv6-linux-rvct
armv6-linux-gcc
armv6-none-rvct
armv7-android-gcc
armv7-darwin-gcc
armv7-linux-rvct
armv7-linux-gcc
armv7-none-rvct
armv7-win32-vs11
mips32-linux-gcc
ppc32-darwin8-gcc
ppc32-darwin9-gcc
ppc32-linux-gcc
ppc64-darwin8-gcc
ppc64-darwin9-gcc
ppc64-linux-gcc
sparc-solaris-gcc
x86-android-gcc
x86-darwin8-gcc
x86-darwin8-icc
x86-darwin9-gcc
x86-darwin9-icc
x86-darwin10-gcc
x86-darwin11-gcc
x86-darwin12-gcc
x86-darwin13-gcc
x86-linux-gcc
x86-linux-icc
x86-os2-gcc
x86-solaris-gcc
x86-win32-gcc
x86-win32-vs7
x86-win32-vs8
x86-win32-vs9
x86-win32-vs10
x86-win32-vs11
x86_64-darwin9-gcc
x86_64-darwin10-gcc
x86_64-darwin11-gcc
x86_64-darwin12-gcc
x86_64-darwin13-gcc
x86_64-linux-gcc
x86_64-linux-icc
x86_64-solaris-gcc
x86_64-win64-gcc
x86_64-win64-vs8
x86_64-win64-vs9
x86_64-win64-vs10
x86_64-win64-vs11
universal-darwin8-gcc
universal-darwin9-gcc
universal-darwin10-gcc
universal-darwin11-gcc
universal-darwin12-gcc
universal-darwin13-gcc
generic-gnu
The generic-gnu target, in conjunction with the CROSS environment variable,
@@ -97,7 +127,7 @@ COMPILING THE APPLICATIONS/LIBRARIES:
5. Configuration errors
If the configuration step fails, the first step is to look in the error log.
This defaults to config.err. This should give a good indication of what went
This defaults to config.log. This should give a good indication of what went
wrong. If not, contact us for support.
SUPPORT

View File

@@ -7,18 +7,7 @@ REM in the file PATENTS. All contributing project authors may
REM be found in the AUTHORS file in the root of the source tree.
echo on
cl /I "./" /I "%1" /nologo /c /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP "%1/vp9/common/vp9_asm_com_offsets.c"
cl /I "./" /I "%1" /nologo /c /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP "%1/vp9/decoder/vp9_asm_dec_offsets.c"
cl /I "./" /I "%1" /nologo /c /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP "%1/vp9/encoder/vp9_asm_enc_offsets.c"
obj_int_extract.exe rvds "vp9_asm_com_offsets.obj" > "vp9_asm_com_offsets.asm"
obj_int_extract.exe rvds "vp9_asm_dec_offsets.obj" > "vp9_asm_dec_offsets.asm"
obj_int_extract.exe rvds "vp9_asm_enc_offsets.obj" > "vp9_asm_enc_offsets.asm"
cl /I "./" /I "%1" /nologo /c /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP "%1/vp8/common/vp8_asm_com_offsets.c"
cl /I "./" /I "%1" /nologo /c /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP "%1/vp8/decoder/vp8_asm_dec_offsets.c"
cl /I "./" /I "%1" /nologo /c /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP "%1/vp8/encoder/vp8_asm_enc_offsets.c"
obj_int_extract.exe rvds "vp8_asm_com_offsets.obj" > "vp8_asm_com_offsets.asm"
obj_int_extract.exe rvds "vp8_asm_dec_offsets.obj" > "vp8_asm_dec_offsets.asm"
obj_int_extract.exe rvds "vp8_asm_enc_offsets.obj" > "vp8_asm_enc_offsets.asm"
cl /I "./" /I "%1" /nologo /c /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP "%1/vpx_scale/vpx_scale_asm_offsets.c"

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/sh
##
## Copyright (c) 2010 The WebM project authors. All Rights Reserved.
##
@@ -13,20 +13,20 @@
verbose=0
set -- $*
for i; do
if [ "$i" == "-o" ]; then
if [ "$i" = "-o" ]; then
on_of=1
elif [ "$i" == "-v" ]; then
elif [ "$i" = "-v" ]; then
verbose=1
elif [ "$i" == "-g" ]; then
elif [ "$i" = "-g" ]; then
args="${args} --debug"
elif [ "$on_of" == "1" ]; then
elif [ "$on_of" = "1" ]; then
outfile=$i
on_of=0
elif [ -f "$i" ]; then
infiles="$infiles $i"
elif [ "${i:0:2}" == "-l" ]; then
elif [ "${i#-l}" != "$i" ]; then
libs="$libs ${i#-l}"
elif [ "${i:0:2}" == "-L" ]; then
elif [ "${i#-L}" != "$i" ]; then
libpaths="${libpaths} ${i#-L}"
else
args="${args} ${i}"

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/sh
##
## configure.sh
##
@@ -75,7 +75,7 @@ Options:
Build options:
--help print this message
--log=yes|no|FILE file configure log is written to [config.err]
--log=yes|no|FILE file configure log is written to [config.log]
--target=TARGET target platform tuple [generic-gnu]
--cpu=CPU optimize for a specific cpu rather than a family
--extra-cflags=ECFLAGS add ECFLAGS to CFLAGS [$CFLAGS]
@@ -198,11 +198,11 @@ add_extralibs() {
#
# Boolean Manipulation Functions
#
enable(){
enable_feature(){
set_all yes $*
}
disable(){
disable_feature(){
set_all no $*
}
@@ -219,7 +219,7 @@ soft_enable() {
for var in $*; do
if ! disabled $var; then
log_echo " enabling $var"
enable $var
enable_feature $var
fi
done
}
@@ -228,7 +228,7 @@ soft_disable() {
for var in $*; do
if ! enabled $var; then
log_echo " disabling $var"
disable $var
disable_feature $var
fi
done
}
@@ -251,10 +251,10 @@ tolower(){
# Temporary File Functions
#
source_path=${0%/*}
enable source_path_used
enable_feature source_path_used
if test -z "$source_path" -o "$source_path" = "." ; then
source_path="`pwd`"
disable source_path_used
disable_feature source_path_used
fi
if test ! -z "$TMPDIR" ; then
@@ -264,12 +264,13 @@ elif test ! -z "$TEMPDIR" ; then
else
TMPDIRx="/tmp"
fi
TMP_H="${TMPDIRx}/vpx-conf-$$-${RANDOM}.h"
TMP_C="${TMPDIRx}/vpx-conf-$$-${RANDOM}.c"
TMP_CC="${TMPDIRx}/vpx-conf-$$-${RANDOM}.cc"
TMP_O="${TMPDIRx}/vpx-conf-$$-${RANDOM}.o"
TMP_X="${TMPDIRx}/vpx-conf-$$-${RANDOM}.x"
TMP_ASM="${TMPDIRx}/vpx-conf-$$-${RANDOM}.asm"
RAND=$(awk 'BEGIN { srand(); printf "%d\n",(rand() * 32768)}')
TMP_H="${TMPDIRx}/vpx-conf-$$-${RAND}.h"
TMP_C="${TMPDIRx}/vpx-conf-$$-${RAND}.c"
TMP_CC="${TMPDIRx}/vpx-conf-$$-${RAND}.cc"
TMP_O="${TMPDIRx}/vpx-conf-$$-${RAND}.o"
TMP_X="${TMPDIRx}/vpx-conf-$$-${RAND}.x"
TMP_ASM="${TMPDIRx}/vpx-conf-$$-${RAND}.asm"
clean_temp_files() {
rm -f ${TMP_C} ${TMP_CC} ${TMP_H} ${TMP_O} ${TMP_X} ${TMP_ASM}
@@ -316,8 +317,8 @@ check_header(){
header=$1
shift
var=`echo $header | sed 's/[^A-Za-z0-9_]/_/g'`
disable $var
check_cpp "$@" <<EOF && enable $var
disable_feature $var
check_cpp "$@" <<EOF && enable_feature $var
#include "$header"
int x;
EOF
@@ -479,7 +480,7 @@ process_common_cmdline() {
for opt in "$@"; do
optval="${opt#*=}"
case "$opt" in
--child) enable child
--child) enable_feature child
;;
--log*)
logging="$optval"
@@ -491,7 +492,7 @@ process_common_cmdline() {
;;
--target=*) toolchain="${toolchain:-${optval}}"
;;
--force-target=*) toolchain="${toolchain:-${optval}}"; enable force_toolchain
--force-target=*) toolchain="${toolchain:-${optval}}"; enable_feature force_toolchain
;;
--cpu)
;;
@@ -511,7 +512,7 @@ process_common_cmdline() {
echo "${CMDLINE_SELECT}" | grep "^ *$option\$" >/dev/null ||
die_unknown $opt
fi
$action $option
${action}_feature $option
;;
--require-?*)
eval `echo "$opt" | sed 's/--/action=/;s/-/ option=/;s/-/_/g'`
@@ -523,11 +524,11 @@ process_common_cmdline() {
;;
--force-enable-?*|--force-disable-?*)
eval `echo "$opt" | sed 's/--force-/action=/;s/-/ option=/;s/-/_/g'`
$action $option
${action}_feature $option
;;
--libc=*)
[ -d "${optval}" ] || die "Not a directory: ${optval}"
disable builtin_libc
disable_feature builtin_libc
alt_libc="${optval}"
;;
--as=*)
@@ -653,6 +654,10 @@ process_common_toolchain() {
tgt_isa=x86_64
tgt_os=darwin12
;;
*darwin13*)
tgt_isa=x86_64
tgt_os=darwin13
;;
x86_64*mingw32*)
tgt_os=win64
;;
@@ -692,13 +697,13 @@ process_common_toolchain() {
# Mark the specific ISA requested as enabled
soft_enable ${tgt_isa}
enable ${tgt_os}
enable ${tgt_cc}
enable_feature ${tgt_os}
enable_feature ${tgt_cc}
# Enable the architecture family
case ${tgt_isa} in
arm*) enable arm;;
mips*) enable mips;;
arm*) enable_feature arm;;
mips*) enable_feature mips;;
esac
# PIC is probably what we want when building shared libs
@@ -751,13 +756,17 @@ process_common_toolchain() {
add_cflags "-mmacosx-version-min=10.8"
add_ldflags "-mmacosx-version-min=10.8"
;;
*-darwin13-*)
add_cflags "-mmacosx-version-min=10.9"
add_ldflags "-mmacosx-version-min=10.9"
;;
esac
# Handle Solaris variants. Solaris 10 needs -lposix4
case ${toolchain} in
sparc-solaris-*)
add_extralibs -lposix4
disable fast_unaligned
disable_feature fast_unaligned
;;
*-solaris-*)
add_extralibs -lposix4
@@ -782,7 +791,7 @@ process_common_toolchain() {
;;
armv5te)
soft_enable edsp
disable fast_unaligned
disable_feature fast_unaligned
;;
esac
@@ -797,7 +806,7 @@ process_common_toolchain() {
arch_int=${arch_int%%te}
check_add_asflags --defsym ARCHITECTURE=${arch_int}
tune_cflags="-mtune="
if [ ${tgt_isa} == "armv7" ]; then
if [ ${tgt_isa} = "armv7" ]; then
if [ -z "${float_abi}" ]; then
check_cpp <<EOF && float_abi=hard || float_abi=softfp
#ifndef __ARM_PCS_VFP
@@ -834,8 +843,8 @@ EOF
asm_conversion_cmd="${source_path}/build/make/ads2armasm_ms.pl"
AS_SFX=.s
msvs_arch_dir=arm-msvs
disable multithread
disable unit_tests
disable_feature multithread
disable_feature unit_tests
;;
rvct)
CC=armcc
@@ -847,7 +856,7 @@ EOF
tune_cflags="--cpu="
tune_asflags="--cpu="
if [ -z "${tune_cpu}" ]; then
if [ ${tgt_isa} == "armv7" ]; then
if [ ${tgt_isa} = "armv7" ]; then
if enabled neon
then
check_add_cflags --fpu=softvfp+vfpv3
@@ -872,8 +881,8 @@ EOF
case ${tgt_os} in
none*)
disable multithread
disable os_support
disable_feature multithread
disable_feature os_support
;;
android*)
@@ -905,9 +914,9 @@ EOF
# Cortex-A8 implementations (NDK Dev Guide)
add_ldflags "-Wl,--fix-cortex-a8"
enable pic
enable_feature pic
soft_enable realtime_only
if [ ${tgt_isa} == "armv7" ]; then
if [ ${tgt_isa} = "armv7" ]; then
soft_enable runtime_cpu_detect
fi
if enabled runtime_cpu_detect; then
@@ -961,7 +970,7 @@ EOF
;;
linux*)
enable linux
enable_feature linux
if enabled rvct; then
# Check if we have CodeSourcery GCC in PATH. Needed for
# libraries
@@ -992,14 +1001,14 @@ EOF
tune_cflags="-mtune="
if enabled dspr2; then
check_add_cflags -mips32r2 -mdspr2
disable fast_unaligned
disable_feature fast_unaligned
fi
check_add_cflags -march=${tgt_isa}
check_add_asflags -march=${tgt_isa}
check_add_asflags -KPIC
;;
ppc*)
enable ppc
enable_feature ppc
bits=${tgt_isa##ppc}
link_with_cc=gcc
setup_gnu_toolchain
@@ -1147,7 +1156,7 @@ EOF
;;
universal*|*-gcc|generic-gnu)
link_with_cc=gcc
enable gcc
enable_feature gcc
setup_gnu_toolchain
;;
esac
@@ -1181,6 +1190,12 @@ EOF
fi
fi
# default use_x86inc to yes if pic is no or 64bit or we are not on darwin
echo " checking here for x86inc \"${tgt_isa}\" \"$pic\" "
if [ ${tgt_isa} = x86_64 -o ! "$pic" = "yes" -o "${tgt_os#darwin}" = "${tgt_os}" ]; then
soft_enable use_x86inc
fi
# Position Independent Code (PIC) support, for building relocatable
# shared objects
enabled gcc && enabled pic && check_add_cflags -fPIC
@@ -1190,14 +1205,14 @@ EOF
enabled linux && check_add_cflags -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0
# Check for strip utility variant
${STRIP} -V 2>/dev/null | grep GNU >/dev/null && enable gnu_strip
${STRIP} -V 2>/dev/null | grep GNU >/dev/null && enable_feature gnu_strip
# Try to determine target endianness
check_cc <<EOF
unsigned int e = 'O'<<24 | '2'<<16 | 'B'<<8 | 'E';
EOF
[ -f "${TMP_O}" ] && od -A n -t x1 "${TMP_O}" | tr -d '\n' |
grep '4f *32 *42 *45' >/dev/null 2>&1 && enable big_endian
grep '4f *32 *42 *45' >/dev/null 2>&1 && enable_feature big_endian
# Try to find which inline keywords are supported
check_cc <<EOF && INLINE="inline"
@@ -1222,7 +1237,7 @@ EOF
if enabled dspr2; then
if enabled big_endian; then
echo "dspr2 optimizations are available only for little endian platforms"
disable dspr2
disable_feature dspr2
fi
fi
;;
@@ -1273,8 +1288,8 @@ print_config_h() {
print_webm_license() {
local destination=$1
local prefix=$2
local suffix=$3
local prefix="$2"
local suffix="$3"
shift 3
cat <<EOF > ${destination}
${prefix} Copyright (c) 2011 The WebM project authors. All Rights Reserved.${suffix}
@@ -1295,8 +1310,8 @@ process_detect() {
true;
}
enable logging
logfile="config.err"
enable_feature logging
logfile="config.log"
self=$0
process() {
cmdline_args="$@"

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/sh
##
## Copyright (c) 2010 The WebM project authors. All Rights Reserved.
##

View File

@@ -381,7 +381,7 @@ generate_vcproj() {
RuntimeLibrary="$debug_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
DebugInformationFormat="2" \
$warn_64bit \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="true"
@@ -395,7 +395,7 @@ generate_vcproj() {
RuntimeLibrary="$debug_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
DebugInformationFormat="2" \
$warn_64bit \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="true"

View File

@@ -72,10 +72,21 @@ parse_project() {
eval "${var}_name=$name"
eval "${var}_guid=$guid"
# assume that all projects have the same list of possible configurations,
# so overwriting old config_lists is not a problem
config_list=`grep -A1 '<Configuration' $file |
grep Name | cut -d\" -f2`
if [ "$sfx" = "vcproj" ]; then
cur_config_list=`grep -A1 '<Configuration' $file |
grep Name | cut -d\" -f2`
else
cur_config_list=`grep -B1 'Label="Configuration"' $file |
grep Condition | cut -d\' -f4`
fi
new_config_list=$(for i in $config_list $cur_config_list; do
echo $i
done | sort | uniq)
if [ "$config_list" != "" ] && [ "$config_list" != "$new_config_list" ]; then
mixed_platforms=1
fi
config_list="$new_config_list"
eval "${var}_config_list=\"$cur_config_list\""
proj_list="${proj_list} ${var}"
}
@@ -125,6 +136,11 @@ process_global() {
indent_push
IFS_bak=${IFS}
IFS=$'\r'$'\n'
if [ "$mixed_platforms" != "" ]; then
config_list="
Release|Mixed Platforms
Debug|Mixed Platforms"
fi
for config in ${config_list}; do
echo "${indent}$config = $config"
done
@@ -139,10 +155,17 @@ process_global() {
indent_push
for proj in ${proj_list}; do
eval "local proj_guid=\${${proj}_guid}"
eval "local proj_config_list=\${${proj}_config_list}"
IFS=$'\r'$'\n'
for config in ${config_list}; do
echo "${indent}${proj_guid}.${config}.ActiveCfg = ${config}"
echo "${indent}${proj_guid}.${config}.Build.0 = ${config}"
for config in ${proj_config_list}; do
if [ "$mixed_platforms" != "" ]; then
local c=${config%%|*}
echo "${indent}${proj_guid}.${c}|Mixed Platforms.ActiveCfg = ${config}"
echo "${indent}${proj_guid}.${c}|Mixed Platforms.Build.0 = ${config}"
else
echo "${indent}${proj_guid}.${config}.ActiveCfg = ${config}"
echo "${indent}${proj_guid}.${config}.Build.0 = ${config}"
fi
done
IFS=${IFS_bak}
@@ -168,9 +191,14 @@ process_makefile() {
IFS=$'\r'$'\n'
local TAB=$'\t'
cat <<EOF
found_devenv := \$(shell which devenv.com >/dev/null 2>&1 && echo yes)
ifeq (\$(CONFIG_VS_VERSION),7)
MSBUILD_TOOL := devenv.com
else
MSBUILD_TOOL := msbuild.exe
endif
found_devenv := \$(shell which \$(MSBUILD_TOOL) >/dev/null 2>&1 && echo yes)
.nodevenv.once:
${TAB}@echo " * devenv.com not found in path."
${TAB}@echo " * \$(MSBUILD_TOOL) not found in path."
${TAB}@echo " * "
${TAB}@echo " * You will have to build all configurations manually using the"
${TAB}@echo " * Visual Studio IDE. To allow make to build them automatically,"
@@ -195,16 +223,17 @@ ${TAB}rm -rf "$platform"/"$config"
ifneq (\$(found_devenv),)
ifeq (\$(CONFIG_VS_VERSION),7)
$nows_sln_config: $outfile
${TAB}devenv.com $outfile -build "$config"
${TAB}\$(MSBUILD_TOOL) $outfile -build "$config"
else
$nows_sln_config: $outfile
${TAB}devenv.com $outfile -build "$sln_config"
${TAB}\$(MSBUILD_TOOL) $outfile -m -t:Build \\
${TAB}${TAB}-p:Configuration="$config" -p:Platform="$platform"
endif
else
$nows_sln_config: $outfile .nodevenv.once
${TAB}@echo " * Skipping build of $sln_config (devenv.com not in path)."
${TAB}@echo " * Skipping build of $sln_config (\$(MSBUILD_TOOL) not in path)."
${TAB}@echo " * "
endif

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/sh
##
## Copyright (c) 2010 The WebM project authors. All Rights Reserved.
##

View File

@@ -7,17 +7,6 @@ REM in the file PATENTS. All contributing project authors may
REM be found in the AUTHORS file in the root of the source tree.
echo on
cl /I "./" /I "%1" /nologo /c "%1/vp9/common/vp9_asm_com_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp9/decoder/vp9_asm_dec_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp9/encoder/vp9_asm_enc_offsets.c"
obj_int_extract.exe rvds "vp9_asm_com_offsets.obj" > "vp9_asm_com_offsets.asm"
obj_int_extract.exe rvds "vp9_asm_dec_offsets.obj" > "vp9_asm_dec_offsets.asm"
obj_int_extract.exe rvds "vp9_asm_enc_offsets.obj" > "vp9_asm_enc_offsets.asm"
cl /I "./" /I "%1" /nologo /c "%1/vp8/common/vp8_asm_com_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp8/decoder/vp8_asm_dec_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp8/encoder/vp8_asm_enc_offsets.c"
obj_int_extract.exe rvds "vp8_asm_com_offsets.obj" > "vp8_asm_com_offsets.asm"
obj_int_extract.exe rvds "vp8_asm_dec_offsets.obj" > "vp8_asm_dec_offsets.asm"
obj_int_extract.exe rvds "vp8_asm_enc_offsets.obj" > "vp8_asm_enc_offsets.asm"

99
configure vendored
View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/sh
##
## configure
##
@@ -38,6 +38,7 @@ Advanced options:
${toggle_internal_stats} output of encoder internal stats for debug, if supported (encoders)
${toggle_mem_tracker} track memory usage
${toggle_postproc} postprocessing
${toggle_vp9_postproc} vp9 specific postprocessing
${toggle_multithread} multithreaded encoding and decoding
${toggle_spatial_resampling} spatial sampling (scaling) support
${toggle_realtime_only} enable this option while building for real-time encoding
@@ -115,6 +116,7 @@ all_platforms="${all_platforms} x86-darwin9-icc"
all_platforms="${all_platforms} x86-darwin10-gcc"
all_platforms="${all_platforms} x86-darwin11-gcc"
all_platforms="${all_platforms} x86-darwin12-gcc"
all_platforms="${all_platforms} x86-darwin13-gcc"
all_platforms="${all_platforms} x86-linux-gcc"
all_platforms="${all_platforms} x86-linux-icc"
all_platforms="${all_platforms} x86-os2-gcc"
@@ -129,6 +131,7 @@ all_platforms="${all_platforms} x86_64-darwin9-gcc"
all_platforms="${all_platforms} x86_64-darwin10-gcc"
all_platforms="${all_platforms} x86_64-darwin11-gcc"
all_platforms="${all_platforms} x86_64-darwin12-gcc"
all_platforms="${all_platforms} x86_64-darwin13-gcc"
all_platforms="${all_platforms} x86_64-linux-gcc"
all_platforms="${all_platforms} x86_64-linux-icc"
all_platforms="${all_platforms} x86_64-solaris-gcc"
@@ -142,6 +145,7 @@ all_platforms="${all_platforms} universal-darwin9-gcc"
all_platforms="${all_platforms} universal-darwin10-gcc"
all_platforms="${all_platforms} universal-darwin11-gcc"
all_platforms="${all_platforms} universal-darwin12-gcc"
all_platforms="${all_platforms} universal-darwin13-gcc"
all_platforms="${all_platforms} generic-gnu"
# all_targets is a list of all targets that can be configured
@@ -150,7 +154,7 @@ all_targets="libs examples docs"
# all targets available are enabled, by default.
for t in ${all_targets}; do
[ -f ${source_path}/${t}.mk ] && enable ${t}
[ -f ${source_path}/${t}.mk ] && enable_feature ${t}
done
# check installed doxygen version
@@ -161,30 +165,30 @@ if [ ${doxy_major:-0} -ge 1 ]; then
doxy_minor=${doxy_version%%.*}
doxy_patch=${doxy_version##*.}
[ $doxy_major -gt 1 ] && enable doxygen
[ $doxy_minor -gt 5 ] && enable doxygen
[ $doxy_minor -eq 5 ] && [ $doxy_patch -ge 3 ] && enable doxygen
[ $doxy_major -gt 1 ] && enable_feature doxygen
[ $doxy_minor -gt 5 ] && enable_feature doxygen
[ $doxy_minor -eq 5 ] && [ $doxy_patch -ge 3 ] && enable_feature doxygen
fi
# install everything except the sources, by default. sources will have
# to be enabled when doing dist builds, since that's no longer a common
# case.
enabled doxygen && php -v >/dev/null 2>&1 && enable install_docs
enable install_bins
enable install_libs
enabled doxygen && php -v >/dev/null 2>&1 && enable_feature install_docs
enable_feature install_bins
enable_feature install_libs
enable static
enable optimizations
enable fast_unaligned #allow unaligned accesses, if supported by hw
enable md5
enable spatial_resampling
enable multithread
enable os_support
enable temporal_denoising
enable_feature static
enable_feature optimizations
enable_feature fast_unaligned #allow unaligned accesses, if supported by hw
enable_feature md5
enable_feature spatial_resampling
enable_feature multithread
enable_feature os_support
enable_feature temporal_denoising
[ -d ${source_path}/../include ] && enable alt_tree_layout
[ -d ${source_path}/../include ] && enable_feature alt_tree_layout
for d in vp8 vp9; do
[ -d ${source_path}/${d} ] && disable alt_tree_layout;
[ -d ${source_path}/${d} ] && disable_feature alt_tree_layout;
done
if ! enabled alt_tree_layout; then
@@ -197,10 +201,10 @@ else
[ -f ${source_path}/../include/vpx/vp8dx.h ] && CODECS="${CODECS} vp8_decoder"
[ -f ${source_path}/../include/vpx/vp9cx.h ] && CODECS="${CODECS} vp9_encoder"
[ -f ${source_path}/../include/vpx/vp9dx.h ] && CODECS="${CODECS} vp9_decoder"
[ -f ${source_path}/../include/vpx/vp8cx.h ] || disable vp8_encoder
[ -f ${source_path}/../include/vpx/vp8dx.h ] || disable vp8_decoder
[ -f ${source_path}/../include/vpx/vp9cx.h ] || disable vp9_encoder
[ -f ${source_path}/../include/vpx/vp9dx.h ] || disable vp9_decoder
[ -f ${source_path}/../include/vpx/vp8cx.h ] || disable_feature vp8_encoder
[ -f ${source_path}/../include/vpx/vp8dx.h ] || disable_feature vp8_decoder
[ -f ${source_path}/../include/vpx/vp9cx.h ] || disable_feature vp9_encoder
[ -f ${source_path}/../include/vpx/vp9dx.h ] || disable_feature vp9_decoder
[ -f ${source_path}/../lib/*/*mt.lib ] && soft_enable static_msvcrt
fi
@@ -247,7 +251,6 @@ EXPERIMENT_LIST="
multiple_arf
non420
alpha
balanced_coeftree
"
CONFIG_LIST="
external_build
@@ -255,6 +258,7 @@ CONFIG_LIST="
install_bins
install_libs
install_srcs
use_x86inc
debug
gprof
gcov
@@ -276,6 +280,7 @@ CONFIG_LIST="
dc_recon
runtime_cpu_detect
postproc
vp9_postproc
multithread
internal_stats
${CODECS}
@@ -311,6 +316,7 @@ CMDLINE_SELECT="
gprof
gcov
pic
use_x86inc
optimizations
ccache
runtime_cpu_detect
@@ -329,6 +335,7 @@ CMDLINE_SELECT="
dequant_tokens
dc_recon
postproc
vp9_postproc
multithread
internal_stats
${CODECS}
@@ -354,12 +361,12 @@ process_cmdline() {
for opt do
optval="${opt#*=}"
case "$opt" in
--disable-codecs) for c in ${CODECS}; do disable $c; done ;;
--disable-codecs) for c in ${CODECS}; do disable_feature $c; done ;;
--enable-?*|--disable-?*)
eval `echo "$opt" | sed 's/--/action=/;s/-/ option=/;s/-/_/g'`
if echo "${EXPERIMENT_LIST}" | grep "^ *$option\$" >/dev/null; then
if enabled experimental; then
$action $option
${action}_feature $option
else
log_echo "Ignoring $opt -- not in experimental mode."
fi
@@ -380,8 +387,8 @@ post_process_cmdline() {
# If the codec family is enabled, enable all components of that family.
log_echo "Configuring selected codecs"
for c in ${CODECS}; do
disabled ${c%%_*} && disable ${c}
enabled ${c%%_*} && enable ${c}
disabled ${c%%_*} && disable_feature ${c}
enabled ${c%%_*} && enable_feature ${c}
done
# Enable all detected codecs, if they haven't been disabled
@@ -389,12 +396,12 @@ post_process_cmdline() {
# Enable the codec family if any component of that family is enabled
for c in ${CODECS}; do
enabled $c && enable ${c%_*}
enabled $c && enable_feature ${c%_*}
done
# Set the {en,de}coders variable if any algorithm in that class is enabled
for c in ${CODECS}; do
enabled ${c} && enable ${c##*_}s
enabled ${c} && enable_feature ${c##*_}s
done
}
@@ -434,7 +441,7 @@ process_targets() {
done
enabled debug_libs && DIST_DIR="${DIST_DIR}-debug"
enabled codec_srcs && DIST_DIR="${DIST_DIR}-src"
! enabled postproc && DIST_DIR="${DIST_DIR}-nopost"
! enabled postproc && ! enabled vp9_postproc && DIST_DIR="${DIST_DIR}-nopost"
! enabled multithread && DIST_DIR="${DIST_DIR}-nomt"
! enabled install_docs && DIST_DIR="${DIST_DIR}-nodocs"
DIST_DIR="${DIST_DIR}-${tgt_isa}-${tgt_os}"
@@ -504,13 +511,13 @@ process_detect() {
fi
if [ -z "$CC" ] || enabled external_build; then
echo "Bypassing toolchain for environment detection."
enable external_build
enable_feature external_build
check_header() {
log fake_check_header "$@"
header=$1
shift
var=`echo $header | sed 's/[^A-Za-z0-9_]/_/g'`
disable $var
disable_feature $var
# Headers common to all environments
case $header in
stdio.h)
@@ -522,7 +529,7 @@ process_detect() {
[ -f "${d##-I}/$header" ] && result=true && break
done
${result:-true}
esac && enable $var
esac && enable_feature $var
# Specialize windows and POSIX environments.
case $toolchain in
@@ -530,7 +537,7 @@ process_detect() {
case $header-$toolchain in
stdint*-gcc) true;;
*) false;;
esac && enable $var
esac && enable_feature $var
;;
*)
case $header in
@@ -539,7 +546,7 @@ process_detect() {
sys/mman.h) true;;
unistd.h) true;;
*) false;;
esac && enable $var
esac && enable_feature $var
esac
enabled $var
}
@@ -557,7 +564,7 @@ EOF
check_header sys/mman.h
check_header unistd.h # for sysconf(3) and friends.
check_header vpx/vpx_integer.h -I${source_path} && enable vpx_ports
check_header vpx/vpx_integer.h -I${source_path} && enable_feature vpx_ports
}
process_toolchain() {
@@ -639,14 +646,18 @@ process_toolchain() {
# ccache only really works on gcc toolchains
enabled gcc || soft_disable ccache
if enabled mips; then
enable dequant_tokens
enable dc_recon
enable_feature dequant_tokens
enable_feature dc_recon
fi
if enabled internal_stats; then
enable_feature vp9_postproc
fi
# Enable the postbuild target if building for visual studio.
case "$tgt_cc" in
vs*) enable msvs
enable solution
vs*) enable_feature msvs
enable_feature solution
vs_version=${tgt_cc##vs}
case $vs_version in
[789])
@@ -682,6 +693,14 @@ process_toolchain() {
# iOS/ARM builds do not work with gtest. This does not match
# x86 targets.
;;
*-win*)
# Some mingw toolchains don't have pthread available by default.
# Treat these more like visual studio where threading in gtest
# would be disabled for the same reason.
check_cxx "$@" <<EOF && soft_enable unit_tests
int z;
EOF
;;
*)
enabled pthread_h && check_cxx "$@" <<EOF && soft_enable unit_tests
int z;

View File

@@ -49,6 +49,9 @@ vpxenc.DESCRIPTION = Full featured encoder
UTILS-$(CONFIG_VP8_ENCODER) += vp8_scalable_patterns.c
vp8_scalable_patterns.GUID = 0D6A210B-F482-4D6F-8570-4A9C01ACC88C
vp8_scalable_patterns.DESCRIPTION = Temporal Scalability Encoder
UTILS-$(CONFIG_VP8_ENCODER) += vp9_spatial_scalable_encoder.c
vp8_scalable_patterns.GUID = 4A38598D-627D-4505-9C7B-D4020C84100D
vp8_scalable_patterns.DESCRIPTION = Spatial Scalable Encoder
# Clean up old ivfenc, ivfdec binaries.
ifeq ($(CONFIG_MSVS),yes)

20
libs.mk
View File

@@ -57,6 +57,13 @@ CLEAN-OBJS += $$(BUILD_PFX)$(1).h
RTCD += $$(BUILD_PFX)$(1).h
endef
# x86inc.asm is not compatible with pic 32bit builds. Restrict
# files which use it to 64bit builds or 32bit without pic
USE_X86INC = no
ifeq ($(CONFIG_USE_X86INC),yes)
USE_X86INC = yes
endif
CODEC_SRCS-yes += CHANGELOG
CODEC_SRCS-yes += libs.mk
@@ -383,7 +390,12 @@ LIBVPX_TEST_DATA=$(addprefix $(LIBVPX_TEST_DATA_PATH)/,\
$(call enabled,LIBVPX_TEST_DATA))
libvpx_test_data_url=http://downloads.webmproject.org/test_data/libvpx/$(1)
$(LIBVPX_TEST_DATA):
libvpx_test_srcs.txt:
@echo " [CREATE] $@"
@echo $(LIBVPX_TEST_SRCS) | xargs -n1 echo | sort -u > $@
CLEAN-OBJS += libvpx_test_srcs.txt
$(LIBVPX_TEST_DATA): $(SRC_PATH_BARE)/test/test-data.sha1
@echo " [DOWNLOAD] $@"
$(qexec)trap 'rm -f $@' INT TERM &&\
curl -L -o $@ $(call libvpx_test_data_url,$(@F))
@@ -443,6 +455,10 @@ else
include $(SRC_PATH_BARE)/third_party/googletest/gtest.mk
GTEST_SRCS := $(addprefix third_party/googletest/src/,$(call enabled,GTEST_SRCS))
GTEST_OBJS=$(call objs,$(GTEST_SRCS))
ifeq ($(filter win%,$(TGT_OS)),$(TGT_OS))
# Disabling pthreads globally will cause issues on darwin and possibly elsewhere
$(GTEST_OBJS) $(GTEST_OBJS:.o=.d): CXXFLAGS += -DGTEST_HAS_PTHREAD=0
endif
$(GTEST_OBJS) $(GTEST_OBJS:.o=.d): CXXFLAGS += -I$(SRC_PATH_BARE)/third_party/googletest/src
$(GTEST_OBJS) $(GTEST_OBJS:.o=.d): CXXFLAGS += -I$(SRC_PATH_BARE)/third_party/googletest/src/include
OBJS-$(BUILD_LIBVPX) += $(GTEST_OBJS)
@@ -467,7 +483,7 @@ $(foreach bin,$(LIBVPX_TEST_BINS),\
lib$(CODEC_LIB)$(CODEC_LIB_SUF) libgtest.a ))\
$(if $(BUILD_LIBVPX),$(eval $(call linkerxx_template,$(bin),\
$(LIBVPX_TEST_OBJS) \
-L. -lvpx -lgtest -lpthread -lm)\
-L. -lvpx -lgtest $(extralibs) -lm)\
)))\
$(if $(LIPO_LIBS),$(eval $(call lipo_bin_template,$(bin))))\

View File

@@ -8,8 +8,8 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef LIBVPX_TEST_ACM_RANDOM_H_
#define LIBVPX_TEST_ACM_RANDOM_H_
#ifndef TEST_ACM_RANDOM_H_
#define TEST_ACM_RANDOM_H_
#include "third_party/googletest/src/include/gtest/gtest.h"
@@ -59,4 +59,4 @@ class ACMRandom {
} // namespace libvpx_test
#endif // LIBVPX_TEST_ACM_RANDOM_H_
#endif // TEST_ACM_RANDOM_H_

View File

@@ -33,10 +33,6 @@ class AltRefTest : public ::libvpx_test::EncoderTest,
altref_count_ = 0;
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void PreEncodeFrameHook(libvpx_test::VideoSource *video,
libvpx_test::Encoder *encoder) {
if (video->frame() == 1) {

View File

@@ -27,14 +27,10 @@ class BordersTest : public ::libvpx_test::EncoderTest,
SetMode(GET_PARAM(1));
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void PreEncodeFrameHook(::libvpx_test::VideoSource *video,
::libvpx_test::Encoder *encoder) {
if ( video->frame() == 1) {
encoder->Control(VP8E_SET_CPUUSED, 0);
if (video->frame() == 1) {
encoder->Control(VP8E_SET_CPUUSED, 1);
encoder->Control(VP8E_SET_ENABLEAUTOALTREF, 1);
encoder->Control(VP8E_SET_ARNR_MAXFRAMES, 7);
encoder->Control(VP8E_SET_ARNR_STRENGTH, 5);

View File

@@ -10,7 +10,7 @@
#ifndef TEST_CLEAR_SYSTEM_STATE_H_
#define TEST_CLEAR_SYSTEM_STATE_H_
#include "vpx_config.h"
#include "./vpx_config.h"
extern "C" {
#if ARCH_X86 || ARCH_X86_64
# include "vpx_ports/x86.h"

View File

@@ -134,14 +134,14 @@ class VP8CodecFactory : public CodecFactory {
const libvpx_test::VP8CodecFactory kVP8;
#define VP8_INSTANTIATE_TEST_CASE(test, params)\
#define VP8_INSTANTIATE_TEST_CASE(test, ...)\
INSTANTIATE_TEST_CASE_P(VP8, test, \
::testing::Combine( \
::testing::Values(static_cast<const libvpx_test::CodecFactory*>( \
&libvpx_test::kVP8)), \
params))
__VA_ARGS__))
#else
#define VP8_INSTANTIATE_TEST_CASE(test, params)
#define VP8_INSTANTIATE_TEST_CASE(test, ...)
#endif // CONFIG_VP8
@@ -216,14 +216,14 @@ class VP9CodecFactory : public CodecFactory {
const libvpx_test::VP9CodecFactory kVP9;
#define VP9_INSTANTIATE_TEST_CASE(test, params)\
#define VP9_INSTANTIATE_TEST_CASE(test, ...)\
INSTANTIATE_TEST_CASE_P(VP9, test, \
::testing::Combine( \
::testing::Values(static_cast<const libvpx_test::CodecFactory*>( \
&libvpx_test::kVP9)), \
params))
__VA_ARGS__))
#else
#define VP9_INSTANTIATE_TEST_CASE(test, params)
#define VP9_INSTANTIATE_TEST_CASE(test, ...)
#endif // CONFIG_VP9

View File

@@ -40,10 +40,6 @@ class ConfigTest : public ::libvpx_test::EncoderTest,
++frame_count_out_;
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
unsigned int frame_count_in_;
unsigned int frame_count_out_;
unsigned int frame_count_max_;

View File

@@ -8,6 +8,7 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include <string.h>
#include "test/acm_random.h"
#include "test/register_state_check.h"
#include "test/util.h"
@@ -22,8 +23,8 @@ extern "C" {
}
namespace {
typedef void (*convolve_fn_t)(const uint8_t *src, int src_stride,
uint8_t *dst, int dst_stride,
typedef void (*convolve_fn_t)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride,
const int16_t *filter_y, int filter_y_stride,
int w, int h);
@@ -187,7 +188,7 @@ class ConvolveTest : public PARAMS(int, int, const ConvolveFunctions*) {
protected:
static const int kDataAlignment = 16;
static const int kOuterBlockSize = 128;
static const int kOuterBlockSize = 256;
static const int kInputStride = kOuterBlockSize;
static const int kOutputStride = kOuterBlockSize;
static const int kMaxDimension = 64;
@@ -211,7 +212,7 @@ class ConvolveTest : public PARAMS(int, int, const ConvolveFunctions*) {
virtual void SetUp() {
UUT_ = GET_PARAM(2);
/* Set up guard blocks for an inner block cetered in the outer block */
/* Set up guard blocks for an inner block centered in the outer block */
for (int i = 0; i < kOutputBufferSize; ++i) {
if (IsIndexInBorder(i))
output_[i] = 255;
@@ -224,6 +225,10 @@ class ConvolveTest : public PARAMS(int, int, const ConvolveFunctions*) {
input_[i] = prng.Rand8Extremes();
}
void SetConstantInput(int value) {
memset(input_, value, kInputBufferSize);
}
void CheckGuardBlocks() {
for (int i = 0; i < kOutputBufferSize; ++i) {
if (IsIndexInBorder(i))
@@ -456,45 +461,86 @@ DECLARE_ALIGNED(256, const int16_t, kChangeFilters[16][8]) = {
{ 128}
};
/* This test exercises the horizontal and vertical filter functions. */
TEST_P(ConvolveTest, ChangeFilterWorks) {
uint8_t* const in = input();
uint8_t* const out = output();
/* Assume that the first input sample is at the 8/16th position. */
const int kInitialSubPelOffset = 8;
/* Filters are 8-tap, so the first filter tap will be applied to the pixel
* at position -3 with respect to the current filtering position. Since
* kInitialSubPelOffset is set to 8, we first select sub-pixel filter 8,
* which is non-zero only in the last tap. So, applying the filter at the
* current input position will result in an output equal to the pixel at
* offset +4 (-3 + 7) with respect to the current filtering position.
*/
const int kPixelSelected = 4;
/* Assume that each output pixel requires us to step on by 17/16th pixels in
* the input.
*/
const int kInputPixelStep = 17;
/* The filters are setup in such a way that the expected output produces
* sets of 8 identical output samples. As the filter position moves to the
* next 1/16th pixel position the only active (=128) filter tap moves one
* position to the left, resulting in the same input pixel being replicated
* in to the output for 8 consecutive samples. After each set of 8 positions
* the filters select a different input pixel. kFilterPeriodAdjust below
* computes which input pixel is written to the output for a specified
* x or y position.
*/
/* Test the horizontal filter. */
REGISTER_STATE_CHECK(UUT_->h8_(in, kInputStride, out, kOutputStride,
kChangeFilters[8], 17, kChangeFilters[4], 16,
Width(), Height()));
kChangeFilters[kInitialSubPelOffset],
kInputPixelStep, NULL, 0, Width(), Height()));
for (int x = 0; x < Width(); ++x) {
const int kQ4StepAdjust = x >> 4;
const int kFilterPeriodAdjust = (x >> 3) << 3;
const int ref_x = kQ4StepAdjust + kFilterPeriodAdjust + kPixelSelected;
ASSERT_EQ(in[ref_x], out[x]) << "x == " << x;
const int ref_x =
kPixelSelected + ((kInitialSubPelOffset
+ kFilterPeriodAdjust * kInputPixelStep)
>> SUBPEL_BITS);
ASSERT_EQ(in[ref_x], out[x]) << "x == " << x << "width = " << Width();
}
/* Test the vertical filter. */
REGISTER_STATE_CHECK(UUT_->v8_(in, kInputStride, out, kOutputStride,
kChangeFilters[4], 16, kChangeFilters[8], 17,
Width(), Height()));
NULL, 0, kChangeFilters[kInitialSubPelOffset],
kInputPixelStep, Width(), Height()));
for (int y = 0; y < Height(); ++y) {
const int kQ4StepAdjust = y >> 4;
const int kFilterPeriodAdjust = (y >> 3) << 3;
const int ref_y = kQ4StepAdjust + kFilterPeriodAdjust + kPixelSelected;
const int ref_y =
kPixelSelected + ((kInitialSubPelOffset
+ kFilterPeriodAdjust * kInputPixelStep)
>> SUBPEL_BITS);
ASSERT_EQ(in[ref_y * kInputStride], out[y * kInputStride]) << "y == " << y;
}
/* Test the horizontal and vertical filters in combination. */
REGISTER_STATE_CHECK(UUT_->hv8_(in, kInputStride, out, kOutputStride,
kChangeFilters[8], 17, kChangeFilters[8], 17,
kChangeFilters[kInitialSubPelOffset],
kInputPixelStep,
kChangeFilters[kInitialSubPelOffset],
kInputPixelStep,
Width(), Height()));
for (int y = 0; y < Height(); ++y) {
const int kQ4StepAdjustY = y >> 4;
const int kFilterPeriodAdjustY = (y >> 3) << 3;
const int ref_y = kQ4StepAdjustY + kFilterPeriodAdjustY + kPixelSelected;
const int ref_y =
kPixelSelected + ((kInitialSubPelOffset
+ kFilterPeriodAdjustY * kInputPixelStep)
>> SUBPEL_BITS);
for (int x = 0; x < Width(); ++x) {
const int kQ4StepAdjustX = x >> 4;
const int kFilterPeriodAdjustX = (x >> 3) << 3;
const int ref_x = kQ4StepAdjustX + kFilterPeriodAdjustX + kPixelSelected;
const int ref_x =
kPixelSelected + ((kInitialSubPelOffset
+ kFilterPeriodAdjustX * kInputPixelStep)
>> SUBPEL_BITS);
ASSERT_EQ(in[ref_y * kInputStride + ref_x], out[y * kOutputStride + x])
<< "x == " << x << ", y == " << y;
@@ -502,6 +548,34 @@ TEST_P(ConvolveTest, ChangeFilterWorks) {
}
}
/* This test exercises that enough rows and columns are filtered with every
possible initial fractional positions and scaling steps. */
TEST_P(ConvolveTest, CheckScalingFiltering) {
uint8_t* const in = input();
uint8_t* const out = output();
SetConstantInput(127);
for (int frac = 0; frac < 16; ++frac) {
for (int step = 1; step <= 32; ++step) {
/* Test the horizontal and vertical filters in combination. */
REGISTER_STATE_CHECK(UUT_->hv8_(in, kInputStride, out, kOutputStride,
vp9_sub_pel_filters_8[frac], step,
vp9_sub_pel_filters_8[frac], step,
Width(), Height()));
CheckGuardBlocks();
for (int y = 0; y < Height(); ++y) {
for (int x = 0; x < Width(); ++x) {
ASSERT_EQ(in[y * kInputStride + x], out[y * kOutputStride + x])
<< "x == " << x << ", y == " << y
<< ", frac == " << frac << ", step == " << step;
}
}
}
}
}
using std::tr1::make_tuple;
@@ -527,9 +601,9 @@ INSTANTIATE_TEST_CASE_P(C, ConvolveTest, ::testing::Values(
#if HAVE_SSSE3
const ConvolveFunctions convolve8_ssse3(
vp9_convolve8_horiz_ssse3, vp9_convolve8_avg_horiz_c,
vp9_convolve8_vert_ssse3, vp9_convolve8_avg_vert_c,
vp9_convolve8_ssse3, vp9_convolve8_avg_c);
vp9_convolve8_horiz_ssse3, vp9_convolve8_avg_horiz_ssse3,
vp9_convolve8_vert_ssse3, vp9_convolve8_avg_vert_ssse3,
vp9_convolve8_ssse3, vp9_convolve8_avg_ssse3);
INSTANTIATE_TEST_CASE_P(SSSE3, ConvolveTest, ::testing::Values(
make_tuple(4, 4, &convolve8_ssse3),
@@ -546,4 +620,26 @@ INSTANTIATE_TEST_CASE_P(SSSE3, ConvolveTest, ::testing::Values(
make_tuple(32, 64, &convolve8_ssse3),
make_tuple(64, 64, &convolve8_ssse3)));
#endif
#if HAVE_NEON
const ConvolveFunctions convolve8_neon(
vp9_convolve8_horiz_neon, vp9_convolve8_avg_horiz_neon,
vp9_convolve8_vert_neon, vp9_convolve8_avg_vert_neon,
vp9_convolve8_neon, vp9_convolve8_avg_neon);
INSTANTIATE_TEST_CASE_P(NEON, ConvolveTest, ::testing::Values(
make_tuple(4, 4, &convolve8_neon),
make_tuple(8, 4, &convolve8_neon),
make_tuple(4, 8, &convolve8_neon),
make_tuple(8, 8, &convolve8_neon),
make_tuple(16, 8, &convolve8_neon),
make_tuple(8, 16, &convolve8_neon),
make_tuple(16, 16, &convolve8_neon),
make_tuple(32, 16, &convolve8_neon),
make_tuple(16, 32, &convolve8_neon),
make_tuple(32, 32, &convolve8_neon),
make_tuple(64, 32, &convolve8_neon),
make_tuple(32, 64, &convolve8_neon),
make_tuple(64, 64, &convolve8_neon)));
#endif
} // namespace

112
test/cpu_speed_test.cc Normal file
View File

@@ -0,0 +1,112 @@
/*
* Copyright (c) 2012 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <climits>
#include <vector>
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "test/codec_factory.h"
#include "test/encode_test_driver.h"
#include "test/i420_video_source.h"
#include "test/util.h"
namespace {
class CpuSpeedTest : public ::libvpx_test::EncoderTest,
public ::libvpx_test::CodecTestWith2Params<
libvpx_test::TestMode, int> {
protected:
CpuSpeedTest() : EncoderTest(GET_PARAM(0)) {}
virtual void SetUp() {
InitializeConfig();
SetMode(GET_PARAM(1));
set_cpu_used_ = GET_PARAM(2);
}
virtual void PreEncodeFrameHook(::libvpx_test::VideoSource *video,
::libvpx_test::Encoder *encoder) {
if (video->frame() == 1) {
encoder->Control(VP8E_SET_CPUUSED, set_cpu_used_);
encoder->Control(VP8E_SET_ENABLEAUTOALTREF, 1);
encoder->Control(VP8E_SET_ARNR_MAXFRAMES, 7);
encoder->Control(VP8E_SET_ARNR_STRENGTH, 5);
encoder->Control(VP8E_SET_ARNR_TYPE, 3);
}
}
virtual void FramePktHook(const vpx_codec_cx_pkt_t *pkt) {
if (pkt->data.frame.flags & VPX_FRAME_IS_KEY) {
}
}
int set_cpu_used_;
};
TEST_P(CpuSpeedTest, TestQ0) {
// Validate that this non multiple of 64 wide clip encodes and decodes
// without a mismatch when passing in a very low max q. This pushes
// the encoder to producing lots of big partitions which will likely
// extend into the border and test the border condition.
cfg_.g_lag_in_frames = 25;
cfg_.rc_2pass_vbr_minsection_pct = 5;
cfg_.rc_2pass_vbr_minsection_pct = 2000;
cfg_.rc_target_bitrate = 400;
cfg_.rc_max_quantizer = 0;
cfg_.rc_min_quantizer = 0;
::libvpx_test::I420VideoSource video("hantro_odd.yuv", 208, 144, 30, 1, 0,
20);
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
}
TEST_P(CpuSpeedTest, TestEncodeHighBitrate) {
// Validate that this non multiple of 64 wide clip encodes and decodes
// without a mismatch when passing in a very low max q. This pushes
// the encoder to producing lots of big partitions which will likely
// extend into the border and test the border condition.
cfg_.g_lag_in_frames = 25;
cfg_.rc_2pass_vbr_minsection_pct = 5;
cfg_.rc_2pass_vbr_minsection_pct = 2000;
cfg_.rc_target_bitrate = 12000;
cfg_.rc_max_quantizer = 10;
cfg_.rc_min_quantizer = 0;
::libvpx_test::I420VideoSource video("hantro_odd.yuv", 208, 144, 30, 1, 0,
40);
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
}
TEST_P(CpuSpeedTest, TestLowBitrate) {
// Validate that this clip encodes and decodes without a mismatch
// when passing in a very high min q. This pushes the encoder to producing
// lots of small partitions which might will test the other condition.
cfg_.g_lag_in_frames = 25;
cfg_.rc_2pass_vbr_minsection_pct = 5;
cfg_.rc_2pass_vbr_minsection_pct = 2000;
cfg_.rc_target_bitrate = 200;
cfg_.rc_min_quantizer = 40;
::libvpx_test::I420VideoSource video("hantro_odd.yuv", 208, 144, 30, 1, 0,
40);
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
}
using std::tr1::make_tuple;
#define VP9_FACTORY \
static_cast<const libvpx_test::CodecFactory*> (&libvpx_test::kVP9)
VP9_INSTANTIATE_TEST_CASE(
CpuSpeedTest,
::testing::Values(::libvpx_test::kTwoPassGood),
::testing::Range(0, 5));
} // namespace

View File

@@ -42,10 +42,6 @@ class CQTest : public ::libvpx_test::EncoderTest,
n_frames_ = 0;
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void PreEncodeFrameHook(libvpx_test::VideoSource *video,
libvpx_test::Encoder *encoder) {
if (video->frame() == 1) {

View File

@@ -36,10 +36,6 @@ class DatarateTest : public ::libvpx_test::EncoderTest,
duration_ = 0.0;
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void PreEncodeFrameHook(::libvpx_test::VideoSource *video,
::libvpx_test::Encoder *encoder) {
const vpx_rational_t tb = video->timebase();
@@ -79,7 +75,7 @@ class DatarateTest : public ::libvpx_test::EncoderTest,
bits_in_buffer_model_ -= frame_size_in_bits;
// Update the running total of bits for end of test datarate checks.
bits_total_ += frame_size_in_bits ;
bits_total_ += frame_size_in_bits;
// If first drop not set and we have a drop set it to this time.
if (!first_drop_ && duration > 1)

View File

@@ -13,14 +13,16 @@
#include <string.h>
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "test/acm_random.h"
#include "test/clear_system_state.h"
#include "test/register_state_check.h"
#include "test/util.h"
extern "C" {
#include "vp9/common/vp9_entropy.h"
#include "vp9_rtcd.h"
void vp9_short_idct16x16_add_c(short *input, uint8_t *output, int pitch);
#include "./vp9_rtcd.h"
void vp9_short_idct16x16_add_c(int16_t *input, uint8_t *output, int pitch);
}
#include "acm_random.h"
#include "vpx/vpx_integer.h"
using libvpx_test::ACMRandom;
@@ -30,12 +32,13 @@ namespace {
#ifdef _MSC_VER
static int round(double x) {
if (x < 0)
return (int)ceil(x - 0.5);
return static_cast<int>(ceil(x - 0.5));
else
return (int)floor(x + 0.5);
return static_cast<int>(floor(x + 0.5));
}
#endif
const int kNumCoeffs = 256;
const double PI = 3.1415926535898;
void reference2_16x16_idct_2d(double *input, double *output) {
double x;
@@ -44,7 +47,9 @@ void reference2_16x16_idct_2d(double *input, double *output) {
double s = 0;
for (int i = 0; i < 16; ++i) {
for (int j = 0; j < 16; ++j) {
x=cos(PI*j*(l+0.5)/16.0)*cos(PI*i*(k+0.5)/16.0)*input[i*16+j]/256;
x = cos(PI * j * (l + 0.5) / 16.0) *
cos(PI * i * (k + 0.5) / 16.0) *
input[i * 16 + j] / 256;
if (i != 0)
x *= sqrt(2.0);
if (j != 0)
@@ -58,23 +63,23 @@ void reference2_16x16_idct_2d(double *input, double *output) {
}
static const double C1 = 0.995184726672197;
static const double C2 = 0.98078528040323;
static const double C3 = 0.956940335732209;
static const double C4 = 0.923879532511287;
static const double C5 = 0.881921264348355;
static const double C6 = 0.831469612302545;
static const double C7 = 0.773010453362737;
static const double C8 = 0.707106781186548;
static const double C9 = 0.634393284163646;
static const double C10 = 0.555570233019602;
static const double C11 = 0.471396736825998;
static const double C12 = 0.38268343236509;
static const double C13 = 0.290284677254462;
static const double C14 = 0.195090322016128;
static const double C15 = 0.098017140329561;
const double C1 = 0.995184726672197;
const double C2 = 0.98078528040323;
const double C3 = 0.956940335732209;
const double C4 = 0.923879532511287;
const double C5 = 0.881921264348355;
const double C6 = 0.831469612302545;
const double C7 = 0.773010453362737;
const double C8 = 0.707106781186548;
const double C9 = 0.634393284163646;
const double C10 = 0.555570233019602;
const double C11 = 0.471396736825998;
const double C12 = 0.38268343236509;
const double C13 = 0.290284677254462;
const double C14 = 0.195090322016128;
const double C15 = 0.098017140329561;
static void butterfly_16x16_dct_1d(double input[16], double output[16]) {
void butterfly_16x16_dct_1d(double input[16], double output[16]) {
double step[16];
double intermediate[16];
double temp1, temp2;
@@ -107,36 +112,36 @@ static void butterfly_16x16_dct_1d(double input[16], double output[16]) {
output[6] = step[1] - step[6];
output[7] = step[0] - step[7];
temp1 = step[ 8]*C7;
temp2 = step[15]*C9;
temp1 = step[ 8] * C7;
temp2 = step[15] * C9;
output[ 8] = temp1 + temp2;
temp1 = step[ 9]*C11;
temp2 = step[14]*C5;
temp1 = step[ 9] * C11;
temp2 = step[14] * C5;
output[ 9] = temp1 - temp2;
temp1 = step[10]*C3;
temp2 = step[13]*C13;
temp1 = step[10] * C3;
temp2 = step[13] * C13;
output[10] = temp1 + temp2;
temp1 = step[11]*C15;
temp2 = step[12]*C1;
temp1 = step[11] * C15;
temp2 = step[12] * C1;
output[11] = temp1 - temp2;
temp1 = step[11]*C1;
temp2 = step[12]*C15;
temp1 = step[11] * C1;
temp2 = step[12] * C15;
output[12] = temp2 + temp1;
temp1 = step[10]*C13;
temp2 = step[13]*C3;
temp1 = step[10] * C13;
temp2 = step[13] * C3;
output[13] = temp2 - temp1;
temp1 = step[ 9]*C5;
temp2 = step[14]*C11;
temp1 = step[ 9] * C5;
temp2 = step[14] * C11;
output[14] = temp2 + temp1;
temp1 = step[ 8]*C9;
temp2 = step[15]*C7;
temp1 = step[ 8] * C9;
temp2 = step[15] * C7;
output[15] = temp2 - temp1;
// step 3
@@ -145,20 +150,20 @@ static void butterfly_16x16_dct_1d(double input[16], double output[16]) {
step[ 2] = output[1] - output[2];
step[ 3] = output[0] - output[3];
temp1 = output[4]*C14;
temp2 = output[7]*C2;
temp1 = output[4] * C14;
temp2 = output[7] * C2;
step[ 4] = temp1 + temp2;
temp1 = output[5]*C10;
temp2 = output[6]*C6;
temp1 = output[5] * C10;
temp2 = output[6] * C6;
step[ 5] = temp1 + temp2;
temp1 = output[5]*C6;
temp2 = output[6]*C10;
temp1 = output[5] * C6;
temp2 = output[6] * C10;
step[ 6] = temp2 - temp1;
temp1 = output[4]*C2;
temp2 = output[7]*C14;
temp1 = output[4] * C2;
temp2 = output[7] * C14;
step[ 7] = temp2 - temp1;
step[ 8] = output[ 8] + output[11];
@@ -175,18 +180,18 @@ static void butterfly_16x16_dct_1d(double input[16], double output[16]) {
output[ 0] = (step[ 0] + step[ 1]);
output[ 8] = (step[ 0] - step[ 1]);
temp1 = step[2]*C12;
temp2 = step[3]*C4;
temp1 = step[2] * C12;
temp2 = step[3] * C4;
temp1 = temp1 + temp2;
output[ 4] = 2*(temp1*C8);
output[ 4] = 2*(temp1 * C8);
temp1 = step[2]*C4;
temp2 = step[3]*C12;
temp1 = step[2] * C4;
temp2 = step[3] * C12;
temp1 = temp2 - temp1;
output[12] = 2*(temp1*C8);
output[12] = 2 * (temp1 * C8);
output[ 2] = 2*((step[4] + step[ 5])*C8);
output[14] = 2*((step[7] - step[ 6])*C8);
output[ 2] = 2 * ((step[4] + step[ 5]) * C8);
output[14] = 2 * ((step[7] - step[ 6]) * C8);
temp1 = step[4] - step[5];
temp2 = step[6] + step[7];
@@ -196,17 +201,17 @@ static void butterfly_16x16_dct_1d(double input[16], double output[16]) {
intermediate[8] = step[8] + step[14];
intermediate[9] = step[9] + step[15];
temp1 = intermediate[8]*C12;
temp2 = intermediate[9]*C4;
temp1 = intermediate[8] * C12;
temp2 = intermediate[9] * C4;
temp1 = temp1 - temp2;
output[3] = 2*(temp1*C8);
output[3] = 2 * (temp1 * C8);
temp1 = intermediate[8]*C4;
temp2 = intermediate[9]*C12;
temp1 = intermediate[8] * C4;
temp2 = intermediate[9] * C12;
temp1 = temp2 + temp1;
output[13] = 2*(temp1*C8);
output[13] = 2 * (temp1 * C8);
output[ 9] = 2*((step[10] + step[11])*C8);
output[ 9] = 2 * ((step[10] + step[11]) * C8);
intermediate[11] = step[10] - step[11];
intermediate[12] = step[12] + step[13];
@@ -217,150 +222,300 @@ static void butterfly_16x16_dct_1d(double input[16], double output[16]) {
output[15] = (intermediate[11] + intermediate[12]);
output[ 1] = -(intermediate[11] - intermediate[12]);
output[ 7] = 2*(intermediate[13]*C8);
output[ 7] = 2 * (intermediate[13] * C8);
temp1 = intermediate[14]*C12;
temp2 = intermediate[15]*C4;
temp1 = intermediate[14] * C12;
temp2 = intermediate[15] * C4;
temp1 = temp1 - temp2;
output[11] = -2*(temp1*C8);
output[11] = -2 * (temp1 * C8);
temp1 = intermediate[14]*C4;
temp2 = intermediate[15]*C12;
temp1 = intermediate[14] * C4;
temp2 = intermediate[15] * C12;
temp1 = temp2 + temp1;
output[ 5] = 2*(temp1*C8);
output[ 5] = 2 * (temp1 * C8);
}
static void reference_16x16_dct_1d(double in[16], double out[16]) {
const double kPi = 3.141592653589793238462643383279502884;
const double kInvSqrt2 = 0.707106781186547524400844362104;
for (int k = 0; k < 16; k++) {
out[k] = 0.0;
for (int n = 0; n < 16; n++)
out[k] += in[n]*cos(kPi*(2*n+1)*k/32.0);
if (k == 0)
out[k] = out[k]*kInvSqrt2;
}
}
void reference_16x16_dct_2d(int16_t input[16*16], double output[16*16]) {
void reference_16x16_dct_2d(int16_t input[256], double output[256]) {
// First transform columns
for (int i = 0; i < 16; ++i) {
double temp_in[16], temp_out[16];
for (int j = 0; j < 16; ++j)
temp_in[j] = input[j*16 + i];
temp_in[j] = input[j * 16 + i];
butterfly_16x16_dct_1d(temp_in, temp_out);
for (int j = 0; j < 16; ++j)
output[j*16 + i] = temp_out[j];
output[j * 16 + i] = temp_out[j];
}
// Then transform rows
for (int i = 0; i < 16; ++i) {
double temp_in[16], temp_out[16];
for (int j = 0; j < 16; ++j)
temp_in[j] = output[j + i*16];
temp_in[j] = output[j + i * 16];
butterfly_16x16_dct_1d(temp_in, temp_out);
// Scale by some magic number
for (int j = 0; j < 16; ++j)
output[j + i*16] = temp_out[j]/2;
output[j + i * 16] = temp_out[j]/2;
}
}
typedef void (*fdct_t)(int16_t *in, int16_t *out, int stride);
typedef void (*idct_t)(int16_t *in, uint8_t *out, int stride);
typedef void (*fht_t) (int16_t *in, int16_t *out, int stride, int tx_type);
typedef void (*iht_t) (int16_t *in, uint8_t *dst, int stride, int tx_type);
TEST(VP9Idct16x16Test, AccuracyCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
for (int i = 0; i < count_test_block; ++i) {
int16_t in[256], coeff[256];
uint8_t dst[256], src[256];
double out_r[256];
for (int j = 0; j < 256; ++j) {
src[j] = rnd.Rand8();
dst[j] = rnd.Rand8();
}
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < 256; ++j)
in[j] = src[j] - dst[j];
reference_16x16_dct_2d(in, out_r);
for (int j = 0; j < 256; j++)
coeff[j] = round(out_r[j]);
vp9_short_idct16x16_add_c(coeff, dst, 16);
for (int j = 0; j < 256; ++j) {
const int diff = dst[j] - src[j];
const int error = diff * diff;
EXPECT_GE(1, error)
<< "Error: 16x16 IDCT has error " << error
<< " at index " << j;
}
}
void fdct16x16_ref(int16_t *in, int16_t *out, int stride, int tx_type) {
vp9_short_fdct16x16_c(in, out, stride);
}
// we need enable fdct test once we re-do the 16 point fdct.
TEST(VP9Fdct16x16Test, AccuracyCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
int max_error = 0;
double total_error = 0;
const int count_test_block = 1000;
for (int i = 0; i < count_test_block; ++i) {
int16_t test_input_block[256];
int16_t test_temp_block[256];
uint8_t dst[256], src[256];
void fht16x16_ref(int16_t *in, int16_t *out, int stride, int tx_type) {
vp9_short_fht16x16_c(in, out, stride, tx_type);
}
for (int j = 0; j < 256; ++j) {
src[j] = rnd.Rand8();
dst[j] = rnd.Rand8();
class Trans16x16TestBase {
public:
virtual ~Trans16x16TestBase() {}
protected:
virtual void RunFwdTxfm(int16_t *in, int16_t *out, int stride) = 0;
virtual void RunInvTxfm(int16_t *out, uint8_t *dst, int stride) = 0;
void RunAccuracyCheck() {
ACMRandom rnd(ACMRandom::DeterministicSeed());
uint32_t max_error = 0;
int64_t total_error = 0;
const int count_test_block = 10000;
for (int i = 0; i < count_test_block; ++i) {
DECLARE_ALIGNED_ARRAY(16, int16_t, test_input_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, test_temp_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, dst, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, src, kNumCoeffs);
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < kNumCoeffs; ++j) {
src[j] = rnd.Rand8();
dst[j] = rnd.Rand8();
test_input_block[j] = src[j] - dst[j];
}
REGISTER_STATE_CHECK(RunFwdTxfm(test_input_block,
test_temp_block, pitch_));
REGISTER_STATE_CHECK(RunInvTxfm(test_temp_block, dst, pitch_));
for (int j = 0; j < kNumCoeffs; ++j) {
const uint32_t diff = dst[j] - src[j];
const uint32_t error = diff * diff;
if (max_error < error)
max_error = error;
total_error += error;
}
}
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < 256; ++j)
test_input_block[j] = src[j] - dst[j];
const int pitch = 32;
vp9_short_fdct16x16_c(test_input_block, test_temp_block, pitch);
vp9_short_idct16x16_add_c(test_temp_block, dst, 16);
EXPECT_GE(1u, max_error)
<< "Error: 16x16 FHT/IHT has an individual round trip error > 1";
for (int j = 0; j < 256; ++j) {
const int diff = dst[j] - src[j];
const int error = diff * diff;
if (max_error < error)
max_error = error;
total_error += error;
EXPECT_GE(count_test_block , total_error)
<< "Error: 16x16 FHT/IHT has average round trip error > 1 per block";
}
void RunCoeffCheck() {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
DECLARE_ALIGNED_ARRAY(16, int16_t, input_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_ref_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_block, kNumCoeffs);
for (int i = 0; i < count_test_block; ++i) {
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < kNumCoeffs; ++j)
input_block[j] = rnd.Rand8() - rnd.Rand8();
fwd_txfm_ref(input_block, output_ref_block, pitch_, tx_type_);
REGISTER_STATE_CHECK(RunFwdTxfm(input_block, output_block, pitch_));
// The minimum quant value is 4.
for (int j = 0; j < kNumCoeffs; ++j)
EXPECT_EQ(output_block[j], output_ref_block[j]);
}
}
EXPECT_GE(1, max_error)
<< "Error: 16x16 FDCT/IDCT has an individual round trip error > 1";
void RunMemCheck() {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
DECLARE_ALIGNED_ARRAY(16, int16_t, input_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, input_extreme_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_ref_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_block, kNumCoeffs);
EXPECT_GE(count_test_block , total_error)
<< "Error: 16x16 FDCT/IDCT has average round trip error > 1 per block";
}
for (int i = 0; i < count_test_block; ++i) {
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < kNumCoeffs; ++j) {
input_block[j] = rnd.Rand8() - rnd.Rand8();
input_extreme_block[j] = rnd.Rand8() % 2 ? 255 : -255;
}
if (i == 0)
for (int j = 0; j < kNumCoeffs; ++j)
input_extreme_block[j] = 255;
if (i == 1)
for (int j = 0; j < kNumCoeffs; ++j)
input_extreme_block[j] = -255;
TEST(VP9Fdct16x16Test, CoeffSizeCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
for (int i = 0; i < count_test_block; ++i) {
int16_t input_block[256], input_extreme_block[256];
int16_t output_block[256], output_extreme_block[256];
fwd_txfm_ref(input_extreme_block, output_ref_block, pitch_, tx_type_);
REGISTER_STATE_CHECK(RunFwdTxfm(input_extreme_block,
output_block, pitch_));
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < 256; ++j) {
input_block[j] = rnd.Rand8() - rnd.Rand8();
input_extreme_block[j] = rnd.Rand8() % 2 ? 255 : -255;
}
if (i == 0)
for (int j = 0; j < 256; ++j)
input_extreme_block[j] = 255;
const int pitch = 32;
vp9_short_fdct16x16_c(input_block, output_block, pitch);
vp9_short_fdct16x16_c(input_extreme_block, output_extreme_block, pitch);
// The minimum quant value is 4.
for (int j = 0; j < 256; ++j) {
EXPECT_GE(4*DCT_MAX_VALUE, abs(output_block[j]))
<< "Error: 16x16 FDCT has coefficient larger than 4*DCT_MAX_VALUE";
EXPECT_GE(4*DCT_MAX_VALUE, abs(output_extreme_block[j]))
<< "Error: 16x16 FDCT extreme has coefficient larger than 4*DCT_MAX_VALUE";
// The minimum quant value is 4.
for (int j = 0; j < kNumCoeffs; ++j) {
EXPECT_EQ(output_block[j], output_ref_block[j]);
EXPECT_GE(4 * DCT_MAX_VALUE, abs(output_block[j]))
<< "Error: 16x16 FDCT has coefficient larger than 4*DCT_MAX_VALUE";
}
}
}
void RunInvAccuracyCheck() {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
DECLARE_ALIGNED_ARRAY(16, int16_t, in, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, coeff, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, dst, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, src, kNumCoeffs);
for (int i = 0; i < count_test_block; ++i) {
double out_r[kNumCoeffs];
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < kNumCoeffs; ++j) {
src[j] = rnd.Rand8();
dst[j] = rnd.Rand8();
in[j] = src[j] - dst[j];
}
reference_16x16_dct_2d(in, out_r);
for (int j = 0; j < kNumCoeffs; ++j)
coeff[j] = round(out_r[j]);
const int pitch = 32;
REGISTER_STATE_CHECK(RunInvTxfm(coeff, dst, pitch));
for (int j = 0; j < kNumCoeffs; ++j) {
const uint32_t diff = dst[j] - src[j];
const uint32_t error = diff * diff;
EXPECT_GE(1u, error)
<< "Error: 16x16 IDCT has error " << error
<< " at index " << j;
}
}
}
int pitch_;
int tx_type_;
fht_t fwd_txfm_ref;
};
class Trans16x16DCT : public Trans16x16TestBase,
public PARAMS(fdct_t, idct_t, int) {
public:
virtual ~Trans16x16DCT() {}
virtual void SetUp() {
fwd_txfm_ = GET_PARAM(0);
inv_txfm_ = GET_PARAM(1);
tx_type_ = GET_PARAM(2);
pitch_ = 32;
fwd_txfm_ref = fdct16x16_ref;
}
virtual void TearDown() { libvpx_test::ClearSystemState(); }
protected:
void RunFwdTxfm(int16_t *in, int16_t *out, int stride) {
fwd_txfm_(in, out, stride);
}
void RunInvTxfm(int16_t *out, uint8_t *dst, int stride) {
inv_txfm_(out, dst, stride >> 1);
}
fdct_t fwd_txfm_;
idct_t inv_txfm_;
};
TEST_P(Trans16x16DCT, AccuracyCheck) {
RunAccuracyCheck();
}
TEST_P(Trans16x16DCT, CoeffCheck) {
RunCoeffCheck();
}
TEST_P(Trans16x16DCT, MemCheck) {
RunMemCheck();
}
TEST_P(Trans16x16DCT, InvAccuracyCheck) {
RunInvAccuracyCheck();
}
class Trans16x16HT : public Trans16x16TestBase,
public PARAMS(fht_t, iht_t, int) {
public:
virtual ~Trans16x16HT() {}
virtual void SetUp() {
fwd_txfm_ = GET_PARAM(0);
inv_txfm_ = GET_PARAM(1);
tx_type_ = GET_PARAM(2);
pitch_ = 16;
fwd_txfm_ref = fht16x16_ref;
}
virtual void TearDown() { libvpx_test::ClearSystemState(); }
protected:
void RunFwdTxfm(int16_t *in, int16_t *out, int stride) {
fwd_txfm_(in, out, stride, tx_type_);
}
void RunInvTxfm(int16_t *out, uint8_t *dst, int stride) {
inv_txfm_(out, dst, stride, tx_type_);
}
fht_t fwd_txfm_;
iht_t inv_txfm_;
};
TEST_P(Trans16x16HT, AccuracyCheck) {
RunAccuracyCheck();
}
TEST_P(Trans16x16HT, CoeffCheck) {
RunCoeffCheck();
}
TEST_P(Trans16x16HT, MemCheck) {
RunMemCheck();
}
using std::tr1::make_tuple;
INSTANTIATE_TEST_CASE_P(
C, Trans16x16DCT,
::testing::Values(
make_tuple(&vp9_short_fdct16x16_c, &vp9_short_idct16x16_add_c, 0)));
INSTANTIATE_TEST_CASE_P(
C, Trans16x16HT,
::testing::Values(
make_tuple(&vp9_short_fht16x16_c, &vp9_short_iht16x16_add_c, 0),
make_tuple(&vp9_short_fht16x16_c, &vp9_short_iht16x16_add_c, 1),
make_tuple(&vp9_short_fht16x16_c, &vp9_short_iht16x16_add_c, 2),
make_tuple(&vp9_short_fht16x16_c, &vp9_short_iht16x16_add_c, 3)));
#if HAVE_SSE2
INSTANTIATE_TEST_CASE_P(
SSE2, Trans16x16DCT,
::testing::Values(
make_tuple(&vp9_short_fdct16x16_sse2, &vp9_short_idct16x16_add_c, 0)));
INSTANTIATE_TEST_CASE_P(
SSE2, Trans16x16HT,
::testing::Values(
make_tuple(&vp9_short_fht16x16_sse2, &vp9_short_iht16x16_add_sse2, 0),
make_tuple(&vp9_short_fht16x16_sse2, &vp9_short_iht16x16_add_sse2, 1),
make_tuple(&vp9_short_fht16x16_sse2, &vp9_short_iht16x16_add_sse2, 2),
make_tuple(&vp9_short_fht16x16_sse2, &vp9_short_iht16x16_add_sse2, 3)));
#endif
} // namespace

View File

@@ -13,15 +13,17 @@
#include <string.h>
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "test/acm_random.h"
#include "test/clear_system_state.h"
#include "test/register_state_check.h"
#include "test/util.h"
extern "C" {
#include "./vpx_config.h"
#include "vp9/common/vp9_entropy.h"
#include "./vp9_rtcd.h"
void vp9_short_fdct32x32_c(int16_t *input, int16_t *out, int pitch);
void vp9_short_idct32x32_add_c(short *input, uint8_t *output, int pitch);
}
#include "test/acm_random.h"
#include "vpx/vpx_integer.h"
using libvpx_test::ACMRandom;
@@ -30,35 +32,15 @@ namespace {
#ifdef _MSC_VER
static int round(double x) {
if (x < 0)
return (int)ceil(x - 0.5);
return static_cast<int>(ceil(x - 0.5));
else
return (int)floor(x + 0.5);
return static_cast<int>(floor(x + 0.5));
}
#endif
static const double kPi = 3.141592653589793238462643383279502884;
static void reference2_32x32_idct_2d(double *input, double *output) {
double x;
for (int l = 0; l < 32; ++l) {
for (int k = 0; k < 32; ++k) {
double s = 0;
for (int i = 0; i < 32; ++i) {
for (int j = 0; j < 32; ++j) {
x = cos(kPi * j * (l + 0.5) / 32.0) *
cos(kPi * i * (k + 0.5) / 32.0) * input[i * 32 + j] / 1024;
if (i != 0)
x *= sqrt(2.0);
if (j != 0)
x *= sqrt(2.0);
s += x;
}
}
output[k * 32 + l] = s / 4;
}
}
}
static void reference_32x32_dct_1d(double in[32], double out[32], int stride) {
const int kNumCoeffs = 1024;
const double kPi = 3.141592653589793238462643383279502884;
void reference_32x32_dct_1d(const double in[32], double out[32], int stride) {
const double kInvSqrt2 = 0.707106781186547524400844362104;
for (int k = 0; k < 32; k++) {
out[k] = 0.0;
@@ -69,7 +51,8 @@ static void reference_32x32_dct_1d(double in[32], double out[32], int stride) {
}
}
static void reference_32x32_dct_2d(int16_t input[32*32], double output[32*32]) {
void reference_32x32_dct_2d(const int16_t input[kNumCoeffs],
double output[kNumCoeffs]) {
// First transform columns
for (int i = 0; i < 32; ++i) {
double temp_in[32], temp_out[32];
@@ -91,27 +74,165 @@ static void reference_32x32_dct_2d(int16_t input[32*32], double output[32*32]) {
}
}
TEST(VP9Idct32x32Test, AccuracyCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
for (int i = 0; i < count_test_block; ++i) {
int16_t in[1024], coeff[1024];
uint8_t dst[1024], src[1024];
double out_r[1024];
typedef void (*fwd_txfm_t)(int16_t *in, int16_t *out, int stride);
typedef void (*inv_txfm_t)(int16_t *in, uint8_t *dst, int stride);
for (int j = 0; j < 1024; ++j) {
class Trans32x32Test : public PARAMS(fwd_txfm_t, inv_txfm_t, int) {
public:
virtual ~Trans32x32Test() {}
virtual void SetUp() {
fwd_txfm_ = GET_PARAM(0);
inv_txfm_ = GET_PARAM(1);
version_ = GET_PARAM(2); // 0: high precision forward transform
// 1: low precision version for rd loop
}
virtual void TearDown() { libvpx_test::ClearSystemState(); }
protected:
int version_;
fwd_txfm_t fwd_txfm_;
inv_txfm_t inv_txfm_;
};
TEST_P(Trans32x32Test, AccuracyCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
uint32_t max_error = 0;
int64_t total_error = 0;
const int count_test_block = 1000;
DECLARE_ALIGNED_ARRAY(16, int16_t, test_input_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, test_temp_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, dst, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, src, kNumCoeffs);
for (int i = 0; i < count_test_block; ++i) {
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < kNumCoeffs; ++j) {
src[j] = rnd.Rand8();
dst[j] = rnd.Rand8();
test_input_block[j] = src[j] - dst[j];
}
const int pitch = 64;
REGISTER_STATE_CHECK(fwd_txfm_(test_input_block, test_temp_block, pitch));
REGISTER_STATE_CHECK(inv_txfm_(test_temp_block, dst, 32));
for (int j = 0; j < kNumCoeffs; ++j) {
const uint32_t diff = dst[j] - src[j];
const uint32_t error = diff * diff;
if (max_error < error)
max_error = error;
total_error += error;
}
}
if (version_ == 1) {
max_error /= 2;
total_error /= 45;
}
EXPECT_GE(1u, max_error)
<< "Error: 32x32 FDCT/IDCT has an individual round-trip error > 1";
EXPECT_GE(count_test_block, total_error)
<< "Error: 32x32 FDCT/IDCT has average round-trip error > 1 per block";
}
TEST_P(Trans32x32Test, CoeffCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
DECLARE_ALIGNED_ARRAY(16, int16_t, input_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_ref_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_block, kNumCoeffs);
for (int i = 0; i < count_test_block; ++i) {
for (int j = 0; j < kNumCoeffs; ++j)
input_block[j] = rnd.Rand8() - rnd.Rand8();
const int pitch = 64;
vp9_short_fdct32x32_c(input_block, output_ref_block, pitch);
REGISTER_STATE_CHECK(fwd_txfm_(input_block, output_block, pitch));
if (version_ == 0) {
for (int j = 0; j < kNumCoeffs; ++j)
EXPECT_EQ(output_block[j], output_ref_block[j])
<< "Error: 32x32 FDCT versions have mismatched coefficients";
} else {
for (int j = 0; j < kNumCoeffs; ++j)
EXPECT_GE(6, abs(output_block[j] - output_ref_block[j]))
<< "Error: 32x32 FDCT rd has mismatched coefficients";
}
}
}
TEST_P(Trans32x32Test, MemCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 2000;
DECLARE_ALIGNED_ARRAY(16, int16_t, input_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, input_extreme_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_ref_block, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, output_block, kNumCoeffs);
for (int i = 0; i < count_test_block; ++i) {
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < 1024; ++j)
for (int j = 0; j < kNumCoeffs; ++j) {
input_block[j] = rnd.Rand8() - rnd.Rand8();
input_extreme_block[j] = rnd.Rand8() & 1 ? 255 : -255;
}
if (i == 0)
for (int j = 0; j < kNumCoeffs; ++j)
input_extreme_block[j] = 255;
if (i == 1)
for (int j = 0; j < kNumCoeffs; ++j)
input_extreme_block[j] = -255;
const int pitch = 64;
vp9_short_fdct32x32_c(input_extreme_block, output_ref_block, pitch);
REGISTER_STATE_CHECK(fwd_txfm_(input_extreme_block, output_block, pitch));
// The minimum quant value is 4.
for (int j = 0; j < kNumCoeffs; ++j) {
if (version_ == 0) {
EXPECT_EQ(output_block[j], output_ref_block[j])
<< "Error: 32x32 FDCT versions have mismatched coefficients";
} else {
EXPECT_GE(6, abs(output_block[j] - output_ref_block[j]))
<< "Error: 32x32 FDCT rd has mismatched coefficients";
}
EXPECT_GE(4 * DCT_MAX_VALUE, abs(output_ref_block[j]))
<< "Error: 32x32 FDCT C has coefficient larger than 4*DCT_MAX_VALUE";
EXPECT_GE(4 * DCT_MAX_VALUE, abs(output_block[j]))
<< "Error: 32x32 FDCT has coefficient larger than "
<< "4*DCT_MAX_VALUE";
}
}
}
TEST_P(Trans32x32Test, InverseAccuracy) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
DECLARE_ALIGNED_ARRAY(16, int16_t, in, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, int16_t, coeff, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, dst, kNumCoeffs);
DECLARE_ALIGNED_ARRAY(16, uint8_t, src, kNumCoeffs);
for (int i = 0; i < count_test_block; ++i) {
double out_r[kNumCoeffs];
// Initialize a test block with input range [-255, 255]
for (int j = 0; j < kNumCoeffs; ++j) {
src[j] = rnd.Rand8();
dst[j] = rnd.Rand8();
in[j] = src[j] - dst[j];
}
reference_32x32_dct_2d(in, out_r);
for (int j = 0; j < 1024; j++)
for (int j = 0; j < kNumCoeffs; ++j)
coeff[j] = round(out_r[j]);
vp9_short_idct32x32_add_c(coeff, dst, 32);
for (int j = 0; j < 1024; ++j) {
REGISTER_STATE_CHECK(inv_txfm_(coeff, dst, 32));
for (int j = 0; j < kNumCoeffs; ++j) {
const int diff = dst[j] - src[j];
const int error = diff * diff;
EXPECT_GE(1, error)
@@ -121,72 +242,21 @@ TEST(VP9Idct32x32Test, AccuracyCheck) {
}
}
TEST(VP9Fdct32x32Test, AccuracyCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
unsigned int max_error = 0;
int64_t total_error = 0;
const int count_test_block = 1000;
for (int i = 0; i < count_test_block; ++i) {
int16_t test_input_block[1024];
int16_t test_temp_block[1024];
uint8_t dst[1024], src[1024];
using std::tr1::make_tuple;
for (int j = 0; j < 1024; ++j) {
src[j] = rnd.Rand8();
dst[j] = rnd.Rand8();
}
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < 1024; ++j)
test_input_block[j] = src[j] - dst[j];
INSTANTIATE_TEST_CASE_P(
C, Trans32x32Test,
::testing::Values(
make_tuple(&vp9_short_fdct32x32_c, &vp9_short_idct32x32_add_c, 0),
make_tuple(&vp9_short_fdct32x32_rd_c, &vp9_short_idct32x32_add_c, 1)));
const int pitch = 64;
vp9_short_fdct32x32_c(test_input_block, test_temp_block, pitch);
vp9_short_idct32x32_add_c(test_temp_block, dst, 32);
for (int j = 0; j < 1024; ++j) {
const unsigned diff = dst[j] - src[j];
const unsigned error = diff * diff;
if (max_error < error)
max_error = error;
total_error += error;
}
}
EXPECT_GE(1u, max_error)
<< "Error: 32x32 FDCT/IDCT has an individual roundtrip error > 1";
EXPECT_GE(count_test_block, total_error)
<< "Error: 32x32 FDCT/IDCT has average roundtrip error > 1 per block";
}
TEST(VP9Fdct32x32Test, CoeffSizeCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
const int count_test_block = 1000;
for (int i = 0; i < count_test_block; ++i) {
int16_t input_block[1024], input_extreme_block[1024];
int16_t output_block[1024], output_extreme_block[1024];
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < 1024; ++j) {
input_block[j] = rnd.Rand8() - rnd.Rand8();
input_extreme_block[j] = rnd.Rand8() % 2 ? 255 : -255;
}
if (i == 0)
for (int j = 0; j < 1024; ++j)
input_extreme_block[j] = 255;
const int pitch = 64;
vp9_short_fdct32x32_c(input_block, output_block, pitch);
vp9_short_fdct32x32_c(input_extreme_block, output_extreme_block, pitch);
// The minimum quant value is 4.
for (int j = 0; j < 1024; ++j) {
EXPECT_GE(4*DCT_MAX_VALUE, abs(output_block[j]))
<< "Error: 32x32 FDCT has coefficient larger than 4*DCT_MAX_VALUE";
EXPECT_GE(4*DCT_MAX_VALUE, abs(output_extreme_block[j]))
<< "Error: 32x32 FDCT extreme has coefficient larger than "
"4*DCT_MAX_VALUE";
}
}
}
#if HAVE_SSE2
INSTANTIATE_TEST_CASE_P(
SSE2, Trans32x32Test,
::testing::Values(
make_tuple(&vp9_short_fdct32x32_sse2,
&vp9_short_idct32x32_add_sse2, 0),
make_tuple(&vp9_short_fdct32x32_rd_sse2,
&vp9_short_idct32x32_add_sse2, 1)));
#endif
} // namespace

View File

@@ -12,7 +12,7 @@
#define TEST_DECODE_TEST_DRIVER_H_
#include <cstring>
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "vpx_config.h"
#include "./vpx_config.h"
#include "vpx/vpx_decoder.h"
namespace libvpx_test {
@@ -36,9 +36,8 @@ class DxDataIterator {
};
// Provides a simplified interface to manage one video decoding.
//
// TODO: similar to Encoder class, the exact services should be
// added as more tests are added.
// Similar to Encoder class, the exact services should be added
// as more tests are added.
class Decoder {
public:
Decoder(vpx_codec_dec_cfg_t cfg, unsigned long deadline)

View File

@@ -8,7 +8,7 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_config.h"
#include "./vpx_config.h"
#include "test/codec_factory.h"
#include "test/encode_test_driver.h"
#include "test/decode_test_driver.h"
@@ -114,19 +114,19 @@ static bool compare_img(const vpx_image_t *img1,
const unsigned int height_y = img1->d_h;
unsigned int i;
for (i = 0; i < height_y; ++i)
match = ( memcmp(img1->planes[VPX_PLANE_Y] + i * img1->stride[VPX_PLANE_Y],
img2->planes[VPX_PLANE_Y] + i * img2->stride[VPX_PLANE_Y],
width_y) == 0) && match;
match = (memcmp(img1->planes[VPX_PLANE_Y] + i * img1->stride[VPX_PLANE_Y],
img2->planes[VPX_PLANE_Y] + i * img2->stride[VPX_PLANE_Y],
width_y) == 0) && match;
const unsigned int width_uv = (img1->d_w + 1) >> 1;
const unsigned int height_uv = (img1->d_h + 1) >> 1;
for (i = 0; i < height_uv; ++i)
match = ( memcmp(img1->planes[VPX_PLANE_U] + i * img1->stride[VPX_PLANE_U],
img2->planes[VPX_PLANE_U] + i * img2->stride[VPX_PLANE_U],
width_uv) == 0) && match;
match = (memcmp(img1->planes[VPX_PLANE_U] + i * img1->stride[VPX_PLANE_U],
img2->planes[VPX_PLANE_U] + i * img2->stride[VPX_PLANE_U],
width_uv) == 0) && match;
for (i = 0; i < height_uv; ++i)
match = ( memcmp(img1->planes[VPX_PLANE_V] + i * img1->stride[VPX_PLANE_V],
img2->planes[VPX_PLANE_V] + i * img2->stride[VPX_PLANE_V],
width_uv) == 0) && match;
match = (memcmp(img1->planes[VPX_PLANE_V] + i * img1->stride[VPX_PLANE_V],
img2->planes[VPX_PLANE_V] + i * img2->stride[VPX_PLANE_V],
width_uv) == 0) && match;
return match;
}
@@ -158,7 +158,7 @@ void EncoderTest::RunLoop(VideoSource *video) {
Decoder* const decoder = codec_->CreateDecoder(dec_cfg, 0);
bool again;
for (again = true, video->Begin(); again; video->Next()) {
again = video->img() != NULL;
again = (video->img() != NULL);
PreEncodeFrameHook(video);
PreEncodeFrameHook(video, encoder);

View File

@@ -190,7 +190,9 @@ class EncoderTest {
virtual void PSNRPktHook(const vpx_codec_cx_pkt_t *pkt) {}
// Hook to determine whether the encode loop should continue.
virtual bool Continue() const { return !abort_; }
virtual bool Continue() const {
return !(::testing::Test::HasFatalFailure() || abort_);
}
const CodecFactory *codec_;
// Hook to determine whether to decode frame after encoding

View File

@@ -50,10 +50,6 @@ class ErrorResilienceTest : public ::libvpx_test::EncoderTest,
mismatch_nframes_ = 0;
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void PSNRPktHook(const vpx_codec_cx_pkt_t *pkt) {
psnr_ += pkt->data.psnr.psnr[0];
nframes_++;
@@ -66,7 +62,7 @@ class ErrorResilienceTest : public ::libvpx_test::EncoderTest,
if (droppable_nframes_ > 0 &&
(cfg_.g_pass == VPX_RC_LAST_PASS || cfg_.g_pass == VPX_RC_ONE_PASS)) {
for (unsigned int i = 0; i < droppable_nframes_; ++i) {
if (droppable_frames_[i] == nframes_) {
if (droppable_frames_[i] == video->frame()) {
std::cout << " Encoding droppable frame: "
<< droppable_frames_[i] << "\n";
frame_flags_ |= (VP8_EFLAG_NO_UPD_LAST |
@@ -152,7 +148,7 @@ TEST_P(ErrorResilienceTest, OnVersusOff) {
const vpx_rational timebase = { 33333333, 1000000000 };
cfg_.g_timebase = timebase;
cfg_.rc_target_bitrate = 2000;
cfg_.g_lag_in_frames = 25;
cfg_.g_lag_in_frames = 10;
init_flags_ = VPX_CODEC_USE_PSNR;
@@ -183,6 +179,9 @@ TEST_P(ErrorResilienceTest, DropFramesWithoutRecovery) {
const vpx_rational timebase = { 33333333, 1000000000 };
cfg_.g_timebase = timebase;
cfg_.rc_target_bitrate = 500;
// FIXME(debargha): Fix this to work for any lag.
// Currently this test only works for lag = 0
cfg_.g_lag_in_frames = 0;
init_flags_ = VPX_CODEC_USE_PSNR;

View File

@@ -15,68 +15,69 @@
#include "third_party/googletest/src/include/gtest/gtest.h"
extern "C" {
#include "vp9_rtcd.h"
#include "./vp9_rtcd.h"
}
#include "acm_random.h"
#include "test/acm_random.h"
#include "vpx/vpx_integer.h"
#include "vpx_ports/mem.h"
using libvpx_test::ACMRandom;
namespace {
void fdct4x4(int16_t *in, int16_t *out, uint8_t *dst, int stride, int tx_type) {
void fdct4x4(int16_t *in, int16_t *out, uint8_t* /*dst*/,
int stride, int /*tx_type*/) {
vp9_short_fdct4x4_c(in, out, stride);
}
void idct4x4_add(int16_t *in, int16_t *out, uint8_t *dst,
int stride, int tx_type) {
void idct4x4_add(int16_t* /*in*/, int16_t *out, uint8_t *dst,
int stride, int /*tx_type*/) {
vp9_short_idct4x4_add_c(out, dst, stride >> 1);
}
void fht4x4(int16_t *in, int16_t *out, uint8_t *dst, int stride, int tx_type) {
void fht4x4(int16_t *in, int16_t *out, uint8_t* /*dst*/,
int stride, int tx_type) {
vp9_short_fht4x4_c(in, out, stride >> 1, tx_type);
}
void iht4x4_add(int16_t *in, int16_t *out, uint8_t *dst,
void iht4x4_add(int16_t* /*in*/, int16_t *out, uint8_t *dst,
int stride, int tx_type) {
vp9_short_iht4x4_add_c(out, dst, stride >> 1, tx_type);
}
class FwdTrans4x4Test : public ::testing::TestWithParam<int> {
public:
FwdTrans4x4Test() {SetUpTestTxfm();}
~FwdTrans4x4Test() {}
void SetUpTestTxfm() {
tx_type = GetParam();
if (tx_type == 0) {
fwd_txfm = fdct4x4;
inv_txfm = idct4x4_add;
virtual ~FwdTrans4x4Test() {}
virtual void SetUp() {
tx_type_ = GetParam();
if (tx_type_ == 0) {
fwd_txfm_ = fdct4x4;
inv_txfm_ = idct4x4_add;
} else {
fwd_txfm = fht4x4;
inv_txfm = iht4x4_add;
fwd_txfm_ = fht4x4;
inv_txfm_ = iht4x4_add;
}
}
protected:
void RunFwdTxfm(int16_t *in, int16_t *out, uint8_t *dst,
int stride, int tx_type) {
(*fwd_txfm)(in, out, dst, stride, tx_type);
(*fwd_txfm_)(in, out, dst, stride, tx_type);
}
void RunInvTxfm(int16_t *in, int16_t *out, uint8_t *dst,
int stride, int tx_type) {
(*inv_txfm)(in, out, dst, stride, tx_type);
(*inv_txfm_)(in, out, dst, stride, tx_type);
}
int tx_type;
void (*fwd_txfm)(int16_t *in, int16_t *out, uint8_t *dst,
int tx_type_;
void (*fwd_txfm_)(int16_t *in, int16_t *out, uint8_t *dst,
int stride, int tx_type);
void (*inv_txfm)(int16_t *in, int16_t *out, uint8_t *dst,
void (*inv_txfm_)(int16_t *in, int16_t *out, uint8_t *dst,
int stride, int tx_type);
};
TEST_P(FwdTrans4x4Test, SignBiasCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
int16_t test_input_block[16];
int16_t test_output_block[16];
DECLARE_ALIGNED_ARRAY(16, int16_t, test_input_block, 16);
DECLARE_ALIGNED_ARRAY(16, int16_t, test_output_block, 16);
const int pitch = 8;
int count_sign_block[16][2];
const int count_test_block = 1000000;
@@ -87,7 +88,7 @@ TEST_P(FwdTrans4x4Test, SignBiasCheck) {
for (int j = 0; j < 16; ++j)
test_input_block[j] = rnd.Rand8() - rnd.Rand8();
RunFwdTxfm(test_input_block, test_output_block, NULL, pitch, tx_type);
RunFwdTxfm(test_input_block, test_output_block, NULL, pitch, tx_type_);
for (int j = 0; j < 16; ++j) {
if (test_output_block[j] < 0)
@@ -103,7 +104,7 @@ TEST_P(FwdTrans4x4Test, SignBiasCheck) {
EXPECT_TRUE(bias_acceptable)
<< "Error: 4x4 FDCT/FHT has a sign bias > 1%"
<< " for input range [-255, 255] at index " << j
<< " tx_type " << tx_type;
<< " tx_type " << tx_type_;
}
memset(count_sign_block, 0, sizeof(count_sign_block));
@@ -112,7 +113,7 @@ TEST_P(FwdTrans4x4Test, SignBiasCheck) {
for (int j = 0; j < 16; ++j)
test_input_block[j] = (rnd.Rand8() >> 4) - (rnd.Rand8() >> 4);
RunFwdTxfm(test_input_block, test_output_block, NULL, pitch, tx_type);
RunFwdTxfm(test_input_block, test_output_block, NULL, pitch, tx_type_);
for (int j = 0; j < 16; ++j) {
if (test_output_block[j] < 0)
@@ -135,12 +136,13 @@ TEST_P(FwdTrans4x4Test, RoundTripErrorCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
int max_error = 0;
double total_error = 0;
int total_error = 0;
const int count_test_block = 1000000;
for (int i = 0; i < count_test_block; ++i) {
int16_t test_input_block[16];
int16_t test_temp_block[16];
uint8_t dst[16], src[16];
DECLARE_ALIGNED_ARRAY(16, int16_t, test_input_block, 16);
DECLARE_ALIGNED_ARRAY(16, int16_t, test_temp_block, 16);
DECLARE_ALIGNED_ARRAY(16, uint8_t, dst, 16);
DECLARE_ALIGNED_ARRAY(16, uint8_t, src, 16);
for (int j = 0; j < 16; ++j) {
src[j] = rnd.Rand8();
@@ -151,10 +153,10 @@ TEST_P(FwdTrans4x4Test, RoundTripErrorCheck) {
test_input_block[j] = src[j] - dst[j];
const int pitch = 8;
RunFwdTxfm(test_input_block, test_temp_block, dst, pitch, tx_type);
RunFwdTxfm(test_input_block, test_temp_block, dst, pitch, tx_type_);
for (int j = 0; j < 16; ++j) {
if(test_temp_block[j] > 0) {
if (test_temp_block[j] > 0) {
test_temp_block[j] += 2;
test_temp_block[j] /= 4;
test_temp_block[j] *= 4;
@@ -166,7 +168,7 @@ TEST_P(FwdTrans4x4Test, RoundTripErrorCheck) {
}
// inverse transform and reconstruct the pixel block
RunInvTxfm(test_input_block, test_temp_block, dst, pitch, tx_type);
RunInvTxfm(test_input_block, test_temp_block, dst, pitch, tx_type_);
for (int j = 0; j < 16; ++j) {
const int diff = dst[j] - src[j];
@@ -181,7 +183,7 @@ TEST_P(FwdTrans4x4Test, RoundTripErrorCheck) {
EXPECT_GE(count_test_block, total_error)
<< "Error: FDCT/IDCT or FHT/IHT has average "
"roundtrip error > 1 per block";
<< "roundtrip error > 1 per block";
}
INSTANTIATE_TEST_CASE_P(VP9, FwdTrans4x4Test, ::testing::Range(0, 4));

View File

@@ -13,23 +13,78 @@
#include <string.h>
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "test/clear_system_state.h"
#include "test/register_state_check.h"
#include "vpx_ports/mem.h"
extern "C" {
#include "vp9_rtcd.h"
void vp9_short_idct8x8_add_c(short *input, uint8_t *output, int pitch);
#include "./vp9_rtcd.h"
void vp9_short_idct8x8_add_c(int16_t *input, uint8_t *output, int pitch);
}
#include "acm_random.h"
#include "test/acm_random.h"
#include "vpx/vpx_integer.h"
using libvpx_test::ACMRandom;
namespace {
void fdct8x8(int16_t *in, int16_t *out, uint8_t* /*dst*/,
int stride, int /*tx_type*/) {
vp9_short_fdct8x8_c(in, out, stride);
}
void idct8x8_add(int16_t* /*in*/, int16_t *out, uint8_t *dst,
int stride, int /*tx_type*/) {
vp9_short_idct8x8_add_c(out, dst, stride >> 1);
}
void fht8x8(int16_t *in, int16_t *out, uint8_t* /*dst*/,
int stride, int tx_type) {
// TODO(jingning): need to refactor this to test both _c and _sse2 functions,
// when we have all inverse dct functions done sse2.
#if HAVE_SSE2
vp9_short_fht8x8_sse2(in, out, stride >> 1, tx_type);
#else
vp9_short_fht8x8_c(in, out, stride >> 1, tx_type);
#endif
}
void iht8x8_add(int16_t* /*in*/, int16_t *out, uint8_t *dst,
int stride, int tx_type) {
vp9_short_iht8x8_add_c(out, dst, stride >> 1, tx_type);
}
TEST(VP9Fdct8x8Test, SignBiasCheck) {
class FwdTrans8x8Test : public ::testing::TestWithParam<int> {
public:
virtual ~FwdTrans8x8Test() {}
virtual void SetUp() {
tx_type_ = GetParam();
if (tx_type_ == 0) {
fwd_txfm = fdct8x8;
inv_txfm = idct8x8_add;
} else {
fwd_txfm = fht8x8;
inv_txfm = iht8x8_add;
}
}
virtual void TearDown() { libvpx_test::ClearSystemState(); }
protected:
void RunFwdTxfm(int16_t *in, int16_t *out, uint8_t *dst,
int stride, int tx_type) {
(*fwd_txfm)(in, out, dst, stride, tx_type);
}
void RunInvTxfm(int16_t *in, int16_t *out, uint8_t *dst,
int stride, int tx_type) {
(*inv_txfm)(in, out, dst, stride, tx_type);
}
int tx_type_;
void (*fwd_txfm)(int16_t*, int16_t*, uint8_t*, int, int);
void (*inv_txfm)(int16_t*, int16_t*, uint8_t*, int, int);
};
TEST_P(FwdTrans8x8Test, SignBiasCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
int16_t test_input_block[64];
int16_t test_output_block[64];
DECLARE_ALIGNED_ARRAY(16, int16_t, test_input_block, 64);
DECLARE_ALIGNED_ARRAY(16, int16_t, test_output_block, 64);
const int pitch = 16;
int count_sign_block[64][2];
const int count_test_block = 100000;
@@ -40,8 +95,9 @@ TEST(VP9Fdct8x8Test, SignBiasCheck) {
// Initialize a test block with input range [-255, 255].
for (int j = 0; j < 64; ++j)
test_input_block[j] = rnd.Rand8() - rnd.Rand8();
vp9_short_fdct8x8_c(test_input_block, test_output_block, pitch);
REGISTER_STATE_CHECK(
RunFwdTxfm(test_input_block, test_output_block,
NULL, pitch, tx_type_));
for (int j = 0; j < 64; ++j) {
if (test_output_block[j] < 0)
@@ -55,7 +111,7 @@ TEST(VP9Fdct8x8Test, SignBiasCheck) {
const int diff = abs(count_sign_block[j][0] - count_sign_block[j][1]);
const int max_diff = 1125;
EXPECT_LT(diff, max_diff)
<< "Error: 8x8 FDCT has a sign bias > "
<< "Error: 8x8 FDCT/FHT has a sign bias > "
<< 1. * max_diff / count_test_block * 100 << "%"
<< " for input range [-255, 255] at index " << j
<< " count0: " << count_sign_block[j][0]
@@ -69,8 +125,9 @@ TEST(VP9Fdct8x8Test, SignBiasCheck) {
// Initialize a test block with input range [-15, 15].
for (int j = 0; j < 64; ++j)
test_input_block[j] = (rnd.Rand8() >> 4) - (rnd.Rand8() >> 4);
vp9_short_fdct8x8_c(test_input_block, test_output_block, pitch);
REGISTER_STATE_CHECK(
RunFwdTxfm(test_input_block, test_output_block,
NULL, pitch, tx_type_));
for (int j = 0; j < 64; ++j) {
if (test_output_block[j] < 0)
@@ -84,24 +141,25 @@ TEST(VP9Fdct8x8Test, SignBiasCheck) {
const int diff = abs(count_sign_block[j][0] - count_sign_block[j][1]);
const int max_diff = 10000;
EXPECT_LT(diff, max_diff)
<< "Error: 4x4 FDCT has a sign bias > "
<< "Error: 4x4 FDCT/FHT has a sign bias > "
<< 1. * max_diff / count_test_block * 100 << "%"
<< " for input range [-15, 15] at index " << j
<< " count0: " << count_sign_block[j][0]
<< " count1: " << count_sign_block[j][1]
<< " diff: " << diff;
}
};
}
TEST(VP9Fdct8x8Test, RoundTripErrorCheck) {
TEST_P(FwdTrans8x8Test, RoundTripErrorCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
int max_error = 0;
double total_error = 0;
int total_error = 0;
const int count_test_block = 100000;
for (int i = 0; i < count_test_block; ++i) {
int16_t test_input_block[64];
int16_t test_temp_block[64];
uint8_t dst[64], src[64];
DECLARE_ALIGNED_ARRAY(16, int16_t, test_input_block, 64);
DECLARE_ALIGNED_ARRAY(16, int16_t, test_temp_block, 64);
DECLARE_ALIGNED_ARRAY(16, uint8_t, dst, 64);
DECLARE_ALIGNED_ARRAY(16, uint8_t, src, 64);
for (int j = 0; j < 64; ++j) {
src[j] = rnd.Rand8();
@@ -112,9 +170,11 @@ TEST(VP9Fdct8x8Test, RoundTripErrorCheck) {
test_input_block[j] = src[j] - dst[j];
const int pitch = 16;
vp9_short_fdct8x8_c(test_input_block, test_temp_block, pitch);
for (int j = 0; j < 64; ++j){
if(test_temp_block[j] > 0) {
REGISTER_STATE_CHECK(
RunFwdTxfm(test_input_block, test_temp_block,
dst, pitch, tx_type_));
for (int j = 0; j < 64; ++j) {
if (test_temp_block[j] > 0) {
test_temp_block[j] += 2;
test_temp_block[j] /= 4;
test_temp_block[j] *= 4;
@@ -124,7 +184,9 @@ TEST(VP9Fdct8x8Test, RoundTripErrorCheck) {
test_temp_block[j] *= 4;
}
}
vp9_short_idct8x8_add_c(test_temp_block, dst, 8);
REGISTER_STATE_CHECK(
RunInvTxfm(test_input_block, test_temp_block,
dst, pitch, tx_type_));
for (int j = 0; j < 64; ++j) {
const int diff = dst[j] - src[j];
@@ -136,21 +198,23 @@ TEST(VP9Fdct8x8Test, RoundTripErrorCheck) {
}
EXPECT_GE(1, max_error)
<< "Error: 8x8 FDCT/IDCT has an individual roundtrip error > 1";
<< "Error: 8x8 FDCT/IDCT or FHT/IHT has an individual roundtrip error > 1";
EXPECT_GE(count_test_block/5, total_error)
<< "Error: 8x8 FDCT/IDCT has average roundtrip error > 1/5 per block";
};
<< "Error: 8x8 FDCT/IDCT or FHT/IHT has average roundtrip "
"error > 1/5 per block";
}
TEST(VP9Fdct8x8Test, ExtremalCheck) {
TEST_P(FwdTrans8x8Test, ExtremalCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
int max_error = 0;
double total_error = 0;
int total_error = 0;
const int count_test_block = 100000;
for (int i = 0; i < count_test_block; ++i) {
int16_t test_input_block[64];
int16_t test_temp_block[64];
uint8_t dst[64], src[64];
DECLARE_ALIGNED_ARRAY(16, int16_t, test_input_block, 64);
DECLARE_ALIGNED_ARRAY(16, int16_t, test_temp_block, 64);
DECLARE_ALIGNED_ARRAY(16, uint8_t, dst, 64);
DECLARE_ALIGNED_ARRAY(16, uint8_t, src, 64);
for (int j = 0; j < 64; ++j) {
src[j] = rnd.Rand8() % 2 ? 255 : 0;
@@ -161,8 +225,12 @@ TEST(VP9Fdct8x8Test, ExtremalCheck) {
test_input_block[j] = src[j] - dst[j];
const int pitch = 16;
vp9_short_fdct8x8_c(test_input_block, test_temp_block, pitch);
vp9_short_idct8x8_add_c(test_temp_block, dst, 8);
REGISTER_STATE_CHECK(
RunFwdTxfm(test_input_block, test_temp_block,
dst, pitch, tx_type_));
REGISTER_STATE_CHECK(
RunInvTxfm(test_input_block, test_temp_block,
dst, pitch, tx_type_));
for (int j = 0; j < 64; ++j) {
const int diff = dst[j] - src[j];
@@ -173,13 +241,14 @@ TEST(VP9Fdct8x8Test, ExtremalCheck) {
}
EXPECT_GE(1, max_error)
<< "Error: Extremal 8x8 FDCT/IDCT has an"
<< "Error: Extremal 8x8 FDCT/IDCT or FHT/IHT has an"
<< " individual roundtrip error > 1";
EXPECT_GE(count_test_block/5, total_error)
<< "Error: Extremal 8x8 FDCT/IDCT has average"
<< "Error: Extremal 8x8 FDCT/IDCT or FHT/IHT has average"
<< " roundtrip error > 1/5 per block";
}
};
}
INSTANTIATE_TEST_CASE_P(VP9, FwdTrans8x8Test, ::testing::Range(0, 4));
} // namespace

View File

@@ -11,6 +11,7 @@
#define TEST_I420_VIDEO_SOURCE_H_
#include <cstdio>
#include <cstdlib>
#include <string>
#include "test/video_source.h"
@@ -34,7 +35,6 @@ class I420VideoSource : public VideoSource {
height_(0),
framerate_numerator_(rate_numerator),
framerate_denominator_(rate_denominator) {
// This initializes raw_sz_, width_, height_ and allocates an img.
SetSize(width, height);
}
@@ -49,7 +49,7 @@ class I420VideoSource : public VideoSource {
if (input_file_)
fclose(input_file_);
input_file_ = OpenTestDataFile(file_name_);
ASSERT_TRUE(input_file_) << "Input file open failed. Filename: "
ASSERT_TRUE(input_file_ != NULL) << "Input file open failed. Filename: "
<< file_name_;
if (start_) {
fseek(input_file_, raw_sz_ * start_, SEEK_SET);
@@ -92,6 +92,7 @@ class I420VideoSource : public VideoSource {
}
virtual void FillFrame() {
ASSERT_TRUE(input_file_ != NULL);
// Read a frame from input_file.
if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) {
limit_ = frame_;
@@ -108,8 +109,8 @@ class I420VideoSource : public VideoSource {
unsigned int frame_;
unsigned int width_;
unsigned int height_;
unsigned int framerate_numerator_;
unsigned int framerate_denominator_;
int framerate_numerator_;
int framerate_denominator_;
};
} // namespace libvpx_test

View File

@@ -15,10 +15,10 @@
#include "third_party/googletest/src/include/gtest/gtest.h"
extern "C" {
#include "vp9_rtcd.h"
#include "./vp9_rtcd.h"
}
#include "acm_random.h"
#include "test/acm_random.h"
#include "vpx/vpx_integer.h"
using libvpx_test::ACMRandom;
@@ -27,10 +27,10 @@ namespace {
#ifdef _MSC_VER
static int round(double x) {
if(x < 0)
return (int)ceil(x - 0.5);
if (x < 0)
return static_cast<int>(ceil(x - 0.5));
else
return (int)floor(x + 0.5);
return static_cast<int>(floor(x + 0.5));
}
#endif

View File

@@ -8,7 +8,6 @@
* be found in the AUTHORS file in the root of the source tree.
*/
extern "C" {
#include "./vpx_config.h"
#include "./vp8_rtcd.h"
@@ -17,105 +16,101 @@ extern "C" {
#include "test/register_state_check.h"
#include "third_party/googletest/src/include/gtest/gtest.h"
typedef void (*idct_fn_t)(short *input, unsigned char *pred_ptr,
#include "vpx/vpx_integer.h"
typedef void (*idct_fn_t)(int16_t *input, unsigned char *pred_ptr,
int pred_stride, unsigned char *dst_ptr,
int dst_stride);
namespace {
class IDCTTest : public ::testing::TestWithParam<idct_fn_t> {
protected:
virtual void SetUp() {
int i;
protected:
virtual void SetUp() {
int i;
UUT = GetParam();
memset(input, 0, sizeof(input));
/* Set up guard blocks */
for (i = 0; i < 256; i++)
output[i] = ((i & 0xF) < 4 && (i < 64)) ? 0 : -1;
}
UUT = GetParam();
memset(input, 0, sizeof(input));
/* Set up guard blocks */
for (i = 0; i < 256; i++) output[i] = ((i & 0xF) < 4 && (i < 64)) ? 0 : -1;
}
virtual void TearDown() {
libvpx_test::ClearSystemState();
}
virtual void TearDown() { libvpx_test::ClearSystemState(); }
idct_fn_t UUT;
short input[16];
unsigned char output[256];
unsigned char predict[256];
idct_fn_t UUT;
int16_t input[16];
unsigned char output[256];
unsigned char predict[256];
};
TEST_P(IDCTTest, TestGuardBlocks) {
int i;
int i;
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(0, output[i]) << i;
else
EXPECT_EQ(255, output[i]);
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(0, output[i]) << i;
else
EXPECT_EQ(255, output[i]);
}
TEST_P(IDCTTest, TestAllZeros) {
int i;
int i;
REGISTER_STATE_CHECK(UUT(input, output, 16, output, 16));
REGISTER_STATE_CHECK(UUT(input, output, 16, output, 16));
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(0, output[i]) << "i==" << i;
else
EXPECT_EQ(255, output[i]) << "i==" << i;
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(0, output[i]) << "i==" << i;
else
EXPECT_EQ(255, output[i]) << "i==" << i;
}
TEST_P(IDCTTest, TestAllOnes) {
int i;
int i;
input[0] = 4;
REGISTER_STATE_CHECK(UUT(input, output, 16, output, 16));
input[0] = 4;
REGISTER_STATE_CHECK(UUT(input, output, 16, output, 16));
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(1, output[i]) << "i==" << i;
else
EXPECT_EQ(255, output[i]) << "i==" << i;
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(1, output[i]) << "i==" << i;
else
EXPECT_EQ(255, output[i]) << "i==" << i;
}
TEST_P(IDCTTest, TestAddOne) {
int i;
int i;
for (i = 0; i < 256; i++)
predict[i] = i;
input[0] = 4;
REGISTER_STATE_CHECK(UUT(input, predict, 16, output, 16));
for (i = 0; i < 256; i++) predict[i] = i;
input[0] = 4;
REGISTER_STATE_CHECK(UUT(input, predict, 16, output, 16));
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(i+1, output[i]) << "i==" << i;
else
EXPECT_EQ(255, output[i]) << "i==" << i;
for (i = 0; i < 256; i++)
if ((i & 0xF) < 4 && i < 64)
EXPECT_EQ(i + 1, output[i]) << "i==" << i;
else
EXPECT_EQ(255, output[i]) << "i==" << i;
}
TEST_P(IDCTTest, TestWithData) {
int i;
int i;
for (i = 0; i < 16; i++)
input[i] = i;
for (i = 0; i < 16; i++) input[i] = i;
REGISTER_STATE_CHECK(UUT(input, output, 16, output, 16));
REGISTER_STATE_CHECK(UUT(input, output, 16, output, 16));
for (i = 0; i < 256; i++)
if ((i & 0xF) > 3 || i > 63)
EXPECT_EQ(255, output[i]) << "i==" << i;
else if (i == 0)
EXPECT_EQ(11, output[i]) << "i==" << i;
else if (i == 34)
EXPECT_EQ(1, output[i]) << "i==" << i;
else if (i == 2 || i == 17 || i == 32)
EXPECT_EQ(3, output[i]) << "i==" << i;
else
EXPECT_EQ(0, output[i]) << "i==" << i;
for (i = 0; i < 256; i++)
if ((i & 0xF) > 3 || i > 63)
EXPECT_EQ(255, output[i]) << "i==" << i;
else if (i == 0)
EXPECT_EQ(11, output[i]) << "i==" << i;
else if (i == 34)
EXPECT_EQ(1, output[i]) << "i==" << i;
else if (i == 2 || i == 17 || i == 32)
EXPECT_EQ(3, output[i]) << "i==" << i;
else
EXPECT_EQ(0, output[i]) << "i==" << i;
}
INSTANTIATE_TEST_CASE_P(C, IDCTTest,
::testing::Values(vp8_short_idct4x4llm_c));
INSTANTIATE_TEST_CASE_P(C, IDCTTest, ::testing::Values(vp8_short_idct4x4llm_c));
#if HAVE_MMX
INSTANTIATE_TEST_CASE_P(MMX, IDCTTest,
::testing::Values(vp8_short_idct4x4llm_mmx));

View File

@@ -15,8 +15,8 @@
#include "test/register_state_check.h"
#include "third_party/googletest/src/include/gtest/gtest.h"
extern "C" {
#include "vpx_config.h"
#include "vp8_rtcd.h"
#include "./vpx_config.h"
#include "./vp8_rtcd.h"
#include "vp8/common/blockd.h"
#include "vpx_mem/vpx_mem.h"
}
@@ -27,6 +27,8 @@ using libvpx_test::ACMRandom;
class IntraPredBase {
public:
virtual ~IntraPredBase() {}
virtual void TearDown() {
libvpx_test::ClearSystemState();
}
@@ -104,9 +106,9 @@ class IntraPredBase {
for (int y = 0; y < block_size_; y++)
sum += data_ptr_[p][y * stride_ - 1];
expected = (sum + (1 << (shift - 1))) >> shift;
} else
} else {
expected = 0x80;
}
// check that all subsequent lines are equal to the first
for (int y = 1; y < block_size_; ++y)
ASSERT_EQ(0, memcmp(data_ptr_[p], &data_ptr_[p][y * stride_],

View File

@@ -28,7 +28,7 @@ static unsigned int MemGetLe32(const uint8_t *mem) {
// so that we can do actual file decodes.
class IVFVideoSource : public CompressedVideoSource {
public:
IVFVideoSource(const std::string &file_name)
explicit IVFVideoSource(const std::string &file_name)
: file_name_(file_name),
input_file_(NULL),
compressed_frame_buf_(NULL),
@@ -47,12 +47,13 @@ class IVFVideoSource : public CompressedVideoSource {
virtual void Init() {
// Allocate a buffer for read in the compressed video frame.
compressed_frame_buf_ = new uint8_t[libvpx_test::kCodeBufferSize];
ASSERT_TRUE(compressed_frame_buf_) << "Allocate frame buffer failed";
ASSERT_TRUE(compressed_frame_buf_ != NULL)
<< "Allocate frame buffer failed";
}
virtual void Begin() {
input_file_ = OpenTestDataFile(file_name_);
ASSERT_TRUE(input_file_) << "Input file open failed. Filename: "
ASSERT_TRUE(input_file_ != NULL) << "Input file open failed. Filename: "
<< file_name_;
// Read file header
@@ -72,6 +73,7 @@ class IVFVideoSource : public CompressedVideoSource {
}
void FillFrame() {
ASSERT_TRUE(input_file_ != NULL);
uint8_t frame_hdr[kIvfFrameHdrSize];
// Check frame header and read a frame from input_file.
if (fread(frame_hdr, 1, kIvfFrameHdrSize, input_file_)

View File

@@ -31,10 +31,6 @@ class KeyframeTest : public ::libvpx_test::EncoderTest,
set_cpu_used_ = 0;
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void PreEncodeFrameHook(::libvpx_test::VideoSource *video,
::libvpx_test::Encoder *encoder) {
if (kf_do_force_kf_)
@@ -136,7 +132,6 @@ TEST_P(KeyframeTest, TestAutoKeyframe) {
// Verify that keyframes match the file keyframes in the file.
for (std::vector<vpx_codec_pts_t>::const_iterator iter = kf_pts_list_.begin();
iter != kf_pts_list_.end(); ++iter) {
if (deadline_ == VPX_DL_REALTIME && *iter > 0)
EXPECT_EQ(0, (*iter - 1) % 30) << "Unexpected keyframe at frame "
<< *iter;

View File

@@ -8,8 +8,8 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef LIBVPX_TEST_MD5_HELPER_H_
#define LIBVPX_TEST_MD5_HELPER_H_
#ifndef TEST_MD5_HELPER_H_
#define TEST_MD5_HELPER_H_
extern "C" {
#include "./md5_utils.h"
@@ -25,9 +25,15 @@ class MD5 {
void Add(const vpx_image_t *img) {
for (int plane = 0; plane < 3; ++plane) {
uint8_t *buf = img->planes[plane];
const int h = plane ? (img->d_h + 1) >> 1 : img->d_h;
const int w = plane ? (img->d_w + 1) >> 1 : img->d_w;
const uint8_t *buf = img->planes[plane];
// Calculate the width and height to do the md5 check. For the chroma
// plane, we never want to round down and thus skip a pixel so if
// we are shifting by 1 (chroma_shift) we add 1 before doing the shift.
// This works only for chroma_shift of 0 and 1.
const int h = plane ? (img->d_h + img->y_chroma_shift) >>
img->y_chroma_shift : img->d_h;
const int w = plane ? (img->d_w + img->x_chroma_shift) >>
img->x_chroma_shift : img->d_w;
for (int y = 0; y < h; ++y) {
MD5Update(&md5_, buf, w);
@@ -61,4 +67,4 @@ class MD5 {
} // namespace libvpx_test
#endif // LIBVPX_TEST_MD5_HELPER_H_
#endif // TEST_MD5_HELPER_H_

View File

@@ -11,8 +11,8 @@
#include "test/register_state_check.h"
#include "third_party/googletest/src/include/gtest/gtest.h"
extern "C" {
#include "vpx_config.h"
#include "vp8_rtcd.h"
#include "./vpx_config.h"
#include "./vp8_rtcd.h"
#include "vpx/vpx_integer.h"
#include "vpx_mem/vpx_mem.h"
}
@@ -63,7 +63,8 @@ TEST_P(Vp8PostProcessingFilterTest, FilterOutputCheck) {
// Pointers to top-left pixel of block in the input and output images.
uint8_t *const src_image_ptr = src_image + (input_stride << 1);
uint8_t *const dst_image_ptr = dst_image + 8;
uint8_t *const flimits = reinterpret_cast<uint8_t *>(vpx_memalign(16, block_width));
uint8_t *const flimits =
reinterpret_cast<uint8_t *>(vpx_memalign(16, block_width));
(void)vpx_memset(flimits, 255, block_width);
// Initialize pixels in the input:

View File

@@ -8,8 +8,8 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef LIBVPX_TEST_REGISTER_STATE_CHECK_H_
#define LIBVPX_TEST_REGISTER_STATE_CHECK_H_
#ifndef TEST_REGISTER_STATE_CHECK_H_
#define TEST_REGISTER_STATE_CHECK_H_
#ifdef _WIN64
@@ -92,4 +92,4 @@ class RegisterStateCheck {};
#endif // _WIN64
#endif // LIBVPX_TEST_REGISTER_STATE_CHECK_H_
#endif // TEST_REGISTER_STATE_CHECK_H_

View File

@@ -16,8 +16,68 @@
#include "test/video_source.h"
#include "test/util.h"
// Enable(1) or Disable(0) writing of the compressed bitstream.
#define WRITE_COMPRESSED_STREAM 0
namespace {
#if WRITE_COMPRESSED_STREAM
static void mem_put_le16(char *const mem, const unsigned int val) {
mem[0] = val;
mem[1] = val >> 8;
}
static void mem_put_le32(char *const mem, const unsigned int val) {
mem[0] = val;
mem[1] = val >> 8;
mem[2] = val >> 16;
mem[3] = val >> 24;
}
static void write_ivf_file_header(const vpx_codec_enc_cfg_t *const cfg,
int frame_cnt, FILE *const outfile) {
char header[32];
header[0] = 'D';
header[1] = 'K';
header[2] = 'I';
header[3] = 'F';
mem_put_le16(header + 4, 0); /* version */
mem_put_le16(header + 6, 32); /* headersize */
mem_put_le32(header + 8, 0x30395056); /* fourcc (vp9) */
mem_put_le16(header + 12, cfg->g_w); /* width */
mem_put_le16(header + 14, cfg->g_h); /* height */
mem_put_le32(header + 16, cfg->g_timebase.den); /* rate */
mem_put_le32(header + 20, cfg->g_timebase.num); /* scale */
mem_put_le32(header + 24, frame_cnt); /* length */
mem_put_le32(header + 28, 0); /* unused */
(void)fwrite(header, 1, 32, outfile);
}
static void write_ivf_frame_size(FILE *const outfile, const size_t size) {
char header[4];
mem_put_le32(header, static_cast<unsigned int>(size));
(void)fwrite(header, 1, 4, outfile);
}
static void write_ivf_frame_header(const vpx_codec_cx_pkt_t *const pkt,
FILE *const outfile) {
char header[12];
vpx_codec_pts_t pts;
if (pkt->kind != VPX_CODEC_CX_FRAME_PKT)
return;
pts = pkt->data.frame.pts;
mem_put_le32(header, static_cast<unsigned int>(pkt->data.frame.sz));
mem_put_le32(header + 4, pts & 0xFFFFFFFF);
mem_put_le32(header + 8, pts >> 32);
(void)fwrite(header, 1, 12, outfile);
}
#endif // WRITE_COMPRESSED_STREAM
const unsigned int kInitialWidth = 320;
const unsigned int kInitialHeight = 240;
@@ -42,6 +102,8 @@ class ResizingVideoSource : public ::libvpx_test::DummyVideoSource {
limit_ = 60;
}
virtual ~ResizingVideoSource() {}
protected:
virtual void Next() {
++frame_;
@@ -56,13 +118,15 @@ class ResizeTest : public ::libvpx_test::EncoderTest,
protected:
ResizeTest() : EncoderTest(GET_PARAM(0)) {}
virtual ~ResizeTest() {}
struct FrameInfo {
FrameInfo(vpx_codec_pts_t _pts, unsigned int _w, unsigned int _h)
: pts(_pts), w(_w), h(_h) {}
vpx_codec_pts_t pts;
unsigned int w;
unsigned int h;
unsigned int w;
unsigned int h;
};
virtual void SetUp() {
@@ -70,10 +134,6 @@ class ResizeTest : public ::libvpx_test::EncoderTest,
SetMode(GET_PARAM(1));
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void DecompressedFrameHook(const vpx_image_t &img,
vpx_codec_pts_t pts) {
frame_info_list_.push_back(FrameInfo(pts, img.d_w, img.d_h));
@@ -99,17 +159,47 @@ TEST_P(ResizeTest, TestExternalResizeWorks) {
}
}
const unsigned int kStepDownFrame = 3;
const unsigned int kStepUpFrame = 6;
class ResizeInternalTest : public ResizeTest {
protected:
#if WRITE_COMPRESSED_STREAM
ResizeInternalTest()
: ResizeTest(),
frame0_psnr_(0.0),
outfile_(NULL),
out_frames_(0) {}
#else
ResizeInternalTest() : ResizeTest(), frame0_psnr_(0.0) {}
#endif
virtual ~ResizeInternalTest() {}
virtual void BeginPassHook(unsigned int /*pass*/) {
#if WRITE_COMPRESSED_STREAM
outfile_ = fopen("vp90-2-05-resize.ivf", "wb");
#endif
}
virtual void EndPassHook() {
#if WRITE_COMPRESSED_STREAM
if (outfile_) {
if (!fseek(outfile_, 0, SEEK_SET))
write_ivf_file_header(&cfg_, out_frames_, outfile_);
fclose(outfile_);
outfile_ = NULL;
}
#endif
}
virtual void PreEncodeFrameHook(libvpx_test::VideoSource *video,
libvpx_test::Encoder *encoder) {
if (video->frame() == 3) {
if (video->frame() == kStepDownFrame) {
struct vpx_scaling_mode mode = {VP8E_FOURFIVE, VP8E_THREEFIVE};
encoder->Control(VP8E_SET_SCALEMODE, &mode);
}
if (video->frame() == 6) {
if (video->frame() == kStepUpFrame) {
struct vpx_scaling_mode mode = {VP8E_NORMAL, VP8E_NORMAL};
encoder->Control(VP8E_SET_SCALEMODE, &mode);
}
@@ -121,21 +211,46 @@ class ResizeInternalTest : public ResizeTest {
EXPECT_NEAR(pkt->data.psnr.psnr[0], frame0_psnr_, 1.0);
}
virtual void FramePktHook(const vpx_codec_cx_pkt_t *pkt) {
#if WRITE_COMPRESSED_STREAM
++out_frames_;
// Write initial file header if first frame.
if (pkt->data.frame.pts == 0)
write_ivf_file_header(&cfg_, 0, outfile_);
// Write frame header and data.
write_ivf_frame_header(pkt, outfile_);
(void)fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, outfile_);
#endif
}
double frame0_psnr_;
#if WRITE_COMPRESSED_STREAM
FILE *outfile_;
unsigned int out_frames_;
#endif
};
TEST_P(ResizeInternalTest, TestInternalResizeWorks) {
::libvpx_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352, 288,
30, 1, 0, 10);
init_flags_ = VPX_CODEC_USE_PSNR;
// q picked such that initial keyframe on this clip is ~30dB PSNR
cfg_.rc_min_quantizer = cfg_.rc_max_quantizer = 48;
// If the number of frames being encoded is smaller than g_lag_in_frames
// the encoded frame is unavailable using the current API. Comparing
// frames to detect mismatch would then not be possible. Set
// g_lag_in_frames = 0 to get around this.
cfg_.g_lag_in_frames = 0;
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
for (std::vector<FrameInfo>::iterator info = frame_info_list_.begin();
info != frame_info_list_.end(); ++info) {
const vpx_codec_pts_t pts = info->pts;
if (pts >= 3 && pts < 6) {
if (pts >= kStepDownFrame && pts < kStepUpFrame) {
ASSERT_EQ(282U, info->w) << "Frame " << pts << " had unexpected width";
ASSERT_EQ(173U, info->h) << "Frame " << pts << " had unexpected height";
} else {

View File

@@ -17,7 +17,6 @@ extern "C" {
#include "./vpx_config.h"
#if CONFIG_VP8_ENCODER
#include "./vp8_rtcd.h"
//#include "vp8/common/blockd.h"
#endif
#if CONFIG_VP9_ENCODER
#include "./vp9_rtcd.h"
@@ -428,6 +427,7 @@ INSTANTIATE_TEST_CASE_P(MMX, SADTest, ::testing::ValuesIn(mmx_tests));
#if HAVE_SSE
#if CONFIG_VP9_ENCODER
#if CONFIG_USE_X86INC
const sad_m_by_n_fn_t sad_4x4_sse_vp9 = vp9_sad4x4_sse;
const sad_m_by_n_fn_t sad_4x8_sse_vp9 = vp9_sad4x8_sse;
INSTANTIATE_TEST_CASE_P(SSE, SADTest, ::testing::Values(
@@ -441,6 +441,7 @@ INSTANTIATE_TEST_CASE_P(SSE, SADx4Test, ::testing::Values(
make_tuple(4, 4, sad_4x4x4d_sse)));
#endif
#endif
#endif
#if HAVE_SSE2
#if CONFIG_VP8_ENCODER
@@ -451,14 +452,20 @@ const sad_m_by_n_fn_t sad_8x8_wmt = vp8_sad8x8_wmt;
const sad_m_by_n_fn_t sad_4x4_wmt = vp8_sad4x4_wmt;
#endif
#if CONFIG_VP9_ENCODER
#if CONFIG_USE_X86INC
const sad_m_by_n_fn_t sad_64x64_sse2_vp9 = vp9_sad64x64_sse2;
const sad_m_by_n_fn_t sad_64x32_sse2_vp9 = vp9_sad64x32_sse2;
const sad_m_by_n_fn_t sad_32x64_sse2_vp9 = vp9_sad32x64_sse2;
const sad_m_by_n_fn_t sad_32x32_sse2_vp9 = vp9_sad32x32_sse2;
const sad_m_by_n_fn_t sad_32x16_sse2_vp9 = vp9_sad32x16_sse2;
const sad_m_by_n_fn_t sad_16x32_sse2_vp9 = vp9_sad16x32_sse2;
const sad_m_by_n_fn_t sad_16x16_sse2_vp9 = vp9_sad16x16_sse2;
const sad_m_by_n_fn_t sad_8x16_sse2_vp9 = vp9_sad8x16_sse2;
const sad_m_by_n_fn_t sad_16x8_sse2_vp9 = vp9_sad16x8_sse2;
const sad_m_by_n_fn_t sad_8x16_sse2_vp9 = vp9_sad8x16_sse2;
const sad_m_by_n_fn_t sad_8x8_sse2_vp9 = vp9_sad8x8_sse2;
const sad_m_by_n_fn_t sad_8x4_sse2_vp9 = vp9_sad8x4_sse2;
#endif
#endif
const sad_m_by_n_test_param_t sse2_tests[] = {
#if CONFIG_VP8_ENCODER
make_tuple(16, 16, sad_16x16_wmt),
@@ -468,18 +475,25 @@ const sad_m_by_n_test_param_t sse2_tests[] = {
make_tuple(4, 4, sad_4x4_wmt),
#endif
#if CONFIG_VP9_ENCODER
#if CONFIG_USE_X86INC
make_tuple(64, 64, sad_64x64_sse2_vp9),
make_tuple(64, 32, sad_64x32_sse2_vp9),
make_tuple(32, 64, sad_32x64_sse2_vp9),
make_tuple(32, 32, sad_32x32_sse2_vp9),
make_tuple(32, 16, sad_32x16_sse2_vp9),
make_tuple(16, 32, sad_16x32_sse2_vp9),
make_tuple(16, 16, sad_16x16_sse2_vp9),
make_tuple(8, 16, sad_8x16_sse2_vp9),
make_tuple(16, 8, sad_16x8_sse2_vp9),
make_tuple(8, 16, sad_8x16_sse2_vp9),
make_tuple(8, 8, sad_8x8_sse2_vp9),
make_tuple(8, 4, sad_8x4_sse2_vp9),
#endif
#endif
};
INSTANTIATE_TEST_CASE_P(SSE2, SADTest, ::testing::ValuesIn(sse2_tests));
#if CONFIG_VP9_ENCODER
#if CONFIG_USE_X86INC
const sad_n_by_n_by_4_fn_t sad_64x64x4d_sse2 = vp9_sad64x64x4d_sse2;
const sad_n_by_n_by_4_fn_t sad_64x32x4d_sse2 = vp9_sad64x32x4d_sse2;
const sad_n_by_n_by_4_fn_t sad_32x64x4d_sse2 = vp9_sad32x64x4d_sse2;
@@ -505,6 +519,7 @@ INSTANTIATE_TEST_CASE_P(SSE2, SADx4Test, ::testing::Values(
make_tuple(8, 4, sad_8x4x4d_sse2)));
#endif
#endif
#endif
#if HAVE_SSE3
#if CONFIG_VP8_ENCODER
@@ -523,9 +538,11 @@ INSTANTIATE_TEST_CASE_P(SSE3, SADx4Test, ::testing::Values(
#endif
#if HAVE_SSSE3
#if CONFIG_USE_X86INC
const sad_m_by_n_fn_t sad_16x16_sse3 = vp8_sad16x16_sse3;
INSTANTIATE_TEST_CASE_P(SSE3, SADTest, ::testing::Values(
make_tuple(16, 16, sad_16x16_sse3)));
#endif
#endif
} // namespace

View File

@@ -17,15 +17,19 @@
#include <sys/types.h>
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "test/acm_random.h"
#include "vpx/vpx_integer.h"
#include "vpx_mem/vpx_mem.h"
extern "C" {
#include "vp8/encoder/onyx_int.h"
}
using libvpx_test::ACMRandom;
namespace {
TEST(Vp8RoiMapTest, ParameterCheck) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
int delta_q[MAX_MB_SEGMENTS] = { -2, -25, 0, 31 };
int delta_lf[MAX_MB_SEGMENTS] = { -2, -25, 0, 31 };
unsigned int threshold[MAX_MB_SEGMENTS] = { 0, 100, 200, 300 };
@@ -121,10 +125,10 @@ TEST(Vp8RoiMapTest, ParameterCheck) {
for (int i = 0; i < 1000; ++i) {
int rand_deltas[4];
int deltas_valid;
rand_deltas[0] = (rand() % 160) - 80;
rand_deltas[1] = (rand() % 160) - 80;
rand_deltas[2] = (rand() % 160) - 80;
rand_deltas[3] = (rand() % 160) - 80;
rand_deltas[0] = rnd(160) - 80;
rand_deltas[1] = rnd(160) - 80;
rand_deltas[2] = rnd(160) - 80;
rand_deltas[3] = rnd(160) - 80;
deltas_valid = ((abs(rand_deltas[0]) <= 63) &&
(abs(rand_deltas[1]) <= 63) &&

View File

@@ -13,8 +13,8 @@
#include "test/clear_system_state.h"
#include "test/register_state_check.h"
extern "C" {
#include "vpx_config.h"
#include "vp8_rtcd.h"
#include "./vpx_config.h"
#include "./vp8_rtcd.h"
#include "vp8/common/blockd.h"
#include "vp8/encoder/block.h"
#include "vpx_mem/vpx_mem.h"
@@ -51,7 +51,7 @@ TEST_P(SubtractBlockTest, SimpleSubtract) {
bd.predictor = reinterpret_cast<unsigned char*>(
vpx_memalign(16, kBlockHeight * kDiffPredStride * sizeof(*bd.predictor)));
for(int i = 0; kSrcStride[i] > 0; ++i) {
for (int i = 0; kSrcStride[i] > 0; ++i) {
// start at block0
be.src = 0;
be.base_src = &source;
@@ -61,7 +61,7 @@ TEST_P(SubtractBlockTest, SimpleSubtract) {
int16_t *src_diff = be.src_diff;
for (int r = 0; r < kBlockHeight; ++r) {
for (int c = 0; c < kBlockWidth; ++c) {
src_diff[c] = 0xa5a5;
src_diff[c] = static_cast<int16_t>(0xa5a5);
}
src_diff += kDiffPredStride;
}

View File

@@ -33,10 +33,6 @@ class SuperframeTest : public ::libvpx_test::EncoderTest,
delete[] modified_buf_;
}
virtual bool Continue() const {
return !HasFatalFailure() && !abort_;
}
virtual void PreEncodeFrameHook(libvpx_test::VideoSource *video,
libvpx_test::Encoder *encoder) {
if (video->frame() == 1) {

View File

@@ -520,3 +520,12 @@ d17bc08eedfc60c4c23d576a6c964a21bf854d1f vp90-2-03-size-226x202.webm
83c6d8f2969b759e10e5c6542baca1265c874c29 vp90-2-03-size-226x224.webm.md5
fe0af2ee47b1e5f6a66db369e2d7e9d870b38dce vp90-2-03-size-226x226.webm
94ad19b8b699cea105e2ff18f0df2afd7242bcf7 vp90-2-03-size-226x226.webm.md5
b6524e4084d15b5d0caaa3d3d1368db30cbee69c vp90-2-03-deltaq.webm
65f45ec9a55537aac76104818278e0978f94a678 vp90-2-03-deltaq.webm.md5
4dbb87494c7f565ffc266c98d17d0d8c7a5c5aba vp90-2-05-resize.ivf
7f6d8879336239a43dbb6c9f13178cb11cf7ed09 vp90-2-05-resize.ivf.md5
bf61ddc1f716eba58d4c9837d4e91031d9ce4ffe vp90-2-06-bilinear.webm
f6235f937552e11d8eb331ec55da6b3aa596b9ac vp90-2-06-bilinear.webm.md5
495256cfd123fe777b2c0406862ed8468a1f4677 vp91-2-04-yv444.webm
65e3a7ffef61ab340d9140f335ecc49125970c2c vp91-2-04-yv444.webm.md5

View File

@@ -24,7 +24,9 @@ LIBVPX_TEST_SRCS-$(CONFIG_ENCODERS) += error_resilience_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_ENCODERS) += i420_video_source.h
LIBVPX_TEST_SRCS-$(CONFIG_VP8_ENCODER) += keyframe_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP9_ENCODER) += borders_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP8_ENCODER) += resize_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP9_ENCODER) += resize_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP9_ENCODER) += cpu_speed_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP9_ENCODER) += vp9_lossless_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_DECODERS) += ../md5_utils.h ../md5_utils.c
LIBVPX_TEST_SRCS-yes += decode_test_driver.cc
@@ -87,6 +89,7 @@ LIBVPX_TEST_SRCS-yes += tile_independence_test.cc
endif
LIBVPX_TEST_SRCS-$(CONFIG_VP9) += convolve_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP9_DECODER) += vp9_thread_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP9_ENCODER) += fdct4x4_test.cc
LIBVPX_TEST_SRCS-$(CONFIG_VP9_ENCODER) += fdct8x8_test.cc
@@ -626,3 +629,11 @@ LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-03-size-226x224.webm
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-03-size-226x224.webm.md5
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-03-size-226x226.webm
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-03-size-226x226.webm.md5
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-03-deltaq.webm
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-03-deltaq.webm.md5
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-05-resize.ivf
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-05-resize.ivf.md5
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-06-bilinear.webm
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp90-2-06-bilinear.webm.md5
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp91-2-04-yv444.webm
LIBVPX_TEST_DATA-$(CONFIG_VP9_DECODER) += vp91-2-04-yv444.webm.md5

View File

@@ -8,7 +8,7 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include <string>
#include "vpx_config.h"
#include "./vpx_config.h"
extern "C" {
#if ARCH_X86 || ARCH_X86_64
#include "vpx_ports/x86.h"
@@ -48,7 +48,9 @@ int main(int argc, char **argv) {
#endif
#if !CONFIG_SHARED
/* Shared library builds don't support whitebox tests that exercise internal symbols. */
// Shared library builds don't support whitebox tests
// that exercise internal symbols.
#if CONFIG_VP8
vp8_rtcd();
#endif

View File

@@ -159,7 +159,11 @@ const char *kVP9TestVectors[] = {
"vp90-2-03-size-226x198.webm", "vp90-2-03-size-226x200.webm",
"vp90-2-03-size-226x202.webm", "vp90-2-03-size-226x208.webm",
"vp90-2-03-size-226x210.webm", "vp90-2-03-size-226x224.webm",
"vp90-2-03-size-226x226.webm"
"vp90-2-03-size-226x226.webm", "vp90-2-03-deltaq.webm",
"vp90-2-05-resize.ivf", "vp90-2-06-bilinear.webm",
#if CONFIG_NON420
"vp91-2-04-yv444.webm"
#endif
};
#endif
@@ -181,6 +185,7 @@ class TestVectorTest : public ::libvpx_test::DecoderTest,
virtual void DecompressedFrameHook(const vpx_image_t& img,
const unsigned int frame_number) {
ASSERT_TRUE(md5_file_ != NULL);
char expected_md5[33];
char junk[128];

View File

@@ -23,10 +23,13 @@ extern "C" {
namespace {
class TileIndependenceTest : public ::libvpx_test::EncoderTest,
public ::libvpx_test::CodecTestWithParam<int> {
public ::libvpx_test::CodecTestWithParam<int> {
protected:
TileIndependenceTest() : EncoderTest(GET_PARAM(0)), n_tiles_(GET_PARAM(1)),
md5_fw_order_(), md5_inv_order_() {
TileIndependenceTest()
: EncoderTest(GET_PARAM(0)),
md5_fw_order_(),
md5_inv_order_(),
n_tiles_(GET_PARAM(1)) {
init_flags_ = VPX_CODEC_USE_PSNR;
vpx_codec_dec_cfg_t cfg;
cfg.w = 704;
@@ -56,9 +59,8 @@ class TileIndependenceTest : public ::libvpx_test::EncoderTest,
void UpdateMD5(::libvpx_test::Decoder *dec, const vpx_codec_cx_pkt_t *pkt,
::libvpx_test::MD5 *md5) {
const vpx_codec_err_t res =
dec->DecodeFrame(reinterpret_cast<uint8_t*>(pkt->data.frame.buf),
pkt->data.frame.sz);
const vpx_codec_err_t res = dec->DecodeFrame(
reinterpret_cast<uint8_t*>(pkt->data.frame.buf), pkt->data.frame.sz);
if (res != VPX_CODEC_OK) {
abort_ = true;
ASSERT_EQ(VPX_CODEC_OK, res);
@@ -72,11 +74,11 @@ class TileIndependenceTest : public ::libvpx_test::EncoderTest,
UpdateMD5(inv_dec_, pkt, &md5_inv_order_);
}
private:
int n_tiles_;
protected:
::libvpx_test::MD5 md5_fw_order_, md5_inv_order_;
::libvpx_test::Decoder *fw_dec_, *inv_dec_;
private:
int n_tiles_;
};
// run an encode with 2 or 4 tiles, and do the decode both in normal and
@@ -93,7 +95,7 @@ TEST_P(TileIndependenceTest, MD5Match) {
timebase.den, timebase.num, 0, 30);
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
const char *md5_fw_str = md5_fw_order_.Get();
const char *md5_fw_str = md5_fw_order_.Get();
const char *md5_inv_str = md5_inv_order_.Get();
// could use ASSERT_EQ(!memcmp(.., .., 16) here, but this gives nicer
@@ -102,7 +104,6 @@ TEST_P(TileIndependenceTest, MD5Match) {
ASSERT_STREQ(md5_fw_str, md5_inv_str);
}
VP9_INSTANTIATE_TEST_CASE(TileIndependenceTest,
::testing::Range(0, 2, 1));
VP9_INSTANTIATE_TEST_CASE(TileIndependenceTest, ::testing::Range(0, 2, 1));
} // namespace

View File

@@ -37,7 +37,7 @@ static double compute_psnr(const vpx_image_t *img1,
img2->planes[VPX_PLANE_Y][i * img2->stride[VPX_PLANE_Y] + j];
sqrerr += d * d;
}
double mse = sqrerr / (width_y * height_y);
double mse = static_cast<double>(sqrerr) / (width_y * height_y);
double psnr = 100.0;
if (mse > 0.0) {
psnr = 10 * log10(255.0 * 255.0 / mse);

View File

@@ -16,16 +16,16 @@
#include "test/register_state_check.h"
#include "vpx/vpx_integer.h"
#include "vpx_config.h"
#include "./vpx_config.h"
extern "C" {
#include "vpx_mem/vpx_mem.h"
#if CONFIG_VP8_ENCODER
# include "vp8/common/variance.h"
# include "vp8_rtcd.h"
# include "./vp8_rtcd.h"
#endif
#if CONFIG_VP9_ENCODER
# include "vp9/encoder/vp9_variance.h"
# include "vp9_rtcd.h"
# include "./vp9_rtcd.h"
#endif
}
#include "test/acm_random.h"
@@ -107,8 +107,8 @@ static unsigned int subpel_avg_variance_ref(const uint8_t *ref,
}
template<typename VarianceFunctionType>
class VarianceTest :
public ::testing::TestWithParam<tuple<int, int, VarianceFunctionType> > {
class VarianceTest
: public ::testing::TestWithParam<tuple<int, int, VarianceFunctionType> > {
public:
virtual void SetUp() {
const tuple<int, int, VarianceFunctionType>& params = this->GetParam();
@@ -191,9 +191,9 @@ void VarianceTest<VarianceFunctionType>::OneQuarterTest() {
}
template<typename SubpelVarianceFunctionType>
class SubpelVarianceTest :
public ::testing::TestWithParam<tuple<int, int,
SubpelVarianceFunctionType> > {
class SubpelVarianceTest
: public ::testing::TestWithParam<tuple<int, int,
SubpelVarianceFunctionType> > {
public:
virtual void SetUp() {
const tuple<int, int, SubpelVarianceFunctionType>& params =
@@ -218,6 +218,7 @@ class SubpelVarianceTest :
vpx_free(src_);
delete[] ref_;
vpx_free(sec_);
libvpx_test::ClearSystemState();
}
protected:
@@ -482,6 +483,7 @@ INSTANTIATE_TEST_CASE_P(
#endif
#if HAVE_SSE2
#if CONFIG_USE_X86INC
const vp9_variance_fn_t variance4x4_sse2 = vp9_variance4x4_sse2;
const vp9_variance_fn_t variance4x8_sse2 = vp9_variance4x8_sse2;
const vp9_variance_fn_t variance8x4_sse2 = vp9_variance8x4_sse2;
@@ -595,8 +597,11 @@ INSTANTIATE_TEST_CASE_P(
make_tuple(6, 5, subpel_avg_variance64x32_sse2),
make_tuple(6, 6, subpel_avg_variance64x64_sse2)));
#endif
#endif
#if HAVE_SSSE3
#if CONFIG_USE_X86INC
const vp9_subpixvariance_fn_t subpel_variance4x4_ssse3 =
vp9_sub_pixel_variance4x4_ssse3;
const vp9_subpixvariance_fn_t subpel_variance4x8_ssse3 =
@@ -681,6 +686,7 @@ INSTANTIATE_TEST_CASE_P(
make_tuple(6, 5, subpel_avg_variance64x32_ssse3),
make_tuple(6, 6, subpel_avg_variance64x64_ssse3)));
#endif
#endif
#endif // CONFIG_VP9_ENCODER
} // namespace vp9

View File

@@ -8,10 +8,6 @@
* be found in the AUTHORS file in the root of the source tree.
*/
extern "C" {
#include "vp8/encoder/boolhuff.h"
#include "vp8/decoder/dboolhuff.h"
}
#include <math.h>
#include <stddef.h>
@@ -24,6 +20,11 @@ extern "C" {
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "vpx/vpx_integer.h"
extern "C" {
#include "vp8/encoder/boolhuff.h"
#include "vp8/decoder/dboolhuff.h"
}
namespace {
const int num_tests = 10;
@@ -44,7 +45,7 @@ void encrypt_buffer(uint8_t *buffer, int size) {
void test_decrypt_cb(void *decrypt_state, const uint8_t *input,
uint8_t *output, int count) {
int offset = input - (uint8_t *)decrypt_state;
int offset = input - reinterpret_cast<uint8_t *>(decrypt_state);
for (int i = 0; i < count; i++) {
output[i] = input[i] ^ secret_key[(offset + i) & 15];
}
@@ -58,10 +59,10 @@ TEST(VP8, TestBitIO) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
for (int n = 0; n < num_tests; ++n) {
for (int method = 0; method <= 7; ++method) { // we generate various proba
const int bits_to_test = 1000;
uint8_t probas[bits_to_test];
const int kBitsToTest = 1000;
uint8_t probas[kBitsToTest];
for (int i = 0; i < bits_to_test; ++i) {
for (int i = 0; i < kBitsToTest; ++i) {
const int parity = i & 1;
probas[i] =
(method == 0) ? 0 : (method == 1) ? 255 :
@@ -76,14 +77,14 @@ TEST(VP8, TestBitIO) {
}
for (int bit_method = 0; bit_method <= 3; ++bit_method) {
const int random_seed = 6432;
const int buffer_size = 10000;
const int kBufferSize = 10000;
ACMRandom bit_rnd(random_seed);
BOOL_CODER bw;
uint8_t bw_buffer[buffer_size];
vp8_start_encode(&bw, bw_buffer, bw_buffer + buffer_size);
uint8_t bw_buffer[kBufferSize];
vp8_start_encode(&bw, bw_buffer, bw_buffer + kBufferSize);
int bit = (bit_method == 0) ? 0 : (bit_method == 1) ? 1 : 0;
for (int i = 0; i < bits_to_test; ++i) {
for (int i = 0; i < kBitsToTest; ++i) {
if (bit_method == 2) {
bit = (i & 1);
} else if (bit_method == 3) {
@@ -98,19 +99,20 @@ TEST(VP8, TestBitIO) {
#if CONFIG_DECRYPT
encrypt_buffer(bw_buffer, buffer_size);
vp8dx_start_decode(&br, bw_buffer, buffer_size,
test_decrypt_cb, (void *)bw_buffer);
test_decrypt_cb,
reinterpret_cast<void *>(bw_buffer));
#else
vp8dx_start_decode(&br, bw_buffer, buffer_size, NULL, NULL);
vp8dx_start_decode(&br, bw_buffer, kBufferSize, NULL, NULL);
#endif
bit_rnd.Reset(random_seed);
for (int i = 0; i < bits_to_test; ++i) {
for (int i = 0; i < kBitsToTest; ++i) {
if (bit_method == 2) {
bit = (i & 1);
} else if (bit_method == 3) {
bit = bit_rnd(2);
}
GTEST_ASSERT_EQ(vp8dx_decode_bool(&br, probas[i]), bit)
<< "pos: "<< i << " / " << bits_to_test
<< "pos: "<< i << " / " << kBitsToTest
<< " bit_method: " << bit_method
<< " method: " << method;
}

View File

@@ -26,7 +26,8 @@ const uint8_t test_key[16] = {
0x89, 0x9a, 0xab, 0xbc, 0xcd, 0xde, 0xef, 0xf0
};
void encrypt_buffer(const uint8_t *src, uint8_t *dst, int size, int offset = 0) {
void encrypt_buffer(const uint8_t *src, uint8_t *dst,
int size, int offset = 0) {
for (int i = 0; i < size; ++i) {
dst[i] = src[i] ^ test_key[(offset + i) & 15];
}
@@ -34,10 +35,11 @@ void encrypt_buffer(const uint8_t *src, uint8_t *dst, int size, int offset = 0)
void test_decrypt_cb(void *decrypt_state, const uint8_t *input,
uint8_t *output, int count) {
encrypt_buffer(input, output, count, input - (uint8_t *)decrypt_state);
encrypt_buffer(input, output, count,
input - reinterpret_cast<uint8_t *>(decrypt_state));
}
} // namespace
} // namespace
namespace libvpx_test {

View File

@@ -18,7 +18,7 @@
extern "C" {
#include "vp8_rtcd.h"
#include "./vp8_rtcd.h"
}
#include "test/acm_random.h"

View File

@@ -19,7 +19,7 @@ extern "C" {
#include "vp9/decoder/vp9_dboolhuff.h"
}
#include "acm_random.h"
#include "test/acm_random.h"
#include "vpx/vpx_integer.h"
using libvpx_test::ACMRandom;
@@ -32,10 +32,10 @@ TEST(VP9, TestBitIO) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
for (int n = 0; n < num_tests; ++n) {
for (int method = 0; method <= 7; ++method) { // we generate various proba
const int bits_to_test = 1000;
uint8_t probas[bits_to_test];
const int kBitsToTest = 1000;
uint8_t probas[kBitsToTest];
for (int i = 0; i < bits_to_test; ++i) {
for (int i = 0; i < kBitsToTest; ++i) {
const int parity = i & 1;
probas[i] =
(method == 0) ? 0 : (method == 1) ? 255 :
@@ -50,14 +50,14 @@ TEST(VP9, TestBitIO) {
}
for (int bit_method = 0; bit_method <= 3; ++bit_method) {
const int random_seed = 6432;
const int buffer_size = 10000;
const int kBufferSize = 10000;
ACMRandom bit_rnd(random_seed);
vp9_writer bw;
uint8_t bw_buffer[buffer_size];
uint8_t bw_buffer[kBufferSize];
vp9_start_encode(&bw, bw_buffer);
int bit = (bit_method == 0) ? 0 : (bit_method == 1) ? 1 : 0;
for (int i = 0; i < bits_to_test; ++i) {
for (int i = 0; i < kBitsToTest; ++i) {
if (bit_method == 2) {
bit = (i & 1);
} else if (bit_method == 3) {
@@ -72,16 +72,16 @@ TEST(VP9, TestBitIO) {
GTEST_ASSERT_EQ(bw_buffer[0] & 0x80, 0);
vp9_reader br;
vp9_reader_init(&br, bw_buffer, buffer_size);
vp9_reader_init(&br, bw_buffer, kBufferSize);
bit_rnd.Reset(random_seed);
for (int i = 0; i < bits_to_test; ++i) {
for (int i = 0; i < kBitsToTest; ++i) {
if (bit_method == 2) {
bit = (i & 1);
} else if (bit_method == 3) {
bit = bit_rnd(2);
}
GTEST_ASSERT_EQ(vp9_read(&br, probas[i]), bit)
<< "pos: " << i << " / " << bits_to_test
<< "pos: " << i << " / " << kBitsToTest
<< " bit_method: " << bit_method
<< " method: " << method;
}

75
test/vp9_lossless_test.cc Normal file
View File

@@ -0,0 +1,75 @@
/*
Copyright (c) 2012 The WebM project authors. All Rights Reserved.
Use of this source code is governed by a BSD-style license
that can be found in the LICENSE file in the root of the source
tree. An additional intellectual property rights grant can be found
in the file PATENTS. All contributing project authors may
be found in the AUTHORS file in the root of the source tree.
*/
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "test/codec_factory.h"
#include "test/encode_test_driver.h"
#include "test/i420_video_source.h"
#include "test/util.h"
namespace {
const int kMaxPsnr = 100;
class LossLessTest : public ::libvpx_test::EncoderTest,
public ::libvpx_test::CodecTestWithParam<libvpx_test::TestMode> {
protected:
LossLessTest() : EncoderTest(GET_PARAM(0)),
psnr_(kMaxPsnr),
nframes_(0),
encoding_mode_(GET_PARAM(1)) {
}
virtual ~LossLessTest() {}
virtual void SetUp() {
InitializeConfig();
SetMode(encoding_mode_);
}
virtual void BeginPassHook(unsigned int /*pass*/) {
psnr_ = 0.0;
nframes_ = 0;
}
virtual void PSNRPktHook(const vpx_codec_cx_pkt_t *pkt) {
if (pkt->data.psnr.psnr[0] < psnr_)
psnr_= pkt->data.psnr.psnr[0];
}
double GetMinPsnr() const {
return psnr_;
}
private:
double psnr_;
unsigned int nframes_;
libvpx_test::TestMode encoding_mode_;
};
TEST_P(LossLessTest, TestLossLessEncoding) {
const vpx_rational timebase = { 33333333, 1000000000 };
cfg_.g_timebase = timebase;
cfg_.rc_target_bitrate = 2000;
cfg_.g_lag_in_frames = 25;
cfg_.rc_min_quantizer = 0;
cfg_.rc_max_quantizer = 0;
init_flags_ = VPX_CODEC_USE_PSNR;
// intentionally changed the dimension for better testing coverage
libvpx_test::I420VideoSource video("hantro_collage_w352h288.yuv", 356, 284,
timebase.den, timebase.num, 0, 30);
const double psnr_lossless = GetMinPsnr();
EXPECT_GE(psnr_lossless, kMaxPsnr);
}
VP9_INSTANTIATE_TEST_CASE(LossLessTest, ALL_TEST_MODES);
} // namespace

View File

@@ -39,8 +39,8 @@ TEST_P(VP9SubtractBlockTest, SimpleSubtract) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
// FIXME(rbultje) split in its own file
for (BLOCK_SIZE_TYPE bsize = BLOCK_SIZE_AB4X4; bsize < BLOCK_SIZE_TYPES;
bsize = static_cast<BLOCK_SIZE_TYPE>(static_cast<int>(bsize) + 1)) {
for (BLOCK_SIZE bsize = BLOCK_4X4; bsize < BLOCK_SIZES;
bsize = static_cast<BLOCK_SIZE>(static_cast<int>(bsize) + 1)) {
const int block_width = 4 << b_width_log2(bsize);
const int block_height = 4 << b_height_log2(bsize);
int16_t *diff = reinterpret_cast<int16_t *>(
@@ -93,9 +93,8 @@ TEST_P(VP9SubtractBlockTest, SimpleSubtract) {
INSTANTIATE_TEST_CASE_P(C, VP9SubtractBlockTest,
::testing::Values(vp9_subtract_block_c));
#if HAVE_SSE2
#if HAVE_SSE2 && CONFIG_USE_X86INC
INSTANTIATE_TEST_CASE_P(SSE2, VP9SubtractBlockTest,
::testing::Values(vp9_subtract_block_sse2));
#endif
} // namespace vp9

109
test/vp9_thread_test.cc Normal file
View File

@@ -0,0 +1,109 @@
/*
* Copyright (c) 2013 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vp9/decoder/vp9_thread.h"
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "test/codec_factory.h"
#include "test/decode_test_driver.h"
#include "test/md5_helper.h"
#include "test/webm_video_source.h"
namespace {
class VP9WorkerThreadTest : public ::testing::Test {
protected:
virtual ~VP9WorkerThreadTest() {}
virtual void SetUp() {
vp9_worker_init(&worker_);
}
virtual void TearDown() {
vp9_worker_end(&worker_);
}
VP9Worker worker_;
};
int ThreadHook(void* data, void* return_value) {
int* const hook_data = reinterpret_cast<int*>(data);
*hook_data = 5;
return *reinterpret_cast<int*>(return_value);
}
TEST_F(VP9WorkerThreadTest, HookSuccess) {
EXPECT_TRUE(vp9_worker_sync(&worker_)); // should be a no-op.
for (int i = 0; i < 2; ++i) {
EXPECT_TRUE(vp9_worker_reset(&worker_));
int hook_data = 0;
int return_value = 1; // return successfully from the hook
worker_.hook = ThreadHook;
worker_.data1 = &hook_data;
worker_.data2 = &return_value;
vp9_worker_launch(&worker_);
EXPECT_TRUE(vp9_worker_sync(&worker_));
EXPECT_FALSE(worker_.had_error);
EXPECT_EQ(5, hook_data);
EXPECT_TRUE(vp9_worker_sync(&worker_)); // should be a no-op.
}
}
TEST_F(VP9WorkerThreadTest, HookFailure) {
EXPECT_TRUE(vp9_worker_reset(&worker_));
int hook_data = 0;
int return_value = 0; // return failure from the hook
worker_.hook = ThreadHook;
worker_.data1 = &hook_data;
worker_.data2 = &return_value;
vp9_worker_launch(&worker_);
EXPECT_FALSE(vp9_worker_sync(&worker_));
EXPECT_TRUE(worker_.had_error);
// Ensure _reset() clears the error and _launch() can be called again.
return_value = 1;
EXPECT_TRUE(vp9_worker_reset(&worker_));
EXPECT_FALSE(worker_.had_error);
vp9_worker_launch(&worker_);
EXPECT_TRUE(vp9_worker_sync(&worker_));
EXPECT_FALSE(worker_.had_error);
}
TEST(VP9DecodeMTTest, MTDecode) {
libvpx_test::WebMVideoSource video("vp90-2-03-size-226x226.webm");
video.Init();
vpx_codec_dec_cfg_t cfg = {0};
cfg.threads = 2;
libvpx_test::VP9Decoder decoder(cfg, 0);
libvpx_test::MD5 md5;
for (video.Begin(); video.cxdata(); video.Next()) {
const vpx_codec_err_t res =
decoder.DecodeFrame(video.cxdata(), video.frame_size());
ASSERT_EQ(VPX_CODEC_OK, res) << decoder.DecodeError();
libvpx_test::DxDataIterator dec_iter = decoder.GetDxData();
const vpx_image_t *img = NULL;
// Get decompressed data
while ((img = dec_iter.Next())) {
md5.Add(img);
}
}
EXPECT_STREQ("b35a1b707b28e82be025d960aba039bc", md5.Get());
}
} // namespace

View File

@@ -99,7 +99,7 @@ class WebMVideoSource : public CompressedVideoSource {
virtual void Begin() {
input_file_ = OpenTestDataFile(file_name_);
ASSERT_TRUE(input_file_) << "Input file open failed. Filename: "
ASSERT_TRUE(input_file_ != NULL) << "Input file open failed. Filename: "
<< file_name_;
nestegg_io io = {nestegg_read_cb, nestegg_seek_cb, nestegg_tell_cb,
@@ -130,6 +130,7 @@ class WebMVideoSource : public CompressedVideoSource {
}
void FillFrame() {
ASSERT_TRUE(input_file_ != NULL);
if (chunk_ >= chunks_) {
unsigned int track;

View File

@@ -1370,12 +1370,12 @@ static void ScaleFilterRows_SSSE3(uint8* dst_ptr, const uint8* src_ptr,
mov edx, [esp + 8 + 12] // src_stride
mov ecx, [esp + 8 + 16] // dst_width
mov eax, [esp + 8 + 20] // source_y_fraction (0..255)
shr eax, 1
cmp eax, 0
je xloop1
cmp eax, 128
cmp eax, 64
je xloop2
shr eax, 1
mov ah,al
neg al
add al, 128
@@ -2132,11 +2132,11 @@ void ScaleFilterRows_SSSE3(uint8* dst_ptr,
"mov 0x14(%esp),%edx \n"
"mov 0x18(%esp),%ecx \n"
"mov 0x1c(%esp),%eax \n"
"shr %eax \n"
"cmp $0x0,%eax \n"
"je 2f \n"
"cmp $0x80,%eax \n"
"cmp $0x40,%eax \n"
"je 3f \n"
"shr %eax \n"
"mov %al,%ah \n"
"neg %al \n"
"add $0x80,%al \n"
@@ -2662,6 +2662,7 @@ static void ScaleFilterRows_SSE2(uint8* dst_ptr,
static void ScaleFilterRows_SSSE3(uint8* dst_ptr,
const uint8* src_ptr, int src_stride,
int dst_width, int source_y_fraction) {
source_y_fraction >>= 1;
if (source_y_fraction == 0) {
asm volatile (
"1:"
@@ -2680,7 +2681,7 @@ static void ScaleFilterRows_SSSE3(uint8* dst_ptr,
: "memory", "cc", "rax"
);
return;
} else if (source_y_fraction == 128) {
} else if (source_y_fraction == 64) {
asm volatile (
"1:"
"movdqa (%1),%%xmm0 \n"
@@ -2703,7 +2704,6 @@ static void ScaleFilterRows_SSSE3(uint8* dst_ptr,
} else {
asm volatile (
"mov %3,%%eax \n"
"shr %%eax \n"
"mov %%al,%%ah \n"
"neg %%al \n"
"add $0x80,%%al \n"

View File

@@ -97,21 +97,91 @@
%endif
%endmacro
%if WIN64
%define PIC
%elifidn __OUTPUT_FORMAT__,macho64
%define PIC
%elif ARCH_X86_64 == 0
; x86_32 doesn't require PIC.
; Some distros prefer shared objects to be PIC, but nothing breaks if
; the code contains a few textrels, so we'll skip that complexity.
%undef PIC
%elif CONFIG_PIC
%define PIC
; PIC macros are copied from vpx_ports/x86_abi_support.asm. The "define PIC"
; from original code is added in for 64bit.
%ifidn __OUTPUT_FORMAT__,elf32
%define ABI_IS_32BIT 1
%elifidn __OUTPUT_FORMAT__,macho32
%define ABI_IS_32BIT 1
%elifidn __OUTPUT_FORMAT__,win32
%define ABI_IS_32BIT 1
%elifidn __OUTPUT_FORMAT__,aout
%define ABI_IS_32BIT 1
%else
%define ABI_IS_32BIT 0
%endif
%if ABI_IS_32BIT
%if CONFIG_PIC=1
%ifidn __OUTPUT_FORMAT__,elf32
%define GET_GOT_SAVE_ARG 1
%define WRT_PLT wrt ..plt
%macro GET_GOT 1
extern _GLOBAL_OFFSET_TABLE_
push %1
call %%get_got
%%sub_offset:
jmp %%exitGG
%%get_got:
mov %1, [esp]
add %1, _GLOBAL_OFFSET_TABLE_ + $$ - %%sub_offset wrt ..gotpc
ret
%%exitGG:
%undef GLOBAL
%define GLOBAL(x) x + %1 wrt ..gotoff
%undef RESTORE_GOT
%define RESTORE_GOT pop %1
%endmacro
%elifidn __OUTPUT_FORMAT__,macho32
%define GET_GOT_SAVE_ARG 1
%macro GET_GOT 1
push %1
call %%get_got
%%get_got:
pop %1
%undef GLOBAL
%define GLOBAL(x) x + %1 - %%get_got
%undef RESTORE_GOT
%define RESTORE_GOT pop %1
%endmacro
%endif
%endif
%if ARCH_X86_64 == 0
%undef PIC
%endif
%else
%macro GET_GOT 1
%endmacro
%define GLOBAL(x) rel x
%define WRT_PLT wrt ..plt
%if WIN64
%define PIC
%elifidn __OUTPUT_FORMAT__,macho64
%define PIC
%elif CONFIG_PIC
%define PIC
%endif
%endif
%ifnmacro GET_GOT
%macro GET_GOT 1
%endmacro
%define GLOBAL(x) x
%endif
%ifndef RESTORE_GOT
%define RESTORE_GOT
%endif
%ifndef WRT_PLT
%define WRT_PLT
%endif
%ifdef PIC
default rel
%endif
; Done with PIC macros
; Always use long nops (reduces 0x90 spam in disassembly on x86_32)
%ifndef __NASM_VER__
@@ -528,6 +598,10 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
global %1:function hidden
%elifidn __OUTPUT_FORMAT__,elf64
global %1:function hidden
%elifidn __OUTPUT_FORMAT__,macho32
global %1:private_extern
%elifidn __OUTPUT_FORMAT__,macho64
global %1:private_extern
%else
global %1
%endif

View File

@@ -173,7 +173,6 @@ void vp8_create_common(VP8_COMMON *oci)
oci->use_bilinear_mc_filter = 0;
oci->full_pixel = 0;
oci->multi_token_partition = ONE_PARTITION;
oci->clr_type = REG_YUV;
oci->clamp_type = RECON_CLAMP_REQUIRED;
/* Initialize reference frame sign bias structure to defaults */

View File

@@ -41,7 +41,8 @@ extern "C"
{
USAGE_STREAM_FROM_SERVER = 0x0,
USAGE_LOCAL_FILE_PLAYBACK = 0x1,
USAGE_CONSTRAINED_QUALITY = 0x2
USAGE_CONSTRAINED_QUALITY = 0x2,
USAGE_CONSTANT_QUALITY = 0x3
} END_USAGE;

View File

@@ -72,7 +72,6 @@ typedef struct VP8Common
int horiz_scale;
int vert_scale;
YUV_TYPE clr_type;
CLAMP_TYPE clamp_type;
YV12_BUFFER_CONFIG *frame_to_show;
@@ -115,9 +114,6 @@ typedef struct VP8Common
int uvdc_delta_q;
int uvac_delta_q;
unsigned int frames_since_golden;
unsigned int frames_till_alt_ref_frame;
/* We allocate a MODE_INFO struct for each macroblock, together with
an extra row on top and column on the left to simplify prediction. */
@@ -157,7 +153,6 @@ typedef struct VP8Common
unsigned int current_video_frame;
int near_boffset[3];
int version;
TOKEN_PARTITION multi_token_partition;
@@ -165,8 +160,10 @@ typedef struct VP8Common
#ifdef PACKET_TESTING
VP8_HEADER oh;
#endif
#if CONFIG_POSTPROC_VISUALIZER
double bitrate;
double framerate;
#endif
#if CONFIG_MULTITHREAD
int processor_core_count;

View File

@@ -923,7 +923,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
if (flags & VP8D_DEBUG_TXT_RATE_INFO)
{
char message[512];
sprintf(message, "Bitrate: %10.2f frame_rate: %10.2f ", oci->bitrate, oci->framerate);
sprintf(message, "Bitrate: %10.2f framerate: %10.2f ", oci->bitrate, oci->framerate);
vp8_blit_text(message, oci->post_proc_buffer.y_buffer, oci->post_proc_buffer.y_stride);
}

View File

@@ -1,52 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_config.h"
#include "vpx/vpx_codec.h"
#include "vpx_ports/asm_offsets.h"
#include "vp8/common/blockd.h"
#if CONFIG_POSTPROC
#include "postproc.h"
#endif /* CONFIG_POSTPROC */
BEGIN
#if CONFIG_POSTPROC
/* mfqe.c / filter_by_weight */
DEFINE(MFQE_PRECISION_VAL, MFQE_PRECISION);
#endif /* CONFIG_POSTPROC */
END
/* add asserts for any offset that is not supported by assembly code */
/* add asserts for any size that is not supported by assembly code */
#if HAVE_MEDIA
/* switch case in vp8_intra4x4_predict_armv6 is based on these enumerated values */
ct_assert(B_DC_PRED, B_DC_PRED == 0);
ct_assert(B_TM_PRED, B_TM_PRED == 1);
ct_assert(B_VE_PRED, B_VE_PRED == 2);
ct_assert(B_HE_PRED, B_HE_PRED == 3);
ct_assert(B_LD_PRED, B_LD_PRED == 4);
ct_assert(B_RD_PRED, B_RD_PRED == 5);
ct_assert(B_VR_PRED, B_VR_PRED == 6);
ct_assert(B_VL_PRED, B_VL_PRED == 7);
ct_assert(B_HD_PRED, B_HD_PRED == 8);
ct_assert(B_HU_PRED, B_HU_PRED == 9);
#endif
#if HAVE_SSE2
#if CONFIG_POSTPROC
/* vp8_filter_by_weight16x16 and 8x8 */
ct_assert(MFQE_PRECISION_VAL, MFQE_PRECISION == 4)
#endif /* CONFIG_POSTPROC */
#endif /* HAVE_SSE2 */

View File

@@ -512,15 +512,15 @@ static void read_mb_modes_mv(VP8D_COMP *pbi, MODE_INFO *mi, MB_MODE_INFO *mbmi)
else
{
mbmi->mode = NEARMV;
vp8_clamp_mv2(&near_mvs[CNT_NEAR], &pbi->mb);
mbmi->mv.as_int = near_mvs[CNT_NEAR].as_int;
vp8_clamp_mv2(&mbmi->mv, &pbi->mb);
}
}
else
{
mbmi->mode = NEARESTMV;
vp8_clamp_mv2(&near_mvs[CNT_NEAREST], &pbi->mb);
mbmi->mv.as_int = near_mvs[CNT_NEAREST].as_int;
vp8_clamp_mv2(&mbmi->mv, &pbi->mb);
}
}
else

View File

@@ -1026,7 +1026,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
const unsigned char *clear = data;
if (pbi->decrypt_cb)
{
int n = data_end - data;
int n = (int)(data_end - data);
if (n > 10) n = 10;
pbi->decrypt_cb(pbi->decrypt_state, data, clear_buffer, n);
clear = clear_buffer;
@@ -1095,7 +1095,7 @@ int vp8_decode_frame(VP8D_COMP *pbi)
vpx_internal_error(&pc->error, VPX_CODEC_MEM_ERROR,
"Failed to allocate bool decoder 0");
if (pc->frame_type == KEY_FRAME) {
pc->clr_type = (YUV_TYPE)vp8_read_bit(bc);
(void)vp8_read_bit(bc); // colorspace
pc->clamp_type = (CLAMP_TYPE)vp8_read_bit(bc);
}

View File

@@ -430,7 +430,6 @@ int vp8dx_get_raw_frame(VP8D_COMP *pbi, YV12_BUFFER_CONFIG *sd, int64_t *time_st
*time_stamp = pbi->last_time_stamp;
*time_end_stamp = 0;
sd->clrtype = pbi->common.clr_type;
#if CONFIG_POSTPROC
ret = vp8_post_proc_frame(&pbi->common, sd, flags);
#else

View File

@@ -1,26 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_ports/asm_offsets.h"
#include "onyxd_int.h"
BEGIN
DEFINE(bool_decoder_user_buffer_end, offsetof(BOOL_DECODER, user_buffer_end));
DEFINE(bool_decoder_user_buffer, offsetof(BOOL_DECODER, user_buffer));
DEFINE(bool_decoder_value, offsetof(BOOL_DECODER, value));
DEFINE(bool_decoder_count, offsetof(BOOL_DECODER, count));
DEFINE(bool_decoder_range, offsetof(BOOL_DECODER, range));
END
/* add asserts for any offset that is not supported by assembly code */
/* add asserts for any size that is not supported by assembly code */

View File

@@ -1322,7 +1322,7 @@ void vp8_pack_bitstream(VP8_COMP *cpi, unsigned char *dest, unsigned char * dest
vp8_start_encode(bc, cx_data, cx_data_end);
/* signal clr type */
vp8_write_bit(bc, pc->clr_type);
vp8_write_bit(bc, 0);
vp8_write_bit(bc, pc->clamp_type);
}

View File

@@ -1325,7 +1325,7 @@ static int estimate_kf_group_q(VP8_COMP *cpi, double section_err, int section_ta
return Q;
}
extern void vp8_new_frame_rate(VP8_COMP *cpi, double framerate);
extern void vp8_new_framerate(VP8_COMP *cpi, double framerate);
void vp8_init_second_pass(VP8_COMP *cpi)
{
@@ -1349,9 +1349,9 @@ void vp8_init_second_pass(VP8_COMP *cpi)
* sum duration is not. Its calculated based on the actual durations of
* all frames from the first pass.
*/
vp8_new_frame_rate(cpi, 10000000.0 * cpi->twopass.total_stats.count / cpi->twopass.total_stats.duration);
vp8_new_framerate(cpi, 10000000.0 * cpi->twopass.total_stats.count / cpi->twopass.total_stats.duration);
cpi->output_frame_rate = cpi->frame_rate;
cpi->output_framerate = cpi->framerate;
cpi->twopass.bits_left = (int64_t)(cpi->twopass.total_stats.duration * cpi->oxcf.target_bandwidth / 10000000.0) ;
cpi->twopass.bits_left -= (int64_t)(cpi->twopass.total_stats.duration * two_pass_min_rate / 10000000.0);
@@ -2398,7 +2398,7 @@ static void assign_std_frame_bits(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
target_frame_size += cpi->min_frame_bandwidth;
/* Every other frame gets a few extra bits */
if ( (cpi->common.frames_since_golden & 0x01) &&
if ( (cpi->frames_since_golden & 0x01) &&
(cpi->frames_till_gf_update_due > 0) )
{
target_frame_size += cpi->twopass.alt_extra_bits;
@@ -2529,7 +2529,7 @@ void vp8_second_pass(VP8_COMP *cpi)
/* Set nominal per second bandwidth for this frame */
cpi->target_bandwidth = (int)
(cpi->per_frame_bandwidth * cpi->output_frame_rate);
(cpi->per_frame_bandwidth * cpi->output_framerate);
if (cpi->target_bandwidth < 0)
cpi->target_bandwidth = 0;
@@ -3185,7 +3185,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
/* Convert to a per second bitrate */
cpi->target_bandwidth = (int)(cpi->twopass.kf_bits *
cpi->output_frame_rate);
cpi->output_framerate);
}
/* Note the total error score of the kf group minus the key frame itself */
@@ -3224,7 +3224,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
cpi->common.vert_scale = NORMAL;
/* Calculate Average bits per frame. */
av_bits_per_frame = cpi->oxcf.target_bandwidth / DOUBLE_DIVIDE_CHECK((double)cpi->frame_rate);
av_bits_per_frame = cpi->oxcf.target_bandwidth / DOUBLE_DIVIDE_CHECK((double)cpi->framerate);
/* CBR... Use the clip average as the target for deciding resample */
if (cpi->oxcf.end_usage == USAGE_STREAM_FROM_SERVER)
@@ -3299,7 +3299,7 @@ static void find_next_key_frame(VP8_COMP *cpi, FIRSTPASS_STATS *this_frame)
}
else
{
int64_t clip_bits = (int64_t)(cpi->twopass.total_stats.count * cpi->oxcf.target_bandwidth / DOUBLE_DIVIDE_CHECK((double)cpi->frame_rate));
int64_t clip_bits = (int64_t)(cpi->twopass.total_stats.count * cpi->oxcf.target_bandwidth / DOUBLE_DIVIDE_CHECK((double)cpi->framerate));
int64_t over_spend = cpi->oxcf.starting_buffer_level - cpi->buffer_level;
/* If triggered last time the threshold for triggering again is

View File

@@ -301,11 +301,11 @@ static int rescale(int val, int num, int denom)
static void init_temporal_layer_context(VP8_COMP *cpi,
VP8_CONFIG *oxcf,
const int layer,
double prev_layer_frame_rate)
double prev_layer_framerate)
{
LAYER_CONTEXT *lc = &cpi->layer_context[layer];
lc->frame_rate = cpi->output_frame_rate / cpi->oxcf.rate_decimator[layer];
lc->framerate = cpi->output_framerate / cpi->oxcf.rate_decimator[layer];
lc->target_bandwidth = cpi->oxcf.target_bitrate[layer] * 1000;
lc->starting_buffer_level_in_ms = oxcf->starting_buffer_level;
@@ -335,7 +335,7 @@ static void init_temporal_layer_context(VP8_COMP *cpi,
lc->avg_frame_size_for_layer =
(int)((cpi->oxcf.target_bitrate[layer] -
cpi->oxcf.target_bitrate[layer-1]) * 1000 /
(lc->frame_rate - prev_layer_frame_rate));
(lc->framerate - prev_layer_framerate));
lc->active_worst_quality = cpi->oxcf.worst_allowed_q;
lc->active_best_quality = cpi->oxcf.best_allowed_q;
@@ -363,7 +363,7 @@ static void reset_temporal_layer_change(VP8_COMP *cpi,
const int prev_num_layers)
{
int i;
double prev_layer_frame_rate = 0;
double prev_layer_framerate = 0;
const int curr_num_layers = cpi->oxcf.number_of_layers;
// If the previous state was 1 layer, get current layer context from cpi.
// We need this to set the layer context for the new layers below.
@@ -377,7 +377,7 @@ static void reset_temporal_layer_change(VP8_COMP *cpi,
LAYER_CONTEXT *lc = &cpi->layer_context[i];
if (i >= prev_num_layers)
{
init_temporal_layer_context(cpi, oxcf, i, prev_layer_frame_rate);
init_temporal_layer_context(cpi, oxcf, i, prev_layer_framerate);
}
// The initial buffer levels are set based on their starting levels.
// We could set the buffer levels based on the previous state (normalized
@@ -403,8 +403,8 @@ static void reset_temporal_layer_change(VP8_COMP *cpi,
lc->bits_off_target = lc->buffer_level;
restore_layer_context(cpi, 0);
}
prev_layer_frame_rate = cpi->output_frame_rate /
cpi->oxcf.rate_decimator[i];
prev_layer_framerate = cpi->output_framerate /
cpi->oxcf.rate_decimator[i];
}
}
@@ -1282,21 +1282,21 @@ int vp8_reverse_trans(int x)
return 63;
}
void vp8_new_frame_rate(VP8_COMP *cpi, double framerate)
void vp8_new_framerate(VP8_COMP *cpi, double framerate)
{
if(framerate < .1)
framerate = 30;
cpi->frame_rate = framerate;
cpi->output_frame_rate = framerate;
cpi->framerate = framerate;
cpi->output_framerate = framerate;
cpi->per_frame_bandwidth = (int)(cpi->oxcf.target_bandwidth /
cpi->output_frame_rate);
cpi->output_framerate);
cpi->av_per_frame_bandwidth = cpi->per_frame_bandwidth;
cpi->min_frame_bandwidth = (int)(cpi->av_per_frame_bandwidth *
cpi->oxcf.two_pass_vbrmin_section / 100);
/* Set Maximum gf/arf interval */
cpi->max_gf_interval = ((int)(cpi->output_frame_rate / 2.0) + 2);
cpi->max_gf_interval = ((int)(cpi->output_framerate / 2.0) + 2);
if(cpi->max_gf_interval < 12)
cpi->max_gf_interval = 12;
@@ -1337,13 +1337,13 @@ static void init_config(VP8_COMP *cpi, VP8_CONFIG *oxcf)
* seems like a reasonable framerate, then use that as a guess, otherwise
* use 30.
*/
cpi->frame_rate = (double)(oxcf->timebase.den) /
(double)(oxcf->timebase.num);
cpi->framerate = (double)(oxcf->timebase.den) /
(double)(oxcf->timebase.num);
if (cpi->frame_rate > 180)
cpi->frame_rate = 30;
if (cpi->framerate > 180)
cpi->framerate = 30;
cpi->ref_frame_rate = cpi->frame_rate;
cpi->ref_framerate = cpi->framerate;
/* change includes all joint functionality */
vp8_change_config(cpi, oxcf);
@@ -1369,13 +1369,13 @@ static void init_config(VP8_COMP *cpi, VP8_CONFIG *oxcf)
if (cpi->oxcf.number_of_layers > 1)
{
unsigned int i;
double prev_layer_frame_rate=0;
double prev_layer_framerate=0;
for (i=0; i<cpi->oxcf.number_of_layers; i++)
{
init_temporal_layer_context(cpi, oxcf, i, prev_layer_frame_rate);
prev_layer_frame_rate = cpi->output_frame_rate /
cpi->oxcf.rate_decimator[i];
init_temporal_layer_context(cpi, oxcf, i, prev_layer_framerate);
prev_layer_framerate = cpi->output_framerate /
cpi->oxcf.rate_decimator[i];
}
}
@@ -1399,14 +1399,14 @@ static void update_layer_contexts (VP8_COMP *cpi)
if (oxcf->number_of_layers > 1)
{
unsigned int i;
double prev_layer_frame_rate=0;
double prev_layer_framerate=0;
for (i=0; i<oxcf->number_of_layers; i++)
{
LAYER_CONTEXT *lc = &cpi->layer_context[i];
lc->frame_rate =
cpi->ref_frame_rate / oxcf->rate_decimator[i];
lc->framerate =
cpi->ref_framerate / oxcf->rate_decimator[i];
lc->target_bandwidth = oxcf->target_bitrate[i] * 1000;
lc->starting_buffer_level = rescale(
@@ -1432,9 +1432,9 @@ static void update_layer_contexts (VP8_COMP *cpi)
lc->avg_frame_size_for_layer =
(int)((oxcf->target_bitrate[i] -
oxcf->target_bitrate[i-1]) * 1000 /
(lc->frame_rate - prev_layer_frame_rate));
(lc->framerate - prev_layer_framerate));
prev_layer_frame_rate = lc->frame_rate;
prev_layer_framerate = lc->framerate;
}
}
}
@@ -1625,7 +1625,7 @@ void vp8_change_config(VP8_COMP *cpi, VP8_CONFIG *oxcf)
cpi->oxcf.target_bandwidth, 1000);
/* Set up frame rate and related parameters rate control values. */
vp8_new_frame_rate(cpi, cpi->frame_rate);
vp8_new_framerate(cpi, cpi->framerate);
/* Set absolute upper and lower quality limits */
cpi->worst_quality = cpi->oxcf.worst_allowed_q;
@@ -1945,7 +1945,7 @@ struct VP8_COMP* vp8_create_compressor(VP8_CONFIG *oxcf)
for (i = 0; i < KEY_FRAME_CONTEXT; i++)
{
cpi->prior_key_frame_distance[i] = (int)cpi->output_frame_rate;
cpi->prior_key_frame_distance[i] = (int)cpi->output_framerate;
}
#ifdef OUTPUT_YUV_SRC
@@ -2273,7 +2273,7 @@ void vp8_remove_compressor(VP8_COMP **ptr)
{
extern int count_mb_seg[4];
FILE *f = fopen("modes.stt", "a");
double dr = (double)cpi->frame_rate * (double)bytes * (double)8 / (double)count / (double)1000 ;
double dr = (double)cpi->framerate * (double)bytes * (double)8 / (double)count / (double)1000 ;
fprintf(f, "intra_mode in Intra Frames:\n");
fprintf(f, "Y: %8d, %8d, %8d, %8d, %8d\n", y_modes[0], y_modes[1], y_modes[2], y_modes[3], y_modes[4]);
fprintf(f, "UV:%8d, %8d, %8d, %8d\n", uv_modes[0], uv_modes[1], uv_modes[2], uv_modes[3]);
@@ -2750,7 +2750,7 @@ static void update_alt_ref_frame_stats(VP8_COMP *cpi)
cpi->gf_active_count = cm->mb_rows * cm->mb_cols;
/* this frame refreshes means next frames don't unless specified by user */
cpi->common.frames_since_golden = 0;
cpi->frames_since_golden = 0;
/* Clear the alternate reference update pending flag. */
cpi->source_alt_ref_pending = 0;
@@ -2802,7 +2802,7 @@ static void update_golden_frame_stats(VP8_COMP *cpi)
* user
*/
cm->refresh_golden_frame = 0;
cpi->common.frames_since_golden = 0;
cpi->frames_since_golden = 0;
cpi->recent_ref_frame_usage[INTRA_FRAME] = 1;
cpi->recent_ref_frame_usage[LAST_FRAME] = 1;
@@ -2834,12 +2834,12 @@ static void update_golden_frame_stats(VP8_COMP *cpi)
if (cpi->frames_till_gf_update_due > 0)
cpi->frames_till_gf_update_due--;
if (cpi->common.frames_till_alt_ref_frame)
cpi->common.frames_till_alt_ref_frame --;
if (cpi->frames_till_alt_ref_frame)
cpi->frames_till_alt_ref_frame --;
cpi->common.frames_since_golden ++;
cpi->frames_since_golden ++;
if (cpi->common.frames_since_golden > 1)
if (cpi->frames_since_golden > 1)
{
cpi->recent_ref_frame_usage[INTRA_FRAME] +=
cpi->mb.count_mb_ref_frame_usage[INTRA_FRAME];
@@ -2890,11 +2890,11 @@ static void update_rd_ref_frame_probs(VP8_COMP *cpi)
cpi->prob_last_coded = 200;
cpi->prob_gf_coded = 1;
}
else if (cpi->common.frames_since_golden == 0)
else if (cpi->frames_since_golden == 0)
{
cpi->prob_last_coded = 214;
}
else if (cpi->common.frames_since_golden == 1)
else if (cpi->frames_since_golden == 1)
{
cpi->prob_last_coded = 192;
cpi->prob_gf_coded = 220;
@@ -3368,12 +3368,12 @@ static void encode_frame_to_data_rate
cpi->per_frame_bandwidth = cpi->twopass.gf_bits;
/* per second target bitrate */
cpi->target_bandwidth = (int)(cpi->twopass.gf_bits *
cpi->output_frame_rate);
cpi->output_framerate);
}
}
else
#endif
cpi->per_frame_bandwidth = (int)(cpi->target_bandwidth / cpi->output_frame_rate);
cpi->per_frame_bandwidth = (int)(cpi->target_bandwidth / cpi->output_framerate);
/* Default turn off buffer to buffer copying */
cm->copy_buffer_to_gf = 0;
@@ -4557,7 +4557,7 @@ static void encode_frame_to_data_rate
{
LAYER_CONTEXT *lc = &cpi->layer_context[i];
int bits_off_for_this_layer =
(int)(lc->target_bandwidth / lc->frame_rate -
(int)(lc->target_bandwidth / lc->framerate -
cpi->projected_frame_size);
lc->bits_off_target += bits_off_for_this_layer;
@@ -4805,7 +4805,7 @@ static void Pass2Encode(VP8_COMP *cpi, unsigned long *size, unsigned char *dest,
{
double two_pass_min_rate = (double)(cpi->oxcf.target_bandwidth
*cpi->oxcf.two_pass_vbrmin_section / 100);
cpi->twopass.bits_left += (int64_t)(two_pass_min_rate / cpi->frame_rate);
cpi->twopass.bits_left += (int64_t)(two_pass_min_rate / cpi->framerate);
}
}
#endif
@@ -4821,8 +4821,10 @@ int vp8_receive_raw_frame(VP8_COMP *cpi, unsigned int frame_flags, YV12_BUFFER_C
{
#if HAVE_NEON
int64_t store_reg[8];
#endif
#if CONFIG_RUNTIME_CPU_DETECT
VP8_COMMON *cm = &cpi->common;
#endif
#endif
struct vpx_usec_timer timer;
int res = 0;
@@ -4848,7 +4850,6 @@ int vp8_receive_raw_frame(VP8_COMP *cpi, unsigned int frame_flags, YV12_BUFFER_C
if(vp8_lookahead_push(cpi->lookahead, sd, time_stamp, end_time,
frame_flags, cpi->active_map_enabled ? cpi->active_map : NULL))
res = -1;
cm->clr_type = sd->clrtype;
vpx_usec_timer_mark(&timer);
cpi->time_receive_data += vpx_usec_timer_elapsed(&timer);
@@ -4933,7 +4934,7 @@ int vp8_get_compressed_data(VP8_COMP *cpi, unsigned int *frame_flags, unsigned l
cpi->frames_till_gf_update_due);
force_src_buffer = &cpi->alt_ref_buffer;
}
cm->frames_till_alt_ref_frame = cpi->frames_till_gf_update_due;
cpi->frames_till_alt_ref_frame = cpi->frames_till_gf_update_due;
cm->refresh_alt_ref_frame = 1;
cm->refresh_golden_frame = 0;
cm->refresh_last_frame = 0;
@@ -5038,7 +5039,7 @@ int vp8_get_compressed_data(VP8_COMP *cpi, unsigned int *frame_flags, unsigned l
if (this_duration)
{
if (step)
cpi->ref_frame_rate = 10000000.0 / this_duration;
cpi->ref_framerate = 10000000.0 / this_duration;
else
{
double avg_duration, interval;
@@ -5052,11 +5053,11 @@ int vp8_get_compressed_data(VP8_COMP *cpi, unsigned int *frame_flags, unsigned l
if(interval > 10000000.0)
interval = 10000000;
avg_duration = 10000000.0 / cpi->ref_frame_rate;
avg_duration = 10000000.0 / cpi->ref_framerate;
avg_duration *= (interval - avg_duration + this_duration);
avg_duration /= interval;
cpi->ref_frame_rate = 10000000.0 / avg_duration;
cpi->ref_framerate = 10000000.0 / avg_duration;
}
if (cpi->oxcf.number_of_layers > 1)
@@ -5067,12 +5068,12 @@ int vp8_get_compressed_data(VP8_COMP *cpi, unsigned int *frame_flags, unsigned l
for (i=0; i<cpi->oxcf.number_of_layers; i++)
{
LAYER_CONTEXT *lc = &cpi->layer_context[i];
lc->frame_rate = cpi->ref_frame_rate /
cpi->oxcf.rate_decimator[i];
lc->framerate = cpi->ref_framerate /
cpi->oxcf.rate_decimator[i];
}
}
else
vp8_new_frame_rate(cpi, cpi->ref_frame_rate);
vp8_new_framerate(cpi, cpi->ref_framerate);
}
cpi->last_time_stamp_seen = cpi->source->ts_start;
@@ -5089,7 +5090,7 @@ int vp8_get_compressed_data(VP8_COMP *cpi, unsigned int *frame_flags, unsigned l
layer = cpi->oxcf.layer_id[
cpi->temporal_pattern_counter % cpi->oxcf.periodicity];
restore_layer_context (cpi, layer);
vp8_new_frame_rate (cpi, cpi->layer_context[layer].frame_rate);
vp8_new_framerate(cpi, cpi->layer_context[layer].framerate);
}
if (cpi->compressor_speed == 2)

View File

@@ -232,7 +232,7 @@ enum
typedef struct
{
/* Layer configuration */
double frame_rate;
double framerate;
int target_bandwidth;
/* Layer specific coding parameters */
@@ -320,6 +320,7 @@ typedef struct VP8_COMP
YV12_BUFFER_CONFIG scaled_source;
YV12_BUFFER_CONFIG *last_frame_unscaled_source;
unsigned int frames_till_alt_ref_frame;
/* frame in src_buffers has been identified to be encoded as an alt ref */
int source_alt_ref_pending;
/* an alt ref frame has been encoded and is usable */
@@ -369,6 +370,7 @@ typedef struct VP8_COMP
double key_frame_rate_correction_factor;
double gf_rate_correction_factor;
unsigned int frames_since_golden;
/* Count down till next GF */
int frames_till_gf_update_due;
@@ -401,7 +403,7 @@ typedef struct VP8_COMP
/* Minimum allocation that should be used for any frame */
int min_frame_bandwidth;
int inter_frame_target;
double output_frame_rate;
double output_framerate;
int64_t last_time_stamp_seen;
int64_t last_end_time_stamp_seen;
int64_t first_time_stamp_ever;
@@ -415,8 +417,8 @@ typedef struct VP8_COMP
int buffered_mode;
double frame_rate;
double ref_frame_rate;
double framerate;
double ref_framerate;
int64_t buffer_level;
int64_t bits_off_target;

View File

@@ -313,7 +313,7 @@ void vp8cx_pick_filter_level(YV12_BUFFER_CONFIG *sd, VP8_COMP *cpi)
/* Get baseline error score */
/* Copy the unfiltered / processed recon buffer to the new buffer */
vp8_yv12_copy_y(saved_frame, cm->frame_to_show);
vpx_yv12_copy_y(saved_frame, cm->frame_to_show);
vp8cx_set_alt_lf_level(cpi, filt_mid);
vp8_loop_filter_frame_yonly(cm, &cpi->mb.e_mbd, filt_mid);
@@ -339,7 +339,7 @@ void vp8cx_pick_filter_level(YV12_BUFFER_CONFIG *sd, VP8_COMP *cpi)
if(ss_err[filt_low] == 0)
{
/* Get Low filter error score */
vp8_yv12_copy_y(saved_frame, cm->frame_to_show);
vpx_yv12_copy_y(saved_frame, cm->frame_to_show);
vp8cx_set_alt_lf_level(cpi, filt_low);
vp8_loop_filter_frame_yonly(cm, &cpi->mb.e_mbd, filt_low);
@@ -367,7 +367,7 @@ void vp8cx_pick_filter_level(YV12_BUFFER_CONFIG *sd, VP8_COMP *cpi)
{
if(ss_err[filt_high] == 0)
{
vp8_yv12_copy_y(saved_frame, cm->frame_to_show);
vpx_yv12_copy_y(saved_frame, cm->frame_to_show);
vp8cx_set_alt_lf_level(cpi, filt_high);
vp8_loop_filter_frame_yonly(cm, &cpi->mb.e_mbd, filt_high);

View File

@@ -234,7 +234,7 @@ void vp8_save_coding_context(VP8_COMP *cpi)
cc->frames_since_key = cpi->frames_since_key;
cc->filter_level = cpi->common.filter_level;
cc->frames_till_gf_update_due = cpi->frames_till_gf_update_due;
cc->frames_since_golden = cpi->common.frames_since_golden;
cc->frames_since_golden = cpi->frames_since_golden;
vp8_copy(cc->mvc, cpi->common.fc.mvc);
vp8_copy(cc->mvcosts, cpi->rd_costs.mvcosts);
@@ -271,7 +271,7 @@ void vp8_restore_coding_context(VP8_COMP *cpi)
cpi->frames_since_key = cc->frames_since_key;
cpi->common.filter_level = cc->filter_level;
cpi->frames_till_gf_update_due = cc->frames_till_gf_update_due;
cpi->common.frames_since_golden = cc->frames_since_golden;
cpi->frames_since_golden = cc->frames_since_golden;
vp8_copy(cpi->common.fc.mvc, cc->mvc);
@@ -388,7 +388,7 @@ static void calc_iframe_target_size(VP8_COMP *cpi)
int initial_boost = 32; /* |3.0 * per_frame_bandwidth| */
/* Boost depends somewhat on frame rate: only used for 1 layer case. */
if (cpi->oxcf.number_of_layers == 1) {
kf_boost = MAX(initial_boost, (int)(2 * cpi->output_frame_rate - 16));
kf_boost = MAX(initial_boost, (int)(2 * cpi->output_framerate - 16));
}
else {
/* Initial factor: set target size to: |3.0 * per_frame_bandwidth|. */
@@ -399,9 +399,9 @@ static void calc_iframe_target_size(VP8_COMP *cpi)
kf_boost = kf_boost * kf_boost_qadjustment[Q] / 100;
/* frame separation adjustment ( down) */
if (cpi->frames_since_key < cpi->output_frame_rate / 2)
if (cpi->frames_since_key < cpi->output_framerate / 2)
kf_boost = (int)(kf_boost
* cpi->frames_since_key / (cpi->output_frame_rate / 2));
* cpi->frames_since_key / (cpi->output_framerate / 2));
/* Minimal target size is |2* per_frame_bandwidth|. */
if (kf_boost < 16)
@@ -715,7 +715,7 @@ static void calc_pframe_target_size(VP8_COMP *cpi)
if (Adjustment > (cpi->this_frame_target - min_frame_target))
Adjustment = (cpi->this_frame_target - min_frame_target);
if (cpi->common.frames_since_golden == (cpi->current_gf_interval >> 1))
if (cpi->frames_since_golden == (cpi->current_gf_interval >> 1))
cpi->this_frame_target += ((cpi->current_gf_interval - 1) * Adjustment);
else
cpi->this_frame_target -= Adjustment;
@@ -1360,7 +1360,7 @@ static int estimate_keyframe_frequency(VP8_COMP *cpi)
* whichever is smaller.
*/
int key_freq = cpi->oxcf.key_freq>0 ? cpi->oxcf.key_freq : 1;
av_key_frame_frequency = 1 + (int)cpi->output_frame_rate * 2;
av_key_frame_frequency = 1 + (int)cpi->output_framerate * 2;
if (cpi->oxcf.auto_key && av_key_frame_frequency > key_freq)
av_key_frame_frequency = key_freq;

View File

@@ -341,7 +341,7 @@ void vp8_initialize_rd_consts(VP8_COMP *cpi, MACROBLOCK *x, int Qvalue)
void vp8_auto_select_speed(VP8_COMP *cpi)
{
int milliseconds_for_compress = (int)(1000000 / cpi->frame_rate);
int milliseconds_for_compress = (int)(1000000 / cpi->framerate);
milliseconds_for_compress = milliseconds_for_compress * (16 - cpi->oxcf.cpu_used) / 16;

View File

@@ -66,7 +66,6 @@ VP8_COMMON_SRCS-yes += common/setupintrarecon.c
VP8_COMMON_SRCS-yes += common/swapyv12buffer.c
VP8_COMMON_SRCS-yes += common/variance_c.c
VP8_COMMON_SRCS-yes += common/variance.h
VP8_COMMON_SRCS-yes += common/vp8_asm_com_offsets.c
VP8_COMMON_SRCS-yes += common/vp8_entropymodedata.h
@@ -192,7 +191,4 @@ VP8_COMMON_SRCS-$(HAVE_NEON) += common/arm/neon/vp8_subpixelvariance8x8_neon$(A
VP8_COMMON_SRCS-$(HAVE_NEON) += common/arm/neon/vp8_subpixelvariance16x16_neon$(ASM)
VP8_COMMON_SRCS-$(HAVE_NEON) += common/arm/neon/vp8_subpixelvariance16x16s_neon$(ASM)
$(eval $(call asm_offsets_template,\
vp8_asm_com_offsets.asm, $(VP8_PREFIX)common/vp8_asm_com_offsets.c))
$(eval $(call rtcd_h_template,vp8_rtcd,vp8/common/rtcd_defs.sh))

View File

@@ -153,7 +153,7 @@ static vpx_codec_err_t validate_config(vpx_codec_alg_priv_t *ctx,
#else
RANGE_CHECK_HI(cfg, g_lag_in_frames, 25);
#endif
RANGE_CHECK(cfg, rc_end_usage, VPX_VBR, VPX_CQ);
RANGE_CHECK(cfg, rc_end_usage, VPX_VBR, VPX_Q);
RANGE_CHECK_HI(cfg, rc_undershoot_pct, 1000);
RANGE_CHECK_HI(cfg, rc_overshoot_pct, 1000);
RANGE_CHECK_HI(cfg, rc_2pass_vbr_bias_pct, 100);
@@ -204,7 +204,7 @@ static vpx_codec_err_t validate_config(vpx_codec_alg_priv_t *ctx,
RANGE_CHECK_HI(vp8_cfg, arnr_strength, 6);
RANGE_CHECK(vp8_cfg, arnr_type, 1, 3);
RANGE_CHECK(vp8_cfg, cq_level, 0, 63);
if(finalize && cfg->rc_end_usage == VPX_CQ)
if (finalize && (cfg->rc_end_usage == VPX_CQ || cfg->rc_end_usage == VPX_Q))
RANGE_CHECK(vp8_cfg, cq_level,
cfg->rc_min_quantizer, cfg->rc_max_quantizer);
@@ -327,17 +327,14 @@ static vpx_codec_err_t set_vp8e_config(VP8_CONFIG *oxcf,
oxcf->resample_up_water_mark = cfg.rc_resize_up_thresh;
oxcf->resample_down_water_mark = cfg.rc_resize_down_thresh;
if (cfg.rc_end_usage == VPX_VBR)
{
oxcf->end_usage = USAGE_LOCAL_FILE_PLAYBACK;
}
else if (cfg.rc_end_usage == VPX_CBR)
{
oxcf->end_usage = USAGE_STREAM_FROM_SERVER;
}
else if (cfg.rc_end_usage == VPX_CQ)
{
oxcf->end_usage = USAGE_CONSTRAINED_QUALITY;
if (cfg.rc_end_usage == VPX_VBR) {
oxcf->end_usage = USAGE_LOCAL_FILE_PLAYBACK;
} else if (cfg.rc_end_usage == VPX_CBR) {
oxcf->end_usage = USAGE_STREAM_FROM_SERVER;
} else if (cfg.rc_end_usage == VPX_CQ) {
oxcf->end_usage = USAGE_CONSTRAINED_QUALITY;
} else if (cfg.rc_end_usage == VPX_Q) {
oxcf->end_usage = USAGE_CONSTANT_QUALITY;
}
oxcf->target_bandwidth = cfg.rc_target_bitrate;
@@ -695,7 +692,6 @@ static vpx_codec_err_t image2yuvconfig(const vpx_image_t *img,
yv12->uv_stride = img->stride[VPX_PLANE_U];
yv12->border = (img->stride[VPX_PLANE_Y] - img->w) / 2;
yv12->clrtype = (img->fmt == VPX_IMG_FMT_VPXI420 || img->fmt == VPX_IMG_FMT_VPXYV12);
return res;
}
@@ -1079,11 +1075,7 @@ static vpx_image_t *vp8e_get_preview(vpx_codec_alg_priv_t *ctx)
ctx->preview_img.planes[VPX_PLANE_U] = sd.u_buffer;
ctx->preview_img.planes[VPX_PLANE_V] = sd.v_buffer;
if (sd.clrtype == REG_YUV)
ctx->preview_img.fmt = VPX_IMG_FMT_I420;
else
ctx->preview_img.fmt = VPX_IMG_FMT_VPXI420;
ctx->preview_img.fmt = VPX_IMG_FMT_I420;
ctx->preview_img.x_chroma_shift = 1;
ctx->preview_img.y_chroma_shift = 1;
@@ -1277,7 +1269,7 @@ static vpx_codec_enc_cfg_map_t vp8e_usage_cfg_map[] =
1, /* g_delete_first_pass_file */
"vp8.fpf" /* first pass filename */
#endif
VPX_SS_DEFAULT_LAYERS, /* ss_number_layers */
1, /* ts_number_layers */
{0}, /* ts_target_bitrate */
{0}, /* ts_rate_decimator */

View File

@@ -41,15 +41,6 @@ typedef enum
static unsigned long vp8_priv_sz(const vpx_codec_dec_cfg_t *si, vpx_codec_flags_t);
typedef struct
{
unsigned int id;
unsigned long sz;
unsigned int align;
unsigned int flags;
unsigned long(*calc_sz)(const vpx_codec_dec_cfg_t *, vpx_codec_flags_t);
} mem_req_t;
static const mem_req_t vp8_mem_req_segs[] =
{
{VP8_SEG_ALG_PRIV, 0, 8, VPX_CODEC_MEM_ZERO, vp8_priv_sz},
@@ -93,65 +84,6 @@ static unsigned long vp8_priv_sz(const vpx_codec_dec_cfg_t *si, vpx_codec_flags_
return sizeof(vpx_codec_alg_priv_t);
}
static void vp8_mmap_dtor(vpx_codec_mmap_t *mmap)
{
free(mmap->priv);
}
static vpx_codec_err_t vp8_mmap_alloc(vpx_codec_mmap_t *mmap)
{
vpx_codec_err_t res;
unsigned int align;
align = mmap->align ? mmap->align - 1 : 0;
if (mmap->flags & VPX_CODEC_MEM_ZERO)
mmap->priv = calloc(1, mmap->sz + align);
else
mmap->priv = malloc(mmap->sz + align);
res = (mmap->priv) ? VPX_CODEC_OK : VPX_CODEC_MEM_ERROR;
mmap->base = (void *)((((uintptr_t)mmap->priv) + align) & ~(uintptr_t)align);
mmap->dtor = vp8_mmap_dtor;
return res;
}
static vpx_codec_err_t vp8_validate_mmaps(const vp8_stream_info_t *si,
const vpx_codec_mmap_t *mmaps,
vpx_codec_flags_t init_flags)
{
int i;
vpx_codec_err_t res = VPX_CODEC_OK;
for (i = 0; i < NELEMENTS(vp8_mem_req_segs) - 1; i++)
{
/* Ensure the segment has been allocated */
if (!mmaps[i].base)
{
res = VPX_CODEC_MEM_ERROR;
break;
}
/* Verify variable size segment is big enough for the current si. */
if (vp8_mem_req_segs[i].calc_sz)
{
vpx_codec_dec_cfg_t cfg;
cfg.w = si->w;
cfg.h = si->h;
if (mmaps[i].sz < vp8_mem_req_segs[i].calc_sz(&cfg, init_flags))
{
res = VPX_CODEC_MEM_ERROR;
break;
}
}
}
return res;
}
static void vp8_init_ctx(vpx_codec_ctx_t *ctx, const vpx_codec_mmap_t *mmap)
{
int i;
@@ -178,16 +110,6 @@ static void vp8_init_ctx(vpx_codec_ctx_t *ctx, const vpx_codec_mmap_t *mmap)
}
}
static void *mmap_lkup(vpx_codec_alg_priv_t *ctx, unsigned int id)
{
int i;
for (i = 0; i < NELEMENTS(ctx->mmaps); i++)
if (ctx->mmaps[i].id == id)
return ctx->mmaps[i].base;
return NULL;
}
static void vp8_finalize_mmaps(vpx_codec_alg_priv_t *ctx)
{
/* nothing to clean up */
@@ -214,7 +136,7 @@ static vpx_codec_err_t vp8_init(vpx_codec_ctx_t *ctx,
mmap.align = vp8_mem_req_segs[0].align;
mmap.flags = vp8_mem_req_segs[0].flags;
res = vp8_mmap_alloc(&mmap);
res = vpx_mmap_alloc(&mmap);
if (res != VPX_CODEC_OK) return res;
vp8_init_ctx(ctx, &mmap);
@@ -366,8 +288,7 @@ static void yuvconfig2image(vpx_image_t *img,
* the Y, U, and V planes, nor other alignment adjustments that
* might be representable by a YV12_BUFFER_CONFIG, so we just
* initialize all the fields.*/
img->fmt = yv12->clrtype == REG_YUV ?
VPX_IMG_FMT_I420 : VPX_IMG_FMT_VPXI420;
img->fmt = VPX_IMG_FMT_I420;
img->w = yv12->y_stride;
img->h = (yv12->y_height + 2 * VP8BORDERINPIXELS + 15) & ~15;
img->d_w = yv12->y_width;
@@ -488,7 +409,7 @@ static vpx_codec_err_t vp8_decode(vpx_codec_alg_priv_t *ctx,
ctx->mmaps[i].sz = vp8_mem_req_segs[i].calc_sz(&cfg,
ctx->base.init_flags);
res = vp8_mmap_alloc(&ctx->mmaps[i]);
res = vpx_mmap_alloc(&ctx->mmaps[i]);
}
if (!res)
@@ -500,7 +421,9 @@ static vpx_codec_err_t vp8_decode(vpx_codec_alg_priv_t *ctx,
/* Initialize the decoder instance on the first frame*/
if (!res && !ctx->decoder_init)
{
res = vp8_validate_mmaps(&ctx->si, ctx->mmaps, ctx->base.init_flags);
res = vpx_validate_mmaps(&ctx->si, ctx->mmaps,
vp8_mem_req_segs, NELEMENTS(vp8_mem_req_segs),
ctx->base.init_flags);
if (!res)
{
@@ -797,8 +720,6 @@ static vpx_codec_err_t image2yuvconfig(const vpx_image_t *img,
yv12->uv_stride = img->stride[VPX_PLANE_U];
yv12->border = (img->stride[VPX_PLANE_Y] - img->d_w) / 2;
yv12->clrtype = (img->fmt == VPX_IMG_FMT_VPXI420 || img->fmt == VPX_IMG_FMT_VPXYV12);
return res;
}

View File

@@ -35,9 +35,5 @@ VP8_DX_SRCS-yes += decoder/onyxd_int.h
VP8_DX_SRCS-yes += decoder/treereader.h
VP8_DX_SRCS-yes += decoder/onyxd_if.c
VP8_DX_SRCS-$(CONFIG_MULTITHREAD) += decoder/threading.c
VP8_DX_SRCS-yes += decoder/vp8_asm_dec_offsets.c
VP8_DX_SRCS-yes := $(filter-out $(VP8_DX_SRCS_REMOVE-yes),$(VP8_DX_SRCS-yes))
$(eval $(call asm_offsets_template,\
vp8_asm_dec_offsets.asm, $(VP8_PREFIX)decoder/vp8_asm_dec_offsets.c))

View File

@@ -0,0 +1,116 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
EXPORT |vp9_convolve_avg_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
|vp9_convolve_avg_neon| PROC
push {r4-r6, lr}
ldrd r4, r5, [sp, #32]
mov r6, r2
cmp r4, #32
bgt avg64
beq avg32
cmp r4, #8
bgt avg16
beq avg8
b avg4
avg64
sub lr, r1, #32
sub r4, r3, #32
avg64_h
pld [r0, r1, lsl #1]
vld1.8 {q0-q1}, [r0]!
vld1.8 {q2-q3}, [r0], lr
pld [r2, r3]
vld1.8 {q8-q9}, [r6@128]!
vld1.8 {q10-q11}, [r6@128], r4
vrhadd.u8 q0, q0, q8
vrhadd.u8 q1, q1, q9
vrhadd.u8 q2, q2, q10
vrhadd.u8 q3, q3, q11
vst1.8 {q0-q1}, [r2@128]!
vst1.8 {q2-q3}, [r2@128], r4
subs r5, r5, #1
bgt avg64_h
pop {r4-r6, pc}
avg32
vld1.8 {q0-q1}, [r0], r1
vld1.8 {q2-q3}, [r0], r1
vld1.8 {q8-q9}, [r6@128], r3
vld1.8 {q10-q11}, [r6@128], r3
pld [r0]
vrhadd.u8 q0, q0, q8
pld [r0, r1]
vrhadd.u8 q1, q1, q9
pld [r6]
vrhadd.u8 q2, q2, q10
pld [r6, r3]
vrhadd.u8 q3, q3, q11
vst1.8 {q0-q1}, [r2@128], r3
vst1.8 {q2-q3}, [r2@128], r3
subs r5, r5, #2
bgt avg32
pop {r4-r6, pc}
avg16
vld1.8 {q0}, [r0], r1
vld1.8 {q1}, [r0], r1
vld1.8 {q2}, [r6@128], r3
vld1.8 {q3}, [r6@128], r3
pld [r0]
pld [r0, r1]
vrhadd.u8 q0, q0, q2
pld [r6]
pld [r6, r3]
vrhadd.u8 q1, q1, q3
vst1.8 {q0}, [r2@128], r3
vst1.8 {q1}, [r2@128], r3
subs r5, r5, #2
bgt avg16
pop {r4-r6, pc}
avg8
vld1.8 {d0}, [r0], r1
vld1.8 {d1}, [r0], r1
vld1.8 {d2}, [r6@64], r3
vld1.8 {d3}, [r6@64], r3
pld [r0]
pld [r0, r1]
vrhadd.u8 q0, q0, q1
pld [r6]
pld [r6, r3]
vst1.8 {d0}, [r2@64], r3
vst1.8 {d1}, [r2@64], r3
subs r5, r5, #2
bgt avg8
pop {r4-r6, pc}
avg4
vld1.32 {d0[0]}, [r0], r1
vld1.32 {d0[1]}, [r0], r1
vld1.32 {d2[0]}, [r6@32], r3
vld1.32 {d2[1]}, [r6@32], r3
vrhadd.u8 d0, d0, d2
vst1.32 {d0[0]}, [r2@32], r3
vst1.32 {d0[1]}, [r2@32], r3
subs r5, r5, #2
bgt avg4
pop {r4-r6, pc}
ENDP
END

View File

@@ -0,0 +1,302 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
; These functions are only valid when:
; x_step_q4 == 16
; w%4 == 0
; h%4 == 0
; taps == 8
; VP9_FILTER_WEIGHT == 128
; VP9_FILTER_SHIFT == 7
EXPORT |vp9_convolve8_avg_horiz_neon|
EXPORT |vp9_convolve8_avg_vert_neon|
IMPORT |vp9_convolve8_avg_horiz_c|
IMPORT |vp9_convolve8_avg_vert_c|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; Multiply and accumulate by q0
MACRO
MULTIPLY_BY_Q0 $dst, $src0, $src1, $src2, $src3, $src4, $src5, $src6, $src7
vmull.s16 $dst, $src0, d0[0]
vmlal.s16 $dst, $src1, d0[1]
vmlal.s16 $dst, $src2, d0[2]
vmlal.s16 $dst, $src3, d0[3]
vmlal.s16 $dst, $src4, d1[0]
vmlal.s16 $dst, $src5, d1[1]
vmlal.s16 $dst, $src6, d1[2]
vmlal.s16 $dst, $src7, d1[3]
MEND
; r0 const uint8_t *src
; r1 int src_stride
; r2 uint8_t *dst
; r3 int dst_stride
; sp[]const int16_t *filter_x
; sp[]int x_step_q4
; sp[]const int16_t *filter_y ; unused
; sp[]int y_step_q4 ; unused
; sp[]int w
; sp[]int h
|vp9_convolve8_avg_horiz_neon| PROC
ldr r12, [sp, #4] ; x_step_q4
cmp r12, #16
bne vp9_convolve8_avg_horiz_c
push {r4-r10, lr}
sub r0, r0, #3 ; adjust for taps
ldr r5, [sp, #32] ; filter_x
ldr r6, [sp, #48] ; w
ldr r7, [sp, #52] ; h
vld1.s16 {q0}, [r5] ; filter_x
sub r8, r1, r1, lsl #2 ; -src_stride * 3
add r8, r8, #4 ; -src_stride * 3 + 4
sub r4, r3, r3, lsl #2 ; -dst_stride * 3
add r4, r4, #4 ; -dst_stride * 3 + 4
rsb r9, r6, r1, lsl #2 ; reset src for outer loop
sub r9, r9, #7
rsb r12, r6, r3, lsl #2 ; reset dst for outer loop
mov r10, r6 ; w loop counter
loop_horiz_v
vld1.8 {d24}, [r0], r1
vld1.8 {d25}, [r0], r1
vld1.8 {d26}, [r0], r1
vld1.8 {d27}, [r0], r8
vtrn.16 q12, q13
vtrn.8 d24, d25
vtrn.8 d26, d27
pld [r0, r1, lsl #2]
vmovl.u8 q8, d24
vmovl.u8 q9, d25
vmovl.u8 q10, d26
vmovl.u8 q11, d27
; save a few instructions in the inner loop
vswp d17, d18
vmov d23, d21
add r0, r0, #3
loop_horiz
add r5, r0, #64
vld1.32 {d28[]}, [r0], r1
vld1.32 {d29[]}, [r0], r1
vld1.32 {d31[]}, [r0], r1
vld1.32 {d30[]}, [r0], r8
pld [r5]
vtrn.16 d28, d31
vtrn.16 d29, d30
vtrn.8 d28, d29
vtrn.8 d31, d30
pld [r5, r1]
; extract to s16
vtrn.32 q14, q15
vmovl.u8 q12, d28
vmovl.u8 q13, d29
pld [r5, r1, lsl #1]
; slightly out of order load to match the existing data
vld1.u32 {d6[0]}, [r2], r3
vld1.u32 {d7[0]}, [r2], r3
vld1.u32 {d6[1]}, [r2], r3
vld1.u32 {d7[1]}, [r2], r3
sub r2, r2, r3, lsl #2 ; reset for store
; src[] * filter_x
MULTIPLY_BY_Q0 q1, d16, d17, d20, d22, d18, d19, d23, d24
MULTIPLY_BY_Q0 q2, d17, d20, d22, d18, d19, d23, d24, d26
MULTIPLY_BY_Q0 q14, d20, d22, d18, d19, d23, d24, d26, d27
MULTIPLY_BY_Q0 q15, d22, d18, d19, d23, d24, d26, d27, d25
pld [r5, -r8]
; += 64 >> 7
vqrshrun.s32 d2, q1, #7
vqrshrun.s32 d3, q2, #7
vqrshrun.s32 d4, q14, #7
vqrshrun.s32 d5, q15, #7
; saturate
vqmovn.u16 d2, q1
vqmovn.u16 d3, q2
; transpose
vtrn.16 d2, d3
vtrn.32 d2, d3
vtrn.8 d2, d3
; average the new value and the dst value
vrhadd.u8 q1, q1, q3
vst1.u32 {d2[0]}, [r2@32], r3
vst1.u32 {d3[0]}, [r2@32], r3
vst1.u32 {d2[1]}, [r2@32], r3
vst1.u32 {d3[1]}, [r2@32], r4
vmov q8, q9
vmov d20, d23
vmov q11, q12
vmov q9, q13
subs r6, r6, #4 ; w -= 4
bgt loop_horiz
; outer loop
mov r6, r10 ; restore w counter
add r0, r0, r9 ; src += src_stride * 4 - w
add r2, r2, r12 ; dst += dst_stride * 4 - w
subs r7, r7, #4 ; h -= 4
bgt loop_horiz_v
pop {r4-r10, pc}
ENDP
|vp9_convolve8_avg_vert_neon| PROC
ldr r12, [sp, #12]
cmp r12, #16
bne vp9_convolve8_avg_vert_c
push {r4-r8, lr}
; adjust for taps
sub r0, r0, r1
sub r0, r0, r1, lsl #1
ldr r4, [sp, #32] ; filter_y
ldr r6, [sp, #40] ; w
ldr lr, [sp, #44] ; h
vld1.s16 {q0}, [r4] ; filter_y
lsl r1, r1, #1
lsl r3, r3, #1
loop_vert_h
mov r4, r0
add r7, r0, r1, asr #1
mov r5, r2
add r8, r2, r3, asr #1
mov r12, lr ; h loop counter
vld1.u32 {d16[0]}, [r4], r1
vld1.u32 {d16[1]}, [r7], r1
vld1.u32 {d18[0]}, [r4], r1
vld1.u32 {d18[1]}, [r7], r1
vld1.u32 {d20[0]}, [r4], r1
vld1.u32 {d20[1]}, [r7], r1
vld1.u32 {d22[0]}, [r4], r1
vmovl.u8 q8, d16
vmovl.u8 q9, d18
vmovl.u8 q10, d20
vmovl.u8 q11, d22
loop_vert
; always process a 4x4 block at a time
vld1.u32 {d24[0]}, [r7], r1
vld1.u32 {d26[0]}, [r4], r1
vld1.u32 {d26[1]}, [r7], r1
vld1.u32 {d24[1]}, [r4], r1
; extract to s16
vmovl.u8 q12, d24
vmovl.u8 q13, d26
vld1.u32 {d6[0]}, [r5@32], r3
vld1.u32 {d6[1]}, [r8@32], r3
vld1.u32 {d7[0]}, [r5@32], r3
vld1.u32 {d7[1]}, [r8@32], r3
pld [r7]
pld [r4]
; src[] * filter_y
MULTIPLY_BY_Q0 q1, d16, d17, d18, d19, d20, d21, d22, d24
pld [r7, r1]
pld [r4, r1]
MULTIPLY_BY_Q0 q2, d17, d18, d19, d20, d21, d22, d24, d26
pld [r5]
pld [r8]
MULTIPLY_BY_Q0 q14, d18, d19, d20, d21, d22, d24, d26, d27
pld [r5, r3]
pld [r8, r3]
MULTIPLY_BY_Q0 q15, d19, d20, d21, d22, d24, d26, d27, d25
; += 64 >> 7
vqrshrun.s32 d2, q1, #7
vqrshrun.s32 d3, q2, #7
vqrshrun.s32 d4, q14, #7
vqrshrun.s32 d5, q15, #7
; saturate
vqmovn.u16 d2, q1
vqmovn.u16 d3, q2
; average the new value and the dst value
vrhadd.u8 q1, q1, q3
sub r5, r5, r3, lsl #1 ; reset for store
sub r8, r8, r3, lsl #1
vst1.u32 {d2[0]}, [r5@32], r3
vst1.u32 {d2[1]}, [r8@32], r3
vst1.u32 {d3[0]}, [r5@32], r3
vst1.u32 {d3[1]}, [r8@32], r3
vmov q8, q10
vmov d18, d22
vmov d19, d24
vmov q10, q13
vmov d22, d25
subs r12, r12, #4 ; h -= 4
bgt loop_vert
; outer loop
add r0, r0, #4
add r2, r2, #4
subs r6, r6, #4 ; w -= 4
bgt loop_vert_h
pop {r4-r8, pc}
ENDP
END

View File

@@ -0,0 +1,280 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
; These functions are only valid when:
; x_step_q4 == 16
; w%4 == 0
; h%4 == 0
; taps == 8
; VP9_FILTER_WEIGHT == 128
; VP9_FILTER_SHIFT == 7
EXPORT |vp9_convolve8_horiz_neon|
EXPORT |vp9_convolve8_vert_neon|
IMPORT |vp9_convolve8_horiz_c|
IMPORT |vp9_convolve8_vert_c|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; Multiply and accumulate by q0
MACRO
MULTIPLY_BY_Q0 $dst, $src0, $src1, $src2, $src3, $src4, $src5, $src6, $src7
vmull.s16 $dst, $src0, d0[0]
vmlal.s16 $dst, $src1, d0[1]
vmlal.s16 $dst, $src2, d0[2]
vmlal.s16 $dst, $src3, d0[3]
vmlal.s16 $dst, $src4, d1[0]
vmlal.s16 $dst, $src5, d1[1]
vmlal.s16 $dst, $src6, d1[2]
vmlal.s16 $dst, $src7, d1[3]
MEND
; r0 const uint8_t *src
; r1 int src_stride
; r2 uint8_t *dst
; r3 int dst_stride
; sp[]const int16_t *filter_x
; sp[]int x_step_q4
; sp[]const int16_t *filter_y ; unused
; sp[]int y_step_q4 ; unused
; sp[]int w
; sp[]int h
|vp9_convolve8_horiz_neon| PROC
ldr r12, [sp, #4] ; x_step_q4
cmp r12, #16
bne vp9_convolve8_horiz_c
push {r4-r10, lr}
sub r0, r0, #3 ; adjust for taps
ldr r5, [sp, #32] ; filter_x
ldr r6, [sp, #48] ; w
ldr r7, [sp, #52] ; h
vld1.s16 {q0}, [r5] ; filter_x
sub r8, r1, r1, lsl #2 ; -src_stride * 3
add r8, r8, #4 ; -src_stride * 3 + 4
sub r4, r3, r3, lsl #2 ; -dst_stride * 3
add r4, r4, #4 ; -dst_stride * 3 + 4
rsb r9, r6, r1, lsl #2 ; reset src for outer loop
sub r9, r9, #7
rsb r12, r6, r3, lsl #2 ; reset dst for outer loop
mov r10, r6 ; w loop counter
loop_horiz_v
vld1.8 {d24}, [r0], r1
vld1.8 {d25}, [r0], r1
vld1.8 {d26}, [r0], r1
vld1.8 {d27}, [r0], r8
vtrn.16 q12, q13
vtrn.8 d24, d25
vtrn.8 d26, d27
pld [r0, r1, lsl #2]
vmovl.u8 q8, d24
vmovl.u8 q9, d25
vmovl.u8 q10, d26
vmovl.u8 q11, d27
; save a few instructions in the inner loop
vswp d17, d18
vmov d23, d21
add r0, r0, #3
loop_horiz
add r5, r0, #64
vld1.32 {d28[]}, [r0], r1
vld1.32 {d29[]}, [r0], r1
vld1.32 {d31[]}, [r0], r1
vld1.32 {d30[]}, [r0], r8
pld [r5]
vtrn.16 d28, d31
vtrn.16 d29, d30
vtrn.8 d28, d29
vtrn.8 d31, d30
pld [r5, r1]
; extract to s16
vtrn.32 q14, q15
vmovl.u8 q12, d28
vmovl.u8 q13, d29
pld [r5, r1, lsl #1]
; src[] * filter_x
MULTIPLY_BY_Q0 q1, d16, d17, d20, d22, d18, d19, d23, d24
MULTIPLY_BY_Q0 q2, d17, d20, d22, d18, d19, d23, d24, d26
MULTIPLY_BY_Q0 q14, d20, d22, d18, d19, d23, d24, d26, d27
MULTIPLY_BY_Q0 q15, d22, d18, d19, d23, d24, d26, d27, d25
pld [r5, -r8]
; += 64 >> 7
vqrshrun.s32 d2, q1, #7
vqrshrun.s32 d3, q2, #7
vqrshrun.s32 d4, q14, #7
vqrshrun.s32 d5, q15, #7
; saturate
vqmovn.u16 d2, q1
vqmovn.u16 d3, q2
; transpose
vtrn.16 d2, d3
vtrn.32 d2, d3
vtrn.8 d2, d3
vst1.u32 {d2[0]}, [r2@32], r3
vst1.u32 {d3[0]}, [r2@32], r3
vst1.u32 {d2[1]}, [r2@32], r3
vst1.u32 {d3[1]}, [r2@32], r4
vmov q8, q9
vmov d20, d23
vmov q11, q12
vmov q9, q13
subs r6, r6, #4 ; w -= 4
bgt loop_horiz
; outer loop
mov r6, r10 ; restore w counter
add r0, r0, r9 ; src += src_stride * 4 - w
add r2, r2, r12 ; dst += dst_stride * 4 - w
subs r7, r7, #4 ; h -= 4
bgt loop_horiz_v
pop {r4-r10, pc}
ENDP
|vp9_convolve8_vert_neon| PROC
ldr r12, [sp, #12]
cmp r12, #16
bne vp9_convolve8_vert_c
push {r4-r8, lr}
; adjust for taps
sub r0, r0, r1
sub r0, r0, r1, lsl #1
ldr r4, [sp, #32] ; filter_y
ldr r6, [sp, #40] ; w
ldr lr, [sp, #44] ; h
vld1.s16 {q0}, [r4] ; filter_y
lsl r1, r1, #1
lsl r3, r3, #1
loop_vert_h
mov r4, r0
add r7, r0, r1, asr #1
mov r5, r2
add r8, r2, r3, asr #1
mov r12, lr ; h loop counter
vld1.u32 {d16[0]}, [r4], r1
vld1.u32 {d16[1]}, [r7], r1
vld1.u32 {d18[0]}, [r4], r1
vld1.u32 {d18[1]}, [r7], r1
vld1.u32 {d20[0]}, [r4], r1
vld1.u32 {d20[1]}, [r7], r1
vld1.u32 {d22[0]}, [r4], r1
vmovl.u8 q8, d16
vmovl.u8 q9, d18
vmovl.u8 q10, d20
vmovl.u8 q11, d22
loop_vert
; always process a 4x4 block at a time
vld1.u32 {d24[0]}, [r7], r1
vld1.u32 {d26[0]}, [r4], r1
vld1.u32 {d26[1]}, [r7], r1
vld1.u32 {d24[1]}, [r4], r1
; extract to s16
vmovl.u8 q12, d24
vmovl.u8 q13, d26
pld [r5]
pld [r8]
; src[] * filter_y
MULTIPLY_BY_Q0 q1, d16, d17, d18, d19, d20, d21, d22, d24
pld [r5, r3]
pld [r8, r3]
MULTIPLY_BY_Q0 q2, d17, d18, d19, d20, d21, d22, d24, d26
pld [r7]
pld [r4]
MULTIPLY_BY_Q0 q14, d18, d19, d20, d21, d22, d24, d26, d27
pld [r7, r1]
pld [r4, r1]
MULTIPLY_BY_Q0 q15, d19, d20, d21, d22, d24, d26, d27, d25
; += 64 >> 7
vqrshrun.s32 d2, q1, #7
vqrshrun.s32 d3, q2, #7
vqrshrun.s32 d4, q14, #7
vqrshrun.s32 d5, q15, #7
; saturate
vqmovn.u16 d2, q1
vqmovn.u16 d3, q2
vst1.u32 {d2[0]}, [r5@32], r3
vst1.u32 {d2[1]}, [r8@32], r3
vst1.u32 {d3[0]}, [r5@32], r3
vst1.u32 {d3[1]}, [r8@32], r3
vmov q8, q10
vmov d18, d22
vmov d19, d24
vmov q10, q13
vmov d22, d25
subs r12, r12, #4 ; h -= 4
bgt loop_vert
; outer loop
add r0, r0, #4
add r2, r2, #4
subs r6, r6, #4 ; w -= 4
bgt loop_vert_h
pop {r4-r8, pc}
ENDP
END

View File

@@ -0,0 +1,78 @@
/*
* Copyright (c) 2013 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "./vp9_rtcd.h"
#include "vp9/common/vp9_common.h"
#include "vpx_ports/mem.h"
void vp9_convolve8_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
/* Given our constraints: w <= 64, h <= 64, taps == 8 we can reduce the
* maximum buffer size to 64 * 64 + 7 (+ 1 to make it divisible by 4).
*/
DECLARE_ALIGNED_ARRAY(8, uint8_t, temp, 64 * 72);
// Account for the vertical phase needing 3 lines prior and 4 lines post
int intermediate_height = h + 7;
if (x_step_q4 != 16 || y_step_q4 != 16)
return vp9_convolve8_c(src, src_stride,
dst, dst_stride,
filter_x, x_step_q4,
filter_y, y_step_q4,
w, h);
/* Filter starting 3 lines back. The neon implementation will ignore the
* given height and filter a multiple of 4 lines. Since this goes in to
* the temp buffer which has lots of extra room and is subsequently discarded
* this is safe if somewhat less than ideal.
*/
vp9_convolve8_horiz_neon(src - src_stride * 3, src_stride,
temp, 64,
filter_x, x_step_q4, filter_y, y_step_q4,
w, intermediate_height);
/* Step into the temp buffer 3 lines to get the actual frame data */
vp9_convolve8_vert_neon(temp + 64 * 3, 64,
dst, dst_stride,
filter_x, x_step_q4, filter_y, y_step_q4,
w, h);
}
void vp9_convolve8_avg_neon(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int x_step_q4,
const int16_t *filter_y, int y_step_q4,
int w, int h) {
DECLARE_ALIGNED_ARRAY(8, uint8_t, temp, 64 * 72);
int intermediate_height = h + 7;
if (x_step_q4 != 16 || y_step_q4 != 16)
return vp9_convolve8_avg_c(src, src_stride,
dst, dst_stride,
filter_x, x_step_q4,
filter_y, y_step_q4,
w, h);
/* This implementation has the same issues as above. In addition, we only want
* to average the values after both passes.
*/
vp9_convolve8_horiz_neon(src - src_stride * 3, src_stride,
temp, 64,
filter_x, x_step_q4, filter_y, y_step_q4,
w, intermediate_height);
vp9_convolve8_avg_vert_neon(temp + 64 * 3,
64, dst, dst_stride,
filter_x, x_step_q4, filter_y, y_step_q4,
w, h);
}

View File

@@ -0,0 +1,84 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
EXPORT |vp9_convolve_copy_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
|vp9_convolve_copy_neon| PROC
push {r4-r5, lr}
ldrd r4, r5, [sp, #28]
cmp r4, #32
bgt copy64
beq copy32
cmp r4, #8
bgt copy16
beq copy8
b copy4
copy64
sub lr, r1, #32
sub r3, r3, #32
copy64_h
pld [r0, r1, lsl #1]
vld1.8 {q0-q1}, [r0]!
vld1.8 {q2-q3}, [r0], lr
vst1.8 {q0-q1}, [r2@128]!
vst1.8 {q2-q3}, [r2@128], r3
subs r5, r5, #1
bgt copy64_h
pop {r4-r5, pc}
copy32
pld [r0, r1, lsl #1]
vld1.8 {q0-q1}, [r0], r1
pld [r0, r1, lsl #1]
vld1.8 {q2-q3}, [r0], r1
vst1.8 {q0-q1}, [r2@128], r3
vst1.8 {q2-q3}, [r2@128], r3
subs r5, r5, #2
bgt copy32
pop {r4-r5, pc}
copy16
pld [r0, r1, lsl #1]
vld1.8 {q0}, [r0], r1
pld [r0, r1, lsl #1]
vld1.8 {q1}, [r0], r1
vst1.8 {q0}, [r2@128], r3
vst1.8 {q1}, [r2@128], r3
subs r5, r5, #2
bgt copy16
pop {r4-r5, pc}
copy8
pld [r0, r1, lsl #1]
vld1.8 {d0}, [r0], r1
pld [r0, r1, lsl #1]
vld1.8 {d2}, [r0], r1
vst1.8 {d0}, [r2@64], r3
vst1.8 {d2}, [r2@64], r3
subs r5, r5, #2
bgt copy8
pop {r4-r5, pc}
copy4
ldr r12, [r0], r1
str r12, [r2], r3
subs r5, r5, #1
bgt copy4
pop {r4-r5, pc}
ENDP
END

View File

@@ -0,0 +1,69 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
;
EXPORT |vp9_dc_only_idct_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vp9_dc_only_idct_add_neon(int input_dc, uint8_t *pred_ptr,
; uint8_t *dst_ptr, int pitch, int stride)
;
; r0 int input_dc
; r1 uint8_t *pred_ptr
; r2 uint8_t *dst_ptr
; r3 int pitch
; sp int stride
|vp9_dc_only_idct_add_neon| PROC
; generate cospi_16_64 = 11585
mov r12, #0x2d00
add r12, #0x41
; dct_const_round_shift(input_dc * cospi_16_64)
mul r0, r0, r12 ; input_dc * cospi_16_64
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; dct_const_round_shift(out * cospi_16_64)
mul r0, r0, r12 ; out * cospi_16_64
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; ROUND_POWER_OF_TWO(out, 4)
add r0, r0, #8 ; + (1 <<((4) - 1))
asr r0, r0, #4 ; >> 4
vdup.16 q0, r0; ; duplicate a1
ldr r12, [sp] ; load stride
vld1.32 {d2[0]}, [r1], r3
vld1.32 {d2[1]}, [r1], r3
vld1.32 {d4[0]}, [r1], r3
vld1.32 {d4[1]}, [r1]
vaddw.u8 q1, q0, d2 ; a1 + pred_ptr[c]
vaddw.u8 q2, q0, d4
vqmovun.s16 d2, q1 ; clip_pixel
vqmovun.s16 d4, q2
vst1.32 {d2[0]}, [r2], r12
vst1.32 {d2[1]}, [r2], r12
vst1.32 {d4[0]}, [r2], r12
vst1.32 {d4[1]}, [r2]
bx lr
ENDP ; |vp9_dc_only_idct_add_neon|
END

View File

@@ -0,0 +1,169 @@
/*
* Copyright (c) 2013 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "./vp9_rtcd.h"
#include "vp9/common/vp9_common.h"
extern void vp9_short_idct16x16_add_neon_pass1(int16_t *input,
int16_t *output,
int output_stride);
extern void vp9_short_idct16x16_add_neon_pass2(int16_t *src,
int16_t *output,
int16_t *pass1Output,
int16_t skip_adding,
uint8_t *dest,
int dest_stride);
extern void vp9_short_idct10_16x16_add_neon_pass1(int16_t *input,
int16_t *output,
int output_stride);
extern void vp9_short_idct10_16x16_add_neon_pass2(int16_t *src,
int16_t *output,
int16_t *pass1Output,
int16_t skip_adding,
uint8_t *dest,
int dest_stride);
extern void save_neon_registers();
extern void restore_neon_registers();
void vp9_short_idct16x16_add_neon(int16_t *input,
uint8_t *dest, int dest_stride) {
int16_t pass1_output[16*16] = {0};
int16_t row_idct_output[16*16] = {0};
// save d8-d15 register values.
save_neon_registers();
/* Parallel idct on the upper 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
vp9_short_idct16x16_add_neon_pass1(input, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
vp9_short_idct16x16_add_neon_pass2(input+1,
row_idct_output,
pass1_output,
0,
dest,
dest_stride);
/* Parallel idct on the lower 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
vp9_short_idct16x16_add_neon_pass1(input+8*16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
vp9_short_idct16x16_add_neon_pass2(input+8*16+1,
row_idct_output+8,
pass1_output,
0,
dest,
dest_stride);
/* Parallel idct on the left 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
vp9_short_idct16x16_add_neon_pass1(row_idct_output, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
vp9_short_idct16x16_add_neon_pass2(row_idct_output+1,
row_idct_output,
pass1_output,
1,
dest,
dest_stride);
/* Parallel idct on the right 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
vp9_short_idct16x16_add_neon_pass1(row_idct_output+8*16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
vp9_short_idct16x16_add_neon_pass2(row_idct_output+8*16+1,
row_idct_output+8,
pass1_output,
1,
dest+8,
dest_stride);
// restore d8-d15 register values.
restore_neon_registers();
return;
}
void vp9_short_idct10_16x16_add_neon(int16_t *input,
uint8_t *dest, int dest_stride) {
int16_t pass1_output[16*16] = {0};
int16_t row_idct_output[16*16] = {0};
// save d8-d15 register values.
save_neon_registers();
/* Parallel idct on the upper 8 rows */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
vp9_short_idct10_16x16_add_neon_pass1(input, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7
// which will be saved into row_idct_output.
vp9_short_idct10_16x16_add_neon_pass2(input+1,
row_idct_output,
pass1_output,
0,
dest,
dest_stride);
/* Skip Parallel idct on the lower 8 rows as they are all 0s */
/* Parallel idct on the left 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
vp9_short_idct16x16_add_neon_pass1(row_idct_output, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
vp9_short_idct16x16_add_neon_pass2(row_idct_output+1,
row_idct_output,
pass1_output,
1,
dest,
dest_stride);
/* Parallel idct on the right 8 columns */
// First pass processes even elements 0, 2, 4, 6, 8, 10, 12, 14 and save the
// stage 6 result in pass1_output.
vp9_short_idct16x16_add_neon_pass1(row_idct_output+8*16, pass1_output, 8);
// Second pass processes odd elements 1, 3, 5, 7, 9, 11, 13, 15 and combines
// with result in pass1(pass1_output) to calculate final result in stage 7.
// Then add the result to the destination data.
vp9_short_idct16x16_add_neon_pass2(row_idct_output+8*16+1,
row_idct_output+8,
pass1_output,
1,
dest+8,
dest_stride);
// restore d8-d15 register values.
restore_neon_registers();
return;
}

View File

@@ -0,0 +1,47 @@
/*
* Copyright (c) 2013 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vp9/common/vp9_common.h"
// defined in vp9/common/arm/neon/vp9_short_idct32x32_add_neon.asm
extern void idct32_transpose_and_transform(int16_t *transpose_buffer,
int16_t *output, int16_t *input);
extern void idct32_combine_add(uint8_t *dest, int16_t *out, int dest_stride);
// defined in vp9/common/arm/neon/vp9_short_idct16x16_add_neon.asm
extern void save_neon_registers();
extern void restore_neon_registers();
void vp9_short_idct32x32_add_neon(int16_t *input, uint8_t *dest,
int dest_stride) {
// TODO(cd): move the creation of these buffers within the ASM file
// internal buffer used to transpose 8 lines into before transforming them
int16_t transpose_buffer[32 * 8];
// results of the first pass (transpose and transform rows)
int16_t pass1[32 * 32];
// results of the second pass (transpose and transform columns)
int16_t pass2[32 * 32];
// save register we need to preserve
save_neon_registers();
// process rows
idct32_transpose_and_transform(transpose_buffer, pass1, input);
// process columns
// TODO(cd): do these two steps/passes within the ASM file
idct32_transpose_and_transform(transpose_buffer, pass2, pass1);
// combine and add to dest
// TODO(cd): integrate this within the last storage step of the second pass
idct32_combine_add(dest, pass2, dest_stride);
// restore register we need to preserve
restore_neon_registers();
}
// TODO(cd): Eliminate this file altogether when everything is in ASM file

View File

@@ -0,0 +1,708 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
EXPORT |vp9_loop_filter_horizontal_edge_neon|
EXPORT |vp9_loop_filter_vertical_edge_neon|
EXPORT |vp9_mbloop_filter_horizontal_edge_neon|
EXPORT |vp9_mbloop_filter_vertical_edge_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
; Currently vp9 only works on iterations 8 at a time. The vp8 loop filter
; works on 16 iterations at a time.
; TODO(fgalligan): See about removing the count code as this function is only
; called with a count of 1.
;
; void vp9_loop_filter_horizontal_edge_neon(uint8_t *s,
; int p /* pitch */,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh,
; int count)
;
; r0 uint8_t *s,
; r1 int p, /* pitch */
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
; sp+4 int count
|vp9_loop_filter_horizontal_edge_neon| PROC
push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
ldr r12, [sp, #8] ; load count
ldr r2, [sp, #4] ; load thresh
add r1, r1, r1 ; double pitch
cmp r12, #0
beq end_vp9_lf_h_edge
vld1.8 {d1[]}, [r3] ; duplicate *limit
vld1.8 {d2[]}, [r2] ; duplicate *thresh
count_lf_h_loop
sub r2, r0, r1, lsl #1 ; move src pointer down by 4 lines
add r3, r2, r1, lsr #1 ; set to 3 lines down
vld1.u8 {d3}, [r2@64], r1 ; p3
vld1.u8 {d4}, [r3@64], r1 ; p2
vld1.u8 {d5}, [r2@64], r1 ; p1
vld1.u8 {d6}, [r3@64], r1 ; p0
vld1.u8 {d7}, [r2@64], r1 ; q0
vld1.u8 {d16}, [r3@64], r1 ; q1
vld1.u8 {d17}, [r2@64] ; q2
vld1.u8 {d18}, [r3@64] ; q3
sub r2, r2, r1, lsl #1
sub r3, r3, r1, lsl #1
bl vp9_loop_filter_neon
vst1.u8 {d4}, [r2@64], r1 ; store op1
vst1.u8 {d5}, [r3@64], r1 ; store op0
vst1.u8 {d6}, [r2@64], r1 ; store oq0
vst1.u8 {d7}, [r3@64], r1 ; store oq1
add r0, r0, #8
subs r12, r12, #1
bne count_lf_h_loop
end_vp9_lf_h_edge
pop {pc}
ENDP ; |vp9_loop_filter_horizontal_edge_neon|
; Currently vp9 only works on iterations 8 at a time. The vp8 loop filter
; works on 16 iterations at a time.
; TODO(fgalligan): See about removing the count code as this function is only
; called with a count of 1.
;
; void vp9_loop_filter_vertical_edge_neon(uint8_t *s,
; int p /* pitch */,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh,
; int count)
;
; r0 uint8_t *s,
; r1 int p, /* pitch */
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
; sp+4 int count
|vp9_loop_filter_vertical_edge_neon| PROC
push {lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
ldr r12, [sp, #8] ; load count
vld1.8 {d1[]}, [r3] ; duplicate *limit
ldr r3, [sp, #4] ; load thresh
sub r2, r0, #4 ; move s pointer down by 4 columns
cmp r12, #0
beq end_vp9_lf_v_edge
vld1.8 {d2[]}, [r3] ; duplicate *thresh
count_lf_v_loop
vld1.u8 {d3}, [r2], r1 ; load s data
vld1.u8 {d4}, [r2], r1
vld1.u8 {d5}, [r2], r1
vld1.u8 {d6}, [r2], r1
vld1.u8 {d7}, [r2], r1
vld1.u8 {d16}, [r2], r1
vld1.u8 {d17}, [r2], r1
vld1.u8 {d18}, [r2]
;transpose to 8x16 matrix
vtrn.32 d3, d7
vtrn.32 d4, d16
vtrn.32 d5, d17
vtrn.32 d6, d18
vtrn.16 d3, d5
vtrn.16 d4, d6
vtrn.16 d7, d17
vtrn.16 d16, d18
vtrn.8 d3, d4
vtrn.8 d5, d6
vtrn.8 d7, d16
vtrn.8 d17, d18
bl vp9_loop_filter_neon
sub r0, r0, #2
;store op1, op0, oq0, oq1
vst4.8 {d4[0], d5[0], d6[0], d7[0]}, [r0], r1
vst4.8 {d4[1], d5[1], d6[1], d7[1]}, [r0], r1
vst4.8 {d4[2], d5[2], d6[2], d7[2]}, [r0], r1
vst4.8 {d4[3], d5[3], d6[3], d7[3]}, [r0], r1
vst4.8 {d4[4], d5[4], d6[4], d7[4]}, [r0], r1
vst4.8 {d4[5], d5[5], d6[5], d7[5]}, [r0], r1
vst4.8 {d4[6], d5[6], d6[6], d7[6]}, [r0], r1
vst4.8 {d4[7], d5[7], d6[7], d7[7]}, [r0]
add r0, r0, r1, lsl #3 ; s += pitch * 8
subs r12, r12, #1
subne r2, r0, #4 ; move s pointer down by 4 columns
bne count_lf_v_loop
end_vp9_lf_v_edge
pop {pc}
ENDP ; |vp9_loop_filter_vertical_edge_neon|
; void vp9_loop_filter_neon();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. The function does not use
; registers d8-d15.
;
; Inputs:
; r0-r3, r12 PRESERVE
; d0 blimit
; d1 limit
; d2 thresh
; d3 p3
; d4 p2
; d5 p1
; d6 p0
; d7 q0
; d16 q1
; d17 q2
; d18 q3
;
; Outputs:
; d4 op1
; d5 op0
; d6 oq0
; d7 oq1
|vp9_loop_filter_neon| PROC
; filter_mask
vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2)
vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1)
vabd.u8 d21, d5, d6 ; m3 = abs(p1 - p0)
vabd.u8 d22, d16, d7 ; m4 = abs(q1 - q0)
vabd.u8 d3, d17, d16 ; m5 = abs(q2 - q1)
vabd.u8 d4, d18, d17 ; m6 = abs(q3 - q2)
; only compare the largest value to limit
vmax.u8 d19, d19, d20 ; m1 = max(m1, m2)
vmax.u8 d20, d21, d22 ; m2 = max(m3, m4)
vabd.u8 d17, d6, d7 ; abs(p0 - q0)
vmax.u8 d3, d3, d4 ; m3 = max(m5, m6)
vmov.u8 d18, #0x80
vmax.u8 d23, d19, d20 ; m1 = max(m1, m2)
; hevmask
vcgt.u8 d21, d21, d2 ; (abs(p1 - p0) > thresh)*-1
vcgt.u8 d22, d22, d2 ; (abs(q1 - q0) > thresh)*-1
vmax.u8 d23, d23, d3 ; m1 = max(m1, m3)
vabd.u8 d28, d5, d16 ; a = abs(p1 - q1)
vqadd.u8 d17, d17, d17 ; b = abs(p0 - q0) * 2
veor d7, d7, d18 ; qs0
vcge.u8 d23, d1, d23 ; abs(m1) > limit
; filter() function
; convert to signed
vshr.u8 d28, d28, #1 ; a = a / 2
veor d6, d6, d18 ; ps0
veor d5, d5, d18 ; ps1
vqadd.u8 d17, d17, d28 ; a = b + a
veor d16, d16, d18 ; qs1
vmov.u8 d19, #3
vsub.s8 d28, d7, d6 ; ( qs0 - ps0)
vcge.u8 d17, d0, d17 ; a > blimit
vqsub.s8 d27, d5, d16 ; filter = clamp(ps1-qs1)
vorr d22, d21, d22 ; hevmask
vmull.s8 q12, d28, d19 ; 3 * ( qs0 - ps0)
vand d27, d27, d22 ; filter &= hev
vand d23, d23, d17 ; filter_mask
vaddw.s8 q12, q12, d27 ; filter + 3 * (qs0 - ps0)
vmov.u8 d17, #4
; filter = clamp(filter + 3 * ( qs0 - ps0))
vqmovn.s16 d27, q12
vand d27, d27, d23 ; filter &= mask
vqadd.s8 d28, d27, d19 ; filter2 = clamp(filter+3)
vqadd.s8 d27, d27, d17 ; filter1 = clamp(filter+4)
vshr.s8 d28, d28, #3 ; filter2 >>= 3
vshr.s8 d27, d27, #3 ; filter1 >>= 3
vqadd.s8 d19, d6, d28 ; u = clamp(ps0 + filter2)
vqsub.s8 d26, d7, d27 ; u = clamp(qs0 - filter1)
; outer tap adjustments
vrshr.s8 d27, d27, #1 ; filter = ++filter1 >> 1
veor d6, d26, d18 ; *oq0 = u^0x80
vbic d27, d27, d22 ; filter &= ~hev
vqadd.s8 d21, d5, d27 ; u = clamp(ps1 + filter)
vqsub.s8 d20, d16, d27 ; u = clamp(qs1 - filter)
veor d5, d19, d18 ; *op0 = u^0x80
veor d4, d21, d18 ; *op1 = u^0x80
veor d7, d20, d18 ; *oq1 = u^0x80
bx lr
ENDP ; |vp9_loop_filter_neon|
; void vp9_mbloop_filter_horizontal_edge_neon(uint8_t *s, int p,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh,
; int count)
; r0 uint8_t *s,
; r1 int p, /* pitch */
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
; sp+4 int count
|vp9_mbloop_filter_horizontal_edge_neon| PROC
push {r4-r5, lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
ldr r12, [sp, #16] ; load count
ldr r2, [sp, #12] ; load thresh
add r1, r1, r1 ; double pitch
cmp r12, #0
beq end_vp9_mblf_h_edge
vld1.8 {d1[]}, [r3] ; duplicate *limit
vld1.8 {d2[]}, [r2] ; duplicate *thresh
count_mblf_h_loop
sub r3, r0, r1, lsl #1 ; move src pointer down by 4 lines
add r2, r3, r1, lsr #1 ; set to 3 lines down
vld1.u8 {d3}, [r3@64], r1 ; p3
vld1.u8 {d4}, [r2@64], r1 ; p2
vld1.u8 {d5}, [r3@64], r1 ; p1
vld1.u8 {d6}, [r2@64], r1 ; p0
vld1.u8 {d7}, [r3@64], r1 ; q0
vld1.u8 {d16}, [r2@64], r1 ; q1
vld1.u8 {d17}, [r3@64] ; q2
vld1.u8 {d18}, [r2@64], r1 ; q3
sub r3, r3, r1, lsl #1
sub r2, r2, r1, lsl #2
bl vp9_mbloop_filter_neon
vst1.u8 {d0}, [r2@64], r1 ; store op2
vst1.u8 {d1}, [r3@64], r1 ; store op1
vst1.u8 {d2}, [r2@64], r1 ; store op0
vst1.u8 {d3}, [r3@64], r1 ; store oq0
vst1.u8 {d4}, [r2@64], r1 ; store oq1
vst1.u8 {d5}, [r3@64], r1 ; store oq2
add r0, r0, #8
subs r12, r12, #1
bne count_mblf_h_loop
end_vp9_mblf_h_edge
pop {r4-r5, pc}
ENDP ; |vp9_mbloop_filter_horizontal_edge_neon|
; void vp9_mbloop_filter_vertical_edge_neon(uint8_t *s,
; int pitch,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh,
; int count)
;
; r0 uint8_t *s,
; r1 int pitch,
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
; sp+4 int count
|vp9_mbloop_filter_vertical_edge_neon| PROC
push {r4-r5, lr}
vld1.8 {d0[]}, [r2] ; duplicate *blimit
ldr r12, [sp, #16] ; load count
vld1.8 {d1[]}, [r3] ; duplicate *limit
ldr r3, [sp, #12] ; load thresh
sub r2, r0, #4 ; move s pointer down by 4 columns
cmp r12, #0
beq end_vp9_mblf_v_edge
vld1.8 {d2[]}, [r3] ; duplicate *thresh
count_mblf_v_loop
vld1.u8 {d3}, [r2], r1 ; load s data
vld1.u8 {d4}, [r2], r1
vld1.u8 {d5}, [r2], r1
vld1.u8 {d6}, [r2], r1
vld1.u8 {d7}, [r2], r1
vld1.u8 {d16}, [r2], r1
vld1.u8 {d17}, [r2], r1
vld1.u8 {d18}, [r2]
;transpose to 8x16 matrix
vtrn.32 d3, d7
vtrn.32 d4, d16
vtrn.32 d5, d17
vtrn.32 d6, d18
vtrn.16 d3, d5
vtrn.16 d4, d6
vtrn.16 d7, d17
vtrn.16 d16, d18
vtrn.8 d3, d4
vtrn.8 d5, d6
vtrn.8 d7, d16
vtrn.8 d17, d18
sub r2, r0, #3
add r3, r0, #1
bl vp9_mbloop_filter_neon
;store op2, op1, op0, oq0
vst4.8 {d0[0], d1[0], d2[0], d3[0]}, [r2], r1
vst4.8 {d0[1], d1[1], d2[1], d3[1]}, [r2], r1
vst4.8 {d0[2], d1[2], d2[2], d3[2]}, [r2], r1
vst4.8 {d0[3], d1[3], d2[3], d3[3]}, [r2], r1
vst4.8 {d0[4], d1[4], d2[4], d3[4]}, [r2], r1
vst4.8 {d0[5], d1[5], d2[5], d3[5]}, [r2], r1
vst4.8 {d0[6], d1[6], d2[6], d3[6]}, [r2], r1
vst4.8 {d0[7], d1[7], d2[7], d3[7]}, [r2]
;store oq1, oq2
vst2.8 {d4[0], d5[0]}, [r3], r1
vst2.8 {d4[1], d5[1]}, [r3], r1
vst2.8 {d4[2], d5[2]}, [r3], r1
vst2.8 {d4[3], d5[3]}, [r3], r1
vst2.8 {d4[4], d5[4]}, [r3], r1
vst2.8 {d4[5], d5[5]}, [r3], r1
vst2.8 {d4[6], d5[6]}, [r3], r1
vst2.8 {d4[7], d5[7]}, [r3]
add r0, r0, r1, lsl #3 ; s += pitch * 8
subs r12, r12, #1
subne r2, r0, #4 ; move s pointer down by 4 columns
bne count_mblf_v_loop
end_vp9_mblf_v_edge
pop {r4-r5, pc}
ENDP ; |vp9_mbloop_filter_vertical_edge_neon|
; void vp9_mbloop_filter_neon();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store. The function does not use
; registers d8-d15.
;
; Inputs:
; r0-r3, r12 PRESERVE
; d0 blimit
; d1 limit
; d2 thresh
; d3 p3
; d4 p2
; d5 p1
; d6 p0
; d7 q0
; d16 q1
; d17 q2
; d18 q3
;
; Outputs:
; d0 op2
; d1 op1
; d2 op0
; d3 oq0
; d4 oq1
; d5 oq2
|vp9_mbloop_filter_neon| PROC
; filter_mask
vabd.u8 d19, d3, d4 ; m1 = abs(p3 - p2)
vabd.u8 d20, d4, d5 ; m2 = abs(p2 - p1)
vabd.u8 d21, d5, d6 ; m3 = abs(p1 - p0)
vabd.u8 d22, d16, d7 ; m4 = abs(q1 - q0)
vabd.u8 d23, d17, d16 ; m5 = abs(q2 - q1)
vabd.u8 d24, d18, d17 ; m6 = abs(q3 - q2)
; only compare the largest value to limit
vmax.u8 d19, d19, d20 ; m1 = max(m1, m2)
vmax.u8 d20, d21, d22 ; m2 = max(m3, m4)
vabd.u8 d25, d6, d4 ; m7 = abs(p0 - p2)
vmax.u8 d23, d23, d24 ; m3 = max(m5, m6)
vabd.u8 d26, d7, d17 ; m8 = abs(q0 - q2)
vmax.u8 d19, d19, d20
vabd.u8 d24, d6, d7 ; m9 = abs(p0 - q0)
vabd.u8 d27, d3, d6 ; m10 = abs(p3 - p0)
vabd.u8 d28, d18, d7 ; m11 = abs(q3 - q0)
vmax.u8 d19, d19, d23
vabd.u8 d23, d5, d16 ; a = abs(p1 - q1)
vqadd.u8 d24, d24, d24 ; b = abs(p0 - q0) * 2
; abs () > limit
vcge.u8 d19, d1, d19
; only compare the largest value to thresh
vmax.u8 d25, d25, d26 ; m4 = max(m7, m8)
vmax.u8 d26, d27, d28 ; m5 = max(m10, m11)
vshr.u8 d23, d23, #1 ; a = a / 2
vmax.u8 d25, d25, d26 ; m4 = max(m4, m5)
vqadd.u8 d24, d24, d23 ; a = b + a
vmax.u8 d20, d20, d25 ; m2 = max(m2, m4)
vmov.u8 d23, #1
vcge.u8 d24, d0, d24 ; a > blimit
vcgt.u8 d21, d21, d2 ; (abs(p1 - p0) > thresh)*-1
vcge.u8 d20, d23, d20 ; flat
vand d19, d19, d24 ; mask
vcgt.u8 d23, d22, d2 ; (abs(q1 - q0) > thresh)*-1
vand d20, d20, d19 ; flat & mask
vmov.u8 d22, #0x80
vorr d23, d21, d23 ; hev
; This instruction will truncate the "flat & mask" masks down to 4 bits
; each to fit into one 32 bit arm register. The values are stored in
; q10.64[0].
vshrn.u16 d30, q10, #4
vmov.u32 r4, d30[0] ; flat & mask 4bits
adds r5, r4, #1 ; Check for all 1's
; If mask and flat are 1's for all vectors, then we only need to execute
; the power branch for all vectors.
beq power_branch_only
cmp r4, #0 ; Check for 0, set flag for later
; mbfilter() function
; filter() function
; convert to signed
veor d21, d7, d22 ; qs0
veor d24, d6, d22 ; ps0
veor d25, d5, d22 ; ps1
veor d26, d16, d22 ; qs1
vmov.u8 d27, #3
vsub.s8 d28, d21, d24 ; ( qs0 - ps0)
vqsub.s8 d29, d25, d26 ; filter = clamp(ps1-qs1)
vmull.s8 q15, d28, d27 ; 3 * ( qs0 - ps0)
vand d29, d29, d23 ; filter &= hev
vaddw.s8 q15, q15, d29 ; filter + 3 * (qs0 - ps0)
vmov.u8 d29, #4
; filter = clamp(filter + 3 * ( qs0 - ps0))
vqmovn.s16 d28, q15
vand d28, d28, d19 ; filter &= mask
vqadd.s8 d30, d28, d27 ; filter2 = clamp(filter+3)
vqadd.s8 d29, d28, d29 ; filter1 = clamp(filter+4)
vshr.s8 d30, d30, #3 ; filter2 >>= 3
vshr.s8 d29, d29, #3 ; filter1 >>= 3
vqadd.s8 d24, d24, d30 ; op0 = clamp(ps0 + filter2)
vqsub.s8 d21, d21, d29 ; oq0 = clamp(qs0 - filter1)
; outer tap adjustments: ++filter1 >> 1
vrshr.s8 d29, d29, #1
vbic d29, d29, d23 ; filter &= ~hev
vqadd.s8 d25, d25, d29 ; op1 = clamp(ps1 + filter)
vqsub.s8 d26, d26, d29 ; oq1 = clamp(qs1 - filter)
; If mask and flat are 0's for all vectors, then we only need to execute
; the filter branch for all vectors.
beq filter_branch_only
; If mask and flat are mixed then we must perform both branches and
; combine the data.
veor d24, d24, d22 ; *f_op0 = u^0x80
veor d21, d21, d22 ; *f_oq0 = u^0x80
veor d25, d25, d22 ; *f_op1 = u^0x80
veor d26, d26, d22 ; *f_oq1 = u^0x80
; At this point we have already executed the filter branch. The filter
; branch does not set op2 or oq2, so use p2 and q2. Execute the power
; branch and combine the data.
vmov.u8 d23, #2
vaddl.u8 q14, d6, d7 ; r_op2 = p0 + q0
vmlal.u8 q14, d3, d27 ; r_op2 += p3 * 3
vmlal.u8 q14, d4, d23 ; r_op2 += p2 * 2
vbif d0, d4, d20 ; op2 |= p2 & ~(flat & mask)
vaddw.u8 q14, d5 ; r_op2 += p1
vbif d1, d25, d20 ; op1 |= f_op1 & ~(flat & mask)
vqrshrn.u16 d30, q14, #3 ; r_op2
vsubw.u8 q14, d3 ; r_op1 = r_op2 - p3
vsubw.u8 q14, d4 ; r_op1 -= p2
vaddw.u8 q14, d5 ; r_op1 += p1
vaddw.u8 q14, d16 ; r_op1 += q1
vbif d2, d24, d20 ; op0 |= f_op0 & ~(flat & mask)
vqrshrn.u16 d31, q14, #3 ; r_op1
vsubw.u8 q14, d3 ; r_op0 = r_op1 - p3
vsubw.u8 q14, d5 ; r_op0 -= p1
vaddw.u8 q14, d6 ; r_op0 += p0
vaddw.u8 q14, d17 ; r_op0 += q2
vbit d0, d30, d20 ; op2 |= r_op2 & (flat & mask)
vqrshrn.u16 d23, q14, #3 ; r_op0
vsubw.u8 q14, d3 ; r_oq0 = r_op0 - p3
vsubw.u8 q14, d6 ; r_oq0 -= p0
vaddw.u8 q14, d7 ; r_oq0 += q0
vbit d1, d31, d20 ; op1 |= r_op1 & (flat & mask)
vaddw.u8 q14, d18 ; oq0 += q3
vbit d2, d23, d20 ; op0 |= r_op0 & (flat & mask)
vqrshrn.u16 d22, q14, #3 ; r_oq0
vsubw.u8 q14, d4 ; r_oq1 = r_oq0 - p2
vsubw.u8 q14, d7 ; r_oq1 -= q0
vaddw.u8 q14, d16 ; r_oq1 += q1
vbif d3, d21, d20 ; oq0 |= f_oq0 & ~(flat & mask)
vaddw.u8 q14, d18 ; r_oq1 += q3
vbif d4, d26, d20 ; oq1 |= f_oq1 & ~(flat & mask)
vqrshrn.u16 d6, q14, #3 ; r_oq1
vsubw.u8 q14, d5 ; r_oq2 = r_oq1 - p1
vsubw.u8 q14, d16 ; r_oq2 -= q1
vaddw.u8 q14, d17 ; r_oq2 += q2
vaddw.u8 q14, d18 ; r_oq2 += q3
vbif d5, d17, d20 ; oq2 |= q2 & ~(flat & mask)
vqrshrn.u16 d7, q14, #3 ; r_oq2
vbit d3, d22, d20 ; oq0 |= r_oq0 & (flat & mask)
vbit d4, d6, d20 ; oq1 |= r_oq1 & (flat & mask)
vbit d5, d7, d20 ; oq2 |= r_oq2 & (flat & mask)
bx lr
power_branch_only
vmov.u8 d27, #3
vmov.u8 d21, #2
vaddl.u8 q14, d6, d7 ; op2 = p0 + q0
vmlal.u8 q14, d3, d27 ; op2 += p3 * 3
vmlal.u8 q14, d4, d21 ; op2 += p2 * 2
vaddw.u8 q14, d5 ; op2 += p1
vqrshrn.u16 d0, q14, #3 ; op2
vsubw.u8 q14, d3 ; op1 = op2 - p3
vsubw.u8 q14, d4 ; op1 -= p2
vaddw.u8 q14, d5 ; op1 += p1
vaddw.u8 q14, d16 ; op1 += q1
vqrshrn.u16 d1, q14, #3 ; op1
vsubw.u8 q14, d3 ; op0 = op1 - p3
vsubw.u8 q14, d5 ; op0 -= p1
vaddw.u8 q14, d6 ; op0 += p0
vaddw.u8 q14, d17 ; op0 += q2
vqrshrn.u16 d2, q14, #3 ; op0
vsubw.u8 q14, d3 ; oq0 = op0 - p3
vsubw.u8 q14, d6 ; oq0 -= p0
vaddw.u8 q14, d7 ; oq0 += q0
vaddw.u8 q14, d18 ; oq0 += q3
vqrshrn.u16 d3, q14, #3 ; oq0
vsubw.u8 q14, d4 ; oq1 = oq0 - p2
vsubw.u8 q14, d7 ; oq1 -= q0
vaddw.u8 q14, d16 ; oq1 += q1
vaddw.u8 q14, d18 ; oq1 += q3
vqrshrn.u16 d4, q14, #3 ; oq1
vsubw.u8 q14, d5 ; oq2 = oq1 - p1
vsubw.u8 q14, d16 ; oq2 -= q1
vaddw.u8 q14, d17 ; oq2 += q2
vaddw.u8 q14, d18 ; oq2 += q3
vqrshrn.u16 d5, q14, #3 ; oq2
bx lr
filter_branch_only
; TODO(fgalligan): See if we can rearange registers so we do not need to
; do the 2 vswp.
vswp d0, d4 ; op2
vswp d5, d17 ; oq2
veor d2, d24, d22 ; *op0 = u^0x80
veor d3, d21, d22 ; *oq0 = u^0x80
veor d1, d25, d22 ; *op1 = u^0x80
veor d4, d26, d22 ; *oq1 = u^0x80
bx lr
ENDP ; |vp9_mbloop_filter_neon|
END

View File

@@ -0,0 +1,603 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
EXPORT |vp9_mb_lpf_horizontal_edge_w_neon|
EXPORT |vp9_mb_lpf_vertical_edge_w_neon|
ARM
AREA ||.text||, CODE, READONLY, ALIGN=2
; void vp9_mb_lpf_horizontal_edge_w_neon(uint8_t *s, int p,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh
; int count)
; r0 uint8_t *s,
; r1 int p, /* pitch */
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|vp9_mb_lpf_horizontal_edge_w_neon| PROC
push {r4-r8, lr}
vpush {d8-d15}
ldr r4, [sp, #88] ; load thresh
ldr r12, [sp, #92] ; load count
h_count
vld1.8 {d16[]}, [r2] ; load *blimit
vld1.8 {d17[]}, [r3] ; load *limit
vld1.8 {d18[]}, [r4] ; load *thresh
sub r8, r0, r1, lsl #3 ; move src pointer down by 8 lines
vld1.u8 {d0}, [r8@64], r1 ; p7
vld1.u8 {d1}, [r8@64], r1 ; p6
vld1.u8 {d2}, [r8@64], r1 ; p5
vld1.u8 {d3}, [r8@64], r1 ; p4
vld1.u8 {d4}, [r8@64], r1 ; p3
vld1.u8 {d5}, [r8@64], r1 ; p2
vld1.u8 {d6}, [r8@64], r1 ; p1
vld1.u8 {d7}, [r8@64], r1 ; p0
vld1.u8 {d8}, [r8@64], r1 ; q0
vld1.u8 {d9}, [r8@64], r1 ; q1
vld1.u8 {d10}, [r8@64], r1 ; q2
vld1.u8 {d11}, [r8@64], r1 ; q3
vld1.u8 {d12}, [r8@64], r1 ; q4
vld1.u8 {d13}, [r8@64], r1 ; q5
vld1.u8 {d14}, [r8@64], r1 ; q6
vld1.u8 {d15}, [r8@64], r1 ; q7
bl vp9_wide_mbfilter_neon
tst r7, #1
beq h_mbfilter
; flat && mask were not set for any of the channels. Just store the values
; from filter.
sub r8, r0, r1, lsl #1
vst1.u8 {d25}, [r8@64], r1 ; store op1
vst1.u8 {d24}, [r8@64], r1 ; store op0
vst1.u8 {d23}, [r8@64], r1 ; store oq0
vst1.u8 {d26}, [r8@64], r1 ; store oq1
b h_next
h_mbfilter
tst r7, #2
beq h_wide_mbfilter
; flat2 was not set for any of the channels. Just store the values from
; mbfilter.
sub r8, r0, r1, lsl #1
sub r8, r8, r1
vst1.u8 {d18}, [r8@64], r1 ; store op2
vst1.u8 {d19}, [r8@64], r1 ; store op1
vst1.u8 {d20}, [r8@64], r1 ; store op0
vst1.u8 {d21}, [r8@64], r1 ; store oq0
vst1.u8 {d22}, [r8@64], r1 ; store oq1
vst1.u8 {d23}, [r8@64], r1 ; store oq2
b h_next
h_wide_mbfilter
sub r8, r0, r1, lsl #3
add r8, r8, r1
vst1.u8 {d16}, [r8@64], r1 ; store op6
vst1.u8 {d24}, [r8@64], r1 ; store op5
vst1.u8 {d25}, [r8@64], r1 ; store op4
vst1.u8 {d26}, [r8@64], r1 ; store op3
vst1.u8 {d27}, [r8@64], r1 ; store op2
vst1.u8 {d18}, [r8@64], r1 ; store op1
vst1.u8 {d19}, [r8@64], r1 ; store op0
vst1.u8 {d20}, [r8@64], r1 ; store oq0
vst1.u8 {d21}, [r8@64], r1 ; store oq1
vst1.u8 {d22}, [r8@64], r1 ; store oq2
vst1.u8 {d23}, [r8@64], r1 ; store oq3
vst1.u8 {d1}, [r8@64], r1 ; store oq4
vst1.u8 {d2}, [r8@64], r1 ; store oq5
vst1.u8 {d3}, [r8@64], r1 ; store oq6
h_next
add r0, r0, #8
subs r12, r12, #1
bne h_count
vpop {d8-d15}
pop {r4-r8, pc}
ENDP ; |vp9_mb_lpf_horizontal_edge_w_neon|
; void vp9_mb_lpf_vertical_edge_w_neon(uint8_t *s, int p,
; const uint8_t *blimit,
; const uint8_t *limit,
; const uint8_t *thresh)
; r0 uint8_t *s,
; r1 int p, /* pitch */
; r2 const uint8_t *blimit,
; r3 const uint8_t *limit,
; sp const uint8_t *thresh,
|vp9_mb_lpf_vertical_edge_w_neon| PROC
push {r4-r8, lr}
vpush {d8-d15}
ldr r4, [sp, #88] ; load thresh
vld1.8 {d16[]}, [r2] ; load *blimit
vld1.8 {d17[]}, [r3] ; load *limit
vld1.8 {d18[]}, [r4] ; load *thresh
sub r8, r0, #8
vld1.8 {d0}, [r8@64], r1
vld1.8 {d8}, [r0@64], r1
vld1.8 {d1}, [r8@64], r1
vld1.8 {d9}, [r0@64], r1
vld1.8 {d2}, [r8@64], r1
vld1.8 {d10}, [r0@64], r1
vld1.8 {d3}, [r8@64], r1
vld1.8 {d11}, [r0@64], r1
vld1.8 {d4}, [r8@64], r1
vld1.8 {d12}, [r0@64], r1
vld1.8 {d5}, [r8@64], r1
vld1.8 {d13}, [r0@64], r1
vld1.8 {d6}, [r8@64], r1
vld1.8 {d14}, [r0@64], r1
vld1.8 {d7}, [r8@64], r1
vld1.8 {d15}, [r0@64], r1
sub r0, r0, r1, lsl #3
vtrn.32 q0, q2
vtrn.32 q1, q3
vtrn.32 q4, q6
vtrn.32 q5, q7
vtrn.16 q0, q1
vtrn.16 q2, q3
vtrn.16 q4, q5
vtrn.16 q6, q7
vtrn.8 d0, d1
vtrn.8 d2, d3
vtrn.8 d4, d5
vtrn.8 d6, d7
vtrn.8 d8, d9
vtrn.8 d10, d11
vtrn.8 d12, d13
vtrn.8 d14, d15
bl vp9_wide_mbfilter_neon
tst r7, #1
beq v_mbfilter
; flat && mask were not set for any of the channels. Just store the values
; from filter.
sub r8, r0, #2
vswp d23, d25
vst4.8 {d23[0], d24[0], d25[0], d26[0]}, [r8], r1
vst4.8 {d23[1], d24[1], d25[1], d26[1]}, [r8], r1
vst4.8 {d23[2], d24[2], d25[2], d26[2]}, [r8], r1
vst4.8 {d23[3], d24[3], d25[3], d26[3]}, [r8], r1
vst4.8 {d23[4], d24[4], d25[4], d26[4]}, [r8], r1
vst4.8 {d23[5], d24[5], d25[5], d26[5]}, [r8], r1
vst4.8 {d23[6], d24[6], d25[6], d26[6]}, [r8], r1
vst4.8 {d23[7], d24[7], d25[7], d26[7]}, [r8], r1
b v_end
v_mbfilter
tst r7, #2
beq v_wide_mbfilter
; flat2 was not set for any of the channels. Just store the values from
; mbfilter.
sub r8, r0, #3
vst3.8 {d18[0], d19[0], d20[0]}, [r8], r1
vst3.8 {d21[0], d22[0], d23[0]}, [r0], r1
vst3.8 {d18[1], d19[1], d20[1]}, [r8], r1
vst3.8 {d21[1], d22[1], d23[1]}, [r0], r1
vst3.8 {d18[2], d19[2], d20[2]}, [r8], r1
vst3.8 {d21[2], d22[2], d23[2]}, [r0], r1
vst3.8 {d18[3], d19[3], d20[3]}, [r8], r1
vst3.8 {d21[3], d22[3], d23[3]}, [r0], r1
vst3.8 {d18[4], d19[4], d20[4]}, [r8], r1
vst3.8 {d21[4], d22[4], d23[4]}, [r0], r1
vst3.8 {d18[5], d19[5], d20[5]}, [r8], r1
vst3.8 {d21[5], d22[5], d23[5]}, [r0], r1
vst3.8 {d18[6], d19[6], d20[6]}, [r8], r1
vst3.8 {d21[6], d22[6], d23[6]}, [r0], r1
vst3.8 {d18[7], d19[7], d20[7]}, [r8], r1
vst3.8 {d21[7], d22[7], d23[7]}, [r0], r1
b v_end
v_wide_mbfilter
sub r8, r0, #8
vtrn.32 d0, d26
vtrn.32 d16, d27
vtrn.32 d24, d18
vtrn.32 d25, d19
vtrn.16 d0, d24
vtrn.16 d16, d25
vtrn.16 d26, d18
vtrn.16 d27, d19
vtrn.8 d0, d16
vtrn.8 d24, d25
vtrn.8 d26, d27
vtrn.8 d18, d19
vtrn.32 d20, d1
vtrn.32 d21, d2
vtrn.32 d22, d3
vtrn.32 d23, d15
vtrn.16 d20, d22
vtrn.16 d21, d23
vtrn.16 d1, d3
vtrn.16 d2, d15
vtrn.8 d20, d21
vtrn.8 d22, d23
vtrn.8 d1, d2
vtrn.8 d3, d15
vst1.8 {d0}, [r8@64], r1
vst1.8 {d20}, [r0@64], r1
vst1.8 {d16}, [r8@64], r1
vst1.8 {d21}, [r0@64], r1
vst1.8 {d24}, [r8@64], r1
vst1.8 {d22}, [r0@64], r1
vst1.8 {d25}, [r8@64], r1
vst1.8 {d23}, [r0@64], r1
vst1.8 {d26}, [r8@64], r1
vst1.8 {d1}, [r0@64], r1
vst1.8 {d27}, [r8@64], r1
vst1.8 {d2}, [r0@64], r1
vst1.8 {d18}, [r8@64], r1
vst1.8 {d3}, [r0@64], r1
vst1.8 {d19}, [r8@64], r1
vst1.8 {d15}, [r0@64], r1
v_end
vpop {d8-d15}
pop {r4-r8, pc}
ENDP ; |vp9_mb_lpf_vertical_edge_w_neon|
; void vp9_wide_mbfilter_neon();
; This is a helper function for the loopfilters. The invidual functions do the
; necessary load, transpose (if necessary) and store.
;
; r0-r3 PRESERVE
; d16 blimit
; d17 limit
; d18 thresh
; d0 p7
; d1 p6
; d2 p5
; d3 p4
; d4 p3
; d5 p2
; d6 p1
; d7 p0
; d8 q0
; d9 q1
; d10 q2
; d11 q3
; d12 q4
; d13 q5
; d14 q6
; d15 q7
|vp9_wide_mbfilter_neon| PROC
mov r7, #0
; filter_mask
vabd.u8 d19, d4, d5 ; abs(p3 - p2)
vabd.u8 d20, d5, d6 ; abs(p2 - p1)
vabd.u8 d21, d6, d7 ; abs(p1 - p0)
vabd.u8 d22, d9, d8 ; abs(q1 - q0)
vabd.u8 d23, d10, d9 ; abs(q2 - q1)
vabd.u8 d24, d11, d10 ; abs(q3 - q2)
; only compare the largest value to limit
vmax.u8 d19, d19, d20 ; max(abs(p3 - p2), abs(p2 - p1))
vmax.u8 d20, d21, d22 ; max(abs(p1 - p0), abs(q1 - q0))
vmax.u8 d23, d23, d24 ; max(abs(q2 - q1), abs(q3 - q2))
vmax.u8 d19, d19, d20
vabd.u8 d24, d7, d8 ; abs(p0 - q0)
vmax.u8 d19, d19, d23
vabd.u8 d23, d6, d9 ; a = abs(p1 - q1)
vqadd.u8 d24, d24, d24 ; b = abs(p0 - q0) * 2
; abs () > limit
vcge.u8 d19, d17, d19
; flatmask4
vabd.u8 d25, d7, d5 ; abs(p0 - p2)
vabd.u8 d26, d8, d10 ; abs(q0 - q2)
vabd.u8 d27, d4, d7 ; abs(p3 - p0)
vabd.u8 d28, d11, d8 ; abs(q3 - q0)
; only compare the largest value to thresh
vmax.u8 d25, d25, d26 ; max(abs(p0 - p2), abs(q0 - q2))
vmax.u8 d26, d27, d28 ; max(abs(p3 - p0), abs(q3 - q0))
vmax.u8 d25, d25, d26
vmax.u8 d20, d20, d25
vshr.u8 d23, d23, #1 ; a = a / 2
vqadd.u8 d24, d24, d23 ; a = b + a
vmov.u8 d30, #1
vcge.u8 d24, d16, d24 ; (a > blimit * 2 + limit) * -1
vcge.u8 d20, d30, d20 ; flat
vand d19, d19, d24 ; mask
; hevmask
vcgt.u8 d21, d21, d18 ; (abs(p1 - p0) > thresh)*-1
vcgt.u8 d22, d22, d18 ; (abs(q1 - q0) > thresh)*-1
vorr d21, d21, d22 ; hev
vand d16, d20, d19 ; flat && mask
vmov r5, r6, d16
; flatmask5(1, p7, p6, p5, p4, p0, q0, q4, q5, q6, q7)
vabd.u8 d22, d3, d7 ; abs(p4 - p0)
vabd.u8 d23, d12, d8 ; abs(q4 - q0)
vabd.u8 d24, d7, d2 ; abs(p0 - p5)
vabd.u8 d25, d8, d13 ; abs(q0 - q5)
vabd.u8 d26, d1, d7 ; abs(p6 - p0)
vabd.u8 d27, d14, d8 ; abs(q6 - q0)
vabd.u8 d28, d0, d7 ; abs(p7 - p0)
vabd.u8 d29, d15, d8 ; abs(q7 - q0)
; only compare the largest value to thresh
vmax.u8 d22, d22, d23 ; max(abs(p4 - p0), abs(q4 - q0))
vmax.u8 d23, d24, d25 ; max(abs(p0 - p5), abs(q0 - q5))
vmax.u8 d24, d26, d27 ; max(abs(p6 - p0), abs(q6 - q0))
vmax.u8 d25, d28, d29 ; max(abs(p7 - p0), abs(q7 - q0))
vmax.u8 d26, d22, d23
vmax.u8 d27, d24, d25
vmax.u8 d23, d26, d27
vcge.u8 d18, d30, d23 ; flat2
vmov.u8 d22, #0x80
orrs r5, r5, r6 ; Check for 0
orreq r7, r7, #1 ; Only do filter branch
vand d17, d18, d16 ; flat2 && flat && mask
vmov r5, r6, d17
; mbfilter() function
; filter() function
; convert to signed
veor d23, d8, d22 ; qs0
veor d24, d7, d22 ; ps0
veor d25, d6, d22 ; ps1
veor d26, d9, d22 ; qs1
vmov.u8 d27, #3
vsub.s8 d28, d23, d24 ; ( qs0 - ps0)
vqsub.s8 d29, d25, d26 ; filter = clamp(ps1-qs1)
vmull.s8 q15, d28, d27 ; 3 * ( qs0 - ps0)
vand d29, d29, d21 ; filter &= hev
vaddw.s8 q15, q15, d29 ; filter + 3 * (qs0 - ps0)
vmov.u8 d29, #4
; filter = clamp(filter + 3 * ( qs0 - ps0))
vqmovn.s16 d28, q15
vand d28, d28, d19 ; filter &= mask
vqadd.s8 d30, d28, d27 ; filter2 = clamp(filter+3)
vqadd.s8 d29, d28, d29 ; filter1 = clamp(filter+4)
vshr.s8 d30, d30, #3 ; filter2 >>= 3
vshr.s8 d29, d29, #3 ; filter1 >>= 3
vqadd.s8 d24, d24, d30 ; op0 = clamp(ps0 + filter2)
vqsub.s8 d23, d23, d29 ; oq0 = clamp(qs0 - filter1)
; outer tap adjustments: ++filter1 >> 1
vrshr.s8 d29, d29, #1
vbic d29, d29, d21 ; filter &= ~hev
vqadd.s8 d25, d25, d29 ; op1 = clamp(ps1 + filter)
vqsub.s8 d26, d26, d29 ; oq1 = clamp(qs1 - filter)
veor d24, d24, d22 ; *f_op0 = u^0x80
veor d23, d23, d22 ; *f_oq0 = u^0x80
veor d25, d25, d22 ; *f_op1 = u^0x80
veor d26, d26, d22 ; *f_oq1 = u^0x80
tst r7, #1
bxne lr
; mbfilter flat && mask branch
; TODO(fgalligan): Can I decrease the cycles shifting to consective d's
; and using vibt on the q's?
vmov.u8 d29, #2
vaddl.u8 q15, d7, d8 ; op2 = p0 + q0
vmlal.u8 q15, d4, d27 ; op2 = p0 + q0 + p3 * 3
vmlal.u8 q15, d5, d29 ; op2 = p0 + q0 + p3 * 3 + p2 * 2
vaddl.u8 q10, d4, d5
vaddw.u8 q15, d6 ; op2=p1 + p0 + q0 + p3 * 3 + p2 *2
vaddl.u8 q14, d6, d9
vqrshrn.u16 d18, q15, #3 ; r_op2
vsub.i16 q15, q10
vaddl.u8 q10, d4, d6
vadd.i16 q15, q14
vaddl.u8 q14, d7, d10
vqrshrn.u16 d19, q15, #3 ; r_op1
vsub.i16 q15, q10
vadd.i16 q15, q14
vaddl.u8 q14, d8, d11
vqrshrn.u16 d20, q15, #3 ; r_op0
vsubw.u8 q15, d4 ; oq0 = op0 - p3
vsubw.u8 q15, d7 ; oq0 -= p0
vadd.i16 q15, q14
vaddl.u8 q14, d9, d11
vqrshrn.u16 d21, q15, #3 ; r_oq0
vsubw.u8 q15, d5 ; oq1 = oq0 - p2
vsubw.u8 q15, d8 ; oq1 -= q0
vadd.i16 q15, q14
vaddl.u8 q14, d10, d11
vqrshrn.u16 d22, q15, #3 ; r_oq1
vsubw.u8 q15, d6 ; oq2 = oq0 - p1
vsubw.u8 q15, d9 ; oq2 -= q1
vadd.i16 q15, q14
vqrshrn.u16 d27, q15, #3 ; r_oq2
; Filter does not set op2 or oq2, so use p2 and q2.
vbif d18, d5, d16 ; t_op2 |= p2 & ~(flat & mask)
vbif d19, d25, d16 ; t_op1 |= f_op1 & ~(flat & mask)
vbif d20, d24, d16 ; t_op0 |= f_op0 & ~(flat & mask)
vbif d21, d23, d16 ; t_oq0 |= f_oq0 & ~(flat & mask)
vbif d22, d26, d16 ; t_oq1 |= f_oq1 & ~(flat & mask)
vbit d23, d27, d16 ; t_oq2 |= r_oq2 & (flat & mask)
vbif d23, d10, d16 ; t_oq2 |= q2 & ~(flat & mask)
tst r7, #2
bxne lr
; wide_mbfilter flat2 && flat && mask branch
vmov.u8 d16, #7
vaddl.u8 q15, d7, d8 ; op6 = p0 + q0
vaddl.u8 q12, d2, d3
vaddl.u8 q13, d4, d5
vaddl.u8 q14, d1, d6
vmlal.u8 q15, d0, d16 ; op6 += p7 * 3
vadd.i16 q12, q13
vadd.i16 q15, q14
vaddl.u8 q14, d2, d9
vadd.i16 q15, q12
vaddl.u8 q12, d0, d1
vaddw.u8 q15, d1
vaddl.u8 q13, d0, d2
vadd.i16 q14, q15, q14
vqrshrn.u16 d16, q15, #4 ; w_op6
vsub.i16 q15, q14, q12
vaddl.u8 q14, d3, d10
vqrshrn.u16 d24, q15, #4 ; w_op5
vsub.i16 q15, q13
vaddl.u8 q13, d0, d3
vadd.i16 q15, q14
vaddl.u8 q14, d4, d11
vqrshrn.u16 d25, q15, #4 ; w_op4
vadd.i16 q15, q14
vaddl.u8 q14, d0, d4
vsub.i16 q15, q13
vsub.i16 q14, q15, q14
vqrshrn.u16 d26, q15, #4 ; w_op3
vaddw.u8 q15, q14, d5 ; op2 += p2
vaddl.u8 q14, d0, d5
vaddw.u8 q15, d12 ; op2 += q4
vbif d26, d4, d17 ; op3 |= p3 & ~(f2 & f & m)
vqrshrn.u16 d27, q15, #4 ; w_op2
vsub.i16 q15, q14
vaddl.u8 q14, d0, d6
vaddw.u8 q15, d6 ; op1 += p1
vaddw.u8 q15, d13 ; op1 += q5
vbif d27, d18, d17 ; op2 |= t_op2 & ~(f2 & f & m)
vqrshrn.u16 d18, q15, #4 ; w_op1
vsub.i16 q15, q14
vaddl.u8 q14, d0, d7
vaddw.u8 q15, d7 ; op0 += p0
vaddw.u8 q15, d14 ; op0 += q6
vbif d18, d19, d17 ; op1 |= t_op1 & ~(f2 & f & m)
vqrshrn.u16 d19, q15, #4 ; w_op0
vsub.i16 q15, q14
vaddl.u8 q14, d1, d8
vaddw.u8 q15, d8 ; oq0 += q0
vaddw.u8 q15, d15 ; oq0 += q7
vbif d19, d20, d17 ; op0 |= t_op0 & ~(f2 & f & m)
vqrshrn.u16 d20, q15, #4 ; w_oq0
vsub.i16 q15, q14
vaddl.u8 q14, d2, d9
vaddw.u8 q15, d9 ; oq1 += q1
vaddl.u8 q4, d10, d15
vaddw.u8 q15, d15 ; oq1 += q7
vbif d20, d21, d17 ; oq0 |= t_oq0 & ~(f2 & f & m)
vqrshrn.u16 d21, q15, #4 ; w_oq1
vsub.i16 q15, q14
vaddl.u8 q14, d3, d10
vadd.i16 q15, q4
vaddl.u8 q4, d11, d15
vbif d21, d22, d17 ; oq1 |= t_oq1 & ~(f2 & f & m)
vqrshrn.u16 d22, q15, #4 ; w_oq2
vsub.i16 q15, q14
vaddl.u8 q14, d4, d11
vadd.i16 q15, q4
vaddl.u8 q4, d12, d15
vbif d22, d23, d17 ; oq2 |= t_oq2 & ~(f2 & f & m)
vqrshrn.u16 d23, q15, #4 ; w_oq3
vsub.i16 q15, q14
vaddl.u8 q14, d5, d12
vadd.i16 q15, q4
vaddl.u8 q4, d13, d15
vbif d16, d1, d17 ; op6 |= p6 & ~(f2 & f & m)
vqrshrn.u16 d1, q15, #4 ; w_oq4
vsub.i16 q15, q14
vaddl.u8 q14, d6, d13
vadd.i16 q15, q4
vaddl.u8 q4, d14, d15
vbif d24, d2, d17 ; op5 |= p5 & ~(f2 & f & m)
vqrshrn.u16 d2, q15, #4 ; w_oq5
vsub.i16 q15, q14
vbif d25, d3, d17 ; op4 |= p4 & ~(f2 & f & m)
vadd.i16 q15, q4
vbif d23, d11, d17 ; oq3 |= q3 & ~(f2 & f & m)
vqrshrn.u16 d3, q15, #4 ; w_oq6
vbif d1, d12, d17 ; oq4 |= q4 & ~(f2 & f & m)
vbif d2, d13, d17 ; oq5 |= q5 & ~(f2 & f & m)
vbif d3, d14, d17 ; oq6 |= q6 & ~(f2 & f & m)
bx lr
ENDP ; |vp9_wide_mbfilter_neon|
END

View File

@@ -0,0 +1,198 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
;
EXPORT |vp9_short_idct16x16_1_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vp9_short_idct16x16_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vp9_short_idct16x16_1_add_neon| PROC
ldrsh r0, [r0]
; generate cospi_16_64 = 11585
mov r12, #0x2d00
add r12, #0x41
; out = dct_const_round_shift(input[0] * cospi_16_64)
mul r0, r0, r12 ; input[0] * cospi_16_64
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; out = dct_const_round_shift(out * cospi_16_64)
mul r0, r0, r12 ; out * cospi_16_64
mov r12, r1 ; save dest
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; a1 = ROUND_POWER_OF_TWO(out, 6)
add r0, r0, #32 ; + (1 <<((6) - 1))
asr r0, r0, #6 ; >> 6
vdup.s16 q0, r0 ; duplicate a1
mov r0, #8
sub r2, #8
; load destination data row0 - row3
vld1.64 {d2}, [r1], r0
vld1.64 {d3}, [r1], r2
vld1.64 {d4}, [r1], r0
vld1.64 {d5}, [r1], r2
vld1.64 {d6}, [r1], r0
vld1.64 {d7}, [r1], r2
vld1.64 {d16}, [r1], r0
vld1.64 {d17}, [r1], r2
vaddw.u8 q9, q0, d2 ; dest[x] + a1
vaddw.u8 q10, q0, d3 ; dest[x] + a1
vaddw.u8 q11, q0, d4 ; dest[x] + a1
vaddw.u8 q12, q0, d5 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
vaddw.u8 q9, q0, d6 ; dest[x] + a1
vaddw.u8 q10, q0, d7 ; dest[x] + a1
vaddw.u8 q11, q0, d16 ; dest[x] + a1
vaddw.u8 q12, q0, d17 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
; load destination data row4 - row7
vld1.64 {d2}, [r1], r0
vld1.64 {d3}, [r1], r2
vld1.64 {d4}, [r1], r0
vld1.64 {d5}, [r1], r2
vld1.64 {d6}, [r1], r0
vld1.64 {d7}, [r1], r2
vld1.64 {d16}, [r1], r0
vld1.64 {d17}, [r1], r2
vaddw.u8 q9, q0, d2 ; dest[x] + a1
vaddw.u8 q10, q0, d3 ; dest[x] + a1
vaddw.u8 q11, q0, d4 ; dest[x] + a1
vaddw.u8 q12, q0, d5 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
vaddw.u8 q9, q0, d6 ; dest[x] + a1
vaddw.u8 q10, q0, d7 ; dest[x] + a1
vaddw.u8 q11, q0, d16 ; dest[x] + a1
vaddw.u8 q12, q0, d17 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
; load destination data row8 - row11
vld1.64 {d2}, [r1], r0
vld1.64 {d3}, [r1], r2
vld1.64 {d4}, [r1], r0
vld1.64 {d5}, [r1], r2
vld1.64 {d6}, [r1], r0
vld1.64 {d7}, [r1], r2
vld1.64 {d16}, [r1], r0
vld1.64 {d17}, [r1], r2
vaddw.u8 q9, q0, d2 ; dest[x] + a1
vaddw.u8 q10, q0, d3 ; dest[x] + a1
vaddw.u8 q11, q0, d4 ; dest[x] + a1
vaddw.u8 q12, q0, d5 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
vaddw.u8 q9, q0, d6 ; dest[x] + a1
vaddw.u8 q10, q0, d7 ; dest[x] + a1
vaddw.u8 q11, q0, d16 ; dest[x] + a1
vaddw.u8 q12, q0, d17 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
; load destination data row12 - row15
vld1.64 {d2}, [r1], r0
vld1.64 {d3}, [r1], r2
vld1.64 {d4}, [r1], r0
vld1.64 {d5}, [r1], r2
vld1.64 {d6}, [r1], r0
vld1.64 {d7}, [r1], r2
vld1.64 {d16}, [r1], r0
vld1.64 {d17}, [r1], r2
vaddw.u8 q9, q0, d2 ; dest[x] + a1
vaddw.u8 q10, q0, d3 ; dest[x] + a1
vaddw.u8 q11, q0, d4 ; dest[x] + a1
vaddw.u8 q12, q0, d5 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
vaddw.u8 q9, q0, d6 ; dest[x] + a1
vaddw.u8 q10, q0, d7 ; dest[x] + a1
vaddw.u8 q11, q0, d16 ; dest[x] + a1
vaddw.u8 q12, q0, d17 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r0
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r0
vst1.64 {d31}, [r12], r2
bx lr
ENDP ; |vp9_short_idct16x16_1_add_neon|
END

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,68 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
;
EXPORT |vp9_short_idct4x4_1_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vp9_short_idct4x4_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vp9_short_idct4x4_1_add_neon| PROC
ldrsh r0, [r0]
; generate cospi_16_64 = 11585
mov r12, #0x2d00
add r12, #0x41
; out = dct_const_round_shift(input[0] * cospi_16_64)
mul r0, r0, r12 ; input[0] * cospi_16_64
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; out = dct_const_round_shift(out * cospi_16_64)
mul r0, r0, r12 ; out * cospi_16_64
mov r12, r1 ; save dest
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; a1 = ROUND_POWER_OF_TWO(out, 4)
add r0, r0, #8 ; + (1 <<((4) - 1))
asr r0, r0, #4 ; >> 4
vdup.s16 q0, r0 ; duplicate a1
vld1.32 {d2[0]}, [r1], r2
vld1.32 {d2[1]}, [r1], r2
vld1.32 {d4[0]}, [r1], r2
vld1.32 {d4[1]}, [r1]
vaddw.u8 q8, q0, d2 ; dest[x] + a1
vaddw.u8 q9, q0, d4
vqmovun.s16 d6, q8 ; clip_pixel
vqmovun.s16 d7, q9
vst1.32 {d6[0]}, [r12], r2
vst1.32 {d6[1]}, [r12], r2
vst1.32 {d7[0]}, [r12], r2
vst1.32 {d7[1]}, [r12]
bx lr
ENDP ; |vp9_short_idct4x4_1_add_neon|
END

View File

@@ -0,0 +1,190 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
EXPORT |vp9_short_idct4x4_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
AREA Block, CODE, READONLY ; name this block of code
;void vp9_short_idct4x4_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vp9_short_idct4x4_add_neon| PROC
; The 2D transform is done with two passes which are actually pretty
; similar. We first transform the rows. This is done by transposing
; the inputs, doing an SIMD column transform (the columns are the
; transposed rows) and then transpose the results (so that it goes back
; in normal/row positions). Then, we transform the columns by doing
; another SIMD column transform.
; So, two passes of a transpose followed by a column transform.
; load the inputs into q8-q9, d16-d19
vld1.s16 {q8,q9}, [r0]!
; generate scalar constants
; cospi_8_64 = 15137 = 0x3b21
mov r0, #0x3b00
add r0, #0x21
; cospi_16_64 = 11585 = 0x2d41
mov r3, #0x2d00
add r3, #0x41
; cospi_24_64 = 6270 = 0x 187e
mov r12, #0x1800
add r12, #0x7e
; transpose the input data
; 00 01 02 03 d16
; 10 11 12 13 d17
; 20 21 22 23 d18
; 30 31 32 33 d19
vtrn.16 d16, d17
vtrn.16 d18, d19
; generate constant vectors
vdup.16 d20, r0 ; replicate cospi_8_64
vdup.16 d21, r3 ; replicate cospi_16_64
; 00 10 02 12 d16
; 01 11 03 13 d17
; 20 30 22 32 d18
; 21 31 23 33 d19
vtrn.32 q8, q9
; 00 10 20 30 d16
; 01 11 21 31 d17
; 02 12 22 32 d18
; 03 13 23 33 d19
vdup.16 d22, r12 ; replicate cospi_24_64
; do the transform on transposed rows
; stage 1
vadd.s16 d23, d16, d18 ; (input[0] + input[2])
vsub.s16 d24, d16, d18 ; (input[0] - input[2])
vmull.s16 q15, d17, d22 ; input[1] * cospi_24_64
vmull.s16 q1, d17, d20 ; input[1] * cospi_8_64
; (input[0] + input[2]) * cospi_16_64;
; (input[0] - input[2]) * cospi_16_64;
vmull.s16 q13, d23, d21
vmull.s16 q14, d24, d21
; input[1] * cospi_24_64 - input[3] * cospi_8_64;
; input[1] * cospi_8_64 + input[3] * cospi_24_64;
vmlsl.s16 q15, d19, d20
vmlal.s16 q1, d19, d22
; dct_const_round_shift
vqrshrn.s32 d26, q13, #14
vqrshrn.s32 d27, q14, #14
vqrshrn.s32 d29, q15, #14
vqrshrn.s32 d28, q1, #14
; stage 2
; output[0] = step[0] + step[3];
; output[1] = step[1] + step[2];
; output[3] = step[0] - step[3];
; output[2] = step[1] - step[2];
vadd.s16 q8, q13, q14
vsub.s16 q9, q13, q14
vswp d18, d19
; transpose the results
; 00 01 02 03 d16
; 10 11 12 13 d17
; 20 21 22 23 d18
; 30 31 32 33 d19
vtrn.16 d16, d17
vtrn.16 d18, d19
; 00 10 02 12 d16
; 01 11 03 13 d17
; 20 30 22 32 d18
; 21 31 23 33 d19
vtrn.32 q8, q9
; 00 10 20 30 d16
; 01 11 21 31 d17
; 02 12 22 32 d18
; 03 13 23 33 d19
; do the transform on columns
; stage 1
vadd.s16 d23, d16, d18 ; (input[0] + input[2])
vsub.s16 d24, d16, d18 ; (input[0] - input[2])
vmull.s16 q15, d17, d22 ; input[1] * cospi_24_64
vmull.s16 q1, d17, d20 ; input[1] * cospi_8_64
; (input[0] + input[2]) * cospi_16_64;
; (input[0] - input[2]) * cospi_16_64;
vmull.s16 q13, d23, d21
vmull.s16 q14, d24, d21
; input[1] * cospi_24_64 - input[3] * cospi_8_64;
; input[1] * cospi_8_64 + input[3] * cospi_24_64;
vmlsl.s16 q15, d19, d20
vmlal.s16 q1, d19, d22
; dct_const_round_shift
vqrshrn.s32 d26, q13, #14
vqrshrn.s32 d27, q14, #14
vqrshrn.s32 d29, q15, #14
vqrshrn.s32 d28, q1, #14
; stage 2
; output[0] = step[0] + step[3];
; output[1] = step[1] + step[2];
; output[3] = step[0] - step[3];
; output[2] = step[1] - step[2];
vadd.s16 q8, q13, q14
vsub.s16 q9, q13, q14
; The results are in two registers, one of them being swapped. This will
; be taken care of by loading the 'dest' value in a swapped fashion and
; also storing them in the same swapped fashion.
; temp_out[0, 1] = d16, d17 = q8
; temp_out[2, 3] = d19, d18 = q9 swapped
; ROUND_POWER_OF_TWO(temp_out[j], 4)
vrshr.s16 q8, q8, #4
vrshr.s16 q9, q9, #4
vld1.32 {d26[0]}, [r1], r2
vld1.32 {d26[1]}, [r1], r2
vld1.32 {d27[1]}, [r1], r2
vld1.32 {d27[0]}, [r1] ; no post-increment
; ROUND_POWER_OF_TWO(temp_out[j], 4) + dest[j * dest_stride + i]
vaddw.u8 q8, q8, d26
vaddw.u8 q9, q9, d27
; clip_pixel
vqmovun.s16 d26, q8
vqmovun.s16 d27, q9
; do the stores in reverse order with negative post-increment, by changing
; the sign of the stride
rsb r2, r2, #0
vst1.32 {d27[0]}, [r1], r2
vst1.32 {d27[1]}, [r1], r2
vst1.32 {d26[1]}, [r1], r2
vst1.32 {d26[0]}, [r1] ; no post-increment
bx lr
ENDP ; |vp9_short_idct4x4_add_neon|
END

View File

@@ -0,0 +1,88 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license and patent
; grant that can be found in the LICENSE file in the root of the source
; tree. All contributing project authors may be found in the AUTHORS
; file in the root of the source tree.
;
EXPORT |vp9_short_idct8x8_1_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;void vp9_short_idct8x8_1_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vp9_short_idct8x8_1_add_neon| PROC
ldrsh r0, [r0]
; generate cospi_16_64 = 11585
mov r12, #0x2d00
add r12, #0x41
; out = dct_const_round_shift(input[0] * cospi_16_64)
mul r0, r0, r12 ; input[0] * cospi_16_64
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; out = dct_const_round_shift(out * cospi_16_64)
mul r0, r0, r12 ; out * cospi_16_64
mov r12, r1 ; save dest
add r0, r0, #0x2000 ; +(1 << ((DCT_CONST_BITS) - 1))
asr r0, r0, #14 ; >> DCT_CONST_BITS
; a1 = ROUND_POWER_OF_TWO(out, 5)
add r0, r0, #16 ; + (1 <<((5) - 1))
asr r0, r0, #5 ; >> 5
vdup.s16 q0, r0 ; duplicate a1
; load destination data
vld1.64 {d2}, [r1], r2
vld1.64 {d3}, [r1], r2
vld1.64 {d4}, [r1], r2
vld1.64 {d5}, [r1], r2
vld1.64 {d6}, [r1], r2
vld1.64 {d7}, [r1], r2
vld1.64 {d16}, [r1], r2
vld1.64 {d17}, [r1]
vaddw.u8 q9, q0, d2 ; dest[x] + a1
vaddw.u8 q10, q0, d3 ; dest[x] + a1
vaddw.u8 q11, q0, d4 ; dest[x] + a1
vaddw.u8 q12, q0, d5 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r2
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r2
vst1.64 {d31}, [r12], r2
vaddw.u8 q9, q0, d6 ; dest[x] + a1
vaddw.u8 q10, q0, d7 ; dest[x] + a1
vaddw.u8 q11, q0, d16 ; dest[x] + a1
vaddw.u8 q12, q0, d17 ; dest[x] + a1
vqmovun.s16 d2, q9 ; clip_pixel
vqmovun.s16 d3, q10 ; clip_pixel
vqmovun.s16 d30, q11 ; clip_pixel
vqmovun.s16 d31, q12 ; clip_pixel
vst1.64 {d2}, [r12], r2
vst1.64 {d3}, [r12], r2
vst1.64 {d30}, [r12], r2
vst1.64 {d31}, [r12], r2
bx lr
ENDP ; |vp9_short_idct8x8_1_add_neon|
END

View File

@@ -0,0 +1,519 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
EXPORT |vp9_short_idct8x8_add_neon|
EXPORT |vp9_short_idct10_8x8_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; Parallel 1D IDCT on all the columns of a 8x8 16bit data matrix which are
; loaded in q8-q15. The output will be stored back into q8-q15 registers.
; This macro will touch q0-q7 registers and use them as buffer during
; calculation.
MACRO
IDCT8x8_1D
; stage 1
vdup.16 d0, r3 ; duplicate cospi_28_64
vdup.16 d1, r4 ; duplicate cospi_4_64
vdup.16 d2, r5 ; duplicate cospi_12_64
vdup.16 d3, r6 ; duplicate cospi_20_64
; input[1] * cospi_28_64
vmull.s16 q2, d18, d0
vmull.s16 q3, d19, d0
; input[5] * cospi_12_64
vmull.s16 q5, d26, d2
vmull.s16 q6, d27, d2
; input[1]*cospi_28_64-input[7]*cospi_4_64
vmlsl.s16 q2, d30, d1
vmlsl.s16 q3, d31, d1
; input[5] * cospi_12_64 - input[3] * cospi_20_64
vmlsl.s16 q5, d22, d3
vmlsl.s16 q6, d23, d3
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d8, q2, #14 ; >> 14
vqrshrn.s32 d9, q3, #14 ; >> 14
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d10, q5, #14 ; >> 14
vqrshrn.s32 d11, q6, #14 ; >> 14
; input[1] * cospi_4_64
vmull.s16 q2, d18, d1
vmull.s16 q3, d19, d1
; input[5] * cospi_20_64
vmull.s16 q9, d26, d3
vmull.s16 q13, d27, d3
; input[1]*cospi_4_64+input[7]*cospi_28_64
vmlal.s16 q2, d30, d0
vmlal.s16 q3, d31, d0
; input[5] * cospi_20_64 + input[3] * cospi_12_64
vmlal.s16 q9, d22, d2
vmlal.s16 q13, d23, d2
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d14, q2, #14 ; >> 14
vqrshrn.s32 d15, q3, #14 ; >> 14
; stage 2 & stage 3 - even half
vdup.16 d0, r7 ; duplicate cospi_16_64
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d12, q9, #14 ; >> 14
vqrshrn.s32 d13, q13, #14 ; >> 14
; input[0] * cospi_16_64
vmull.s16 q2, d16, d0
vmull.s16 q3, d17, d0
; input[0] * cospi_16_64
vmull.s16 q13, d16, d0
vmull.s16 q15, d17, d0
; (input[0] + input[2]) * cospi_16_64
vmlal.s16 q2, d24, d0
vmlal.s16 q3, d25, d0
; (input[0] - input[2]) * cospi_16_64
vmlsl.s16 q13, d24, d0
vmlsl.s16 q15, d25, d0
vdup.16 d0, r8 ; duplicate cospi_24_64
vdup.16 d1, r9 ; duplicate cospi_8_64
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d18, q2, #14 ; >> 14
vqrshrn.s32 d19, q3, #14 ; >> 14
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d22, q13, #14 ; >> 14
vqrshrn.s32 d23, q15, #14 ; >> 14
; input[1] * cospi_24_64 - input[3] * cospi_8_64
; input[1] * cospi_24_64
vmull.s16 q2, d20, d0
vmull.s16 q3, d21, d0
; input[1] * cospi_8_64
vmull.s16 q8, d20, d1
vmull.s16 q12, d21, d1
; input[1] * cospi_24_64 - input[3] * cospi_8_64
vmlsl.s16 q2, d28, d1
vmlsl.s16 q3, d29, d1
; input[1] * cospi_8_64 + input[3] * cospi_24_64
vmlal.s16 q8, d28, d0
vmlal.s16 q12, d29, d0
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d26, q2, #14 ; >> 14
vqrshrn.s32 d27, q3, #14 ; >> 14
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d30, q8, #14 ; >> 14
vqrshrn.s32 d31, q12, #14 ; >> 14
vadd.s16 q0, q9, q15 ; output[0] = step[0] + step[3]
vadd.s16 q1, q11, q13 ; output[1] = step[1] + step[2]
vsub.s16 q2, q11, q13 ; output[2] = step[1] - step[2]
vsub.s16 q3, q9, q15 ; output[3] = step[0] - step[3]
; stage 3 -odd half
vdup.16 d16, r7 ; duplicate cospi_16_64
; stage 2 - odd half
vsub.s16 q13, q4, q5 ; step2[5] = step1[4] - step1[5]
vadd.s16 q4, q4, q5 ; step2[4] = step1[4] + step1[5]
vsub.s16 q14, q7, q6 ; step2[6] = -step1[6] + step1[7]
vadd.s16 q7, q7, q6 ; step2[7] = step1[6] + step1[7]
; step2[6] * cospi_16_64
vmull.s16 q9, d28, d16
vmull.s16 q10, d29, d16
; step2[6] * cospi_16_64
vmull.s16 q11, d28, d16
vmull.s16 q12, d29, d16
; (step2[6] - step2[5]) * cospi_16_64
vmlsl.s16 q9, d26, d16
vmlsl.s16 q10, d27, d16
; (step2[5] + step2[6]) * cospi_16_64
vmlal.s16 q11, d26, d16
vmlal.s16 q12, d27, d16
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d10, q9, #14 ; >> 14
vqrshrn.s32 d11, q10, #14 ; >> 14
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d12, q11, #14 ; >> 14
vqrshrn.s32 d13, q12, #14 ; >> 14
; stage 4
vadd.s16 q8, q0, q7 ; output[0] = step1[0] + step1[7];
vadd.s16 q9, q1, q6 ; output[1] = step1[1] + step1[6];
vadd.s16 q10, q2, q5 ; output[2] = step1[2] + step1[5];
vadd.s16 q11, q3, q4 ; output[3] = step1[3] + step1[4];
vsub.s16 q12, q3, q4 ; output[4] = step1[3] - step1[4];
vsub.s16 q13, q2, q5 ; output[5] = step1[2] - step1[5];
vsub.s16 q14, q1, q6 ; output[6] = step1[1] - step1[6];
vsub.s16 q15, q0, q7 ; output[7] = step1[0] - step1[7];
MEND
; Transpose a 8x8 16bit data matrix. Datas are loaded in q8-q15.
MACRO
TRANSPOSE8X8
vswp d17, d24
vswp d23, d30
vswp d21, d28
vswp d19, d26
vtrn.32 q8, q10
vtrn.32 q9, q11
vtrn.32 q12, q14
vtrn.32 q13, q15
vtrn.16 q8, q9
vtrn.16 q10, q11
vtrn.16 q12, q13
vtrn.16 q14, q15
MEND
AREA Block, CODE, READONLY ; name this block of code
;void vp9_short_idct8x8_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vp9_short_idct8x8_add_neon| PROC
push {r4-r9}
vpush {d8-d15}
vld1.s16 {q8,q9}, [r0]!
vld1.s16 {q10,q11}, [r0]!
vld1.s16 {q12,q13}, [r0]!
vld1.s16 {q14,q15}, [r0]!
; transpose the input data
TRANSPOSE8X8
; generate cospi_28_64 = 3196
mov r3, #0x0c00
add r3, #0x7c
; generate cospi_4_64 = 16069
mov r4, #0x3e00
add r4, #0xc5
; generate cospi_12_64 = 13623
mov r5, #0x3500
add r5, #0x37
; generate cospi_20_64 = 9102
mov r6, #0x2300
add r6, #0x8e
; generate cospi_16_64 = 11585
mov r7, #0x2d00
add r7, #0x41
; generate cospi_24_64 = 6270
mov r8, #0x1800
add r8, #0x7e
; generate cospi_8_64 = 15137
mov r9, #0x3b00
add r9, #0x21
; First transform rows
IDCT8x8_1D
; Transpose the matrix
TRANSPOSE8X8
; Then transform columns
IDCT8x8_1D
; ROUND_POWER_OF_TWO(temp_out[j], 5)
vrshr.s16 q8, q8, #5
vrshr.s16 q9, q9, #5
vrshr.s16 q10, q10, #5
vrshr.s16 q11, q11, #5
vrshr.s16 q12, q12, #5
vrshr.s16 q13, q13, #5
vrshr.s16 q14, q14, #5
vrshr.s16 q15, q15, #5
; save dest pointer
mov r0, r1
; load destination data
vld1.64 {d0}, [r1], r2
vld1.64 {d1}, [r1], r2
vld1.64 {d2}, [r1], r2
vld1.64 {d3}, [r1], r2
vld1.64 {d4}, [r1], r2
vld1.64 {d5}, [r1], r2
vld1.64 {d6}, [r1], r2
vld1.64 {d7}, [r1]
; ROUND_POWER_OF_TWO(temp_out[j], 5) + dest[j * dest_stride + i]
vaddw.u8 q8, q8, d0
vaddw.u8 q9, q9, d1
vaddw.u8 q10, q10, d2
vaddw.u8 q11, q11, d3
vaddw.u8 q12, q12, d4
vaddw.u8 q13, q13, d5
vaddw.u8 q14, q14, d6
vaddw.u8 q15, q15, d7
; clip_pixel
vqmovun.s16 d0, q8
vqmovun.s16 d1, q9
vqmovun.s16 d2, q10
vqmovun.s16 d3, q11
vqmovun.s16 d4, q12
vqmovun.s16 d5, q13
vqmovun.s16 d6, q14
vqmovun.s16 d7, q15
; store the data
vst1.64 {d0}, [r0], r2
vst1.64 {d1}, [r0], r2
vst1.64 {d2}, [r0], r2
vst1.64 {d3}, [r0], r2
vst1.64 {d4}, [r0], r2
vst1.64 {d5}, [r0], r2
vst1.64 {d6}, [r0], r2
vst1.64 {d7}, [r0], r2
vpop {d8-d15}
pop {r4-r9}
bx lr
ENDP ; |vp9_short_idct8x8_add_neon|
;void vp9_short_idct10_8x8_add_neon(int16_t *input, uint8_t *dest, int dest_stride)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride)
|vp9_short_idct10_8x8_add_neon| PROC
push {r4-r9}
vpush {d8-d15}
vld1.s16 {q8,q9}, [r0]!
vld1.s16 {q10,q11}, [r0]!
vld1.s16 {q12,q13}, [r0]!
vld1.s16 {q14,q15}, [r0]!
; transpose the input data
TRANSPOSE8X8
; generate cospi_28_64 = 3196
mov r3, #0x0c00
add r3, #0x7c
; generate cospi_4_64 = 16069
mov r4, #0x3e00
add r4, #0xc5
; generate cospi_12_64 = 13623
mov r5, #0x3500
add r5, #0x37
; generate cospi_20_64 = 9102
mov r6, #0x2300
add r6, #0x8e
; generate cospi_16_64 = 11585
mov r7, #0x2d00
add r7, #0x41
; generate cospi_24_64 = 6270
mov r8, #0x1800
add r8, #0x7e
; generate cospi_8_64 = 15137
mov r9, #0x3b00
add r9, #0x21
; First transform rows
; stage 1
; The following instructions use vqrdmulh to do the
; dct_const_round_shift(input[1] * cospi_28_64). vqrdmulh will do doubling
; multiply and shift the result by 16 bits instead of 14 bits. So we need
; to double the constants before multiplying to compensate this.
mov r12, r3, lsl #1
vdup.16 q0, r12 ; duplicate cospi_28_64*2
mov r12, r4, lsl #1
vdup.16 q1, r12 ; duplicate cospi_4_64*2
; dct_const_round_shift(input[1] * cospi_28_64)
vqrdmulh.s16 q4, q9, q0
mov r12, r6, lsl #1
rsb r12, #0
vdup.16 q0, r12 ; duplicate -cospi_20_64*2
; dct_const_round_shift(input[1] * cospi_4_64)
vqrdmulh.s16 q7, q9, q1
mov r12, r5, lsl #1
vdup.16 q1, r12 ; duplicate cospi_12_64*2
; dct_const_round_shift(- input[3] * cospi_20_64)
vqrdmulh.s16 q5, q11, q0
mov r12, r7, lsl #1
vdup.16 q0, r12 ; duplicate cospi_16_64*2
; dct_const_round_shift(input[3] * cospi_12_64)
vqrdmulh.s16 q6, q11, q1
; stage 2 & stage 3 - even half
mov r12, r8, lsl #1
vdup.16 q1, r12 ; duplicate cospi_24_64*2
; dct_const_round_shift(input_dc * cospi_16_64)
vqrdmulh.s16 q9, q8, q0
mov r12, r9, lsl #1
vdup.16 q0, r12 ; duplicate cospi_8_64*2
; dct_const_round_shift(input[1] * cospi_24_64)
vqrdmulh.s16 q13, q10, q1
; dct_const_round_shift(input[1] * cospi_8_64)
vqrdmulh.s16 q15, q10, q0
; stage 3 -odd half
vdup.16 d16, r7 ; duplicate cospi_16_64
vadd.s16 q0, q9, q15 ; output[0] = step[0] + step[3]
vadd.s16 q1, q9, q13 ; output[1] = step[1] + step[2]
vsub.s16 q2, q9, q13 ; output[2] = step[1] - step[2]
vsub.s16 q3, q9, q15 ; output[3] = step[0] - step[3]
; stage 2 - odd half
vsub.s16 q13, q4, q5 ; step2[5] = step1[4] - step1[5]
vadd.s16 q4, q4, q5 ; step2[4] = step1[4] + step1[5]
vsub.s16 q14, q7, q6 ; step2[6] = -step1[6] + step1[7]
vadd.s16 q7, q7, q6 ; step2[7] = step1[6] + step1[7]
; step2[6] * cospi_16_64
vmull.s16 q9, d28, d16
vmull.s16 q10, d29, d16
; step2[6] * cospi_16_64
vmull.s16 q11, d28, d16
vmull.s16 q12, d29, d16
; (step2[6] - step2[5]) * cospi_16_64
vmlsl.s16 q9, d26, d16
vmlsl.s16 q10, d27, d16
; (step2[5] + step2[6]) * cospi_16_64
vmlal.s16 q11, d26, d16
vmlal.s16 q12, d27, d16
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d10, q9, #14 ; >> 14
vqrshrn.s32 d11, q10, #14 ; >> 14
; dct_const_round_shift(input_dc * cospi_16_64)
vqrshrn.s32 d12, q11, #14 ; >> 14
vqrshrn.s32 d13, q12, #14 ; >> 14
; stage 4
vadd.s16 q8, q0, q7 ; output[0] = step1[0] + step1[7];
vadd.s16 q9, q1, q6 ; output[1] = step1[1] + step1[6];
vadd.s16 q10, q2, q5 ; output[2] = step1[2] + step1[5];
vadd.s16 q11, q3, q4 ; output[3] = step1[3] + step1[4];
vsub.s16 q12, q3, q4 ; output[4] = step1[3] - step1[4];
vsub.s16 q13, q2, q5 ; output[5] = step1[2] - step1[5];
vsub.s16 q14, q1, q6 ; output[6] = step1[1] - step1[6];
vsub.s16 q15, q0, q7 ; output[7] = step1[0] - step1[7];
; Transpose the matrix
TRANSPOSE8X8
; Then transform columns
IDCT8x8_1D
; ROUND_POWER_OF_TWO(temp_out[j], 5)
vrshr.s16 q8, q8, #5
vrshr.s16 q9, q9, #5
vrshr.s16 q10, q10, #5
vrshr.s16 q11, q11, #5
vrshr.s16 q12, q12, #5
vrshr.s16 q13, q13, #5
vrshr.s16 q14, q14, #5
vrshr.s16 q15, q15, #5
; save dest pointer
mov r0, r1
; load destination data
vld1.64 {d0}, [r1], r2
vld1.64 {d1}, [r1], r2
vld1.64 {d2}, [r1], r2
vld1.64 {d3}, [r1], r2
vld1.64 {d4}, [r1], r2
vld1.64 {d5}, [r1], r2
vld1.64 {d6}, [r1], r2
vld1.64 {d7}, [r1]
; ROUND_POWER_OF_TWO(temp_out[j], 5) + dest[j * dest_stride + i]
vaddw.u8 q8, q8, d0
vaddw.u8 q9, q9, d1
vaddw.u8 q10, q10, d2
vaddw.u8 q11, q11, d3
vaddw.u8 q12, q12, d4
vaddw.u8 q13, q13, d5
vaddw.u8 q14, q14, d6
vaddw.u8 q15, q15, d7
; clip_pixel
vqmovun.s16 d0, q8
vqmovun.s16 d1, q9
vqmovun.s16 d2, q10
vqmovun.s16 d3, q11
vqmovun.s16 d4, q12
vqmovun.s16 d5, q13
vqmovun.s16 d6, q14
vqmovun.s16 d7, q15
; store the data
vst1.64 {d0}, [r0], r2
vst1.64 {d1}, [r0], r2
vst1.64 {d2}, [r0], r2
vst1.64 {d3}, [r0], r2
vst1.64 {d4}, [r0], r2
vst1.64 {d5}, [r0], r2
vst1.64 {d6}, [r0], r2
vst1.64 {d7}, [r0], r2
vpop {d8-d15}
pop {r4-r9}
bx lr
ENDP ; |vp9_short_idct10_8x8_add_neon|
END

View File

@@ -0,0 +1,237 @@
;
; Copyright (c) 2013 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
EXPORT |vp9_short_iht4x4_add_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; Parallel 1D IDCT on all the columns of a 4x4 16bits data matrix which are
; loaded in d16-d19. d0 must contain cospi_8_64. d1 must contain
; cospi_16_64. d2 must contain cospi_24_64. The output will be stored back
; into d16-d19 registers. This macro will touch q10- q15 registers and use
; them as buffer during calculation.
MACRO
IDCT4x4_1D
; stage 1
vadd.s16 d23, d16, d18 ; (input[0] + input[2])
vsub.s16 d24, d16, d18 ; (input[0] - input[2])
vmull.s16 q15, d17, d2 ; input[1] * cospi_24_64
vmull.s16 q10, d17, d0 ; input[1] * cospi_8_64
vmull.s16 q13, d23, d1 ; (input[0] + input[2]) * cospi_16_64
vmull.s16 q14, d24, d1 ; (input[0] - input[2]) * cospi_16_64
vmlsl.s16 q15, d19, d0 ; input[1] * cospi_24_64 - input[3] * cospi_8_64
vmlal.s16 q10, d19, d2 ; input[1] * cospi_8_64 + input[3] * cospi_24_64
; dct_const_round_shift
vqrshrn.s32 d26, q13, #14
vqrshrn.s32 d27, q14, #14
vqrshrn.s32 d29, q15, #14
vqrshrn.s32 d28, q10, #14
; stage 2
; output[0] = step[0] + step[3];
; output[1] = step[1] + step[2];
; output[3] = step[0] - step[3];
; output[2] = step[1] - step[2];
vadd.s16 q8, q13, q14
vsub.s16 q9, q13, q14
vswp d18, d19
MEND
; Parallel 1D IADST on all the columns of a 4x4 16bits data matrix which
; loaded in d16-d19. d3 must contain sinpi_1_9. d4 must contain sinpi_2_9.
; d5 must contain sinpi_4_9. d6 must contain sinpi_3_9. The output will be
; stored back into d16-d19 registers. This macro will touch q11,q12,q13,
; q14,q15 registers and use them as buffer during calculation.
MACRO
IADST4x4_1D
vmull.s16 q10, d3, d16 ; s0 = sinpi_1_9 * x0
vmull.s16 q11, d4, d16 ; s1 = sinpi_2_9 * x0
vmull.s16 q12, d6, d17 ; s2 = sinpi_3_9 * x1
vmull.s16 q13, d5, d18 ; s3 = sinpi_4_9 * x2
vmull.s16 q14, d3, d18 ; s4 = sinpi_1_9 * x2
vmovl.s16 q15, d16 ; expand x0 from 16 bit to 32 bit
vaddw.s16 q15, q15, d19 ; x0 + x3
vmull.s16 q8, d4, d19 ; s5 = sinpi_2_9 * x3
vsubw.s16 q15, q15, d18 ; s7 = x0 + x3 - x2
vmull.s16 q9, d5, d19 ; s6 = sinpi_4_9 * x3
vadd.s32 q10, q10, q13 ; x0 = s0 + s3 + s5
vadd.s32 q10, q10, q8
vsub.s32 q11, q11, q14 ; x1 = s1 - s4 - s6
vdup.32 q8, r0 ; duplicate sinpi_3_9
vsub.s32 q11, q11, q9
vmul.s32 q15, q15, q8 ; x2 = sinpi_3_9 * s7
vadd.s32 q13, q10, q12 ; s0 = x0 + x3
vadd.s32 q10, q10, q11 ; x0 + x1
vadd.s32 q14, q11, q12 ; s1 = x1 + x3
vsub.s32 q10, q10, q12 ; s3 = x0 + x1 - x3
; dct_const_round_shift
vqrshrn.s32 d16, q13, #14
vqrshrn.s32 d17, q14, #14
vqrshrn.s32 d18, q15, #14
vqrshrn.s32 d19, q10, #14
MEND
; Generate cosine constants in d6 - d8 for the IDCT
MACRO
GENERATE_COSINE_CONSTANTS
; cospi_8_64 = 15137 = 0x3b21
mov r0, #0x3b00
add r0, #0x21
; cospi_16_64 = 11585 = 0x2d41
mov r3, #0x2d00
add r3, #0x41
; cospi_24_64 = 6270 = 0x187e
mov r12, #0x1800
add r12, #0x7e
; generate constant vectors
vdup.16 d0, r0 ; duplicate cospi_8_64
vdup.16 d1, r3 ; duplicate cospi_16_64
vdup.16 d2, r12 ; duplicate cospi_24_64
MEND
; Generate sine constants in d1 - d4 for the IADST.
MACRO
GENERATE_SINE_CONSTANTS
; sinpi_1_9 = 5283 = 0x14A3
mov r0, #0x1400
add r0, #0xa3
; sinpi_2_9 = 9929 = 0x26C9
mov r3, #0x2600
add r3, #0xc9
; sinpi_4_9 = 15212 = 0x3B6C
mov r12, #0x3b00
add r12, #0x6c
; generate constant vectors
vdup.16 d3, r0 ; duplicate sinpi_1_9
; sinpi_3_9 = 13377 = 0x3441
mov r0, #0x3400
add r0, #0x41
vdup.16 d4, r3 ; duplicate sinpi_2_9
vdup.16 d5, r12 ; duplicate sinpi_4_9
vdup.16 q3, r0 ; duplicate sinpi_3_9
MEND
; Transpose a 4x4 16bits data matrix. Datas are loaded in d16-d19.
MACRO
TRANSPOSE4X4
vtrn.16 d16, d17
vtrn.16 d18, d19
vtrn.32 q8, q9
MEND
AREA Block, CODE, READONLY ; name this block of code
;void vp9_short_iht4x4_add_neon(int16_t *input, uint8_t *dest,
; int dest_stride, int tx_type)
;
; r0 int16_t input
; r1 uint8_t *dest
; r2 int dest_stride
; r3 int tx_type)
; This function will only handle tx_type of 1,2,3.
|vp9_short_iht4x4_add_neon| PROC
; load the inputs into d16-d19
vld1.s16 {q8,q9}, [r0]!
; transpose the input data
TRANSPOSE4X4
; decide the type of transform
cmp r3, #2
beq idct_iadst
cmp r3, #3
beq iadst_iadst
iadst_idct
; generate constants
GENERATE_COSINE_CONSTANTS
GENERATE_SINE_CONSTANTS
; first transform rows
IDCT4x4_1D
; transpose the matrix
TRANSPOSE4X4
; then transform columns
IADST4x4_1D
b end_vp9_short_iht4x4_add_neon
idct_iadst
; generate constants
GENERATE_COSINE_CONSTANTS
GENERATE_SINE_CONSTANTS
; first transform rows
IADST4x4_1D
; transpose the matrix
TRANSPOSE4X4
; then transform columns
IDCT4x4_1D
b end_vp9_short_iht4x4_add_neon
iadst_iadst
; generate constants
GENERATE_SINE_CONSTANTS
; first transform rows
IADST4x4_1D
; transpose the matrix
TRANSPOSE4X4
; then transform columns
IADST4x4_1D
end_vp9_short_iht4x4_add_neon
; ROUND_POWER_OF_TWO(temp_out[j], 4)
vrshr.s16 q8, q8, #4
vrshr.s16 q9, q9, #4
vld1.32 {d26[0]}, [r1], r2
vld1.32 {d26[1]}, [r1], r2
vld1.32 {d27[0]}, [r1], r2
vld1.32 {d27[1]}, [r1]
; ROUND_POWER_OF_TWO(temp_out[j], 4) + dest[j * dest_stride + i]
vaddw.u8 q8, q8, d26
vaddw.u8 q9, q9, d27
; clip_pixel
vqmovun.s16 d26, q8
vqmovun.s16 d27, q9
; do the stores in reverse order with negative post-increment, by changing
; the sign of the stride
rsb r2, r2, #0
vst1.32 {d27[1]}, [r1], r2
vst1.32 {d27[0]}, [r1], r2
vst1.32 {d26[1]}, [r1], r2
vst1.32 {d26[0]}, [r1] ; no post-increment
bx lr
ENDP ; |vp9_short_iht4x4_add_neon|
END

Some files were not shown because too many files have changed in this diff Show More