Commit Graph

17301 Commits

Author SHA1 Message Date
Jerome Jiang
a5ab38093f Merge "Fix vp8 race when build --enable-vp9-highbitdepth." 2017-05-30 05:47:44 +00:00
Johann Koenig
2693b89c19 Merge "subpel variance neon: reduce stack usage" 2017-05-26 17:25:47 +00:00
Johann Koenig
47174d60c8 Merge "Use vdup instead of vmov" 2017-05-26 17:25:24 +00:00
Jerome Jiang
0afa2dad76 Fix vp8 race when build --enable-vp9-highbitdepth.
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.

BUG=webm:1435

Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
2017-05-26 09:45:01 -07:00
Marco
146005a911 vp9: SVC: Fix to condiiton on using source_sad.
Fix the condition on usage of source_sad for temporal layers.
FIx allows it to be used for the case of 1 temporal layer.

Change-Id: I02b1b0ade67a7889d1b93cee66d27c0951131fc3
2017-05-26 08:46:50 -07:00
Marco Paniconi
9ec9415fd9 Merge "vp9: Use source_sad only on top temporal enhancement layer." 2017-05-26 05:24:06 +00:00
Marco Paniconi
4be18ab295 Merge "vp9: SVC: Enable copy partition for SVC speed >= 7." 2017-05-26 05:23:47 +00:00
Marco
ea914456af vp9: Use source_sad only on top temporal enhancement layer.
For 1 pass CBR SVC mode.

Change-Id: Ic026740f9d0ec5eee7c5845be9c5b15884fec48d
2017-05-25 16:32:05 -07:00
Jerome Jiang
327c9bb1da Refactor: Move vp8 skin detection to new files.
Change-Id: If760f28cbbf22beac1cc9bd1546f13831e9dd3f0
2017-05-25 16:12:27 -07:00
Marco
747cf7a505 vp9: SVC: Enable copy partition for SVC speed >= 7.
Adjust the max_copied_frame setting for temporal layers.
Keep the same setting for non-SVC at speed 8.
This change also enables copy_partiton for non-SVC at speed 7,
but with smaller value of max_copied_frame (=2).

~2% speedup for SVC speed 7, 3 layers, with little/no quality loss.

Change-Id: Ic65ac9aad764ec65a35770d263424b2393ec6780
2017-05-25 12:21:46 -07:00
Johann
f3c97ed32e subpel variance neon: reduce stack usage
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).

But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.

There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.

Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
2017-05-24 13:28:13 -07:00
Johann
d204c4bf01 Use vdup instead of vmov
Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a
2017-05-24 11:38:15 -07:00
Johann Koenig
de1a9c77a7 Merge changes Iaab2b9a1,Idfb458d3
* changes:
  sub pel avg variance neon: 4x block sizes
  sub pel variance neon: 4x block sizes
2017-05-24 18:33:53 +00:00
Johann Koenig
b11a37f540 Merge changes I31fa6ef8,I228c6f29
* changes:
  sub pel avg variance neon: add neon optimizations
  sub pel variance neon: normalize variable names
2017-05-24 18:32:02 +00:00
James Zern
f0279ceb92 Merge "partial_idct_test,InitInput: fix rollover in mult" 2017-05-24 16:27:21 +00:00
James Zern
566f6d75bd partial_idct_test,InitInput: fix rollover in mult
promote coeff to signed 64-bit to avoid exceeding integer bounds when
squaring the value

Change-Id: If77bef6bc0a6a4c39ca3013e5e2ddb426a1c6e1f
2017-05-24 15:27:38 +02:00
Linfeng Zhang
6444958f62 Update inv_txfm_sse2.h and inv_txfm_sse2.c
Extract shared code into inline functions.

Change-Id: Iee1e5a4bc6396aeed0d301163095c9b21aa66b2f
2017-05-23 14:54:46 -07:00
Linfeng Zhang
36f1b183e4 Update InitInput() in test/partial_idct_test.cc
Make it work in high bit depth.

BUG=webm:1412

Change-Id: Ic5cfd410a69709f01e2924774356a108a349d273
2017-05-23 14:24:23 -07:00
Gregor Jasny
bcfd9c9750 Add support for Visual Studio 2017
BUG=webm:1428

Change-Id: Iba98aef1159724d106cf39b94d7b69843d76cd48
2017-05-23 11:32:27 +02:00
Johann
f6fcd3410d sub pel avg variance neon: 4x block sizes
BUG=webm:1423

Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b
2017-05-22 14:40:05 -07:00
Johann
188d58eaa9 sub pel variance neon: 4x block sizes
Add optimizations for blocks of width 4

BUG=webm:1423

Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938
2017-05-22 14:40:01 -07:00
Johann
9b0d306a2f sub pel avg variance neon: add neon optimizations
These are missing an optimized version of vpx_comp_avg_pred

BUG=webm:1423

Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed
2017-05-22 13:58:43 -07:00
Johann
e0d294c3af sub pel variance neon: normalize variable names
match vpx_dsp/variance.c variable names

Change-Id: I228c6f296c183af147b079b7c8bcdf97bd09cf3a
2017-05-22 13:58:43 -07:00
Linfeng Zhang
27beada6d0 Merge "Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2" 2017-05-22 20:58:18 +00:00
Johann
67ac68e399 variance neon: assert overflow conditions
Change-Id: I12faca82d062eb33dc48dfeb39739b25112316cd
2017-05-22 11:25:06 -07:00
Linfeng Zhang
c167345ffb Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2
BUG=webm:1412

Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
2017-05-22 11:24:21 -07:00
Johann
d217c87139 neon variance: special case 4x
The sub pixel variance uses a temp buffer which guarantees width ==
stride. Take advantage of this with the 4x and avoid the very costly
lane loads.

Change-Id: Ia0c97eb8c29dc8dfa6e51a29dff9b75b3c6726f1
2017-05-22 10:51:31 -07:00
Johann Koenig
e7cac13016 Merge changes Ib8dd96f7,Ie9854b77
* changes:
  neon variance: process 4x blocks
  use memcpy for unaligned neon stores
2017-05-22 17:48:33 +00:00
Marco Paniconi
b3bf91bdc6 Merge "vp9: Adjustments to cyclic refresh for high motion." 2017-05-22 06:27:30 +00:00
Marco
2adc0443dd vp9: Adjustments to cyclic refresh for high motion.
For aq-mode=3: refactor the condition for turning off
the refresh. Add some adjustments for high motion content.

No/little change in RTC metrics, only affects high motion case.

Change-Id: I7da8eabfb0e61db014be4562806f72ee5ef4a43b
2017-05-21 22:21:44 -07:00
Marco
ff9395eb3b vp9: Speed >= 8: Modify condition for low-resoln.
No change on RTC metrics.

Change-Id: I5abc573cb56572188d900645d13ba479f55a1ea0
2017-05-21 22:14:38 -07:00
Johann Koenig
b5055002d7 Merge "neon 4 byte helper functions" 2017-05-19 17:11:30 +00:00
Johann Koenig
3c603eadb4 Merge "neon fdct: 4x4 implementation" 2017-05-19 17:08:58 +00:00
Paul Wilkins
a7977ece93 Merge "Changes to modified error." 2017-05-19 12:24:32 +00:00
Marco
1205e3207e vp9: SVC: Modify condition to allow for copy partition.
When temporal layers are used, only allow for copy partition
on the top temporal enhancement layer frames.

Change-Id: I5472abdc0f9f6c8dafa75a7a84c615e08ae22af8
2017-05-18 14:19:31 -07:00
Jerome Jiang
6b6ff9c969 Merge "vp9: Make copy partition work for SVC and dynamic resize." 2017-05-18 19:37:30 +00:00
Marco
2ba4729ef8 vp9: Make copy partition work for SVC and dynamic resize.
Only affects speed 8.

Make changes to copy partition to fix a bug in setting microblock
offset. Avg PSNR shows 0.02% gain on rtc_derf and 0.08% loss on rtc.

Change-Id: I61c3e5914dde645331344388e7437e5638acd4f3
2017-05-18 11:33:56 -07:00
paulwilkins
5680b4517f Changes to modified error.
The modified error was a derivative of the "coded_error"
that was used to allocate bits between different frames on the
assumption that the allocation should be linear in terms of this
modified error.  I.e. a frame with double the modified error score
should all things being equal get double the number of bits. The
code also included upper and lower caps derived from input
VBR parameters.

This patch improves the initial calculation of the clip mean error
(now called "mean_mod_score" as it is no longer a prediction error)
used as the midpoint for the rate distribution function and normalizes
the output "modified scores" scores such that 1.0 indicates a frame
in the middle of the distribution.  The VBR upper and lower caps are
then applied directly to a  frame's normalized score.

This refactoring is intended to make it easier to drop in alternative
distribution functions or to base the rate allocation on a corpus wide
midpoint (rather than the clip mean).

Change-Id: I4fb09de637e93566bfc4e022b2e7d04660817195
2017-05-18 12:56:02 +01:00
Johann
7b742da63e neon variance: process 4x blocks
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.

BUG=webm:1422

Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
2017-05-17 17:35:01 -07:00
Johann
2057d3ef75 use memcpy for unaligned neon stores
Advise the compiler that the store is eventually going to a uint8_t
buffer. This helps avoid getting alignment hints which would cause the
memory access to fail.

Originally added as a workaround for clang:
https://bugs.llvm.org//show_bug.cgi?id=24421

Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e
2017-05-17 12:11:31 -07:00
Marco Paniconi
a2dfbbd7d6 Merge "vp9: Modify ChangingDropFrameThresh unittest." 2017-05-17 18:42:51 +00:00
Linfeng Zhang
13918a9ccc Merge "Update partial idct testing code" 2017-05-17 17:53:03 +00:00
Yaowu Xu
bde2c04fb7 Merge "Experiment. Store first pass errors as per MB values." 2017-05-17 17:38:15 +00:00
Marco
4733df333f vp9: Modify ChangingDropFrameThresh unittest.
Add another (lower) bitrate to the test, to cover
frame drop behavior at low bitrate range.

Change-Id: Iaad003974159daf3d2d65ef3a6575a3e72e498d6
2017-05-17 09:38:21 -07:00
Linfeng Zhang
3210ca6d60 Update partial idct testing code
Add PartialIDctTest::PrintDiff() to help debugging.
In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct.
Update PartialIDctTest::InitInput().

Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
2017-05-17 09:28:32 -07:00
Johann
105503b839 neon fdct: 4x4 implementation
Approximately twice as fast as C implementation.

BUG=webm:1424

Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f
2017-05-17 07:38:18 -07:00
paulwilkins
42e5073f94 Experiment. Store first pass errors as per MB values.
Most existing first pass stats are stored in a form normalized to a
macro-block scale. However the error scores for intra / inter etc were
stored as frame level values but mainly used as MB level values.

This change  fixes that. Normalized per MB values make comparisons
between different formats easier and in any case this is usually what is
wanted.

An change in results should be limited to slight differences in rounding.

*** Change after patch 8 +2 requiring new approval.

Final pre-submit testing showed  one 4K clip with above expected change.
Investigation showed this was due to a value used to test for ultra low intra
complexity in key frame detection. This was a per frame not per MB value but
also did not scale with frame size. Replacement with a small per MB value
(based on original per frame value and cif frame size) resolved the KF detection
problem.

Also converted kf_group_error_left to a double in line with other error values
to reduce rounding problems in KF group bit allocation

All clips and sets now show nominal (or 0) change as expected.

Change-Id: Ic2d57980398c99ade2b7380e3e6ca6b32186901f
2017-05-17 12:00:18 +01:00
Linfeng Zhang
18e8baa5c0 Add transpose_32bit_4x4() and rename transpose_4x4() for vpx_dsp/x86
Change-Id: Ib57377f6cf6573c04720d3cc5dea4285362b4220
2017-05-16 17:46:37 -07:00
Johann Koenig
31cb852a90 Merge "Revert "Add visibility="protected" attribute for global variables referenced in asm files."" 2017-05-16 23:39:37 +00:00
Johann Koenig
2300e16675 Revert "Add visibility="protected" attribute for global variables referenced in asm files."
This reverts commit 0d88e15454.

Reason for revert: chromium builds are failing to locate vpx_rv during dlopen()

dlopen failed: cannot locate symbol "vpx_rv" referenced by "libstandalonelibwebviewchromium.so"

Original change's description:
> Add visibility="protected" attribute for global variables referenced in asm files.
>
> During aosp builds with binutils-2.27, we're seeing linker error
> messages of this form:
> libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
> symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
> object
>
> subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
> Other messages refer to symbol references from deblock_sse2.o and
> subpixel_sse2.o, also assembled from asm files.
>
> This change marks such symbols as having "protected" visibility. This
> satisfies the linker as the symbols are not preemptible from outside
> the shared library now, which I think is the original intent anyway.
>
> Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
>

TBR=jzern@google.com,johannkoenig@google.com,rahulchaudhry@chromium.org,builds@webmproject.org

Change-Id: I0c2ea375aa7ef5fda15b9d9e23e654bb315c941b
2017-05-16 15:54:33 -07:00