Commit Graph

4920 Commits

Author SHA1 Message Date
Peter de Rivaz
2c886953d1 Reinsert macro to fix issue 884.
Change 72056 unfolded some macro definitions,
but lost some alternative behaviour required for
high bitdepth encodes.
This causes the encoder to crash, see issue 884.

Change-Id: I8ce4d73c9fe0a3c10ccb86fba210fabc8b2f0ccc
2014-12-02 13:45:26 -08:00
Deb Mukherjee
02941b0df2 Fix a warning related to VPX_EFLAG_FORCE_KF check
Fixes a warning in chrome build.

Change-Id: I8fa0fd3e7ba1aecf89e5f79ce94cd64ed6a9567c
2014-12-02 11:35:52 -08:00
Peter de Rivaz
7e40a55ef9 Added high bitdepth sse2 transform functions
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.

Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05)
(cherry picked from commit 08d2f54800)
(cherry picked from commit 4230c2306c)
2014-12-02 11:16:24 -08:00
Paul Wilkins
00e3626e13 Use average mb energy from first pass in AQ2 test.
AQ2 modified to use mb_av_energy in defining variance
thresholds used alongside complexity when defining the
segment to be used for an SB64.

Slight improvements in metrics (ssim and PSNR).

Change-Id: Idb9cb73f7d9c4f7118cd7e84ac77b0f25cacbf81
2014-12-02 16:07:30 +00:00
Marco Paniconi
83fd18977f Cyclic refresh: factor segment delta-q into rate control.
Incorporate segment delta-q into estimated bits.
This generally improves the rate control under cyclic refresh (aq=3) mode.

Change-Id: I1dc60fb230e7d08357fae18909d8ed27bf58e037
2014-12-01 16:56:43 -08:00
Jingning Han
f59cb45e90 Merge "Remove repeated search_type_check_frequency assign" 2014-12-01 14:02:10 -08:00
Yunqing Wang
7af927e324 Merge "vp9_ethread: calculate and save the tok starting address for tiles" 2014-12-01 12:49:03 -08:00
Paul Wilkins
0d3d6e0e31 Increase strength of AQ1.
This patch greatly increase the strength of AQ1.

Visual tests show strong gains on many clips but their is a big
hit on psnr.

SSIM is more mixed with some winners and losers.

Change-Id: Idaa5d3b41d8576096bfa000b62bc531c3d8bf6a1
2014-11-27 10:53:37 +00:00
Jingning Han
a6df0cbcca Remove repeated search_type_check_frequency assign
This parameter is initialized as 50. No need to re-assign the
same value in speed -6.

Change-Id: I8735a5593412df2fdcee53ae45c8ebd1c3d792e7
2014-11-25 18:36:41 -08:00
Yunqing Wang
0993bef7e9 vp9_ethread: calculate and save the tok starting address for tiles
Each tile's tok starting address is calculated before the encoding
process. These addresses are stored so that the same calculation
won't be done again in packing bit stream.

Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50
2014-11-25 17:19:35 -08:00
Yaowu Xu
e4234b3f8b Separate rate_correction_factor for boosted GFs
When the golden frame is boosted, the rate correction factor is not
correlated well with other inter frames even in CBR mode. This commit
changes to use GF specific rate_correction_factor when gf_cbr_boost
is greater than 20%.

Change-Id: I6312c1564387bcacc11f4c5e8a9cfdc781b5c3ab
2014-11-25 14:32:07 -08:00
Jingning Han
a04ed98482 Cosmetic change in vp9_pick_inter_mode
Change-Id: Ic072585ebffdb36982ed7b8b9f875ca6c1c656c4
2014-11-25 09:42:57 -08:00
Jingning Han
92a7cfc8bf Adaptively adjust mode test kick-off thresholds in RTC coding
This commit allows the encoder to increase the mode test kick-off
thresholds if the previous best mode renders all zero quantized
coefficients, thereby saving motion search runs when possible.
The compression performance of speed -5 and -6 is down by -0.446%
and 0.591%, respectively. The runtime of speed -6 is improved by
10% for many test clips.

vidyo1, 1000 kbps
16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms

nik720p, 1000 kbps
33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms

dark720p, 1000 kbps
33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms

mmoving, 1000 kbps
33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms

Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90
2014-11-25 09:42:08 -08:00
Jingning Han
30104207fd Merge "Rework forward txfm/quantization skip system in RTC coding mode" 2014-11-25 09:33:57 -08:00
Jingning Han
6912c44135 Merge "Remove redundant intra mode penalty from vp9_pick_inter_mode" 2014-11-24 22:13:44 -08:00
Yunqing Wang
edbd61e136 vp9_ethread: modify VP9_COMP structure
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.

In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.

Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
2014-11-24 17:57:38 -08:00
Jingning Han
25be81e2dd Remove redundant intra mode penalty from vp9_pick_inter_mode
The intra mode penalty is covered by intra_cost_penalty. This
commit removes the other intra cost threshold, provided that the
constant 50 is negligible in normal rate-distortion cost.

Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72
2014-11-24 14:55:59 -08:00
Jingning Han
e6fb9c0b0b Merge "Key frame non-RD mode decision process" 2014-11-24 13:21:56 -08:00
Debargha Mukherjee
e9d9f1adab Merge "Refactored idct routines and headers" 2014-11-24 12:47:03 -08:00
John Stark
71379b87df Changes to assembler for NASM on mac.
fixes non-Apple nasm part of issue #755

Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a
2014-11-24 12:00:50 -08:00
Peter de Rivaz
3a8c43a479 Refactored idct routines and headers
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.

The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected.  For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.

Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
(cherry picked from commit 454342d4e7)
2014-11-24 09:57:40 -08:00
Jingning Han
2fbdfd2c66 Key frame non-RD mode decision process
This commit makes a non-RD coding mode decision process for key
frame coding. It can be optionally turned on in speed -6 and above.

Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5
2014-11-24 09:04:28 -08:00
Marco
681d5e9024 Merge "Only allow for cyclic refresh (aq=3 mode) for base layer." 2014-11-24 07:46:36 -08:00
Paul Wilkins
2232c3e34b Merge "Fix some minor nits." 2014-11-21 17:39:43 -08:00
Debargha Mukherjee
02355a4abf Merge "Added highbitdepth sse2 acceleration for quantize" 2014-11-21 16:08:47 -08:00
Paul Wilkins
d28e9ed452 Merge changes Ie077edd0,Id31a74fc
* changes:
  Remove rate component adjustment for AQ1
  Switch AQ1 segment basis from q ratio to rate ratio.
2014-11-21 15:38:32 -08:00
Paul Wilkins
771259fe10 Merge "Add adaptive midpoint for AQ1." 2014-11-21 15:26:18 -08:00
Paul Wilkins
6dbf83d082 Merge "Add variance restriction to AQ2." 2014-11-21 15:25:43 -08:00
Marco
53c3f2ca4d Only allow for cyclic refresh (aq=3 mode) for base layer.
Condition existed for temporal case, added it for spatial as well.
Issue: https://code.google.com/p/webm/issues/detail?id=878.

Change-Id: I38339207f9a94924f5568a081eabe64f867a686d
2014-11-21 14:47:32 -08:00
Paul Wilkins
ea494c0e76 Fix some minor nits.
Change-Id: Ib8810d431fa20a2c78e0caaa28eb2c99903e60fb
2014-11-21 14:13:59 -08:00
Paul Wilkins
a867bb538b Merge "Further AQ1 clean up." 2014-11-21 12:58:03 -08:00
Jingning Han
7428cebe4f Rework forward txfm/quantization skip system in RTC coding mode
This commit allows more aggressive decision to skip forward
transform and quantization for luma component in RTC coding mode.
The chroma components remains going through the normal coding
routine, since they are not included in the non-RD mode search
process.

It reduces the runtime cost by 2% - 10%. In speed -6,
vidyo1 1000 kbps
16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms

nik720p 1000 kbps
33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms

dark720p 1000 kbps
33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms

The compression performance of speed -6 is improved by 0.44% in
PSNR and 1.31% in SSIM.

Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7
2014-11-21 12:46:40 -08:00
Paul Wilkins
b87c51ce55 Merge "Initial AQ1 restructuring." 2014-11-21 12:10:03 -08:00
Paul Wilkins
f5209d7e01 Remove rate component adjustment for AQ1
In AQ1 a rate adjustment was applied for blocks coded with a
deltaq. This tends to skew the partition selection and cause
rate overshoot.

For example, consider a 64x64 super block where some but not all
sub blocks are in a low q segment and some are in a high q segment.
The choice of Q when considering large partition and transform sizes
is defined by the lowest sub block segment id (currently this implies the
lowest Q). If some parts of the larger partition are very hard this will
cause a high rate component.

The correct behavior here is for the rd code to discard the large partition
choice and break down to sub blocks where some have low and some
have high Q.  However the rate correction factor above mask the high
cost of coding at a larger partition size.

Change-Id: Ie077edd0b1b43c094898f481df772ea280b35960
2014-11-21 08:51:58 -08:00
Paul Wilkins
1663eff7f8 Switch AQ1 segment basis from q ratio to rate ratio.
In defining the Q deltas for segments in AQ1 use a rate
ratio rather than a q ratio.

Change-Id: Id31a74fcf2b7e55437e42a51c21b3cbcb57028d4
2014-11-21 08:50:57 -08:00
Paul Wilkins
fc47c5d653 Add adaptive midpoint for AQ1.
Make the midpoint variance used in AQ mode 1 segmentation
depend on the overall complexity of the frame in two pass.

Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd
2014-11-20 18:37:34 -08:00
Alex Converse
bc1b3d8412 Allow DC/H/V/TM on screen content.
6.3% better compression
less than 1% compression time increase

Change-Id: Ie83c059436e54c09de9e7c87e06e0a6d40dc38fe
2014-11-20 18:04:57 -08:00
Alex Converse
722e9d611b Drop special inter mode selection for screen content.
Better mode selection was implemented for all content.

Change-Id: I479778ed21d3968892f4dce396c83733583f4f23
2014-11-20 18:04:57 -08:00
Yunqing Wang
72522dbc86 Merge "vp9_ethread: move filter_cache out of RD_OPT struct" 2014-11-20 16:51:31 -08:00
Paul Wilkins
d031237999 Add variance restriction to AQ2.
Add an additional restriction to bit/complexity based
segmentation based on spatial variance.

Only lower Q when both the number of bits spent
in the initial encoding pass and the spatial complexity are
below a threshold. This will prevent the low Q segments
being used just because there is a surfeit of bits.

Small metrics gains especially opsnr.
derf ~0.2% std-hd ~0.3%

Change-Id: I6a8496d466d673f9b0e2b2ca6304ea7b6d8e1cce
2014-11-20 16:23:35 -08:00
Paul Wilkins
3d1e8c9a85 Further AQ1 clean up.
Further patch to restructure AQ mode 1.

Change-Id: I566452a033d047a49a40441a7be24690ea69412d
2014-11-20 16:00:51 -08:00
Paul Wilkins
6a760d483d Initial AQ1 restructuring.
This is the first of a series of patches to restructure and
improve AQ mode 1 (variance based AQ).

Change-Id: Idcf693131a3ea2459dcfd957a54a65b971fa4a2a
2014-11-20 15:50:15 -08:00
Paul Wilkins
b74eeb8675 Merge "Fix bug in calculating number of mbs with scaling." 2014-11-20 15:45:41 -08:00
Yunqing Wang
54ba65a63e Merge "vp9_ethread: move max/min partition size to mb struct" 2014-11-20 14:00:37 -08:00
Yunqing Wang
379334c2d8 vp9_ethread: move filter_cache out of RD_OPT struct
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.

Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
2014-11-20 13:44:16 -08:00
Yunqing Wang
0b71fdbf80 Merge "vp9_ethread: change mask_filter to a local variable" 2014-11-20 13:02:55 -08:00
Yunqing Wang
bdaa3eaf43 Merge "Revert "vp9_ethread: include a pointer to mb in VP9_COMP"" 2014-11-20 12:27:34 -08:00
Paul Wilkins
5e5da2e963 Fix bug in calculating number of mbs with scaling.
Correct calculation of number of mbs in two pass code when
frame resizing is enabled. Always use initial number of mbs if
scaling is enabled, as this is what was used in the first pass.

Change-Id: I49a4280ab5a8b1000efcc157a449a081cbb6d410
2014-11-20 12:24:43 -08:00
Yunqing Wang
b0efddd8e6 vp9_ethread: change mask_filter to a local variable
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.

Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
2014-11-20 09:41:49 -08:00
Yunqing Wang
ad7586a9e1 vp9_ethread: move max/min partition size to mb struct
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.

Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
2014-11-20 09:24:50 -08:00
Yunqing Wang
70c9d2983b Revert "vp9_ethread: include a pointer to mb in VP9_COMP"
This reverts commit 6906d218dd.

Another way will be used to handle mb struct.

Change-Id: Ic1111a46b2b1ee00f8f9e3fcd4cf3eb6030b2dc4
2014-11-20 08:31:12 -08:00
Peter de Rivaz
a7b2d09f36 Added highbitdepth sse2 acceleration for quantize
Also includes block error.

(This patch is mostly cherry picked from
commit db7192e0b0)

Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
2014-11-19 23:55:19 -08:00
Jingning Han
c42715b721 Enable ssse3 version of vp9_fdct8x8_quant
It improves the speed performance of vp9_fdct8x8_quant_sse2 by
about 5%.

Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097
2014-11-19 22:14:19 -08:00
Yaowu Xu
21db24efcb Add a reset to rc tracking for dropped frames
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/[34] fails post the
merge of commit#ffa06b37. This commit adds reset of rc tracking info
when frame is dropped, and fixes the causes of the bad interaction
between the tests and the previous commit.

Change-Id: I848acfd9fcb336359662274325190f94aac76eae
2014-11-19 15:32:11 -08:00
Jingning Han
bf63652d34 Merge "Combine fdct8x8 and quantization process" 2014-11-19 11:17:44 -08:00
Jingning Han
ce77a7bcb0 Merge "Add sse2 version for vp9_quantize_fp" 2014-11-19 11:17:36 -08:00
Jingning Han
c6908fd5f7 Combine fdct8x8 and quantization process
This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.

Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
2014-11-18 18:10:56 -08:00
Yaowu Xu
587f0cd39d Merge "Prevent severe rate control errors in CBR mode" 2014-11-18 16:56:06 -08:00
Marco
3c715863da Merge "Modify active_worst_quality setting for one pass CBR." 2014-11-18 11:22:18 -08:00
Yaowu Xu
550707b9e1 Merge "change to call vp9_refining_search_sad() directly" 2014-11-18 09:14:02 -08:00
Yaowu Xu
ffa06b3708 Prevent severe rate control errors in CBR mode
In rare cases, the interaction between rate correction factor and Q
choices may cause severe oscillating frame sizes that are way off
target bandwidth. This commit adds tracking of rate control results
for last two frames, and use the information to prevent oscillating
Q choices.

Change-Id: I9a6d125a15652b9bcac0e1fec6d7a1aedc4ed97e
2014-11-18 09:05:57 -08:00
Jingning Han
2d3cc8ea2b Add sse2 version for vp9_quantize_fp
vp9_quantize_fp is the quantization process used by rtc coding
mode. This commit adds a sse2 implementation of it. The
implementation is modified based on vp9_quantize_b_sse2. No speed
difference from ssse3 version.

Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
2014-11-18 09:01:41 -08:00
Jingning Han
13a999de8e Merge "Add empty pointer check to pred buffering in rtc coding mode" 2014-11-17 17:40:54 -08:00
Marco
b660f723b4 Modify active_worst_quality setting for one pass CBR.
Current setting had active_worst_quality set too high (close to worst_quality)
for first frame(s) following first key frame. This changes that to be somewhat
more aggressive in allowing active_worst_quality to be lower following key frame.

Also remove the 4/5 reduction in active_worst for key frame as
this should be set by the user qp_max setting.

Change-Id: I0530b3ddcc85c00e3eb7568de1b14a31206c4a4c
2014-11-17 11:46:49 -08:00
Yaowu Xu
1687c47bfd change to call vp9_refining_search_sad() directly
The function pointer in compressor instance does not change, so this
commit changes to call the function directly.

Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
2014-11-17 11:30:17 -08:00
Jingning Han
a62c87fb04 Add empty pointer check to pred buffering in rtc coding mode
This commit adds a check condition to the prediction buffering
operation used in the rtc coding mode. This resolves a unit test
warning in example/vpx_tsvc_encoder_vp9_mode_7.

Change-Id: I9fd50d5956948b73b53bd8fc5a16ee66aff61995
2014-11-17 11:24:07 -08:00
Yunqing Wang
4539c496bc Merge "Code cleanup: remove unused members in RD_OPT" 2014-11-17 09:10:28 -08:00
Yunqing Wang
3f4a93baf2 Merge "vp9_ethread: combine encoder counts in separate struct" 2014-11-17 08:57:38 -08:00
Debargha Mukherjee
c3a9056df4 Merge "Added sse2 acceleration for highbitdepth variance" 2014-11-14 21:11:27 -08:00
Yunqing Wang
87ae6d73d4 Code cleanup: remove unused members in RD_OPT
These 2 members in RD_OPT were moved to TileDataEnc struct
already, and therefore were removed here.

Change-Id: I22fee3b67f96e473a58e194a7edc76dbd48bfa04
2014-11-14 16:33:25 -08:00
Yunqing Wang
d0b547c676 vp9_ethread: combine encoder counts in separate struct
Several frame counters in encoder are updated at SB level. Combine
those counters and put them in a separate struct, which allows us
to allocate one copy for each thread.

Change-Id: I00366296a13c0ada4d8fa12f5e07728388b6cab7
2014-11-14 16:09:22 -08:00
Peter de Rivaz
48032bfcdb Added sse2 acceleration for highbitdepth variance
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f
(cherry picked from commit d7422b2b1e)
(cherry picked from commit 6d741e4d76)
2014-11-14 15:18:53 -08:00
Yunqing Wang
6906d218dd vp9_ethread: include a pointer to mb in VP9_COMP
Modified VP9_COMP struct to include MACROBLOCK *mb. This change
makes it feasible in multi-thread case to allocate a mb for each
thread.

Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
2014-11-14 12:31:06 -08:00
Yunqing Wang
807885b5e0 Merge "vp9_ethread: modify the cyclic refresh struct" 2014-11-13 18:35:01 -08:00
Yaowu Xu
e4e85ad4a8 Merge "adapt the adjustment limit for rate correction factor in RTC mode" 2014-11-13 15:50:30 -08:00
Yunqing Wang
8ee605f188 vp9_ethread: modify the cyclic refresh struct
Two members in struct CYCLIC_REFRESH
  int64_t projected_rate_sb;
  int64_t projected_dist_sb;
are updated at the superblock level, which makes them shared data
in the multi-thread situation, and requires extra work to handle
them. However, those values are updated and used immediately, and
therefore can be removed. This patch cleaned up the code and
removed the two members.

Change-Id: I2c6ee4552bf49fb63ce590cdb47f9723974fffb1
2014-11-13 15:05:46 -08:00
Adrian Grange
35de9db312 Merge "Prepare for dynamic frame resizing in the recode loop" 2014-11-13 15:01:49 -08:00
Paul Wilkins
20517e0627 Merge "Fix 32 bit build emms problem." 2014-11-13 15:00:41 -08:00
Jingning Han
aeff1f7ec2 Merge "Use reconstructed pixels for intra prediction" 2014-11-13 13:59:02 -08:00
Jingning Han
6efafda738 Merge "Refactor nonrd_use_partition coding process" 2014-11-13 13:58:21 -08:00
Adrian Grange
0d085ebc0a Prepare for dynamic frame resizing in the recode loop
Prepare for the introduction of frame-size change
logic into the recode loop.

Separated the speed dependent features into
separate static and dynamic parts, the latter being
those features that are dependent on the frame size.

Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
2014-11-13 11:41:20 -08:00
Paul Wilkins
b9c4f9a7db Fix 32 bit build emms problem.
Add extra vp9_clear_system_state() calls to fix
double / mmx issue introduced into first pass
code for 32 bit builds.

Change-Id: I84cd2986b80d83650a091ab25c43755efeb82e03
2014-11-13 11:33:55 -08:00
Yaowu Xu
9f79259e54 adapt the adjustment limit for rate correction factor in RTC mode
Rate correction factor is used to correct the estimated rate for any
given quantizer, and feeds into rate control for quantizer selection.
We make use of the actual bits used to calculate this rate correction
factor with an adjustment limit to prevent over-adjustment.

This commit adapts the adjustment limit to the difference between the
estimated bits and the actual bits, allows the adjustment limit to vary
between 0.125 (when estimate is close to actual) and 0.625 (when there
is >10X factor off between estimated and actual bits). By doing this,
the commit appears to have largely corrected two observed issues:
1. Adjustment is too slow when the actual bits used is way off from
estimate due to the small adjustment limit.
2. Extreme oscillating quantizer choices due to the feedback loop.

Change-Id: I4ee148d2c9d26d173b6c48011313ddb07ce2d7d6
2014-11-13 11:26:52 -08:00
Deb Mukherjee
7621a19aa5 Merge "Vidyo: Turn off keyframes in higher spatial layers" 2014-11-13 03:27:11 -08:00
Debargha Mukherjee
002172efd6 Merge "Added highbitdepth sse2 SAD acceleration and tests" 2014-11-12 21:20:34 -08:00
Peter de Rivaz
7eee487c00 Added highbitdepth sse2 SAD acceleration and tests
Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f
(cherry picked from commit b1a6f6b9cb)
2014-11-12 14:25:45 -08:00
Yaowu Xu
8e112d9586 Merge "Use normal rate_correction_factor for gf in CBR mode" 2014-11-12 08:00:26 -08:00
Deb Mukherjee
48a7627316 Vidyo: Turn off keyframes in higher spatial layers
Change-Id: Icdd5e71cd6a2b59bc4b3b972af9e4d4a36821792
2014-11-11 16:09:07 -08:00
Jingning Han
e717d22b63 Use reconstructed pixels for intra prediction
This commit makes the speed -6 and above use the reconstructed
boundary pixels for precise intra prediction. This allows more
intra prediction modes to be tested in the non-RD coding process.

Enabling horizontal and vertical intra prediction modes can
improve the speed -6 compression performance for rtc set
by 0.331%.

Change-Id: I3a99f9d12c6af54de2bdbf28c76eab8e0905f744
2014-11-11 10:04:43 -08:00
Paul Wilkins
4999505472 Merge "AQ1 - remove first pass weights." 2014-11-11 09:17:33 -08:00
Yaowu Xu
f2b978e895 Use normal rate_correction_factor for gf in CBR mode
I0c5f010 changed to allow update golden reference buffer in CBR mode,
this commit changes the use of rate_correction_factor for those frames
to be aligned with the new usage. This commit attempts to solve two
issues:

a. Initialization of rate correction factor for Golden Frame
Prior to this patch, even the regular inter frame has been update
the rate correction factor based on content and encoding results,
the first golden frame would still use the ininitialized value
that can be way off.

b. Allowing rate correction factor update to be slightly faster
Prior to this patch, when the rate correction factor is off, the
update to the factor is too slow, the factor could not get close
to a semi-correct value even after many frames.

The commit helps all clips in psnr/ssim metric, but especially to
a few clip in RTC set that rate correction was way off. For example
thaloundeskmtgvga gained about .5dB for both overall/average psnr.

Change-Id: I0be5c41691be57891d824505348b64be87fa3545
2014-11-10 16:55:13 -08:00
Alex Converse
ce9ba97a9d Fix LAST SKIP when considering GOLDEN
Change-Id: I39d9f13fa34984ee9dad0c4f303ef672635f420e
2014-11-07 13:44:17 -08:00
Paul Wilkins
08d86bc904 Merge "Add intra complexity and brightness weight to first pass." 2014-11-07 09:22:12 -08:00
Paul Wilkins
31b6d7c1eb AQ1 - remove first pass weights.
Removed redundant weighting function tied for AQ1 from first
pass code.

Improvment in baseline AQ1 results:-
Derf  opsnr +0.142% SSIm +0.258%
YT  opsnr +0.173% SSIm +0.3%

Change-Id: I16ef91caf2d7f302cd5940cc5e2626d48ebcb212
2014-11-07 14:11:29 +00:00
Jingning Han
754b05a4de Refactor nonrd_use_partition coding process
This commit integrates the non-RD mode decision process and the
encoding process into a single recursion scheme.

Change-Id: I6a7e72a0b84d567554801ebbe01ec75d54c1f77d
2014-11-06 17:00:48 -08:00
Yunqing Wang
bf44117d5f Merge "Modify the frame context memory deallocation" 2014-11-06 13:08:57 -08:00
Jingning Han
417e754f56 Merge "Remove unused is_background function" 2014-11-06 12:03:15 -08:00
Jingning Han
e97f404e52 Merge "Rework cut-off decisions in cyclic refresh aq mode" 2014-11-06 12:03:07 -08:00
Yunqing Wang
1228433430 Modify the frame context memory deallocation
This patch was to fix the vpxdec fuzzing3 test failure. When an
error occurs, setjmp() is invoked, which calls the decoder
removing routine. In multiple thread situation, other threads
could try to access the frame context memory that is already
deallocated, thus causing a segfault.

An invalid unit test was added for this issue.

Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952
2014-11-06 11:34:19 -08:00
Paul Wilkins
5e935126a6 Add intra complexity and brightness weight to first pass.
The aim of this patch is to apply a positive weighting to
frames that have a significant number of blocks that are
of low spatial complexity and are dark. The rationale behind
this is that artifacts tend to be more visible in such frames.

In this patch the weight is only applied in regard to the distribution
of bits between frames. Hence if all the frames share similar
characteristics (as is the case for most of our short test clips) there
will be little or no net effect.

However, the effect can be seen on some longer form test content.

For example Tears of steel baseline test:
2323.09 Kbit/s opsnr 39.915 ssim 74.729
With this patch:-
2213.34 Kbit/s opsnr 39.963 ssim 74.808
(Sligtly better metrics and about 5% smaller)

The weighting may well need some further tuning along side changes
to the aq modes.

Change-Id: Ieced379bca03938166ab87b2b97f55d94948904c
2014-11-06 10:45:00 +00:00
Jingning Han
10da059b52 Remove unused is_background function
Change-Id: Ia540eac5f066ae95280c2f898370eddf0110c279
2014-11-05 21:19:23 -08:00
Jingning Han
caaf63b2c4 Rework cut-off decisions in cyclic refresh aq mode
This commit removes the cyclic aq mode dependency on
in_static_area and reworks the corresponding cut-off thresholds.
It improves the compression performance of speed -5 by 1.47% in
PSNR and 2.07% in SSIM, and the compression performance of speed
-6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster
in both settings at high bit-rates.

Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9
2014-11-05 21:17:09 -08:00
hkuang
e8860693ea Merge "Totally remove prev_mi in VP9 decoder." 2014-11-05 17:48:47 -08:00
hkuang
4cc7c5a17f Totally remove prev_mi in VP9 decoder.
This will save the memory and improve the decode speed due to
removing unnecessary memset of big prev_mi array for
all the key frames.

Decoding a all key frames 1080p video shows speed improve around 2%.

Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10
2014-11-05 16:14:30 -08:00
Yaowu Xu
2c4fee17bc Fix visual studio 2013 compiler warnings
For configured with --enable-vp9-highbitdepth

Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
2014-11-05 13:47:28 -08:00
Hui Su
2c95a3f374 Merge "Simplify interface of write_selected_tx_size and read_tx_size" 2014-11-05 13:33:09 -08:00
Jingning Han
a7889cac9a Merge "Skip ref frame mode search conditioned on predicted mv residuals" 2014-11-05 12:04:10 -08:00
Hui Su
709c634b84 Simplify interface of write_selected_tx_size and read_tx_size
Change-Id: Ia2b2a895deefaaf7b34bf26df86add56dbab082c
2014-11-04 16:11:50 -08:00
Minghai Shang
9f9e30d7bf Merge "[spatial svc] Make spatial svc working for one pass rate control" 2014-11-04 15:57:16 -08:00
Minghai Shang
86c36a504d [spatial svc] Make spatial svc working for one pass rate control
Change-Id: Ibd9114485c3d747f9d148f64f706bf873ea473ac
2014-11-04 11:46:48 -08:00
Jingning Han
1e753387c8 Merge "Refactor sub-pixel motion search unit" 2014-11-04 09:11:15 -08:00
Jingning Han
1434f7695b Skip ref frame mode search conditioned on predicted mv residuals
This commit makes the RTC coding mode to conditionally skip the
reference frame mode search, when the predicted motion vector of
the current reference frame gives more than two times sum of
absolute difference compared to that of other reference frames.

It reduces the runtim by 1% - 4% for speed -5 and -6. The average
compression performance is improved by about 0.1% in both settings.

It is of particular benefit to light change scenarios. The
compression performance of test clip mmmovingvga.y4m is improved by
6.39% and 15.69% at high bit rates for speed -5 and -6, respectively.

Speed -5
vidyo1 16555 b/f, 40.818 dB, 12422 ms ->
       16552 b/f, 40.804 dB, 12100 ms

nik    33211 b/f, 39.138 dB, 11341 ms ->
       33228 b/f, 39.139 dB, 11023 ms

mmmoving 33263 b/f, 40.935 dB, 13508 ms ->
         33256 b/f, 41.068 dB, 12861 ms

Speed -6
vidyo1 16541 b/f, 40.227 dB, 8437 ms ->
       16540 b/f, 40.220 dB, 8216 ms

nik    33272 b/f, 38.399 dB, 7610 ms ->
       33267 b/f, 38.414 dB, 7490 ms

mmmoving 33255 b/f, 40.555 dB, 7523 ms ->
         33257 b/f, 40.975 dB, 7493 ms

Change-Id: Id2aef76ef74a3cba5e9a82a83b792144948c6a91
2014-11-04 09:10:19 -08:00
Marco
343acaa8f2 Merge "Allow disable of refresh golden for more than 1 layer encoding." 2014-11-03 14:38:05 -08:00
Jingning Han
e083f6bd08 Refactor sub-pixel motion search unit
This commit unfolds the legacy macro definitions used in the
sub-pixel motion search and refactors the operational flow for
later optimizations.

Change-Id: I3e3f770cad961d03d1a6eb0b2a0186cc77eaf2b8
2014-11-03 09:02:57 -08:00
Jingning Han
0ca5908ff6 Merge "Fix the THR_MODES array used in vp9_pick_inter_mode" 2014-11-03 08:46:42 -08:00
Yaowu Xu
2fe893c94f Merge "Fix speed 7 and speed 12 for rt" 2014-11-03 08:02:58 -08:00
Marco
d6b688375f Allow disable of refresh golden for more than 1 layer encoding.
The current logic was allowing for disabling golden refresh only
for two pass svc encoding. This change disables it as long as
more than 1 layer encoding is used (for example temporal layers under 1pass CBR).

Change-Id: I4dc5204a7ad365c821ec7963e93b59da82e1826b
2014-11-02 22:24:00 -08:00
Jingning Han
7e119e2946 Fix the THR_MODES array used in vp9_pick_inter_mode
Fix the alignment of entries fo intra prediction modes.

Change-Id: Ie32ad87cf90694efd591a4b1cc29c916c4cd56f7
2014-11-02 12:25:57 -08:00
Yaowu Xu
0271ff7775 Fix speed 7 and speed 12 for rt
A recent change has introduced big quality drops for speed 7 and 12
for --rt mode. The change reverted the big drop and improved quality
by 9.5% for speed 7 and 13.4% for speed 12.

Change-Id: I07b82e3bb6002a73af486a083458c88877bdad01
2014-10-31 17:29:02 -07:00
hkuang
55577431ae Bind motion vectors with frame buffer structure.
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.

Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
2014-10-31 17:01:08 -07:00
Jingning Han
1c84e73ebd Merge "Fix mode index use case in vp9_pick_inter_mode" 2014-10-31 08:55:40 -07:00
Jingning Han
61966b1d10 Merge "Refactor vp9_update_rd_thresh_fact" 2014-10-31 08:55:28 -07:00
Jingning Han
1cffea9fb7 Merge "Rework pred pixel buffer system in non-RD coding mode" 2014-10-31 08:55:24 -07:00
Jingning Han
64348d9f8d Fix mode index use case in vp9_pick_inter_mode
This improves coding performance of speed -5 and -6 by 0.6%,
respectively.

Change-Id: Ic5a7746a88c73285f0b14333d35dc16b02152c25
2014-10-30 11:10:06 -07:00
Jingning Han
f7b46d8c5e Refactor vp9_update_rd_thresh_fact
Reduce the scope of function parameters.

Change-Id: Ifef2cfb559908a97498ffdbd6ea53da1cd45a73c
2014-10-30 11:09:40 -07:00
Jingning Han
7bea8c59f9 Rework pred pixel buffer system in non-RD coding mode
This commit makes the inter prediction buffer system to support
hybrid partition search. It reduces the runtime of speed -5 by
about 3%. No compression performance change.

vidyo1 720p 1000 kbps
11831 ms -> 11497 ms

nik 720p 1000 kbps
10919 ms -> 10645 ms

Change-Id: I5b2da747c6395c253cd074d3907f5402e1840c36
2014-10-30 11:08:35 -07:00
Hui Su
d478d2df37 Merge "Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV." 2014-10-30 11:05:04 -07:00
Hui Su
66906da066 Merge "Combine vp9_encode_block_intra and encode_block_intra" 2014-10-30 11:02:31 -07:00
Yunqing Wang
aed48c786a Remove unused speed feature
Partition_check was unused and removed.

Change-Id: I15ec9162d86dc61f04c09229c498629878ed7155
2014-10-29 17:05:04 -07:00
Jingning Han
afa31ab9b8 Merge "Enable mode search threshold update in non-RD coding mode" 2014-10-29 12:42:22 -07:00
Jingning Han
9349a28e80 Enable mode search threshold update in non-RD coding mode
Adaptively adjust the mode thresholds after each mode search round
to skip checking less likely selected modes. Local tests indicate
5% - 10% speed-up in speed -5 and -6. Average coding performance
loss is -1.055%.

speed -5
vidyo1 720p 1000 kbps
16533 b/f, 40.851 dB, 12607 ms -> 16556 b/f, 40.796 dB, 11831 ms

nik 720p 1000 kbps
33229 b/f, 39.127 dB, 11468 ms -> 33235 b/f, 39.131 dB, 10919 ms

speed -6
vidyo1 720p 1000 kbps
16549 b/f, 40.268 dB, 10138 ms -> 16538 b/f, 40.212 dB, 8456 ms

nik 720p 1000 kbps
33271 b/f, 38.433 dB,  7886 ms -> 33279 b/f, 38.416 dB, 7843 ms

Change-Id: I2c2963f1ce4ed9c1cf233b5b2c880b682e1c1e8b
2014-10-29 10:55:34 -07:00
Adrian Grange
4074099ed8 Simplify vp9_set_rd_speed_thresholds_sub8x8
Change-Id: I4bf0f9a38697f5aea564a47afd7f02bb8b2888b6
2014-10-29 09:09:46 -07:00
Hui Su
0928da3b6e Combine vp9_encode_block_intra and encode_block_intra
Change-Id: I79091fb677b64892ecca2fb466fde14602d8cdfc
2014-10-28 18:57:01 -07:00
Jingning Han
982dab6050 Merge "Use zero motion vector in choose_partitioning" 2014-10-28 12:00:13 -07:00
JackyChen
50e5c30536 Merge "vp9_denoiser_sse2: refactor the code." 2014-10-28 11:06:05 -07:00
Yaowu Xu
7d7b43b9af Merge "Allow update of golden refernce buffer in CBR mode" 2014-10-28 10:48:02 -07:00
JackyChen
99a8dac4de vp9_denoiser_sse2: refactor the code.
Combined vp9_denoiser_8xM_sse2 and vp9_denoiser_4xM_sse2 into one
function vp9_denoiser_NxM_sse2_small and passed the bitexact testing.
Changed the name of the function vp9_denoiser_64_32_16xM_sse2 to
vp9_denoiser_NxM_sse2_big.

Change-Id: Ib22478df585994dd347ebae04202c0b701e7f451
2014-10-28 09:36:58 -07:00
Yaowu Xu
2a506e33b4 Merge "Add a new control of golden frame boost in CBR mode" 2014-10-28 09:32:58 -07:00
Yaowu Xu
e5cd51880e Allow update of golden refernce buffer in CBR mode
This commit changes to allow the usage of golden reference frame in
VP9 CBR mode to improve quality. VP9 supports potentially up to 8
reference buffers, it has reference buffers available for this
purpose. This was not possible in VP8 as golden and alt-ref buffers
were used for temporal scalability purpose in CBR mode in WebRTC.

For frames that update golden frame, there can be a quality boost.
The amount of allowed bitrate boost can be controlled via parameter
rc_max_inter_bitrate_pct. The inital value of the boost ratior is
currently based on over_shoot_pct. Further experiments will work
out the adaption of this boost value.

Change-Id: I0c5f010c8fd8b7b598f69779c1b30e5b2ac30a4d
2014-10-28 09:31:10 -07:00
Paul Wilkins
422d7bc918 Relax maximum Q for extreme overshoot.
Added code to relax the active maximum Q in response
to extreme local overshoot to reduce bandwidth peaks.

The impact is small in metrics terms, but it this helps reduce
bandwidth spikes and overall overshoot in a number of
clips in our tests sets (especially the YT test set).

In particular this should help prevent very big spikes where a clip
is mainly easy but has a short hard section. In such a case a choice
of maximum Q for the clip as a whole may allow us to hit the overall
target rate but give some extreme spikes. The chunked encoding in YT
mitigates this problem but it can show up where a longer clip is
coded as a single chunk.

Change-Id: I213d09950ccb8489d10adf00fda1e53235b39203
2014-10-28 13:03:06 +00:00
Jingning Han
07436abb86 Use zero motion vector in choose_partitioning
The zero motion vector was effectively used in the subsampled pixel
based variance calculation. This commit makes it directly use zero
mv to generate prediction.

Change-Id: Ica83dc843e9f8da2f89c3ef451e50f16214c0def
2014-10-27 19:38:43 -07:00
Jingning Han
d56b3eb0cf Refactor encoder tile data structure
Make the common tile info as one element in the encoder tile data
struct.

Change-Id: I8c474b4ba67ee3e2c86ab164f353ff71ea9992be
2014-10-27 19:37:13 -07:00
Yaowu Xu
03a60b78db Add a new control of golden frame boost in CBR mode
0 means that golden boost is off, and uses average frame target rate,
a non-zero number means the percentage of boost over average frame
bitrate is given initially to golden frames in CBR mode.

Change-Id: If4334fe2cc424b65ae0cce27f71b5561bf1e577d
2014-10-27 13:55:18 -07:00
Jingning Han
192010d218 Refactor rtc coding mode to support tile encoding
Use per tile threshold in the prediction mode search process.

Change-Id: I6c74ee5a3b069bb4281002dfe51310911a0756c0
2014-10-27 09:53:46 -07:00
Yaowu Xu
aa2af3ff6e Merge "Add a new control of max bitrate for inter frame" 2014-10-27 08:11:54 -07:00
Jingning Han
ac53c41e64 Merge "Tile based adaptive mode search in RD loop" 2014-10-24 18:44:52 -07:00
Yaowu Xu
636099f7b6 Add a new control of max bitrate for inter frame
Change-Id: I205de3611622cff7f751ea8baf9f82784581730a
2014-10-24 10:19:28 -07:00
Jingning Han
eee201c221 Tile based adaptive mode search in RD loop
Make the spatially adaptive mode search in rate-distortion
optimization loop inter tile independent. Experiments suggest that
this does not significantly change the coding staticstics.

Single tile, speed 3:
pedestrian_area 1080p 1500 kbps
59192 b/f, 40.611 dB, 101689 ms

blue_sky 1080p 1500 kbps
58505 b/f, 36.347 dB, 62458 ms

mobile_cal 720p 1000 kbps
13335 b/f, 35.646 dB, 45655 ms

as compared to 4 column tiles, speed 3:
pedestrian_area 1080p 1500 kbps
59329 b/f, 40.597 dB, 101917 ms

blue_sky 1080p 1500 kbps
58712 b/f, 36.320 dB, 62693 ms

mobile_cal 720p 1000 kbps
13191 b/f, 35.485 dB, 45319 ms

Change-Id: I35c6e1e0a859fece8f4145dec28623cbc6a12325
2014-10-24 10:00:27 -07:00
Paul Wilkins
60d192db04 Merge "Enable dual arf with constant q." 2014-10-24 05:51:25 -07:00
Paul Wilkins
3758650c98 Merge "Move frame re-sizing into the recode loop" 2014-10-24 05:50:39 -07:00
Adrian Grange
65753eeb8a Move frame re-sizing into the recode loop
The point at which frames are scaled to their
coded dimensions is moved into the re-code loop.

This is in preparation for a further patch that
will add logic into the re-code loop to reduce
the coded frame size if the encoder is struggling
to hit the target data rate at the native frame
size.

Change-Id: Ie4131f5ec6fb93148879f6ce96123296442bf2d1
2014-10-23 16:20:57 -07:00
Yaowu Xu
86777f2e1e Merge "Move filter_ref initialization" 2014-10-23 11:20:22 -07:00
Yaowu Xu
065809d286 Move filter_ref initialization
To outside the loop to avoid repeating the operations.

Change-Id: I66c1986e98ce0d7594caad3d3b45de655b299bff
2014-10-23 08:27:25 -07:00
Paul Wilkins
8fc3ab774f Enable dual arf with constant q.
Add second level arf Q adjustment when using dual arfs
in constant Q mode.

Previously in constant Q mode enabling dual arf hurt by ~5%
but with this change the average benefit is ~1-1.5% with some
mid range data points up ~10%.

Note however that it still hurts on some clips including
some very low motion show content.

Change-Id: I5b7789a2f42a6127d9e801cc010c20a7113bdd9b
2014-10-23 13:19:31 +01:00
Paul Wilkins
9363425daa Merge "Initialization bug for multi arf." 2014-10-23 02:02:48 -07:00
Jingning Han
41a17f4457 Merge "Allow checking zeromv mode in vp9_pick_inter_mode" 2014-10-22 18:46:20 -07:00
Yunqing Wang
330a6b2756 Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct" 2014-10-22 17:10:39 -07:00
Yunqing Wang
7c7e4d4eb8 vp9_ethread: allocate frame contexts outside VP9_COMMON struct
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.

Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
2014-10-22 15:03:12 -07:00
Yaowu Xu
7c48a295ae Merge "Fix a subtle issue in re-use inter_pred" 2014-10-22 14:53:06 -07:00
Jingning Han
08cdd006e1 Allow checking zeromv mode in vp9_pick_inter_mode
This improves the compression performance of speed -5 by 0.6%. The
speed impact is less than 1%.

Change-Id: Ie77daa561976dfc8b479061e1221bdf428eb0c3b
2014-10-22 14:47:15 -07:00
JackyChen
897500b9ba Merge "vp9_denoiser_sse2.c: improve code style." 2014-10-22 13:52:03 -07:00
Yaowu Xu
3f79359e0a Fix a subtle issue in re-use inter_pred
The initialization of this_mode_pred does not work when the ref_frame
loop ever goes beyond LAST_FRAME. This commit fixes the subtle issue
and allows potentially expanding the loop to test GOLDEN_FRAME.

Change-Id: Ibbd427a22160d1d9eacb8ed0c87f88d6cef9c0f3
2014-10-22 12:06:27 -07:00
JackyChen
5cba6516aa vp9_denoiser_sse2.c: improve code style.
denoiser_sse2.c: fix typos in comment.

Change-Id: Ic0fb102331b0e533c058da3cab1fbc30de9a0070
2014-10-22 10:55:54 -07:00
Paul Wilkins
7cd6330ef3 Initialization bug for multi arf.
Moved erroneous reset of cpi->multi_arf_last_grp_enabled.

Change-Id: Ibb0b96f6ed1d5eeb575a3b1c798e0fe2ee651d06
2014-10-22 18:51:07 +01:00
Hangyu Kuang
9ce3a7d76c Implement frame parallel decode for VP9.
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.

Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
2014-10-22 10:50:58 -07:00
Jingning Han
0e64aa5073 Merge "Refactor rate distortion cost structure in non-RD coding mode" 2014-10-22 08:41:36 -07:00
Yaowu Xu
87665f16f4 Merge "Change speed features for good quality(cpu-used=5)" 2014-10-22 08:40:15 -07:00
Jingning Han
be212d4db3 Refactor rate distortion cost structure in non-RD coding mode
This commit refactors the rate distortion structure used in the
non-RD coding mode and saves a few RDCOST calculations.

Change-Id: I62c3416c300d2c5372f21b96d93a6b633a34ab3a
2014-10-21 17:17:11 -07:00
Hui Su
8947b18fa3 Move the definition of switchable filter numbers into enum
INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and
IF_DIFF_REF_FRAME_ADD_MV.

Change-Id: Ic36c9eb6ccb8ec324d991f7241e42b40b60b1dcb
2014-10-21 15:41:37 -07:00
Yaowu Xu
c30f7e6cc5 Change speed features for good quality(cpu-used=5)
The existing speed features produce horrible encoding results, almost
30% worse than cpu-used=4, this commit adjust the speed features to
produce relatively resonable results to be within 3%-5% of cpu-used=4.

Change-Id: I0ca6ebafb33024d4a0cbcf04c78a4a00b8dd1ecf
2014-10-21 11:59:12 -07:00
Jingning Han
1ed1dde06d Remove unused copy_partitioning
Change-Id: I75a2a3772ed17e73180eb4f263cc838cae4927b0
2014-10-21 09:47:58 -07:00
Jingning Han
f2c21cfa1e Merge "Remove deprecated constrain_copy_partitioning function" 2014-10-21 09:44:11 -07:00
Jingning Han
55ec7ebca7 Merge "Remove unused sb_has_motion function in vp9_encodeframe.c" 2014-10-21 09:43:55 -07:00
Jingning Han
072d844aff Merge "Remove deprecated use_lastframe_partitioning feature" 2014-10-21 09:43:45 -07:00
Jingning Han
61ff08ef61 Merge "Hybrid partition search for rtc coding mode" 2014-10-21 09:43:35 -07:00
Paul Wilkins
4163da33c2 Merge "Extend --auto-alt-ref so it can enable multi-alt ref." 2014-10-21 07:02:24 -07:00
Paul Wilkins
889c53a507 Merge "Resolve compiler warning." 2014-10-21 07:02:13 -07:00
Jingning Han
abb2fbb10e Remove deprecated constrain_copy_partitioning function
Its functionality has been replaced with choose_partitioning and
threshold based control on split mode check.

Change-Id: Ic9bb321df06b524f5c38ea5874dc6f6a8f93c5e3
2014-10-20 17:08:21 -07:00
Jingning Han
ef53898c48 Remove unused sb_has_motion function in vp9_encodeframe.c
Change-Id: I035fb6aa5c10741b065e27befb097d8087e3c62f
2014-10-20 17:08:11 -07:00
Jingning Han
e62ce79e1a Remove deprecated use_lastframe_partitioning feature
This speed feature has been deprecated in both yt and rtc coding
modes. This commit removes the related operations.

Change-Id: I079c79c6adafe45581af2ebf8b98faebcface1ce
2014-10-20 17:03:38 -07:00
Jingning Han
9f128b3ed9 Hybrid partition search for rtc coding mode
This commit re-designs the recursive partition search scheme in
rtc speed -5. It first checks if the current block is under cyclic
refresh mode. If so, apply recursive partition search. Otherwise,
perform sub-sampled pixel based partition selection. When the
pre-selection finds the partition size should be 32x32 or above,
use the partition size directly. Otherwise, apply partition search
at nearby levels around the preset partition size.

It is enabled in speed -5. The compression performance of rtc
speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
from 1% to 10%.

nik_720p, 1000 kbps
33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms

vidyo1_720p, 1000 kbps
16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms

Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c
2014-10-20 13:02:12 -07:00
Yunqing Wang
687c56e802 Merge "SAD32xh and SAD64xh for AVX2" 2014-10-20 12:37:55 -07:00
Yunqing Wang
67c866750c Merge "Remove the dependency in token storing locations" 2014-10-20 08:26:46 -07:00
Paul Wilkins
6f0ae3a2d1 Extend --auto-alt-ref so it can enable multi-alt ref.
Extend --auto-alt-ref from parameter so we can use it to
turn multi-arf on and off from the command line.

For now the range is 0-off, 1-on, 2-multi-arf on.

Rename play_alternate to enable_auto_arf

Change-Id: Id7b64407cfbe76ba0090a83b588a03e22a240386
2014-10-20 16:09:37 +01:00
Paul Wilkins
9626a0cb62 Resolve compiler warning.
conversion from 'const int64_t' to 'int', possible loss of data.

Change-Id: I471a73bba5d448d9be0ef9cbf1590fa73aa74be1
2014-10-20 12:08:33 +01:00
Paul Wilkins
9c98fb2bab Merge "Alter adjustment of two pass GF/ARF boost with Q." 2014-10-20 03:12:06 -07:00
levytamar82
7045aec00a SAD32xh and SAD64xh for AVX2
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain

Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
2014-10-19 13:59:10 -07:00
Debargha Mukherjee
6202c75f84 Merge "Add highbitdepth function for vp9_avg_8x8" 2014-10-18 14:37:10 -07:00
Yaowu Xu
06e65269c7 Merge "Remove unused VAR_BASED_FIXED_PARTITION flag" 2014-10-18 13:31:47 -07:00
Yaowu Xu
7bf475926b Merge "Use rate/distortion thresholds to control non-RD partition search" 2014-10-18 13:31:41 -07:00
Peter de Rivaz
73ae6e495c Add highbitdepth function for vp9_avg_8x8
Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/
(a92f987a6b) on highbitdepth branch.

Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8
2014-10-17 17:04:37 -07:00
Yunqing Wang
7c4992c466 Remove the dependency in token storing locations
Currently, the tokens for a tile are stored immediately after its
preceding tile, which causes a dependency. This is unnecessary
since we always allocate enough memory for tokens. Removing
the dependency allows token writing done in parallel. This patch
doesn't change encoding result.

Change-Id: I7365a6e5e2c2833eb14377c37e1503c9d0f26543
2014-10-17 14:25:33 -07:00
JackyChen
0615a196e2 Merge "vp9_denoiser_sse2.c: solve windows build error." 2014-10-17 11:16:22 -07:00
Jingning Han
3bc94cd2eb Merge "Add init and reset functions for RD_COST struct" 2014-10-17 11:15:19 -07:00
Jingning Han
4a6c99e5fd Merge "Reset rate cost value in rd mode search" 2014-10-17 11:15:03 -07:00
Jingning Han
94ecfa323f Reset rate cost value in rd mode search
When early termination is triggered, properly reset the rate cost
to invalid value to avoid potential ioc issue.

Change-Id: I3444390be2e49a34bb02cf8a74c33d5dbd96d88d
2014-10-17 09:33:59 -07:00
JackyChen
6356d21a47 vp9_denoiser_sse2.c: solve windows build error.
Change-Id: Ib5df91c8580d5dbeb0b3554edc9c2ca906ba4c4d
2014-10-17 09:28:22 -07:00
Jingning Han
e1111fba7e Remove unused VAR_BASED_FIXED_PARTITION flag
Change-Id: I4ce19b7cb1c45fed86e81ee785e787630020fb4f
2014-10-17 09:02:25 -07:00
Paul Wilkins
f0c3da930a Alter adjustment of two pass GF/ARF boost with Q.
Delete gfboost_qadjust() and move Q based adjustment
into calc_frame_boost(). Also remove clamping. Making
the adjustment here means that it influences not just the
boost level but also the selection of the GF/ARF interval.

This change gives a small average gain in PSNR but
larger gains in SSIM, especially for harder std-hd set (1.5%)

Change-Id: I3aa81b8feccaeff93d915e19fb9cf5cd64c86327
2014-10-17 16:37:17 +01:00
James Zern
00f1cf40ed Merge "vp9_denoiser_sse2.c: eliminate gcc warnings" 2014-10-17 03:26:06 -07:00
JackyChen
8514d03402 vp9_denoiser_sse2.c: eliminate gcc warnings
Change-Id: I5f63f48e11e31ea9951223c5b18f42a2471e4560
2014-10-17 11:00:57 +02:00
Jingning Han
4b5f01e12e Merge "Fix an ioc issue in super_block_uvrd" 2014-10-16 12:35:11 -07:00
Jingning Han
ed100c0b00 Fix an ioc issue in super_block_uvrd
This commit fixes an ioc issue that will happen when the cumulative
variables are not in effective use. The fix discards these
redundant additions.

Change-Id: Idbac5bfb989c0cedc5f8a323effce938519b2457
2014-10-16 11:07:39 -07:00
Paul Wilkins
716ae78ce4 Change initialization of static_scene_max_gf_interval.
This removes an unnecessary restriction that causes
a problem (noticed by AWG) when the forced key frame
interval is set to a very small value, such as 10. In this case
we were being forced to code minimal length GF groups.

Change-Id: I76ef5861a09638ff51f61fea02359554184ada53
2014-10-16 16:18:29 +01:00
Minghai Shang
68b550f551 [spatial svc]Another workaround to avoid using prev_mi
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.

Change remerged.

Change-Id: I9efab38bba7da86e056fbe8f663e711c5df38449
2014-10-16 16:09:40 +01:00
Paul Wilkins
d5130af568 Revert "Move input frame scaling into the recode loop"
This reverts commit 452dc21500.

This change has introduced a significant quality regression on content
with forced key frames. (e.g. the YT and yt-hd set). It is most
noticeable in static content where the kf bits dominate. Here, despite
key frames being apparently coded at the same Q, there is a drop in all
metrics of ~20% (e.g clXR and BFa0).

Change-Id: Iba14cc61778c0846fa0a59c33c55a9fc49512cb4
2014-10-16 15:54:40 +01:00
Paul Wilkins
468032961d Revert "[spatial svc]Another workaround to avoid using prev_mi"
This reverts commit c113457af9.

Temporary revert to allow clean revert of another commit.

Change-Id: Ia9b7b755e6c48e1b6e383329f121fef175a24b27
2014-10-16 15:52:08 +01:00
Marco
48ea5b7190 Merge "Some updates for Speed 6/VAR_BASED_PARTITION." 2014-10-15 15:57:21 -07:00
Jingning Han
e2612fbd70 Add init and reset functions for RD_COST struct
Change-Id: I2902de7051a883fd22e27a655209233733969cfd
2014-10-15 15:02:06 -07:00
Jingning Han
801b77d48c Merge "Replace copy_partitioning use case with choose_partitioning" 2014-10-15 14:54:52 -07:00
Jingning Han
5e766ccee0 Use rate/distortion thresholds to control non-RD partition search
Compare the estimated rate and distortion to the thresholds scaled
according to the operating block size and determine if further
split partition search will be run. The compression performance of
speed -5 is changed by -0.074%. The encoding speed is 10% - 15%
faster.

vidyo1 720p
16545 b/f, 40.492 dB, 11475 ms -> 16535 b/f, 40.486 dB, 10100 ms

nik720p
16624 b/f, 36.310 dB, 10071 ms -> 16617 b/f, 36.313 dB, 8346 ms

Change-Id: Ic9197ab5761279ae55d2fb7813b2af0e0db497b8
2014-10-15 13:40:33 -07:00
Marco
09ea74f194 Some updates for Speed 6/VAR_BASED_PARTITION.
Reduce the intra_cost_penalty for non-rd mode,
and some updates to VAR_BASED_PARTITION.

Visual tests show some improvement at Speed 6, for RTC clips.

Change-Id: If9090daf7aed14906a32d931a538ab544bbca606
2014-10-15 12:06:48 -07:00
Jingning Han
89b8c7a513 Replace copy_partitioning use case with choose_partitioning
This commit replaces the use of copy_partitioning with
choose_partitioning based on the sse of subsamped pixels, which
provides significantly better coding performance and runs at
similar speed, as compared to copy_partitioning. It improves rtc
speed 5 coding performance by 3%.

Change-Id: I52d3682a12dce0147f5e52383a594fc242ca3228
2014-10-15 11:37:20 -07:00
Minghai Shang
c113457af9 [spatial svc]Another workaround to avoid using prev_mi
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change-Id: I60b680314e33a60b4093cafc296465ee18169c19
2014-10-14 16:26:39 -07:00
Adrian Grange
2040bb58fb Merge "Move input frame scaling into the recode loop" 2014-10-14 15:30:42 -07:00
Alex Converse
00a9671bbd Merge "Add a 32-bit friendly sse2 quantizer." 2014-10-14 14:35:02 -07:00
Yunqing Wang
a614f2288c Remove an unneeded function call
set_tile_limits() is called in vp9_change_config() already.

Change-Id: I91c3a0df2c1c7fd7e71546d8f51fd5b65838a7da
2014-10-14 11:41:37 -07:00
Alex Converse
7497d2fb23 Add a 32-bit friendly sse2 quantizer.
This is based on the 64-bit ssse3 quantizer.

1.1x speedup for screen content at speed 7.

Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448
2014-10-14 11:37:41 -07:00
Jingning Han
f67e75a6f4 Merge "Refactor super_block_uvrd function to remove goto statement" 2014-10-14 11:33:00 -07:00
Jingning Han
f3a5de816d Refactor super_block_uvrd function to remove goto statement
Use return value 0/1 as indicator of the validity of the rate-
distortion cost.

Change-Id: I6244126fbf03472cebcba4f177a6cd329fae4743
2014-10-14 09:58:11 -07:00
Adrian Grange
452dc21500 Move input frame scaling into the recode loop
Move the point at which input frames are scaled
into the recode loop. This will allow us to change
the coded frame size dynamically in response
to previous attempts to encode the frame at a
higher resolution.

A following patch will implement a scheme for
resizing the frame in the recode loop.

Change-Id: I6a59c02d6ac1626512edad6de8b60063b79433e6
2014-10-14 09:27:55 -07:00
Jingning Han
d0369d6fd4 Merge "Use speed feature variable in vp9_rd_pick_inter/intra_mode" 2014-10-14 09:10:24 -07:00
Jingning Han
fdf2205558 Merge "Fix vp9_rd_pick_inter/intra function types" 2014-10-14 09:10:11 -07:00
Jingning Han
790a96c94f Merge "Refactor rate distortion cost structure" 2014-10-14 08:58:55 -07:00
Jingning Han
69a09a70e9 Use speed feature variable in vp9_rd_pick_inter/intra_mode
Replace repeated fetch cpi->sf with a local sf pointer.

Change-Id: I5a55bba3e1c41fbdbc6ad5f078d2fa49dd95ee67
2014-10-13 16:15:00 -07:00
Jingning Han
3bdb6bfcee Fix vp9_rd_pick_inter/intra function types
The returned value is not used anywhere, hence changing the function
type into void.

Change-Id: I0ece49ed61e7aab6df01140135503ad41d4ef4a4
2014-10-13 16:00:46 -07:00
Jingning Han
811cef97c9 Refactor rate distortion cost structure
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.

Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
2014-10-13 14:27:16 -07:00
Paul Wilkins
6dbb9e4d44 Clamp rate error estimate.
Add back clamp which ensures that the Q adaptation
is turned off when the over_shoot_pct and under_shoot_pct
parameters are set to 100.

Change-Id: Id0161b114d39a3029cd3eb28020caab0c3914922
2014-10-13 18:07:58 +01:00
Paul Wilkins
f7f0eaa581 Add adaptation option for VBR.
Allow min and maxQ to creep when the undershoot
or overshoot exceeds thresholds controlled by the
command line under_shoot_pct and over_shoot_pct
values.

Default is 100%,100% which ~disables adaptation.

Derf results for example undershoot% / overshoot%:-

Head:- Mean abs (%rate error) = 14.4%

This check in:-
25%/25% - Mean abs (%rate error) = 6.7%
                  PSNR hit -1% SSIM -0.1%

5% / 5%  - Mean abs (%rate error) = 2.2%
                 PSNR hit -3.3% SSIM - 1.1%

Most of the remaining error and most of the quality hit is
at extreme data rates. The adaptation code still has an
exception for material that is in effect static so that we
don't over adjust and over spend on YT slide show type
content.

(Rebase of If25a2449a415449c150acff23df713e9598d64c9
to resolve a auto-merge error)

Change-Id: Iec4e1613ef0d067454751d8220edb7058dfbd816
2014-10-13 10:16:44 +01:00
Jingning Han
a62acf3c0a Fix ActiveMapTest valgrind warning
This fixes a valgrind warning in the ActiveMapTest unit test
reported in issue 870.

Change-Id: Idf172ab0244ebefe630c3577e649bc9ba7c43d10
2014-10-11 22:36:58 -07:00
Alex Converse
a90255c366 Revert "Add adaptation option for VBR."
This reverts commit 869d4ca519.

This breaks the build via conflict with
e18edd5eb6.

Change-Id: If544b99e367a449452834eb8cce600f58c34ec0d
2014-10-10 11:34:00 -07:00
Paul Wilkins
169949dd74 Merge "Add adaptation option for VBR." 2014-10-10 09:22:58 -07:00
Yaowu Xu
bdea0055b2 Merge "vp9/choose_partitioning: add missing clear_system_state" 2014-10-10 09:16:19 -07:00
James Zern
a3e1a9291a vp9/choose_partitioning: add missing clear_system_state
set_vt_partitioning does double math

Change-Id: I8e9d73d5c89b937a5326abf04164d24d9d88c5ef
2014-10-10 08:14:46 -07:00
Paul Wilkins
869d4ca519 Add adaptation option for VBR.
Allow min and maxQ to creep when the undershoot
or overshoot exceeds thresholds controlled by the
command line under_shoot_pct and over_shoot_pct
values.

Default is 100%,100% which ~disables adaptation.

Derf results for example undershoot% / overshoot%:-

Head:- Mean abs (%rate error) = 14.4%

This check in:-
25%/25% - Mean abs (%rate error) = 6.7%
                  PSNR hit -1% SSIM -0.1%

5% / 5%  - Mean abs (%rate error) = 2.2%
                 PSNR hit -3.3% SSIM - 1.1%

Most of the remaining error and most of the quality hit is
at extreme data rates. The adaptation code still has an
exception for material that is in effect static so that we
don't over adjust and over spend on YT slide show type
content.

Change-Id: If25a2449a415449c150acff23df713e9598d64c9
2014-10-10 12:54:16 +01:00
James Zern
7c6fec672f vp9_avg_intrin_sse2: correct intrinsics include
immintrin.h -> emmintrin.h
fixes build where newer intrinsics are unavailable

Change-Id: I79311b39bfa782fc2abeb45884ecb417050cb9f8
2014-10-10 10:05:47 +02:00
Deb Mukherjee
9a29fdbae7 Merge "Rename highbitdepth functions to use highbd prefix" 2014-10-09 15:39:56 -07:00
Deb Mukherjee
1929c9b391 Rename highbitdepth functions to use highbd prefix
Uses highbd_ prefix convention consistently.

Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-10-09 14:40:40 -07:00
Jingning Han
112789d4f2 Merge "Remove sub8x8 block index from rd_pick_partition argument" 2014-10-09 11:16:11 -07:00
Deb Mukherjee
3117830af3 Merge "Subpel search cleanups and enhancements" 2014-10-09 11:14:51 -07:00
Alex Converse
9ffbc31367 Merge "Move the high freq coeff check outside store_coding_context" 2014-10-09 11:12:02 -07:00
Jingning Han
6a0d291fce Remove sub8x8 block index from rd_pick_partition argument
This parameter is deprecated. Its function is replaced with
other explicit condition check.

Change-Id: I61337e350ba8ca9eb50382db8b4d4acbf45cb7eb
2014-10-09 09:20:16 -07:00
Yaowu Xu
3af39b9997 Merge "Fix src frame buffer copy and extend" 2014-10-09 07:08:48 -07:00
James Zern
cec763bd97 set_vt_partitioning: fix type conversion warning
double -> int64
+ make threshold_multiplier an int

Change-Id: I6d3607fdf13d670f57c9d9b04a80acb2be1346a0
2014-10-09 11:41:36 +02:00
Deb Mukherjee
d78dbff09a Subpel search cleanups and enhancements
- Some fixes to surface fit.
- Returns variance function as cost rather than sad in the
  pattern search and diamond search functions. Only
  vp9_pattern_search_sad function used in bigdia search
  uses sad as integer 1-away costs.
- Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.

Results:
derf [Speed 3]: About +0.036% in coding efficiency without any
discernible speed loss.
derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.

Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
2014-10-08 23:59:43 -07:00
Yunqing Wang
189566db58 Merge "Allow mode search breakout at very low prediction errors" 2014-10-08 19:58:18 -07:00
Yunqing Wang
e18edd5eb6 Allow mode search breakout at very low prediction errors
In model_rd_for_sb function, the spatial domain SSE and variance
are checked to see if transform coefficients are quantized to 0.
Besides that, this patch adds another set of thresholds that are
much more strict. These thresholds are used to conduct a partition
block level check to measure if all its TX blocks are skippable
for YUV planes. If it is true, x->skip is set for this partition
block, and thus its mode search is terminated.

This speeds up the encoding at very low prediction error case,
such as screen sharing application. This patch covers what
rd_encode_breakout_test() does, so that function is removed.

Borg test at speed 3 shows:
For stdhd set, psnr: +0.008%, ssim: +0.014%;
For derf set, psnr: +0.018%, ssim: +0.025%.
No noticeable speed change.

Change-Id: I4e5f15cf10016a282a68e35175ff854b28195944
2014-10-08 17:46:22 -07:00
Jingning Han
5fcbcf1b22 Move the high freq coeff check outside store_coding_context
This fixes valgrind message issue 870.

Change-Id: Ibbc2481923a2995029ab05de30c9e8a6e9f0f9a8
2014-10-08 16:10:32 -07:00
Jingning Han
41cea46154 Use local variable in vp9_rd_pick_inter_mode_sb
Change-Id: Ie35a965a6b8de536ccaf61ff61498620d22db205
2014-10-08 16:09:47 -07:00
Yaowu Xu
d602500d4a Fix src frame buffer copy and extend
For input source with size that is not multiple of 8, the size is
rounded to 8 and saved in width or height, the original source sizes
are saved in crop_width and crop_height. This commit corrects the
computation of bottom and right extension amounts to use the orignal
sizes, hence crop_width and crop_height.

In addition, this commit also adds the missed initialization for
uv_crop_width and uv_crop_height.

This addresses issue #834

Change-Id: I084543ca7645a4964b88f7cf8ff668f517d3a39b
2014-10-08 11:07:04 -07:00
Jim Bankoski
20254d1daa Merge "experimental : partition using 1/8 x 1/8 image" 2014-10-08 09:04:26 -07:00
Jim Bankoski
4130e691bb Merge "Force better lower quantizer keyframe in case of high quantizer." 2014-10-08 09:04:01 -07:00
Paul Wilkins
2faff64866 Merge "Improve two pass VBR accuracy." 2014-10-08 04:23:30 -07:00
Paul Wilkins
679a1c2f5a Merge "Two pass rc changes." 2014-10-08 04:23:20 -07:00
Jim Bankoski
0ce51d823f experimental : partition using 1/8 x 1/8 image
The concept:

There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either.   This skirts the issue by using
a box blur scaled down version for variance calculations.  To compare
against source_var_ moved keyframe to be rd based like source_var.

Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
2014-10-07 16:36:14 -07:00
Jim Bankoski
dae97868da Force better lower quantizer keyframe in case of high quantizer.
Change-Id: Ie69a164bc166b6a8819777038d65a7d9f9c3361f
2014-10-07 15:40:02 -07:00
Jingning Han
3bbec7b422 Merge "Replace mi_width_log2() with mi_width_log2_lookup table" 2014-10-07 15:33:52 -07:00
Jingning Han
27c9577f8e Merge "Take out repeated block width/height lookup functions" 2014-10-07 15:33:45 -07:00
Yunqing Wang
0cac69f594 Merge "Fix skip_txfm issue in rdopt code" 2014-10-07 15:13:44 -07:00
Jingning Han
bd9706506f Merge "Move inter filter defs to vp9_filter.h" 2014-10-07 13:42:26 -07:00
Jingning Han
91442e1626 Merge "Reduce the scope of the header file used in vp9_context_tree.h" 2014-10-07 13:42:17 -07:00
Jingning Han
92b45e5d4e Merge "Remove redundant header file from vp9_encoder.h" 2014-10-07 13:42:13 -07:00
Yunqing Wang
a4aa14020a Fix skip_txfm issue in rdopt code
Fixed an encoder crash. Set skip_txfm to 0 for cases that skip_txfm
isn't calculated. Put memcpy of skip_txfm at right place.

Change-Id: Ib3b6afc1b251a85b2a853c8138fb3393f48cfef6
2014-10-07 12:47:43 -07:00
Jingning Han
7ee58985bd Replace mi_width_log2() with mi_width_log2_lookup table
Change-Id: If0ea98aa139d14d40cd924114e18396aff36b5a5
2014-10-07 12:45:25 -07:00
Jingning Han
b66f7016c1 Take out repeated block width/height lookup functions
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.

Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
2014-10-07 12:33:07 -07:00
Jingning Han
5d9cdac087 Move inter filter defs to vp9_filter.h
Add comments on the use case of these definitions. Further reduce
the scope of header file in vp9_context_tree.h.

Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75
2014-10-07 12:16:37 -07:00
Deb Mukherjee
cfc337aae8 Merge "Resolves some static analysis / undefined warnings" 2014-10-07 12:15:26 -07:00
Deb Mukherjee
fced63ed30 Resolves some static analysis / undefined warnings
Also fixes a case of distortion becoming negative and messing
up the RDCOST computation.

Change-Id: Id345af9e8dfff31ade622be5756e51f2cdface53
2014-10-07 11:20:56 -07:00
Jingning Han
5e32036b97 Reduce the scope of the header file used in vp9_context_tree.h
Change-Id: I264ee35044a5973c7725daba7af870968353a3c1
2014-10-07 11:13:35 -07:00
JackyChen
a9f479682a Merge "Add SSE2 code and unit test for VP9 denoiser." 2014-10-07 10:51:55 -07:00
Jingning Han
3f93f23120 Remove redundant header file from vp9_encoder.h
Change-Id: Ia212390cf8d36db5436bb0f0e1b696f70066341a
2014-10-07 10:49:58 -07:00
Jingning Han
a75551585b Fix eobs buffer pointer mis-use
This commit fixes a buffer pointer mis-use in store_coding_context.
The compression performance for stdhd set of speed 3 is improved by
0.097%. It fixes issue 869.

Change-Id: Idc59e22035eaf39f7133ca04174894374d647ff7
2014-10-06 15:57:13 -07:00
JackyChen
80465dae88 Add SSE2 code and unit test for VP9 denoiser.
This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
only 16x16 blocks in denoiser, while in VP9, there are 13 different
block sizes.

By adding this SSE2 code, the improvement of encoder speed is around
20%(using C code vs using SSE2 code), vary for different clips.

The unit test for VP9 denoiser is to confirm that the SSE2 code is
bit-exact with the C code. The unit test covers all block size.

Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
2014-10-06 15:27:40 -07:00
Jingning Han
1b8c57e915 Merge "Fix an IOC issue in vp9_rd_pick_inter_mode_sb" 2014-10-06 09:29:29 -07:00
Yaowu Xu
5966acc1be Merge "Properly initialize segmentID in nonrd coding path" 2014-10-06 07:57:36 -07:00
Paul Wilkins
0e1068a4bd Improve two pass VBR accuracy.
Adjustments to the GF interval choice and minimum boost.
Adjustment to the calculation of 2 pass worst q.
Compared to 09/29 head there is metrics hit on derf of
(-0.123%,-0.191%)

Compared to the September 29 head and a baseline on
September 18 baseline the accuracy of the VBR rate control
measured on the derf set is as follows:-

Mean error %  / Mean abs(error %)
Sept 18 baseline (-7.0% / 14.76%)
Sept 29 head (-15.7%, 19.8%)
This check in (-1.5% / 14.4%)

The mean undershoot is reduced slightly but the
worst case overshoot on e.g. harbour/highway is
increased. This will be addressed in a later patch.

Change-Id: Iffd9b0ab7432a131c98fbaaa82d1e5b40be72b58
2014-10-06 14:20:09 +01:00
Jingning Han
085b97aa5c Fix an IOC issue in vp9_rd_pick_inter_mode_sb
It is possible that the GOLDEN reference frame is not avaiable, in
which setting the predicted mv will be associated with a residual
value of INT_MAX. This commit checks this condition before
left shift and comparison with that of ALTREF frame, to avoid
overflow issue.

Change-Id: Ib98c3149dbdd016f2fe5beaafb13f67d469dd07c
2014-10-05 12:05:14 -07:00
Jingning Han
a021924092 Merge "Fix indent in encode_rd_sb_row" 2014-10-03 15:24:02 -07:00
Jingning Han
a1088e0b5f Merge "Rework partition search skip scheme" 2014-10-03 15:23:54 -07:00
Yaowu Xu
0065b73481 Properly initialize segmentID in nonrd coding path
This commit adds proper initialization of segment id for variance AQ
mode in non-rd coding path. It fixes the enc/dec mismatch issue of
rt=7 with --aq-mode=1, as reported in issue #816

Change-Id: I02fa41b96345bf2e66077d5ea553f85ba800f7bb
2014-10-03 15:01:53 -07:00
Deb Mukherjee
8a01074d04 Merge "Incorporate WRAPLOW macro into non-highbitdepth tx" 2014-10-03 12:45:39 -07:00
Jingning Han
ef62233396 Fix indent in encode_rd_sb_row
Change-Id: Icbcfe7b56d88474f4398b4c5b52f6719d551ab4a
2014-10-03 11:57:36 -07:00
Jingning Han
bb260d9076 Rework partition search skip scheme
This commit enables the encoder to skip split partition search if
the bigger block size has all non-zero quantized coefficients in low
frequency area and the total rate cost is below a certain threshold.
It logarithmatically scales the rate threshold according to the
current block size. For speed 3, the compression performance loss:
derf  -0.093%
stdhd -0.066%

Local experiments show 4% - 20% encoding speed-up for speed 3.
blue_sky_1080p, 1500 kbps
51051 b/f, 35.891 dB, 67236 ms ->
50554 b/f, 35.857 dB, 59270 ms (12% speed-up)

old_town_cross_720p, 1500 kbps
14431 b/f, 36.249 dB, 57687 ms ->
14108 b/f, 36.172 dB, 46586 ms (19% speed-up)

pedestrian_area_1080p, 1500 kbps
50812 b/f, 40.124 dB, 100439 ms ->
50755 b/f, 40.118 dB,  96549 ms (4% speed-up)

mobile_calendar_720p, 1000 kbps
10352 b/f, 35.055 dB, 51837 ms ->
10172 b/f, 35.003 dB, 44076 ms (15% speed-up)

Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
2014-10-03 11:54:30 -07:00
Deb Mukherjee
d50716face Incorporate WRAPLOW macro into non-highbitdepth tx
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.

Also incudes some minor cleanups/renaming etc.

Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
2014-10-03 11:38:05 -07:00
Deb Mukherjee
2c7b94f6ec Merge "Prevent negative cost for highbitdepth" 2014-10-03 11:37:47 -07:00
Deb Mukherjee
431cdc33ee Prevent negative cost for highbitdepth
Adds proper scaling for highbitdepth in a rdopt cost.

Change-Id: I066694799a7f491b830945ef1c66eb202071c355
2014-10-03 10:22:21 -07:00
Deb Mukherjee
00a4b20fbe rdmult data type change
To fix a VS warning.

Change-Id: I4c530c0afe8d06acdb8cc78b7995aba57a25373d
2014-10-03 00:09:41 -07:00
Yaowu Xu
f809475c73 Merge "Make iscan and scan neighbor arrays static const." 2014-10-02 15:15:58 -07:00
Yaowu Xu
9712bc691d Make iscan and scan neighbor arrays static const.
This commit changes the tables to be read only, which fixes
issue #866

Change-Id: I85bbe03f9d344f50570f8c1c61699bdc5cee248f
2014-10-02 14:08:14 -07:00
Alex Converse
a0befb93e7 Fix subsampling check for images 1 pixel wide/tall
Change-Id: I0e262ede7eb4a4ae0c86181922d744e542e93350
2014-10-02 11:02:57 -07:00
Deb Mukherjee
35e8fa1458 rdmult data type change to fix high bit-depth
Fixes an intermittent assert failure for highbitdepth.

Change-Id: If8cad0209a94f1184b69c7b3f1d587934f857d9b
2014-10-02 07:37:26 -07:00
Jingning Han
9641b1b9ac Merge "Remove unused header files from vp9_encodemb.h" 2014-10-01 17:05:25 -07:00
Deb Mukherjee
30fbf23fda Merge "High-bitdepth bugfixes" 2014-10-01 16:47:43 -07:00
Yunqing Wang
e350e3fe68 Merge "Modify block transform skipping check" 2014-10-01 16:19:56 -07:00
Jingning Han
72a78a0c40 Remove unused header files from vp9_encodemb.h
Change-Id: Icfc3fb62cc0b05e435814035bfe1f2e2870442b4
2014-10-01 14:50:24 -07:00
Deb Mukherjee
a160d72522 High-bitdepth bugfixes
Miscellaneous bug-fixes for high bitdepth functionality.
With this patch, high bit-depth profiles become mostly functional,
except for an intermittent assert failure issue that is being
tracked.

Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
2014-10-01 14:18:11 -07:00
Jingning Han
0a9f5fa146 Remove repeated header files from vp9_block.h
This commit removes unused header file vp9_onyxc_int.h and repeatedly
included file vpx_ports/mem.h from vp9_block.h

Change-Id: I400b210bd1da48f1880bd50a8f4a6e2c690e15a1
2014-10-01 13:01:43 -07:00
Yunqing Wang
e4aac6bb61 Modify block transform skipping check
Block transform skipping was implemented based on DCT's energy
conservation property. Modified the thresholds using zero bin
parameters. AC and DC coefficients were checked separately to
allow better identifying of skippable blocks.

Borg test at speed 3 showed:
stdhd set: psnr gain: 0.153%, ssim gain: 0.051%;
derf set: psnr gain: 0.023%, ssim gain: 0.036%

For most test clips, the encoding speedup is 1% - 2%.
parkrun(720p): 7.5% speedup, park_joy(1080p): 3.5% speedup.

Change-Id: If28eb81113a077414f5ca7b021c14f9069b373bb
2014-10-01 12:58:09 -07:00
Jingning Han
20a37391d9 Merge "Conditionally skip reference frame check" 2014-10-01 11:19:10 -07:00
Jingning Han
891793a540 Conditionally skip reference frame check
For regular inter frames, if the distance from GOLDEN_FRAME is larger
than 2 and if the predicted motion vector of LAST_FRAME gives lower
sse than that of GOLDEN_FRAME, skip the GOLDE_FRAME mode checking in
the rate-distortion optimization. It provides about 5% speed-up at
expense of -0.137% and -0.230% performance down for speed 3. Local
experiment results:

pedestrian 1080p 2000 kbps
66712 b/f, 40.908 dB, 113688 ms ->
66768 b/f, 40.911 dB, 108752 ms

blue_sky 1080p 2000 kbps
51054 b/f, 35.894 dB, 70406 ms ->
51051 b/f, 35.891 dB, 67236 ms

old_town_cross 720p 1500 kbps
14412 b/f, 36.252 dB, 60690 ms ->
14431 b/f, 36.249 dB, 57346 ms

Change-Id: Idfcafe7f63da7a4896602fc60bd7093f0f0d82ca
2014-10-01 08:32:15 -07:00