Commit Graph

56 Commits

Author SHA1 Message Date
Yaowu Xu
81d16e3f53 fixed an issue with 8x8 token cost in trellisquant
changed the token cost for 8x8 transformed macroblock used in trellisquant
from those derived from 4x4 transform coefficient distribution to those
derived from 8x8 transform coefficient distribution. Test results show
this fix help 8x8 transform based compression consistently on cif and hd
sets:

http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif_cost8x8only.html
(avg psnr:.14% glb psnr: .17% ssim: .20%)
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hd_cost8x8only.html
(avg psnr:.17% glb psnr: .18% ssim: .58%)

Note: To test the effect of this change, 8x8 transform was forced to be used
only on 16x16 predicted macroblocks on inter frames, the effect would be
bigger had all macroblocks been forcd to use 8x8 transform.

Change-Id: If9b7868b75357c66541f511e5ee78e4d2d4929a4
2012-01-26 14:50:11 -08:00
Yaowu Xu
d37cd97682 Removed #if CONFIG_I8X8
This commit removed the macro CONFIG_I8X8, which was used to indicate
the 8x8 intra prediction experiment, made the change fully merged in.

Change-Id: Iafa4443781ce6e83f5591c12ba615a0e92ce0ea0
2011-12-07 13:48:53 -08:00
Yaowu Xu
a8fbab8697 enabled 8x8 intra prediction modes on inter frames
This commit enabled the usage of 8x8 intra prediction modes on inter
frames. There are a few TODO items related to this: 1)baseline entropy
need be calibrated; 2)cost of UV need to be done more properly rather
than using decision only relying on Y; 3)Threshold for allowing picking
8x8 intra prediction should be lowered to lower than the B_PRED.

Even with all the TODOs, tests showed consistent gain on derf set ~0.1%
(PSNR:0.08% and SSIM:0.14%). It is assumed that 8x8 intra prediction
will help more on large resolution clips, especially with above TODOs
addressed.

Change-Id: I398ada49dfc32575cfab962a569c2885111ae3ba
2011-12-02 13:44:47 -08:00
Yaowu Xu
bba710fcbd added transform type to MB_MODE_INFO
this commit is to add an variable in the macroblock level mode
info structure to track the transform size used in each MB, so
the information can be used later in the loop filter to change
how loop filter works on MBs with different transform sizes.

Change-Id: Id0eeaba6cc854c6d1be00ed8d237b3d9e250e447
2011-12-01 07:34:27 -08:00
Yaowu Xu
7f33be9e96 fixed the scaling in 8x8 trellis quant
This commit has a few minor fixes to the 8x8 trellis quant, so to
make it work regardless if extend_qrange is enabled or not. It also
borrowed adaptive RDMULT constants from 4x4 trellis that was missed
in the 8x8 trellis quant.

Change-Id: I60d7769071f102c699b5084597e62bca87a1f759
2011-11-16 14:29:02 -08:00
Yaowu Xu
982b061dc2 Make 8x8 and extend_qrange to work together
This commit added scaling factors to 8x8 transform, quant, dequant and
inverse transform pipeline to make 8x8 transform to work when configed
with enable-extend_qrange. This commit also disabled the trellis-quant
when extend_qrange is configured.

Change-Id: Icfb3192e4746f70a4bb35ad18b7b47705b657e52
2011-11-11 07:31:00 -08:00
Yaowu Xu
cbcba9e7c0 scaled the threshold for 2nd order coefficient reset
extend_qrange introduces a different scaling factor, this commit takes
the scaling difference into account for reset 2nd order coefficients.

Change-Id: Ie58bca9f52698fa759e3f88da2aa4d82630fa91a
2011-11-10 08:31:13 -08:00
Yaowu Xu
5883246d01 make debug match release build on win32 with 8x8 transform enabled
The 8x8 forward transform makes use of floating operations, therefore
requires emms call to reset mmx registers to correct state. Without
the resets, the 8x8 forward transform results are indefinite on win32
platform.

Change-Id: Ib5b71c3213e10b8a04fe776adf885f3714e7deb1
2011-11-08 20:36:54 -08:00
Paul Wilkins
a9df4183a6 Segment signaling of TX size
Initial attempt at using new segment feature signaling
to indicate 4x4 or 8x8 transform.

needs --enable-experimental --enable-t8x8

Note this is work in progress.

Change-Id: Ib160d46a5d810307bfcbc79853ce1a65b5b870b7
2011-11-08 12:21:08 +00:00
Yaowu Xu
d8afecef71 Added context reset when 2nd order coefficients are cleared
As discovered in path 10 of Change Ia12acd2f, reset 2nd order coeffs
without reset of above and left coding context may have introduced
problem that causes encoder/decoder mismatching. This commit added
update to coding context when the 2nd coefficients are cleared.

In addition, this commit also introduced early breakout in the checks
to speed up when coefficients are too significant to be cleared.

Change-Id: I85322a432b11e8af85001525d1e9dc218f9a0bd6
2011-11-03 16:05:29 -07:00
Yaowu Xu
c8ef79d22e Change to prevent encoding of effect-less 2nd order coefficients
similar logic to http://gerrit.chromium.org/gerrit/#change,10359

Change-Id: Ia12acd2f2b3b92ef2a601da43c2497034ef62174
2011-10-25 10:25:02 -07:00
Yaowu Xu
ca6b85aa4e add 8x8 intra prediction modes
Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.

Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html

To build and test, configure with:
--enable-experimental --enable-i8x8

Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f
2011-09-16 15:55:19 -07:00
John Koleszar
62371d382a Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/decoder/decodframe.c
	vp8/encoder/encodeframe.c
	vp8/encoder/encodemb.c

Change-Id: I6e0d1669e4409a2dfd73ba2c7038d730842d3953
2011-09-16 09:22:29 -04:00
Scott LaVarnway
b870947d42 Removed bmi copy to/from BLOCKD
for SPLITMV and B_PRED modes.  Modified code to use the bmi
found in mode_info_context instead of BLOCKD.  On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock.  This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)

Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
2011-08-24 14:42:26 -04:00
Yaowu Xu
8c31484ea1 fix more merge issues
With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental

Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf
2011-07-21 09:01:53 -07:00
Deb Mukherjee
08f6471890 Add 8x8 transform to experimental branch
Please refer to previous commit messages for detailed info:
https://on2-git.corp.google.com/g/#change,5940
https://on2-git.corp.google.com/g/#change,6045

Change-Id: I8b16992f2f69c5a808ad40a3e32ef589cce7c59d
2011-07-20 09:49:22 -07:00
John Koleszar
766fce16c4 Merge remote branch 'internal/upstream' into HEAD 2011-07-01 00:05:11 -04:00
Yunqing Wang
0d87098e08 Copy macroblock data to a buffer before encoding it
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.

Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
2011-06-23 13:54:02 -04:00
John Koleszar
b08c6fa699 Merge remote branch 'origin/master' into experimental
Change-Id: I24a548e3ce7794409b6731829f83befc0d465800
2011-05-10 00:05:10 -04:00
Johann
a7d4d3c550 clean up unused variable warnings
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-05-09 12:56:20 -04:00
John Koleszar
5dfd6f51cb Merge remote branch 'origin/master' into experimental
Change-Id: I6f77e7c10a54c54b26126b8acd5edd0a03358a41
2011-04-22 00:05:08 -04:00
Scott LaVarnway
09c933ea80 Removed redundant checks of the mode_info_context flags
Code cleanup.  The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.

Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
2011-04-20 14:06:40 -04:00
John Koleszar
cb3e0aaba3 Merge remote branch 'origin/master' into experimental
Change-Id: I231e4dd65adcf4f5c158e3749880a18b8c36cbe4
2011-04-13 00:05:09 -04:00
John Koleszar
7ff5084f33 Merge remote branch 'origin/master' into experimental
Change-Id: Ib42656b05f2b099f17fd6c2033bbc3445421150c
2011-04-12 00:05:09 -04:00
Yunqing Wang
4fd81a99f8 Set cpu_used range to [-16, 16] in real-time mode
Remove encoding speed limitation in real-time mode.

Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
2011-04-11 15:55:04 -04:00
Yunqing Wang
d1abe62d1c Define RDCOST only once
Clean up the code.

Change-Id: I7db048efa4d972b528d553a7921bc45979621129
2011-04-11 11:53:56 -04:00
John Koleszar
5f6db3591c Merge remote branch 'origin/master' into experimental
Conflicts:
	vp8/encoder/ratectrl.c
	vp8/encoder/rdopt.c

Change-Id: I4cc58acb432662d2c47aceda1680e52982adbc06
2011-03-23 00:24:25 -04:00
John Koleszar
429dc676b1 Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.

These symbols were identified by:

  $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
    | sort | grep '^ *1 '

Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-17 20:53:47 -04:00
John Koleszar
1a7ce50a6c Merge remote branch 'origin/master' into experimental
Change-Id: I52f21ff6f9a1dca7099a8459657f6f288c5bfe40
2011-02-25 00:05:08 -05:00
Scott LaVarnway
861175ef00 Removed vp8_block2type
and used defines instead.

Change-Id: Idb56e0295d004793f406dfd2d8d8c546aad62e03
2011-02-24 14:35:18 -05:00
John Koleszar
4fafc4d985 Merge remote branch 'origin/master' into experimental
Change-Id: I8999a33db82d38eb85482f3c423db238d6ee3ed9
2011-02-18 00:05:11 -05:00
John Koleszar
02321de0f2 Fix relative include paths
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.

Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-10 15:09:44 -05:00
Yaowu Xu
5b42ae09ae experiment extending the quantizer range
Prior to this change, VP8 min quantizer is 4, which caps the
highest quality around 51DB. This experimental change extends
the min quantizer to 1, removes the cap and allows the highest
quality to be around ~73DB, consistent with the fdct/idct round trip
error. To test this change, at configure time use options:

--enable-experimental --enable-extend_qrange

The following is a brief log of changes in each of the patch sets

patch set 1:
In this commit, the quantization/dequantization constants are kept
unchanged, instead scaling factor 4 is rolled into fdct/idct.
Fixed Q0 encoding tests on mobile:
  Before:    9560.567kbps Overall PSNR:50.255DB VPXSSIM:98.288
  Now:   18035.774kbps Overall PSNR:73.022DB VPXSSIM:99.991

patch set 2:
regenerated dc/ac quantizer lookup tables based on the scaling
factor rolled in the fdct/idct. Also slightly extended the range
towards the high quantizer end.

patch set 3:
slightly tweaked the quantizer tables and generated bits_per_mb
table based on Paul's suggestions.

patch set 4:
fix a typo in idct, re-calculated tables relating active max Q
to active min Q

patch set 5:
added rdmult lookup table based on Q

patch set 6:
fix rdmult scale: dct coefficient has scaled up by 4

patch set 7:
make transform coefficients to be within 16bits

patch set 8:
normalize 2nd order quantizers

patch set 9:
fix mis-spellings

patch set 10:
change the configure script and macros to allow experimental code
to be enabled at configure time with --enable-extend_qrange

patch set 11:
rebase for merge

Change-Id: Ib50641ddd44aba2a52ed890222c309faa31cc59c
2011-01-19 13:22:35 -08:00
Henrik Lundin
48c28fc42c Remove unused local variables
Removing unused local variables causing compiler warnings in
Visual Studio.

Change-Id: I0e2096303be1fdbc01428a6e57cca9796bb32c8a
2011-01-11 15:22:19 +01:00
Yaowu Xu
062980cc48 Merge "adjust RDMULT for UV plane in quantization RDO" 2010-12-06 22:04:45 -08:00
Yaowu Xu
7c03a1c308 adjust RDMULT for UV plane in quantization RDO
This patch adds a weighting factor on RDMULT for UV blocks. The change
has an overall gain about 0.5% based on ssim, between 0.1 and 0.2% by
psnr numbers.

Change-Id: I97781b077ce3bb7e34241b03268491917e8d1d72
2010-12-06 20:53:59 -08:00
John Koleszar
5e76dfcc70 Merge 'Add simple version of activity masking.'
Merge commit 'refs/changes/79/779/2' of
    https://review.webmproject.org/p/libvpx

Conflicts:
	vp8/encoder/encodeintra.c
	vp8/encoder/encodemb.c

Change-Id: Id607063fabe92d99eeb3c380e8ca670b01bfb3ef
2010-12-03 13:30:50 -05:00
Yaowu Xu
ef2f27f10e make rdmult adaptive for intra in quantizer RDO
This intends to correct the tendency that VP8 aggressively favors rate
on intra coded frames. Experiments tested different numbers in [0, 1]
and found 9/16 overall provided about 2-4% gains for all-intra coded
clips based on vpx-ssim metric. The impact on regular encoded clips
is much smaller but positive overall. Overall impact on psnr is also
positive even though very small.

Change-Id: If808553aaaa87fdd44691f9787820ac9856d9f8a
2010-11-11 11:33:35 -08:00
John Koleszar
d6c67f02c9 make vp8_recon16x16mb{,y} RTCD functions
ARM NEON has a platform specific version of vp8_recon16x16mb, though
it's just a stub to extract the various parameters from the
MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
that function's prototype directly will be a better long term solution,
but it's quite an invasive change.

Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
2010-10-26 13:23:36 -04:00
Timothy B. Terriberry
8f75ea6b5c Convert [4][4] matrices to [16] arrays.
Most of the code that actually uses these matrices indexes them as
 if they were a single contiguous array, and coverity produces
 reports about the resulting accesses that overflow the static
 bounds of the first row.
This is perfectly legal in C, but converting them to actual [16]
 arrays should eliminate the report, and removes a good deal of
 extraneous indexing and address operators from the code.

Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-21 17:04:30 -07:00
Yaowu Xu
2e53e9e53f change to make use of more trellis quantization
when a subsequent frame is encoded as an alt reference frame, it is
unlikely that any mb in current frame will be used as reference for
future frames, so we can enable quantization optimization even when
the RD constant is slightly rate-biased. The change has an overall
benefit between 0.1% to 0.2% bit savings on the test sets based on
vpxssim scores.

Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
2010-10-15 10:14:34 -07:00
John Koleszar
136857475e Centralize mb skip state calculation
This patch moves the scattered updates to the mb skip state
(mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent
changes to the quantizer exposed a bug where if a macroblock
could be coded as a skip but isn't, the encoder would run the
loopfilter but the decoder wouldn't, causing a reference buffer
mismatch.

The loopfilter is controlled by a flag called dc_diff. The decoder
looks at the number of decoded coefficients when setting this flag.
The encoder sets this flag based on the skip state, since any
skippable macroblock should be transmitted as a skip. The coefficient
optimization pass (vp8_optimize_b()) could change the coefficients
such that a block that was not a skip becomes one. The encoder was
not updating the skip state in this situation for intra coded blocks.

The underlying issue predates it, but this bug was recently triggered
by enabling trellis quantization on the Y2 block in commit dcd29e3,
and by changing the quantizer range control in commit 305be4e.

Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802
2010-10-12 09:03:19 -04:00
Timothy B. Terriberry
8d0f7a01e6 Add simple version of activity masking.
This uses MB variance to change the RDO weight for mode decision
 and quantization.
Activity is normalized against the average for the frame, which is
 currently tracked using feed-forward statistics.
This could also be used to adjust the quantizer for the entire
 frame, but that requires more extensive rate control changes.
This does not yet attempt to adapt the quantizer within the frame,
 but the signaling cost means that will likely only be useful at
 very high rates.

Change-Id: I26cd7c755cac3ff33cfe0688b1da50b2b87b9c93
2010-10-12 08:41:03 -04:00
Yaowu Xu
dcd29e369f enable trellis quantization for 2nd order blocks
Experimented with different value for Y2_RD_MULT ranging f[1, 32],
without adapting the value to MB coding mode/frame type/Q value,
4 works out best among all values, providing overall 0.1% coding
gain on the test set.

Change-Id: I6b2583a8aa5db5e7e5c65c646301909c0c58f876
2010-10-02 06:20:33 -07:00
John Koleszar
c2140b8af1 Use WebM in copyright notice for consistency
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.

Fixes issue #97.

Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-09 10:01:21 -04:00
Scott LaVarnway
e85e631504 Changed above and left context data layout
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit)  This layout should be more
cache friendly.

As a result of this change, the encoder had to be updated.

Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
2010-08-31 11:24:30 -04:00
Scott LaVarnway
9c7a0090e0 Removed unnecessary MB_MODE_INFO copies
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD.  As a result,
a large number compile errors had to be fixed.

Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
2010-08-12 16:25:43 -04:00
Yaowu Xu
c404fa42ac Removed duplicate functions
Change-Id: Ie587972ccefd3c762b8cdf8ef39345cd22924b9b
2010-08-10 21:45:34 -07:00
Timothy B. Terriberry
8fa38096a3 Add trellis quantization.
Replace the exponential search for optimal rounding during
 quantization with a linear Viterbi trellis and enable it
 by default when using --best.
Right now this operates on top of the output of the adaptive
 zero-bin quantizer in vp8_regular_quantize_b() and gives a small
 gain.
It can be tested as a replacement for that quantizer by
 enabling the call to vp8_strict_quantize_b(), which uses
 normal rounding and no zero bin offset.
Ultimately, the quantizer will have to become a function of lambda
 in order to take advantage of activity masking, since there is
 limited ability to change the quantization factor itself.
However, currently vp8_strict_quantize_b() plus the trellis
 quantizer (which is lambda-dependent) loses to
 vp8_regular_quantize_b() alone (which is not) on my test clip.

Patch Set 3:

Fix an issue related to the cost evaluation of successor
states when a coefficient is reduced to zero. With this
issue fixed, now the trellis search almost exactly matches
the exponential search.

Patch Set 2:

Overall, the goal of this patch set is to make "trellis"
search to produce encodings that match the exponential
search version. There are three main differences between
Patch Set 2 and 1:
a. Patch set 1 did not properly account for the scale of
2nd order error, so patch set 2 disable it all together
for 2nd blocks.
b. Patch set 1 was not consistent on when to enable the
the quantization optimization. Patch set 2 restore the
condition to be consistent.
c. Patch set 1 checks quantized level L-1, and L for any
input coefficient was quantized to L. Patch set 2 limits
the candidate coefficient to those that were rounded up
to L. It is worth noting here that a strategy to check
L and L+1 for coefficients that were truncated down to L
might work.

(a and b get trellis quant to basically match the exponential
search on all mid/low rate encodings on cif set, without
a, b, trellis quant can hurt the psnr by 0.2 to .3db at
200kbps for some cif clips)
(c gets trellis quant  to match the exponential search
to match at Q0 encoding, without c, trellis quant can be
1.5 to 2db lower for encodings with fixed Q at 0 on most
derf cif clips)

Change-Id:	Ib1a043b665d75fbf00cb0257b7c18e90eebab95e
2010-08-10 20:58:24 -07:00
Yaowu Xu
d0dd01b8ce Redo the forward 4x4 dct
The new fdct lowers the round trip sum squared error for a
4x4 block ~0.12. or ~0.008/pixel. For reference, the old
matrix multiply version has average round trip error 1.46
for a 4x4 block.

Thanks to "derf" for his suggestions and references.

Change-Id: I5559d1e81d333b319404ab16b336b739f87afc79
2010-06-24 13:17:58 -07:00