Restructure multi-threaded decoder

On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.

The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.

Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.

Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
This commit is contained in:
Yunqing Wang
2010-09-16 14:08:52 -04:00
parent 9100073e8d
commit f857a85088
8 changed files with 1624 additions and 474 deletions

View File

@@ -15,12 +15,12 @@
#ifndef _DECODER_THREADING_H
#define _DECODER_THREADING_H
extern void vp8_mtdecode_mb_rows(VP8D_COMP *pbi,
MACROBLOCKD *xd);
extern void vp8_mt_loop_filter_frame(VP8D_COMP *pbi);
extern void vp8_stop_lfthread(VP8D_COMP *pbi);
extern void vp8_start_lfthread(VP8D_COMP *pbi);
#if CONFIG_MULTITHREAD
extern void vp8mt_decode_mb_rows(VP8D_COMP *pbi, MACROBLOCKD *xd);
extern void vp8_decoder_remove_threads(VP8D_COMP *pbi);
extern void vp8_decoder_create_threads(VP8D_COMP *pbi);
extern int vp8mt_alloc_temp_buffers(VP8D_COMP *pbi, int width, int prev_mb_rows);
extern void vp8mt_de_alloc_temp_buffers(VP8D_COMP *pbi, int mb_rows);
#endif
#endif