Training Quality

Location:

China

Posted:

November 22, 2012

Contact this candidate

Resume:

J Comput Electron (****) *:*** ***

DOI **.****/s*0825-006-0145-z

Super-resolution using neural networks based on the optimal

recovery theory

Yizhen Huang Yangjing Long

Published online: 18 January 2007

C Springer Science + Business Media, LLC 2007

Abstract An optimal recovery based neural-network Super i.e. large number of pixels per unit area, and hence limits the

Resolution algorithm is developed. The proposed method is amount of light available for each pixel, which exaggerates

computationally less expensive and outputs images with high the effect of shot noise and degrades the image quality ev-

subjective quality, compared with previous neural-network or idently. To avoid suffering from shot noise, there exists the

optimal recovery algorithms. It is evaluated on classical SR limitation of the pixel size reduction, and the optimally lim-

ited pixel size is estimated at about 40 m2 for a 0.35 m

test images with both generic and specialized training sets,

and compared with other state-of-the-art methods. Results CMOS process. The current image sensor technology has

show that our algorithm is among the state-of-the-art, both almost reached this level.

in quality and ef ciency. This urges and expedites the development of software so-

lution to increase resolution level. Such resolution enhance-

Keywords Super resolution . Neural-network . Optimal ment techniques may be generally referred to as Super Res-

recovery olution (SR) image reconstruction, which primarily consists

of three categories: (a) Multiple-frame SR by extracting a

single HR image from a sequence of Low-Resolution (LR)

1 Introduction

images aligned in sub-pixel accuracy (see [1 3] for excellent

surveys) (b) Single-frame example-based SR that increases

With the popularization of various image-capturing devices

the resolution of a single image with training based on anal-

such as commercial digital cameras, camphones and cam-

ogous images [4 11, 29 31] (c) Single-frame SR that does

corders, digital images can be obtained more easily and con-

not depend on any other images [12 27]. The latter category

veniently than ever before. However, High-Resolution (HR)

of single-frame SR is more commonly referred to as image

sensors are still costly. This is because, from the hardware

interpolation or digital zooming. In this article, we focus on

perspective, high spatial resolution requires small pixel size

single-frame SR.

A ner taxonomy can be made for single-frame SR accord-

Y. Long: This work was supported in part by the National

ing to branches of their algorithms. Due the limited space,

211/985 Project of Shanghai Jiaotong University under Subgrant

PRP[S03009009]. only a few are listed: (a) Linear methods [12 16] are rel-

atively fast once the lter coef cients are found. This is

Y. Huang . Y. Long

especially the case for short kernel methods [12]. But the

Department of Computer Science and Engineering, School of

Electronics Information & Electric Engineering, Shanghai visual quality of the resulting images is often quite unsatis-

Jiaotong University, 800 Dongchuan Road, Minhang District,

factory. Shorter kernel lters usually result in blurred images

Shanghai 200240, PR China

or aliasing effects [13, 14], and with longer lters the images

e-mail: ***********@****.***.**

often exhibit ringing along sharp edges [15, 16]. (b) Images

Y. Long interpolated with edge-directed approaches [17, 18] look

Department of Mathematics, Shanghai Jiaotong University, 800

sharper but often seem too much like drawings, especially for

Dongchuan Road, Minhang District, Shanghai 200240, PR China,

large zoom factors. (c) The ideal lter response approaches

Digital Media and Data Reconstruction Lab., Shanghai Jiaotong

[19, 20] are with theoretical perfection and poor practical

University

Springer

276 J Comput Electron (2006) 5:275 281

performance. (d) Projection-Onto-Convex-Set (POCS) algo- both the objective quality (commonly measured in terms of

rithms produce images with high Peak Signal-to-Noise Ratio PSNR, Mean Square Error (MSE), or Normalized Color Dif-

(PSNR) but are computationally expensive [21, 22]. (e) Par- ference (NCD)) and subjective quality (namely visual experi-

tial Differential Equation (PDE) models [23, 24] establish ence by our Human Vision System (HVS)) of output images

inter-pixel correlation from a more abstract point of view, yet [4, 6, 8, 11].

they usually yield images with jagged artifacts and are vul- A lot of image reconstruction algorithms based on the op-

nerable to noises. (f) Optimal recovery interpolators [23, 25 timal recovery theory arise recently [23, 25 28]. Although

27], and (g) Neural-network based methods [29 31], which quite different from each other, they bene t from the common

are to be detailed in Section 2. strengths intrinsic to the optimal recovery representation: the

The branch, to which each SR algorithm belongs, affects subjective quality of output images is satisfactory [27]. How-

the output image quality and computing expense intrinsically. ever, for each image patch, not only does it require computing

Based on this observation, one way to nd potentially useful matrix multiplication but also inverse matrices, which is inef-

SR algorithms is by introducing new category of methodol- cient and induces relatively large errors. Speci cally, if the

ogy. However, this is rather dif cult considering the fact that matrix to be inversed is singular or approximately singular,

SR has been intensively studies for decades (see the semi- the errors induced are considerable. Therefore optimal recov-

nal work by Tsai and Huang [33] for multiple-frame SR and ery algorithms are slow and produce images with relatively

early textbook [34] for single-frame SR). inferior subjective quality [25, 27] among the state-of-the-art

Another way is by combining several existing SR algo- methods.

rithms to take their respective advantages. Driven by this, The success of these algorithms motivates us to ask that

a hybrid method of optimal recovery and neural-network is whether it is possible to integrate an optimal recovery based

proposed in this paper. The rest of the manuscript is arranged approach within a neural-network framework and, if so, will

as follows: In Section 2, the two branches of algorithms, from the two different branches of algorithms complement each

which our hybrid method evolves, together with the main idea other to offer a better algorithm? We believe that the answers

of our method are presented; Sections 3 details our proposed should be positive. An important observation is that, optimal

method; Section 4 is experimental results; Finally conclusion recovery condenses and extracts local image patch informa-

is provided in Section 5. tion and enables neural-networks to be trained and simulate

in more effective way.

Guided by these ideas, an optimal recovery based neural-

2 Related work and main idea network SR algorithm is developed, which is computation-

ally less expensive and outputs images with high subjective

Like other image reconstruction problems such as Color Fil- quality, compared with previous neural-network or optimal

ter Array interpolation [28, 32], SR is an inherently ill-posed recovery algorithms. It is evaluated on classical SR test im-

inverse problem since there may exist a virtually in nite set ages and compared with other state-of-the-art methods. Re-

of magni ed images from the original given data. The prob- sults show that our algorithm is among the state-of-the-art,

lem is usually placed into an inter-pixel correlation model both in quality and ef ciency.

[6, 7, 11, 12, 15 17, 23, 24, 29 31], which has been ac-

knowledged to be highly non-linear. Back propagation neural 3 Proposed algorithm

networks are capable of learning complex nonlinear func-

tions. This naturally motivates the development of neural- The proposed algorithm consists of the following 3 portions:

network approaches [29 32] that produces better results in

high-frequency regions. 3.1 Optimal recovery model

One problem using neural-networks is that, there exists

large information redundancy in the functional relation be- The rst key point of using optimal recovery for image in-

terpolation is determining the quadratic signal class K:

tween input and output neurons, which affects ef ciency

greatly. This is especially the case if neural-networks are

K = {x R n : x T Qx }

applied directly in a brute force way [29, 30, 32]. Another

problem is that, the training step for neural-networks is con-

siderably time-consuming. Fortunately, this can be done as from a set of training data. The training data is usually taken

an initialization and does not occupy any computing resource from the local features of the image and selecting a proper

for real-time processing. On the other hand, the training training set is discussed at the end of this subsection. For

now assume that a training set of patches S = x1, . . ., xm

step is also an opportunity to incorporate priori information.

Such information is often referred to as example images, representative of the local data is given for estimating the

local quadratic signal class. The Q for which the ellipsoid

and statistics show that, example images help to improve

Springer

J Comput Electron (2006) 5:275 281 277

x T Qx (1)

for some constant must be representative of the training

set S. In other words Q must be a matrix such that when an

image patch y is similar to the vectors in S then Eq. (1) holds

for y. Let matrix S be formed by arranging the image patches

in S as columns:

S = (x1, . . ., xm ) (2)

Fig. 1 Local HR image used for selecting S to estimate the quadratic

and consider the equation relating the image patch y to the class for the center 4 4 patch (dark pixels are part of the decimated

image)

training set S using a column of m weights, a:

Sa = y (3) Given the formulation of the quadratic signal class from a

training set S, or equivalently the formulation of Q, the next

Vector y is similar to the vectors in S when a has small energy. key point is determining the training set S. One commonly

Using standard notation for singular value decomposition of used approach is based on the proximity of their locations to

S, Eq. (3) can be rewritten as: the position of the vector being modeled. Patches are gener-

ated from the local neighborhood.

U VTa = y For example, in Fig. 1 to model the quadratic signal class

that the center patch

The weight vector a is given by

x = [x2,2, x2,3, x2,4, x2,5, x3,2, . . ., x5,5 ]T (4)

a=V UT y

belongs to, let

and the sum of the squares, or the energy of a is:

x0,0 x4,4

x4,5

a a=y U

T T T

x0,1

S = . . (5)

. .

Since

x3,3 x7,7

SS T = U UT

where S is formed by choose all the possible 4 4 image

blocks in the 8 8 region of Fig. 1.

it follows that

However, only darks pixels belong to the decimated im-

a T a = y T ( SS T ) 1 y age and are known, other pixels are missing samples to be

interpolated. To resolve this problem, an initial interpolation

= y T Qy should be performed before applying our proposed interpo-

lator. Section 3.1 discusses this issue speci cally.

where Q is the pseudo inverse of SS T . If y is very similar to Once the quadratic signal class K is obtained, the known

representors are products between Q 1 and vectors with one

the training set S then

at the location of known samples and zero everywhere else.

aT a The optimal recovery solution is a linear combination of the

known representors.

and with the new Q it follows that For example, to achieve 2X magni cation, the known rep-

resentors (denoted as Fx, y where (x, y ) is its location) in Fig.

y T Qy 1 are the dark pixels for known samples. The known repre-

sentor F2,2 at the location (2,2) is the product between Q 1

This is the desired form for our ellipsoidal signal class. It re- and the vector with 1 for x2,2 and 0 for all other vector com-

sults from an Occam s razor type of assumption that small ponents.

weights are used to represent vectors that are similar to our The grey pixels labeled with a are computed by the

training set S. For more theoretical basis, refer to [36] or the average of the known representors at its immediate left and

appendix of [27]. right neighborhood e.g. the intensity of the pixel (2,3) is

Springer

278 J Comput Electron (2006) 5:275 281

Fig. 2 Schematic of a 3-layer feed-forward neural network

Fig. 4 Twenty 768 512 example images for the generic training set.

The images are numbered from left to right, top to bottom, of which the

estimated and updated to be ( F2,2 + F2,4 )/2. Similarly, the 7, 8, 9, 10 and 14th images are selected to form the specialized training

grey pixels labeled with b are computed by the average set to SR the image Boat

of the known representors of its immediate top and bottom

neighbors, and the c labeled pixels is the average of the

3.3 Two-pass iterative mechanism

known representors of its four immediate neighbors.

As stated in Section 3.1, an initial interpolation is need before

our proposed algorithm can work. From another perspective,

3.2 Neural-network design our algorithm can be considered as a process of updating

a rough interpolated SR image to be a more precise one.

We use the 3-layer feed-forward neural network as shown Hence it can be utilized as an iterative process with the output

in Fig. 2: The input layer has 64 neurons, corresponding to image of the previous iteration used as the input of the current

the 8 8 image patch, the hidden layer has 50 neurons and iteration.

the output layer has 9 neurons indicating the 9 used known The simplest initial interpolation may be pixel replica-

representors. The sigmoid transfer function of the hidden tion. Bilinear or bicubic initial interpolation can slightly

layer is tansig (x ) = 2/(1 + ex p ( 2x )) 1 and that of the improve the output image quality if using our algorithm

output layer is logsig(x ) = 1/(1 + ex p ( x )). The training in a one-pass way, which indicates that, the output im-

algorithm is selected to be the Scaled Conjugate Gradient age quality depends on the initial interpolation step. In-

method [35]. terestingly, according to experience from experiments, us-

In the training step, the neural network is fed with 8 8 im- ing our algorithm in a two-pass way generates visual re-

age patches arranged in vector form for the 64 input neurons sults that are very similar regardless of the initial inter-

and the 9 known representors computed by the optimal re- polation step, and more times of iteration only waste the

covery algorithm in Section 3.1 as target values for the output computing resource but yield negligible performance gain.

neurons; in the simulation step, the known representors are So a two-pass iterative mechanism is adopted in the ex-

generated by the neural network rather than by the optimal periments part in Section 4. Its block diagram is shown in

recovery theory. Fig. 3.

Fig. 3 A block diagram of our proposed algorithm

Springer

J Comput Electron (2006) 5:275 281 279

Fig. 5 Boat s mast 2X magni cation (from left to right, top to bottom): Adobe Photoshop, (e) The optimal recovery algorithm, (f) The neural-

network algorithm, (g) The proposed algorithm with generic training

(a) The entire true HR image (5X downsampled), (b) The true HR image

set, (h) The proposed algorithm with specialized training set

cropped from (a), (c) Input (2X magni ed), (d) The bicubic spline in

Fig. 6 Boat s mast 3X magni cation (from left to right, top to bot- (e) The proposed algorithm with generic training set, (f) The proposed

algorithm with specialized training set

tom): (a) Input (3X magni ed), (b) Bicubic spline in Adobe Photoshop,

4 Experiments and results In what follows, both subjective visual evaluation and ob-

jective quantitative measurement are presented. The Peak

For performance evaluation, our method is compared with Signal-to-Noise Ratio (PSNR) is used as an objective indi-

the traditional bicubic interpolation (the bicubic resampling cator to measure image quality, which is de ned by

tool in Adobe Photoshop), the state-of-the-art optimal recov-

ery algorithm (our own implementation of the algorithm de-

2552

PSNR = 10 log

scribed in [27]) and the neural-network algorithm (obtained

cols rows 2

j =1 e (i,

directly from the author [31]). All codes run on an AMD i =1

cols rows

Athlon 64 3000 MHz. Twenty 768 512 example images

displayed in Fig. 4 are used to generate 8 8 image patches Where cols and rows are the column and row numbers of

for training the neural network. the image, respectively, 255 is the maximum value for pixel

Springer

280 J Comput Electron (2006) 5:275 281

Table 1 Average PSNR over the 3 channels of each method for References

noise-free 2X magni cation (in dB)

1. Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image re-

Bicubic Optimal recovery Neural-network Proposed

construction: a technical overview. IEEE Signal Proc. Mag. 20(3),

Lena 39.72 34.19 38.35 39.63 21 36 (2003)

2. Baker, S., Kanade, T.: Limits on super-resolution and how to break

Peppers 40.20 40.32 39.77 40.09

them. IEEE Trans. Patt. Anal. Mach. Intell. 24(9), 1167 1183

Boy 34.41 35.93 36.26 35.87

(2002)

Boat 34.58 36.43 37.05 38.21

3. Farsiu, S., Robinson, D., Elad, M., Milanfar, P.: Advances and chal-

lenges in super-resolution. Int. J. Imag. Sys. Technol. 14(2), 47 57

(2004)

Table 2 Computation time corresponding to Table 1

4. Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level

vision. Int. J. Comp. Vis. 40(1), 25 47 (2000)

Optimal recovery Neural-network Proposed

5. Qiu, G.: Inter-resolution look-up table for improved spatial magni-

cation of images. J. Vis. Commun. Image Represent. 11, 360 373

Lena 2.81 13.42 1

(2000)

Peppers 2.57 19.24 1

6. Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-

Boy 2.89 14.37 1

resolution. IEEE Comp. Graph. Appl. 22(2), 56 65 (2002)

Boat 2.92 12.53 1

7. Freeman, W.T., Pasztor, E.C.: Learning to estimate scenes from

images. In: Proceedings of the 1998 Conference on Advances in

Neural Information Processing Systems II, pp. 775 781 (1998)

8. Sun, J., Zheng, N., Tao, H., Shum, H.: Image hallucination with

intensity, and e(i, j) is the difference between the real and primal sketch priors. In: Proceedings of the IEEE Conference on

computed images at pixel location (i, j). Computer Vision and Pattern Recognition, vol. II, pp. 729 736

The PSNR results and corresponding computation time of (2003)

9. Freeman, W.T., Pasztor, E.C.: Markov networks for super-

each method are listed in Tables 1 and 2. The bicubic spline is

resolution. In: Proceedings of the 34th Annual Conference on In-

waived from comparing computation time because its source formation Sciences and Systems (2000)

code remains unavailable to us. Our algorithm is used as 10. Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.:

the baseline 1, to which the ratios of all other methods are Image Analogies. In: Proceedings of SIGGRAPH 2001, pp. 327

340 (2001)

computed. We can see that, our algorithm is computationally

11. Chang, H., Yeung, D.Y., Xiong, Y.: Super-resolution through neigh-

less expensive while offering an excellent PSNR in most bor embedding. In: Proceedings of the IEEE International Confer-

cases. Figures 5 and 6 depict in close-up the results of the ence on Computer Vision, vol. I, pp. 275 282 (2004)

image Boat by each method for 2X and 3X magni cation 12. Keys, R.G.: Cubic convolution interpolation for digital image pro-

cessing. IEEE Trans. Acous., Speech and Signal Proc 29, 1153

respectively. Due to the limited space, only the mask part of

1160 (1981)

the boat is cropped from the entire HR image and presented 13. Ramstad, T., Wang, Y., Mitra, S.K.: An ef cient image interpolation

in full size. It is shown that, our proposed method produces scheme using hybrid IIR Nyquist lters. Opt. Eng. 31, 1277 1283

visually pleasing HR images with ne details at edges while (1992)

14. Chan, A.K., Chui, C.K., Zha, J., Liu, Q.: Local cardinal spline in-

synthesizing plausible texture and shine at the water ripple

terpolation and its application to image processing. In: Proceedings

region. of the Conference on Curves and Surfaces in Computer Vision and

Graphics II, SPIE Proceedings, vol. 1610, pp. 272 283 (1991)

15. Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing: part

I-theory. IEEE Trans. Sign. Proc. 41, 821 833 (1993)

5 Conclusion 16. Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing: part

II-ef cient design and applications. IEEE Trans. Sign. Proc. 41,

An optimal recovery based neural-network Super Resolution 834 848 (1993)

17. Thurnhofer, S., Mitra, S.: Edge-enhanced image zooming. Opt. Eng.

algorithm is developed. The proposed method is computa-

35(7), 1862 1870 (1996)

tionally less expensive and outputs images with high subjec- 18. Li, X., Orchard, M.T.: New edge-directed interpolation. IEEE

tive quality, compared with previous neural-network or opti- Trans. Image Proc. 10(10), 1521 1527 (2001)

mal recovery algorithms. It is evaluated on classical SR test 19. Chan, S.C., Ho, K.L., Kok, C.W.: Interpolation of 2-D signal by

subsequence FFT. IEEE Trans. Circuits Sys. Part II 40, 115 118

images and compared with other state-of-the-art methods.

(1993)

Results show that our algorithm is among the state-of-the- 20. Kim, S.P., Su, W.: Direct image resampling using block transform

art, both in quality and ef ciency. coef cients. Sign. Proc.: Image Commun. 5, 259 272 (1993)

Besides, we believe extensions of our algorithm, such as 21. Stasinski, R., Konrad, J.: POCS-based image reconstruction from

irregularly-spaced samples. In: Proceedings of the 7th IEEE In-

clustering and classi cation of example image patches, us-

ternational Conference on Image Processing Vol. 2, pp. 315 318

ing multiple neural-networks for specialized learning and (2000)

simulation of image patches with different gradients and di- 22. Stasinski, R., Konrad, J.: Improved POCS-based image reconstruc-

rections, etc., may bring the performance increase. We will tion from irregularly-spaced samples. In: Proceedings of the 7th

European Signal Processing Conference, Vol. 2 pp. 461 464 (2002)

pursue this interesting direction in our future research.

Springer

J Comput Electron (2006) 5:275 281 281

23. Chen, G., de Figueiredo, R.J.P.: A uni ed approach to opti- 29. Swanston, D.J., Bishop, J.M., Mitchell, R.J.: simple adaptive mo-

mal image interpolation problems based on linear partial dif- mentum: New algorithm for training multilayer perceptrons. Elect.

ferential equation models. IEEE Trans. Image Proc. 2, 41 49 Lett. 30, 1498 1500 (1994)

(1993) 30. Scalero, R.S., Tepedelenlioglu, N.: A fast new algorithm for training

feedforward neural networks. IEEE Trans. Sign. Proc. 40, 203 210

24. Karayiannis, N.B., Venetsanopoulos, A.N.: Image interpola-

tion based on variational principles. Sign. Proc. 25, 259 288 (1992)

(1991) 31. Plaziac, N.: Image interpolation using neural networks. IEEE Trans.

Image Proc. 8, 1647 1651 (1999)

25. Muresan, D.D., Parks, T.W.: Optimal recovery approach to

image interpolation. In: Proceedings of the 8th IEEE Inter- 32. Go, J., Sohn, K., Lee, C.: Interpolation using neural networks for

digital still cameras. IEEE Trans. Consum. Elect. 46(3), 610 616

national Conference on Image Processing, vol. III, pp. 7 10

(2001) (2000)

26. Shenoy, R.G., Parks, T.W.: An optimal recovery approach 33. Tsai, R.Y., Huang, T.S.: Multiframe image restoration and registra-

to interpolation. IEEE Trans. Sign. Proc. 40(8), 1987 1996 tion. Adv. Comp. Vis. Image Proc. 1, 317 339 (1984)

(1992) 34. Anderws, H.C., Hunt, B.R.: Digital Image Restoration. Prentice-

27. Muresan, D.D., Parks, T.W.: Adaptively quadratic (AQua) im- Hall NJ, (1977)

age interpolation. IEEE Trans. Image Proc. 13(5), 690 698 35. Moller, A.F.: A scaled conjugate gradient algorithm for fast super-

vised learning. Neural Networks 6(4), 525 533 (1993)

(2004)

28. Muresan, D.D., Parks, T.W.: Demosaicing using optimal recovery. 36. Michelli, C.A., Rivlin, T.J.: Optimal Estimation in Approximation

IEEE Trans. Image Proc. 14(2), 267 278 (2005) Theory, Plenum NY, (1976)

Springer

Contact this candidate