J Comput Electron (****) *:*** ***
Super-resolution using neural networks based on the optimal
recovery theory
Yizhen Huang Yangjing Long
Published online: 18 January 2007
C Springer Science + Business Media, LLC 2007
Abstract An optimal recovery based neural-network Super i.e. large number of pixels per unit area, and hence limits the
Resolution algorithm is developed. The proposed method is amount of light available for each pixel, which exaggerates
computationally less expensive and outputs images with high the effect of shot noise and degrades the image quality ev-
subjective quality, compared with previous neural-network or idently. To avoid suffering from shot noise, there exists the
optimal recovery algorithms. It is evaluated on classical SR limitation of the pixel size reduction, and the optimally lim-
ited pixel size is estimated at about 40 m2 for a 0.35 m
test images with both generic and specialized training sets,
and compared with other state-of-the-art methods. Results CMOS process. The current image sensor technology has
show that our algorithm is among the state-of-the-art, both almost reached this level.
in quality and ef ciency. This urges and expedites the development of software so-
lution to increase resolution level. Such resolution enhance-
Keywords Super resolution . Neural-network . Optimal ment techniques may be generally referred to as Super Res-
recovery olution (SR) image reconstruction, which primarily consists
of three categories: (a) Multiple-frame SR by extracting a
single HR image from a sequence of Low-Resolution (LR)
1 Introduction
images aligned in sub-pixel accuracy (see [1 3] for excellent
surveys) (b) Single-frame example-based SR that increases
With the popularization of various image-capturing devices
the resolution of a single image with training based on anal-
such as commercial digital cameras, camphones and cam-
ogous images [4 11, 29 31] (c) Single-frame SR that does
corders, digital images can be obtained more easily and con-
not depend on any other images [12 27]. The latter category
veniently than ever before. However, High-Resolution (HR)
of single-frame SR is more commonly referred to as image
sensors are still costly. This is because, from the hardware
interpolation or digital zooming. In this article, we focus on
perspective, high spatial resolution requires small pixel size
single-frame SR.
A ner taxonomy can be made for single-frame SR accord-
Y. Long: This work was supported in part by the National
ing to branches of their algorithms. Due the limited space,
211/985 Project of Shanghai Jiaotong University under Subgrant
PRP[S03009009]. only a few are listed: (a) Linear methods [12 16] are rel-
atively fast once the lter coef cients are found. This is
Y. Huang . Y. Long
especially the case for short kernel methods [12]. But the
Department of Computer Science and Engineering, School of
Electronics Information & Electric Engineering, Shanghai visual quality of the resulting images is often quite unsatis-
Jiaotong University, 800 Dongchuan Road, Minhang District,
factory. Shorter kernel lters usually result in blurred images
Shanghai 200240, PR China
or aliasing effects [13, 14], and with longer lters the images
e-mail: ***********@****.***.**
often exhibit ringing along sharp edges [15, 16]. (b) Images
Y. Long interpolated with edge-directed approaches [17, 18] look
Department of Mathematics, Shanghai Jiaotong University, 800
sharper but often seem too much like drawings, especially for
Dongchuan Road, Minhang District, Shanghai 200240, PR China,
large zoom factors. (c) The ideal lter response approaches
Digital Media and Data Reconstruction Lab., Shanghai Jiaotong
[19, 20] are with theoretical perfection and poor practical
University
Springer
276 J Comput Electron (2006) 5:275 281
performance. (d) Projection-Onto-Convex-Set (POCS) algo- both the objective quality (commonly measured in terms of
rithms produce images with high Peak Signal-to-Noise Ratio PSNR, Mean Square Error (MSE), or Normalized Color Dif-
(PSNR) but are computationally expensive [21, 22]. (e) Par- ference (NCD)) and subjective quality (namely visual experi-
tial Differential Equation (PDE) models [23, 24] establish ence by our Human Vision System (HVS)) of output images
inter-pixel correlation from a more abstract point of view, yet [4, 6, 8, 11].
they usually yield images with jagged artifacts and are vul- A lot of image reconstruction algorithms based on the op-
nerable to noises. (f) Optimal recovery interpolators [23, 25 timal recovery theory arise recently [23, 25 28]. Although
27], and (g) Neural-network based methods [29 31], which quite different from each other, they bene t from the common
are to be detailed in Section 2. strengths intrinsic to the optimal recovery representation: the
The branch, to which each SR algorithm belongs, affects subjective quality of output images is satisfactory [27]. How-
the output image quality and computing expense intrinsically. ever, for each image patch, not only does it require computing
Based on this observation, one way to nd potentially useful matrix multiplication but also inverse matrices, which is inef-
SR algorithms is by introducing new category of methodol- cient and induces relatively large errors. Speci cally, if the
ogy. However, this is rather dif cult considering the fact that matrix to be inversed is singular or approximately singular,
SR has been intensively studies for decades (see the semi- the errors induced are considerable. Therefore optimal recov-
nal work by Tsai and Huang [33] for multiple-frame SR and ery algorithms are slow and produce images with relatively
early textbook [34] for single-frame SR). inferior subjective quality [25, 27] among the state-of-the-art
Another way is by combining several existing SR algo- methods.
rithms to take their respective advantages. Driven by this, The success of these algorithms motivates us to ask that
a hybrid method of optimal recovery and neural-network is whether it is possible to integrate an optimal recovery based
proposed in this paper. The rest of the manuscript is arranged approach within a neural-network framework and, if so, will
as follows: In Section 2, the two branches of algorithms, from the two different branches of algorithms complement each
which our hybrid method evolves, together with the main idea other to offer a better algorithm? We believe that the answers
of our method are presented; Sections 3 details our proposed should be positive. An important observation is that, optimal
method; Section 4 is experimental results; Finally conclusion recovery condenses and extracts local image patch informa-
is provided in Section 5. tion and enables neural-networks to be trained and simulate
in more effective way.
Guided by these ideas, an optimal recovery based neural-
2 Related work and main idea network SR algorithm is developed, which is computation-
ally less expensive and outputs images with high subjective
Like other image reconstruction problems such as Color Fil- quality, compared with previous neural-network or optimal
ter Array interpolation [28, 32], SR is an inherently ill-posed recovery algorithms. It is evaluated on classical SR test im-
inverse problem since there may exist a virtually in nite set ages and compared with other state-of-the-art methods. Re-
of magni ed images from the original given data. The prob- sults show that our algorithm is among the state-of-the-art,
lem is usually placed into an inter-pixel correlation model both in quality and ef ciency.
[6, 7, 11, 12, 15 17, 23, 24, 29 31], which has been ac-
knowledged to be highly non-linear. Back propagation neural 3 Proposed algorithm
networks are capable of learning complex nonlinear func-
tions. This naturally motivates the development of neural- The proposed algorithm consists of the following 3 portions:
network approaches [29 32] that produces better results in
high-frequency regions. 3.1 Optimal recovery model
One problem using neural-networks is that, there exists
large information redundancy in the functional relation be- The rst key point of using optimal recovery for image in-
terpolation is determining the quadratic signal class K:
tween input and output neurons, which affects ef ciency
greatly. This is especially the case if neural-networks are
K = {x R n : x T Qx }
applied directly in a brute force way [29, 30, 32]. Another
problem is that, the training step for neural-networks is con-
siderably time-consuming. Fortunately, this can be done as from a set of training data. The training data is usually taken
an initialization and does not occupy any computing resource from the local features of the image and selecting a proper
for real-time processing. On the other hand, the training training set is discussed at the end of this subsection. For
now assume that a training set of patches S = x1, . . ., xm
step is also an opportunity to incorporate priori information.
Such information is often referred to as example images, representative of the local data is given for estimating the
local quadratic signal class. The Q for which the ellipsoid
and statistics show that, example images help to improve
Springer
J Comput Electron (2006) 5:275 281 277
x T Qx (1)
for some constant must be representative of the training
set S. In other words Q must be a matrix such that when an
image patch y is similar to the vectors in S then Eq. (1) holds
for y. Let matrix S be formed by arranging the image patches
in S as columns:
S = (x1, . . ., xm ) (2)
Fig. 1 Local HR image used for selecting S to estimate the quadratic
and consider the equation relating the image patch y to the class for the center 4 4 patch (dark pixels are part of the decimated
image)
training set S using a column of m weights, a:
Sa = y (3) Given the formulation of the quadratic signal class from a
training set S, or equivalently the formulation of Q, the next
Vector y is similar to the vectors in S when a has small energy. key point is determining the training set S. One commonly
Using standard notation for singular value decomposition of used approach is based on the proximity of their locations to
S, Eq. (3) can be rewritten as: the position of the vector being modeled. Patches are gener-
ated from the local neighborhood.
U VTa = y For example, in Fig. 1 to model the quadratic signal class
that the center patch
The weight vector a is given by
x = [x2,2, x2,3, x2,4, x2,5, x3,2, . . ., x5,5 ]T (4)
1
a=V UT y
belongs to, let
and the sum of the squares, or the energy of a is:
x0,0 x4,4
x4,5
2
a a=y U
T T T
x0,1
Uy
S = . . (5)
. .
.
.
Since
x3,3 x7,7
SS T = U UT
2
where S is formed by choose all the possible 4 4 image
blocks in the 8 8 region of Fig. 1.
it follows that
However, only darks pixels belong to the decimated im-
a T a = y T ( SS T ) 1 y age and are known, other pixels are missing samples to be
interpolated. To resolve this problem, an initial interpolation
= y T Qy should be performed before applying our proposed interpo-
lator. Section 3.1 discusses this issue speci cally.
where Q is the pseudo inverse of SS T . If y is very similar to Once the quadratic signal class K is obtained, the known
representors are products between Q 1 and vectors with one
the training set S then
at the location of known samples and zero everywhere else.
aT a The optimal recovery solution is a linear combination of the
known representors.
and with the new Q it follows that For example, to achieve 2X magni cation, the known rep-
resentors (denoted as Fx, y where (x, y ) is its location) in Fig.
y T Qy 1 are the dark pixels for known samples. The known repre-
sentor F2,2 at the location (2,2) is the product between Q 1
This is the desired form for our ellipsoidal signal class. It re- and the vector with 1 for x2,2 and 0 for all other vector com-
sults from an Occam s razor type of assumption that small ponents.
weights are used to represent vectors that are similar to our The grey pixels labeled with a are computed by the
training set S. For more theoretical basis, refer to [36] or the average of the known representors at its immediate left and
appendix of [27]. right neighborhood e.g. the intensity of the pixel (2,3) is
Springer
278 J Comput Electron (2006) 5:275 281
Fig. 2 Schematic of a 3-layer feed-forward neural network
Fig. 4 Twenty 768 512 example images for the generic training set.
The images are numbered from left to right, top to bottom, of which the
estimated and updated to be ( F2,2 + F2,4 )/2. Similarly, the 7, 8, 9, 10 and 14th images are selected to form the specialized training
grey pixels labeled with b are computed by the average set to SR the image Boat
of the known representors of its immediate top and bottom
neighbors, and the c labeled pixels is the average of the
3.3 Two-pass iterative mechanism
known representors of its four immediate neighbors.
As stated in Section 3.1, an initial interpolation is need before
our proposed algorithm can work. From another perspective,
3.2 Neural-network design our algorithm can be considered as a process of updating
a rough interpolated SR image to be a more precise one.
We use the 3-layer feed-forward neural network as shown Hence it can be utilized as an iterative process with the output
in Fig. 2: The input layer has 64 neurons, corresponding to image of the previous iteration used as the input of the current
the 8 8 image patch, the hidden layer has 50 neurons and iteration.
the output layer has 9 neurons indicating the 9 used known The simplest initial interpolation may be pixel replica-
representors. The sigmoid transfer function of the hidden tion. Bilinear or bicubic initial interpolation can slightly
layer is tansig (x ) = 2/(1 + ex p ( 2x )) 1 and that of the improve the output image quality if using our algorithm
output layer is logsig(x ) = 1/(1 + ex p ( x )). The training in a one-pass way, which indicates that, the output im-
algorithm is selected to be the Scaled Conjugate Gradient age quality depends on the initial interpolation step. In-
method [35]. terestingly, according to experience from experiments, us-
In the training step, the neural network is fed with 8 8 im- ing our algorithm in a two-pass way generates visual re-
age patches arranged in vector form for the 64 input neurons sults that are very similar regardless of the initial inter-
and the 9 known representors computed by the optimal re- polation step, and more times of iteration only waste the
covery algorithm in Section 3.1 as target values for the output computing resource but yield negligible performance gain.
neurons; in the simulation step, the known representors are So a two-pass iterative mechanism is adopted in the ex-
generated by the neural network rather than by the optimal periments part in Section 4. Its block diagram is shown in
recovery theory. Fig. 3.
Fig. 3 A block diagram of our proposed algorithm
Springer
J Comput Electron (2006) 5:275 281 279
Fig. 5 Boat s mast 2X magni cation (from left to right, top to bottom): Adobe Photoshop, (e) The optimal recovery algorithm, (f) The neural-
network algorithm, (g) The proposed algorithm with generic training
(a) The entire true HR image (5X downsampled), (b) The true HR image
set, (h) The proposed algorithm with specialized training set
cropped from (a), (c) Input (2X magni ed), (d) The bicubic spline in
Fig. 6 Boat s mast 3X magni cation (from left to right, top to bot- (e) The proposed algorithm with generic training set, (f) The proposed
algorithm with specialized training set
tom): (a) Input (3X magni ed), (b) Bicubic spline in Adobe Photoshop,
(c) The optimal recovery algorithm, (d) The neural-network algorithm,
4 Experiments and results In what follows, both subjective visual evaluation and ob-
jective quantitative measurement are presented. The Peak
For performance evaluation, our method is compared with Signal-to-Noise Ratio (PSNR) is used as an objective indi-
the traditional bicubic interpolation (the bicubic resampling cator to measure image quality, which is de ned by
tool in Adobe Photoshop), the state-of-the-art optimal recov-
ery algorithm (our own implementation of the algorithm de-
2552
PSNR = 10 log
scribed in [27]) and the neural-network algorithm (obtained
cols rows 2
j =1 e (i,
1
j)
directly from the author [31]). All codes run on an AMD i =1
cols rows
Athlon 64 3000 MHz. Twenty 768 512 example images
displayed in Fig. 4 are used to generate 8 8 image patches Where cols and rows are the column and row numbers of
for training the neural network. the image, respectively, 255 is the maximum value for pixel
Springer
280 J Comput Electron (2006) 5:275 281
Table 1 Average PSNR over the 3 channels of each method for References
noise-free 2X magni cation (in dB)
1. Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image re-
Bicubic Optimal recovery Neural-network Proposed
construction: a technical overview. IEEE Signal Proc. Mag. 20(3),
Lena 39.72 34.19 38.35 39.63 21 36 (2003)
2. Baker, S., Kanade, T.: Limits on super-resolution and how to break
Peppers 40.20 40.32 39.77 40.09
them. IEEE Trans. Patt. Anal. Mach. Intell. 24(9), 1167 1183
Boy 34.41 35.93 36.26 35.87
(2002)
Boat 34.58 36.43 37.05 38.21
3. Farsiu, S., Robinson, D., Elad, M., Milanfar, P.: Advances and chal-
lenges in super-resolution. Int. J. Imag. Sys. Technol. 14(2), 47 57
(2004)
Table 2 Computation time corresponding to Table 1
4. Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level
vision. Int. J. Comp. Vis. 40(1), 25 47 (2000)
Optimal recovery Neural-network Proposed
5. Qiu, G.: Inter-resolution look-up table for improved spatial magni-
cation of images. J. Vis. Commun. Image Represent. 11, 360 373
Lena 2.81 13.42 1
(2000)
Peppers 2.57 19.24 1
6. Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-
Boy 2.89 14.37 1
resolution. IEEE Comp. Graph. Appl. 22(2), 56 65 (2002)
Boat 2.92 12.53 1
7. Freeman, W.T., Pasztor, E.C.: Learning to estimate scenes from
images. In: Proceedings of the 1998 Conference on Advances in
Neural Information Processing Systems II, pp. 775 781 (1998)
8. Sun, J., Zheng, N., Tao, H., Shum, H.: Image hallucination with
intensity, and e(i, j) is the difference between the real and primal sketch priors. In: Proceedings of the IEEE Conference on
computed images at pixel location (i, j). Computer Vision and Pattern Recognition, vol. II, pp. 729 736
The PSNR results and corresponding computation time of (2003)
9. Freeman, W.T., Pasztor, E.C.: Markov networks for super-
each method are listed in Tables 1 and 2. The bicubic spline is
resolution. In: Proceedings of the 34th Annual Conference on In-
waived from comparing computation time because its source formation Sciences and Systems (2000)
code remains unavailable to us. Our algorithm is used as 10. Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.:
the baseline 1, to which the ratios of all other methods are Image Analogies. In: Proceedings of SIGGRAPH 2001, pp. 327
340 (2001)
computed. We can see that, our algorithm is computationally
11. Chang, H., Yeung, D.Y., Xiong, Y.: Super-resolution through neigh-
less expensive while offering an excellent PSNR in most bor embedding. In: Proceedings of the IEEE International Confer-
cases. Figures 5 and 6 depict in close-up the results of the ence on Computer Vision, vol. I, pp. 275 282 (2004)
image Boat by each method for 2X and 3X magni cation 12. Keys, R.G.: Cubic convolution interpolation for digital image pro-
cessing. IEEE Trans. Acous., Speech and Signal Proc 29, 1153
respectively. Due to the limited space, only the mask part of
1160 (1981)
the boat is cropped from the entire HR image and presented 13. Ramstad, T., Wang, Y., Mitra, S.K.: An ef cient image interpolation
in full size. It is shown that, our proposed method produces scheme using hybrid IIR Nyquist lters. Opt. Eng. 31, 1277 1283
visually pleasing HR images with ne details at edges while (1992)
14. Chan, A.K., Chui, C.K., Zha, J., Liu, Q.: Local cardinal spline in-
synthesizing plausible texture and shine at the water ripple
terpolation and its application to image processing. In: Proceedings
region. of the Conference on Curves and Surfaces in Computer Vision and
Graphics II, SPIE Proceedings, vol. 1610, pp. 272 283 (1991)
15. Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing: part
I-theory. IEEE Trans. Sign. Proc. 41, 821 833 (1993)
5 Conclusion 16. Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing: part
II-ef cient design and applications. IEEE Trans. Sign. Proc. 41,
An optimal recovery based neural-network Super Resolution 834 848 (1993)
17. Thurnhofer, S., Mitra, S.: Edge-enhanced image zooming. Opt. Eng.
algorithm is developed. The proposed method is computa-
35(7), 1862 1870 (1996)
tionally less expensive and outputs images with high subjec- 18. Li, X., Orchard, M.T.: New edge-directed interpolation. IEEE
tive quality, compared with previous neural-network or opti- Trans. Image Proc. 10(10), 1521 1527 (2001)
mal recovery algorithms. It is evaluated on classical SR test 19. Chan, S.C., Ho, K.L., Kok, C.W.: Interpolation of 2-D signal by
subsequence FFT. IEEE Trans. Circuits Sys. Part II 40, 115 118
images and compared with other state-of-the-art methods.
(1993)
Results show that our algorithm is among the state-of-the- 20. Kim, S.P., Su, W.: Direct image resampling using block transform
art, both in quality and ef ciency. coef cients. Sign. Proc.: Image Commun. 5, 259 272 (1993)
Besides, we believe extensions of our algorithm, such as 21. Stasinski, R., Konrad, J.: POCS-based image reconstruction from
irregularly-spaced samples. In: Proceedings of the 7th IEEE In-
clustering and classi cation of example image patches, us-
ternational Conference on Image Processing Vol. 2, pp. 315 318
ing multiple neural-networks for specialized learning and (2000)
simulation of image patches with different gradients and di- 22. Stasinski, R., Konrad, J.: Improved POCS-based image reconstruc-
rections, etc., may bring the performance increase. We will tion from irregularly-spaced samples. In: Proceedings of the 7th
European Signal Processing Conference, Vol. 2 pp. 461 464 (2002)
pursue this interesting direction in our future research.
Springer
J Comput Electron (2006) 5:275 281 281
23. Chen, G., de Figueiredo, R.J.P.: A uni ed approach to opti- 29. Swanston, D.J., Bishop, J.M., Mitchell, R.J.: simple adaptive mo-
mal image interpolation problems based on linear partial dif- mentum: New algorithm for training multilayer perceptrons. Elect.
ferential equation models. IEEE Trans. Image Proc. 2, 41 49 Lett. 30, 1498 1500 (1994)
(1993) 30. Scalero, R.S., Tepedelenlioglu, N.: A fast new algorithm for training
feedforward neural networks. IEEE Trans. Sign. Proc. 40, 203 210
24. Karayiannis, N.B., Venetsanopoulos, A.N.: Image interpola-
tion based on variational principles. Sign. Proc. 25, 259 288 (1992)
(1991) 31. Plaziac, N.: Image interpolation using neural networks. IEEE Trans.
Image Proc. 8, 1647 1651 (1999)
25. Muresan, D.D., Parks, T.W.: Optimal recovery approach to
image interpolation. In: Proceedings of the 8th IEEE Inter- 32. Go, J., Sohn, K., Lee, C.: Interpolation using neural networks for
digital still cameras. IEEE Trans. Consum. Elect. 46(3), 610 616
national Conference on Image Processing, vol. III, pp. 7 10
(2001) (2000)
26. Shenoy, R.G., Parks, T.W.: An optimal recovery approach 33. Tsai, R.Y., Huang, T.S.: Multiframe image restoration and registra-
to interpolation. IEEE Trans. Sign. Proc. 40(8), 1987 1996 tion. Adv. Comp. Vis. Image Proc. 1, 317 339 (1984)
(1992) 34. Anderws, H.C., Hunt, B.R.: Digital Image Restoration. Prentice-
27. Muresan, D.D., Parks, T.W.: Adaptively quadratic (AQua) im- Hall NJ, (1977)
age interpolation. IEEE Trans. Image Proc. 13(5), 690 698 35. Moller, A.F.: A scaled conjugate gradient algorithm for fast super-
vised learning. Neural Networks 6(4), 525 533 (1993)
(2004)
28. Muresan, D.D., Parks, T.W.: Demosaicing using optimal recovery. 36. Michelli, C.A., Rivlin, T.J.: Optimal Estimation in Approximation
IEEE Trans. Image Proc. 14(2), 267 278 (2005) Theory, Plenum NY, (1976)
Springer