Einzeltreffer — DigiBib

Constructing an accurate prediction model from a small training data set is an important but difficult task in the field of forecasting. This is because when the data size is small, the incomplete data may mean that the model produced cannot sufficiently represent the true data structure or cause the model training to be overfitted. To address this issue, this paper presents an approach that combines multiple prediction models to extract data information in multiple facets. In the multi-model approach, a compromise weight method is proposed to determine the relative reliability of each of the prediction model. The methods used include multiple regression, artificial neural network, and support vector machines for regression. A thin-film transistor liquid crystal display manufacturing case study is used to illustrate the details of this research. The empirical results not only show that the proposed multi-model can reduce the manufacturing variation and increase the production yield, but also can propose a robust and reliable parameter interval to the online engineers in the early manufacturing stage.

A multi-model approach to determine early manufacturing parameters for small-data-set prediction.

Constructing an accurate prediction model from a small training data set is an important but difficult task in the field of forecasting. This is because when the data size is small, the incomplete data may mean that the model produced cannot sufficiently represent the true data structure or cause the model training to be overfitted. To address this issue, this paper presents an approach that combines multiple prediction models to extract data information in multiple facets. In the multi-model approach, a compromise weight method is proposed to determine the relative reliability of each of the prediction model. The methods used include multiple regression, artificial neural network, and support vector machines for regression. A thin-film transistor liquid crystal display manufacturing case study is used to illustrate the details of this research. The empirical results not only show that the proposed multi-model can reduce the manufacturing variation and increase the production yield, but also can propose a robust and reliable parameter interval to the online engineers in the early manufacturing stage.

Keywords: manufacturing; forecasting; multi-model; small data set; TFT-LCD

1. Introduction

In traditional pattern-recognition systems, a single (or particular) learning algorithm is used to predict the true pattern of a given training data set (Ho et al. [9]). However, researchers have developed numerous learning algorithms working with different data features depending upon the application of interest. These algorithms are different in their basic theories, and hence achieve various degrees of success for different applications. However, in many applications, it is not easy or even possible to decide which method is the most suitable and which will lead to the construction of the best model (Reformat and Yager [24]).

In previous studies of manufacturing yield modelling, Kumar et al. ([11]) summarised several techniques such as Poisson model, Stapper model, Murphy's yield model, Seeds's yield model, and Okabe's model to predict yield accurately at an early product stage. Li et al. ([15]) proposed another approach to predict manufacturing yield by combining the manufacturing experience and the support vector machine. Wu and Zhang ([31]) proposed the fuzzy neural network based yield model for yield prediction of semiconductor manufacturing factory. In their study, they showed the proposed method could predict the yield more accurately than traditional statistic analysis models (Cunningham [6]) and artificial neural network models (Tong et al. [29]).

Process capability indices (PCI) have been proposed to the manufacturing industry to control the process yield or line yield (which is defined as the percentage of processed product units passing the inspection) by using past in-control data. In a single characteristic, the production yield can be measured by several well-known PCIs, but the challenge is measuring the production yield with multiple characteristics (Pearn et al. [23], [22], Pearn and Cheng [21], Shu and Wu [27]).

1.1 Multi-model approach

A multiple-model approach has been proposed to take advantage of the many kinds of pattern analysis and different facets of data information (Kuncheva [12]). The multiple-model approach includes two steps. The first is to determine a set of models, and the second is to combine the predictions (Todorovski and Džeroski [28]). There are two approaches for finding multiple models. One is to find those with different learning algorithms or heterogeneous models (Merz [19]). The other is to use a single learning algorithm with different initial settings or subsets of training data, such as the bagging (Breiman [3]) and boosting (Freund and Schapire [7]) methods. In the second step, combining the prediction models, there are three combining paradigms: voting, stacking, and cascading. In Todorovski and Džeroski ([28]), voting is used to vote for the models' predictions, and the prediction receiving the most votes is chosen as the final one. With staking, a learning algorithm is used to produce a meta-level model to combine the predictions of all the models. Finally, cascading is an iterative process of combining the models.

There are many studies that discuss combining multiple models. Todorovski and Džeroski ([28]) introduced meta decision trees (MDTs) based on the C4.5 algorithm to specify which model should be used to obtain the final classification. Reformat and Yager ([24]) proposed fuzzy ensemble classifiers by using the Dempster–Shafer concept of belief function and an ordered weighted averaging (OWA) operator that combines the outputs of a number of individual classifiers. Byon et al. ([4]) proposed a collection of base classifiers instead of one single classifier as the prediction model to find the final method which achieves the best accuracy in detecting the minority samples in highly class imbalanced cases. Ma and Cherkassky ([18]) proposed the multiple model estimation (MME) approach that estimates several models with each of them describing a different subset of the data. Hence, MME can be viewed as a process of using several 'simple structures' to describe the available data.

1.2 On the difficult small-data learning problem

Small data sets are associated with incomplete data and insufficient information. In data analysis, incomplete data may mean that the model produced cannot sufficiently represent the true data structure and thus increase the model variation. Management in the early stages of production is a difficult but critical task, because it is necessary to detect problems and correct policies as early as possible. However, in the early manufacturing process, the online engineers usually make decisions to determine the manufacturing parameters based on small pilot data sets and their own manufacturing experience. The rough parameters thus produced usually cause more variation in the production yield.

In this study, we propose a multi-model system to help the engineers determine the manufacturing parameters in a factory. Different from the past prediction learning methods which use single prediction model, the proposed method here combines several learning models to acquire the predictional pattern from the data. On the other hand, some combined multiple models in the past research only discuss classification problems, and thus the weight combination methods are determined for classification purpose. The main aim of this approach is to produce a robust forecasting model to determine the manufacturing parameters. We develop the multi-model system by combining various forecasting models including multiple regression, a backpropagation (BP) neural network, and support vector machines for regression (SVR) with Gaussian and polynomial kernels. The system is built basically according to the model weights which are determined by their output accuracies (mean absolute errors). Note that, a compromise weight determination method is developed here on the basis of mega-trend diffusion (MTD) function proposed by Li et al. ([14]), which considers uncertainty issues from small data set problem.

The reasons for selecting these models are as follows. The multiple regression model is used to build up the linear relation between the input and output variables. A backpropagation neural network is a multiple-layer neural network that can forecast well using nonlinear data, such as with the XOR (exclusive OR) problem (Haykin [8]) and as a learning algorithm it can be seen as a nonlinear expansion of the least-mean-square error (LMS) approach. Finally, a support vector machine for regression (termed here support vector regression (SVR)) maintains all the features that characterise the maximal margin algorithm in which a nonlinear function is learned by a linear learning machine in a kernel-induced feature space (Cristianini and Shawe-Taylor [5]). The SVR uses the kernel to transform the data into a feature space of higher dimensions, and builds up the prediction model with some specific support vectors in the training data set. We used the SVR as the third prediction model with the Gaussian and polynomial kernels in the model to extract possible hidden information in the data set.

1.3 Real case

A real problem in the thin-film transistor liquid crystal display (TFT-LCD) manufacturing factory in Taiwan is examined in this paper. The robustness and the yield improvement in the factory are the performance indices used to show the proposed multi-model system is effective when the training data set is small.

The remainder of this paper is organised as follows. In Section 2, a review of small-data-set analysis literature is presented, and several prediction models are introduced. Section 3 shows the detailed multi-model procedure. The empirical case study in TFT-LCD manufacturing and the multi-model system performance are presented in Section 4. Finally, conclusions are given in Section 5, along with a suggestion for future research.

2. Literature review

In this section, we will review the literature on small-data-set analysis and several prediction models that include the backpropagation neural network and support-vector regression.

2.1 Small-data-set analysis

Small data sets usually have problems with incomplete data and insufficient information and in data analysis; this may mean that any model that is developed does not accurately represent the data structure. There are many studies that discuss small data set problems. One approach, virtual sample generation, is a data-preprocessing method proposed to enhance the prediction performance for small-data-set problems. The original idea of virtual data generation was proposed by Niyogi et al. ([20]), who used prior knowledge obtained from a given small training set to create virtual samples to improve the recognition performance. Specially, they generated new views from any other direction through a mathematical transformation of a given 3-D object. Samples generated through those new views are called virtual samples, and the process of creating virtual samples is mathematically equivalent to incorporating prior knowledge. Later, Li et al. ([13]) used the functional virtual population to solve the scheduling problem in early flexible manufacturing systems. Their research used virtual sample generation to increase the amount of training data to improve the classification accuracy of a backpropagation neural network (BPNN).

Another approach is usually named the wrapper method, in which artificial neural networks and kernel methods are used to deal with small-data-set problems. For example, Huang and Moraga ([10]) proposed a diffusion neural network (DNN) which, compared with other such networks, had more nodes in the input and layers, and was trained by using derived patterns instead of the original ones. The DNN method's error rate, 48%, was shown in their work to be better than that of the conventional BPNN. Li and Liu ([16]) proposed a new neural network weight determination model by applying the concept of cognitive theory to the weighted learning algorithm. They showed that the resulting mean absolute percentage errors were less than those obtained with the BPNN, linear regression, x-bar, and Li and Yeh's ([16]) method. In the kernel method approach, the main aim is to extend the data space into a higher dimensional feature space by using the chosen kernel. Currently, a support vector machine (SVM) is the commonly used kernel method for data prediction. Unlike traditional methods which minimise the training error, SVM aims at minimising an upper bound of the generalisation error through maximising the margin between the separating hyperplane and the data (Amari and Wu [2]).

2.2 Artificial-intelligence methods

Since the first neural-network model perceptron was proposed in 1957, many different neural-network models have been developed, such as back-propagation neural networks and self-organising map networks. Over the last two decades, artificial neural networks (ANN) (Haykin [8]) have become another important technique in industrial engineering.

A back-propagation algorithm is also called a generalised delta rule, and from the perspective of a learning algorithm, it can be seen as a nonlinear expansion of the least-mean-square error (LMS) approach. Most of the time, this kind of neural network is known as a backpropagation neural network with the structure shown in Figure 1.

Graph: Figure 1. Structure of a back-propagation neural network.

Backpropagation learning consists of two passes through different layers of the network: a forward pass and a backward pass. In the forward pass, the input vector propagates layer by layer through the network to the output. If is the input data set and denotes the input signals with M attributes, the qth forward training operations of neuron j propagated to each sth layer in the S layer neural network is and , where ns−1 is the number of nodes in the sth layer, wij are the synaptic weights connecting neurons i to neurons j, θj are biases at neurons j, and is the activation function. In the backward pass, an error function is used to differentiate a continuous weight vector to solve the unconstrained optimisation problem of minimising the error function. The backward learning rule of the qth step in the training can be derived by partially differentiating the function of the sum of squared errors in the error function in which a total of Q times training with the desired outputs, [ ] is considered. y (q) represents the actual output of the qth step in the network, and thus the amount of correction on weights can be obtained by partial differentiation using the following chain rules:

Graph

The new weight, w (q + 1), is then updated by , where .

The activation function is if is the output of a node in the output layer of the network, or if is the output of a node in the hidden layer of the network, and is the propagated local gradient in neuron k of the next layer.

2.3 Support-vector machines for regression

A support-vector machine (SVM) is a promising pattern-recognition technique that was first proposed by Vapnik ([30]). Unlike traditional methods which minimise the training error, SVM aims at minimising an upper bound of the generalisation error by maximising the margin between the separating hyperplane and the data (Amari and Wu [2]). In its original form, the SVM learning leads to a quadratic programming problem, which is a convex constrained optimisation problem, and thus has a unique solution. Given a training set of N samples, , where is the input vector corresponding to the ith sample labelled by depending on its class, the SVM problem can be formulated as a quadratic programming optimisation problem that will find the weight parameter and the bias parameter b that maximise the margin while ensuring that the training samples are well classified:

Graph

Here, parameter C is used to tune the trade-off between the acceptable amounts of errors. are slack variables, and

Vapnik extended the SVM to regression models for treating a regression problem as a single classification case. Given a training set of N samples: , where is the input vector, the corresponding value is the target value of , and an -insensitive error function is proposed as a trade-off between the acceptable amounts of errors.

Graph

When minimising the objective function (also called the error function):

Graph

We can re-express the optimisation problem by introducing slack variables to the model. For each data point , we use two slack variables, and , to outline the points that are out of the interval, , where corresponds to a point for which , and corresponds to a point for which , as illustrated in Figure 2.

Graph: Figure 2. Illustration of SVM regression.

The condition for a target point to lie inside the -tube is that . The purpose of introducing the slack variables is to allow points to lie outside the tube provided that the slack variables are non-zero, and the corresponding conditions are:

Graph

Hence, the error function for support vector regression can then be written as:

Graph

Using the Lagrange multipliers to solve the quadratic programming problem, we can find the dual problem as:

Graph

where is called a positive semidefinite kernel (or Mercer kernel) which satisfies the symmetric (i.e. ) and the following equation:

Graph

Each Mercer kernel can be expressed as , where is a mapping of feature selection, and is the inner product. Moreover, commonly used kernel functions are polynomial kernels, radial-basis function kernels, and two-layer perceptron kernels (Yu et al. 2008):

Graph

where are specified a priori by the user.

There are many studies that discuss the performance of SVR for different kernels, such as Sánchez ([25]) and Shawkat et al. (2006). In Shawkat et al.'s study, they indicate that the radial-basis function kernels have superior performance, and for this reason, we use one of them as the analysis tool in this paper.

3. Proposed method

In this section, we introduce the proposed multi-model for small data sets and the implementation procedure.

3.1 Multi-model for small data sets

Different methods and techniques can be applied to analyse and process data, and each has its own unique approach to extract valuable method-oriented information. As a result, our goal is to develop a prediction model that combines multiple forecast models to extract information from multiple facets. Some of these prediction models have a good forecast performance in general, while others perform well specifically for minimum mean absolute (or square) errors or minimum forecast variances on a small data set. Given that is the training data set, and N is the size of the data set, is the vector of M dimensional variables for the ith sample in S, and the response variable belongs in real number , the forecast model can be constructed by aggregating multiple forecast models for the small size training data set S.

Two commonly used structures are shown in Figure 3 for combining multiple models. The left-hand side model is generated using multiple forecast models, or by applying k different learning algorithms to a single training data set S; on the right-hand side, the model is generated using a single forecast model A with d different model parameters to a single training data S.

Graph: Figure 3. Structures for combining multiple models.

The main aim of the two proposed systems is to extract a variety of relationships among the data attributes to construct a combined forecast model for the small data set. Figure 4 shows the multi-model system which includes two main phases, multiple models, and model combination. In the multiple model phase, we not only use the different learning algorithms (), but also consider the parameter determination () in a single learning machine Ak, because in SVR the forecast performance is highly dependent on the chosen kernel. For the polynomial kernel with one degree, SVR is seen as a linear forecast model, but SVR becomes nonlinear if we choose the Gaussian kernel.

Graph: Figure 4. Multi-model system for small data sets.

In the model combination phase, it is an important task to decide the combination weights in the multi-model system. The main purpose of the multi-model system is to minimise the forecast mean absolute errors. The forecast model which is represented by the multiple models , with the weights belonging to the weight space , is denoted as . Thus, the prediction error (PE) is defined as in Byon et al. ([4]):

Graph

where is the absolute expectation over the training set S, whose members are independently and identically distributed, and is the absolute expectation over the test observations .

In the small training data size problem, it is unlikely that minimum prediction errors will be achieved; thus, the training model may suffer from overfitting, and so compromise weights should be considered. In this paper, we modify the model using the mega-trend diffusion (MTD) function, which is built for dealing with the quantity problem for small data sets. The detailed concept of the MTD function is described below.

The mega-trend diffusion (MTD) function was first proposed by Li et al. ([14]), and its main purpose is to generate virtual samples to solve the problem of small-data-set analysis. From the statistical viewpoint, the assumption of normal distribution is a necessary condition before data analysis. However, it is often difficult to show that a data set follows a normal distribution when the data size is small. Consequently, Li et al. used the membership function in fuzzy-set theory to calculate the possibility values of virtual samples instead of the probability in statistics to avoid the normal distribution assumption. Figure 5 shows the concept of the fuzzy theorem applied to the MTD function. The triangle is the membership function, and the height of sample x is the possibility values of the membership function denoted as . The value of the membership function presents the possibility value of x, as denoted by M (x). The symbols a, b, and uset are described below.

Graph: Figure 5. MTD function (Li et al. [14]).

is the given training sample, and the boundaries a and b are defined as follows:

Graph

where and is the variance of X. NL is the number of data points smaller than uset. NU is the number of data points greater than uset. and show the rates of skewness in the distribution of the data. f (t) is a real number. The MTD function is formulated as:

Graph

In our model, the mean absolute errors, , of the multiple models are given as the input in the modified mega-trend diffusion (MMTD) function. Each mean absolute error, , has the corresponding sample variance for each forecast model. We collect the model variances and compute the expectation model variance to instead of the variance in MTD. Hence, the boundaries a and b in MMTD are defined as follows:

Graph

where is the average of the minimum and maximum errors, and is the expectation variance of the corresponding models. NL is the number of mean absolute errors smaller than uset. NU is the number of mean absolute errors greater than uset. and show the rates of skewness in the distribution of the data. f (t) is a real number. The MMTD function is formulated as follows:

Graph

Figure 6 shows the MMTD function and the height of the point ei is the corresponding weight of the ith training model.

Graph: Figure 6. MMTD function.

3.2 Summary of implementation procedure

The proposed procedure can be summarised as follows.

Assuming that is a small training data set, and is the testing data set and each sample , i = 1, , N, in S or (i = 1, , q, in T or) has M attributes (i.e. ), and is the target value of which is a continuous number.

Step 1

(data standardisation): Standardise the data set when the scales of explanatory variables are widely different.

Step 2

(training models): Train the chosen models including the single-model with different parameters using the training data set S.

Step 3

(collect the mean absolute errors): Input the testing data into the models that are trained in Step 2 and collect the mean absolute errors .

Step 4

(weight determination): Construct the MMTD function and compute the height value of each error in the MMTD as the corresponding weight.

4. Empirical study in TFT-LCD manufacturing

We use an empirical study in TFT-LCD manufacturing to illustrate our multi-model system. The details of problem definition, factor selection, data collection, model parameter setting, and the experiment design are described in the following subsections.

4.1 Problem definition, factor selection, and data collection

The process of producing a TFT-LCD is mainly separated into three steps: the TFT and Colour Filter (CF) Process, Cell Process, and Module Process. A colour filter is one of the important components of a TFT-LCD, and the key processes in manufacturing a CF include coating the black matrix (BM), green (G), red (R), blue (B), ITO (Indium Tin Oxide), and photo-spacer (PS) layers (Figure 7), and these are flow-shop production-type processes. The coating of the photo-spacer layer is the most important step, because it is the last process in making a CF, in which the target (Photo-Spacer Height) value is determined by the height of the photo-spacer layer. If the does not meet the specifications, the cost of reprocessing is much greater than that for other layers. Therefore, cell-manufacturing engineers state that related panel data must be provided to calculate the cell manufacturing parameters when CF base panels are shipped to the cell making station.

Graph: Figure 7. Main processes in manufacturing CF.

The structure of is shown in Figure 8, where means the thickness of the photo-spacer layer, is the thickness of the R layer, and is the thickness of the BM layer. The thicknesses of the BM, R, and photo-spacer layers are mainly determined by the coating speed in the CF manufacturing process. When the coating speed is low, the thickness is high, and vice versa. The final index that the cell process requires is , and the function to calculate the value is estimated as:

Graph

Graph: Figure 8. PSH structure.

When exceeds the specification in a CF, the cell has a defect, known as 'Gravity Mura', and when is below the specification, it will have a defect known as 'Push Mura'. It is difficult to control the quality performance of in CF, because the value includes the variance of the BM, R, and photo-spacer layers.

In cost control, the random sampling strategy is used to measure the PSH, as the quality-control engineer can only sample one out of every 100 glasses. In the SPC (statistic process control) database, we can find only 13 PSH measurement data for the previous 3 months.

4.2 Model building and parameter sensitivity analysis

In this case, only 13 measurement data have been collected from a TFT-LCD manufacturing company in Taiwan, and Table 1 shows only the first five data points for business reasons. In the multi-model system, three commonly used forecast models are applied, namely those based on multiple regression, a backpropagation neural network (BPNN) and a support vector machine for regression (SVR). In the SVR model, two different kinds of kernel, polynomial and Gaussian, are employed. The analysis tool is Matlab 7.0 with statistical and neural network toolboxes, and the SVM toolbox, which is downloaded from http://bime-talks.blogspot.com/2007/06/b946110233.html.

Table 1. First five raw data.

PSH	Rth	BMth	PSth
3.4879	2.0471	1.3472	3.8273
3.5525	2.0297	1.3930	3.8796
3.5999	2.0327	1.3936	3.9238
3.5350	2.0216	1.3792	3.8896
3.5178	2.0752	1.3590	3.8684

For the parameter sensitivity analysis, this research uses the 13 collected data and leave-one-out cross-validation to ensure that the learning model is robust. The value of mean absolute error (MAE) is the performance index. In the model parameter sensitivity experiment, the two SVM kernel parameters used are based on the range of parameter values suggested by Ali and Smith-Miles ([1]). For BPNN parameters, the earning rate (LR) starts from 0.3 to 0.5, and the hidden nodes (HNs) start from three to five. Table 2 shows the sensitivity analysis results. Summarising Table 2, Table 3 shows the corresponding parameters of the training models for the experiments below.

Table 2. Parameter sensitivity analysis results.

SVR learning model
Gaussian	Parameter	0.4	0.5	0.6	0.7	0.8	0.9	1
	MAE	0.0288	0.0285	0.0289	0.0285	0.0287	0.0289	0.0287
Polynomial	Parameter	1	2	3	4	5
	MAE	0.0289	0.0287	0.0286	0.0285	0.0285
BPNN learning model
	Parameter (LR/HNs)
MAE	(0.3/3)	(0.3/4)	(0.3/5)	(0.4/3)	(0.4/4)	(0.4/5)	(0.5/3)	(0.5/4)	(0.5/5)
	0.0210	0.0263	0.0206	0.0244	0.0261	0.0267	0.0245	0.0255	0.0318

Table 3. Parameter settings for the corresponding training models.

Multiple regression	BPNN	SVR	SVR
None	Learning rate = 0.3	Kernel = polynomial	Kernel = Gaussian
	One hidden layer	Degree = 4	Degree = 0.5
	Five hidden nodes	C = 1000	C = 1000
	8000 learning times	Epsilon = 0.05	Epsilon = 0.05

4.3 Experiment results

The attributes in Table 2, PSH, Rth, and BMth are the input features, and the prediction target is PSth, because we expect to find the best PSth parameter setting in the PS layer for CF manufacturing. In the model building, we train the prediction models with the 13 collected data, and leave-one-out cross-validation is used here to find the mean absolute errors of prediction models. Table 4 shows the mean absolute errors and the corresponding standard deviations (STDs) for the prediction models. Finally, we use the mean absolute errors of the prediction models as the input in the MMTD function and compute the compromise weights for the multiple prediction models. They are , , , and .

Table 4. Mean absolute errors and standard deviation of the prediction models.

	Multiple regression	BPNN	SVR (polynomial)	SVR (Gaussian)
Mean	0.0427	0.0206	0.0285	0.0285
SD	0.0276	0.0167	0.0155	0.0179

This paper uses model robustness and the empirical results in the factory to verify whether the proposed multi-model system is effective with regard to small-data-set prediction. To test model robustness, we use 30 experiment products as the testing data set to verify whether the proposed multi-model system is effective. Table 5 shows the mean absolute errors and standard deviations for the prediction models, including the proposed multi-model system, using the combination weights with the values computed with the 13 training data. It is clear that the multi-model system has a smaller mean absolute error and STD than other prediction models, except for the multiple regression model. Comparing the performance of multiple regression model in Tables 3 and 4, we find the STDs are different in both cases. In order to verify the effectiveness of the proposed approach, we designed a new experiment for the scenario of small-data-set analysis by using a random-sampling method to generate training data sets from the 30 experiment products, where we set the sizes of the training data at 5, 10, 15, 20, and 25. The experiment repeats 30 times for different sizes of data sets. Table 6, the experiment results, clearly shows that the STD of proposed method outperforms other models when the data size is small. For the case with 30 samples, although the MAE index of the proposed method is not less than linear regression, they are not statistically significantly different in t-test. Hence, we conclude that the proposed method has effectiveness and robustness when the data set is small.

Table 5. Mean absolute errors and standard deviation of the prediction models and the multi-model for 30 data.

	Multiple regression	BPNN	SVR (polynomial)	SVR (Gaussian)	Multi-model
Mean	0.0132	0.0383	0.0315	0.0320	0.0261
SD	0.0118	0.0293	0.0210	0.0294	0.0185

Table 6. Experiment results for the small-data-set scenarios.

		Multiple regression	BPNN	SVR (polynomial)	SVR (Gaussian)	Multi-model
5	MAE	0.0683	0.0644	0.0715	0.0662	0.0652
SD	0.0658	0.0390	0.0314	0.0546	0.0305
10	MAE	0.0305	0.0328	0.0345	0.0447	0.0312
SD	0.0202	0.0276	0.0301	0.0326	0.0196
15	MAE	0.0287	0.0292	0.0286	0.0299	0.0287
SD	0.0164	0.0314	0.0293	0.0192	0.0157
20	MAE	0.0292	0.0274	0.0330	0.0359	0.0286
SD	0.0156	0.0138	0.0215	0.0253	0.0136
25	MAE	0.0291	0.0258	0.0123	0.0236	0.0209
SD	0.0141	0.0121	0.0209	0.0232	0.0120

For verification of the empirical results, we build the upper and lower limits for the parameter settings of PSth and require the online engineers to follow the specifications when they are setting the PSth parameters. The upper and lower limits of PSH thus obtained are 3.43 and 3.53, respectively, which are set by module engineers, and the corresponding values of Rth and BMth are 2.026 and 1.358. The upper and lower limits of the PSth values obtained from the models are shown in Table 7.

Table 7. PSth values obtained from the models.

	PSH	Rth	BMth	Multiple regression	BPNN	SVR (poly)	SVR (Gaussian)	Multi-model
Lower limit	3.43	2.026	1.358	3.7862	3.8460	3.8365	3.8364	3.8261
Upper limit	3.53	2.026	1.358	3.8578	3.8852	3.8386	3.8587	3.8599

After 20 weeks of tracking in the factory, the production yield of the PS layer was indeed improved from 95.61% to 98.96%, and the yield variation decreased from 6.35 to 0.29. The PS yield chart is shown in Figure 9.

Graph: Figure 9. PS yield in CF.

5. Conclusions

As product life cycles have become shorter, management in the early stages of manufacturing systems has become more important. Unfortunately, the data collected in the early manufacturing stage are usually small and incomplete. This paper proposed a multi-model approach for small data-set prediction in order to take advantage of various prediction-learning algorithms to achieve more robust and correct prediction results. In addition, the MMTD method was developed to build up the compromise weight system, and a TFT-LCD manufacturing case study was used to demonstrate the proposed model in detail. The empirical verification of 30 pilot experiment products and the yield improvement show that the proposed method is both useful and effective when the data size is small. For the model parameters' setting, since the proposed model is proved to be robust on various parameters, for an engineer who wants to use this proposed methodology, we suggest using the parameter values listed in Table 3. However, this is a case-by-case problem, using a couple of parameter values to ensure the model's correctness is also a careful process to find the optimal final model. In this paper, we only use TFT-LCD data sets to verify the proposed method. In future work, other product data sets for production yield prediction can also be valuable to verify this approach.

References 1 Ali, S and Smith-Miles, KA. 2006. A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70: 173–186. 2 Amari, S and Wu, S. 1999. Improving support vector machine classifiers by modifying kernel functions. Neural Networks, 12: 783–789. 3 Breiman, L. 1996. Bagging predictors. Machine Learning, 24(2): 123–140. 4 Byon, E, Shrivastava, AK and Ding, YA. 2010. Classification procedure for highly imbalance class sizes. IIE Transactions, 42: 288–303. 5 Cristianini, N and Shawe-Taylor, J. 2000. An introduction to support vector machines and other kernel-based learning methods, Cambridge: The Edinburgh Building. 6 Cunningham, JA. 1990. The use and evaluation of yield models in integrated circuit manufacturing. IEEE Transactions on Semiconductor Manufacturing, 3(2): 60–71. 7 Freund, Y and Schapire, RE. 1996. "Experiments with a new boosting algorithm". In 13th International Conference on Machine Learning, Edited by: Bari, I and Saitta, L. San Francisco: Morgan Kaufmann. 8 Haykin, S. 1994. Neural networks: a comprehensive foundation, Upper Saddle River, NJ: Prentice-Hall. 9 Ho, TK, Hull, JJ and Srihari, SN. 1994. Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1): 66–75. Huang, CF and Moraga, C. 2004. A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35: 137–161. Kumar, N. 2006. A review of yield modeling techniques for semiconductor manufacturing. International Journal of Production Research, 44(23): 5019–5036. Kuncheva, LI. 2004. Combining pattern classifiers: methods and algorithms, Hoboken, NJ: Wiley. Li, DC, Chen, L-S and Lin, Y-S. 2003. Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17): 4011–4024. Li, DC. 2007. Using mega-trend diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers and Operations Research, 34: 966–982. Li, DC. 2010. A yield forecast model for pilot products using support vector regression and manufacturing experience – the case of large-size polarizer. International Journal of Production Research, 48(18): 5481–5496. Li, DC and Liu, CW. 2008. A neural network weight determination model designed uniquely for small data set learning. Expert Systems with Applications, 36: 9853–9858. Li, DC and Yeh, CW. 2008. A non-parametric learning algorithm for small manufacturing data sets. Expert Systems with Applications, 34(1): 391–398. Ma, Y and Cherkassky, V. 2003. Multiple model classification using SVM-based approach. The International Joint Conference on Neural Network, 4: 1581–1586. Merz, CJ. 1999. Using correspondence analysis to combine classifiers. Machine Learning, 36(1): 33–58. Niyogi, P, Girosi, F and Tomaso, P. 1998. Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86: 2196–2209. Pearn, WL and Cheng, YC. 2010. Measuring production yield for processes with multiple characteristics. International Journal of Production Research, 48(15): 4519–4536. Pearn, WL. 2010. Procedure of the convolution method for estimating production yield with sample size information. International Journal of Production Research, 48(5): 1245–1265. Pearn, WL, Wang, FK and Yen, CH. 2006. Measuring production yield for processes with multiple quality characteristics. International Journal of Production Research, 44(21): 4649–4661. Reformat, M and Yager, R. 2008. Building ensemble classifiers using belief functions and OWA operators. Soft Computing, 12(6): 543–558. Sánchez, VDA. 2003. Advanced support vector machines and kernel methods. Neurocomputing, 55: 5–20. Shawkat, A, Kate, A and Smith, M. 2006. A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70: 173–186. Shu, MH and Wu, HC. 2010. Measuring the manufacturing process yield based on fuzzy data. International Journal of Production Research, 48(6): 1627–1638. Todorovski, L and Džeroski, S. 2003. Combining multiple models with meta decision trees. Machine Learning, 50(3): 223–249. Tong, LI, Lee, WI and Su, CT. 1997. Using a neural network-based approach to predict the wafer yield in integrated circuit manufacturing. IEEE Transactions on Components, Packaging, and Manufacturing Technology – Part C, 20(4): 288–194. Vapnik, V. 1995. The nature of statistical learning theory, New York: Springer Verlag. Wu, L and Zhang, J. 2010. Fuzzy neural network based yield prediction model for semiconductor manufacturing system. International Journal of Production Research, 48(11): 3225–3243. Yu, J, Wang, Y and Shen, Y. 2008. Noise reduction and edge detection via kernel anisotropic diffusion. Pattern Recognition Letters, 29(10): 1496–1503.

By Der-Chiang Li; Chiao-Wen Liu and Wen-Chih Chen

Reported by Author; Author; Author

Titel:	A multi-model approach to determine early manufacturing parameters for small-data-set prediction
Autor/in / Beteiligte Person:	LI, Der-Chiang ; LIU, Chiao-Wen ; CHEN, Wen-Chih
Link:	Volltext (PDF) View record from PASCAL Archive
Zeitschrift:	International journal of production research, Jg. 50 (2012), Heft 22-24, S. 6679-6690
Veröffentlichung:	Abingdon: Taylor & Francis, 2012
Medientyp:	academicJournal
Umfang:	print, 1 p.1/4
ISSN:	0020-7543 (print)
Schlagwort:	Control theory, operational research Automatique, recherche opérationnelle Sciences exactes et technologie Exact sciences and technology Sciences appliquees Applied sciences Recherche operationnelle. Gestion Operational research. Management science Recherche opérationnelle et modèles formalisés de gestion Operational research and scientific management Théorie de la fiabilité. Renouvellement des équipements Reliability theory. Replacement problems Gestion des stocks, gestion de la production. Distribution Inventory control, production control. Distribution Informatique; automatique theorique; systemes Computer science; control theory; systems Intelligence artificielle Artificial intelligence Connexionnisme. Réseaux neuronaux Connectionism. Neural networks Affichage cristaux liquides Liquid crystal displays Arithmétique intervalle Interval arithmetic Aritmética intervalo Classification à vaste marge Vector support machine Máquina ejemplo soporte Commande multimodèle Multimodel control Control multimodelo Dispositif cristaux liquides Liquid crystal devices Donnée manquante Missing data Dato que falta Fiabilité Reliability Fiabilidad Gestion production Production management Gestión producción Information incomplète Incomplete information Información incompleta Matrice intervalle Interval matrix Matriz intervalo Modèle multiple Multimodel Modelo múltiple Modélisation Modeling Modelización Méthode empirique Empirical method Método empírico Prévision Forecasting Previsión Rendement Yield Rendimiento Régression multiple Multiple regression Regresión múltiple Réseau neuronal Neural network Red neuronal Structure donnée Data structure Estructura datos Transistor couche mince Thin film transistor Transistor capa delgada Modèle donnée Data models Modelo de datos TFT-LCD forecasting manufacturing multi-model small data set
Sonstiges:	Nachgewiesen in: PASCAL Archive Sprachen: English Original Material: INIST-CNRS Document Type: Article File Description: text Language: English Author Affiliations: Department of Industrial and Information Management, National Cheng Kung University, Tainan, Tawain, Province of China Rights: Copyright 2014 INIST-CNRS ; CC BY 4.0 ; Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS Notes: Computer science; theoretical automation; systems ; Operational research. Management

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.