Einzeltreffer — DigiBib

Product life cycles are becoming shorter, especially in the optoelectronics industry. Shortening production cycle times using knowledge obtained in pilot runs, where sample sizes are usually very small, is thus becoming a core competitive ability for firms. Machine learning algorithms are widely applied to this task, but the number of training samples is always a key factor in determining their knowledge acquisition capability. Therefore, this study, based on box-and-whisker plots, systematically generates more training samples to help gain more knowledge in the early stages of manufacturing systems. A case study of a TFT-LCD manufacturer is taken as an example when a new product was phased-in in 2008. The experimental results show that it is possible to rapidly develop a production model that can provide more information and precise predictions with the limited data acquired from pilot runs.

Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs.

Product life cycles are becoming shorter, especially in the optoelectronics industry. Shortening production cycle times using knowledge obtained in pilot runs, where sample sizes are usually very small, is thus becoming a core competitive ability for firms. Machine learning algorithms are widely applied to this task, but the number of training samples is always a key factor in determining their knowledge acquisition capability. Therefore, this study, based on box-and-whisker plots, systematically generates more training samples to help gain more knowledge in the early stages of manufacturing systems. A case study of a TFT-LCD manufacturer is taken as an example when a new product was phased-in in 2008. The experimental results show that it is possible to rapidly develop a production model that can provide more information and precise predictions with the limited data acquired from pilot runs.

Keywords: TFT-LCD; small data set learning; box-and-whisker plots; distribution reconstructing; M5' model tree; artificial sample

1. Introduction

Product life cycles are becoming increasingly shorter as a result of the increasing pressure of global competition. However, companies may dominate their target market if they can shorten their production cycle time to accelerate the time to market of new products.

In the optoelectronics manufacturing industry, a TFT-LCD (thin film transistor liquid crystal display) is mainly composed of a thin film transistor panel and a colour filter (CF) panel. To produce a TFT-LCD, the four processes in the major manufacturing procedure, shown in Figure 1, are as follows. The first two processes, array and CF, are similar to the semiconductor manufacturing process. The third process, cell, assembles the arrayed back substrate and the CF front substrate and then fills the space between them with liquid crystal. Finally, there is the module assembly process, which connects additional components (e.g. driver integrated circuits and backlight units) to the fabricated glass panel.

Graph: Figure 1. TFT-LCD manufacturing process.

If a failure is detected in a panel after the cell process, not only are the costs and the quality efforts of the first three processes wasted, but the ultimate delivery time will also be postponed. A failure that commonly occurs in the cell process is the shifting problem caused when combining the two glass substrates. The alignment error indicator, Cell-Vernier, is a key index used to identify the quality of a panel. If the value of the indicator does not comply with the specifications, the panel will be scrapped. In order to control the Cell-Vernier to avoid the shifting problem, six input attributes, the TPEs (total pitch errors), can be measured on a CF substrate when the CF process is completed. These are based on six lines, including four lines across both sides and two diagonal lines on a CF substrate (shown in Figure 2). The six TPEs, and D2, are the differences measured between the designed and actual lengths of these lines in mm.

Graph: Figure 2. The six TPEs (total pitch errors) related coordinates in a panel.

If the Cell-Vernier can be predicted with the six TPEs measured in the CF process, the company can thus avoid unnecessary costs in the cell process. Whenever a new product is being phased in, a pilot run from the first step (including the array and CF processes) to the second step (the cell process) will take at least one week. In addition, to reduce costs, only a few items will be produced in a pilot run. Since the data needed to derive meaningful knowledge is hard to collect at this early stage of production, there needs to be an effective procedure to overcome the problem of insufficient data to deliver more useful information to decision makers.

Kernel-based neural networks with probabilistic reasoning derived from a probability/possibility consistency principle were applied to the problem of rare fault detection in semiconductor manufacturing quality control (Thomas et al. [22]). In addition, stacked neural networks (Wolpert [26]) were employed for small data sets to select the optimal process model for a wood pulp factory, since the related experiments are costly and time consuming (Lanouette et al. [12]). In addition, a Bayesian network was applied to a small size data set concerning the diagnosis of liver disorders with a method using Noisy-OR gates to learn the necessary parameters (Oniśko et al. 2001). Chan et al. ([3]) also proposed a genetic programming-based fuzzy regression to model manufacturing processes, in which the processes have fuzziness, and the number of experimental data for modelling is limited.

Adding a number of artificial samples to training sets is another effective method to improve the predictive capability of machine learning algorithms, such as virtual data generation (VDG) approaches, which are most often used in pattern recognition, where prior knowledge obtained from the given small training set is used to create virtual samples to enhance recognition ability (Niyogi et al. [19]), the results of which have been proven to be mathematically equivalent to incorporating prior knowledge. In industrial production, the functional virtual population (FVP) approach, the first method proposed for small dataset learning for scheduling problems in dynamic manufacturing environments, was developed to expand the domain of the system attributes and generate a number of virtual samples to construct so-called early scheduling knowledge (Li et al. [13]). However, the FVP approach requires many steps to complete the prediction procedure, since it is based on a trial and error process. In addition, a bootstrap procedure was proposed to enhance statistical inference in simulation experiments by generating bootstrap samples for training (Willemain et al. [25], Ivănescu et al. 2006).

Based on the principles of information diffusion (Huang [7]) derived from fuzzy theories, other VDG algorithms (Huang and Moraga [8], Li et al. [15], [16], [17]) have emerged in recent years. For instance, the diffusion neural network (DNN) (Huang and Moraga [8]) obtains derivative samples by symmetrically extending each sample to improve the learning results of the neural network, although how to obtain the diffusion coefficients is not clearly defined. Instead of fuzzifying each datum individually (Huang [7], Huang and Moraga [8]), a mega-fuzzification method that fuzzifies a set of data using a common membership function and is combined with a data trend estimation procedure to systematically extend the small dataset obtained in the early stages of manufacturing was presented in Li et al. ([16]). In addition, a technique called mega-trend diffusion (MTD) was proposed by Li et al. ([17]), which combines mega-diffusion with data trend estimation to overcome the symmetrical extension problem. Following the procedure of MTD, the virtual samples thus generated are added to the training set to improve the learning accuracies in FMS scheduling (Li et al. [17]) and in powder pilot runs (Li et al. [14]).

One problem with the VDG algorithms based on information diffusion is how to accurately estimate the population skewness and domain bounds. In order to deal with this issue, the proposed procedure employs a convenient descriptive statistical approach, box-and-whisker plots (Tukey [23]), to help address where the values are located. A box-and-whisker plot (box plot hereafter) is widely applied to concretely represent the population information of a group of numerical data, even when the sample size is small (e.g. 10 or fewer). In practice, the advantages of applying a box plot to deal with the limited data obtained in pilot runs are that it can rapidly obtain the reasonable range within which values should be located without using abstruse algorithms to estimate the value bounds, and it can easily acquire the population skewness to construct an asymmetric triangle membership function to learn the possibility of a value occurring.

In this study, more training samples are systematically generated by a simply heuristic mechanism within the value bounds obtained from the analysis results of the box plots. When a new training set is formed by adding the generated training samples, the M5' model tree (Wang and Witten [24]) is employed to obtain more tree-based knowledge at the early production stages. The experimental results show that it is possible to rapidly develop a production model that can provide more information and precise predictions with the limited data acquired from pilot runs.

The rest of this paper is organised as follows: the box-and-whisker plots and the M5' model tree algorithm are introduced in Section 2, Section 3 describes the proposed method, and Section 4 describes the detailed implementation of the modelling procedure. In order to validate the effectiveness of this method, the experimental results of the TFT-LCD case and discussions are provided in Section 5. Finally, the conclusions are presented in Section 6.

2. Background

There are two topics introduced in this section, the box-and-whisker plots and the M5' model tree.

2.1 The box-and-whisker plots

The box-and-whisker plots were proposed by Tukey in 1977. They are a convenient graphic tool in descriptive analysis to display a group of numerical data through its medians, means, quartiles, and minimum and maximum observations. A box plot (shown in Figure 3) is useful to display the distribution of data, examine symmetry and indicate potential outliers.

Graph: Figure 3. A right-skewed distribution drawn in a box-and-whisker plot.

A box is drawn to represent 50% of the data, where the box's upper boundary represents the upper quartile (Q3) of the data and the lower boundary the lower quartile (Q1). The length of the box represents the interquartile range (IQR), which is calculated by

Graph

The median (i.e. Q2) is demonstrated by a straight line drawn inside the box, and the mean is marked as a plus (+) symbol. There are two fences in a box plot. The upper fence is defined as higher than Q3, and the lower fence is defined as lower than Q1. The fences are not drawn in the plots. The small square symbols outside the fences denote the outliers of the observations. Whiskers in a box plot are drawn from the upper and lower edges of the box to the largest and smallest observations, respectively, within the upper and lower fences.

A box plot for normally distributed data should be symmetric, which means the mean is close to the median line, the median line roughly evenly divides the box and the lengths of the two whiskers are roughly equal. For a non-normal distribution, skewness can be detected when the median line deviates from the centre of the box downward, and the upper whisker is longer than the lower one; that is, the right tail is longer and the distribution is right skewed. The opposite conditions represent a left-skewed distribution.

2.2 The M5' model tree

Model tree algorithms are treated as an extension of the classification and regression tree (CART) proposed by Breiman et al. ([2]). In addition to their high predictive capability, they can provide clear classification rules with a causal relation tree structure for decision makers, the same as decision trees. The main difference between model tree algorithms and CART is that each node is replaced by a regression plane instead of a constant value. Some of the model tree algorithms developed are M5 (Quinlan [21]), RETIS (Karalic [10]), M5' (Wang and Witten [24]; Frank et al. [6]), GUIDE (Loh [18]) and SECRET (Dobra and Gehrke [4]). They all follow the same stages to build the trees, grow and prune them, but the differences between them are the methods of attribute preprocessing, splitting criterion computation and pruning.

The M5' model tree, developed based on the growing procedure of M5, added the ability to handle nominal attributes, missing values and sharp discontinuous compensation. The two model trees are widely applied to solve many real cases (Bhattacharya and Solomatine [1], Kucuksille et al. [11]) nowadays. In addition, the main advantage of model trees is that they can represent understandable rules which can be readily expressed so as to be known by humans. The detailed modelling procedure of the M5' model tree will be introduced in the next section.

3. The proposed procedure

As shown in Figure 4, there are two main sections of the proposed procedure. The purpose of the first section is to generate more samples for training to enhance the knowledge acquisition capability of learning tools (e.g. M5' model tree), while the goal of the second section is to obtain more information and represent the learned knowledge concretely.

Graph: Figure 4. The eight processes of the proposed modelling procedure.

3.1 Training sample generation

The four main processes to generate more training samples are depicted in the following subsections.

Domain conjecture

In this paper, the box-and-whisker plots are used to help understand the distribution, skewness and outliers of a group of observations. As the definition of box-and-whisker plots mentioned in Section 2.1, the reasonable range where most values locate should be within the two fences (see Figure 3), which are employed to define the possible value bounds as

Graph

and

Graph

where L and U denote the lower and upper bounds of the observations, respectively. Whenever the minimum (or maximum) value of the observations is outside the two fences, it implies that there would be some certain values located between the minimum (or maximum) value and the lower (or upper) fence. Hence, the estimated lower (or upper) bound should move toward the value to contain this area, and Equations (2) and (3) are thus revised as

Graph

where min and max are the minimum and maximum values of the observations, respectively.

Distribution reconstructing

In order to rebuild the distribution of observations, the fuzzy technique concept is employed in this study. As shown in Figure 5, an asymmetric triangle membership function (MF) can be drawn based on L, U, and the median (i.e. Q2). Note that the area coloured red is the result caused by moving L to min, and the reason a median is employed in this study is that, when the sample size is small, a mean would be easier to affect by extreme outliers than a median.

Graph: Figure 5. An asymmetric triangle membership function is drawn to reconstruct the distribution of observations.

Value filling

The objective of this process is to generate more values to fill the gaps between observations to enhance the knowledge acquisition capability of learning tools. In order to make the synthetic values obey the rebuilt distribution (i.e. central tendency), a simply heuristic mechanism is proposed to assess whether a randomly generated value is the fittest. Three steps of the value filling process are described as follows:

Step 1

Set the MF value of Q2 to 1 as the height of the triangle membership function. Accordingly, the MF value is employed as the occurrence possibility of a value (sample point), and is restricted within the range of [0, 1].

Step 2

As shown in Figure 6, randomly generate a value within [L, U] based on a uniform distribution as a temporary value (tv), and then use Equation (6) to calculate the corresponding MF value as its occurrence possibility. Equation (6) is the relative height of tv derived from both sides of triangle Q2.

Graph

Graph: Figure 6. Calculating the MF value of tv.

Step 3

Set the MF value (i.e. the occurrence possibility) of tv as a dynamic threshold value for testing another randomly created seed value (s), which is also within the interval [0, 1]. In this test, the tv is kept as the synthetic value (v) if the MF value of tv is higher, meaning that the tv is closer to the median and is more probably occurring, otherwise the tv should be dropped and we repeat Step 2 to generate another tv. These steps are described in Equation (7).

Graph

From Figure 7, one finds that, if the lower bound (L) moves from the lower fence (i.e. ) to min, the occurrence possibility of tv will increase, meaning that tv can have a higher probability of being kept as s synthetic value. On the other hand, the occurrence possibility of outliers positioned between [L, lower fence] is relatively low, meaning only a few temporary values can be kept as synthetic values.

Graph: Figure 7. Moving L from lower fence toward min makes tv relatively close to Q2 and thus increases the occurrence possibility of tv.

Sample set forming

Before forming a training sample, an assumption of independence between attributes is needed in this research. Since the size of the acquired data is small (e.g. 10 or fewer), the relations between attributes may not be completely reliable. Here, assume an obtained data set contains m attributes and n samples. For each attribute, if the expected number of synthetic training samples is N, the process in the 'Value filling' section should be repeated times, and the final training set used for learning tools should contain the n real and N synthetic samples.

3.2 Forecasting model building

The five processes in building a complete M5' model tree include numeric input attribute preprocessing, branching data to grow a complete tree, calculating a regression model in each node for pruning and prediction, pruning the tree to avoid the overfitting problem, and using a smoothing procedure to compensate for the sharp discontinuities caused by the data division. However, an unpruned M5' tree is employed in this work, because the acquirable data size is small. Therefore, we will introduce all of the processes except tree pruning in the following sections.

Input attributes preprocessing

Since a model tree requires a discrete feature space for the data splitting process, the domain spaces of numeric attributes need to be discretised first. The most commonly used unsupervised methods are equal width discretisation and equal frequency discretisation (Dougherty et al. [5]), and the equal width interval binning is employed in this paper. Let max and min denote the value bounds of a numeric attribute, and K, set as 20 in this research, is the expected number of intervals defined by the user. The width is then computed as:

Graph

and the bin bounds are constructed at , where . Therefore, we can have splitting positions of an attribute in the following data splitting process test.

However, when the size of data is small (e.g. smaller than K), the expected number of intervals may not be available, because there may be no values locating between certain bin bounds.

Data splitting

The splitting criterion is used to determine which attribute is the best to split a portion T of the training data set which reaches a particular node. It is based on the standard deviation (sd) of the output values in T as a measure of the error at the node, and the expected reduction in error as a result of testing each attribute at that node. After calculating the measure of the error for all attributes, the splitting position which has the minimum error is chosen as the branching node. This means that the error of T before splitting can be reduced to the minimum error after branching. The expected error reduction, SDR, is calculated by

Graph

where T1, T2 are the sets that result from splitting the node according to the chosen attribute.

In order to choose the splitting position that has the maximum SDR as the branching criterion, the computation would repeat on the splitting positions of each attribute.

This step will repeat until the branching process terminates when the data size of a portion T is less than four. Here, the portion T is called an interior node of a model tree. If the size of T1 (T2) meets the terminative criterion, T1 (T2) is named a leaf node.

Regression modelling

When the branching process terminates, a linear regression model is calculated for each interior and leaf node of the unpruned tree. A regression model takes the form

Graph

where a1, a2, ... , ak are the attribute values, and w0, w1, w2, ... , wk are the regression coefficients.

Prediction smoothing

In M5', the smoothing process is used to compensate for the sharp discontinuities that will inevitably occur between adjacent linear models on the leaves of the pruned tree. An appropriate smoothing calculation is

Graph

where is the prediction passed up to the next higher node, P is the prediction passed to this node from below, q is the value predicted by the model at this node, n is the number of training samples that reach the node below and k is a smoothing constant. The smoothing process will be performed only when the model tree is used to carry out forecasting tasks.

3.3 Summary of the implementation procedure

Suppose that an obtained data set contains n samples with m attributes, and the expected number of synthetic training samples is N. The proposed procedure can be summarised in six steps, as follows:

Step 1

Domain conjecture: for each of the m attributes, estimate the corresponding m set domain bounds (L, U) by using Equations (1), (4) and (5).

Step 2

Distribution reconstructing: rebuild m distributions for the m attributes by employing m triangle membership functions with the m sets of L, U and Q2.

Step 3

Sample set generation: repeat steps 3.1 to 3.2 N times to generate the expected synthetic training samples.

Step 3.1

For each of the m triangle membership functions, follow the three steps mentioned in the 'Value filling' section to acquire m synthetic values.

Step 3.2

Based on the assumption of independence between attributes, combining the m synthetic values to create a training sample set.

Step 4

Final training set forming: append the N synthetic training samples to the original data set to form the final training set, which contains samples.

Step 5

Grow an unpruned M5' model tree: initially, set the training set as the portion T, and call it 'root'. Repeat steps 5.1 to 5.4 to grow the tree until all the 'portion Ts' meet the terminal criterion.

Step 5.1

Input attribute preprocessing: use Equation (8) to equally partition each domain of the input attributes into K intervals to obtain splitting positions.

Step 5.2

For each of the splitting positions, calculate SDR with Equation (9).

Step 5.3

Choose the splitting position that has the maximum SDR as the branching criterion, and then divide portion T into two subsets (i.e. T1 and T2).

Step 5.4

Set the subset T1 (or T2) as the portion T, and then examine whether the size of the new portion T is less than four. If not, continue splitting the portion and call it 'interior node' or name it 'leaf node' and terminate this splitting process.

Step 6

Regression models building: for each of the interior and leaf node, calculate a linear regression model with the samples stored at it.

4. Detailed implementation of the experiment

In this paper, a problem is solved using a data set of 19 samples (see Table 1) obtained from a series of six pilot runs when a new 19" TFT-LCD product was phased-in in 2008. The best way for us to carry out our experiments is to perform a sensitivity analysis of the training sample sizes. Since there are only three or four experimental items issued in each pilot run, the training sample sizes for the sensitivity analysis are set as 3, 6, 9, 12, 15 and 18 to simulate the data collected in each pilot run stage. When a certain number of samples (e.g. six) are randomly taken from the data set without replacement as the training set, the remainder (e.g. 13) is regarded as the testing set. The system performance of each 3, 6, 9, 12 and 15 training sample sizes is measured by the average learning error of 10 experimental runs, while the training sample size of 18 is examined with a leave-one-out cross validation. The details of the proposed procedure are depicted in 10 steps with an example of six training samples:

Table 1. The 19 samples obtained from a series of six pilot runs.

	The six TPEs	Cell-Vernier
No.	Y1	Y6	X1	X4	D1	D2	Y
1	2.3000	1.0740	0.4460	1.6020	1.4550	0.4790	1.0913
2	0.0880	−0.6780	−0.6555	−0.1265	−0.8885	0.4444	1.5990
3	0.4945	−0.1145	−0.5975	0.4410	−0.5775	0.0684	1.1923
4	1.0700	1.2810	0.3150	1.0230	1.0590	0.4035	1.2692
5	0.9950	1.1230	1.4650	2.3870	0.6830	1.6452	1.1087
6	0.3785	−0.1740	−0.5715	−0.0625	−0.2600	0.0994	1.1971
7	1.9870	1.3950	1.4190	2.8960	1.3810	1.9795	1.1183
8	1.7470	0.9450	1.3200	2.4810	1.4500	1.2474	1.6904
9	1.5960	0.9450	1.4050	1.9930	2.0300	1.3277	0.9471
10	2.0150	1.4410	0.7010	1.8500	1.7390	1.0101	1.0731
11	1.1705	−0.0630	0.1320	0.4140	0.9410	−0.0083	1.8788
12	1.8470	0.9630	1.6510	2.1600	2.1620	1.5899	1.2962
13	0.7780	−0.9790	0.4180	0.9510	0.3370	−0.4092	0.9798
14	0.0170	0.1815	−0.2080	−0.3185	0.5185	−0.0378	1.9577
15	0.8025	−1.7490	−0.4720	−0.5025	−0.2805	0.8255	1.2865
16	1.8330	1.4010	1.1100	2.3400	1.1770	1.5551	1.0894
17	0.5370	0.6340	1.7250	2.3160	1.0940	1.0937	1.3663
18	1.4090	−0.0340	0.6490	0.7400	1.5150	−0.0221	1.4135
19	−0.2560	−1.8570	−0.2460	−1.1275	−0.5855	0.4568	1.5067

Step 1

Randomly select six samples from Table 1 as the training set to simulate the scenario that the data are obtained when the second pilot run is finished. The six selected samples are listed in Table 2. Note that when the training sample size is 18, each sample will be taken out to form the testing set and the others are regarded as the training set in turns for 19 validation runs.

Table 2. The six training samples randomly selected from Table 1.

	The six TPEs	Cell-Vernier
No.	Y1	Y6	X1	X4	D1	D2	Y
2	0.0880	−0.6780	−0.6555	−0.1265	−0.8885	0.4444	1.5990
5	0.9950	1.1230	1.4650	2.3870	0.6830	1.6452	1.1087
9	1.5960	0.9450	1.4050	1.9930	2.0300	1.3277	0.9471
10	2.0150	1.4410	0.7010	1.8500	1.7390	1.0101	1.0731
14	0.0170	0.1815	−0.2080	−0.3185	0.5185	−0.0378	1.9577
18	1.4090	−0.0340	0.6490	0.7400	1.5150	−0.0221	1.4135

Step 2

For each of the seven attributes, calculate the IQR using Equation (1), and then obtain the estimated lower bound (T) and upper bound (U) employing Equations (4) and (5), respectively. In this example, taking the Y1 TPE for instance, Q1, Q2 and Q3 equal 0.3148, 1.2020, and 1.5493, respectively, and IQR is . The L and U can be obtained as:

Graph

The results are shown in Table 3.

Table 3. The L and U of each attribute.

	Y1	Y6	X1	X4	D1	D2	Y
Q1	0.3148	0.0199	0.0063	0.0901	0.5596	0.0945	1.0820
Q2	1.2020	0.5633	0.6750	1.2950	1.0990	0.7273	1.2611
Q3	1.5493	1.0785	1.2290	1.9573	1.6830	1.2483	1.5526
IQR	1.2345	1.0586	1.2228	1.8671	1.1234	1.1538	0.4706
L	−1.5370	−1.5681	−1.8279	−2.7106	−1.1254	−1.6361	0.3761
U	3.4010	2.6664	3.0631	4.7579	3.3681	2.9790	2.2586

Step 3

For each of the seven attributes, draw an asymmetric triangle membership function based on the corresponding L, U and Q2. In this example, the triangle membership function of attribute Y1 can thus be drawn as Figure 8.

Graph: Figure 8. The triangle membership function of the attribute Y1.

Step 4

For each of the seven attributes, follow the three steps mentioned in the 'Value filling' section to acquire seven synthetic values. For instance, if a temporary value (tv) of Y1 is generated as −0.3503, its corresponding MF value calculated using Equation (6) is 0.433. When another random seed value (s) obtained is smaller than 0.433 (e.g. 0.398), the temporary value (−0.3503) is then kept as a synthetic value, otherwise it is dropped. The outcome of the retained value of each attribute is summarised in Table 4.

Table 4. The results of each attribute's synthetic value.

	Y1	Y6	X1	X4	D1	D2	Y
Attribute value	−0.3503	0.8949	2.0276	0.2361	2.4047	−0.0625	1.1639

Step 5

Combine the values in Table 4 to generate a synthetic training sample. For example, if 100 synthetic training samples are expected to be created, steps 4 to 5 should be repeated another 99 times.

Step 6

Append the generated training samples to the six real samples to form the final training set, which has 106 samples.

The effectiveness of the proposed sample generation procedure can be shown in Figure 9. The distribution of attribute Y1 of the six real samples is given in Figure 9(a), while that of the 106 training set is in Figure 9(b). Initially, the distribution of Y1 appears to be fragile and crisp, and fills the gap between values. Obviously, it is quite difficult to obtain more information from the distribution in Figure 9(a). In contrast, the structure of the distribution in Figure 9(b) is more complete. According to our research, when sample size increases the central tendency of a rebuilt distribution should be more significant. And the result shown in Figure 9(b) represents the effectiveness of the proposed heuristic mechanism. In addition, as the expression mentioned in 'Input attributes processing' section, the distribution of the 106-sample training set can afford more effective bin bounds (i.e. splitting positions) for the tree-growing process. Therefore, the proposed procedure strengthens the knowledge acquisition capability of the M5' model tree.

Graph: Figure 9. The distribution of attribute Y1 of (a) the six real samples; (b) the 106 training set.

Step 7

Take the final training set to build an M5' model tree. In order to build an M5' model tree, one may follow the steps mentioned in Section 3.3 or use WEKA data mining freeware, which can be downloaded from the official website (http://www.cs.waikato.ac.nz/ml/weka). Note that, since the training data size is small, we suggest building an unpruned M5' model tree in this study. The learning result of an unpruned M5' model tree built with the 106 final training samples is shown in Figure 10 (considering the size of the figure, only a part is shown).

Graph: Figure 10. The learning result of an unpruned M5' model tree built with the 106 final training samples.

Step 8

Take the rest of the 13 samples in Table 1 as a testing set to validate the forecasting model built in step 7, and then use Equation (12) to calculate the average error rate. The average error rate used in this study is the mean absolute percentage error (MAPE), which is calculated by

Graph

where M is the sample size of the testing set, and and are the actual and predicted values of the ith testing sample, respectively.

In the example, each real Cell-Vernier (Y) and the corresponding predicted value () of the testing set (13 samples) are shown in Table 5, with an MAPE of 13.39%. It takes approximately five minutes to build a forecasting model, including calculating the MAPE from steps 1 to 8 with the six training samples using the proposed procedure on a Pentium 4 3.0GB computer with 2GB Ram. If the procedure can be programmed, the whole process can be done in a minute at most. This processing time is acceptable for most TFT-LCD manufacturers.

Table 5. The real Cell-Vernier (Y), the predicted values (), and the related error percentages of the testing set.

No.	Y
1	1.0913	1.2320	12.89%
3	1.1923	1.1900	0.19%
4	1.2692	1.2160	4.19%
6	1.1971	1.1870	0.84%
7	1.1183	1.2000	7.31%
8	1.6904	1.1990	29.07%
11	1.8788	1.1890	36.71%
12	1.2962	1.3320	2.76%
13	0.9798	1.1740	19.82%
15	1.2865	1.0970	14.73%
16	1.0894	1.2260	12.54%
17	1.3663	1.2280	10.12%
19	1.5067	1.1610	22.94%

Step 9

Repeat steps 1 to 8 ten times and calculate the average MAPE of each training sample size using Equation (13), which is defined as:

Graph

where is the result of the jth experimental run, and k is the total experimental run set in each training sample size, and here is set as 10. Note that, when the training size is 18, k is then set as 19.

Step 10

Repeat steps 1 to 9 with different sample sizes of training set.

5. Computational results and discussion

In this section, unpruned M5' model trees built with the real and final training sets, respectively, are compared with regard to their predictive accuracy, knowledge learned, and the practical validation of the 19" TFT-LCD product obtained in the mass production stage.

5.1 The predictive accuracy comparison

The results of the sensitivity analysis of training sample sizes are represented in Table 6 and Figure 11, where 'M5" and 'this study' denote the unpruned M5' model trees built with the real and final training sets, respectively.

Graph: Figure 11. The trend of the average MAPE in each training sample size.

Table 6. The detailed computational MAPE of each training sample size with the corresponding runs.

Training sample size	Method	Experimental run	Average	Standard deviation	P-value (two-sided)
3	M5'	32.05%	23.85%	17.50%	24.41%	17.01%	26.30%	16.26%	54.04%	29.27%	19.06%	25.98%	11.24%	0.003
This study	24.49%	21.80%	16.55%	21.70%	15.68%	25.32%	13.65%	49.20%	20.35%	15.50%	22.42%	10.22%
6	M5'	22.33%	17.51%	14.72%	19.07%	17.04%	22.47%	16.28%	21.57%	18.87%	27.03%	19.69%	3.67%	0.012
This study	15.70%	15.66%	13.72%	16.56%	16.66%	15.11%	15.40%	12.47%	17.59%	13.39%	15.23%	1.60%
9	M5'	24.09%	16.61%	23.27%	28.59%	18.91%	17.87%	15.69%	16.54%	17.67%	16.61%	19.59%	4.27%	0.005
This study	15.12%	14.24%	15.18%	14.75%	15.90%	14.39%	12.68%	16.10%	13.17%	15.42%	14.70%	1.11%
12	M5'	11.44%	20.04%	17.96%	18.07%	15.29%	20.06%	18.00%	24.28%	17.74%	16.13%	17.90%	3.36%	0.001
This study	10.15%	15.98%	10.91%	17.48%	12.86%	15.04%	15.70%	14.12%	14.41%	11.03%	13.77%	2.45%
15	M5'	19.43%	10.25%	26.92%	39.15%	18.42%	11.47%	12.49%	12.96%	13.61%	14.31%	17.90%	8.95%	0.003
This study	11.29%	5.08%	15.35%	19.37%	12.27%	10.23%	9.33%	8.35%	8.69%	11.27%	11.12%	3.96%
18	M5'	14.02%	0.54%	8.63%	0.60%	7.21%	8.56%	33.43%	0.00%	27.78%	53.27%	16.74%	14.92%	0.001
23.22%	31.87%	2.15%	11.24%	16.76%	41.50%	14.07%	16.63%	6.49%
This study	7.33%	0.18%	2.75%	0.60%	0.35%	1.47%	22.13%	0.08%	21.45%	17.14%	6.99%	7.70%
8.27%	20.94%	0.59%	8.39%	0.94%	6.23%	6.15%	2.87%	4.96%

Generally, the acceptable error (MAPE) between a prediction and the real Cell-Vernier value is expected to be within 10%. When the training sample size increases, the two learning results (AvgMAPE) decrease, although it is difficult to achieve the goal (within 10%) by using the M5' model tree built with the real training sets. However, the results (AvgMAPE) of the M5' model tree built with the final training sets reveal fewer learning errors, and thus the forecast system is better and more stable.

Table 6 also shows that, when the training sets contain the synthetic training samples generated by the proposed procedure, a lower and more stable standard deviation of MAPE is observed. In addition, a paired two-sided t-test is employed to test the significance of the variation between the pairs of measure indicators, and all the statistically significant values (P-values) support the final training sets that contain the synthetic training samples at an alpha level of 0.05. This is proof that the proposed procedure can build a statistically useful and robust model.

5.2 The learning knowledge comparison

Since the process engineers need to obtain more information about the criteria range of the six TPEs, the M5' model tree can archive this. The M5' model tree can represent the learned knowledge with causal tree-based classification rules, and has precise predictive capability. However, the training sample size is always a key factor for the M5' model tree in determining knowledge acquisition capability.

We use the example (mentioned in Section 4) that has only six training samples to further illustrate this. When an unpruned M5' model tree is used to extract the knowledge hidden in the real training set, only two classification rules can be obtained (as shown in Figure 12). On the other hand, more information with 47 rules can be derived from the final training set (shown in Figure 10). From Figure 10, the process engineers can thus learn more about the criteria range of the six TPEs to reduce the potential costs in the cell process. This is further proof that the proposed procedure can build an informative and robust model.

Graph: Figure 12. The learning result of an unpruned M5' model tree built with the real training set (six samples).

5.3 The practical validation comparison

When the six pilot runs of the 19" TFT-LCD product were completed, the first 1000 issued products in the mass production stage were taken to form a validation dataset. The 19 samples collected in the six pilot runs are used to build two unpruned M5' model trees: one is learned with the 19 samples; while the other is learned with the 19 samples and 100 synthetic training samples. The predicted results of the two models are summarised in Table 7, and the MAPE are 28.89 and 8.06%, respectively. Obviously, the model built with the training set containing the synthetic training samples can still maintain good predictive capability in the future. Figure 13 provides the cumulative frequency charts of the results for the two models.

Graph: Figure 13. Results of the 1000 samples using an unpruned M5' model tree built with (a) the real training set (19 samples) and (b) the final training set (119 samples).

Table 7. Actual and two kinds of predicted values of the 1000 samples.

No	Actual	M5'	This research	No	Actual	M5'	This research	...	No	Actual	M5'	This research	No	Actual	M5'	This research
1	0.891	1.196	0.967	101	0.971	1.319	1.068	...	801	1.053	1.387	1.055	901	1.055	1.417	1.051
2	1.027	1.212	1.036	102	0.96	1.392	0.939	...	802	1.098	1.285	1.476	902	1.163	1.349	1.075
3	0.974	1.381	1.184	103	1.024	1.394	0.967	...	803	1.089	1.38	1.074	903	1.051	1.297	1.057
4	1.023	1.391	1.031	104	0.986	1.289	0.989	...	804	1.009	1.372	1.157	904	1.032	1.365	1.024
5	0.941	1.269	1.019	105	1.038	1.435	1.487	...	805	0.967	1.427	1.058	905	1.154	1.323	1.099
6	1.078	1.279	1.016	106	1.049	1.313	1.047	...	806	1.025	1.374	1.077	906	1.102	1.34	1.09
7	0.94	1.219	0.958	107	1.166	1.337	1.067	...	807	1.101	1.352	1.071	907	1.047	1.365	1.097
8	1.136	1.359	1.024	108	1.125	1.325	1.065	...	808	1.051	1.365	1.42	908	1.149	1.444	1.034
9	1.119	1.352	1.002	109	1.101	1.385	1.211	...	809	1.01	1.432	1.377	909	1.039	1.36	1.071
10	1.086	1.431	1.095	110	0.987	1.357	1.089	...	810	1.101	1.432	1.119	910	1.009	1.328	1.011
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
91	1.165	1.345	1.061	191	1.168	1.381	1.319	...	891	1.15	1.386	0.975	991	1.03	1.343	1.029
92	1.106	1.309	1.05	192	0.984	1.364	1.134	...	892	1	1.317	1.09	992	1.066	1.272	1.013
93	0.976	1.316	1.057	193	0.974	1.363	1.059	...	893	0.981	1.375	1.071	993	1.05	1.416	1.093
94	1.054	1.337	1.07	194	0.964	1.364	1.079	...	894	0.969	1.278	1.201	994	1.101	1.321	1.054
95	1.044	1.283	1.074	195	1.103	1.352	1.088	...	895	0.974	1.436	1.151	995	1.147	1.275	1.027
96	1.026	1.318	1.044	196	1.024	1.361	1.027	...	896	1.106	1.445	1.092	996	0.962	1.378	1.036
97	1.016	1.394	1.005	197	1.142	1.409	1.004	...	897	1.121	1.399	1.047	997	1.122	1.411	1.132
98	1.098	1.434	1.156	198	1.061	1.282	1.016	...	898	1.03	1.37	1.012	998	1.156	1.426	1.191
99	1.101	1.373	1.12	199	1.135	1.333	1.033	...	899	1.073	1.292	1.05	999	1.055	1.295	1.024
100	1.073	1.396	1.042	200	1.056	1.33	1.466	...	900	0.972	1.375	1.09	1000	0.991	1.405	1.147

Since the goal of the acceptable MAPE between a prediction and the real Cell-Vernier value is expected to be within 10%, from Figure 13(a), we can find that only two of the 1000 samples can reach the goal for the model built with the 19 samples, while 754 of the 1000 samples (shown in Figure 13(b)) can achieve the goal with our proposed procedure. This result also reveals that the proposed procedure can provide a practically robust model for predictive purposes.

6. Conclusions

In the optoelectronics manufacturing industry, if the value of the Cell-Vernier on a panel is out of specification after the Cell process, the panel will be scrapped. How to predict the Cell-Vernier using the six TPEs calibrated in the CF process in advance to avoid unnecessary costs and delays in delivery time is thus an important issue for TFT-LCD manufacturers. The proposed procedure makes the predictions more precise and more learned knowledge, i.e. the classification rules, can be extracted for engineers. If TFT-LCD manufacturers follow the procedure outlined in this study, it is expected that the inventory turnover rate will rise and thus costs can be reduced. The storehouse utilisation rate will also improve.

There have been few studies of small samples in this context to date, and potential exists to derive more comprehensive theories to obtain a higher rate of accuracy. In addition, taking dependency conditions between attributes into consideration when generating synthetic values is also considered a worthy subject in the future.

References 1 Bhattacharya, B and Solomatine, DP. 2006. Machine learning in sedimentation modelling. Neural Networks, 19(2): 208–214. 2 Breiman, L, Friedman, JH, Olshen, RA and Stone, CJ. 1984. Classification and regression trees, Monterey, CA: Wadsworth and Brooks. 3 Chan, KY, Kwong, CK and Tsim, YC. 2010. A genetic programming based fuzzy regression approach to modelling manufacturing processes. International Journal of Production Research, 48(7): 1967–1982. 4 Dobra, A. and Gehrke, J.E., 2002. SECRET: a scalable linear regression tree algorithm. Proceedings of the Eighth ACM SIGKDD International Conference: Knowledge Discovery and Data Mining, Edmonton, Alberta, 481–487 5 Dougherty, J., Kohavi, R., and Sahami, M., 1995. Supervised and unsupervised discretization of continuous features. Proceedings of the 12th International Conference on Machine Learning, San Francisco, 194–202 6 Frank, E, Wang, Y, Inglis, S, Holmes, G and Witten, IH. 1998. Technical note: using model trees for classification. Machine Learning, 32: 63–76. 7 Huang, CF. 1997. Principle of information diffusion. Fuzzy Sets and Systems, 91: 69–90. 8 Huang, CF and Moraga, C. 2004. A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35: 137–161. 9 Ivănescu, VC, Bertrand, JWM, Fransoo, JC and Kleijnen, JPC. 2006. Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57: 2–9. Karalic, A., 1992. Employing linear regression in regression tree leaves. Proceedings of the 10th European Conference on Artificial Intelligence. New York: Wiley, 440–441 Kucuksille, EU, Selbas, R and Sencan, A. 2009. Data mining techniques for thermophysical properties of refrigerants. Energy Conversion and Management, 50(2): 399–412. Lanouette, R, Thibault, J and Valade, JL. 1999. Process modeling with neural networks using small experimental datasets. Computers & Chemical Engineering, 23: 1167–1176. Li, DC, Chen, LS and Lin, YS. 2003. Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41: 4011–4024. Li, DC, Tsai, TI and Shi, S. 2009. A prediction of the dielectric constant of multi-layer ceramic capacitors using the mega-trend-diffusion technique in powder pilot runs: case study. International Journal of Production Research, 47(1): 51–69. Li, DC, Wu, CS and Chang, FMM. 2005. Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3–4): 321–328. Li, DC, Wu, CS, Tsai, TI and Chang, FM. 2006. Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33: 1857–1869. Li, DC, Wu, CS, Tsai, TI and Lin, YS. 2007. Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34: 966–982. Loh, WY. 2002. Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12: 361–386. Niyogi, P., Girosi, F., and Tomaso, P., 1998. Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, Oakland, California, 275–298 Oniśko, A, Druzdzel, MJ and Wasyluk, H. 2001. Learning Bayesian network parameters from small data sets: application of Noisy-OR gates. International Journal of Approximate Reasoning, 27: 165–182. Quinlan, J.R., 1992. Learning with continuous classes. Proceedings of the Fifth Australian Joint Conference, Hobert, Tasmania, 343–348 Thomas, M, Kanstein, A and Goser, K. 1997. Rare fault detection by possibilistic reasoning. Computational Intelligence – Theory and Applications, 1226: 294–298. Tukey, JW. 1977. Exploratory data analysis, Reading, MA: Addison-Wesley. Wang, Y. and Witten, I., 1997. Inducing model trees for continuous classes. Proceedings of Poster Papers of the 9th European Conference on Machine Learning, Prague, Czech Republic, 128–137 Willemain, TR, Bress, RA and Halleck, LS. 2003. Enhanced simulation inference using bootstraps of historical inputs. IIE Transactions, 35(9): 851–862. Wolpert, D. 1992. Stacked generalization. Neural networks, 5: 241–259.

By Der-Chiang Li; Chien-Chih Chen; Che-Jung Chang and Wen-Chih Chen

Reported by Author; Author; Author; Author

Titel:	Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs
Autor/in / Beteiligte Person:	LI, Der-Chiang ; CHEN, Chien-Chih ; CHANG, Che-Jung ; CHEN, Wen-Chih
Link:	Volltext (PDF) View record from PASCAL Archive
Zeitschrift:	International journal of production research, Jg. 50 (2012), Heft 6-8, S. 1539-1553
Veröffentlichung:	Abingdon: Taylor & Francis, 2012
Medientyp:	academicJournal
Umfang:	print, 3/4 p
ISSN:	0020-7543 (print)
Schlagwort:	Control theory, operational research Automatique, recherche opérationnelle Sciences exactes et technologie Exact sciences and technology Sciences appliquees Applied sciences Recherche operationnelle. Gestion Operational research. Management science Recherche opérationnelle et modèles formalisés de gestion Operational research and scientific management Gestion des stocks, gestion de la production. Distribution Inventory control, production control. Distribution Acquisition connaissances Knowledge acquisition Adquisición de conocimientos Algorithme apprentissage Learning algorithm Algoritmo aprendizaje Apprentissage(intelligence artificielle) Learning (artificial intelligence) Aptitude compétition Competitive ability Aptitud competencia Arbre graphe Tree(graph) Arbol grafo Cycle développement Life cycle Ciclo desarrollo Modélisation Modeling Modelización Optoélectronique Optoelectronics Optoelectrónica Pilote Pilot Produit nouveau New product Producto nuevo Système production Production system Sistema producción Taille échantillon Sample size Tamaño muestra Temps achèvement Completion time Tiempo acabado M5' model tree TFT-LCD artificial sample box-and-whisker plots distribution reconstructing small data set learning
Sonstiges:	Nachgewiesen in: PASCAL Archive Sprachen: English Original Material: INIST-CNRS Document Type: Article File Description: text Language: English Author Affiliations: Department of Industrial and Information Management, National Cheng Kung University, Tainan, Tawain, Province of China Rights: Copyright 2015 INIST-CNRS ; CC BY 4.0 ; Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS Notes: Operational research. Management

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.