Product life cycles are becoming shorter, especially in the optoelectronics industry. Shortening production cycle times using knowledge obtained in pilot runs, where sample sizes are usually very small, is thus becoming a core competitive ability for firms. Machine learning algorithms are widely applied to this task, but the number of training samples is always a key factor in determining their knowledge acquisition capability. Therefore, this study, based on box-and-whisker plots, systematically generates more training samples to help gain more knowledge in the early stages of manufacturing systems. A case study of a TFT-LCD manufacturer is taken as an example when a new product was phased-in in 2008. The experimental results show that it is possible to rapidly develop a production model that can provide more information and precise predictions with the limited data acquired from pilot runs.
Keywords: TFT-LCD; small data set learning; box-and-whisker plots; distribution reconstructing; M5' model tree; artificial sample
Product life cycles are becoming increasingly shorter as a result of the increasing pressure of global competition. However, companies may dominate their target market if they can shorten their production cycle time to accelerate the time to market of new products.
In the optoelectronics manufacturing industry, a TFT-LCD (thin film transistor liquid crystal display) is mainly composed of a thin film transistor panel and a colour filter (CF) panel. To produce a TFT-LCD, the four processes in the major manufacturing procedure, shown in Figure 1, are as follows. The first two processes, array and CF, are similar to the semiconductor manufacturing process. The third process, cell, assembles the arrayed back substrate and the CF front substrate and then fills the space between them with liquid crystal. Finally, there is the module assembly process, which connects additional components (e.g. driver integrated circuits and backlight units) to the fabricated glass panel.
Graph: Figure 1. TFT-LCD manufacturing process.
If a failure is detected in a panel after the cell process, not only are the costs and the quality efforts of the first three processes wasted, but the ultimate delivery time will also be postponed. A failure that commonly occurs in the cell process is the shifting problem caused when combining the two glass substrates. The alignment error indicator, Cell-Vernier, is a key index used to identify the quality of a panel. If the value of the indicator does not comply with the specifications, the panel will be scrapped. In order to control the Cell-Vernier to avoid the shifting problem, six input attributes, the TPEs (total pitch errors), can be measured on a CF substrate when the CF process is completed. These are based on six lines, including four lines across both sides and two diagonal lines on a CF substrate (shown in Figure 2). The six TPEs, and D2, are the differences measured between the designed and actual lengths of these lines in mm.
Graph: Figure 2. The six TPEs (total pitch errors) related coordinates in a panel.
If the Cell-Vernier can be predicted with the six TPEs measured in the CF process, the company can thus avoid unnecessary costs in the cell process. Whenever a new product is being phased in, a pilot run from the first step (including the array and CF processes) to the second step (the cell process) will take at least one week. In addition, to reduce costs, only a few items will be produced in a pilot run. Since the data needed to derive meaningful knowledge is hard to collect at this early stage of production, there needs to be an effective procedure to overcome the problem of insufficient data to deliver more useful information to decision makers.
Kernel-based neural networks with probabilistic reasoning derived from a probability/possibility consistency principle were applied to the problem of rare fault detection in semiconductor manufacturing quality control (Thomas et al. [
Adding a number of artificial samples to training sets is another effective method to improve the predictive capability of machine learning algorithms, such as virtual data generation (VDG) approaches, which are most often used in pattern recognition, where prior knowledge obtained from the given small training set is used to create virtual samples to enhance recognition ability (Niyogi et al. [
Based on the principles of information diffusion (Huang [
One problem with the VDG algorithms based on information diffusion is how to accurately estimate the population skewness and domain bounds. In order to deal with this issue, the proposed procedure employs a convenient descriptive statistical approach, box-and-whisker plots (Tukey [
In this study, more training samples are systematically generated by a simply heuristic mechanism within the value bounds obtained from the analysis results of the box plots. When a new training set is formed by adding the generated training samples, the M5' model tree (Wang and Witten [
The rest of this paper is organised as follows: the box-and-whisker plots and the M5' model tree algorithm are introduced in Section 2, Section 3 describes the proposed method, and Section 4 describes the detailed implementation of the modelling procedure. In order to validate the effectiveness of this method, the experimental results of the TFT-LCD case and discussions are provided in Section 5. Finally, the conclusions are presented in Section 6.
There are two topics introduced in this section, the box-and-whisker plots and the M5' model tree.
The box-and-whisker plots were proposed by Tukey in 1977. They are a convenient graphic tool in descriptive analysis to display a group of numerical data through its medians, means, quartiles, and minimum and maximum observations. A box plot (shown in Figure 3) is useful to display the distribution of data, examine symmetry and indicate potential outliers.
Graph: Figure 3. A right-skewed distribution drawn in a box-and-whisker plot.
A box is drawn to represent 50% of the data, where the box's upper boundary represents the upper quartile (Q3) of the data and the lower boundary the lower quartile (Q1). The length of the box represents the interquartile range (IQR), which is calculated by
Graph
The median (i.e. Q2) is demonstrated by a straight line drawn inside the box, and the mean is marked as a plus (+) symbol. There are two fences in a box plot. The upper fence is defined as higher than Q3, and the lower fence is defined as lower than Q1. The fences are not drawn in the plots. The small square symbols outside the fences denote the outliers of the observations. Whiskers in a box plot are drawn from the upper and lower edges of the box to the largest and smallest observations, respectively, within the upper and lower fences.
A box plot for normally distributed data should be symmetric, which means the mean is close to the median line, the median line roughly evenly divides the box and the lengths of the two whiskers are roughly equal. For a non-normal distribution, skewness can be detected when the median line deviates from the centre of the box downward, and the upper whisker is longer than the lower one; that is, the right tail is longer and the distribution is right skewed. The opposite conditions represent a left-skewed distribution.
Model tree algorithms are treated as an extension of the classification and regression tree (CART) proposed by Breiman et al. ([
The M5' model tree, developed based on the growing procedure of M5, added the ability to handle nominal attributes, missing values and sharp discontinuous compensation. The two model trees are widely applied to solve many real cases (Bhattacharya and Solomatine [
As shown in Figure 4, there are two main sections of the proposed procedure. The purpose of the first section is to generate more samples for training to enhance the knowledge acquisition capability of learning tools (e.g. M5' model tree), while the goal of the second section is to obtain more information and represent the learned knowledge concretely.
Graph: Figure 4. The eight processes of the proposed modelling procedure.
The four main processes to generate more training samples are depicted in the following subsections.
Domain conjecture
In this paper, the box-and-whisker plots are used to help understand the distribution, skewness and outliers of a group of observations. As the definition of box-and-whisker plots mentioned in Section 2.1, the reasonable range where most values locate should be within the two fences (see Figure 3), which are employed to define the possible value bounds as
Graph
and
Graph
where L and U denote the lower and upper bounds of the observations, respectively. Whenever the minimum (or maximum) value of the observations is outside the two fences, it implies that there would be some certain values located between the minimum (or maximum) value and the lower (or upper) fence. Hence, the estimated lower (or upper) bound should move toward the value to contain this area, and Equations (
Graph
Graph
where min and max are the minimum and maximum values of the observations, respectively.
Distribution reconstructing
In order to rebuild the distribution of observations, the fuzzy technique concept is employed in this study. As shown in Figure 5, an asymmetric triangle membership function (MF) can be drawn based on L, U, and the median (i.e. Q2). Note that the area coloured red is the result caused by moving L to min, and the reason a median is employed in this study is that, when the sample size is small, a mean would be easier to affect by extreme outliers than a median.
Graph: Figure 5. An asymmetric triangle membership function is drawn to reconstruct the distribution of observations.
Value filling
The objective of this process is to generate more values to fill the gaps between observations to enhance the knowledge acquisition capability of learning tools. In order to make the synthetic values obey the rebuilt distribution (i.e. central tendency), a simply heuristic mechanism is proposed to assess whether a randomly generated value is the fittest. Three steps of the value filling process are described as follows:
Set the MF value of Q2 to 1 as the height of the triangle membership function. Accordingly, the MF value is employed as the occurrence possibility of a value (sample point), and is restricted within the range of [0, 1].
As shown in Figure 6, randomly generate a value within [L, U] based on a uniform distribution as a temporary value (tv), and then use Equation (
Graph
Graph: Figure 6. Calculating the MF value of tv.
Set the MF value (i.e. the occurrence possibility) of tv as a dynamic threshold value for testing another randomly created seed value (s), which is also within the interval [0, 1]. In this test, the tv is kept as the synthetic value (v) if the MF value of tv is higher, meaning that the tv is closer to the median and is more probably occurring, otherwise the tv should be dropped and we repeat Step 2 to generate another tv. These steps are described in Equation (
Graph
From Figure 7, one finds that, if the lower bound (L) moves from the lower fence (i.e. ) to min, the occurrence possibility of tv will increase, meaning that tv can have a higher probability of being kept as s synthetic value. On the other hand, the occurrence possibility of outliers positioned between [L, lower fence] is relatively low, meaning only a few temporary values can be kept as synthetic values.
Graph: Figure 7. Moving L from lower fence toward min makes tv relatively close to Q2 and thus increases the occurrence possibility of tv.
Sample set forming
Before forming a training sample, an assumption of independence between attributes is needed in this research. Since the size of the acquired data is small (e.g. 10 or fewer), the relations between attributes may not be completely reliable. Here, assume an obtained data set contains m attributes and n samples. For each attribute, if the expected number of synthetic training samples is N, the process in the 'Value filling' section should be repeated times, and the final training set used for learning tools should contain the n real and N synthetic samples.
The five processes in building a complete M5' model tree include numeric input attribute preprocessing, branching data to grow a complete tree, calculating a regression model in each node for pruning and prediction, pruning the tree to avoid the overfitting problem, and using a smoothing procedure to compensate for the sharp discontinuities caused by the data division. However, an unpruned M5' tree is employed in this work, because the acquirable data size is small. Therefore, we will introduce all of the processes except tree pruning in the following sections.
Input attributes preprocessing
Since a model tree requires a discrete feature space for the data splitting process, the domain spaces of numeric attributes need to be discretised first. The most commonly used unsupervised methods are equal width discretisation and equal frequency discretisation (Dougherty et al. [
Graph
and the bin bounds are constructed at , where . Therefore, we can have splitting positions of an attribute in the following data splitting process test.
However, when the size of data is small (e.g. smaller than K), the expected number of intervals may not be available, because there may be no values locating between certain bin bounds.
Data splitting
The splitting criterion is used to determine which attribute is the best to split a portion T of the training data set which reaches a particular node. It is based on the standard deviation (sd) of the output values in T as a measure of the error at the node, and the expected reduction in error as a result of testing each attribute at that node. After calculating the measure of the error for all attributes, the splitting position which has the minimum error is chosen as the branching node. This means that the error of T before splitting can be reduced to the minimum error after branching. The expected error reduction, SDR, is calculated by
Graph
where T
In order to choose the splitting position that has the maximum SDR as the branching criterion, the computation would repeat on the splitting positions of each attribute.
This step will repeat until the branching process terminates when the data size of a portion T is less than four. Here, the portion T is called an interior node of a model tree. If the size of T
Regression modelling
When the branching process terminates, a linear regression model is calculated for each interior and leaf node of the unpruned tree. A regression model takes the form
Graph
where a
Prediction smoothing
In M5', the smoothing process is used to compensate for the sharp discontinuities that will inevitably occur between adjacent linear models on the leaves of the pruned tree. An appropriate smoothing calculation is
Graph
where is the prediction passed up to the next higher node, P is the prediction passed to this node from below, q is the value predicted by the model at this node, n is the number of training samples that reach the node below and k is a smoothing constant. The smoothing process will be performed only when the model tree is used to carry out forecasting tasks.
Suppose that an obtained data set contains n samples with m attributes, and the expected number of synthetic training samples is N. The proposed procedure can be summarised in six steps, as follows:
Domain conjecture: for each of the m attributes, estimate the corresponding m set domain bounds (L, U) by using Equations (
Distribution reconstructing: rebuild m distributions for the m attributes by employing m triangle membership functions with the m sets of L, U and Q2.
Sample set generation: repeat steps 3.1 to 3.2 N times to generate the expected synthetic training samples.
For each of the m triangle membership functions, follow the three steps mentioned in the 'Value filling' section to acquire m synthetic values.
Based on the assumption of independence between attributes, combining the m synthetic values to create a training sample set.
Final training set forming: append the N synthetic training samples to the original data set to form the final training set, which contains samples.
Grow an unpruned M5' model tree: initially, set the training set as the portion T, and call it 'root'. Repeat steps 5.1 to 5.4 to grow the tree until all the 'portion Ts' meet the terminal criterion.
Input attribute preprocessing: use Equation (
For each of the splitting positions, calculate SDR with Equation (
Choose the splitting position that has the maximum SDR as the branching criterion, and then divide portion T into two subsets (i.e. T
Set the subset T
Regression models building: for each of the interior and leaf node, calculate a linear regression model with the samples stored at it.
In this paper, a problem is solved using a data set of 19 samples (see Table 1) obtained from a series of six pilot runs when a new 19" TFT-LCD product was phased-in in 2008. The best way for us to carry out our experiments is to perform a sensitivity analysis of the training sample sizes. Since there are only three or four experimental items issued in each pilot run, the training sample sizes for the sensitivity analysis are set as 3, 6, 9, 12, 15 and 18 to simulate the data collected in each pilot run stage. When a certain number of samples (e.g. six) are randomly taken from the data set without replacement as the training set, the remainder (e.g. 13) is regarded as the testing set. The system performance of each 3, 6, 9, 12 and 15 training sample sizes is measured by the average learning error of 10 experimental runs, while the training sample size of 18 is examined with a leave-one-out cross validation. The details of the proposed procedure are depicted in 10 steps with an example of six training samples:
Table 1. The 19 samples obtained from a series of six pilot runs.
The six TPEs Cell-Vernier No. 1 2.3000 1.0740 0.4460 1.6020 1.4550 0.4790 1.0913 2 0.0880 −0.6780 −0.6555 −0.1265 −0.8885 0.4444 1.5990 3 0.4945 −0.1145 −0.5975 0.4410 −0.5775 0.0684 1.1923 4 1.0700 1.2810 0.3150 1.0230 1.0590 0.4035 1.2692 5 0.9950 1.1230 1.4650 2.3870 0.6830 1.6452 1.1087 6 0.3785 −0.1740 −0.5715 −0.0625 −0.2600 0.0994 1.1971 7 1.9870 1.3950 1.4190 2.8960 1.3810 1.9795 1.1183 8 1.7470 0.9450 1.3200 2.4810 1.4500 1.2474 1.6904 9 1.5960 0.9450 1.4050 1.9930 2.0300 1.3277 0.9471 10 2.0150 1.4410 0.7010 1.8500 1.7390 1.0101 1.0731 11 1.1705 −0.0630 0.1320 0.4140 0.9410 −0.0083 1.8788 12 1.8470 0.9630 1.6510 2.1600 2.1620 1.5899 1.2962 13 0.7780 −0.9790 0.4180 0.9510 0.3370 −0.4092 0.9798 14 0.0170 0.1815 −0.2080 −0.3185 0.5185 −0.0378 1.9577 15 0.8025 −1.7490 −0.4720 −0.5025 −0.2805 0.8255 1.2865 16 1.8330 1.4010 1.1100 2.3400 1.1770 1.5551 1.0894 17 0.5370 0.6340 1.7250 2.3160 1.0940 1.0937 1.3663 18 1.4090 −0.0340 0.6490 0.7400 1.5150 −0.0221 1.4135 19 −0.2560 −1.8570 −0.2460 −1.1275 −0.5855 0.4568 1.5067
Randomly select six samples from Table 1 as the training set to simulate the scenario that the data are obtained when the second pilot run is finished. The six selected samples are listed in Table 2. Note that when the training sample size is 18, each sample will be taken out to form the testing set and the others are regarded as the training set in turns for 19 validation runs.
Table 2. The six training samples randomly selected from Table 1.
The six TPEs Cell-Vernier No. 2 0.0880 −0.6780 −0.6555 −0.1265 −0.8885 0.4444 1.5990 5 0.9950 1.1230 1.4650 2.3870 0.6830 1.6452 1.1087 9 1.5960 0.9450 1.4050 1.9930 2.0300 1.3277 0.9471 10 2.0150 1.4410 0.7010 1.8500 1.7390 1.0101 1.0731 14 0.0170 0.1815 −0.2080 −0.3185 0.5185 −0.0378 1.9577 18 1.4090 −0.0340 0.6490 0.7400 1.5150 −0.0221 1.4135
For each of the seven attributes, calculate the IQR using Equation (
Graph
The results are shown in Table 3.
Table 3. The L and U of each attribute.
Q1 0.3148 0.0199 0.0063 0.0901 0.5596 0.0945 1.0820 Q2 1.2020 0.5633 0.6750 1.2950 1.0990 0.7273 1.2611 Q3 1.5493 1.0785 1.2290 1.9573 1.6830 1.2483 1.5526 IQR 1.2345 1.0586 1.2228 1.8671 1.1234 1.1538 0.4706 −1.5370 −1.5681 −1.8279 −2.7106 −1.1254 −1.6361 0.3761 3.4010 2.6664 3.0631 4.7579 3.3681 2.9790 2.2586
For each of the seven attributes, draw an asymmetric triangle membership function based on the corresponding L, U and Q2. In this example, the triangle membership function of attribute Y1 can thus be drawn as Figure 8.
Graph: Figure 8. The triangle membership function of the attribute Y1.
For each of the seven attributes, follow the three steps mentioned in the 'Value filling' section to acquire seven synthetic values. For instance, if a temporary value (tv) of Y1 is generated as −0.3503, its corresponding MF value calculated using Equation (
Table 4. The results of each attribute's synthetic value.
Attribute value −0.3503 0.8949 2.0276 0.2361 2.4047 −0.0625 1.1639
Combine the values in Table 4 to generate a synthetic training sample. For example, if 100 synthetic training samples are expected to be created, steps 4 to 5 should be repeated another 99 times.
Append the generated training samples to the six real samples to form the final training set, which has 106 samples.
The effectiveness of the proposed sample generation procedure can be shown in Figure 9. The distribution of attribute Y1 of the six real samples is given in Figure 9(a), while that of the 106 training set is in Figure 9(b). Initially, the distribution of Y1 appears to be fragile and crisp, and fills the gap between values. Obviously, it is quite difficult to obtain more information from the distribution in Figure 9(a). In contrast, the structure of the distribution in Figure 9(b) is more complete. According to our research, when sample size increases the central tendency of a rebuilt distribution should be more significant. And the result shown in Figure 9(b) represents the effectiveness of the proposed heuristic mechanism. In addition, as the expression mentioned in 'Input attributes processing' section, the distribution of the 106-sample training set can afford more effective bin bounds (i.e. splitting positions) for the tree-growing process. Therefore, the proposed procedure strengthens the knowledge acquisition capability of the M5' model tree.
Graph: Figure 9. The distribution of attribute Y1 of (a) the six real samples; (b) the 106 training set.
Take the final training set to build an M5' model tree. In order to build an M5' model tree, one may follow the steps mentioned in Section 3.3 or use WEKA data mining freeware, which can be downloaded from the official website (
Graph: Figure 10. The learning result of an unpruned M5' model tree built with the 106 final training samples.
Take the rest of the 13 samples in Table 1 as a testing set to validate the forecasting model built in step 7, and then use Equation (
Graph
where M is the sample size of the testing set, and and are the actual and predicted values of the ith testing sample, respectively.
In the example, each real Cell-Vernier (Y) and the corresponding predicted value () of the testing set (13 samples) are shown in Table 5, with an MAPE of 13.39%. It takes approximately five minutes to build a forecasting model, including calculating the MAPE from steps 1 to 8 with the six training samples using the proposed procedure on a Pentium 4 3.0GB computer with 2GB Ram. If the procedure can be programmed, the whole process can be done in a minute at most. This processing time is acceptable for most TFT-LCD manufacturers.
Table 5. The real Cell-Vernier (Y), the predicted values (), and the related error percentages of the testing set.
No. 1 1.0913 1.2320 12.89% 3 1.1923 1.1900 0.19% 4 1.2692 1.2160 4.19% 6 1.1971 1.1870 0.84% 7 1.1183 1.2000 7.31% 8 1.6904 1.1990 29.07% 11 1.8788 1.1890 36.71% 12 1.2962 1.3320 2.76% 13 0.9798 1.1740 19.82% 15 1.2865 1.0970 14.73% 16 1.0894 1.2260 12.54% 17 1.3663 1.2280 10.12% 19 1.5067 1.1610 22.94%
Repeat steps 1 to 8 ten times and calculate the average MAPE of each training sample size using Equation (
Graph
where is the result of the jth experimental run, and k is the total experimental run set in each training sample size, and here is set as 10. Note that, when the training size is 18, k is then set as 19.
Repeat steps 1 to 9 with different sample sizes of training set.
In this section, unpruned M5' model trees built with the real and final training sets, respectively, are compared with regard to their predictive accuracy, knowledge learned, and the practical validation of the 19" TFT-LCD product obtained in the mass production stage.
The results of the sensitivity analysis of training sample sizes are represented in Table 6 and Figure 11, where 'M5" and 'this study' denote the unpruned M5' model trees built with the real and final training sets, respectively.
Graph: Figure 11. The trend of the average MAPE in each training sample size.
Table 6. The detailed computational MAPE of each training sample size with the corresponding runs.
Training sample size Method Experimental run Average Standard deviation P-value (two-sided) 3 M5' 32.05% 23.85% 17.50% 24.41% 17.01% 26.30% 16.26% 54.04% 29.27% 19.06% 25.98% 11.24% 0.003 This study 24.49% 21.80% 16.55% 21.70% 15.68% 25.32% 13.65% 49.20% 20.35% 15.50% 22.42% 10.22% 6 M5' 22.33% 17.51% 14.72% 19.07% 17.04% 22.47% 16.28% 21.57% 18.87% 27.03% 19.69% 3.67% 0.012 This study 15.70% 15.66% 13.72% 16.56% 16.66% 15.11% 15.40% 12.47% 17.59% 13.39% 15.23% 1.60% 9 M5' 24.09% 16.61% 23.27% 28.59% 18.91% 17.87% 15.69% 16.54% 17.67% 16.61% 19.59% 4.27% 0.005 This study 15.12% 14.24% 15.18% 14.75% 15.90% 14.39% 12.68% 16.10% 13.17% 15.42% 14.70% 1.11% 12 M5' 11.44% 20.04% 17.96% 18.07% 15.29% 20.06% 18.00% 24.28% 17.74% 16.13% 17.90% 3.36% 0.001 This study 10.15% 15.98% 10.91% 17.48% 12.86% 15.04% 15.70% 14.12% 14.41% 11.03% 13.77% 2.45% 15 M5' 19.43% 10.25% 26.92% 39.15% 18.42% 11.47% 12.49% 12.96% 13.61% 14.31% 17.90% 8.95% 0.003 This study 11.29% 5.08% 15.35% 19.37% 12.27% 10.23% 9.33% 8.35% 8.69% 11.27% 11.12% 3.96% 18 M5' 14.02% 0.54% 8.63% 0.60% 7.21% 8.56% 33.43% 0.00% 27.78% 53.27% 16.74% 14.92% 0.001 23.22% 31.87% 2.15% 11.24% 16.76% 41.50% 14.07% 16.63% 6.49% This study 7.33% 0.18% 2.75% 0.60% 0.35% 1.47% 22.13% 0.08% 21.45% 17.14% 6.99% 7.70% 8.27% 20.94% 0.59% 8.39% 0.94% 6.23% 6.15% 2.87% 4.96%
Generally, the acceptable error (MAPE) between a prediction and the real Cell-Vernier value is expected to be within 10%. When the training sample size increases, the two learning results (AvgMAPE) decrease, although it is difficult to achieve the goal (within 10%) by using the M5' model tree built with the real training sets. However, the results (AvgMAPE) of the M5' model tree built with the final training sets reveal fewer learning errors, and thus the forecast system is better and more stable.
Table 6 also shows that, when the training sets contain the synthetic training samples generated by the proposed procedure, a lower and more stable standard deviation of MAPE is observed. In addition, a paired two-sided t-test is employed to test the significance of the variation between the pairs of measure indicators, and all the statistically significant values (P-values) support the final training sets that contain the synthetic training samples at an alpha level of 0.05. This is proof that the proposed procedure can build a statistically useful and robust model.
Since the process engineers need to obtain more information about the criteria range of the six TPEs, the M5' model tree can archive this. The M5' model tree can represent the learned knowledge with causal tree-based classification rules, and has precise predictive capability. However, the training sample size is always a key factor for the M5' model tree in determining knowledge acquisition capability.
We use the example (mentioned in Section 4) that has only six training samples to further illustrate this. When an unpruned M5' model tree is used to extract the knowledge hidden in the real training set, only two classification rules can be obtained (as shown in Figure 12). On the other hand, more information with 47 rules can be derived from the final training set (shown in Figure 10). From Figure 10, the process engineers can thus learn more about the criteria range of the six TPEs to reduce the potential costs in the cell process. This is further proof that the proposed procedure can build an informative and robust model.
Graph: Figure 12. The learning result of an unpruned M5' model tree built with the real training set (six samples).
When the six pilot runs of the 19" TFT-LCD product were completed, the first 1000 issued products in the mass production stage were taken to form a validation dataset. The 19 samples collected in the six pilot runs are used to build two unpruned M5' model trees: one is learned with the 19 samples; while the other is learned with the 19 samples and 100 synthetic training samples. The predicted results of the two models are summarised in Table 7, and the MAPE are 28.89 and 8.06%, respectively. Obviously, the model built with the training set containing the synthetic training samples can still maintain good predictive capability in the future. Figure 13 provides the cumulative frequency charts of the results for the two models.
Graph: Figure 13. Results of the 1000 samples using an unpruned M5' model tree built with (a) the real training set (19 samples) and (b) the final training set (119 samples).
Table 7. Actual and two kinds of predicted values of the 1000 samples.
No Actual M5' This research No Actual M5' This research ... No Actual M5' This research No Actual M5' This research 1 0.891 1.196 0.967 101 0.971 1.319 1.068 ... 801 1.053 1.387 1.055 901 1.055 1.417 1.051 2 1.027 1.212 1.036 102 0.96 1.392 0.939 ... 802 1.098 1.285 1.476 902 1.163 1.349 1.075 3 0.974 1.381 1.184 103 1.024 1.394 0.967 ... 803 1.089 1.38 1.074 903 1.051 1.297 1.057 4 1.023 1.391 1.031 104 0.986 1.289 0.989 ... 804 1.009 1.372 1.157 904 1.032 1.365 1.024 5 0.941 1.269 1.019 105 1.038 1.435 1.487 ... 805 0.967 1.427 1.058 905 1.154 1.323 1.099 6 1.078 1.279 1.016 106 1.049 1.313 1.047 ... 806 1.025 1.374 1.077 906 1.102 1.34 1.09 7 0.94 1.219 0.958 107 1.166 1.337 1.067 ... 807 1.101 1.352 1.071 907 1.047 1.365 1.097 8 1.136 1.359 1.024 108 1.125 1.325 1.065 ... 808 1.051 1.365 1.42 908 1.149 1.444 1.034 9 1.119 1.352 1.002 109 1.101 1.385 1.211 ... 809 1.01 1.432 1.377 909 1.039 1.36 1.071 10 1.086 1.431 1.095 110 0.987 1.357 1.089 ... 810 1.101 1.432 1.119 910 1.009 1.328 1.011 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 91 1.165 1.345 1.061 191 1.168 1.381 1.319 ... 891 1.15 1.386 0.975 991 1.03 1.343 1.029 92 1.106 1.309 1.05 192 0.984 1.364 1.134 ... 892 1 1.317 1.09 992 1.066 1.272 1.013 93 0.976 1.316 1.057 193 0.974 1.363 1.059 ... 893 0.981 1.375 1.071 993 1.05 1.416 1.093 94 1.054 1.337 1.07 194 0.964 1.364 1.079 ... 894 0.969 1.278 1.201 994 1.101 1.321 1.054 95 1.044 1.283 1.074 195 1.103 1.352 1.088 ... 895 0.974 1.436 1.151 995 1.147 1.275 1.027 96 1.026 1.318 1.044 196 1.024 1.361 1.027 ... 896 1.106 1.445 1.092 996 0.962 1.378 1.036 97 1.016 1.394 1.005 197 1.142 1.409 1.004 ... 897 1.121 1.399 1.047 997 1.122 1.411 1.132 98 1.098 1.434 1.156 198 1.061 1.282 1.016 ... 898 1.03 1.37 1.012 998 1.156 1.426 1.191 99 1.101 1.373 1.12 199 1.135 1.333 1.033 ... 899 1.073 1.292 1.05 999 1.055 1.295 1.024 100 1.073 1.396 1.042 200 1.056 1.33 1.466 ... 900 0.972 1.375 1.09 1000 0.991 1.405 1.147
Since the goal of the acceptable MAPE between a prediction and the real Cell-Vernier value is expected to be within 10%, from Figure 13(a), we can find that only two of the 1000 samples can reach the goal for the model built with the 19 samples, while 754 of the 1000 samples (shown in Figure 13(b)) can achieve the goal with our proposed procedure. This result also reveals that the proposed procedure can provide a practically robust model for predictive purposes.
In the optoelectronics manufacturing industry, if the value of the Cell-Vernier on a panel is out of specification after the Cell process, the panel will be scrapped. How to predict the Cell-Vernier using the six TPEs calibrated in the CF process in advance to avoid unnecessary costs and delays in delivery time is thus an important issue for TFT-LCD manufacturers. The proposed procedure makes the predictions more precise and more learned knowledge, i.e. the classification rules, can be extracted for engineers. If TFT-LCD manufacturers follow the procedure outlined in this study, it is expected that the inventory turnover rate will rise and thus costs can be reduced. The storehouse utilisation rate will also improve.
There have been few studies of small samples in this context to date, and potential exists to derive more comprehensive theories to obtain a higher rate of accuracy. In addition, taking dependency conditions between attributes into consideration when generating synthetic values is also considered a worthy subject in the future.
By Der-Chiang Li; Chien-Chih Chen; Che-Jung Chang and Wen-Chih Chen
Reported by Author; Author; Author; Author