Forest species classifications are becoming increasingly automated as advances are made in machine learning. Complex algorithms can reach high accuracies, but are not always suitable for small-scale classifications, which may benefit from simpler conventional methods. The goal of this classification was to identify contiguous stands of ponderosa pine (Pinus ponderosa Douglas ex Lawson) against a mix of forest and non-forest background in the southern Willamette Valley, Oregon. The study area is approximately 816,600 ha, considerably larger than most study areas used for presenting techniques for tree species classification. To achieve the objective, we used two classification procedures, one parametric and one non-parametric. For the parametric method, we selected the maximum likelihood (ML) algorithm, whereas for the non-parametric method we chose the random forest (RF) algorithm. To identify ponderosa pine, we used 1 m spatial resolution red-green-blue-infrared (RGBI) aerial images supplied by the U.S. National Agriculture Imagery Program (NAIP) and 1 m spatial resolution canopy height models (CHMs) provided by the Oregon Department of Geology and Mineral Industries (DOGAMI). We tested four data variations for each method: Aerial imagery, CHM-masked aerial imagery, aerial imagery with an additional CHM band, and CHM-masked aerial imagery with a CHM band. The parametric classifications of aerial imagery alone reached an average kappa coefficient of 0.29, which increased to 0.51 when masked with CHM data. The incorporation of CHM data as a fifth band resulted in a similar improvement in kappa (0.47), but the most effective parametric method was the incorporation of CHM data as both a fifth band and a post-classification mask, resulting in a kappa coefficient of 0.89. The non-parametric classification of aerial imagery achieved a mean validation kappa coefficient of 0.85 collectively and 0.90 individually, which only increased by approximately 0.01 or less when the CHM masks were applied. The addition of the CHM band increased the kappa value to 0.91 for both individual and collective tile classifications. The highest kappa of all methods was achieved through five-band non-parametric classification with the addition of the CHM band (0.94) for both collective and individual classifications. Our results suggest that parametric methods, when enhanced with a CHM mask, could be suitable for large-area, small-scale classifications based on RGBI imagery, but a non-parametric classification of fused spectral and height data will generally achieve the highest accuracy for large, unbalanced datasets.
Keywords: tree species classification; binary maps; maximum likelihood; random forest; canopy height model
Remote sensing has become an indispensable tool in forest management, due to the spatial scales of forested land and management timelines. The use of satellite or aerial imagery in land cover analysis enables land managers and planners to make landscape level decisions quickly and accurately. This is crucial for managers who work with large tracts of land, or researchers performing small-scale analyses. Advances in remote sensing introduce increasingly fine-resolution datasets, both spatially and spectrally. This allows more precise measures of variations across the landscape, such as forest species composition. Fortunately, many of these datasets are freely available for landscape analyses.
The relatively recent availability of sub-meter spatial resolution introduced new opportunities for precise measurements and predictions of land cover, allowing managers to identify land classes at a very fine scale. With the introduction of such high spatial and spectral resolution imagery, forest vegetation classification can be performed at species-level precision [[
Most recent studies have made use of non-parametric algorithms for tree species classification, particularly support vector machine and random forest (RF) [[
As data quality progresses and improves, so do the classification techniques performed upon them. Many of the advances of the past few decades are concentrated in machine learning, the ultimate objective of which is to achieve or surpass the pattern recognition capacity of a human [[
When large swaths of land are captured by a sensor, the rich spectral information used for land-cover classification can be challenging for classes that covers only a fraction of the image. Therefore, we considered an unbalanced image one that contain classes with a number of pixels three orders of magnitude smaller than the rest of the classes [[
Many factors influence the suitability of parametric and non-parametric algorithms for vegetation classification. Species-level studies vary in study area, quantity and precision of classes, model complexity, time constraints, and desired product. The decision to use a parametric or non-parametric algorithm can be influenced by all of these factors. For species classifications that target one class of vegetation, unbalanced data, as discussed above, can introduce another challenge in selecting an appropriate algorithm.
Species classifications require algorithm training, and for large study areas that are broken into dozens or even hundreds of rasters, this can be time consuming. When building an unbalanced classification, analysts may attempt to create a representative sample of class training data in which the target species is known to occur in greater quantities, to classify a large assortment of rasters with unknown target species abundance. For datasets that are unequally distributed across the study area, creating a representative set of training data can be particularly time consuming. Non-parametric algorithms, such as decision trees or machine learning classifiers, are often selected for such tasks, as they are better equipped to handle unbalanced data through ensemble mechanisms such as bagging or boosting [[
The Willamette Valley is located in northwestern Oregon (Figure 1), a region composed of agriculture, developed urban areas, and temperate forest. Douglas-fir (Pseudotsuga menziesii) dominates the majority of surrounding forests of the Willamette Valley, often growing with western hemlock (Tsuga heterophylla), western redcedar (Thuja plicata), and deciduous trees such as bigleaf maple (Acer macrophylla) and red alder (Alnus rubra). While these species are among the most common today, this was not the case prior to European settlement of the region.
Historically, indigenous communities burned large areas of the valley to maintain an abundance of edible plant species and to maintain visibility for hunting. The routine burning minimized the growth of trees and hardwood shrubs, creating an abundance of open prairies. Surveyors noted that Willamette Valley Ponderosa Pine (WVPP) typically grew within white oak (Quercus garryana) woodlands, shallow hills and ridges, areas prone to flooding, and in open savannahs with white oak and Douglas-fir [[
In 1996, a group of concerned landowners and researchers formed the Willamette Valley Ponderosa Pine Conservation Association (WVPPCA) with the intention of maintaining the genetic diversity of WVPP and increasing the ease of acquiring and growing the pine locally [[
The Willamette Valley is an alluvial region in northwestern Oregon of about 14,457 km
The center of the valley is primarily developed urban and agricultural land, and encompasses several large cities, including Portland, Salem, and Eugene. Nutrient-rich soil and relatively flat topography (sea level to 122 m) makes the Willamette Valley a significant agricultural area. The Valley contains several prominent rivers, including the Columbia, Willamette, Tualatin, McKenzie, Santiam, and others. The perimeter of the valley rises into the foothills of the Coast and Cascade Ranges, where dense forests take over the landscape. Most WVPP has been planted at the interface of rural urban areas and commercial forestland.
High-resolution red-green-blue-infrared (RGBI) aerial imagery was acquired from the National Agriculture Imagery Program (NAIP), which is flown every two to three years in all continental states. The data were collected using Leica Geosystem's ADSH100/SH100 digital sensors, flown on a Cessna 441 at a height of about 8400 m. The sensors recorded imagery at 40cm or 80cm GSD in four wavelengths, namely red, green, blue, and near-infrared. Images were captured continuously at three angles: Nadir, forward (26°), and backward (19°). Flight lines were oriented north/south with 30% sidelap.
The data were processed to contain four RBGI bands with a ground pixel resolution of 1 m. The imagery was delivered as georeferenced digital ortho-quarter quad tiles (DOQQ), each tile covering one USGS topographic quadrangle. The Willamette Valley ecoregion contained 507 orthophotos collected in June and August of 2016 with no cloud cover. Two hundred and thirty-one of these raster images fell within the focus area of the southern Willamette Valley. However, only a small fraction contains Ponderosa pine that could be reliable identified from the images. Therefore, we have focused the study on five DOQQ tiles, which covered an area of 69,722 hectares.
The training data was partially created through known locations of WVPP, but most sites were identified by visually scanning aerial imagery for WVPP-specific characteristics, such as color, crown structure, and planting patterns. This process was facilitated by a false color composite of the NAIP imagery and very-high-resolution satellite reference imagery. Across the used dataset, 12.8 ha of WVPP training regions were identified and used. An additional two or three classes were used in parametric classification to hone the pine training areas, namely non-pine coniferous forest and low vegetation classes were used for all classifications, and a deciduous class was used in riparian-dominated regions. For non-parametric classifications, only two classes were used (pine and non-pine). The training data for the additional classes were created using visual assessment of the NAIP imagery, as they were less spectrally precise than the WVPP regions.
Lidar-derived highest hit digital surface models (DSMs) and bare earth digital elevation models (DEMs) of the focus area were acquired from the State of Oregon Department of Geology and Mineral Industries. The majority of data used was collected in the 2009, with some additional data from the 2012, 2015, and 2016 [[
Watershed Sciences, Inc. collected and processed the lidar data for the 2009 projects. The data were flown using a Leica ALS50 Phase II laser system mounted in a Cessna Caravan 208. Data were collected at near nadir, and flight lines overlap 60% to 75% laterally. The average pulse density in 2009 was 8.14 pulses per m
Estimated species cover datasets have been created for western Oregon and surrounding regions in the Pacific Northwest, such as the gradient nearest neighbor (GNN) species map created by the Landscape Ecology, Modeling, Mapping, and Analysis (LEMMA) group. LEMMA's species map provides a continuous raster image of estimated plant community composition over large landscapes [[
The inherently mixed species pixels of this dataset limit its usefulness in training or validating a single-species classification map, especially a species such as WVPP, which is present in small quantities within a very fragmented matrix of agricultural, forested, and developed land covers, and often occurs in very small stands that may be lost in the 30 m resolution of the GNN map. However, in future studies of unbalanced species mapping, a GNN species map may be very useful in providing a preliminary filter that can mask out unsuitable vegetation communities in large study areas.
Initial attempts to run an ML classification on the original NAIP aerial imagery resulted in a high level of noise and class blending. The high spatial resolution of the imagery led to wide spectral variation within each target class, causing substantial overlap between the vegetation classes. To decrease spectral variability within the classes, a smoothing filter was applied to each image before classification. This filter came from the Orfeo Toolbox "Mean-Shift Smoothing" algorithm, which draws spatial and spectral information from neighboring cells within given radii to produce an image that preserves integral edges [[
The original NAIP imagery data had limited class separability potential. Although the spatial resolution was very high—so high that the smoothing filter was needed—the spectral resolution was limited to four bands. Many tree species classification studies make use of multispectral or hyperspectral data with much higher spectral resolution [[
In an additional test of the effectiveness of CHM data in vegetation class separation, we created CHM-based height masks to apply post-classification. Our goal was to reduce commission of agricultural and mature conifer regions. The CHM rasters were reclassified, with values between 5 feet and 90 feet assigned a value of one, and all other heights assigned a value of zero. The minimum value was selected to exclude areas of low, non-forest vegetation while preserving young WVPP stands, and the maximum value was selected to exclude mature, non-pine conifer stands. This range exceeds the expected height of most planted WVPP stands, because most stands targeted by this study are under 30 years old as a result of the conservation efforts of the 1990s [[
In a preliminary attempt to increase class separability, we tested the impact of incorporating two vegetation indices as fifth bands: The normalized difference vegetation index (NDVI) and the visible-band difference vegetation index (VDVI), similar to Wang Xiaoqin et al. [[
(
(
The NDVI and VDVI are calculated with pixel values and do not rely on sensor-specific coefficients, such as the enhanced vegetation index. The NDVI and VDVI are extensively used in unmanned aircraft systems (UAS), which do not experience significant atmospheric interference, and often only collect visible band imagery, due to the wide availability of high spatial resolution cameras [[
To identify the WVPP using the parametric method, we chose three algorithms based on their popularity in image analysis: ML [[
The ML algorithm distributes pixels among the classes based on the probability of class membership. The version of ML used in the study uses non-normalized discriminant function g
(
- i = class
- x = n-dimensional data (where n is the number of bands)
-
p(ω
i ) = probability that class ωi occurs in the image and is assumed the same for all classes -
|Σ
i | = determinant of the covariance matrix of the data in class -
Σ
i −1 = its inverse matrix -
m
i = mean vector
To allocate a pixel to a class, the discriminant functions are compared with the threshold, which leads to the inequality:
(
where T
The rearrangements of terms from (
(
In eventuality that x is normally distributed,
(
The ML classifications were performed using the ENVI implementation of the ML algorithm, which was applied on the enhanced NAIP imagery with and without the CHM mask on the four-band spectral NAIP data and the five-band NAIP + CHM data. Therefore, a set of four ML classifications were executed (Figure 2).
The ML classifier is not the only algorithm that provides accurate results for image analysis [[
Mirroring the ML method, we performed four RF variations within the non-parametric classification method. The 4-band NAIP and 5-band NAIP + CHM rasters were classified in R, and the CHM binary mask was applied to each of these (Figure 2).
Irrespective of the classification approach (i.e., parametric or non-parametric), the NAIP images were proven to provide superior results than classifications applied on images with lower spatial resolution [[
The NAIP images used for assessment contain less than 0.02% of identified ponderosa pine. Because of the relatively low abundance of WVPP in naturally or artificially mixed stands, and the difficulty in validating the accuracy of a classified mixed stand, only plantations that would provide an uninterrupted signature of WVPP were considered for training and validation. Across the 69,722 ha of the area, all of the 12.8 ha of pine pixels were used to train and validate the classifiers, with a randomized 7:3 ratio of training to validation pine stands. For the ML parametric classifier, we randomly selected non-pine pixels covering 86.3 ha. Because of the reduced, unbalanced presence of the pine in the study area, there were no assessment data outside the 12.8 ha, meaning that the accuracy was carried out on the validation data. For each tile and classification method, a set of three randomizations were executed, which produced 243 combinations, i.e., 3
We assess the performances of the nonparametric classifier RF with a 5-fold cross-validation approach, similar to Goetz et al. [[
All classifications were assessed using overall accuracy, kappa coefficient, and pine commission and omission error (Equations (
(
(
(
(
where:
- TP = number of pixels with pine predicted with pine (true positive)
- TN = number of pixels without pine predicted without pine (true negative)
- FP = number of pixels without pine predicted with pine (false positive)
- FN = number of pixels with pine predicted without pine (false negative)
- All = total number of pixels: All = TP + TN + FP + FN
To achieve the best smoothing results in Orfeo Toolbox, we used a spatial radius of four pixels, a range (spectral) radius of 256 DN values, and a 0.1 mode convergence threshold. Iterations were capped at 100. These values were settled upon through trial and error. The spatial radius was selected for its ability to smooth stand canopies while preserving the edges of larger features. The range radius value impacted trial classifications to a lesser degree, so a relatively large value was selected to ensure spectral consistency within features. The mode convergence and max iteration values were kept at default.
We found that incorporating the CHM data as a fifth band generated greater accuracies for both parametric and nonparametric methods in this preliminary test. However, the benefit was not consistent between both classification methods. Adding the CHM band increased overall accuracy from 62.8% to 65.2% and kappa coefficient from 0.48 to 0.52 for ML, but for the RF algorithm, the CHM band increased overall accuracy from 86.0% to 96.3% and kappa coefficient from 0.46 to 0.75 (Table 1).
While the addition of a CHM band was most pronounced in the non-parametric method, the application of a CHM mask seemed most effective in the parametric method. The preliminary mask application increased the parametric overall accuracy from 62.8% to 92.4% and kappa coefficient from 0.48 to 0.60 (Table 2). The effect of the mask on the non-parametric method was not as impressive as was on the parametric method; overall accuracy rose from 86.0% to 90.2%, and the kappa coefficient rose from 0.46 to 0.49 (Table 2).
The large increase in pine omission to RF following the application of the CHM mask was initially worrisome, but the likely cause of the increase is canopy gap lidar returns. Any pixel falling within the pine class validation polygon was flagged as "true pine" in the accuracy assessment, but the high resolution of the CHM resulted in canopy gap ground returns being classified as "non-pine" after masking, because the pixel had a true height of less than five feet. This was flagged as an error in the accuracy assessment, but it should not be considered an innate shortcoming of the RF classification, because oftentimes the spectral classification was correct, and was later changed to non-pine because of the ground return. This could be remedied by adjusting the mask parameters.
The incorporation of NDVI and VDVI as bands did not improve the classification accuracy by an appreciable margin during preliminary testing. The addition of an NDVI band did improve overall accuracy by 2.4% and the kappa statistic by about 0.02 (Table 3), and was considered insufficient to justify processing and adding NDVI bands for all rasters. VDVI decreased overall accuracy by 1.9% and kappa by 0.02. In all instances of index incorporation, pine omission was decreased by several percentages, but this improvement came at the cost of pine commission.
After considering the results of the CHM and vegetation index band additions, we concluded that the parametric and nonparametric methods would benefit from the incorporation of CHM data through pixel-level fusion and post-classification masking, and that the inclusion of vegetation index bands would not appreciably improve the results of either method. Therefore, the subsequent analysis was executed without including NDVI and VDVI.
Of the three parametric algorithms, ML most accurately classified the known WVPP stands present in the sample region (Table 4). Although it did not attain the highest overall accuracy, it classified the pine class with the highest producer accuracy by 13% and achieved a user accuracy consistent with the other two algorithms. SAM correctly identified most pine areas but had a very high rate of commission error in both non-pine classes; its relatively high overall accuracy was inflated by high accuracies in the non-pine classes. MD was unable to distinguish the WVPP stands from certain agricultural crops and mixed conifer stands. Some of the most prominent WVPP stands were a challenge for MD to classify.
In addition to the SAM results listed above, which used individual training datasets, the SAM trial with the spectral library training dataset performed extremely poorly at 24.8% overall accuracy with a 0.095 kappa coefficient. Pine was massively overrepresented on the landscape, yet was poorly predicted in the true stand locations, with a commission rate of 36.24% and an omission rate of 42.14%. It was this finding that affirmed our decision to use ML for the parametric classification, in which each image had its own finely tuned training dataset.
The parametric classification involved multiple iterations, as new training areas were identified or unreliable training areas removed. This process produced higher accuracies, as additional adjustments were implemented on the training areas. The most effective ENVI class threshold was 0.05 for all classes; a threshold of 0.1 or higher resulted in very high commission error, while thresholds lower than 0.05 resulted in a high level of unclassified areas, likely causing omission error.
The ML classifications produced a broad range of accuracies across the various combinations of spectral and CHM data. The simple four-band aerial imagery classification produced a fairly low-accuracy image at 67.73% overall accuracy and 0.29 kappa (Table 5). Pine commission was very high at 47.48%, and pine omission was at its highest at 13.59%. The addition of the CHM band increased the overall accuracy and kappa coefficient of the parametric method substantially. Overall accuracy increased by about 15%, and the kappa coefficient increased by 0.22. Pine omission was nearly unchanged, but the application of the CHM mask removed nearly 30% of pine class commission (Table 5).
The classification of the five-band images, which incorporated the CHM band, reached higher accuracies on average. The un-masked five-band images were classified with a 78.08% overall accuracy and a kappa of 0.47 (Table 5). Pine omission was a few percentage points lower than the four-band classifications, but pine commission saw an even greater decrease to an average of 6.42%. The application of the CHM mask increased overall accuracy by nearly 20% and nearly doubled the kappa coefficient. Pine omission was nearly unchanged, but pine commission nearly disappeared, achieving an average of just 0.83% (Table 5). A visual representation of the five-band masked classification can be seen in Figure 3.
The non-parametric classifications were conducted in the same four combinations as the parametric classifications, with the addition of one extra variation. Each tile was classified once using only its spatially related training data (individual classification—Table 6) and once using training information from all five images (collective classification—Table 7).
Overall accuracy for all RF classification variations exceeded 90%, with kappa values ranging between 0.85 and 0.94. In most cases, exclusively using training data that were spatially related to the raster gave better results, but only by at the most 0.05 kappa (Table 6). The collective classifications only showed superior results in the five-band masked classification (Table 7).
The application of a CHM mask did not benefit the four-band classification, and in fact reduced accuracy by a very small margin. When applying the mask to the five-band classification, accuracy increased by 0.03 kappa. Both pine omission and pine commission were reduced in the five-band masked classification (Figure 4).
The addition of height data to the NAIP aerial imagery at both pixel level (included in classification) and decision level (included post-classification) improved classification accuracies across all variations of parametric and non-parametric methods. The incorporation of vegetation indices had minimal effects upon the accuracy of the preliminary ML classification, and in fact reduced overall accuracy and kappa for two of the three index combinations. The number of indices that were available for use were extremely limited, as EVF and other indices requiring band coefficients have not yet been established for NAIP imagery. NDVI and VDVI use simple band algebra to emphasize differences in vegetation reflectance. Such indices can be valuable in reducing excess spectral information and exaggerating differences between classes, but when classified in combination with all original input bands, there is little to no new information added to the pixel stack. The minor improvement in overall accuracy offered by the NDVI band did not provide sufficient evidence for its usefulness in class separation, and the decrease in accuracy among the other index combinations supported our decision not to include these indices further.
The four-band ML classification on average had the lowest accuracy of all classification methods (67.73% overall accuracy and 0.29 kappa coefficient). This low average can be attributed to several influences. The biggest challenge for the four-band classification is the difficulty to represent the pine class fully, with its spectral variations introduced by age and tree heath, while also excluding all other types of vegetation. Additionally, each raster contains varying quantities of pine, exacerbating the difficulty in adequately representing the pine class within each raster when few stands are available to be sampled. Images with fewer pine stands available, consequently training data, generally produced worse results.
The accuracy of the four-band classification was greatly improved by applying the CHM mask, primarily by its removal of pine commission in the non-forest vegetation class. By masking the four-band ML classified images, mean pine commission was reduced by about 29%, resulting in an overall accuracy of 83.14% and kappa coefficient of 0.51. Visually, the similarities between WVPP stands and certain agricultural crops are stronger than the similarities between WVPP stands and other deciduous or coniferous forest species, making the lower height mask of five feet particularly effective.
Incorporation of the CHM band to create the five-band had a similar, but slightly weaker effect on the ML classification in terms of average overall accuracy and kappa coefficient. However, pine class commission was reduced by a substantial 41%, while pine class omission was reduced by about 3%. Masking cannot truly affect pine omission, as the mask does not add any pixels excluded from the pine class. Any decrease in pine omission from masking is the result of masking a different iteration of the four-band or five-band classified image. This large decrease in pine commission suggests that the ML algorithm was successfully able to remove non-pine areas that matched the spectral signature but did not match the average height of the pine training areas. Some of these non-pine areas that were ultimately included may have been included at the fringe of the class parameters, lacking a hard limit in height parameters. There may have been points within the pine training data that fell outside the 5 to 90-foot limit, creating leeway in height discrimination.
While the generalized accuracy metrics suggest that a mask application is more effective in isolating the pine class, the substantial decrease in pine commission in the five-band classification compared to the mask's effect on commission reinforces the impact of the unbalanced data. Because the non-pine class is an amalgam of every single land cover other than pine, its less impressive accuracy lowered the overall accuracy and kappa of the five-band classification, while the mask automatically transferred any pixels outside of height range to the non-pine class, boosting its accuracy automatically.
The steady improvement in accuracy throughout the ML methods suggests that neither incorporation of CHM data should be ruled out in a parametric classification. Certainly, a combination of spectral and height bands is superior to an entirely spectral classification, but the application of a decision-level height mask helps correct excess commission error within a binary classification, even when the same data are incorporated at the pixel level. This is particularly visible in the jump from 0.47 to 0.89 kappa coefficient between the unmasked and masked five-band ML classification.
The four-band RF classification applied to individual images surpassed the accuracy of the four-band ML classification by nearly 30%, achieving an average overall accuracy of 95.63% (Table 6). This incredible increase in accuracy speaks to the effectiveness of non-parametric algorithms in classifying pure spectral data. While the four-band RF classification was quite impressive, increases in accuracy were practically inexistent with the addition of height data, as opposed to the dramatic increases seen between ML methods.
The application of the height mask to the four-band classification was effectively null, while the incorporation of the CHM band as the five-band raster improved accuracy by a scant 0.33% (0.01 kappa). Pine omission and commission were likewise weakly improved. A slightly larger, but otherwise negligible improvement was made by masking the five-band classification.
We found that better accuracy was achieved when classifying images with their own spatially related training data (individual classification) as opposed to a collective classification (Table 7). This is likely due to the spectral variability between images, which has a pronounced effect upon the attempt to separate the precise spectral signatures of varying vegetation species. The minimum accuracy for collective classification was more than 5% smaller than the smallest accuracy obtained for individual images (Table 7), even that the standard deviation was almost the same (i.e., 0.021). The decrease in accuracy with more than 2% from individual images to all the images is probably the results of the RF accounting for the increase variability associated with the larger landscape considered simultaneously rather than separately. The reduction in the ability of the RF to delineate ponderosa pine over large areas, which inheritably have not only larger variability but also was recorded from different position of the sun and atmospheric conditions, was present in the omission and commission errors, which also increased with more than 2% compared with the classification of individual images (Table 6 and Table 7). The addition of CHM did not improve the accuracy (i.e., 93.43% vs. 93.36%) but reduced the variability (i.e., standard deviation decreased from 0.0205 to 0.0203, meaning 1%), even insignificantly.
While the RF algorithm generally outperformed the ML method in most variations in terms of overall accuracy, most notably when classifying four-band spectral data, the accuracies of ML and RF began to converge with the application of CHM data through pixel-level fusion and mask application in the ML classifications. Overall accuracies were nearly equal between the five-band masked classifications by each algorithm, with a slightly lower kappa for ML (Table 5). The use of CHM data in the ML classifications was particularly effective in reducing commission in the pine class. This came at the price of increasing pine omission, which is visible in Figure 3 and Figure 4; the ML image is much less dense than the RF image. The contiguous pine stands were not classified as completely by ML as they were by RF, and therefore were highlighted by a larger sieve for the ML image (areas greater than 1000 m highlighted for ML, areas greater than one hectare highlighted for RF).
When classifying large areas, the acquisition of training and validation regions can be tedious. While the collective RF classifications were technically lower in accuracy than the individual classifications, the differences were small enough that a collective RF classification could require a lower density of training data across the landscape, enabling the user to accurately classify regions where training data are scant or non-existent. This is a major drawback of ML, which requires that training data be provided from the image to be classified.
The unbalanced proportion of WVPP in respect to the rest of the land use classes was successfully identified by the selected parametric and non-parametric algorithm. However, the non-parametric algorithm provided results that were consistently superior to the parametric algorithm, even if the difference in overall accuracy was operationally negligible. The non-parametric algorithm had a kappa coefficient consistently larger than 0.90—values unattained by the parametric algorithm. Nevertheless, the only situation when the two types of algorithms were comparable were when the elevation data were included, which in many areas is not available. In the case when only spectral data are available, the non-parametric algorithm is superior in binary classify landscapes with highly unbalanced vegetation classes.
The limited spectral resolution, large size of the imagery dataset, and the unbalanced targeted classes contributed to many of classification errors. Considering that the training data could not be fully ground-verified, it is likely that some of the pine training regions that could not be verified with street-view imagery were not truly pine but were instead Douglas-fir or another spectrally or structurally similar species. Furthermore, the training data could only be obtained by visual assessment of the imagery, rather than in-person sampling. In as many cases as possible, we only used training data that could be validated through street-view imagery. However, these data are not available in rural parts of the study area, so for some regions, training data were created solely through assessment of high-resolution satellite imagery in various color composites. Some training data were composed of plantations that may have been too thin to provide a solid spectral signature, thereby including understory pixels that contributing to pine commission. Age differences between pine stands also contributed to spectral variability within the pine class.
The most common confusion was between pine and mixed-conifer plantations that shared similar planting and growth patterns, and certain agricultural crops. Widely spaced Douglas-fir plantations, especially mature Douglas-fir, is visually similar to many WVPP plantations. As WVPP and Douglas-fir stands age, they become more similar spectrally. WVPP stands tend to have a duller, browner reflectance than other conifer stands. This is partially because WVPP is planted at a wider spacing and has low-density crown characteristics, resulting in a higher proportion of ground visibility within the stand and through the canopy. We aimed to minimize this confusion through the integration of CHM data with the spectral data.
We found that the highest accuracies, considering overall average, kappa coefficient, omission, and commission, are achieved by fusing CHM data with spectral data and performing a non-parametric classification, followed by a height mask. However, if height data are not available for the area of interest, a non-parametric classification of spectral data alone provides near-equivalent results.
If a parametric algorithm is to be used, the classification is dramatically improved through the addition of height data and masking, as the addition of non-spectral data offers greater class separability in spectrally similar areas. Because our classification targets were forest stands rather than individual trees, height data of moderate resolution was useful and more readily available for such a large study size. The unbalanced nature of the dataset was sufficiently handled through under-sampling of the majority non-pine class.
RF was the superior classification method for our data, because the non-parametric algorithm was ultimately the best fit for the unbalanced, non-parametric nature of our study area. Isolating such a small segment of the spectral landscape within a massive study area required the on-the-fly adaptations of RF, while ML was limited by our defined classes and training areas.
Although RF classification of five-band imagery achieved the highest accuracies of the six methods, the ML method still reached a reasonably high kappa coefficient and extremely low commission rates with the addition of the CHM mask. While RF is the preferable method when classifying small-scale, high-resolution aerial imagery, the ML method can still achieve high accuracies if height data is available to enhance it.
Our findings suggest that the parametric and non-parametric method used in the study are appropriate for unbalanced binary classifications. However, the non-parametric algorithm is better suited for heterogeneous land cover, and if necessary, can be modified to use a standardized training dataset on an iteration of images, whereas the parametric method requires training data that are spatially related to each image in the iteration. The non-parametric method seems to be suitable for both standalone spectral and height-fused data, although the addition of height data may increase the classification accuracy across nearly all metrics.
Graph: Figure 1 The Willamette Valley and the footprints of the rasters used for identification of Willamette Valley Ponderosa Pine (WVPP) stands.
Graph: Figure 2 Methods workflow for final classification techniques.
Graph: Figure 3 ML classification of five-band raster with CHM mask applied. Red areas are contiguous regions of pine class greater than 1000 square meters. White areas are smaller regions of pine class. Black areas are classified as non-pine.
Graph: Figure 4 RF individual classification of five-band raster with CHM mask applied. Red areas are contiguous regions of pine class greater than one hectare. White areas are smaller regions of pine class. Black areas are classified as non-pine.
Table 1 Accuracy statistics for the addition of a canopy height models (CHM) band to the maximum likelihood (ML) and random forest (RF) classifications.
4-Band ML 5-Band ML 4-Band RF 5-Band RF Overall Accuracy 62.8% 65.2% 86.0% 96.2% Kappa Coefficient 0.48 0.52 0.46 0.75 Pine Omission 40.2% 7.4% 17.5% 16.1% Pine Commission 25.9% 43.8% 14.8% 3.6%
Table 2 Accuracy statistics for the application of a CHM mask to the ML and RF classifications.
4-Band ML Masked 4-Band ML 4-Band RF Masked 4-Band RF Overall Accuracy 62.9% 92.4% 86.0% 90.2% Kappa Coefficient 0.48 0.60 0.46 0.49 Pine Omission 40.2% 38.3% 17.5% 34.7% Pine Commission 25.9% 2.8% 14.8% 6.5%
Table 3 Accuracy statistics for the addition of indices to the ML classification.
Overall Accuracy Kappa Pine Omission Pine Commission No Indices 61.1% 0.478 23.8% 12.9% 63.5% 0.507 19.4% 15.6% 59.2% 0.451 19.2% 18.8% 60.5% 0.468 17.9% 18.3%
Table 4 Comparative statistics for parametric classifiers.
Overall Accuracy Kappa Coefficient Pine Omission Pine Commission ML 53.9% 0.41 19.8% 15.5% SAM 56.6% 0.44 33.0% 17.0% MD 35.9% 0.23 40.0% 19.3%
Table 5 Confusion matrix for the ML classifications (in parenthesis, change from four-band statistics).
4-Band 4-Band Masked 5-Band 5-Band Masked Overall Accuracy 67.73% 83.14% (+15.41%) 78.08% (+10.35%) 97.22% (+29.49%) Kappa Coefficient 0.29 0.51 (+0.22) 0.47 (+0.18) 0.89 (+0.60) Pine Omission 13.59% 13.48% (−0.11%) 10.98% (−2.61%) 10.83% (−2.76%) Pine Commission 47.48% 18.84% (−28.64%) 6.42% (−41.06%) 0.83% (−46.65)
Table 6 Mean statistics of all RF individual images classifications (in parenthesis, change from four-band statistics).
4-Band 4-Band Masked 5-Band 5-Band Masked Overall Accuracy 95.63% 95.62% (−0.01%) 95.96% (+0.33%) 97.28% (+1.65%) Kappa Coefficient 0.90 0.90 (0.00) 0.91 (+0.01) 0.94 (+0.04) Pine Omission 4.27% 4.23% (−0.04%) 3.63% (−0.64%) 2.35% (−1.92%) Pine Commission 4.42% 4.43% (+0.01%) 4.24% (−0.18%) 2.91% (−1.51%)
Table 7 Mean statistics of all RF collective classifications (in parenthesis, change from four-band statistics.).
4-Band 4-Band Masked 5-Band 5-Band Masked Overall Accuracy 93.43% 93.36% (−0.07%) 95.93% (+0.50%) 97.30% (+1.87%) Kappa Coefficient 0.85 0.85 (0.00) 0.91 (+0.06) 0.94 (+0.09) Pine Omission 7.17% 6.94% (−0.23%) 3.74% (−3.43%) 2.25% (−4.92%) Pine Commission 6.27% 6.42% (+0.15%) 4.23% (−2.04%) 2.94% (−3.33%)
Conceptualization, S.A.F.; methodology, A.P.R. and B.M.S.; software, A.P.R., B.M.S., and C.Q.; validation, A.P.R.; formal analysis, A.P.R. and B.M.S.; investigation, A.P.R., S.A.F., and B.M.S.; resources, S.A.F. and B.M.S.; data curation, A.P.R.; writing—original draft preparation, A.P.R.; writing—review and editing, B.M.S. and S.A.F.; visualization, A.P.R.; supervision, S.A.F. and B.M.S.; project administration, A.P.R.; funding acquisition, S.A.F. All authors have read and agreed to the published version of the manuscript.
This research received funding from the Forestry and Natural Resources Extension Program at Oregon State University, the U.S. Department of Agriculture, grant number 2019-67019-29462, and the McIntire Stennis project OREZ-FERM-875.
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
We would like to thank Glenn Howe for his significant contributions in recommending the usage of RF. We would also like to thank Matt Gregory for his support in the beginning stages of the study and for his help in troubleshooting code issues.
By Audrey P. Riddell; Stephen A. Fitzgerald; Chu Qi and Bogdan M. Strimbu
Reported by Author; Author; Author; Author