The KIHBA project - using artificial intelligence to recognise tree species

A comprehensive, spatially high-resolution tree species map has long been a wish of forestry practitioners, as a basis for the implementation of diverse tasks in forestry. Can their wish be fulfilled with the help of artificial intelligence? The LWF is pursuing this goal in the remote sensing project KIHBA in co-operation with the company IABG mbH.

A high-resolution spatial survey of the main tree species does not exist in this sense for the whole of Bavaria. Information on the percentage shares of the different tree species and their distribution across Bavaria is based on the extrapolation and spatial interpolation of sample information from the National Forest Inventory. These results do not however allow the derivation of a precise area-specific spatial distribution of the tree species.

Since the end of 2022, a tree species map for the whole of Germany with a spatial resolution of 10 m x 10 m has been available for the first time, created and published by the Thünen Institute. This map is based on classic machine learning (ML) methods. Satellite images (2017/2018) from the Sentinel-2 satellite of the European Earth observation programme Copernicus were used as the data basis. Tree species recognition is possible at the stand level here, but, because of the low spatial resolution, it is not possible at the individual tree level.

In the KIHBA project, we hope to be able to use deep learning (DL) algorithms and a higher spatial resolution of the remote sensing data to generate a model to derive a tree species map at the individual tree level for the whole of Bavaria. The classes selected for the modelling correspond to the main tree species occurring in Bavaria: beech, oak, spruce and pine, as well as other coniferous and broadleaved trees. An additional class for deadwood was also defined.

Results from the remote sensing studies BeechSAT and IpsSAT carried out to date at the LWF in cooperation with IABG have showed higher classification accuracies for deep learning methods than were achieved using classic ML methods. Since DL methods for image evaluation can capture spatial structures and relationships, DL should complement an evaluation based purely on colour information with the use of structural information. Compared to classic ML, however, DL methods require much larger amounts of reference data, the collection of which is usually very labour-intensive.

Aerial view from above showing a mixture of green conifers and autumn-colored deciduous trees — Fig. 1: A high-resolution spatial survey of the main tree species does not yet exist in this sense for the whole of Bavaria. Photo: Tobias Hase

Data basis and reference data generation

For modelling at the individual tree level and in order to include structural information, the project uses remote sensing data of the highest possible resolution. These are digital true orthophotos (tDOP, aerial photographs corrected for distortion and tilt). These very precise, official aerial survey data from the Bavarian surveying administration are updated every two years, and available for the whole of Bavaria with a spatial resolution of 0.2 m in 4 channels (red, green, blue (RGB) + near infrared). The coverage of Bavaria is based on a large number of image flights taken at different times, resulting in inhomogeneities in the image material due to different lighting conditions. Alternative data from satellite-based sensors with resolutions of < 1 m (e.g. WorldView 2/3, SkySAT) were also considered, but were ruled out because of strong tilting of the trees due to oblique imaging, high positional inaccuracy, lack of availability, and high costs.

Possible climatic and geographical influences, the phenology and other heterogeneity of the input data must be considered during the tree species survey and mapped in the reference data used for modelling. This means that the reference data must be collected as widely as possible across Bavaria. Terrestrial inventory data from the National Forest Inventory and the annual forest condition survey were used to help with their collection, as were high-resolution drone images from other projects at the LWF.

A total of 809 tract corners of NFI inventory points distributed across Bavaria were selected for the derivation of reference data. A square area measuring 50 m x 50 m was defined around each tract corner. In accordance with this selection, tDOPs from the years 2017/18/19 and 2021/22 were used. These had mainly been taken between May and July. Within each square, all trees with a minimum height of 12 metres were identified through visual interpretation, marked with a dot in the GIS, and assigned to a class (Figure 2).

Map of Bavaria — Fig. 2: Months of the 2018/19 aerial surveys, showing positions of the reference data collection points across Bavaria. Right: CIR aerial image section and generated reference data with various tree species, for one tract corner measuring 50 x 50 m as an example.

Trees under 12 metres in height were excluded from the survey, as reliable determination of the species was not possible. The height information was taken from an official, normalised digital surface model (nDSM) with a spatial resolution of 1 m.

The quality of the recorded tree points was improved through multiple inspections carried out according to the dual control principle, and their attribution to “certain”/”uncertain” categories (approx. 5 %). A total of approx. 103,000 tree points (97,200 certain) were set, with spruce being the most common tree species in the reference data, at 37.6 % of the total.

The selected DL methods require areal labels, which depict the crown areas associated with the tree points. Through image segmentation, an attempt was made to derive the boundaries of these areas automatically. The results were unreliable, however, especially in heterogeneous stands; they showed over-/under-segmentation and some missing delimitations of tree species. Subsequent correction of the segmentation proved to be more time-consuming than a completely manual recording of the tree crowns, so that the areal labels were generated manually on the basis of the RGB-tDOP images. Since semantic segmentation (see following paragraph) does not consider individual objects, it was possible to group adjacent crowns of a tree species.

For later evaluation of the modelling, 10 % of the reference data was excluded from the model training and tuning as a test data set. This ensured that the test data set was representative in terms of tree species distribution, time of recording and spatial distribution. Test data were excluded from the model training so that a prediction and subsequent analysis could be carried out using the model with unknown data.

Selection of the appropriate deep learning modelling approach

In recent years, DL approaches, especially so-called Convolutional Neural Networks (CNN), have become established in Earth observation. Notable with DL approaches are the significantly larger number of model parameters compared to conventional ML methods, and the consideration of local neighbourhoods of the image pixels. In addition to spectral information (colour), spatial structures and textures of the image content are thus also included in the classification process. Since the aim is not only to recognise the tree species, but also to delimit their exact spatial boundaries, a semantic segmentation approach was chosen. In this process, each pixel of the input image is assigned to the relevant class. It is not possible to differentiate between individual trees (as with instance segmentation).

As part of the model optimisation, different model architectures were compared (U-Net, U-Net3P), and model parameters, e.g. class weightings, were tested.

The influence of additional input data (e.g. tree heights) or of image augmentation was also analysed. With the latter, the original images are manipulated geometrically (e.g. through rotation, mirroring) or in terms of content (e.g. by adding noise) in order to increase the robustness of the model and enlarge the data set.

Depending on the amount of input data, the calculating time for a training session was approx. 10-12 hours on a standard graphics card. Extensive hyperparameter tuning experiments took several days. The models generated were then evaluated using validation and test data, and the best model was determined. Established statistical metrics such as the F1 score and analysis of the error matrix were used.

The winning combination

The systematic validation of the training results made it possible to determine which combinations of parameters and input data sets lead to an improvement of the DL model and ultimately form the best model. Figure 3 shows a selection of relevant measures aimed at improving the model. Contrary to expectations, not all measures led to a significant improvement.

tabel — Fig. 3: Empirically determined influence of parameters and input data sets on the accuracy of the model.

The achieved accuracies (F1 scores) of the best model are shown in Figure 4. The highest accuracies are achieved by the pine and spruce classes, with F1 scores of 0.75 and 0.67 respectively. In addition to the above-average proportion of the training data set, this is also due to their distinctive and easily recognisable structure in the image data. By contrast, oak achieves very low values (0.21). This can be explained by the comparatively small amount of training data for this class and its low differentiability compared to other deciduous tree classes. It was also noticed that confusion occurs particularly within the deciduous and coniferous tree classes, so that further training was carried out with aggregated classes. By aggregating individual classes, it was possible to achieve a significant improvement in the accuracy values, particularly for deciduous trees. This again shows the difficulty of differentiation within the deciduous tree classes, and the good separability from coniferous trees. The model with the combined classes is thus suitable as a pixel-precise classifier into deciduous/coniferous trees, and achieves an F1 score of 0.83 for the deciduous tree classes; the other class results can be seen in Figure 4 (right). Figure 5 provides a visual comparison between the results of the individual tree species classes and the combined classes.

In addition to the pixel-based validation, which takes into account the correct delimitation of the trees, the results of the model were also analysed on a point by point basis. This is to show whether the core area of an individual tree has been correctly recognised, regardless of the peripheral delimitation of its crown. F1 scores of greater than 0.8 were achieved for the combination classes for deciduous trees, pine, and spruce/other conifers. The additionally surveyed deadwood class shows low accuracy values, which can be explained by the pronounced heterogeneity of the class (damaged, dead).

Conclusion and outlook

Different representations of the same aerial photograph — Fig. 5: View of the tDOP in RGB and CIR (NIR, R, G) at a tract corner, and under this the recorded reference labels and classification results.

In the BeechSAT and IpsSAT projects mentioned above, DL achieved better results than ML in distinguishing between healthy and damaged or dead trees. However, the number of classes there was low, and the classes were clearly distinguishable. In our study, the high number of classes, especially those of hardwoods, showed only minor spectral and structural differences, so that these could not be reliably separated even when using DL. It was however possible to differentiate reliably between deciduous and coniferous trees.

An analysis of the confusion between these two classes showed the great influence of the image material, as artefacts (artificial structures such as edges or distortions through orthorectification) were frequent here, and thus obviously have a direct influence.

The reference data based on the tDOPs are of great added value (high positional accuracy). They can be updated with little effort and used to validate other remote sensing products (e.g. the Thünen Institute’s tree species group map). An application of the model over a larger area is currently being developed in order to evaluate the influence of spatial structures, the occurrence of possible peripheral effects and the calculation times of the prediction. If successful, it would be possible to generate a precise area-specific tree species group map with a spatial resolution of 0.2 m. This would have a 100-fold higher resolution in terms of pixel resolution than the tree species group maps already available at the LWF, which are based on Sentinel-2 data.

Summary

The KIHBA project aims to develop an automated classification of the main tree species beech, oak, spruce and pine at individual tree level using deep learning methods and high-resolution remote sensing data. The model architecture used is a U-Net. The results show the limitations of the methodology, as deciduous trees in particular cannot be reliably differentiated. A precise differentiation between the aggregated classes “deciduous trees”, “pine” and “spruce/other conifers” did prove to be possible, however. There is a large, updatable reference data set, and the classification can be applied across the whole of Bavaria.

The KIHBA project is being funded by the BMWK [Federal Ministry for Economic Affairs and Climate Action] (running time 1.5.2021-31.1.2024) and carried out in cooperation with IABG mbH Geospatial Solutions.

The KIHBA project - using artificial intelligence to recognise tree species

More on waldwissen.net