A very biased summary of the 2019 IGARSS conference, by Tugdual CEILLIER.
Last summer, I had the chance to attend the 2019 edition of the IGARSS conference, which presents the latest advances in remote sensing research. And as I submitted our new paper to the 2020 edition of IGARSS a few days ago, I realised that I hadn’t shared my notes on the 2019 edition yet! Because in between sushi lunches and karaoke nights, I attended some great presentations and I’d like to share them with the community.
This is obviously a selection that is heavily biased towards our topics of interest at Earthcube, so it will focus mostly on object detection, change detection and segmentation of very high-resolution optical images.
As it would be impossible to cover the full spectrum of this symposium, this post will only deal with 4 events: the Technology, Industry and Education (TIE) industry workshop, the Object detection from space session and two very nice invited sessions, Deep learning for multispectral image analysis and Labels in Deep Learning: Friend or Foe?
TIE industry workshop
It was the second edition of this workshop, and the first time it had a reserved full day before the symposium itself. It took place on Sunday and its goal was to present scientists with tools developed by companies to ease accessing and processing of remote sensing data. It was chaired by Nathan Longbotham, from Descartes Labs.
This American company, with offices in Washington (USA), Lisbon (Portugal) and Ayacucho (Peru) is specialized in developing remote sensing processing pipelines for a wide variety of clients.
The very nice thing about Development Seed is that they open-source many of the tools they build for their projects. Their Github page is full of useful libraries to anyone interested in using remote sensing data without rebuilding everything.
In this talk, Drew Bollinger presented how to use these tools to very quickly build a prototype able to predict land use from Landsat images. The nice thing was that the whole demo ran on a Jupyter notebook that was launched using mybinder.org so everyone could experiment with it in real-time.
Development Seed also contributes to the projects that aim at building stronger standards for the industry. In particular, they are power users of Cloud-Optimized Geotiffs (GOG) and SpatioTemporal Asset Catalogs (STAC) which are two standards that could make working with many different sources of remote sensing data way easier, even if they are not yet mature enough.
Descartes Labs is an American company that leverages cloud computing and machine learning to scale remote sensing analysis. To do so, they have built their own platform that was the main object of this presentation by Kornelijus Survila.
One of the strengths of this platform is that it allows access to data both open-source (like Landsat or Sentinel) and commercial (including Airbus and Planet). It is designed as a one-stop-shop to build products using remote sensing data. Many treatments like band ratios are included in the platform and you can also use their API to put your own code on their infrastructure, using either CPU or GPU machines.
The example presented was a one-person project to detect solar power plants on the whole globe. Using a U-Net neural network, the API made it possible to treat 320To of Sentinel-2 data in parallel without having to make any effort to distribute the computation.
Tellus is a Japanese initiative to encourage companies to use more remote sensing (and more generally geospatial) data. It is developed by the cloud computing company Sakura Internet and backed by the Japanese Space Agency, the JAXA.
The tool developed by Tellus allows to visualize and process several different sources of remote sensing images, focussing on the ones acquired over Japan. Interestingly, they are planning on giving access to very high-resolution images of Japan, both optical and radar.
Google Earth Engine
Earth Engine is a free and powerful tool to quickly run simple processings on free remote sensing images and visualize them in real-time, which makes it ideal to quickly develop small prototypes.
The presentation by Yasushi Onda was filled with very useful resources and detailed two nice examples. The first one leveraged night light images to measure the impact of the 2018 Winter Olympics on the city of Pyeongchang, South Korea. The second one consisted of detecting paddy rice fields by crossing several sources of images, including the NDVI computed from Landsat images.
In my opinion, the fact that this event had a dedicated spot and was attended by a lot of persons (especially for a Sunday!) shows how much the need to access and process data easily is growing in the remote sensing community. Testing easily algorithms on lots of images to iterate rapidly is key when developing a product or during a research project. The quality of the presentations shows that great tools are becoming available.
However, I feel that the diversity of the platforms and the lack of widely used standards are still a problem for the industry. Should you become vendor-locked and make all your code specific to a given platform? Will your customers be willing to use a third-party platform to use your services? So I really feel that the field need to go through a consolidation phase, with many platforms probably dying or merging in the process, before we reach a viable state.
Object detection from space
So let’s now dive into the symposium itself with this first session that, in addition to its film-like title, included two very nice presentations.
Geoseg: a computer vision package for automatic building segmentation and outline extraction
In his talk, Guangming Wu presented the Pytorch library he and his colleagues developed to try and make easier benchmarking different machine learning algorithms to perform building segmentation on remote sensing data. The library includes various models, losses, metrics and some relevant datasets.
Interestingly, his own tests showed that the best performing model was the BR-Net (Boundary-regulated network) which I wasn’t aware of. All the details of these tests can be found in the paper.
Multiclass vessel detection from high-resolution optical satellite images based on deep neural networks
A quite thorough presentation by Sergey Voinov, from the German Aerospace Center (DLR), of their highly productized pipeline for ship detection in high-resolution satellite images (50cm resolution).
Diagram of the DLR’s ship detection pipeline (figure from the IGARSS paper)
This very pragmatic approach uses a light neural network to classify the whole image divided into small tiles (300x300px). This prevents applying the actual object detection network (Faster-RCNN), which is way more computer-intensive, on the full image.
Another point of interest is that the boxes that are output by the system are not directly the ones from the Faster-RCNN. They use a clever post-processing step to get rotated bounding boxes, showing the ship’s bow. The final label for each ship comes from a comparison with AIS data.
Figure 3 of the IGARSS paper
Deep learning for multispectral image analysis
A lot of presentations at IGARSS 2019 combined in some way deep learning with remote sensing data. It is such a strong trend that it led to a lot of talks consisting of something along the lines of “I tested this deep learning algorithm on this dataset and it worked that well”… This invited session, though, regrouped, in my opinion, the most relevant presentations about how to use deep learning in remote sensing in a relevant manner.
Image registration of satellite imagery with deep convolutional neural networks
Using deep learning to coregistrate remote sensing images is not something wildly original, even if the papers on this subject remain rare. The originality of this work, presented by Maria Vakalopoulou, is that their network can be trained either for rigid transformation, deformable transformation or both at the same time.
Figure 1 of the IGARSS paper
To do so, they use an encoder-decoder structure with dilated convolutions that predicts the parameters of the transformation from the two images (the reference one and the one to transform). These parameters are then fed to a fixed layer that performs the transformation itself. The loss computed from the reference image and the transformed one is a simple MSE (mean squared error) with two additional regulation terms which ensure that the transformation does not go too far from the identity transformation.
Continual learning for dense labelling of satellite images
Finding a dataset that suits perfectly what you want to do to is nearly impossible. However, it is often possible to find multiple datasets, each one solving part of the problem. In his presentation, Onur Tasar presented a way of leveraging several datasets with heterogenous labels to train a network able to predict all of the various classes.
To do so, he trains the network on the first dataset and then uses this network as a memory. For the next datasets, the predictions from this network are used as labels to train an updated network that will output old classes plus new classes. It is found that keeping 30% of the training patches from the previous datasets is a good compromise to not lose performances over the whole process.
Principle of the continual learning process (Figure 1 of the IGARSS paper).
To improve even more the process, Onur said they plan on using style transfer to reduce the domain adaptation problem. To do so, they think they will use GANs constrained to only modify the dynamics of the image.
Cross-domain-classification of tsunami damage via data simulation and residual-network-derived features from multi-source images
Post-disaster damage assessment is a hot topic in remote sensing, as it can be directly used by the rescue services to target the most affected areas. The work presented by Bruno Adriano aims at solving this problem in the case of tsunamis.
The approach here is to combine VHR optical imagery (Worldview 2) with SAR imagery (ALOS). The problem is that while WV2 has a ground resolution of 50cm, ALOS has a 10m resolution (no high-resolution SAR imagery was available on the test area). To artificially increase the resolution of the ALOS images, they use a conditional GAN to transform WV2+ALOS into high-resolution TeraSAR-X images, using a dataset from another area. They then use a ResNet-50 to extract features from the optical image and combine features from optical and SAR using a random forest algorithm.
Schematic of the fusion architecture (Figure 2 of the IGARSS paper).
I really liked the approach of fusing different types of data, and the results obtained on their test area are quite promising.
Visual question answering from remote sensing images
This was a very unusual and very funny presentation. The goal here is to train a network to directly answer questions about a patch of remote sensing imagery, as a human would. This is the principle of visual question answering (VQA).
Examples of questions that could be asked about this image (Figure 1 of the IGARSS paper).
The method developed by Sylvain Lobry and collaborators consists in using a CNN to extract a feature vector from the image, using an RNN to transform the question into a vector and then combining the two vectors by multiplying them. The question is then predicted my a multi-layer perceptron (MLP) by choosing between all possible answers.
Principle of the VQA architecture (Figure 2 of the IGARSS paper).
To train this solution, they used Sentinel-2 images and automatically generated questions using a given template. The labels were obtained with OpenStreetMap. The results they got are not bad but remain pretty far from usable, which they are completely aware off, this being a very preliminary work. Interestingly, the most difficult task for the network was to count objects precisely, the network being only able to guess the right order of magnitude.
Unsupervised multiple-change detection in vhr multisensor images via deep-learning-based adaptation
Change detection in VHR satellite images is HARD! I know because we have worked a lot on the topic at Earthcube before getting good results. So I was very interested in this approach. The work presented by Sudipan Saha is very clever in the way it takes advantage of the progress on GANs to address the change detection issue.
The principle of the method detailed in the talk is to train a cycleGAN to perform style transfer between two different images, that can be acquired by different sensors or in different viewing conditions.
Training of the cycleGAN (Figure 1 of the IGARSS paper).
Then the different networks trained are used in the following process:
Transform image1 onto image1' that has image2 style (using the first generator)
Extract features from image1' and image2 (using the second generator)
Compute the difference of the extracted features and threshold it
Clusterize the detected changes based on the features difference
Principle of the change detection framework (Figure 3 of the IGARSS paper).
The results obtained look pretty good and have the process has the great advantage of being completely unsupervized. Anyone who has labelled data for change detection will know how invaluable this is!
Fusing multi-seasonal sentinel-2 images with residual convolutional neural networks for local climate zone-derived urban land cover classification
Remote sensing images used to be few and far between. But with Sentinel-2 having a revisit time that can be as short as 5 days, using the temporal information of these images is more and more appealing. This is exactly what the work presented by Chunping Qiu is aiming for. It uses averaged Sentinel-2 images for each season and wants to predict the local climate zone labels, i.e. land use categories.
To do so, the team from the Technical University of Munich tried different solutions: stacking the 4 images before feeding them to a single CNN, using 4 different ResNets for image features extracting and an LSTM to predict the classes, and using 4 different ResNets for image segmentation and averaging the results.
Interestingly, the best results were obtained by averaging the results from four networks, each one specialized in one season. This makes me think that it would be very interesting to test the LSTM version using directly all the images from Sentinel-2, without performing the season average as a preprocessing.
Figure 1 from the IGARSS paper.
Labels in Deep Learning: Friend or Foe?
This session was a first at IGARSS and based on the fact that the very small room it was held in was completely crowded, I daresay it was a big success. So congratulations to Katarina Doctor and Devis Tuia for proposing this topic and chairing this inspiring session. Anyone working in deep learning knows how important the labels you use are and how difficult it is to produce or even define them!
Where do labels come from?
Very interesting introduction by Katarina Doctor on how labels are produced. The paper gets pretty technical but one thing I really liked is the emphasis put on the fact that as labels are decided by humans, they are in fact the outputs of a kind of reinforcement learning.
Labels actually come from a reinforcement process (Figure 2 of the IGARSS paper).
So it is important to keep in mind that labels cannot be perfect and depend a lot on the context they are produced in. In the case of remote sensing labels, we have in fact 3 different bias: the possibility of measurement (instrument limitation), the measurement error (human limitation), and the relevance of the labels (fit between labels and task).
I encourage anyone interested in this topic to peruse the paper as it is very thorough and full of useful references to explore this field of research.
The truth about ground truth: label noise in human-generated reference data
In his talk, Ronny Hansch detailed a truly great experiment. It is based on the fact that there is often no clear boundaries between classes in remote sensing (think: when does a road become a dirt track?) and that people can interpret classes in different ways (especially if there is no “unknown” class: do you put an airport runway in the “road” class?). So the goal was to estimate the influence of the labels uncertainty in the training and evaluation of deep learning algorithms.
The experimental protocol is pretty straightforward:
Make 4 volunteers label the same area, with the same instructions,
As ground truths, use each individual labelling + their union + their intersection,
Train and evaluate on the different ground truths and compare performances.
Comparison between the different algorithms (left) and between the human labellers (left) (Tables 1 and 2 of the IGARSS paper)
The results are pretty interesting. First of all, one can see in Table 2 that two humans never agree 100%. So for this task, it would be foolish to want performances of 99%! This goes against the usual idea that the highest the performances, the better, and that you can always have better metrics. I think it is especially useful when discussing with people from the business side :-).
The second conclusion is rather surprising: sensitivity to noise in the training data appears to be low (in Table 1, numbers are stable in a given column). This is reassuring, as we know that we will never get perfect labels and must still work with them.
Finally, the last point is that performances depend a lot on the data used to evaluate the performances! This shows how important it is to have very clean testing sets, which undergo a strict quality check and must represent perfectly the use case that you want to tackle. This is something that we are very, very careful with at Earthcube and this work proves us right!
Building an operationally relevant dataset from satellite imagery
This was a very brave presentation, by Katie Rainey from the Naval Information Warfare Center. She basically detailed all the mistakes she made building a dataset for ship classification from very high-resolution patches. I think this is the kind of presentations that is never done but can be of help to a lot of people working in the same field.
Left: Illustration of the patches creation process. Right: Example of images from the four classes. (Figures 1 & 2 from the IGARSS paper)
Mistakes Katie detailed are the following ones:
Launching an experiment (“proof of concept”) without a clear task in mind;
Having the researcher label the data herself, instead of dedicating an expert;
Using the classes that are in the available data (barge, cargo, container, and tanker), instead of ones relevant for the operational use-case;
Resizing the patches and therefore losing all information about the size of the ship;
Breaking the connection between the original data and the patches and losing important metadata in the process.
The presentation highlighted very well bias that are very frequent when dealing with deep learning and more generally data science. More often than not, decision-makers want to experiment with it and ask data scientist to “make a POC” without specifying much more and without dedicating the proper resources (like experts to label relevant data). I could see many people nodding in approval during the presentation so I think it is a message worth spreading.
However, Katie did say that the project had been a great opportunity to expose researchers to operational use-cases and that it allowed kindling the creation of links between her lab and operatives.
Learning to understand earth observation images with weak and unreliable ground truth
Very often, it is very hard to get exactly the labelled data you need. You then have to make do with poorly suited labels or data. This is exactly the starting point of the work presented by Rodrigo Daudt.
The goal is one that I can feel close to, as we have worked a lot on the topic at Earthcube: change detection in very high-resolution optical images. The main issue with change detection is that it is very difficult and highly time-consuming to label changes, as they are very diverse, can be hard to define and very rare.
The solution this team came up with was to use pretty poor labels bu in a very clever way. The process they put together can be summarised as follow:
Changes are taken from modifications in different dates of the Copernicus Urban Atlas. It is not precise, it just tells if a given parcel has changed its land use (in reality, only part of the parcel has actually changed).
From this coarse ground truth (GT), a first model is trained.
Then, predictions are made with the trained model on the images of the dataset and a new GT is obtained by keeping only the intersection of the original GT and the prediction. Different strategies can be used for this.
The model is re-trained on this new GT.
Iterate the previous steps (in this work, 4 times)
Comparison between the different strategies to intersect GT and predictions (Figure 2 from the IGARSS article)
The results obtained are very interesting and show that the model is able to find both unchanged areas in the GT areas but also additional changes outside of these areas. I find that this approach is very promising, in particular as a first step to perform transfer learning for a more precise task in a second phase.
It is worth noting that an extended version of this work was rewarded with the Best Student Paper Award at the Earthvision workshop of the 2019 edition of the CVPR conference. Way to go, Rodrigo et al.!
Interactive coconut tree annotation using feature space projections
Great presentation by Davis Tuia, whose smile and enthusiasm is always refreshing! The context is fun, as this work aims at detecting coconut trees in drone imagery, in participation to an OpenAI challenge by WeRobotics.
The goal here is to label very quickly many small tiles either as containing coconut trees or as background, to train a classifier. The process proposed is the following one:
Label a limited amount of tiles as coconut tree or background;
Train a first (bad) classifier;
Use this classifier to project all the possible tiles in the feature space;
Project the feature space in 2D and cluster the tiles;
Explore the projected feature space to decide if the tiles are in the right cluster or not;
Iterate ad libitum.
Figure 2 from the IGARSS paper
The idea of projecting the feature space to explore it more easily is a nice one, and it reminds me a lot of interesting techniques of hard negative minings. Moreover, a simple but efficient user interface was developed to ease the work of the annotators and rapidly iterate. In the end, they show that for a fixed labelling time, their strategy yielded significantly better results than randomly exploring the available image.
Overall, this was a very interesting edition of IGARSS, where deep learning was clearly gaining momentum. The community of convolutional neural networks users for remote sensing application is consolidating and addressing subjects that are often on point, such as the importance of good data, and especially of good labels. Can’t wait for the 2020 edition in Hawai!