Estimating coal power plant operation from satellite images with computer vision

17 min readAug 7, 2023

In this first Tech Blog, we return to where TransitionZero began — using machine learning and satellite data to fill critical data gaps in public emissions data. This work led to our co-founding of ClimateTRACE, with the support of Google.org.

This tutorial was prepared for the Climate Change AI Summer School by André Ferreira, data scientist at TransitionZero

In a (data scientist’s) utopian world, for every region of the world, for every domain and every point in time, there would be perfectly formatted, reliable and easily accessible data. Unfortunately, that is not the case. Particularly in greenhouse gas emissions data, the quality and availability of data varies greatly, both geographically and along time. For instance, the USA has CAMPD, where power generation and emissions are made publicly available for each power plant, on an hourly basis. Similarly, the EU has ENTSOE and Australia has NEM. But for most other areas of the world, either data of this kind of granularity is non-existent, made difficult to access or is self-reported by energy companies (with the reliability concerns that it carries). If we think outside of the power sector, such as in heavy industry or in the infamous scope 3 emissions from supply chains, data scarcity and other issues are even more problematic.

While ground-level sensors are hard to implement on a global scale and self-reported data can lead to unreliable and inconsistent data, there is one type of sensor that is global by nature: satellites. For domains where their emissions are visible from space, at least in one of the available remote sensing bands (e.g. the 13 bands in the Sentinel 2 satellites), we can aim to collect emission estimates across the whole world. And given that some regions have emissions data freely available, we can use that to train machine learning models so as to automate the emissions estimation process.

Climate action from data

When faced with a challenge, it’s important to have a thorough understanding of its context and our progress towards resolving it. Thus, emissions and climate data can guide us towards comprehending and addressing climate change. Furthermore, as we get a sense for the kinds of solutions that can help us achieve our mitigation and adaptation goals, such as renewable energy penetration, plant-based diets and switching from fossil fuelled cars to electric and/or active methods of transport, a key step is spreading the message to bring those ideas into fruition. And while the best communication approach can vary from culture and target audience, data can help to construct a solid, fact-based narrative for climate action.

At TransitionZero, we have examples of using this approach of data for climate action, such as:

Turning the Supertanker — a report on the unprofitability of coal power plants in China and how much the country could save by switching to renewable energy
Japan’s toxic narrative on ammonia — an analysis on the misleading emissions saving potential of co-firing ammonia with coal
Two billion reasons: how Indonesia can get ahead of the net zero curve — exploring potential pathways for Indonesia’s emission cuts derived from energy systems modelling

This blog post is based on the Climate Change AI Summer School 2023 tutorial on Monitoring, Reporting, and Verification. The original content of the tutorial is in this Colab notebook, where you can run the code and train models.

In the end, you should have learnt more about:

The gaps in emissions data
Practical usage of freely available satellite imagery
The application of computer vision models for the purpose of emissions monitoring

1) Overview
2) Prerequisites
3) Background on coal power plants
4) Software requirements
5) Data description
6) Methodology
7) Modelling experiments
8) Results & discussion
9) References

Overview

Over the next sections of this article, we’ll go through the steps of how to monitor coal power plants, leveraging satellite imagery and computer vision models. In the pursuit of this goal, some of the main contributions of this tutorial are:

A dataset of coal power plants’ emissions, including images, metadata and labels.
A public GitHub repository with auxiliary code for creating the dataset, fetching Sentinel 2 images and training computer vision models.

Prerequisites

Hardware settings

In the methodology section, there are code cells that train deep learning models. In order to get these cells to run in a reasonably short amount of time (~11 minutes each), you should make sure that you add a GPU to your Colab notebook. You can do so by clicking on Runtime ➡️ Change runtime type ➡️ Hardware accelerator ➡️ GPU

Power sector knowledge

You don’t need to be an expert in power plants to go through this tutorial. The section bellow has some further context on coal power plants if you’re interested, but it’s enough to just be aware that coal power plants can be generating power (ON) or idle (OFF), producing visible plumes when ON.

Machine learning knowledge

You may benefit from some knowledge of Python, PyTorch and deep learning in general, as all of these are utilised. This knowledge is particularly relevant if you want to experiment with using different models or different data approaches. You can still run all of the code and analyse the results without having to edit anything.

Background on coal power plants

Operation status

We can think of a coal power plant has having two main operational statuses: idle (OFF) or generating electricity (ON). Within the ON state, the power plant can generate different levels of energy, and cause more or less emissions in a proportional manner, which depend on its capacity (i.e. how much power can it generate at its maximum) and its capacity factor (i.e. what percentage of its capacity is it generating). But for the purposes of this tutorial, we only care about whether the power plant is ON or OFF.

Coal power plants tend to be used as baseload, i.e. being in a somewhat constant state of either ON or OFF, as can be seen later on in one of the data analysis plots, and are typically slow to ramp up and down. Thus, a snapshot of them every few days, such as from satellite images, should be enough to get a realistic picture of their operational status and emissions.

How a coal power plant works

Image source: *Fundamentals of Thermal Power Generation*, by Mohammed Elamin

As a thermal energy source, coal power plants produce energy by heating up water and using its steam to turn a turbine. In the case of a coal power plant, it burns coal to produce the heat that steams up the water.

From an emissions standpoint, the main types of infrastructure in a coal power plant that we should care about are:

Flue stacks: a narrow and tall chimney where the waste gases from coal burning (including CO2) are vented out
Cooling technology: what’s used to cool down the water that is heated up inside the power plant; some of the main cooling types are:
Cooling towers: a wide and voluminous tower where the water cools down at a pool on the bottom of it while big plumes of water vapor come out
Water outlet: in the once-through cooling method, the water that is pulled in to spin the turbine is then pushed straight out of the power plant (e.g. back to a river)

Signals

Depending on the technology in use, coal power plants can have a set of signals that relate to their operational status and CO2 emissions:

Cooling tower plumes → generally very easily visible signal; we’ll focus on these

Flue stack plumes → can be harder to spot, especially in medium to low-resolution image

Flue stack and cooling tower heat → thermal data from satellites is still too low resolution to accurately model this

Cooling water flow → hard to decouple from nearby industrial activity

Sentinel 2 image of a once-through cooled coal power plant, as obtained in our internal processes

Software requirements

The main software dependency here is coal_emissions_monitoring, a Python library that was designed for this tutorial. It includes a lot of helper functions and notebooks to replicate this work, from dataset creation to machine learning model definition.

Data description

The data in this tutorial has been assembled using a combination of the following sources:

AWS’ Sentinel 2 L2A collection: satellite images made available both as the typical visible light images and as multispectral images; visual bands have a spatial resolution of 10 metres (i.e. each pixel represents ~10x10 square metres on the ground).
Google’s geospatial classification demo: dataset of timestamps and coordinates of cooling towers, with a label on whether or not its associated power plant was operating.

Our goal in the end is to be able to estimate the operation status of coal power plants through the satellite images, which is a proxy of their emissions.

The data has already been collected and processed in the coal_emissions_monitoring repo, but feel free to check it for more information on how it was done. The main step taken there was downloading satellite images and associating them to each row of the labelled dataset.

Data download

All of the data required for this tutorial is made available in this public Google Drive folder. We’re going to download it to this notebook’s disk.

Data pre-processing

Here we’re going to decompress our satellite images, configure a dataset and explore what’s in the data.

batch_size = 64     # how many samples to use in each batch of model training
crop_size = 64      # what's the size in pixels to crop the satellite images to

data = CoalEmissionsDataModule(
    final_dataset_path="/content/google/final_dataset.csv", # path to the CSV file that contains the dataset that we're training on
    batch_size=batch_size,                                  # how many samples to use in each batch of model training
    crop_size=crop_size,                                    # what's the size in pixels to crop the satellite images to
    num_workers=os.cpu_count(),                             # how many workers to use to fetch the next batches of data while the model trains
    predownload_images=True,                                # whether to use predownloaded images or download them on the fly (very slow)
    download_missing_images=False,                          # whether or not to download any images that are missing from the dataset
    images_dir=images_path,                                 # path to where the images are saved on disk
)

# prepare the data for model training, i.e.
# • load the dataset table
# • filter out cloudy images (>50% of the satellite tile)
# • make sure that images are already downloaded and on the right path
# • split non-2020 data into training and validation sets, based on random spatial assignment
data.setup("fit")

# prepare the data for testing the model, i.e.
# • load the dataset table
# • filter out cloudy images (>50% of the satellite tile)
# • make sure that images are already downloaded and on the right path
# • assign all 2020 data to the test set
data.setup("test")

For a clearer understanding, here’s a description of each column in the dataset above:

ts - timestamp of when the operation status (i.e. is_powered_on) was recorded
latitude - latitude coordinate of the centre of the cooling tower
longitude - longitude coordinate of the centre of the cooling tower
is_powered_on - operation status of the power plant at the time ts; it's the label that we're going to estimate with our machine learning models; 1 means the power plant was ON (i.e. generating electricity), 0 is OFF
facility_id - unique identifier of the cooling tower
geometry - square-shaped geometry centred on the cooling tower, with dimensions of 640 x 640 metres, which represents 64 x 64 pixels in the Sentinel 2 satellite images
cloud_cover - percentage, from 0 to 100, that represents how much of the original satellite tile (which is 110 x 110 km) is estimated to be covered by clouds; it's an important column as clouds can obscure the power plants and trick our models to think that it's seeing cooling plumes. This is calculated using a combination of remote sensing arithmetics; you can see more information about it here
cog_url - URL for the Cloud Optimised GeoTIFF (COG) that contains the image that is relevant for the respective row (i.e. contains an image of the geometry taken at the date ts)
local_image_path - path to the image that was cropped from the cog_url using geometry, saved on the local disk
data_set - group that the respective sample corresponds to; can be train, if it's going to be used to train the model; val, if it's being used to validate the model's performance during training; test, if it's going to be used to test the model in unseen data after training

Data analysis

Let’s explore the dataset then to find out what patterns it uncovers:

Training data stats:

Number of rows: 3305
Number of power plants: 133
Time range: 2016–11–17 10:58:28+00:00–2021–04–27 10:46:46.734000+00:00

Validation data stats:

Number of rows: 926
Number of power plants: 23
Time range: 2016–12–05 10:14:12.680000+00:00–2021–05–04 10:38:51.055000+00:00

Test data stats:

Number of rows: 1885
Number of power plants: 156
Time range: 2020–01–02 10:26:23.818000+00:00–2020–12–31 09:18:43.139000+00:00

Note that the train/validation split is a spatial split, randomly allocating facilities to either the training or the validation set. This means that each unique cooling tower can either be in the training or validation set, but not on both at the same time. Precautions were taken to ensure that cooling towers that are very near each other, to the point of appearing in each others’ images, would always be in the same set, so as to avoid leakage (e.g. similar images appearing both during training and validation, which could result in misleadingly high performance). For more information on this, check the get_facility_set_mapper function in ccai-ss23-ai-monitoring-tutorial.

The test set includes all power plants but only has 2020 data, while the other sets have data for the years of 2016, 2017, 2018, 2019 and 2021.

For an interactive version of this plot, check the Colab notebook

Above you can see the time series of all of the cooling towers’ operation status (is_powered_on). To see each cooling tower one at a time, you can double click on its facility_id on the legend in the right side of the graph.

You can notice that, while some cooling towers have some variations in their power generation, most spend the majority of the time generating power (is_powered_on = 1)

The histogram above reinforces the idea that coal power plants are generating electricity very often.

As the operation status is the label that we want to classify with a machine learning model, this disparity between positive and negative states actually constitutes a problem called an unbalanced dataset. With one class (ON) having many more observations than the other (OFF), if handled incorrectly it could trick our models to always predict that a power plant is ON.

Fortunately there are ways to counteract this issue, such as applying a smaller weight to ON samples in the model's loss function, which makes the model pay relatively more attention to OFF samples.

The cloud cover histogram shows us that most images are relatively cloudless, with most of them having between 0% to 0.5% clouds (and this is an estimate over the entire tile that the satellite captured, not just on the small patch that we extracted). So hopefully clouds won’t be much of an issue.

Note that, in this plot, cloud_cover is represented as a percentage of the satellite tile.

Map of the cooling towers in the dataset, split by training and validation set. Note that the squares represent the size and shape of the images that were retrieved for each cooling tower on each timestamp. For an interactive version of this plot, check the Colab notebook

Methodology

Core model

For the purposes of this tutorial, a relatively simple model was defined from scratch, based on common features of modern Convolutional Neural Networks. You can read the model’s definition in the SmallCNN class of coal_emissions_monitoring.model. It consists of:

6 convolutional layers with ReLU activation function
2 max pooling layers interspersed between the convolutional layers
1 adaptive average pooling layer after the final convolutional layer
1 fully connected layer in the end for estimating the final logit (i.e. the value that is used to estimate if a cooling tower is ON or OFF)

You are encouraged to try using other models, such as those made available in Hugging Face’s timm library, but note that most popular computer vision models were developed with larger image sizes (e.g. 256 x 256 pixels instead of our current maximum size of 64 x 64 pixels).

Early stopping

In order to be productive with training time, we’re using a common technique called early stopping. What it does is end model training if a specific validation metric hasn’t improved for a certain number of training epochs. In this case, we’re early stopping model training if the loss on the validation set doesn’t decrease for 10 consecutive epochs.

Model checkpoints

As we monitor model performance during training, we can keep an eye on when the model is performing at its best. Since it’s not guaranteed that the best model will be the one that comes out at the very end of training, we use model checkpoints to continuously save the model whenever we detect a new best performance over a given metric (in this case, its the balanced accuracy over the validation set).

PyTorch Lightning

There are many different Python libraries to develop machine learning code. In this tutorial, we’re using Pytorch Lightning, a neat wrapper around PyTorch, currently the most used machine learning library in the world, which simplifies code and gives us a consistent, structured approach to defining models and datasets.

Aim to visualise and compare modelling experiments

Below we’ll start an Aim experiment tracker. This can be very useful to visually compare multiple modelling experiments, as it tracks many metrics and parameters automatically, and presents them in a nice interface.

Note that when running it before training any models it will be empty. You need to train models in the cells coming afterwards to actually see models showing up here.

For the purposes of this tutorial, we focus on the following metrics:

Balanced accuracy: a proxy to how often the model is correct in its predictions, while taking into account the disparate number of samples of each class (this is to address the unbalanced dataset that we discussed previously); the closest this number is to 1 the better, while a lower number closer to 0.5 is worse, i.e. we want to maximise this metric
Loss: an error value that, in the case of the training set data, is directly used in the machine learning optimisation; along the model training, we should see this value decrease, quickly at first and then slowly plateauing in a low value; we want to minimise this metric

All metrics should improve on the training set data quite smoothly, even if it has a few ups and downs, given that it is the data that the model is directly optimising for. We’re always likely to have worse performance in the validation and test sets, but the performance there is relevant to see how well the model generalises to new data.

Example of an Aim dashboard while training models in this tutorial. For creating your own dashboard and visualising your experiments, check the Colab notebook

Modelling experiments

In this tutorial, we explore three different modelling approaches:

Train on the full training set data as it is, with images of size 64 x 64 pixels and a cloud filter of 50%
Train using a smaller crop size, in this case 32 x 32 pixels
The reasoning behind using a smaller crop size is to further focus the model on each cooling tower’s output. The larger the image, the more we get of background and potentially irrelevant context.
For example, an image of 128x128 pixels would correspond to 1.28 km width and length on the ground, which would add a lot of noise to what we actually want to model, which are the plumes (or the lack thereof) coming out of cooling towers.
Train using a more strict cloud filter, in this case 5%
The images provided in this tutorial were obtained using a 50% cloud filter. This could still let through some images with clouds, which can reduce the quality of our dataset. It’s relevant then to experiment with stricter cloud filters.

Refer to the Colab notebook for performing model training and visualising model predictions.

Results & discussion

As we can confirm both from the validation and particularly the test set metrics, the first model trained with the largest 64 x 64 crop size and on the full training data is the best performing one (we get 83.76% balanced accuracy, 99.28% precision, 69.65% recall and 0.13 loss on the test set).

While recall can come across as a worse metric, being < 70%, it’s worth remembering that we’re using one individual cooling tower to assess the operation of a power plant. A power plant can have multiple cooling towers and flue stacks from which plumes come out of. It is often the case that a power plant is operating but there’s only visible plumes in some of its cooling towers. Thus, with our dataset of cooling tower images, we’re bound to have false negatives where we do not see any activity on a specific cooling tower while the power plant is actually running.

Potential reasons for why it’s performing better having a larger crop size:

A larger crop size gives more context to the model, allowing it to see other infrastructure of the power plant, such as more cooling towers and flue stacks, and being able to see more of the plume if there is one
With a larger image come more features to work with, which can be beneficial for the model

Potential reasons for why it’s performing better when being less strict about cloud cover:

Using a less strict cloud filter allows for more images to be used for training the model; with a larger dataset, there’s a tendency for reaching better model performance
Cloud cover is only estimated using remote sensing methods; it can sometimes mistake power plant plumes for clouds and thus discard perfectly acceptable images

Limitations

Despite having trained a model that reached an interesting performance and that generalised well to the validation and test sets, there are still some limitations to this approach, such as:

While it’s useful to know whether a power plant is operating or not, this doesn’t give us very specific emissions estimates; we would need to either work on top of these outputs or change the modelling approach to estimate CO2 emissions (see the Climate TRACE Power sector — Electricity Generation Methodology)
When relying on visible spectrum imagery from satellites, as we did here, we’re hampered by weather factors such as clouds, snow and bright objects on the ground
This tutorial only modelled cooling towers of coal power plants; there are coal power plants that don’t use cooling towers (or at least not ones that generate clearly visible plumes) and there are other power plants that aren’t fuelled by coal; this means that for countries with a relatively low amount of coal power (e.g. Brazil) and countries with different types of prevailing cooling technology (e.g. Japan), this coal cooling tower modelling approach will be less useful to get country-level insights
Our data was all located in the US and Europe; without testing on samples from other locations in the world that have some form of ground truth labels, we don’t know how well the models generalise to the rest of the world

Next steps

There are many more experiments that can be done to try to improve emissions estimates from satellite images. Here are some suggestions to consider:

Use more Sentinel 2 bands (the folder on Google Drive contains images with all of the bands)
Add spatial indices
Explore a time-series approach
Explore pixel-wise, tabular approach
Work on a specific emissions estimation step
Try different models, including ones pre-trained on Sentinel 2 imagery
Add training data from other regions, by extracting power generation data
Tackle other types of power plants, such as ones with flue gas desulfurisation