Introduction

Inspired by this twitter, I downloaded monthly temperature data and produced similar plots…to celebrate hottest UK summer. Unlike their anomaly maps, I simply produce actual temperature maps. Here is one for July 2022.

2022 07

Notes:

Jupyter notebook is here
Data source: land temperature, sea temperature

Procedure

The procedure to produce a combined temperature map can be summarised into the following four figures:

data

Steps

The steps in the procedure are as below:

Download monthly data (see below for the data source). One for land and the other for sea temperatures.
- Sea data are already given in a rectangular form over longitudes and latitudes. Moreover, they seem to be clean - great! The values are missing over the land coordinates. See the first sub-figure above.
- Land data are recorded at stations over lands - so non-rectangular form. And, some records seem to be obviously outliers…. arg… See the second sub-figure above where some outliers are enclosed by circles.
Cleanse land data - identify and remove outliers. The third sub-figure is the cleansed land data. Also, see below on the outlier identification method.
From the cleansed land data, infer the missing temperatures in the sea data set, producing a combined temperature dataset over the rectangular grid. See the fourth sub-figure above. For the inference (imputation) method, see below.
Transform the combined data through a map projection, and draw a contour plot. See below further details.

Outlier Detection

A temperature record is deemed to be an outlier if it is very different from neighbouring values. To make this statement quantitative, we need to

Define the distance between two points. This is easy. Use the great-circle distance.
Specify an error band at each point. If the record is outside the band, it is tagged as an outlier.

Local and Global Methods:

Two methods are considered

a local method: For each record point, take $K$ nearest neighbouring points (measured in the great-circle distance), and calculate the lower, middle and upper quantiles, denoted by $l$, $m$, and $u$, respectively. Applying a scaling factor $s$, an error band for the point under question is set as
\[\begin{equation} [m+s(l-m), m+s(u-m)] \label{E:local-band} \end{equation}\]
a global method: Let $c(\lambda, \phi)$ be the temperature at longitude $\lambda$ and latitude $\phi$. Since the temperature variation is more pronounced along the latitude, we make a global spline fit $f(\phi)$ and calculate the lower, middle and upper quantiles of residuals $c(\lambda,\phi) - f(\phi)$, denoted by $L$, $M$, and $U$, respectively. Apply a scaling factor $s$ to produce an error band profile
\[\begin{equation} [f(\phi)+M+s(L-M), f(\phi)+M+s(U-M)] \label{E:global-band} \end{equation}\]

For illustrations, see the figures on the top row below.

For both methods,
- low, middle, and upper quantiles are calculated at 5%, 50%, 95%.
- $s = 4$
For the local method, $K=5$.
For the global method, we used the cubic spline with 9 knots.

Observe that

local method:
- The width of the error band is sensitive (maybe too much) to the dispersion of neighbouring temperatures. For example, it does a good job of capturing variations toward the south pole well, identifying no points there.
- The method does not work well when a neighbouring stations are densely located, creating very small error band. For example, two points (those closest to blue data points) are incorrectly identified as outliers.
global method: The opposite observations can be made.
- Some points near the north and south poles are incorrectly identified as outliers.
- No interior points are incorrectly identified as outliers.

outliers

(latitude, temperature) scatter plots.
blue points: data
green points: upper bounds of error bands
orange points: lower bounds of error bands
red points: outliers, i.e. those outside the bands

Combined Method

Let’s take the average of the local and global bands to get the combined band, which I hope to be the best of both. See the bottom-left figure, which seems to be working well! The bottom-right figure is the cleaned land dataset.

Land Temperature Inference

From the sea data set, consider a (longitude, latitude) point corresponding to a land location. We impute the temperature at the point as the weighted average of the cleaned land temperatures. The weighting is given as

\[\exp(-d^2/w)\]

where $d$ is the great-circle distance and $w$ is the width to determine the width. $w$ is set to the larger of (i) the half of the distance of 1 degree in latitude and (ii) the distance to the closest land station.

Map Projection

I used GeoPandas to package to apply ESRI Projection 54012.

TODOs

country specific historical time series.
make them into an animation.