Sending Money Overseas

In this post, I’m going to discuss a few R Shiny web apps I built to visualise the flow of remittances around the world. Remittances (for these purposes) are defined as consumer-to-consumer transfers of money across borders, often by diaspora workers. These apps are data exploration tools; because I don’t have a research question and can’t verify the underlying data, I didn’t set out to provide an analysis, but instead to provide something useful to someone who’s interested in remittances. It’s also the first time I’ve worked with Shiny apps and mapping software in R, and so I learned a lot during the development process.

I’m quite interested in bilateral remittance flows - the amount of money that is sent and received from each pair of countries in the world. Market-makers in this space, digital hawalas, care about two things, about two things: the total sum of money that moves, and the percentage of it that goes in each direction. We can characterise each bilateral flow as having a total amount - the sum of the money moving in each direction - and a balance measure. If the same amount of money moves in each direction, then the balance measure is 0, and if 100% of the total moves in one direction, then the balance measure is 1. Hawala systems work best when the currency flows are balanced, so we will colour balanced flows green, and unbalanced flows red.

The World Bank provides historical data about the total value of incoming and outgoing remittances for countries across the world; however, they don’t provide information about the bilateral flows between countries - indeed, they seem to see it as sensitive information. I suspect that you can get access to that information (and more) by paying McKinsey for their Global Payments Map. Some McKinsey visualisations built on this data can be found here - however, I don’t think they’re very useful. They feel a bit like what you see in an action film when someone “hacks the mainframe” - a stream of impressive-looking computery images, but too zoomed-out to get any actual information from them.

But although the World Bank don’t make their bilateral data available any more, it seems like they used to; two tools, provided by the Migration Policy Institute and the Pew Research Centre respectively, use very similar data to power visualisations, and have broken links back to the World Bank page that refuses to provide the bilateral data. These visualisations show the flow of remittances in 2017, denominated in US dollars; however, I think it’s possible to do better.

The MPI plot shows histograms for inflows and outflows from a country, but only uses a map to let you select countries; in contrast, the PRC plot doesn’t even bother to show histograms, but it does shade countries across the world by the amount they either receive from or send to a chosen country.

I’m skeptical about the accuracy of the data; the country names in the datasets don’t match, and the MPI data doesn’t include flows to and from the UK, whereas the PRC data does. Strangely, the sums of the bilateral inflows and outflows for each country, don’t match the data the World Bank provides here; however, by combining the two datasets, I think we can get a rough sense of global payment flows, accurate to within an order of magnitude.

There could be lots of reasons for this - most notably, disagreement over what counts as a remittance. To illustrate the point, here’s a brilliant line from a Nigerian fintech Substack:

Which brings me to the second part. It is amazing the amount of confusion that has been caused by different definitions of remittances. To make it really simple: on one hand you have finance guys and accountants who define remittances as the amount of cash sent. Cash dollars or euros or bitcoin :). On the other hand you have economists and statisticians who define remittances as the value of stuff sent. Which includes cash but also include cars, shirts, TVs, laptops, purses, random gifts and so on. The finance / accounting definition is typically calculated by looking at financial transactions. The economists/statistical definition is estimated by looking at trade flows and balance of payments. and of course cash.

Which is why both numbers look very different. Does Nigeria receive $30bn worth of finance/accounting defined remittances? NO. Last estimate I saw a few years ago was about $3bn. Does Nigeria receive $30bn worth of economist/statistical remittances? Probably close. It is an estimate remember.

Importantly, if you are sitting in a fancy building in Abuja looking for $30bn worth of cash remittances to divert to your official foreign exchange market then I hate to be the one to break it to you, you are not going to find it. Although I’m sure the people in the stats department downstairs could have told you that.

As such, I think it’s best to treat the results below as an exercise in data visualisation rather than as the basis for further conclusions.

Visualising the Data

First, let’s look at remittance flows for a single country, both in and out:

If any of these plots don’t initially load, just refresh the page.

It’s also possible to show this data on the same axis; the bar graph below shows the inflows and outflows between a given country and the rest of the world, ordered either by inflows, outflows, or the total going in either direction.

However, the histograms don’t tell the whole story. While MPI used their map simply to provide a way to select countries, the app below shows a map overlaid with remittance flows:

The flows are coloured by how balanced they are - green flows have the same amount of money going in each direction, red flows have all the money going in the same direction. For instance, looking at the plot above, the UK and South Africa would have a green flow, while the UK and Germany would have a red flow.
The arrows show the net direction of flow - for instance, more money flows from Australia to the UK than vice versa.
The transparency of the flows reflect the total value of the remittances sent - more opaque means higher value.
You can control the thickness of the flows and the size of the arrowheads using the sliders in the bottom left, and also set a minimum total flow amount.
You can remove individual countries from the plot, and focus on specific continents or countries.

I think that’s really cool - but what do we learn from this?

We shouldn’t draw conclusions about the global economy or payments industry from these visualisations; but I was surprised by how unbalanced global remittance flows are. While it seems intuitive that the phenomenon of migrants sending money back to their home country would create disparities, the data is remarkably skewed; if you plot a histogram of the balance of each of the relationships in the data, you get the result below:

Notably, however, European flows are pretty balanced, as one can see on the map above. That implies that it’s much easier to build a remittance network within Europe than it is to expand it globally - unlike, say, a social network, which will be more or less as useful to anyone with a smartphone and an internet connection regardless of where they are in the world.

I like these visualisations because they’re engaging, fun to use, and fun to build; but given the right data, I think it’s a really great way to understand the flow of global remittances.

How it’s Made

I hadn’t done much with Shiny or geographical data before; but these tutorials were really helpful for getting to grips with the technology. You can find all the code and data on my GitHub. I started off using base maps R, but quickly moved over to the tidyverse with ggmap. It integrates better with Shiny, and the tidy data approach makes it much easier to work with; for example, removing Antarctica from the plot was as simple as writing filter(region != “Antarctica”). A process like this takes a lot of iterations to get right - you can see some of my WIP maps below!

I had to solve lots of small problems along the way, but two neat ones stand out - both created by the International Date Line (IDL).

ggplot generates these maps using latitude and longitude coordinates - you can see the axis marks laid out in my attempt on the bottom-right! I wanted to be able to zoom in on each continent - but when you look at a histogram of the coordinates for each continent, you can see a problem:

Both the Americas and Oceania straddle the IDL; which means that when we plot them individually, ggplot will centre on 0 (i.e. Greenwich), rather than the real “middle” of the data. For the Americas, the problem is pretty easily solved by filtering out any land-masses with a longitude over 50; the only things over the line are a few of the Aleutian islands. But we can’t do that for Oceania, because entire countries are the “wrong” side of the IDL, including French Polynesia, Tonga, and Samoa - all of which have remittance flows! As such, we need to transform the longitude coordinates so that the origin is the IDL, not Greenwich, using the transformation below:

However, focusing on Oceania makes clear that there’s another problem with our visualisation- nasty horizontal lines cover the north and south of the map in the work-in-progress images above.

These come from a bug in the way that the flows between capital cities are drawn. It happens in two steps: first, we plot 100 points along the great circle, the line of shortest distance, which connects the two points on the Earth’s surface; and then we connect each of these points together in sequence using geom_path(). The issue is that when one of these paths cross the IDL from, say, (179.9,0) to (-179.9, 0), the path has to go all the way horizontally across the “front” of the map, whereas really we want it to go around the “back” of the map!

Each of these flows are named something like “Australia Canada”; to solve the problem, we have to identify the 253 flows that cross the IDL (out of 6504 total), by checking to see whether the difference between the maximum and minimum x-coordinate for each flow was more than 180 degrees; Having done that, we can to divide the sets of points that made up the flow into two sets either side of the IDL - giving us both “Australia Canada” and “Negative Australia Canada”. These points can then be connected by two separate paths. Once we’ve given the same colour and transparency to the flows on each side (which proved easier said than done), we’ve successfully solved the problem - and the map suddenly looks much cleaner and more intelligible.

Visualising the Data

How it’s Made

Simplex But Not Easy

Smoooth