Simplex But Not Easy
Last week I wrote about the flows of remittances around the world; the main point of that piece was the tool below, which lets you explore those flows in a visual and interactive way:
To create the map, I paired off flows that went between the same countries in opposite directions, representing them with an arrow pointing in the direction of the larger flow, a colour representing how balanced the two flows were in each direction, and a transparency showing the total value of both flows put together.
This reduces the total amount of money which has to move, which is probably a good thing; if you owe me £10 and I owe you £5, then it makes sense to say that you just owe me £5. However, it's possible to do better - for instance, if A, B, and C all owe each other money in a triangle, then we could resolve their debts. Unfortunately, this is a non-trivial problem, as the example below shows.
If we assume that each of the arrows represent a debt of £1, we can definitely simplify the graph; we could, for instance, cancel out the debt between A and B. However, this would still leave us with A owing C, and C owing B - so £2 outstanding in 2 different payments.
In this case, it's obviously better to resolve the 3-cycle A > C > B > A, rather than the 2-cycle A > B > A - we would then be left with just 1 payment of £1 outstanding. But there's no easy rule for resolving this problem on larger graphs - like our world map, which has 203 nodes and 4880 edges worth more than $500,000.
There's a brilliant Fast Show sketch which encapsulates the confusion that can result when you try to settle up debts between multiple people:
Our goal is to find a way to make sure every country in the world sends and receives the right amount of money; some countries send more remittances than they receive, and vice versa. We can formulate this as a demand, which will be either positive or negative for each country; the task then is to find the smallest set of flows that satisfy these demands.
This what a computer scientist would call a minimum-cost flow problem, which takes this form:
In our case, the vertices V of the graph G are the countries, and the arcs A are the remittance flows between them. The goal is to find a set of arcs F between nodes that will satisfy the demands of all the nodes, while minimising the total amount of flow. In our case, the capacities of the flows are infinite - there's no limit to how much money we can send between any pair of countries - and the costs are all 1, since we assume that it costs the same to send money between any two countries.
Interestingly, minimising the total amount of money flowing around the network is the same as minimising the total number of edges in the network, because of the integrality property: if we have integer demands and costs, then there will always be an integer-valued minimum-cost flow, since since we can always cancel a cycle with integer flow during each iteration of the cycle-cancelling algorithm.
This can also be stated as a linear programming problem:
In this context, we want to find a vector of flows f, where each flow goes from vertex i to vertex j, such that we minimise the scalar product of the costs and values of each of those flows; again, however, all our costs will be 1. If we made costs variable, we could incorporate the idea that, perhaps, it costs the same amount to send £1 from the UK to Afghanistan as it does to send £10 from the UK to France.
All these flows have to be non-negative, and they are also skew-symmetrical - their value is positive in one direction, and negative in another.
The flow vector f has length m, for the m flows that we start off with; the goal will be to reduce this vector down, so that many of its elements have value 0, reducing the number of non-zero edges in our graph.
The constraint Af = b represents the demands of each node. If we have n nodes, then we define b as a demand vector of length n; the demand will be positive if the node sends out money, and negative if it receives it.
A is then an n x m matrix, with a row for each country and a column for each flow between them. The entries of A are 1 if that flow (column) starts in that country (row), -1 if it finishes in that country, and 0 otherwise.
These problems can best be solved using the simplex algorithm, a piece of linear algebra originally invented by George Dantzig while working in operations research for the Americans in WW2. The mechanics of the algorithm go a little over my head, but there's a popular C implementation called lp_solve that we can call in R.
As an illustration, take the graph below, that represents the flow of remittances between 4 countries - the USA, Canada, South Africa, and France. In the base case, with our raw data, there are 12 flows, totalling $5.66bn.
By netting off the flows pairwise, we get to the bilateral solution, which I visualised in my last post; that results in the graph below, with 6 flows totalling $3.97bn, or 70.2% of the original total.
However, when we apply the simplex algorithm to the original graph, we get a much better solution - there are just 3 flows, totalling $3.67bn, or 64.8% of the original.
The graphs below compare the number and total value of the flows we get in each case:
We can apply the same algorithm to the entire graph of 203 nodes and 4880 edges, and the results are pretty impressive. Whereas the bilateral method reduces the total value of the flows from $577bn to $473bn, the simplex algorithm reduces it further down to $401bn - from 83.1% of the total to 69.5%. The simplex method, however, is particularly effective in reducing the total number of flows necessary to satisfy all the demands. Whereas there are 3535 edges in the bilateral graph, there are just 201 in the graph created by the simplex algorithm.
In fact, the simplex algorithm is so effective at reducing the number of flows necessary to satisfy all the demands that we can visualise them on a map. Trying to show all the flows in the bilateral solution creates an uninterpretable mess - but the simplex method produces a picture that we can actually read and understand.
The simplex method does produce some pretty unexpected results at times; the four graphs below show the flows in the raw data, in the bilateral solution, and after applying the simplex algorithm, for the top 30 flows according to the raw data, the simplex solution, and finally the magnitude of the difference between the bilateral and simplex solutions.
Some of these results are quite surprising - who'd have thought that the best solution involved sending billions of dollars from Russia to Morocco, or from Switzerland to Mexico? I think it's also a good example of how maths can be used to find a really non-intuitive solution to an important problem; algorithms first developed in the 1940s are still in use today.