Code & Math

Before I knew anything about programming, I came across a problem that has followed me ever since. It's definitely not a hard one, but for me it has come to represent the beauty of code and mathematics. It's the first thing I implement when I learn a new programming language.

The problem was stated to me like this: 

How many interpersonal relationships can be formed in a group of people of a given size?

When I started thinking about this, the first challenge was to specify this problem in mathematical terms. I drew groups as fully connected graphs with people as nodes and relationships as edges connecting them. Then I counted the edges and wrote a table with two columns - the number of people in the group, n, and the number of connections as a function of the group size, f(n):

n f(n)
1 0
2 1
3 3
4 6
5 10
6 15

The first thing I realized was that the number of potential relationships grows by n−1 for each new member added to the group. This makes intuitive sense, since the new member may form relationships with each of the existing members. So, my first intuition was to view this as a recursive function, where n is a positive integer:

$$f(n) = \begin{cases} 0 & \text{if $n = 1$} \\ (n - 1) + f(n - 1) & \text{if $n > 1$} \end{cases}$$

If the group consists of only 1 person, there are 0 possible relationships. And if it consists of more than 1 person, there are n−1 plus the number of possible relationships in a group without the last person in it. Clear as day! After defining the formula in this way, I wrote my very first code in JavaScript.*

// JavaScript
var numRelations = function(n) {
    if(n === 1){
        return 0;
    }
    else if(n > 1){
        return (n - 1) + numRelations(n - 1);
    }
}

This code is very close to the mathematical formulation, but the recursive implementation is pretty inefficient. It takes time to run and it fails for large integers due to stack overflow. Still, at the time I was quite proud to have found a way to solve my problem and calculate answers to the question, even with groups of thousands of people!

A while later, I started to learn programming in Java and got introduced to for-loops. This was a revelation to me! Suddenly I could implement the calculations much more efficiently, by using mutability and iteration - cornerstones in imperative programming. In the code below, rel is storing the total number of possible relations, being updated at each iteration by adding (i1) to itself, while i is in turn keeping track of the number of group members in the current iteration. It's all just a matter of repeated addition, going from i = 1 to i = n.

// Java
private int numRelations(int n) {
    int rel = 0;
    for (int i = 1; i <= n; i++) {
        rel += (i - 1);
    }        
    return rel;
}

This code is much more efficient, but it produces really strange results for large integers. I learned about numerical overflow this way - the integer variable rel that stores the result at each step has limited space in memory and therefore returns the wrong results when the numbers get too large.

Since then, I have taught myself to write code in R, Python and most recently Scala. For each language, I have written scripts to solve the problem and have found better and more elegant ways to formulate and understand it. More on that later. 

Something that I found almost poetic was learning about tail recursion in Scala. It turns out there is an efficient way of writing recursive functions that is very close to my first solution! The script below is almost as quick as the for-loop in Java, and it doesn't suffer from memory issues as my JavaScript implementation. Also, it makes use of the type Long, which allows for computation on larger numbers without numerical overflow. 

// Scala
def numRelations(n: Long): Long = {
    def loop(acc: Long, n: Long): Long = {
        if (n == 1) acc
        else loop((n - 1) + acc, n - 1) 
    }
    loop(0, n)
}

The secret lies in the inner loop of the function that has an accumulator acc which is updated at each recursive call - much like the variable rel in the Java code above, but without relying on mutable state.  So this has really 'closed the loop' for me when it comes to this specific problem ;)


As for alternative formulations of the problem, it may have occurred to you already that it can be stated as a combinatorial one. An equivalent question to the first one is:

How many possible (unordered) combinations of two elements are there in a set of n elements?

This is a special case of the binomial coefficient, where k = 2:

$$f(n) = {n \choose 2} = \frac{n!}{2!(n - 2)!} = \frac{n(n-1)}{2}$$

The last step is arrived at by cancelling out (n−2)! in both the numerator and the denominator. Let's implement the formula in python, just because of the clean and neat syntax!

# Python
def numRelations(n):
    return (n * (n - 1)) / 2

It should be clear to anyone that this is much much more efficient than any of the previous implementations - there's only one computation needed, instead of one operation per step going from 1 until n as previously. The time complexity is constant, O(1) instead of linear O(n), to use some computer science terminology. 

Interestingly, the exact same formula can be arrived at by using even older mathematics. As hinted already, the problem can also be viewed as a simple sum of an arithmetic sequence of n integers, going from 0 to n−1. Looking in the table above, f(n) at a given row is equal to the sum of all the values in the n column in the rows above that specific row. This means that we can use a clever formula that dates back to Aryabhata in 499 AD:

$$f(n) = S_n = \frac{n(a_1 + a_n)}{2} = \frac{n(0 + (n-1))}{2} = \frac{n(n-1)}{2}$$

That's beautiful if you ask me. You can reason about the problem from different mathematical standpoints but arrive at the same simple solution. And there are a lot of different ways of implementing code to make the calculations for you. 

I've probably been overthinking this particular problem by a large margin. But I think this example shows the importance of being comfortable with basic mathematics to find good ways of thinking about problems and arrive at efficient solutions. 

Of course, you can always resort to use existing implementations of mathematical functions, like the generic choose in R (which I've been using to validate the results from the scripts above). The script below is the last one in this post, and it's also the easiest one if you don't want to dig into the maths:

# R
num_relations <- function(n) choose(n = n, k = 2)

Hope you enjoyed reading about this problem and my journey when trying to understand it and solve it :) Please add any thoughts or inputs in the comment section below!

/Morgan Ström

* In the snippets, I'm only showing the algorithms and not the code for error handling etc.