‘as_triangle’ - Wrapper Function

In which I look at the ‘as.triangle()’ function from the {ChainLadder} package

R
Reserving
Author

Michael Gicheru

Published

February 21, 2025

The {ChainLadder} package is described as “an R package providing methods and models which are typically used in insurance claims reserving”. I was recently tasked with moving our claims reserving models from excel to R, as our models are getting too big to handle (some of them are pushing 1GB). The {ChainLadder} package exposes functions that have been incredibly useful and time-saving for me during this exercise. These functions include, converting long data into wide data i.e triangle (more on this later), calculating development factors and “predict” functions for calculating ultimate claims reserves. To be honest, all I had to do was manipulate our claims data into a usable format for the {ChainLadder} suite of functions, plug the data through them and finally create outputs in excel summarizing the final reserves.

That being said, there are a few “gotchas” that I have encountered that I think are worth writing about. I also worked out some solutions that hopefully address some of these “gotchas”. I discuss one of them here.

Quick Tutorial

Historical insurance data is typically represented in a triangle structure or wide format, showing the development of claims over time for each origin period. The triangle makes it easy to see the development of claims from one development period to another. As an example, we use data from the Reinsurance Association of America (RAA):

library(ChainLadder)
RAA
      dev
origin    1     2     3     4     5     6     7     8     9    10
  1981 5012  8269 10907 11805 13539 16181 18009 18608 18662 18834
  1982  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
  1983 3410  8992 13873 16141 18735 22214 22863 23466    NA    NA
  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
  1986 1513  6445 11702 12935 15852    NA    NA    NA    NA    NA
  1987  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
  1990 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA

From left to right, we can see the initial claim amount for all origin years under development “dev” period 1 and annual evaluations thereafter. A human could easily intuit the development of claims from this wide format. However, data is stored in a long format, where a variable has its own column and each row is an observation:

filename <-  file.path(system.file("Database",
                                   package="ChainLadder"),
                       "TestData.csv")
myData <- read.csv(filename)
raa <- subset(myData, lob %in% "RAA")
head(raa)
   origin dev value lob
67   1981   1  5012 RAA
68   1982   1   106 RAA
69   1983   1  3410 RAA
70   1984   1  5655 RAA
71   1985   1  1092 RAA
72   1986   1  1513 RAA

How do we get from long to wide? {ChainLadder} provides a function for this called as.triangle, which converts data into a triangle format. Let’s test this out:

raa.tri <- as.triangle(
  Triangle = raa,
  origin = "origin",
  dev = "dev"
)

raa.tri
      dev
origin    1    2    3    4    5    6    7   8   9  10
  1981 5012 3257 2638  898 1734 2642 1828 599  54 172
  1982  106 4179 1111 5270 3116 1817 -103 673 535  NA
  1983 3410 5582 4881 2268 2594 3479  649 603  NA  NA
  1984 5655 5900 4211 5500 2159 2658  984  NA  NA  NA
  1985 1092 8473 6271 6333 3786  225   NA  NA  NA  NA
  1986 1513 4932 5257 1233 2917   NA   NA  NA  NA  NA
  1987  557 3463 6926 1368   NA   NA   NA  NA  NA  NA
  1988 1351 5596 6165   NA   NA   NA   NA  NA  NA  NA
  1989 3133 2262   NA   NA   NA   NA   NA  NA  NA  NA
  1990 2063   NA   NA   NA   NA   NA   NA  NA  NA  NA

You may have noticed that this is not the same triangle as what was shown above. This is because RAA is a cumulative triangle while raa.tri is an incremental triangle. We use incr2cum to turn an incremental triangle into a cumulative one:

raa.cum <- incr2cum(Triangle = raa.tri, na.rm = FALSE)
raa.cum
      dev
origin    1     2     3     4     5     6     7     8     9    10
  1981 5012  8269 10907 11805 13539 16181 18009 18608 18662 18834
  1982  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
  1983 3410  8992 13873 16141 18735 22214 22863 23466    NA    NA
  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
  1986 1513  6445 11702 12935 15852    NA    NA    NA    NA    NA
  1987  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
  1990 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA

Pretty straight forward, I would say. In a perfect world, these functions are enough. Unfortunately, I don’t leave in a perfect world. Let’s put these functions through their paces.

The problem

I have adjusted the RAA long data to have some gaps in the years and removed one development period at the end. This is how it looks like now:

new_raa <- as.triangle(
  Triangle = raa.example
)

new_raa
      dev
origin    1    2    3    4    5    6    7   8   9
  1981 5012 3257 2638  898 1734   NA   NA 599  54
  1982  106 4179 1111 5270 3116 1817 -103 673 535
  1983 3410 5582 4881   NA 2594 3479  649 603  NA
  1984 5655 5900 4211 5500 2159 2658  984  NA  NA
  1985 1092 8473 6271 6333 3786  225   NA  NA  NA
  1986 1513 4932 5257   NA   NA   NA   NA  NA  NA
  1987  557 3463 6926 1368   NA   NA   NA  NA  NA
  1988 1351 5596 6165   NA   NA   NA   NA  NA  NA
  1989 3133 2262   NA   NA   NA   NA   NA  NA  NA
  1990 2063   NA   NA   NA   NA   NA   NA  NA  NA

Make it into a cumulative triangle:

new_raa_cum <- incr2cum(new_raa)
new_raa_cum
      dev
origin    1     2     3     4     5     6     7     8     9
  1981 5012  8269 10907 11805 13539    NA    NA    NA    NA
  1982  106  4285  5396 10666 13782 15599 15496 16169 16704
  1983 3410  8992 13873    NA    NA    NA    NA    NA    NA
  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA
  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA
  1986 1513  6445 11702    NA    NA    NA    NA    NA    NA
  1987  557  4020 10946 12314    NA    NA    NA    NA    NA
  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA
  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA
  1990 2063    NA    NA    NA    NA    NA    NA    NA    NA

You may have noticed NAs in periods we have observations in for example dev 9 for 1981. We can fix this with the na.rm argument:

new_raa_cum <- incr2cum(new_raa, na.rm = TRUE)
new_raa_cum
      dev
origin    1     2     3     4     5     6     7     8     9
  1981 5012  8269 10907 11805 13539 13539 13539 14138 14192
  1982  106  4285  5396 10666 13782 15599 15496 16169 16704
  1983 3410  8992 13873 13873 16467 19946 20595 21198    NA
  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA
  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA
  1986 1513  6445 11702 11702    NA    NA    NA    NA    NA
  1987  557  4020 10946 12314    NA    NA    NA    NA    NA
  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA
  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA
  1990 2063    NA    NA    NA    NA    NA    NA    NA    NA

This fixes some of the issues but there is one glaring problem left. This triangle is not really “square”. Checking the diagonal, we can see a missing value where you would not expect to see one, i.e. for the development year 1986, dev period 5. You would expect that since there was no observation for that development period, the cumulative amount would be 11702, however, in its stead there is a missing value.

Why did this happen? This is because the incr2cum function assumes the triangle is half of a perfect square, where the number of development periods is equal to the number of origin periods. This assumption fails in a lot of cases. In particular, this assumption fails for long tail classes, where there are some development periods when no new claims are recorded for a while. Someone opened an issue on the {ChainLadder} github page that does a good job of explaining this.

This is a problem because it breaks a lot of downstream functions that are essential in predicting the ultimate claims. I won’t go into these downstream functions here but I will provide a potential solution for this.

The solution

To fix this, we can make a wrapper function; call it as_triangle that creates a skeleton of all unique origin periods, then creating development periods based on the length of the unique periods.

as_triangle <- function(data, origin, dev, value) {
  # create skeleton
  unique_origins <- unique(data[[origin]]) # get unique origin periods
  dev_period <- 1:(length(unique_origins)) # create development periods
  triangle_skeleton <- expand.grid(unique_origins, dev_period, stringsAsFactors = FALSE)
  names(triangle_skeleton) <- c(origin, dev)  

  complete_skeleton <- merge(triangle_skeleton, data[, c(origin, dev, value)], by = c(origin, dev), all.x = TRUE)
  incremental_triangle <- ChainLadder::as.triangle(
    Triangle = complete_skeleton,
    origin = origin,
    dev = dev,
    value = value
  )

  return(incremental_triangle)
}

Let’s test this out:

new_raa_fix <- as_triangle(data = raa.example, origin = "origin", dev = "dev", value = "value")
new_raa_fix
      dev
origin    1    2    3    4    5    6    7   8   9 10
  1981 5012 3257 2638  898 1734   NA   NA 599  54 NA
  1982  106 4179 1111 5270 3116 1817 -103 673 535 NA
  1983 3410 5582 4881   NA 2594 3479  649 603  NA NA
  1984 5655 5900 4211 5500 2159 2658  984  NA  NA NA
  1985 1092 8473 6271 6333 3786  225   NA  NA  NA NA
  1986 1513 4932 5257   NA   NA   NA   NA  NA  NA NA
  1987  557 3463 6926 1368   NA   NA   NA  NA  NA NA
  1988 1351 5596 6165   NA   NA   NA   NA  NA  NA NA
  1989 3133 2262   NA   NA   NA   NA   NA  NA  NA NA
  1990 2063   NA   NA   NA   NA   NA   NA  NA  NA NA

The cumulative triangle will now be:

new_raa_fix_cum <- incr2cum(Triangle = new_raa_fix, na.rm = TRUE)
new_raa_fix_cum
      dev
origin    1     2     3     4     5     6     7     8     9    10
  1981 5012  8269 10907 11805 13539 13539 13539 14138 14192 14192
  1982  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
  1983 3410  8992 13873 13873 16467 19946 20595 21198    NA    NA
  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
  1986 1513  6445 11702 11702 11702    NA    NA    NA    NA    NA
  1987  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
  1990 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA

This is now a perfect square. There is a caveat to this function though; it assumes that the delay , i.e. the period between two development periods, is one. There are some triangles that are quarterly unlike the yearly example here, where development periods are labelled as 3, 9, 12… We can adjust the function further:

as_triangle <- function(data, origin, dev, value, delay = 1, start = 1) {
  # create skeleton
  unique_origins <- unique(data[[origin]]) # get unique origin periods
  dev_period <- seq(from = start, to = length(unique_origins), by = delay) # create development periods
  triangle_skeleton <- expand.grid(unique_origins, dev_period, stringsAsFactors = FALSE)
  names(triangle_skeleton) <- c(origin, dev)  

  complete_skeleton <- merge(triangle_skeleton, data[, c(origin, dev, value)], by = c(origin, dev), all.x = TRUE)
  incremental_triangle <- ChainLadder::as.triangle(
    Triangle = complete_skeleton,
    origin = origin,
    dev = dev,
    value = value
  )

  return(incremental_triangle)
}

With this, you need to know the delay beforehand. This is what I currently use for the IBNR exercise and I have not encountered any issues with it. However, don’t take my word for it. You can find the function in this github repository. You can adjust the function to suite your use case.

If you can think of any other solutions, please share!

Acknowledgements

I would like to acknowledge the person who opened the github issue. They provide an excellent explanation of the problem and inspired the solution in this blog.