library(ChainLadder)
The {ChainLadder} package is described as “an R package providing methods and models which are typically used in insurance claims reserving”. I was recently tasked with moving our claims reserving models from excel to R, as our models are getting too big to handle (some of them are pushing 1GB). The {ChainLadder} package exposes functions that have been incredibly useful and time-saving for me during this exercise. These functions include, converting long data into wide data i.e triangle (more on this later), calculating development factors and “predict” functions for calculating ultimate claims reserves. To be honest, all I had to do was manipulate our claims data into a usable format for the {ChainLadder} suite of functions, plug the data through them and finally create outputs in excel summarizing the final reserves.
That being said, there are a few “gotchas” that I have encountered that I think are worth writing about. I also worked out some solutions that hopefully address some of these “gotchas”. I discuss one of them here.
Quick Tutorial
Historical insurance data is typically represented in a triangle structure or wide format, showing the development of claims over time for each origin period. The triangle makes it easy to see the development of claims from one development period to another. As an example, we use data from the Reinsurance Association of America (RAA):
RAA
dev
origin 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 106 4285 5396 10666 13782 15599 15496 16169 16704 NA
1983 3410 8992 13873 16141 18735 22214 22863 23466 NA NA
1984 5655 11555 15766 21266 23425 26083 27067 NA NA NA
1985 1092 9565 15836 22169 25955 26180 NA NA NA NA
1986 1513 6445 11702 12935 15852 NA NA NA NA NA
1987 557 4020 10946 12314 NA NA NA NA NA NA
1988 1351 6947 13112 NA NA NA NA NA NA NA
1989 3133 5395 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
From left to right, we can see the initial claim amount for all origin years under development “dev” period 1 and annual evaluations thereafter. A human could easily intuit the development of claims from this wide format. However, data is stored in a long format, where a variable has its own column and each row is an observation:
<- file.path(system.file("Database",
filename package="ChainLadder"),
"TestData.csv")
<- read.csv(filename)
myData <- subset(myData, lob %in% "RAA")
raa head(raa)
origin dev value lob
67 1981 1 5012 RAA
68 1982 1 106 RAA
69 1983 1 3410 RAA
70 1984 1 5655 RAA
71 1985 1 1092 RAA
72 1986 1 1513 RAA
How do we get from long to wide? {ChainLadder} provides a function for this called as.triangle
, which converts data into a triangle format. Let’s test this out:
<- as.triangle(
raa.tri Triangle = raa,
origin = "origin",
dev = "dev"
)
raa.tri
dev
origin 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
You may have noticed that this is not the same triangle as what was shown above. This is because RAA
is a cumulative triangle while raa.tri
is an incremental triangle. We use incr2cum
to turn an incremental triangle into a cumulative one:
<- incr2cum(Triangle = raa.tri, na.rm = FALSE)
raa.cum raa.cum
dev
origin 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 106 4285 5396 10666 13782 15599 15496 16169 16704 NA
1983 3410 8992 13873 16141 18735 22214 22863 23466 NA NA
1984 5655 11555 15766 21266 23425 26083 27067 NA NA NA
1985 1092 9565 15836 22169 25955 26180 NA NA NA NA
1986 1513 6445 11702 12935 15852 NA NA NA NA NA
1987 557 4020 10946 12314 NA NA NA NA NA NA
1988 1351 6947 13112 NA NA NA NA NA NA NA
1989 3133 5395 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
Pretty straight forward, I would say. In a perfect world, these functions are enough. Unfortunately, I don’t leave in a perfect world. Let’s put these functions through their paces.
The problem
I have adjusted the RAA long data to have some gaps in the years and removed one development period at the end. This is how it looks like now:
<- as.triangle(
new_raa Triangle = raa.example
)
new_raa
dev
origin 1 2 3 4 5 6 7 8 9
1981 5012 3257 2638 898 1734 NA NA 599 54
1982 106 4179 1111 5270 3116 1817 -103 673 535
1983 3410 5582 4881 NA 2594 3479 649 603 NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA
1986 1513 4932 5257 NA NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA
Make it into a cumulative triangle:
<- incr2cum(new_raa)
new_raa_cum new_raa_cum
dev
origin 1 2 3 4 5 6 7 8 9
1981 5012 8269 10907 11805 13539 NA NA NA NA
1982 106 4285 5396 10666 13782 15599 15496 16169 16704
1983 3410 8992 13873 NA NA NA NA NA NA
1984 5655 11555 15766 21266 23425 26083 27067 NA NA
1985 1092 9565 15836 22169 25955 26180 NA NA NA
1986 1513 6445 11702 NA NA NA NA NA NA
1987 557 4020 10946 12314 NA NA NA NA NA
1988 1351 6947 13112 NA NA NA NA NA NA
1989 3133 5395 NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA
You may have noticed NAs in periods we have observations in for example dev 9 for 1981. We can fix this with the na.rm
argument:
<- incr2cum(new_raa, na.rm = TRUE)
new_raa_cum new_raa_cum
dev
origin 1 2 3 4 5 6 7 8 9
1981 5012 8269 10907 11805 13539 13539 13539 14138 14192
1982 106 4285 5396 10666 13782 15599 15496 16169 16704
1983 3410 8992 13873 13873 16467 19946 20595 21198 NA
1984 5655 11555 15766 21266 23425 26083 27067 NA NA
1985 1092 9565 15836 22169 25955 26180 NA NA NA
1986 1513 6445 11702 11702 NA NA NA NA NA
1987 557 4020 10946 12314 NA NA NA NA NA
1988 1351 6947 13112 NA NA NA NA NA NA
1989 3133 5395 NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA
This fixes some of the issues but there is one glaring problem left. This triangle is not really “square”. Checking the diagonal, we can see a missing value where you would not expect to see one, i.e. for the development year 1986, dev period 5. You would expect that since there was no observation for that development period, the cumulative amount would be 11702, however, in its stead there is a missing value.
Why did this happen? This is because the incr2cum
function assumes the triangle is half of a perfect square, where the number of development periods is equal to the number of origin periods. This assumption fails in a lot of cases. In particular, this assumption fails for long tail classes, where there are some development periods when no new claims are recorded for a while. Someone opened an issue on the {ChainLadder} github page that does a good job of explaining this.
This is a problem because it breaks a lot of downstream functions that are essential in predicting the ultimate claims. I won’t go into these downstream functions here but I will provide a potential solution for this.
The solution
To fix this, we can make a wrapper function; call it as_triangle
that creates a skeleton of all unique origin periods, then creating development periods based on the length of the unique periods.
<- function(data, origin, dev, value) {
as_triangle # create skeleton
<- unique(data[[origin]]) # get unique origin periods
unique_origins <- 1:(length(unique_origins)) # create development periods
dev_period <- expand.grid(unique_origins, dev_period, stringsAsFactors = FALSE)
triangle_skeleton names(triangle_skeleton) <- c(origin, dev)
<- merge(triangle_skeleton, data[, c(origin, dev, value)], by = c(origin, dev), all.x = TRUE)
complete_skeleton <- ChainLadder::as.triangle(
incremental_triangle Triangle = complete_skeleton,
origin = origin,
dev = dev,
value = value
)
return(incremental_triangle)
}
Let’s test this out:
<- as_triangle(data = raa.example, origin = "origin", dev = "dev", value = "value")
new_raa_fix new_raa_fix
dev
origin 1 2 3 4 5 6 7 8 9 10
1981 5012 3257 2638 898 1734 NA NA 599 54 NA
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 NA 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 NA NA NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The cumulative triangle will now be:
<- incr2cum(Triangle = new_raa_fix, na.rm = TRUE)
new_raa_fix_cum new_raa_fix_cum
dev
origin 1 2 3 4 5 6 7 8 9 10
1981 5012 8269 10907 11805 13539 13539 13539 14138 14192 14192
1982 106 4285 5396 10666 13782 15599 15496 16169 16704 NA
1983 3410 8992 13873 13873 16467 19946 20595 21198 NA NA
1984 5655 11555 15766 21266 23425 26083 27067 NA NA NA
1985 1092 9565 15836 22169 25955 26180 NA NA NA NA
1986 1513 6445 11702 11702 11702 NA NA NA NA NA
1987 557 4020 10946 12314 NA NA NA NA NA NA
1988 1351 6947 13112 NA NA NA NA NA NA NA
1989 3133 5395 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
This is now a perfect square. There is a caveat to this function though; it assumes that the delay , i.e. the period between two development periods, is one. There are some triangles that are quarterly unlike the yearly example here, where development periods are labelled as 3, 9, 12… We can adjust the function further:
<- function(data, origin, dev, value, delay = 1, start = 1) {
as_triangle # create skeleton
<- unique(data[[origin]]) # get unique origin periods
unique_origins <- seq(from = start, to = length(unique_origins), by = delay) # create development periods
dev_period <- expand.grid(unique_origins, dev_period, stringsAsFactors = FALSE)
triangle_skeleton names(triangle_skeleton) <- c(origin, dev)
<- merge(triangle_skeleton, data[, c(origin, dev, value)], by = c(origin, dev), all.x = TRUE)
complete_skeleton <- ChainLadder::as.triangle(
incremental_triangle Triangle = complete_skeleton,
origin = origin,
dev = dev,
value = value
)
return(incremental_triangle)
}
With this, you need to know the delay beforehand. This is what I currently use for the IBNR exercise and I have not encountered any issues with it. However, don’t take my word for it. You can find the function in this github repository. You can adjust the function to suite your use case.
If you can think of any other solutions, please share!
Acknowledgements
I would like to acknowledge the person who opened the github issue. They provide an excellent explanation of the problem and inspired the solution in this blog.