hicp
-package
The Harmonised Index of Consumer Prices (HICP) is the key
economic figure to measure inflation in the euro area. The methodology
underlying the HICP is documented in the HICP Methodological Manual
(European Commission 2024). Based on this
manual, the hicp
-package provides functions for data users
to work with publicly available HICP price indices and weights
(upper-level aggregation).
This vignette highlights the main package features. It contains three sections on data access, the classification of individual consumption by purpose (COICOP) underlying the HICP, as well as index aggregation, change rates and contributions of lower-level indices to the overall inflation rate. It also shows how the package functions can be similarly used for working with quarterly index series like the owner-occupied housing price index (OOHPI).
# load package:
library(hicp)
# set global options:
options(hicp.coicop.version="ecoicop.hicp") # the coicop version to be used
options(hicp.coicop.bundles=hicp:::coicop.bundles) # coicop bundle code dictionary (e.g., 08X)
options(hicp.all.items.code="00") # internal code for the all-items index
options(hicp.chatty=TRUE) # print package-specific messages and warnings
The hicp
-package offers easy access to HICP data from
Eurostat’s public database. For
that purpose, it uses the download functionality provided by Eurostat’s
restatapi
-package.
This section shows how to list, filter and retrieve HICP data using the
functions datasets()
, datafilters()
, and
data()
.
Eurostat’s database contains various data sets of different
statistics. All data sets are classified by topic and can be accessed
via a navigation tree. HICP data can be found under “Economy and finance
/ Prices”. An even simpler solution that does not require visiting
Eurostat’s database is provided by the function datasets()
,
which lists all available HICP data sets with corresponding metadata
(e.g., number of observations, last update).
The function output shows the first five HICP data sets. As can be
seen, a short description of each data set and some metadata are
provided. The variable code
is the data set identifier,
which is needed to filter and download data.
dtd[1:5, list(title, code, lastUpdate, values)]
#> title
#> <char>
#> 1: HICP at constant tax rates - monthly data (index)
#> 2: HICP at constant tax rates - monthly data (annual rate of change)
#> 3: HICP at constant tax rates - monthly data (monthly rate of change)
#> 4: HICP - administered prices (composition)
#> 5: HICP - country weights
#> code lastUpdate values
#> <char> <char> <num>
#> 1: prc_hicp_cind 2025.01.17 2193537
#> 2: prc_hicp_cann 2025.01.17 2016222
#> 3: prc_hicp_cmon 2025.01.17 4357812
#> 4: prc_hicp_apc 2025.01.17 728607
#> 5: prc_hicp_cow 2024.02.22 2921
The HICP is compiled each month in each member state of the European
Union (EU) for various items. Its compilation started in 1996.
Therefore, the data set of price indices is relatively large. Sometimes,
however, data users only need the price indices of certain years or
specific countries. Eurostat’s API and, thus, the
restatapi
-package allows to provide filters on each data
request, e.g., to download only the price indices of the euro area for
the all-items HICP. The filtering options can differ for each data set.
Therefore, the function datafilters()
returns the allowed
filtering options for a given data set.
The function output shows that the data set prc_hicp_inw
for the HICP item weights can be filtered with respect to the frequency
(freq
), the product code (coicop
), and the
geographical area (geo
). The table dtf
contains for each filter the allowed values, e.g., CP011
for coicop
and A
for freq
. These
filters can be integrated in the data download as explained in the
following subsection.
# allowed filters:
unique(dtf$concept)
#> [1] "freq" "coicop" "geo"
# allowed filter values:
dtf[1:5,]
#> concept code name
#> <char> <char> <char>
#> 1: freq A Annual
#> 2: coicop CP00 All-items HICP
#> 3: coicop CP01 Food and non-alcoholic beverages
#> 4: coicop CP011 Food
#> 5: coicop CP0111 Bread and cereals
Applying a filter to a data request can noticeably reduce the
downloading time, particularly for bigger data sets. The function
data()
can be used to download a specific data set.
# download item weights with filters:
item.weights <- hicp::data(id="prc_hicp_inw",
filters=list("geo"=c("EA","DE","FR")),
date.range=c("2015","2024"),
flags=TRUE)
The object item.weights
contains the item weights for
the euro area, Germany, and France from 2015 to 2024.
# inspect data:
item.weights[1:5, ]
#> Key: <coicop, geo, time>
#> coicop geo time values flags
#> <char> <char> <char> <num> <char>
#> 1: AP DE 2015 141.49 b
#> 2: AP DE 2016 146.47 <NA>
#> 3: AP DE 2017 141.30 <NA>
#> 4: AP DE 2018 139.96 <NA>
#> 5: AP DE 2019 141.78 <NA>
nrow(item.weights) # number of observations
#> [1] 13412
unique(item.weights$geo) # only EA, DE, and FR
#> [1] "DE" "EA" "FR"
range(item.weights$time) # from 2015 to 2023
#> [1] "2015" "2024"
If one would have wanted the whole data set, the request would
simplify to hicp::data(id="prc_hicp_inw")
.
HICP item weights and price indices are classified according to the
European COICOP (ECOICOP-HICP). This COICOP version is used by default
(options(hicp.coicop.version="ecoicop.hicp")
) but others
are available in the package as well. The all-items HICP includes twelve
item divisions, which are further broken down by consumption purpose. At
the lowest level of subclasses (5-digit codes), there is the finest
differentiation of items for which weights are available, e.g.,
rice (01111) or bread (01113). Both rice and bread
belong to the same class, bread and cereals (0111), and, at
higher levels, to the same group food (011) and division
food and non-alcoholic beverages (01). Hence, ECOICOP and thus
also the HICP follows a pre-defined hierarchical tree, where the item
weights of the all-items HICP add up to 1000. This section shows how to
work with the COICOP codes to derive for example the lowest level of
items that form the all-items HICP.
COICOP codes and bundles. The COICOP codes
underlying the HICP (ECOICOP)
consist of numbers. The code 00
is used in this package for
the all-items HICP although it is no official COICOP code (see
options(hicp.all.items.code="00")
). The codes of the twelve
divisions below are 01, 02,..., 12
. At the lowest level of
subclasses, the codes consist of 5 digits.
Using the function is.coicop()
, it can be easily checked
if a code is a valid COICOP code or not. This excludes bundle codes like
082_083
, which violate the standard COICOP code pattern,
but can be found in HICP data. Bundle codes can be generally detected
using the function is.bundle()
and be ‘unbundled’ into the
underlying valid COICOP codes using the function
unbundle()
.
# example codes:
ids <- c("00","CP00","01","08X")
# check if bundle codes:
is.bundle(id=ids)
#> [1] FALSE FALSE FALSE TRUE
# unbundle any bundle codes into their components:
unbundle(id=ids)
#> [[1]]
#> [1] "00"
#>
#> [[2]]
#> [1] "CP00"
#>
#> [[3]]
#> [1] "01"
#>
#> [[4]]
#> [1] "082" "083"
# bundle codes are no valid ECOICOP codes:
is.coicop(id=ids)
#> [1] FALSE FALSE TRUE FALSE
# games of chance have a valid ECOICOP code:
is.coicop("0943", settings=list(coicop.version="ecoicop"))
#> [1] TRUE
# but not in the ECOICOP-HICP version 1:
is.coicop("0943", settings=list(coicop.version="ecoicop.hicp"))
#> [1] FALSE
COICOP relatives. COICOP codes available in the data
downloaded from Eurostat’s database should be generally valid (except
for the prefix “CP”). More relevant is thus the detection of children
and parent codes in the data. Children are those codes that belong to
the same higher-level code (or parent). Such relations can be direct
(e.g., 01->011
) or indirect (e.g.,
01->0111
). It is important to note that children usually
exhibit exactly one parent, while a parent may contain multiple
children. This can be seen in the example below.
# example codes:
ids <- c("00","01","011","01111","01112")
# no direct parent for 01111 and 01112 available:
parent(id=ids, usedict=FALSE, closest=FALSE, k=1)
#> [[1]]
#> NULL
#>
#> [[2]]
#> [1] "00"
#>
#> [[3]]
#> [1] "01"
#>
#> [[4]]
#> NULL
#>
#> [[5]]
#> NULL
# but 011 is one indirect (or closest) parent:
parent(id=ids, usedict=FALSE, closest=TRUE)
#> [[1]]
#> NULL
#>
#> [[2]]
#> [1] "00"
#>
#> [[3]]
#> [1] "01"
#>
#> [[4]]
#> [1] "011"
#>
#> [[5]]
#> [1] "011"
# while 011 has two (indirect) children:
child(id=ids, usedict=FALSE, closest=TRUE)
#> [[1]]
#> [1] "01"
#>
#> [[2]]
#> [1] "011"
#>
#> [[3]]
#> [1] "01111" "01112"
#>
#> [[4]]
#> NULL
#>
#> [[5]]
#> NULL
The functions child()
and parents()
may be
useful for various reasons. To derive the composition of COICOP codes at
the lowest possible level, however, the function tree()
is
better suited. For the HICP, the derivation of this composition can be
done separately for each country and year. Consequently, the selection
of COICOP codes may differ across space and time. If needed, however,
specifying the argument by
in function tree()
allows to merge the composition of COICOP codes at the lowest possible
level, e.g., to obtain a unique selection of the same COICOP codes over
time. Because the derivation of COICOP codes searches in the whole
COICOP tree, the resulting composition of COICOP codes is also denoted
as the COICOP tree in this package.
# adjust COICOP codes:
item.weights[, "coicop":=gsub(pattern="^CP", replacement="", x=coicop)]
# derive separate trees for each time period and country:
item.weights[, "t1" := tree(id=coicop, w=values, flag=TRUE, settings=list(w.tol=0.1)), by=c("geo","time")]
item.weights[t1==TRUE,
list("n"=uniqueN(coicop), # varying coicops over time and space
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")][order(geo,time),]
#> geo time n w
#> <char> <char> <int> <num>
#> 1: DE 2015 295 1000.00
#> 2: DE 2016 295 1000.00
#> 3: DE 2017 295 1000.00
#> 4: DE 2018 295 1000.00
#> 5: DE 2019 295 1000.00
#> ---
#> 26: FR 2020 295 999.99
#> 27: FR 2021 295 999.94
#> 28: FR 2022 295 1000.01
#> 29: FR 2023 295 1000.04
#> 30: FR 2024 295 999.95
# derive merged trees over time, but not across countries:
item.weights[, "t2" := tree(id=coicop, by=time, w=values, flag=TRUE, settings=list(w.tol=0.1)), by="geo"]
item.weights[t2==TRUE,
list("n"=uniqueN(coicop), # same selection over time in a country
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")][order(geo,time),]
#> geo time n w
#> <char> <char> <int> <num>
#> 1: DE 2015 295 1000.00
#> 2: DE 2016 295 1000.00
#> 3: DE 2017 295 1000.00
#> 4: DE 2018 295 1000.00
#> 5: DE 2019 295 1000.00
#> ---
#> 26: FR 2020 284 999.98
#> 27: FR 2021 284 999.94
#> 28: FR 2022 284 1000.02
#> 29: FR 2023 284 1000.04
#> 30: FR 2024 284 999.95
# derive merged trees over countries and time:
item.weights[, "t3" := tree(id=coicop, by=paste(geo,time), w=values, flag=TRUE, settings=list(w.tol=0.1))]
item.weights[t3==TRUE,
list("n"=uniqueN(coicop), # same selection over time and across countries
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")][order(geo,time),]
#> geo time n w
#> <char> <char> <int> <num>
#> 1: DE 2015 93 1000.00
#> 2: DE 2016 93 1000.00
#> 3: DE 2017 93 1000.00
#> 4: DE 2018 93 1000.00
#> 5: DE 2019 93 1000.00
#> ---
#> 26: FR 2020 93 999.99
#> 27: FR 2021 93 1000.02
#> 28: FR 2022 93 1000.03
#> 29: FR 2023 93 999.98
#> 30: FR 2024 93 999.96
All three COICOP trees in the example above can be used to aggregate
the all-items HICP in a single aggregation step as the item weights add
up to 1000, respectively. While the selection of COICOP codes varies
over time and across countries for t1
, it is the same over
time and across countries for t3
.
The HICP is a chain-linked Laspeyres-type index (European Union 2016). The (unchained) price indices in each calendar year refer to December of the previous year, which is the price reference period. These price indices are chain-linked to the existing index using December to obtain the HICP. The HICP indices currently refer to the index reference period 2015=100. Monthly and annual change rates can be derived from the price indices. The contributions of the price changes of individual items to the annual rate of change can be computed by the “Ribe contributions”. More details can be found in European Commission (2024, chap. 8).
The all-items index is a weighted average of the items’ subindices. However, because the HICP is a chain index, the subindices can not simply be aggregated. They first need to be unchained, i.e., expressed relative to December of the previous year. These unchained indices can then be aggregated as a weighted average. Since the Laspeyres-type index is consistent in aggregation, the aggregation can be done gradually from the bottom level to the top or directly in one step.
In the following example, the euro area HICP is computed directly in
one step and also gradually through all higher-level indices. First, the
monthly price indices are downloaded from Eurostat’s database for the
index reference period 2015=100 (unit
) and the period from
December 2014 to December 2024.
# download monthly price indices:
prc <- hicp::data(id="prc_hicp_midx",
filter=list(unit="I15", geo="EA"),
date.range=c("2014-12", "2024-12"))
# manipulate data:
prc[, "time":=as.Date(paste0(time, "-01"))]
prc[, "year":=as.integer(format(time, "%Y"))]
prc[, "coicop" := gsub(pattern="^CP", replacement="", x=coicop)]
setnames(x=prc, old="values", new="index")
Second, the price indices are unchained separately for each ECOICOP
using the function unchain()
.
The (unchained) price indices prc
and the item weights
inw
are then merged into one data set.
# manipulate item weights:
inw <- item.weights[geo=="EA", list(coicop,geo,time,values,t1)]
inw[, "time":=as.integer(time)]
setnames(x=inw, old=c("time","values","t1"), new=c("year","weight","tree"))
# merge price indices and item weights:
hicp.data <- merge(x=prc, y=inw, by=c("geo","coicop","year"), all.x=TRUE)
Based on the derived ECOICOP tree, the unchained price indices are
aggregated in one step using the function laspeyres()
,
chained into a long-term index series using the function
chain()
, and finally re-referenced to the index reference
period 2015 using the function rebase()
. The resulting
index is plotted below.
# compute all-items HICP in one aggregation step:
hicp.own <- hicp.data[tree==TRUE,
list("laspey"=laspeyres(x=dec_ratio, w0=weight)),
by="time"]
setorderv(x=hicp.own, cols="time")
# chain the resulting index:
hicp.own[, "chain_laspey" := chain(x=laspey, t=time, by=12)]
# rebase the index to 2015:
hicp.own[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015")]
# plot all-items index:
plot(chain_laspey_15~time, data=hicp.own, type="l", xlab="Time", ylab="Index")
title("Euro area HICP")
abline(h=0, lty="dashed")
Similarly, the (unchained) price indices are aggregated gradually following the ECOICOP tree, which produces in addition to the all-items index all lower-level indices.
# compute all-items HICP gradually from bottom to top:
hicp.own.all <- hicp.data[ , aggregate.tree(x=dec_ratio, w0=weight, id=coicop, formula=laspeyres),
by="time"]
setorderv(x=hicp.own.all, cols="time")
hicp.own.all[, "chain_laspey" := chain(x=laspeyres, t=time, by=12), by="id"]
hicp.own.all[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015"), by="id"]
A comparison to the all-items index that has been computed in one step shows no differences. This highlights the consistency in aggregation of the Laspeyres-type index.
# compare all-items HICP from direct and step-wise aggregation:
agg.comp <- merge(x=hicp.own.all[id=="00", list(time, "index_stpwse"=chain_laspey_15)],
y=hicp.own[, list(time, "index_direct"=chain_laspey_15)],
by="time")
# no differences -> consistent in aggregation:
nrow(agg.comp[abs(index_stpwse-index_direct)>1e-4,])
#> [1] 0
User-defined aggregates can be easily calculated with the functions
aggregate()
and disaggregate()
. This is
particularly useful for the calculation of the HICP special aggregates
like food, energy or the overall index excluding the two.
# compute food and energy by aggregation:
spa <- spec.aggs[code%in%c("FOOD","NRG"), ]
hicp.data[time>="2019-12-01",
aggregate(x=dec_ratio, w0=weight, id=coicop,
agg=spa$composition,
settings=list(names=spa$code)),
by="time"]
#> time id w0 laspeyres
#> <Date> <char> <num> <num>
#> 1: 2019-12-01 NRG 101.31 1.0021199
#> 2: 2019-12-01 FOOD 190.17 1.0201893
#> 3: 2020-01-01 NRG 98.49 1.0079341
#> 4: 2020-01-01 FOOD 190.73 1.0067257
#> 5: 2020-02-01 NRG 98.49 0.9916196
#> ---
#> 118: 2024-10-01 FOOD 194.71 1.0255994
#> 119: 2024-11-01 NRG 99.11 0.9952538
#> 120: 2024-11-01 FOOD 194.71 1.0273904
#> 121: 2024-12-01 NRG 99.11 1.0014037
#> 122: 2024-12-01 FOOD 194.71 1.0261042
# compute overall index excluding food and energy by disaggregation
hicp.data[time>="2019-12-01",
disaggregate(x=dec_ratio, w0=weight, id=coicop,
agg=list("00"=c("FOOD","NRG")),
settings=list(names="TOT_X_FOOD_NRG")),
by="time"]
#> time id w0 laspeyres
#> <Date> <char> <num> <num>
#> 1: 2019-12-01 TOT_X_FOOD_NRG 708.53 1.0130118
#> 2: 2020-01-01 TOT_X_FOOD_NRG 710.78 0.9829321
#> 3: 2020-02-01 TOT_X_FOOD_NRG 710.78 0.9868098
#> 4: 2020-03-01 TOT_X_FOOD_NRG 710.78 0.9980751
#> 5: 2020-04-01 TOT_X_FOOD_NRG 710.78 1.0054154
#> ---
#> 57: 2024-08-01 TOT_X_FOOD_NRG 706.17 1.0251445
#> 58: 2024-09-01 TOT_X_FOOD_NRG 706.17 1.0258097
#> 59: 2024-10-01 TOT_X_FOOD_NRG 706.17 1.0283045
#> 60: 2024-11-01 TOT_X_FOOD_NRG 706.17 1.0224486
#> 61: 2024-12-01 TOT_X_FOOD_NRG 706.17 1.0270710
The resulting aggregates can finally be chained and rebased as shown before.
User-defined functions can be passed to aggregate()
as
well, which allows aggregation using various weighted or unweighted
bilateral index formulas. By contrast, the function
disaggregate()
requires the underlying data to be
aggregated as a Laspeyres-type index.
The HICP indices show the price change between a comparison period and the index reference period. However, data users are more often interested in monthly and annual rates of change.
Monthly change rates are computed by dividing the index in the
current period by the index one month before, while annual change rates
are derived by comparing the index in the current month to the index in
the same month one year before. Both can be easily derived using the
function rates()
. Contributions of the price changes of
individual items to the annual rate of change can be computed by the
Ribe or Kirchner contributions as implemented in the function
contrib()
.
# compute annual rates of change for the all-items HICP:
hicp.data[, "ar" := rates(x=index, t=time, type="year"), by=c("geo","coicop")]
# add all-items hicp:
hicp.data <- merge(x=hicp.data,
y=hicp.data[coicop=="00", list(geo,time,index,weight)],
by=c("geo","time"), all.x=TRUE, suffixes=c("","_all"))
# ribe decomposition:
hicp.data[, "ribe" := contrib(x=index, w=weight, t=time,
x.all=index_all, w.all=weight_all, type="year"),
by="coicop"]
# annual change rates and contribtuions over time:
plot(ar~time, data=hicp.data[coicop=="00",],
type="l", xlab="Time", ylab="", ylim=c(-2,12))
lines(ribe~time, data=hicp.data[coicop=="011"], col="red")
title("Contributions of food to overall inflation")
legend("topleft", col=c("black","red"), lty=1, bty="n",
legend=c("Overall inflation (in %)", "Contributions of food (in pp-points)"))
Most of the calculations shown in the previous two sections can be similarly done for quarterly (or annual) index series. The owner-occupied housing price index (OOHPI) is a prominent example for a chained quarterly price index. The OOHPI indices and weights can be downloaded from Eurostat’s database. Below, this is done for the period from 2014 to 2024 for the euro area.
# download quarterly OOHPI for euro area:
dtp <- hicp::data(id="prc_hpi_ooq",
filter=list(unit="I15_Q", geo="EA"),
date.range=c("2014-10","2024-12"))
# download annual OOH weights for euro area:
dtw <- hicp::data(id="prc_hpi_ooinw",
filter=list(geo="EA"),
date.range=c("2014","2024"))
Before calculations can start, any time variables in the data must be put first into proper dates. Afterwards, the indices and weights can be merged into a single data set.
# manipulate indices:
dtp[, c("year","quarter") := tstrsplit(x=time, split="-Q", fixed=TRUE)]
dtp[, "year":=as.integer(year)]
dtp[, "quarter":=as.integer(quarter)]
dtp[, "time":=as.Date(paste(year, quarter*3, "01", sep="-"), format="%Y-%m-%d")]
dtp[, c("unit","quarter"):=NULL]
setnames(x=dtp, old="values", new="index")
# manipulate item weights:
dtw[, "year":=as.integer(time)]
dtw[, c("unit","time"):=NULL]
setnames(x=dtw, old="values", new="weight")
# merge indices and item weights:
dtooh <- merge(x=dtp, y=dtw, by=c("geo","expend","year"), all.x=TRUE)
setcolorder(x=dtooh, neworder=c("geo","expend","year","time"))
setkeyv(x=dtooh, cols=c("geo","expend","time"))
The OOHPI is chained using the fourth quarter of the previous year.
Hence, for the aggregation of the OOHPI subcomponents, the indices must
first be unchained using the function unchain()
. The
argument by
of this function should now match to one month
of the relevant quarter. Hence, for the fourth quarter, by
should be set to 10
, 11
or 12
.
The unchaining then works as usual.
The subcomponents of the OOHPI do not follow the COICOP system.
Instead, they are classified into expenditure categories
(expend
). These must be (manually) selected for index
aggregation. For example, the total OOHPI is an aggregate of the two
categories ‘acquisition of dwellings’ (DW_ACQ
) and
‘ownership of dwellings’ (DW_OWN
). These two expenditure
categories are further broken down into finer ones. In the following,
they are used to compute the overall OOHPI, which is finally chained and
rebased to the year 2015.
# aggregate, chain and rebase:
dtagg <- dtooh[expend%in%c("DW_ACQ","DW_OWN"), list("oohpi"=laspeyres(x=ratio, w0=weight)), by="time"]
dtagg[, "oohpi" := chain(x=oohpi, t=time)]
dtagg[, "oohpi" := rebase(x=oohpi, t=time, t.ref="2015")]
It is important to note that the functions unchain()
,
chain()
and rebase()
auto-detect the frequency
of the time series. If users prefer to manually define the frequency,
the function settings can be changed to
settings=list(freq="quarter")
. The same is true for the
derivation of annual (or quarterly) change rates:
# derive annual change rates:
dtagg[, "ar" := rates(x=oohpi, t=time, type="year", settings=list(freq="quarter"))]
The annual change rates ar
show the percentage change of
the overall OOHPI in the current quarter compared to the same quarter
one year before. These change rates could be further decomposed into the
individual contributions of each expenditure category using the function
contrib()
.