Skip to main content

New Zealand

Ethics for new data sources and technology: Perspectives from Aotearoa New Zealand. Emma MacDonald (Statistics New Zealand)

Languages and translations
English

Ethics for new data sources and technology: Perspectives from Aotearoa New Zealand

Emma MacDonald Director, Interim Centre for Data Ethics and Innovation Statistics New Zealand

Data landscape:

Aotearoa New Zealand

• Closest country in the world to the South Pole.

• Land mass of ~ 268,000 km.

• Population ~ 5.1m people.

• 3rd in Transparency International Corruption index.

Data and Statistics Act 2022 A legislative framework that embeds ethics and trust

• Gives effect to the principles of the Treaty of Waitangi.

• Sets a robust framework for deciding on who should have research access to data.

• Determines under what conditions people can access the data.

We are beginning a 10-year programme to change how we collect data and statistics.

We know that the quality of data is critical

in building trust.

Integrative and inclusive

Image from Stats NZ

Māori authority over Māori data

Image from Te Kāhui Raraunga

Data ethics New Zealand’s framework

• Government Chief Data Steward role.

• Government Data Strategy and Roadmap.

• Guidance on ethics and data use.

• Centre for Data Ethics and Innovation.

• Data Ethics Advisory Group.

The future of data ethics The opportunities, challenges, and choices

Image from Stats NZ

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8

From field collection to alternative prices data at Stats NZ - Mark Colville, Frances Krsinich (Stats NZ, New Zealand)

Languages and translations
English

From field collection to alternative prices data at Stats NZ

Mark Colville

Frances Krsinich

14 June 2023

UNECE 'rethinking data collection' 12-14 June 2023 114/06/2023

Introduction

• Stats NZ has been moving from prices survey data to ‘alternative data’ and the methods associated with those since 2001 for inflation measurement:

- used cars - introduced a ‘multilateral method’ (hedonics) from 2001 (on large-scale survey data) then incorporated admin data in 2017

- Scanner data – consumer electronics (2014), supermarkets (2019)

- Rent price index from tenancy bonds data (2019)

- Overseas trade index (import data for TVs and phones: 2013, customs data all imports: 2020)

• Time saved and quality improved but risks from bespoke systems becoming ‘black boxes’ over time

• So, after 20 years, Stats NZ is building MAP (Multilateral Application Pipeline) to generalise our production processes

14/06/2023 UNECE 'rethinking data collection' 12-14 June 2023 2

Multilateral price indexes

• Traditional methods don’t work well with alternative data - chain drift (asymmetrical price/quantities due to sales) - implicit price movements associated with new products

• Over the last 20 years, significant research on multilateral methods - TDH, GEKS, TPD, GK, ITRYGEKS

• Stats NZ has adopted multilateral methods in production since 2001 - used cars (2001, TDH), consumer electronics (2014, ITRYGEKS), rents (2019, TPD), overseas trade index (2013, 2020, TPD)

• 2019 internal review recommended consolidation of processes for both production and R&D

14/06/2023 UNECE 'rethinking data collection' 12-14 June 2023 3

Production processes

Production processes are needed in addition to the index estimation itself:

• input diagnostics to explore and validate source data

• output diagnostics to validate indexes, and compare them to previous production runs, effect of splicing on most recent movement

• analytical measures such as decomposition (i.e. what drives change)

• processes to identify and deal with changes – e.g. to coding of characteristics

14/06/2023 UNECE 'rethinking data collection' 12-14 June 2023 4

Version control

Data Storage

Interface

System level logging • Timestamps for each production run incl. Topic, Period, User and System Version

[2023-04-04 11:41:13] PRD_RPIQ_2023.03 - mcolvill - v.1.4.0 – Complete

[2023-05-05 08:36:18] PRD_RPIM_2023.04 - mcolvill - v.1.4.0 – Complete

“Run” level logging • Timestamps each step of the production run

[2023-04-04 11:32:21] 00 - Log File Initialisation

[2023-04-04 11:32:21] 01 - Folder structure created

[2023-04-04 11:32:21] 02 - Running data ingest script

[2023-04-04 11:41:13] Run complete. Time taken: 8.9 mins

Thank you!

…and we welcome any questions or feedback:

[email protected]

[email protected]

Performance time

The multilateral R package is the index-estimating R package that sits within the wider Multilateral Application Pipeline (MAP) R-based system.

Relative processing times (in minutes) using multilateral within the Stats NZ environment using parallel processing (with four CPU cores) compared to standard runs (one CPU core) on two years of supermarket scanner data - approximately 50 million observations. (Note – in this example both the GEKS-Tornqvist and TPD (time-product dummy) methods use geomean splicing and an estimation window length of 13 months).

14/06/2023 UNECE 'rethinking data collection' 12-14 June 2023 9

GEKS-T 45 min (1 core), 23 min (4 cores) TPD 105 min (1 core), 36 min (4 cores)

References Bentley, A and F Krsinich (2017) Towards a big data CPI for New Zealand Paper presented at the 2017 Ottawa Group, Eltville, Germany

Bentley, A (2022) Rentals for Housing: A Property Fixed-Effects Estimator of Inflation from Administrative Data Journal of Official Statistics, 38(1)

de Haan, J and Krsinich, F (2014) Scanner data and the treatment of quality change in nonrevisable price indexes Journal of Business and Economic Statistics, 32(3)

Krsinich, F (2016) The FEWS index: Fixed effects with a window splice Journal of Official Statistics 32(2)

Stansfield, M (2019) Import and export price indexes using fixed-effects window-splicing Paper presented at the 2019 New Zealand Association of Economists conference, Wellington, New Zealand

Stansfield, M and F Krsinich (2022, June). A MAP for the future of price indexes at Stats NZ Paper presented at the 17th Ottawa Group 2022, Rome, Italy

Stansfield, M (2022) Multilateral R package available on the Comprehensive R Archive Network (CRAN)

Stats NZ (2014) Measuring price change for consumer electronics using scanner data

Stats NZ (2019a) New methodology for rental prices in the CPI

Stats NZ (2019b) Overseas trade price indexes through a multilateral method 14/06/2023 UNECE 'rethinking data collection' 12-14 June 2023 10

  • Slide 1
  • Slide 2: Introduction
  • Slide 3: Multilateral price indexes
  • Slide 4: Production processes
  • Slide 5: Version control
  • Slide 6: Data Storage
  • Slide 7: Interface
  • Slide 8: Thank you!
  • Slide 9: Performance time
  • Slide 10: References

From field collection to alternative price data at Stats NZ

Paper presented at the UNECE Expert meeting on Statistical Data Collection, Geneva 12-14 June 2023

Mark Colville, Frances Krsinich, Prices, Stats NZ P O Box 2922

Wellington, New Zealand [email protected] www.stats.govt.nz

Disclaimer Conference papers represent the views of the authors, and do not imply commitment by Stats NZ to adopt any findings, methodologies, or recommendations. Any data analysis was carried out under the security and confidentiality provisions of the Data and Statistics Act 2022.

Liability statement Stats NZ gives no warranty that the information or data supplied in this paper is error free. All care and diligence has been used, however, in processing, analysing, and extracting information. Statistics New Zealand will not be liable for any loss or damage suffered by customers consequent upon the use directly, or indirectly, of the information in this paper.

Reproduction of material Any table or other material published in this paper may be reproduced and published without further licence, provided that it does not purport to be published under government authority and that acknowledgement is made of this source.

Citation Colville, M and F Krsinich (2023, June). From field collection to alternative price data at Stats NZ. Paper presented at the UNECE Expert meeting on Statistical Data Collection, Geneva

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

2

Executive summary Stats NZ has increasingly been using alternative data for inflation measurement over the last 20 years. In particular, scanner data was introduced into the NZ CPI for consumer electronics products in 2014, replacing both field collection and significant in-office resources spent on largely subjective quality adjustment.

Efforts to get scanner data for supermarket products were given a kickstart during the COVID pandemic and average prices for products in the CPI basket of goods have been sourced from scanner data rather than field collection since then. With the recent agreement from supermarket retailers to provide expenditure data in addition to prices for all products, we are now able to improve the index quality by using new methodology to incorporate data for all supermarket products.

Stats NZ is currently developing a generalised production process for these new methodologies, called MAP (the ‘multilateral application pipeline’) – which uses a common process after the initial data wrangling stage, with methods and parameters set specific to the data source. The production of price indexes from supermarket scanner data will join that of consumer electronics products (scanner data), used cars (survey and vehicle registration data), rents (tenancy bond data) and overseas trade indexes (customs data) in production using MAP. We will present and discuss this development in terms of its efficiency gains and futureproofing of our ability to use alternative data sources for inflation measurement.

Introduction In this paper we give a history of Stats NZ’s1 use of multilateral methods for alternative prices data and explain why we are now developing a generalised research and production system in R called the Multilateral Application Pipeline (MAP).

In addition to index estimation, other processes are required in production, and these need to be automated and standardised across different price indexes and data sources to aid transparency, robustness, and efficiency. These include:

1. input diagnostics to explore and validate source data 2. output diagnostics to validate the results of index estimation against those of previous

periods 3. analytical measures such as decompositions, or ‘points effects’, to aid insights into the

aggregate-level price indexes 4. processes to identify and deal with changes in the coding of characteristics, if those

characteristics are used for explicit hedonic modelling2, or if they are required for the creation of unique product identifiers3

Because Stats NZ has adopted a range of multilateral methods gradually for a number of data sources over the last 20 years, over time a range of production processes across SAS, Excel, and R, were introduced, with different levels of automation and robustness. The development of MAP

1 Statistics NZ is now called ‘Stats NZ’.

2 For example, in the time dummy hedonic (TDH) or Imputation Törnqvist Rolling Year GEKS (ITRYGEKS) indexes

3 Such as required for consumer electronics scanner data where model name is masked for those products sold predominantly by one retailer, to protect the confidentiality of that retailer.

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

3

enables us to simultaneously improve our current production processes and pre-build the development and production system for future adoption of new alternative price data sources.

Since Stats NZ has started using these methods, the theory of multilateral price indexes has developed, and we are now in a position to develop a system that generalises all the production processes once the source data has been transformed into a consistent format. The appropriate index estimation can then be specified with parameters for each choice of a multilateral index method, a splicing method, and an estimation window length, with flexibility to easily change these settings in response to future theoretical findings.

Multilateral price indexes Traditional index methods do not work well with alternative prices data4 such as scanner data,

administrative data, and web-scraped online data for two main reasons:

1. Chained superlative indexes5 tend to exhibit ‘chain drift’ when frequent sales result in

asymmetric volatility in prices and quantities.

2. Matched-model methods omit the implicit price movements associated with the

introduction of new products.

Over the last 20 years, there has been a significant amount of research and development in this

area, resulting in the adoption of multilateral index methods, such as:

• the Time Dummy Hedonic (TDH)

• the Rolling Year GEKS (RYGEKS) (Ivancic, Diewert and Fox, 2011)

• the Time Product Dummy (TPD) (ibid.) or FEWS (Krsinich, 2016)6

• the Imputation-Törnqvist RYGEKS (ITRYGEKS) (de Haan and Krsinich, 2014)

Evolution since 2001 at Stats NZ

Stats NZ has used alternative data and multilateral methods in the New Zealand Consumers Price Index (NZ CPI) for used cars from 2001; consumer electronics from 2014 and housing rentals from 2019. In the NZ Overseas Trade Index (OTI), a multilateral method was used for mobile phones and televisions from 2013 before being fully adopted for all price indexes from customs data in the OTI in 2020.

Krsinich (2014) explains the adoption of multilateral methods at Stats NZ in the wider context of the history of quality adjustment in the New Zealand Consumers Price Index (NZ CPI).

Used cars

Stats NZ first used a multilateral index in production in 2001, when a time-dummy hedonic (TDH) index was adopted as a more efficient and accurate way of estimating price change from a large-

4 Also known as ‘non-traditional data’ or ‘big data’ in the context of price measurement, though many argue that these data sources are not strictly ‘big data’. A more accurate term might be ‘bigger data’.

5 The seemingly appropriate way to estimate representative price indexes in the context of rapidly changing product universes and full-coverage data.

6 The FEWS index explicitly combined window-splicing with a TPD index to address the systematic bias that would result from using a TPD in production for a non-revisable index such as the CPI. Now that splicing (of more than just the latest period) is recognised as an important element in the specification of multilateral methods, the distinction between TPD and FEWS is no longer required and so we will now tend to use the original term ‘TPD’ to refer to this method.

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

4

scale survey of all used cars sold by a sample of used-car dealers. In 2011 the hedonic formulation was improved and in 2017 administrative data on used cars’ characteristics from the New Zealand Transport Authority was incorporated to reduce respondent burden.

Rental prices

In 2009 a time-product dummy (TPD) was used to benchmark the performance of the then matched- model rental index based on a longitudinal survey of landlords. Exploring the properties of this approach then motivated further research by Stats NZ into the potential of using fixed-effects (or time-product dummy) indexes with splicing more generally, for any longitudinal price data with insufficient data on product characteristics to exploit explicit hedonic methods such as the TDH.

In 2019 Stats NZ then redeveloped the rental index in production as a TPD7 index based on tenancy bond data (Stats NZ, 2019a; Bentley, 2022).

Overseas trade indexes

In 2013 the TPD was used to estimate price indexes for mobile phones and televisions from import

data in the overseas trade index then, in 2020, Stats NZ fully adopted the TPD for estimation of all

price indexes from customs data for the NZ OTI (Stansfield, 2019; Stats NZ, 2019b).

Consumer electronics

In 2014 the Imputation Törnqvist Rolling Year GEKS (ITRYGEKS) (de Haan and Krsinich, 2014) index

was adopted to estimate price indexes from scanner data for consumer electronics products in the

NZ CPI (Stats NZ, 2014).

Stats NZ’s strategy for future use of alternative price data

Bentley and Krsinich (2017) gave an overview of the potential for alternative data in the NZ CPI. Following this, in 2021 an internal review by Stats NZ recommended a strategy for the future of using alternative data in the NZ CPI. The internal report’s key recommendation was that Stats NZ should pursue the development of a generalised processing system to consolidate the existing production processes and provide a solid basis for the future incorporation of alternative data sources. The paper by Stansfield and Krsinich (2021) presents some of the conclusions and empirical testing undertaken during that review.

Production processes are non-trivial At price index conferences and in the literature, most of the focus on the use of alternative data has centred around index methodology. In particular, on the still-evolving concepts, limitations and empirical results relating to multilateral index methods.

However, the estimation of indexes is just one element of what must be dealt with when using alternative data in production. It is also crucial in the production of price indexes to understand 1. what drives aggregate price movements and 2. the impact on the most recent index movement of

7 With a geomean splice.

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

5

the splicing procedure used8. Issues also arise when dealing with incomplete or inconsistent-across- time source data which, in Stats NZ’s experience, is the rule rather than the exception with this data.

Many of the processes required for these insights and mitigations become non-trivial to automate at scale. The iterative development of the MAP system is therefore incorporating an automation of processes which, in the past, have involved relatively time-consuming analytical work, often at-least partially using Excel.

Stansfield and Krsinich (2022) show in more detail the implications of inconsistent coding over time of scanner data for consumer electronics products, and the need for production processes to deal with this.

Empirical testing at scale The ability to automate and scale up both the index estimation and many of the associated production processes is also important when determining the appropriate methods for new data sources. Decisions are required about which underlying multilateral index methods to use (e.g., TDH, GEKS-T, GEKS-IT9, TPD) and what their appropriate settings should be in terms of splicing method and estimation window length. While some methods will be better than others based on theoretical considerations, we acknowledge that the theory is still evolving. This heightens the importance of empirical testing – across methods and their parameters, and against historical series (where they exist) to help justify those decisions.

The Multilateral Application Pipeline As already mentioned, until recently the processing of alternative data sources with multilateral methods at Stats NZ has used bespoke systems across a variety of different languages and operational systems, namely Excel, SAS, and R, with varying degrees of manual intervention required by analysts.

The earliest implemented processes, such as those for used cars and consumer electronics, are inefficient in various ways by today’s standards. For example, the splicing10 of the most recently estimated quarter’s movement for used cars is done in Excel in a relatively manual way, rather than coded into the production of the index. For consumer electronics, the identification and treatment of changed coding for characteristics has been done in excel and is labour-intensive and relatively opaque without documentation of decisions and treatments incorporated into the system itself.

By centralising the process in Stats NZ’s new Multilateral Application Pipeline (MAP) system, the integration of alternative data sources and multilateral methods can be consolidated and

8 The use of splicing (where the splicing period is greater than just the latest period) trades off the quality of the most recent movement in favour of the longer term index. While this is generally a desirable property, focus is often on the most recent movement (either annual or monthly) so the NSO should understand the impact of the implicit revisioning implied by the splicing.

9 The multilateral package refers to ITRYGEKS as GEKS-IT (GEKS Imputation Törnqvist) as a more standardised naming convention. Similarly, GEKS-T (or the CCDI) is the Rolling Year GEKS based on a Törnqvist index and GEKS-J is the GEKS based on a Jevons index.

10 The CPI is non-revisable. Multilateral methods, however, re-estimate a back series with each successive period. This means that new results must be ‘spliced’ onto the published index such that they preserve the integrity of the published series (by incorporating a ‘revision’ factor). See de Haan (2019) for a discussion of different splicing approaches.

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

6

streamlined. Creating a centralised system also brings transparency to these complex processes and a platform upon which team members can learn, with links to documentation and instructions.

While making production processes for the existing use of alternative data more robust and transparent, this generalised system also lays much of the groundwork for future implementations of new data sources.

Multilateral R package for index estimation

Over the past few years, we have developed an R package for estimating all the multilateral indexes in production at Stats NZ, the multilateral package, which is now available at CRAN11. Some of the underlying functions are an implementation of the IndexNumR package12 by Graham White. We have also added multilateral methods that use hedonic regression modelling, such as the time dummy hedonic (TDH) and the Imputation Törnqvist Rolling Year GEKS (ITRYGEKS13).

Stats NZ built our own package internally to ensure full transparency, particularly for our validation against existing SAS-based implementations, and with consideration of speed and the flexibility to change between methods and parameters easily. For speed of processing, the package allows parallel processing and optimized functions like sparse matrices and memory efficient operations. The extra hedonic regression functionality is computationally intensive and requires this optimization.

Figure 1 shows the relative processing times within the Stats NZ environment using parallel processing (with four CPU cores) compared to standard runs (one CPU core) on two years of data of approximately 50 million observations. Both the GEKS-T and TPD methods use geomean splicing and an estimation window length of 13 months.

Figure 1: The effect of parallel processing on run-time (in minutes) of the GEKS and TPD indexes

GEKS-T 45 min (1 core), 23 min (4 cores) TPD 105 min (1 core), 36 min (4 cores)

The multilateral R package is the index-estimating R package that sits within the wider Multilateral Application Pipeline (MAP) system.

11 Comprehensive R Archive Network https://cran.r-project.org/web/packages/multilateral/index.html

12 https://mirrors.pku.edu.cn/cran/web/packages/IndexNumR/index.html

13 Referred to as GEKS-IT in the multilateral R package.

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

7

Overview of the MAP system

The goal of the Multilateral Application Pipeline (MAP) is to be a generic system capable of consuming raw data, processing it, producing statistics, and presenting diagnostic information to the end user (internal analysts). The main use-case of MAP at Stats NZ is to calculate multilateral price indexes on a range of product categories using the “multilateral” R package (Stansfield, 2022), to validate those outputs using a variety of diagnostic measures, and then to be collated with other indexes for dissemination. The system is designed to be very user-friendly, requiring no prior coding experience, and during typical usage of MAP manual intervention should not be required. The system is operated using a simple interface of button prompts and text entry fields, resembling a stand-alone application.

Architecture

The MAP system is written in R and R Markdown built into an R package alongside a secure file storage location for data steady-states and metadata. The system uses a single high-level function to run end-to-end, and a Shiny application is included as the primary intended method for non- developers to use MAP which streamlines their interaction with the system.

Modularity

The system is designed to be flexible with functionality separated into discrete steps, including Initialization, Storage Setup, Data Ingestion, Editing and Imputation, Index Calculation, Data Export, Diagnostics and Cleanup. Individual steps can be run in isolation or repeated, such as when an error occurs, removing the need for running redundant steps.

To streamline using MAP, all indexes produced from a distinct alternative data source are bundled together into an “output”. These outputs are typically aligned to a specific statistical output, such as the Rent Price Index (RPI), or a homogenous group of products such as supermarket products. Each output has a corresponding metadata file that describes calculation parameters, in addition to discrete data ingestion processes, diagnostics and output data structures.

Steady-states and version control

To maintain strict reproducibility of our statistics, data steady-states are produced during the processing of data sources. These states vary depending on the origin and specifications of the data source, but typically include the data in its raw unadjusted format, processed states before and after editing and imputation, followed by the production statistics. Each steady-state is date-time- stamped, allowing traceability of any statistical output from the system.

MAP is version-controlled using Gitlab, making use of branches to allow development of the system to occur alongside production outputs. Designated releases additionally make it easier to trace any specific statistical output, to provide documentation on changes and to simplify troubleshooting.

Documentation and Diagnostics

Due to the ability to version-control MAP, it was beneficial to incorporate documentation directly into the system. User guides and process documentation are written in R Markdown and are built directly into the Shiny application used by internal analysts running MAP.

Diagnostic reports are written in R Markdown, with default diagnostics for input data and output indexes, with the ability to create tailored diagnostics for product groups, or specific products. Desirable diagnostics include analysis of input data such as column/row count, expected variables and simple averages of key variables. More complex diagnostics such as interactive graphs of multilateral index splicing are also available, allowing visualisation of complex analysis.

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

8

Example: migrating the consumer electronics scanner data system

The original production system for consumer electronics scanner data was implemented in 2014 in SAS, with diagnostic and analysis processes largely executed in Excel. The system required manual intervention from analysts to produce and respond to diagnostic processes. At the time it was introduced, the system was well understood but with staff turnover the process has slowly turned into a ‘black box’ with gradual loss of understanding of the purpose underlying key steps. This has made the system quite fragile and overly dependent on a few senior technical staff to deal with ad- hoc issues.

The system usually took about four days for an analyst to run, as issues often arose requiring bespoke problem-solving. If no issues at all arose the fastest possible run (which incorporated some quite laborious semi-automated work in Excel) took approximately 4 hours.

With its migration to MAP this system can now run in less than 5 minutes end-to-end, with all reports automatically produced. The new process has required little manual intervention and is significantly simpler to maintain and interpret.

Future plans to migrate into MAP

To date, used cars, consumer electronics and rents have been migrated to MAP. The next system to migrate is overseas trade indexes (which use customs data). Although this already has its own relatively robust R-based systems, it will be rebuilt in the generalized MAP system to enable full consolidation and streamlining. Despite the overseas trade index migration not yet being complete, we are already observing a ~60% decrease in processing time based on the consolidation into MAP.

Likely future data sources to be developed in the MAP system:

• Supermarket scanner data is in the exploration stage for use of multilateral methods,

with the testing of methods and parameters, and investigation of the raw data.

• A prototype official house price index able to be disaggregated into land and structure

indexes, using local councils’ valuation and sales data (see Krsinich, 2019)

We are also now exploring the potential to use MAP inside the NZ GS114 environment to produce

indexes securely with release to Stats NZ of the aggregate-level indexes. This is looking very

promising.

Conclusion In addition to the methodological challenges of using alternative data for price index estimation, there are non-trivial issues associated with production at scale. Our development of the R-based Multilateral Application Pipeline (MAP) helps to automate, consolidate, and generalise these production processes.

The development of MAP has been iterative, starting with the migration of existing production systems for used cars and consumer electronics products, from SAS and Excel. More recently, the Rent Price Indexes (based on tenancy bond administrative data) were consolidated from existing R

14 GS1 hold price and quantity information corresponding to their barcode information from a market research company, meaning that sufficient information for (non-hedonic) multilateral index (such as TPD or GEKS) methods is available within their secure environment, though not able to be released at that level of disaggregation.

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

9

systems, and we are currently migrating the R-based systems for the Overseas Trade Index (based on customs data).

We plan to develop supermarket scanner data and a prototype house price index using the MAP system, and we are currently exploring the use of MAP inside NZ GS1’s secure environment to enable the safe use of confidential price and quantity data linked to barcode information.

For Stats NZ, there are multiple benefits of the MAP system:

• A reduction in manual, error-prone processes – everything that can be automated will be automated.

• More transparency, with the underlying code open for review and reuse by others.

• Diagnostics, monitoring, and analysis are incorporated alongside index estimation.

• Index estimation is done with our in-house developed multilateral R package, which enables the full range of multilateral methods already in production at Stats NZ, and performs well at scale through optimised functionality and parallel processing.

• Consistent interfaces and processes across product types, data sources and methods.

• The potential for incorporation of links to training and documentation in the user interface.

In addition to the multilateral package (Stansfield, 2022) the rest of MAP’s R packages will be made open source and available from CRAN - we hope that other agencies and researchers will also make use of them in their research and development.

References Bentley, A and F Krsinich (2017) Towards a big data CPI for New Zealand Paper presented at the 2017 Ottawa Group, Eltville, Germany

Bentley, A (2022) Rentals for Housing: A Property Fixed-Effects Estimator of Inflation from Administrative Data Journal of Official Statistics, 38(1)

Bentley, A and Krsinich, F (2022) Timely Rental Price Indices for thin markets: Revisiting a chained property fixed-effects estimator Paper presented at the 2022 Ottawa Group conference, Rome, Italy

de Haan, J and Krsinich, F (2014) Scanner data and the treatment of quality change in nonrevisable price indexes Journal of Business and Economic Statistics, 32(3)

de Haan, J (2019) Rolling Year Time Dummy Indexes and the Choice of Splicing Method Paper presented at the 2019 Ottawa Group conference, Rio de Janeiro, Brazil

Ivancic, L, W E Diewert and K J Fox (2011) Scanner Data, Time Aggregation and the Construction of Price Indexes Journal of Econometrics 161, 24-35

Krsinich, F (2014) Quality Adjustment in the New Zealand Consumers Price Index Chapter from The New Zealand CPI at 100. History and Interpretation Publisher: Victoria University Press. Editors: Sharleen Forbes, Antong Victorio

Krsinich, F (2016) The FEWS index: Fixed effects with a window splice Journal of Official Statistics 32(2)

From field collection to alternative price data at Stats NZ, UNECE expert meeting 2023

10

Krsinich, F (2019) Land prices: UNCOVERED! Extricating land price indexes from improved property price indexes for New Zealand Paper presented at the 2019 New Zealand Association of Economists conference, Wellington, New Zealand

Stansfield, M (2019) Import and export price indexes using fixed-effects window-splicing Paper presented at the 2019 New Zealand Association of Economists conference, Wellington, New Zealand

Stansfield, M and F Krsinich (2021) Bigger, better, faster: further progress in using non-traditional data to measure price inflation Paper presented at the 2021 New Zealand Association of Economists conference, Wellington, New Zealand

Stansfield, M and F Krsinich (2022, June). A MAP for the future of price indexes at Stats NZ Paper presented at the 17th Ottawa Group 2022, Rome, Italy

Stansfield, M (2022) Multilateral R package available on the Comprehensive R Archive Network (CRAN)

Stats NZ (2014) Measuring price change for consumer electronics using scanner data

Stats NZ (2019a) New methodology for rental prices in the CPI

Stats NZ (2019b) Overseas trade price indexes through a multilateral method

The Relationship Approach for Living in Aotearoa - Donna Jones and Kelly Evans (Stats NZ, New Zealand)

Languages and translations
English

1

The Relationship Approach for Living in Aotearoa – New Zealand’s new longitudinal survey

Purpose This paper acknowledges the unique challenge of retaining participants in longitudinal surveys and describes how Stats NZ Tatauranga Aotearoa engages and builds relationships with survey participants in Living in Aotearoa, New Zealand’s new longitudinal survey. It outlines high-level concepts and considerations that have been identified to improve survey members’ willingness and participation in the survey for its six-year duration. It then outlines the benefits and challenges associated with this approach, and the strategy for evaluating its effectiveness.

Summary Living in Aotearoa is a new longitudinal survey in Aotearoa (New Zealand), introduced to enable Stats NZ to report on all ten measures of child poverty under the Child Poverty Reduction Act (2018). Currently, nine of ten measures can be collected through the Household Economic Survey (HES), but the final measure, that of persistent poverty, requires longitudinal data.

Many longitudinal surveys find, that after just a few waves of interviewing, members of the responding sample from the initial wave are no longer participating1,2. The primary concern is that the loss of participants, also known as sample attrition, can be systematic rather than random. Evidence from longitudinal studies shows that establishing and maintaining a meaningful relationship between the survey administrators and survey participants is fundamental to achieving high retention rates3,4.

The Respondent Relationship Approach (RRA) represents a new approach to engaging with survey participants to help manage attrition within Living in Aotearoa. Aotearoa is a bicultural nation, built upon a founding treaty document between its indigenous Māori population and the British Crown. In the Māori worldview, te ao Māori, investing in relationships is key to building trust and confidence, and this has been shown to be effective in ensuring Māori have this trust and confidence in survey processes and outputs produced5. The RRA has been designed to develop a sense of collective

1 Watson, N., & Wooden, M. (2009). Identifying factors affecting longitudinal survey response. In: Methodology of Longitudinal Surveys (ed. P. Lynn)

2 Satherley, N., Milojev, P., Greaves, L., Huang, Y., Osborne, D., Bulbulia, J., et al. (2015) Demographic and Psychological Predictors of Panel Attrition: Evidence from the New Zealand Attitudes and Values Study. PLoS ONE 10(3): e0121950. doi:10.1371/journal.pone.0121950

3 Estrada, M., Woodcock, A., & Schultz, P. W. (2014). Tailored panel management: A theory-based approach to building and maintaining participant commitment to a longitudinal study, Evaluation Review, 38(1), 3-28

4 Poulton, R., Moffitt, T. E., & Silva, P. A. (2015). The Dunedin Multidisciplinary Health and Development Study: overview of the first 40 years, with an eye to the future, Social Psychiatry and Psychiatric Epidemiology, 50(5), 679-693.

5 Jones1, B., Ingham, T., Cram, F., Dean, S., & Davie, C. (2013). An indigenous approach to explore health-related experiences among Māori parents: the Pukapuka Hauora asthma study, BMC Public Health, 13:228

UNECE Expert Meeting on Statistical Data Collection:

‘Rethinking Data Collection' (12-14 June 2023)

2

responsibility, known as a shared kaupapa (purpose or topic), so that people are more inclined to participate and contribute their information for the ‘greater good’.

Development of Living in Aotearoa began in mid-2020 and the RRA is an integral part of its design. Living in Aotearoa began its second year of data collection in April 2023 and early feedback has been received about the RRA. Evaluation criteria and a two-phase evaluation process have been established, enabling Stats NZ to gather baseline data for analysis from multiple sources. This will lead to an annual evaluation of the RRA starting in 2024.

Introduction to Living in Aotearoa In 2018, the New Zealand government introduced the Child Poverty Reduction Act. The Act is designed to help achieve a significant and sustained reduction in child poverty in New Zealand. It requires annual reporting on ten measures of child poverty - nine of which have been reported from the Household Economic Survey (HES) since February 2020. One measure however, persistent child poverty, is unable to be reported from existing annual Stats NZ surveys because it requires longitudinal data, and HES is an annual survey. To enable Stats NZ to deliver all ten child poverty measures, including persistent poverty, HES is being transformed into two new surveys: a longitudinal survey called Living in Aotearoa (LiA) and an annual survey called Household Expenditure and Wealth (HEW). Living in Aotearoa is a rotating panel survey that will enable us to report on all ten child poverty measures as well as continue to meet reporting needs for income, housing costs and material wellbeing (currently met by HES). HEW will meet the remaining needs for expenditure and net worth. Living in Aotearoa will interview respondents every year for six years. Once fully implemented, more than 20,000 households will be participating in the survey at any given time. Cross-sectional (annual) and longitudinal data from Living in Aotearoa, including the provision of Official Statistics on all ten child poverty measures, will be available from early 2026.

Figure 1 – Rotating panel survey design for Living in Aotearoa

Wave 1 is the first interview for the household, and so on. Green text shows the data used to form cross-

sectional measures. Yellow shading shows the data used to form longitudinal measures.

3

Why the Respondent Relationship Approach? Many longitudinal surveys find, that after just a few waves of interviewing, members of the responding sample from the initial wave are no longer participating.6,7 The primary concern is that the loss of participants, also known as sample attrition, can be systematic rather than random. For example, people with lower incomes may be less likely to own their own home and therefore more likely to move to a new house, meaning they are more difficult to maintain contact with and retain in the survey. This pattern would result in selective attrition where people with lower incomes are lost from the survey at higher rates than people with higher incomes. As a result, the sample would become less representative over time, introducing systematic bias to the estimates and threatening the validity of the statistics generated.

A robust relationship approach to minimising attrition is critical to the success of Living in Aotearoa, particularly so that priority groups such as Māori, Pacific Peoples, migrant communities and people with disabilities can be accurately represented in the survey.

The current approach to collection of annual social survey data used by StatsNZ is called the “persistence approach”. This approach makes use of a series of notifications, by mail or email, that inform the participant of their mandated obligation to take part in the survey if chosen in a sample under the Data and Statistics Act 2022. This approach, while accepted and common internationally particularly for annual or one-off surveys, is less focused on ensuring willingness to participate, engage and interact in an ongoing way which is essential for longitudinal surveys.

The development of the Respondent Relationship Approach (RRA) drew on the collective expertise of the Stats NZ Collection Operations team, Te Ao Māori (New Zealand indigenous world view) perspectives, academic literature, and evidence from other longitudinal studies including the Dunedin Study, Growing Up in New Zealand (GUiNZ), New Zealand Attitudes and Values Study (NZAVS) and Household, Income and Labour Dynamics in Australia (HILDA).

Conceptual framework Aotearoa New Zealand is a bicultural nation founded on Te Tiriti o Waitangi | the Treaty of Waitangi— a treaty between the indigenous Māori population and the British Crown. Within the Māori worldview, known as Te Ao Māori, establishing and nurturing strong relationships is immensely important to fostering trust and community.

The Relationship Respondent Approach (RRA) is centered in te ao Māori. It is specifically designed to foster a sense of collective responsibility, referred to as a shared kaupapa (purpose or topic), where individuals are more inclined to actively participate and contribute their information for the betterment of the whole community. While build around te ao Māori principles, the approach and components within it are good practice and extend beyond the New Zealand context.

Further, honouring the Government’s commitment to improving issues of systemic inequity for Māori in Aotearoa requires deliberate survey design with Māori as a priority group. By clearly prioritising

6 Watson, N., & Wooden, M. (2009). Identifying factors affecting longitudinal survey response. In: Methodology of Longitudinal Surveys (ed. P. Lynn)

7 Satherley, N., Milojev, P., Greaves, L., Huang, Y., Osborne, D., Bulbulia, J., et al. (2015) Demographic and Psychological Predictors of Panel Attrition: Evidence from the New Zealand Attitudes and Values Study. PLoS ONE 10(3): e0121950. doi:10.1371/journal.pone.0121950

4

relationships with survey respondents, StatsNZ hopes to improve participation in the survey, as well as confidence and trust in the survey processes, for Māori.

The Relationship Approach for Living in Aotearoa can be synthesised into the conceptual framework shown in figure 2. It focusses on building relationships with people that are mana-enhancing8, based on trust, and show reciprocity.

The approach seeks to provide a framework for decision making about the engagement methods and tools employed. The components of the framework are:

• People first

• Building trust

• Being mana-enhancing

• Showing reciprocity

• Shared kaupapa, collective responsibility

Figure 2 – Respondent Relationship Approach conceptual framework

8 Mana is a multi-faceted Māori concept that can mean dignity, authority, prestige, or power.

5

People first

People and relationships sit at the heart of the framework, recognising the need to work with people and communities to establish a trusting relationship first and foremost. The approach is aligned to the concept of whanaungatanga: the process of establishing links, making connections and relating to the people one meets by identifying in culturally appropriate ways, whakapapa (line of descent) linkages, past heritages, points of engagement, or other relationships.

In the survey context, whanaungatanga requires a focus on the integrity and authenticity of the survey interviewer/survey participant relationship and interactions.

The focus on establishing a relationship first and foremost is not significantly different from current practice. However, there is increased emphasis on the development of a relationship which can support a longitudinal relationship rather than a one-off engagement, and ways of engaging that are appreciated by and culturally safe for Māori.

In practice, the ‘people first’ approach means:

• A focus on building a meaningful relationship, first and foremost

• Working with survey participants to understand and work through any barriers to participation and cater to individual needs; this could mean:

o Returning to a household on a different day o Being flexible around times o Being flexible about the meeting place o Making the survey available in different modes o Making the survey available in different languages o Ensuring the participant knows they may have a support person with them o Having activities to help entertain children so that caregivers can focus on the survey o Switching survey interviewers based on experience, gender and/or cultural or

community connections where a survey participant relationship is proving challenging to establish

Building trust

Trust is essential for survey participants to feel comfortable sharing their information over consecutive years. The information collected in Living in Aotearoa is sensitive and survey participants must feel confident in the survey processes and how their information is stored, protected, and used to produce outputs.

This component places increased emphasis on trust required to support a longitudinal relationship with survey participants. This will involve development of different survey collateral and may involve more survey interviewer time invested into explaining the survey processes and implications of being selected to participate. It will also require increased focus on community engagement to build awareness of household surveys and identify opportunities to improve survey experience and access for priority groups.

In practice, building trust with survey participants means:

• Building community awareness and acceptance of household surveys

• Utilising survey interviewers that are known and respected in their local community

• Clearly explaining the survey processes including any risks and benefits of participating

• Being open and honest about the legislative requirements

6

• Being open and honest about how data is sourced, stored and used including the use of administrative data and data to maintain contact with survey participants

• Ensuring people’s information is kept secure and confidential

• Ensuring survey participants remain at the centre of all decision making around the collection approach

Being mana-enhancing

In te ao Māori, mana is a multi-faceted concept that can mean dignity, authority, prestige, or power. There are different types of mana:

• Mana a person is born with, their whakapapa

• Mana given to a person by others

• Mana of a grouping

In the survey context, upholding and enhancing the mana of participants refers to the realisation that everything the survey participant shares with StatsNZ is tapu (sacred), as it is their personal information. It is a taonga (treasure) which they give to StatsNZ. The RRA recognises that the personal information and data that survey participants share is tapu because it belongs to them. By upholding participants’ mana, the mana of StatsNZ is also upheld.

In practice, being mana enhancing means:

• Being culturally aware and responsive

• Being respectful of survey participants; listening and validating but not judging or assuming

• Considering the needs of survey participants and adapting the approach to suit: for example, allowing choice in the meeting time and place and working with them to resolve barriers to participation

• Embracing reciprocity: ensuring that survey participants feel valued as individuals not merely a source of data

• Using language and explanations that people can understand and feel empowered by

• Ensuring survey participants remain at the centre of all decision making around the collection approach.

Showing reciprocity

Showing reciprocity refers to a social norm that involves in-kind exchanges between people. It recognises that relationships are strengthened when contributions are acknowledged and returned in-kind. In practice, showing reciprocity means:

• Practicing whanaungatanga

• Saying thank you and expressing gratitude in all exchanges

• Affirming the feelings and experiences of survey participants

• Sharing how the survey data is being used for good

• Offering genuine koha (donation or gift)

Reciprocity must be woven throughout the RRA, during every point of contact with survey participants. We acknowledge that each action builds on the action before it and contributes to the development and maintenance of the relationship.

The exchange of koha is one way that StatsNZ can give meaningful expression to the reciprocal relationship with survey participants. Koha is a rich concept in te ao Māori. It acts to initiate and then maintain the balance of the relationship. It is not given with the intent of changing a person’s mind or

7

incentivising participation. By offering koha, StatsNZ can enhance the reciprocal nature of the relationship with the survey participant and uphold their mana in the exchange.

Shared kaupapa, collective responsibility

Survey participants will be motivated to participate by different things depending on their personal circumstances, life experiences, history, and beliefs. Survey interviewers need to be skilled in telling a compelling story that can draw people together and create a sense that participation in the survey is contributing to the betterment of New Zealand.

Strategies to develop a shared kaupapa and sense of collective responsibility can be woven through each stage of the approach. This can be further enhanced through community engagement, for example, partnering with community organisations to improve survey experience and access for priority groups e.g. Māori, Pacific Peoples, migrant communities and people with disabilities.

This component of the approach is not significantly different from current practice, however, there is increased emphasis on the need to build a strong sense of community to retain people for the duration of the longitudinal survey. Development of survey collateral with consistent branding and a robust, localised community engagement approach are required to support the work of survey interview teams.

Implementation – opportunities and challenges In implementing the Respondent Relationship Approach (RRA) within the Living in Aotearoa survey, several opportunities and challenges have been identified.

Opportunities

One key opportunity lies in redesigning the survey collateral to enhance its visual appeal and content. This includes providing versions in te reo Māori (the Māori language) to ensure the survey is inclusive and culturally relevant. By incorporating te ao Māori perspectives, the survey materials can resonate with participants and foster a stronger connection.

To support the RRA, increasing Te Ao Māori and cultural awareness among frontline staff through training opportunities is essential. Implementing this approach initiates quality professional development of our workforce, equipping staff with a deep and authentic understanding of Māori culture.

Another opportunity lies in sharing “data value stories” with participants. Highlighting how national and community organisations like Kidscan, Variety, and Plunket utilise the survey data to develop impactful programs can demonstrate the real-world significance of participant contributions. This approach helps participants recognize the value of their involvement and the positive change it can bring to their communities.

Furthermore, taking the time to explain the purpose and importance of the survey, as well as addressing any concerns, strengthens participants' commitment, motivation, and trust in StatsNZ surveys and processes as a whole. It generates increased community engagement at a time when cohesiveness is being challenged worldwide.

8

Challenges

Implementing effective “staying in contact” approaches that align with the relationship approach framework presents a notable challenge. It has been acknowledged already that the RRA signals a move away from the “persistence approach”, which relies on communicating the obligation to participate under the law. Balancing regular and clear communication while respecting participants' privacy and boundaries requires careful consideration. Developing strategies that maintain engagement without overwhelming participants is crucial for long-term retention.

Supporting frontline staff in adjusting their approach to align with the nuanced differences of the relationship approach can be demanding. Providing guidance, training, and resources to help staff navigate these subtleties is essential for successful implementation.

Lastly, understanding the impact of the survey's approach versus external environmental factors on response rates can be complex. Differentiating between the effects of the relationship approach and other influences requires thorough analysis and evaluation.

Addressing these challenges and capitalising on the identified opportunities will contribute to the successful implementation of the RRA within the Living in Aotearoa survey. By prioritising cultural relevance, community engagement, and personalized interactions, Stats NZ aims to enhance participant retention and ensure the long-term success of the survey.

Evaluating the Respondent Relationship Approach

Guiding principles

The following guiding principles inform the evaluation approach, topline evaluation questions, and the granular key performance indicators. A two-phase evaluation process

There is no baseline data to use to differentiate the impact of the RRA from that of the standard data collection process. Therefore, monitoring and evaluation is structured into two phases:

9

An annual evaluation cadence

After baseline data creation, evaluation is to be undertaken at the end of the collection period each calendar year. This will allow reflections to be captured in a timely way, maximising available time to implement identified necessary changes before the next collection.

Three components for evaluation

Acknowledging the complexity of the collection approach and the longitudinal nature of the survey, evaluation is focussed into three categories:

Factoring in broader experience

It is recognised that Living in Aotearoa is operating in a changing macro socio-economic environment and that this and other surveys are experiencing specific challenges. For this reason a collaborative learning approach has been taken with ongoing collection monitoring metrics collected on the performance of all surveys and work in New Zealand and internationally happening to better understand falling survey response rates and how to address these.

Align with strategic priorities

To ensure that the RRA appropriately delivers on its objective, delivery and evaluation are aligned with Living in Aotearoa’s committed benefits; in particular, “Improved confidence that survey processes and outputs deliver for Māori and for all” and Stats NZ’s overarching strategic priorities, including alignment of its approach with te ao Māori and its intention to pursue an integrated roadmap approach across all bodies of work.

Evaluation questions

Three high-level evaluation questions are given below, along with examples of “measurable” components and potential sources of data to address these.

Process Evaluation Question: How well is the implementation and delivery of the Respondent Relationship Approach working?

This question may be answered by looking at the organisation’s internal systems readiness to support the implementation of the RRA, and the extent to which survey interviewers are supported in their application of the RRA. Available measures or sources of data include recruitment monitoring data,

10

survey interviewer retention, survey interviewer confidence and feedback, training design, training participation and feedback, and development of supporting collateral materials.

Outcome Evaluation Question: How well is the Respondent Relationship Approach contributing to survey participants being willing and able to participate in the survey for its full six-year duration?

This question seeks to understand both the extent to which the RRA is fostering a positive and enduring relationship with respondents, as well as the more general success of Living in Aotearoa in delivering quality data on child poverty. Measures to assess the direct effectiveness of the RRA include the practical processes developed to stay in contact with respondents, the percentage of respondents proactively staying in contact with StatsNZ, respondent feedback on the collection approach, and analysis of refusal comments or complaints. Measures to assess the more general success of Living in Aotearoa as above include the stability and retention rate of samples, data quality, and demographical gaps analysis.

Synthesis

Process and outcome question insights are combined to inform lessons learned and next steps for Living in Aotearoa. In the final year of the evaluation process, the following question is applied. Synthesis Evaluation Question: What has been learned from the utlisation of the Respondent Relationship Approach?

This question seeks to identify specific lessons for the design and delivery of future surveys and the critical factors to the success of quality data collection in a dynamic environment. These are informed by the process and outcome evaluation activities, respectively.

Conclusion This paper highlights the unique challenge of participant retention in longitudinal surveys and presents the approach employed by Stats NZ in the Living in Aotearoa survey to address this issue.

Recognising the cultural context of Aotearoa as a bicultural nation, the RRA draws upon te ao Māori principles of whanaungatanga and building trust and confidence through investing in relationships. Creating a shared kaupapa fosters a sense of collective responsibility, motivating individuals to actively participate and contribute their information for the greater good.

By engaging and building relationships with survey participants, Living in Aotearoa strives to enhance the quality and longevity of longitudinal data, ultimately contributing to a better understanding of child poverty and supporting evidence-based policy decisions in Aotearoa, New Zealand.

Forest Product Conversion Factors

Forest products conversion factors provides ratios of raw material input to the output of wood-based forest products for 37 countries of the world. Analysts, policymakers, forest practitioners and forest-based manufacturers often have a need for this information for understanding the drivers of efficiency, feasibility and economics of the sector.

Forest Product Conversion Factors

Forest products conversion factors provides ratios of raw material input to the output of wood-based forest products for 37 countries of the world. Analysts, policymakers, forest practitioners and forest-based manufacturers often have a need for this information for understanding the drivers of efficiency, feasibility and economics of the sector.