+44 (0) 15 64 793552

ETC Conference Papers 2022

A simple model versus a complex model; what do we gain?

Seminar
Day 1 (7 Sep 2022), Session 1, NATIONAL MODELS, 11:00 - 13:00

Status
Accepted, documents submitted

Submitted by / Abstract owner
Barry Zondag

Authors
Barry Zondag, Significance
Frank Hofman, Rijkswaterstaat, Dutch Ministry of Infrastructure and Water Management
Jasper Willigers, Significance

Short abstract
The Dutch strategic transport model is a large-scale complex model. It is often questioned whether a simple model would not do the ‘job’ as well. This paper explores the gains from adding this aspect to the search for an optimal model specification.

Abstract
The Netherlands have a long history of strategic passenger transport modelling. Over the last decades the National and regional models (Landelijk Modelsysteem, LMS, and Netherlands Regional Model, NRM) have continuously been improved based on user experience, data availability, methodological developments and policy demands from the clients. The current version of the demand model can be characterized as a large-scale disaggregated choice model. The model has a detailed representation of population segments included as explanatory variables to address differences in behaviour between these segments, and its main module integrates the destination-, mode-, time-of-day, access and egress mode- and station choice in an estimated nested logit structure. The model distinguishes 9 transport modes, 5 home-based purposes, 2 work based purposes and 3 Child purposes. For each of the purposes a separate sub-model has been estimated.

Over the years the complexity of the LMS/NRM models has increased as new specification research has been building upon existing model specifications. From the side of policy makers and practitioners it is often questioned whether all this complexity is needed, as it makes it more difficult to explain and interpreted the model outcomes. In addition, the model uses numerous dummy variables to account for heterogeneity in behavior. This approach might lead to “overfitting”. This study aims to explore whether the practice of building upon existing model specifications has resulted in a logical outcome, or would the specification look very different if we start from scratch. Furthermore, by following a stepwise approach, starting with a very simple model and adding step by step additional aspect of the current model, we could analyse what the gains are of adding these different aspects.

In this research we have identified five steps between a basic model and the current version of the model specification, as following:
1. Time: Simple model, multi-nominal logit structure (MNL) and explanatory variables for travel times by mode, alternative specific constants by mode and size variables by purpose;
2. Cost: Adding travel costs by mode, including existing travel costs reduction factors and reimbursements, and parking costs. This step also includes estimated costs coefficients by income class;
3. Spatial variables: adding spatial variables like intrazonal constants, train variables for short distance, urbanization factors both at origin and destination zones by mode and match between education level of employees and jobs for commuters;
4. Segment variables: adding segments variables, mainly dummy variables, to address heterogeneity in mode and destination choices. Variables tested and included are among others age, gender, type of participation, education level, car availability in household and ownership of student card for public transport;
5. Structure of model: in this last step a nested logit model structure, instead of MNL structure in step 1 to 4, is freely estimated and tested for significance.

The stepwise testing of the model specification has been executed for the purposes commuting and shopping. Both purpose models have been estimated for each step in Alogit estimation software and applied in so-called apply-models; running and comparing the models with the estimation data (weighted and expanded). The different steps are compared on the ‘standard’ evaluation criteria, as usually applied, including model fit, average travel distances and trip length distribution by mode, time- and cost elasticities and spatial aspects of particular interest like traffic flows to the 4 main cities by mode.

Each of the steps results in a very substantial increase of the model fit, which means that the more advanced models give a better estimate at the individual level of the travel choices made. Comparing the steps with more aggregated reference values, step 1 gives already a reasonable fit for average travel distances and trip length distribution (except the train). The inclusion of step 2, cost, and step 3, spatial variables, are important to improve the match with the observed train trip length distribution and traffic flows by mode to the 4 main urban areas. Without including step 2 and 3 both reference values are poorly simulated by the step 1, travel time, model. Adding the segment variables, step 4, and nesting structure, step 5, are critical steps to improve the time- and cost elasticities by mode. Without these steps the elasticities for step 1 to 3 are mainly outside the ‘acceptable’ range, drawn from the literature, for these elasticities. The time elasticity tends to become very high in the simple models in combination with very low-cost elasticities, especially for public transport.

Programme committee
Transport Models