Behavioural Economics Symposium 2016: Data Science and Behavioural Economics
Big Data and advances in data science have enabled policymakers to gain deeper insights into patterns of how people think, choose and act. However, the value of Big Data does not lie in the data itself, but from how it is collected, organised and analysed to generate useful insights, in turn facilitating the design and implementation of more tailored interventions and policies.
In August 2016, the Civil Service College organised a Behavioural Economics Symposium that brought together Gary King, Sumit Agarwal, and Daniel Lim to discuss how data science complements behavioural economics in designing better policies.
Click below to read summaries of the presentations:
• Big Data is not about the Data! The Power of Modern Analytics — Gary King
Read the presentation slides from:
|•||Big Data enables key insights to be drawn, in order to shape and plan
policies in an impactful manner. For instance, data analytics can improve
forecasts for better planning and allocation of public funds (see Box
Box 1: How analytics improves forecasting for the US Social Security Administration.
Customised analytics could be used for improved forecasting, as seen in the example of the US Social Security Administration (SSA).
The SSA uses forecasts of the funds needed in the Social Security Trust Funds for retirees when planning its budget. However, no evaluation of SSA forecasts had been done for the past 85 years. It was found that the forecasting errors were unbiased until the year 2000 but were systematically biased thereafter. This made the Social Security Trust Funds look healthier than they actually were. One key oversight was that retirees were living longer than expected. There were thus insufficient trust funds to support them.
With the use of customised analytics, it was predicted that the trust funds would need US$800 billion more than the SSA had initially forecasted.
Box 2: Identifying a common solution for two different problems.
While these two examples seemed different, they faced the same challenge of accurate classification (i.e., classifying deaths and social media posts into the correct categories). They thus shared the same solution which involved estimating the percentage of individuals dying of a particular cause, and the percentage of social media posts indicating unemployment. A new customised analytics based on this approach was then developed to improve classification in both examples.
Box 3: Limited capability of humans to recall keywords.
In an experiment, 43 Harvard undergraduates were instructed to recall keywords through the following prompt: There are 10,000 twitter posts, each containing the word “healthcare”, in the time period surrounding the Supreme Court decision on Obamacare. List any keywords that come to mind that will select posts related to Obamacare and will not select posts unrelated to Obamacare.
Some examples of keywords selected included “unconstitutional”, “coverage”, and “Obama”. The median number of keywords selected by the respondents was 8. Collectively, 149 unique keywords were recalled. However, 66 per cent of these 149 keywords were selected by a single respondent. This illustrated the point that human users perform poorly at recalling keywords when faced with a sea of information and data.
RESEARCH IN THE SINGAPORE CONTEXT
|•||Agarwal shared on research work that drew on multiple data sets in the
Singapore context. These include research on social peer effects (see
Box 4) and ways to ease traffic congestion (see Box
Box 4: Social peer effect of bankruptcies.
Data sets of all bankruptcy cases, credit and debit card transactions in Singapore were obtained to study how the spending behaviour of individuals would be affected when another person, residing in the same building, was declared bankrupt — given that this could impact how often and where they met.
After a bankruptcy case, the average spending of residents in the same building fell by 4 per cent per month. This effect was likely to be more significant for those living on the same floor of the building or those with similar demographics (e.g., age, gender). This shows how the behaviour of individuals could have implications on the community and macro economy, and provides insights to policies that can mitigate the effect.
Box 5: Easing congestion on public transport with more covered walkways.
Data was collected from EZ-link cards to gather information about commuters’ mode of transport and their embarkation and disembarkation stations.
It was found that 10 per cent of all commuter rides are single-stop rides (i.e., commuters who board at a particular station and alight at the next station). The average distance between two stations is approximately 200 metres. If these commuters were to walk instead of using the public transport, congestion during the peak hours could be eased.
Given that people were more likely to walk when there are covered walkways, providing more of such walkways could possibly reduce peak hour congestion by encouraging more commuters to walk between bus stops or train stations.
DATA SCIENCE CAN BE APPLIED TO SOLVE PUBLIC SECTOR PROBLEMS
|•||Data science can be effectively applied across different domains. Lim
shared examples from the National Library Board (see Box 6); the
Housing Development Board (see Box 7); and SingHealth (see Box
Box 6: National Library Board (NLB) — Using data science to understand the profiles of library borrowers.
Using clustering algorithms on data not only reveal natural groupings, but could also uncover new and previously unknown groupings. Running a clustering algorithm on 20 million library records allowed NLB to better understand its customer profiles.
The following customer segments emerged: (1) younger adults with general interests; (2) young families; (3) children and early teens from the West of Singapore; (4) middle-aged casual fiction readers from the North of Singapore; (5) retiree hobbyists borrowing from downtown; (6) grandparents and grandchildren from mature estates; and (7) working, travelling businessmen and techies.
Clusters (5) and (6) were unknown to NLB, who originally categorised these individuals together as one homogenous group. Such clustering techniques could be used to enhance NLB’s annual planning and help it to understand its customer base better.
Box 7: Housing Development Board (HDB) — Using textual analysis to understand public feedback.
Previously, HDB officers manually assigned received emails from the public to pre-defined categories. However, due to the complex nature of these emails, many emails were inevitably categorised under a generic category called “Others”. Moreover, pre-defined categories might not capture emerging issues because taxonomies change over time. This makes it challenging to do analytics.
Using topic modelling (a type of unsupervised machine learning), topic clusters from large amounts of unstructured textual data can be discovered. In HDB’s case, over 90,000 emails were analysed, resulting in the discovery of a topic cluster related to key collection timings. This analytics confirmed the suspicion that key collection timing was an issue and this helped convince HDB’s senior management to make improvements to the relevant process. This illustrates how textual analysis can be used to exemplify the importance of evidence-based decision making.
Lim noted that instead of understanding key public concerns through the use of surveys, the government could achieve the same goal in a quicker manner through analysing emails or feedback received from the public. This would also enable the government to detect problems and solve them in a timely manner.
Box 8: Healthcare — Using predictive analytics to identify potential frequent admitters to a hospital.
SingHealth was interested in predicting the number of frequent admitters (i.e., those admitted more than three times a year) to a hospital. Typically, a “patient navigator” interviews repeat patients and decides if this patient is “high-risk” based on his/her living conditions. A high-risk patient would then be assigned to a social medical worker to help prevent a readmission.
Using a machine learning algorithm that made use of data on the patients’ demographics and medical histories, it could accurately predict (up to 80 per cent) which second-time patient admits were at risk of becoming frequent-admitters. Lim emphasised that it was also important to see how this data could be integrated into hospital processes to further reduce readmission cases.
THE IMPORTANCE OF GROWING PUBLIC SECTOR DATA SCIENCE CAPABILITIES
ABOUT THE SPEAKERS
Gary King is the Albert J. Weatherhead III University Professor at Harvard University. He is based in the Department of Government (in the Faculty of Arts and Sciences) and serves as Director of the Institute for Quantitative Social Science, where he develops and applies empirical methods in many areas of social science research, focusing on innovations that span the range from statistical theory to practical application. He is an elected Fellow in 8 honorary societies and has won more than 40 awards for his work. Professor King is also President of the Society for Political Methodology and Vice President of the American Political Science Association.
Sumit Agarwal is a Visiting Professor in the Department of Finance, National University of Singapore (NUS). He was the Vice-Dean (PhD and Research) and Low Tuck Kwong Professor at the School of Business, NUS. Previously, he was a senior financial economist at the Federal Reserve Bank of Chicago and a senior vice president and credit risk management executive in the Small Business Risk Solutions Group of Bank of America.His research interests include issues relating to financial institutions, household finance, behavioral finance, international finance, real estate markets, urban economics and capital markets.
Daniel Lim is Consultant in the Infocomm Development Authority's (IDA) Data Science Division, where he is Team Lead for the multidisciplinary Quantitative Strategy team. He was formerly a member of Harvard’s Behavioural Insights Group and has collaborated with government agencies like Ministry of Home Affairs and the Land Transport Authority to design and implement randomised controlled trials. He is also Special Assistant to IDA’S Managing Director and a member of the Civil Service College’s Economics Experts Group and the 2035 National Scenarios Team.