Behavioural Economics Symposium 2016: Data Science and Behavioural Economics


 
 

Big Data and advances in data science have enabled policymakers to gain deeper insights into patterns of how people think, choose and act. However, the value of Big Data does not lie in the data itself, but from how it is collected, organised and analysed to generate useful insights, in tu​rn facilitating the design and implementation of more tailored interventions and policies.

In August 2016, the Civil Service College organised a Behavioural Economics Symposium that brought together Gary King, Sumit Agarwal, and Daniel Lim to discuss how data science complements behavioural economics in designing better policies.

Click below to read summaries of the presentations:

• Big Data is not about the Data! The Power of Modern Analytics — Gary King
• Big Data Analytics: Why and How of Real Time Actionable Insights — Sumit Agarwal
• Data Science for the Public Sector — Daniel Lim

 

PRESENTATION SLIDES

Read the presentation slides from:

Gary King (PDF,1.2MB) | Daniel Lim (PDF,991KB)


LECTURE SUMMARY

PDF VERSION: Abstract (382KB) | Gary King (389KB) | Sumit Agarwal (382KB) | Daniel Lim (430KB)


Big Data is not about the Data!
The Power of Modern Analytics

By Prof Gary King


 
Big Data enables key insights to be drawn, in order to shape and plan policies in an impactful manner. For instance, data analytics can improve forecasts for better planning and allocation of public funds (see Box 1).


 
 

Box 1: How analytics improves forecasting for the US Social Security Administration.

Customised analytics could be used for improved forecasting, as seen in the example of the US Social Security Administration (SSA).

The SSA uses forecasts of the funds needed in the Social Security Trust Funds for retirees when planning its budget. However, no evaluation of SSA forecasts had been done for the past 85 years. It was found that the forecasting errors were unbiased until the year 2000 but were systematically biased thereafter. This made the Social Security Trust Funds look healthier than they actually were. One key oversight was that retirees were living longer than expected. There were thus insufficient trust funds to support them.

With the use of customised analytics, it was predicted that the trust funds would need US$800 billion more than the SSA had initially forecasted.

 

 
Human capital remains key to how well big data is used. Data analytics incurs little cost and infrastructure, and data collection has been easier by the growing presence of the Internet, increased data sharing and advances in statistical methods. However, analytics is still heavily reliant on human capital to provide reliable insights and identify similar patterns across problems. To illustrate this, Prof King shared how data patterns alone could be misleading and how researchers identified a common analytics approach to address two seemingly different problems (see Box 2).


 
 

Box 2: Identifying a common solution for two different problems.

1.In countries where medical death certificates were not issued, physicians would visit the caregivers to collect data on symptoms in order to classify the cause of death. Such “verbal autopsies” were inaccurate; even with the same list of symptoms, different physicians would arrive at different causes of death.

2.In a totally different domain, social media posts were analysed to forecast the US unemployment rate, using sentiment analysis by word count. The frequency of keywords related to unemployment, such as “jobs”, “classifieds” and “unemployment”, was counted. However, the classification of social media posts became unexpectedly inaccurate when Steve Jobs passed on. This was because there was a sudden spike in the number of tweets containing words such as “jobs” but these words were not related to unemployment.

While these two examples seemed different, they faced the same challenge of accurate classification (i.e., classifying deaths and social media posts into the correct categories). They thus shared the same solution which involved estimating the percentage of individuals dying of a particular cause, and the percentage of social media posts indicating unemployment. A new customised analytics based on this approach was then developed to improve classification in both examples.

 

 
In text analysis, which requires users to select keywords from a large amount of text, King asserted that a computer-assisted approach was more effective when compared to an unassisted human and a fully-automated approach. The weakness of an unassisted human approach is that human users are unable to recall keywords well (see Box 3). A computer-assisted approach addresses this problem by enabling data scientists to judge and select the suitable keywords from categories generated by the computer. This approach thus provides the best balance between leveraging on the strengths of human judgement and the ability of computers to manage big data sets.

King also cautioned against the use of a fully-automated approach. In China, such an approach would be incapable of following social media conversations that use homographs (i.e., words with similar spelling but with different meanings) and homophones (i.e., words with the same pronunciation but with different meanings). In addition, this approach cannot select the correct keywords if there are sudden changes in social trends. This occurred during the Boston Marathon bombing incident, when social media posts switched keywords from #BostonBombings to #BostonStrong.


 
 

Box 3: Limited capability of humans to recall keywords.

In an experiment, 43 Harvard undergraduates were instructed to recall keywords through the following prompt: There are 10,000 twitter posts, each containing the word “healthcare”, in the time period surrounding the Supreme Court decision on Obamacare. List any keywords that come to mind that will select posts related to Obamacare and will not select posts unrelated to Obamacare.

Results:

Some examples of keywords selected included “unconstitutional”, “coverage”, and “Obama”. The median number of keywords selected by the respondents was 8. Collectively, 149 unique keywords were recalled. However, 66 per cent of these 149 keywords were selected by a single respondent. This illustrated the point that human users perform poorly at recalling keywords when faced with a sea of information and data.

 
 

Big Data Analytics: Why and How of Real Time Actionable Insights

By Prof Sumit Agarwal


 
Prof Agarwal described Big Data as a large volume of data (20 per cent structured and 80 per cent unstructured) and is characterised by “4 Vs”, namely:

1. Volume: The amount of data.

2. Velocity: The speed at which data is transmitted.

3. Variety: The different types of data such as text, sensor data, audio, video, click streams, log files, etc.

4. Veracity: The uncertainty of data that affects its accuracy.

He highlighted key challenges of using Big Data — the inability to carry out random sampling, difficulty of using classical statistics to interpret the data, biased and inaccurate data, having too many variables of different weights, merging both unstructured and structured data sets, as well as privacy and confidentiality issues. Despite these challenges, he encouraged government agencies to tap on the full potential of Big Data by sharing more data, avoiding working in silos, and tapping on insights from social media.

RESEARCH IN THE SINGAPORE CONTEXT

 
Agarwal shared on research work that drew on multiple data sets in the Singapore context. These include research on social peer effects (see Box 4) and ways to ease traffic congestion (see Box 5).


 
 

Box 4: Social peer effect of bankruptcies.

Data sets of all bankruptcy cases, credit and debit card transactions in Singapore were obtained to study how the spending behaviour of individuals would be affected when another person, residing in the same building, was declared bankrupt — given that this could impact how often and where they met.

After a bankruptcy case, the average spending of residents in the same building fell by 4 per cent per month. This effect was likely to be more significant for those living on the same floor of the building or those with similar demographics (e.g., age, gender). This shows how the behaviour of individuals could have implications on the community and macro economy, and provides insights to policies that can mitigate the effect.

 

 
 

Box 5: Easing congestion on public transport with more covered walkways.

Data was collected from EZ-link cards to gather information about commuters’ mode of transport and their embarkation and disembarkation stations.

It was found that 10 per cent of all commuter rides are single-stop rides (i.e., commuters who board at a particular station and alight at the next station). The average distance between two stations is approximately 200 metres. If these commuters were to walk instead of using the public transport, congestion during the peak hours could be eased.

Given that people were more likely to walk when there are covered walkways, providing more of such walkways could possibly reduce peak hour congestion by encouraging more commuters to walk between bus stops or train stations.

 

 

Data Science for the Public Sector

By Dr Daniel Lim


 
Data science can help enable evidence-based decision making in the government. However, this is a difficult task to accomplish. Data scientists are familiar with data but might not understand policy and operational issues. Conversely, policymakers might not know how analytics can add value to their work. To succeed, policymakers and data scientists need to work together to form a multidisciplinary team. There is a lot of value in such translational work.

Using analytics does not guarantee that a data science project will be successful. Dr Lim shared three critical success factors:

• Have a clear problem statement;

• Have a clear and actionable impact — “Assuming analytics proves fantastic, how does that change policy or improve operations?”

• Ensure that data is available in the right format and relevant to addressing the problem statements that have been raised.

To achieve the first two criteria, there must be buy-in from the stakeholders who are in a position to act on analytics insights and to share data.

Data science can help to measure outcomes of interest, generate hypotheses, or derive insights. For example, a hypothesis that the size and structure of homes influence the number of babies a couple has — i.e., smaller homes or homes with fewer rooms would lead to smaller families. To test the validity of this hypothesis, data can be analysed to find correlations and the insights gathered can be used in policymaking. In addition, data science provides new ways of measuring outcomes that were previously difficult to administer.

DATA SCIENCE CAN BE APPLIED TO SOLVE PUBLIC SECTOR PROBLEMS

 
Data science can be effectively applied across different domains. Lim shared examples from the National Library Board (see Box 6); the Housing Development Board (see Box 7); and SingHealth (see Box 8).


 
 

Box 6: National Library Board (NLB) — Using data science to understand the profiles of library borrowers.

Using clustering algorithms on data not only reveal natural groupings, but could also uncover new and previously unknown groupings. Running a clustering algorithm on 20 million library records allowed NLB to better understand its customer profiles.

The following customer segments emerged: (1) younger adults with general interests; (2) young families; (3) children and early teens from the West of Singapore; (4) middle-aged casual fiction readers from the North of Singapore; (5) retiree hobbyists borrowing from downtown; (6) grandparents and grandchildren from mature estates; and (7) working, travelling businessmen and techies.

Clusters (5) and (6) were unknown to NLB, who originally categorised these individuals together as one homogenous group. Such clustering techniques could be used to enhance NLB’s annual planning and help it to understand its customer base better.

 

 
 

Box 7: Housing Development Board (HDB) — Using textual analysis to understand public feedback.

Previously, HDB officers manually assigned received emails from the public to pre-defined categories. However, due to the complex nature of these emails, many emails were inevitably categorised under a generic category called “Others”. Moreover, pre-defined categories might not capture emerging issues because taxonomies change over time. This makes it challenging to do analytics.

Using topic modelling (a type of unsupervised machine learning), topic clusters from large amounts of unstructured textual data can be discovered. In HDB’s case, over 90,000 emails were analysed, resulting in the discovery of a topic cluster related to key collection timings. This analytics confirmed the suspicion that key collection timing was an issue and this helped convince HDB’s senior management to make improvements to the relevant process. This illustrates how textual analysis can be used to exemplify the importance of evidence-based decision making.

Lim noted that instead of understanding key public concerns through the use of surveys, the government could achieve the same goal in a quicker manner through analysing emails or feedback received from the public. This would also enable the government to detect problems and solve them in a timely manner.

 

 
 

Box 8: Healthcare — Using predictive analytics to identify potential frequent admitters to a hospital.

SingHealth was interested in predicting the number of frequent admitters (i.e., those admitted more than three times a year) to a hospital. Typically, a “patient navigator” interviews repeat patients and decides if this patient is “high-risk” based on his/her living conditions. A high-risk patient would then be assigned to a social medical worker to help prevent a readmission.

Using a machine learning algorithm that made use of data on the patients’ demographics and medical histories, it could accurately predict (up to 80 per cent) which second-time patient admits were at risk of becoming frequent-admitters. Lim emphasised that it was also important to see how this data could be integrated into hospital processes to further reduce readmission cases.

 

THE IMPORTANCE OF GROWING PUBLIC SECTOR DATA SCIENCE CAPABILITIES

 
Lim emphasised the importance of developing an evidence-based culture — one where decisions and changes are backed by evidence and data. The role of data science is to support this culture, which necessitates data science capabilities to be levelled up in the public sector. Greater literacy in data science will enable policymakers to be more aware of what analytics can offer and its limitations.

Technology can be utilised to make analytics more accessible to those not trained in this area and can increase the use of analytics. IDA has been working on a Citizen Feedback Analytics Platform to enable non- technical public officers to leverage textual analytics in their work.

 
 

ABOUT THE SPEAKERS

Gary King is the Albert J. Weatherhead III University Professor at Harvard University. He is based in the Department of Government (in the Faculty of Arts and Sciences) and serves as Director of the Institute for Quantitative Social Science, where he develops and applies empirical methods in many areas of social science research, focusing on innovations that span the range from statistical theory to practical application. He is an elected Fellow in 8 honorary societies and has won more than 40 awards for his work. Professor King is also President of the Society for Political Methodology and Vice President of the American Political Science Association.

Sumit Agarwal is a Visiting Professor in the Department of Finance, National University of Singapore (NUS). He was the Vice-Dean (PhD and Research) and Low Tuck Kwong Professor at the School of Business, NUS. Previously, he was a senior financial economist at the Federal Reserve Bank of Chicago and a senior vice president and credit risk management executive in the Small Business Risk Solutions Group of Bank of America.His research interests include issues relating to financial institutions, household finance, behavioral finance, international finance, real estate markets, urban economics and capital markets.

Daniel Lim is Consultant in the Infocomm Development Authority's (IDA) Data Science Division, where he is Team Lead for the multidisciplinary Quantitative Strategy team. He was formerly a member of Harvard’s Behavioural Insights Group and has collaborated with government agencies like Ministry of Home Affairs and the Land Transport Authority to design and implement randomised controlled trials. He is also Special Assistant to IDA’S Managing Director and a member of the Civil Service College’s Economics Experts Group and the 2035 National Scenarios Team.


 


​​​