Regulators are using text analytics to identify fraudulent information inadvertently revealed in narrative disclosures in annual reports, as Steve Young reports
This article was first published in the February 2016 UK edition of Accounting and Business magazine.
Information is the lifeblood of financial markets, and most financial information involves narrative content in some form – annual reports, earnings announcements, analysts’ reports, corporate web pages, regulatory guidelines, etc.
The volume of financially relevant text is vast and increasing rapidly in response to rising business complexity and evolving methods of communication, such as social media. Recent developments in corporate reporting aimed at improving transparency and quality – such as the revisions to the UK corporate governance code, the extended audit report and the new strategic report – have also focused heavily on enhancing narrative disclosures.
Financial narrative disclosures form part of the big data revolution, with the content of annual reports ranging from a few words to hundreds of pages of content – for example, HSBC’s December 2014 annual report weighs in at 488 pages.
The practical and conceptual problems associated with incorporating textual content into traditional economic models leads some decision-makers to eschew narrative data in favour of more familiar, quantitative outputs such as sales, earnings, stock returns, etc. However, emerging evidence suggests that stakeholders’ ability to make sense of textual content in a timely way and on a large scale is central to effective decision-making.
Research shows that narratives are informative beyond traditional financial data. For example, forward-looking disclosures predict future changes in sales, operating cashflows, profitability and capital expenditure for up to three years. Narratives in corporate reports have also been found to contain information that helps assess a business’s competitive environment and predict financial distress.
Research also suggests that narrative disclosures can help predict short-run share price movements. For example, market and company-level measures of sentiment constructed using news stories from financial media have been shown to provide incremental predictive ability for one-day-ahead share price changes. Views expressed on specialist social media platforms » such as SeekingAlpha.com may also contain information that is useful for predicting stock returns and earnings surprises.
Narrative commentaries allow management to engage in hype or spin to influence perceptions, as research published in the Journal of Accounting Literature reveals. Company-generated narratives tend to discuss positive news, with bad news partially reported or ignored completely. Attribution bias is also widespread, with management ascribing positive outcomes to internal policies while blaming negative outcomes on external factors such as unfavourable macro-economic conditions and bad weather. Companies with weak accounting results tend to produce longer narratives that are harder to read, and less readable narratives are associated with less persistent earnings performance. The ability to discriminate between useful commentary and spin is therefore crucial.
According to research published in Contemporary Accounting Research, the language used by, for example, US-listed entities in the management discussion and analysis section can reveal the truthful and the fraudulent reporters, even after controlling for financial statement data. This evidence is prompting regulators to incorporate text analytics into their market surveillance. For example, the US Securities and Exchange Commission (SEC) is using linguistic tools to screen financial statement data for signs of fraud, and achieving higher detection rates than when only quantitative data was used.
Speaking with forked tongue
The value of qualitative data is not restricted to written disclosures. Management responses to analysts’ questions during earnings conferences can also predict accounting irregularities. Deceptive executives make more references to received wisdom (‘you know’, ‘investors well know’, ‘others know well’, etc), fewer references to shareholder value, and use more extreme positive words (‘fantastic’, ‘great’, ‘definitely’, etc) and fewer anxiety words (‘worried’, ‘fearful’, ‘nervous’, etc).
Researchers, investors and regulators interested in incorporating qualitative information into decision-making are increasingly looking to computer science and linguistics for help in analysing company data. The standard approach involves developing algorithms that let users harvest large samples of text from various sources and then measure key properties including readability (the ‘fog index’, where the number of years of education needed to understand a given sample of text is taken into account – a higher number of years indicates lower readability), tone (positive or negative), narrative themes, degree of uncertainty, level of boilerplate, etc. These measures may then be used in applications predicting performance and as a red flag for future problems.
A key issue when applying automated textual analysis is the form and format in which text is presented to financial market users. The SEC’s Electronic Data Gathering, Analysis and Retrieval (Edgar) system is designed to facilitate automated download and analysis of vast datasets of company-level financial information (10-Qs, 10-Ks, 8-Ks). In contrast, reporting norms in many other countries create significant barriers to textual analysis. In the UK, for example, annual reports and accounts are published as PDF documents, so direct, automated access to text is problematic, while the lack of any consistent document structure makes it very difficult to locate and analyse comparable disclosures.
In response, researchers at Lancaster University, the London School of Economics and Manchester University, have developed web-based software that analyses large samples of UK annual report content as part of their Corporate Financial Information Environment (CFIE) project. The tool extracts from PDFs all text for each section listed in the table of contents, and computes a range of linguistic features for each, including word count, readability and tone.
The research team is currently using the software to produce the first large sample analysis of UK annual reporting practices (based on a sample more than 16,000 reports published since 2003). Not only does the analysis reveal a rise in report length (see box), it also reveals a decline in readability. The median annual corporate report published in 2014 was 3% less readable overall than the comparable report in 2003, with readability in the performance commentary sections down 11%. These results support concerns that UK annual reports are becoming increasingly complex and less understandable.
The research is an example of how developments in textual analysis are shedding new light on corporate communication practices, and helping to unlock the power of financial narratives as an investment resource.
Professor Steve Young is head of department – accounting and finance at Lancaster University Management School