A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection

Authors

  • Wayne Moodaley Senior Lecturer in Accounting and Financial Management at the Johannesburg Business School, University of Johannesburg.
  • Arnesh Telukdarie Professor of Digital Business at the Johannesburg Business School, University of Johannesburg.

DOI:

https://doi.org/10.14207/ejsd.2023.v12n4p319

Keywords:

greenwashing, artificial intelligence, sustainability, sustainability reporting, sustainability disclosures

Abstract

Detection of false or misleading green claims (referred to as “greenwashingâ€) within company sustainability disclosures is challenging for a number of reasons, which include the textual and qualitative nature, volume, and complexity of such disclosures. In recent years, notable progress made in the fields of artificial intelligence and specifically, large language models (LLMs), has showcased the capacity of these tools to effectively analyse extensive and intricate textual data, including the contents of sustainability disclosures. Transformer-based LLMs, such as Google’s BERT architecture, were trained on general domain text corpora. Subsequent research has shown that further pre-training of such LLMs on specific domains, such as the climate or sustainability domains, may improve performance. However, previous research often uses text corpora that exhibit significant variation across topics and language and which often consist of heterogeneous subdomains. We therefore propose a conceptual framework for further pre-training of transformer based LLMs using text corpora relating to specific sustainability subdomains i.e. subdomain specific pre-training. We do so as a basis for the improved performance of such models in analysing sustainability disclosures. The main contribution is a conceptual framework to advance the use of LLMs for the reliable identification of green claims and ultimately, greenwashing.

Keywords: greenwashing, artificial intelligence, sustainability, sustainability reporting, sustainability disclosures.

Downloads

Published

2023-10-01

How to Cite

Moodaley, W. ., & Telukdarie, A. . (2023). A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection. European Journal of Sustainable Development, 12(4), 319. https://doi.org/10.14207/ejsd.2023.v12n4p319

Issue

Section

Articles