Using Machine Learning to Generate a Dictionary for Environmental Issues

Daniel E. O’Leary, Yangin Yoon

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The purpose of this paper is to investigate the use of machine learning approaches to build a dictionary of terms to analyze text for ESG content using a bag of words approach, where ESG stands for “environment, social and governance.” Specifically, the paper reviews some experiments performed to develop a dictionary for information about the environment, for “carbon footprint”. We investigate using Word2Vec based on Form 10K text and from Earnings Calls, and queries of ChatGPT and compare the results. As part of the development of our dictionaries we find that bigrams and trigrams are more likely to be found when using ChatGPT, suggesting that bigrams and trigrams provide a “better” approach for the dictionaries developed with Word2Vec. We also find that terms provided by ChatGPT were not as likely to appear in Form 10Ks or other business disclosures, as were those terms generated using Word2Vec. In addition, we explored different question approaches to ChatGPT to find different perspectives on carbon footprint, such as “reducing carbon footprint” or “negative effects of carbon footprint.” We then discuss combining the findings from each of these approaches, to build a dictionary that could be used alone or with other ESG concept dictionaries.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Extraction - 7th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2023, Proceedings
EditorsAndreas Holzinger, Andreas Holzinger, Andreas Holzinger, Peter Kieseberg, Federico Cabitza, Andrea Campagner, A Min Tjoa, Edgar Weippl, Edgar Weippl
PublisherSpringer Science and Business Media Deutschland GmbH
Pages141-154
Number of pages14
ISBN (Print)9783031408366
DOIs
StatePublished - 2023
EventMachine Learning and Knowledge Extraction 7th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2023 - Benevento, Italy
Duration: 28 Aug 20231 Sep 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14065 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceMachine Learning and Knowledge Extraction 7th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2023
Country/TerritoryItaly
CityBenevento
Period28/08/231/09/23

Keywords

  • Bag of Words
  • Carbon Footprint
  • ChatGPT
  • Concept
  • Dictionary
  • ESG
  • Environment
  • Form 10K
  • Hybrid Approach
  • Ontology
  • Reducing Carbon Footprint
  • Word2Vec

Fingerprint

Dive into the research topics of 'Using Machine Learning to Generate a Dictionary for Environmental Issues'. Together they form a unique fingerprint.

Cite this