Exploratory Experiments in Generating Bag of Word Dictionaries Using Word2Vec and ChatGPT

Research output: Contribution to journalArticlepeer-review

Abstract

This paper investigates the use of machine learning as a means of generating “bag of words” using text corpora from accounting applications. We use Word2Vec to generate words/dictionaries, that are “similar” to a seed word that captures a concept. As part of our analysis, we perform several experiments using text from Form 10Ks and earnings calls. We investigate several activities including choice of the seed word(s), choosing word sources (corpora), analysis of resulting word lists, and other concerns. We also examine the notion of “human-in-the-loop” and the roles that a person would need to perform while generating a dictionary. Further, we investigate the impact of using accounting and financial corpuses on the different semantic and syntactic relationships, in contrast to Wikipedia. We then extend the analysis to compare those findings to ChatGPT another source of words and investigate some of the advantages and disadvantages of that approach.

Original languageEnglish
Pages (from-to)125-147
Number of pages23
JournalJournal of Emerging Technologies in Accounting
Volume22
Issue number2
DOIs
StatePublished - 1 Sep 2025

Keywords

  • bag of word dictionaries
  • ChatGPT
  • experiments with AI
  • exploratory research
  • text mining
  • Word2Vec

Fingerprint

Dive into the research topics of 'Exploratory Experiments in Generating Bag of Word Dictionaries Using Word2Vec and ChatGPT'. Together they form a unique fingerprint.

Cite this