Abstract
This paper investigates the use of machine learning as a means of generating “bag of words” using text corpora from accounting applications. We use Word2Vec to generate words/dictionaries, that are “similar” to a seed word that captures a concept. As part of our analysis, we perform several experiments using text from Form 10Ks and earnings calls. We investigate several activities including choice of the seed word(s), choosing word sources (corpora), analysis of resulting word lists, and other concerns. We also examine the notion of “human-in-the-loop” and the roles that a person would need to perform while generating a dictionary. Further, we investigate the impact of using accounting and financial corpuses on the different semantic and syntactic relationships, in contrast to Wikipedia. We then extend the analysis to compare those findings to ChatGPT another source of words and investigate some of the advantages and disadvantages of that approach.
| Original language | English |
|---|---|
| Pages (from-to) | 125-147 |
| Number of pages | 23 |
| Journal | Journal of Emerging Technologies in Accounting |
| Volume | 22 |
| Issue number | 2 |
| DOIs | |
| State | Published - 1 Sep 2025 |
Keywords
- bag of word dictionaries
- ChatGPT
- experiments with AI
- exploratory research
- text mining
- Word2Vec
Fingerprint
Dive into the research topics of 'Exploratory Experiments in Generating Bag of Word Dictionaries Using Word2Vec and ChatGPT'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver