Sentiment Analysis

Data

This is a snapshot of the data (JWB article data 1967–2025 downloaded from Scopus) we will be working with.

import pandas as pd
data = pd.read_csv('../data/jwb-articles.csv')
data = data[data['Abstract'].notna()] # Keep nonempty abstracts
data.head()

	Authors	Author full names	Author(s) ID	Title	Year	Source title	Volume	Issue	Art. No.	Page start	...	ISSN	ISBN	CODEN	PubMed ID	Language of Original Document	Document Type	Publication Stage	Open Access	Source	EID
0	Al Asady, A.; Anokhin, S.	Al Asady, Ahmad (57219984746); Anokhin, Sergey...	57219984746; 24482882200	The Trojan horse of international entrepreneur...	2025	Journal of World Business	60	6	101677.0	NaN	...	10909516	NaN	NaN	NaN	English	Article	Final	NaN	Scopus	2-s2.0-105014957115
1	Thams, Y.; Dau, L.A.; Doh, J.; Kostova, T.; Ne...	Thams, Yannick (55357149800); Dau, Luis Alfons...	55357149800; 35147597100; 7003920280; 66037741...	Political ideology and the multinational enter...	2025	Journal of World Business	60	6	101678.0	NaN	...	10909516	NaN	NaN	NaN	English	Short survey	Final	NaN	Scopus	2-s2.0-105014844629
2	Lindner, T.; Puck, J.; Puhr, H.	Lindner, Thomas (57159151000); Puck, Jonas (85...	57159151000; 8563161700; 57223389639	Artificial intelligence in international busin...	2025	Journal of World Business	60	6	101676.0	NaN	...	10909516	NaN	NaN	NaN	English	Short survey	Final	All Open Access; Hybrid Gold Open Access	Scopus	2-s2.0-105014595041
3	Bruton, G.D.; Mejía-Morelos, J.H.; Ahlstrom, D.	Bruton, Garry D. (6603867202); Mejía-Morelos, ...	6603867202; 55748855800; 56525447800	Multinational corporations and inclusive suppl...	2025	Journal of World Business	60	6	101663.0	NaN	...	10909516	NaN	NaN	NaN	English	Article	Final	All Open Access; Hybrid Gold Open Access	Scopus	2-s2.0-105013512235
4	Liang, Y.; Giroud, A.; Rygh, A.; Chen, Z.	Liang, Yanze (57223851564); Giroud, Axèle L.A....	57223851564; 7003496253; 37117826800; 58631386600	Political embeddedness and post-acquisition in...	2025	Journal of World Business	60	6	101665.0	NaN	...	10909516	NaN	NaN	NaN	English	Article	Final	All Open Access; Hybrid Gold Open Access	Scopus	2-s2.0-105013485759

5 rows × 41 columns

BERTsentiment

We only train the model on the first 200 as there is a 512 token limitation on the input length for the default BERT model. This may not apply to other models. We then convert the Pandas DataFrame to a Hugging Face Dataset.

from datasets import Dataset
data['Abstract_200'] = data['Abstract'].apply(lambda x: ' '.join(x.split(' ')[:200]))
dataset = Dataset.from_pandas(data)

We now fit the model on the abstracts. We apply the FinBERT model as an illustration. This model was pre-trained on financial communication text.

from transformers import pipeline

finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')
nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)

def get_sentiment(examples):
    # Initialize lists to store results
    sentiments = []
    scores = []

    # Process each entry in the batch
    for text in examples['Abstract_200']:
        try:
            # Get the sentiment and sentiment score for each article
            result = nlp(text)
            sentiment = result[0]['label']
            score = result[0]['score']

            sentiments.append(sentiment)
            scores.append(score)
        except Exception as e:
            print(f'Error processing text: {text}. Error: {e}', flush=True)
            # Append default values in case of an error
            sentiments.append(None)
            scores.append(None)

    # Ensure the output lists are of the same length as the batch size
    batch_size = len(examples['Abstract_200'])
    while len(sentiments) < batch_size:
        sentiments.append(None)
        scores.append(None)

    return {'sentiment': sentiments, 'score': scores}

# Run the sentiment analysis in batches
dataset = dataset.map(get_sentiment, batched=True, batch_size=64)

We can examine the results for the first 15 articles.

df = pd.DataFrame(dataset)
df[['Abstract_200', 'sentiment', 'score']].head(15)

	Abstract_200	sentiment	score
0	This study explores the under-theorized relati...	Neutral	0.993450
1	While politics and political issues such as ri...	Neutral	0.999964
2	This paper discusses the impact of artificial ...	Neutral	0.999968
3	An institutional logic represents the way a pa...	Neutral	0.993483
4	Political embeddedness has been shown to influ...	Positive	0.577034
5	How do MNE subsidiaries respond and perform af...	Negative	0.999626
6	The ever-increasing internationalization and g...	Neutral	0.999457
7	While the impact of digital platforms on firms...	Positive	0.992149
8	Female entrepreneurs in emerging economies enc...	Neutral	0.999127
9	This study centers upon Vietnam's Law on Inves...	Positive	0.999999
10	Binational decoupling—especially between the U...	Neutral	0.922705
11	Drawing primarily upon institutional theory, w...	Neutral	0.999989
12	This paper explores the factors that influence...	Neutral	0.999493
13	The recent acceleration in the international e...	Neutral	0.962983
14	Achieving lateral collaboration benefits acros...	Neutral	0.967192