Sentiment Analysis

Data

This is a snapshot of the data (JWB article data 1967–2025 downloaded from Scopus) we will be working with.

import pandas as pd
data = pd.read_csv('../data/jwb-articles.csv')
data = data[data['Abstract'].notna()] # Keep nonempty abstracts
data.head()
Authors Author full names Author(s) ID Title Year Source title Volume Issue Art. No. Page start ... ISSN ISBN CODEN PubMed ID Language of Original Document Document Type Publication Stage Open Access Source EID
0 Al Asady, A.; Anokhin, S. Al Asady, Ahmad (57219984746); Anokhin, Sergey... 57219984746; 24482882200 The Trojan horse of international entrepreneur... 2025 Journal of World Business 60 6 101677.0 NaN ... 10909516 NaN NaN NaN English Article Final NaN Scopus 2-s2.0-105014957115
1 Thams, Y.; Dau, L.A.; Doh, J.; Kostova, T.; Ne... Thams, Yannick (55357149800); Dau, Luis Alfons... 55357149800; 35147597100; 7003920280; 66037741... Political ideology and the multinational enter... 2025 Journal of World Business 60 6 101678.0 NaN ... 10909516 NaN NaN NaN English Short survey Final NaN Scopus 2-s2.0-105014844629
2 Lindner, T.; Puck, J.; Puhr, H. Lindner, Thomas (57159151000); Puck, Jonas (85... 57159151000; 8563161700; 57223389639 Artificial intelligence in international busin... 2025 Journal of World Business 60 6 101676.0 NaN ... 10909516 NaN NaN NaN English Short survey Final All Open Access; Hybrid Gold Open Access Scopus 2-s2.0-105014595041
3 Bruton, G.D.; Mejía-Morelos, J.H.; Ahlstrom, D. Bruton, Garry D. (6603867202); Mejía-Morelos, ... 6603867202; 55748855800; 56525447800 Multinational corporations and inclusive suppl... 2025 Journal of World Business 60 6 101663.0 NaN ... 10909516 NaN NaN NaN English Article Final All Open Access; Hybrid Gold Open Access Scopus 2-s2.0-105013512235
4 Liang, Y.; Giroud, A.; Rygh, A.; Chen, Z. Liang, Yanze (57223851564); Giroud, Axèle L.A.... 57223851564; 7003496253; 37117826800; 58631386600 Political embeddedness and post-acquisition in... 2025 Journal of World Business 60 6 101665.0 NaN ... 10909516 NaN NaN NaN English Article Final All Open Access; Hybrid Gold Open Access Scopus 2-s2.0-105013485759

5 rows × 41 columns

BERTsentiment

We only train the model on the first 200 as there is a 512 token limitation on the input length for the default BERT model. This may not apply to other models. We then convert the Pandas DataFrame to a Hugging Face Dataset.

from datasets import Dataset
data['Abstract_200'] = data['Abstract'].apply(lambda x: ' '.join(x.split(' ')[:200]))
dataset = Dataset.from_pandas(data)

We now fit the model on the abstracts. We apply the FinBERT model as an illustration. This model was pre-trained on financial communication text.

from transformers import pipeline

finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')
nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)

def get_sentiment(examples):
    # Initialize lists to store results
    sentiments = []
    scores = []

    # Process each entry in the batch
    for text in examples['Abstract_200']:
        try:
            # Get the sentiment and sentiment score for each article
            result = nlp(text)
            sentiment = result[0]['label']
            score = result[0]['score']

            sentiments.append(sentiment)
            scores.append(score)
        except Exception as e:
            print(f'Error processing text: {text}. Error: {e}', flush=True)
            # Append default values in case of an error
            sentiments.append(None)
            scores.append(None)

    # Ensure the output lists are of the same length as the batch size
    batch_size = len(examples['Abstract_200'])
    while len(sentiments) < batch_size:
        sentiments.append(None)
        scores.append(None)

    return {'sentiment': sentiments, 'score': scores}

# Run the sentiment analysis in batches
dataset = dataset.map(get_sentiment, batched=True, batch_size=64)

We can examine the results for the first 15 articles.

df = pd.DataFrame(dataset)
df[['Abstract_200', 'sentiment', 'score']].head(15)
Abstract_200 sentiment score
0 This study explores the under-theorized relati... Neutral 0.993450
1 While politics and political issues such as ri... Neutral 0.999964
2 This paper discusses the impact of artificial ... Neutral 0.999968
3 An institutional logic represents the way a pa... Neutral 0.993483
4 Political embeddedness has been shown to influ... Positive 0.577034
5 How do MNE subsidiaries respond and perform af... Negative 0.999626
6 The ever-increasing internationalization and g... Neutral 0.999457
7 While the impact of digital platforms on firms... Positive 0.992149
8 Female entrepreneurs in emerging economies enc... Neutral 0.999127
9 This study centers upon Vietnam's Law on Inves... Positive 0.999999
10 Binational decoupling—especially between the U... Neutral 0.922705
11 Drawing primarily upon institutional theory, w... Neutral 0.999989
12 This paper explores the factors that influence... Neutral 0.999493
13 The recent acceleration in the international e... Neutral 0.962983
14 Achieving lateral collaboration benefits acros... Neutral 0.967192