Topic Modeling

Data

This is a snapshot of the data (JWB article data 1967–2025 downloaded from Scopus) we will be working with.

import pandas as pd
data = pd.read_csv('../data/jwb-articles.csv')
data = data[data['Abstract'].notna()] # Keep nonempty abstracts
data.head()
Authors Author full names Author(s) ID Title Year Source title Volume Issue Art. No. Page start ... ISSN ISBN CODEN PubMed ID Language of Original Document Document Type Publication Stage Open Access Source EID
0 Al Asady, A.; Anokhin, S. Al Asady, Ahmad (57219984746); Anokhin, Sergey... 57219984746; 24482882200 The Trojan horse of international entrepreneur... 2025 Journal of World Business 60 6 101677.0 NaN ... 10909516 NaN NaN NaN English Article Final NaN Scopus 2-s2.0-105014957115
1 Thams, Y.; Dau, L.A.; Doh, J.; Kostova, T.; Ne... Thams, Yannick (55357149800); Dau, Luis Alfons... 55357149800; 35147597100; 7003920280; 66037741... Political ideology and the multinational enter... 2025 Journal of World Business 60 6 101678.0 NaN ... 10909516 NaN NaN NaN English Short survey Final NaN Scopus 2-s2.0-105014844629
2 Lindner, T.; Puck, J.; Puhr, H. Lindner, Thomas (57159151000); Puck, Jonas (85... 57159151000; 8563161700; 57223389639 Artificial intelligence in international busin... 2025 Journal of World Business 60 6 101676.0 NaN ... 10909516 NaN NaN NaN English Short survey Final All Open Access; Hybrid Gold Open Access Scopus 2-s2.0-105014595041
3 Bruton, G.D.; Mejía-Morelos, J.H.; Ahlstrom, D. Bruton, Garry D. (6603867202); Mejía-Morelos, ... 6603867202; 55748855800; 56525447800 Multinational corporations and inclusive suppl... 2025 Journal of World Business 60 6 101663.0 NaN ... 10909516 NaN NaN NaN English Article Final All Open Access; Hybrid Gold Open Access Scopus 2-s2.0-105013512235
4 Liang, Y.; Giroud, A.; Rygh, A.; Chen, Z. Liang, Yanze (57223851564); Giroud, Axèle L.A.... 57223851564; 7003496253; 37117826800; 58631386600 Political embeddedness and post-acquisition in... 2025 Journal of World Business 60 6 101665.0 NaN ... 10909516 NaN NaN NaN English Article Final All Open Access; Hybrid Gold Open Access Scopus 2-s2.0-105013485759

5 rows × 41 columns

Latent Dirichlet Allocation (LDA)

Text preprocessing: tokenize abstracts, remove punctuation and stop words, and store cleaned tokens.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.tokenize import RegexpTokenizer

tokenizer = RegexpTokenizer(r'\w+')
stop_words = stopwords.words('english')

docs = []
for abstract in data['Abstract']:
    tokens = word_tokenize(abstract.lower())
    tokens = tokenizer.tokenize(' '.join(tokens))
    rm_stop_words = [word for word in tokens if word not in stop_words]
    docs.append(rm_stop_words)

Before we fit the LDA model, we construct a dictionary and convert our text to a bag of words.

import gensim
from gensim.models.ldamodel import LdaModel
from gensim import corpora

lda_dict = corpora.Dictionary(docs)
print('The number of unique words:', len(lda_dict))
print(lda_dict)
The number of unique words: 8944
Dictionary<8944 unique tokens: ['activities', 'affect', 'aims', 'also', 'argues']...>
lda_doc_corpus = [lda_dict.doc2bow(word) for word in docs]
print(lda_doc_corpus[0])
[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 2), (6, 1), (7, 2), (8, 3), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 2), (15, 1), (16, 1), (17, 1), (18, 1), (19, 1), (20, 1), (21, 1), (22, 1), (23, 1), (24, 3), (25, 2), (26, 2), (27, 1), (28, 1), (29, 1), (30, 1), (31, 1), (32, 1), (33, 1), (34, 1), (35, 1), (36, 1), (37, 1), (38, 1), (39, 1), (40, 2), (41, 1), (42, 1), (43, 1), (44, 1), (45, 1), (46, 1)]

Now we train a LDA model to identify latent topics in the abstracts.

lda = LdaModel(corpus=lda_doc_corpus, id2word=lda_dict, num_topics=5,
              random_state=42, passes=40, alpha=10, eta=0.1)

We can examine the results of the LDA and display the topics by listing the words associated with each topic.

lda.show_topics()
[(0,
  '0.010*"foreign" + 0.009*"institutional" + 0.009*"countries" + 0.008*"performance" + 0.008*"firms" + 0.007*"study" + 0.007*"chinese" + 0.007*"international" + 0.007*"relationship" + 0.006*"knowledge"'),
 (1,
  '0.019*"international" + 0.014*"research" + 0.013*"business" + 0.010*"global" + 0.007*"ib" + 0.006*"knowledge" + 0.005*"new" + 0.005*"literature" + 0.005*"future" + 0.004*"study"'),
 (2,
  '0.013*"subsidiary" + 0.008*"firm" + 0.007*"subsidiaries" + 0.007*"performance" + 0.006*"firms" + 0.006*"foreign" + 0.006*"results" + 0.005*"global" + 0.005*"knowledge" + 0.005*"family"'),
 (3,
  '0.009*"cultural" + 0.008*"research" + 0.007*"study" + 0.007*"country" + 0.007*"political" + 0.007*"leadership" + 0.006*"multinational" + 0.006*"mncs" + 0.005*"performance" + 0.005*"institutional"'),
 (4,
  '0.025*"firms" + 0.010*"management" + 0.010*"international" + 0.009*"market" + 0.008*"study" + 0.008*"firm" + 0.006*"performance" + 0.006*"internationalization" + 0.005*"markets" + 0.005*"talent"')]

And to visualize the topic probabilities for the first 15 abstracts in a heatmap, we can run the following code.

import matplotlib.pyplot as plt
get_document_topics = lda.get_document_topics(lda_doc_corpus)

all_probs = []

for doc_i in range(15):
    doc_probs = get_document_topics[doc_i]
    print(doc_probs)
    probs = []
    for (topic, prob) in doc_probs:
        probs.append(prob)
    all_probs.append(probs)

plt.imshow(all_probs)
plt.colorbar()
[(0, 0.19223882), (1, 0.280424), (2, 0.18413724), (3, 0.17475025), (4, 0.1684497)]
[(0, 0.09653838), (1, 0.35228702), (2, 0.11452496), (3, 0.3348996), (4, 0.10175001)]
[(0, 0.14665498), (1, 0.25366387), (2, 0.13557251), (3, 0.26665145), (4, 0.19745715)]
[(0, 0.12201825), (1, 0.20950222), (2, 0.121986), (3, 0.1319087), (4, 0.4145848)]
[(0, 0.28674793), (1, 0.1418315), (2, 0.18685031), (3, 0.1600391), (4, 0.22453114)]
[(0, 0.17187466), (1, 0.14167556), (2, 0.32062945), (3, 0.23193854), (4, 0.13388178)]
[(0, 0.11212903), (1, 0.22102064), (2, 0.094602), (3, 0.1505873), (4, 0.421661)]
[(0, 0.14603102), (1, 0.107232735), (2, 0.11303985), (3, 0.11569671), (4, 0.5179997)]
[(0, 0.3068515), (1, 0.18931858), (2, 0.15783337), (3, 0.15932915), (4, 0.18666743)]
[(0, 0.3844842), (1, 0.16181146), (2, 0.15116443), (3, 0.13615756), (4, 0.16638234)]
[(0, 0.16029285), (1, 0.2216254), (2, 0.11499759), (3, 0.14059122), (4, 0.36249295)]
[(0, 0.4805228), (1, 0.09621751), (2, 0.14083625), (3, 0.17901455), (4, 0.10340884)]
[(0, 0.2092328), (1, 0.13705625), (2, 0.22701237), (3, 0.25130144), (4, 0.17539714)]
[(0, 0.20451798), (1, 0.3667167), (2, 0.11523414), (3, 0.11886554), (4, 0.19466569)]
[(0, 0.11036309), (1, 0.1859023), (2, 0.15423217), (3, 0.41059655), (4, 0.13890587)]

BERTopic

We first extract the abstracts from a DataFrame and fit a BERTopic model.

from bertopic import BERTopic

docs = df['Abstract'].tolist()
docs = [str(doc) for doc in docs]

topic_model = BERTopic(language='english', calculate_probabilities=True, verbose=True)
topic_model.fit(docs)

We can retrieve and print the document-topic matrix from the trained BERTopic model, showing which topics are associated with each document.

doc_topic = topic_model.topics_
print('Document-topic matrix:')
print(doc_topic)
Document-topic matrix:
[-1, 30, -1, -1, 24, -1, 1, 5, -1, -1, -1, -1, -1, 5, 0, 8, -1, 5, 9, 10, 9, 3, -1, 30, -1, -1, -1, -1, 19, -1, -1, 8, 8, -1, -1, -1, -1, 9, 1, 9, 30, 5, -1, 29, 12, -1, -1, -1, 1, -1, 8, 3, 1, 17, -1, 10, 4, 10, -1, 1, 1, 18, -1, 19, -1, 11, 3, 8, 0, 18, 18, 18, 30, 17, -1, -1, 18, 2, 9, 8, 30, 9, 18, 8, -1, 0, -1, 9, 5, 5, 28, 8, -1, -1, 2, -1, 19, -1, -1, 0, 18, -1, -1, -1, -1, 23, 8, 17, 18, -1, -1, -1, 9, -1, 9, 1, 30, 5, -1, 5, 28, -1, 2, 8, -1, -1, 8, 8, 9, -1, -1, -1, 12, -1, -1, 16, 8, -1, 24, -1, 6, -1, 9, 22, 10, -1, 10, 6, 9, -1, -1, 3, -1, -1, -1, 9, -1, -1, 3, -1, 10, 0, 0, 0, 23, 0, 1, -1, 24, -1, -1, 27, 4, 3, -1, -1, 27, 24, -1, 5, 9, -1, -1, 0, 1, -1, 18, 8, -1, 1, 28, -1, 9, 13, -1, -1, -1, 5, -1, 21, 17, 1, 18, 27, 1, 8, 9, 1, -1, -1, 0, 9, 2, -1, -1, 1, -1, -1, -1, -1, 24, 14, -1, -1, 5, 23, 6, 5, 1, -1, -1, 3, -1, -1, -1, 27, 17, -1, -1, -1, -1, 1, -1, 18, -1, -1, 1, 10, 10, -1, -1, -1, -1, 10, 9, 8, 1, 17, 23, 17, 29, -1, -1, 1, 5, -1, 5, -1, -1, -1, -1, -1, 13, 19, 17, -1, -1, -1, -1, -1, 29, 30, -1, 18, 1, 10, -1, 5, 27, 3, 9, 27, -1, 10, 9, -1, -1, 0, 0, -1, -1, -1, 29, 9, -1, -1, -1, 16, 5, 16, 1, 1, 27, -1, 1, 20, 3, -1, -1, 23, 26, 27, -1, 3, 8, -1, -1, 10, 20, -1, 11, 19, 11, 11, -1, -1, -1, 3, 17, 0, 10, 0, 6, 23, 3, 5, -1, 11, 12, -1, -1, 0, -1, 5, 28, -1, -1, -1, -1, -1, 16, -1, 13, 0, -1, 8, -1, 10, 26, -1, 5, 0, 3, -1, 10, 10, 20, 11, 10, -1, 10, -1, 6, 0, 24, 26, -1, 3, 11, 26, -1, 4, 0, 0, 8, -1, -1, 8, -1, 23, -1, -1, -1, 3, -1, 13, 18, -1, -1, 19, 8, 6, 27, -1, -1, -1, 11, 10, 5, -1, -1, -1, -1, -1, -1, 6, 8, 21, -1, -1, -1, 11, 24, -1, -1, -1, -1, 30, 8, 4, 5, 5, 2, 5, 11, -1, -1, -1, -1, -1, 4, -1, -1, 30, -1, -1, 18, -1, 0, -1, 0, 20, 17, 6, 1, 17, -1, 27, 6, 3, -1, -1, 11, 25, 2, -1, 17, 16, 24, 20, -1, -1, -1, -1, -1, -1, 11, -1, 11, 11, -1, 5, 10, -1, 0, -1, -1, 24, 10, -1, -1, -1, 25, -1, -1, 24, 11, 5, -1, 0, -1, 10, -1, 18, 4, 20, 8, 18, -1, -1, -1, 2, -1, -1, 10, -1, 3, 5, 1, -1, 16, 7, 12, -1, 3, -1, 6, 27, 23, 24, 11, 16, -1, 10, -1, 8, 12, 8, 2, 8, -1, 4, 11, 8, -1, 8, 1, 16, -1, 16, -1, -1, 9, -1, 11, 0, 3, -1, -1, -1, -1, 1, 3, 1, 16, -1, -1, 13, 0, -1, 6, 1, -1, 20, 24, 24, 29, -1, -1, -1, 1, 30, -1, 0, -1, 4, 16, 20, 28, 0, 19, -1, -1, -1, 2, -1, 13, -1, -1, 13, -1, 30, 25, 0, -1, -1, 0, 2, 30, 0, 26, -1, 17, 4, -1, 11, -1, 17, 20, 10, 8, 20, -1, 3, 19, 5, -1, 20, -1, -1, 13, -1, 2, 3, -1, 0, 7, 12, 12, 12, 12, 12, 12, 12, -1, 12, 12, 12, 12, 2, 3, -1, 6, 11, -1, -1, 29, -1, 29, 29, 9, -1, 0, 29, 26, -1, 0, -1, -1, 5, 10, -1, 10, -1, 0, -1, -1, 1, -1, -1, 6, 1, 19, 0, 16, -1, -1, -1, -1, 18, 13, -1, 16, 13, 4, 18, -1, -1, 20, 13, 22, -1, 3, 7, 2, 7, 7, 2, 7, 2, 7, 12, 5, 0, 17, 3, 6, 4, 5, -1, -1, 0, 9, -1, -1, 3, -1, -1, 2, -1, 4, 6, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, -1, 1, 4, 16, 4, 4, 26, 0, 11, 18, -1, -1, 23, 0, -1, -1, 3, -1, -1, -1, 7, -1, -1, 7, -1, 20, -1, -1, 1, 28, -1, 25, 7, 25, 26, -1, -1, 0, 6, 16, 3, 7, -1, 20, -1, -1, 22, -1, 30, 16, -1, 16, 26, -1, -1, 12, 6, 6, 9, -1, -1, -1, 27, 9, -1, 17, 17, -1, 1, 1, 17, -1, -1, 17, 17, -1, 17, 20, 19, 22, 6, 0, -1, 4, -1, -1, -1, 0, 9, 16, -1, 7, 8, 2, 29, -1, -1, 4, 12, 14, 0, 23, 7, -1, 6, 13, -1, 13, -1, 13, 13, 13, 13, 13, -1, -1, -1, 4, 5, 19, 1, 19, 5, -1, -1, 26, 13, -1, 12, 12, 12, 12, 12, 12, 12, -1, 12, 11, 7, 13, 22, 15, 6, 0, 2, 10, -1, -1, 0, -1, -1, -1, -1, -1, -1, 21, 4, -1, 24, 20, -1, 0, -1, 20, -1, 25, -1, -1, 26, -1, -1, -1, 19, -1, -1, 0, 19, -1, 25, 0, 20, 6, -1, 31, -1, 3, 7, -1, -1, -1, 18, -1, 4, 29, 19, -1, -1, 6, -1, -1, -1, 21, 1, 3, 1, 20, 1, 1, -1, 13, 1, -1, 23, 11, 2, 25, -1, -1, 9, 2, -1, 5, -1, 22, 28, 0, -1, 0, -1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 2, -1, -1, 25, -1, 25, -1, -1, 2, -1, 31, 11, 21, 7, -1, 7, 5, -1, 25, -1, -1, 6, 10, 22, -1, 26, -1, 21, 3, -1, -1, -1, -1, -1, 25, 2, -1, -1, -1, -1, 6, -1, -1, -1, 2, 2, 2, 2, 0, 2, -1, 23, 26, -1, -1, -1, 4, 6, 7, 23, 8, -1, -1, -1, 0, -1, 3, -1, 21, 2, 2, -1, 7, 7, 7, -1, -1, -1, 17, 7, 7, -1, 7, -1, 29, -1, 0, 16, -1, -1, -1, -1, 21, 0, -1, -1, -1, 7, -1, 7, -1, 6, 11, 31, 15, 31, 11, 31, 31, -1, -1, -1, 9, 11, 28, -1, -1, -1, -1, 0, -1, 0, -1, 2, 0, 25, 0, 28, -1, 0, -1, 0, 15, -1, 2, 29, -1, -1, 28, -1, -1, 2, -1, -1, 2, -1, 21, 13, 4, -1, 4, -1, -1, -1, -1, -1, 2, 2, -1, 7, 7, 6, 25, -1, -1, -1, -1, 9, 21, 14, 7, 22, -1, 3, 3, 3, 3, -1, 6, 19, 3, 22, -1, 4, -1, -1, -1, 22, 22, 19, 1, 1, -1, -1, -1, 31, -1, 23, 31, -1, 16, -1, -1, 22, -1, 13, 21, -1, -1, 7, 14, -1, 21, 21, -1, -1, 7, 21, -1, 22, 21, -1, 7, -1, 24, -1, -1, -1, -1, 23, -1, 16, -1, -1, -1, 4, -1, 16, 6, -1, 2, -1, 11, 21, 2, 6, 21, 23, 2, 2, 31, -1, -1, -1, -1, -1, 6, -1, 2, 7, 21, -1, 2, 6, -1, -1, 6, 28, -1, -1, -1, -1, 14, -1, -1, -1, -1, 14, -1, -1, 14, -1, 14, 13, 22, 5, -1, -1, -1, -1, 7, -1, -1, -1, 19, -1, -1, 9, -1, 15, -1, 28, 15, 13, 14, -1, 14, 8, 9, 14, 15, -1, 14, -1, 14, -1, 31, 14, 14, 9, 14, 6, -1, -1, -1, 14, 14, -1, 14, -1, 27, 14, 14, 14, 16, 14, 14, 14, 3, 15, 15, 28, 15, -1, -1, 15, 15, 15, 15, -1, -1, 15, 15, 15, 15, 15, 15, 15, -1, 15, 19, -1, 15, 15, 22, 22, 22]

We can also retrieve and print the topic probability distributions for each document, showing the likelihood of each topic being associated with the documents.

# Get probabilities for each topic
probs = topic_model.probabilities_
print('Topic probabilities for the first document:')
print(probs[0].round(2))
print()
# Print topic probabilities for the first 15 documents
for i in range(min(15, len(docs))):
    print(f'Document {i + 1} is in topic {doc_topic[i]}')
    print(f'Topic probabilities for Document {i + 1}:')
    print(probs[i].round(3))
    print()
Topic probabilities for the first document:
[0.   0.   0.   0.   0.   0.   0.   0.   0.   0.01 0.01 0.01 0.   0.
 0.   0.   0.01 0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.01
 0.   0.   0.01 0.01]

Document 1 is in topic -1
Topic probabilities for Document 1:
[0.003 0.003 0.003 0.004 0.002 0.003 0.003 0.003 0.003 0.006 0.007 0.007
 0.003 0.004 0.005 0.004 0.005 0.002 0.004 0.004 0.002 0.004 0.003 0.003
 0.003 0.003 0.004 0.008 0.003 0.004 0.006 0.005]

Document 2 is in topic 30
Topic probabilities for Document 2:
[0.012 0.014 0.008 0.013 0.007 0.017 0.011 0.011 0.02  0.037 0.024 0.017
 0.008 0.016 0.015 0.01  0.03  0.006 0.009 0.016 0.007 0.011 0.017 0.009
 0.014 0.009 0.019 0.025 0.012 0.015 0.55  0.012]

Document 3 is in topic -1
Topic probabilities for Document 3:
[0.004 0.004 0.002 0.003 0.002 0.004 0.002 0.003 0.005 0.004 0.003 0.002
 0.002 0.003 0.002 0.002 0.006 0.002 0.002 0.003 0.002 0.003 0.004 0.002
 0.005 0.002 0.006 0.004 0.003 0.005 0.006 0.002]

Document 4 is in topic -1
Topic probabilities for Document 4:
[0.003 0.003 0.002 0.004 0.002 0.004 0.003 0.002 0.003 0.007 0.018 0.009
 0.002 0.005 0.006 0.003 0.007 0.001 0.003 0.005 0.002 0.003 0.004 0.002
 0.003 0.002 0.004 0.006 0.003 0.003 0.007 0.005]

Document 5 is in topic 24
Topic probabilities for Document 5:
[0.077 0.035 0.014 0.057 0.013 0.023 0.02  0.015 0.017 0.019 0.02  0.016
 0.012 0.016 0.014 0.011 0.028 0.012 0.012 0.026 0.014 0.027 0.021 0.026
 0.151 0.016 0.049 0.024 0.033 0.027 0.022 0.012]

Document 6 is in topic -1
Topic probabilities for Document 6:
[0.02  0.021 0.014 0.023 0.012 0.024 0.017 0.018 0.027 0.069 0.047 0.034
 0.014 0.024 0.028 0.019 0.05  0.01  0.016 0.028 0.011 0.019 0.024 0.016
 0.022 0.015 0.032 0.057 0.019 0.022 0.124 0.023]

Document 7 is in topic 1
Topic probabilities for Document 7:
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.]

Document 8 is in topic 5
Topic probabilities for Document 8:
[0.015 0.042 0.01  0.015 0.009 0.088 0.018 0.01  0.036 0.019 0.018 0.013
 0.009 0.02  0.013 0.009 0.024 0.009 0.011 0.019 0.01  0.012 0.031 0.012
 0.024 0.01  0.018 0.016 0.014 0.017 0.024 0.011]

Document 9 is in topic -1
Topic probabilities for Document 9:
[0.028 0.122 0.014 0.031 0.014 0.056 0.027 0.015 0.028 0.025 0.025 0.019
 0.013 0.025 0.016 0.012 0.034 0.013 0.015 0.026 0.012 0.021 0.033 0.02
 0.066 0.015 0.032 0.024 0.024 0.03  0.03  0.014]

Document 10 is in topic -1
Topic probabilities for Document 10:
[0.013 0.015 0.007 0.019 0.007 0.017 0.016 0.009 0.014 0.026 0.125 0.052
 0.007 0.02  0.028 0.016 0.038 0.006 0.012 0.03  0.009 0.013 0.02  0.011
 0.016 0.008 0.018 0.031 0.014 0.013 0.028 0.023]

Document 11 is in topic -1
Topic probabilities for Document 11:
[0.01  0.016 0.006 0.012 0.005 0.016 0.01  0.007 0.016 0.029 0.021 0.012
 0.006 0.012 0.011 0.007 0.04  0.005 0.007 0.015 0.005 0.008 0.014 0.007
 0.014 0.006 0.015 0.018 0.009 0.01  0.023 0.008]

Document 12 is in topic -1
Topic probabilities for Document 12:
[0.014 0.012 0.006 0.05  0.006 0.01  0.011 0.007 0.009 0.012 0.018 0.014
 0.006 0.01  0.01  0.008 0.019 0.005 0.007 0.018 0.006 0.014 0.011 0.011
 0.017 0.007 0.018 0.023 0.015 0.011 0.014 0.01 ]

Document 13 is in topic -1
Topic probabilities for Document 13:
[0.026 0.027 0.012 0.031 0.011 0.027 0.019 0.015 0.026 0.041 0.04  0.026
 0.012 0.021 0.022 0.014 0.151 0.01  0.012 0.043 0.012 0.02  0.027 0.017
 0.035 0.014 0.055 0.054 0.023 0.024 0.049 0.017]

Document 14 is in topic 5
Topic probabilities for Document 14:
[0.016 0.048 0.011 0.017 0.01  0.108 0.02  0.012 0.039 0.021 0.02  0.015
 0.01  0.022 0.014 0.01  0.026 0.01  0.012 0.021 0.011 0.014 0.034 0.013
 0.027 0.011 0.02  0.017 0.015 0.018 0.026 0.012]

Document 15 is in topic 0
Topic probabilities for Document 15:
[0.055 0.017 0.017 0.023 0.014 0.014 0.013 0.02  0.013 0.013 0.013 0.011
 0.014 0.012 0.01  0.008 0.018 0.012 0.009 0.017 0.015 0.026 0.014 0.026
 0.028 0.021 0.04  0.017 0.032 0.031 0.016 0.009]
# Get the lists of keywords under each topic
topic_keywords = topic_model.get_topics()

# Print the lists of keywords for each topic
for topic_id, keywords in topic_keywords.items():
    keywords = [(u, round(v, 3)) for u, v in keywords]
    print(f'Topic {topic_id}: {keywords}')

We can also examine the topics more closely.

# To see the first 5 topics
freq = topic_model.get_topic_info()
freq.head(5)
Topic Count Name Representation Representative_Docs
0 -1 544 -1_the_of_and_in [the, of, and, in, to, that, we, on, this, with] [In recent years, there has been an increasing...
1 0 61 0_knowledge_transfer_and_the [knowledge, transfer, and, the, of, on, to, th... [This paper proposes a conceptual framework de...
2 1 52 1_international_entrepreneurial_internationali... [international, entrepreneurial, international... [Grounded in the resource-based view of the fi...
3 2 43 2_career_expatriates_expatriate_assignments [career, expatriates, expatriate, assignments,... [Creating organizational processes which nurtu...
4 3 38 3_acquisitions_acquisition_crossborder_acquirers [acquisitions, acquisition, crossborder, acqui... [This study develops and tests a framework abo...

Note that topic -1 indicates that the document has not been grouped into a topic.