Topic Modeling

Data

This is a snapshot of the data (JWB article data 1967–2025 downloaded from Scopus) we will be working with.

import pandas as pd
data = pd.read_csv('../data/jwb-articles.csv')
data = data[data['Abstract'].notna()] # Keep nonempty abstracts
data.head()

	Authors	Author full names	Author(s) ID	Title	Year	Source title	Volume	Issue	Art. No.	Page start	...	ISSN	ISBN	CODEN	PubMed ID	Language of Original Document	Document Type	Publication Stage	Open Access	Source	EID
0	Al Asady, A.; Anokhin, S.	Al Asady, Ahmad (57219984746); Anokhin, Sergey...	57219984746; 24482882200	The Trojan horse of international entrepreneur...	2025	Journal of World Business	60	6	101677.0	NaN	...	10909516	NaN	NaN	NaN	English	Article	Final	NaN	Scopus	2-s2.0-105014957115
1	Thams, Y.; Dau, L.A.; Doh, J.; Kostova, T.; Ne...	Thams, Yannick (55357149800); Dau, Luis Alfons...	55357149800; 35147597100; 7003920280; 66037741...	Political ideology and the multinational enter...	2025	Journal of World Business	60	6	101678.0	NaN	...	10909516	NaN	NaN	NaN	English	Short survey	Final	NaN	Scopus	2-s2.0-105014844629
2	Lindner, T.; Puck, J.; Puhr, H.	Lindner, Thomas (57159151000); Puck, Jonas (85...	57159151000; 8563161700; 57223389639	Artificial intelligence in international busin...	2025	Journal of World Business	60	6	101676.0	NaN	...	10909516	NaN	NaN	NaN	English	Short survey	Final	All Open Access; Hybrid Gold Open Access	Scopus	2-s2.0-105014595041
3	Bruton, G.D.; Mejía-Morelos, J.H.; Ahlstrom, D.	Bruton, Garry D. (6603867202); Mejía-Morelos, ...	6603867202; 55748855800; 56525447800	Multinational corporations and inclusive suppl...	2025	Journal of World Business	60	6	101663.0	NaN	...	10909516	NaN	NaN	NaN	English	Article	Final	All Open Access; Hybrid Gold Open Access	Scopus	2-s2.0-105013512235
4	Liang, Y.; Giroud, A.; Rygh, A.; Chen, Z.	Liang, Yanze (57223851564); Giroud, Axèle L.A....	57223851564; 7003496253; 37117826800; 58631386600	Political embeddedness and post-acquisition in...	2025	Journal of World Business	60	6	101665.0	NaN	...	10909516	NaN	NaN	NaN	English	Article	Final	All Open Access; Hybrid Gold Open Access	Scopus	2-s2.0-105013485759

5 rows × 41 columns

Latent Dirichlet Allocation (LDA)

Text preprocessing: tokenize abstracts, remove punctuation and stop words, and store cleaned tokens.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.tokenize import RegexpTokenizer

tokenizer = RegexpTokenizer(r'\w+')
stop_words = stopwords.words('english')

docs = []
for abstract in data['Abstract']:
    tokens = word_tokenize(abstract.lower())
    tokens = tokenizer.tokenize(' '.join(tokens))
    rm_stop_words = [word for word in tokens if word not in stop_words]
    docs.append(rm_stop_words)

Before we fit the LDA model, we construct a dictionary and convert our text to a bag of words.

import gensim
from gensim.models.ldamodel import LdaModel
from gensim import corpora

lda_dict = corpora.Dictionary(docs)
print('The number of unique words:', len(lda_dict))
print(lda_dict)

The number of unique words: 8944
Dictionary<8944 unique tokens: ['activities', 'affect', 'aims', 'also', 'argues']...>

lda_doc_corpus = [lda_dict.doc2bow(word) for word in docs]
print(lda_doc_corpus[0])

[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 2), (6, 1), (7, 2), (8, 3), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 2), (15, 1), (16, 1), (17, 1), (18, 1), (19, 1), (20, 1), (21, 1), (22, 1), (23, 1), (24, 3), (25, 2), (26, 2), (27, 1), (28, 1), (29, 1), (30, 1), (31, 1), (32, 1), (33, 1), (34, 1), (35, 1), (36, 1), (37, 1), (38, 1), (39, 1), (40, 2), (41, 1), (42, 1), (43, 1), (44, 1), (45, 1), (46, 1)]

Now we train a LDA model to identify latent topics in the abstracts.

lda = LdaModel(corpus=lda_doc_corpus, id2word=lda_dict, num_topics=5,
              random_state=42, passes=40, alpha=10, eta=0.1)

We can examine the results of the LDA and display the topics by listing the words associated with each topic.

lda.show_topics()

[(0,
  '0.010*"foreign" + 0.009*"institutional" + 0.009*"countries" + 0.008*"performance" + 0.008*"firms" + 0.007*"study" + 0.007*"chinese" + 0.007*"international" + 0.007*"relationship" + 0.006*"knowledge"'),
 (1,
  '0.019*"international" + 0.014*"research" + 0.013*"business" + 0.010*"global" + 0.007*"ib" + 0.006*"knowledge" + 0.005*"new" + 0.005*"literature" + 0.005*"future" + 0.004*"study"'),
 (2,
  '0.013*"subsidiary" + 0.008*"firm" + 0.007*"subsidiaries" + 0.007*"performance" + 0.006*"firms" + 0.006*"foreign" + 0.006*"results" + 0.005*"global" + 0.005*"knowledge" + 0.005*"family"'),
 (3,
  '0.009*"cultural" + 0.008*"research" + 0.007*"study" + 0.007*"country" + 0.007*"political" + 0.007*"leadership" + 0.006*"multinational" + 0.006*"mncs" + 0.005*"performance" + 0.005*"institutional"'),
 (4,
  '0.025*"firms" + 0.010*"management" + 0.010*"international" + 0.009*"market" + 0.008*"study" + 0.008*"firm" + 0.006*"performance" + 0.006*"internationalization" + 0.005*"markets" + 0.005*"talent"')]

And to visualize the topic probabilities for the first 15 abstracts in a heatmap, we can run the following code.

import matplotlib.pyplot as plt
get_document_topics = lda.get_document_topics(lda_doc_corpus)

all_probs = []

for doc_i in range(15):
    doc_probs = get_document_topics[doc_i]
    print(doc_probs)
    probs = []
    for (topic, prob) in doc_probs:
        probs.append(prob)
    all_probs.append(probs)

plt.imshow(all_probs)
plt.colorbar()

[(0, 0.19223882), (1, 0.280424), (2, 0.18413724), (3, 0.17475025), (4, 0.1684497)]
[(0, 0.09653838), (1, 0.35228702), (2, 0.11452496), (3, 0.3348996), (4, 0.10175001)]
[(0, 0.14665498), (1, 0.25366387), (2, 0.13557251), (3, 0.26665145), (4, 0.19745715)]
[(0, 0.12201825), (1, 0.20950222), (2, 0.121986), (3, 0.1319087), (4, 0.4145848)]
[(0, 0.28674793), (1, 0.1418315), (2, 0.18685031), (3, 0.1600391), (4, 0.22453114)]
[(0, 0.17187466), (1, 0.14167556), (2, 0.32062945), (3, 0.23193854), (4, 0.13388178)]
[(0, 0.11212903), (1, 0.22102064), (2, 0.094602), (3, 0.1505873), (4, 0.421661)]
[(0, 0.14603102), (1, 0.107232735), (2, 0.11303985), (3, 0.11569671), (4, 0.5179997)]
[(0, 0.3068515), (1, 0.18931858), (2, 0.15783337), (3, 0.15932915), (4, 0.18666743)]
[(0, 0.3844842), (1, 0.16181146), (2, 0.15116443), (3, 0.13615756), (4, 0.16638234)]
[(0, 0.16029285), (1, 0.2216254), (2, 0.11499759), (3, 0.14059122), (4, 0.36249295)]
[(0, 0.4805228), (1, 0.09621751), (2, 0.14083625), (3, 0.17901455), (4, 0.10340884)]
[(0, 0.2092328), (1, 0.13705625), (2, 0.22701237), (3, 0.25130144), (4, 0.17539714)]
[(0, 0.20451798), (1, 0.3667167), (2, 0.11523414), (3, 0.11886554), (4, 0.19466569)]
[(0, 0.11036309), (1, 0.1859023), (2, 0.15423217), (3, 0.41059655), (4, 0.13890587)]

BERTopic

We first extract the abstracts from a DataFrame and fit a BERTopic model.

from bertopic import BERTopic

docs = df['Abstract'].tolist()
docs = [str(doc) for doc in docs]

topic_model = BERTopic(language='english', calculate_probabilities=True, verbose=True)
topic_model.fit(docs)

We can retrieve and print the document-topic matrix from the trained BERTopic model, showing which topics are associated with each document.

doc_topic = topic_model.topics_
print('Document-topic matrix:')
print(doc_topic)

Document-topic matrix:
[-1, 30, -1, -1, 24, -1, 1, 5, -1, -1, -1, -1, -1, 5, 0, 8, -1, 5, 9, 10, 9, 3, -1, 30, -1, -1, -1, -1, 19, -1, -1, 8, 8, -1, -1, -1, -1, 9, 1, 9, 30, 5, -1, 29, 12, -1, -1, -1, 1, -1, 8, 3, 1, 17, -1, 10, 4, 10, -1, 1, 1, 18, -1, 19, -1, 11, 3, 8, 0, 18, 18, 18, 30, 17, -1, -1, 18, 2, 9, 8, 30, 9, 18, 8, -1, 0, -1, 9, 5, 5, 28, 8, -1, -1, 2, -1, 19, -1, -1, 0, 18, -1, -1, -1, -1, 23, 8, 17, 18, -1, -1, -1, 9, -1, 9, 1, 30, 5, -1, 5, 28, -1, 2, 8, -1, -1, 8, 8, 9, -1, -1, -1, 12, -1, -1, 16, 8, -1, 24, -1, 6, -1, 9, 22, 10, -1, 10, 6, 9, -1, -1, 3, -1, -1, -1, 9, -1, -1, 3, -1, 10, 0, 0, 0, 23, 0, 1, -1, 24, -1, -1, 27, 4, 3, -1, -1, 27, 24, -1, 5, 9, -1, -1, 0, 1, -1, 18, 8, -1, 1, 28, -1, 9, 13, -1, -1, -1, 5, -1, 21, 17, 1, 18, 27, 1, 8, 9, 1, -1, -1, 0, 9, 2, -1, -1, 1, -1, -1, -1, -1, 24, 14, -1, -1, 5, 23, 6, 5, 1, -1, -1, 3, -1, -1, -1, 27, 17, -1, -1, -1, -1, 1, -1, 18, -1, -1, 1, 10, 10, -1, -1, -1, -1, 10, 9, 8, 1, 17, 23, 17, 29, -1, -1, 1, 5, -1, 5, -1, -1, -1, -1, -1, 13, 19, 17, -1, -1, -1, -1, -1, 29, 30, -1, 18, 1, 10, -1, 5, 27, 3, 9, 27, -1, 10, 9, -1, -1, 0, 0, -1, -1, -1, 29, 9, -1, -1, -1, 16, 5, 16, 1, 1, 27, -1, 1, 20, 3, -1, -1, 23, 26, 27, -1, 3, 8, -1, -1, 10, 20, -1, 11, 19, 11, 11, -1, -1, -1, 3, 17, 0, 10, 0, 6, 23, 3, 5, -1, 11, 12, -1, -1, 0, -1, 5, 28, -1, -1, -1, -1, -1, 16, -1, 13, 0, -1, 8, -1, 10, 26, -1, 5, 0, 3, -1, 10, 10, 20, 11, 10, -1, 10, -1, 6, 0, 24, 26, -1, 3, 11, 26, -1, 4, 0, 0, 8, -1, -1, 8, -1, 23, -1, -1, -1, 3, -1, 13, 18, -1, -1, 19, 8, 6, 27, -1, -1, -1, 11, 10, 5, -1, -1, -1, -1, -1, -1, 6, 8, 21, -1, -1, -1, 11, 24, -1, -1, -1, -1, 30, 8, 4, 5, 5, 2, 5, 11, -1, -1, -1, -1, -1, 4, -1, -1, 30, -1, -1, 18, -1, 0, -1, 0, 20, 17, 6, 1, 17, -1, 27, 6, 3, -1, -1, 11, 25, 2, -1, 17, 16, 24, 20, -1, -1, -1, -1, -1, -1, 11, -1, 11, 11, -1, 5, 10, -1, 0, -1, -1, 24, 10, -1, -1, -1, 25, -1, -1, 24, 11, 5, -1, 0, -1, 10, -1, 18, 4, 20, 8, 18, -1, -1, -1, 2, -1, -1, 10, -1, 3, 5, 1, -1, 16, 7, 12, -1, 3, -1, 6, 27, 23, 24, 11, 16, -1, 10, -1, 8, 12, 8, 2, 8, -1, 4, 11, 8, -1, 8, 1, 16, -1, 16, -1, -1, 9, -1, 11, 0, 3, -1, -1, -1, -1, 1, 3, 1, 16, -1, -1, 13, 0, -1, 6, 1, -1, 20, 24, 24, 29, -1, -1, -1, 1, 30, -1, 0, -1, 4, 16, 20, 28, 0, 19, -1, -1, -1, 2, -1, 13, -1, -1, 13, -1, 30, 25, 0, -1, -1, 0, 2, 30, 0, 26, -1, 17, 4, -1, 11, -1, 17, 20, 10, 8, 20, -1, 3, 19, 5, -1, 20, -1, -1, 13, -1, 2, 3, -1, 0, 7, 12, 12, 12, 12, 12, 12, 12, -1, 12, 12, 12, 12, 2, 3, -1, 6, 11, -1, -1, 29, -1, 29, 29, 9, -1, 0, 29, 26, -1, 0, -1, -1, 5, 10, -1, 10, -1, 0, -1, -1, 1, -1, -1, 6, 1, 19, 0, 16, -1, -1, -1, -1, 18, 13, -1, 16, 13, 4, 18, -1, -1, 20, 13, 22, -1, 3, 7, 2, 7, 7, 2, 7, 2, 7, 12, 5, 0, 17, 3, 6, 4, 5, -1, -1, 0, 9, -1, -1, 3, -1, -1, 2, -1, 4, 6, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, -1, 1, 4, 16, 4, 4, 26, 0, 11, 18, -1, -1, 23, 0, -1, -1, 3, -1, -1, -1, 7, -1, -1, 7, -1, 20, -1, -1, 1, 28, -1, 25, 7, 25, 26, -1, -1, 0, 6, 16, 3, 7, -1, 20, -1, -1, 22, -1, 30, 16, -1, 16, 26, -1, -1, 12, 6, 6, 9, -1, -1, -1, 27, 9, -1, 17, 17, -1, 1, 1, 17, -1, -1, 17, 17, -1, 17, 20, 19, 22, 6, 0, -1, 4, -1, -1, -1, 0, 9, 16, -1, 7, 8, 2, 29, -1, -1, 4, 12, 14, 0, 23, 7, -1, 6, 13, -1, 13, -1, 13, 13, 13, 13, 13, -1, -1, -1, 4, 5, 19, 1, 19, 5, -1, -1, 26, 13, -1, 12, 12, 12, 12, 12, 12, 12, -1, 12, 11, 7, 13, 22, 15, 6, 0, 2, 10, -1, -1, 0, -1, -1, -1, -1, -1, -1, 21, 4, -1, 24, 20, -1, 0, -1, 20, -1, 25, -1, -1, 26, -1, -1, -1, 19, -1, -1, 0, 19, -1, 25, 0, 20, 6, -1, 31, -1, 3, 7, -1, -1, -1, 18, -1, 4, 29, 19, -1, -1, 6, -1, -1, -1, 21, 1, 3, 1, 20, 1, 1, -1, 13, 1, -1, 23, 11, 2, 25, -1, -1, 9, 2, -1, 5, -1, 22, 28, 0, -1, 0, -1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 2, -1, -1, 25, -1, 25, -1, -1, 2, -1, 31, 11, 21, 7, -1, 7, 5, -1, 25, -1, -1, 6, 10, 22, -1, 26, -1, 21, 3, -1, -1, -1, -1, -1, 25, 2, -1, -1, -1, -1, 6, -1, -1, -1, 2, 2, 2, 2, 0, 2, -1, 23, 26, -1, -1, -1, 4, 6, 7, 23, 8, -1, -1, -1, 0, -1, 3, -1, 21, 2, 2, -1, 7, 7, 7, -1, -1, -1, 17, 7, 7, -1, 7, -1, 29, -1, 0, 16, -1, -1, -1, -1, 21, 0, -1, -1, -1, 7, -1, 7, -1, 6, 11, 31, 15, 31, 11, 31, 31, -1, -1, -1, 9, 11, 28, -1, -1, -1, -1, 0, -1, 0, -1, 2, 0, 25, 0, 28, -1, 0, -1, 0, 15, -1, 2, 29, -1, -1, 28, -1, -1, 2, -1, -1, 2, -1, 21, 13, 4, -1, 4, -1, -1, -1, -1, -1, 2, 2, -1, 7, 7, 6, 25, -1, -1, -1, -1, 9, 21, 14, 7, 22, -1, 3, 3, 3, 3, -1, 6, 19, 3, 22, -1, 4, -1, -1, -1, 22, 22, 19, 1, 1, -1, -1, -1, 31, -1, 23, 31, -1, 16, -1, -1, 22, -1, 13, 21, -1, -1, 7, 14, -1, 21, 21, -1, -1, 7, 21, -1, 22, 21, -1, 7, -1, 24, -1, -1, -1, -1, 23, -1, 16, -1, -1, -1, 4, -1, 16, 6, -1, 2, -1, 11, 21, 2, 6, 21, 23, 2, 2, 31, -1, -1, -1, -1, -1, 6, -1, 2, 7, 21, -1, 2, 6, -1, -1, 6, 28, -1, -1, -1, -1, 14, -1, -1, -1, -1, 14, -1, -1, 14, -1, 14, 13, 22, 5, -1, -1, -1, -1, 7, -1, -1, -1, 19, -1, -1, 9, -1, 15, -1, 28, 15, 13, 14, -1, 14, 8, 9, 14, 15, -1, 14, -1, 14, -1, 31, 14, 14, 9, 14, 6, -1, -1, -1, 14, 14, -1, 14, -1, 27, 14, 14, 14, 16, 14, 14, 14, 3, 15, 15, 28, 15, -1, -1, 15, 15, 15, 15, -1, -1, 15, 15, 15, 15, 15, 15, 15, -1, 15, 19, -1, 15, 15, 22, 22, 22]

We can also retrieve and print the topic probability distributions for each document, showing the likelihood of each topic being associated with the documents.

# Get probabilities for each topic
probs = topic_model.probabilities_
print('Topic probabilities for the first document:')
print(probs[0].round(2))
print()
# Print topic probabilities for the first 15 documents
for i in range(min(15, len(docs))):
    print(f'Document {i + 1} is in topic {doc_topic[i]}')
    print(f'Topic probabilities for Document {i + 1}:')
    print(probs[i].round(3))
    print()

Topic probabilities for the first document:
[0.   0.   0.   0.   0.   0.   0.   0.   0.   0.01 0.01 0.01 0.   0.
 0.   0.   0.01 0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.01
 0.   0.   0.01 0.01]

Document 1 is in topic -1
Topic probabilities for Document 1:
[0.003 0.003 0.003 0.004 0.002 0.003 0.003 0.003 0.003 0.006 0.007 0.007
 0.003 0.004 0.005 0.004 0.005 0.002 0.004 0.004 0.002 0.004 0.003 0.003
 0.003 0.003 0.004 0.008 0.003 0.004 0.006 0.005]

Document 2 is in topic 30
Topic probabilities for Document 2:
[0.012 0.014 0.008 0.013 0.007 0.017 0.011 0.011 0.02  0.037 0.024 0.017
 0.008 0.016 0.015 0.01  0.03  0.006 0.009 0.016 0.007 0.011 0.017 0.009
 0.014 0.009 0.019 0.025 0.012 0.015 0.55  0.012]

Document 3 is in topic -1
Topic probabilities for Document 3:
[0.004 0.004 0.002 0.003 0.002 0.004 0.002 0.003 0.005 0.004 0.003 0.002
 0.002 0.003 0.002 0.002 0.006 0.002 0.002 0.003 0.002 0.003 0.004 0.002
 0.005 0.002 0.006 0.004 0.003 0.005 0.006 0.002]

Document 4 is in topic -1
Topic probabilities for Document 4:
[0.003 0.003 0.002 0.004 0.002 0.004 0.003 0.002 0.003 0.007 0.018 0.009
 0.002 0.005 0.006 0.003 0.007 0.001 0.003 0.005 0.002 0.003 0.004 0.002
 0.003 0.002 0.004 0.006 0.003 0.003 0.007 0.005]

Document 5 is in topic 24
Topic probabilities for Document 5:
[0.077 0.035 0.014 0.057 0.013 0.023 0.02  0.015 0.017 0.019 0.02  0.016
 0.012 0.016 0.014 0.011 0.028 0.012 0.012 0.026 0.014 0.027 0.021 0.026
 0.151 0.016 0.049 0.024 0.033 0.027 0.022 0.012]

Document 6 is in topic -1
Topic probabilities for Document 6:
[0.02  0.021 0.014 0.023 0.012 0.024 0.017 0.018 0.027 0.069 0.047 0.034
 0.014 0.024 0.028 0.019 0.05  0.01  0.016 0.028 0.011 0.019 0.024 0.016
 0.022 0.015 0.032 0.057 0.019 0.022 0.124 0.023]

Document 7 is in topic 1
Topic probabilities for Document 7:
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.]

Document 8 is in topic 5
Topic probabilities for Document 8:
[0.015 0.042 0.01  0.015 0.009 0.088 0.018 0.01  0.036 0.019 0.018 0.013
 0.009 0.02  0.013 0.009 0.024 0.009 0.011 0.019 0.01  0.012 0.031 0.012
 0.024 0.01  0.018 0.016 0.014 0.017 0.024 0.011]

Document 9 is in topic -1
Topic probabilities for Document 9:
[0.028 0.122 0.014 0.031 0.014 0.056 0.027 0.015 0.028 0.025 0.025 0.019
 0.013 0.025 0.016 0.012 0.034 0.013 0.015 0.026 0.012 0.021 0.033 0.02
 0.066 0.015 0.032 0.024 0.024 0.03  0.03  0.014]

Document 10 is in topic -1
Topic probabilities for Document 10:
[0.013 0.015 0.007 0.019 0.007 0.017 0.016 0.009 0.014 0.026 0.125 0.052
 0.007 0.02  0.028 0.016 0.038 0.006 0.012 0.03  0.009 0.013 0.02  0.011
 0.016 0.008 0.018 0.031 0.014 0.013 0.028 0.023]

Document 11 is in topic -1
Topic probabilities for Document 11:
[0.01  0.016 0.006 0.012 0.005 0.016 0.01  0.007 0.016 0.029 0.021 0.012
 0.006 0.012 0.011 0.007 0.04  0.005 0.007 0.015 0.005 0.008 0.014 0.007
 0.014 0.006 0.015 0.018 0.009 0.01  0.023 0.008]

Document 12 is in topic -1
Topic probabilities for Document 12:
[0.014 0.012 0.006 0.05  0.006 0.01  0.011 0.007 0.009 0.012 0.018 0.014
 0.006 0.01  0.01  0.008 0.019 0.005 0.007 0.018 0.006 0.014 0.011 0.011
 0.017 0.007 0.018 0.023 0.015 0.011 0.014 0.01 ]

Document 13 is in topic -1
Topic probabilities for Document 13:
[0.026 0.027 0.012 0.031 0.011 0.027 0.019 0.015 0.026 0.041 0.04  0.026
 0.012 0.021 0.022 0.014 0.151 0.01  0.012 0.043 0.012 0.02  0.027 0.017
 0.035 0.014 0.055 0.054 0.023 0.024 0.049 0.017]

Document 14 is in topic 5
Topic probabilities for Document 14:
[0.016 0.048 0.011 0.017 0.01  0.108 0.02  0.012 0.039 0.021 0.02  0.015
 0.01  0.022 0.014 0.01  0.026 0.01  0.012 0.021 0.011 0.014 0.034 0.013
 0.027 0.011 0.02  0.017 0.015 0.018 0.026 0.012]

Document 15 is in topic 0
Topic probabilities for Document 15:
[0.055 0.017 0.017 0.023 0.014 0.014 0.013 0.02  0.013 0.013 0.013 0.011
 0.014 0.012 0.01  0.008 0.018 0.012 0.009 0.017 0.015 0.026 0.014 0.026
 0.028 0.021 0.04  0.017 0.032 0.031 0.016 0.009]

# Get the lists of keywords under each topic
topic_keywords = topic_model.get_topics()

# Print the lists of keywords for each topic
for topic_id, keywords in topic_keywords.items():
    keywords = [(u, round(v, 3)) for u, v in keywords]
    print(f'Topic {topic_id}: {keywords}')

We can also examine the topics more closely.

# To see the first 5 topics
freq = topic_model.get_topic_info()
freq.head(5)

	Topic	Count	Name	Representation	Representative_Docs
0	-1	544	-1_the_of_and_in	[the, of, and, in, to, that, we, on, this, with]	[In recent years, there has been an increasing...
1	0	61	0_knowledge_transfer_and_the	[knowledge, transfer, and, the, of, on, to, th...	[This paper proposes a conceptual framework de...
2	1	52	1_international_entrepreneurial_internationali...	[international, entrepreneurial, international...	[Grounded in the resource-based view of the fi...
3	2	43	2_career_expatriates_expatriate_assignments	[career, expatriates, expatriate, assignments,...	[Creating organizational processes which nurtu...
4	3	38	3_acquisitions_acquisition_crossborder_acquirers	[acquisitions, acquisition, crossborder, acqui...	[This study develops and tests a framework abo...

Note that topic -1 indicates that the document has not been grouped into a topic.