Step 9 — Visualizing Word Co-occurrence (Bigrams) as Networks in Python
Introduction
After learning how to count word co-occurrence (bigrams) in Step 8, it’s time to progress into seeing these relationships visually. Visualizing bigrams as a network (graph) helps you quickly spot the most important word associations in your data. In this article, you’ll learn how to turn your bigram co-occurrence statistics into an interpretable network using Python’s networkx and matplotlib libraries.
Main concept explained clearly
A network graph (also called a “graph” in mathematics and computer science) is a collection of points (nodes) connected by lines (edges). In NLP bigram analysis:
- Each node represents a unique word.
- Each edge between two nodes represents a bigram (a pair of consecutive words).
- The weight of each edge is the count of how often that bigram appeared.
By converting bigrams and their counts into a network, you can:
- Spot clusters of terms that occur together.
- Identify “hub words” that connect different concepts.
- Visualize the most meaningful relationships in your text.
Why this matters in NLP
- Word networks capture context and association far better than isolated word counts.
- Visual networks help you understand text structure, spot common phrases, and find emerging topics.
- This approach is a first step toward topic modeling, phrase extraction, or even building word embeddings.
- Network graphs are valuable for presentations, making text analysis results more accessible and impactful.
Python example
Let’s turn a small text into a bigram network graph!
Step 9.1 — Prepare bigram data
Reuse your filtering and bigram code:
import stringtext = "Natural language processing makes machines understand human language. Processing language is complex and interesting."stop_words = set([ 'the', 'is', 'in', 'it', 'by', 'and', 'a', 'of', 'to', 'for', 'on', 'o', 'a', 'de', 'em', 'para'])def preprocess(text): clean = text.strip().lower() translator = str.maketrans('', '', string.punctuation) no_punct = clean.translate(translator) tokens = no_punct.split() return [w for w in tokens if w not in stop_words]tokens = preprocess(text)bigram_counts = {}for i in range(len(tokens) - 1): pair = (tokens[i], tokens[i+1]) bigram_counts[pair] = bigram_counts.get(pair, 0) + 1
Step 9.2 — Build and display the bigram network
You need to install networkx if you haven’t:
pip install networkx matplotlib
Now plot the graph:
“`python name=step09_bigram_network_graph.py
import string
import networkx as nx
import matplotlib.pyplot as plt
text = “Natural language processing makes machines understand human language. Processing language is complex and interesting.”
stop_words = set([
‘the’, ‘is’, ‘in’, ‘it’, ‘by’, ‘and’, ‘a’, ‘of’, ‘to’, ‘for’, ‘on’, ‘o’, ‘a’, ‘de’, ’em’, ‘para’
])
def preprocess(text):
clean = text.strip().lower()
translator = str.maketrans(”, ”, string.punctuation)
no_punct = clean.translate(translator)
tokens = no_punct.split()
return [w for w in tokens if w not in stop_words]
tokens = preprocess(text)
Count bigrams
bigram_counts = {}
for i in range(len(tokens) – 1):
pair = (tokens[i], tokens[i+1])
bigram_counts[pair] = bigram_counts.get(pair, 0) + 1
Create network graph
G = nx.Graph()
for (w1, w2), count in bigram_counts.items():
G.add_edge(w1, w2, weight=count)
Draw graph
plt.figure(figsize=(8, 5))
pos = nx.spring_layout(G, seed=42) # reproducible layout
edges = G.edges()
weights = [G[u][v][‘weight’] for u,v in edges]
nx.draw(G, pos, with_labels=True, width=weights, node_color=’skyblue’, edge_color=’gray’, font_size=12)
plt.title(“Bigram Network Graph”)
plt.show()
“`
Line-by-line explanation of the code
- Preprocessing and bigram counting:
Same process you know—prepare clean tokens, then count bigram frequencies. G = nx.Graph(): Creates a new empty undirected graph.G.add_edge(w1, w2, weight=count): Adds an edge for each bigram, with frequency as the weight.nx.spring_layout(G, seed=42): Chooses where to place nodes visually (spring model, for nice spread).nx.draw(...): Draws the network graph, adjusting node color, edge thickness (by weight), and text size.plt.title(...),plt.show(): Standard matplotlib chart labeling and display.
Practical notes
- You can build and visualize much larger networks from real datasets—just limit the number of edges to keep the figure readable.
- Edge thickness will vary if some bigrams are repeated; higher weights lead to thicker lines.
- Use
G.number_of_edges()orG.number_of_nodes()to check network size. - Try using different layouts (
nx.circular_layout,nx.kamada_kawai_layout) for fun. - For Portuguese, just update your stop word set and text.
Suggested mini exercise
- Change the sample text to a small paragraph in Portuguese and see how the graph changes.
- Use a longer text, then plot only the top 10 or 20 most frequent bigrams by filtering
bigram_countsfirst. - Try playing with color, font size, or different networkx layout functions.
- Add a printout of the ten most frequent bigrams below the chart.
Conclusion
Visualizing word co-occurrence as a network makes relationships in your data leap off the page. You’ve now learned to move from numbers to structure, uncovering the clusters and connections that shape real-world text. With these skills, you’re ready to explore document structure, context, and semantics at a higher level—setting up for collocations, keyphrase extraction, or word embedding models in future steps.
