Step 6 — Visualizing Word Frequencies with Bar Charts in Python
Introduction
After learning to filter out stop words and analyze meaningful word frequencies, it’s time to add an essential skill to your NLP toolkit: visualizing word frequencies with bar charts. Visualization transforms raw counts into actionable insights, making patterns, anomalies, and trends easy to understand even in small datasets. In this step, you’ll create your first matplotlib bar chart, a practical and visual way to summarize the most common tokens in any text.
Main concept explained clearly
Data visualization is the process of presenting quantitative information (like word frequencies) in a graphical, interpretable form. In NLP, visualizing frequencies allows you to:
- See which words dominate the text at a glance.
- Detect unusual or interesting tokens quickly.
- Communicate findings more effectively.
A bar chart is a classic visualization where each word is represented by a bar proportional to its frequency. Python’s matplotlib library is a powerful (yet beginner-friendly) tool for this.
Why this matters in NLP
- Understanding distribution is crucial for filtering, summarizing, and modeling.
- Visualizations reveal frequent words, rare words, and noise that pure numbers may hide.
- Bar charts are a foundation for more advanced plots (histograms, word clouds, etc.).
- They enable early exploratory analysis—helping you decide what preprocessing or cleaning is needed before moving deeper into NLP.
Python example
Let’s build your first bar chart step-by-step. You’ll need to install matplotlib if you don’t have it:
pip install matplotlib
Step 6.1 — Prepare filtered frequency statistics
Use your existing frequency code for a filtered token list (from Step 5):
word_freq = {'nlp': 3, '2024': 1, 'practical': 1, 'learning': 1, 'step': 2, 'makes': 1, 'easier': 1}
Step 6.2 — Select the top N words for plotting
sorted_words = sorted(word_freq.items(), key=lambda item: item[1], reverse=True)top_n = 5top_words = sorted_words[:top_n]words = [word for word, count in top_words]counts = [count for word, count in top_words]
Step 6.3 — Plot the bar chart
import matplotlib.pyplot as pltplt.bar(words, counts)plt.xlabel('Word')plt.ylabel('Frequency')plt.title('Top 5 Words')plt.show()
Step 6.4 — Complete code for Step 6
“`python name=step06_visualize_word_freq.py
import string
import matplotlib.pyplot as plt
Sample text with some redundancy
text = “NLP, in 2024! NLP is practical. Learning NLP step by step makes it easier.”
Normalize and tokenize
clean_text = text.strip().lower()
translator = str.maketrans(”, ”, string.punctuation)
no_punct_text = clean_text.translate(translator)
tokens = no_punct_text.split()
Define stop words (English + a few Portuguese)
stop_words = set([
‘the’, ‘is’, ‘in’, ‘it’, ‘by’, ‘and’, ‘a’, ‘of’, ‘to’, ‘for’, ‘on’, ‘o’, ‘a’, ‘de’, ’em’, ‘para’
])
Filter tokens
filtered_tokens = [word for word in tokens if word not in stop_words]
Count frequencies
word_freq = {}
for word in filtered_tokens:
word_freq[word] = word_freq.get(word, 0) + 1
Sort and select top N
sorted_words = sorted(word_freq.items(), key=lambda item: item[1], reverse=True)
top_n = 5
top_words = sorted_words[:top_n]
words = [word for word, count in top_words]
counts = [count for word, count in top_words]
Plotting
plt.bar(words, counts, color=’skyblue’)
plt.xlabel(‘Word’)
plt.ylabel(‘Frequency’)
plt.title(f’Top {top_n} Most Common Words’)
plt.tight_layout()
plt.show()
“`
Line-by-line explanation of the code
import stringandimport matplotlib.pyplot as pltstringfor text cleaning,matplotlibfor plotting.- Text normalization and tokenization:
Standard sequence from previous steps. - Stop word filtering:
Removes uninformative words for sharper visualization. - Frequency counting:
Builds the word frequency dictionary for tokens left after filtering. - Sorting and selecting
top_n: - Sorts word-count pairs by frequency.
- Picks the top N entries.
- Separates them into lists for plotting.
- Bar chart plotting:
plt.bar(words, counts, color='skyblue')creates the bars.plt.xlabel/plt.ylabelset axis labels.plt.titleadds a chart title.plt.tight_layout()prevents label overlap.plt.show()displays the chart (window pops up).
Practical notes
- For longer texts or more words, adapt the script to plot top 10, 20, or even all unique tokens.
- If running in Jupyter Notebook, use
%matplotlib inlinefor in-place plots. - Bar charts are great for categorical comparisons but not for full vocabulary histograms—those come later.
- You can customize colors, add gridlines, or rotate labels for more readable charts.
- For Portuguese texts, expand the stop word list to suit your dataset.
Suggested mini exercise
- Change the sample text to a short paragraph in Portuguese. Adjust the stop word list and see what words appear.
- Vary
top_nfor larger charts. - Color-code bars or tweak chart aesthetics.
- Try plotting the least frequent words instead of the most.
- Use two texts and compare their bar charts side by side.
Conclusion
You’ve now learned how to turn frequency numbers into clear, compelling visual summaries—a major practical upgrade for NLP beginners. Visualizing token distributions lets you spot patterns instantly, improve preprocessing, and communicate findings to others. This bar chart technique is a durable tool for any text analytics workflow, and prepares you for deeper explorations like word clouds and corpus analysis.
