Step 6 — Visualizing Word Frequencies with Bar Charts in Python

Introduction

After learning to filter out stop words and analyze meaningful word frequencies, it’s time to add an essential skill to your NLP toolkit: visualizing word frequencies with bar charts. Visualization transforms raw counts into actionable insights, making patterns, anomalies, and trends easy to understand even in small datasets. In this step, you’ll create your first matplotlib bar chart, a practical and visual way to summarize the most common tokens in any text.


Main concept explained clearly

Data visualization is the process of presenting quantitative information (like word frequencies) in a graphical, interpretable form. In NLP, visualizing frequencies allows you to:

  • See which words dominate the text at a glance.
  • Detect unusual or interesting tokens quickly.
  • Communicate findings more effectively.

A bar chart is a classic visualization where each word is represented by a bar proportional to its frequency. Python’s matplotlib library is a powerful (yet beginner-friendly) tool for this.


Why this matters in NLP

  • Understanding distribution is crucial for filtering, summarizing, and modeling.
  • Visualizations reveal frequent words, rare words, and noise that pure numbers may hide.
  • Bar charts are a foundation for more advanced plots (histograms, word clouds, etc.).
  • They enable early exploratory analysis—helping you decide what preprocessing or cleaning is needed before moving deeper into NLP.

Python example

Let’s build your first bar chart step-by-step. You’ll need to install matplotlib if you don’t have it:

pip install matplotlib

Step 6.1 — Prepare filtered frequency statistics

Use your existing frequency code for a filtered token list (from Step 5):

word_freq = {'nlp': 3, '2024': 1, 'practical': 1, 'learning': 1, 'step': 2, 'makes': 1, 'easier': 1}

Step 6.2 — Select the top N words for plotting

sorted_words = sorted(word_freq.items(), key=lambda item: item[1], reverse=True)
top_n = 5
top_words = sorted_words[:top_n]
words = [word for word, count in top_words]
counts = [count for word, count in top_words]

Step 6.3 — Plot the bar chart

import matplotlib.pyplot as plt
plt.bar(words, counts)
plt.xlabel('Word')
plt.ylabel('Frequency')
plt.title('Top 5 Words')
plt.show()

Step 6.4 — Complete code for Step 6

“`python name=step06_visualize_word_freq.py
import string
import matplotlib.pyplot as plt

Sample text with some redundancy

text = “NLP, in 2024! NLP is practical. Learning NLP step by step makes it easier.”

Normalize and tokenize

clean_text = text.strip().lower()
translator = str.maketrans(”, ”, string.punctuation)
no_punct_text = clean_text.translate(translator)
tokens = no_punct_text.split()

Define stop words (English + a few Portuguese)

stop_words = set([
‘the’, ‘is’, ‘in’, ‘it’, ‘by’, ‘and’, ‘a’, ‘of’, ‘to’, ‘for’, ‘on’, ‘o’, ‘a’, ‘de’, ’em’, ‘para’
])

Filter tokens

filtered_tokens = [word for word in tokens if word not in stop_words]

Count frequencies

word_freq = {}
for word in filtered_tokens:
word_freq[word] = word_freq.get(word, 0) + 1

Sort and select top N

sorted_words = sorted(word_freq.items(), key=lambda item: item[1], reverse=True)
top_n = 5
top_words = sorted_words[:top_n]
words = [word for word, count in top_words]
counts = [count for word, count in top_words]

Plotting

plt.bar(words, counts, color=’skyblue’)
plt.xlabel(‘Word’)
plt.ylabel(‘Frequency’)
plt.title(f’Top {top_n} Most Common Words’)
plt.tight_layout()
plt.show()
“`


Line-by-line explanation of the code

  • import string and import matplotlib.pyplot as plt
  • string for text cleaning, matplotlib for plotting.
  • Text normalization and tokenization:
    Standard sequence from previous steps.
  • Stop word filtering:
    Removes uninformative words for sharper visualization.
  • Frequency counting:
    Builds the word frequency dictionary for tokens left after filtering.
  • Sorting and selecting top_n:
  • Sorts word-count pairs by frequency.
  • Picks the top N entries.
  • Separates them into lists for plotting.
  • Bar chart plotting:
  • plt.bar(words, counts, color='skyblue') creates the bars.
  • plt.xlabel/plt.ylabel set axis labels.
  • plt.title adds a chart title.
  • plt.tight_layout() prevents label overlap.
  • plt.show() displays the chart (window pops up).

Practical notes

  • For longer texts or more words, adapt the script to plot top 10, 20, or even all unique tokens.
  • If running in Jupyter Notebook, use %matplotlib inline for in-place plots.
  • Bar charts are great for categorical comparisons but not for full vocabulary histograms—those come later.
  • You can customize colors, add gridlines, or rotate labels for more readable charts.
  • For Portuguese texts, expand the stop word list to suit your dataset.

Suggested mini exercise

  1. Change the sample text to a short paragraph in Portuguese. Adjust the stop word list and see what words appear.
  2. Vary top_n for larger charts.
  3. Color-code bars or tweak chart aesthetics.
  4. Try plotting the least frequent words instead of the most.
  5. Use two texts and compare their bar charts side by side.

Conclusion

You’ve now learned how to turn frequency numbers into clear, compelling visual summaries—a major practical upgrade for NLP beginners. Visualizing token distributions lets you spot patterns instantly, improve preprocessing, and communicate findings to others. This bar chart technique is a durable tool for any text analytics workflow, and prepares you for deeper explorations like word clouds and corpus analysis.

Edvaldo Guimrães Filho Avatar

Published by