Word cloud is an image composed of words used in a particular text or subject, in which the size of each word indicates its frequency or importance.
In this python script, we will generate a word cloud image of text from a news article on CNN.
- wordcloud 1.5.0
- matplotlib 3.0.3
Install the dependencies in a virtual environment and activate it.
The image we want to generate will have below configurations.
# image configurations
background_color = "#101010"
height = 720
width = 1080
I have copy-pasted the content of the news article in a text file. Read the file and store words in a list.
# Read a text file and calculate frequency of words in it
with open("/tmp/sample_text.txt", "r") as f:
words = f.read().split()
Now generate a dictionary with keys as words and values as frequency of words. We will ignore the stop words.
data = dict()
for word in words:
word = word.lower()
if word in stop_words:
continue
data[word] = data.get(word, 0) + 1
You can get the list of stopwords from nltk
library or from resource available online.
import nltk
from nltk.corpus import stopwords
set(stopwords.words('english'))
Now create word cloud object and initialize with image configurations.
word_cloud = WordCloud(
background_color=background_color,
width=width,
height=height
)
word_cloud.generate_from_frequencies(data)
word_cloud.to_file('image.png')
Call the generate_from_frequencies
method with data dictionary as input and then generate the image and save to file.
Code is available at Github.
"""
Python script to generate word cloud image.
Author - Anurag Rana
Read more on - https://www.pythoncircle.com
"""
from wordcloud import WordCloud
# image configurations
background_color = "#101010"
height = 720
width = 1080
with open("stopwords.txt", "r") as f:
stop_words = f.read().split()
# Read a text file and calculate frequency of words in it
with open("/tmp/sample_text.txt", "r") as f:
words = f.read().split()
data = dict()
for word in words:
word = word.lower()
if word in stop_words:
continue
data[word] = data.get(word, 0) + 1
word_cloud = WordCloud(
background_color=background_color,
width=width,
height=height
)
word_cloud.generate_from_frequencies(data)
word_cloud.to_file('image.png')
For more details, visit official documentation of word cloud.