Data visualization in python application -- Summary

preface

   through seven articles, we give you a general introduction to the production process of data visualization, Of course, this project is also written by Eric Matthes "Python programming from introduction to practice" is a project in the book. However, I introduce this project to you through certain features and through the way I learn the project. From environment construction to subsequent step-by-step implementation. Of course, with the deepening of the project, the amount of code is increasing. In order to facilitate your reading, we just introduce the code method corresponding to the implementation function He wrote. Today, we give you the complete code of the project to give you a complete effect. However, readers are strongly encouraged to read from the beginning of this project. In this way, I believe you will benefit a lot. If you simply paste all the code in this article, it's useless. Maybe you don't even know the whole framework of this project. It should be noted here that since this project is data visualization, it is different from the previous project—— Alien invasion There are still some differences. Alien invasion is a small part of a module implementation project. It depends on all the code of the project to run. For example, we introduced earlier Armed spacecraft , we can't run with this piece of code alone. However, our data visualization is different. Each module is independent of each other, there is no necessary connection, and the coupling is very low. We mainly introduce the acquisition of data and the specific use of API, analyze our existing data, and teach us how to make some beautiful charts.

Project overview

   data visualization refers to exploring data through visual representation. It is closely related to data mining. In fact, accurately speaking, it is one of the links of data mining and artificial intelligence, and data mining refers to using code to explore the laws and associations of data sets. A dataset can be a small list of numbers that can be represented in one line of code, or it can be a more intuitive picture. The specific effects are as follows:


  presenting data beautifully is not just about beautiful pictures. Present the data in a striking and concise way, so that users can clearly and intuitively understand the meaning behind the data, so as to better control the laws. The first thing this project introduces to you is to solve the problem of data, because the premise of data visualization is that we must have data first. It is mainly introduced to you through three articles Generate data That is, in the absence of data, we should generate some data for our analysis; Through two articles Download data ; With the data, the last is to introduce you through two articles Specific use and analysis of API.
   however, this article only introduces some small methods, such as how to draw histogram and broken line chart. If readers want to learn this knowledge, there are a lot of online tutorials. You can learn it at once. It is relatively simple, there are many applications, and the cost performance is very good. Let's introduce all the codes in this project. It is convenient for everyone to refer to as a whole.

Complete code

1,dice_ visual. Implementation of PY

import pygal

from die import Die

# Create two D6 dice.
die_1 = Die()
die_2 = Die()

# Make some rolls, and store results in a list.
results = []
for roll_num in range(1000):
    result = die_1.roll() + die_2.roll()
    results.append(result)
    
# Analyze the results.
frequencies = []
max_result = die_1.num_sides + die_2.num_sides
for value in range(2, max_result+1):
    frequency = results.count(value)
    frequencies.append(frequency)
    
# Visualize the results.
hist = pygal.Bar()
hist.force_uri_protocol = 'http'

hist.title = "Results of rolling two D6 dice 1000 times."
hist.x_labels = ['2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
hist.x_title = "Result"
hist.y_title = "Frequency of Result"

hist.add('D6 + D6', frequencies)
hist.render_to_file('dice_visual.svg')

2,die. Implementation of PY

from random import randint

class Die():
    """A class representing a single die."""
    
    def __init__(self, num_sides=6):
        """Assume a six-sided die."""
        self.num_sides = num_sides
        
    def roll(self):
        """"Return a random value between 1 and number of sides."""
        return randint(1, self.num_sides)

3,die_ visual. Implementation of PY

import pygal

from die import Die

# Create a D6.
die = Die()

# Make some rolls, and store results in a list.
results = []
for roll_num in range(1000):
    result = die.roll()
    results.append(result)
    
# Analyze the results.
frequencies = []
for value in range(1, die.num_sides+1):
    frequency = results.count(value)
    frequencies.append(frequency)
    
# Visualize the results.
hist = pygal.Bar()
hist.force_uri_protocol = 'http'

hist.title = "Results of rolling one D6 1000 times."
hist.x_labels = ['1', '2', '3', '4', '5', '6']
hist.x_title = "Result"
hist.y_title = "Frequency of Result"

hist.add('D6', frequencies)
hist.render_to_file('die_visual.svg')

4,different_ dice. Implementation of PY

from die import Die

import pygal

# Create a D6 and a D10.
die_1 = Die()
die_2 = Die(10)

# Make some rolls, and store results in a list.
results = []
for roll_num in range(50000):
    result = die_1.roll() + die_2.roll()
    results.append(result)
    
# Analyze the results.
frequencies = []
max_result = die_1.num_sides + die_2.num_sides
for value in range(2, max_result+1):
    frequency = results.count(value)
    frequencies.append(frequency)
    
# Visualize the results.
hist = pygal.Bar()
hist.force_uri_protocol = 'http'

hist.title = "Results of rolling a D6 and a D10 50,000 times."
hist.x_labels = ['2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
    '13', '14', '15', '16']
hist.x_title = "Result"
hist.y_title = "Frequency of Result"

hist.add('D10 + D10', frequencies)
hist.render_to_file('dice_visual.svg')

5,mpl_ squares. Implementation of PY

import matplotlib.pyplot as plt

input_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
plt.plot(input_values, squares, linewidth=5)

# Set chart title and label axes.
plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

# Set size of tick labels.
plt.tick_params(axis='both', labelsize=14)

plt.show()

6,random_ walk. Implementation of PY

from random import choice

class RandomWalk():
    """A class to generate random walks."""
    
    def __init__(self, num_points=5000):
        """Initialize attributes of a walk."""
        self.num_points = num_points
        
        # All walks start at (0, 0).
        self.x_values = [0]
        self.y_values = [0]

    def fill_walk(self):
        """Calculate all the points in the walk."""
        
        # Keep taking steps until the walk reaches the desired length.
        while len(self.x_values) < self.num_points:
            
            # Decide which direction to go, and how far to go in that direction.
            x_direction = choice([1, -1])
            x_distance = choice([0, 1, 2, 3, 4])
            x_step = x_direction * x_distance
            
            y_direction = choice([1, -1])
            y_distance = choice([0, 1, 2, 3, 4])
            y_step = y_direction * y_distance
            
            # Reject moves that go nowhere.
            if x_step == 0 and y_step == 0:
                continue
            
            # Calculate the next x and y values.
            next_x = self.x_values[-1] + x_step
            next_y = self.y_values[-1] + y_step
            
            self.x_values.append(next_x)
            self.y_values.append(next_y)

7,rw_ visual. Implementation of PY

import matplotlib.pyplot as plt

from random_walk import RandomWalk

# Keep making new walks, as long as the program is active.
while True:
    # Make a random walk, and plot the points.
    rw = RandomWalk(50000)
    rw.fill_walk()
    
    # Set the size of the plotting window.
    plt.figure(dpi=128, figsize=(10, 6))
    
    point_numbers = list(range(rw.num_points))
    plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues,
        edgecolor='none', s=1)
        
    # Emphasize the first and last points.
    plt.scatter(0, 0, c='green', edgecolors='none', s=100)
    plt.scatter(rw.x_values[-1], rw.y_values[-1], c='red', edgecolors='none',
        s=100)
        
    # Remove the axes.
    plt.axes().get_xaxis().set_visible(False)
    plt.axes().get_yaxis().set_visible(False)
        
    plt.show()
    
    keep_running = input("Make another walk? (y/n): ")
    if keep_running == 'n':
        break

8,scatter_ squares. Implementation of PY

import matplotlib.pyplot as plt

x_values = list(range(1, 1001))
y_values = [x**2 for x in x_values]

plt.scatter(x_values, y_values, c=(0, 0, 0.8), edgecolor='none', s=40)

# Set chart title, and label axes.
plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)

# Set size of tick labels.
plt.tick_params(axis='both', which='major', labelsize=14)

# Set the range for each axis.
plt.axis([0, 1100, 0, 1100000])

plt.show()

9,highs_ lows. Implementation of PY

import csv
from datetime import datetime

from matplotlib import pyplot as plt

# Get dates, high, and low temperatures from file.
filename = 'death_valley_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs, lows = [], [], []
    for row in reader:
        try:
            current_date = datetime.strptime(row[0], "%Y-%m-%d")
            high = int(row[1])
            low = int(row[3])
        except ValueError:
            print(current_date, 'missing data')
        else:
            dates.append(current_date)
            highs.append(high)
            lows.append(low)

# Plot data.
fig = plt.figure(dpi=128, figsize=(10, 6))
plt.plot(dates, highs, c='red', alpha=0.5)
plt.plot(dates, lows, c='blue', alpha=0.5)
plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)

# Format plot.
title = "Daily high and low temperatures - 2014\nDeath Valley, CA"
plt.title(title, fontsize=20)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)

plt.show()

10,btc_close_2017.py implementation

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)
try:
    # Python 2.x version
    from urllib2 import urlopen
except ImportError:
    # Python 3.x version
    from urllib.request import urlopen  # 1
import json
import requests
import pygal
import math
from itertools import groupby


json_url = 'https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json'
response = urlopen(json_url)  # 2
# Read data
req = response.read()
# Write data to file
with open('btc_close_2017_urllib.json', 'wb') as f:  # 3
    f.write(req)
# Load json format
file_urllib = json.loads(req.decode('utf8'))  # 4
print(file_urllib)


json_url = 'https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json'
req = requests.get(json_url)  # 1
# Write data to file
with open('btc_close_2017_request.json', 'w') as f:
    f.write(req.text)  # 2
file_requests = req.json()  # 3


print(file_urllib == file_requests)


# Load data into a list
filename = 'btc_close_2017.json'
with open(filename) as f:
    btc_data = json.load(f)  # 1

# Print information for each day
for btc_dict in btc_data:
    date = btc_dict['date']
    month = int(btc_dict['month'])
    week = int(btc_dict['week'])
    weekday = btc_dict['weekday']
    close = int(float(btc_dict['close']))  # 1
    print("{} is month {} week {}, {}, the close price is {} RMB".format(
        date, month, week, weekday, close))


# Create 5 lists to store the date and closing price respectively
dates = []
months = []
weeks = []
weekdays = []
close = []
# Daily information
for btc_dict in btc_data:
    dates.append(btc_dict['date'])
    months.append(int(btc_dict['month']))
    weeks.append(int(btc_dict['week']))
    weekdays.append(btc_dict['weekday'])
    close.append(int(float(btc_dict['close'])))

line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)  # ①
line_chart.title = 'Closing price(¥)'
line_chart.x_labels = dates
N = 20  # The x-axis coordinates are displayed every 20 days
line_chart.x_labels_major = dates[::N]  # ②
line_chart.add('Closing price', close)
line_chart.render_to_file('Closing price line chart(¥).svg')


line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)
line_chart.title = 'Logarithmic transformation of closing price(¥)'
line_chart.x_labels = dates
N = 20  # The x-axis coordinates are displayed every 20 days
line_chart.x_labels_major = dates[::N]
close_log = [math.log10(_) for _ in close]  # ①
line_chart.add('log Closing price', close_log)
line_chart.render_to_file('Log conversion line chart of closing price(¥).svg')
line_chart


def draw_line(x_data, y_data, title, y_legend):
    xy_map = []
    for x, y in groupby(sorted(zip(x_data, y_data)), key=lambda _: _[0]):  # 2
        y_list = [v for _, v in y]
        xy_map.append([x, sum(y_list) / len(y_list)])  # 3
    x_unique, y_mean = [*zip(*xy_map)]  # 4
    line_chart = pygal.Line()
    line_chart.title = title
    line_chart.x_labels = x_unique
    line_chart.add(y_legend, y_mean)
    line_chart.render_to_file(title + '.svg')
    return line_chart


idx_month = dates.index('2017-12-01')
line_chart_month = draw_line(
    months[:idx_month], close[:idx_month], 'Monthly and daily average closing price(¥)', 'Monthly daily average')
line_chart_month


idx_week = dates.index('2017-12-11')
line_chart_week = draw_line(
    weeks[1:idx_week], close[1:idx_week], 'Daily average closing price(¥)', 'Daily mean')
line_chart_week


idx_week = dates.index('2017-12-11')
wd = ['Monday', 'Tuesday', 'Wednesday',
      'Thursday', 'Friday', 'Saturday', 'Sunday']
weekdays_int = [wd.index(w) + 1 for w in weekdays[1:idx_week]]
line_chart_weekday = draw_line(
    weekdays_int, close[1:idx_week], 'Weekly average closing price(¥)', 'Weekly average')
line_chart_weekday.x_labels = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
line_chart_weekday.render_to_file('Weekly average closing price(¥).svg')
line_chart_weekday


with open('Closing price Dashboard.html', 'w', encoding='utf8') as html_file:
    html_file.write(
        '<html><head><title>Closing price Dashboard</title><meta charset="utf-8"></head><body>\n')
    for svg in [
            'Closing price line chart(¥).svg', 'Log conversion line chart of closing price(¥).svg', 'Monthly and daily average closing price(¥).svg',
            'Daily average closing price(¥).svg', 'Weekly average closing price(¥).svg'
    ]:
        html_file.write(
            '    <object type="image/svg+xml" data="{0}" height=500></object>\n'.format(svg))  # 1
    html_file.write('</body></html>')

11,bar_ descriptions. Implementation of PY

import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

my_style = LS('#333366', base_style=LCS)
chart = pygal.Bar(style=my_style, x_label_rotation=45, show_legend=False)

chart.title = 'Python Projects'
chart.x_labels = ['httpie', 'django', 'flask']
chart.force_uri_protocol = 'http'

plot_dicts = [
    {'value': 16101, 'label': 'Description of httpie.'},
    {'value': 15028, 'label': 'Description of django.'},
    {'value': 14798, 'label': 'Description of flask.'},
    ]

chart.add('', plot_dicts)
chart.render_to_file('bar_descriptions.svg')

12,python_ repos. Implementation of PY

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

# Make an API call, and store the response.
url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)

# Store API response in a variable.
response_dict = r.json()
print("Total repositories:", response_dict['total_count'])

# Explore information about the repositories.
repo_dicts = response_dict['items']

names, plot_dicts = [], []
for repo_dict in repo_dicts:
    names.append(repo_dict['name'])
    
    plot_dict = {
        'value': repo_dict['stargazers_count'],
        'label': repo_dict['description'],
        'xlink': repo_dict['html_url'],
        }
    plot_dicts.append(plot_dict)

# Make visualization.
my_style = LS('#333366', base_style=LCS)

my_config = pygal.Config()
my_config.force_uri_protocol = 'http'
my_config.x_label_rotation = 45
my_config.show_legend = False
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.truncate_label = 15
my_config.show_y_guides = False
my_config.width = 1000

chart = pygal.Bar(my_config, style=my_style)
chart.title = 'Most-Starred Python Projects on GitHub'
chart.x_labels = names

chart.add('', plot_dicts)
chart.render_to_file('python_repos.svg')

13,hn_ submissions. Implementation of PY

import requests

from operator import itemgetter

# Make an API call, and store the response.
url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
r = requests.get(url)
print("Status code:", r.status_code)

# Process information about each submission.
submission_ids = r.json()
submission_dicts = []
for submission_id in submission_ids[:30]:
    # Make a separate API call for each submission.
    url = ('https://hacker-news.firebaseio.com/v0/item/' +
            str(submission_id) + '.json')
    submission_r = requests.get(url)
    print(submission_r.status_code)
    response_dict = submission_r.json()
    
    submission_dict = {
        'title': response_dict['title'],
        'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
        'comments': response_dict.get('descendants', 0)
        }
    submission_dicts.append(submission_dict)
    
submission_dicts = sorted(submission_dicts, key=itemgetter('comments'),
                            reverse=True)

for submission_dict in submission_dicts:
    print("\nTitle:", submission_dict['title'])
    print("Discussion link:", submission_dict['link'])
    print("Comments:", submission_dict['comments'])

  this is the complete code of this article. I hope readers can have a clear understanding and a deeper understanding of the application of their basic Python knowledge.

Implementation process of each module

1. Generate data

[1].Generate data (Part 1)
[2].Generate data (medium)
[3].Generate data (Part 2)

2. Download data

[1].Download data (I)
[2].Download data (Part 2)

3. Use API

[1].Using API (I)
[2].Using API (Part 2)
   this is the implementation details of each module of our project. You can read it carefully and it will be helpful for you in future data analysis.

summary

   this paper summarizes the data visualization project, from requirement analysis to code structure, and gives the complete code of the project. Finally, it summarizes the article links of each function realization in front, which is convenient for everyone to read. Python is a language that pays attention to practical operation. It is the simplest and the best entry among many programming languages. When you learn the language, it's easier to learn java, go and C. Of course, Python is also a popular language, which is very helpful for the implementation of artificial intelligence. Therefore, it is worth your time to learn. We live and struggle. We work hard every day, study hard, and constantly improve our ability. I believe we will learn something. come on.

Keywords: Python Data Analysis data visualization

Added by abgoosht on Sun, 13 Feb 2022 05:51:48 +0200