preface
we have been introducing the previous articles Alien invasion The project makes full use of the relevant knowledge of Python and realizes the game development of alien invasion through the third-party library of pygame. This project only allows you to simply understand the relevant knowledge of Python. Of course, it also enables you to understand the development process of the game to a certain extent, but it is only limited to the level of ideological understanding. Because generally speaking, games need performance testing, and more often they use C language,c++ . Python is almost impossible to develop, let alone play games. In fact, Python is most widely used in data analysis and algorithms, especially NLP (natural language processing), CV (computer vision), data mining and so on. Next, let's introduce the second project, which is the basis of data analysis and needs to be mastered by everyone.
1, Project background
data visualization refers to exploring data through visual representation, which is closely related to data analysis and data mining, while data mining refers to exploring the laws and associations of data sets through code. A dataset can be a small list of numbers that can be represented in one line of code, or it can be data in other forms. presenting data beautifully is not just about beautiful pictures. Present the data in a striking and concise way, so that the viewer can understand its meaning and discover the laws and meanings in the data set. Fortunately, even without a supercomputer, it can visualize complex data. In view of the efficiency of Python, it can quickly explore a data set composed of millions of data points on a laptop. The data points do not have to be digital. Using the relevant knowledge of Python, we can also analyze non digital data. in many fields such as finance, stock market and weather research, everyone uses Python to complete data intensive work. Data scientists have written an impressive series of visualization and analysis tools in Python, many of which are also available to us. Among many visualization tools, one of the most popular tools is matplotlib, which is a Mathematical drawing library . We will use it to make simple charts, such as line charts and scatter charts. Then, we will generate a more interesting data set based on the concept of random walk - a graph generated from a series of random decisions. next, we need to install the third-party library required for this project - matplotlib.
2, Install matplotlib
since I use windows, I focus on installing the matplotlib Library in Windows mountain. Other systems are similar. Before installing matplotlib, we should install the Python environment, and pip. For the specific installation process, see article . Generally, readers who follow our previous "alien invasion" project to knock the code should have installed it Python as well as pip Therefore, there is no introduction here. If you install it, you can baidu online. Many such articles can follow the operation. next, I'll introduce the installation of matplotlib in windows in detail. First, we go to our computer to check our computer version and Python version:
Therefore, we are matplotlib official website Go up to find the corresponding version to download;
Next, put the downloaded file into the script directory of python or pycharm. Note that there is pip3.7 in this directory. The details are as follows:
Then right click the corresponding position to open the black screen terminal and enter the following command:
pip3.7 install matplotlib-3.4.3-cp37-cp37m-win_amd64.whl
after entering, the system starts to install. Since matplotlib is installed on my computer, I won't repeat the installation. Next, we test the package just installed; To this end, first input Python on the black screen terminal, and then enter the following code in the python environment to test whether matplotlib is successfully installed:
import matplotlib
the specific effects are as follows:
So far, our matplotlib has been successfully installed. Next, let's do some exercises through matplotlib to get familiar with the usage of matplotlib.
3, matplotlib draws a simple line graph
if we want to get started with the various charts drawn by matplotlib as soon as possible, we can check it on the official website. The introduction is still fast. Next, we will only introduce the commonly used functions and the functions used in this project, and the remaining interested readers can go by themselves Official website study. let's use matplotlib to draw a simple line chart, and then customize it to realize more informative data visualization. We will use the square number sequences 1, 4, 9, 16 and 25 to plot this chart. Just provide the following numbers to matplotlib, and matplotlib can complete other tasks. The specific implementation is as follows:
import matplotlib.pyplot as plt squares = [1, 4, 9, 16, 25] plt.plot(squares) plt.show()
we first imported the module pyplot and assigned it an alias plt to avoid repeated input of pyplot. Most online examples do this, so do it here. The module pyplot contains many functions for generating charts. we create a list in which the aforementioned square numbers are stored, and then pass the list to the function plot(), which attempts to draw meaningful graphics based on these numbers. plt.show() opens the matplotlib viewer and displays the drawn graphics, as shown in the following figure:
1. Change label text and line thickness
as shown in the above figure, the number is getting larger and larger, but the label text is too small and the lines are too thin. Fortunately, matplotlib allows us to adjust all aspects of visualization. Let's improve the readability of this diagram through code:
import matplotlib.pyplot as plt squares = [1, 4, 9, 16, 25] plt.plot(squares, linewidth=5) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', labelsize=14) plt.show()
the parameter linewidth determines the thickness of the line drawn by plot(). The function title() assigns a title to the chart. In the above code, the parameter fontsize appears multiple times, specifying the size of the text in the chart. the functions xlabel() and ylabel() let you set the title for each axis; And the function ticket_params() sets the style of the scale, where the specified argument will affect the scale on the x-axis and y-axis (axis = 'both') and set the font size of the scale mark to 14(labelsize=14). the final chart is much easier to read. The specific effects are as follows:
As can be seen from the figure, the label text is larger and the lines are thicker.
2. Correction chart
after the graphics were easier to read, we found that the data was not drawn correctly; The end point of the line chart indicates that the square of 4.0 is 25! Let's fix this problem. when providing a series of numbers for plot(), it assumes that the X coordinate corresponding to the first data point is 0, but the x value corresponding to our first point is 1. To change this default behavior, we can provide both input and output values for plot():
import matplotlib.pyplot as plt input_values = [1, 2, 3, 4, 5] squares = [1, 4, 9, 16, 25] plt.plot(input_values, squares, linewidth=5) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', labelsize=14) plt.show()
now plot() will plot the data correctly, because we provide both input and output values, it does not need to make assumptions about how the output values are generated. The final figure is correct. The details are as follows:
when using plot(), you can specify various arguments, and you can also use many functions for customization.
3. Use scatter() to draw a scatter chart and style it
sometimes, you need to draw a scatter chart and set the style of each data point. For example, we might want to display smaller values in one color and larger values in another color. When drawing a large dataset, we can also set the same style for each point, and redraw some points with different style options to highlight them. to draw a single point, use scatter() and pass it a pair of x and y coordinates, which will draw a point at the specified position:
import matplotlib.pyplot as plt plt.scatter(2, 4) plt.show()
let's set the output style to make it more interesting; You need to add a title, label the axis, and make sure that all text is large enough to be seen:
import matplotlib.pyplot as plt plt.scatter(2, 4, s=200) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.show()
first, we call scatter() and use the argument s to set the size of the point used when drawing the graph. If you run scatter at this time_ Squares.py, you will see a point in the center of the chart. The specific effects are as follows:
4. Draw a series of points using scatter()
to draw a series of points, you can pass two lists containing x values and y values to scatter(). The specific implementation is as follows:
import matplotlib.pyplot as plt x_values = [1, 2, 3, 4, 5] y_vslues = [1, 4, 9, 16, 25] plt.scatter(x_values, y_vslues, s=100) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.show()
list x_values contains the number whose square value you want to calculate, and the list y_values contains the square value of each of the preceding numbers. When passing these lists to scatter(), matplotlib reads a value from each list in turn to draw a point. The point coordinates to be drawn are (1,1), (2,4), (3,9), (4,16) and (5,25), and the final effect diagram is as follows:
5. Automatically calculate data
manually calculating the values contained in the list may be inefficient, especially when there are many points to draw. Instead of manually calculating the list containing point coordinates, let Python loop do this calculation for us. Here is the code for drawing 1000 points:
import matplotlib.pyplot as plt x_values = list(range(1, 1001)) y_vslues = [x**2 for x in x_values] plt.scatter(x_values, y_vslues, s=40) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.axis([0, 1100, 0, 1100000]) plt.show()
let's first create a list containing x values, including the numbers 1 ~ 1000. Next is a list parsing that generates Y values. It traverses x values (for x in x_values), calculates its square value (x**2), and stores the results in the list y_values, and then pass the input list and output list to scatter(). due to the large data set, we set the points smaller, and use the function axis() to specify the value range of each coordinate. The function axis () requires four values: the minimum and maximum of the X and Y coordinates. Here, we set the value range of the x-axis to 01100 and the value range of the y-axis to 01100000. The specific effects are as follows:
6. Delete outline of data point
matplotlib allows us to assign colors to each point in the scatter diagram. The default is blue dot and black outline, which works well when the scatter chart contains few data points. But draw many points, and the black outline may be pasted together. To delete the outline of a data point, pass the argument edgecolor='none 'when calling scatter():
import matplotlib.pyplot as plt x_values = list(range(1, 1001)) y_vslues = [x**2 for x in x_values] plt.scatter(x_values, y_vslues, edgecolors='none', s=40) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.axis([0, 1100, 0, 1100000]) plt.show()
the specific effect drawing is as follows:
7. Custom color
to modify the color of the data point, pass the parameter c to scatter() and set it to the name of the color to be used, as shown below:
import matplotlib.pyplot as plt x_values = list(range(1, 1001)) y_vslues = [x**2 for x in x_values] plt.scatter(x_values, y_vslues, c='red', edgecolors='none', s=40) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.axis([0, 1100, 0, 1100000]) plt.show()
we can also customize colors using RGB color mode. The specific effects are as follows:
To specify a custom color, pass the parameter c and set it to a tuple containing three 0 ~! Small values between, which represent the red, green and blue components respectively. For example, the following code line creates a scatter chart consisting of light blue dots:
import matplotlib.pyplot as plt x_values = list(range(1, 1001)) y_vslues = [x**2 for x in x_values] plt.scatter(x_values, y_vslues, c=(0, 0, 0.8), edgecolors='none', s=40) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.axis([0, 1100, 0, 1100000]) plt.show()
the closer the value is to 0, the darker the specified color, and the closer the value is to 1, the lighter the specified color. The specific effects are as follows:
8. Use color mapping
color mapping refers to a series of colors that gradually change from the start color to the end color. In visualization, color mapping is used to highlight the law of data. For example, we can use lighter colors to display smaller values and darker colors to display larger values. the module pyplot has a set of color mappings built in. To use these color maps, we need to tell pyplot how to set the color of each point in the dataset. Next, we demonstrated how to set the color according to the y value of each point:
import matplotlib.pyplot as plt x_values = list(range(1, 1001)) y_values = [x**2 for x in x_values] plt.scatter(x_values, y_values, c=y_values, cmap=plt.cm.Blues ,edgecolors='none', s=40) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.axis([0, 1100, 0, 1100000]) plt.show()
we set the parameter c to a list of Y values and use the parameter cmap to tell pyplot which color mapping to use. These codes display the points with small y value as light blue and the points with large y value as dark blue. The resulting effect diagram is as follows:
if we want to know Mapping of all colors in pyplot , you can go to the official website of matplotlib. The specific operations are as follows:
first click Examples - > color Examples - > Click colormaps_reference;
the effects of specific operations are as follows:
the specific documents are as follows:
9. Auto save chart
to let the program automatically save the chart in a file, replace the call to plt.show() with the call to plt.savefig():
import matplotlib.pyplot as plt x_values = list(range(1, 1001)) y_values = [x**2 for x in x_values] plt.scatter(x_values, y_values, c=y_values, cmap=plt.cm.Blues ,edgecolors='none', s=40) # Set the chart title and label the axis plt.title("Square Number", fontsize=24) plt.xlabel("Value", fontsize=14) plt.ylabel("Square of Value", fontsize=14) # Sets the size of the tick mark plt.tick_params(axis='both', which='major', labelsize=14) plt.axis([0, 1100, 0, 1100000]) plt.show() plt.savefig('squares_plot.png', bbox_iches='tight')
the first argument specifies the file name to save the chart, which will be stored in scatter_ In the directory where squares.py is located; The second argument specifies to trim the extra white space in the chart. If you want to leave extra white space around the chart, you can omit this argument.
summary
starting from this article, I will introduce the project of data visualization. This article mainly introduces the background of the project: the importance of data visualization and the related applications of Python. In addition, the installation of matplotlib in windows is introduced; Finally, the related knowledge points of drawing line graph in matplotlib are introduced. Python is a language that pays attention to practical operation. It is the simplest and the best entry among many programming languages. When you learn the language, it's easier to learn java, go and C. Of course, Python is also a popular language, which is very helpful for the implementation of artificial intelligence. Therefore, it is worth your time to learn. Life is endless and struggle is endless. We work hard every day, study hard, and constantly improve our ability. I believe we will learn something. come on.