streamlit rapid deployment SCADA data analysis

By establishing a web app for SCADA data analysis, this paper briefly introduces the use and deployment of streamlit.
See the code in the text for details GitHub:
https://github.com/SooHooLee/test
Project web app See:
https://share.streamlit.io/soohoolee/test/data_analysis.py

1. What is Streamlit

Streamlit: an application development framework for machine learning and data science teams. It is the fastest way to build and share data applications. Without front-end experience, it can quickly build a shareable user-friendly web app in python in a few minutes.

2. Installation and operation of streamlit

Installation: open Anaconda Prompt and directly install pip in command line mode.

$ pip install streamlit

Run: in the file directory, run the streamlit file

$ streamlit run file name.py

3. SCADA data analysis practice

Import Streamlit: import streamlit as st
The main interface of streamlit can only be divided into left and right sides. Although it is not as good-looking as HTML/CSS, the good thing is that it is concise and clear. In this project, the left side is used for operation and the right side is displayed in real time. Add in the statement of streamlit sidebar. You can put it on the left interface.

3.1 guide library

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import streamlit as st
from windrose import WindroseAxes
warnings.filterwarnings("ignore")
plt.rcParams['font.sans-serif'] = ['SimHei'] #Display Chinese labels normally
plt.rcParams['axes.unicode_minus'] = False #The negative sign is displayed normally
from statsmodels.tsa.stattools import acf
from statsmodels.graphics.tsaplots import plot_acf

3.2 data reading

(1) Title: st.header('This is a header ')
(2) Subtitle: st.subheader('This is a subheader ')
(3) Import file: st.file_uploader(label, type=None, accept_multiple_files=False, key=None, help=None, on_change=None, args=None, kwargs=None)
Parameter details:
label (str): a short label that prompts the user about the purpose of the uploader;
type (str/list/str/None): an array of allowed extensions. ['png', 'jpg'] defaults to None, indicating that all extensions are allowed;
accept_multiple_files (bool): if True, users are allowed to upload multiple files at the same time. In this case, the return value will be the file list. Default: False;
key (str/int): an optional string or integer used as the unique key of the widget;
help (str): a tooltip displayed next to the file uploader.
on_change (callable): file_ The value of uploader is an optional callback.
args (tuple): tuple of optional parameters passed to the callback.
kwargs (dict): an optional Dictionary of kwargs passed to the callback.
(4) Display the data frame as an interactive table: st.dataframe(data)

 st.sidebar.subheader('* Data upload *')
 st.subheader('* Data description *')
 file = st.sidebar.file_uploader('Upload data', type=['csv'], key=None)
 scada_data = pd.read_csv(file)
 st.dataframe(scada_data)

3.3 selecting time series

(5) Slider: st.slider(label, min_value=None, max_value=None, value=None, step=None, format=None, key=None, help=None, on_change=None, args=None, kwargs=None)
Parameter details:
label (str): a short label that prompts the user about the purpose of the slider;
min_value: minimum allowable value. If the value is int, it defaults to 0; if it is a floating point number, it is 0.0; if it is a date / datetime, it is value - timedelta(days=14); if it is a time, it is time min;
max_value: maximum allowable value. If the value is int, it defaults to 100; if it is a floating point number, it is 1.0; if it is a date / datetime, it is value + timedelta(days=14); if it is a time, it is time max;
Value: the value of the slider when it is first rendered. If a tuple / list containing two values is passed here, a range slider with these lower and upper limits is rendered. For example, if set to (1, 10), the slider will have an optional range between 1 and 10. The default is min_value.
step(int/float/timedelta/None): step interval. If the value is an integer, it defaults to 1; if it is a floating point number, it defaults to 0.01; if it is a date / date time, it defaults to 0.01; if it is a date / date time, it defaults to timedelta(minutes=15) (or if max_value - min_value < 1 day);
format (str/None): the format string of printf style, which controls how the interface should display numbers. This does not affect the return value. int/float formatter support:% d% e% F% g% I date / time / date time formatter use Moment.js Symbol
(6) Write: st.write(*args, **kwargs)
write () is a treasure chest. You can display the corresponding contents according to the commands in (), such as:
write(string): print the formatted Markdown string, using short code that supports LaTeX expressions and emoticons.
write(data_frame): displays the DataFrame as a table.
write(error): print exceptions specifically.
write(func): displays information about the function.
write(module): displays information about the module.
write(dict): displays the dict in the interactive widget.
write(mpl_fig): displays the Matplotlib diagram.
write(altair): displays the Altair chart.
write(keras): displays the Keras model.
write(graphviz): displays the Graphviz graph.
Write (plot_fig): displays the plot diagram.
write(bokeh_fig): displays a scatter chart.
write(sympy_expr): use LaTeX to print SymPy expressions.
write(htmlable): if available, print the repr for the object_ html() .
write(obj): print str(obj) if others are unknown
Wait, wait

  # ---------------------------Select time series-----------------------
    st.sidebar.subheader('* time series *')
    st.subheader('* time series *')
    options = np.array(scada_data['real_time']).tolist()
    (start_time, end_time) = st.select_slider("Please select a time series:", options=options,value=(options[0], options[len(options) - 1]))
    # setting index as date
    scada_data['real_time'] = pd.to_datetime(scada_data['real_time'], format='%Y-%m-%d')
    scada_data.index = scada_data['real_time']
    st.write("Sequence start time:", start_time)
    st.write("Sequence end time:", end_time)
    scada_data = scada_data[start_time:end_time]
    st.dataframe(scada_data)

3.4 time varying characteristic analysis

(7) Selection box: st.selectbox(label, options, index=0, format_func=special_internal_function, key=None, help=None, on_change=None, args=None, kwargs=None)
Note: # the elements in the selection box are labels in SCADA data, which can be modified according to the actual labels;
(8) Line chart: st.line_chart(data)

# ---------------------------Data analysis-----------------------
    st.header('------------------------Data analysis------------------------')  
    st.sidebar.header('-----------------Data analysis-----------------')  
    # -----------Time varying characteristic analysis-----------------
    st.sidebar.subheader('* Time varying characteristic analysis *')
    st.subheader('* Time varying characteristic analysis *')
    type = st.sidebar.selectbox('Please select the analysis object:',
                                ('ActivePower', 'WindSpeed', 'NacellePosition', 'WindDirction', 'AirPressure',
                                 'Temperature', 'PitchAngle', 'ErrorMode', 'OperationMode', 'GeratorSpeed', 'RotorSpeed',
                                 'AirDensity'))
    st.write("The selected analysis objects are:", type)
    st.line_chart(scada_data[type])

3.5 wind rose chart

You can use the windrose library to draw rose charts in python.
Install first: $pip install windrose
from windrose import WindroseAxes
For specific usage, see Official documents.
Official examples:

from windrose import WindroseAxes
from matplotlib import pyplot as plt
import matplotlib.cm as cm
import numpy as np
# Create wind speed and direction variables
ws = np.random.random(500) * 6
wd = np.random.random(500) * 360
ax = WindroseAxes.from_ax()
ax.bar(wd, ws, normed=True, opening=0.8, edgecolor='white')
ax.set_legend()


Then draw a gourd and ladle according to the gourd, draw a rose picture first, and then display the picture.
(9) Figure showing matplotlib.pyplot: st.pyplot(fig)

   # -----------Wind rose-----------------
    st.sidebar.subheader('* Wind rose *')
    st.subheader('* Wind rose *')
    ws = scada_data['WindSpeed']
    wd = scada_data['WindDirction']
    ax = WindroseAxes.from_ax()
    fig = ax.bar(wd, ws, normed=True, opening=0.8, edgecolor='white')
    st.set_option('deprecation.showPyplotGlobalUse', False)
    st.pyplot(fig)

3.6 correlation analysis

3.6.1 Pearson correlation

# -----------Correlation analysis-----------------
    st.sidebar.subheader('* correlation analysis  *')
    st.subheader('* correlation analysis  *')
    # data.corr() calculates the correlation coefficient
    # Pearson correlation
    st.sidebar.subheader(" (1) Pearson correlation coefficient")
    st.subheader(" (1) Pearson correlation coefficient")
    corr = scada_data.corr()
    st.write("Correlation coefficient:", corr)
    fig, ax = plt.subplots(figsize=(8, 8))  # Canvas Size 
    ax = sns.heatmap(corr, vmax=.8, square=True, annot=True)  # Draw thermal diagram annot=True display coefficient
    st.pyplot(fig)

3.6. 2 autocorrelation

   # Autocorrelation acf
    st.sidebar.subheader(" (2) Autocorrelation")
    st.subheader(" (2) Autocorrelation")

    type_acf = st.sidebar.selectbox('Please select the autocorrelation analysis object:',
                                    (
                                        'ActivePower', 'WindSpeed', 'NacellePosition', 'WindDirction', 'AirPressure',
                                        'Temperature',
                                        'PitchAngle', 'ErrorMode', 'OperationMode', 'GeratorSpeed', 'RotorSpeed',
                                        'AirDensity'))
    st.write("The selected autocorrelation analysis objects are:", type_acf)
    lags = st.sidebar.number_input("Please enter the order of autocorrelation:", min_value=1, max_value=200, value=30, step=1)
    st.write("The selected autocorrelation order is:", lags)
    data_acf = acf(scada_data[type_acf], unbiased=False, nlags=lags, qstat=False, fft=None, alpha=None, missing='none')
    st.write("Autocorrelation coefficient:", data_acf)
    plot_acf(scada_data[type_acf])
    st.pyplot()

4. Local operation

Step 1: open CMD and enter the folder path in command line mode
My file location is D:\project\test\data_analysis.py
Step 2: run the streamlit file streamlit run file name py
Step 3: the URL will be displayed. Open the URL in the browser and you can open it locally.

Another way is to open the streamlit run file name in pycharm's terminal py, so you don't have to enter the folder location in cmd.

We can see that the url of local deployment is port 8502. If you need to show it, you can't close the program. Keep the program open and the url can be opened. Isn't this very inconvenient? Next, we will deploy quickly and open the link without local.

5. Streamlit rapid deployment

First, let's take a look Official documents How to deploy:
Step 1: add your App to GitHub;
Step 2: deploy your App on streamlit cloud
Step 3: advanced settings for deployment;
Step 4: waiting for deployment
About the rapid deployment of Streamlit, the official document gives the above four steps. Is it very simple! A lot. Let's take it step by step.

5.1 add App to GitHub

First, you have to have a GitHub account, which is omitted.. Here is to upload the whole folder with pychar. Pychar associates GitHub and Git, and HA is also omitted here.
The test folder contains two files, data_analysis.py is the code for the above data analysis, requirements Txt is the dependency that APP needs to install.
Step1: VCS --> Import into Version Control -->Share Project on GitHub

Step 2: fill in project information

Step 3: documents submitted for the first time


Step4: upload succeeded


5.2 deploying apps on streamlit cloud

Step 1: Enter streamlit cloud Associate GitHub account and click "New app"

Step 3: configure APP. Either fill in it yourself or copy the GitHub URL for one click deployment.

Just copy the GitHub URL: https://github.com/SooHooLee/test/blob/master/data_analysis.py

5.3 advanced settings for deployment

You can select the python version, etc,

Step 4: waiting for deployment
App is being deployed, most of which can be deployed in a few minutes, but if there are many dependencies, the first deployment may take some time. After initial deployment, any changes that do not involve dependencies should be displayed immediately.

After the balloon is deployed, the standard structure of web app URL:
https://share.streamlit.io/[user name]/[repo name]/[branch name]/[app path]
For example: http://share.streamlit.io/streamlit/demo-self-driving/master/streamlit_app.py

See the code in the text for details GitHub:
https://github.com/SooHooLee/test
Project web app See:
https://share.streamlit.io/soohoolee/test/data_analysis.py

Thoughts on some questions:
1. If no data is uploaded, a ValueError will appear. Just upload the data (because streamlit is executed from top to bottom. If no data is uploaded, no data can be found later, so an error is reported).


2. About the data format: the attribute name of the data needs to be selected according to the given. Otherwise, the corresponding data content cannot be found after selecting time series analysis. Are general labels for SCADA data.
It's inconvenient to give data. You can upload your own data to try, or create data to test.
3. Other methods are also used for deployment, such as docker + Alibaba server, heroku, etc,

reference resources:
1,Streamlit
2,Streamlit Deployment Guide (wiki)

Keywords: Python Data Analysis Data Mining

Added by gazza52 on Tue, 14 Dec 2021 05:05:42 +0200