Introduction to superset deployment, installation and use
Superset overview
Apache Superset is an open source, modern and lightweight BI analysis tool. It can connect with a variety of data sources, has rich icon display forms, supports custom dashboards, and has a friendly user interface, which is very easy to use.
Superset application scenario
Because Superset can connect with common big data analysis tools, such as Hive, Kylin, Druid, etc., and supports custom dashboard, it can be used as a visualization tool for data warehouse.
Superset installation and use
Superset official website address: https://superset.apache.org/
GitHub source address: https://github.com/apache/superset
Install Python environment
Superset is a Web application written in Python language. The project development team is in version 3.6, so Python 3 The environment of 6 is the most stable.
Installing Miniconda
CONDA is an open source package and environment manager, which can be used to install different Python versions of software packages and their dependencies on the same machine, and can switch between different Python environments. Anaconda includes CONDA, Python and a lot of installed toolkits, such as numpy, panda, etc. Miniconda includes CONDA and python.
Download the latest version of Miniconda3
wangting@ops04:/opt/software >wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Installing Miniconda
wangting@ops04:/opt/software >bash Miniconda3-latest-Linux-x86_64.sh In order to continue the installation process, please review the license agreement. Please, press ENTER to continue # [continue] enter >>> Do you accept the license terms? [yes|no] [no] >>> Please answer 'yes' or 'no':' # [agree to some terms] yes >>> yes Miniconda3 will now be installed into this location: /home/wangting/miniconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/home/wangting/miniconda3] >>> /opt/module/miniconda3 # [custom installation path default home directory] Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no] # Run conda initialization yes [no] >>> yes Thank you for installing Miniconda3! # When this prompt appears, the installation is complete
When the script is running, the environment parameters are automatically added to the bashrc environment file in the user's home directory
__conda_setup="$('/opt/module/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" if [ $? -eq 0 ]; then eval "$__conda_setup" else if [ -f "/opt/module/miniconda3/etc/profile.d/conda.sh" ]; then . "/opt/module/miniconda3/etc/profile.d/conda.sh" else export PATH="/opt/module/miniconda3/bin:$PATH" fi fi unset __conda_setup
Reference the bashrc file of the home directory modified by the script
wangting@ops04:/opt/module/miniconda3 >source ~/.bashrc (base) wangting@ops04:/opt/module/miniconda3 >
Exit base environment mode
(base) wangting@ops04:/opt/module/miniconda3 >conda deactivate
Cancel each login to activate the base environment (each login to the terminal, use the command line to log in to the environment)
After the Miniconda installation is completed, the default base environment will be activated every time the terminal is opened. The automatic activation of the default base environment is prohibited through the following command.
wangting@ops04:/opt/module/miniconda3 >conda config --set auto_activate_base false
Configure conda domestic image
wangting@ops04:/opt/module/miniconda3 >conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free wangting@ops04:/opt/module/miniconda3 >conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main wangting@ops04:/opt/module/miniconda3 >conda config --set show_channel_urls yes
Create Python environment
Create python 3 6 environment -- login name of custom virtual environment after name(-n) = custom python version after name(-n)
wangting@ops04:/home/wangting >conda create --name superset python=3.6 Proceed ([y]/n)? y # Wait until the creation is completed, and the installation package and dependencies will be downloaded during the process Executing transaction: done # # To activate this environment, use # # $ conda activate superset # # To deactivate an active environment, use # # $ conda deactivate #See that the above content is created
[note] if warning: a new version of conda exists. Is prompted when conda create is executed above, Update to update conda. Update is not required for successful installation
wangting@ops04:/home/wangting >conda create --name superset python=3.6 Collecting package metadata (current_repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 4.7.12 latest version: 4.10.1 Please update conda by running $ conda update -n base -c defaults conda Segmentation fault (core dumped) wangting@ops04:/home/wangting >conda update -n base -c defaults conda
conda environment management common commands
View all conda environments
wangting@ops04:/opt/module/miniconda3/pkgs >conda info --envs # conda environments: # base * /opt/module/miniconda3 superset /opt/module/miniconda3/envs/superset
Activate the corresponding conda environment for login
wangting@ops04:/opt/module/miniconda3/pkgs >conda activate superset (superset) wangting@ops04:/opt/module/miniconda3/pkgs > (superset) wangting@ops04:/opt/module/miniconda3/pkgs >python --version Python 3.6.13 :: Anaconda, Inc.
Exit the current conda environment
(superset) wangting@ops04:/opt/module/miniconda3/pkgs >conda deactivate wangting@ops04:/opt/module/miniconda3/pkgs >
Verification function
# Log in again wangting@ops04:/opt/module/miniconda3/pkgs >conda activate superset (superset) wangting@ops04:/opt/module/miniconda3/pkgs > # Log in to the python command line of conda (superset) wangting@ops04:/opt/module/miniconda3/pkgs >python Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> exit() # In conda, use the PIP command to install the module - i to specify the resource address, which is the official address by default; Verify pip (superset) wangting@ops04:/opt/module/miniconda3/pkgs >pip install gunicorn -i https://pypi.douban.com/simple/ Looking in indexes: https://pypi.douban.com/simple/ Collecting gunicorn Downloading https://pypi.doubanio.com/packages/e4/dd/5b190393e6066286773a67dfcc2f9492058e9b57c4867a95f1ba5caf0a83/gunicorn-20.1.0-py3-none-any.whl (79 kB) |████████████████████████████████| 79 kB 2.3 MB/s Requirement already satisfied: setuptools>=3.0 in /opt/module/miniconda3/envs/superset/lib/python3.6/site-packages (from gunicorn) (52.0.0.post20210125) Installing collected packages: gunicorn Successfully installed gunicorn-20.1.0 (superset) wangting@ops04:/opt/module/miniconda3/pkgs >
Add another conda environment verification -- name can be abbreviated as - n; When the python version is not specified, the default is 2.7
# The process is the same as the conda environment with superset above wangting@ops04:/opt/module/miniconda3/pkgs >conda create -n wangting python=3.6 wangting@ops04:/opt/module/miniconda3/pkgs > wangting@ops04:/opt/module/miniconda3/pkgs >conda info --envs # conda environments: # base * /opt/module/miniconda3 superset /opt/module/miniconda3/envs/superset wangting /opt/module/miniconda3/envs/wangting wangting@ops04:/opt/module/miniconda3/pkgs >conda activate wangting (wangting) wangting@ops04:/opt/module/miniconda3/pkgs >
Superset deployment
Superset official website address: http://superset.apache.org/
Before installing Superset, install the following required dependencies
wangting@ops04:/opt/software >sudo yum install -y python-setuptools wangting@ops04:/opt/software >sudo yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
Log in to CONDA superset environment to install and deploy
wangting@ops04:/opt/software >conda activate superset (superset) wangting@ops04:/opt/software >
Install (update) setuptools and pip
(superset) wangting@ops04:/opt/software >pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/ Looking in indexes: https://pypi.douban.com/simple/ Requirement already satisfied: setuptools in /opt/module/miniconda3/envs/superset/lib/python3.6/site-packages (52.0.0.post20210125) Collecting setuptools Downloading https://pypi.doubanio.com/packages/4e/78/56aa1b5f4d8ac548755ae767d84f0be54fdd9d404197a3d9e4659d272348/setuptools-57.0.0-py3-none-any.whl (821 kB) |████████████████████████████████| 821 kB 2.4 MB/s Requirement already satisfied: pip in /opt/module/miniconda3/envs/superset/lib/python3.6/site-packages (21.1.2) Installing collected packages: setuptools Attempting uninstall: setuptools Found existing installation: setuptools 52.0.0.post20210125 Uninstalling setuptools-52.0.0.post20210125: Successfully uninstalled setuptools-52.0.0.post20210125 Successfully installed setuptools-57.0.0 (superset) wangting@ops04:/opt/software >
Install superset
# Apache superset will install a series of dependent modules and wait for the installation to complete (superset) wangting@ops04:/opt/software >pip install apache-superset -i https://pypi.douban.com/simple/
Initialize Supetset database
(superset) wangting@ops04:/opt/software >superset db upgrade Traceback (most recent call last): File "/opt/module/miniconda3/envs/superset/bin/superset", line 5, in <module> from superset.cli import superset File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/__init__.py", line 21, in <module> from superset.app import create_app File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/app.py", line 45, in <module> from superset.security import SupersetSecurityManager File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/security/__init__.py", line 17, in <module> from superset.security.manager import SupersetSecurityManager # noqa: F401 File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/security/manager.py", line 44, in <module> from superset import sql_parse File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/sql_parse.py", line 18, in <module> from dataclasses import dataclass ModuleNotFoundError: No module named 'dataclasses' # Prompt ERROR and report an ERROR. If the dataclasses module cannot be found, install it according to the ERROR (superset) wangting@ops04:/opt/software >pip install dataclasses Collecting dataclasses Downloading dataclasses-0.8-py3-none-any.whl (19 kB) Installing collected packages: dataclasses Successfully installed dataclasses-0.8 # Try initialization again, done (superset) wangting@ops04:/opt/software >superset db upgrade
Create administrator user
(superset) wangting@ops04:/opt/software >export FLASK_APP=superset (superset) wangting@ops04:/opt/software >flask fab create-admin Username [admin]: # Enter is used to log in the management user of the management page User first name [admin]: # Enter user information User last name [user]: # Enter user information Email [admin@fab.org]: # Enter email information Password: # Set the password 123456 to log in the management user password of the management page Repeat for confirmation: # Duplicate password 123456 logging was configured successfully INFO:superset.utils.logging_configurator:logging was configured successfully /opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/flask_caching/__init__.py:202: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled. "Flask-Caching: CACHE_TYPE is set to null, " No PIL installation found INFO:superset.utils.screenshots:No PIL installation found Recognized Database Authentications. Admin User admin created.
Superset initialization
(superset) wangting@ops04:/opt/software >superset init logging was configured successfully INFO:superset.utils.logging_configurator:logging was configured successfully ... ... INFO:superset.security.manager:Creating missing metrics permissions Cleaning faulty perms INFO:superset.security.manager:Cleaning faulty perms (superset) wangting@ops04:/opt/software >
Install gunicorn to provide http services
(superset) wangting@ops04:/opt/software >pip install gunicorn -i https://pypi.douban.com/simple/ Looking in indexes: https://pypi.douban.com/simple/ Requirement already satisfied: gunicorn in /opt/module/miniconda3/envs/superset/lib/python3.6/site-packages (20.0.4) Requirement already satisfied: setuptools>=3.0 in /opt/module/miniconda3/envs/superset/lib/python3.6/site-packages (from gunicorn) (57.0.0)
Start Supterset
(superset) wangting@ops04:/opt/software >gunicorn --workers 5 --timeout 120 --bind ops04:8787 "superset.app:create_app()" --daemon (superset) wangting@ops04:/opt/software >
[Note:] ops04 is the host name, and there is ip resolution of the host name in / etc/hosts;
View superset running status
![002](C:\Users\33450\Desktop\Big data document\superset\002.png)(superset) wangting@ops04:/opt/software >netstat -tnlpu|grep 8787 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 11.8.38.86:8787 0.0.0.0:* LISTEN 18884/python (superset) wangting@ops04:/opt/software >ps -ef | grep 8787 | grep -v grep wangting 18884 1 0 11:32 ? 00:00:00 /opt/module/miniconda3/envs/superset/bin/python /opt/module/miniconda3/envs/superset/bin/gunicorn --workers 5 --timeout 120 --bind ops04:8787 superset.app:create_app() --daemon wangting 18887 18884 5 11:32 ? 00:00:04 /opt/module/miniconda3/envs/superset/bin/python /opt/module/miniconda3/envs/superset/bin/gunicorn --workers 5 --timeout 120 --bind ops04:8787 superset.app:create_app() --daemon wangting 18888 18884 5 11:32 ? 00:00:04 /opt/module/miniconda3/envs/superset/bin/python /opt/module/miniconda3/envs/superset/bin/gunicorn --workers 5 --timeout 120 --bind ops04:8787 superset.app:create_app() --daemon wangting 18890 18884 5 11:32 ? 00:00:04 /opt/module/miniconda3/envs/superset/bin/python /opt/module/miniconda3/envs/superset/bin/gunicorn --workers 5 --timeout 120 --bind ops04:8787 superset.app:create_app() --daemon wangting 18892 18884 5 11:32 ? 00:00:04 /opt/module/miniconda3/envs/superset/bin/python /opt/module/miniconda3/envs/superset/bin/gunicorn --workers 5 --timeout 120 --bind ops04:8787 superset.app:create_app() --daemon wangting 18893 18884 5 11:32 ? 00:00:04 /opt/module/miniconda3/envs/superset/bin/python /opt/module/miniconda3/envs/superset/bin/gunicorn --workers 5 --timeout 120 --bind ops04:8787 superset.app:create_app() --daemon
Stop the superset service (stop it if necessary)
# It is equivalent to kill ing the corresponding process IDs one by one. The service itself does not prov id e a command line to stop the service (superset) wangting@ops04:/opt/software >ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
superset access usage
http://ops04:8787/login/
[Note:]
-
The user name and password are the user name and password just defined by create admin
-
url address http://ops04:8787/login/ It can be accessed because the ip of ops04 is parsed in the C:\Windows\System32\drivers\etc\hosts file. It can also be accessed directly by changing to ip:8787
superset installation data source
superset needs to install different dependencies for different data sources. The following address is the description of the official website
https://superset.apache.org/docs/databases/installing-database-drivers
Common data source pip installation methods and connection formats
Database | PyPI package | Connection String |
---|---|---|
Apache Hive | pip install pyhive | hive://hive@{hostname}:{port}/{database} |
Apache Impala | pip install impyla | impala://{hostname}:{port}/{database} |
Apache Kylin | pip install kylinpy | kylin://<username>:<password>@<hostname>:<port>/<project>?<param1>=<value1>&<param2>=<value2> |
Apache Spark SQL | pip install pyhive | hive://hive@{hostname}:{port}/{database} |
Big Query | pip install pybigquery | bigquery://{project_id} |
Elasticsearch | pip install elasticsearch-dbapi | elasticsearch+http://{user}:{password}@{host}:9200/ |
MySQL | pip install mysqlclient | mysql://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
Oracle | pip install cx_Oracle | oracle:// |
PostgreSQL | pip install psycopg2 | postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
Presto | pip install pyhive | presto:// |
SQLite | sqlite:// | |
SQL Server | pip install pymssql | mssql:// |
Installing mysqlclient dependencies
(superset) wangting@ops04:/opt/software >conda install mysqlclient Proceed ([y]/n)? # y Preparing transaction: done Verifying transaction: done Executing transaction: done
Restart Superset after installation
(superset) wangting@ops04:/opt/software >ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9 (superset) wangting@ops04:/opt/software >netstat -tnlpu|grep 8787 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) (superset) wangting@ops04:/opt/software >gunicorn --workers 5 --timeout 120 --bind ops04:8787 "superset.app:create_app()" --daemon (superset) wangting@ops04:/opt/software >netstat -tnlpu|grep 8787 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 11.8.38.86:8787 0.0.0.0:* LISTEN 44883/python (superset) wangting@ops04:/opt/software >
Data source configuration
Database configuration
Add database and save
Data - databases
gmall # indicates the name of the user database, which can be modified as appropriate
mysql://root:123456@11.8.38.86/gmall?charset=utf8
root # username
123456 # database password
11.8.38.86 # database ip
gmall # database
charset=utf8 # specifies the character set
Default 3306 port
Add test case data table
table: supersetwt
Table configuration
Data - Datasets + sign added
After adding libraries and tables, it is equivalent to having data source collection
Make dashboard
Add Dashboards + sign
To create a chart, click the table to select a template
Edit model
Click SAVE to SAVE and go to the Kanban to see the effect of the first edition
The picture effect is not obvious. The weight data is too close. Modify the mysql data to make the weight drop larger
Click a small menu in the upper right corner of the icon to refresh the data graph
Continue adding data template elements (reading Statistics)
Continue to add data template elements (maximum body temperature in recent week)
dashboard can edit typesetting