Data cleaning using poi

1. Data cleaning

1.1. What is data cleansing

Data cleaning – the process of re examining and verifying data to remove duplicate information, correct existing errors, and provide data consistency.

From the name of data cleaning, we can see that it is to "wash away" the "dirty", which refers to the last procedure to find and correct the identifiable errors in the data file, including checking the consistency of data, dealing with invalid and missing values, etc. Because the data in the data warehouse is a collection of data facing a certain subject, which is extracted from multiple business systems and contains historical data, it is inevitable that some data are wrong data and some data conflict with each other. These wrong or conflicting data are obviously what we don't want, which is called "dirty data". We should "wash away" dirty data according to certain rules, which is data cleaning. The task of data cleaning is to filter the data that does not meet the requirements, submit the filtering results to the business competent department, and confirm whether to filter it or extract it after being corrected by the business unit. The data that do not meet the requirements mainly include incomplete data, wrong data and duplicate data. Data cleaning is different from questionnaire review. The data cleaning after input is generally completed by computer rather than manually

1.2. Steps of data cleaning

The steps of data cleaning are relatively simple and clear. There are two stages:

  • Step 1: deviation detection: check the factors leading to the deviation and identify the discrete value and noise value, such as whether the data we take is representative, whether the amount of data is sufficient, the influence of accidental factors and other factors, which can reflect the discrete value and noise value on the mathematical model

  • Step 2: data cleaning: deal with missing values and noise.

1.3. Method of data cleaning

  • Missing data filling
    The methods to fill in missing values fall into three categories:
  • Delete directly:
    Records with missing values can be deleted, or attributes with missing values can be deleted directly. But it is not very effective in practice, especially when the missing value changes a lot and spans multiple independent variables.
  • Fill in manually:
    It sounds impractical and time-consuming to fill in the exact value manually. This method can be directly ignored in the analysis of big data.
  • Auto fill:
    This method is the most widely used at present. As the name suggests, missing values are automatically populated with certain values. Make cleaning according to different values of automatic filling

1.4. So how do we achieve data cleaning?

Note: in fact, we don't need to do it, because these are often carried out by special posts or departments. For example, Algorithm Engineers produce algorithms, big data engineers implement them, and clean them with the relevant framework / knowledge of big data.

ps: so how do we achieve data cleaning? Our approach is to achieve statistical summation or de reordering of the data obtained by the crawler

Reading guide module: our cleaning this time is to put three excel The salary in is classified and counted, and the salary in such situations as face-to-face negotiation and others is ignored

2.1. Use poi to write excel (2003 version of Excel)

package com.Li.poi;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;

import jxl.Workbook;
/**
 * @Description:   Write to Excel using hssfworkbook POI jar
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 1:51:42 PM
 */
public class Test {

	public static void main(String[] args) throws IOException {
		//1. Create a workbook object using hssfworkbook POI jar
		HSSFWorkbook workbook = new HSSFWorkbook();
		//2. Create sheet
		HSSFSheet sheet = workbook.createSheet();
		//3. Create the first row of objects
		HSSFRow row = sheet.createRow(0);
		//4. Write to header
		String[] biaotouStrings = {"number","full name"};
		for (int i = 0; i < biaotouStrings.length; i++) {
			//Create column create cell
			HSSFCell cell = row.createCell(i);
			//Write value
			cell.setCellValue(biaotouStrings[i]);
		}
		//5. Write 20 pieces of data under the header
		for (int i = 1; i < 20; i++) {
			//Create row
			HSSFRow row2 = sheet.createRow(i);
			//Create the first column of each row
			HSSFCell cell01 = row2.createCell(0);
			//Write the value of the first column
			cell01.setCellValue(i);
			//Create second column
			HSSFCell cell02 = row2.createCell(1);
			//Write the value of the second column
			cell02.setCellValue("00_"+i);
		}
		//6. Specify the path of the file
		File file = new File("E://poi_Excell//poi_one.xls");
		//7. Initialize file output stream object
		FileOutputStream fileOutputStream = new FileOutputStream(file);
		//8. Write data to file
		workbook.write(fileOutputStream);
		//FileStream. Refresh data
		fileOutputStream.flush();
		//10. Close the corresponding flow object
		fileOutputStream.close();
		workbook.close();
	}
}

2.2.poi realizes reading data in excel (2003 version of Excel)

package com.Li.poi;

import java.io.FileInputStream;

import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.xmlbeans.impl.xb.xsdschema.TotalDigitsDocument.TotalDigits;
/**
 * @Description:  Reading data in Excel with Poi
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 2:15:54 PM
 */
public class Test02 {
	public static void main(String[] args) throws Exception {
		//1. File path
		String pathName = "E://poi_Excell//poi_one.xls";
		//2. Convert the excel to be parsed into a stream for output
		FileInputStream fileInputStream = new FileInputStream(pathName);
		//3. Instantiate the core class of poi
		HSSFWorkbook workbook = new HSSFWorkbook(fileInputStream);
		//4. Get the first sheet of the current Excel, that is, the first page
		HSSFSheet sheet = workbook.getSheetAt(0);
		//5. Get the number of rows in the current sheet
		int totalRows = sheet.getPhysicalNumberOfRows();
		//6. Cycle through each line
		for (int i = 0; i < totalRows; i++) {
			//Take out each line
			HSSFRow row = sheet.getRow(i);
			//Get the total number of columns
			int totalCellNums = row.getPhysicalNumberOfCells();
			//Loop each column of each row
			for (int j = 0; j < totalCellNums; j++) {
				 //Get cell
				 HSSFCell cell = row.getCell(j);
				 //Get value
				 //The function to get the value is defective. Write a function yourself
				   String CellValue = getValue(cell);
				 //Output value
				 System.out.print(CellValue+"\t");
			}
			System.out.println();
		}
	}
	/**
	 * @desc Custom function -- format output
	 * @param cell
	 * @return
	 */
	//Get various types of cell s
	public static String getValue(Cell cell)
	{
		//Determines whether it is the value of a numeric cell
		if (cell.getCellType()==Cell.CELL_TYPE_NUMERIC)//Numeric cell
		{
			//When returning, use valueof to transfer out the value
			return  String.valueOf((int) cell.getNumericCellValue()); 
		}
		else if(cell.getCellType()==Cell.CELL_TYPE_BOOLEAN) //
		{
			return String.valueOf(cell.getBooleanCellValue());
		}
		//Returns a value of type string
		return cell.getStringCellValue();
	}
}

2.3. Analysis of POI operation Excel 2007 + Version (Excel 2007 + version)

Reading guide module: with the update of Excel version, some people in the market use Excel 2003, and of course some people use Excel 2007 +. What should we do? In fact, poi has helped us do a good job. As for parsing the higher version of Excel 2007 + compared with the 2003 version, only the core class has changed, and its steps and ideas have not changed

excel2007:XSSFWorkbook ---->XSSFSheet ---->XSSFRow—>XSSFCell
package com.Li.poi;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import jxl.Workbook;
/**
 * @Description:   Use poi to write to Excel (2007 + version of Excel)
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 1:51:42 PM
 */
public class Test03 {

	public static void main(String[] args) throws IOException {
		//1. Create a workbook object using hssfworkbook POI jar
//		HSSFWorkbook workbook = new HSSFWorkbook();
		XSSFWorkbook workbook = new XSSFWorkbook();
		//2. Create sheet
//		HSSFSheet sheet = workbook.createSheet();
		XSSFSheet sheet = workbook.createSheet();
		//3. Create the first row of objects
//		HSSFRow row = sheet.createRow(0);
		XSSFRow row = sheet.createRow(0);
		//4. Write to header
		String[] biaotouStrings = {"number","full name"};
		for (int i = 0; i < biaotouStrings.length; i++) {
			//Create column create cell
//			HSSFCell cell = row.createCell(i);
			XSSFCell cell = row.createCell(i);
			//Write value
			cell.setCellValue(biaotouStrings[i]);
		}
		//5. Write 20 pieces of data under the header
		for (int i = 1; i < 20; i++) {
			//Create row
//			HSSFRow row2 = sheet.createRow(i);
			XSSFRow row2 = sheet.createRow(i);
			//Create the first column of each row
//			HSSFCell cell01 = row2.createCell(0);
			XSSFCell cell01 = row2.createCell(0);
			//Write the value of the first column
			cell01.setCellValue(i);
			//Create second column
//			HSSFCell cell02 = row2.createCell(1);
			XSSFCell cell02 = row2.createCell(1);
			//Write the value of the second column
			cell02.setCellValue("00_"+1);
		}
		//6. Specify the path of the file
		File file = new File("E://poi_Excell//poi_one.xlsx");
		//7. Initialize file output stream object
		FileOutputStream fileOutputStream = new FileOutputStream(file);
		//8. Write data to file
		workbook.write(fileOutputStream);
		//9. Refresh fileOutputStream data
		fileOutputStream.flush();
		//10. Close the corresponding flow object
		fileOutputStream.close();
		workbook.close();
	}
}

2.4. Using poi to parse excel in 2003 and 2007 at the same time

package com.Li.poi;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

/**
 * @Description:  Using poi to parse excel in 2003 and 2007 at the same time
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 3:47:09 PM
 */
public class Test04 {
	public static void main(String[] args) throws IOException {
		//Create a workbook interface object
		Workbook wb = null;
		//Create a sheet interface object
		Sheet sheet = null;
		//Create a Row interface object
		Row row = null;
		//Create a Cell interface object
		Cell cell = null;
		String pathname = "E:\\poi_Excell\\poi_one.xlsx"; 
		//Convert excel to file object
		File file = new File(pathname);
		//2. Convert file object to stream object
		FileInputStream fileInputStream = new FileInputStream(file);
		//3. Judgment
		if(file.isFile()&&file.exists()) 
		{
			//3.1 get the name of current excel
			String fileName = file.getName();
			System.out.println(fileName);
			//3.2 get the suffix of the file
			int st = fileName.lastIndexOf(".");
			CharSequence suffix = fileName.subSequence(st+1, fileName.length());
			//3.3 realize multi-channel judgment through the suffix of the file
			if(suffix.equals("xls")) 
			{
				System.out.println("Walk 2003 analysis");
				//Different core classes
				wb = new HSSFWorkbook(fileInputStream);
			}
			else if(suffix.equals("xlsx"))
			{
				System.out.println("Go 2007 analysis");
				//Different core classes
				wb = new XSSFWorkbook(fileInputStream);
			}
			else 
			{
				System.out.println("The current file cannot be parsed. Exit the program");
				System.exit(0);
			}
		}
		//4. Get the first sheet of the current Excel, that is, the first page
		sheet =  wb.getSheetAt(0);
		//5. Get the number of rows in the current sheet
		int totalRows = sheet.getPhysicalNumberOfRows();
		//6. Cycle through each line
		for (int i = 0; i < totalRows; i++) {
			//Take out each line
			row = sheet.getRow(i);
			//Get the total number of columns
			int totalCellNums = row.getPhysicalNumberOfCells();
			//Loop each column of each row
			for (int j = 0; j < totalCellNums; j++) {
				 //Get cell
				 cell = row.getCell(j);
				 //Get value
				 //The function to get the value is defective. Write a function yourself
				   String CellValue = getValue(cell);
				 //Output value
				 System.out.print(CellValue+"\t");
			}
			System.out.println();
		}
	}
	/**
	 * @desc Custom function -- format output
	 * @param cell
	 * @return
	 */
	//Get various types of cell s
	public static String getValue(Cell cell)
	{
		//Determines whether it is the value of a numeric cell
		if (cell.getCellType()==Cell.CELL_TYPE_NUMERIC)//Numeric cell
		{
			//When returning, use valueof to transfer out the value
			return  String.valueOf((int) cell.getNumericCellValue()); 
		}
		else if(cell.getCellType()==Cell.CELL_TYPE_BOOLEAN) //
		{
			return String.valueOf(cell.getBooleanCellValue());
		}
		//Returns a value of type string
		return cell.getStringCellValue();
	}
}

2.5. Using poi to clean data

package com.Li.flushs;
/**
 * @Description:  Using POI to build a tool class
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 5:40:53 PM
 */

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class FileUtils {
	//Get different versions of excel through the path
	public Workbook getWorkbook(String pathname) throws IOException 
	{
		//Create a workbook interface object
		Workbook wb = null;
		//Create a sheet interface object
		Sheet sheet = null;
		//Create a Row interface object
		Row row = null;
		//Create a Cell interface object
		Cell cell = null;
		//Convert excel to file object
		File file = new File(pathname);
		//2. Convert file object to stream object
		FileInputStream fileInputStream = new FileInputStream(file);
		//3. Judgment
		if(file.isFile()&&file.exists()) 
		{
			//3.1 get the name of current excel
			String fileName = file.getName();
			System.out.println(fileName);
			//3.2 get the suffix of the file
			int st = fileName.lastIndexOf(".");
			CharSequence suffix = fileName.subSequence(st+1, fileName.length());
			//3.3 realize multi-channel judgment through the suffix of the file
			if(suffix.equals("xls")) 
			{
				System.out.println("Walk 2003 analysis");
				//Different core class hssfworkbooks
				wb = new HSSFWorkbook(fileInputStream);
			}
			else if(suffix.equals("xlsx"))
			{
				System.out.println("Go 2007 analysis");
				//Different core classes XSSFWorkbook
				wb = new XSSFWorkbook(fileInputStream);
			}
			else 
			{
				System.out.println("The current file cannot be parsed. Exit the program");
				System.exit(0);
			}
		}
		return wb;
	}
	//2. Read the value of salary column in each Excel and store it into the container
	public List<String> getCellData(Workbook workbook , int sheetNum , int beginColumn , int endColumn)
	{
		Sheet sheet = null;
		Row row = null;
		Cell cell = null; 
		List<String> list = new ArrayList<String>();
		//4. Get the first sheet of the current Excel, that is, the first page
		sheet =  workbook.getSheetAt(0);
		//5. Get the number of rows in the current sheet
		int totalRows = sheet.getPhysicalNumberOfRows();
		//6. Cycle through each line
		for (int i = 0; i < totalRows; i++) {//Rows in excel obtained by outer loop
			//Take out each line
			row = sheet.getRow(i);
			//Get the total number of columns
			int totalCellNums = row.getPhysicalNumberOfCells();
			//Loop each column of each row
			for (int j = beginColumn; j <= endColumn; j++) {//Columns in excel obtained by outer loop
				 //Get cell
				 cell = row.getCell(j);
				 //Get value
				 String CellValue = getValue(cell);
				 list.add(CellValue);
			}
		}
		return list;
	}
	/**
	 * @desc Custom function -- format output
	 * @param cell
	 * @return
	 */
	//Get various types of cell s
	public String getValue(Cell cell)
	{
		//Determines whether it is the value of a numeric cell
		if (cell.getCellType()==Cell.CELL_TYPE_NUMERIC)//Numeric cell
		{
			//When returning, use valueof to transfer out the value
			return  String.valueOf((int) cell.getNumericCellValue()); 
		}
		else if(cell.getCellType()==Cell.CELL_TYPE_BOOLEAN) //
		{
			return String.valueOf(cell.getBooleanCellValue());
		}
		//Returns a value of type string
		return cell.getStringCellValue();
	}
}

//Test class
package com.Li.flushs;

import java.io.IOException;
import java.util.List;

import org.apache.poi.ss.usermodel.Workbook;

/**
 * @Description: Test class: the test uses poi to obtain the data in excel, and then classifies and counts
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 7:40:53 PM
 */
public class Test {

	public static void main(String[] args) throws IOException {
		String pathName = "E:\\recruit\\zhyc_java.xls";
		FileUtils fileUtils = new FileUtils();
		Workbook workbook = fileUtils.getWorkbook(pathName);
		List<String> a = fileUtils.getCellData(workbook, 0, 2, 2);
		for (String string : a) {
			System.out.println(string);
		}
	}
}

2.6. Using poi to clean data

package com.Li.flushs;
/**
 * @Description:  Using POI to build a tool class
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 5:40:53 PM
 */

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class FileUtils {
	//Get different versions of excel through the path
	public Workbook getWorkbook(String pathname) throws IOException 
	{
		//Create a workbook interface object
		Workbook wb = null;
		//Create a sheet interface object
		Sheet sheet = null;
		//Create a Row interface object
		Row row = null;
		//Create a Cell interface object
		Cell cell = null;
		//Convert excel to file object
		File file = new File(pathname);
		//2. Convert file object to stream object
		FileInputStream fileInputStream = new FileInputStream(file);
		//3. Judgment
		if(file.isFile()&&file.exists()) 
		{
			//3.1 get the name of current excel
			String fileName = file.getName();
			System.out.println(fileName);
			//3.2 get the suffix of the file
			int st = fileName.lastIndexOf(".");
			CharSequence suffix = fileName.subSequence(st+1, fileName.length());
			//3.3 realize multi-channel judgment through the suffix of the file
			if(suffix.equals("xls")) 
			{
				System.out.println("Walk 2003 analysis");
				//Different core class hssfworkbooks
				wb = new HSSFWorkbook(fileInputStream);
			}
			else if(suffix.equals("xlsx"))
			{
				System.out.println("Go 2007 analysis");
				//Different core classes XSSFWorkbook
				wb = new XSSFWorkbook(fileInputStream);
			}
			else 
			{
				System.out.println("The current file cannot be parsed. Exit the program");
				System.exit(0);
			}
		}
		return wb;
	}
	//2. Read the value of salary column in each Excel and store it into the container
	public List<String> getCellData(Workbook workbook , int sheetNum , int beginColumn , int endColumn)
	{
		Sheet sheet = null;
		Row row = null;
		Cell cell = null; 
		List<String> list = new ArrayList<String>();
		//4. Get the first sheet of the current Excel, that is, the first page
		sheet =  workbook.getSheetAt(0);
		//5. Get the number of rows in the current sheet
		int totalRows = sheet.getPhysicalNumberOfRows();
		//6. Cycle through each line
		for (int i = 0; i < totalRows; i++) {//Rows in excel obtained by outer loop
			//Take out each line
			row = sheet.getRow(i);
			//Get the total number of columns
			int totalCellNums = row.getPhysicalNumberOfCells();
			//Loop each column of each row
			for (int j = beginColumn; j <= endColumn; j++) {//Columns in excel obtained by outer loop
				 //Get cell
				 cell = row.getCell(j);
				 //Get value
				 String CellValue = getValue(cell);
				 list.add(CellValue);
			}
		}
		return list;
	}
	/**
	 * @desc Custom function -- format output
	 * @param cell
	 * @return
	 */
	//Get various types of cell s
	public String getValue(Cell cell)
	{
		//Determines whether it is the value of a numeric cell
		if (cell.getCellType()==Cell.CELL_TYPE_NUMERIC)//Numeric cell
		{
			//When returning, use valueof to transfer out the value
			return  String.valueOf((int) cell.getNumericCellValue()); 
		}
		else if(cell.getCellType()==Cell.CELL_TYPE_BOOLEAN) //
		{
			return String.valueOf(cell.getBooleanCellValue());
		}
		//Returns a value of type string
		return cell.getStringCellValue();
	}
}
//Test class
package com.Li.flushs;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.poi.ss.usermodel.Workbook;

/**
 * @Description: Test class: the test uses poi to obtain the data in excel, and then classifies and counts
 * @auther:Li Ya Hui
 * @Time:2021 April 29, 2014 7:40:53 PM
 */
public class Test {

	public static void main(String[] args) throws IOException {
		//Create a list and save the list obtained after execution
		List<List<String>> list  = new ArrayList<List<String>>();
		int h=0,m=0,w=0;
		for (int i = 1; i <= 3; i++) {
			String pathName = "E:\\recruit\\zhyc_java0_"+i+".xls";
			FileUtils fileUtils = new FileUtils();
			Workbook workbook = fileUtils.getWorkbook(pathName);
			List<String> resultList = fileUtils.getCellData(workbook, 0, 2, 2);
			list.add(resultList);//Save the salaries obtained from the three excel into the list container
//			System.out.println(resultList);
		}
		/**
		 *Rules of data cleaning
		 * 1.As long as it's not digital, give it up
		 * 2.Three levels are defined according to the salary level, namely
		 * 12000 Above: high income, i.e. h
		 * 8000-12000: Middle income is m
		 * 8000 The following: low income, i.e. w
		 * 
		 * 
		 */
		for (int i = 0; i < list.size(); i++) {
			  List<String> everyResultList = list.get(i);//The container of salary in each excel
			  for (int j = 0; j < everyResultList.size(); j++) {
				  String everyResult = everyResultList.get(j);//Salary of every item in every excel
//				  System.out.println("everyResult:\t"+everyResult);
				  //According to the cleaning rules
				  if(!(everyResult.startsWith("other")||everyResult.startsWith("salary")||everyResult.startsWith("Face to face discussion"))) 
				  {
//					  System.out.println(everyResult);
					  //Starting position
					  int st = everyResult.lastIndexOf("-");//Pay attention to this. It is recommended to paste the in excel
					  //Each specific salary
					   String ft = (String) everyResult.subSequence(0, st);
					   int xinshui =Integer.parseInt(ft);
					   System.out.println(xinshui);
					  //judge
					  if(xinshui>=12000) 
					  {
						  h++;
					  }else if(xinshui>=8000)
					  {
						  m++;
					  }else 
					  {
						  w++;
					  }
				  }
			  }
		}
		
		System.out.println("12000 The above are"+h+"individual");
		System.out.println("8000-12000 The above are"+m+"individual");
		System.out.println("8000 The following are"+w+"individual");
		
	}  
}

3. What is data visualization

  1. Data visualization:
    • It is a scientific and technological research on the visual expression of data. Among them, the visual representation of this data is defined as a kind of information extracted in a summary form, including various attributes and variables of the corresponding information unit.
    • It is in a concept of continuous expansion. It mainly refers to the more advanced technical methods in technology, which allow the visual interpretation of data through expression, modeling and the display of three-dimensional, surface, attribute and animation by using graphics, image processing, computer vision and user interface. Compared with special technical methods such as stereo modeling, data visualization covers a wide range of technical methods
  2. Classification of data visualization
    • In a broad sense, visualization is everywhere. The website is a data visualization, behind which are the dense data tables of the database. When you get to the browser, it is a simple and easy to understand page; Taobao is the visualization of goods, with various filters such as price and delivery place; Wechat is the visualization of real-time data, which surrounds your social network and allows you to see the latest message flow at a glance.

    • Data visualization in a narrow sense is more pure graphics to represent data

  3. Then we analyze and realize data visualization from a narrow perspective
    • We all know that we got some data through crawlers a few days ago. If we look at these data one by one, we will find a lot and it's very laborious. So I proposed to analyze and display the data from the perspective of data visualization. How to display it? How to deal with it?
    • As for processing, we adopt data cleaning, so what do these data cleaning include? We can see the corresponding topics later. How should the data be displayed? We take the form of charts (because we haven't learned servlet and other technologies, so we can only do this for the time being)
Data Guide

1. Third party jar packages capable of operating excel: jxl, poi, Jxcell (charge)
2. However, these can not directly operate excel to realize visual charts. Therefore, we need to change our thinking and see if we can realize them indirectly
3. It is found that we can use the following methods:
1) Java calls excel Macro (what is a macro? It's hard to write, and it can only be written with the permission when writing, and then it is called with java code)
2) Jxcell is used, but this one is charged, and the version is relatively old. For jdk1 After 6, the support was not very good
3) Using poi+JfreeChart, which can operate excel to realize visualization, but because JfreeChart is an old product, the generated chart is ugly
4) Use poi+excel template (in fact, it is to define an excel chart, but it is recommended to use the excel version after 2003, otherwise the effect may not come out), you can refer to F: / 123 xlsx

3.1. Two methods of assigning excel formula

3.1.1 get cell formula workbook getCreationHelper(). createFormulaEvaluator(). evaluateAll();
  • workbook.getCreationHelper().createFormulaEvaluator().evaluateAll();

  • FileOutputStream out = new FileOutputStream(new File("formula.xlsx"));

  • workbook.write(out);

3.1.2. Get cell formula page sheet setForceFormulaRecalculation(true);
  • FileInputStream fileInputStream = new FileInputStream("E:\poi_Excell\123.xlsx");
  • XSSFWorkbook xssfWorkbook = new XSSFWorkbook(fileInputStream);
  • XSSFSheet sheet = xssfWorkbook.getSheetAt(0);
  • xssfWorkbook.setSheetName(0, "sheet0");
  • Stores the style of the current table
  • XSSFCellStyle cellStyle = xssfWorkbook.createCellStyle();

3.2 case display

package com.Li.flushs;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

/**
 * @Description: poi+excel Template copy of excel for data visualization
 * @auther:Li Ya Hui
 * @Time:2021 April 30, 2014 1:11:58 PM
 */
public class Test02 {
	public static void main(String[] args) throws IOException {
		//1. Convert the template file into a stream object
		FileInputStream fileInputStream = new FileInputStream("E:\\poi_Excell\\123.xlsx");
		//2. Create output stream object
		FileOutputStream fileOutputStream =new FileOutputStream("E:\\poi_Excell\\2021-04-30.xlsx");
		//3. Initialize poi core class
		XSSFWorkbook xssfWorkbook = new XSSFWorkbook(fileInputStream);
		//3.1 obtain the original template, i.e. 123 First page of xlsx
		XSSFSheet sheet = xssfWorkbook.getSheetAt(0);
		
		//Two methods of obtaining formula
	    //3.2 automatic calculation
//        sheet.setForceFormulaRecalculation(true);
        //3.3 name the designated sheet
        xssfWorkbook.setSheetName(0, "sheet0");
        //4. Store the style of the current table
        XSSFCellStyle cellStyle = xssfWorkbook.createCellStyle();
		
        //5. Re assign values to the first column of four rows of data in excel template
        String [] mingcheng = {"java","php","python","C++"};//name
		  
		//7. Get the assignment of four rows of sheet
        for (int i = 0; i < mingcheng.length; i++) {
			XSSFRow row = sheet.getRow(i);
			XSSFCell cell = row.getCell(0);
			cell.setCellValue(mingcheng[i]);
		}
       //8. Re assign values to the second column of four rows of data in excel template
        int [] shuLiang = {800,500,260,100};//quantity
        for (int i = 0; i < shuLiang.length; i++) {
			XSSFRow row = sheet.getRow(i);
			XSSFCell cell = row.getCell(1);
			cell.setCellValue(shuLiang[i]) ;
		}
        //Two methods of obtaining formula
        xssfWorkbook.getCreationHelper().createFormulaEvaluator().evaluateAll();
        //9. Write
        xssfWorkbook.write(fileOutputStream);
        //10. Close flow object
        fileOutputStream.close();
        fileInputStream.close();
        xssfWorkbook.close();
		
	}
}

4. Data analysis (understanding)

Reading guide module: many students may feel a bit repetitive when they see this module. In fact, it is not. After data cleaning, we can also visualize it. Then whether we can make decisions is not enough, because commercial investment needs to be rigorous and careful. After all, money is very important
So how do we implement data analysis?

1. Data analysis: refers to the process of analyzing a large amount of collected data with appropriate statistical analysis methods, extracting useful information and forming conclusions, and studying and summarizing the data in detail. This process is also the supporting process of the quality management system. In practice, data analysis can help people make judgments in order to take appropriate action.
The mathematical basis of data analysis was established in the early 20th century, but it was not until the emergence of computers that practical operation became possible and data analysis was popularized. Data analysis is the product of the combination of mathematics and computer science

Data analysis is a young and developing subject

2. Data analysis mainly includes:
1) Simple Math
2) Statistics
3) Fast Fourier transform (FFT)
4) Smoothing and Filtering
5) Baseline and peak analysis

3. It sounds so difficult. How can we learn? Haha, we don't need to learn this, because it also has special posts or departments. After all, today's complex business can't be completed by one person alone. Now we should pay attention to team cooperation

4. Additional expansion and supplement, so how to learn? Master Python, Matlab, R language, and then the use of SPSS platform (software)

5. Business decisions

  1. Purpose of decision-making
    Strict work priorities have always existed in the military, on which all decisions are based. Tasks always come first, followed by decisions in line with the interests of the team, and finally decisions related to personal interests. Individuals should be placed last because individuals are always the smallest link in the organizational chain. Egoism doesn't make much sense. Individual gains and losses should not be considered in the team or organization.

  2. Mistakes are by no means immutable
    It may be a little too much to use the word "by no means", but you should be able to understand what I mean. As I said before, failure depends only on when you choose to stop, but also on how people look at specific problems. For example, some difficult problems appear to people at one level, but they may not be urgent problems in the eyes of people at another level. Collect as many opinions as possible and understand the specific situation more deeply.

  3. Timeline of execution
    Internal and external influences determine the feasibility of implementing decisions within a given timeline. Internal influence refers to the ability of you and the team to make decisions within a given time, while external influence refers to the driving force affecting the deadline, and these factors are beyond your control, such as weather, economy or market demand.

  • You have to ask yourself two questions. The first question is, "is this the right time to make a decision?" If the answer is yes, ask the second question: "can I implement this decision?" If the answer is no, find out "why?"
  1. Known unknown and unknown unknown
  • You will face many constraints when making decisions.

  • "Known unknowns" means that you are aware of some intangible constraints, but you can't quantify them, such as transportation (if you live in Los Angeles, you must know what I'm talking about). For example, you know that the rush hour in Los Angeles never ends, so it may take you between 20 minutes and 2 hours from point A to point B. The key is that you know the existence of uncertainty, but you don't know the specific degree of uncertainty.

  • According to Murphy's law, "unknown unknowns" are accidents that you don't plan at all, such as (continue to take traffic as an example) traffic accidents or engine failures.

  • Try to be clear about all the constraints that will help you achieve your goals.

  1. Availability of resources
  • If you don't succeed at first, keep trying. The outcome of each effort depends largely on the resources you use to implement the plan, so identify not only the main resources available, but also the secondary resources. Each decision should develop an emergency plan to deal with the "unknown unknown" and the failure of the main action plan.
  • If you are not prepared enough, you will be overwhelmed by the decision-making process. Reviewing the above may make you more confident in your decision-making
  • Therefore, a good software system always helps enterprises

Keywords: Java Programming Big Data Visualization

Added by Chamza on Fri, 18 Feb 2022 15:30:10 +0200