Friends who have read interactive analysis of six pulse sword (portal: The initial experience of HoloStudio in interactive analysis of six pulse sword )It should be known that HoloStudio, a one-stop development platform based on interactive analysis engine, is deeply integrated into dataworks. With the support of dataworks's powerful function background, through interactive analysis and accelerated query of data (MaxCompute, real-time computing), we can directly connect to data service, scheduling and other modules through HoloStudio, and easily realize one-stop development and docking of big data.
However, a friend told Xiaobian that he would not use the scheduling function of dataworks. This is not allowed by Xiaobian. Taking advantage of this opportunity, today's editor will tell you how to achieve periodic scheduling of data in HoloStudio in dataworks, move the bench and start the class!
Prerequisite
Between using DataWorks, make sure that the legacy service is open and the workspace is configured
- Opening DataWorks services And configure the workspace.
- Opening MaxCompute service.
- Opening Interactive analysis service.
Operation steps
Step 1: MaxCompute prepares the data source
Prepare a MaxCompute source data table, which you can refer to MaxCompute create table To create a table, you can select a table directly from the data map. For example, the existing tables in the data map are selected, and their DDL S are as follows: (the data volume is about 40000 pieces)
CREATE TABLE IF NOT EXISTS bank_data_odps ( age BIGINT COMMENT 'Age', job STRING COMMENT 'Type of work', marital STRING COMMENT 'marriage', education STRING COMMENT 'Education level', card STRING COMMENT 'Is there a credit card', housing STRING COMMENT 'Housing loan', loan STRING COMMENT 'loan', contact STRING COMMENT 'Contact way', month STRING COMMENT 'Month', day_of_week STRING COMMENT 'What day is it', duration STRING COMMENT 'Duration', campaign BIGINT COMMENT 'Number of contacts in this activity', pdays DOUBLE COMMENT 'Time interval to last contact', previous DOUBLE COMMENT 'Number of previous customer contacts', poutcome STRING COMMENT 'Results of previous campaigns', emp_var_rate DOUBLE COMMENT 'Rate of change in employment', cons_price_idx DOUBLE COMMENT 'Consumer price index', cons_conf_idx DOUBLE COMMENT 'Consumer confidence index', euribor3m DOUBLE COMMENT 'Euro deposit rate', nr_employed DOUBLE COMMENT 'Number of staff and workers', y BIGINT COMMENT 'Whether there is fixed deposit' );
Step 2: new data development of HoloStudio
Move to HoloStudio, click data development - new data development in the left menu bar, and create an external table to map MaxCompute source table data.
Enter the SQL command as follows, click save, and then click the upper left corner to go to DataWorks for scheduling.
BEGIN; CREATE FOREIGN TABLE if not EXISTS bank_data_foreign_holo ( age int8, job text, marital text, education text, card text, housing text, loan text, contact text, month text, day_of_week text, duration text, campaign int8, pdays float8, previous float8, poutcome text, emp_var_rate float8, cons_price_idx float8, cons_conf_idx float8, euribor3m float8, nr_employed float8, y int8 ) SERVER odps_server OPTIONS (project_name 'projectname', table_name 'bank_data_odps'); GRANT SELECT ON bank_data_foreign_holo TO PUBLIC; COMMIT;
Description: option given connection parameters: project name is MaxCompute project space name, table name is MaxCompute table name
Step 3: external table scheduling
After you jump to the DataWorks scheduling page, configure the scheduling information and publish it.
After the development of the new Hologres, select the newly developed development node, click the update node version, then synchronize the new node in the HoloStudio to DataWorks, click the left side scheduling configuration, select the scheduling dependency, and the parent node output name is ensured to be the MaxCompute source table. Click Save submit publish after the scheduling configuration is completed, and click operation and maintenance center to publish in the production environment.
You can configure the time attribute according to your own project situation. Examples are as follows
After entering the production environment, select the published node and click publish.
After successful publishing, click the operation and maintenance center in the upper right corner to configure the data.
Click the cycle task in the left menu bar, select the published node, and right-click mend data current node to publish successfully.
Step 4: HoloStudio establishes partition table data development
After the external table node is published, go to HoloStudio to establish the partition table data development and write the partition data.
HoloStudio -- Data Development -- new data development, enter SQL command, click Run, and assign value to user-defined parameters. After running successfully, click Save to go to DataWorks for scheduling. The example SQL is as follows:
BEGIN; CREATE TABLE if not EXISTS bank_data_holo ( age int8, job text, marital text, education text, card text, housing text, loan text, contact text, month text, day_of_week text, duration text, campaign int8, pdays float8, previous float8, poutcome text, emp_var_rate float8, cons_price_idx float8, cons_conf_idx float8, euribor3m float8, nr_employed float8, y int8, ds text NOT NULL ) PARTITION BY LIST(ds); CALL SET_TABLE_PROPERTY('bank_data_holo', 'orientation', 'column'); CALL SET_TABLE_PROPERTY('bank_data_holo', 'time_to_live_in_seconds', '700000'); COMMIT; create table if not exists bank_data_holo_1_${bizdate} partition of bank_data_holo for values in ('${bizdate}'); insert into bank_data_holo_1_${bizdate} select age as age, job as job, marital as marital, education as education, card as card, housing as housing, loan as loan, contact as contact, month as month, day_of_week as day_of_week, duration as duration, campaign as campaign, pdays as pdays, previous as previous, poutcome as poutcome, emp_var_rate as emp_var_rate, cons_price_idx as cons_price_idx, cons_conf_idx as cons_conf_idx, euribor3m as euribor3m, nr_employed as nr_employed, y as y, '${bizdate}' as ds from bank_data_foreign_holo;
Step 5: partition table scheduling
Jump to DataWorks new data development, click Update node version, synchronize partition table information to this node, click scheduling configuration on the right, assign basic attribute parameter as time node, schedule dependency as external table just published, click Save submit publish after completion, and go to operation and maintenance center to publish production environment.
Step 6: publish and periodically schedule data
DataWorks publishes the development node successfully and supplements the data completely.
Go to Holostudio, click the left menu bar PG management - table, select the partition table of the dispatching configuration successfully, and click the data preview to see the data.
Set the scheduling time, the system will automatically conduct periodic scheduling, through the partition, the daily data scheduling.
I'm sure you've learned how to use DataWorks scheduling through the explanation of Xiaobian Open interactive analysis Let's use it.
If you have any questions, you are welcome to add nail group consultation