What? You still don't use DataWorks scheduling?

Friends who have read interactive analysis of six pulse sword (portal: The initial experience of HoloStudio in interactive analysis of six pulse sword )It should be known that HoloStudio, a one-stop development platform based on interactive analysis engine, is deeply integrated into dataworks. With the support of dataworks's powerful function background, through interactive analysis and accelerated query of data (MaxCompute, real-time computing), we can directly connect to data service, scheduling and other modules through HoloStudio, and easily realize one-stop development and docking of big data.
However, a friend told Xiaobian that he would not use the scheduling function of dataworks. This is not allowed by Xiaobian. Taking advantage of this opportunity, today's editor will tell you how to achieve periodic scheduling of data in HoloStudio in dataworks, move the bench and start the class!

Prerequisite

Between using DataWorks, make sure that the legacy service is open and the workspace is configured

Opening DataWorks services And configure the workspace.
Opening MaxCompute service.
Opening Interactive analysis service.

Operation steps

Step 1: MaxCompute prepares the data source

Prepare a MaxCompute source data table, which you can refer to MaxCompute create table To create a table, you can select a table directly from the data map. For example, the existing tables in the data map are selected, and their DDL S are as follows: (the data volume is about 40000 pieces)

CREATE TABLE IF NOT EXISTS bank_data_odps
(
 age             BIGINT COMMENT 'Age',
 job             STRING COMMENT 'Type of work',
 marital         STRING COMMENT 'marriage',
 education       STRING COMMENT 'Education level',
 card         STRING COMMENT 'Is there a credit card',
 housing         STRING COMMENT 'Housing loan',
 loan            STRING COMMENT 'loan',
 contact         STRING COMMENT 'Contact way',
 month           STRING COMMENT 'Month',
 day_of_week     STRING COMMENT 'What day is it',
 duration        STRING COMMENT 'Duration',
 campaign        BIGINT COMMENT 'Number of contacts in this activity',
 pdays           DOUBLE COMMENT 'Time interval to last contact',
 previous        DOUBLE COMMENT 'Number of previous customer contacts',
 poutcome        STRING COMMENT 'Results of previous campaigns',
 emp_var_rate    DOUBLE COMMENT 'Rate of change in employment',
 cons_price_idx  DOUBLE COMMENT 'Consumer price index',
 cons_conf_idx   DOUBLE COMMENT 'Consumer confidence index',
 euribor3m       DOUBLE COMMENT 'Euro deposit rate',
 nr_employed     DOUBLE COMMENT 'Number of staff and workers',
 y               BIGINT COMMENT 'Whether there is fixed deposit'
);

Step 2: new data development of HoloStudio

Move to HoloStudio, click data development - new data development in the left menu bar, and create an external table to map MaxCompute source table data.
Enter the SQL command as follows, click save, and then click the upper left corner to go to DataWorks for scheduling.

BEGIN;
CREATE FOREIGN TABLE if not EXISTS bank_data_foreign_holo (
 age int8,
 job text,
 marital text,
 education text,
 card text,
 housing text,
 loan text,
 contact text,
 month text,
 day_of_week text,
 duration text,
 campaign int8,
 pdays float8,
 previous float8,
 poutcome text,
 emp_var_rate float8,
 cons_price_idx float8,
 cons_conf_idx float8,
 euribor3m float8,
 nr_employed float8,
 y int8
)
SERVER odps_server
OPTIONS (project_name 'projectname', table_name 'bank_data_odps');
GRANT SELECT ON bank_data_foreign_holo TO PUBLIC;
COMMIT;

Description: option given connection parameters: project name is MaxCompute project space name, table name is MaxCompute table name

Step 3: external table scheduling

After you jump to the DataWorks scheduling page, configure the scheduling information and publish it.
After the development of the new Hologres, select the newly developed development node, click the update node version, then synchronize the new node in the HoloStudio to DataWorks, click the left side scheduling configuration, select the scheduling dependency, and the parent node output name is ensured to be the MaxCompute source table. Click Save submit publish after the scheduling configuration is completed, and click operation and maintenance center to publish in the production environment.
You can configure the time attribute according to your own project situation. Examples are as follows

After entering the production environment, select the published node and click publish.

After successful publishing, click the operation and maintenance center in the upper right corner to configure the data.
Click the cycle task in the left menu bar, select the published node, and right-click mend data current node to publish successfully.

Step 4: HoloStudio establishes partition table data development

After the external table node is published, go to HoloStudio to establish the partition table data development and write the partition data.
HoloStudio -- Data Development -- new data development, enter SQL command, click Run, and assign value to user-defined parameters. After running successfully, click Save to go to DataWorks for scheduling. The example SQL is as follows:

BEGIN;
CREATE TABLE if not EXISTS bank_data_holo (
 age int8,
 job text,
 marital text,
 education text,
 card text,
 housing text,
 loan text,
 contact text,
 month text,
 day_of_week text,
 duration text,
 campaign int8,
 pdays float8,
 previous float8,
 poutcome text,
 emp_var_rate float8,
 cons_price_idx float8,
 cons_conf_idx float8,
 euribor3m float8,
 nr_employed float8,
 y int8,
 ds text NOT NULL
)
PARTITION  BY LIST(ds);
CALL SET_TABLE_PROPERTY('bank_data_holo', 'orientation', 'column');
CALL SET_TABLE_PROPERTY('bank_data_holo', 'time_to_live_in_seconds', '700000');
COMMIT;



create table if not exists bank_data_holo_1_${bizdate} partition of bank_data_holo
  for values in ('${bizdate}');

insert into bank_data_holo_1_${bizdate}
select 
    age as age,
    job as job,
    marital as marital,
    education as education,
    card as card,
     housing as housing,
    loan as loan,
    contact as contact,
    month as month,
    day_of_week as day_of_week,
     duration as duration,
    campaign as campaign,
     pdays as pdays,
    previous as previous,
    poutcome as poutcome,
     emp_var_rate as emp_var_rate,
    cons_price_idx as cons_price_idx,
    cons_conf_idx as cons_conf_idx,
    euribor3m as euribor3m,
    nr_employed as nr_employed,
    y as y,
    '${bizdate}' as ds 
from bank_data_foreign_holo;

Step 5: partition table scheduling

Jump to DataWorks new data development, click Update node version, synchronize partition table information to this node, click scheduling configuration on the right, assign basic attribute parameter as time node, schedule dependency as external table just published, click Save submit publish after completion, and go to operation and maintenance center to publish production environment.

Step 6: publish and periodically schedule data

DataWorks publishes the development node successfully and supplements the data completely.

Go to Holostudio, click the left menu bar PG management - table, select the partition table of the dispatching configuration successfully, and click the data preview to see the data.

Set the scheduling time, the system will automatically conduct periodic scheduling, through the partition, the daily data scheduling.

I'm sure you've learned how to use DataWorks scheduling through the explanation of Xiaobian Open interactive analysis Let's use it.
If you have any questions, you are welcome to add nail group consultation

Keywords: Database SQL Attribute Big Data

Added by r3drain on Thu, 31 Oct 2019 03:31:50 +0200

Programming VIP