The Oozie Editor/Dashboard application allows you to define Oozie workflow, coordinator, and bundle applications, run workflow, coordinator, and bundle jobs, and view the status of jobs. For information about Oozie, see Oozie Documentation.
In the previous episode, we saw how to to transfer some file data into Apache Hadoop. In order to interrogate easily the data, the next step is to create some Hive tables. This will enable quick interaction with high level languages like SQL and Pig.
We experiment with the SQL queries, then parameterize them and insert them into a workflow in order to run them together in parallel. Including Hive queries in an Oozie workflow is a pretty common use case with recurrent pitfalls as seen on the user group. We can do it with Hue in a few clicks.
In order to run DistCp, Streaming, Pig, Sqoop, and Hive jobs, Oozie must be configured to use the Oozie ShareLib. See Oozie Installation in http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/CDH4-Installation-Guide.html.
A coordinator application allows you to define and execute recurrent and interdependent workflow jobs. The coordinator application defines the conditions under which the execution of workflows can occur.
A bundle application allows you to batch a set of coordinator applications.
Oozie Editor/Dashboard is one of the applications installed as part of Hue. For information about installing and configuring Hue, see Hue Installation in http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/CDH4-Installation-Guide.html.
Click the Oozie Editor icon
() Oozie Dashboard icon
(
) in the navigation bar at the top of the Hue browser page. Oozie Editor/Dashboard opens with the following screens:
Many screens contain lists. When you type in the Filter field on screens, the lists are dynamically filtered to display only those rows containing text that matches the specified substring.
In the Dashboard workflows, coordinators, and bundles can only be viewed, submitted, and modified by their owner or a superuser.
Editor permissions for performing actions on workflows, coordinators, and bundles are summarized in the following table:
Action |
Superuser or Owner |
All |
---|---|---|
View |
Y |
Only if "Is shared" is set |
Submit |
Y |
Only if "Is shared" is set |
Modify |
Y |
N |
The Dashboard shows a summary of the running and completed workflow, coordinator, and bundle jobs.
You can view jobs for a period up to the last 30 days.
You can filter the list by date (1, 7, 15, or 30 days) or status (Succeeded, Running, or Killed). The date and status buttons are toggles.
Click the Workflows tab to view the running and completed workflow jobs for the filters you have specified.
Click a workflow row in the Running or Completed table to view detailed information about that workflow job.
In the left pane contains a link to the workflow and the variable values specified.
For the selected job, the following information is available in the right area.
For each action in the workflow you can:
Click the Coordinators tab to view the running and completed coordinator jobs for the filters you have specified.
For the selected job, the following information is available.
Click the Bundles tab to view the running and completed bundle jobs for the filters you have specified.
The Oozie tab provides subtabs that give you access to Oozie instrumentation and configuration settings.
For information on the instrumentation metrics supported by Oozie, see Oozie Monitoring.
For information on the configuration properties supported by Oozie, see Oozie Configuration.
In Workflow Manager you create Oozie workflows and submit them for execution.
Click the Workflows tab to open the Workflow Manager.
Each row shows a workflow: its name, description, timestamp of its last modification. It also shows:
In Workflow Editor you edit workflows that include MapReduce, Streaming, Java, Pig, Hive, Sqoop, Shell, Ssh, DistCp, Fs, Email, Sub-workflow, and Generic actions. You can configure these actions in the Workflow Editor, or you can import job designs from Job Designer to be used as actions in your workflow. For information about defining workflows, see the Workflow Specification.
To open a workflow, in Workflow Manager, click the workflow. Proceed with Editing a Workflow.
To submit a workflow for execution, do one of the following:
The workflow job is submitted and the Dashboard displays the workflow job.
To view the output of the job, click
View the logs.
In the pane on the left, click the Suspend button.
In the pane on the left, click the Resume button.
In the pane on the left, click the Rerun button.
To schedule a workflow for recurring execution, do one of the following:
A coordinator is created and opened in the Coordinator Editor. Proceed with Editing a Coordinator.
In the Workflow Editor you can easily perform operations on Oozie action and control nodes.
The Workflow Editor supports dragging and dropping action nodes. As you move the action over other actions and forks, highlights indicate active areas. If there are actions in the workflow, the active areas are the actions themselves and the areas above and below the actions. If you drop an action on an existing action, a fork and join is added to the workflow.
In the Workflow Editor, click the Upload button.
The workspace of the workflow is opened in the File Browser application. Follow the procedure in Uploading Files to upload the files. You must put JAR files in a lib directory in the workspace.
In Coordinator Manager you create Oozie coordinator applications and submit them for execution.
Click the Coordinators tab to open the Coordinator Manager.
Each row shows a coordinator: its name, description, timestamp of its last modification. It also shows:
In Coordinator Editor, you edit coordinators and the datasets required by the coordinators. For information about defining coordinators and datasets, see the Coordinator Specification.
To open a coordinator, in Coordinator Manager, click the coordinator. Proceed with Editing a Coordinator.
To create a coordinator, in Coordinator Manager:
In the Coordinator Editor you specify coordinator properties and the datasets on which the workflow scheduled by the coordinator will operate by stepping through screens in a wizard. You can also advance to particular steps and revisit steps by clicking the Step "tabs" above the screens. The following instructions walk you through the wizard.
In Bundle Manager you create Oozie bundle applications and submit them for execution.
Click the Bundle tab to open the Bundle Manager.
Each row shows a bundle: its name, description, timestamp of its last modification. It also shows:
For information about defining bundles, see the Bundle Specification.
To open a bundle, in Bundle Manager, click the bundle. Proceed with Editing a Bundle.
To submit a bundle for execution, check the checkbox next to the bundle and click the Submit button.
In the Bundle Editor, you specify properties by stepping through screens in a wizard. You can also advance to particular steps and revisit steps by clicking the Step "tabs" above the screens. The following instructions walk you through the wizard.