2014/07/04

Control Cluster, Bootstrap and Step Statuses

In this post we are going to learn how to get information about the status of the cluster and various of its components - bootstrap actions, step actions -. In some cases waiting for a step to finish or fail is a necessary requirement; this is achieved implementing a status loops that checks the situation of a desired element each certain time (Amazon has a request limit to avoid potential attacks). This post covers how to implement such types of loop for job flows, bootstrap actions, and step actions.

If you do not know how a cluster is created, a previous post covering this lesson is accessible via this link. Even for those readers who are not interested in knowing how a cluster is created, it is a recommended post that facilitates the comprehension of the matters discussed here.

Check Job Flow (or cluster) status

Without getting into too much detail, the following code shows how to check the status for a certain job flow (multiple job flows can be analysed at the same time):


String state;
String jobFlowId = runJobFlowResult.getJobFlowId(); //Created in the previous post

DescribeJobFlowsRequest jobFlowDescRequest;
DescribeJobFlowsResult jobFlowDescResult ; 
JobFlowDetail jobFlowDetail;

STATUS_LOOP: while(true){

   jobFlowDescRequest = new DescribeJobFlowsRequest(Arrays.asList(new String[] {jobFlowId }));
   jobFlowDescResult = emrClient.describeJobFlows(jobFlowDescRequest);
            
   jobFlowDetail = jobFlowDescResult.getJobFlows().get(0);
   state = jobFlowDetail.getExecutionStatusDetail().getState().toString();
     
   if(jobFlowIsDone(state)){
          break;    
   }else{
        try {
             Thread.sleep(5000); //To avoid making requests too frequently
        } catch (InterruptedException ex) {
             //Exception treatment       
        }
   }
}

//****************************
//****** EXTRA FUNCTION ******

private static final List<JobFlowExecutionState> JOB_DONE_STATES = Arrays.asList(new JobFlowExecutionState[] {    

     JobFlowExecutionState.COMPLETED, 
     JobFlowExecutionState.FAILED, 
     JobFlowExecutionState.TERMINATED, 
     JobFlowExecutionState.WAITING,
     JobFlowExecutionState.RUNNING});

private boolean jobFlowIsDone (String value){
   return JOB_DONE_STATES.contains(JobFlowExecutionState.fromValue(value));
} 

As it can be observed, the method consists on a simple request to the EMR server, which returns a state variable that clients can observe. The functionality of the method is rather simple; after having requested the job flow status to the server, it checks whether the status is one of the following or it is not: completed, failed, terminated, waiting, running. If one of the conditions matches a single element of the list, the loop is broken and the application continues running. However, if no elements match the actual status of the job flow, the method sleeps for 5000 ms (variable) and runs the loop one again.

Check step (and bootstrap) status

In order to be able to obtain the status of the steps or bootstrap actions, only a small syntax change is needed. The changes are shown in the following code:


String state;
String jobFlowId = runJobFlowResult.getJobFlowId(); //Created in the previous post
StepDetail stepDetail;
DescribeJobFlowsRequest jobFlowDescRequest;
DescribeJobFlowsResult jobFlowDescResult ; 
JobFlowDetail jobFlowDetail;

STATUS_LOOP: while(true){

   jobFlowDescRequest = new DescribeJobFlowsRequest(Arrays.asList(new String[] {jobFlowId }));
   jobFlowDescResult = emrClient.describeJobFlows(jobFlowDescRequest);
          
   stepDetail = jobFlowDescResult.getJobFlows().get(0).getStep().get(jobFlowDescResult.getJobFlows().get(0).getSteps().size() -1 );
   state = stepDetail.getExecutionStatusDetail().getState().toString();
     
   if(stepIsDone(state)){
          break;    
   }else{
        try {
             Thread.sleep(5000); //To avoid making requests too frequently
        } catch (InterruptedException ex) {
             // Exception handling
        }
   }
}

//****************************
//****** EXTRA FUNCTION ******

private static final List<StepExecutionState> STEP_DONE_STATES = Arrays.asList(new StepExecutionState[] {    

     StepExecutionState.COMPLETED, 
     StepExecutionState.FAILED, 
     StepExecutionState.CANCELLED,
     StepExecutionState.INTERRUPTED});

private boolean stepIsDone (String value){
   return STEP_DONE_STATES.contains(StepExecutionState.fromValue(value));
} 

The code above is appropriate for getting the status for a single step. Again, it can be modified to check all the active - and non active - steps and bootstrap actions, it all depends on the application the user wants to develop.

The next post we will see how to re-size a running cluster using instance groups. Please, if this post has been useful for you, comment and share so others can also get advantage of these pieces of code.




7 comments: