The ingestion phase is both the initialization of a model as well as an application interaction between users and the model. During initialization, users specify locations for data sources or annotate data (another form of ingestion). During interaction, users consume the predictions of the model and provide feedback that is used to rein‐ force the model.

The staging phase is where transformations are applied to data to make it consumable and stored so that it can be made available for processing. Staging is responsible for normalization and standardization of data, as well as data management in some com‐ putational data store.

The computation phase is the heavy-lifting phase with the primary responsibility of mining the data for insights, performing aggregations or reports, or building machine learning models for recommendations, clustering, or classification.

Workflow management phase performs abstraction, orchestration, and automation tasks that enable the workflow steps to be operationalized for production. The end result of this step should be an application, job, or script that can be run on demand in an automated fashion

By the book:

Data Analytics with Hadoop

Return to home | Generated on 09/29/22