Total WebSite Views Count

Flink Ecosystem Components

Apache Flink Tutorial- Ecosystem Components
Above diagram shows complete ecosystem of Apache Flink. There are different layers in the ecosystem diagram:

i. Storage / Streaming

Flink doesn’t ship with the storage system; it is just a computation engine. Flink can read, write data from different storage system as well as can consume data from streaming systems. Below is the list of storage/streaming system from which Flink can read write data:
  • HDFS Hadoop Distributed File System
  • Local-FS – Local File System
  • S3 – Simple Storage Service from Amazon
  • HBase NoSQL Database in Hadoop ecosystem
  • MongoDB NoSQL Database
  • RDBMS – Any relational database
  • Kafka Distributed messaging Queue
  • RabbitMQ – Messaging Queue
  • Flume Data Collection and Aggregation Tool
The second layer is the deployment/resource management. Flink can be deployed in following modes:
  • Local mode – On a single node, in single JVM
  • Cluster –
    On a multi-node cluster, with following resource manager.
    • Standalone – This is the default resource manager which is shipped with Flink.
    • YARN – This is a very popular resource manager, it is part of Hadoop, introduced in Hadoop 2.x
    • Mesos – This is a generalized resource manager.
  • Cloud – on Amazon or Google cloud
      The next layer is Runtime – the Distributed Streaming Dataflow, which is also called as the kernel of Apache Flink. This is the core layer of flink which provides distributed processing, fault tolerance, reliability, native iterative processing capability, etc. The top layer is for APIs and Library, which provides the diverse capability to Flink:

      ii. DataSet API

      It handles the data at the rest, it allows the user to implement operations like map, filter, join, group, etc. on the dataset. It is mainly used for distributed processing. Actually, it is a special case of Stream processing where we have a finite data source. The batch application is also executed on the streaming runtime.

      iii.DataStream API

      It handles a continuous stream of the data. To process live data stream it provides various operations like map, filter, update states, window, aggregate, etc. It can consume the data from the various streaming source and can write the data to different sinks. It supports both Java and Scala. Now let’s discuss some DSL (Domain Specific Library) Tool’s

      iv. Table

      It enables users to perform ad-hoc analysis using SQL like expression language for relational stream and batch processing. It can be embedded in DataSet and DataStream APIs. Actually, it saves users from writing complex code to process the data instead allows them to run SQL queries on the top of Flink.

      v. Gelly

      It is the graph processing engine which allows users to run set of operations to create, transform and process the graph. Gelly also provides the library of an algorithm to simplify the development of graph applications. It leverages native iterative processing model of Flink to handle graph efficiently. Its APIs are available in Java and Scala.

      vi. FlinkML

      It is the machine learning library which provides intuitive APIs and an efficient algorithm to handle machine learning applications. We write it in Scala. As we know machine learning algorithms are iterative in nature, Flink provides native support for iterative algorithm to handle the same quite effectively and efficiently.

      AWS Services

      AWS Services

      Technology Selection & Evaluation Criteria

      Technology Selection & Evaluation Criteria

      Scale Cube - Scale In X Y Z Cube

      Scale Cube - Scale In X Y Z Cube

      Feature Post

      AWS Services

      About Me

      About Me

      Spring Cloud

      Spring Cloud
      Spring Cloud

      Spring Cloud +mCloud Native + Big Data Archittect

      Spring Cloud +mCloud Native + Big Data Archittect

      ACID Transaction

      ACID Transaction

      Data Pipe Line Stack

      Data Pipe Line Stack

      Popular Posts