After Hadoop we keep getting tons of technologies to handle Big Data, like
- MapReduce – Batch processing engine,
- Apache Storm – Stream processing engine,
- Apache Tez – Batch and interactive engine,
- Apache Giraph – Graph processing engine,
- Apache Hive – SQL engine.
Big Data Processing Requirement
The industry needs a generalized platform which alone can handle diverse workloads like:
- Batch Processing
- Interactive Processing
- Real-time (stream) Processing
- Graph Processing
- Iterative processing
- In-memory processing
Thus. the platform should also provide
- Distributed computing,
- Fault tolerance
- High availability
- Ease of use and lightning fast speed.
Apache Flink provide all Big Data Processing Requirement as a unit
Apache Spark has started the new trend by offering a diverse platform to solve different problems but is limited due to its underlying batch processing engine which processes streams also as micro-batches. Flink has taken the same capability ahead and Flink can solve all the types of Big Data problems.
Apache Flink is a general purpose cluster computing tool, which can handle batch processing, interactive processing, Stream processing, Iterative processing, in-memory processing, graph processing.
Thus, Apache Flink is the next generation Big Data platform also known as 4G of Big Data. At the heart, it is a stream processing framework (doesn’t cut stream into micro-batches).
Flink’s kernel is a streaming runtime which also provides lightning fast speed, fault tolerance, distributed processing, ease of use, etc.
Basically, Flink processes data at a consistently high speed with very low latency. So, it is the large-scale data processing platform which can process data generated at very high speed.
Flink
Apache Flink is the next generation Big Data tool also known as 4G of Big Data.
- It is the true stream processing framework (doesn’t cut stream into micro-batches).
- Flink’s kernel (core) is a streaming runtime which also provides distributed processing, fault tolerance, etc.
- Flink processes events at a consistently high speed with low latency.
- It processes the data at lightning fast speed.
- It is the large-scale data processing framework which can process data generated at very high velocity.
Apache Flink is the powerful open source platform which can address following types of requirements efficiently:
- Batch Processing
- Interactive processing
- Real-time stream processing
- Graph Processing
- Iterative Processing
- In-memory processing
Flink is an alternative to MapReduce, it processes data more than 100 times faster than MapReduce.
It is independent of Hadoop but it can use HDFS to read, write, store, process the data.
Flink does not provide its own data storage system. It takes data from distributed storage.
What is RAD (Rapid Application Development) Model?
RAD or Rapid Application Development process is an adoption of the waterfall model; it targets at developing software in a short span of time. RAD follow the iterative
SDLC RAD model has following phases
- Business Modeling
- Data Modeling
- Process Modeling
- Application Generation
- Testing and Turnover