MOOC: Understanding Apache Flink

Understanding Apache Flink

By Ivan Mushketyk

Course: https://app.pluralsight.com/library/courses/understanding-apache-flink/table-of-contents

Apache Flink is "Framework for stream and batch processing"

  • Has Java, Scala, and Python APIs
  • Features: Batch is a special case, Flexible windowing, Iteration support, Memeory management, High-throughput low latency, Exactly-once processing, Query optimization

Batch Processing

  • Operations
    • map
    • filter
    • flatMap
    • groupBy
  • Operation for multiple dataset
    • join
    • outerJoin
    • cross
  • Write Data
    • writeAsText
    • writeAsCsv
    • print / printToErr
    • write
    • output
    • collect
  • A simple processing flow
    • readCsvFile()
    • map()
    • filter()
    • writeAsText()

Stream processing

  • Get StreamEcecutionEnvironment
  • Operations
    • readTextFile
    • fromCollection / fromElements
    • socketTextStream
    • addSource
    • map
    • filter
    • flatMap
    • union
    • split
    • connect
    • writeAsText / writeAsCsv
    • print / printToErr
    • addSink
  • A simple processing flow
    • addSource()
    • map()
    • filter()
    • print()
  • Windows
    • Non-keyed: non-parallel processing of a single stream
    • Keyed windows: "groups" stream elements to processl in parallel
    • Window Assigner (how to split data into windows)
      • Tumbling (time)
      • Tumbling (count)
      • Sliding
      • Session
      • Global
    • Operations in Windows
      • reduce/fold
      • min/max/sum
      • apply
    • Late Events – Events that arrive after the a window is processed. By default, these events are dropped. Can define how long to wait for events.
  • State in Stream Processing
    • State types
      • Keyed
        • ValueState
        • ListState
        • ReduceSate
        • FoldState
      • Operator

Other topics to explore

  • Table API
  • Graph processing (gelly)
  • Machine learning library (FlinkML)
  • Complex event processing (CEP)
  • Checkpoints and savepoints
  • Apache Kafka integration
Advertisement

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s