![]() Configuring FlumeĬonfiguring Flume on the chosen machine requires the following two steps. Machines in the Spark cluster should have access to the chosen machine running the custom sink. The rest of the Flume pipeline is configured to send data to that agent. General RequirementsĬhoose a machine that will run the custom sink in a Flume agent. However, this requires configuring Flume to run a custom sink. Transactions succeed only after data is received and Spark Streaming uses a reliable Flume receiverĪnd transactions to pull data from the sink. ![]() Flume pushes data into the sink, and the data stays buffered.Instead of Flume pushing data directly to Spark Streaming, this approach runs a custom Flume sink that allows the following. Maven repository and add it to spark-submit with -jars.Īpproach 2: Pull-based Approach using a Custom Sink Īlternatively, you can also download the JAR of the Maven artifact spark-streaming-flume-assembly from the bin/spark-submit -packages :spark-streaming-flume_2.10:1.6.2. Then use spark-submit to launch your application (see Deploying section in the main programming guide).įor Python applications which lack SBT/Maven project management, spark-streaming-flume_2.10 and its dependencies can be directly added to spark-submit using -packages (see Application Submission Guide). ![]() Make sure spark-core_2.10 and spark-streaming_2.10 are marked as provided dependencies as those are already present in a Spark installation. However, the details are slightly different for Scala/Java applications and Python applications.įor Scala and Java applications, if you are using SBT or Maven for project management, then package spark-streaming-flume_2.10 and its dependencies into the application JAR. Note that the hostname should be the same as the one used by the resource manager in theĬluster (Mesos, YARN or Spark Standalone), so that resource allocation can match the names and launchĭeploying: As with any Spark applications, spark-submit is used to launch your application. You can specify your custom decoding function to decode the body byte arrays in Flume events to any arbitrary data type. Programming: In the streaming application code, import FlumeUtils and create input DStream as follows.įrom import FlumeUtilsįlumeStream = FlumeUtils.createStream(streamingContext,, )īy default, the Python API will decode Flume event body as UTF8 encoded strings. Linking: In your SBT/Maven projrect definition, link your streaming application against the following artifact (see Linking section in the main programming guide for further information). ![]() See the Flume’s documentation for more information aboutĬonfiguring Flume agents. Configuring FlumeĬonfigure Flume agent to send data to an Avro sink by having the following in the configuration file. When your Flume + Spark Streaming application is launched, one of the Spark workers must run on that machine.įlume can be configured to push data to a port on that machine.ĭue to the push model, the streaming application needs to be up, with the receiver scheduled and listening on the chosen port, for Flume to be able push data. General RequirementsĬhoose a machine in your cluster such that In this approach, Spark Streaming essentially sets up a receiver that acts an Avro agent for Flume, to which Flume can push the data. Approach 1: Flume-style Push-based Approachįlume is designed to push data between Flume agents. Here we explain how to configure Flume and Spark Streaming to receive data from Flume. Spark Streaming + Flume Integration GuideĪpache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |