Example usage for org.apache.spark.sql.streaming OutputMode Append

List of usage examples for org.apache.spark.sql.streaming OutputMode Append

Introduction

In this page you can find the example usage for org.apache.spark.sql.streaming OutputMode Append.

Prototype

public static OutputMode Append() 

Source Link

Document

OutputMode in which only the new rows in the streaming DataFrame/Dataset will be written to the sink.

Usage

From source file:HoodieJavaStreamingApp.java

License:Apache License

/**
 * Hoodie spark streaming job//from   w ww  .j a v  a2s . c  o  m
 * @param streamingInput
 * @throws Exception
 */
public void stream(Dataset<Row> streamingInput) throws Exception {

    DataStreamWriter<Row> writer = streamingInput.writeStream().format("com.uber.hoodie")
            .option("hoodie.insert.shuffle.parallelism", "2").option("hoodie.upsert.shuffle.parallelism", "2")
            .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY(), tableType)
            .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
            .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "partition")
            .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
            .option(HoodieWriteConfig.TABLE_NAME, tableName)
            .option("checkpointLocation", streamingCheckpointingPath).outputMode(OutputMode.Append());

    updateHiveSyncConfig(writer);
    writer.trigger(new ProcessingTime(500)).start(tablePath).awaitTermination(streamingDurationInMs);
}