Example usage for com.google.common.collect SetMultimap replaceValues

List of usage examples for com.google.common.collect SetMultimap replaceValues

Introduction

In this page you can find the example usage for com.google.common.collect SetMultimap replaceValues.

Prototype

@Override
Set<V> replaceValues(K key, Iterable<? extends V> values);

Source Link

Document

Because a SetMultimap has unique values for a given key, this method returns a Set , instead of the java.util.Collection specified in the Multimap interface.

Usage

From source file:co.cask.cdap.etl.planner.ConnectorDag.java

/**
 * Insert connector nodes into the dag.// w ww .  ja  v  a2 s . c  om
 *
 * A connector node is a boundary at which the pipeline can be split into sub dags.
 * It is treated as a sink within one subdag and as a source in another subdag.
 * A connector is inserted in front of a reduce node (aggregator plugin type, etc)
 * when there is a path from some source to one or more reduce nodes or sinks.
 * This is required because in a single mapper, we can't write to both a sink and do a reduce.
 * We also can't have 2 reducers in a single mapreduce job.
 * A connector is also inserted in front of any node if the inputs into the node come from multiple sources.
 * A connector is also inserted in front of a reduce node that has another reduce node as its input.
 *
 * After splitting, the result will be a collection of subdags, with each subdag representing a single
 * mapreduce job (or possibly map-only job). Or in spark, each subdag would be a series of operations from
 * one rdd to another rdd.
 *
 * @return the nodes that had connectors inserted in front of them
 */
public Set<String> insertConnectors() {
    // none of this is particularly efficient, but this should never be a bottleneck
    // unless we're dealing with very very large dags

    Set<String> addedAlready = new HashSet<>();

    /*
        Isolate the specified node by inserting a connector in front of and behind the node.
        If all inputs into the the node are sources, a connector will not be inserted in front.
        If all outputs from the node are sinks, a connector will not be inserted after.
        Other connectors count as both a source and a sink.
     */
    for (String isolationNode : isolationNodes) {
        isolate(isolationNode, addedAlready);
    }

    /*
        Find sections of the dag where a source is writing to both a sink and a reduce node
        or to multiple reduce nodes. a connector counts as both a source and a sink.
            
        for example, if a source is writing to both a sink and a reduce:
            
            |---> sink1
          source ---|
            |---> reduce ---> sink2
            
        we need to split this up into:
            
            |---> sink1
          source ---|                    =>     connector ---> reduce ---> sink2
            |---> connector
            
        The same logic applies if a source is writing to multiple reduce nodes. So if we run into this scenario,
        we will add a connector in front of all reduce nodes accessible from the source.
        When trying to find a path from a source to multiple reduce nodes, we also need to stop searching
        once we see a reduce node or a connector. Otherwise, every single reduce node would end up
        with a connector in front of it.
     */
    for (String node : getTopologicalOrder()) {
        if (!sources.contains(node) && !connectors.contains(node)) {
            continue;
        }

        Set<String> accessibleByNode = accessibleFrom(node, Sets.union(connectors, reduceNodes));
        Set<String> sinksAndReduceNodes = Sets.intersection(accessibleByNode,
                Sets.union(connectors, Sets.union(sinks, reduceNodes)));
        // don't count this node
        sinksAndReduceNodes = Sets.difference(sinksAndReduceNodes, ImmutableSet.of(node));

        if (sinksAndReduceNodes.size() > 1) {
            for (String reduceNodeConnector : Sets.intersection(sinksAndReduceNodes, reduceNodes)) {
                addConnectorInFrontOf(reduceNodeConnector, addedAlready);
            }
        }
    }

    /*
        Find nodes that have input from multiple sources and add them to the connectors set.
        We can probably remove this part once we support multiple sources. Even though we don't support
        multiple sources today, the fact that we support forks means we have to deal with the multi-input case
        and break it down into separate phases. For example:
            
        |---> reduce1 ---|
          n1 ---|                |---> n2
        |---> reduce2 ---|
            
        From the previous section, both reduces will get a connector inserted in front:
            
        |---> reduce1.connector               reduce1.connector ---> reduce1 ---|
          n1 ---|                              =>                                       |---> n2
        |---> reduce2.connector               reduce2.connector ---> reduce2 ---|
            
        Since we don't support multi-input yet, we need to convert that further into 3 phases:
            
          reduce1.connector ---> reduce1 ---> n2.connector
                                                            =>       sink.connector ---> n2
          reduce2.connector ---> reduce2 ---> n2.connector
            
        To find these nodes, we traverse the graph in order and keep track of sources that have a path to each node
        with a map of node -> [ sources that have a path to the node ]
        if we find that a node is accessible by more than one source, we insert a connector in front of it and
        reset all sources for that node to its connector
     */
    SetMultimap<String, String> nodeSources = HashMultimap.create();
    for (String source : sources) {
        nodeSources.put(source, source);
    }
    for (String node : getTopologicalOrder()) {
        Set<String> connectedSources = nodeSources.get(node);
        /*
            If this node is a connector, replace all sources for this node with itself, since a connector is a source
            Taking the example above, we end up with:
                
              reduce1.connector ---> reduce1 ---|
                                      |---> n2
              reduce2.connector ---> reduce2 ---|
                
            When we get to n2, we need it to see that it has 2 sources: reduce1.connector and reduce2.connector
            So when get to reduce1.connector, we need to replace its source (n1) with itself.
            Similarly, when we get to reduce2.connector, we need to replaces its source (n1) with itself.
            If we didn't, when we got to n2, it would think its only source is n1, and we would
            miss the connector that should be inserted in front of it.
         */
        if (connectors.contains(node)) {
            connectedSources = new HashSet<>();
            connectedSources.add(node);
            nodeSources.replaceValues(node, connectedSources);
        }
        // if more than one source is connected to this node, then we need to insert a connector in front of this node.
        // its source should then be changed to the connector that was inserted in front of it.
        if (connectedSources.size() > 1) {
            String connectorNode = addConnectorInFrontOf(node, addedAlready);
            connectedSources = new HashSet<>();
            connectedSources.add(connectorNode);
            nodeSources.replaceValues(node, connectedSources);
        }
        for (String nodeOutput : getNodeOutputs(node)) {
            // propagate the source connected to me to all my outputs
            nodeSources.putAll(nodeOutput, connectedSources);
        }
    }

    /*
        Find reduce nodes that are accessible from other reduce nodes. For example:
            
          source ---> reduce1 ---> reduce2 ---> sink
            
        Needs to be broken down into:
            
          source ---> reduce1 ---> reduce2.connector      =>     reduce2.connector ---> reduce2 ---> sink
     */
    for (String reduceNode : reduceNodes) {
        Set<String> accessibleByNode = accessibleFrom(reduceNode, Sets.union(connectors, reduceNodes));
        Set<String> accessibleReduceNodes = Sets.intersection(accessibleByNode, reduceNodes);

        // Sets.difference because we don't want to add ourselves
        accessibleReduceNodes = Sets.difference(accessibleReduceNodes, ImmutableSet.of(reduceNode));
        for (String accessibleReduceNode : accessibleReduceNodes) {
            addConnectorInFrontOf(accessibleReduceNode, addedAlready);
        }
    }

    return addedAlready;
}