Example usage for com.google.common.collect MinMaxPriorityQueue size

List of usage examples for com.google.common.collect MinMaxPriorityQueue size

Introduction

In this page you can find the example usage for com.google.common.collect MinMaxPriorityQueue size.

Prototype

int size

To view the source code for com.google.common.collect MinMaxPriorityQueue size.

Click Source Link

Usage

From source file:com.alibaba.wasp.master.balancer.DefaultLoadBalancer.java

/**
 * Generate a global load balancing plan according to the specified map of
 * server information to the most loaded entityGroups of each server.
 * //from w w w  . j  a  v a  2s  . c  om
 * The load balancing invariant is that all servers are within 1 entityGroup of the
 * average number of entityGroups per server. If the average is an integer number,
 * all servers will be balanced to the average. Otherwise, all servers will
 * have either floor(average) or ceiling(average) entityGroups.
 * 
 * HBASE-3609 Modeled entityGroupsToMove using Guava's MinMaxPriorityQueue so that
 * we can fetch from both ends of the queue. At the beginning, we check
 * whether there was empty entityGroup server just discovered by Master. If so, we
 * alternately choose new / old entityGroups from head / tail of entityGroupsToMove,
 * respectively. This alternation avoids clustering young entityGroups on the newly
 * discovered entityGroup server. Otherwise, we choose new entityGroups from head of
 * entityGroupsToMove.
 * 
 * Another improvement from HBASE-3609 is that we assign entityGroups from
 * entityGroupsToMove to underloaded servers in round-robin fashion. Previously one
 * underloaded server would be filled before we move onto the next underloaded
 * server, leading to clustering of young entityGroups.
 * 
 * Finally, we randomly shuffle underloaded servers so that they receive
 * offloaded entityGroups relatively evenly across calls to balanceCluster().
 * 
 * The algorithm is currently implemented as such:
 * 
 * <ol>
 * <li>Determine the two valid numbers of entityGroups each server should have,
 * <b>MIN</b>=floor(average) and <b>MAX</b>=ceiling(average).
 * 
 * <li>Iterate down the most loaded servers, shedding entityGroups from each so
 * each server hosts exactly <b>MAX</b> entityGroups. Stop once you reach a server
 * that already has &lt;= <b>MAX</b> entityGroups.
 * <p>
 * Order the entityGroups to move from most recent to least.
 * 
 * <li>Iterate down the least loaded servers, assigning entityGroups so each server
 * has exactly </b>MIN</b> entityGroups. Stop once you reach a server that already
 * has &gt;= <b>MIN</b> entityGroups.
 * 
 * EntityGroups being assigned to underloaded servers are those that were shed in
 * the previous step. It is possible that there were not enough entityGroups shed
 * to fill each underloaded server to <b>MIN</b>. If so we end up with a
 * number of entityGroups required to do so, <b>neededEntityGroups</b>.
 * 
 * It is also possible that we were able to fill each underloaded but ended up
 * with entityGroups that were unassigned from overloaded servers but that still do
 * not have assignment.
 * 
 * If neither of these conditions hold (no entityGroups needed to fill the
 * underloaded servers, no entityGroups leftover from overloaded servers), we are
 * done and return. Otherwise we handle these cases below.
 * 
 * <li>If <b>neededEntityGroups</b> is non-zero (still have underloaded servers),
 * we iterate the most loaded servers again, shedding a single server from
 * each (this brings them from having <b>MAX</b> entityGroups to having <b>MIN</b>
 * entityGroups).
 * 
 * <li>We now definitely have more entityGroups that need assignment, either from
 * the previous step or from the original shedding from overloaded servers.
 * Iterate the least loaded servers filling each to <b>MIN</b>.
 * 
 * <li>If we still have more entityGroups that need assignment, again iterate the
 * least loaded servers, this time giving each one (filling them to
 * </b>MAX</b>) until we run out.
 * 
 * <li>All servers will now either host <b>MIN</b> or <b>MAX</b> entityGroups.
 * 
 * In addition, any server hosting &gt;= <b>MAX</b> entityGroups is guaranteed to
 * end up with <b>MAX</b> entityGroups at the end of the balancing. This ensures
 * the minimal number of entityGroups possible are moved.
 * </ol>
 * 
 * TODO: We can at-most reassign the number of entityGroups away from a particular
 * server to be how many they report as most loaded. Should we just keep all
 * assignment in memory? Any objections? Does this mean we need HeapSize on
 * HMaster? Or just careful monitor? (current thinking is we will hold all
 * assignments in memory)
 * 
 * @param clusterState Map of entityGroupservers and their load/entityGroup information
 *          to a list of their most loaded entityGroups
 * @return a list of entityGroups to be moved, including source and destination, or
 *         null if cluster is already balanced
 */
public List<EntityGroupPlan> balanceCluster(Map<ServerName, List<EntityGroupInfo>> clusterMap) {
    boolean emptyFServerPresent = false;
    long startTime = System.currentTimeMillis();

    ClusterLoadState cs = new ClusterLoadState(clusterMap);

    int numServers = cs.getNumServers();
    if (numServers == 0) {
        LOG.debug("numServers=0 so skipping load balancing");
        return null;
    }
    NavigableMap<ServerAndLoad, List<EntityGroupInfo>> serversByLoad = cs.getServersByLoad();

    int numEntityGroups = cs.getNumEntityGroups();

    if (!this.needsBalance(cs)) {
        // Skipped because no server outside (min,max) range
        float average = cs.getLoadAverage(); // for logging
        LOG.info("Skipping load balancing because balanced cluster; " + "servers=" + numServers + " "
                + "entityGroups=" + numEntityGroups + " average=" + average + " " + "mostloaded="
                + serversByLoad.lastKey().getLoad() + " leastloaded=" + serversByLoad.firstKey().getLoad());
        return null;
    }

    int min = numEntityGroups / numServers;
    int max = numEntityGroups % numServers == 0 ? min : min + 1;

    // Using to check balance result.
    StringBuilder strBalanceParam = new StringBuilder();
    strBalanceParam.append("Balance parameter: numEntityGroups=").append(numEntityGroups)
            .append(", numServers=").append(numServers).append(", max=").append(max).append(", min=")
            .append(min);
    LOG.debug(strBalanceParam.toString());

    // Balance the cluster
    // TODO: Look at data block locality or a more complex load to do this
    MinMaxPriorityQueue<EntityGroupPlan> entityGroupsToMove = MinMaxPriorityQueue.orderedBy(rpComparator)
            .create();
    List<EntityGroupPlan> entityGroupsToReturn = new ArrayList<EntityGroupPlan>();

    // Walk down most loaded, pruning each to the max
    int serversOverloaded = 0;
    // flag used to fetch entityGroups from head and tail of list, alternately
    boolean fetchFromTail = false;
    Map<ServerName, BalanceInfo> serverBalanceInfo = new TreeMap<ServerName, BalanceInfo>();
    for (Map.Entry<ServerAndLoad, List<EntityGroupInfo>> server : serversByLoad.descendingMap().entrySet()) {
        ServerAndLoad sal = server.getKey();
        int entityGroupCount = sal.getLoad();
        if (entityGroupCount <= max) {
            serverBalanceInfo.put(sal.getServerName(), new BalanceInfo(0, 0));
            break;
        }
        serversOverloaded++;
        List<EntityGroupInfo> entityGroups = server.getValue();
        int numToOffload = Math.min(entityGroupCount - max, entityGroups.size());
        // account for the out-of-band entityGroups which were assigned to this server
        // after some other entityGroup server crashed
        Collections.sort(entityGroups, riComparator);
        int numTaken = 0;
        for (int i = 0; i <= numToOffload;) {
            EntityGroupInfo egInfo = entityGroups.get(i); // fetch from head
            if (fetchFromTail) {
                egInfo = entityGroups.get(entityGroups.size() - 1 - i);
            }
            i++;
            entityGroupsToMove.add(new EntityGroupPlan(egInfo, sal.getServerName(), null));
            numTaken++;
            if (numTaken >= numToOffload)
                break;
            // fetch in alternate order if there is new entityGroup server
            if (emptyFServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
        }
        serverBalanceInfo.put(sal.getServerName(), new BalanceInfo(numToOffload, (-1) * numTaken));
    }
    int totalNumMoved = entityGroupsToMove.size();

    // Walk down least loaded, filling each to the min
    int neededEntityGroups = 0; // number of entityGroups needed to bring all up to min
    fetchFromTail = false;

    Map<ServerName, Integer> underloadedServers = new HashMap<ServerName, Integer>();
    for (Map.Entry<ServerAndLoad, List<EntityGroupInfo>> server : serversByLoad.entrySet()) {
        int entityGroupCount = server.getKey().getLoad();
        if (entityGroupCount >= min) {
            break;
        }
        underloadedServers.put(server.getKey().getServerName(), min - entityGroupCount);
    }
    // number of servers that get new entityGroups
    int serversUnderloaded = underloadedServers.size();
    int incr = 1;
    List<ServerName> sns = Arrays
            .asList(underloadedServers.keySet().toArray(new ServerName[serversUnderloaded]));
    Collections.shuffle(sns, RANDOM);
    while (entityGroupsToMove.size() > 0) {
        int cnt = 0;
        int i = incr > 0 ? 0 : underloadedServers.size() - 1;
        for (; i >= 0 && i < underloadedServers.size(); i += incr) {
            if (entityGroupsToMove.isEmpty())
                break;
            ServerName si = sns.get(i);
            int numToTake = underloadedServers.get(si);
            if (numToTake == 0)
                continue;

            addEntityGroupPlan(entityGroupsToMove, fetchFromTail, si, entityGroupsToReturn);
            if (emptyFServerPresent) {
                fetchFromTail = !fetchFromTail;
            }

            underloadedServers.put(si, numToTake - 1);
            cnt++;
            BalanceInfo bi = serverBalanceInfo.get(si);
            if (bi == null) {
                bi = new BalanceInfo(0, 0);
                serverBalanceInfo.put(si, bi);
            }
            bi.setNumEntityGroupsAdded(bi.getNumEntityGroupsAdded() + 1);
        }
        if (cnt == 0)
            break;
        // iterates underloadedServers in the other direction
        incr = -incr;
    }
    for (Integer i : underloadedServers.values()) {
        // If we still want to take some, increment needed
        neededEntityGroups += i;
    }

    // If none needed to fill all to min and none left to drain all to max,
    // we are done
    if (neededEntityGroups == 0 && entityGroupsToMove.isEmpty()) {
        long endTime = System.currentTimeMillis();
        LOG.info("Calculated a load balance in " + (endTime - startTime) + "ms. " + "Moving " + totalNumMoved
                + " entityGroups off of " + serversOverloaded + " overloaded servers onto " + serversUnderloaded
                + " less loaded servers");
        return entityGroupsToReturn;
    }

    // Need to do a second pass.
    // Either more entityGroups to assign out or servers that are still underloaded

    // If we need more to fill min, grab one from each most loaded until enough
    if (neededEntityGroups != 0) {
        // Walk down most loaded, grabbing one from each until we get enough
        for (Map.Entry<ServerAndLoad, List<EntityGroupInfo>> server : serversByLoad.descendingMap()
                .entrySet()) {
            BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey().getServerName());
            int idx = balanceInfo == null ? 0 : balanceInfo.getNextEntityGroupForUnload();
            if (idx >= server.getValue().size())
                break;
            EntityGroupInfo entityGroup = server.getValue().get(idx);
            entityGroupsToMove.add(new EntityGroupPlan(entityGroup, server.getKey().getServerName(), null));
            totalNumMoved++;
            if (--neededEntityGroups == 0) {
                // No more entityGroups needed, done shedding
                break;
            }
        }
    }

    // Now we have a set of entityGroups that must be all assigned out
    // Assign each underloaded up to the min, then if leftovers, assign to max

    // Walk down least loaded, assigning to each to fill up to min
    for (Map.Entry<ServerAndLoad, List<EntityGroupInfo>> server : serversByLoad.entrySet()) {
        int entityGroupCount = server.getKey().getLoad();
        if (entityGroupCount >= min)
            break;
        BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey().getServerName());
        if (balanceInfo != null) {
            entityGroupCount += balanceInfo.getNumEntityGroupsAdded();
        }
        if (entityGroupCount >= min) {
            continue;
        }
        int numToTake = min - entityGroupCount;
        int numTaken = 0;
        while (numTaken < numToTake && 0 < entityGroupsToMove.size()) {
            addEntityGroupPlan(entityGroupsToMove, fetchFromTail, server.getKey().getServerName(),
                    entityGroupsToReturn);
            numTaken++;
            if (emptyFServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
        }
    }

    // If we still have entityGroups to dish out, assign underloaded to max
    if (0 < entityGroupsToMove.size()) {
        for (Map.Entry<ServerAndLoad, List<EntityGroupInfo>> server : serversByLoad.entrySet()) {
            int entityGroupCount = server.getKey().getLoad();
            if (entityGroupCount >= max) {
                break;
            }
            addEntityGroupPlan(entityGroupsToMove, fetchFromTail, server.getKey().getServerName(),
                    entityGroupsToReturn);
            if (emptyFServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
            if (entityGroupsToMove.isEmpty()) {
                break;
            }
        }
    }

    long endTime = System.currentTimeMillis();

    if (!entityGroupsToMove.isEmpty() || neededEntityGroups != 0) {
        // Emit data so can diagnose how balancer went astray.
        LOG.warn("entityGroupsToMove=" + totalNumMoved + ", numServers=" + numServers + ", serversOverloaded="
                + serversOverloaded + ", serversUnderloaded=" + serversUnderloaded);
        StringBuilder sb = new StringBuilder();
        for (Map.Entry<ServerName, List<EntityGroupInfo>> e : clusterMap.entrySet()) {
            if (sb.length() > 0)
                sb.append(", ");
            sb.append(e.getKey().toString());
            sb.append(" ");
            sb.append(e.getValue().size());
        }
        LOG.warn("Input " + sb.toString());
    }

    // All done!
    LOG.info("Done. Calculated a load balance in " + (endTime - startTime) + "ms. " + "Moving " + totalNumMoved
            + " entityGroups off of " + serversOverloaded + " overloaded servers onto " + serversUnderloaded
            + " less loaded servers");

    return entityGroupsToReturn;
}

From source file:org.apache.hadoop.hbase.master.balancer.SimpleLoadBalancer.java

/**
 * Generate a global load balancing plan according to the specified map of
 * server information to the most loaded regions of each server.
 *
 * The load balancing invariant is that all servers are within 1 region of the
 * average number of regions per server.  If the average is an integer number,
 * all servers will be balanced to the average.  Otherwise, all servers will
 * have either floor(average) or ceiling(average) regions.
 *
 * HBASE-3609 Modeled regionsToMove using Guava's MinMaxPriorityQueue so that
 *   we can fetch from both ends of the queue. 
 * At the beginning, we check whether there was empty region server 
 *   just discovered by Master. If so, we alternately choose new / old
 *   regions from head / tail of regionsToMove, respectively. This alternation
 *   avoids clustering young regions on the newly discovered region server.
 *   Otherwise, we choose new regions from head of regionsToMove.
 *   //from  w  w w. j a v  a2  s. c  o  m
 * Another improvement from HBASE-3609 is that we assign regions from
 *   regionsToMove to underloaded servers in round-robin fashion.
 *   Previously one underloaded server would be filled before we move onto
 *   the next underloaded server, leading to clustering of young regions.
 *   
 * Finally, we randomly shuffle underloaded servers so that they receive
 *   offloaded regions relatively evenly across calls to balanceCluster().
 *         
 * The algorithm is currently implemented as such:
 *
 * <ol>
 * <li>Determine the two valid numbers of regions each server should have,
 *     <b>MIN</b>=floor(average) and <b>MAX</b>=ceiling(average).
 *
 * <li>Iterate down the most loaded servers, shedding regions from each so
 *     each server hosts exactly <b>MAX</b> regions.  Stop once you reach a
 *     server that already has &lt;= <b>MAX</b> regions.
 *     <p>
 *     Order the regions to move from most recent to least.
 *
 * <li>Iterate down the least loaded servers, assigning regions so each server
 *     has exactly </b>MIN</b> regions.  Stop once you reach a server that
 *     already has &gt;= <b>MIN</b> regions.
 *
 *     Regions being assigned to underloaded servers are those that were shed
 *     in the previous step.  It is possible that there were not enough
 *     regions shed to fill each underloaded server to <b>MIN</b>.  If so we
 *     end up with a number of regions required to do so, <b>neededRegions</b>.
 *
 *     It is also possible that we were able to fill each underloaded but ended
 *     up with regions that were unassigned from overloaded servers but that
 *     still do not have assignment.
 *
 *     If neither of these conditions hold (no regions needed to fill the
 *     underloaded servers, no regions leftover from overloaded servers),
 *     we are done and return.  Otherwise we handle these cases below.
 *
 * <li>If <b>neededRegions</b> is non-zero (still have underloaded servers),
 *     we iterate the most loaded servers again, shedding a single server from
 *     each (this brings them from having <b>MAX</b> regions to having
 *     <b>MIN</b> regions).
 *
 * <li>We now definitely have more regions that need assignment, either from
 *     the previous step or from the original shedding from overloaded servers.
 *     Iterate the least loaded servers filling each to <b>MIN</b>.
 *
 * <li>If we still have more regions that need assignment, again iterate the
 *     least loaded servers, this time giving each one (filling them to
 *     </b>MAX</b>) until we run out.
 *
 * <li>All servers will now either host <b>MIN</b> or <b>MAX</b> regions.
 *
 *     In addition, any server hosting &gt;= <b>MAX</b> regions is guaranteed
 *     to end up with <b>MAX</b> regions at the end of the balancing.  This
 *     ensures the minimal number of regions possible are moved.
 * </ol>
 *
 * TODO: We can at-most reassign the number of regions away from a particular
 *       server to be how many they report as most loaded.
 *       Should we just keep all assignment in memory?  Any objections?
 *       Does this mean we need HeapSize on HMaster?  Or just careful monitor?
 *       (current thinking is we will hold all assignments in memory)
 *
 * @param clusterMap Map of regionservers and their load/region information to
 *                   a list of their most loaded regions
 * @return a list of regions to be moved, including source and destination,
 *         or null if cluster is already balanced
 */
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>> clusterMap) {
    List<RegionPlan> regionsToReturn = balanceMasterRegions(clusterMap);
    if (regionsToReturn != null) {
        return regionsToReturn;
    }
    filterExcludedServers(clusterMap);
    boolean emptyRegionServerPresent = false;
    long startTime = System.currentTimeMillis();

    Collection<ServerName> backupMasters = getBackupMasters();
    ClusterLoadState cs = new ClusterLoadState(masterServerName, backupMasters, backupMasterWeight, clusterMap);

    if (!this.needsBalance(cs))
        return null;

    int numServers = cs.getNumServers();
    NavigableMap<ServerAndLoad, List<HRegionInfo>> serversByLoad = cs.getServersByLoad();
    int numRegions = cs.getNumRegions();
    float average = cs.getLoadAverage();
    int max = (int) Math.ceil(average);
    int min = (int) average;

    // Using to check balance result.
    StringBuilder strBalanceParam = new StringBuilder();
    strBalanceParam.append("Balance parameter: numRegions=").append(numRegions).append(", numServers=")
            .append(numServers).append(", numBackupMasters=").append(cs.getNumBackupMasters())
            .append(", backupMasterWeight=").append(backupMasterWeight).append(", max=").append(max)
            .append(", min=").append(min);
    LOG.debug(strBalanceParam.toString());

    // Balance the cluster
    // TODO: Look at data block locality or a more complex load to do this
    MinMaxPriorityQueue<RegionPlan> regionsToMove = MinMaxPriorityQueue.orderedBy(rpComparator).create();
    regionsToReturn = new ArrayList<RegionPlan>();

    // Walk down most loaded, pruning each to the max
    int serversOverloaded = 0;
    // flag used to fetch regions from head and tail of list, alternately
    boolean fetchFromTail = false;
    Map<ServerName, BalanceInfo> serverBalanceInfo = new TreeMap<ServerName, BalanceInfo>();
    for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.descendingMap().entrySet()) {
        ServerAndLoad sal = server.getKey();
        int load = sal.getLoad();
        if (load <= max) {
            serverBalanceInfo.put(sal.getServerName(), new BalanceInfo(0, 0));
            break;
        }
        serversOverloaded++;
        List<HRegionInfo> regions = server.getValue();
        int w = 1; // Normal region server has weight 1
        if (backupMasters != null && backupMasters.contains(sal.getServerName())) {
            w = backupMasterWeight; // Backup master has heavier weight
        }
        int numToOffload = Math.min((load - max) / w, regions.size());
        // account for the out-of-band regions which were assigned to this server
        // after some other region server crashed 
        Collections.sort(regions, riComparator);
        int numTaken = 0;
        for (int i = 0; i <= numToOffload;) {
            HRegionInfo hri = regions.get(i); // fetch from head
            if (fetchFromTail) {
                hri = regions.get(regions.size() - 1 - i);
            }
            i++;
            // Don't rebalance special regions.
            if (shouldBeOnMaster(hri) && masterServerName.equals(sal.getServerName()))
                continue;
            regionsToMove.add(new RegionPlan(hri, sal.getServerName(), null));
            numTaken++;
            if (numTaken >= numToOffload)
                break;
            // fetch in alternate order if there is new region server
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
        }
        serverBalanceInfo.put(sal.getServerName(), new BalanceInfo(numToOffload, (-1) * numTaken));
    }
    int totalNumMoved = regionsToMove.size();

    // Walk down least loaded, filling each to the min
    int neededRegions = 0; // number of regions needed to bring all up to min
    fetchFromTail = false;

    Map<ServerName, Integer> underloadedServers = new HashMap<ServerName, Integer>();
    int maxToTake = numRegions - min;
    for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.entrySet()) {
        if (maxToTake == 0)
            break; // no more to take
        int load = server.getKey().getLoad();
        if (load >= min && load > 0) {
            continue; // look for other servers which haven't reached min
        }
        int w = 1; // Normal region server has weight 1
        if (backupMasters != null && backupMasters.contains(server.getKey().getServerName())) {
            w = backupMasterWeight; // Backup master has heavier weight
        }
        int regionsToPut = (min - load) / w;
        if (regionsToPut == 0) {
            regionsToPut = 1;
        }
        maxToTake -= regionsToPut;
        underloadedServers.put(server.getKey().getServerName(), regionsToPut);
    }
    // number of servers that get new regions
    int serversUnderloaded = underloadedServers.size();
    int incr = 1;
    List<ServerName> sns = Arrays
            .asList(underloadedServers.keySet().toArray(new ServerName[serversUnderloaded]));
    Collections.shuffle(sns, RANDOM);
    while (regionsToMove.size() > 0) {
        int cnt = 0;
        int i = incr > 0 ? 0 : underloadedServers.size() - 1;
        for (; i >= 0 && i < underloadedServers.size(); i += incr) {
            if (regionsToMove.isEmpty())
                break;
            ServerName si = sns.get(i);
            int numToTake = underloadedServers.get(si);
            if (numToTake == 0)
                continue;

            addRegionPlan(regionsToMove, fetchFromTail, si, regionsToReturn);
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }

            underloadedServers.put(si, numToTake - 1);
            cnt++;
            BalanceInfo bi = serverBalanceInfo.get(si);
            if (bi == null) {
                bi = new BalanceInfo(0, 0);
                serverBalanceInfo.put(si, bi);
            }
            bi.setNumRegionsAdded(bi.getNumRegionsAdded() + 1);
        }
        if (cnt == 0)
            break;
        // iterates underloadedServers in the other direction
        incr = -incr;
    }
    for (Integer i : underloadedServers.values()) {
        // If we still want to take some, increment needed
        neededRegions += i;
    }

    // If none needed to fill all to min and none left to drain all to max,
    // we are done
    if (neededRegions == 0 && regionsToMove.isEmpty()) {
        long endTime = System.currentTimeMillis();
        LOG.info("Calculated a load balance in " + (endTime - startTime) + "ms. " + "Moving " + totalNumMoved
                + " regions off of " + serversOverloaded + " overloaded servers onto " + serversUnderloaded
                + " less loaded servers");
        return regionsToReturn;
    }

    // Need to do a second pass.
    // Either more regions to assign out or servers that are still underloaded

    // If we need more to fill min, grab one from each most loaded until enough
    if (neededRegions != 0) {
        // Walk down most loaded, grabbing one from each until we get enough
        for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.descendingMap().entrySet()) {
            BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey().getServerName());
            int idx = balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload();
            if (idx >= server.getValue().size())
                break;
            HRegionInfo region = server.getValue().get(idx);
            if (region.isMetaRegion())
                continue; // Don't move meta regions.
            regionsToMove.add(new RegionPlan(region, server.getKey().getServerName(), null));
            totalNumMoved++;
            if (--neededRegions == 0) {
                // No more regions needed, done shedding
                break;
            }
        }
    }

    // Now we have a set of regions that must be all assigned out
    // Assign each underloaded up to the min, then if leftovers, assign to max

    // Walk down least loaded, assigning to each to fill up to min
    for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.entrySet()) {
        int regionCount = server.getKey().getLoad();
        if (regionCount >= min)
            break;
        BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey().getServerName());
        if (balanceInfo != null) {
            regionCount += balanceInfo.getNumRegionsAdded();
        }
        if (regionCount >= min) {
            continue;
        }
        int numToTake = min - regionCount;
        int numTaken = 0;
        while (numTaken < numToTake && 0 < regionsToMove.size()) {
            addRegionPlan(regionsToMove, fetchFromTail, server.getKey().getServerName(), regionsToReturn);
            numTaken++;
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
        }
    }

    // If we still have regions to dish out, assign underloaded to max
    if (0 < regionsToMove.size()) {
        for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.entrySet()) {
            int regionCount = server.getKey().getLoad();
            BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey().getServerName());
            if (balanceInfo != null) {
                regionCount += balanceInfo.getNumRegionsAdded();
            }
            if (regionCount >= max) {
                break;
            }
            addRegionPlan(regionsToMove, fetchFromTail, server.getKey().getServerName(), regionsToReturn);
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
            if (regionsToMove.isEmpty()) {
                break;
            }
        }
    }

    long endTime = System.currentTimeMillis();

    if (!regionsToMove.isEmpty() || neededRegions != 0) {
        // Emit data so can diagnose how balancer went astray.
        LOG.warn("regionsToMove=" + totalNumMoved + ", numServers=" + numServers + ", serversOverloaded="
                + serversOverloaded + ", serversUnderloaded=" + serversUnderloaded);
        StringBuilder sb = new StringBuilder();
        for (Map.Entry<ServerName, List<HRegionInfo>> e : clusterMap.entrySet()) {
            if (sb.length() > 0)
                sb.append(", ");
            sb.append(e.getKey().toString());
            sb.append(" ");
            sb.append(e.getValue().size());
        }
        LOG.warn("Input " + sb.toString());
    }

    // All done!
    LOG.info("Done. Calculated a load balance in " + (endTime - startTime) + "ms. " + "Moving " + totalNumMoved
            + " regions off of " + serversOverloaded + " overloaded servers onto " + serversUnderloaded
            + " less loaded servers");

    return regionsToReturn;
}

From source file:org.apache.hadoop.hbase.master.DefaultLoadBalancer.java

/**
 * Generate a global load balancing plan according to the specified map of
 * server information to the most loaded regions of each server.
 *
 * The load balancing invariant is that all servers are within 1 region of the
 * average number of regions per server.  If the average is an integer number,
 * all servers will be balanced to the average.  Otherwise, all servers will
 * have either floor(average) or ceiling(average) regions.
 *
 * HBASE-3609 Modeled regionsToMove using Guava's MinMaxPriorityQueue so that
 *   we can fetch from both ends of the queue. 
 * At the beginning, we check whether there was empty region server 
 *   just discovered by Master. If so, we alternately choose new / old
 *   regions from head / tail of regionsToMove, respectively. This alternation
 *   avoids clustering young regions on the newly discovered region server.
 *   Otherwise, we choose new regions from head of regionsToMove.
 *   /*  w  ww .  j  av a 2s.  co m*/
 * Another improvement from HBASE-3609 is that we assign regions from
 *   regionsToMove to underloaded servers in round-robin fashion.
 *   Previously one underloaded server would be filled before we move onto
 *   the next underloaded server, leading to clustering of young regions.
 *   
 * Finally, we randomly shuffle underloaded servers so that they receive
 *   offloaded regions relatively evenly across calls to balanceCluster().
 *         
 * The algorithm is currently implemented as such:
 *
 * <ol>
 * <li>Determine the two valid numbers of regions each server should have,
 *     <b>MIN</b>=floor(average) and <b>MAX</b>=ceiling(average).
 *
 * <li>Iterate down the most loaded servers, shedding regions from each so
 *     each server hosts exactly <b>MAX</b> regions.  Stop once you reach a
 *     server that already has &lt;= <b>MAX</b> regions.
 *     <p>
 *     Order the regions to move from most recent to least.
 *
 * <li>Iterate down the least loaded servers, assigning regions so each server
 *     has exactly </b>MIN</b> regions.  Stop once you reach a server that
 *     already has &gt;= <b>MIN</b> regions.
 *
 *     Regions being assigned to underloaded servers are those that were shed
 *     in the previous step.  It is possible that there were not enough
 *     regions shed to fill each underloaded server to <b>MIN</b>.  If so we
 *     end up with a number of regions required to do so, <b>neededRegions</b>.
 *
 *     It is also possible that we were able to fill each underloaded but ended
 *     up with regions that were unassigned from overloaded servers but that
 *     still do not have assignment.
 *
 *     If neither of these conditions hold (no regions needed to fill the
 *     underloaded servers, no regions leftover from overloaded servers),
 *     we are done and return.  Otherwise we handle these cases below.
 *
 * <li>If <b>neededRegions</b> is non-zero (still have underloaded servers),
 *     we iterate the most loaded servers again, shedding a single server from
 *     each (this brings them from having <b>MAX</b> regions to having
 *     <b>MIN</b> regions).
 *
 * <li>We now definitely have more regions that need assignment, either from
 *     the previous step or from the original shedding from overloaded servers.
 *     Iterate the least loaded servers filling each to <b>MIN</b>.
 *
 * <li>If we still have more regions that need assignment, again iterate the
 *     least loaded servers, this time giving each one (filling them to
 *     </b>MAX</b>) until we run out.
 *
 * <li>All servers will now either host <b>MIN</b> or <b>MAX</b> regions.
 *
 *     In addition, any server hosting &gt;= <b>MAX</b> regions is guaranteed
 *     to end up with <b>MAX</b> regions at the end of the balancing.  This
 *     ensures the minimal number of regions possible are moved.
 * </ol>
 *
 * TODO: We can at-most reassign the number of regions away from a particular
 *       server to be how many they report as most loaded.
 *       Should we just keep all assignment in memory?  Any objections?
 *       Does this mean we need HeapSize on HMaster?  Or just careful monitor?
 *       (current thinking is we will hold all assignments in memory)
 *
 * @param clusterState Map of regionservers and their load/region information to
 *                   a list of their most loaded regions
 * @return a list of regions to be moved, including source and destination,
 *         or null if cluster is already balanced
 */
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>> clusterState) {
    boolean emptyRegionServerPresent = false;
    long startTime = System.currentTimeMillis();

    int numServers = clusterState.size();
    if (numServers == 0) {
        LOG.debug("numServers=0 so skipping load balancing");
        return null;
    }
    NavigableMap<ServerAndLoad, List<HRegionInfo>> serversByLoad = new TreeMap<ServerAndLoad, List<HRegionInfo>>();
    int numRegions = 0;
    // Iterate so we can count regions as we build the map
    for (Map.Entry<ServerName, List<HRegionInfo>> server : clusterState.entrySet()) {
        List<HRegionInfo> regions = server.getValue();
        int sz = regions.size();
        if (sz == 0)
            emptyRegionServerPresent = true;
        numRegions += sz;
        serversByLoad.put(new ServerAndLoad(server.getKey(), sz), regions);
    }
    // Check if we even need to do any load balancing
    float average = (float) numRegions / numServers; // for logging
    // HBASE-3681 check sloppiness first
    int floor = (int) Math.floor(average * (1 - slop));
    int ceiling = (int) Math.ceil(average * (1 + slop));
    if (serversByLoad.lastKey().getLoad() <= ceiling && serversByLoad.firstKey().getLoad() >= floor) {
        // Skipped because no server outside (min,max) range
        LOG.info("Skipping load balancing because balanced cluster; " + "servers=" + numServers + " "
                + "regions=" + numRegions + " average=" + average + " " + "mostloaded="
                + serversByLoad.lastKey().getLoad() + " leastloaded=" + serversByLoad.firstKey().getLoad());
        return null;
    }
    int min = numRegions / numServers;
    int max = numRegions % numServers == 0 ? min : min + 1;

    // Using to check balance result.
    StringBuilder strBalanceParam = new StringBuilder();
    strBalanceParam.append("Balance parameter: numRegions=").append(numRegions).append(", numServers=")
            .append(numServers).append(", max=").append(max).append(", min=").append(min);
    LOG.debug(strBalanceParam.toString());

    // Balance the cluster
    // TODO: Look at data block locality or a more complex load to do this
    MinMaxPriorityQueue<RegionPlan> regionsToMove = MinMaxPriorityQueue.orderedBy(rpComparator).create();
    List<RegionPlan> regionsToReturn = new ArrayList<RegionPlan>();

    // Walk down most loaded, pruning each to the max
    int serversOverloaded = 0;
    // flag used to fetch regions from head and tail of list, alternately
    boolean fetchFromTail = false;
    Map<ServerName, BalanceInfo> serverBalanceInfo = new TreeMap<ServerName, BalanceInfo>();
    for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.descendingMap().entrySet()) {
        ServerAndLoad sal = server.getKey();
        int regionCount = sal.getLoad();
        if (regionCount <= max) {
            serverBalanceInfo.put(sal.getServerName(), new BalanceInfo(0, 0));
            break;
        }
        serversOverloaded++;
        List<HRegionInfo> regions = server.getValue();
        int numToOffload = Math.min(regionCount - max, regions.size());
        // account for the out-of-band regions which were assigned to this server
        // after some other region server crashed 
        Collections.sort(regions, riComparator);
        int numTaken = 0;
        for (int i = 0; i <= numToOffload;) {
            HRegionInfo hri = regions.get(i); // fetch from head
            if (fetchFromTail) {
                hri = regions.get(regions.size() - 1 - i);
            }
            i++;
            // Don't rebalance meta regions.
            if (hri.isMetaRegion())
                continue;
            regionsToMove.add(new RegionPlan(hri, sal.getServerName(), null));
            numTaken++;
            if (numTaken >= numToOffload)
                break;
            // fetch in alternate order if there is new region server
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
        }
        serverBalanceInfo.put(sal.getServerName(), new BalanceInfo(numToOffload, (-1) * numTaken));
    }
    int totalNumMoved = regionsToMove.size();

    // Walk down least loaded, filling each to the min
    int neededRegions = 0; // number of regions needed to bring all up to min
    fetchFromTail = false;

    Map<ServerName, Integer> underloadedServers = new HashMap<ServerName, Integer>();
    for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.entrySet()) {
        int regionCount = server.getKey().getLoad();
        if (regionCount >= min) {
            break;
        }
        underloadedServers.put(server.getKey().getServerName(), min - regionCount);
    }
    // number of servers that get new regions
    int serversUnderloaded = underloadedServers.size();
    int incr = 1;
    List<ServerName> sns = Arrays
            .asList(underloadedServers.keySet().toArray(new ServerName[serversUnderloaded]));
    Collections.shuffle(sns, RANDOM);
    while (regionsToMove.size() > 0) {
        int cnt = 0;
        int i = incr > 0 ? 0 : underloadedServers.size() - 1;
        for (; i >= 0 && i < underloadedServers.size(); i += incr) {
            if (regionsToMove.isEmpty())
                break;
            ServerName si = sns.get(i);
            int numToTake = underloadedServers.get(si);
            if (numToTake == 0)
                continue;

            addRegionPlan(regionsToMove, fetchFromTail, si, regionsToReturn);
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }

            underloadedServers.put(si, numToTake - 1);
            cnt++;
            BalanceInfo bi = serverBalanceInfo.get(si);
            if (bi == null) {
                bi = new BalanceInfo(0, 0);
                serverBalanceInfo.put(si, bi);
            }
            bi.setNumRegionsAdded(bi.getNumRegionsAdded() + 1);
        }
        if (cnt == 0)
            break;
        // iterates underloadedServers in the other direction
        incr = -incr;
    }
    for (Integer i : underloadedServers.values()) {
        // If we still want to take some, increment needed
        neededRegions += i;
    }

    // If none needed to fill all to min and none left to drain all to max,
    // we are done
    if (neededRegions == 0 && regionsToMove.isEmpty()) {
        long endTime = System.currentTimeMillis();
        LOG.info("Calculated a load balance in " + (endTime - startTime) + "ms. " + "Moving " + totalNumMoved
                + " regions off of " + serversOverloaded + " overloaded servers onto " + serversUnderloaded
                + " less loaded servers");
        return regionsToReturn;
    }

    // Need to do a second pass.
    // Either more regions to assign out or servers that are still underloaded

    // If we need more to fill min, grab one from each most loaded until enough
    if (neededRegions != 0) {
        // Walk down most loaded, grabbing one from each until we get enough
        for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.descendingMap().entrySet()) {
            BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey().getServerName());
            int idx = balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload();
            if (idx >= server.getValue().size())
                break;
            HRegionInfo region = server.getValue().get(idx);
            if (region.isMetaRegion())
                continue; // Don't move meta regions.
            regionsToMove.add(new RegionPlan(region, server.getKey().getServerName(), null));
            totalNumMoved++;
            if (--neededRegions == 0) {
                // No more regions needed, done shedding
                break;
            }
        }
    }

    // Now we have a set of regions that must be all assigned out
    // Assign each underloaded up to the min, then if leftovers, assign to max

    // Walk down least loaded, assigning to each to fill up to min
    for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.entrySet()) {
        int regionCount = server.getKey().getLoad();
        if (regionCount >= min)
            break;
        BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey().getServerName());
        if (balanceInfo != null) {
            regionCount += balanceInfo.getNumRegionsAdded();
        }
        if (regionCount >= min) {
            continue;
        }
        int numToTake = min - regionCount;
        int numTaken = 0;
        while (numTaken < numToTake && 0 < regionsToMove.size()) {
            addRegionPlan(regionsToMove, fetchFromTail, server.getKey().getServerName(), regionsToReturn);
            numTaken++;
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
        }
    }

    // If we still have regions to dish out, assign underloaded to max
    if (0 < regionsToMove.size()) {
        for (Map.Entry<ServerAndLoad, List<HRegionInfo>> server : serversByLoad.entrySet()) {
            int regionCount = server.getKey().getLoad();
            if (regionCount >= max) {
                break;
            }
            addRegionPlan(regionsToMove, fetchFromTail, server.getKey().getServerName(), regionsToReturn);
            if (emptyRegionServerPresent) {
                fetchFromTail = !fetchFromTail;
            }
            if (regionsToMove.isEmpty()) {
                break;
            }
        }
    }

    long endTime = System.currentTimeMillis();

    if (!regionsToMove.isEmpty() || neededRegions != 0) {
        // Emit data so can diagnose how balancer went astray.
        LOG.warn("regionsToMove=" + totalNumMoved + ", numServers=" + numServers + ", serversOverloaded="
                + serversOverloaded + ", serversUnderloaded=" + serversUnderloaded);
        StringBuilder sb = new StringBuilder();
        for (Map.Entry<ServerName, List<HRegionInfo>> e : clusterState.entrySet()) {
            if (sb.length() > 0)
                sb.append(", ");
            sb.append(e.getKey().toString());
            sb.append(" ");
            sb.append(e.getValue().size());
        }
        LOG.warn("Input " + sb.toString());
    }

    // All done!
    LOG.info("Done. Calculated a load balance in " + (endTime - startTime) + "ms. " + "Moving " + totalNumMoved
            + " regions off of " + serversOverloaded + " overloaded servers onto " + serversUnderloaded
            + " less loaded servers");

    return regionsToReturn;
}