Towards the Optimal Solution of the Multiprocessor Scheduling Problem with Communication Delays - PDF

Please download to get full document.

View again

of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Arts & Culture

Published:

Views: 0 | Pages: 9

Extension: PDF | Download: 0

Share
Related documents
Description
Towards the Optimal Solution of the Multiprocessor Scheduling Problem with Communication Delays Tatjana Davidović Mathematical Institute, Serbian Academy of Sciences, Belgrade, Serbia,
Transcript
Towards the Optimal Solution of the Multiprocessor Scheduling Problem with Communication Delays Tatjana Davidović Mathematical Institute, Serbian Academy of Sciences, Belgrade, Serbia, Leo Liberti LIX Ecole Polytechnique, F Palaiseau, France, Nelson Maculan COPPE Systems Engineering, Federal University of Rio de Janeiro, Brazil, Nenad Mladenović Mathematical Institute, Serbian Academy of Sciences, Belgrade, Serbia, and School of Information Systems, Computing and Mathematics, Brunel University, UB83PH Uxbridge UK, We consider the Multiprocessor Scheduling Problem with Communication Delays, where the delay is proportional to both the amount of exchanged data between pairs of dependent tasks and the distance between processors in the multiprocessor architecture. Although scheduling problems are usually solved by means of heuristics due to their large sizes, we propose methods to identify optimal solutions of small and medium-scale instances. A set of instances with known optima is a useful benchmarking tool for new heuristic algorithms. We propose two new Mixed-Integer Bilinear Programming formulations, we linearize them in two different ways, and test them with CPLEX 8.1. To decrease the time needed by CPLEX for finding the optimal solution, we use Variable Neighborhood Search heuristic to obtain a good approximation for the initial solution. Keywords: Multiprocessors, Task Scheduling, Communication Delays, Linear Programming, CPLEX. 1 Introduction In this paper we propose formulation-based and combinatorial methods to find optimal solutions to the Multiprocessor Scheduling Problem with Communication Delays (MSPCD). This consists in finding a minimum makespan schedule to run a set of tasks with fixed lengths L i on an arbitrary network of homogeneous processors. The task precedence is modelled by a directed acyclic graph G = (V, A) where V is the set of tasks and an arc (i, j) exists if task i has to be completed before task j can start. If i and j are executed on different processors h, k P, they incur a communication cost penalty γij hk dependent on the distance d hk between the processors, and on the amount of exchanged data c ij between tasks (γij hk = Γc ij d hk, where Γ is a known constant). Since this problem is a super-class of the classic scheduling problem, it is NP-hard. Its natural application field is in compiler design [13] and robotics [14]. The size of real-life instances is almost always too large to be solved to optimality, so heuristic algorithms are used to find good solutions. When designing new heuristic methods, it is necessary to validate them and benchmark them on a set of instances with known optimal solutions. The motivation of this study (and purpose of this paper) is to provide a tool to accomplish this, i.e. finding optimal solutions of the MSPCD. Of course, our methods are prohibitively computationally expensive to be applied to real-life MSPCD instances, so our numerical experiments were carried out on a set of small- and medium-sized instances. In the literature one can find mathematical formulations for different variants of scheduling problems. For example, in [12] the formulation for scheduling independent tasks is given. Scheduling of dependent tasks without communication is modelled in several different ways [1, 10, 15]. In [9] the authors considered the problem of scheduling dependent tasks onto heterogeneous multiprocessor architecture, but neglected communication time. To the best of our knowledge, up to now no mathematical programming model involving communications was developed and solved to optimality. Moreover, in existing models targeting similar problems, the processor topology is assumed to be a complete graph. The main contributions of this paper are: (a) an adaptation of the classic scheduling formulation with ordering variables to the MSPCD, and a compact linearization of the resulting bilinear program; (b) a new formulation based on packing rectangles into a larger rectangle, which considerably reduces the number of variables; (c) comparative computational experiments on CPLEX 8.1 solving the two formulations with the impact of good, heuristic-based initial solution for the search process. The rest of this paper is organized as follows. The two formulations are presented in Section 2. In Section 3 we discuss the comparative computational results. Section 4 concludes the paper. 2 Problem formulation In this section we present two new formulations for the MSPCD. One is an adaptation of the classic scheduling model with variables expressing both assignment and ordering, and the other is derived from the idea of identifying tasks with rectangles and packing rectangles into a larger rectangle representing the total makespan and processor set. The parameters for both models are: the set of task lengths L i (i V ), the task precedence graph G = (V, A), the costs c ij on the arcs (i, j) A, the processor distance matrix (d kl ) for k, l P. We let δ (j) be the set of precedents of task i, that is δ (j) = {i V (i, j) A}. 2.1 Classic formulation The classic scheduling formulation employs a set of binary variables which control both assignment of task to processor and position in the order of tasks executed by the given processor, and a set of continuous variables indicating the starting time for each task. Thus, we let: { i, s V, k P yik s 1 task i is the s-th task on processor k = 0 otherwise, and t i R + be the starting time of task i for all i V. The MSPCD can be formulated as follows: min x,t i V k P k P, s V \{1} max i + L i } i V (1) p yik s = 1 (2) k=1 s=1 i=1 i=1 y 1 ik 1 (3) y s ik i=1 y s 1 ik (4) j V, i δ (j) t j t i + L i + k P, s V \{n}, i, j V p p h=1 s=1 k=1 r=1 t j t i + L i M 2 yik s + γ hk ij y s ihy r jk (5) yjk r r=s+1 (6) i, s V, k P y s ik {0, 1} (7) i V t i 0, (8) where M 0 is a sufficiently large penalty coefficient and γ hk ij represent the delay between tasks i and j as mentioned in Section 1. Equations (2) ensure that each task is assigned to exactly one processor. Inequalities (3)- (4) state that each processor can not be simultaneously used by more than one task. (3) means that at most one task will be the first one at k, while (4) ensures that if some task is the s th one (s 2) scheduled to processor k then there must be another task assigned as (s 1)-th to the same processor. Inequalities (5) express precedence constraints together with communication time required for tasks assigned to different processors. Constraints (6) define the sequence of the starting times for the set of tasks assigned to the same processor. They express the fact that task j must start at least L i time units after the beginning of task i whenever j is executed after i on the same processor k; the M parameter must be large enough so that constraint (6) is active only if i and j are executed on the same processor k and r s Linearization of the classic formulation The mathematical formulation of MSPCD given by (1)-(8) contains linear terms in the continuous t variables and linear and bilinear terms in the binary y variables. It therefore belongs to the class of mixed integer bilinear programs. It is possible to linearize this model by introducing a new set of (continuous) variables ζijhk sr [0, 1] which replace the bilinear terms ys ih yr jk in Eq. (5) [4, 5]. The linearizing variables ζijhk sr have to satisfy the following linearization constraints: y s ih + y r jk 1 yih s ζijhk sr (9) yjk r ζ sr ijhk (10) ζ sr ijhk (11) i, j, s, r V, h, k P, which guarantee that ζijhk sr = ys ih yr jk. We call this reformulation the usual linearization. The number of variables and constraints in the linearized model is O( V 4 P 2 ). Solving the formulation with a Branch-and-Bound method, we need to solve a linear relaxation at each node. Thus, it is important to obtain formulations with a small number of constraints. Since the constraint size is due to the linearization constraints (9)-(11), we propose a different linearization technique, called compact linearization, which reduces the size of the problem. This consists in multiplying each assignment constraints (2) by yjk r ; the resulting bilinear terms are successively linearized by substituting them with the appropriate ζ variable: thus, we obtain valid linear relationships between the ζ and the y variables [8]. The compact linearization also assumes that ζijhk sr = ζrs jihk, which holds by commutativity of product. Thus, the compact linearization for the MSPCD is as follows: i, j, r V, l P i, j, s, r V, h, k P p h=1 s=1 ζ sr ijhk = y r jk (12) ζ sr ijhk = ζ rs jikh (13) Notice that although there are still O( V 4 P 2 ) constraints (13), these can be used to eliminate half of the linearizing variables and deleted from the formulation (any decent MILP pre-solver will actually perform this reformulation automatically). Hence, there are only O( V 3 P ) constraints in the compact linearization Bounds to the objective function Simply feeding the classic formulation to a MILP solver was found to be inefficient. To speed up the search, we artificially set lower and upper bounds to the objective function value. Load Balancing (LB) and the Critical Path Method (CPM) are two fast alternatives to compute cheap lower bounds. The optimal solution cannot have a shorter execution time than the ideal load balancing case, i.e. when there is no idle time intervals and all the processors complete the execution exactly at the same time. In other words, if we let T max = max j n {t j + L j } denote the objective function, we have T max 1 ni=1 p L i = T LB. Furthermore, the length of final schedule can not be smaller that the length of the critical path, the longest path connecting a node with no predecessors to a node without successors. Let us denote by δ + (j) the set of immediate successors of task j, i.e. δ + (j) = {i V : (j, i) A}. By using CPM the corresponding lower bound T CP can be defined as T CP = max j n CP(j) where CP(j) = L j, if δ + (j) =, L j + max j δ + (j) CP(j ), otherwise, Let T be the greatest of the bounds obtained by LB and CPM; a constraint of the form T max T is then also added to the formulation. Notice that this lower bound is valid throughout the whole solution process and does not depend on the current Branch-and-Bound node being solved. Several efficient heuristics and meta-heuristics can provide upper bounds. These heuristics are seldom applicable to a Branch-and-Bound algorithm because at any given Branch-and-Bound iteration, some of the variables are fixed and it is usually difficult to force the graph-based heuristics to constrain the parts of the graph structure which correspond to the fixed variables. At the first iteration, however, no variables are fixed, which makes it possible to apply some efficient graph-based heuristics as a kind of pre-processing step to the whole Branch-and-Bound run. In our tests, we used Variable Neighborhood Search (VNS) [3] to compute tight upper bounds T to the objective function value, and a constraint T max T was added to the formulation prior to starting the solution process. 2.2 Packing formulation The idea on which this model is based is to liken task i to a rectangle of length L i and height 1, and to pack all the rectangles representing the tasks into a larger rectangle of height P and length W, where W is the total makespan to be minimized [6]. For each task i V let t i R be the starting execution time (or alternatively, the horizontal coordinate of the bottom left corner of the rectangle representing task i) and p i N be the ID of the processor where task i is to be executed (alternatively, the vertical coordinate of the bottom left corner of the rectangle representing task i). Let W be the total makespan (alternatively, the length of the larger rectangle where we want to pack the task-rectangles). Let x ik be 1 if task i is assigned to processor k, and zero otherwise. In order to enforce non-overlapping constraints, we define two sets of binary variables: { 1 task i finishes before task j starts i, j V σ ij = 0 otherwise i, j V ǫ ij = { 1 the processor index of task i is strictly less than that of task j 0 otherwise. The formulation is as follows: min t,p,σ,ǫ W (14) i V t i + L i W (15) i j V t j t i L i (σ ij 1)W max 0 (16) i j V p j p i 1 (ǫ ij 1) P 0 (17) i j V σ ij + σ ji + ǫ ij + ǫ ji 1 (18) i j V σ ij + σ ji 1 (19) i j V ǫ ij + ǫ ji 1 (20) j V : i δ (j) σ ij = 1 (21) j V : i δ (j) t i + L i + γij hk x ih x jk t j (22) h,k P i V kx ik = p i (23) i V k P k P x ik = 1 (24) W 0 (25) i V t i 0 (26) i V p i {1,..., P } (27) i V, k P x ik {0, 1} (28) where W max is an upper bound on the makespan W, e.g. i, j V σ ij, ǫ ij {0, 1}, (29) W max = i V L i + i,j V c ij max{d hk h, k P }. We minimize the makespan in (14) and (15), we define the time order on the tasks in terms of the σ variables in (16), and similarly we define the CPU ID order on the tasks in terms of the ǫ variables in (17). By (18), the rectangles must have at least one relative positioning on the plane. By (19) a task cannot both be before and after another task; similarly, by (20) a task cannot be placed both on a higher and on a lower CPU ID than another task. Constraints (21) enforce the task precedences; constraints (22) model the communication delays. Constraints (23) link the assignment variables x with the CPU ID variables p, and (24) are the assignment constraints. Formulation (14)-(29) is also a bilinear MINLP where the bilinearities arise because of the communication delay constraints. Observe, however, that this formulation has variables with only two indices, as opposed to the formulation of Section 2.1 whose variables have three indices. Since linearization squares the size of the variable set, it appears clear that the packing formulation size is much smaller than the classic one. (30) 2.2.1 Linearization of the packing formulation We linearize formulation (14)-(29) by introducing variables 0 z 1 as follows: j V, i δ (j), h, k P (z hk ij = x ih x jk ). Integrality on the linearizing variables z follows from the usual linearization constraints: j V, i δ (j), h, k P (x ih z hk ij x jk z hk ij x ih + x jk 1 z hk ij ) (31) The above linearization constraints are equivalent to the following compact linearization: i j V, k P ( h P z hk ij = x jk ). (32) i j V, h, k P (z hk ij = z kh ji ). (33) Although the set of compact linearization constraints (33)-(32) is smaller than the set of usual linearization constraints (31), the difference is not as dramatic as for the classic formulation (O( V 2 P ) against O( V 2 P 2 ). Experimentally, we found that the compact linearization is beneficial when the precedence graph is dense. For sparse graphs the usual linearization sometimes yields shorter solution times. This is in accordance with the observation that a more precise evaluation of the magnitude order of the usual linearization constraint size is O( A P 2 ) rather than O( V 2 P 2 ), and that A decreases with respect to V 2 as the sparsity of the graph increases. 3 Computational results In this section we describe and discuss the computational experiments performed to test the ideas presented in this paper. All our computations were carried out on an Intel PIV 2.66GHz CPU with 1GB RAM running Linux. We used CPLEX v. 8.1 [7]. 3.1 Instances Several small size test instances known from the literature (like [11]) were used in our experiments. In addition, we tested a few examples with known optimal solutions (named with the prefix ogra) generated as in [2]. We scheduled small-sizes instances on: p = 2 and p = 3, completely connected processors. For ogra examples the optimal solution is obtained when the tasks are well packed (as in ideal schedule) in the order defined by the task indices. For these task graphs it is always the case that T max = T LB = T CP. This means that they have a special structure which makes them very hard instances for heuristic methods. In other words, in the cases when the task graph density is small, i.e. the number of mutually independent tasks is large, it is hard to find the task ordering which yields the optimal schedule. Medium-size instances (with 30 and 40 tasks) are generated randomly, as it was proposed in [2]. These instances names are generated with the prefix t followed by the number of tasks, density and index distinguishing graphs with the same characteristics. They are scheduled on a ring of p = 4 processors. 3.2 Resulting tables Here we present some selected computational results. In Table 1 comparative results for both formulations are given for small test instances. First four columns contain the task graph characteristics (name, number of processors, number of tasks, edge densities). The length of the optimal schedule is given in the fifth column, while the last two columns contain the time required by CPLEX to solve classical and packing formulation respectively. The time is represented by hours : minutes : seconds. hundreds of seconds with zero values in front omitted. Table 1: Results for small task graph examples example p n ρ T max Classic Packing ogra :14: ogra : ogra : mt : ogra :42: ogra :13: ogra :12: gra :48.92 ogra t :43:51.52 t :29.86 t t :56.79 t As we can see, with all improvements (artificial lower bounds, VNS heuristic initial solution) we were not able to solve to the optimality graphs with more that 10 tasks using classical formulation. On the other hand, with packing formulation 20 tasks represent no significant difficulty. Medium-size examples seem to be difficult even for packing formulation. Results given in the Table 2 show that more that 24 hours are needed for most of them to be solved to the optimality. Namely, for these examples we set time limit of 24 hours for the CPLEX execution. In the fourth column of Table 2 we put the required execution time to obtain the optimal solution (if it has been reached) or the given time limit (if the optimal solution remind unfound). The last columns contains the gap between the final solution and the obtained lower bound. The average CPU time improvement of the packing formulation over the best variant of the classic formulation is experimentally proved to be about 5000 times. Therefore, we believe that the packing formulation represents a valuable tool for solving small MSPCD instances to optimality. 4 Conclusion In this paper two linear programming formulations for scheduling dependent tasks onto homogeneous multiprocessor architecture of an arbitrary structure with taking into account communication delays is proposed. The models are used within the commercial software for solving Mixed-Integer Linear Programs CPLEX. To reduce the execution time required by CPLEX to solve test examples Table 2: Results for medium-size task graph examples example p T max Time Gap t :00: % t :44: % t :00: % t :11: % t :00: % t :00: % t :24: % t :00: % t :00: % to the optimality, we added heuristic limitations to the upper bound and the lower bound and introduce reduction constraints to the original model. References [1] E. H. Bowman (1959), The schedule-sequencing problem, Operations Research, 7(5): [2] T. Davidović and T. G. Crainic (2006), Benchmark problem instances for static task scheduling of task graphs with communication delays on homogeneous multiprocessor systems, Computers & OR, 33(8): [3] T. Davidović, P. Hansen, and N. Mladenović (2005), Permutation based genetic, tabu and variable neighborhood search heuristics for multiprocessor scheduling with communication delays, Asia-pacific Journal of Operational Research, 22(3): [4] R. Fortet (1960), Applications de l algèbre de boole en recherche opérationelle. Revue Française de Recherche Opérationelle, 4: [5] P.L. Hammer and S. Rudeanu, Boolean Methods in Operations Research and Related Areas, Springer, Berlin, [6] Y. Guan and R.K. Cheung (2004), The berth allocation problem: models and solution methods. OR S
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks