A unified fault-tolerant routing scheme for a class of cluster networks

Khaled Day; Bassel Arafeh; Abderezak Touzene

doi:10.1016/j.sysarc.2008.01.002

A unified fault-tolerant routing scheme for a class of cluster networks

Khaled Day^*, Bassel Arafeh, Abderezak Touzene

^*المؤلف المقابل لهذا العمل

Computer Science

نتاج البحث: المساهمة في مجلة › Article › مراجعة النظراء

3 اقتباسات (Scopus)

ملخص

Large cluster systems with thousands of nodes have become a cost-effective alternative to traditional supercomputers. In these systems cluster nodes are interconnected using high-degree switches. Regular direct interconnection network topologies including tori (k-ary n-cubes) and meshes are among adapted choices for interconnecting these high-degree switches. We propose a generalized fault-tolerant routing scheme for highly connected regular interconnection networks and derive conditions for its applicability. The scheme is based on the availability of efficiently identifiable disjoint routes between network nodes. When routing paths become faulty, alternative disjoint routes are identified and taken. The methods used to identify the routing paths, to propagate failure information, and to switch from a routing path to another incur little communication and computation overhead. If the faults occur reasonably apart in time, then packets are efficiently routed along paths of minimal or near-minimal lengths. In the unlikely case where several faults occur in a short period of time, the scheme still delivers packets but possibly along longer paths. The proposed scheme and its properties are first presented in general terms for any interconnection topology satisfying certain derived connectivity conditions. The applicability of the general scheme is then illustrated on examples of well known regular topologies satisfying the derived connectivity conditions including the binary hypercube, the k-ary n-cube and the star graph networks.

اللغة الأصلية	English
الصفحات (من إلى)	757-768
عدد الصفحات	12
دورية	Journal of Systems Architecture
مستوى الصوت	54
رقم الإصدار	8
المعرِّفات الرقمية للأشياء	https://doi.org/10.1016/j.sysarc.2008.01.002
حالة النشر	Published - أغسطس 2008

ASJC Scopus subject areas

???subjectarea.asjc.1700.1712???
???subjectarea.asjc.1700.1708???

الوصول إلى المستند

10.1016/j.sysarc.2008.01.002

الملفات والروابط الأخرى

قم بذكر هذا

@article{e6212733bd494dbd94122f03c24022b3,

title = "A unified fault-tolerant routing scheme for a class of cluster networks",

abstract = "Large cluster systems with thousands of nodes have become a cost-effective alternative to traditional supercomputers. In these systems cluster nodes are interconnected using high-degree switches. Regular direct interconnection network topologies including tori (k-ary n-cubes) and meshes are among adapted choices for interconnecting these high-degree switches. We propose a generalized fault-tolerant routing scheme for highly connected regular interconnection networks and derive conditions for its applicability. The scheme is based on the availability of efficiently identifiable disjoint routes between network nodes. When routing paths become faulty, alternative disjoint routes are identified and taken. The methods used to identify the routing paths, to propagate failure information, and to switch from a routing path to another incur little communication and computation overhead. If the faults occur reasonably apart in time, then packets are efficiently routed along paths of minimal or near-minimal lengths. In the unlikely case where several faults occur in a short period of time, the scheme still delivers packets but possibly along longer paths. The proposed scheme and its properties are first presented in general terms for any interconnection topology satisfying certain derived connectivity conditions. The applicability of the general scheme is then illustrated on examples of well known regular topologies satisfying the derived connectivity conditions including the binary hypercube, the k-ary n-cube and the star graph networks.",

keywords = "Cluster systems, Fault-tolerant routing, Interconnection networks",

author = "Khaled Day and Bassel Arafeh and Abderezak Touzene",

year = "2008",

month = aug,

doi = "10.1016/j.sysarc.2008.01.002",

language = "English",

volume = "54",

pages = "757--768",

journal = "Journal of Systems Architecture",

issn = "1383-7621",

publisher = "Elsevier",

number = "8",

}

TY - JOUR

T1 - A unified fault-tolerant routing scheme for a class of cluster networks

AU - Day, Khaled

AU - Arafeh, Bassel

AU - Touzene, Abderezak

PY - 2008/8

Y1 - 2008/8

N2 - Large cluster systems with thousands of nodes have become a cost-effective alternative to traditional supercomputers. In these systems cluster nodes are interconnected using high-degree switches. Regular direct interconnection network topologies including tori (k-ary n-cubes) and meshes are among adapted choices for interconnecting these high-degree switches. We propose a generalized fault-tolerant routing scheme for highly connected regular interconnection networks and derive conditions for its applicability. The scheme is based on the availability of efficiently identifiable disjoint routes between network nodes. When routing paths become faulty, alternative disjoint routes are identified and taken. The methods used to identify the routing paths, to propagate failure information, and to switch from a routing path to another incur little communication and computation overhead. If the faults occur reasonably apart in time, then packets are efficiently routed along paths of minimal or near-minimal lengths. In the unlikely case where several faults occur in a short period of time, the scheme still delivers packets but possibly along longer paths. The proposed scheme and its properties are first presented in general terms for any interconnection topology satisfying certain derived connectivity conditions. The applicability of the general scheme is then illustrated on examples of well known regular topologies satisfying the derived connectivity conditions including the binary hypercube, the k-ary n-cube and the star graph networks.

AB - Large cluster systems with thousands of nodes have become a cost-effective alternative to traditional supercomputers. In these systems cluster nodes are interconnected using high-degree switches. Regular direct interconnection network topologies including tori (k-ary n-cubes) and meshes are among adapted choices for interconnecting these high-degree switches. We propose a generalized fault-tolerant routing scheme for highly connected regular interconnection networks and derive conditions for its applicability. The scheme is based on the availability of efficiently identifiable disjoint routes between network nodes. When routing paths become faulty, alternative disjoint routes are identified and taken. The methods used to identify the routing paths, to propagate failure information, and to switch from a routing path to another incur little communication and computation overhead. If the faults occur reasonably apart in time, then packets are efficiently routed along paths of minimal or near-minimal lengths. In the unlikely case where several faults occur in a short period of time, the scheme still delivers packets but possibly along longer paths. The proposed scheme and its properties are first presented in general terms for any interconnection topology satisfying certain derived connectivity conditions. The applicability of the general scheme is then illustrated on examples of well known regular topologies satisfying the derived connectivity conditions including the binary hypercube, the k-ary n-cube and the star graph networks.

KW - Cluster systems

KW - Fault-tolerant routing

KW - Interconnection networks

UR - http://www.scopus.com/inward/record.url?scp=48149109850&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=48149109850&partnerID=8YFLogxK

U2 - 10.1016/j.sysarc.2008.01.002

DO - 10.1016/j.sysarc.2008.01.002

M3 - Article

AN - SCOPUS:48149109850

SN - 1383-7621

VL - 54

SP - 757

EP - 768

JO - Journal of Systems Architecture

JF - Journal of Systems Architecture

IS - 8

ER -

A unified fault-tolerant routing scheme for a class of cluster networks

ملخص

ASJC Scopus subject areas

الوصول إلى المستند

الملفات والروابط الأخرى

بصمة

قم بذكر هذا