dc

M1: "Tecnologías para Cluster Computing"

Turno Mañana (9 a 12 horas)

Oren Laadan, Columbia University, EEUU

Oren Laadan is a PhD student of Computer Science at Columbia University, member of the Network Computing Laboratory (www.ncl.cs.columbia.edu). He received his B.Sc. in physics, math and computer science, and M.Sc. in computer science from the Hebrew University of Jerusalem. He was a principal member of the distributed computing lab there, where he developed the MOSIX (www.mosix.cs.huji.ac.il) multicomputer operating system. Oren's main research interests include distributed operating systems, dynamic resource allocation by process migration, virtualization systems, high performance computing, and algorithms for efficient resource utilization in clusters and Grids. His current research explores operating systems virtualization for cluster computing and mobile computing

Objetivos del Curso:

Cluster Computing has emerged as a major strategy for delivering high
performance to technical and commercial applications. The growing popularity and availability of powerful computers and high-speed networks as off-the-shelf components, together with superior flexibility and cost- effectiveness, have redefined how parallel and distributed computing is being accomplished today. Commodity clusters today provide a convenient ready-to-use platform for executing complex computation-, data- and/or transaction-centric applications. Cluster computing provides a wide spectrum of research and development challenges in many of its areas. In this course we will focus on emerging software technologies that are geared to provide high performance, high availability, scalability and ease-of-use. We will also discuss tools and algorithms for resource management, middle-ware and finally the integration of clusters into computational grids.

Programa:

Course overview, introduction to parallel and distributed computing, the case for cluster computing, basic concepts in cluster computing (architecture, middle-ware, programming environments,
applications).
Software environments, single system image (SSI) environments, advanced operating system support for cluster computing.
Virtualization technologies: virtual machines, virtual operating systems, virtual programming environments, checkpoint/restart of jobs, process migration.
Job assignment, resources in cluster (cpu, memory, network, storage) and resource sharing, algorithms and systems for resource balancing, consolidation, scalability of clusters.
Overview of Grid computing, integrating grid computing and clusters.
Throughout the course detailed examples of existing systems which are deployed in commerce, industry and research environments will be given to demonstrate the concepts described.

Requisitos:

Basic knowledge of operating systems

Bibliografía:

High Performance Cluster Computing : Architectures and Systems, Vol 1, Buyya, Raijkumar (editor), 1999 Prentice Hall, USA, ISBN: 0-13013-784-7
The Grid: Blueprint for a New Computing Infrastructure, 2nd Edition, Ian Foster, Carl Kesselman, Morgan Kaufmann, 2004. ISBN: 1-55860-933-4.
The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. I. Foster, C. Kesselman, J. Nick, S. Tuecke, Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002.
Virtual machine monitors: current technology and future trends. Mendel Rosenblum and Tal Garfinkel, IEEE Computer, 38(5):39-47, 2005.
Xen and the Art of Virtualization. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, Proceedings of the 19th Symposium on Operating System Principles October, 2003.
Live migration of Virtual. Machines C. Clark, K. Fraser, and S. Hand; J. G. Hansen and E. Jul; C.Limpach, I. Pratt, and A. Warfield, USENIX NSDI, Boston, May 2005

Material del Curso:

Sobre el examen:

El examen será de 1 hora, y va a tener 3 partes:

1. Analizar un sistema con "performance" sub-optimo con el fin de puntar las causas posibles.

2. Una lista de "preguntas cortas" a las cuales se debe contestar con no mas 3-5 frases.

3. Dos preguntas en las cuales se necesita aplicar lo aprendido a un sistema/configuracion desconocido, sugeriendo solucion adecuada. Ejemplos:

1. Un sistema operativo muestra utilizacion de 10% del cpu, y 90% disco; Cual de los cambios siguientes puede mejorarlo (y por que ?)
(a) instalar otro cpu
(b) instalar mas memoria
...
...

2.
(a) Nombra 3 diferencias claves entre Grids y Clusteres
(b) Nombra 3 "tradeoffs" entre virtualization en nivel OS y hardware
(c) Normalmente el paralelismo crece con el tamano del problema. Es cierto que hay casos en que el tamane crece pero el paralelismo se disminuye ?
(d) Explica el efecto de "bandwidth" y "latency" sobre la eficaz del cluster
...
...

3. (veran)

Trabajo final: en este trabajo tendrán que escribir un reporte corto (de 6 a 10 páginas) en el que revean y evaluen alguno de los temas que han visto en clase, de manera más detallada. Se pueden agrupar de a pares, en ese caso, se espera que el trabajo sea más profundo.

El trabajo tendrá que estructurase de la siguiente forma:
1. Resumen (abstract): un resumen corto del trabajo.
2. Introducción (motivación, background y organización del trabajo)
3. Discusión (describir los sistemas existentes y las tendencias, evaluarlos y comparar entre las distintas alternativas)
4. Conclusiones
5. Bibliografía

A continuación hay una lista de temas, por favor, seleccionar UNO (pueden sugerir nuevos por mail si no esta listado). Por favor avisenme su selección por mail.

Nota
1. Los papers deben ser entregados el Lunes 22 de Agosto.
2. Entrega por mail: el formato puede ser texto, word .doc, pdf, ghostscript (puede ser entregado en español).
3. Recordatorio: la nota final es 50% el examen y 50% el trabajo, así que metanle pilas al trabajo!!!

(Nota: No voy a estar en linea del 5/8 al 16/8, así que planifiquen)

Temas disponibles:
1. Sistemas operativos de clusters (SolarisMC, Mosix, UnixWare, GLUnix, Sprite, RHODOS, etc).
2. Sistemas de manejo de clusters(Condor, PBS, Muai Scheduler, Codine, Utopia, PBS, etc)
3. Sistemas de monitoreo de clusters(Alert, SMILE, PARMON, bWatch, etc)
4. Herramientas de cluster (Nimrod, Parallel Commands, etc; also check the Parallel Tools Consortium page)
5. Checkpoint/Restart (Condor, CoCheck, Zap, libcpt, libckpt, etc)
6. Máquinas virtuales (VMWare, Xen, Denali, UML - user mode linux, VirtualPC, etc)
7. Cluster Applications (scientific, commercial, webserving, data-mining, etc)
8. Interconnection Networks for Clusters (Ethernet, Gigabit Ethernet, Myrinet, SCI, Infiniband, VIA, etc)

Buena Suerte!

Calificaciones:

Nombre
Esteban Mocskos
Marious Zapetero
Esteban Franqueiro
Sebastian Garcia Rojas
Paula Casero
Jose Ignacio Orlicki
Juan Poablo Suarez
Leonardo Dominguez
Maria Engenio Berezin
Alejandro Jose Panelli
Demian Ponce
Maximiliano Bertacchini
Maria Carolina Leon Carri
Diego M. Vadell
Federico A. Ocampo
Maximiliano Sacco
Guillermo Luis Grinblat
Eduardo Vega
Marcelo Doallo
Alejandro Benaban
Martinez Luquez Juan Cruz
salberu@buenosaires.gov.ar

examen

trabajo

final

100