Analysis of community structure in a young Internet domain

 Master thesis data

Master's Specialization: Algorithms and Programming
Topic Aproval date: 23/01/09
Orientation: research

Student: Miquel Camprodon
Thesis advisor(s): Jordi Delgado, Ricard Gavaldà

Thesis Description
The .cat top-level Internet domain opened up for registration in  February 2006, and currently contains over 30,000 subdomains and several million pages. It is administered by Fundació PuntCat, a non-profit organization whose goals also include promoting its usage and encourage related research.

Fundació PuntCat has been performing monthly crawls of the whole .cat  domain since day 1, with recent crawls measuring several Gigabytes. This offers an almost unique case study in the history of the Internet: the possibility of  watching in detail the growth and evolution of a top-level Internet domain from its very start to a reasonable level of maturity.

The thesis comprises the investigation of community structure in the .cat domain, and in particular a study of the peculiarities due to  its youth. It will include research and implementation of a good number of the algorithms for community identification in the literature, a brief comparison among them to choose one or two that give the most meaningful and robust results, an in-depth study of the communities detected by the chosen algorithms at several points in time, and an interpretation of the results obtained.