Crawling

Papers

Focused crawling: A new approach for topic-specific resource discovery
S Chakrabarti, M van den Berg, B Dom - View as HTML - Cited by 250

    A short intro to focused crawling by chakrabarti  

Distributed Hypertext Resource Discovery Through Examples (Chakrabarti, et. al)

Efficient crawling through URL ordering
J Cho, H Garcia-Molina, L Page - View as HTML - Cited by 169

Crawling the Hidden Web  use this link
S Raghavan, H Garcia-Molina - View as HTML - Cited by 78

Focused crawling using context graphs
M Diligenti, F Coetzee, S Lawrence, CL Giles, M … - View as HTML - Cited by 90

SPHINX: a framework for creating personal, site-specific web crawlers
RC Miller, K Bharat - Cached - Cited by 51


Breadth-first search crawling yields high-quality pages
M Najork, JL Wiener - Cited by 42


Accelerated focused crawling through online relevance feedback
S Chakrabarti, K Punera, M Subramanyam - Cited by 21

 Personalized and Focused Web Spiders
M Chau, H Chen - View as HTML - Cited by 2

Mercator: A scalable, extensible web crawler
A Heydon, M Najork, L Ave - Cited by 94  

A Longer, Newer Mercator paper

A 2001 Survey of Crawlers

Menczer, Indiana U. Topical Web Crawlers: Evaluating Adaptive Algorithms.

Semantic Web Crawling

Caroll, et. al, Implementing the Semantic Web Recommendations

 

Communities

Inferring web communities from link topology
D Gibson, J Kleinberg, P Raghavan - Cited by 212


Efficient identification of web communities
GW Flake, S Lawrence, CL Giles, FM Coetzee - Cited by 79

 

Organizations/Projects

Heritrix and the Internet Archive