Web Track
- Goal: investigate whether �web data is different� and how links might be used in retrieval
- 2GB static snapshot of the Web
- an approximation to the real web task...
- links (frames, scripts, etc.) not live
- collection not closed under links
- sampling to produce 2GB favored contentful sites
- � but much closer to web than are other TREC collections
- html
- dirty data
- heterogeneous genres and subject matter