Friday, December 5, 2025

BB25006 Archiving the Internet V01 051225

As part of “Book on Book” will be information on Physical Libraries  and Digital Libraries with internet crawling and data capture for achiving purposes an important part of the informational cataloguing area.



 https://www.loc.gov/preservation/digital/formats/intro/intro.shtml



You can view a huge junk of internet that has be crawled and extracted on this site below but getting at it seems a bit technical.


www.commoncrawl.org 


This link below is a paper that has been written based upon the use of the Common Crawl Corpus data looking where  geospatial information is being used on the internet.


https://dl.acm.org/doi/pdf/10.1145/3678717.3691286

No comments:

Post a Comment

BB26010 The Decline of Reading V01 100326

  Love of books is more fragile than we realise Reading for pleasure is a recent phenomenon but one that is in genuine danger of extinction ...