HathiTrust Posts February 2012 Activities Update

The complete Feb. 2012 Activities Update is accessible here.

Here are a Few Highlights:

Changes to tab-delimited metadata files

Staff at Michigan added 5 new fields to HathiTrust’s tab-delimited inventory files: publication date, publication location, language, bibliographic format, and an indication of whether or not a volume has been identified as a U.S. federal government document. A description of the new fields, which are included in the inventory files as of March 1, is available at http://www.hathitrust.org/hathifiles_description.

Print on Demand Reports

HathiTrust is now posting reports of public domain and open access volumes in HathiTrust that are available for print on demand. The reports can be found at http://www.hathitrust.org/pod_reports and will be released on the first of every month beginning in April.

IMLS Quality Grant

Project staff completed page-level review of a third production sample, consisting of 1,000 volumes digitized by the Internet Archive. More than 85,000 pages were reviewed in all. Approximately 9,400 of these (about 10%) were coded by two reviewers for quality assurance purposes.

Full-text Search

Michigan staff began work on the next iteration of advanced full-text search, which will allow users to build queries with greater Boolean complexity and enhance the ability to revise advanced searches. Staff made progress as well on plans to improve search results relevance ranking. This work is planned to begin after the next release of advanced full-text search.

California Digital Library staff completed dictionary-building work for the spelling suggester feature. The code can now build a language-sensitive dictionary of unigrams and bigrams from any Lucene index, automatically choosing a frequency cut-off to constrain the size of the dictionary. Focus will now shift to implementing fast-lookup and suggestion ranking.

New Growth/Overall Size of Collection

  • 91,500 Volumes Added During February 2012
  • 10,074,909 Overall (Volume count does not include archival and image materials in the Minnesota Digital Library project)
  • ~28% of the Collection is Public Domain (2,791,223)
    • Includes volumes opened through copyright review and rights holder permissions

Much more in the complete Feb. 2012 Activities Update