Tagging and The Scotsman Digital Archive

Posted on February 27, 2007
Filed Under Advertising, Digital Archive, Digitisation, Future, Newspapers, Paid Content, Tools and Services, User Generated, del.icio.us | Leave a Comment

Following on from some of the issues raised in “When tags work and when they don’t: Amazon and LibraryThing“.  

The team behind the Scotsman Digital Archive - a searchable archive of The Scotsman newspaper from 1817-1950 developed during 2005 - http://archive.scotsman.com/ - developed functionality that allows users to “clip” and “tag” articles of interest that they might want to get back to later.

Scotdman ArchiveThis is important in the context of a relatively large repository - and especially important when searching across text extracted via OCR from historical material. 

The tagging functionality allows users to organise, group and locate articles easily and was quickly adopted by the user community as the benefits were substantial and obvious.

Clearly this tagging adds a great deal of value to the archive, providing some structure and pathways to difficult to find or significant material that could ultimately benefit the wider community of users.

The next obvious stage of development would have been to develop the social/community side of this by allowing users to share their tags with other users, make them public and connect with each other.  This reinforces the principle that the best way to get users to add value to data is to do it in such a way that they don’t realise they are doing so - I guess this is the same as saying provide a clear incentive (see Amazon, Google Maps, Listal.com etc).  It also goes without saying that it should be as pain-free as possible.

The other point to note here is that in certain circumstances you can get users to tag for the greater good - and not just as a by-product of some personal benefit.  You can see this in del.icio.us where individuals can develop into experts through their consistent and regular tagging of material on specific subject areas.  These experts can eventually develop a “network” of like minded indivuduals and attract “fans” who track what they are tagging via RSS.  This network effect is one of the most powerful aspects of del.icio.us.  The profile that the experts receive from their peers within the community is a great incentive to continue tagging. 

[Disclosure:  Please note author is former General Manager of scotsman.com.]

More than one million unique, historical newspaper pages online …

Posted on February 23, 2007
Filed Under Digitisation, Good Things, Newspapers, Paid Content, Search, Tools and Services | Leave a Comment

Announced on the 15th February via press release, Small Town Papers Inc. have partnered with World Vital Records, Inc., to make over one million newspaper pages from small towns across America available and searchable online.  

The press release states that:

“We selected World Vital Records to distribute our collection of small-town newspapers because of their commitment to the millions of people who want to research their family history,” said Paul Jeffko, president and founder of SmallTownPapers, Inc. “World Vital Records is delivering on their mission to help people discover their ancestors with an incredible collection of exclusive materials, including SmallTownPapers.”

Current editions are available from over 250 small town newspapers and users can also search the archiveUsers have to register to access added benefits such as the “Scrap Book” and “Notifiers”.  The revenue model appears to be advertising rather than subscription based and the site looks to be reasonably well monetised via display and contextual (Google AdSense) advertising deals.  Geo-targeting of ads also appears to be pretty good - while looking at an edition of the Mifflinburg Telegraph from November 10th 2005 I was getting sky and banner ads from The Sun (UK national) and Talk Talk (UK Broadband service).  

There is a “order a digital reprint” link but it doesn’t work so I guess there are plans to offer this service online eventually.

They are looking to extend the service.  On the ”For Publishers” page it states:

“Would you like your newspaper to be included in the  SmallTownPapers web site? We can convert your paper or film archives to a fully-searchable image archive. Small community  newspapers can participate with little or no cost.” 

As far as I could tell boolean operators are not available in search and pages are not segemented into individual articles for search or display purposes - meaning you can’t search for “apples AND pears” within the same article.  If you search for  ”Edinburgh garden” you get “Edinburgh” from one article and “garden” from another which makes it harder to find things.

Saying that - not bad for a free service.  

Newspaper Digital Editions - future or futile?

Posted on February 23, 2007
Filed Under Advertising, Copyright, Digitisation, Future, Newspapers, Paid Content | 1 Comment

Following on from the Roy Greenslade post on newspaper Digital Editions - “Is PDF an acronym for Pretty Damn Futile?

Digital Editions are indeed an easy leap for old word print editors / execs to make - amazing that they are still going for it with such enthusiasm really.  Half an hour of research would tell them that it simply isn’t going to meet their objectives - if  those objectives include significant subscription or ad revenues.

It ticks the “digital” box, but in reality doesn’t really do much else in terms of value for the regular user.  Any demand there is comes from those that need to keep a record of how the paper actually looked when it was published - ad agencies, media monitoring companies and there also tends to be a small market for certain types of reader who are out of circulation area.  Even if they were available for free most users would never use them - they will go to the newspaper web site or use Google or Google News to find current or older articles. 

Last time I reviewed uptake of digital editions among newspaper titles (to be honest it was some time ago now) it varied between 0.2% and 1% of actual newspaper circulation - at this rate it was always going to be a struggle to get revenue to justify the cost of production.  Even the NYT who invested in their digital edition provider - NewsStand - and were therefore more incentivised than most to make it work, could only make it to the higher end of this range. It could be that uptake has changed for these services - but I doubt it.  Anyone got any stats on this other than the recent report analysis on Norwegian titles?

One area that does seems to work a bit better is specialist publications and magazines where readers like to hold on to copies for reference purposes and build their own archive. 

As more newspapers digitise their historical material there will at least be some justification for the cost of production as the process of turning the newspaper into a Digital Edition can also populate a Digital Archive - allowing users to search from the first edition of the newspaper to the most recent from a single user interface. 

Even this justification will be short-lived as the goal will be to eventually populate the Digital Archive directly from the newspaper / web site production system - The Guardian and Observer already do this for their Digital Editions.

This is quite old but good overview of the key suppliers and issues from J.D. Lassica in OJR from June 2004 - “Are Digital Newspaper Editions More Than Smoke and Mirrors?

National Archives of Japan - Digital Gallery

Posted on February 21, 2007
Filed Under Digitisation, Entertainment, Good Things, Search, Technology, Tools and Services | Leave a Comment

National Archives of Japan - Digital Gallery has some great maps, photograhs and posters - this is one of a series on “Air-Raid Precautions and Civil Defense, Illustrated Posters of “Air-Raid Defense”.  

“You can run the keyword or layered search, and view the detailed descriptions and digitized images of the records preserved by the National Archives of Japan. You can, according to your circumstances for the use of the Internet, view the digitized images in the formats of JPEG2000, PDF or JPEG. You can also run the cross-file search linked to various data bases worldwide to share a wide range of information and knowledge.”

Decline and fall of a music empire

Posted on January 4, 2007
Filed Under Advertising, Broadband, Copyright, Digitisation, Entertainment, Future, Music, Newspapers, Paid Content, Technology, Trends | Leave a Comment

The FT reports on MusicZone’s slide into administration.  There is upside for the industry in terms of the growth of digital revenues but this growth is no where near enough to cover the dramatic decline in the revenue from physical formats. This will all sound very familiar to newspaper executives.

The downward trend has been clear for five years but recent figures suggest that the decline in CDs and DVDs has accelerated. The IFPI, the music trade association, reported a 10 per cent slide in physical format sales in the first half of the year around the world.

Ged Doherty, the head of Sony BMG’s UK operations, predicted two months ago that CD sales would halve over the next three years.

“We predict digital growth of 25 per cent per year but it is not enough to replace the loss from falling CD sales.”

Mr Doherty warned that, if current trends continued, by 2010 the industry’s total revenues could be 30 per cent lower than they are now. He said: “We have to reinvent.”

Start-up will seek out content being used without permission

Posted on December 19, 2006
Filed Under Copyright, Digitisation, Future, Newspapers, Paid Content, Search, User Generated, Weblogs | Leave a Comment

Start-up founded by ex-Yahoo and Verisign execs will help content owners work out if there material is being used withlout permission. Might not be welcomed by social networking / blooging sites:

Attributor analyzes the content of clients, who could range from individuals to big media companies, using a technique known as “digital fingerprinting,” which determines unique and identifying characteristics of content. It uses these digital fingerprints to search its index of the Web for the content. The company claims to be able to spot a customer’s content based on the appearance of as little as a few sentences of text or a few seconds of audio or video. It will provide customers with alerts and a dashboard of identified uses of their content on the Web and the context in which it is used.

And.

Its co-founders, former Yahoo Inc. executive Jim Brock, and Jim Pitkow, a Silicon Valley entrepreneur who has sold companies to Google and VeriSign Inc., claim to have cracked the thorny computer-science problem of scouring the entire Web by using undisclosed technology to efficiently process and comb through chunks of content. The company says it will have over 10 billion Web pages in its index before the end of this month.

Yahoo launches Open Content Alliance

Posted on October 3, 2005
Filed Under Digitisation | Leave a Comment

The FT reports that Yahoo are following Google with the announcement of their involvement in a major digitial archiving project - the Open Content Alliance. 

Initial members of the consortium include: University of Toronto, Adobe Systems, the European Archive, Hewlett
Packard Labs, the UK’s National Archives, O’Reilly Media, the Prelinger Archives and the Internet Archive.

The big difference from the Google Print project is that “content under copyright will be made available through the OCA only with the copyright holder’s authorisation.”

Search Engine Watch notes that “At the option of the copyright holder, copyrighted content may be distributed through a Creative Commons license,” going on to explain that “Creative Commons is a non-profit organization whose licensing encourages personal use, reuse and re-purposing of digital content. Content that is made available on the OCA website will be available in PDF and other widely adopted formats. This approach enables mass media and independent publishers to
expand their reach by submitting content that spans categories, file formats and languages while retaining their copyrights.”.

Chronicle.com points out that the Association of Learned and Professional Society Publishers, has endorsed the Yahoo plan. In a press release, Sally Morris, chief executive of the association, said, “We welcome the launch of the OCA because its approach respects the rights of publishers and other copyright owners.”

According to project leaders  neither Yahoo nor any other group involved has been given exclusive rights to the content - material will be made available so that it can indexed and searched by other search engines.  Yahoo will get things going by paying for the scanning of an 18,000-volume collection of American literature at the University of California.

Business week: Mainstream press will open archives

Posted on May 4, 2005
Filed Under Digitisation, Newspapers, Paid Content | Leave a Comment

Link: Prediction: Mainstream press will open archives.

US to have 30m newspaper pages online by 2006

Posted on November 19, 2004
Filed Under Digitisation, Newspapers | Leave a Comment

US announced this week that they will have 30m newspaper pages on net by 2006.

This article mentions that …

The span of the joint project is limited because type faces of printers used before 1836 are too difficult for optical scanners to read, and copyright restrictions are in force on papers published after 1923.

They have developed a prototype at the Library of Congress site - The Stars and Stripes, 1918-1919. It is a pretty basic interface - they are clearly focusing on getting the basics right before developing the front end.  If you go to the above link and look to the bottom left of the page you will see that you are able to view “the OCR-generated text transcription of this page”. This gives a reasonably accurate OCR version of the page.  The PDF’s and the OCR accuracy look to be excellent from a quick look.  However they don’t look to have done that well on “segmenting” the page into it’s constituent elements but to be fair this is clearly an early version.

Copyright and licensing for digital preservation

Posted on June 16, 2004
Filed Under Digitisation, Journalism, Newspapers, Paid Content | Leave a Comment

Excellent report looking at how libraries can use material deposited by publishers.

Libraries cannot preserve digital material they do not own. Adrienne Muir describes a new project to identify copyright and licensing issues that currently hinder digital preservation and looks at whether new legislation will help.

keep looking »

Blogroll


Categories


Archives