tag:blogger.com,1999:blog-24369178538088894742024-03-19T21:11:58.820+00:00Ian Davis - managing information, taxonomy, metadata and knowledgeThe posts on this blog are provided 'as is' with no warranties and confer no rights. The opinions expressed on this site are my own and do not necessarily represent those of my past,future or present employer or any organisations i might belong to unless explicitly stated.Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.comBlogger19125tag:blogger.com,1999:blog-2436917853808889474.post-12810545978437325352010-07-01T09:44:00.000+01:002010-07-01T09:44:50.507+01:00Musings on E-Commerce Retail MetadataRetail based e-commerce, buying and selling products online, is a growing part of all our lives. From purchasing the latest DVD movie, ordering a part for your hand-built pc, or just treating yourself to the odd novel, using retail websites is something we all do day-in-day-out.<br />
<br />
The business of creating operational e-commerce sites is a complex one. From relationships with data vendors and product suppliers, to product data, supply chain and pricing issues, to website navigation, search, and item ordering and fulfillment, there is a lot to consider. A great deal of work goes into creating and running the sites we use everyday, but how much do many people know about what goes on below the surface? As the swan glides smoothly across the water, how much frantic paddling is really taking place below the surface? I thought I'd take a little time to introduce a few key areas, giving a flavour of some e-commerce retail day-to-day issues and challenges.<br />
<br />
<span style="font-weight: bold;">Product Data</span><br />
Metadata is a major issue for e-commerce retail websites. Two of the clearest divisions many people will see on their favourite sites is between descriptive metadata and promotional metadata. Companies often maintain divisions between these two data types. Different people may create them, in different systems and for different reasons. Different types of metadata may be used in different ways and often interacts in specific ways with each website’s search and browse functionality.<br />
<br />
Descriptive metadata is usually linked to products in order to describe them - either for the benefit of people, IT systems or both. Taking books as an example, their product attributes can be many and varied including: title names, author names, publisher names, languages, prices, weights, number of pages, genres or subjects.<br />
<br />
Take a look at any retail site, look closely and you'll start to see the metadata. You'll soon notice how much of it there is, and there's usually more of it behind the scenes then there is in the public-facing website pages.<br />
<br />
How do companies deal with all this data sloshing around their web businesses?<br />
<br />
One key concern is the need to decide on what metadata is controlled - how and why, and what metadata is not controlled. It's perfectly reasonable to have a number of free text fields, in combination with data fields that are semi-controlled and other fields which only contain pre-determined values.<br />
<br />
For example,<br />
<br />
Free text fields can be populated using agreed editorial guidelines. These may cover short and long descriptions of products, product reviews by users, publishers, or newspapers.<br />
<br />
Semi-controlled fields will be constructed under tighter guidelines. These fields often include titles or sub titles.<br />
<br />
Controlled named entity fields are a big part of the equation. These deal with the proper names of people - authors or illustrators, organisations - such as publishers, and events etc. These metadata fields are populated with controlled names taken from authority files.<br />
<br />
Another key set of controlled vocabulary fields covers: subjects or genres, types - books, CDs etc, formats - paperbacks, or hardbacks, and audiences - children, students, teens. Values for these fields are often created in thesauri with preferred terms being linked to variant forms or synonyms, hierarchical relationships between preferred terms, and related links across them to take people from a subject like Dogs, to the related subject of Pets.<br />
<br />
<b>A moot point is how much control should exist behind promotional metadata versus product descriptive metadata?</b><br />
<br />
I've always taken the view that usually getting the ‘best of both’ is ideal. In a fast paced area - promoting products to customers, much of the promotional metadata needs to be loosely controlled in terms of how it is created and used. Staff should be given freedom to respond to market conditions and use their initiative. However, tighter controls are often needed in terms of who creates promotional metadata. This helps to achieve a broad consistency – giving customers a framework to work within. <br />
<br />
Customers visiting e-commerce websites are often looking for something new, and fast moving promotional metadata assists them with this need. However, customers also like to know how to get to regular promotions relating to fairly consistent products e.g. ‘CD Friday – 10% off the top 20’. Consistency comforts and reassures, unpredictability excites and enlivens – a mix can give the best result.<br />
<br />
<b>How is all this descriptive and promotional metadata created?</b><br />
<br />
For descriptive metadata -<br />
<br />
The needs of each product type, core customers, and the business, are assessed and a list of metadata fields created. For example, a field may be created to cover the concept of product genres. The preferred name of the field may internally be ‘Subjects’, the alternative website display name of the field may be ‘Genres’. Controlled vocabulary terms, needed for controlled vocabulary fields, are created to support product descriptions. These terms are maintained by staff, and assigned to products. For example, current genre terms are gathered and reviewed. They are approved as they are, or modified or removed. Additional controlled vocabulary is created and maintained to support customer needs. Some metadata fields will be populated from third party vendors, others will have data manually entered into them as free text, semi controlled text, named entities or controlled vocabulary terms. For example, subject entries are applied to products through mapping from vendors or entry by staff. Governance structures are created and maintained, including: descriptions outlining the data entered into each field, guidelines on where the data comes from, how it is entered and who enters it etc. For example, rules are often written explaining why a controlled vocabulary focused on subjects is needed, why the internal name is subjects and the external name is genres, how terms are created, maintained and deleted etc. What relationships exist between these terms and between related terms in other areas - relationships may be hierarchical ones – Broad term>Narrow terms, or even related ones between related concepts. Other questions usually include - how a vocabulary is used by other systems, who has the right to request additions and deletions and who has the final say.<br />
<br />
Vocabulary development<br />
<br />
The creation of efficient and effective controlled vocabularies - taxonomies, thesauri and ontologues - whether they support browsing or retrieval, is a controlled process based on ongoing assessment and review. It is not something that is done quickly or with little thought. Proper steps are taken to fully and effectively create and modify the necessary vocabulary types. Essentially, it is possible to create and develop a consistent and logical set of data structures behind the scenes, upon which much can be effectively built, whilst ensuring flexibility as to how data elements and the relationships between them can be displayed on websites.<br />
<br />
For promotional metadata - <br />
<br />
Guidelines are usually written outlining the role of promotional metadata and the ways in which it promotes sales. These guidelines describe and control the process of creating, modifying and removing promotional metadata. Specific staff create, modify and delete promotional metadata on a daily basis. The effectiveness of promotional metadata in generating product sales is analysed and changes made as needed. Effective promotional metadata is often defined as promotional data that sells more products. Ineffective promotional metadata is conversely defined as that which sells fewer products, reduces or damages the sales experience, or results in potential customers not buying and moving to rival websites.<br />
<br />
A simple example of one possible guideline may be to ensure that in the promotional area of a website, promotional items with a short duration are always at the top of the display, whilst those with a longer duration are lower down the display. People need to see very time restricted offerings first, but like to know where longer sales promotions can easily be found.<br />
<br />
For example<br />
<br />
Unstructured view:<br />
<br />
• Buy Dr Who DVDs<br />
• 2 for 3 on CDs<br />
• Latest Disney blu-rays<br />
• Magic Monday – special offers<br />
• 6 hour speed sale – click now<br />
<br />
Structured view:<br />
<br />
• 6 hour speed sale<br />
<br />
• Magic Monday – special offers<br />
<br />
• 2 for 3 on CDs<br />
<br />
• Buy Dr Who DVDs<br />
• Latest Disney blu-rays<br />
<br />
<b>The challenges e-commerce sites face are many and varied, these include:</b><br />
<br />
Data mapping from vendors:<br />
* Reviewing mapping tables.<br />
* Documenting data needs.<br />
* Analysing data vendor metadata for breadth, depth and accuracy.<br />
* Negotiating with vendors regarding additional information or fixing data feed issues.<br />
* Modifying mapping tables – changing current mappings, adding additional ones, and creating new vocabularies to map to.<br />
* Testing and releasing updates.<br />
* Agreeing and implementing governance rules and guidelines.<br />
<br />
Named entity enhancements:<br />
* Reviewing named entity files for data accuracy, breadth and depth.<br />
* Identifying problems with current data and possible problems with the addition of data – either from vendors or through manual entry.<br />
* Fixing vendor or data entry issues relating to new data.<br />
* Cleaning metadata previously entered.<br />
* Identifying named entities with more than one alternative name. For example, ‘Arthur Conan Doyle’, ‘Conan Doyle, Arthur’, ‘Conan Doyle’, ‘Doyle, Conan’, Arthur Conan-Doyle’, etc.<br />
<br />
Data cleansing tasks to fix these kind of problems would include: identifying the named entities, choosing a preferred name based on guidelines and creating a data structure allowing the creation of a number of alternative names, which would be linked as synonyms to the preferred name. A newly cleansed vocabulary of preferred names, related to a wide number of synonyms, would assist greatly with data retrieval.<br />
<br />
Search Support:<br />
* Reviewing and extending search synonyms.<br />
* Reviewing search metrics: zero hits, few hits, too many hits, searches with low product views, searches with low basket conversion rates.<br />
* Directing the results of each search review into enhanced metadata creation, product descriptions, search effectiveness (e.g. stop words review and updating) and website usability.<br />
* Agreeing and implementing governance rules and guidelines.<br />
<br />
Browsing Assistance:<br />
* Creating and displaying consistent and intuitive facets describing and promoting products.<br />
* Creating useful divisions between descriptive and promotional categories.<br />
* Creating processes to manage the maintenance of these divisions, the ways in which both are created and developed and the ways in which metadata in back-end systems interacts with metadata displayed on public facing websites.<br />
* Agreeing and implementing governance rules and guidelines.<br />
<br />
Luckily all of these challenges can be dealt with and minimised by employing the ongoing professional services of staff or consultants adept at using data analysis, metadata modelling, taxonomy, thesaurus and ontology creation and mapping to support content description and findability. When these skills are combined with current stake analysis, key task analysis and supported by the best in usability, wonderful things can be achieved.<br />
<br />
IanIan Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com2tag:blogger.com,1999:blog-2436917853808889474.post-36254825733436335462010-04-30T14:33:00.024+01:002010-04-30T15:31:05.753+01:00Juice Based Findability<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-4SPshpuaHWfEGRcRZMzp_zR9InF1HZ3BF9PLzIGy6CR6qWS2H0pc5iacVfgGlmK6SryUqRLWRWxEt1-elL4pzptQ9pgd0Q0ucOs1g_61jLR9KkfahwrNje97TVWyyPl9u4Slvmlmn44_/s1600/orange+juice.jpg"><img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 163px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-4SPshpuaHWfEGRcRZMzp_zR9InF1HZ3BF9PLzIGy6CR6qWS2H0pc5iacVfgGlmK6SryUqRLWRWxEt1-elL4pzptQ9pgd0Q0ucOs1g_61jLR9KkfahwrNje97TVWyyPl9u4Slvmlmn44_/s200/orange+juice.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5465936725948093154" /></a>I recently returned from an e-commerce assessment project in Cape Town. The project went well, and the client was absolutely wonderful - very welcoming and extremely keen to strengthen the asset categorisation of their products and the search and browse support they offer.<br /><div style="text-align: right;"><br /></div><div>My stay was extended somewhat by the antics of an Icelandic volcano - yes me too. While I was on 'volcation' I enjoyed a number of visits to the hotel's 'full breakfast buffet'. Sitting there, sipping my coffee, I received a lesson in 'Juice Based Findability' - bear with me, it will make sense soon.</div><div style="text-align: right;"><span class="Apple-style-span" style="font-size:x-small;"><br /></span></div><div>My hotel had the usual juice section - glasses close to a variety of freshly squeezed juices. I probably sat near this juice area 10 times during my recent stay. Whilst idly watching my fellow breakfasters I noticed at least 5 occasions when the guests could not find the glasses for the juice. The thing was that the glasses were lined up below the juice bar, and the table top on which the juice bar was sitting was wide enough to obscure the glasses to the guests who were standing next to the juices. On a number of occasions, guests approached the juice area intent on getting a drink, and all too often they were unsuccessful - they could just not find the glasses. Some looked around quite determinedly, some spent longer than others trying to track down the errant glasses. Some asked members of staff for help, some just walked away and got a coffee or tea instead.</div><div><br /></div><div>Some people tried harder than others to solve the problem for themselves and get a glass of juice, but everyone with the problem was unsuccessful in solving it. The same staff were asked to solve the same problem day in day out, and yet they never altered the juice bar area. They never changed the location of the glasses or added any signage explaining the location of the glasses.</div><div><br /></div><div>This experience is very similar to information finding challenges online. All too often sites do not make information finding tasks as simple and as fast as they should be. Also, when faced with real people having real problems, some sites ignore them, others help individuals via customer services centres, but most don't fix the root of the problem.</div><div><br /></div><div>Faced with problems, frustrated by confusing navigation, strange search results, or missing information, most web users will go elsewhere with their business. If they do let the site owner know the problem, then please website owners, fix it at the root so other people don't encounter it.</div><div><br /></div><div>Sometimes information architects and website owners are too close to things - too focused on their issues and their plans. They need to regularly take a step back and watch their customers and users interacting with their websites.</div><div><br /></div><div>Next time you have a moment, look at the key information tasks your customers or clients have, sit back and ask, "How easy it it to get to the juice?" Analyse search logs, sit with people and watch them use your site, there are lots of ways to do it. Then, act on what you see, focusing on helping most of the people most of the time. I guarantee that valuable lessons will be learned and findability will improve.</div><div><br /></div><div>Dow Jones Client Solutions offers audits targeted at improving information findability through enhanced asset categorisation, browse navigation and search support. Let me know if you would like to get more value out of your information.</div><div><br /></div><div>Ian</div><div><br /></div><div><span class="Apple-style-span" style="font-size:x-small;">photo by </span><a href="http://www.flickr.com/photos/mamchenkov/470956381/"><span class="Apple-style-span" style="font-size:x-small;">Leonid Mamchenkov</span></a></div>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-14240701717551886202010-03-22T16:13:00.005+00:002010-03-22T16:20:50.419+00:00E-Commerce Websites - Metadata and Controlled Vocabulary Can Help<div>I've worked for Dow Jones Client Solutions, managing our 'Outside Americas' information consulting services, since 2006. In that time I've been involved in a wide range of projects for a variety of businesses.</div><div><br /></div><div>Dow Jones Client Solutions offers a diverse range of information management services, amongst them services to: organise audio, video, image, and text assets, improve information browsing, provide effective search experiences and create bespoke user journeys that direct clients and customers from initial products and services to related ones.</div><div><br /></div><div>I've been thinking a lot about e-commerce websites recently, and looking at quite a few examples of the genre. I am also a customer myself, and all too frequently come up against frustrating websites with poor search and browse functionality and a complete lack of regard for the possible customer.</div><div><br /></div><div>Competition online is strong. It's easy for customers to move between competing websites - choosing the ones with the best experience and the right mix of products, price and customer service. Revenue and market share go to sites that offer an easy to understand information architecture - with user-friendly navigation, an intuitive and efficient search experience - with effective asset categorisation, search facets and filters, related links to products and services, and the appropriate sets of keywords to direct simple searches to the appropriate results.</div><div><br /></div><div>Dow Jones Client Solutions offers:</div><div><br /></div><div> * E-Commerce Assessments.</div><div> * Search and browse advice and development.</div><div> * Metadata and vocabulary development and maintenance.</div><div> * Categorization advice for text, images, video and audio assets.</div><div> * Vocabulary and metadata mapping to aid sharing and interoperability.</div><div> * Metadata and vocabulary translation and localisation.</div><div> * Information management workshops and training sessions.</div><div><br /></div><div>If anyone reading this feels that the consulting services we offer may be of interest, I would love to arrange an quick informal call to discuss your business objectives.</div><div><br /></div><div>I look forward to hearing from you.</div><div><br /></div><div>Ian</div><div><br /></div>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-77753623354394927602009-11-30T15:34:00.041+00:002009-11-30T16:47:46.138+00:00Digital Asset Management Foundation - Coffee Meet-Up - Notes and AudioIn my last blog post I mentioned I was taking part in an informal 'meet-up' to discuss Digital Asset Management (DAM). I made some rough notes during the call, which I hope will serve to give a flavour of the discussions:<br /><br />Topics:<br /><ul><li>The need to broaden the understanding of DAM.</li><li>The need to share experiences and challenges in DAM.</li><li>The need to connect with clients, understand needs and deliver targeted solutions.</li><li>Creating metadata and vocabularies to support assets: images and video.</li><li>Applying metadata to image and video assets - manual, automatic and semi-automatic solutions.</li><li>DAM solutions: 'software as a service' versus 'enterprise solutions'.</li><li>Creating Vision Statements for DAM.</li><li>The phases of DAM.<br /></li><li>DAM return on investment: key task analysis, baselining and measuring outcomes.</li><li>Controlled vocabularies for DAM - license to kick start development, then develop and customise.</li><li>Using consultancy to support DAM creation and utilisation.</li><li>Working with legacy data in DAM systems.</li><li>Harvesting metadata from creators and suppliers.</li><li>Adding value through manual tagging of assets.</li><li>Tagging assets using: external sources - off-shore or local, or in-house resources.</li><li>Video processing: soundtrack indexing, scene and key recognition.<br /></li></ul>For those who want to listen to the conversation you're free to do so by visiting the following URL:<br /><br /><a href="http://rec1.dimdim.com/view2/dimdim/e89feedc-2cac-102d-9515-003048642bd7">DAM Foundation - Audio Track of Coffee Meetup 27 Nov 2009</a><br /><br />The audio is a little broken up at the start, but stick with it, it gets better. Also, time delays between the US and UK means it sounds as if the speakers are talking over each other.<br /><br />Speakers were:<br /><ul><li><a href="http://uk.linkedin.com/in/nigeljcliffe">Nigel Cliffe</a>, Managing Director at Cliffe Associates Ltd</li><li><a href="http://uk.linkedin.com/in/iandavistaxonomyspecialist">Ian Davis</a>, Taxonomy Delivery Manager, Outside Americas, Dow Jones Client Solutions</li><li><a href="http://www.linkedin.com/in/hdegyor">Henrik de Gyor</a>, Digital Asset Manager at K12 Inc</li></ul>I hope you all enjoy the conversation, we hope to arrange more in a few weeks.<br /><br />IanIan Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com2tag:blogger.com,1999:blog-2436917853808889474.post-34499507665065352322009-11-27T10:35:00.048+00:002009-11-27T12:16:59.089+00:00Digital Asset Management and Metadata for Images and VideoMissing out on the recent Photo Metadata Conference - <a href="http://www.linkedin.com/redirect?url=http%3A%2F%2Fbit%2Ely%2F6PlLJj&urlhash=mV6T&_t=NUS_STAT-link_text&trk=NUS_STAT-link_text" target="_blank">http://bit.ly/6PlLJj</a><span class="text"> - has reminded me how much I love working in the DAM world, in particular in the area of creating metadata and controlled vocabularies to support digital image and video search and browse.<br /><br />Reading about the <a href="http://www.phmdc.org/programme2009.htm">Photo Metadata Conference programme</a></span><span class="text"> it seems like there were some great presentations. I downloaded them all, they're available from the conference website, and had great fun going through all the excellent experiences, comments and ideas.<br /><br />I wish I'd been there for <a href="http://www.google.co.uk/url?sa=t&source=web&ct=res&cd=1&ved=0CAcQFjAA&url=http%3A%2F%2Ftaxonomysociety.com%2Fblog1%2Fabout-2%2F&ei=ZboPS-yFEJ-H4gblq4ybBA&usg=AFQjCNEKictIJvJTkwmpsiT1HrvfjzuStg&sig2=A4sZAzbgOxCG4W7FWie-rA">Madi Solomon</a>'s keynote on the collapse of boundaries in the digital world. I agree that it's less and less about what format an asset is in and more about what that asset is, and how it needs to be organised to support its use.<br /><br />Assets need to work for their places in the world. Finding them and using them needs to be simpler, and metadata and controlled vocabularies need to support and enable this.<br /><br />Understanding the assets an organization has, analysing the needs of that organisation, and ensuring they have what they need and that each asset is organised to support its use, is where the really exciting and satisfying work is for me.<br /><br />After having worked for <a href="http://www.corbis.com/">Corbis</a> from 1991 to 1999, in the early research and development days of digital image organisation and sale, I was excited to see Max Wieberneits presentation on still and video metadata.<br /><br />Video and still images have much in common. I've blogged about this in the past and it's still a big area for me. Both asset types have technical metadata, depicted content metadata and aboutness metadata, to name but a few. Add to this the sound tracks for video - which can be indexed for retrieval, and the ability to segment video into scenes and key frames, and you have an exciting mix of metadata across both formats.<br /><br />I agree with Max that using established metadata systems makes a huge amount of sense, as does working to get as much metadata as possible from the creators or custodians of images and video - it's much easier to capture metadata early on in the creation process than down the line, and some metadata will be lost if you leave its capture too late.<br /><br />As Max says, one key concern for image and video asset metadata is the users of the assets. Different people have different needs and need different metadata. For many people a good level of access to video can be built using initial metadata associated with the videos, key scene and frame analysis and the indexing of the audio tracks of the videos. Whereas for others, access to the mood of the video may only come through music analysis, lack of noise at key moments, and manually applied subject tags.<br /><br />On the image side, as Max says, editorial users have somewhat differing needs to commercial users of stock photos. Max showed a great slide listing a long set of conceptual keywords: 'comfortable, dreaming, luxury, spoiled' etc. I remember the fun we had creating these concepts, arranging them in hierarchies, providing synonyms for them, and creating definitions and application rules to control how they're assigned. It sounds easy, but trying to accurately use a concept like, "spoiled" or "luxury" often brings many challenges.<br /><br />I've already touched on the needs of video users, and some of the basic ways video can be organised. It was great to read Lionel Faucher's piece on how a video agency uses metadata. Video is easier than still images to work with, automated solutions are more applicable to video and much more successful, but challenges still abound, as Lionel clearly shows in his presentation.<br /><br />One of the interesting topics I've been following for a while is the metadata being generated from digital cameras, and the work being done to make more use of it. Related to this is the exciting area of geographic coordinate metadata, which is created by some digital cameras when a photo is taken, and the uses to which that can be put.<br /><br />Two presentations in the area of geography and image metadata were given by Bern Beuermann</span><span class="text">, and Ross Purves. A great research area was mentioned by Bernd - the taking of GPS co-ordinates and linking them to points of interest that are within a certain range of a GPS location. This can make the tagging of images with key depicted buildings, or topography a little easier and will produce many advantages for image tagging and retrieval..<br /><br />A couple of things that I'm interested in were missing from the conference. I'd have liked to have seen more on: working with video soundtracks, automatic scene and frame analysis, and the place of manually applied tags in video indexing. I'd also like to have seen more about the creation of hybrid image retrieval systems that bring together content based image retrieval with controlled vocabulary and folksonomy tags. Maybe that's all for next year!<br /><br />There also seemed to have been a big emphasis on technology, file formats, and metadata standards - in many ways the building blocks or key tools for organising and providing access to video and image content. What I'd have liked to see more of is the uses to which these building blocks have been put, the real world sharing of user needs and the challenges of actually making the technology and the supporting structures work to achieve business aims.<br /><br />I should end by thanking the organisers of the event, and the presenters, for putting so many presentations online - it's very helpful and refreshing to have such a good level of access to this form of content.<br /><br />One way in which I keep involved in the image and video world is through my involvement in the <a href="http://www.linkedin.com/groups?about=&gid=1952873&trk=anet_ug_grppro">DAM Foundation</a> on Linkedin. There is a coffee meet-up organised for this afternoon, which I hope will kick start a lot of exciting developments. I'll post more about the outcome of the meeting next week.<br /><br />Ian<br /><br /></span>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com2tag:blogger.com,1999:blog-2436917853808889474.post-79252286956320418622009-10-06T10:09:00.031+01:002009-10-06T10:56:56.538+01:00My Thoughts on, "Collaboration: know your enthusiasts and laggards", article from CiscoLast week I spent some time reading an excellent and very interesting piece from Cisco, "<a href="http://bit.ly/KknE8">Collaboration: know your enthusiasts and laggards</a>".<br /><br />I encourage you take a look at the results of the study Cisco undertook into the factors linked to successful adoption of collaboration via networked tools: instant messaging, wikis, shared workspaces, video conferencing, forums and discussion boards etc.<br /><br />Whilst reading their interesting findings a couple of things struck me.<br /><br />On page one of the article was the sentence,<br /><br />"You can use the study results to maximize your return on investment from collaboration tools. One way is to implement business practices shown to lead to more enthusiastic collaboration."<br /><br />This struck me as possibly being another way of saying: if you have already purchased tools to allow collaboration you can enjoy a return on that investment by putting in place an environment which will encourage collaboration using these tools. Please correct me if I'm wrong but this sounds a little too close to the assumption that collaborating is an end in itself, not a means to an end.<br /><br />To my mind, collaboration is very important in many walks of life and many types of organisations can benefit from doing a lot more of it. Some of it will come via software; much of it should come through face-t0-face chats, discussions and more formal meetings. None of it will, I think, lead to a return on investment in and of itself. If I asked a CEO how their business was doing in these hard times, I wouldn't expect them to say, "We're doing well, we're collaborating so much more than before."<br /><br />For me, the key to a return on investment from collaboration is controlling that collaboration. Knowing what the business goals and objectives are and making a conscious decision to use collaboration as a technique to help achieve them. Also important is the monitoring of the collaboration taking place and then linking the collaboration efforts to the outcomes of the collaboration.<br /><br />Collaboration can have a very specific goal, "We have a project to deliver and two teams in different cities need to collaborate, in these ways, to successfully deliver that project."<br /><br />Collaboration can be less concrete, but no less valuable, "We have a group of people over here, and another group over there, who would benefit from talking more and understanding each other - their jobs, their day to day issues and how they go about solving them. We're not sure what will exactly come from this but we will set up collaborative spaces, monitor them, get feedback from the collaborators, and look at how these groups do their jobs one month, three months, six months, after the collaboration was established. We'll then analyse how collaboration contributed to getting a, b, and c done, learn from the experience and build on it.<br /><br />Rather than saying, "We collaborate therefore we succeed", I'd like to be able to say, "We had a business need, problem or corporate goal, we put a number of collaboration techniques in place and we achieved our goals or fixed our problems. We also saw where and how our collaboration contributed to our success."<br /><br />Collaboration is a tool to use to achieve an objective, not an end in itself. Return on investment comes from what results from collaboration, not from collaboration alone.<br /><br />For many people and organizations the goal should be to achieve results through targeted collaboration, not to just collaborate more.<br /><br />I hope we all succeed because we know how to collaborate, we know why we're doing it, we know what we get from it, and we know how it contributes to our goals and objectives.<br /><br />IanIan Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-28080749831556219012009-10-05T11:18:00.020+01:002009-10-05T11:53:58.619+01:00Accessing Useful Knowledge: musings from a train carriageSitting on a train, slowly trundling through Hertfordshire, my thoughts turned to the challenges of knowledge and information sharing.<br /><br />I was minding my own business, surrounded by other similar people, also minding their own business and I started to think that if I had a need for knowledge and information, what would be my best course of action? What would be the most efficient and effective way to obtain, share and distribute information and knowledge?<br /><br />Pondering this question produced some interesting thoughts.<br /><br />If I needed a particular newspaper, document or magazine article, that I'd forgotten to bring along with me, my best bet was to stand up, forget I was English, and ask my fellow travelers whether anyone had a copy. A long shot I know, but a direct request for specific information was my best chance.<br /><br />On the other hand, if I had a less structured knowledge and information need what would work best?<br /><br />If I wanted to exchange information and knowledge regarding how to get people to share their knowledge in a work environment, and how to persuade them, "not to panic" and convince them that knowledge sharing, "is a good thing", my best bet is not to ask a specific question out loud, or to call, tweet, or email the people in the carriage. My best bet is to try to get a conversation going between all the people in the train carriage.<br /><br />Back in the real world, persuading a bunch of strangers to talk to each other on a train is only going to happen if the train grinds to a halt and all the lights go out - otherwise, forget it.<br /><br />However, the thought emphasised for me that often the best means of communication is face-to-face. The best way to exchange knowledge and information in order to meet a range of needs is to get a group of people to sit in the same physical space, and with a clear idea of the boundaries and objectives of the meeting, to talk to each other in the real world.<br /><br />Other forms of more distanced communication, email, phone, etc have their place and are very popular and useful, but in this world of technology let's not lose track, let's not forget, that having a discussion with a real person is often the best way to communicate.<br /><br />IanIan Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-89844107144708360572009-10-05T10:58:00.005+01:002009-10-05T11:08:44.729+01:00Search Solutions 2009I recently attended the<a href="http://bit.ly/YDRRt"> Search Solutions 2009</a> one day conference. For an excellent summary of a very interesting day take a look at <a href="http://bit.ly/ppt9r">Karen's recent blog post</a><br /><br />For me, 'a star of the show' was <a href="http://bit.ly/lzUeV">Dave Mountain</a>'s enthralling discussion, "Location-Based Services: Positioning, Geocontent and Location-Aware Applications". Dave looked at location based services, their current uses and future possibilities. One aspect, which sparked heated debate over coffee, was the very real security implications of having your position pinpointed to a couple of metres. Location Based Services will I think continue to grow and meld together with social applications such as Twitter, Facebook, Flickr, e-commerce and mobile devices. We will increasingly know where the nearest coffee shop is to our location in terms of direct route, time taken to get there etc. Add to this the possibility that everyone else will know where you are in real time and what you're doing and you have a world of many information and privacy challenges. I wonder whether we'll end up with people paying a surcharge to cloak themselves from all this information gathering?<br /><br />If you want to know more about the world of Geocontent and Location Aware Applications Dave Mountain is a great person to talk to.<br /><br />Ian<br /><br />This post was previously posted on <a href="http://bit.ly/INW7J">Taxonomy Watch</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-31114187557089395062009-09-28T10:11:00.007+01:002009-09-28T10:17:26.807+01:00Image Findability: Improving through TagsTake a look at my recent article on Image Findability on FUMSI - <a href="http://bit.ly/LQ3UP">bit.ly/LQ3UP</a><br /><br />My article outlines the options open to tag images for a business need - selling, sharing, reducing duplication of effort etc. It assumes an image focused audit or assessment has already understood the creation and use of image content and the need is to choose from a set of options in order to create a tagging plan, with a set of rules, guidelines and success metrics.Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-22983184423765398262009-09-25T16:56:00.000+01:002009-09-25T16:58:17.903+01:00Need to Create Good Work Fast? Simple - Get a New ComputerI have a problem. I have six pieces of work to write in a couple of weeks and I'm under pressure. I need the work to be spot on, of the highest quality and created in the shortest space of time.<br /><br />The answer to my problem? Buy a new computer.<br /><br />Does this sound strange to you? Can you see how improved output comes from a new computer?<br /><br />I was sceptical, but the Sales guy said a new computer was the answer. I asked him to explain and he told me how the time I was wasting messing with my old computer was at the heart of my problem. All those lost minutes fixing crashes, worrying about blue screens, battling with slow performance, scanning for adware, spyware and worse. Forget all that was the message I was getting, move to the promised land of a newer, faster computer and your problems are solved. After a bit more chat I was sold. My new computer would save me time and that extra time would be spent devoted to my key tasks, which in turn would lead to better quality work and faster work at that. Saving time was even money in the bank for me to set against the cost of the computer - so it wasn't even as expensive as I'd thought.<br /><br />At this point I excused myself, had a coffee, and thought it through one more time. Did it make sense that a new computer was my solution? The light quickly dawned, of course it didn't. A new computer wasn't the solution and time saving was not my key issue. How did the Sales guy know that time saved would be time I'd actually spend on my document tasks? How did he know the processes and tasks I'd been performing with my current computer were not valuable experiences - not to be lightly ignored. Why did he make no attempt to understand me and my circumstances and simply sell me the one size fits all Sales line that so many people still hear today?<br /><br />I soon realised than I'm better off assessing my goals and objectives. What is it I need to do? For whom? Why? And when? Then I need to ensure I'm prepared and enabled to achieve them. Is my broadband connection operating? Is it fast enough? Is the right software up and running? Can I access the libraries I need?<br /><br />I would also benefit from improving my time planning and management skills. I need to focus on my key tasks. What is it I need to do? What problems am I having here? I also should not forget my deliverables. What do I need to produce and how do I get there?<br /><br />All these areas, when addressed in the right way, will enable my tasks and improve my outcomes. Granted, this is a little harder to sell than a new computer equals better work and a wonderful life, but surely I'm worth that extra effort and it's certainly what I need to hear.<br /><br />Many of us encounter this scenario frequently. How many times have you watched a Sales presentation built around saving time? Usually a calculator is involved and sometimes members of the audience are asked to volunteer key pieces of information - "How much time do you spend searching for information in a day?", "What's your hourly rate?", "How hard do you find tracking down the information you need?" "Could you be more productive if you saved some of this time?" Very often 'time saved' is then calculated and that 'time saved' directly equated to business advantage. Very often there is little or no thought put into the needs or objectives of individual businesses or any injection of common sense into the Sales pitch.<br /><br />A Dow Jones information assessment looks for the real issues and pain points our clients experience, and works with them to solve their problems and enable improved outcomes. If you have an information management issue you need assistance with, speak to us and let us work with you to get to the heart of your needs. You never know you might even save enough money to afford that new computer you've always wanted!<br /><br />Ian<br /><br />This post first appeared at the <a href="http://www.synapticacentral.com/content/need-create-good-work-fast-simple-get-new-computer">Synaptica Central blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-83517920704865027332009-09-25T16:54:00.000+01:002009-09-25T16:56:16.545+01:00Passionate GeographersI noticed a very interesting initiative recently <a href="http://www.geograph.org.uk/">Project Geograph: Photograph Every Grid Square.</a><br /><br />This project is working towards collecting and making available images depicting the geography of every square kilometre of the British Isles. This ambitious project seems to be progressing very well, with many good quality images loaded to the website.<br /><br />Already over 8,900 contributors have submitted nearly 1,500,000 images, with an average of 5 images associated to each geographic square across England, Wales, Scotland and Ireland. This is a great resource, preserving in amazing detail what the British Isles looked like at the start of the 21st Century. This is also a wonderful way to learn about the geography of these amazing islands and to dig deeply into their hills, valleys, towns and villages. This is also a superb source for genealogists looking at how a particular part of the British Isles looks today.<br /><br />Back in 2007 I attended the <a href="http://www.eu.socialtext.net/blogsandsocialmedia/index.cgi?programme_information">Blogs and Social Media Conference 2.0</a> in London. One presentation which has stayed in my mind since then, was <a href="http://www.headshift.com/mt/mt-cp.cgi?__mode=view&blog_id=3&id=20">Lee Bryant's</a>, "Engaging with Passionates". In his exceptional presentation Lee described a ground-breaking social networking case study and talked about the energy that can be released when organisations successfully tap into a group of people who are truly passionate about a given topic.<br /><br />I think you'd be hard pressed to find a better example of the power of passionates than the Geograph Project. Looking at the number of contributors, the amount of the British Isles covered, and the quality of the photography and metadata created, makes a clear point - find people who are passionate about a topic, people who are committed to a hobby or interest, engage them in the right way and they will deliver time and again.<br /><br />I wish everyone associated with the Geograph Project all the luck in the world, may they stay passionate and committed to what they do, and may their project benefit from their commitment.<br /><br />Oh, and if you like what you see, submit a photograph, or start a similar initiative.<br /><br />Ian<br /><br />This post first appeared in the <a href="http://www.synapticacentral.com/content/passionate-geographers">Synaptica Central Blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-60321353864209419802009-09-25T16:48:00.000+01:002009-09-25T16:53:58.078+01:00Report from Digital Asset Management (DAM) Conference - London, 1 JulyI spent Wednesday 1st July at the Henry Stewart DAM Conference in London.<br /><br />In my slot I talked about, "Tagging Images for Findability - Making Your DAM System Work for You." I used my 30 minutes to raise the issue of organising images using metadata and controlled vocabulary to connect the images to the people who want to use them. I spent a little time looking at the ways to use text to categorise images and the advantages and disadvantages that brings. I devoted a lot of the presentation to raising issues to watch out for when tagging images, in particular specificity and focus in image depictions, abstract concepts and image 'aboutness' and the deceptive simplicity of visually simple images.<br /><br />A far braver presentation than mine was given by Madi Solomon. Madi ditched the PowerPoint presentation to facilitate a refreshing debate on metadata. Questions from the floor came thick and fast. Madi did a great job of presenting 'on the edge' and drew out the experiences of many of the attendees and the challenges they were facing.<br /><br />Also of note at the conference was a very informative presentation from Theresa Regli on 'Evaluating and Selecting Technologies' and a stimulating piece from Mark Davey on the old chestnut of ROI and Digital Asset Management Systems. Mark took a pretty dry subject and a slot directly after a good lunch and succeeded brilliantly in making it entertaining, informative and practical. Take a look at his excellent presentation Digital Asset Management ROI - the basics. I think this is a key resource for anyone interested in return on investment in the DAM space and it's fun to watch too.<br /><br />I had a great day at DAM London and I hope my fellow delegates found the presentations as helpful and enlightening as I did.<br /><br />Ian<br /><br />This post first appeared on the <a href="http://www.synapticacentral.com/content/report-digital-asset-management-dam-conference-london-1-july">Synaptica Central blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-7996709509207785942009-09-25T16:42:00.000+01:002009-09-25T16:48:15.917+01:00Report from the ISKO Content Architecture Conference - 22-23 June, London, UK<span style="color: rgb(0, 0, 0);">I spent Monday and Tuesday of this week at the fascinating </span><a style="color: rgb(0, 0, 0);" href="http://www.iskouk.org/conf2009/index.htm">ISKO Content Architecture Conference.</a><br /><br /><span style="color: rgb(0, 0, 0);">On Monday I gave a presentation on, "Still Digital Images - the hardest things to classify and find."</span><br /><br /><span style="color: rgb(0, 0, 0);">My presentation looked at the image market and the ways in which images can be annotated - or is that processed, classified, categorized, tagged, keyworded… We need a controlled vocabulary to controlled the vocabulary of controlled vocabulary!</span><br /><p style="color: rgb(0, 0, 0);">I then went on to raise some of the challenges of image organization and retrieval - picking out the need to consider different image domains and user groups, and considering how to provide users with access to basic attributes, depicted content and abstract concepts linked to images.</p> <p style="color: rgb(0, 0, 0);">There were some amazingly interesting presentations over the two days of this event.</p> <p style="color: rgb(0, 0, 0);">Highlights for me included a <a target="_blank" href="http://www.iskouk.org/conf2009/abstracts.htm#crystal">great keynote from David Crystal</a> looking at the evolution of the linguistic approach to content analysis. <a target="_blank" href="http://www.iskouk.org/conf2009/abstracts.htm#solomon">Madi Solomon</a> highlighted the challenges faced by Disney and Pearson in the management of content using metadata. <a target="_blank" href="http://www.iskouk.org/conf2009/papers/inskip_ISKOUK2009.pdf">Charles Inskip</a> opened my mind to music categorization and sale, and the many similarities with image retrieval and organization. Also, intriguing was the work showcased by the BBC's Tom Scott, who spoke about <a target="_blank" href="http://www.iskouk.org/conf2009/abstracts.htm#scott">'Building Coherence at bbc.co.uk'</a></p> <p style="color: rgb(0, 0, 0);">As always at these events, interesting posters and presentations abounded, and this blog can only give a flavour of them.</p> <p style="color: rgb(0, 0, 0);">If you want to know more, the organizers have made abstracts available online, and in some cases full papers. They also plan to make the slides of individual presentations available along with recorded audio. I'm told the full set of resources will be on the <a target="_blank" href="http://www.iskouk.org/conf2009/index.htm">conference website</a> in the next few weeks.</p> <p style="color: rgb(0, 0, 0);">Next week I'm at a <a target="_blank" href="http://www.damusers.com/events/about.php?eventid=28&PHPSESSID=ad8583d5258f3649ec12c3afcc023962">Digital Asset Management (DAM) conference</a> in London talking about "Tagging Images for Findability: making your DAM system work for you." More about that next week.</p> <p style="color: rgb(0, 0, 0);">Ian</p><span style="color: rgb(0, 0, 0);">This post first appeared in the </span><a style="color: rgb(0, 0, 0);" href="http://www.synapticacentral.com/content/report-isko-content-architecture-conference-22-23-june-london-uk">Synaptica Central Blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-10492111724221451862009-09-25T16:21:00.000+01:002009-09-25T16:42:10.855+01:00Classifying Images Part 3: Depicted ContentWelcome back to my occasional image classification series.<br /><br />The last time I raised the topic of image classification I discussed the basic attributes of images. This time I want to focus on the thornier issue of the content, or concepts, depicted in them.<br /><br />There is a danger of treating an image like a piece of text and classifying its attributes: Who created it? When? What techniques were used? Then writing a title or caption and leaving it at that. Sometimes little more need be done to a document than record this kind of information, especially with free text searching, but lots more needs to be done to most images.<br /><br /><span style="font-weight: bold;">Image findability</span><br /><br />Image findability is the process of using search and browse to access the images required. A major aspect of image findability relates to the things depicted in them. Image users often search for images based on the generic things in them and also the proper names of these things. Classifying images based on depicted content means considering anything and everything that is and can be depicted in an image. When considering this I like to focus my efforts on understanding the images I'm dealing with, the users who are trying to find and work with the images, and the ways in which these people need to search and browse for the images they need. After an assessment of these areas I then tailor my approach.<br /><br />Broadly speaking people searching for depicted content are looking for a number of types:<br /><ul><li>Places: cities, towns, villages, streets...</li><li>Built works: parks, skyscrapers, cottages, walls, doors, windows...</li><li>Topography: mountains, valleys...</li><li>Groups and organisations: air forces, choirs, police departments...</li><li>People: roles, occupations, ethnicity and nationality: mothers, doctors, Caucasians, French, Germans...</li><li>Actions, activities and events: running, writing, laughing, smiling, birthdays, parties, book signings, meetings...</li><li>Objects: a myriad of items...</li><li>Animals and plants: common and scientific names...</li><li>Anatomy and attributes of people, animals and plants: arms, legs, adults, leaves, trunks, paws, tails...</li><li>Depicted text shown in images - often signs or writing shown in images..</li></ul>Many of these generic types can also have proper named instances:<br /><ul><li>Proper names of people, places, buildings, topography, organisations, animals etc</li></ul>When dealing with depicted content I've found some of the biggest issues to be:<br /><ul><li>Identification - knowing what is in an image</li><li>Focus and specificity - knowing what to include and what to exclude</li><li>Consistency - applying the same term in the same way for the same depicted content</li></ul><span style="font-weight: bold;">Identification</span> - knowing what is in an image<br /><br />Depicted content is a relatively black and white area - a dog is depicted so a dog is tagged. However, it might sound a little weird, but working out what is actually in an image can be a lot harder than you think.<br /><br />Take a look at the image <a href="http://www.flickr.com/photos/sis/107999620/">"Do You Know What This Is?" by Sister72</a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuXJYki0Gtq037-JaUAFPexLgw5o6p0mKG1nv3gmyHgf0AoE-JlDlhg4oqdQSX_X_PRqGo_YWHxBsKj5f0c-Jc3UbFk1MI3-TpkZWBURW9HgDGjvEVcqnYA-7ct8A96jedamRVcBj4Ninr/s1600-h/107999620_a09bbea78e.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 297px; height: 256px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuXJYki0Gtq037-JaUAFPexLgw5o6p0mKG1nv3gmyHgf0AoE-JlDlhg4oqdQSX_X_PRqGo_YWHxBsKj5f0c-Jc3UbFk1MI3-TpkZWBURW9HgDGjvEVcqnYA-7ct8A96jedamRVcBj4Ninr/s400/107999620_a09bbea78e.jpg" alt="" id="BLOGGER_PHOTO_ID_5385426814855457202" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />This depicted content is fairly simple to see, but understanding what you're looking at is not that easy. Even if you know roughly what you're looking at, do you know what it's actually called?<br /><br />One tip is to group similar images together when you're classifying them. Also, always start by assembling as much information as possible before you begin to classify images. It is especially important to gather together the information you have from the creator or custodians of the images.<br /><br />Also important, when you have the luxury, is to get the image creator to add key metadata about the image at the point of creation, or soon after.<br /><br /><span style="font-weight: bold;">Focus and specificity</span><br /><br />Knowing what to include and what to exclude, what to mention and what to ignore, is also much harder than it sounds.<br /><br />Firstly, some image users will want a piece of depicted content tagged whenever it appears in an image, others will only want it tagged when the image shows a very good representation of that content, and of course many people will want something in between the two extremes.<br /><br />Different users have different requirements. You need to understand the domain in which you're working and see the classification of depicted image content as supporting the needs of your users.<br /><br />For example, Would you tag everything in this <a href="http://www.flickr.com/photos/puyo/253932597/">'Messy Room' image?</a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFKN4tCVIIjwTxtWaDvYyQDgPU2Gbt2fbkDQVLodAL2eN-W_AcjgMySQI48C5cf9TICogtLQq07qQsDU5pvGRZO6MEcTtOWS9_BpAC_0tdwqN0rMCtQeZDqH64nZEa_WrkCCq68kbAxCQn/s1600-h/253932597_a23322970f.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 306px; height: 229px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFKN4tCVIIjwTxtWaDvYyQDgPU2Gbt2fbkDQVLodAL2eN-W_AcjgMySQI48C5cf9TICogtLQq07qQsDU5pvGRZO6MEcTtOWS9_BpAC_0tdwqN0rMCtQeZDqH64nZEa_WrkCCq68kbAxCQn/s400/253932597_a23322970f.jpg" alt="" id="BLOGGER_PHOTO_ID_5385427689013934274" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />What would you miss out and why?<br /><br />Looking at the image of <a href="http://www.flickr.com/photos/thorne-enterprises/331263172/">"Mountain Goats", from Thorne Enterprises</a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjdjJE0hc4ESh6xhRel0DKiSRGh2bWvT4BmYYuesnanWdlCGr6ZcKJhi4BH2RtPOIUsTsPUKvmGqw9ORDdGx66DgVeY0IFBnPZhatpyjPyUuaYvxJy6wwwlHGLeArWRWjIz8aJXl5GD8LT/s1600-h/331263172_91ba485c29.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 202px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjdjJE0hc4ESh6xhRel0DKiSRGh2bWvT4BmYYuesnanWdlCGr6ZcKJhi4BH2RtPOIUsTsPUKvmGqw9ORDdGx66DgVeY0IFBnPZhatpyjPyUuaYvxJy6wwwlHGLeArWRWjIz8aJXl5GD8LT/s320/331263172_91ba485c29.jpg" alt="" id="BLOGGER_PHOTO_ID_5385428171463971554" border="0" /></a><br />Would you tag this with goats as well as mountains? Would this be helpful?<br /><br /><br /><br /><br /><br /><br /><br /><br /><br />Let's look at four images depicting windows:<br /><br /><a href="http://www.flickr.com/photos/_dietrich/2402860858/">'Window to the World'?,</a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmisbtWBVbQILKGHCAR9piAP_7kEcYny832YO67o6tQzQ3E5XW6y44Mku-Ol-3JjVhIU9Cn6bZsyOn9k-B7pBa-te54zsnyGPDH-P18QqREkFL5m8mSFa3mDtdePYQobSJaUudUlKErqXf/s1600-h/2402860858_92e68af42a.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 209px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmisbtWBVbQILKGHCAR9piAP_7kEcYny832YO67o6tQzQ3E5XW6y44Mku-Ol-3JjVhIU9Cn6bZsyOn9k-B7pBa-te54zsnyGPDH-P18QqREkFL5m8mSFa3mDtdePYQobSJaUudUlKErqXf/s320/2402860858_92e68af42a.jpg" alt="" id="BLOGGER_PHOTO_ID_5385429006804415010" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><a href="http://www.flickr.com/photos/franciscoantunes/2200581990/">Portu</a><a href="http://www.flickr.com/photos/franciscoantunes/2200581990/">guese Window'?</a>, '<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQzuR-nVrCnsaS-L8ka_i7Q2yhHvLlTvJm-hmzCobsSoFU92IlDI4QvFrphcsmJRoUNdZxjY3qzMaU6jPWaBBAQKqoL_GpX1bUlMw0PFzczghPZznP66W2NBSjVWaAW2pHp6hwGomM52du/s1600-h/2200581990_d5816cb474.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 221px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQzuR-nVrCnsaS-L8ka_i7Q2yhHvLlTvJm-hmzCobsSoFU92IlDI4QvFrphcsmJRoUNdZxjY3qzMaU6jPWaBBAQKqoL_GpX1bUlMw0PFzczghPZznP66W2NBSjVWaAW2pHp6hwGomM52du/s320/2200581990_d5816cb474.jpg" alt="" id="BLOGGER_PHOTO_ID_5385429567976099010" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><a href="http://www.flickr.com/photos/redfishid/3142704373/">What Light Through Yonder Window Breaks'?</a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinZYfqXl5jquB_GCCt_MQ6E9bgB3EI-tyYxBRO1s-SOzAdTjQVYYKEMVV6_BpUSsetnqCl6mNexoLm2BHqMXJJ0QHXuCbOzZCoHgGhATvtW5vFqeDeoWSKKJb8LaIfNShfhEgnMI4d4LSz/s1600-h/3142704373_5312fdfae4.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 213px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinZYfqXl5jquB_GCCt_MQ6E9bgB3EI-tyYxBRO1s-SOzAdTjQVYYKEMVV6_BpUSsetnqCl6mNexoLm2BHqMXJJ0QHXuCbOzZCoHgGhATvtW5vFqeDeoWSKKJb8LaIfNShfhEgnMI4d4LSz/s320/3142704373_5312fdfae4.jpg" alt="" id="BLOGGER_PHOTO_ID_5385429851742037634" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />and<br /><br /><a href="http://www.flickr.com/photos/franciscoantunes/2310858543/">'Window'.</a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLgDf9pwqZPw26UHncNggGd1IAKbTBv5hREtGo6Bh6h0VEC3ZXfwjTQ7ZUJoRLgb1Ek63dMUAbPqPL7WF2INRZmhI5OavwM4SjeU5TKfjjwBVJQvD3_GCcuUJyjg33SgsRtWfMr0NjJ2Tq/s1600-h/2310858543_a137832c25.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 213px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLgDf9pwqZPw26UHncNggGd1IAKbTBv5hREtGo6Bh6h0VEC3ZXfwjTQ7ZUJoRLgb1Ek63dMUAbPqPL7WF2INRZmhI5OavwM4SjeU5TKfjjwBVJQvD3_GCcuUJyjg33SgsRtWfMr0NjJ2Tq/s320/2310858543_a137832c25.jpg" alt="" id="BLOGGER_PHOTO_ID_5385430227107792786" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />Looking at these, it soon becomes clear that even deciding to apply a simple term like 'Windows' is not always easy.<br /><br />Would you apply 'Windows' to the image of the cat looking out of the window? Is a window actually depicted in that image? If the image wasn't tagged with 'Windows' how else would anyone find an image of a cat looking out of a window?<br /><br />The other three images show windows as parts of buildings. but is a building always depicted? Deciding when to apply a building type or the name of a building can be hard. Should you do this every time a part of a building is shown? Only when the whole building is shown? When enough of the building is visible? Or when a section of the building that to most people would represent the build is visible? For example, what part of the Empire State Building would you consider to depict that building? Rarely does anyone see it all - how much is enough? Would you treat the images of windows in a similar way and classify them all with a building type of 'Houses', or would you ignore the structure and focus on the parts - the window, the roof?<br /><br /><span style="font-weight: bold;">Consistency</span><br /><br />Achieving consistent application of terms to images revolves partly around clear term definitions, well defined application rules and guidelines, and a robust quality assurance process.<br /><br />Term definitions are very important. Defining the meaning of a term, and ensuring the people choosing which term to assign understand that meaning, can be crucial to term application. For example, creating a term such as 'Bow' without defining its meaning is not going to make it easy to apply.<br /><br />Application rules that are well considered, thorough and clear are also very useful. Even a simple concept often needs some form of guidance linked to it. I remember a while ago needing two terms, 'Indoors' and 'Outdoors' to allow users to find images of people who were outside and inside - a simple concept you might think, one that people often need, and one that's easy to apply - who'd need guidelines for that? However, it soon became clear that guidelines were needed after I received a series of interesting questions: Is being on a train indoors? Should studio shots always be considered indoors? Does every shot of a person have to have indoors or outdoors assigned to it? If not, when should this term be used and when not? Is this a focus issue? If so, how much of a location needs to be seen before Indoors or Outdoors is used. A clear set of application guidelines followed an interesting meeting!<br /><br />Strong quality assurance processes are very valuable. People make mistakes and images generate interesting issues. Appointing staff to review a percentage of classification work based on clear guidelines, and then sharing findings with the people who assigned the terms to the images, is an important way of assessing how well the image classification is progressing and keeping a classification team synchronised.<br /><br />Today I’ve talked a lot about content depicted in images, next time I’ll focus on abstract concepts which are related to an images ‘aboutness’.<br /><br />This post first appeared in the <a href="http://www.synapticacentral.com/content/classifying-images-part-3-depicted-content">Synaptica Central blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-1308902676656358342009-09-25T16:00:00.001+01:002009-09-25T16:21:12.576+01:00Content Based Image Retrieval - Google and Similar Image SearchI was very interested to see Google experimenting with visual similarity in still images, what I usually call Content Based Image Retrieval or CBIR.<br /><br />Google Labs recently launched an image search function based on visual similarity - <a href="http://similar-images.googlelabs.com/">Google Similar Images</a>. This new offering allows searchers to start with an initial image and then find other images that look like their example picture.<br /><br />I've been reviewing these type of systems on and off since the early '90s. They've always offered much, but I never saw any evidence that the delivery matched the hype.<br /><br />I've always found that using pictures instead of text to find images works best on simple 2d images: carpet patterns, trademarks, simple shapes, colours and textures. Finding objects in images was always a struggle, and looking for abstract concepts: fear, excitement, gloom, isolation, solitude.. was never been more than a vague possibility. Over the years a lot of work has been done in this area, and the search results I've seen have started to improve, but this technology is still young, and in my personal opinion still rarely delivers what most users want, need and expect.<br /><br />Looking at Google Similar Images, I wonder how much of the back-end is pure content based image retrieval (CBIR), how much is using metadata in some way, and how the two are interacting? One thing that appears to be helping to often show a tight first page of results, is simply pulling the same image from different sites. I also noticed that the 'similar images' option is not available for all images - which makes me wonder why? Have some images been processed in ways that others haven't?<br /><br />Diving right into the<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5EA-DRI2gQCokfayTtOW4WoV2znKIlFqmFz2ZbSLZNQHNqeslPgk2zqurRLSzmE2Y43gl8_jnVDLUJaPYKney9CY0uaX02q2k7DkeTUyL5RY3cVBmH-94rFEHDS3KYviSCVOUxdkMJ1HM/s1600-h/Google-Similar-5.preview.JPG"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 331px; height: 247px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5EA-DRI2gQCokfayTtOW4WoV2znKIlFqmFz2ZbSLZNQHNqeslPgk2zqurRLSzmE2Y43gl8_jnVDLUJaPYKney9CY0uaX02q2k7DkeTUyL5RY3cVBmH-94rFEHDS3KYviSCVOUxdkMJ1HM/s320/Google-Similar-5.preview.JPG" alt="" id="BLOGGER_PHOTO_ID_5385421184113544066" border="0" /></a> experience, I entered a query for a place in the UK and didn't see any image results with the 'Similar Images' option. I wonder whether this is to do with the presence of the results on UK websites?<br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo0Z4OgsIFGTh4L-WwN_zP9T49f-_joFKoqEjXj8rnbJGqSxZyKkgUTXeQ30VSsvB-OLxuMpDgxFc1phXZHa9Enmkvy-CwlJeqNExqMGaskyrIWzvI0k9GD-9ZQYtqdJn5d6h4FiPbypbk/s1600-h/Google-Simiar-1_0.preview.JPG"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 330px; height: 247px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo0Z4OgsIFGTh4L-WwN_zP9T49f-_joFKoqEjXj8rnbJGqSxZyKkgUTXeQ30VSsvB-OLxuMpDgxFc1phXZHa9Enmkvy-CwlJeqNExqMGaskyrIWzvI0k9GD-9ZQYtqdJn5d6h4FiPbypbk/s400/Google-Simiar-1_0.preview.JPG" alt="" id="BLOGGER_PHOTO_ID_5385422214934932562" border="0" /></a><br /><br /><br /><br />I persevered, and found some interesting images and got some interesting results.<br /><br />I started with a fairly standard image of a beach scene, always a favourite with testers. As you can see I got a pretty good first screen back. However, the 5th and 6th image on the top row show no sea or beach, neither do the first three images on the second row.<br /><br />I moved on to an image of what looks like equipment at the top of a pole.<br /><br />The results were much more mixed: studio shots of objects, fighting people, trucks etc. No images were returned that I would consider similar to the example picture.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjDkFd8-AQK1vPTODycMvElMEu-r03gygxnCS1NpzSnw27fOdslQaazZgit2Mbw4Vur_B6uYWnZnIEWnF6Oi0A-2tJvxvwYvPvRP5aZWbo-5d8G7jBk2-Bm_cjKdb4lcOMIO2hNhL16Hhh/s1600-h/Google-Search-2.preview.JPG"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 328px; height: 246px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjDkFd8-AQK1vPTODycMvElMEu-r03gygxnCS1NpzSnw27fOdslQaazZgit2Mbw4Vur_B6uYWnZnIEWnF6Oi0A-2tJvxvwYvPvRP5aZWbo-5d8G7jBk2-Bm_cjKdb4lcOMIO2hNhL16Hhh/s400/Google-Search-2.preview.JPG" alt="" id="BLOGGER_PHOTO_ID_5385422597087859090" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><div style="text-align: left;"><br /><div style="text-align: left;"><br />Interesting results came from a similarity query on a clock face. A couple of the first results hit the mark, then the results set degenerated into image similarity based more on the colour and the black background than anything else.<br /></div></div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwabvXxCuNjk16bv4h9Q-NcufUSZ6l7TV7RQRr1HbF0CXx-NnD1ep6b_ryGcKKZgRFFXLchncTi4udJtzPzRvk3HKP8qPiz3G-RitcR5VecoqYCGgYMYZFF7oJDrurMcTer1JeIhaSAytM/s1600-h/Google-Search3.preview.JPG"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 322px; height: 241px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwabvXxCuNjk16bv4h9Q-NcufUSZ6l7TV7RQRr1HbF0CXx-NnD1ep6b_ryGcKKZgRFFXLchncTi4udJtzPzRvk3HKP8qPiz3G-RitcR5VecoqYCGgYMYZFF7oJDrurMcTer1JeIhaSAytM/s400/Google-Search3.preview.JPG" alt="" id="BLOGGER_PHOTO_ID_5385422952752324306" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />My last attempt, before morning coffee called, was an image of a country road. I was hoping that the clear roadway might produce a pretty precise results set. However, I was a little disappointed by what I saw.<br /><br />The first results page only produced one vague road on the bottom row, with most of the similarity seemingly related to colours instead of objects.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhX1iwUpXOMiizz_R7MYcIkqJmMkkiJzcLpFcqpdtMtWnKQfxLhvOBEtPRxndZ5P6fEpiDNNGcFgSeiwW93cVdiHeOdq8S-JQ3EmVvtBWuNWshPMZMEZUa90-oWKVhjjEWavALnlic2whQE/s1600-h/google-search-4.preview.JPG"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 319px; height: 239px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhX1iwUpXOMiizz_R7MYcIkqJmMkkiJzcLpFcqpdtMtWnKQfxLhvOBEtPRxndZ5P6fEpiDNNGcFgSeiwW93cVdiHeOdq8S-JQ3EmVvtBWuNWshPMZMEZUa90-oWKVhjjEWavALnlic2whQE/s400/google-search-4.preview.JPG" alt="" id="BLOGGER_PHOTO_ID_5385423289816296546" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />From my less than scientific dip into this Google Labs offering, it looks like the highlighted images on the Google Similar Images home page produce good results - better results than I've seen other systems come up with. Many other image queries are sure to also produce results which may well impress. However, many of the results I saw did not match the initial level of accuracy I saw from the highlighted home page pictures.<br /><br />I don't want to be picky, this is still a prototype after all, and well done to Google for introducing a wider audience to this type of image search. Hopefully, after more work, the results will increasingly make more sense to people, the access points offered to depicted content and conceptual aboutness will improve and more images will be more findable for more people.<br /><br />Until that time, visual search without text will help with image findability, but text, metadata, and controlled vocabulary applied to images by people is for me still king, and will continue to offer the widest and deepest access to images for a long time to come.<br /><br />Ian<br /><br />This post first appeared on the <a href="http://www.synapticacentral.com/content/content-based-image-retrieval-google-and-similar-image-search">Synaptica Central Blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-76643939849962747332009-09-25T15:58:00.000+01:002009-09-25T16:00:08.054+01:00VideoSurf - a new way to search for video?If you have been keeping up with my posts on this blog you won't be surprised to learn that today I spent my lunch hour exploring a video search offering that's new to me called VideoSurf. I was so interested in this new search tool that I interrupted my usual run of image indexing articles, and my lunch hour, to do some research and write up this post.<br /><br />In a September press release VideoSurf claimed its computers can now, "see inside videos to understand and analyze the content." I would encourage anyone who has an interest in this area to take a look at the company's website, give it a whirl and see what they think.<br />Watch Vampire Videos Online - VideoSurf Video Search<br /><br />In my experiences video search engines have relied on a combination of the metadata that is linked to the video clips, scene and key frame analysis, and automatic indexing of sound tracks synched with the video.<br /><br />For example, sound tracks, synchronised to video content, can be transformed to text and indexed and then can be linked to sections of videos by looking for gaps in the video to identify scenes, with various techniques also used to create key frames, that attempt to represent a scene. These techniques are backed up with metadata to accompany a video clip.<br /><br />If you have worked in the industry you know that video metadata is expensive to create. Most of what people see online is either harvested for free from other sources, or limited in size and scope. Such metadata may cover the title of a video clip, text describing the clip, clip length .etc. It may even include some information about the depicted content in the video or even abstract concepts which try to specify what a clip is about. Though this level of video metadata is the most time consuming and complex to create - it also offers the fullest level of access for users.<br /><br />Audio tracks can be also be of great use and many information needs can be met by searching on audio in a video. There are however limitations; for example many VERY SCARY scenes have little dialogue in them, and depend heavily on camera-work and music to give the feeling of fear, how easy is it to find these scenes based on dialogue alone, or even based on 'seeing inside a video'. How can you look for 'fear' as a concept?<br /><br />Content based image retrieval, looking at textures, basic shapes, and colours in still images, has yet to offer the promised revolution in image indexing and retrieval. In some contexts it works quite well, in many contexts end-users don't really see how it works at all. So adding a layer to video search that tries to analyse the actual content, pixel for pixel is an interesting development.<br /><br />To my mind, a full set of access paths to all the layers of a video still demands the use of fairly extensive metadata, especially for depicted content and abstract concepts. Up to now, metadata has always been the way to find what an image, whether it's still or moving, is conceptually about, and what can be seen in individual images and videos. Even when that metadata is actually sounds, turned into text and stored in a database.<br /><br />Is VideoSurf's offering really any different from what's gone before?<br /><br />Is this system, which seems to be using Content-Based Image Retrieval (CBIR technology to some extent, a significant advance?<br /><br />Reviewing some of the blog posts people have published it seems many others are interested in VideoSurf's offering as well.<br /><br />For an initial idea as to how VideoSurf works, try taking a look at James McQuivey's OmniVideo blog post, "Video search, are we there yet?-. As James describes in the article, one pretty neat aspect of what VideoSurf can do is to match faces, enabling you to look for the same face in different videos, thus reducing the need to have the depicted person mentioned in the metadata exclusively. However, this clearly isn't much help if the person you're looking for is mentioned but not depicted, in which case indexed audio would help, or if the person is not well depicted, for example the person is only depicted from the side or the back. However, quibbles aside, if this works, then this is a pretty useful function in itself.<br /><br />Here are some of the other bloggers who have be writing their thoughts on Video Surf. For example:<br /><br /> * An interesting post on this subject from the Rhondda's Reflections blog on Searching for videos with VideoSurf<br /> * Phil Bradley comments on his Weblog on the VideoSurf Video Search<br /> * And one of the the best current reviews of VideoSurf that I've found comes from Chris Sherman at SearchEngineLand.<br /><br />Clearly, we're on the right track and there is a lot of interest in the opportunities and technologies around video search. However I think that there is a long way to go before detailed and automatic object recognition is of any meaningful use to people. As far as I can see, it's still not there with still or moving digital images. Metadata for me is still the 'king' of visual search. There however are a growing number of needs that automatic solutions can already resolve and a growing case for solutions that work by offering a combination of automatic computer recognition of image elements, metadata schemes and controlled vocabulary search and browse support.<br /><br />I'd love to know what people think, about VideoSurf and other services that provide video search.<br /><br />Ian<br /><br />This post first appeared at the <a href="http://www.synapticacentral.com/content/videosurf-new-way-search-video">Synaptica Central blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-68791637471844664362009-09-25T15:55:00.000+01:002009-09-25T15:58:01.696+01:00Classifying Images Part 2: Basic AttributesI've already asked the question "What is the Hardest Content to Classify?" and promised additional posts on the subject based on my background of 13 years developing taxonomy and indexing solutions for still images libraries, so I am continuing my thoughts in this post focusing on the basic attributes of image classification.<br /><br />In my opinion, images are the hardest content items to classify, but luckily for sanities sake not all image classification is equally demanding.<br /><br />The easiest elements of image classification relate to what I'm going to call image attributes metadata. This area, for me, covers all the metadata about the image files themselves, rather than information describing what is depicted in images and what images are about.<br /><br />Metadata aspects in this area cover many things and there are also layers to consider:<br /><br />1, The original object<br />-- This could a statue, an oil painting, a glass plate negative, a digital original, or a photographic print<br /><br />2, The second generation images<br />-- The archive image taken of the original object, plus any further images, cut-down image files, screen sizes, thumbnails, images in different formats, Jpeg, Tiff etc<br /><br />The first thing to think about is the need to create a fully and useful metadata scheme, capturing everything you need to know to support what you need to do. This may be to support archiving and/or search and retrieval.<br /><br />Then look at what data you may already have or can obtain. Analyse data for accuracy and completeness and use whatever you can. Look to the new generation of digital cameras to obtain metadata from them. Ask image creators to create basic attribute data at the time of creation.<br /><br />You'll be interested in the following metadata types:<br /><br />- Scanner types<br />- Image processing activities<br />- Creator names<br />- Creator dates<br />- Last modified names<br />- Last modified dates<br />- Image sizes and formats<br />- Creator roles - photographers, artists, sculptures<br />- Locations of original objects<br />- Locations at which second generation images were created<br />- Unique image id numbers and batch numbers<br />- Secondary image codes that may come from various legacy systems<br />- Techniques used in the images - grain, blur etc<br />- Whether the images are part of a series and where they fit in that series<br />- The type of image - photographic print, glass plate negative, colour images, black and white images<br /><br />This data really gives you a lot of background on the original and on the various second generation images created during production. Much of this data can either be obtained freely or cheaply, lots of it will be quick and easy to grab and enter into your systems. It should also be objective and easy to check.<br /><br />My next post will cover dealing with depicted content in images. Please feel free to leave comments or questions on the subject.<br /><br />This post first appeared on the <a href="http://www.synapticacentral.com/content/classifying-images-part-2-basic-attributes">Synaptica Central blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-50804152124267960452009-09-25T15:52:00.000+01:002009-09-25T15:55:35.814+01:00What is the Hardest Content to Classify?A topic that came to mind, as I thought about things to blog about, is the whole area of classification of different types of content: text, sound, video and images.<br /><br />I often speak to clients who have a range of item types stored in a number of repositories. They're often looking to classify new content, or to work on older content in order to improve its findability. They are always looking to get more value from their content.<br /><br />In these circumstances a content audit is often called for, to answer the 'What do you have?' question. This then leads to a general discussion of the content types and the ways in which they can be classified, usually using a controlled vocabulary either applied by a machine, by a person, or by a mixture of the two.<br /><br />One thing that often makes people ask me questions is my fairly frequent assertion that images are easily the hardest item types to deal with.<br /><br />Why are Images the Hardest Content to Classify?<br /><br />-Textual items contain text. Use of auto-categorising software, free text storage and access .etc .etc makes organising and finding textual items relatively easy.<br /><br />-Sound can be digitised and turned into text.<br /><br />-Video often has an audio track that can be turned into text too. Computers can be used to identify scenes. Breaking a video into scenes and linking a synched and indexed soundtrack together can provide pretty good access for many people - (though there's a whole blog post on the many access points to video that these process doesn't provide).<br /><br />Images on the other hand have no text, no scenes, all you have are individual images, with the meaning and access points held in the visuals.<br /><br />Some will say that this is really not a problem, all you need to do is use content based image retrieval software to identify colours, textures and shapes in your images, and you'll soon be searching for images without any manual indexing. However, whilst this technology is promising, it leaves a lot to be desired.<br /><br />Today, the way to provide a wide and deep level of access to still images continues to be by using people to view images, write captions and assign keywords or tags to each image based on image 'depictions' and 'aboutness and attributes'. This manual process often requires the use of a controlled vocabulary to improve consistency and application.<br /><br />However, how this indexing is done and what structures support it, will be the subject of further posts- I just wanted to get my thoughts out there !<br /><br />So Stay tuned.<br /><br />Ian<br /><br />this blog post first appeared on the <a href="http://www.synapticacentral.com/content/what-hardest-content-classify">Synaptica Central blog</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0tag:blogger.com,1999:blog-2436917853808889474.post-170235025081400172009-09-25T14:59:00.000+01:002009-09-25T15:51:59.104+01:00Author Spotlight: Ian DavisMy name is Ian Davis, and I'm a Global Project Delivery Manager working in the Dow Jones Client Solutions Taxonomy Delivery Team and based in our London office. I work to develop and deliver a range of content and information solutions for our global clients. Projects can include discovery assessments, taxonomy strategy and creation, taxonomy mapping, search support, information architecture and website development. I also assist in the marketing and deployment of the website <a href="http://www.taxonomywarehouse.com/">www.taxonomywarehouse.com</a><br /><br />My particular areas of interest include: developing taxonomies, thesauri, and metadata schemas, manual and automated indexing of still and moving images, deploying and using Synaptica controlled vocabulary software, the challenges of managing teams of geographically dispersed information workers, website creation and development, and the localisation of content into multi-lingual environments.<br /><br />I joined Dow Jones in February, 2006, after 13 years developing taxonomy and indexing solutions for still images libraries at both Corbis Corporation and Photonica (formerly part of Amana Japan and now part of Getty Images).<br /><br />At Corbis, I served as head of the UK division’s image cataloguing department.<br /><br />At Photonica, I worked to create and implement the e-commerce website www.iconica.com and was responsible for the development of www.photonica.com. I also developed, implemented and maintained all vocabularies underpinning the classification and retrieval of Photonica's extensive digital image content. One aspect of this included creating an extensive English language thesaurus and managing the localisation of that controlled vocabulary into five European languages. I managed a team of ten still image indexers and five thesaurus developers.<br /><br />After leaving Photonica, I worked as an independent consultant for BUPA in the area of metadata and taxonomy creation and development, and the implementation of an enterprise search solution.<br /><br />Most of my time is currently spent working on the delivery of a range of client engagements outside the Americas. I managing a team of geographically dispersed staff who are working on the customisation of large topical thesauri and the creation of various browsable taxonomies. We also create multi-lingual thesauri.<br /><br />This post first featured on the <a href="http://www.synapticacentral.com/content/author-spotlight-ian-davis-global-project-delivery-manager-taxonomy-delivery-team">synaptica central blog -</a>Ian Davishttp://www.blogger.com/profile/07525154101983512716noreply@blogger.com0