Thursday, 1 July 2010

Musings on E-Commerce Retail Metadata

Retail based e-commerce, buying and selling products online, is a growing part of all our lives. From purchasing the latest DVD movie, ordering a part for your hand-built pc, or just treating yourself to the odd novel, using retail websites is something we all do day-in-day-out.

The business of creating operational e-commerce sites is a complex one. From relationships with data vendors and product suppliers, to product data, supply chain and pricing issues, to website navigation, search, and item ordering and fulfillment, there is a lot to consider. A great deal of work goes into creating and running the sites we use everyday, but how much do many people know about what goes on below the surface? As the swan glides smoothly across the water, how much frantic paddling is really taking place below the surface? I thought I'd take a little time to introduce a few key areas, giving a flavour of some e-commerce retail day-to-day issues and challenges.

Product Data
Metadata is a major issue for e-commerce retail websites. Two of the clearest divisions many people will see on their favourite sites is between descriptive metadata and promotional metadata. Companies often maintain divisions between these two data types. Different people may create them, in different systems and for different reasons. Different types of metadata may be used in different ways and often interacts in specific ways with each website’s search and browse functionality.

Descriptive metadata is usually linked to products in order to describe them - either for the benefit of people, IT systems or both. Taking books as an example, their product attributes can be many and varied including: title names, author names, publisher names, languages, prices, weights, number of pages, genres or subjects.

Take a look at any retail site, look closely and you'll start to see the metadata. You'll soon notice how much of it there is, and there's usually more of it behind the scenes then there is in the public-facing website pages.

How do companies deal with all this data sloshing around their web businesses?

One key concern is the need to decide on what metadata is controlled - how and why, and what metadata is not controlled. It's perfectly reasonable to have a number of free text fields, in combination with data fields that are semi-controlled and other fields which only contain pre-determined values.

For example,

Free text fields can be populated using agreed editorial guidelines. These may cover short and long descriptions of products, product reviews by users, publishers, or newspapers.

Semi-controlled fields will be constructed under tighter guidelines. These fields often include titles or sub titles.

Controlled named entity fields are a big part of the equation. These deal with the proper names of people - authors or illustrators, organisations - such as publishers, and events etc. These metadata fields are populated with controlled names taken from authority files.

Another key set of controlled vocabulary fields covers: subjects or genres, types - books, CDs etc, formats - paperbacks, or hardbacks, and audiences - children, students, teens. Values for these fields are often created in thesauri with preferred terms being linked to variant forms or synonyms, hierarchical relationships between preferred terms, and related links across them to take people from a subject like Dogs, to the related subject of Pets.

A moot point is how much control should exist behind promotional metadata versus product descriptive metadata?

I've always taken the view that usually getting the ‘best of both’ is ideal. In a fast paced area - promoting products to customers, much of the promotional metadata needs to be loosely controlled in terms of how it is created and used. Staff should be given freedom to respond to market conditions and use their initiative. However, tighter controls are often needed in terms of who creates promotional metadata. This helps to achieve a broad consistency – giving customers a framework to work within.

Customers visiting e-commerce websites are often looking for something new, and fast moving promotional metadata assists them with this need. However, customers also like to know how to get to regular promotions relating to fairly consistent products e.g. ‘CD Friday – 10% off the top 20’. Consistency comforts and reassures, unpredictability excites and enlivens – a mix can give the best result.

How is all this descriptive and promotional metadata created?

For descriptive metadata -

The needs of each product type, core customers, and the business, are assessed and a list of metadata fields created. For example, a field may be created to cover the concept of product genres. The preferred name of the field may internally be ‘Subjects’, the alternative website display name of the field may be ‘Genres’. Controlled vocabulary terms, needed for controlled vocabulary fields, are created to support product descriptions. These terms are maintained by staff, and assigned to products. For example, current genre terms are gathered and reviewed. They are approved as they are, or modified or removed. Additional controlled vocabulary is created and maintained to support customer needs. Some metadata fields will be populated from third party vendors, others will have data manually entered into them as free text, semi controlled text, named entities or controlled vocabulary terms. For example, subject entries are applied to products through mapping from vendors or entry by staff. Governance structures are created and maintained, including: descriptions outlining the data entered into each field, guidelines on where the data comes from, how it is entered and who enters it etc. For example, rules are often written explaining why a controlled vocabulary focused on subjects is needed, why the internal name is subjects and the external name is genres, how terms are created, maintained and deleted etc. What relationships exist between these terms and between related terms in other areas - relationships may be hierarchical ones – Broad term>Narrow terms, or even related ones between related concepts. Other questions usually include - how a vocabulary is used by other systems, who has the right to request additions and deletions and who has the final say.

Vocabulary development

The creation of efficient and effective controlled vocabularies - taxonomies, thesauri and ontologues - whether they support browsing or retrieval, is a controlled process based on ongoing assessment and review. It is not something that is done quickly or with little thought. Proper steps are taken to fully and effectively create and modify the necessary vocabulary types. Essentially, it is possible to create and develop a consistent and logical set of data structures behind the scenes, upon which much can be effectively built, whilst ensuring flexibility as to how data elements and the relationships between them can be displayed on websites.

For promotional metadata -

Guidelines are usually written outlining the role of promotional metadata and the ways in which it promotes sales. These guidelines describe and control the process of creating, modifying and removing promotional metadata. Specific staff create, modify and delete promotional metadata on a daily basis. The effectiveness of promotional metadata in generating product sales is analysed and changes made as needed. Effective promotional metadata is often defined as promotional data that sells more products. Ineffective promotional metadata is conversely defined as that which sells fewer products, reduces or damages the sales experience, or results in potential customers not buying and moving to rival websites.

A simple example of one possible guideline may be to ensure that in the promotional area of a website, promotional items with a short duration are always at the top of the display, whilst those with a longer duration are lower down the display. People need to see very time restricted offerings first, but like to know where longer sales promotions can easily be found.

For example

Unstructured view:

• Buy Dr Who DVDs
• 2 for 3 on CDs
• Latest Disney blu-rays
• Magic Monday – special offers
• 6 hour speed sale – click now

Structured view:

• 6 hour speed sale

• Magic Monday – special offers

• 2 for 3 on CDs

• Buy Dr Who DVDs
• Latest Disney blu-rays

The challenges e-commerce sites face are many and varied, these include:

Data mapping from vendors:
* Reviewing mapping tables.
* Documenting data needs.
* Analysing data vendor metadata for breadth, depth and accuracy.
* Negotiating with vendors regarding additional information or fixing data feed issues.
* Modifying mapping tables – changing current mappings, adding additional ones, and creating new vocabularies to map to.
* Testing and releasing updates.
* Agreeing and implementing governance rules and guidelines.

Named entity enhancements:
* Reviewing named entity files for data accuracy, breadth and depth.
* Identifying problems with current data and possible problems with the addition of data – either from vendors or through manual entry.
* Fixing vendor or data entry issues relating to new data.
* Cleaning metadata previously entered.
* Identifying named entities with more than one alternative name. For example, ‘Arthur Conan Doyle’, ‘Conan Doyle, Arthur’, ‘Conan Doyle’, ‘Doyle, Conan’, Arthur Conan-Doyle’, etc.

Data cleansing tasks to fix these kind of problems would include: identifying the named entities, choosing a preferred name based on guidelines and creating a data structure allowing the creation of a number of alternative names, which would be linked as synonyms to the preferred name. A newly cleansed vocabulary of preferred names, related to a wide number of synonyms, would assist greatly with data retrieval.

Search Support:
* Reviewing and extending search synonyms.
* Reviewing search metrics: zero hits, few hits, too many hits, searches with low product views, searches with low basket conversion rates.
* Directing the results of each search review into enhanced metadata creation, product descriptions, search effectiveness (e.g. stop words review and updating) and website usability.
* Agreeing and implementing governance rules and guidelines.

Browsing Assistance:
* Creating and displaying consistent and intuitive facets describing and promoting products.
* Creating useful divisions between descriptive and promotional categories.
* Creating processes to manage the maintenance of these divisions, the ways in which both are created and developed and the ways in which metadata in back-end systems interacts with metadata displayed on public facing websites.
* Agreeing and implementing governance rules and guidelines.

Luckily all of these challenges can be dealt with and minimised by employing the ongoing professional services of staff or consultants adept at using data analysis, metadata modelling, taxonomy, thesaurus and ontology creation and mapping to support content description and findability. When these skills are combined with current stake analysis, key task analysis and supported by the best in usability, wonderful things can be achieved.


Friday, 30 April 2010

Juice Based Findability

I recently returned from an e-commerce assessment project in Cape Town. The project went well, and the client was absolutely wonderful - very welcoming and extremely keen to strengthen the asset categorisation of their products and the search and browse support they offer.

My stay was extended somewhat by the antics of an Icelandic volcano - yes me too. While I was on 'volcation' I enjoyed a number of visits to the hotel's 'full breakfast buffet'. Sitting there, sipping my coffee, I received a lesson in 'Juice Based Findability' - bear with me, it will make sense soon.

My hotel had the usual juice section - glasses close to a variety of freshly squeezed juices. I probably sat near this juice area 10 times during my recent stay. Whilst idly watching my fellow breakfasters I noticed at least 5 occasions when the guests could not find the glasses for the juice. The thing was that the glasses were lined up below the juice bar, and the table top on which the juice bar was sitting was wide enough to obscure the glasses to the guests who were standing next to the juices. On a number of occasions, guests approached the juice area intent on getting a drink, and all too often they were unsuccessful - they could just not find the glasses. Some looked around quite determinedly, some spent longer than others trying to track down the errant glasses. Some asked members of staff for help, some just walked away and got a coffee or tea instead.

Some people tried harder than others to solve the problem for themselves and get a glass of juice, but everyone with the problem was unsuccessful in solving it. The same staff were asked to solve the same problem day in day out, and yet they never altered the juice bar area. They never changed the location of the glasses or added any signage explaining the location of the glasses.

This experience is very similar to information finding challenges online. All too often sites do not make information finding tasks as simple and as fast as they should be. Also, when faced with real people having real problems, some sites ignore them, others help individuals via customer services centres, but most don't fix the root of the problem.

Faced with problems, frustrated by confusing navigation, strange search results, or missing information, most web users will go elsewhere with their business. If they do let the site owner know the problem, then please website owners, fix it at the root so other people don't encounter it.

Sometimes information architects and website owners are too close to things - too focused on their issues and their plans. They need to regularly take a step back and watch their customers and users interacting with their websites.

Next time you have a moment, look at the key information tasks your customers or clients have, sit back and ask, "How easy it it to get to the juice?" Analyse search logs, sit with people and watch them use your site, there are lots of ways to do it. Then, act on what you see, focusing on helping most of the people most of the time. I guarantee that valuable lessons will be learned and findability will improve.

Dow Jones Client Solutions offers audits targeted at improving information findability through enhanced asset categorisation, browse navigation and search support. Let me know if you would like to get more value out of your information.


Monday, 22 March 2010

E-Commerce Websites - Metadata and Controlled Vocabulary Can Help

I've worked for Dow Jones Client Solutions, managing our 'Outside Americas' information consulting services, since 2006. In that time I've been involved in a wide range of projects for a variety of businesses.

Dow Jones Client Solutions offers a diverse range of information management services, amongst them services to: organise audio, video, image, and text assets, improve information browsing, provide effective search experiences and create bespoke user journeys that direct clients and customers from initial products and services to related ones.

I've been thinking a lot about e-commerce websites recently, and looking at quite a few examples of the genre. I am also a customer myself, and all too frequently come up against frustrating websites with poor search and browse functionality and a complete lack of regard for the possible customer.

Competition online is strong. It's easy for customers to move between competing websites - choosing the ones with the best experience and the right mix of products, price and customer service. Revenue and market share go to sites that offer an easy to understand information architecture - with user-friendly navigation, an intuitive and efficient search experience - with effective asset categorisation, search facets and filters, related links to products and services, and the appropriate sets of keywords to direct simple searches to the appropriate results.

Dow Jones Client Solutions offers:

* E-Commerce Assessments.
* Search and browse advice and development.
* Metadata and vocabulary development and maintenance.
* Categorization advice for text, images, video and audio assets.
* Vocabulary and metadata mapping to aid sharing and interoperability.
* Metadata and vocabulary translation and localisation.
* Information management workshops and training sessions.

If anyone reading this feels that the consulting services we offer may be of interest, I would love to arrange an quick informal call to discuss your business objectives.

I look forward to hearing from you.


Monday, 30 November 2009

Digital Asset Management Foundation - Coffee Meet-Up - Notes and Audio

In my last blog post I mentioned I was taking part in an informal 'meet-up' to discuss Digital Asset Management (DAM). I made some rough notes during the call, which I hope will serve to give a flavour of the discussions:

  • The need to broaden the understanding of DAM.
  • The need to share experiences and challenges in DAM.
  • The need to connect with clients, understand needs and deliver targeted solutions.
  • Creating metadata and vocabularies to support assets: images and video.
  • Applying metadata to image and video assets - manual, automatic and semi-automatic solutions.
  • DAM solutions: 'software as a service' versus 'enterprise solutions'.
  • Creating Vision Statements for DAM.
  • The phases of DAM.
  • DAM return on investment: key task analysis, baselining and measuring outcomes.
  • Controlled vocabularies for DAM - license to kick start development, then develop and customise.
  • Using consultancy to support DAM creation and utilisation.
  • Working with legacy data in DAM systems.
  • Harvesting metadata from creators and suppliers.
  • Adding value through manual tagging of assets.
  • Tagging assets using: external sources - off-shore or local, or in-house resources.
  • Video processing: soundtrack indexing, scene and key recognition.
For those who want to listen to the conversation you're free to do so by visiting the following URL:

DAM Foundation - Audio Track of Coffee Meetup 27 Nov 2009

The audio is a little broken up at the start, but stick with it, it gets better. Also, time delays between the US and UK means it sounds as if the speakers are talking over each other.

Speakers were:
  • Nigel Cliffe, Managing Director at Cliffe Associates Ltd
  • Ian Davis, Taxonomy Delivery Manager, Outside Americas, Dow Jones Client Solutions
  • Henrik de Gyor, Digital Asset Manager at K12 Inc
I hope you all enjoy the conversation, we hope to arrange more in a few weeks.


Friday, 27 November 2009

Digital Asset Management and Metadata for Images and Video

Missing out on the recent Photo Metadata Conference - - has reminded me how much I love working in the DAM world, in particular in the area of creating metadata and controlled vocabularies to support digital image and video search and browse.

Reading about the Photo Metadata Conference programme
it seems like there were some great presentations. I downloaded them all, they're available from the conference website, and had great fun going through all the excellent experiences, comments and ideas.

I wish I'd been there for Madi Solomon's keynote on the collapse of boundaries in the digital world. I agree that it's less and less about what format an asset is in and more about what that asset is, and how it needs to be organised to support its use.

Assets need to work for their places in the world. Finding them and using them needs to be simpler, and metadata and controlled vocabularies need to support and enable this.

Understanding the assets an organization has, analysing the needs of that organisation, and ensuring they have what they need and that each asset is organised to support its use, is where the really exciting and satisfying work is for me.

After having worked for Corbis from 1991 to 1999, in the early research and development days of digital image organisation and sale, I was excited to see Max Wieberneits presentation on still and video metadata.

Video and still images have much in common. I've blogged about this in the past and it's still a big area for me. Both asset types have technical metadata, depicted content metadata and aboutness metadata, to name but a few. Add to this the sound tracks for video - which can be indexed for retrieval, and the ability to segment video into scenes and key frames, and you have an exciting mix of metadata across both formats.

I agree with Max that using established metadata systems makes a huge amount of sense, as does working to get as much metadata as possible from the creators or custodians of images and video - it's much easier to capture metadata early on in the creation process than down the line, and some metadata will be lost if you leave its capture too late.

As Max says, one key concern for image and video asset metadata is the users of the assets. Different people have different needs and need different metadata. For many people a good level of access to video can be built using initial metadata associated with the videos, key scene and frame analysis and the indexing of the audio tracks of the videos. Whereas for others, access to the mood of the video may only come through music analysis, lack of noise at key moments, and manually applied subject tags.

On the image side, as Max says, editorial users have somewhat differing needs to commercial users of stock photos. Max showed a great slide listing a long set of conceptual keywords: 'comfortable, dreaming, luxury, spoiled' etc. I remember the fun we had creating these concepts, arranging them in hierarchies, providing synonyms for them, and creating definitions and application rules to control how they're assigned. It sounds easy, but trying to accurately use a concept like, "spoiled" or "luxury" often brings many challenges.

I've already touched on the needs of video users, and some of the basic ways video can be organised. It was great to read Lionel Faucher's piece on how a video agency uses metadata. Video is easier than still images to work with, automated solutions are more applicable to video and much more successful, but challenges still abound, as Lionel clearly shows in his presentation.

One of the interesting topics I've been following for a while is the metadata being generated from digital cameras, and the work being done to make more use of it. Related to this is the exciting area of geographic coordinate metadata, which is created by some digital cameras when a photo is taken, and the uses to which that can be put.

Two presentations in the area of geography and image metadata were given by Bern Beuermann
, and Ross Purves. A great research area was mentioned by Bernd - the taking of GPS co-ordinates and linking them to points of interest that are within a certain range of a GPS location. This can make the tagging of images with key depicted buildings, or topography a little easier and will produce many advantages for image tagging and retrieval..

A couple of things that I'm interested in were missing from the conference. I'd have liked to have seen more on: working with video soundtracks, automatic scene and frame analysis, and the place of manually applied tags in video indexing. I'd also like to have seen more about the creation of hybrid image retrieval systems that bring together content based image retrieval with controlled vocabulary and folksonomy tags. Maybe that's all for next year!

There also seemed to have been a big emphasis on technology, file formats, and metadata standards - in many ways the building blocks or key tools for organising and providing access to video and image content. What I'd have liked to see more of is the uses to which these building blocks have been put, the real world sharing of user needs and the challenges of actually making the technology and the supporting structures work to achieve business aims.

I should end by thanking the organisers of the event, and the presenters, for putting so many presentations online - it's very helpful and refreshing to have such a good level of access to this form of content.

One way in which I keep involved in the image and video world is through my involvement in the DAM Foundation on Linkedin. There is a coffee meet-up organised for this afternoon, which I hope will kick start a lot of exciting developments. I'll post more about the outcome of the meeting next week.


Tuesday, 6 October 2009

My Thoughts on, "Collaboration: know your enthusiasts and laggards", article from Cisco

Last week I spent some time reading an excellent and very interesting piece from Cisco, "Collaboration: know your enthusiasts and laggards".

I encourage you take a look at the results of the study Cisco undertook into the factors linked to successful adoption of collaboration via networked tools: instant messaging, wikis, shared workspaces, video conferencing, forums and discussion boards etc.

Whilst reading their interesting findings a couple of things struck me.

On page one of the article was the sentence,

"You can use the study results to maximize your return on investment from collaboration tools. One way is to implement business practices shown to lead to more enthusiastic collaboration."

This struck me as possibly being another way of saying: if you have already purchased tools to allow collaboration you can enjoy a return on that investment by putting in place an environment which will encourage collaboration using these tools. Please correct me if I'm wrong but this sounds a little too close to the assumption that collaborating is an end in itself, not a means to an end.

To my mind, collaboration is very important in many walks of life and many types of organisations can benefit from doing a lot more of it. Some of it will come via software; much of it should come through face-t0-face chats, discussions and more formal meetings. None of it will, I think, lead to a return on investment in and of itself. If I asked a CEO how their business was doing in these hard times, I wouldn't expect them to say, "We're doing well, we're collaborating so much more than before."

For me, the key to a return on investment from collaboration is controlling that collaboration. Knowing what the business goals and objectives are and making a conscious decision to use collaboration as a technique to help achieve them. Also important is the monitoring of the collaboration taking place and then linking the collaboration efforts to the outcomes of the collaboration.

Collaboration can have a very specific goal, "We have a project to deliver and two teams in different cities need to collaborate, in these ways, to successfully deliver that project."

Collaboration can be less concrete, but no less valuable, "We have a group of people over here, and another group over there, who would benefit from talking more and understanding each other - their jobs, their day to day issues and how they go about solving them. We're not sure what will exactly come from this but we will set up collaborative spaces, monitor them, get feedback from the collaborators, and look at how these groups do their jobs one month, three months, six months, after the collaboration was established. We'll then analyse how collaboration contributed to getting a, b, and c done, learn from the experience and build on it.

Rather than saying, "We collaborate therefore we succeed", I'd like to be able to say, "We had a business need, problem or corporate goal, we put a number of collaboration techniques in place and we achieved our goals or fixed our problems. We also saw where and how our collaboration contributed to our success."

Collaboration is a tool to use to achieve an objective, not an end in itself. Return on investment comes from what results from collaboration, not from collaboration alone.

For many people and organizations the goal should be to achieve results through targeted collaboration, not to just collaborate more.

I hope we all succeed because we know how to collaborate, we know why we're doing it, we know what we get from it, and we know how it contributes to our goals and objectives.


Monday, 5 October 2009

Accessing Useful Knowledge: musings from a train carriage

Sitting on a train, slowly trundling through Hertfordshire, my thoughts turned to the challenges of knowledge and information sharing.

I was minding my own business, surrounded by other similar people, also minding their own business and I started to think that if I had a need for knowledge and information, what would be my best course of action? What would be the most efficient and effective way to obtain, share and distribute information and knowledge?

Pondering this question produced some interesting thoughts.

If I needed a particular newspaper, document or magazine article, that I'd forgotten to bring along with me, my best bet was to stand up, forget I was English, and ask my fellow travelers whether anyone had a copy. A long shot I know, but a direct request for specific information was my best chance.

On the other hand, if I had a less structured knowledge and information need what would work best?

If I wanted to exchange information and knowledge regarding how to get people to share their knowledge in a work environment, and how to persuade them, "not to panic" and convince them that knowledge sharing, "is a good thing", my best bet is not to ask a specific question out loud, or to call, tweet, or email the people in the carriage. My best bet is to try to get a conversation going between all the people in the train carriage.

Back in the real world, persuading a bunch of strangers to talk to each other on a train is only going to happen if the train grinds to a halt and all the lights go out - otherwise, forget it.

However, the thought emphasised for me that often the best means of communication is face-to-face. The best way to exchange knowledge and information in order to meet a range of needs is to get a group of people to sit in the same physical space, and with a clear idea of the boundaries and objectives of the meeting, to talk to each other in the real world.

Other forms of more distanced communication, email, phone, etc have their place and are very popular and useful, but in this world of technology let's not lose track, let's not forget, that having a discussion with a real person is often the best way to communicate.