Wednesday, August 29, 2012

The problem with crowdsourced content

We have whole lot of sites featuring reviews written by their readers on pretty much all topics. Of these, at least the front runners attract large traffic volumes, have a devoted user base, and are, I assume, making money. However, I want to discuss what I think is wrong with them.

The problem is the method of generating content. All these sites rely on a bevy of users to come write reviews on the products/service that the site focuses on and that they have used/experienced. This can be restaurant, book, travel, gadgets, or whatever else. Via friendly UIs, facilitated social media interaction etc., users are encouraged to contribute data for each others benefit. This strategy is very effective in generating large amounts of data. However, it is very bad at generating cohesive data.

In general, I have three somewhat interconnected problems with this:
  1. Data is of poor quality – Since the website wants readers to submit reviews, it can rarely hold them accountable to the quality of their writing. The aim is to lower the barrier to writing and social media sharing. Get him to write. No matter he writes, get him to write. To be fair, most of the reputed website will intervene if you write inflammatory or profane material, but apart from that, pretty much anything goes. As a direct consequence, the quality of review in terms of both the content (what is written) and the form (how it is written) goes down. Most people write unbalanced reviews, either giving full marks and endless praise or griping about a very bad experience.This can be avoided to some extent by making sincere efforts at moderating and community building (StackOverflow is a great example), but none of the major commercial websites seem to be doing so.
  2. There is just too much data – This is the explicit result of successfully inducing readers to submit content to a site. Since anyone can submit anything, the data volumes are large, and it becomes well-nigh impossible to find information. This is what I like to call the Problem of 500 reviews. What I mean by that is that on any successful reviews site, today you can find 500 reviews for pretty much every single item. Too much data is not much better than no data. The best this deluge of disjointed reader inputs can give us is a general sense of how people like something. As an experiment, choose any famous book about which you know nothing, and try to find out about the book using only GoodReads reviews. I am fairly regular on the site, but I mostly do it for the bragging rights, and to let my friends know what I am up to.
  3. Data is without structure – I totally agree that a successful travel site of the kind that we are talking about will have all data about some destination. But how do we find this data? Since it is broken up across a large number of unconnected reviews, it is very difficult to present the information in a coherent, intuitive manner. It is now left to the reader to sift through the data that each reviewer has provided and collate the data he needs (when to get there, how to get there etc.)
The crowdsourced content model is like a group discussion where everyone is talking at the same time. There is no anchor or reference around which a discussion can be built.

IMO, a far better alternative is to have an informed member write one review, and then use that to gather all sorts of varied and personalized experiences regarding the topic of discussion. It may seem so up front, but such a model (critic-driven model, if you will) is not about classroom style information broadcast. The Expert has not spoken. It is about providing a structured core, the basic information, and then inviting the readers to extend that into a wider body of information. If you want people to spend time and effort sharing their opinion, it is only fair that you offer them something in return.

3 comments:

  1. I agree with you in this particular context and strongly advocate creating meaningful moderated reviews for various things(books in current context). But the crowd sourcing of information is the most powerful way at present to gather information in the broader context of the things and is the only way to scale up information gathering when the users can only sustain a website where managing active and huge amount of content on personal capacity is impossible.

    ReplyDelete
  2. UJJWAL - This is what happens when the only target is to scale riding on user generated content - http://tcrn.ch/RnzVDq

    And my context wasn't books. It was any structured data, typically reviews.

    Challenges to scaling are not (or at least shouldn't be) the paramount concern for a service aiming to disseminate information. The quality of the data itself is the benchmark against which everything should be evaluated.

    Neither is everyone likely to attain FB style scaling (1 billion users and still going strong) regardless of strategy. So what do you do?

    One road to take is the one I have outlined here - take some informed, dedicated people and let them communities and discussions around informative reviews. This is old-school scaling - organic growth. You want more content, you hire more people. Unlike what you think, it is not impossible. The news industry does it all the time. You need enough boots on the ground, that's all.

    The other way I can imagine is to write software that summarizes all points raised in community reviews into a bulleted list (or something similar) and shows them by order of importance (frequency of occurrence perhaps?)

    On the whole, I lean towards the first approach, considering summarization of hundreds of user comments, each with different style and tones, is a non-trivial computational challenge.

    ReplyDelete
  3. Hey.
    I'm follower #5. Thanks for following me.

    Cynthia
    http://thethingsyoucanread.blogspot.com/

    ReplyDelete