Seeding your user generated content site (or not)
We often get asked why we didn’t seed Spotstory with content from some other source. Certainly, databases with location data are easily obtained and integrated.
I imagine that anyone launching a site that depends on user generated content, or some other sort of social media, is going to ask themselves this question. I’m not sure that there is one right answer for everyone’s circumstance, but I can explain how we made our choice.
Pros
Here are a couple of reasons we came up for why we might use seed data.
- Empty sites are no fun We’ve probably all had the experience where we see a site’s launch announcement on TechCrunch or Mashable!, only to find the the site contains only three or four data records (usually someplace in downtown San Francisco.) Seeding with a pre-existing database seems to solve this problem.
This is the big argument for seeding, and, no doubt, for certain services it makes complete sense.
- Scaling You want to make sure your system scales to handle the large datasets (you hope!) you will have some day.
This argument is probably less often voiced, but it should be! It’s definitely the right thing for everyone to do: you can use seed data during your development process, and then discard it before you deploy. You can still keep it around in your development environment. This is what we do.
Cons
Here are some of the reasons against using seed data.
- The site is still empty anyway We’ve all seen this too: you go to a site, click on a link, or search on a latitude/longitude, or place name and get 1000 results. You click through them and find … nothing.
Aron and I discussed this at great length when we were getting started. We’d experienced many sites that were filled with this “empty” data and the sites just felt … empty. Big certainly, but empty nonetheless.
(I have a lot more to say here, but it’s a tangent to the topic at hand. I’ll write more about it later, I promise.)
- It’s illegal It’s mechanically easy to access and import data, but do you have the right?
Now, there is a lot of data in Wikipedia that would be appropriate and welcome in Spotstory. The problem is Wikipedia uses the the GNU Free Documentation License and Spotstory uses the Creative Commons Attribution-ShareAlike 2.5 License for all text content. Though these licenses share the same spirit, they are not the same.
Some might see this as a technicality, but we don’t. (And, yes, it is a challenge keeping unadulterated Wikipedia content out of Spotstory!)
In the end
In the end, we decided we’d start from scratch, and try to grow as much content as we could before we launched. We felt it was important that actual people discover and share the Spots. We thought it best that every Spot have associated with it a person who considered it interesting enough to share with the rest of the community.
Ultimately, this approach seemed most appropriate to the spirit of the site.
We’re based in the Boston area, and we’re happy at this moment to try and build a critical mass of interesting content for this area, and so far it seems to be working out well for us.
But, please, don’t let that keep you from creating Spots in other places! :)
No comments

