Seven Notable Silicon Valley 'Big Data' Startups
Not long ago, thanks to an invitation from a group called Stanford SEED, I had the chance to give a talk to Stanford masters and PhD students, exchange students, and Koreans living in Silicon Valley. As I was chatting with the organizers, we decided to build the talk around the theme of Silicon Valley, rather than just talking about a book or about myself. They suggested it would be nice to cover notable startups, so I set out to surface not only companies I had long been interested in but also some new ones, and did some research. There are so many capable startups in Silicon Valley that picking just a few was itself not easy, but what sparked my interest most were data-driven startups. Part of it is that I am personally very interested in this area, so such companies stand out to me. On AngelList, one of the best curated startup directories, a search for 'Big Data Analytics Startups' turns up a whopping 558 results. I got to know some of them more deeply through interviews after being contacted via LinkedIn, so I will focus on introducing companies I have actually used or whose internal workings I know well.
I put the words 'Big Data' in the title to grab attention, but honestly 'big data' is often used as a marketing buzzword, and it is not a word I love. Analyzing data and drawing insights from it has always been done; it is not something new in itself. Still, it is appealing that you can build a business simply by analyzing data. And most startups that fly the big-data flag are using Hadoop along with the Map-Reduce algorithm, so calling them big data companies is not unreasonable.
1. TellApart
TellApart is an Ad Retargeting company. I first learned about them a few months ago while researching big data companies, and later got a much more detailed picture when a recruiter reached out to me. The company was founded by Josh McFarland, a Stanford economics graduate and former Google product manager. I was able to hear the full founding story in an interview. In 2009, James Slavet of the renowned Silicon Valley VC Greylock Partners was looking for a new partner to bring on board, and decided on Josh. After the discussion had progressed smoothly, at the dinner where they were to finalize things, instead of accepting the partner role, Josh said he actually wanted to start a company. He did not seem to have an idea locked in at the time. James then offered him an EIR (Entrepreneur in Residence) position, and Josh took it. Over 8 months he conceived a new business there and built a prototype, and in April 2010 household-name VCs and investors, including Greylock Partners, SV Angels, Dick Costolo, and Reid Hoffman, invested $4.75M (about 5 billion won) to officially kick off the company. They went on to raise an additional $11M (about 12 billion won), and today they have about 50 people at an office in Burlingame, not far from San Francisco. Last December they announced that their annual revenue run rate had passed $100M (110 billion won) and they were profitable. That is 2 billion won in revenue per employee, truly a rocket ship. Their customers already include luxury department store chain Neiman Marcus, Warby Parker, which is rewriting the history of glasses and sunglasses, and kitchenware franchise Sur La Table.
Let me pause here to talk a bit about Ad Retargeting. As the name suggests, it means 'sending ads again': when someone visits a shopping site, browses, and leaves without buying (about 98% of the time, apparently), later, when that person uses another service such as Facebook or Gmail, the retargeter sends them the brand and product again to drive return visits and purchases. Take a look at the image below.
Ever since I visited the homepage of an ad retargeting company called AdRoll, ads like these have occasionally shown up in my Twitter timeline. Likewise, after visiting TellApart customer Hayneedle.com, Hayneedle ads have been relentlessly appearing in my Facebook timeline.
When sending an ad to me, the most important thing is judging whether I am worth sending an ad to. That is because every ad shown to me costs money. It is not easy to do without collecting as much information about me as possible. So how fast and how accurately a company can understand me becomes the competitive edge of an ad retargeting company, and that analysis falls squarely in the big data domain.
On the other hand, the fact that they already have information about the sites I visited and use that to send ads feels a little uncomfortable. If sensitive information is being used, there is potential for invasion of privacy. Canada seems to apply this more strictly. There is a story where a Canadian, after visiting a website selling sleep apnea treatment devices, kept seeing related ads on Google, reported it, and the Canadian government pressured Google on the grounds that it had violated personal information protection law.
2. The Climate Corporation

The Climate Corporation website
About two years ago I first learned about this company by listening to a 2011 Stanford talk by founder David Friedberg. After that a company recruiter reached out and I got a sense of where the company was, and a few months ago I met a product manager working there and got a detailed look.
Likewise, this is a company built purely on 'data,' and what they do is fascinating. The founding story is best told in David's lecture. When he was working at Google, he walked past a bike rental shop on a rainy day and thought:
On a rainy day like this, who would rent a bike? For that shop, weather is directly tied to revenue. If I collected more accurate weather information than anyone else and sold it, wouldn't that be a business?
Convinced it could work as a business, he left Google, started the company, collected weather information, and began selling the product to places where weather impacts business results, such as bike rental shops, construction companies, ski resorts, travel agencies, farmers, and so on. The idea was: customers would pay a set amount regularly, and whenever weather that hurts sales hit (too cold, raining, etc.), they would be paid out immediately. A kind of insurance product.
Results were weak. There was a gap between being interested in the weather and actually paying for a product that is effectively insurance. After a lot of struggle, he found the market: farmers in Iowa. Their crops were heavily affected by small climate shifts, and since they were already used to insurance, they understood the product right away. The company decided to focus only on farmers and further specialized the product. The result was a huge success.
I said it is similar to insurance, but one big difference is that, unlike insurance, this company has no 'claim filing' process. The company monitors climate shifts across the entire US, and when abnormal weather appears (temperature below a threshold, humidity above a threshold), it automatically deposits money into the subscriber's bank account. That convenience resonated with many customers.
A few months ago, in October 2013 they were acquired by Monsanto at a very high price of $1.1B (1.2 trillion won), and founder David, of course, became a billionaire.
3. SmartZip

SmartZip homepage
When doing real estate transactions in the US, most people work with someone called a realtor or broker. You can get a detailed sense of their role from the nadoo post 'The Process of Buying a Home in the US at a Glance,' but there are some differences from Korean brokers.
First, they spend much more time getting a home bought or sold. It is not like: 'Samsung Raemian, 24 pyeong, 680 million won, move-in ready next March.' Each home has a different shape and style, so it takes much more time to find a home the customer wants. The same is true when selling: rarely do you just show the lived-in home as-is and sell it. The realtor usually hires people to stage the home attractively and then holds what is called an 'open house'—a time when interested buyers come, see the home, and ask questions.
Second, the commission is much higher. Each side's agent (buyer and seller) takes 3%. Conventionally, the seller pays 6%, which is split 3% and 3% between the two realtors. In California, a decent home goes for over $1M (about 1.1 billion won), and in areas like San Francisco, Palo Alto, and Cupertino, $3M homes are common. Selling a $3M home nets each realtor $90K, almost 100 million won. Earning 100 million won from selling one home is not bad.
Representing the seller takes far less time than representing the buyer, so naturally every realtor wants to work for the seller. The problem is that the number of homes coming on the market is far smaller than the number of realtors. To win the seller, realtors run all sorts of marketing to make their names known.
SmartZip solves exactly this problem. They gather all the information about homes and the people who live in them. When was it bought and for how much, how many rooms, how close to the highway, how big is the yard, how many people live there, what is the household income, and so on. They reportedly collect up to 2,000 attributes per home. Using all this information, what they do is 'Predictive Analytics.' That is, the core of the algorithm is to identify in advance which homes are likely to come onto the market in the next 6 to 12 months. The concrete method is below. In fact, it is a commonly used data mining approach.
- Continuously gather information about homes and households.
- When homes come on the market, use half of them to train the algorithm using events that occurred at those homes 6 to 12 months earlier.
- Apply the trained algorithm to the other half of the homes and see how effective it is.
- If training outcomes are poor, adjust the algorithm and run again. Repeat until the algorithm quality is at its peak.
- Once accuracy is good enough, apply the algorithm to the information about homes not yet on the market.
- Score the 'likelihood of being sold soon' and rank homes accordingly.
I do not know how accurate their list is, but according to their claims, homes in the top 20% of the ranking come on the market within a year with a 40 to 50% probability. If that is true, it is a list worth gold. Realtors would happily pay for it. Indeed, many realtors are paying for and using the service and testifying to the big impact.
Like this, many big data companies use a technique called 'Machine Learning.' It was the most interesting subject I took in my MBA, and I wanted to learn more, so I even did homework on Coursera. As the word 'machine learning' suggests, you 'train' an algorithm on a large amount of data and then use it to make new inferences. Sometimes the results are absurd, but in many cases the accuracy is astonishingly high, and it is being applied to more and more fields.
4. C9
What this company does is not easy to describe; they call themselves a Revenue Performance Company. They have built several products, but at the core is 'revenue forecasting.' Like the earlier companies, they use machine learning to analyze data. Using the attributes of in-progress deals, they calculate each deal's probability of closing, and use that to estimate quarterly or annual revenue. Accurately forecasting quarterly revenue is a very hard problem area, and how accurately you can do it is important enough to indicate the caliber of a company.
Suppose a company has 100 salespeople. Suppose they are working on hundreds of deals. Each deal is at a different stage. Some have just met the buyer; some are close to signing after good conversations; some have been nurtured for years without much progress. Some deals close with a single phone call; others only after a long investment. You gather all the historical information from past deals, analyze it, build an algorithm, then apply it to in-progress deals to estimate future revenue.
They already count companies like LinkedIn and Pandora as customers, and the company shows continued growth potential.
5. Kaggle
Kaggle, a company whose spelling and pronunciation are similar to Google, making it easy to remember. What they do is connect data scientists with companies that need their skills. There are many interesting challenges on the site, and some carry large prizes.
GE and Alaska Airlines are among the biggest customers. Their Flight Quest 1 with a total purse of $250,000 ended last year, and the second challenge with a $220,000 prize is nearing its submission deadline. The goal is to analyze Alaska Airlines' vast data to help pilots fly more efficiently and on time. The data includes weather and countless other variables, and information about how long each flight took to arrive, whether it was delayed, and so on. In the first challenge, 173 teams and 236 data scientists competed. According to the video featuring the second-place winners who took the $50,000 prize, the two of them together spent over 300 hours.
Many interesting challenges are posted right now. One is a 'loan default' prediction challenge from Imperial College London. Data on 200,000 customers is provided, containing each customer's situation when they took out the loan, whether they paid it back, whether they defaulted, how big the loss was if they did, and so on. The team that builds the algorithm with the best prediction wins and takes a $10,000 prize.
Founder Anthony Goldbloom, an Australian born in 1983, started Kaggle in 2010 and raised $11.25M (about 12 billion won). I do not think the investors will necessarily see a huge exit, but regardless, they deserve high marks for creating an interesting marketplace where data scientists around the world gather to showcase their skills.
6. Mattermark

Mattermark.com service screen. They collect all kinds of information about private companies, with scores assigned based on how 'hot' each is.
Calling this company a 'big data company' may be a stretch, but either way, it is a startup that gathers, processes, and sells data. What they do is collect as much information as possible about pre-IPO companies (headcount, VC investment data, web/mobile app popularity, social network metrics, etc.) and use it to rank 'hot' companies. An acquaintance of mine joined as the company's first employee, so I got to know the company well, and I even signed up as a paid member and used it for a while. They have gathered a surprising amount of information about private companies, i.e., startups, and it is very useful. Their main customers are VCs (venture capital firms). The cost is $500 per user per month, and they have already signed up many VCs.
VCs do this kind of company analysis themselves, but that is something only large VCs can handle; many smaller firms cannot afford to. Mattermark fills that gap. Customers do not need to be only VCs. Private equity firms looking for acquisitions, or M&A departments at large companies, can also be customers.
A big reason they landed paying customers right after launch is the role of founder Danielle Morrill. Her Crunchbase profile lists high school as her final education, and her LinkedIn profile shows a high school graduation year of 2003, so she is quite a young founder. She is well-known for her active presence in Silicon Valley, and her blog is especially famous. She writes really well. Before starting this company, she founded Referly, and the blog post she wrote when deciding to shut that company down due to slow growth went viral, which seems to have made her name widely known. If you subscribe to the Mattermark newsletter, you can get her curated weekly read-and-summarize digest. It is one I never skip.
Their office sits in a San Francisco high-rise, and their team page is styled rather nicely. They are steadily taking in $500 per customer per month for gathering and processing data, and already have a decent customer base, so I am looking forward to their growth.
7. Lumosity
This one too does not quite fit the same category, but it is similarly a data-driven model. Contrary to older theories that once you reach a certain age the neural networks of the brain harden and you cannot get smarter, this service is built on the theory of Neuroplasticity, which says the brain is malleable and plastic. It is a service that helps you 'train' your brain. I only recently learned about it and paid for a month. At first it was quite unimpressive, but as the difficulty ramped up it became genuinely interesting. I will need more time to judge, but it seems to help with brain training.
It is divided into five areas: Speed, Memory, Attention, Flexibility, and Problem Solving, and each area has various games. One of my favorites is a memory game called Pinball, shown below. Several pins appear on screen, then disappear completely. Next you see a marker showing from which corner the ball will be shot. Now you must remember where the pins were and predict where the ball will end up after multiple bounces. As the number of pins grows and the number of bounces increases, it becomes fairly hard to predict.

Lumosity Pinball, one of the games
According to the TechCrunch interview video, founder Mike Scanlon explains that seeing his grandfather pass away from Alzheimer's disease when he was young was the trigger for starting the company. Curious about how to prevent such things, he studied neuroscience at Stanford, where theories that could help brain training had been researched but nobody had turned them into a service that actually helps people, so he built one himself.
Maybe because it is a service made by a Stanford neuroscientist who watched family suffer from Alzheimer's, the service launched in 2007 and, as of April 2013, had 40 million members. Considering the subscription fee is $12 per month, revenue must be substantial. Another company I expect will keep growing.
The future of big data
A few days ago Amazon became a topic of conversation when it was revealed that they registered a patent titled Anticipatory Shipping. Before a customer even presses the order button, Amazon predicts whether they will order and starts shipping the item to a warehouse near the customer. By the time the order is placed, the item is already close, so delivery can be much faster. Analyzing data to learn customer tastes is already impressive, but now predicting purchases too? It really does look like Amazon intends to rule the world. You can view the registered patent here. From a quick read, the main idea is: without knowing the final delivery address, ship the item nearby first, and once the address is determined during the shipping process, then ship it to the precise address. If the item was sent to a nearby warehouse but the customer ultimately did not order it, compare the cost of returning the item to the central warehouse with the cost of simply sending it to the customer, and if the latter is cheaper, ship it to the customer for free. It would give you goosebumps to think Amazon guesses you want something and sends it before you order.
In the future, will the coffee you were thinking of ordering already be made on the way to Starbucks, the clothes you will buy already picked out on the way to the department store, the menu you will order already prepared on the way to the restaurant? That is a bit of a stretch, but still, it is not entirely comforting that we keep trying to figure out and predict the sacred free will of human beings.





