View profile

🐰 #32 Please don’t become the next AltaVista, TikToks Algorithm, Snowflakes Difference; ThDPTh #32 🐰

Three Data Point Thursday
🐰 #32 Please don’t become the next AltaVista, TikToks Algorithm, Snowflakes Difference; ThDPTh #32 🐰
By Sven Balnojan  • Issue #32 • View online
Why your company probably will end up on a graveyard of data companies, how TikTok’s algorithm works and why snowflake seems to be different than the other databases.
If you’re reading this via email, then congrats, you’re one of the first 100 subscribers of this newsletter. Small but excellent seems to be a good mantra there.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, read it.

(1) TikToks Algorithm
Why is TikTok’s algorithm so good? This is a lovely video explaining some ideas behind the algorithm. WSJ simply created hundreds of fake accounts, watched videos, and then tried to reverse engineer the algorithm. Their key insight: the algorithm’s most important input is how long you “linger over/ rewatch a specific video”. So, why is it in this newsletter? Because I had a few thoughts watching this video:
  1. The algorithm is really successful.
  2. It’s apparently pretty simple.
  3. It could be much more successful (if it wasn’t optimized for short-term wins, but long-term wins, which I think a lot of the criticism is about. Call me naive, but I think every company should want healthy, functioning & well-informed customers in the long run).
  4. It’s apparently easy to reverse engineer.
Putting 1–3 aside, think about (4). This algorithm seems easy enough to reverse engineer, even though bytedance seems to think this is an important piece of intellectual property. I think the reason is that machine learning algorithms are seldom engineered with the goal of being hard to reverse engineer. So if you do truly think an algorithm is an important piece of intellectual property, maybe you should think about the chance of reverse engineering as well.
(2) Why Snowflake is different
This article takes a stab at explaining why Snowflake is fundamentally different then what’s happening at Azure or AWS. Since no one of us has a good view into the sources of these solutions, it’s only a guess, but the story makes sense.
The basic idea is that Redshift is Postgres + massive parallel processing, but still Postgres. Whereas Snowflake truly was built to decouple storage from compute. No matter where these products stand, what is true is that Snowflake focuses on decoupling storage from compute. That’s a key difference from what e.g. Redshift does. On AWS it’s a feature, on Snowflake, it’s a USP.
The #1 Reason Snowflake is Different | by Doug Foo | Geek Culture | Medium
🔮🔮🔮 Data Company Corner 🔮🔮🔮
Stuff that might be interesting for anyone at the front line of the data world, inside a data company, inspired by much positive feedback from my article on commercial open source software data companies.
(3) Graveyard of Search Engines
A point I’m trying to make for some time now is that in the data space, you very likely have to embrace open-source. In the open-source & data space, you’re headed into a winner takes all market.
But that means you are not building “another ETL tool” or “another CDP solution”. You’re either building the only one — the winner, or a company headed for bankruptcy sometime in the next 10–15 years.
I really like the search engine market analogy because it displays this dynamic very well. The search engine market is much older than either the data or the open-source market, and developed a very clear structure:
  1. One winner: Google
  2. Two follow-ups because they enforce monopolies (Yandex & Baidu)
  3. One well-differentiated player, duckduckgo, with an uncertain future (Google did try to take down Baidu, but got stopped, I am not sure what’s really stopping Google from taking down duckduckgo once it becomes big enough).
That’s it, that’s 99% of the complete search market. Here you can find everyone else:
7 Search Engines Google Obliterated
And as a side note: The CMS market looks kind of similar, dominated by one huge open-source player.
🎄 In Other News & Thanks
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Did you enjoy this issue?
Sven Balnojan

Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue