An Austin, Texas, startup called Pilosa
launched today with an open source product it says can vastly speed up access to massive datasets, and also make it much easier to federate them in a single place. The key piece of its technology is a distributed bitmap index (essentially, storing data as single bits—0s or 1s) that’s designed to fit in-memory. Pilosa says its index technology can sit on top of pretty much any data store, and that its closest natural competitor is Elasticsearch.
Pilosa spun out of a sports-fan-management company called Umbel, which used (and uses) the technology for customer segmentation. Pilosa also claims some early users in the bioinformatics and network security spaces.
I spoke with CEO H.O. Maycotte and VP of Product Troy Lanier, who say the company’s initial focus is on building an open source community (here’s the Pilosa GitHub repo
) and figuring out what the next set of ideal use cases might be. Although Pilosa does have an enterprise edition, as well.
Maycotte used a pretty illustrative analogy to describe how people might think about Pilosa and where it would fit into their current big data environments:
“The card catalog’s always sat in the front of the library, but the truth is that the library is now just a very small piece of the information that you have access to as a students or as a citizen of a city. What we’re proposing now is, let’s make the card catalog its own building, and let’s not just look at the library that’s next door. Let’s look at all the libraries at once.
"That’s really what we want to do to your data. If it’s huge and it’s fragmented, we want to help you access it faster.”
The problems Pilosa is trying to solve for are obviously a huge deal today, with exploding volumes of data coming off everything from genomes to connected devices, and with data silos still so prevalent inside large companies. However, even assuming the technology works as advertised beyond its home at Umbel, Pilosa faces the challenge of convincing folks to give it a try. And then of convincing them to re-architect their data environments to incorporate yet another distinct technology on top of/in lieu of Hadoop (et al), Spark, Elasticsearch, Neo4j or whatever else they’re using.
If the company is lucky, some common use cases and architectures will emerge from its community-building efforts, at it can leverage more well-known partner technologies as a foot in the door for educating the market.
In other news (with only a hint of self-promotion), accelerator 500 Startups announced the 500 Startups Data Track
today, which will focus on companies building big data, machine learning and AI technologies. Chris Neumann (formerly of DataHero and Aster Data, and now venture partner at 500 Startups) is heading it up, and I’m proud to be among the group of speakers/mentors/coaches for the first batch of companies. I’ll be providing guidance mainly around marketing and PR, two things about which I have some experience on both sides of the equation.
The program kicks off on June 17, and interested startups can apply here