Modern enterprises run on data, but moving this data around and giving it the right shape so it can be used in specific applications remains a complex undertaking. Definity , which is launching out of stealth Wednesday and announcing a $4.5 million seed funding round, wants to give these companies the tools to observe, fix and optimize their data pipelines.
The twist here is that unlike many of its competitors, it doesn’t only look at the data once it’s transformed and deposited somewhere — at which point is becomes hard to troubleshoot when things go awry — but while the data is still in motion.
The startup supports a wide variety of environments but focuses on Apache Spark-based applications (on-prem or top of managed services like Google’s Dataproc, AWS EMR or Databricks, for example), which is maybe no surprise given that all of the co-founders have a lot of experience with open-source data-processing engines. CTO Ohad Raviv is a Spark contributor and the former big-data tech lead at PayPal. Roy Daniel, the company’s CEO, previously worked at FIS, while VP of R&D Tom Bar-Yacov was formerly a data engineering manager at PayPal.
In an interview, Daniel stressed that the company focuses on the data transformation plane on top of a data lake or warehouse, not the data ingestion part of the pipeline. Some of the issues the team experienced during its time working for these large enterprises include data quality problems brought about by inconsistent data, schema changes and stale data. “Those are data quality issues that propagate downstream,” he said. “They affect the business, whether it’s models that are working on top of bade data now, or dashboards or BI that is broken and all of a sudden, the CFO is like, what’s going on?”
Another problem is data pipelines that simply break, fail and then re-run, as well as pipelines that haven’t been optimized and end up costing far more to process than necessary.
“We met through a mutual friend,” Daniel told me when I asked him how the founding team first met and decided to tackle this specific issue. “We all come from financial services, but in our first meeting, we already realized that we’re actually fighting and are challenged by same problem from the two sides of the coin. And this was the spark, and we thought: ‘Hey, we should do something about that.’”
What makes Definity stand out is that it monitors the data in motion. This allows it to detect issues right at the source, making it easier to troubleshoot and to optimize these pipelines. It may not be impossible to diagnose the root cause of an issue if all you have is the final result, but it’s definitely a lot easier when you can look at all of the different steps that led to it. This also means that Definity could stop a pipeline from ever running if the input data is corrupted, for example.
“Today’s enterprise data leaders face a serious pressure to ensure the reliability of the data powering the business, while increasing scale, cutting costs, and adopting AI technologies,” said Nate Meir, a general partner at StageOne Ventures, which led Definity’s seed round. “But without X-ray vision into every data application, data teams are left blind and reactionary. Definity is addressing this need head-on with a paradigm-shifting solution that is both powerful and seamless for data engineering and data platform teams.”
Since the service uses an agent-based system, it also stays out of the way of the developers who build and maintain these systems. No code changes are needed, and the agents simply run in line with every Python or data application in the pipeline. It’s worth noting, though, that even for those customers who use Definity’s hosted service, only metadata is ever transferred to its servers.
The funding round was led by StageOne, with participation from Hyde Park Ventures Partners and a number of strategic angel investors.