Enter into data mesh learning mode, a new data tool category, and why you need data SLAs.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

This week I stumbled through a series of things, the data mesh learning slack community, a primer on Reverse ETL, and Data Product SLAs.

1 Data Mesh Learning Slack Community

I “got” stumbled over the data mesh learning slack community, which is an excellent place to learn about data meshes and ask questions to people already involved deeply in building these things. Scott Hirleman et. al. did a great job of…


The 101 of Data Meshes, testing data with auto profiling, developing data pipeline in a test-driven way.

Image for post
Image for post
Created by the author with pitch.com. The image has nothing to do with the newsletter, it just came to my mind while observing the late covid-19 statistics in Germany, and it’s a data point.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

This week I stumbled over a couple of important topics I mentioned before, data meshes, test-driven development for data teams, and testing data with great-expectations.

1 TDD for Data People

I was about to write an article myself about a test-driven workflow for data people’s “daily bread”. But low and behold, Marcos Marx was faster. …


Three Tips on data product management; How data errors cascade down, why one-piece workflows are much better than context switching, and why you shouldn’t fix all data bugs.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

As you might’ve noticed by the title, I got to think about data product management this week and in particular three topics which I like to share my favorite resources on because I think product management is so important for data teams.

1 Data Errors Cascade, Watch Your Data or Your Data Project Dies

I recently picked up a great paper from the data engineering weekly newsletter, published by Google about data, quality, and the impact on…


Version Data Lakes, Declarative DAGs and shared SQL stuff with SQLPad.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

The three data points for today are next-gen data lakes with lakeFS, declarative DAGs with boundary-layer, and fast data engineer onboarding with SQLPad.

1 LakeFS, versioning and branching data

lakeFS is a tool that provides a layer on top of your AWS S3 or GCS data lake. It allows automatic versioning and branching of your data. The team provides lots of best practices, e.g. showing how to set up a data…


Getting Started

How to avoid the merge hell, speed up delivery of business value, reduce defects, and live happily ever after in your data warehouse.

Image for post
Image for post
Faster development, fewer defects on deployment to production with trunk-based development in data workflows. Image by the author.

“We needed an extra day to merge the transformation branches together”, “Ah yeah but there was a bug once we finally got the data to production, so we had to redo some stuff for another 2 days”,… sound familiar? To me, it seems like data and analytics engineers are particularly prone to run into the “merge hell” or the “defect in production” scenario.

But there is a good software engineering practice that can resolve these problems altogether! It’s called “trunk-based development” (TBD). …


PyTorch BigGraph, Alibaba's Euler, and the PersLay Framework.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this near future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

— —

This letter has a topic, which is the data that is not in table form. Huh? Yep, lots of data sets look pretty “tabular”. It comes in tables. Some data, like images, comes in different forms, but can still be put into tables.

But SOME data does not come in table form. That’s the basic idea behind a field called “geometric deep…


Use 7 simple questions to find machine learning opportunities, even without any technical knowledge

Image for post
Image for post
(Photo by Markus Spiske, Unsplash)

Machine learning, AI, Data Science all carry lots of scary and complicated concepts like deep neural networks, cross-entropy, optimization….

Enough scary words to scare off any product manager but the really tech-savvy from even thinking about integrating machine learning into their products at all.

But that, in turn, makes it hard for a company to get all the value out of their machine learning engineers if most product managers shy away from employing them.

I like to use a dead simple checklist which, in my opinion, any non-techy product manager can use, to spot whether a business opportunity lends itself…


Your data warehouse will be in the cloud, period. Data teams and Machine learners are adopting CI & CD as well.

Image for post
Image for post
CD4ML Demo based on GitLab CI.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this near future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

Here are your weekly three data points: Cloud Data Warehouses, CI & CD for Data Warehouses, and Continuous Delivery for Machine Learning (CD4ML).

1 Cloud Data Warehouses; Which to Choose?

First time I share an actual data point, a number:

Gartner estimates that by 2023, 75% of all data warehouses will be in the cloud.

Why do I share this? Because I think it’s important to know! Your future data warehouse…


Dashboards, Graphs, Reports, Spreadsheets, OLAP Cubes, or direct SQL Access?

Image for post
Image for post
The six BI artifacts. The usual company goes through a journey of spreadsheets -> Reports & Dashboards -> “somewhere else….”.

Artifact, from the two Latin words, arte “by skill” and factum “to make”. Something skillfully created on purpose. A great word to describe something created by a development process.

In the discipline of Business Intelligence, the collection of “technologies & processes” in a company to systematically analyze data, I find that six artifacts are at the cornerstone of most processes.

In most articles on data architectures, some of them are missing, or at least the appropriate tooling is, maybe intentionally maybe not (see for instance the great article at a16z , which mostly focuses on reports, dashboards & ad hoc…


Understanding DataOps Testing, doing it with dbt, and how central initiatives are key to data.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this near future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

Here are your weekly three data points: DataOps Testing, AirBnBs Quality Initiative, Testing with dbt.

1 DataOps, Value and Innovation Pipelines, DataKitchen

In DataOps companies aim for error numbers of 1 or less a year. To an average data guy that might sound crazy! …

Sven Balnojan

Ph.D., Product Manager, DevOps & Data enthusiast, and author of “Three Data Point Thursday”: https://www.getrevue.co/profile/svenbalnojan.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store