What’s the t in EtLT? How to conduct manual data checks and the rise of the data engineer.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

(1)🔮 E T! LT

With the rise of EL (T) over ETL, we took a great step towards much simpler and better processes in the data world. But it is becoming apparent that in some cases a little (t) as in E(t)L (T) in some form is actually needed. Because for some data sources, we simply only want parts of the data, not the complete raw data…


What are the cool OSS data projects this year? How does a good data roadmap look like? And how to pitch the data mesh paradigm to the C-level.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

(1)🔥 Hot OSS Data Projects

Pete Sodeling ran a Data Council Survey to check out the hot data projects for 2021 which features a bunch of interesting ones you should have a look at. The good old Apache Airflow and the transformation tool dbt are on the list, but also a few other interesting tools you might not have seen.

A few ones I’d like to highlight: Apache…


Notes from Industry

How to apply the true Data as Code philosophy to achieve close to zero production defects using the tried & true methods from software development on data.

Yep, zero defects! That’d be awesome. Image by the author.

Data teams spent close to 60% of their time on operational things, not producing value. They also experience a large level of bugs in their data systems, according to the datakitchen study & Gardeners survey. Yet, in the software development world, we already have the philosophies in place that allow high-performing teams to deliver both quickly and at a high level of quality, without any of these problems.

So why don’t we carry over the exact same practices? After all, they originate in the lean manufacturing world and already got carried over to software. …


How GitHub, Meltano, Airbyte, and Atlassian manage to stay focused on bigger goals while still staying flexible and agile.

“As a PM, you must plan for the near term milestones (more detailed) as well as for the long term strategy (more broad), and everything in between. Considered as a spectrum, these form a nearsighted roadmap. This will enable you to efficiently communicate both internally and externally how the team is planning to deliver on the product vision.” (from the GitLab Product Handbook)

““In preparing for battle I have always found that plans are useless, but planning is indispensable.” (Dwight D. Eisenhower)

Planning, roadmaps, OKRs. The modern-day product manager has lots of different tools he can use to conduct his…


Finally a singer SDK, the data as a product webinar and using readme driven development for better data related work products.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

(1) Singer SDK + SingerHub + Spec Extension

I do believe that the future of data, be it BI or data integration & EL (T) workflows is open-source simply due to the nature of the task. So it’s great to see the GitLab meltano team tackle the three major challenges that are out there with one of the current options.

Let’s step back for a second. Currently, there really is only…


How self-service analytics works like at GitLab, how DeliverHero built their data mesh with BigQuery, and what you should know about Data Catalogs v.2.0.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

(1) 🔮 GitLabs Data Team & Their Self-Service Program

I just stumbled over the data team at Gitlab. The company Gitlab maintains software I personally enjoy using, they have over 1,000 employees and roughly 130$m in revenue. Of course, I like the openness of GitLab in general, and the precision with which e.g. the data team crafts their team page.

But what I really like is the presentation of their self-service program…


How to design great dashboards, why breaking your productive system makes it more robust, and the DBT coding conventions.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook, really ;)

(1)🔮 Chaos Toolkit

Chaos Engineering means breaking things in production to make them more resilient and better. The key idea is that test/stage systems are not similar enough to the production system to test out everything. Sounds like a crazy concept, but NetFlix very convincingly uses this exact idea to improve their systems day in and day out.

Data: so what? I find the…


Lots of Machine Learning Libraries to assess image quality, produce explanations for your models, or forecast & classify time series.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

(1) 🚀 Image Quality Assessment Implementation

The German price comparison website idealo.de provides an implementation of some interesting applied Google research from 2018 called “NIMA: Neural Image Assessment”. The paper describes two neural networks the team open-sourced. The first network aims to establish the aesthetic looks of an image, while the second takes a guess at the technical looks.

So basically, these two networks help you determine how pretty…


Why functional data engineering is the right approach to batch ETL, Machine Learning can use a functional approach as well and how to build evolutionary data architectures.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

🎄 (1) Functional Data Engineering

Two years ago, Maxime Beauchemin, the creator of both Apache Airflow and Superset published an article about why the functional paradigm is as important in data engineering as it is in software engineering. I very much agree and I feel this idea is still not completely absorbed by the community. …


What the future of BI looks like, how to generate proper unique keys in SQL, and a final look at how to build data platforms.

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

🔥 (1) The Future of BI is Open Source

Maxime Beauchemin, the creator of both Apache Airflow and Superset, just published a great piece about why the future of business intelligence is open source. I totally agree with him and still find it mind-boggling that open source is just now catching up to this. …

Sven Balnojan

Ph.D., Product Manager, DevOps & Data enthusiast, and author of “Three Data Point Thursday”: https://www.getrevue.co/profile/svenbalnojan.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store