Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
This week I stumbled through a series of things, the data mesh learning slack community, a primer on Reverse ETL, and Data Product SLAs.
I “got” stumbled over the data mesh learning slack community, which is an excellent place to learn about data meshes and ask questions to people already involved deeply in building these things. Scott Hirleman et. al. did a great job of…
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
This week I stumbled over a couple of important topics I mentioned before, data meshes, test-driven development for data teams, and testing data with great-expectations.
I was about to write an article myself about a test-driven workflow for data people’s “daily bread”. But low and behold, Marcos Marx was faster. …
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
As you might’ve noticed by the title, I got to think about data product management this week and in particular three topics which I like to share my favorite resources on because I think product management is so important for data teams.
I recently picked up a great paper from the data engineering weekly newsletter, published by Google about data, quality, and the impact on…
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
The three data points for today are next-gen data lakes with lakeFS, declarative DAGs with boundary-layer, and fast data engineer onboarding with SQLPad.
lakeFS is a tool that provides a layer on top of your AWS S3 or GCS data lake. It allows automatic versioning and branching of your data. The team provides lots of best practices, e.g. showing how to set up a data…
“We needed an extra day to merge the transformation branches together”, “Ah yeah but there was a bug once we finally got the data to production, so we had to redo some stuff for another 2 days”,… sound familiar? To me, it seems like data and analytics engineers are particularly prone to run into the “merge hell” or the “defect in production” scenario.
But there is a good software engineering practice that can resolve these problems altogether! It’s called “trunk-based development” (TBD). …
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this near future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
— —
This letter has a topic, which is the data that is not in table form. Huh? Yep, lots of data sets look pretty “tabular”. It comes in tables. Some data, like images, comes in different forms, but can still be put into tables.
But SOME data does not come in table form. That’s the basic idea behind a field called “geometric deep…
Machine learning, AI, Data Science all carry lots of scary and complicated concepts like deep neural networks, cross-entropy, optimization….
Enough scary words to scare off any product manager but the really tech-savvy from even thinking about integrating machine learning into their products at all.
But that, in turn, makes it hard for a company to get all the value out of their machine learning engineers if most product managers shy away from employing them.
I like to use a dead simple checklist which, in my opinion, any non-techy product manager can use, to spot whether a business opportunity lends itself…
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this near future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
Here are your weekly three data points: Cloud Data Warehouses, CI & CD for Data Warehouses, and Continuous Delivery for Machine Learning (CD4ML).
First time I share an actual data point, a number:
Gartner estimates that by 2023, 75% of all data warehouses will be in the cloud.
Why do I share this? Because I think it’s important to know! Your future data warehouse…
Artifact, from the two Latin words, arte “by skill” and factum “to make”. Something skillfully created on purpose. A great word to describe something created by a development process.
In the discipline of Business Intelligence, the collection of “technologies & processes” in a company to systematically analyze data, I find that six artifacts are at the cornerstone of most processes.
In most articles on data architectures, some of them are missing, or at least the appropriate tooling is, maybe intentionally maybe not (see for instance the great article at a16z , which mostly focuses on reports, dashboards & ad hoc…
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand this near future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
Here are your weekly three data points: DataOps Testing, AirBnBs Quality Initiative, Testing with dbt.
1 DataOps, Value and Innovation Pipelines, DataKitchen
In DataOps companies aim for error numbers of 1 or less a year. To an average data guy that might sound crazy! …
Ph.D., Product Manager, DevOps & Data enthusiast, and author of “Three Data Point Thursday”: https://www.getrevue.co/profile/svenbalnojan.