Earlier this year, we released an anonymous data stack survey, through social channels and email, to find out more about data tooling and its impact on different company sizes and roles.
The survey was available to anyone but, out of the 189 responses we received, 89% were Metabase customers.
While we can’t say our insights are a statistically significant representation of data folks everywhere because of the small sample size, the results have a few things you may want to know, like how one specific database could be worse for your team’s morale... Keep reading to find out more.
75% of survey respondents said they’re using an open-source production database, so it’s no surprise that Postgres and MySQL were mentioned most often throughout the survey. But one surprise: larger companies choose an open source production database more often than not.
Large companies said open source, over performance, scalability, and security, was the deciding factor in choosing their production database. Share on x.comOn trend, 75% of respondents at larger companies said dbt is currently a part of their data stack. Open source was also in their top three reasons for choosing a data modeling tool.
Open source was in every corner of the survey results, which is not surprising given that our community rallies around all open-source tools, not just BI.
Salesforce was the number one upstream data source. Stripe and Slack were also in the top five. We’re curious to know what Slack data you’re ingesting...
Speaking of the top five, these tools all contain quite a bit of PII so it came as a surprise to us that 92% of respondents didn’t choose security and compliance as their top reason for choosing a data storage option. (This warrants a whole other survey...)
What wasn’t shocking to see: many folks aren’t ingesting data from one social media platform anymore.
That $42,000 per month enterprise API cost may have been the final nail in the coffin. X, formerly known as Twitter (RIP), barely made our top ten upstream data sources. Share on x.comAirbyte and Fivetran rounded out the top three, but in-house data ingestion was still more popular than the two combined.
Explore the dashboardMaybe legacy architecture forces people to build in-house ingestion tools. Or the cost of third-party tooling outweighs the benefits.
It could also just be that third-party ingestion tools are still growing, so maybe we’ll see a shift to them in the coming year.
But a good amount of companies are still choosing to build their own ingestion pipelines. We’ve seen a similar trend in data cataloging (more on that below).
You can keep those Python scripts handy for now. In-house data ingestion seems poised to stay as a complement to commercial offerings versus being replaced entirely by third-party ingestion tooling.
Although it’s one of the most widely used database in the industry, MySQL had the lowest role satisfaction score out of the three most commonly used analytics databases.
Explore the dashboard You may want to rethink your database... and your return to office policy, too. Those happiest in their role said they use PostgreSQL in a distributed team setting. Share on x.comIf you’re using MySQL and have opposing opinions to share, we’re all ears. As for our theory on MySQL's lower score: it's a battle-hardened database, but maybe MySQL is keeping older (less fun) codebases afloat.
Postgres users also said their companies are more self-serve than users of other analytics databases, so it may be a wise option if you’re a global, fully remote team.
Explore the dashboardPeople working on distributed teams said their companies are more self-serve than localized teams. Distributed companies need self-service tools and processes to work asynchronously and let workers query on their own time. This is pretty straightforward.
Explore the dashboardBut from the results around employee satisfaction, there is one large caveat. Perceptions of self-serve differ by role.
People in data roles perceived their companies as less self-serve than their C-level and Engineering counterparts. Share on x.com Explore the dashboardIt’s not surprising that C-Levels and Engineers see their companies as more self-serve. They’re the ones using self-serve tooling.
These results could mean that self-serve is doing as it was intended to do. It could also mean that Data Analytics folks think their companies aren’t as self-serve as they hoped for. There isn’t a huge variation here, but it’s good to keep an eye on.
The good news is we can let you know if that changes! Fill out the survey below to help us figure it out.
The data stack survey is still open. You can submit your answers now via the form. We’ll create follow-up posts on new, interesting findings as they roll in.
The dashboard and this report are static data for you to use. If you do use the data for something cool, make sure to share it with us!