fireside-chats

Exploring the 2023 Metabase Community State of the Data Stack

· 60 minutes

  days
:
  hours
:
  minutes
:
  seconds
 

About this event

During this event, Candice Ren, co-founder of 173Tech, and Jacob Joseph, Success Engineer at Metabase, gave their personal thoughts on Metabase's 2023 Community Data Report insights and results. Some topics covered during this event include why Candice and Joseph believe large companies aim for open source data tooling and keeping data ingestion in house, and why data analysts could have rated their companies as less self-serve than their engineering and C-level counterparts.

Guests

Candice Ren

Candice Ren

Founder, 173Tech

Candice Ren is the co-founder of 173Tech, a modern analytics agency helping fast-growing companies turn data into powerful growth engines. Their team of data experts has been using Metabase since 2019 helping SaaS, mobile apps, and eCommerce businesses to get a sense of their data.

Jacob Joseph

Jacob Joseph

Success Engineer, Metabase

Data Educator and Social Entrepreneur, specializes in enterprise cloud data infrastructure, BI/Reporting/Analytics, data modeling, ETL, and data engineering.

Summary

Candice and Jacob kicked off the event by covering a bit about their background. Candice has been in the data industry for several years, was the former Head of Analytics at Bumble, and is the co-founder of 173Tech. The company has been using Metabase since 2019 and was one of the first official partners of Metabase. Jacob has been using Metabase for the last five years, and currently works as a Success Engineer for Metabase. He has about 15 years of experience working with data, data engineering, web development, and information visualization.

Before jumping into the report, we quickly reiterated that of the 189 responses we received for the data stack survey, a large majority of respondents were Metabase users, and that most of the insights in this event would be around Candice and Jacob’s personal experiences and takeaways within the industry.

The first insight discussed, that larger companies prefer open-source data tools, made sense to Candice and Jacob. They mentioned that the need for flexibility and scalability creates necessity for open source tooling. Candice mentioned that open source solutions are common, especially for fast-paced, tech-driven organizations.

“The flexibility it offers, in terms of control, optimization, and scalability, is crucial,” said Candice.

Jacob mentioned open source “addresses the history of vendor lock-in issues in the data community… although Airbyte is gaining popularity, offering an approachable solution for data ingestion.”

While discussing insight number two, that customer data was ingested more often than social media data, Candice said she found the result surprising.

“It's surprising to see social media data ranked lower, considering the increasing integration of marketing stacks. First-party data is crucial, especially with changes like iOS 14. Social media data integration is growing, albeit not as fast as hoped. The shift towards centralized data stacks is observed” said Candice.

Jacob mentioned that, “the skew in the survey results, with 90% being Metabase users, might influence the perception. Financial system data is also essential, not represented here.”

Around insight number three, “Most companies keep data ingestion in-house”, Candice said, “It's surprising to see in-house data ingestion lead. The philosophy is to recommend cost-effective and time-effective tools like Airbyte and FiveTran alongside in-house solutions.” She also mentioned how, “companies, especially enterprises, tend to be slightly more risk-averse and prefer proprietary products.”

Jacob followed up by saying, “Airbyte's rapid growth is observed (in the results), making it more approachable for users building their own solutions.”

Insight number four, “In-house built data catalogs are preferred over specific tools”, sparked conversation around why customers may be shying away from using data cataloging tools.

Candice mentioned, “there might be an underlying problem when companies decide they need a data catalog tool. The focus should be on business needs, and collaboration is key. The target audience varies, and different roles need different things out of a data catalog concept. Collaboration among all teams is crucial.”

Jacob agreed and mentioned how “most data catalogs on the list don't solve the key problems companies are looking for. Collaborative approaches like Metabase's internal data dictionary are effective.”

For the remainder of the report, Candice and Jacob went through the correlations between role happiness and factors like chosen database or self-service at their org.

Insight number five talked about the impact of an analytics database on average role satisfaction score. In our report, those who took the survey reported a higher role score if they used Postgres instead of MySQL.

Candice said that this result aligns with her experience. “MySQL and MariaDB face scaling limitations, impacting performance. Users often face challenges as they scale, leading to a lower satisfaction score.”

Jacob mentioned that, “Postgres is a popular backend database, especially among our clients. However, for analytics, data warehouses like BigQuery and Snowflake tend to be more effective as they handle larger volumes optimally.”

Candice agreed. “Exactly, and the flexibility of Metabase on top of different data warehouses adds to its appeal.”

We asked respondents to rate their company's self-service capability on a scale from 1 to 10. PostgreSQL users gave the highest scores, possibly indicating a correlation between the choice of database and a more self-service-oriented environment.

Jacob mentioned he saw this as correlation, not causation. “Choosing a better database may align with a more data-driven culture, fostering self-service.”

Candice agreed. “It's essential to consider the nuances of why certain databases lead to higher self-service scores.”

For our insights around self-service score based on team localization, Jacob mentioned how it makes sense that distributed teams scored higher than others. “Distributed teams often need more self-service due to time zone differences. It's a necessity. I'm not surprised by their higher scores,” said Jacob.

Candice agreed because “the nature of distributed work demands self-sufficiency. It is interesting to note the regional differences, although sample sizes in Africa, Central America, and the Middle East are small.”

For our last insight, how data analysts rated their companies lower in self-service score than their C-level and engineering counterparts, Candice and Jacob had a bit of a laugh about the nature of data professionals and their aim to always improve.

“Data professionals are often critical of their work, always striving for improvement. This might be reflected in lower self-service scores, though they might be more self-service than they think,” said Jacob.

Candice agreed. “Data analysts aim to make others more self-service, and their role involves delving deep into data. If they rate their companies as highly self-service, it might be a concern.”