‧
4 min read
Set up a basic pipeline for log analysis
The Metabase Team
‧ 4 min read
Share this article
Choosing tools to clean, parse, and structure log data can be overwhelming (and expensive). But when you’re just getting started, you can get away with a simple setup for ad hoc analysis with a BI tool like Metabase. Here are a few best practices to follow for setting up a basic pipeline for analyzing logs.
Use a data connector tool as a shortcut for ingestion
Tools like Airbyte can quickly connect to your database and structure logs for you. Choose your logging source, like AWS CloudTrail, and connect it to a database, like Snowflake (a relatively easy, scalable, reasonable cost solution), or AWS Aurora Serverless Postgres(an easy, somewhat scalable, low-cost solution).
Other ETL tools, like Fivetran or Stitch, work in a similar vein. They use a connector to move log data from a source, like CloudTrail, to your database. You can also use an ETL tool and perform data modeling in tandem to do some of the heavier lifting for you.
Use a single cloud provider to keep everything under one roof
Google Cloud Logging connects with BigQuery so you can automatically ingest logs right into your data warehouse. AWS has multiple logging options, like CloudTrail or CloudWatch, that you can connect to one of their database options, like Postgres for RDS. Azure Monitor also has logging and storage capabilities.
Advanced use case: dump logs from multiple AWS services into an S3 bucket and query them with Athena
If you have a bit of experience with cloud services, like AWS, you can use an entire suite of cloud services to take logs from several different services and push them into one central location to prepare for analysis.
For example, push web server or application logs from your EC2 instances into an S3 bucket, along with your CloudTrail logs. Connect your S3 bucket to a querying tool, like Athena, so you can create a few tables to use for analysis. Once you have tables, you can connect to your analytics tool and create a troubleshooting dashboard, like one that maps EC2 events to CloudTrail incidents for root cause analysis.
Here’s some other AWS logging options that you can use with S3 and Athena:
- CloudWatch: store application, system, or custom logs
- RDS: store error, slow query, or transaction logs
- Lambda: store lambda logs that contain execution details, error messages, and custom log statements
- Elastic Load Balancer (ELB): store ELB logs that contain client IP address, request time, and response status code
If want to go a step further, you can connect Athena to dbt and learn how to write your own data transformations in SQL. dbt streamlines version control, deployment, and testing without having to run individual tools. However, we only recommend this setup if you’re familiar with data modeling and developer tooling.
Batch load logs for efficiency
You should batch load your logs into your data warehouse directly, or a storage option like S3, to avoid latency and resource consumption. Most cloud services offer a batch service where you can schedule and queue jobs. Note if you’re paying for a database or log storage, double-check the price of batch loading first as some cloud services charge per batch upload.
Use a database client library or connector for ingestion
Not having access to a connector tool is not an issue, but it may take more development work. Using an existing database client library or driver can help you ingest logs directly into storage / data warehouse at log time.
For example, Postgres has drivers, and MySQL has connectors. Use one of these to hook into your database without having to reinvent the wheel.
Make sure your logs include a timestamp, source, message, and log level
There are four areas we recommend to add to your log files to make log analysis smoother:
- Timestamps: It’ll be easier to establish a sequence of events. Timestamps are especially important if you’re using an analytics tool to create dashboards for performance monitoring, or auditing and compliance.
- Source: Like the service that created the log, but also which location/file/sub-service the logs are coming from. You can use the service field for troubleshooting, or just to gauge which resources are allocated to each service.
- Log message: Keep messages clear and concise so you can understand each event. Reuse keywords so it’s easier to filter and find what you need during textual analysis.
- Log level: You can filter on levels like
ALERT
andCRITICAL
to get to the bottom of which log requires immediate investigation and response.
Additional best practices and ideas for log analysis
If you’re aiming for real-time log analysis, or advanced or frequent log analysis, log-specific tooling tends to be a better fit. If your team is already using an ELK stack, a tool like Grafana could fit the bill.
Here are a few additional resources you can use to make decisions when building out a small scale logging pipeline:
You might also enjoy
All postsEmbed a Metabase dashboard in Zendesk
Get the customer data and insights you need automatically filtered and ready for use within your support tickets. You can also embed a dashboard in Salesforce, Stripe, Jira, or platforms that allow embedding URLs.
Ignacio Beines Furcada and Sarina Bloodgood
‧
5 min read