Data Taps 

 The Optimized  Data Ingestion        

Start streaming newline delimited JSON like logs, clickstream, events, IoT metrics, etc. into your S3 Express Bucket in minutes and 10x cheaper than AWS Firehose!

Data Tap uses DuckDB SQL to transform your incoming data into optimal compressed and partitioned Parquet files and immediately saves on data processing and data storage costs for you! You create the SQL!

Our innovative custom C++ AWS Lambda runtime implementation guarantees steady latency, high performance, and low cost with unparalleled scale!

Stop worrying and start reliable processing now — Stop worrying about clusters, setups, scaling configurations, onboarding and integration projects, breaking servers, downtime, or costs! ..

Data Taps is as cloud native as it can get
it is fully AWS Lambda serverless with S3 / Express!
— Welcome to the new world!

A Data Tap is a single AWS Lambda Function with a Function URL and runs an embedded SQL OLAP database (DuckDB) and scales like nothing else!
— It is 10x cheaper than Firehose to start with!

Furthermore, you can deploy Taps to your own AWS Account — Data Taps is BYOC ready from start so your Data Plane stays with you!

Data Taps traffic is encrypted, ingestion is both authenticated and controlled (ACLs). You list Data Tap users in the configuration that are allowed to send data to the Tap — Data Taps works cross-regions, cross-accounts securely

Even with authenticated data, you need a resilient data processor (DuckDB) that bypasses erroneous data instead of stopping — No poison pills!

Embed Data Tap realtime metrics into your documentation

Below you can find 2 embedded (iframe) Data Tap metrics Widgets. If not connected, they ask your Data Taps username and password to connect. Once connected and if your account has access to the configured Data Tap metrics, the data starts flowing into the PlotlyJS or ECharts configured diagrams.

  • You can login with demo user: demo+test@boilingdata.com and password: demodata2024.

  • You can use our demo Postgres CDC to Data Taps repository as a starting point to push changes made to your Postgres databses to Data Taps and start enjoying realtime embeddable analytics along with OLAP charts with Boiling as Data Tap pushes your change capture data to your S3 Bucket

  • This demo Data Tap is public, so you (or anybody else) can HTTP POST data to it and see the metrics update on the widgets. Non-public Taps require authorization token, which you get from Boiling API.

  • Send data to the public Tap for example using the command below.

curl -d '{"some":"testData","newlineDelimited":true,”really”:100}' \
  https://ythjkjtk6behgxomi3sajud5qq0rlxky.lambda-url.eu-west-1.on.aws/
  • A Data Tap widget embeds an SQL db engine (DuckDB WASM) in the browser and all matching data flowing from BoilingData is written to it.

  • Once per second SQL command is run to fetch data from the local in-browser database for PlotlyJS or ECharts graph and the graph is updated. The first chart on this page is with ECharts and the second one below with PlotlyJS.

  • Data Tap iframes are connected to each other so that once you login in one of them, the rest will also login and connect. Also, if you logout from one of them, the others will logout too, for better UX.

  • Temporary session context is saved on localStorage in the browser, so even if you refresh the page, you don’t need to re-login as long as the security context is still valid. After 30mins Taps are automatically disconnected, for this demo.

  • If you click the left upper corner to logout, the temporary security session context is also removed, forcing login with username and password again.

  • These widgets are Operational Analytics widgets displaying incoming data characteristics, but nothing about the data contents - with Data Taps you can also use SQL to generate insights from the data itself. Both in browser, and on cloud side.

  • These metrics come from the Data Taps directly (via WebSocket connection), so it is fastest and lowest latency route to browser. Please note that the widget itself only refreshes once per second, regardless how much and how frequently metrics data flows in.

  • This public Tap is having Lambda reserved concurrency of 2, so about 20 RPS max and about 12 MB/s. Lambda Function URL will throttle if there are already 2 concurrent Lambda functions running - just keep sending the data.

  • Maximum reserved concurrency with default settings on a clean AWS account is 1000 giving theoretical maximum 100k RPS and about 580 GB/s data ingestion speed (when every ingestion message is the max 6MB). And this 1000 concurrency is only a soft limit on AWS.

  • All the data flows into S3, the widgets only show metrics about the incoming data.

1. Generate & Validate

Send your data into Tap URLs as much as you like. Taps will filter out erroneous unparseable JSON lines.

2. Filter & Transform

Taps do data transformations with DuckDB SQL. You can aggregate, filter, and transform your streaming data.

3. Deploy in a minute

Data Taps are created with a simple YAML template and deployed with fully fledged and secure command line tool supporting CI/CD. Templates are rigorously validated before accepted.

4. Collaborate!

Taps utilise high performant security access control and takes the ideas from cellular networks security.

Your YAML template is like a security Sandbox. You can add Shares for listing which users can send data to the Data Tap if not only you.

I want to play with it right away!
— SNR. DATA ENGINEER
I can post JSON to the Tap URL and get realtime stats on my Browser in one go
— Startup Founder
We needed SQL Query Log for Boiling Data and found out this pattern serves tons of use cases!
— Dan Forsberg, Founder of Boiling Data

Obsessed

We are obsessed in making the system simple and powerful!
You build with small set of well-defined resources numerous use cases and more. Sandboxes are corners in your S3 Bucket(s), Taps are URLs for ingesting data and Shares provide access.

We are obsessed in making Data Taps easy, reliable, verified, and fast!
If you have been frustrated by existing IaC tooling, we hear you. We give you command line tool, run deployments on cloud side, plan them against real existing resources, run rigid verifications on multiple levels against your YAML. We do the deployments with Least Privilege security principles so that deployers can only deploy what they need to. Secure Interworking is designed inside out into the system. Data is yours and in open optimal format (Parquet). Deployment resources are write-through cached so that a deployment takes couple of seconds instead of tens of minutes or hours.

We built Data Taps on top of the lowest AWS Lambda layer possible, a dedicated custom built C++ runtime. We further optimised it and e.g. take care of data synchronisation when AWS Lambda containers shut down. We deliver optimal chunks of data to the S3 Express Bucket, partitioned by year, month, day, and hour. Furthermore the data is put into <start>_<end> folders and have accompanied ingestion timestamp inside the data. Data compaction from S3 Express Bucket to normal S3 Buckets is done automatically for you with Core User subscription.

We are obsessed on making data more reachable for you!
Compared to other vendor provided serverless or non-serverless offerings, Data Taps is typically at least 10-20x more cost effective while maintaining very high standards. This is only possible due to the innovative and highly optimised custom C++ AWS Lambda Data Taps runtime. The mere simplicity is hard to grasp and quality delivered unparalleled. AWS published 12x Lambda scalability increase per Lambda Function on Dec 2023 and released S3 Express. Together with DuckDB they are killer combo for stream processing which you need to see and experience yourself to believe! This is hands down a game over in stream processing for other architectures with similar usage scenarios and requirements. Together with Data Lake patterns and e.g. Apache Iceberg becoming more popular - Data Taps addresses the data ingestion problem with unbelievable efficiency!

Compare!

Trad. Cloud Serverless Offering

  • TOTAL

    • $1500 - 2500 / month 

    • $36k impl. cost over 6m

  • $850 - Stream ingestion into S3

  • $300 - $500 - Raw data processing and business logic

    • AWS Glue JSON → partitioned Parquet

    • AWS Glue Jobs business logic

    • S3 storage and API + DWH ingestion

  • $150 - Serverless DWH + Cloud BI for Reporting

    • RedShift Serverless 24 RPU 15min / day

    • QuickSight admin account and 5 readers

    • NO realtime operational Dashboards

  • $1000 - maintenance / devops / monitoring / CICD

    • 20% one person + infra

  • $36,000 (one time cost) - Impl. project

    • IaC, CI/CD, monitoring, alerts for the above infra, Glue Spark code with tests, BI Dashboards, DWH schemas and ingestion, documentation, etc.

    • ~ 6 man months x $6,000

  • Other

    • No High-Availability (HA)

    • Data loss possibilities in case of too small MSK

    • Monolithic / shared infrastructure

    • Slow data processing with Spark / Python / SparkSQL

    • Time to market: optimistically 6 months if you have required competencies and capacity and you know what to build

Data Taps

  • TOTAL

    • $30 / month

    • $50 impl. cost over 1h

  • $30 - Data Taps, up to 1 TB (Core User 10 TB)

    • Secure: Encrypted, Authenticated, ACLed

    • Stream data ingestion (newline delimited JSON)

    • AWS Lambda unparalleled horizontal scalability!

    • DuckDB SQL Transformation that you choose

    • Optimised ingestion to your S3 Express

    • Embeddable realtime Tap data metrics

  • $0 - maintenance / devops – N/A

  • $50 - (one time cost) Impl. project. 1 h from 1 person.

  • Other

    • Time to market: in hours!

    • Steady and low latency

    • Multi-region support

    • HA built-in, multi-AZ architecture and S3

    • High data durability (S3, S3 Express)

    • Data mesh architecture

    • Data processing with SQL

Choose your plan

FREE Starter!
Free

Data Taps with "small" data


✓ 1 Tap, transformations with DuckDB SQL
✓ Data intake: 10 GB / month
✓ Landing to your S3 Bucket
✓ auth token lifetime: 24h
Subscriber
€29.90
Every month


✓ 10 Taps, transformations with DuckDB SQL
✓ Data intake: 1 TB / month
✓ Landing to your S3 Bucket
✓ Embeddable realtime Tap data metrics
Core User
€199.00
Every month


✓ Unlimited Taps, transformation with DuckDB SQL
✓ Data intake: 10 TB / month
✓ Landing to your S3 Bucket
✓ Embeddable realtime Tap data metrics
✓ Cleanup & Compaction into your S3 Bucket with Iceberg