Data Taps
The Best Data Ingestion!
Stream data to S3 with scale in minutes and 50x cheaper than AWS Firehose!
Save on time, development, downstream processing, and storage costs. Transform with DuckDB SQL.
Our innovative custom C++ AWS Lambda runtime guarantees steady latency, high performance, and low cost with unparalleled scale!
Stop worrying and start reliable processing now — Stop worrying about clusters, setups, scaling configurations, onboarding and integration projects, breaking servers, downtime, or costs! ..
Data Taps is as cloud native as it can get
it is fully AWS Lambda serverless with S3 / Express!
— Welcome to the new world!
A Data Tap is a single AWS Lambda Function with a Function URL and runs an embedded SQL OLAP database (DuckDB) and scales like nothing else!
— It is up to 50x cheaper than Firehose to start with!
Furthermore, you can deploy Taps to your own AWS Account — Data Taps is BYOC ready from start so your Data Plane stays with you!
Data Taps traffic is encrypted, ingestion is both authenticated and controlled (ACLs). You list Data Tap users in the configuration that are allowed to send data to the Tap — Data Taps works cross-regions, cross-accounts securely
Even with authenticated data, you need a resilient data processor (DuckDB) that bypasses erroneous data instead of stopping — No poison pills!
Embed Data Tap realtime metrics into your documentation
Below you can find 2 embedded (iframe) Data Tap metrics Widgets. If not connected, they ask your Data Taps username and password to connect. Once connected and if your account has access to the configured Data Tap metrics, the data starts flowing into the PlotlyJS or ECharts configured diagrams.
You can login with demo user: demo+test@boilingdata.com and password: demodata2024.
You can use our demo Postgres CDC to Data Taps repository as a starting point to push changes made to your Postgres databses to Data Taps and start enjoying realtime embeddable analytics along with OLAP charts with Boiling as Data Tap pushes your change capture data to your S3 Bucket
This demo Data Tap is public, so you (or anybody else) can HTTP POST data to it and see the metrics update on the widgets. Non-public Taps require authorization token, which you get from Boiling API.
Send data to the public Tap for example using the command below.
curl -d '{"some":"testData","newlineDelimited":true,”really”:100}' \ https://ythjkjtk6behgxomi3sajud5qq0rlxky.lambda-url.eu-west-1.on.aws/
A Data Tap widget embeds an SQL db engine (DuckDB WASM) in the browser and all matching data flowing from BoilingData is written to it.
Once per second SQL command is run to fetch data from the local in-browser database for PlotlyJS or ECharts graph and the graph is updated. The first chart on this page is with ECharts and the second one below with PlotlyJS.
Data Tap iframes are connected to each other so that once you login in one of them, the rest will also login and connect. Also, if you logout from one of them, the others will logout too, for better UX.
Temporary session context is saved on localStorage in the browser, so even if you refresh the page, you don’t need to re-login as long as the security context is still valid. After 30mins Taps are automatically disconnected, for this demo.
If you click the left upper corner to logout, the temporary security session context is also removed, forcing login with username and password again.
These widgets are Operational Analytics widgets displaying incoming data characteristics, but nothing about the data contents - with Data Taps you can also use SQL to generate insights from the data itself. Both in browser, and on cloud side.
These metrics come from the Data Taps directly (via WebSocket connection), so it is fastest and lowest latency route to browser. Please note that the widget itself only refreshes once per second, regardless how much and how frequently metrics data flows in.
This public Tap is having Lambda reserved concurrency of 2, so about 20 RPS max and about 12 MB/s. Lambda Function URL will throttle if there are already 2 concurrent Lambda functions running - just keep sending the data.
Maximum reserved concurrency with default settings on a clean AWS account is 1000 giving theoretical maximum 100k RPS and about 580 GB/s data ingestion speed (when every ingestion message is the max 6MB). And this 1000 concurrency is only a soft limit on AWS.
All the data flows into S3, the widgets only show metrics about the incoming data.
1. Generate & Validate
Send your data into Tap URLs as much as you like. Taps will filter out erroneous unparseable JSON lines.
2. Filter & Transform
Taps do data transformations with DuckDB SQL. You can aggregate, filter, and transform your streaming data.
3. Deploy in a minute
Data Taps are created with a simple YAML template and deployed with fully fledged and secure command line tool supporting CI/CD. Templates are rigorously validated before accepted.
4. Collaborate!
Taps utilise high performant security access control and takes the ideas from cellular networks security.
Your YAML template is like a security Sandbox. You can add Shares for listing which users can send data to the Data Tap if not only you.
Obsessed
We are obsessed in making the system simple and powerful!
You build with small set of well-defined resources numerous use cases and more. Sandboxes are corners in your S3 Bucket(s), Taps are URLs for ingesting data and Shares provide access.
We are obsessed in making Data Taps easy, reliable, verified, and fast!
If you have been frustrated by existing IaC tooling, we hear you. We give you command line tool, run deployments on cloud side, plan them against real existing resources, run rigid verifications on multiple levels against your YAML. We do the deployments with Least Privilege security principles so that deployers can only deploy what they need to. Secure Interworking is designed inside out into the system. Data is yours and in open optimal format (Parquet). Deployment resources are write-through cached so that a deployment takes couple of seconds instead of tens of minutes or hours.
We built Data Taps on top of the lowest AWS Lambda layer possible, a dedicated custom built C++ runtime. We further optimised it and e.g. take care of data synchronisation when AWS Lambda containers shut down. We deliver optimal chunks of data to the S3 Express Bucket, partitioned by year, month, day, and hour. Furthermore the data is put into <start>_<end> folders and have accompanied ingestion timestamp inside the data. Data compaction from S3 Express Bucket to normal S3 Buckets is done automatically for you with Core User subscription.
We are obsessed on making data more reachable for you!
Compared to other vendor provided serverless or non-serverless offerings, Data Taps is typically at least 10-20x more cost effective while maintaining very high standards. This is only possible due to the innovative and highly optimised custom C++ AWS Lambda Data Taps runtime. The mere simplicity is hard to grasp and quality delivered unparalleled. AWS published 12x Lambda scalability increase per Lambda Function on Dec 2023 and released S3 Express. Together with DuckDB they are killer combo for stream processing which you need to see and experience yourself to believe! This is hands down a game over in stream processing for other architectures with similar usage scenarios and requirements. Together with Data Lake patterns and e.g. Apache Iceberg becoming more popular - Data Taps addresses the data ingestion problem with unbelievable efficiency!
Compare!
Trad. Cloud Serverless Offering
TOTAL:
$1500 - 2500 / month
$36k impl. cost over 6m
$850 - Stream ingestion into S3
$600 - Kafka
$200 - Kafka Connect
$25 - S3 JSON Raw Storage (1TB & API)
$300 - $500 - Raw data processing and business logic
AWS Glue JSON → partitioned Parquet
AWS Glue Jobs business logic
S3 storage and API + DWH ingestion
$150 - Serverless DWH + Cloud BI for Reporting
RedShift Serverless 24 RPU 15min / day
QuickSight admin account and 5 readers
NO realtime operational Dashboards
$1000 - maintenance / devops / monitoring / CICD
20% one person + infra
$36,000 (one time cost) - Impl. project
IaC, CI/CD, monitoring, alerts for the above infra, Glue Spark code with tests, BI Dashboards, DWH schemas and ingestion, documentation, etc.
~ 6 man months x $6,000
Other
No High-Availability (HA)
Data loss possibilities in case of too small MSK
Monolithic / shared infrastructure
Slow data processing with Spark / Python / SparkSQL
Time to market: optimistically 6 months if you have required competencies and capacity and you know what to build
Data Taps
TOTAL
$30 / month
$50 impl. cost over 1h
$30 - Data Taps, up to 1 TB (Core User 10 TB)
Secure: Encrypted, Authenticated, ACLed
Stream data ingestion (newline delimited JSON)
AWS Lambda unparalleled horizontal scalability!
DuckDB SQL Transformation that you choose
Optimised ingestion to your S3 Express
Embeddable realtime Tap data metrics
$0 - maintenance / devops – N/A
$50 - (one time cost) Impl. project. 1 h from 1 person.
Other
Time to market: in hours!
Steady and low latency
Multi-region support
HA built-in, multi-AZ architecture and S3
High data durability (S3, S3 Express)
Data mesh architecture
Data processing with SQL
Choose your plan
Data Taps with "small" data