Anvilogic on Databricks Architecture
Anvilogic implementation on Databricks (AWS, GCP, Azure).
Last updated
Anvilogic implementation on Databricks (AWS, GCP, Azure).
Last updated
Below is the generic architecture digram for how Anvilogic works on top of Databricks.
This supports Databricks on Azure, AWS, and GCP.
Diagram:
PDF Download:
Databricks will be configured in the IaaS environment that you have and is available across AWS, GCP, and Azure.
Data that already originates in IaaS that can be sent to cloud storage does not require a streaming tool and can be onboarded to Databricks directly.
Anvilogic requires 2 primary compute warehouses to run.
SQL Warehouse - Compute for ad-hoc queries to assist search, hunt, and IR
Serverless (Default) or All Purpose - Run 24/7 executing workflow jobs (detection use cases) on a cron.
You have the option to choose either type of compute environment, but Serverless is default and the recommended option since this warehouse is likely to be running 24/7 as use cases are constantly running.
Detections execute as jobs within a Workflow. Rules built on the Anvilogic platform are converted from a user friendly SQL builder to PySpark functions that run on a defined schedule.
Datasets that come from assets hosted within a data center or not in a public IaaS environment will require a solution to route that data to Databricks.
Data streaming tools (ex. Cribl, Fluentbit, Apache NiFi) can be used to send on-prem. logs directly to Databricks.
It is a requirement that you have a data transport/streaming tool to send data to IaaS storage or Anvilogic pipelines for ingestion.
Forwarding agents that are installed on endpoints also need to be re-configured to send to the streaming tools for ingestion into Databricks.
Databricks and/or Anvilogic does not provide any data streaming or endpoint agent technology.
Python Notebooks are used to collect & ingest data from storage and transform raw events into the AVL detection schema using Delta Live Tables.
Yes, if you have a streaming tool (ex. Cribl, Fluentbit, Apache NiFi) you can send custom data sources directly to your primary storage servers (ex. S3, Blob, etc.) and Anvilogic can orchestrate the ETL process into the correct schema and tables required for detection purposes.
Yes, Anvilogic helps with all of the parsing and normalization of security relevant data into the Anvilogic schema.
We have onboarding templates and configs that will help ensure the data you are brining into Databricks is properly formatted to execute detections and perform triage, hunting, and response.
All data parsing, normalization, and enrichment is done in the Python Notebook section of the diagram above.
Anvilogic assists in the ETL process of parsing, normalization and enrichment.
Bronze Tables - Un-parsed and non structured data, this is usually in 2 columns (time and raw).
Silver Tables - Parsed and structured data, this is usually where raw data is separated into multiple columns (normalization and enrichment can also occur here).
Gold Tables - Data feeds that are critical for security operations purposes (ex. Detections, Triage, Hunting, etc.) will be consolidated into parent tables by their security domain (ex. Endpoint, Cloud, Network, etc.). This data is parsed, normalized, and enriched if available.
Each feed can be customized based on an organization's preference.
Yes, Anvilogic can provide out of the box integrations for common vendor alerts and data collection for specific SaaS Security tools (ex. Crowdstrike FDR).
Tools not listed in our integration marketplace can be sent through the Custom Data Integration pipeline as a self service option.
Raw data sources are events/telemetry that is generated from endpoints/tools/appliances (ex. Windows Event logs, EDR logs).
Alerts data is curated signals from security tools (ex. Proofpoint alerts, Anti-virus alerts, etc.) that has already been identified to be suspicious or malicious by the vendor.
Yes, Anvilogic can integrate with most SOARs via REST API through either a push or a pull method.
Yes, Anvilogic has a search user interface (UI) to make it easy to query data that is inside of a Databricks catalog.
In addition, Anvilogic makes it easy to build repeatable detections that can execute on top of Databricks using a low-code UI builder.
Yes, Anvilogic has a data model and offers parsing and normalization code for any security data set that you want to use within the platform.
Yes, we can also work with OCSF data, and each data feed can be modified/controlled to customize to your needs.
Yes, Anvilogic can onboard IOCs from your third party threat intel tools (ex. Threat Connect) and use that data to create new detections, conduct ongoing exposure checks across your data feeds, or use it to enrich your alert output for triage analysts.