List of Apache Spark Customers

Apache Software Foundation

1000 N West Street, Suite 1200,
Wilmington, 19801, DE,
United States

https://www.apache.org/

1 302-250-4080

Non Profit

$2M

Since 2010, our global team of researchers has been studying Apache Spark customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.

Each quarter our research team identifies companies that have purchased Apache Spark for Analytics and BI from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.

Companies using Apache Spark for Analytics and BI include: HCA Healthcare, a United States based Healthcare organisation with 226000 employees and revenues of $70.60 billion, Allstate, a United States based Insurance organisation with 55000 employees and revenues of $67.69 billion, Royal Bank of Canada, a Canada based Banking and Financial Services organisation with 96628 employees and revenues of $48.64 billion, Banco Itau, a Brazil based Banking and Financial Services organisation with 93200 employees and revenues of $28.40 billion, Freddie Mac, a United States based Banking and Financial Services organisation with 8004 employees and revenues of $21.20 billion and many others.

Contact us if you need a completed and verified list of companies using Apache Spark, including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the IaaS software purchases.

The Apache Spark customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of IaaS software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!

Apply Filters For Customers

Filters

Customer	Industry	Empl.	Revenue	Country	Vendor	Application	Category	When	SI	Insight
2K Games	Professional Services	3000	$930M	United States	Apache Software	Apache Spark	Analytics and BI	2018	n/a	In 2018, 2K Games deployed Apache Spark as a Distributed Processing System to support data engineering and analytics workflows across its operational data estate. The Apache Spark implementation was positioned to process Hive data stored in HDFS and to analyze data held in Teradata, using the HDP platform and YARN cluster for job execution and resource management. Implementation focused on batch processing and interactive query capabilities, with engineers designing Spark batch jobs to replace slower MapReduce patterns and to accelerate ETL pipelines. Apache Spark applications were developed using Spark SQL, Spark DataFrame API, Spark RDDs, PySpark, Python and Scala, and included custom aggregate functions, data validation, cleansing, transformation, and interactive querying workflows. The data pipeline architecture integrated Apache Spark with Hive, HDFS, Teradata and custom-built input adapters to ingest and normalize disparate sources. Operational tasks included migrating Hive tables between environments, a scripted data copy from edge node to on premises data lake, and auditing YARN cluster log files while data copy processes ran. Governance and operational ownership were established through a reuse framework for interfaces and day to day coordination with offshore teams to assign tasks and maintain pipeline reliability. Engineers were responsible for building reusable Spark components, validating data with PySpark applications, and maintaining operational monitoring via log auditing in the YARN cluster.
Afiniti	Professional Services	2000	$350M	Bermuda	Apache Software	Apache Spark	Analytics and BI	2020	n/a	In 2020, Afiniti deployed Apache Spark as a central execution engine within a Distributed Processing System to operationalize real-time streaming and large scale data processing for its AI driven customer experience platform. Apache Spark supported in memory analytics and micro batch processing to enable predictive agent pairing and near real time model scoring for customer service routing. The Spark implementation covered stream processing and batch ETL workloads, integrated into a Medallion Architecture for staged data refinement. Functional capabilities included streaming ingestion and processing, micro batch aggregation, schema aware transformations, automated data validation, and support for change data capture workflows and dimensional modeling using DBT and ErWin. Integrations were explicitly instrumented with Kafka for real time streaming, Azure Databricks for Spark execution, Azure Data Factory and Airflow for orchestration, and downstream consumption by Snowflake and Apache Superset for analytics. The implementation also leveraged data ingestion and replication tooling such as AirByte, Talend, and QLIK Replicate to ingest MySQL, SQL Server, Greenplum, and PostgreSQL sources, and included C#.NET API integrations for system interoperability. Operational rollout targeted global analytics and engineering teams and included documentation driven onboarding, capacity planning, and staff training to sustain production operations. Governance changes formalized CDC pipelines and standardized dimensional models, while outcomes documented in project notes included a 30% reduction in streaming latency, a 40% reduction in infrastructure cost following Azure migration, a 50% increase in data processing speeds in cloud pipelines, automated validation that cut manual effort by 20% and maintained 99.9% data integrity, and a 25% reduction in query time for the enterprise data portal.
Allstate	Insurance	55000	$67.7B	United States	Apache Software	Apache Spark	Analytics and BI	2017	n/a	In 2017, Allstate implemented Apache Spark as its Distributed Processing System to operationalize ETL and feature engineering pipelines that support analytics and machine learning workflows. Apache Spark became the core distributed processing engine used by data engineering and data science teams to process large claims, policy and agency datasets and to enable downstream modeling efforts. The implementation centered on PySpark, Spark SQL and the DataFrame APIs to build complex ETL and feature engineering pipelines in Python, with Apache Airflow used for orchestration and job scheduling. Pipelines covered structured ETL, feature creation for Agency Analytics and Product Operations predictive models, and NLP pipelines that extract signals from unstructured claims text, while also preparing imagery datasets for computer vision models developed with TensorFlow, Keras and Scikit Learn. Apache Spark was integrated into an ecosystem that included Amazon S3 as a data lake, Oracle Database for transactional sources, and Hadoop and Hive for data storage and cataloging, with analytic outputs consumed in Tableau and model artifacts promoted into production monitoring workflows. Spark jobs were invoked and monitored by Airflow orchestrations, and outputs were surfaced to data science teams for model training and to production model deployment processes. Governance and operationalization included an internal product owner function that developed data management tools and a data catalog to simplify find and request access workflows, and an Engineering Consulting Services team that supported adoption across data science groups. Engineering best practices such as version control, unit testing and data validation were coached into teams to increase pipeline reliability while supporting machine learning initiatives focused on claims routing, handling and agency performance analytics.
Ausrion	Professional Services	23000	$9.0B	United States	Apache Software	Apache Spark	Analytics and BI	2022	n/a	In 2022, Asurion deployed Apache Spark as the core processing engine for its Analytics and BI platform. The initiative focused on scalable data processing and ETL workflows to support data science and analytics use cases across the company, with software engineering teams responsible for platform development and operational ownership. Apache Spark was implemented as the primary engine for large scale batch processing and data transformation. The implementation emphasized Scala based data processing applications, modular ETL pipelines, job orchestration, and routine performance optimization, with engineers expected to maintain code quality through reviews and comprehensive technical documentation. The Spark deployment integrates with AWS infrastructure components explicitly referenced in hiring and technical notes, including EMR for managed Spark clusters, S3 as object storage for the data lake, EC2 for compute resources, and Lambda for event driven orchestration. Operational coverage includes software engineers in Nashville, TN collaborating with data scientists, DevOps engineers, and product managers to design and tune pipelines for downstream analytics workloads. Governance for the Apache Spark environment centers on software engineering practices, code review processes, and documentation standards to support maintainability and troubleshooting. The platform design assigns responsibilities for ETL development, performance tuning, and cross team coordination, aligning the Analytics and BI platform with established engineering workflows.
Banco Inter	Banking and Financial Services	3800	$3.6B	Brazil	Apache Software	Apache Spark	Analytics and BI	2020	n/a	In 2020, Banco Inter deployed Apache Spark as part of an Analytics and BI initiative to support batch and realtime data processing across its data platform. Apache Spark was positioned to provide scalable ETL and streaming compute for analytics workloads feeding internal reporting and downstream data products. The implementation emphasized an observability and data reliability layer, with modules for continuous log monitoring, metric collection, automated alerting, and key performance indicators for pipeline health. Data pipeline capabilities included batch processing and streaming processing, event driven change data capture workflows, and engineering support for Kafka producers and consumers. Teams implemented automated alerts and anomaly detection techniques to surface performance degradation and data quality issues. Apache Spark was integrated with Apache Kafka and Kafka Connect using Debezium for CDC, and with cloud data services on Amazon Web Services including Amazon RDS and Amazon Redshift for storage and downstream analytics. Monitoring and performance telemetry were instrumented through New Relic and pipeline metrics, enabling close coupling between Spark processing, messaging, and cloud storage. The configuration connected data engineer teams, application developers, infrastructure engineers, and analytics consumers across Banco Inter. Governance centered on cross functional collaboration to maintain observability, implement pipeline improvements, and evolve event driven solutions, with Data Reliability Engineers responsible for alerting, KPI definition, and remediation workflows. The deployment aimed to guarantee pipeline stability and performance and to produce data products that help infrastructure and application teams with cost reduction, vulnerability management and resource improvements.
	Banking and Financial Services	93200	$28.4B	Brazil	Apache Software	Apache Spark	Analytics and BI	2020	n/a
	Professional Services	840	$350M	United States	Apache Software	Apache Spark	Analytics and BI	2012	n/a
	Banking and Financial Services	2600	$1.5B	Ireland	Apache Software	Apache Spark	Analytics and BI	2017	n/a
	Automotive	450	$65M	United States	Apache Software	Apache Spark	Analytics and BI	2021	n/a
	Banking and Financial Services	8004	$21.2B	United States	Apache Software	Apache Spark	Analytics and BI	2021	n/a

Showing 1 to 10 of 26 entries

Buyer Intent: Companies Evaluating Apache Spark

ARTW Buyer Intent uncovers actionable customer signals, identifying software buyers actively evaluating Apache Spark. Gain ongoing access to real-time prospects and uncover hidden opportunities. Companies Actively Evaluating Apache Spark for Analytics and BI include:

University of Illinois, a United States based Education organization with 12000 Employees
Edwards Lifesciences, a United States based Life Sciences company with 19800 Employees
Joachim Weigelt Bueroorganisation, a Germany based Retail organization with 11 Employees

Discover Software Buyers actively Evaluating Enterprise Applications

Filters

Company	Industry	Employees	Revenue	Country	Evaluated
University of Illinois	Education	12000	$5.0B	United States	2026-05-24
Edwards Lifesciences	Life Sciences	19800	$6.0B	United States	2026-03-10
Joachim Weigelt Bueroorganisation	Retail	11	$2M	Germany	2026-03-09
	Non Profit	2000	$250M	United States	2026-01-28
	Education	8540	$990M	Germany	2026-01-26
	Banking and Financial Services	317233	$180.6B	United States	2025-12-16
	Construction and Real Estate	1900	$480M	Bulgaria	2025-11-12
	Media	150	$27M	France	2025-10-28
	Professional Services	59300	$4.8B	United States	2025-09-26
	Government	5693	$1.2B	Australia	2025-08-26

FAQ - APPS RUN THE WORLD Apache Spark Coverage

Apache Spark is a Analytics and BI solution from Apache Software.

Companies worldwide use Apache Spark, from small firms to large enterprises across 21+ industries.

Organizations such as HCA Healthcare, Allstate, Royal Bank of Canada, Banco Itau and Freddie Mac are recorded users of Apache Spark for Analytics and BI.

Companies using Apache Spark are most concentrated in Healthcare, Insurance and Banking and Financial Services, with adoption spanning over 21 industries.

Companies using Apache Spark are most concentrated in United States, Canada and Brazil, with adoption tracked across 195 countries worldwide. This global distribution highlights the popularity of Apache Spark across Americas, EMEA, and APAC.

Companies using Apache Spark range from small businesses with 0-100 employees - 3.85%, to mid-sized firms with 101-1,000 employees - 19.23%, large organizations with 1,001-10,000 employees - 30.77%, and global enterprises with 10,000+ employees - 46.15%.

Customers of Apache Spark include firms across all revenue levels — from $0-100M, to $101M-$1B, $1B-$10B, and $10B+ global corporations.

Contact APPS RUN THE WORLD to access the full verified Apache Spark customer database with detailed Firmographics such as industry, geography, revenue, and employee breakdowns as well as key decision makers in charge of Analytics and BI.

List of Apache Spark Customers

Apply Filters For Customers

Buyer Intent: Companies Evaluating Apache Spark

Discover Software Buyers actively Evaluating Enterprise Applications

Q1. What is Apache Spark used for?

Q2. Who uses Apache Spark for Analytics and BI?

Q3. Which companies use Apache Spark?

Q4. What is the industry breakdown of companies using Apache Spark?

Q5. What is the country breakdown of companies using Apache Spark?

Q6. What is the breakdown by employee size of companies using Apache Spark?

Q7. What is the breakdown by revenue of companies using Apache Spark?

Q8. How can I get the full list of companies using Apache Spark?