List of Apache Spark Customers
Wilmington, 19801, DE,
United States
Since 2010, our global team of researchers has been studying Apache Spark customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.
Each quarter our research team identifies companies that have purchased Apache Spark for Analytics and BI from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.
Companies using Apache Spark for Analytics and BI include: HCA Healthcare, a United States based Healthcare organisation with 226000 employees and revenues of $70.60 billion, Allstate, a United States based Insurance organisation with 55000 employees and revenues of $67.69 billion, Royal Bank of Canada, a Canada based Banking and Financial Services organisation with 96628 employees and revenues of $48.64 billion, Banco Itau, a Brazil based Banking and Financial Services organisation with 93200 employees and revenues of $28.40 billion, Freddie Mac, a United States based Banking and Financial Services organisation with 8004 employees and revenues of $21.20 billion and many others.
Contact us if you need a completed and verified list of companies using Apache Spark, including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the IaaS software purchases.
The Apache Spark customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of IaaS software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!
Apply Filters For Customers
| Logo | Customer | Industry | Empl. | Revenue | Country | Vendor | Application | Category | When | SI | Insight | Insight Source |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
2K Games | Professional Services | 3000 | $930M | United States | Apache Software | Apache Spark | Analytics and BI | 2018 | n/a | In 2018, 2K Games deployed Apache Spark as a Distributed Processing System to support data engineering and analytics workflows across its operational data estate. The Apache Spark implementation was positioned to process Hive data stored in HDFS and to analyze data held in Teradata, using the HDP platform and YARN cluster for job execution and resource management. Implementation focused on batch processing and interactive query capabilities, with engineers designing Spark batch jobs to replace slower MapReduce patterns and to accelerate ETL pipelines. Apache Spark applications were developed using Spark SQL, Spark DataFrame API, Spark RDDs, PySpark, Python and Scala, and included custom aggregate functions, data validation, cleansing, transformation, and interactive querying workflows. The data pipeline architecture integrated Apache Spark with Hive, HDFS, Teradata and custom-built input adapters to ingest and normalize disparate sources. Operational tasks included migrating Hive tables between environments, a scripted data copy from edge node to on premises data lake, and auditing YARN cluster log files while data copy processes ran. Governance and operational ownership were established through a reuse framework for interfaces and day to day coordination with offshore teams to assign tasks and maintain pipeline reliability. Engineers were responsible for building reusable Spark components, validating data with PySpark applications, and maintaining operational monitoring via log auditing in the YARN cluster. | |
|
|
Afiniti | Professional Services | 2000 | $350M | Bermuda | Apache Software | Apache Spark | Analytics and BI | 2020 | n/a | In 2020, Afiniti deployed Apache Spark as a central execution engine within a Distributed Processing System to operationalize real-time streaming and large scale data processing for its AI driven customer experience platform. Apache Spark supported in memory analytics and micro batch processing to enable predictive agent pairing and near real time model scoring for customer service routing. The Spark implementation covered stream processing and batch ETL workloads, integrated into a Medallion Architecture for staged data refinement. Functional capabilities included streaming ingestion and processing, micro batch aggregation, schema aware transformations, automated data validation, and support for change data capture workflows and dimensional modeling using DBT and ErWin. Integrations were explicitly instrumented with Kafka for real time streaming, Azure Databricks for Spark execution, Azure Data Factory and Airflow for orchestration, and downstream consumption by Snowflake and Apache Superset for analytics. The implementation also leveraged data ingestion and replication tooling such as AirByte, Talend, and QLIK Replicate to ingest MySQL, SQL Server, Greenplum, and PostgreSQL sources, and included C#.NET API integrations for system interoperability. Operational rollout targeted global analytics and engineering teams and included documentation driven onboarding, capacity planning, and staff training to sustain production operations. Governance changes formalized CDC pipelines and standardized dimensional models, while outcomes documented in project notes included a 30% reduction in streaming latency, a 40% reduction in infrastructure cost following Azure migration, a 50% increase in data processing speeds in cloud pipelines, automated validation that cut manual effort by 20% and maintained 99.9% data integrity, and a 25% reduction in query time for the enterprise data portal. | |
|
|
Allstate | Insurance | 55000 | $67.7B | United States | Apache Software | Apache Spark | Analytics and BI | 2017 | n/a | In 2017, Allstate implemented Apache Spark as its Distributed Processing System to operationalize ETL and feature engineering pipelines that support analytics and machine learning workflows. Apache Spark became the core distributed processing engine used by data engineering and data science teams to process large claims, policy and agency datasets and to enable downstream modeling efforts. The implementation centered on PySpark, Spark SQL and the DataFrame APIs to build complex ETL and feature engineering pipelines in Python, with Apache Airflow used for orchestration and job scheduling. Pipelines covered structured ETL, feature creation for Agency Analytics and Product Operations predictive models, and NLP pipelines that extract signals from unstructured claims text, while also preparing imagery datasets for computer vision models developed with TensorFlow, Keras and Scikit Learn. Apache Spark was integrated into an ecosystem that included Amazon S3 as a data lake, Oracle Database for transactional sources, and Hadoop and Hive for data storage and cataloging, with analytic outputs consumed in Tableau and model artifacts promoted into production monitoring workflows. Spark jobs were invoked and monitored by Airflow orchestrations, and outputs were surfaced to data science teams for model training and to production model deployment processes. Governance and operationalization included an internal product owner function that developed data management tools and a data catalog to simplify find and request access workflows, and an Engineering Consulting Services team that supported adoption across data science groups. Engineering best practices such as version control, unit testing and data validation were coached into teams to increase pipeline reliability while supporting machine learning initiatives focused on claims routing, handling and agency performance analytics. | |
|
|
|
Professional Services | 23000 | $9.0B | United States | Apache Software | Apache Spark | Analytics and BI | 2022 | n/a |
|
|
|
|
|
Banking and Financial Services | 3800 | $3.6B | Brazil | Apache Software | Apache Spark | Analytics and BI | 2020 | n/a |
|
|
|
|
|
Banking and Financial Services | 93200 | $28.4B | Brazil | Apache Software | Apache Spark | Analytics and BI | 2020 | n/a |
|
|
|
|
|
Professional Services | 840 | $350M | United States | Apache Software | Apache Spark | Analytics and BI | 2012 | n/a |
|
|
|
|
|
Banking and Financial Services | 2600 | $1.5B | Ireland | Apache Software | Apache Spark | Analytics and BI | 2017 | n/a |
|
|
|
|
|
Automotive | 450 | $65M | United States | Apache Software | Apache Spark | Analytics and BI | 2021 | n/a |
|
|
|
|
|
Banking and Financial Services | 8004 | $21.2B | United States | Apache Software | Apache Spark | Analytics and BI | 2021 | n/a |
|
|
Buyer Intent: Companies Evaluating Apache Spark
- Hoag Memorial Hospital, a United States based Non Profit organization with 2000 Employees
- RWTH Aachen University, a Germany based Education company with 8540 Employees
- JPMorgan Chase, a United States based Banking and Financial Services organization with 317233 Employees
Discover Software Buyers actively Evaluating Enterprise Applications
| Logo | Company | Industry | Employees | Revenue | Country | Evaluated | ||
|---|---|---|---|---|---|---|---|---|
| No data found | ||||||||