List of Apache PySpark Customers
Wilmington, 19801, DE,
United States
Since 2010, our global team of researchers has been studying Apache PySpark customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.
Each quarter our research team identifies companies that have purchased Apache PySpark for API Management from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.
Companies using Apache PySpark for API Management include: NextEra Energy, a United States based Utilities organisation with 16800 employees and revenues of $24.75 billion, Synchrony, a United States based Banking and Financial Services organisation with 20000 employees and revenues of $16.13 billion, FBD Insurance, a Ireland based Insurance organisation with 900 employees and revenues of $272.0 million and many others.
Contact us if you need a completed and verified list of companies using Apache PySpark, including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the software purchases.
The Apache PySpark customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!
Apply Filters For Customers
| Logo | Customer | Industry | Empl. | Revenue | Country | Vendor | Application | Category | When | SI | Insight |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
FBD Insurance | Insurance | 900 | $272M | Ireland | Apache Software | Apache PySpark | API Management | 2025 | n/a |
In 2025 FBD Insurance implemented Apache PySpark for API Management to support claims data pipelines. The Apache PySpark deployment is centered on migrating and transforming high volume claims data and provisioning programmatic access layers for downstream actuarial and claims workflows.
Implementation scope covers data engineering led design of ETL orchestration and batch processing, using Apache PySpark together with SQL based processing. Functional capabilities implemented include extraction and transformation pipelines, automated data cleansing and schema enforcement, and pipeline scheduling to feed actuarial financial modelling and reserving processes.
Integrations explicitly include SSIS for ETL workflow automation and SQL for persistent staging and analytic layers, with the new claims system acting as the primary target for transformed datasets. Operational coverage is focused on claims and actuarial business functions and is executed from the Dublin hybrid data engineering team.
Governance and rollout emphasize data engineering best practices, automation of legacy processes, and modular pipeline design to improve maintainability and scalability. The project modernizes and automates prior data handling patterns, improving pipeline maintainability and scalability as part of the deployment of Apache PySpark for API Management.
|
|
|
NextEra Energy | Utilities | 16800 | $24.8B | United States | Apache Software | Apache PySpark | API Management | 2018 | n/a |
In 2018, NextEra Energy deployed Apache PySpark as part of a Big Data analytics build within its IT organization, establishing a cloud native analytics foundation to process IoT sensor telemetry from Florida solar assets. A contract Cloud Engineer from ProTek Consulting led the end to end design and deployment on AWS, aligning the effort with corporate migration sequencing while enabling departmental analytics ahead of broader schedules.
Apache PySpark was embedded in the ETL layer to perform scalable transformations as sensor data moved into the cloud, with ingestion and orchestration architecture using AWS DMS, AWS Glue, Amazon S3, and AWS Lambda to stage and transform records before landing in Amazon Redshift. Infrastructure provisioning was automated through AWS CloudFormation to ensure repeatability across environments, and ETL jobs and data pipelines were instrumented for operational visibility.
The implementation integrated Amazon Redshift with both Power BI via ODBC and JDBC connectors and with Amazon QuickSight to enable a hybrid BI strategy that supported centralized corporate reporting alongside agile internal dashboards. Security and governance controls were implemented using IAM policies, VPC isolation, and KMS encryption, and the solution maintained close alignment with on premises network, data, and compliance stakeholders to preserve hybrid operational continuity.
Operational governance included automated provisioning workflows, observability using Amazon CloudWatch to track ETL job health, data latency, and Redshift query performance, and a structured handoff to IT and analytics teams. The build reduced infrastructure setup time by 25 percent and improved issue detection and incident response by 40 percent as reported by the implementation team, and the solution served as a reference model for subsequent departmental cloud initiatives.
|
|
|
Synchrony | Banking and Financial Services | 20000 | $16.1B | United States | Apache Software | Apache PySpark | API Management | 2018 | n/a |
In 2018, Synchrony implemented Apache PySpark. The implementation positioned Apache PySpark as the primary engine for large scale ETL and big data processing, Apps Category .
Implementation scope centered on designing and developing scalable ETL pipelines using AWS Glue, PySpark, and SQL, with Python-based orchestration using AWS Lambda and Step Functions. Functional capabilities implemented included JSON encoding and decoding with PySpark to transform semi-structured data into analytics-ready tables, automated ingestion and validation pipelines, and data validation frameworks built with PySpark and Pandas.
Integrations and operational architecture leveraged AWS services including S3 for landing and persistent storage, Glue for cataloging and ETL orchestration, Redshift for analytic modeling, EMR for Spark-based batch processing, IAM and KMS for security, and CloudWatch for monitoring. Migration work included moving on-premises DB2 datasets to Amazon S3 using AWS Glue and PySpark, and Redshift schema design used distribution keys, sort keys, and materialized views to improve query execution.
Governance and operational controls emphasized secure access and compliance through IAM role management, S3 bucket policies, and KMS encryption, while operationalizing monitoring and incident workflows with CloudWatch, JIRA, and ServiceNow in an Agile Scrum delivery model. The Apache PySpark implementation supported cross-functional data engineering and analytics teams, and included automated source file ingestion, data quality checks, and cleanup workflows that reduced manual intervention and addressed performance bottlenecks through SQL and ETL optimization.
|
Buyer Intent: Companies Evaluating Apache PySpark
Discover Software Buyers actively Evaluating Enterprise Applications
| Logo | Company | Industry | Employees | Revenue | Country | Evaluated | ||
|---|---|---|---|---|---|---|---|---|
| No data found | ||||||||