List of Apache PySpark Customers

Apache Software Foundation

1000 N West Street, Suite 1200,
Wilmington, 19801, DE,
United States

https://www.apache.org/

1 302-250-4080

Non Profit

$2M

Since 2010, our global team of researchers has been studying Apache PySpark customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.

Each quarter our research team identifies companies that have purchased Apache PySpark for API Management from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.

Companies using Apache PySpark for API Management include: NextEra Energy, a United States based Utilities organisation with 16800 employees and revenues of $24.75 billion, Synchrony, a United States based Banking and Financial Services organisation with 20000 employees and revenues of $16.13 billion, FBD Insurance, a Ireland based Insurance organisation with 900 employees and revenues of $272.0 million and many others.

Contact us if you need a completed and verified list of companies using Apache PySpark, including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the software purchases.

The Apache PySpark customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!

Apply Filters For Customers

Filters

Customer	Industry	Empl.	Revenue	Country	Vendor	Application	Category	When	SI	Insight
FBD Insurance	Insurance	900	$272M	Ireland	Apache Software	Apache PySpark	API Management	2025	n/a	In 2025 FBD Insurance implemented Apache PySpark for API Management to support claims data pipelines. The Apache PySpark deployment is centered on migrating and transforming high volume claims data and provisioning programmatic access layers for downstream actuarial and claims workflows. Implementation scope covers data engineering led design of ETL orchestration and batch processing, using Apache PySpark together with SQL based processing. Functional capabilities implemented include extraction and transformation pipelines, automated data cleansing and schema enforcement, and pipeline scheduling to feed actuarial financial modelling and reserving processes. Integrations explicitly include SSIS for ETL workflow automation and SQL for persistent staging and analytic layers, with the new claims system acting as the primary target for transformed datasets. Operational coverage is focused on claims and actuarial business functions and is executed from the Dublin hybrid data engineering team. Governance and rollout emphasize data engineering best practices, automation of legacy processes, and modular pipeline design to improve maintainability and scalability. The project modernizes and automates prior data handling patterns, improving pipeline maintainability and scalability as part of the deployment of Apache PySpark for API Management.
NextEra Energy	Utilities	16800	$24.8B	United States	Apache Software	Apache PySpark	API Management	2018	n/a	In 2018, NextEra Energy deployed Apache PySpark as part of a Big Data analytics build within its IT organization, establishing a cloud native analytics foundation to process IoT sensor telemetry from Florida solar assets. A contract Cloud Engineer from ProTek Consulting led the end to end design and deployment on AWS, aligning the effort with corporate migration sequencing while enabling departmental analytics ahead of broader schedules. Apache PySpark was embedded in the ETL layer to perform scalable transformations as sensor data moved into the cloud, with ingestion and orchestration architecture using AWS DMS, AWS Glue, Amazon S3, and AWS Lambda to stage and transform records before landing in Amazon Redshift. Infrastructure provisioning was automated through AWS CloudFormation to ensure repeatability across environments, and ETL jobs and data pipelines were instrumented for operational visibility. The implementation integrated Amazon Redshift with both Power BI via ODBC and JDBC connectors and with Amazon QuickSight to enable a hybrid BI strategy that supported centralized corporate reporting alongside agile internal dashboards. Security and governance controls were implemented using IAM policies, VPC isolation, and KMS encryption, and the solution maintained close alignment with on premises network, data, and compliance stakeholders to preserve hybrid operational continuity. Operational governance included automated provisioning workflows, observability using Amazon CloudWatch to track ETL job health, data latency, and Redshift query performance, and a structured handoff to IT and analytics teams. The build reduced infrastructure setup time by 25 percent and improved issue detection and incident response by 40 percent as reported by the implementation team, and the solution served as a reference model for subsequent departmental cloud initiatives.
Synchrony	Banking and Financial Services	20000	$16.1B	United States	Apache Software	Apache PySpark	API Management	2018	n/a	In 2018, Synchrony implemented Apache PySpark. The implementation positioned Apache PySpark as the primary engine for large scale ETL and big data processing, Apps Category . Implementation scope centered on designing and developing scalable ETL pipelines using AWS Glue, PySpark, and SQL, with Python-based orchestration using AWS Lambda and Step Functions. Functional capabilities implemented included JSON encoding and decoding with PySpark to transform semi-structured data into analytics-ready tables, automated ingestion and validation pipelines, and data validation frameworks built with PySpark and Pandas. Integrations and operational architecture leveraged AWS services including S3 for landing and persistent storage, Glue for cataloging and ETL orchestration, Redshift for analytic modeling, EMR for Spark-based batch processing, IAM and KMS for security, and CloudWatch for monitoring. Migration work included moving on-premises DB2 datasets to Amazon S3 using AWS Glue and PySpark, and Redshift schema design used distribution keys, sort keys, and materialized views to improve query execution. Governance and operational controls emphasized secure access and compliance through IAM role management, S3 bucket policies, and KMS encryption, while operationalizing monitoring and incident workflows with CloudWatch, JIRA, and ServiceNow in an Agile Scrum delivery model. The Apache PySpark implementation supported cross-functional data engineering and analytics teams, and included automated source file ingestion, data quality checks, and cleanup workflows that reduced manual intervention and addressed performance bottlenecks through SQL and ETL optimization.

Showing 1 to 3 of 3 entries

Buyer Intent: Companies Evaluating Apache PySpark

ARTW Buyer Intent uncovers actionable customer signals, identifying software buyers actively evaluating Apache PySpark. Gain ongoing access to real-time prospects and uncover hidden opportunities.

Discover Software Buyers actively Evaluating Enterprise Applications

Filters

Logo	Company	Industry	Employees	Revenue	Country	Evaluated
No data found

FAQ - APPS RUN THE WORLD Apache PySpark Coverage

Apache PySpark is a API Management solution from Apache Software.

Companies worldwide use Apache PySpark, from small firms to large enterprises across 21+ industries.

Organizations such as NextEra Energy, Synchrony and FBD Insurance are recorded users of Apache PySpark for API Management.

Companies using Apache PySpark are most concentrated in Utilities, Banking and Financial Services and Insurance, with adoption spanning over 21 industries.

Companies using Apache PySpark are most concentrated in United States and Ireland, with adoption tracked across 195 countries worldwide. This global distribution highlights the popularity of Apache PySpark across Americas, EMEA, and APAC.

Companies using Apache PySpark range from small businesses with 0-100 employees - 0%, to mid-sized firms with 101-1,000 employees - 33.33%, large organizations with 1,001-10,000 employees - 0%, and global enterprises with 10,000+ employees - 66.67%.

Customers of Apache PySpark include firms across all revenue levels — from $0-100M, to $101M-$1B, $1B-$10B, and $10B+ global corporations.

Contact APPS RUN THE WORLD to access the full verified Apache PySpark customer database with detailed Firmographics such as industry, geography, revenue, and employee breakdowns as well as key decision makers in charge of API Management.

List of Apache PySpark Customers

Apply Filters For Customers

Buyer Intent: Companies Evaluating Apache PySpark

Discover Software Buyers actively Evaluating Enterprise Applications

Q1. What is Apache PySpark used for?

Q2. Who uses Apache PySpark for API Management?

Q3. Which companies use Apache PySpark?

Q4. What is the industry breakdown of companies using Apache PySpark?

Q5. What is the country breakdown of companies using Apache PySpark?

Q6. What is the breakdown by employee size of companies using Apache PySpark?

Q7. What is the breakdown by revenue of companies using Apache PySpark?

Q8. How can I get the full list of companies using Apache PySpark?